Deep Learning Techniques for Biomedical and Health Informatics [1st ed. 2020] 978-3-030-33965-4, 978-3-030-33966-1

This book presents a collection of state-of-the-art approaches for deep-learning-based biomedical and health-related app

3,628 168 11MB

English Pages XXV, 383 [395] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Deep Learning Techniques for Biomedical and Health Informatics [1st ed. 2020]
 978-3-030-33965-4, 978-3-030-33966-1

Table of contents :
Front Matter ....Pages i-xxv
Front Matter ....Pages 1-1
MedNLU: Natural Language Understander for Medical Texts (H. B. Barathi Ganesh, U. Reshma, K. P. Soman, M. Anand Kumar)....Pages 3-21
Deep Learning Based Biomedical Named Entity Recognition Systems (Pragatika Mishra, Sitanath Biswas, Sujata Dash)....Pages 23-40
Disambiguation Model for Bio-Medical Named Entity Recognition (A. Kumar)....Pages 41-55
Applications of Deep Learning in Healthcare and Biomedicine (Shubham Mittal, Yasha Hasija)....Pages 57-77
Deep Learning for Clinical Decision Support Systems: A Review from the Panorama of Smart Healthcare (E. Sandeep Kumar, Pappu Satya Jayadev)....Pages 79-99
Review of Machine Learning and Deep Learning Based Recommender Systems for Health Informatics (Jayita Saha, Chandreyee Chowdhury, Suparna Biswas)....Pages 101-126
Front Matter ....Pages 127-127
Deep Learning and Explainable AI in Healthcare Using EHR (Sujata Khedkar, Priyanka Gandhi, Gayatri Shinde, Vignesh Subramanian)....Pages 129-148
Deep Learning for Analysis of Electronic Health Records (EHR) (Pawan Singh Gangwar, Yasha Hasija)....Pages 149-166
Application of Deep Architecture in Bioinformatics (Sagnik Sen, Rangan Das, Swaraj Dasgupta, Ujjwal Maulik)....Pages 167-186
Intelligent, Secure Big Health Data Management Using Deep Learning and Blockchain Technology: An Overview (Sohail Saif, Suparna Biswas, Samiran Chattopadhyay)....Pages 187-209
Malaria Disease Detection Using CNN Technique with SGD, RMSprop and ADAM Optimizers (Avinash Kumar, Sobhangi Sarkar, Chittaranjan Pradhan)....Pages 211-230
Deep Reinforcement Learning Based Personalized Health Recommendations (Jayraj Mulani, Sachin Heda, Kalpan Tumdi, Jitali Patel, Hitesh Chhinkaniwala, Jigna Patel)....Pages 231-255
Using Deep Learning Based Natural Language Processing Techniques for Clinical Decision-Making with EHRs (Runjie Zhu, Xinhui Tu, Jimmy Huang)....Pages 257-295
Front Matter ....Pages 297-297
Diabetes Detection Using ECG Signals: An Overview (G. Swapna, K. P. Soman, R. Vinayakumar)....Pages 299-327
Deep Learning and the Future of Biomedical Image Analysis (Monika Jyotiyana, Nishtha Kesswani)....Pages 329-345
Automated Brain Tumor Segmentation in MRI Images Using Deep Learning: Overview, Challenges and Future (Minakshi Sharma, Neha Miglani)....Pages 347-383

Citation preview

Studies in Big Data 68

Sujata Dash · Biswa Ranjan Acharya · Mamta Mittal · Ajith Abraham · Arpad Kelemen   Editors

Deep Learning Techniques for Biomedical and Health Informatics

Studies in Big Data Volume 68

Series Editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland

The series “Studies in Big Data” (SBD) publishes new developments and advances in the various areas of Big Data- quickly and with a high quality. The intent is to cover the theory, research, development, and applications of Big Data, as embedded in the fields of engineering, computer science, physics, economics and life sciences. The books of the series refer to the analysis and understanding of large, complex, and/or distributed data sets generated from recent digital sources coming from sensors or other physical instruments as well as simulations, crowd sourcing, social networks or other internet transactions, such as emails or video click streams and other. The series contains monographs, lecture notes and edited volumes in Big Data spanning the areas of computational intelligence including neural networks, evolutionary computation, soft computing, fuzzy systems, as well as artificial intelligence, data mining, modern statistics and Operations research, as well as self-organizing systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output. ** Indexing: The books of this series are submitted to ISI Web of Science, DBLP, Ulrichs, MathSciNet, Current Mathematical Publications, Mathematical Reviews, Zentralblatt Math: MetaPress and Springerlink.

More information about this series at http://www.springer.com/series/11970

Sujata Dash Biswa Ranjan Acharya Mamta Mittal Ajith Abraham Arpad Kelemen •







Editors

Deep Learning Techniques for Biomedical and Health Informatics

123

Editors Sujata Dash Department of Computer Science North Orissa University Takatpur, Odisha, India Mamta Mittal Computer Science and Engineering Department G. B. Pant Government Engineering College New Delhi, Delhi, India

Biswa Ranjan Acharya School of Computer Science and Engineering KIIT Deemed to University Bhubaneswar, Odisha, India Ajith Abraham Scientific Network for Innovation and Research Excellence Machine Intelligence Research Labs Auburn, AL, USA

Arpad Kelemen Department of Organizational Systems and Adult Health University of Maryland Baltimore, MD, USA

ISSN 2197-6503 ISSN 2197-6511 (electronic) Studies in Big Data ISBN 978-3-030-33965-4 ISBN 978-3-030-33966-1 (eBook) https://doi.org/10.1007/978-3-030-33966-1 © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

Overview Biomedical and Health Informatics is an emerging field of research at the intersection of information science, computer science, and health care. Health care informatics and analytics is a new era that brings tremendous opportunities and challenges due to easily available plenty of biomedical data for further analysis. The aim of healthcare informatics is to ensure the high-quality, efficient healthcare, better treatment and quality of life by efficiently analyzing of abundant biomedical, and healthcare data including patient’s data, electronic health records (EHRs) and lifestyle. Earlier, it was common requirement to have a domain expert to develop a model for biomedical or healthcare; however, recent advancements in representation learning algorithms (deep learning techniques) allow to automatically learning the pattern and representation of the given data for the development of such model. Deep learning methods with multiple levels of representation in which at each level the system learn higher abstract level representation. Deep learning based algorithms has demonstrated great performance to a variety of areas including computer vision, image processing, natural language processing, speech recognition, video analysis, biomedical and health informatics etc. Deep learning approaches such as neural networks such as deep belief network, convolutional neural network, deep auto-encoder, and deep generative networks have emerged as powerful computational models. These have shown significant success in dealing with massive data for large number of applications due to their capability to extract complex hidden features and learn efficient representation in unsupervised setting. The book will play a vital role in improvising human life to a great extent. All the researchers and practitioners those who are working in field of biomedical and health informatics, and deep learning will be highly benefited. This book would be a good collection of state-of-the-art approaches for deep learning based biomedical and health related applications. It will be very beneficial for the new researchers and practitioners working in the field to quickly know the best performing methods. They would be able to compare different approaches and can carry forward their

v

vi

Preface

research in the most important area of research which has direct impact on betterment of the human life and health. This book would be very useful because there is no book in the market which provides a good collection of the state-of-the-art methods of deep learning based models for biomedical and health informatics as Deep learning is recently emerged and very un-matured field of research in biomedical and healthcare. This book, Deep Learning Techniques for Biomedical and Health Informatics, aims to present discussions on various applications of deep learning relating to the Biomedical and Health Informatics problems and suggest latest research methodologies and emerging developments to benefit the researchers and practitioners. In this volume, 49 researchers and practitioners of international repute have presented latest research developments, current trends, state of the art reports, case studies and suggestions for further development in the field of biomedical and health informatics, and deep learning.

Objective The purpose of this book is to report the latest advances and developments in the field of biomedical and health informatics, and deep learning. The book comprises the following three parts: • Deep Learning for Biomedical Engineering and Health Informatics • Deep Learning and Electronics Health Records • Deep Learning for Medical Image Processing

Organization There are 16 chapters in Deep Learning Techniques for Biomedical and Health Informatics. They are organized into three parts, as follows: • Part One: Deep Learning for Biomedical Engineering and Health Informatics. This part has a focus on deep learning paradigms and its application in biomedical and health informatics, clinical decision support systems, disease diagnosis and monitoring systems and recommender systems for health informatics. There are six chapters in this part. The first chapter looks into the application of deep learning to healthcare data in the task like information and relation extraction. The second and third contribution focus on discovery of biomedical named entities from many biomedical text mining task applying deep learning techniques. The fourth chapter introduces deep learning and developments in neural network and then discusses its applications in healthcare

Preface

vii

and its relevance in biomedical informatics and computational biology research in public health domain. The fifth chapter discusses various existing deep learning techniques and their applications for decision support in clinical systems. The sixth chapter discusses the challenges and issues of health recommender system. • Part Two: Deep Learning and Electronics Health Records. The second part comprises seven chapters. The first contribution discusses about the design and implementation of explainable deep learning system for healthcare using HER. The second chapter audits the deep learning strategies connected with EHR information examination and induction. The third chapter contribution focus on the extensive application of deep learning in many domains, including bioinformatics for the analysis and classification of biomedical imaging data, sequence data from omics and biomedical signal processing. The fourth chapter discusses advanced distributed security techniques such as blockchain to protect the health data from unauthorized access and the fifth contribution presents CNN based classification for malaria disease to classify the blood films into infected and normal blood films. The sixth chapter presents deep reinforcement learning based approach for complete health care recommendations including medicines to take, doctors to consult, nutrition to acquire and activities to perform that consists of exercises and preferable sports. The seventh contribution presents the advantages in dealing with text-based extractions and retrievals using deep learning techniques. • Part Three: Deep Learning for Medical Image Processing. There are three chapters in this part. The first chapter discusses several deep learning architectures which can be effectively used for HRV signal analysis for the purpose of detection of diabetes. The second chapter discusses the issues and challenges of DL approaches for analysing biomedical images and its application for classification, registration and segmentation. The last chapter gives an overview of deep learning-based segmentation algorithms with a special reference to brain tumor classification, various challenges, along with its future scope.

Target Audiences The current volume is a reference text aimed to support a number of potential audiences, including the following: • Researchers in this field who wish to have the up-to-date knowledge of the current practice, mechanisms, and research developments. • Students and academicians of biomedical and informatics field who have an interest in further enhancing the knowledge of the current developments.

viii

Preface

• Industry and peoples from Technical Institutes, R&D Organizations, and working in the field of machine learning, deep learning, biomedical engineering, health informatics, and related fields. Baripada, Odisha, India Bhubaneswar, Odisha, India New Delhi, India Auburn, AL, USA Baltimore, MD, USA

Sujata Dash Biswa Ranjan Acharya Mamta Mittal Ajith Abraham Arpad Kelemen

Acknowledgements

The editors would like to acknowledge the help of all the people involved in this project and, more specifically, to the reviewers who took part in the review process. Without their support, this book would not have become a reality. First, the editors would like to thank each one of the authors for their time, contribution, and understanding during the preparation of the book. Second, the editors wish to acknowledge the valuable contributions of the reviewers regarding the improvement of quality, coherence, and content presentation of chapters. Last but not least, the editors wish to acknowledge the love, understanding, and support of their family members during the preparation of the book. Baripada, Odisha, India Bhubaneswar, Odisha, India New Delhi, India Auburn, AL, USA Baltimore, MD, USA

Sujata Dash Biswa Ranjan Acharya Mamta Mittal Ajith Abraham Arpad Kelemen

ix

Contents

Deep Learning for Biomedical Engineering and Health Informatics MedNLU: Natural Language Understander for Medical Texts . . . . . . . H. B. Barathi Ganesh, U. Reshma, K. P. Soman and M. Anand Kumar Deep Learning Based Biomedical Named Entity Recognition Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pragatika Mishra, Sitanath Biswas and Sujata Dash

3

23

Disambiguation Model for Bio-Medical Named Entity Recognition . . . . A. Kumar

41

Applications of Deep Learning in Healthcare and Biomedicine . . . . . . . Shubham Mittal and Yasha Hasija

57

Deep Learning for Clinical Decision Support Systems: A Review from the Panorama of Smart Healthcare . . . . . . . . . . . . . . . . . . . . . . . . E. Sandeep Kumar and Pappu Satya Jayadev

79

Review of Machine Learning and Deep Learning Based Recommender Systems for Health Informatics . . . . . . . . . . . . . . . . . . . . 101 Jayita Saha, Chandreyee Chowdhury and Suparna Biswas Deep Learning and Electronics Health Records Deep Learning and Explainable AI in Healthcare Using EHR . . . . . . . . 129 Sujata Khedkar, Priyanka Gandhi, Gayatri Shinde and Vignesh Subramanian Deep Learning for Analysis of Electronic Health Records (EHR) . . . . . 149 Pawan Singh Gangwar and Yasha Hasija Application of Deep Architecture in Bioinformatics . . . . . . . . . . . . . . . . 167 Sagnik Sen, Rangan Das, Swaraj Dasgupta and Ujjwal Maulik

xi

xii

Contents

Intelligent, Secure Big Health Data Management Using Deep Learning and Blockchain Technology: An Overview . . . . . . . . . . . . . . . . . . . . . . . 187 Sohail Saif, Suparna Biswas and Samiran Chattopadhyay Malaria Disease Detection Using CNN Technique with SGD, RMSprop and ADAM Optimizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Avinash Kumar, Sobhangi Sarkar and Chittaranjan Pradhan Deep Reinforcement Learning Based Personalized Health Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Jayraj Mulani, Sachin Heda, Kalpan Tumdi, Jitali Patel, Hitesh Chhinkaniwala and Jigna Patel Using Deep Learning Based Natural Language Processing Techniques for Clinical Decision-Making with EHRs . . . . . . . . . . . . . . . . . . . . . . . . 257 Runjie Zhu, Xinhui Tu and Jimmy Huang Deep Learning for Medical Image Processing Diabetes Detection Using ECG Signals: An Overview . . . . . . . . . . . . . . 299 G. Swapna, K. P. Soman and R. Vinayakumar Deep Learning and the Future of Biomedical Image Analysis . . . . . . . . 329 Monika Jyotiyana and Nishtha Kesswani Automated Brain Tumor Segmentation in MRI Images Using Deep Learning: Overview, Challenges and Future . . . . . . . . . . . . . . . . . . . . . 347 Minakshi Sharma and Neha Miglani

Editors and Contributors

About the Editors Sujata Dash received her Ph.D. in computational modeling from Berhampur University, Orissa, India, in 1995. She is Associate Professor in P.G. Department of Computer Science and Application, North Orissa University, Baripada, India. She has published more than 150 technical papers in international journals, conferences, and chapters of reputed publications. She has guided many scholars for their Ph.D. in computer science. She is associated with many professional bodies like IEEE, CSI, ISTE, OITS, OMS, IACSIT, IMS, and IAENG. She is in the editorial board of several international journals and also reviewer of many international journals. Her current research interests include Machine Learning, Distributed Data Mining, Bioinformatics, Intelligent Agent, Web Data Mining, Recommender System, and Image Processing. Biswa Ranjan Acharya is an academic currently associated with Kalinga Institute of Industrial Technology Deemed to be University along with pursuing Ph.D. in computer application from Veer Surendra Sai University of Technology (VSSUT), Burla, Odisha, India. He has received MCA in 2009 from IGNOU, New Delhi, India, and M.Tech. in Computer Science and Engineering in the year of 2012 from Biju Patnaik University of Technology (BPUT), Odisha, India. He is also associated with various educational and research societies like IEEE, IACSIT, CSI, IAENG, and ISC. He has along with 2 years of industry experience as a software engineer, a total of 10 years’ experience in both academia of some reputed university like Ravenshaw University and software development field. He currently is working on research area multiprocessor scheduling along with different fields like Data Analytics, Computer Vision, Machine Learning, and IoT. He published some research articles in international reputed journal as well as serving as reviewer.

xiii

xiv

Editors and Contributors

Mamta Mittal is graduated in computer engineering from Kurukshetra University Kurukshetra, in 2001, and received masters’ degree (Honors) in computer engineering from YMCA, Faridabad. Her Ph.D. is from Thapar University, Patiala, in computer engineering and rich experience of more than 16 years. Presently, she is working at G. B. Pant Government Engineering College, Okhla, New Delhi (under Government of NCT Delhi), and supervising Ph.D. candidates of GGSIPU, New Delhi. She is working on DST approved Project “Development of IoT-based hybrid navigation module for mid-sized autonomous vehicles.” She has published many SCI/SCIE/Scopus indexed papers and Book Editor of renowned publishers. Ajith Abraham is current working as Director of Machine Intelligence Research Labs (MIR Labs), which has members from more than 100 countries. Dr. Abraham’s research and development experience includes more than 27 years in the industry and academia. He received M.S. from Nanyang Technological University (NTU), Singapore, and Ph.D. in Computer Science from Monash University, Melbourne, Australia. He works in a multi-disciplinary environment involving machine (network) intelligence, cyber security, sensor networks, Web intelligence, scheduling, data mining and applied to various real-world problems. He has given more than 100+ conference plenary lectures/tutorials and invited seminars/lectures in over 100 universities around the globe. Arpad Kelemen is Professor of informatics at the University of Maryland, Baltimore. He has expertise in biomedical informatics, human–computer interaction, game development for education and self-management, data mining, machine learning, artificial intelligence, intelligent patient care technologies, and software and healthcare database development. He published over 60 peer-reviewed research articles and two books, and served as PI, Co-PI, and Co-I for multiple grants from NSF, NIH, HRSA, and New York State Foundation for Science, Technology, and Innovation. Dr. Kelemen holds a Ph.D. in computer science from the University of Memphis, MS and BS from the University of Szeged, Hungary.

Contributors M. Anand Kumar Department of Information Technology, National Institute of Technology Karnataka, Surathkal, India H. B. Barathi Ganesh Amrita School of Engineering, Center for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India Sitanath Biswas North Orissa University, Baripada, India Suparna Biswas Department of Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology, Kolkata, West Bengal, India

Editors and Contributors

xv

Samiran Chattopadhyay Department of Information Technology, Jadavpur University, Kolkata, West Bengal, India Hitesh Chhinkaniwala Adani Institute of Infrastructure Engineering, Ahmedabad, India Chandreyee Chowdhury Computer University, Kolkata, India

Science

and

Engineering,

Jadavpur

Rangan Das Department of Computer Science and Engineering, Jadavpur University, Jadavpur, Kolkata, India Swaraj Dasgupta Department of Computer Science and Engineering, Jadavpur University, Jadavpur, Kolkata, India Sujata Dash North Orissa University, Baripada, India Priyanka Gandhi Department of Computer Engineering, VESIT, Mumbai, India Pawan Singh Gangwar Delhi Technological University, Delhi, India Yasha Hasija Delhi Technological University, Delhi, India Sachin Heda Department of Computer Science and Engineering, Institute of Technology Nirma University, Ahmedabad, India Jimmy Huang Information Retrieval and Knowledge Management Research Lab, School of Information Technology, York University, Toronto, Canada Monika Jyotiyana Central University of Rajasthan, Bandar Sindri, Ajmer, India Nishtha Kesswani Central University of Rajasthan, Bandar Sindri, Ajmer, India Sujata Khedkar Department of Computer Engineering, VESIT, Mumbai, India A. Kumar Department of Computer Science and Engineering, National Institute of Technology Raipur, Raipur, Chhattisgarh, India Avinash Kumar School of Computer Engineering, KIIT DU, Bhubaneswar, India Ujjwal Maulik Department of Computer Science and Engineering, Jadavpur University, Jadavpur, Kolkata, India Neha Miglani Department of Computer Engineering, National Institute of Technology, Kurukshetra, India Pragatika Mishra Gandhi Institute for Technology, Bhubaneswar, India Shubham Mittal Delhi Technological University, Delhi, India Jayraj Mulani Department of Computer Science and Engineering, Institute of Technology Nirma University, Ahmedabad, India Jigna Patel Department of Computer Science and Engineering, Institute of Technology Nirma University, Ahmedabad, India

xvi

Editors and Contributors

Jitali Patel Department of Computer Science and Engineering, Institute of Technology Nirma University, Ahmedabad, India Chittaranjan Pradhan School Bhubaneswar, India

of

Computer

Engineering,

KIIT

DU,

U. Reshma Arnekt Solutions Pvt. Ltd., Magarpatta City, Pune, Maharashtra, India Jayita Saha Computer Science and Engineering, Jadavpur University, Kolkata, India Sohail Saif Department of Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology, Kolkata, West Bengal, India E. Sandeep Kumar Department of Telecommunication Engineering, M.S. Ramaiah Institute of Technology, Bengaluru, India Sobhangi Sarkar School of Computer Engineering, KIIT DU, Bhubaneswar, India Pappu Satya Jayadev Department of Electrical Engineering, IIT Madras, Chennai, India Sagnik Sen Department of Computer Science and Engineering, Jadavpur University, Jadavpur, Kolkata, India Minakshi Sharma Department of Computer Engineering, National Institute of Technology, Kurukshetra, India Gayatri Shinde Department of Computer Engineering, VESIT, Mumbai, India K. P. Soman Amrita School of Engineering, Center for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India Vignesh Subramanian Department of Computer Engineering, VESIT, Mumbai, India G. Swapna Amrita School of Engineering, Center for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India Xinhui Tu School of Computer Science, Central China Normal University, Wuhan, China Kalpan Tumdi Department of Computer Science and Engineering, Institute of Technology Nirma University, Ahmedabad, India R. Vinayakumar Amrita School of Engineering, Center for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India

Editors and Contributors

xvii

Runjie Zhu Information Retrieval and Knowledge Management Research Lab, Department of Electrical Engineering and Computer Science, York University, Toronto, Canada

Abbreviations

AD AdaGrad ADC ADHD ADNI AE AES AFLC AHE AI AMBE ANFIS ANN ANS Anti-CPP ApEn AR AUC AV BBHE BCHC BERT BETA Bi-LSTM BiLM Bio-NER BMESO BMI BOE BoW

Alzheimer’s disease Adaptive gradient algorithm Analog-to-digital converter Attention-deficit hyperactivity disorder Alzheimer’s disease neuroimaging initiative Auto-encoders Advanced Encryption Standard Adaptive fuzzy leader clustering algorithm Adaptive histogram equalization Artificial intelligence Absolute mean brightness error Adaptive neuro-fuzzy inference system Artificial neural network Autonomic nervous system Anti-cyclic citrullinated peptide Approximate entropy Autoregressive Area under curve Atrioventricular Bi-histogram equalization Birmingham Community Healthcare Bidirectional Encoder Representations From Transformer Blackbox Explanations Using Transparent Approximations Bidirectional long short-term memory Bidirectional language model Biomedical named entity recognition B-Begin M-Middle E-End S-Single O-Outside Body mass index Bag of events Bag of words

xix

xx

BRCA1 BUN C4.5 CAD CAMDM CAN CAT CBIR CBoW CD CDBN CDC CDP CDSS CE CEC CGMS CLAHE CM CNN CP CPI CPT CRF CRP CRPS CSF CSG CT CUIs DAE DAG DBM DBN DCNNs DDI DES DET DFA DIARETDB1 DL DM DNA DNN

Abbreviations

Breast cancer gene type 1 Blood urea nitrogen A decision tree algorithm Computer-aided design Computer-aided medical decision making Cardiovascular autonomic neuropathy Computerized axial tomography Content-based image retrieval Continuous bag of words Correlation dimension Convolutional deep belief networks Centers For Disease Control And Prevention Code On Dental Procedures And Nomenclature Clinical decision support system Character embedding Constant error carousel Continuous glucose monitoring system Contrast-limited adaptive histogram equalization Confusion matrix Convolution neural network Clinical predictions Compound–protein interaction Current procedural terminology Conditional random field C-reactive protein Continuous ranked probability score Cerebrospinal fluid Continuous skip-gram Computed tomography Concept unique identifiers Denoising auto-encoders Directed acyclic graph Deep Boltzmann machine Deep belief network Deep convolutional neural networks Drug–drug interaction Data Encryption Standard Determinism Detrended fluctuation analysis Diabetic Retinopathy Database Deep learning Diabetes mellitus Deoxyribonucleic acid Deep neural network

Abbreviations

DQN DRG DRL DRMM EBV ECG ED EE EEG EHR EI ELMO EM E-Mail EMD EMR EPS ESR ESRD FCM FDA FFT FHE FIS FITBIR FN FP GAN GBDT GBT GLoVe GM GPS GPUs GRAM GRU GSN HAR HbA1c HCPCS HCUP HDL HE HER HF

xxi

Deep Q network Diagnostic related grouping Deep reinforcement learning Deep relevance matching model Epstein–Barr virus Electrocardiogram Encoder–decoder Energy expenditure Electroencephalogram Electronic health records Extended intelligence Embeddings from language models Expectation maximization Electronic mail Empirical mode decomposition Electronic Health Records Epsilon Erythrocyte sedimentation rate End-stage renal disease Fuzzy c-means Food And Drug Administration Fast Fourier transform Fuzzy logic-based histogram equalization Fuzzy inference system The Federal Interagency Traumatic Brain Injury Research False negative False positive Generative adversarial network Gradient boosting decision trees Gradient boosting tree Global vector Gray matter Global Positioning System Graphical processing units Graph-based attention model Gated recurrent unit Generative stochastic network Human activity recognition Hemoglobin A1c Healthcare Common Procedure Coding System Healthcare Cost And Utilization Project High-density lipoproteins Histogram equalization Hindsight experience replay Heart failure

xxii

HIN HMD HMM HOS HPI HRS HRV i2b2 IBL ICD ICD9 ID IDPs IDRs IE IMA IMF IOB IoT IR JSON KNN KPCA LA LAM LDA LDL LIDC LIME LL LoG LSDB LSTM RNN LSTM LV MCEMJ MDF medGAN MEMM MICCAI MIDAS MIL MILA MiME MIMIC

Abbreviations

Heterogeneous information network Human Mortality Database Hidden Markov model Higher-order spectrum History of patient illness Health recommender systems Heart rate variability Informatics For Integrating Biology and The Bedside Instance-based learning International Classification of Diseases International Classification of Diseases 9 Identifier Intrinsically disordered proteins Intrinsically disordered regions Information extraction Indian Medical Association Intrinsic mode function Inside—Outside—Beginning Internet of things Information retrieval JavaScript Object Notation K-nearest neighbor Kernel principal component analysis Left arm Laminarity Linear discriminant analysis Low-density lipoproteins Lung Image Database Consortium Dataset Local interpretable model Left leg Laplacian of Gaussian Locus-specific databases Long short-term Memory RNN Long short-term memory Left ventricle Medical Concept Embeddings From Medical Journals Markov decision process Medical Generative Adversarial Network Maximum entropy Markov model Medical image computing and computer-assisted intervention The Multimedia Medical Archiving System Multi-instance learning Montreal Institute For Learning Algorithms Multilevel Medical Embedding Medical Information Mart For Intensive Care

Abbreviations

MinPts MiRNA ML MLEE MLP MRI MRNA MSE MTL MTM NDC NDD NEC NED NER NGS NIHCC NLM NLP NLU NMS NN NNE NO NP OAI OASIS OGTT PCA PD PDA PET PHQ PII PINN PNS POMDP POS PoW PP PPG PPI PSD PSNR

xxiii

Minimum points Micro-ribonucleic acid Machine learning Multilevel event extraction Multilayer perceptron Magnetic resonance imaging Messenger ribonucleic acid Mean square error Multi-task learning Multi-task model National Drug Codes Neurodegenerative disorders Named entity classification Named entity detection Named entity recognition Next-generation sequencing National Institute of Health Clinical Centre National Library of Medicine Natural language processing Natural language understanding Non-maxima suppression Neural network Non-named entities Nitric oxide Noun phrases Osteoarthritis initiative Open Access Series of Imaging Studies Oral glucose tolerance test Principal component analysis Parkinson’s disease Personal digital assistant Positron emission tomography Patient Health Questionnaire Personally identifiable information Pairwise input neural network Parasympathetic nervous system Partially observed Markov decision process Part of speech Proof of work Prepositional phrases Photoplethysmography Protein–protein interaction Power spectrum density Peak signal–noise ratio

xxiv

QoS QSAR RA RBM RCNNs RE ReLU RF RL RMSE RMSProp RNA RNN RoI RP LIME RQA RR RS RSA RSNA SA SAE SampEn SBE SCR SDP SEER SGD SHMS SIFT SiRNA SMS SNS SP LIME SPECT SPPMI SQL SRL-RNN SSIM STARE SVD SVM TCIA TD

Abbreviations

Quality of service Quantitative structure−activity Relationship Right arm Restricted Boltzmann machine Region convolutional neural networks Relation extraction Rectified linear unit Random forest Representation learning Recursive mean separate histogram equalization Root mean square propagation Ribonucleic acid Recurrent neural network Region of interest Random pick local interpretable model Recurrence quantification analysis Recurrence rate Recommender systems Rivest–Shamir–Adleman Radiological Society of North America Sinoatrial Sparse auto-encoders Sample entropy Surrounding-based embedding feature Summary care records Shortest dependency path Survival Epidemiology And End Results Program Stochastic gradient descent Smart healthcare monitoring system Scale-invariant feature transform Small interfering ribonucleic acid Short message service Sympathetic nervous system Selective pick local interpretable model Single-photon emission computed tomography Shifted positive pointwise mutual information Structured query language Supervised reinforcement learning with recurrent neural network Structural similarity index mean Structured analysis of the retina Singular value decomposition Support vector machine The Cancer Imaging Archive Temporal difference

Abbreviations

TE TG TN TP t-SNE TT UCI UMLS UQI USF UTI VAE VEGF VGG VHL VIA VP WBAN WBCD WE WHO WM XML

xxv

Tone entropy Triglycerides True negative True positive T-distributed stochastic neighbor embedding Trapping time University of California, Irvine Unified medical language system Universal quality index University of Southern California Urinary tract infection Variational auto-encoders Vascular endothelial growth factor Visual geometry group Von-Hippel–Lindau Illness Visual and image analysis Verb phrases Wireless body area network Wisconsin Breast Cancer Dataset Word embedding World Health Organization White matter Extensible Markup Language

Deep Learning for Biomedical Engineering and Health Informatics

MedNLU: Natural Language Understander for Medical Texts H. B. Barathi Ganesh, U. Reshma, K. P. Soman and M. Anand Kumar

Abstract Natural Language Understanding is one of the essential tasks for building clinical text-based applications. Understanding of these clinical texts can be achieved through Vector Space Models and Sequential Modelling tasks. This paper is focused on sequential modelling i.e. Named Entity Recognition and Part of Speech Tagging by attaining a state of the art performance of 93.8% as F1 score for i2b2 clinical corpus and achieves 97.29% as F1 score for GENIA corpus. This paper also states the performance of feature fusion by integrating word embedding, feature embedding and character embedding for sequential modelling tasks. We also propose a framework based on a sequential modelling architecture, named MedNLU, which has the capability of performing Part of Speech Tagging, Chunking, and Entity Recognition on clinical texts. The sequence modeler in MedNLU is an integrated framework of Convolutional Neural Network, Conditional Random Fields and Bi-directional Long-Short Term Memory network.

1 Introduction Medical fields generate digital data in the form of clinical reports—structured/semistructured data, raw data and the amount of data consumers/patients generate in H. B. Barathi Ganesh (B) · K. P. Soman Amrita School of Engineering, Center for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] K. P. Soman e-mail: [email protected] U. Reshma Arnekt Solutions Pvt. Ltd., Pentagon P-3, Magarpatta City, Pune, Maharashtra, India e-mail: [email protected] M. Anand Kumar Department of Information Technology, National Institute of Technology Karnataka, Surathkal, India e-mail: [email protected] © Springer Nature Switzerland AG 2020 S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics, Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_1

3

4

H. B. Barathi Ganesh et al.

social media platforms. No one individual can acquire and maintain the knowledge needed to comprehend the entirety of these data. Here comes the need of Natural Language Processing (NLP), which is one of the subfields of Artificial Intelligence. MedNLU is a framework to address the challenges involved in understanding the information that are hidden in the midst of digital health data. This framework will act as a fundamental component in many health care applications that requires natural language processing and understanding. The MedNLU comprises of subfields in Artificial Intelligence like NLP, Conventional Machine Learning and Deep Learning. It takes the health-care texts or health-care documents as the input and outputs the tokenized text, chunked text, parsed text, entities and Part of Speech (POS) tags associated with the medical text. By utilizing these entities and POS tags, knowledge base can be built, which can be used for data base management and conversational systems. The components of MedNLU framework are given in Fig. 1. Most of the health documents are produced using Electronic Health Records (EHRs) that includes records of patient’s family history, reason for initial complaint, diagnosis and treatment, prescription medication, lab tests and results, record of visits, administrative and billing data, patient demographics, progress notes, vital signs, medical histories, immunization dates, allergies, radiology images and so on. Almost all the details of a patient will be readily available for clinicians and physicians Fig. 1 MedNLU framework

MedNLU: Natural Language Understander for Medical Texts

5

at any point of time. With the help of NLP in Health care, the end system that required a human assistance to re-check the documents got replaced by NLP based systems. There are hospitals around the world which have already started using NLP on a daily basis. Extraction of information from it will help to develop the application like decision support systems, adverse drug reaction identification, pharmacovigilance, effective management of pharmacokinetics, patient cohort identification, effective EMR development and maintenance. This required information is mostly extracted through NLU tasks like Named Entity Recognition (NER), Part of Speech (POS) tagging and Chunking [1–4]. So far medical domain has been mostly using NLU, which worked on rule-based methodology [5, 6]. Rule-based is nothing but a set of hand coded rules to extract valuable information from the medical data. With respect to the knowledge resources that needed to be extracted in each set of documents, certain rules were framed that were convenient for the structure of each document. In simple, document specific rules were used for knowledge extraction. It was a little later when algorithm driven models were used and reduced the workload of manually encoding the datum [7, 8]. Recently researchers have moved into applying Deep Learning to the Health care data in the task like NER and relation extraction [9–11]. By observing this here we state the performance of feature fusion by integrating word embedding, feature embedding and character embedding for sequential modelling tasks. Sequence modeler in MedNLU is an integrated framework of Convolutional Neural Network (CNN), Long-Short Term Memory (LSTM) network and Conditional Random Fields (CRF). The proposed framework named MedNLU, has the capability of performing Named Entity Recognition, Part of Speech Tagging, Parsing and Chunking on clinical texts. This experiment also proves that without having a domain specific word embedding model, the sequence model architecture attains state of art performance using word embeddings developed from general English text.

2 Related Works Effective computation of dense word matrix and addition of downstream model [12] on word2vec with different architecture is published in [13] which has a large impact on research group to make use of the Big Data available in healthcare domain. Thus, Text Classification tasks like sentiment analysis, Text summarization, Information extraction (IE) [14] and Information retrieval (IR) which are some of common NLP problems [15] have started using word embedding. Because of its acclaim fields in health care and bioinformatics uses the same. Some of healthcare problems like Relation Extraction (RE), Named Entity Recognition (NER) [1], drug-disease interaction, medical synonym extraction [2], and chemical-disease relation are getting special attention. Closed set small corpus or general big corpus such as Google news and

6

H. B. Barathi Ganesh et al.

Wikipedia [16] have been used by people most of the time for training the embedding models. These models cannot be directly used since clinical texts includes more clinical words than the general words and it is not following the general grammar patterns. After computing the word vectors, different methodologies were used for evaluating the word embedding models. Context predicting and context counting from semantic vectors are few among them in which the relation between data and correlation issues with the different parameters are measured for lexical semantic tasks to evaluate the word embedding model [17]. Counter predicting model is chosen over count-based model due to its ability to give better results. Latent Semantic Analysis was used by Landauer Thomas [18] for indirect knowledge accretion from text and analysis for similarities in space were done by local co-occurrence. Unsupervised vectors were used for classification problems in analogy tasks by Turney [19] and this unsupervised way of learning for text applications were tried to be modified by many others [20]. In bioinformatics domain assessment of word embeddings was done by Pakhomo [12]. Due to restrictions on the use of clinical texts (HIPAA), work available on clinical POS tagging is much less. POS annotation of 390,000 pediatric sequence from text at Cincinnati Children’s Medical Centre was reported by Pestian et al. [3]. With the addition of Special Lexicon into tagger wordlist, tagger which is comparable to dTagger after training acquired an accuracy of 91.5%. But both tagger and the corpus were not available. In order to reduce the dimensions of clinical text annotation while co-training a POS tagger along with WSJ corpus Liu et al. [4, 21] developed sampling methods. While evaluating one of the sampling methods in tagging pathology reports, 84% of the training data found to be reduced giving an accuracy of 92.7%. Due to the domain constraints, annotated corpus as well as the trained tagger were not available to the research community. Mayo Clinic in Rochester, Minnesota developed MED corpus [4] having 100,650 POS-tagged tokens from 273 clinical notes. An accuracy of 93.6% on the clinical notes was achieved when annotations were pooled with GENIA and POS-tagged corpora [22]. Even with the unavailability of clinical text corpora, Mayo Clinic made a biomedical NLP package cTAKES [6] which is a full- established tagger made as a pre-trained reusable model. The classic methods of doing NER were dictionary based and Rule based approaches [5], which required domain expertise for detecting proper rules. Earlier most of the researchers, those focused on named entity recognition tasks mostly proposed the conventional machine learning approaches or using a grouping of conventional machine learning and rule-based approaches. In [23] different supervised and semi-supervised machine learning algorithms were used for NER problems which concentrated on domain-dependent attribute and specialized text features. Hybrid models made by concatenating Conditional Random Fields (CRF) and Support Vector Machines (SVM) algorithms combined with different pattern matching rules gave better output as shown in [7]. In [8] combining some pre-processing techniques like annotation and true casing with CRF based NER seems to have better concept extraction performance. i2b2 challenge top performed models employed CRF and

MedNLU: Natural Language Understander for Medical Texts

7

semi markov Hidden Markov Models (HMM) with the F-score value of 0.85 in the shared task. Brown clustering method was used to derive unsupervised feature representations from unlabelled corpora joined with HMM algorithm that was semi-supervised, was selected as the best performing system for 2010 i2b2/VA challenge [24]. Multiple aspect relations between words are not captured by one-hot unsupervised word feature representation from Brown clustering. Thus Jonnalagadda [25] proposed clinical Entity recognition that was improved by including distributional word representation with random indexing model. By integrating word embedding obtained from English Wikipedia corpus has been applied for the different NER tasks [25] which found out to be a successive approach. CRF based concept extraction system [26] got an enhanced performance through binarized word embedding obtained from domain specific corpora. By the commencement of deep learning, a subset of machine learning, unparalleled results were obtained for visual, NER and speech. Features are automatically learned in neural networks which reduces the man power that was earlier needed for machine learning, making neural network advantageous than conventional machine learning algorithms. Researchers now started applying Deep Learning algorithms to the health care data in the task like NER and relation extraction [9–11].

3 Methodology The neural network architecture used for our implementation has multiple components. The architecture of the entire process is depicted in Fig. 2. The text representation is first and foremost technique in any natural language understanding task. It sets the stage for the performance of subsequent machine learning or deep learning algorithm. In our problem statement we transformed the input sentence into a vector representation combining three different attributes named as word embeddings, character embedding and feature vector. The character embeddings are computed through CNN using the methodology described in [27]. In this experiment we have used a domain specific word embedding model developed from Journal of Medical Case Reports (Health Embedding) and also, we have experimented with the architecture with word embeddings from Google (Google Embedding). By fusing these three vectors to the network with Bi-LSTM followed by CRF or SoftMax makes the final prediction. The developed health embeddings are evaluated through both qualitative and quantitative methods.

3.1 Word Embedding Word embedding captures the contextual meaning of words in terms of a low dimensional vector. A word vector should clearly represent the distribution of adjacent

8

H. B. Barathi Ganesh et al.

Fig. 2 Sequencing modeler

words around the current word. This approach of representing words has helped achieve state of the art performance for many challenging natural language processing tasks. The two major models for learning word embedding were Continuous Bag-of-Words (CBoW) model which learns current word representation based on adjacent words (or context) and Continuous Skip-Gram model which learns by predicting the adjacent words given a context word [13, 28]. In our experimentation, we employed CBoW model for word embeddings. The input layer consists of context words with a word window of size S and Vocabulary V. This input is passed to hidden layer h which is an N-dimensional vector. Finally, the output y is one-hot encoded word from training examples. The input layer is connected to the hidden layer via a V × N weight matrix W and hidden layer is connected to output layer using a N × V weight matrix Wt. The forward pass computations are performed by first computing the output of hidden layer h as follows: 1  W xi s i=1 s

h=

(1)

Finally, the output is computed as:   exp u j   y j = P(w1 , . . . , wc ) =  v exp u j j=1

(2)

where, u j is the input to each layer in output layer. This forward pass is followed by a backward propagation in which the model learns the parameters in term of weight

MedNLU: Natural Language Understander for Medical Texts

9

matrices W and Wt . The weight matrices are initialized with random values. The cost function (E) which is just the conditional probability of output word given the input word is computed using the training examples fed to the model. Our objective is to maximize the conditional probability. Maximizing the conditional probability is similar to minimizing the negative log probability. The final objective function could be written as: minimi ze j = − log P(w1 , ..., wc )

(3)

The optimization procedure includes gradient computation of the objective function with respect to the unknown parameters [14]. The parameters are finally updated at each iteration using Stochastic Gradient Descent.

3.2 Character Embedding The character level representation of words were extracted using a Convolutional Neural Network (CNN). The CNN helps extracts morphological information from all the characters in a word and transform it into neural encodings. Earlier research has shown that Convolutional Neural Networks is one of the prominent approaches to mine the prefix (first n characters) and suffix (last n words) information from characters of respective words and represent them as a lower dimensional vector call character embedding. Figure 2 shows the CNN architecture, which is used to mine the character-level vector representation of a given word. This architecture is similar to the architecture proposed by Chiu et al. [27]. Except the character type features, in this experiment we have used only the character embeddings as the inputs to CNN. The overview of the architecture for extracting character embedding using CNN is given in Fig. 3.

3.3 Feature Vector The feature vector is just one hot encoded vector. It transforms the 7 categorical attributes into one hot encoding vector. The different categorical attributes are Start case, uppercase, lowercase, all numeric, partially numeric, contains digit, and others. The final vector input is the concatenation of character embedding (vector for character representation), word embedding (vector for word representation), and feature vector (categorical attributes). We call this concatenation as Feature Fusion. The word embedding from pre-trained google news vectors were used in one setup while in the other we trained our own embeddings on healthcare data. These healthcare embeddings seem to work better than the pre-trained embeddings. In our experimentation, we have observed that the implementation using feature fusion yields better results than word embeddings alone.

10

H. B. Barathi Ganesh et al.

Fig. 3 CNN for Character embedding

3.4 Bidirectional Long Short-Term Memory (Bi-LSTM) The textual data is nothing but a string of words put together with some language specific rules. The most suitable network architecture which inherently works well with sequential data is Long Short-Term Memory (LSTM) networks. The network architecture of LSTM differs in terms of directionality. It could be unidirectional or bi-directional. The Bi-LSTM has access to the information from past as well as future [29, 30]. The LSTM network consists of a set of memory blocks. Each LSTM cell has a self-connected memory cell and three gates namely, input, output and forget gates. These inputs, output and reset gates corresponds to write, read and reset operations for a single LSTM cell. These memory blocks help LSTM cells to retain information for a longer duration of time and it also help solve long range dependency issues. In sequential task of Natural Language Processing (NLP), it is always better to have both past as well as future contexts. However, an LSTM cell retains information from the past values not the future values. An elegant solution to the aforementioned scenario is to use a Bi-directional LSTM cell [29]. The idea is to replicate the LSTM cell and stack it side by side. The first cell reads the input as-is and the second half reads the same input but a reverse copy of it. It has practically proven to work better for sequential tasks.

MedNLU: Natural Language Understander for Medical Texts

11

3.5 Conditional Random Fields In sequence labelling task, it is always beneficial to consider the correlation between adjacent labels. In NLP tasks like Part of Speech (POS) tagging and Named Entity Recognition (NER), there are multiple labels per sequence. Instead of decoding individual labels we model the sequence jointly using CRF [31]. Given an input word sequence x = x1 , x2 , …, xn where each element is a vector representation of each word in the sequence. Another sequence y = y1 , y2 …, yn represents the sequence of labels for the word sequence x. The probabilistic sequence model for given sequence of words x given as the conditional probability label sequence given the word sequence. It could be given as:   T exp W y,y  z i + b y,y    p(x; W, b) =  n T  exp W z + b   i y,y y ∈γ (z) i=1 y,y n

i=1

(4)

where y’ and y are the label pair. Wand b are the weight vectors and bias corresponding to the language pair. The training of CRF is executed using maximum likelihood estimation. For training set pair (xi , yi ), the log likelihood is given as; L(W, b) =



log p(y|z; W, b)

(5)

i

The objective is to choose the parameters such that the log-likelihood is maximized. To retrieve the sequence of labels with highest probability, we use: y ∗ = argmax p(y|z; W, b)

(6)

4 Corpora Statistics Data utilized for forming distributional representation model (word embedding) is created with text content web crawled from the sources like GENIA [32] and Journal of Medical Case Reports (BMC) and i2b2 [33]. The closed set small corpus like GENIA corpus and i2b2 corpus includes the clinical data for performing POS tagging and NER. Contents from the medical journal is collected by web crawler which is a program used for accumulating relevant data from the internet. Web crawler fetches the documents corresponding to the seed URL and parses links in the seed page and place each URL into a queue. These links are used to collect the text data. Uniform cleaning is applied for crawled data (training) as well as testing data (GENIA and i2b2). The uniform cleaning applied for removing irrelevant content from the raw data. It includes handling special characters like ± , Latin alphabets,

12

H. B. Barathi Ganesh et al.

Table 1 Experimented data statistics

Crawled data BMC

GENIA corpus

i2b2 clinical

Number of documents

4109

67



Number of sentences

434,099

23,467

16,107

Number of words

7,861,071

439,403

201,015

Average word/sentence

18.10

18.72

12.48

and etc. which are not encoded by UTF-8 encoding scheme. We have also removed the classes with negligible count: predeterminers (PDT), interjection (UH) and ?/= . Statistics about the corpora utilized in creating the MedNLU is shown in Table 1. GENIA [32] corpus is used for creating parts of speech tagging model in MedNLU. The i2b2 clinical [33] corpus is annotated with 3 types of clinical tags, which are named as problem, test and treatment. These tags were comprised of successive words also. This corpus consists of 16107 sentences of patient discharge summary. The i2b2 clinical corpus follows the Inside—Outside—Beginning (IOB) format.

5 Experiments and Observations The sequential modeler for MedNLU has been constructed by integrating CNN, BLSTM and CRF. The systematic diagram is given in Fig. 2. This experiment is performed with the system having the following configuration: RAM 32 GB, NVIDIA GEFORCE GTX1080, i7 Processor, Python 3 and Ubuntu 16.04 LTS. For every word, character-level representation (i.e. 30 × 1 vector) is computed using CNN as given in Fig. 2. For each of these embeddings, we fine-tune the initial embeddings by modifying them during weights updates of the neural network model by back-propagation. These character embeddings are concatenated with the corresponding word embedding (300 × 1) and a feature vector (7 × 1). This concatenated vector has been fed to the BLSTM followed by the CRF layer. For the performance observation purpose, we also integrated the BLSTM with the typical SoftMax layer. The dropout has been applied in multiple levels during the computation of character embeddings. It applied before inputting to CNN as well as on input and output vectors of BLSTM. The dropout rate has been fixed as 0.25 for all dropout layers through all the experiments. This is shown in Fig. 1. Optimization of parameters are performed with mini-batch Adam optimizer with batch size 32 and early stopping 5. We have used pre-trained word embedding generated from general news text (Google embedding), as well as the embedding model developed from clinical texts

MedNLU: Natural Language Understander for Medical Texts

13

(Health embedding). Python Gensim library is used for developing health embeddings. From word2vec, the continuous bag of words model with the following parameters are used to compute the health embeddings: minimum word frequency as 1, embedding dimension as 300 and window size as 4. The corpus used for creating health embedding model has explained under corpus statistics. The systematic diagram is given below in Fig. 4. The created word embeddings are evaluated through qualitative and quantitative analysis. We have used cosine distance to inference the similarity among the words/phrases for performing qualitative evaluation. The top five similar words were taken with respect to the target word for further analysis. The qualitative analysis results are given in Following Table 2. In qualitative analysis, health embedding (vectors computed for the clinical text) is validated by using the data from two sequential modeling tasks: POS tagging and NER. Qualitative evaluation is performed based on the three different categories disorder, symptoms and drug name. t-distributed Stochastic Neighbour Embedding (t-SNE) whose primary purpose is used for visualizing high parameter data. There are techniques like multidimensional scaling, sammon mapping graph-based techniques are developed earlier before t-SNE. Here D-dimensional data is visualized into two dimensional or threedimensional data. In t-SNE, the euclidean distances between vectors are converted into a probability distribution such that similar vector will have the high probability. The t-SNE map is

Fig. 4 Model diagram for creation of health care embeddings

14

H. B. Barathi Ganesh et al.

Table 2 The performance of health embeddings through quantitative analysis Category

Target word/phrase

Health embedding

Google Word embedding

Wikipedia embedding from glove

Disorder

Diabetes

Psoriasis Neutropenia Schizophrenia Epilepsy Obesity

Diabetics Diabetic Hypertension Diabetes mellitus Heart

Hypertension Obesity Arthritis Cancer Alzheimer

Symptom

Dyspnea

Fatigue Diarrhea Nausea Arthralgia Dizziness

Dyspnoea Pruritus Nasopharyngitis Symptom severity Rhinorrhea

Shortness Breathlessness Cyanosis Photophobia Faintness

Drug

Aspirin

Azathioprine Rifampicin Capecitabine Doxorubicin Fluconazole

Dose aspirin Ibuprofen Statins Statin Calcium supplements

Ibuprofen Tamoxifen Pills Statins Medication

generated by keeping the target word in different categories and the same is shown in Fig. 5. t-distributed stochastic neighbour embedding maps the vector in the high dimensional space into the 2-dimension space. Here in this paper, the vectors from the health embeddings those are close in the vector space can be visualized by t-SNE map. In the above figure, data points in the orange, blue and green colors are representing the respective categories like disorder, drug, and symptoms. From Fig. 5, we can clearly observe that the computed health embedding maps the different categories (disorder, drug, and symptoms) into different clusters. We modelled our analysis into classification task for performing quantitative analysis. As described in Ghanny et al. [34], we then evaluated the word embedding on a POS tagged representation of GENIA corpus as given in Fig. 4 to ensure the quality of representation. In POS tagging task we have totally 26 classes and those were mapped to meta tags with the count of 12 classes. In entity recognition task we have 7 classes. The statistics about the classes are given in Tables 3 and 4. The obtained quantitative results are given in the following Fig. 6. The results were obtained using 10 × 10 fold cross validation by having LSTM as a classifier. Finally, we ended up with performance results for four architecture i.e. ([Google Embedding or Health Embedding] + BLSTM + [CRF or SoftMax]). The observed results for POS task for these combinations are given in Fig. 6. The performance of sequence modeler on NER corpus are shown in Fig. 7a, b and performance on POS corpus are shown in Fig. 8a, b. The chunking and Parsing are performed through regular expression parser. A set of rules defined for extracting Clauses (S), Prepositional Phrases (PP), Verb

MedNLU: Natural Language Understander for Medical Texts

15

Fig. 5 t-SNE map of health embeddings computed through word2vec CBOW model Table 3 POS corpus: target class statistics

POS Tag

POS Tag

Count

SYM

3217

IN

12,414

CC

4122

CD

4672

JJ

8454

VB

11,822

36,085

RB

3034

PRP

807

NN WDT DT Table 4 NER corpus: Target class statistics

NER Tag

Count

442 7171

Count

B-problem

19,664

I-problem

27,938

B-test

13,831

I-test

11,898

B-treatment

14,185

I-treatment O

12,053 291,706

16

H. B. Barathi Ganesh et al.

Fig. 6 The performance of health embeddings through quantitative analysis

Phrases (VP) and Noun Phrases (NP). These commonly occurring grammar rules are extracted from POS tagged corpus based on the frequency of its occurrence. The resultant parsed tree from the chunking is also a part of MedNLU. It can be observed that CRF performs better than the SoftMax in both the NER and POS tagging tasks. This ensures the need of sequence modeler at the output layer than the typical SoftMax layer. The time duration takes for building CRF and SoftMax based models are almost the same. Due to this we have not given the details about time consumption for building proposed sequence modeler. Google embedding wins the race by attaining better results than the health embeddings in both the tasks. Hence this ensures that, the sequence modeler is independent to the requirement of domain knowledge. It can also be inferred that the character embeddings include the information about medical words that are not present in the Google embeddings. We also compared the results obtained by the other models on experimented corpora. The sequence modeler able to achieve the state of the art performance on i2b2 clinical corpus. The statistics are given in the following Table 5. MedNLU able to achieve nearly 8% of improved performance. Due to the non-availability of standard separated train and test files, we have not compared the results obtained for GENIA Corpus.

6 Conclusion An integrated framework for Natural Language Understanding of clinical text has been developed. The proposed sequential modeler on Part of Speech Tagging and

MedNLU: Natural Language Understander for Medical Texts

17

Fig. 7 a Performance of google embedding—sequence modeler with CRF and SoftMax on NER b Performance of health embedding—sequence modeler with CRF and SoftMax on NER

18

H. B. Barathi Ganesh et al.

Fig. 8 a Performance of google embedding—sequence modeler with CRF and SoftMax on POS tagging b performance of health embedding—sequence modeler with CRF and SoftMax on POS tagging Table 5 Comparing obtained results with other systems on NER corpus Methodology

Precision (%)

Recall (%)

F-Score (%)

Semi supervised hidden markov models [24]

83.64

86.88

85.23

Distributional semantics and CRF [25]

85.60

82.70

83.70

CRF-neural embedding [26]

85.10

80.60

82.80

MedNLU

94.60

93.10

93.8

MedNLU: Natural Language Understander for Medical Texts

19

Named Entity Recognition by attains the state of the art performance of 93.8% as F1 score for i2b2 clinical corpus and achieves 97.29% as F1 score for GENIA corpus. From the observed results it is clear that the character embedding provides an additional sub word information about the clinical words. Character Embedding along with the word embedding (computed for general text) solves the requirement of clinical text-based word embedding model. The sub features extracted from the clinical words also contributes towards the objective. The proposed MedNLU, has the capability of performing Named Entity Recognition, Part of Speech Tagging, Parsing and Chunking on clinical texts. These successive results are good enough to extend this framework further towards building the relation extraction and dependency parsing modules. It is also clear that the existing annotated corpora are not good enough to drive the deep learning algorithms. Hence, future work will also be focused on creating large annotated clinical text-based corpora. Framework will be extended further by including features that support in finding of Adverse Drug reaction and also findings of disability.

References 1. Wang, Y., Wang, L., Rastegar-Mojarad, M., Moon, S., Shen, F., Afzal, N., Liu, S., Zeng, Y., Mehrabi, S., Sohn, S. et al.: Clinical information extraction applications: a literature review. J. Biomed. Inform, 2017 2. Yogatama, D., Liu, F., Smith, N.A.: Extractive summarization by maximizing semantic volume. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1961–1966, (2015) 3. Pestian, J.P., Itert, L., Duch, W.: Development of a pediatric text-corpus for part-of-speech tagging. In: Proceedings of the International IIS: IIPWM‘04 Conference held in Zakopane, Poland. Springer, pp. 219–26 (2004) 4. Pakhomov, S.V., Coden, A., Chute, C.G.: Developing a corpus of clinical notes manually annotated for part-of-speech. Int J Med Inform. 75(6), 418–429 (2006) 5. Hirschman, L., Morgan, A.A., Yeh, A.S.: The MITRE Corporation. Rutabaga by any other name: extracting biological names. J. Biomed. Inform. 35(4), 247–259 (2002) 6. Savova, G.K., Masanz, J.J., Ogren, P.V., Zheng, J., Sohn, S., Kipper-Schuler, K.C., Chute, C.G.: Mayo clinical text analysis and knowledge extraction system (ctakes): architecture, component evaluation and applications. J. Am. Med. Inform. Assoc. 17(5), 507–513 (2010) 7. Boag, W., Wacome, K, Naumann, T., Rumshisky, A.: Cliner: a lightweight tool for clinical named entity recognition. AMIA Joint Summits on Clinical Research Informatics (poster) (2015) 8. Fu, X., Ananiadou, S.: Improving the extraction of clinical concepts from clinical records. In: Proceedings of BioTxtM14 (2014) 9. Lv, X., Guan, Y., Yang, J., Wu, J.: Clinical relation extraction with deep learning. International Journal of Hybrid Information Technology, pp. 237–248 (2016) 10. Wu, Y., Jiang, M„ Lei, J., Xu, H.: Named entity recognition in Chinese clinical text using deep neural networks. Studies in Health Technology and Informatics, pp. 624 (2015) 11. Dong, X., Qian, L., Guan, Y., Huang, L., Yu, Q., Yang, J.: A multiclass classification method based on deep learning for named entity recognition in electronic medical records. In: Scientific Data Summit (NYSDS), IEEE, pp. 1–10 (2016) 12. Pakhomov, S.V., Finley, G., McEwan, R., Wang, Y., Melton, G.B.: Corpus domain effects on distributional semantic modeling of medical terms. Bioinformatics 32(23), 3635–3644 (2016)

20

H. B. Barathi Ganesh et al.

13. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013) 14. Ganguly, D., Roy, D., Mitra, M., Jones, G.J.: Word embedding based generalized language model for information retrieval. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp. 795–798 (2015) 15. Ganesh, H.B., Kumar, M.A., Soman, K.P.: From vector space models to vector space models of semantics. In: Forum for Information Retrieval Evaluation, Springer, Cham, pp. 50–60 (2018) 16. Tang, B., Cao, H., Wang, X., Chen, Q., Xu, H.: Evaluating word representation features in biomedical named entity recognition tasks. BioMed research International, 2014 (2014) 17. Jagannatha, A., Chen, J., Yu, H.: Mining and ranking biomedical synonym candidates from wikipedia. In: Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis, pp. 142–151 (2015) 18. Gurulingappa, H., Toldo, L., Schepers, C., Bauer, A., Megaro, G.: Semi-supervised information retrieval system for clinical decision support. In TREC (2016) 19. Peter, D.T.: A uniform approach to analogies, synonyms, antonyms, and associations. In: Proceedings of the 22nd International Conference on Computational Linguistics, Vol. 1. Association for Computational Linguistics, pp. 905–912 (2008) 20. Landauer, T.K., Dumais, S.T.: A solution to plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104(2), 211 (1997) 21. Liu, K., Chapman, W., Hwa, R., Crowley, R.S.: Heuristic sample selection to minimize reference standard training set for a part-of-speech tagger. J. Am. Med. Inform. Assoc. 14(5), 641–650 (2007) 22. Fan, J.W., Prasad, R., Yabut, R.M., Loomis, R.M., Zisook, D.S., Mattison, J.E., Huang, Y.: Part-of-speech tagging for clinical text: wall or bridge between institutions?” In: AMIA Annual Symposium Proceedings, vol. 2011. American Medical Informatics Association, p. 382–391 (2011) 23. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. ICML. pp. 282–289 (2001) 24. de Bruijn, Berry, Cherry, Colin, Kiritchenko, Svetlana, Martin, Joel, Zhu, Xiaodan: Machinelearned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. J. Am. Med. Inform. Assoc. 18(5), 557–562 (2011) 25. Jonnalagadda, S., Cohen, T., Wu, S., Gonzalez, G.: Enhancing clinical concept extraction with distributional semantics. J. Biomed. Inform. 45(1), 129–140 (2012) 26. Wu, Y., Xu, J., Jiang, M., Zhang, Y., Xu, H.: A study of neural word embeddings for named entity recognition in clinical text. In: AMIA Annual Symposium Proceedings, vol. 2015, p. 1326. American Medical Informatics Association (2015) 27. Chiu, J.P.C., Nichols, E.: Named entity recognition with bidirectional lstm-cnns. arXiv preprint arXiv:1511.08308 (2015) 28. Ganesh, H.B., Kumar, M.A., Soman, K.P.: Distributional semantic representation in health care text classification. In: International Conference on Forum of Information Retrieval and Evaluation, pages 201–204, 2016 29. Dyer, C., Ballesteros, M., Ling, W., Matthews, A., Smith, N.A..: Transition based dependency parsing with stack long short-term memory. In: Proceedings of ACL-2015 (Volume1: Long Papers), pages 334–343 (2015) 30. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Networks 18(5–6), 602–610 (2005) 31. Settles, B.: Biomedical named entity recognition using conditional random fields and rich feature sets. In: Proceedings of the COLING 2004 NLPBA,. 2004, pp 104–108 (2004) 32. Verspoor, K., Cohen, K.B., Lanfranchi, A., Warner, C., Johnson, H.L., Roeder, C., Choi, J.D., Funk, C., Malenkiy, Y., Eckert, M., et al.: A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools. BMC Bioinformatics 13(1), 207 (2012)

MedNLU: Natural Language Understander for Medical Texts

21

33. Uzuner, O., South, B.R., Shen, S., DuVall, S.L.: 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assoc. Sep-Oct 18(5), 552–556 (2011) 34. Ghannay, S., Favre, B., Esteve, Y., Camelin, N.: Word embedding evaluation and combination. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 300–305 (2016) 35. Baroni, M., Dinu, G., Kruszewski, G.: Don’t count, predict! a systematic comparison of contextcounting vs. context-predicting semantic vectors. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 238–247 (2014)

H. B. Barathi Ganesh Current Chief Technology Officer at Arnekt Solutions Pvt Ltd., a pioneering Artificial Intelligence technologist with 5+ years of experience in implementing AI-enabled technologies and enterprise systems that facilitate business processes and strategic objectives. Continuous practitioner in blending of technology and business requirements for defining powerful future business strategies, which were evidenced by cost-effective, high-performance services and products. Has broader AI expertise in the domains like Automotive, BFSI, Education, E-Commerce, Logistics, Manufacturing, and Retail. U. Reshma Principal Engineer—Researcher in the field of Natural Language Processing, Conventional Machine Learning and Deep Learning. Has sound fundamental understanding in sub-fields of Artificial Intelligence. K. P. Soman Currently serves as Head and Professor at Amrita Center for Computational Engineering and Networking (CEN), Coimbatore Campus. He has 300+ publications in national & international journals and conference proceedings. He has organized a series of workshops and summer schools in Advanced signal processing using wavelets, Kernel Methods for pattern classification, Deep learning, Big-data Analytics etc. for industry and academia. Authored books on “Insight into Wavelets”, “Insight into Data mining”, “Support Vector Machines and Other Kernel Methods” and “Signal and Image processing-the sparse way”, published by Prentice Hall, New Delhi, and Elsevier. M. Anand Kumar Received his Ph.D. in Machine Translation from Amrita Center for Computational Engineering and Networking (CEN), Coimbatore Campus. Currently serving as an assistant professor at the Department of Information technology, National Institute of Technology, Karnataka. He has 100+ publications in national and international journals and conference proceedings. His research interests include Natural Language Processing, Text Mining, Deep Learning and Transfer Learning.

Deep Learning Based Biomedical Named Entity Recognition Systems Pragatika Mishra, Sitanath Biswas and Sujata Dash

Abstract In this chapter, we are proposing a really crucial downside known as medicine Named Entity Recognition system. Named entity recognition could be a vital mission in linguistic communication process referring to artificial intelligence, information Retrieval and data Extraction. Linguistic communication process could be a subfield of engineering, computer science and data engineering that deals that the interaction between the pc and human language. It deals with the method and analyse the language information. It’s a pc activity during which computers square measure subjected to know, alter and analyse which has automation of activities, strategies of communication. One amongst the vital elements of linguistic communication process (NLP) is called Entity Recognition (NER), which is employed to search out and classify the expressions of specific which means in texts, written in linguistic communication. The various varieties of named entities includes person name, association name, place name, numbers etc. During this book chapter we tend to area unit solely handling medicine named entity recognition (Bio-NER) that could be a basic assignment within the conducting of medicine text terms, like ribonucleic acid, cell type, cell line, protein, and DNA. Biomedical NER be one amongst the foremost core and crucial task in medicine data extraction from documents. Recognizing or characteristic medicine named entities looks to be tougher than characteristic traditional named entities. During this book chapter we tend to area unit victimization Deep learning formula that is additionally called deep structural learning or gradable learning. It’s a division of a broader unit of machine learning ways supported learning knowledge representation conflicting such task algorithms. This kind of learning is supervised, semi supervised or unsupervised. Deep learning model area units are largely inspired by IP and communication pattern in biological nervous

P. Mishra Gandhi Institute for Technology, Bhubaneswar, India e-mail: [email protected] S. Biswas · S. Dash (B) North Orissa University, Baripada, India e-mail: [email protected] S. Biswas e-mail: [email protected] © Springer Nature Switzerland AG 2020 S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics, Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_2

23

24

P. Mishra et al.

systems nonetheless with various variations from structural and purposeful functions of biological brains. For experiment and analysis, we’ve used GENIA Corpus that was created by a gaggle of researchers to develop the analysis of knowledge and text mining system in biological science. It consists of one, 999 MEDLINE abstracts. The GENIA Corpus has been loosely employed by linguistic communication process community for improvement of linguistics search system and institution Bio human language technology tasks. During this analysis, we tend to propose a multi-tasking learning arrangement for Bio-NER that supports NN models to avoid wasting human effort. Deep neural spec that has several layers and every layer abstract options primarily based on the standard generated by the lower layers. After comparing with the results of various experiments like Saha et al.’s (Pattern Recogn. Lett 3:1591–1597, 2010) with a Precision of 68.12, Recall 67.66 and F-Score 67.89; Liao et al.’s (Biomedical Named Entity Recognition Based on Skip-Chain Crfs. pp. 1495–1498, 2012) with a Precision of 72.8, Recall 73.6 and F-Score73.2; ABNER (A Biomedical Named Entity Recognizer, pp. 46–51, 2013) with a Precision of 69.1, Recall 72.0 and F-Score 70.5; Sasaki et al. (How to Make the Most of Ne Dictionaries in Statistical Ner. pp. 63–70, 2008) with a Precision of 68.58, Recall 79.85 and F-Score 73.78; Sun et al.’s (Comput. Biol. Med 37:1327–1333, 2007) with a Precision of 70.2, Recall 72.3 and F-Score 71.2; Our system has achieved a Precision of 66.54, Recall 76.13 and F-score 71.01% on GENIA normal take a look at corpus, that is near to the progressive performance using simply Part-of-speech feature and shows that deep learning will efficiently be performed upon medical specialty Named Entity Recognition. This book chapter deals with the following section: Introduction, Literature review, Architecture, Experiment, Results and analysis, conclusion and future work and References. Keywords GENIA corpus · Deep learning · Machine learning · Natural language processing · Named entity recognition

1 Introduction In this book chapter, we are dealing with a really crucial downside referred to as medicine Named Entity Recognition system. Named entity recognition may be a vital mission in language process touching the computational linguistics, info Retrieval and data Extraction. Language process may be a subfield of technology, computer science and data engineering that deals with the interaction between the computer and human language. It deals with the method and analyse the language information. It’s a laptop activity during which computers are subjected to know, alter and analyse which incorporates automation of activities, ways of communication. Named Entity Recognition (NER) is one in all the crucial elements to language process (N L P) that is employed to search out and categorise expressions of distinctive which means in texts, written in language. The assorted kinds of named entities embrace person’s name, organization’s name, place’s name, numbers etc. during this book chapter we

Deep Learning Based Biomedical Named Entity Recognition Systems

25

have a tendency tobe completely coping with medicine named entity recognition (Bio-NER) that may be primary task in managing medicine text terminologies, like polymer, cell-type, cell-line, protein, and DNA. Biomedical N.E.R is the most simple and important task in medicine data extraction from text. Recognizing or distinctive medicine named entities appears to be additionally tricky than to recognize traditional named entities. Biomedical named entity recognition faces five challenges: • The numbers of new medical terms are emerging. Therefore it is hard to build a dictionary which will include the newest term. • Same word could be categorized into different entity in term of context. • Length of an entity could be quite long, and may include special characters such as hyphens. • Abbreviations are frequently used in the biomedical area that undergo ambiguous situation. • In biomedical terminology, normal terms or functional terms are united. Due to this the term becomes very long. It is challenging for bio-NER to fragment the sentence with named entities. Recently, applications of deep learning build approach has been made to biomedical named entity recognition (Bio-N.E.R) which has shown promising outputs. However, an abundant/huge quantity of training data or the scarcity/lack of data can hamper the performance of deep learning approaches. Deep learning is also known as deep structural learning or hierarchical learning. It is a branch of a broader unit of machine learning methods based on learning data representations, as opposed to specified task algorithms. Deep learning models are mostly enthused by communication patterns and information processing in biological nervous systems yet has various differences from the functional and structural property of the biological brain (human brain). Also, Deep learning methods such as deep belief networks, deep neural networks and recurring neural networks are applied to areas like audio recognition, computer vision, former social network filtering, bioinformatics, natural language processing, etc. where they have shown results equivalent to and in certain cases advanced to human experts. This type of learning can be: Supervised learning, which is a machine learning chore of learning a function that maps an input and output based on example input–output pairs. Semi-supervised learning, which is a class of machine learning chore and technique which also make use of un-labelled data for training a small quantity of labelled data with a large amount of unlabeled data. Unsupervised learning, which is a term used for Hebbian learning, associated to learning without a teacher, also known as self organisation and a method of modelling the probability density of inputs. In this research work, we draw on a method which is based on Convolution Neural Network (CNN) Named Entity Recognition (NER) is that computerised process of finding out plus labelling entities in a given text. Within the medicine domain, typical entity varieties embody illness, chemical, cistron and macro molecule. Biomedical NER (BioNER) is a necessary structural block of the various down-stream text mining applications like extraction of drug-drug interactions [1] and disease-treatment relations. Bio-NER be additionally used once in the formation of a classy medicine

26

P. Mishra et al.

entity search tool [2] that allows user to cause advanced query to go looking for bioentities. NER, in medicine text-mining is concentrated chiefly on the wordbook, the rule and the machine learning-based approach [3–5] word book based mostly systems have an easy and insightful structure however they cannot handle undetected entity or polysemantic word, leading to lower recall [3, 4]. Additionally, building and maintaining a comprehensive and latest wordbook includes a substantial quantity of labour-intensive work. The statute primary approach is a lot of a scendable; however it wants manually crafted featured sets to suit a model to a dataset [5]. These dictionary-based and ruled approach are able to do high preciseness [2] however will manufacture incorrect predictions once a brand new word, that isn’t within the coaching knowledge, seems for the period of a sentence (out-of-vocabulary problem). Habibi et al. [6, 7] utilised character-level word embedding to confine characteristic, like writing options, of medical specialty entities and achieved progressive performance, demonstrating the efficiency of character-level word embeddings in BioNER. Even though these models have shown some potential results, NER remains a really difficult chore within medical specialty domain for all the subsequent reasons. First, a restricted quantity of coaching knowledge is offered for BioNER task. On the contrary, the J.N.L.P.B.A corpus [8] contain annotation of solely genes and proteins. Hence, {the knowledge|the info|the information} for every entity kind includes solelya little section of the overall quantity of annotated data. Multi-task learning (MTL) may be a technique for coaching one model for numerous tasks at an equivalent time. MTL will influence totally diverse datasets that area unit composed for various however connected tasks [9]. Though extraction of genes is totally dissimilar from extraction of chemicals, each task needs learning of some general options which will facilitate perceive the linguistic expressions for medicine texts. Student et al. Since M.T.L based mostly models square measure is trained on various styles of entities and bigger coaching knowledge, they need a broad exposure of varied medical specialty entities, which as expected ends up in higher recall. On the contrary, because the M.T.L models square measure is trained on combos of various entity varieties, they have an inclination towards own issue in differentiating amongst entity varieties, leading to low preciseness. Another reason NER is troublesome within the medical specialty domain is the associate entity might be tagged as completely unlike entity sorts counting on its matter context. As an example, BiLSTM-CRF based mostly models for illness entities erroneously labeled the factor name “BRCA1” as an illness entity as a result of there are illness names like “BRCA1 abnormalities” or “Brca1-deficient” within coaching sets. In addition, the coaching set that annotates “VHL” (Von-Hippel-Lindau illness) as disease entity confuses the model as a result of VHL be additionally used as factor name, since the alteration of this factor causes VHL illness. Therefore, every model is Associate in nursing professional in its own domain and helps improving the accuracy rate by investing the multi-domain data from the opposite model. Driven by the works of Collobert [10], we have a tendency to tend to place up a neural network model in support of medication N.E.R mission. Our works gift that deep learning can expeditiously be performed on drugs N.E.R. Our design achieves getting

Deep Learning Based Biomedical Named Entity Recognition Systems

27

ready towards progressive performance on GENIA corpus that may be a well-liked commonplace corpus has been adopted by several analysis teams as assessment.

2 Literature Review In the field of Biomedical, the level of data has been produced each day is Gigabyte or even Terabyte. The development of the medicine analysis space has been driven into some ways by such an enormous quantity of information. Medicine Named Entity Recognition could be an important initial step for medicine scientific discipline. Medicine Named Entity Recognition is far trickier than the final Named Entity Recognition thanks to complexities like daily dynamic cluster members, distinguished boundaries and irregularity in expression [11–15]. The popularity of genes, drawing out a listing of exclusive identifiers for human genes and also the extraction of physical macromolecule—protein interaction annotation—relevant info. AN even-handed exactness and recall discovered in favour of the submission of the cistron mentioned for cistron standardization task. Within the case of proteinprotein interaction task completely different results were obtained looking on the annotation extraction progress. The final characteristic discovered task was the grouping of system outputs showed results higher than a single system that light-emitting diode to the event of the foremost text mining meta-server in the context [12]. There has been numerous supervised technique that are accustomed learn medicine names entity recognition issues like: MEMMs (Maximum Entropy Markov Models) [16] or conditional Markov model could be a graphical model that mixes HMM and most entropy models for sequence labelling. MEMMs notice applications in language processing; a part of speech tagging in specific likewise as info extraction. HMM (Hidden mathematician|Markov|Andre Markov|mathematician} Models) [17] is applied math Markov model into that the system being modelled is taken to be a procedure with unobserved state. CRF (Conditional Random Field) [18] be a category of applied math modelling methodology that is applied in recognition of pattern and machine learning, used for structured prediction. A CRF is capable of taking context into consideration. For instance, the linear chain CRF predicts the sequence of labels for sequence of input samples. It’s fashionable in linguistic communication process. HMM, MEMM, and CRF square measure 3 fashion able applied math model strategies, often applied to pattern recognition and machine learning issues. In Hidden Markov Model (HMM) the word “Hidden” depicts the fact that only the symbols released by the system can be seen. Advantages of Hidden Markov Model have a strong foundation with efficient learning algorithms. Whereas disadvantages of Hidden Markov Model include its dependency on every state and its corresponding observed objects. The sequence labelling, having a relationship with individual words, also relates to aspects such as sequence length or world context, etc. Maximum Entropy Markov Model takes into consideration the dependencies between neighbouring and entirely observed sequence which gives better expression ability. Conditional Random Field Model addresses the labelling bias issue. With

28

P. Mishra et al.

Comparison to Hidden Markov Model, since CRF does not have strict independent assumptions as HMM and accommodate any contact information. Thus its feature design is flexible. Whereas, compared to Maximum Entropy Markov model, CRF computes the conditional probability of global optimal output notes; it overcomes the drawbacks of label bias. CRF is additionally applied for entity recognition in medicine by Settles [2], which accomplish Associate in Nursing F-score of 69.9% on GENIA corpus in conjunction with varied types of character. Whereas, the HMM when applied on GENIA corpus to attain preciseness of 65.5% and a recall of 66.9% [2]. Li conferred 2 faces of medicine named entity recognition model on GENIA corpus, which is split in 2 components [10, 19]: Named entity detection (NED): this is often the primary half that is employed to differentiate the non-named entities (NNE) while not characteristic their sort. Names entity classification (NEC): This is the second part in which the multi-agent technique or strategy is used, achieving an F- score of 76.06%. BioNER is additionally used once building a classy medical specialty entity search tool [20] that allows the user to cause complicated query to go looking for biomedical entities. NER in medical specialty text mining concentrates principally on wordbook, the rule, and the machined learning-based approaches [3–5, 21–23]. Word book based mostly systems have straightforward and perceptive structure however they cannot handle undetected entities or ambiguous words, leading to low recall [3, 4]. These rules and dictionary-based approaches are able to do high preciseness [3] however will manufacture incorrect predictions once a replacement word, that isn’t within the coaching in formation, seems in the sentence (not from the vocabulary issues). The not from the vocabulary issues drawback happen soften particularly within the medical specialty domain, because it is frequent for replacement medical specialty term, like a replacement drug name. Habibi et al. [24] utilised character level word embedding to capture characteristics, like writing options, of medical specialty entities and achieved progressive performance, demonstrating the efficiency of character level word embedding in Bio NER. Though these models have shown some potential results, NER continues to be an awfully difficult job within medical specialty domain for subsequent reasons. Firstly, a restricted quantity of coaching knowledge is out there for Bio NER task. The Gold-standard datasets contain annotation of 1 or 2 varieties of entity. As an instance, the NCBI corpus [8] includes annotations of diseases however not for different varieties of entities like proteins and genes. On the contrary, JNLPBA corpus [9] contains annotations of solely protein sand genes. Hence, {the knowledge |the info| the information} for every entity sort contains solelya little fraction of the entire quantity of annotated data. Multi-task learning (MTL) could be a methodology to coach one model for several tasks at a similar time. MTL will influence completely different datasets that area unit composed for various however connected tasks [25]. Although extraction of genes is totally different from extraction of chemicals, each task needs to learn some general options which may facilitate perceive the linguistic expressions of medical specialty text. [26, 27] achieved performance appreciate that of the progressive single task NER models. In contrast to the standard MTL ways that use solelyone static model, CollaboNet consists of several models strained on

Deep Learning Based Biomedical Named Entity Recognition Systems

29

totally diverse datasets for various task. On the contrary, because the MTL models are trained with mixtures of various entity sorts, they have an inclination to possess problem in differentiating amongst entity sorts, leading to low preciseness. A further excuse NER is troublesome within the medicine domain is the associate degree entity can be tagged as a completely different entity sorts betting on its matter context.

3 Architecture In this chapter we tend to use a technique that relies on a convolutional neural network (CNN) that has obsessed some human language technology (Natural Language Processing) tasks [28–30]. This convolutional neural spec is projected by Bengio [30] for the probabilistic language model. Neural Networks were introduced when this for compound human language technology tasks. We tend to take this into thought for medicine named entity recognition task. The design is given in Fig. 1. After we compare the previous over engineering system, the deep learning approach that is enforced here reduces the enslavement on linguistic ingenuity. In figure one the token beta, delineate within the right middle of the window, and calculated at instant “t”. Words contained by the sliding windows that square measure painted as real valued vectors square measure inputs for this neural network. The node score for every label of word beta is produced once the transformation of linear layers and sigmoid layers. At last the count lattice for the sentence be given as output at the top of procedure. Viterbi algorithmic rule is then applied to induce the label sequence within the best state. And, the length of our input for CNN is fastened and custom-made to text information. Firstly, a word wordbook S is be created by massive information from medical specialty papers. The words in S are reworked into vectors for input for CNN. Each word within the word book encompasses a preset dimension vector. Therefore, the words that are altered into vectors are held on within the matrix M ∈D×|S|

(1)

where, D is the vector dimension of the node. |S| be the size of the vocabulary or dictionary of words. Here, we consider |S| as finite. M is randomly initialized and trained with General Neural Network on a huge number of biomedical text paper files (unlabeled). M, representation of the real-valued vector, can be obtained in two ways: • First methodology would be to initialize the vector of every word i with zero (0) for all the positions and one for the position M to optimize them as parameters throughout the coaching section [29]. • Second technique is viewing the illustration of word as fraction of coaching a neural-network-language model [31–34].

30

Fig. 1 Architecture for neural network

P. Mishra et al.

Deep Learning Based Biomedical Named Entity Recognition Systems

31

In this chapter, we have a tendency to optimize the word illustration for such that Bio-NER task. Here, we have a tendency to use the second technique. Once comparison the various language models [34–37] we have a tendency to choose skip-gram neural network language model. This model isn’t the most effective model, however it’s additional applicable for the coaching of rare words.

3.1 Extraction of Features in Sentence Level Here, within the case of medical specialty NER tasks, a correct label for every word within the sentences have to be compelled to tend to suggest if it’s a Bio-NER or not. These sentences area unit taken as inputs and acceptable labelled sequences are given as output for every sentence. The lengths of the sentences don’t seem to be fastened however the input for neural network is fastened. This is often the explanation; we have a tendency to choose window approach. Therefore, the window size is determined as ‘k’ at the start and completely different exactness might occur within the system thanks to it. The dependency data amongst the label of every word and it’s near words area unit below concern thanks to window approach. Hence, the words close the labelled word of the window will experience the layer along. When we study the word at position C, along with the neighbouring words of position in the range [(C − (k − 1)/2), (C + (k − 1)/2)] shall be pass onto Mapping layer. Since, every word is reworked into D-dimensional vector via this layer; hence the input-size for the linear layer-1 is unbroken mounted.

3.2 Criteria of Label Deep neural network is described as a structural style with many layers. The layers show characteristics supported the options made by subordinate layers. Betting on the planning of the neural network, every layer may either be linear perform or alternative transformation. A perform fθ (.) describes the 3 layers in our design as   f (x) = M 2 g M 1 x + b1 + b2

(2)

where, the matrices M 1 ∈ H ∗Dk , b1 ∈1∗H , M 2 ∈|L|∗H , b2 ∈1∗|L| and g (.) are sigmoid functions. H be the number of the hidden units. – |L| be the size of possible label tags set with the uses of Stochastic gradient ascent, on a training set T, the V-dimensional parameter matrix θ(θ1 , θ2 , . . . . . . θv ) will be trained by maximizing the convenience

32

P. Mishra et al.



log p(y|x, θ )

(x,y)∈T

This is multi-class classification. Since f (x, l, θ) be used to describe the score for every Ith label in example x, corresponding to the training window. Hence, f (x, l, θ) is interpreted as conditional probability P (l | x, θ). Now, using the softmax regression operation, e f (x,l,θ ) P(l|x, θ ) =  f (x, j,θ ) e

(3)

We define the log-add operation as logi add z i = log

 

 e

zi

i

Hence, the log-likelihood for the training (x, y) is log p(y|x, θ ) = f (x, y, θ ) − log j add f (x, j, θ )

(4)

Here, f (x[1:T] , l, t, θ) is the output score of sentence x[1:T] —where l is tag, t is time, θ are the parameters. The Bio-NER contains a concern of the score for every path of the label because of the dependency among the tags within the same sentence. Thus for the interpretation of the output, we must always contemplate the dependencies between the labels. The score of sentence x, on the trail, be the total of 2 components. First, the mentioned node scores then the second, transition scores Alj, that is chance of transformation from label one to j. Here, θ ∼ is denoted as all the parameters including Alj and θ. For sentence x[1:T] , the score of the path with the labels l[1:T] is T       W x[1:T ] , l[1:T ] , θ ∼ = A[t−1]l[t] + f x[1:T ] , l[t] , t, θ

(5)

t=1

The log conditional probability for taking the real labelled path, y[1:T] be   log P(y[1:T ] |x[1:T ] , θ ∼ ) = W x[1:T ] , y[1:T ] , θ ∼ − log∇l[1:T ] addW (x[1:T ] , l[1:T ] , θ ∼ (6) During training stage, to maximize the  (x,y∈T )

log P(y[1:T ] |x[1:T ] , θ ∼ ),

Deep Learning Based Biomedical Named Entity Recognition Systems

33

all the parameters θ ∼ are trained over (x[1:T] , y[1:T] ). In inference procedure, the Viterbi algorithm is used to come across   argl[1:T ] maxW x[1:T ] , l[1:T ] , θ ∼

(7)

3.3 Stochastic Gradient The simplest optimisation algorithms to attenuate a formula ar the Gradient descent algorithms. Considering the massive value of computation, we have a tendency to choose the random gradient [38] optimizing technique. The new worth of θ is computed in every iteration step for Associate in nursing example (x, y). θ ← θ + ∈  log p(y|x, θ )

(8)

 log p(y|x, θ ) shows the gradient of log p(y[1:T ] |x[1:T ] , θ )

(9)

with respect to θ and ∈ as a small positive constant where ∈ is the chosen learning rate.

4 Experiment The task of Bio-NER is to acknowledge the entities like diseases, viruses, proteins and genes and label them not ably in straightforward medicine text. The figure a pair of shows that each word in a very given sentence be taken as token and allied with the selected label. Here, the labels O, B-C or I-C not solely indicates the cluster however conjointly the placement of the token inside the Named Entities, wherever C is for class, B and that I are locations for starting associate in training inside an entity severally. There are five label categories: deoxyribonucleic acid, RNA, Protein, Cell_type and Cell_line. Here O indicates the token that isn’t an element of Named Entity. The check file is thought as the BIO notation in GENIA Corpus. 11 labels are enclosed victimization this BIO notation in Fig. 2. These tokens are assigned with one amongst the 11 labels within the result.

34

P. Mishra et al.

Fig. 2 Biomedical named entity recognition example

5 Result of Experiment and Its Analysis Unlabeled knowledge are collected from the PUMBED information mistreatment bio python and therefore the keywords chosen for looking are ‘drug’, ‘protein’, ‘interaction’, ‘cell_type’ and ‘DNA’. We take into thought 339,074 papers from the pumbed information and 294,893 documents amongst them have abstracts. Whole 430 MB file is employed as our unlabeled knowledge. We have a tendency to use the Word2vec tool to use our skip gram language model. As a result, 205,914 words with 600 dimension vectors are incorporated in our word lexicon S. we have a tendency to additionally take into thought the POS tagger tools. This tool is deliberately planned for the medicine texts as a result of the options of medicine text are quite completely different from the opposite articles. GENIA Corpus is applied during this experiment wherever exactness, recall and F-score are selected for analysis. Exactness be variety |the amount |the quantity of Named Entities properly detected and divided by the whole figure of Named Entities known by system. Recall is quantity of Named Entities properly detected and divided by the quantity of Named Entities enclosed within the input text that is that the harmonic performance of a system. F − Scor e =

2(Pr ecision ∗ Recall) (Pr ecision + Recall)

According to different systems, the classes like super molecule and polymer have the very best F-Score. The number of each entity within the coaching knowledge is shown in Table 1. Once examination Table 1 and Fig. 3, the class ‘cell-type has the tiniest coaching knowledge set however has highest preciseness and second highest F-Score. In figure four, square measure able to see that there are twelve-tone music ‘B-DNA’ wrong labelled words into ‘B-Protein’ that includes a larger count than different medicine classes. We found 2 major reasons when researching on the coaching data: medical specialty Named Entities are composed of the many nested named entities. as an example,

Deep Learning Based Biomedical Named Entity Recognition Systems Table 1 Major entity categories and performances

35

Category

Precision

Recall

F-Score

Protein

0.6389

0.8062

0.7128

DNA

0.6427

0.6761

0.6590

RNA

0.6050

0.6102

0.6076

Cell_type

0.7344

0.7356

0.7351

Cell_line

0.5008

0.6160

0.5524

Overall

0.6486

0.7610

0.7004

Fig. 3 Major entity categories and training data of words contained for each category

words like, ‘Viruses’, ‘Epstein-Barr’, ‘protein’, ‘cell’, ‘EBV’ are in each the entities but belong to totally different classes in Fig. 4. It is found that these words might come into view at different positions according to categories. The BMESO notation is applied to utilize this information since BIO notation cannot present such information (Fig. 5). BMSEO notation is analogous to BIO notation which supplies elaborated depiction of the position of every word within the entities. Here B indicated the start of entity and E is the finish of that object. Words amid B and E are denoted as M. If the entity is singular, it shall be denoted as S. Second reason is the need of training

36

P. Mishra et al.

Fig. 4 Error distribution

Fig. 5 NERs and labels examples

set of the labels as well as the entities that don’t come into view in training set. The ultimate results on GENIA file is listed in Table 2. After comparing with the results of various experiments like Saha et al.’s [39] with a Precision of 68.12, Recall 67.66 and F-Score 67.89; Liao et al.’s [40] with a Precision of 72.8, Recall 73.6 and F-Score 73.2; ABNER [41] with a Precision of 69.1, Recall 72.0 and F-Score 70.5; Sasaki et al.’s [42] with a Precision of 68.58, Recall 79.85 and F-Score73.78; Sun et al.’s [43] with a Precision of 70.2, Recall72.3 and F-Score71.2; Our system has achieved a Precision of 66.54, Recall

Deep Learning Based Biomedical Named Entity Recognition Systems Table 2 Comparison with state of the art systems

37

Teams

Precision

Recall

F-Score

Saha et al. [39]

68.12

67.66

67.89

Liao et al. [40]

72.8

73.6

73.2

ABNER [41]

69.1

72.0

70.5

Sasaki et al. [42]

68.58

79.85

73.78

Sun et al. [43]

70.2

72.3

71.2

Our results

66.54

76.13

71.01

76.13 and F-score 71.01% on GENIA standard test corpus, which be nearly the state-of-the-art performance. However, the biomedical dictionary changes every day and will be different due to changing tasks and corpora.

6 Conclusion and Future Scope In this book chapter, we have enforced a compound layer neural network on medicine Named Entity Recognition system. Results that are achieved square measure getting ready to state-of-art performance. There’s a scope of any improvement of the performance of neural network. The belief of the left boundary word is crucial and not word or the subsequent words are tagged incorrectly too. Reverse recognition with forward recognition can be explored in future for better accuracy of the system.

References 1. Lim, S., Lee, K., Kang, J.: Drug drug interaction extraction from the literature using a recursive neural network. PLoS ONE 13(1), e0190926 (2018) 2. Lee, K., Hwang, Y., Kim, S., Rim, H.: Biomedical named entity recognition using two-phase model based on Svms. J. Biomed. Inform. 37(6), 436–447 (2004) 3. Hettne, K.M., Stierum, R.H., Schuemie, M.J., Hendriksen, P.J., Schijvenaars, B.J., Mulligen, E.M.V et al.: A dictionary to identify small molecules and drugs in free text. Bioinformatics. 25(22), 2983–2991 (2009) 4. Song, M., Yu, H., Han, W.S.: Developing a hybrid dictionary-based bio-entity recognition technique. BMC Med. Inform. Decis. Mak. 15(1), S9 (2015) 5. Fukuda, K.I., Tsunoda, T., Tamura, A., Takagi, T. et al.: Toward information extraction: identifying protein names from biological papers. In: Pac sympbiocomput. vol. 707, p. 707–718 (1998) 6. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: HLT-NAACL. The Association for Computational Linguistics. p. 260–270 (2016) 7. Pyysalo, S., Ginter, F., Moen, H., Salakoski, T., Ananiadou, S.: Distributional semantics resources for biomedical text processing. In: Proceedings of the 5th International Symposium on Languages in Biology and Medicine, Tokyo, Japan. p. 39–43 (2013)

38

P. Mishra et al.

8. Kim, J.D., Ohta, T., Tsuruoka, Y., Tateisi, Y., Collier, N.: Introduction to the bio-entity recognition task at JNLPBA. In: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications. Association for Computational Linguistics. p. 70–75 (2004) 9. Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997). Available from: https:// doi.org/10.1023/A:1007379606734 10. Collobert, R.: Deep learning for efficient discriminative parsing. In: International Conference on Artificial Intelligence and Statistics (2011) 11. Dai, H., Chang, Y.C., Tsai, R.T.Z.H., Hsu, W.: New challenges for biological text- mining in the next decade. J. Comput. Sci. Technol. 25(1), 169–179 (2010) 12. Krallinger, M., Morgan, A., Smith, L., Leitner, F., Tanabe, L., Wilbur, J., Hirschman, L., Valencia, A.: Evaluation of text-mining systems for biology: overview of the second biocreative community challenge. Genome Biol. 9(2) (2008) 13. Dai, H., Huang, C., Lin, R., Tsai, R., Hsu, W.: Biosmile web search: a web application for annotating biomedical entities and relations. Nucleic Acids Res. 36, 390–397 (2008) 14. Rebholz-Schuhmann, D., Arregui, M., Gaudan, S., Kirsch, H., Jimeno, A.: Text processing through web services: calling Whatizit. Bioinformatics. 24(2) 296–300 (2008) 15. Si, L., Kanungo, T., Huang, X.: Boosting performance of bio-entity recognition by combining results from multiple systems. In: Proceedings of the 5th International Workshop on Bioinformatics ACM (2005), pp. 76–83 16. Tsuruoka, Y., Tateishi, Y., Kim, J.-D., Ohta, T., McNaught, J., Ananiadou, S., Tsujii, J.I.: Developing a robust part-of-speech tagger for biomedical text. In: Advances in Informatics. Springer (2005), pp. 382–392 17. Vlachos, A.: Evaluating and combining biomedical named entity recognition systems. In: BioNLP 2007: Biological, Translational, and Clinical Language Processing (2007), pp. 199–206 18. Li, L., Zhou, R., Huang, D.: Two-phase biomedical named entity recognition using crfs. Comput. Biol. Chem. 33(4), 334–338 (2009) 19. Li, L., Fan, W., Huang, D.: A two-phase bio-ner system based on integrated classifiers and multi-agent strategy. IEEE/ACM Trans. Comput. Biol. Bioinf. 10(4), 897–904 (2013) 20. Lee, S., Kim, D., Lee, K., Choi, J., Kim, S., Jeon, M., et al.: BEST: next-generation biomedical entity search tool for knowledge discovery from biomedical literature. PLoS ONE 11(10), e0164680 (2016) 21. Proux, D., Rechenmann, F., Julliard, L., Pillet, V., Jacq, B.: Detecting gene symbols and names in biological texts. Genome Inform. 9, 72–80 (1998) 22. Tsai, R.T.H., Sung, C.L., Dai, H.J., Hung, H.C., Sung, T.Y., Hsu, W.L.: NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition. In: BMC bioinformatics. BioMed Central. 7, S11 (2006) 23. Ju, M., Miwa, M., Ananiadou, S.: A neural layered model for nested named entity recognition. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). vol. 1, p. 1446–1459 (2018) 24. Do˘gan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inform. 47, 1–10 (2014) 25. Crichton, G., Pyysalo, S., Chiu, B., Korhonen, A.: A neural network multi-task learning approach to biomedical named entity recognition. BMC Bioinform. 18(1), 368 (2017) 26. Zheng, J.G., Howsmon, D., Zhang, B., Hahn, J., McGuinness, D., Hendler, J et al.: Entity linking for biomedical literature. In: Proceedings of the ACM 8th International Workshop on Data and Text Mining in Bioinformatics. ACM. p. 3–4 (2014) 27. Tsutsui, S., Ding, Y., Meng, G.: Machine reading approach to understand Alzheimers disease literature. In: Proceedings of the Tenth International Workshop on Data and Text Mining in Biomedical Informatics (DTMBIO) (2016) 28. Bengio, R.D.Y., Vincent, P.: A neural probalilistic language model. In: NIPS. vol. 13 (2001)

Deep Learning Based Biomedical Named Entity Recognition Systems

39

29. Westion, R.C.A.J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: ICML (2008) 30. Collobert, J.W.R., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, A.P.: Natural language processing (almost) from scratch. JMLR (2011) 31. YoshuaBengio, R.E.D., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003) 32. Schwenk, H.: Continuous space language models. Comput. Speech Lang. 21(3), 492–518 (2007) 33. Mikolov, T., Karafiat, M., Burget, L., Cernocky, J., Khudanpur, S.: Recurrent neural network based language model. In: Eleventh Annual Conference of the International Speech Communication Association (INTERSPEECH) (2010), pp. 1045–1048 34. Mnih, A., Teh, Y.W.: A fast and simple algorithm for training neural probabilistic language models. In: Proceedings of the 29th International Conference on Machine Learning (ICML-12) (2012), pp. 1751–1758 35. Collobert, R.: Deep learning for efficient discriminative parsing. In: International Conference on Artificial Intelligence and Statistics (AISTATS) (2011) 36. Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (2010), pp. 384–394 37. Yih, W.T., Mikolov, T., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (2013) pp. 746–751 38. Bottou, L.: Stochastic gradient learning in neural networks. In: Proceedings of Neuro-Nimes, vol. 91 (1991) 39. Saha, S.N.S.K., Sarkar, S., Mitra, P.: A composite kernel for named entity recognition. Pattern Recogn. Lett. 3, 1591–1597 (2010) 40. Liao, Z., Wu, H.: Biomedical named entity recognition based on skip-chain crfs. In: Industrial Control and Electronics Engineering (ICICEE), 2012 International Conference on. IEEE (2012), pp. 1495–1498 41. ABNER: A Biomedical Named Entity Recognizer (2013), pp. 46–51 42. Sasaki, Y.T.Y., McNaught, J., Ananiadou, S.: How to make the most of ne dictionaries in statistical ner. In: Proceedings Workshop Current Trends in Biomedical Natural Language Processing (2008), pp. 63–70 43. Sun, C., Guan, Y., Wang, X., Lin, L.: Rich features based conditional random fields for biological named entities recognition. Comput. Biol. Med. 37, 1327–1333 (2007)

Pragatika Mishra is an M. Tech in Computer Science and Engineering from Biju Patnaik University of Technology. She has around 2 years of experience in teaching under-graduate students. Her area of research interests are Artificial Intelligence and Machine Learning. Sitanath Biswas has done M.E (CSE) from Utkal University and currently pursuing Ph.D. from North Orissa University, Baripada, Odisha. He is currently working as Asst. Prof. in Gandhi Institute for Technology, Bhubaneswar, Odisha. He has over 14 years of experience in Teaching and Research. He has published over 18 research papers in various international Journal of repute. His area of research is artificial Intelligence and Natural Language Processing. Sujata Dash received her Ph.D. degree in Computational Modelling from Berhampur University, Orissa, India in 1995. She is an Associate Professor in P.G. Department of Computer Science and Application, North Orissa University, at Baripada, India. She has published more than 150 technical papers in international journals, conferences, and book chapters of reputed publications. She has guided many scholars for their Ph.D. degrees in computer science. She is associated with many

40

P. Mishra et al.

professional bodies like IEEE, CSI, ISTE, OITS, OMS, IACSIT, IMS and IAENG. She is in the editorial board of several international journals and also reviewer of many international journals. Her current research interests include Machine Learning, Distributed Data Mining, Bioinformatics, Intelligent Agent, Web Data Mining, Recommender System and Image Processing.

Disambiguation Model for Bio-Medical Named Entity Recognition A. Kumar

Abstract Discovery of biomedical named entities is one of the preliminary steps for many biomedical texts mining task. In the biomedical domain, typical entities are present, including disease, chemical, gene, and protein. To find these entities, currently, a deep learning-based approach applied into the Biomedical Named Entity Recognition (Bio_NER) which gives prominent results. Although deep learningbased approach gives a satisfactory result, still a tremendous amount of data is required for training because a lack of data can be one of the barriers in the performance of Bio_NER. There is one more obstacle in the path of Bio_NER is polysemy or misclassification of the entity in bio-entity. Which means one biomedical entity might have a different meaning in different places, i.e., a gene named entity may be labeled as disease name. When Conditional Random Field combined with deep learning-based approach i.e. Bidirectional Long Short Term Memory (Bi-LSTM), It mistakenly labeled a gene entity “BRCA1” as a disease entity which is “BRCA1 abnormality” or “Braca1-deficient” present in the training dataset. Similarly, “VHL (Von Hippel-Lindau disease),” which is one of the genes named labeled as a disease by Bi-LSTM CRF Model. One more problem is addressed in this chapter, as biomed domain, entities are long and complex like cell whose name is “A375M (B-Raf (V600E)) is a human melanoma cell line”, in this biomedical entity, multiple words are present, but still it is difficult to find the context information of this particular bio-entity. For lack of data and entity misclassification problem, this chapter embeds multiple Bio_NER models. In the proposed model, the model trained with different datasets is connected so that the targeted model obtained the information by combining another model, which reduce the false-positives rate. Recurrent Neural Network (RNN) which is dependent upon the Bi-LSTM gates are introduced to handle the long and complex range dependencies in biomedical entities. BioCreative II GM Corpus, Pubmed, Gold-standard dataset, and JNLPBA dataset are used in this research work.

A. Kumar (B) Department of Computer Science and Engineering, National Institute of Technology Raipur, Raipur, Chhattisgarh 492010, India e-mail: [email protected] © Springer Nature Switzerland AG 2020 S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics, Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_3

41

42

A. Kumar

Keywords Information extraction · Bio-Medical Name Entity Recognition (Bio-NER) · Conditional Random Field (CRF) · Deep learning · Machine Learning (ML) · Long Short Term Memory Network (LSTM) · Text mining

Abbreviations CRF LSTM BILSTM BioNER NER MTM WE CE

Conditional random field Long short term memory Bidirectional long short term memory Biomedical named entity recognition Named entity recognition Multi task model Word embedding Character embedding

1 Introduction As the internet is growing day by day, biomedical text data is also increasing, and for access meaningful and important information from various biomedical text data, a strong technique is required. In this chapter, Named Entity Recognition (NER), which is a technique of information extraction (IE) and a part of text mining is used in this research. The named entity recognition (NER) is required substances to label the text dataset. NER automatically recognizes name entity in natural language in the domain of interest. In biomedical text data, many identities are required to label like gene name, protein name, disease, chemical, medication. In the recent past, most of the researcher focuses on protein and gene items extraction, whereas some research going on disease entity extraction. Biomedical named entity recognition is somewhere difficult to normal named entity recognition, because of the reason that biomedical named entity recognition have a verity of the alias, abbreviation, verity in naming convention and organism which may refer as the same name of protein or genes with term which refer different biological entities. In the example, one biomedical entity named called p53, which refer as a protein named in one context. Similarly, p53 also refers to a molecular weight of protein with 53 KD. For tackle, this type of problem, different approaches of named entity recognition has been applied on biomedical text name as a rule-based approach, dictionary-based approach, and machine learning-based approach. As we know, thousands of biomedical pieces of literature published in thousand of the journal every day, which emerges new terms and spelling variation of an existing biomedical word. The rule-based and dictionary-based approach of named entity recognition is not suitable because of

Disambiguation Model for Bio-Medical Named Entity Recognition

43

less prediction power. Here ML-based approach comes into the picture. The MLbased approach is more reliable and robust for biomedical named entity recognition because it has capabilities to handle a data of high dimensional vector features for text processing and it can predict new terms or variations depends on learning pattern. For training, a reliable and high performance named entity recognition model is required, which is capable of fully capture the words in the context. Biomedical NER was developed for the use of various linguistic features characteristics of the word like lemmatization and stemming, morphological features like prefixes or suffixes, word shape, character weight, etc., orthographic features like word formation, symbols, digits, etc., contextual features like word windows and conjunction. Binary encoding sets of feature is used for an input of ML to train the algorithm of Named entity recognition model with the involvement of annotation of named entity mentioned in training dataset. Recently past year most of the researchers work only the single domain such as protein, or a gene or a disease or a chemical name but none of the research as described in the literature has been done for all four datasets together. This research chapter considered multiple domains (protein, gene, disease, and chemical name) together so that we can automatically recognize and labeled the correct entity in a given text. The main goal of this research is to handle polysemous words, which is the main cause of lower recall. The model which mention in this chapter is combined handles all four type of domain dataset in a single model.

1.1 Rule-Based Approach The rule-based approach deal with the orthographic and morphological structures. Compare the rule-based approach with the dictionary-based approach; the rule-based approach performs better as the comparison with the directory-based approach. In, a character string is used to identify the term followed by the rules and the handcrafter patterns to concatenation the adjacent words of a named entity. The drawback with the rule-based approach is, it highly depended upon the domain-based named entities which have common morphological and orthographic characteristics. Depending on handcrafted features and inappropriate for a new domain or naming convention, switch with the other approaches.

1.2 Dictionary-Based Approach The dictionary-based approach used to find the entire name entity from a given text by the dictionary, and various terminology has applied on bio-med text mining. An instance of “HUGO,” is a terminology which provides 21,000 gene entities of human. UniProt database of the Swiss-Prot, which contains 180,000 records of protein, has been frequently used. BioThesaurus include the compilation of several of million

44

A. Kumar

genes and protein mapped into the UniProt entries used by cross-reference in the database of iProclass. Unlike a machine learning-based approach, the significant advantage of dictionary-based approach over the machine learning-based approach used an external identifier for built each entry which provides metadata to the annotation extracted names. However, this approach suffers various challenges, including false positive, due to the cause of ambiguity in the name. Spelling variations and synonyms covered by the false negative. This approach depends on the curation and creation of lexicon to the particular domain, which contains millions of entities. To solve the problem of spelling variation, Tsuruoka et al., use the variant generator and string searching and method for achieving improved F-Score on GENIA corpora compared by the exact matching algorithm [1, 2].

1.3 Machine Leaning Based Approach Machine learning-based approach is one of the best and frequently used in the area of text mining. BioCreative II protein or gene tasks achieved the best performance by using a machine learning-based approach. Different type of supervised learning like a Support Vector Machine (SVM) [3], Hidden Markov Model (HMM) [2], CRF [4], MEMMs [5], Cased-based [6] have used in named entity recognition. Supervised learning methods utilizes only annotated text corpus. To resolve the sparseness of data issue, which encountered during the use of a large set of features on a minimal dataset of training. Recently few semi-supervised learning methods used for large size of unannotated text corpora. The vital part of the ML approach is an appropriate selection of features set, which is represented by the named entity. Mostly used features are morphological patterns, parts of speech (POS) tagging, orthographical words pattern formation, tokenization, lemmatization, and conjunction of contextual features. In recent, the importance of deep learning-based methods is demonstrated by the various studies. The ability of Recurrent Neural Network (RNN) is shown by a Sahu and Anand [7] for biomedical text named entity recognition. The model proposed by Sahu and Anand is the combination of Conditional Random Field (CRF) with Bi-directional Long Short-Term Memory (Bi-LSTM), used character level (Cl) and word level (WL) embedding but they did not describe the benefits of CL and WL embedding with Bi-LSTM-CRF model. Habib et al., [8], merged the Bi-LSTM-CRF model Lample et al., [9] with word embedding of Pyysalo et al., [10]. Habibi used CL based word embedding for capturing characteristics like an orthographic feature of bio-medical entities. Habibi et al., illustrate the potentiality of character-level word embedding in Biomedical Named entity recognition. Although the given models showed the prominent result, still a very challenging task in the area of biomedical named entity recognition remains. First, to deal with a small amount of training data, which is available for Biomed NER task. A Gold Standard datasets are consist of only one or two types of annotation of the entity. NCBI corpora [11] contain only diseases annotation only, and this corpus does not contain any other types of an entity like

Disambiguation Model for Bio-Medical Named Entity Recognition

45

gene and proteins. Whereas in JNLPBA corpora [12], consist annotations of gene and proteins only. Therefore, a small amount of total annotated data is compromised for each entity. Discuss multitask learning model, which is used to train a single model for multiple tasks at the same periods. MTL can influence by distinct datasets collected for different but related task [13]. Although the extraction of gene entity is entirely different as compare to chemicals entity. Both the task requires the learning of some standard features which can help to access the linguistic expression of biomedical text. Crichton et al., [14] developed a multitask learning model which was trained by the various datasets that contain annotation of different types of entities. MTL model proposed by Wang et al., [15] performs better as compare to other states of the art methods, single task named entity recognition models. This much of literature review inspire us for the proposed model, proposed model is a combination of multiple models. As previous conventional multitask learning method which only uses a single-task model. The proposed model trained different datasets for different tasks. The proposed model is used to train an annotated dataset for a particular type of entity so that it becomes trained for its own entity type. The major drawback in multitask learning methods are, it produces high recall and low precision value. So multitask learning method based models, train multiple types of entities and having a more extensive training dataset. The coverage of various biomedical entities is broader, which resulting in a higher recall. On the other side, MTL based models trained a combination of different type of entities, which create difficulty to differentiate among a different kind of entity, which results lower precision value. One more reason for that named entity recognition is said to be difficult in the field of the biomedical domain is that NER labeled as a different entity type based on the textual context. In this chapter, observed that many false prediction tents to the polysemy problem. For example, a word can use as a disease name and a gene name. Model designed to labeled disease entity mistakenly labeled gene as a disease this mistified problem of entity tends towards the false positive rate. Example, BI-LSTM-CRF models for labeling disease type of entity incorrectly label the gene name “BRCA1” as a disease type of entity because there exist disease name as a “BRCA1 abnormalities” or “brca1 deficient” in the training dataset. Besides, in training data set one annotates as “VHL” (Von Hippel-Lindau disease) is a disease entity which confuses the model because “VHL” also used for the gene name and the after the mutation of the gene is converted into a disease. For solving the false positive which is arises due to the polysemous words, a proposed model is introduced, in which “BRCA1” utilize the outputs of a chemical and gene models. Once it predicts as a gene, it informs to the disease model that it identifies “BRCA1” as a gene, so that disease model will not need to predict as a disease. In the proposed model, each model is trained individually of its entity type and further train with the output of another model to train the other kind of entity. The remaining chapter is organized as follows: Sect. 2 describes the basic concepts of Conditional Random Field (CRF), Long Short Term Memory (LSTM) and Bidirectional Long Short Term Memory (BILSTM) which is used in the field of Deep Learning; In Sect. 3 proposed methodology is present by using biomedical datasets;

46

A. Kumar

In Sect. 4 dataset description and evaluation matrix is described; In Sect. 5 proposed model is compare with the existing multi-task model (MTM) for biomedical named entity recognition; and finally Sect. 6, gives the conclusion of this chapter along with its future work.

2 Background In the following section, the deep techniques has been applied on biomedical named entity recognition. The brief introductions about three approaches are as follows.

2.1 Deep Learning Technique Deep learning is a part of an artificial neural network technique and a subclass of machine learning. In deep learning, multiple layers used for a higher level of feature from the input dataset. LSTM (Long Short Term Memory) used in the field of deep learning and is part of the recurrent neural network. Opposite of feedforward neural networks, LSTM contain feedback connection also. LSTM is capable to process single as well as the sequence of data like video or speech. LSTM Contain a cell, an output gate, an input gate, and a forget gate the cell is used for remembering the values and the remaining three gates operate the flow of information.

2.1.1

CRF Model

Conditional Random Field is a probabilistic graphical model which is generally used for sequence tagging task like named entity recognition, Object Recognition. Part of Speech (POS) Tagging, etc. CRF is conditionally trained a model which is capable of working with huge amount of nonindependent features. Despite of discrete classifier, CRF has a special property of considering neighboring examples.

2.1.2

LSTM Network

Long Short Term Memory (LSTM) is the Recurrent Neural Network (RNN) based neural network which efficiently managed variable-length inputs. Research has proven that RNN is useful in various NLP tasks like speech recognition, language modeling, and machine translation [16, 17], RNN based LSTM variants are mostly used [18]. The proposed model uses the LSTM framework from Graves et al., [16]. The following steps are used to calculate the hidden states by given the output of the embedding layer.

Disambiguation Model for Bio-Medical Named Entity Recognition

47

it = σ (Wxi xt + Whi ht−1 + bi )

(1)

  f t = σ Wx f xt + Wh f ht−1 + b f

(2)

ct = f t  ct−1 + it  tanh(Wxc xt + Whc ht−1 + bc )

(3)

ot = σ (Wxo xt + Who ht−1 + Wco ct + bo )

(4)

ht = ot  tanh(ct )

(5)

where logistic hyperbolic tangent function and sigmoid function and denoted as tanh and σ respectively and  use for element-wise product. Forward LSTM is used to extract represent of input in a forward direction and backward LSTM, which represent the input in a backward direction. The concatenation of forward and backward LSTM create the hidden state which is proposed by Schuster and Paliwal [19], and it was frequently is used in various sequence encoding task.

2.1.3

BILSTM-CRF Network

Bi-directional Long Short Term Memory (BILSTM) network handles backward dependency issue, long term dependency problem, modeling dependency for adjacent output tags to enhance the performance of sequence labeling models [15]. Conditional Random Field (CRF) is applied just after the output layer of BI-LSTM to capture dependencies. BILSTM-CRF network model architecture shows in Fig. 1. In the input layer, words are taking in the form of tokens, and then these tokens are passes through the BILSTM layers. The output of BILSTM layer goes to CRF

Output Layer

Bidirectional Long Short Term Memory Layer

CRF Algorithm

LSTM

LSTM

LSTM

LSTM

LSTM

LSTM

LSTM

Backward LSTM Forward LSTM

Input Layer

Fig. 1 BILSTM-CRF network model architecture

LSTM

Word2Vec Representation

48

A. Kumar

model, where the CRF model tags the input tokens sequence according to tagging scheme. The probability of each label given in the sequence S = w1 . . . , wn are calculated by the following equations. z t = W y h bi t + by

(6)

p(yt |w1 . . . , wn ; Θ) = so f tmax(z t )

(7)

  exp a j so f tmax a j =  k exp ak where W y and b y shows in Eq. (6) are the parameters of fully connected layer for BIO tagging scheme, and to calculate the probability of each tag, softmax (.) function is applied. Based on probability p from Eq. (7) the training objective is to minimize by following steps. L L ST M = −

N 

log p(yt |w1 , . . . , wn ; Θ)

(8)

t=1

LC RF = −

T  

A yt−1, yt + z t,yt



(9)

t=1

Loss = L L ST M + L C R F

(10)

where L L ST M is use for cross entropy loss for the label yt and L C R F stands for the negative sentence-level log likelihood. A yt−1, yt , z t,yt shows the transition and emission score respectively and summation of A yt−1, yt , z t,yt gives the tag score.

3 Methodology This section describes the architecture of proposed model. The combination of multiple datasets like NCBI [11], BC5CDR [20], JNLPBA [21], BC5CDR [22] are considered as an input dataset Fig. 2 shows the architecture of the proposed model. The following steps describe the proposed model. 1. All the biomedical dataset first combine and sent it to the individual model. 2. Each model trains the dataset according to its bio-entity type and send it to the max pooling function. 3. The function of max pooling is to progressively reduce the dataset size of the representation to reduce the number of parameters and computation in the network. Pooling layer operates on each feature map independently.

Disambiguation Model for Bio-Medical Named Entity Recognition

49

Fig. 2 Architecture of the proposed model

4. The activation function introduced nonlinearity in the output of the neuron. Then it sends to the targeted model. 5. The proposed model again combine with Conditional random field (CRF) to give the sequential tagged output. 6. Target output will give the annotated tagged dataset. In this chapter, Deep Learning concept is introduced. To handle the deep learningbased method a very big amount of dataset are required and biomedical dataset are capable to fulfill the requirement of deep learning-based methods. The advantage of using the deep learning-based methods in the biomedical named entity recognition is it reduce the probability of error. It will later on discuss in the evaluation section.

50

A. Kumar

Table 1 Biomedical database description S. no.

Corpus

Entity type

1.

NCBI-disease [11]

Disease

2.

JNLPBA [21]

3.

BC5CDR [14]

4.

BC4CHEMD [15]

#annotation

#sentences

Data size

6881

7639

793 abstract

Gene/protein

35,336

22,562

2404 abstract

Disease

12,852

14,228

1500 article

Disease

84,310

86,679

10,000 abstract

4 Evaluation 4.1 Dataset Description In this section, four biomedical datasets are considering for experimental research named as NCBI [11], BC5CDR [20], JNLPBA [21], BC5CDR [22] all four mentioned datasets are collected by the Chichton et al., [14]. These four datasets constructed from MEDLINE abstracts [23] and each dataset concentrate one of the three biomedical entity type gene or protein, disease, and chemical. Cell type entity tags from JNLPBA did not consider in this research. All datasets consist of input sentences of the biomedical entity. JNLPBA contain training and testing dataset while remaining three contain development, training, and testing dataset. JNLPBA used a small part of training dataset as a development dataset, which is approximately equal to the size of test datasets. JNLPBA dataset from Crichton et al., [14] Contain split sentences. This chapter needs original dataset developed by Kim et al., [20] which contain more accurate sentence separation. The description of the datasets shown in Table 1.

4.2 Evaluation Metric To evaluate the performance of biomedical named entity recognition task, Information Extraction (IE) metrics is considered. To calculate precision (P), Recall (R) and F-Score or F-Measure (F1 ) defined Eqs. (11), (12), (13). Are follows: precision(P) = Recall(R) =

TP T P + FP

TP T P + FN

F − measur e(F1 ) =

2× P × R P+R

(11) (12) (13)

Disambiguation Model for Bio-Medical Named Entity Recognition

51

where: • TP stands for (True Positive) = total number of correct entities in sequence. • TP + FP stand for (False Positive) = total number of ground truth entities in sequence. • TP + FN stands for (False Negative) = total number of predictive entities in sequence.

4.3 Post Processing and Parameters Setting Post-processing step is applied to correct false BIOES sequences. These steps increases the precision approx 0.1–0.5%, and F1 score about 0.04–0.3%. Generally precision, recall, and F1 scores are used to evaluate the performance of the models. AdaGrad optimizer [24] in which the initial learning rate 0.01 is exponentially decayed for each epoch by 0.95. The dimension of the character level embedding (dchar ) kept 30 and dimension of the character level word embedding (dclwe ) was kept 200 * 3. 300 hidden units for both forward and backward LSTMs are used. Dropout [25] is applied into two parts of the proposed model: outputs of CLWE (0.5) and BILSTM (0.3). The minimum batch size of experiment was 10 parameter settings are mostly same as Wang et al., [3]. Only very few settings differ from the parameter of Wang like dropout rate etc. parameter is using for validation sets only.

5 Result and Discussion Table 2, shows the comparison of the experimental result between multitask-learning model and the proposed model. BC5CDR-Disease dataset is used by Wang et al., [26] for an experiment. Wang tests his model repeatedly on BC5CDR-disease dataset Table 2 Performance comparison S. no.

Datasets

Proposed model

Wang et al., [26]

Precision

Recall

F1 score

1.

NCBI-disease

84.48

87.27

86.36

2.

JNLPBA

74.43

83.22

78.58

70.91

76.34

73.52

3.

BC5CDR-disease

85.61

82.61

84.08

*83.73

*82.93

*83.33

4.

BC4CHEMD

90.78

87.01

88.85

91.30

87.53

89.37

5.

Average

84.07

85.03

84.47

82.95

83.30

83.09

Precision 85.86

* The experiment that conducted not to borrow from the orignal paper

Recall 86.42

F1 score 86.14

52

A. Kumar

to compare his model with the other models. The iterative result denoted by the asterisks symbol in the table. The proposed model performs ten times with ten different initialization and then take the arithmetic mean of all the four datasets to evaluate the performance of each model. The proposed model as shown in Table 2, gain higher precision as well as F1 score as compare to MTM model on all datasets. The proposed model able to improve both precisions as well as recall. The proposed model also performs better as compared to the Multi Task Model (MTM) from Wang et al., [15] on four datasets. The proposed model consists of the expert training model for each entity type, which further enhances biomedical named entity recognition performance. When the proposed model compared with baseline models, the Proposed model achieves higher precision on average. Even though if the slight increase in recall, the increase in precision is more valuable than that of recall when considering the practical use of the bio-NER model. The strong probability of repeating important information in a large size corpus, but it may not create any problem in the performance of the named entity system it will be compensated in another place. However, false information and error propagation can affect the entire system. Recognizing biomedical entity as different bio-entity type is the type of bioentity error. For instance, ‘VHL’ a gene recognize as a disease when it was used in the sentences is a type of bio-entity error. The interesting thing is, bio-entity error generally occurs when the bio-entities are confusing or entity contain multiple words (e.g. BRCA1). The error comes out from MTM are 4334 whereas proposed model on four datasets (BC5CDR-disease, BC4CHEMD, JNLPBA, NCBI) gives 3966 which is 368 less as compared to the MTM Model. Proposed shows the best performance in error analysis. The inaccuracy investigation on STM which is a single LSTM-CRF model shows a lot of errors while classified bio-entity in JNLPBA. It contains 49.3% of total errors of JNLPBA. The error investigates on the MTM model is a bio-entity error which contains 1333 out of 4334 errors which are 38% of incorrect error. The bio-entity type of error is much greater as comparison to the other type of errors like a span error which was the most common error type, which contain 38% of incorrect errors. While most span errors tend from subjective annotations or can be easily fixed by non-experts, bio-entity errors are difficult to detect, even for biomedical researchers. Also, for biomedical text mining methods, such as drug-drug interaction extraction, span errors can cause minor errors. bio-entity errors could lead to entirely different results.

Disambiguation Model for Bio-Medical Named Entity Recognition

53

In the proposed model, every expert model trained single entity type dataset, and the output of the training data is concatenated with word embedding. Other expert models share knowledge to the targeted model as shown in Fig. 2 so that the bio-entity type error problem will reduce. Table 2 shows and thus, 736 errors are bio-entity errors, covers 18.6% of all the errors.

6 Conclusions Conclude this paper with the introduction to the proposed model, which contains multiple bidirectional LSTM-CRF (BILSTM-CRF) Model for recognition of biomedical entities. Most of the state of the art methods are capable of handling only a single type of entity. The proposed model can handle multiple datasets along with higher F1 Scores. Dissimilar to the multi-task models, Proposed model used various single task NER models, which relay more information to other models for achieving the highest precision. To enhance the performance over multi-task models, Proposed model categorized biomedical entity which is polysemous or which have the same orthographic feature. As a result show, Proposed model achieving excellent results as a comparison to the related work proposed on four BioNER datasets in term of precision, recall, and F1 Score. Although there is some computational overhead in this proposed model, when it gives an accurate result, it does not make any sense. This proposed model will be imposed on a geospatial dataset in future. Acknowledgements The authors would like to thank the National Institute of Technology Raipur for providing necessary infrastructure and facility for doing research.

References 1. Zhong, H., Hu, X.: Disease named entity recognition by machine learning using semantic type of metathesaurus. Int. J. Mach. Learn. Comput. 3(6), 494–498 (2014) 2. Collier, N., Nobata, C., Tsujii, J.: Extracting the names of genes and gene products with a hidden Markov model, vol. 1. In: Proceedings of the 18th Conference on Computational Linguistics, pp. 201–207 (2000) 3. Zhou, G.D.: Recognizing names in biomedical texts using mutual information independence model and SVM plus sigmoid. Int. J. Med. Inf. 75(6), 456–467 (2006) 4. Lafferty, J., Mccallum, A., Pereira, F.C.N., Pereira, F.: Conditional Random Fields, pp. 282–289 (2001) 5. McCallum, A., Freitag, D., Pereira, F.C.N.: Maximum entropy markov models for information extraction and segmentation. In: Proceedings of the Seventeenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp. 591–598 (2000) 6. Neves, M.L., Carazo, J.-M., Pascual-Montano, A.: Moara: A Java library for extracting and normalizing gene and protein mentions. BMC Bioinf. 11(1), 157 (2010) 7. Sahu, S.K., Anand, A.: Recurrent neural network models for disease name recognition using domain invariant features. ArXiv E-Prints. arXiv:1606.09371 (2016)

54

A. Kumar

8. Habibi, M., Weber, L., Neves, M., Wiegandt, D.L., Leser, U.: Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics (Oxford, England) 33(14), i37–i48 (2017) 9. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. ArXiv E-Prints. arXiv:1603.01360 (2016) 10. Pyysalo, S., Ginter, F., Moen, H., Salakoski, T., Ananiadou, S.: Distributional semantics resources for biomedical text processing. In: Proceedings of the 5th Languages in Biology and Medicine Conference (LBM’13), pp. 39–44 (2013) 11. Do˘gan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inform. 47, 1–10 (2014) 12. Goulart, R.R.V., Strube de Lima, V.L., Xavier, C.C.: A systematic review of named entity recognition in biomedical texts. J. Braz. Comput. Soc. 17(2), 103–116 (2011) 13. Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997) 14. Crichton, G., Pyysalo, S., Chiu, B., Korhonen, A.: A neural network multi-task learning approach to biomedical named entity recognition. BMC Bioinf. 18(1), 368 (2017) 15. Wang, X., Zhang, Y., Ren, X., Zhang, Y., Zitnik, M., Shang, J., Langlotz, C., Han, J.: Crosstype biomedical named entity recognition with deep multi-task learning. ArXiv E-Prints. arXiv: 1801.09851 (2018) 16. Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: IEEE International Conference, Department of Computer Science, University of Toronto, no. 3, pp. 6645–6649 17. Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. ArXiv E-Prints. arXiv:1508.06615 (2015) 18. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 19. Song, M., Yu, H., Han, W.-S.: Developing a hybrid dictionary-based bio-entity recognition technique. BMC Med. Inform. Decis. Mak. 15(1), S9 (2015) 20. Kim, J.-D., Ohta, T., Tsuruoka, Y., Tateisi, Y., Collier, N.: Introduction to the bio-entity recognition task at JNLPBA. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 70–75 (2004) 21. Kim, J., Ohta, T., Tsuruoka, Y., Tateisi, Y., Collier, N.: Introduction to the bio-entity recognition task at JNLPBA, 70–75 (n.d.) 22. Krallinger, M., Rabal, O., Leitner, F., Vazquez, M., Salgado, D., Lu, Z., Leaman, R., Lu, Y., Ji, D., Lowe, D.M., Valencia, A.: The CHEMDNER corpus of chemicals and drugs and its annotation principles. J. Cheminf. 7(Suppl 1 Text mining for chemistry and the CHEMDNER track), S2–S2 (2015) 23. Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 147–155 (2009) 24. Campos, D., Matos, S., Oliveira, J.L.: Biomedical named entity recognition: a survey of machine-learning tools. In: Sakurai, S. (ed.) Theory and Applications for Advanced Text Mining (2012) 25. Fukuda, K., Tamura, A., Tsunoda, T., Takagi, T.: Toward information extraction: identifying protein names from biological papers. In: Pacific Symposium on Biocomputing, pp. 707–718 (1998) 26. Kim, S., Chen, J.Y., Cutello, V., Lee, D.: DTMBIO 2016: The Tenth International Workshop on Data and Text Mining in Biomedical Informatics. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 2511–2512 (2016)

Disambiguation Model for Bio-Medical Named Entity Recognition

55

Ashutosh Kumar completed his Bachelor of Engineering (B.E.) in 2014 from Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal in Computer Science and Engineering. He received his Master of Technology (M.Tech) degree in 2017 from the Central University of Rajasthan in Computer Science and Engineering. Currently, he is Ph.D. Research Scholar in National Institute of Technology Raipur. His area of research interest includes “Text mining for biomedical literature” and “Named Entity Recognition.”

Applications of Deep Learning in Healthcare and Biomedicine Shubham Mittal and Yasha Hasija

Abstract The increasing advancements and improvements in medicine and healthcare in the past few decades have ushered us into a data-driven era where a huge amount of data is collected and stored. With this change, there is a need for analytical and technological upgradation of existing systems and processes. Data collected is in the form of Electronic Health Data taken from individuals or patients which can be in the form of readings, texts, speeches or images. A means to Artificial Intelligence—‘Machine Learning’ is the study of models that computer systems use to self-learn instructions based on the weight of parameters without being provided explicit instructions. Parallelly with biomedical advancements in the past decade, it has been observed that there has been an increasing refinement of algorithms and tools of machine learning. Deep Learning is one of the more promising of these algorithms. It is an Artificial Neural Network that designs models computationally that are composed of many processing layers, in order to learn data representations with numerous levels of abstraction. Research suggests that deep learning might have benefits over previous algorithms of machine learning and its’ suggestive better predictive performance is, hence garnering significant attention. With their multiple levels of representation and results that surpass human accuracy, deep learning has particularly found widespread applications in health informatics and biomedicine. These are in the field of molecular diagnostics comprising pharmacogenomics and identification of pathogenic variants, in experimental data interpretation comprising DNA sequencing and gene splicing, in protein structure classification and prediction, in biomedical imaging, drug discovery, medical informatics and more. The aim of this chapter is to discuss these applications and to elaborate on how they are being instrumental in improving healthcare and medicine in the modern context. Algorithms of deep learning show an improved potential in learning patterns and extracting attributes from a complex dataset. We would first introduce deep learning and developments in artificial neural network and then go on to discuss its applications in healthcare and finally talk about its’ relevance in biomedical informatics S. Mittal · Y. Hasija (B) Delhi Technological University, Delhi, India e-mail: [email protected] S. Mittal e-mail: [email protected] © Springer Nature Switzerland AG 2020 S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics, Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_4

57

58

S. Mittal and Y. Hasija

and computational biology research in the public health domain. In the end future scope of deep learning algorithms would be discussed from a modern healthcare perspective.

1 Introduction In the last 10–15 years there has been a drastic advancement in data acquiring technologies in the field of life sciences, together with improvements in computational biology and techniques of digital storage which has transformed modern biology into a data-rich science from a data-poor one. Owing to this development, research today is data-driven and there are multiple potential solutions to a biological problem today, unlike before where one question granted one answer. Bioinformatics deals with assisting in the handling of this large dataset in different aspects, be it storing, extracting or analyzing data. Techniques of extracting data are further handled by computational biology techniques using programming and algorithms. This set of methods used to discover meaningful relationships, patterns and functions in biological data is called ‘data mining’. In the early 1990s, an area of study that became a popular part of Computational Science is ‘Soft Computing’. Soft computing as a term represents all the methodologies that provide flexible information processing capability while handling ambiguous real-life situations. While hard computing aims for precision, soft computing deals in the domain of partial truth, ambiguity, inaccuracy and approximation to obtain the solution for a problem [1, 2]. Earlier this was not possible and only simple systems could be precisely analyzed and modelled by computational approaches, while systems of medicine, biology, management studies, humanities and other similar fields of study which are more complex remained difficult to control by conventional analytical and mathematical methods. However, soft computing techniques complement each other and therefore biological processes are more closely resembled by soft computing techniques than traditional techniques that mostly work on logic, such as predicate logic and sentential logic. The main constituents of soft computing are—genetic algorithms, rough sets, fuzzy logic, neural networks, and signal processing tools such as wavelets. Of these, neural networks have a wide scope in terms of classifying and representing biological data computationally. Neural networks are strong and exhibit good learning and generalization abilities in data-rich environments [3]. The algorithms used in Neural networks are called Machine Learning Algorithms.

Applications of Deep Learning in Healthcare and Biomedicine

59

1.1 Machine Learning Finding answers to problems through computer program methods using experienced data is known as learning [4]. Several machine learning algorithms have been proposed in the last few decades. With the vast amount of biological data being generated every day across the world, there is a need for computational techniques to analyze such data. Such processing power is held by machine learning techniques which extract useful hidden relationships from great volumes of data. Essentially how machine learning algorithms work is—they analyze the whole data, regulate their internal structure to given data and generate hidden layers which give estimated models to study results from [5]. The major challenges that bioinformatics applications of machine learning algorithms, face today are two. The first is the availability of a smaller number of samples as data and the second is the fact that each sample in life science that is characterized by thousands of features. Machine learning algorithms have been categorized in the following manner: Supervised learning Model is trained on inputs and desired outputs with an aim to be able to precisely predict future output which is unknown. Data used is labelled and the job is known as ‘regression’ when the output of target is a continuous variable and as ‘classification’ when the same is a group of discrete values. Unsupervised learning/clustering Unlike supervised learning, unlabeled data is used in unsupervised learning. Also, the clusters of data are created based on matching similarity and closeness which help in further analysis of data. Semi-supervised learning This type of learning is performed by categorizing input data into both types—that is, training of the model is performed upon a small amount of labelled data and a large amount of unlabeled data. Reinforcement learning This model, unlike the previous learning models, is aimed at bettering online performance and lacks any kind of input and output data. Optimization Involves selecting the model that fits the data in the best possible way, that is selecting among the numerous possible models, that model which gives the most optimal result.

1.2 Artificial Neural Network An application of Artificial Intelligence (AI)—Machine Learning empowers systems with the capability of performing various tasks by self-learning automatically and

60

S. Mittal and Y. Hasija

improving with experience, without any specific programming. To do this, the system must be made conversant with a dataset, which is called ‘training data’. Majorly the two machine learning methods—supervised and unsupervised learning train an algorithm. At the time of training, a certain set of instructions are provided and it is from them that Supervised learning generates a function that reproduces the output [2]. The training process is called to ‘regression’ when the data in output has a continual value and ‘classification’ when a categorical value is present in output data [6]. Unsupervised learning includes the creation of a function that considers the hidden structures from unlabeled input data, unlike supervised data. During the training phase pre-processing of training data set is done and important features are extracted. Preprocessing involves noise reduction, feature extraction, image rectification and similar operations. For every new application, it is necessary to design features in a new way because feature extraction is a challenging task, especially when it is of medical importance. This process is frequently called “hand-crafting” of features, in the deep learning literature. Depending on the feature vector x  Rn, the classifier must predict the precise class y, which is characteristically assessed by a function yˆ = f(x) which gives the classification result yˆ directly. The parameter vector θ of the classifier is obtained during the training phase and later checked on a separate test data set. Artificial Neural Network is a well-known classification and regression algorithm in Machine learning, which represents the units of several layers in the computational analysis by imitating the architecture and signal transmission of the neurons and their synapses in the human brain. The ANN consists of interrelated artificial neurons where each neuron implements a simplistic classifier model that gives a decision signal as outputs based on a certain weighted summation of proofs. A wide number of these basic computational elements are accumulated together to form the ANN [7]. Here the features of the network are trained by a certain valuable algorithm like the ‘back-propagation’ algorithm, where signals from the input and anticipated decision outputs are presented in pairs, mirroring the situation where the brain focuses on an external stimulus of sensation to learn to achieve specific jobs (Fig. 1). Machine learning features used in input data can be numerical and nominal values. Defining logical and powerful features is fundamental to machine learning studies. ANN has shown extraordinary performance in numerous areas, but also drawbacks such as a decline in the local minimum during optimization, and overfitting (overtraining) for certain values. Artificial neural networks based predictive techniques have over the last few years shown incredible capabilities in solving problems of nonlinear modelling [8] in various applications, but most of these methods composed of shallow architectures because of problems related to deep networks training. Due to fast learning algorithms that have been proposed recently deep architecture has attracted a lot of consideration lately especially since deep ANNs have proved to outperform conventional methods of pattern recognition, classification and machine learning domains. DNN is composed of a series of layers stacked. Prediction is made found on the first layer i.e. the input. Output in the last layer predicts a class or value. Hidden layers are those between the input and output layers, and they are called so because their condition does not relate to data that is observable.

Applications of Deep Learning in Healthcare and Biomedicine

61

Fig. 1 A conceptual analogy between real neurons (on the left) and artificial neurons (on the right)

The multi-layered construction of the neural networks permits them to make more complicated decisions. For explicit training models, each edge demands weights that are optimized. These weights use the sum of a wide number of characteristics and are initialized at random and eventually organized by a good algorithm for optimization like the ‘gradient descent’ algorithm. After the application of training sample data to the network, there is an evaluation of a loss function between the target class and the prediction. All features are thence mildly updated towards the course that would be favouring the minimization of a loss function. On the basis of these networks, numerous classes of deep learning exist, all with varied approaches. Depth of layers is extended by DNNs as compared to the traditional ANN, along with a demonstration of better performance in recognition studies and prediction, when the layers become complicated.

1.3 Deep Learning Deep learning allows the representation of multiple levels of abstraction through models that are designed computationally and comprise many layers of processing. Introduced in the year 2000 but gaining momentum today due to upgradation in technology, deep learning techniques have drastically improved and increased its applications be it in the recognition of speech, or detection of an object or in biological fields such as in gene expression studies and in drug discovery [9]. By using the backpropagation algorithm deep learning discovers elaborate structure in large datasets to guide a machine to change its internal parameters which are then used to calculate the demonstration of each layer from the demonstration in the layer before that [10, 11]. Deep convolutional networks have initiated a revolution in the processing of video, images, audio and speech, while recurrent nets have stood out in analyzing sequential data such as text and speech.

62

S. Mittal and Y. Hasija

Deep learning can go unsupervised unlike shallow learning (supervised), and with little guidance, it learns uniquely complex patterns from raw data of high dimension [12]. This optimization is called as the tradeoff of the breadth or depth; that is. Deep learning has demonstrated its usefulness in—language and image recognition, video games, replication of painting styles or even classical composition of music. Representation learning is the type of learning required in these tasks; where there is detection/classification of patterns from unprocessed raw data, especially at times when the data in question is hierarchical in construct. For example, Image recognition starts with learning a pecking order of sub-images from pixels with edges, and then motifs, up until the final output is a full object. Being particularly unsupervised, deep neural network algorithms can act as feature detector units at each layer which slowly but ultimately extract more sophisticated and invariant features from the original input signals [12, 13]. Machines can now accurately identify millions of images which seems like an impossible task as per human standards. Using deep learning machines are able to learn to differentiate between similar objects or a sentence with high accuracy. They have also motivated the machine learning community towards bringing to fruition the idea of automation of tasks such as image recognition, prediction, classification and annotation in biology, where the huge complexity and vastness of data now overshadows human analytical capabilities [14, 15].

2 Deep Learning: Recent Trends Deep learning has given rise to immense possible and ongoing applications across the world, both in the Biological and non-biological domains.

2.1 In Non-biological Domains The application of deep learning has increasingly progressed ever since the advent of Convoluted Neural Network in early 2000. It has been since used for numerous applications with wide success such as image segmentation and face recognition. However, these did not gain much attention in research and the industry, at least not before 2012 in an open ImageNet Competition, that comprised of millions of images for training and 150,000 pictures exclusively for verification and testing [16]. This competition created a new field and had an enormous effect, leading the researchers to collaborate and compete, without making them collect a large-scale labelled dataset [5]. ‘Dropout’—a new technique for regulating, and a novel image extension skill, were used to improve the results of this competition. Furthermore, big giants in the IT and AI world such as Microsoft, Google and Facebook started considering image recognition using algorithms of deep learning as important areas of research. Post this, techniques in deep learning showed a 16% error rate in 2012 and it diminished to 3% and below in 2016, therefore surpassing object classification performance by

Applications of Deep Learning in Healthcare and Biomedicine

63

any human being. Object classification innovations have been relocated to semantic segmentation and object localization. The RNN-based language model and CNNbased image recognition framework were integrated to establish a visual questioning and answering, and an image captioning system. Another important area is speech recognition where computer science and electrical engineering knowledge, and research in linguistics, and health care (including radiology) can be combined. Technologies that bring about the translation and recognition of the speech to text by computational equipment, including robotics and smart technologies, have been developed by many researchers. Lately, due to advances in deep learning and big data, there has been tremendous progress in speech recognition [17]. This is evident from the numerous available speech recognition systems in multiple international firms, such as Facebook, Google, and by the numerous scientific papers that have been published in the research field on this topic.

2.2 In Biological Domain The expression profile of a gene can be considered a snap or image of the activities taking place inside a given cell or tissue very much like how a picture (image) is representative of the objects in an environment. Patterns of gene expression demonstrate a cell’s physical state in the same way how objects in a picture are represented by a pixel pattern. This is how similarities can be compared between biological data and the kind of data deep learning has been quite successful with particularly, audio and image data. In quite the same manner how two very similar but classically different images must be distinguished by deep learning algorithms regardless of background, two very similar but classically different pathologies of the disease may be discerned which is why thus discrimination of basic differences is absolutely essential. Invariance and selectivity are needed for both gene expression analysis and image recognition and are also two descriptors of CNNs [18]. Very similar analogies can be made with other deep learning applications; for example, language prediction, requires sequential learning with RNNs and this is very similar to signaling in biology, where one occurrence can be predicted from previous occurrences in the same way that a word in a sentence can be predicted from the preceding group of words. Another similar example would be the structural prediction of biological targets such as proteins. While these parallel comparisons are illustrative in nature, they also have various advantages together with DNNs that reinforce their case for biological applications. First and foremost, deep networks require the datasets for successful analysis which life science data more than enough provides. Also, DNNs are well designed to make use of well spread, noisy, and high dimensional data having non-linear relationships, which are quite endemic to data extracted in biology [18]. Furthermore, DNNs have an ability to generalize i.e. if it is trained on a dataset once, it can be applied to various other datasets as well, which as it turns out for the better, is already required

64

S. Mittal and Y. Hasija

Fig. 2 Deep Neural Network Assembly. a Input data—it consists of data from Electronic Health Records, clinical data, and also molecular data from microarray, MRI, etc. b Data preprocessing—in this step the source data is preprocessed before analysis by a deep neural network. Techniques of standardization, normalization, noise reduction and others are being used. c Deep Neural Network— pre-processed data is used in several hidden layers all the while extracting important features and resulting in output layer with trained neurons. d Output—this result helps in various biomedical and healthcare applications such as—diagnosis of disease, genotype-phenotype correlation, disease prediction, studying pharmacogenomics and drug response, among many others

for analysis of multi-platform heterogeneous data, such as that of expression of a gene. Despite the good match of biological data and DNN, their adoption in biology has been slow due to several possible reasons. This might be because biological data used for training has a lot of features associated with it, unlike non-biological data. More computational trials and research is required when dealing with deep learning on data. Moreover, the ability to simply interpret data with clear transparency is lacking in DNNs as they only learn by simple relations, associations and patterns. Such models are called ‘black boxes’ and they, therefore, require also the support of human beings for interpretation [18]. But the benefits of deep learning overshadow its negatives and might even be overcome with time (Fig. 2).

3 Applications of Deep Learning in Biomedicine Let us review the current and possible applications in Biomedicine in this section when it comes to deep learning.

3.1 Biomarkers In biomedicine is the conversion of data into biomarkers which reproduce physical states and phenotype—such as disease, is a valuable task. Biomarkers are highly

Applications of Deep Learning in Healthcare and Biomedicine

65

important when it comes to assessing the outcomes of clinical trial and identifying diseases and monitoring them, specifically near diseases like cancer. For the modern translational medicine identification of specific biomarkers with high sensitivity is a big challenge [10, 2]. An essential tool for biomarker development is Computational biology which may use any source of data, virtually speaking, from proteomics to genomics.

3.2 Genomic Study Next-generation sequencing technology has helped produce a huge volume of genomic data. A lot of this data can be analyzed computationally using in silico approaches such as structurally annotating genomes, including predicting the site of protein binding, and of splicing sites and noncoding regulatory sequences. A significant sector of genomics is environmental genomics or the metagenomics which NGS has brought attention to. One of the challenges in it is functionally analyzing species diversity and sequence data. Using deep belief networks and RNN has allowed phenotypic categorization of data of human microbiome and data of metagenomics pH [2, 10]. These helped provide the ability to learn to represent dataset in a hierarchical manner however they could not improve the accuracy of categorization. Nevertheless, on large datasets and after properly selecting network parameters DNN is said to have the potential to greatly improve metagenomics algorithms.

3.3 Transcriptomic Analysis Various kinds of transcripts are analyzed to gather functionally important information such as splicing code, disease biomarkers, etc. These can be miRNA, mRNA, siRNA, etc. Normalization is required since gene expression data obtained from different sources is dependent on numerous factors. For cross-platform analysis, Deep Neural Networks are quite well suited due to their strong generalization capacity [10]. Size of gene expression datasets and They are also well equipped to handle some of the other major issues with gene expression data, such as the size of the data sets and the need for dimension reduction and selectivity/invariance, and in the following section. Analysis of transcriptomics data with high dimensional matrix has also proven quite successful with deep learning. In one technique deep learning was used to extract features of cancer datasets which proved highly successful over previously used methods of basic feature selection [2]. The results showed high accuracy with better classification and selection of cancer features. Another instance where deep learning proved successful was when Fakoor et al. applied the autoencoder network for cancer classification upon gene expression data taken from Microarray.

66

S. Mittal and Y. Hasija

3.4 Medical Image Processing One of the most successful applications of DNN across the world has been in image analysis. Architectures of deep learning have proven better at recognizing objects in pictures than human detection and traditional image recognition. As a result, around the world, all advanced software systems use deep learning for image analysis involving object recognition, retrieval, and categorization [2, 14]. Naturally, in medicine, this had been of great value to researchers and technicians in identifying disease based on pictures of symptoms, especially in dermatological disorders but also in images showing gene expression and internal body imaging [2, 10]. Convoluted Neural Networks evidently have shown to be most useful in this arena of image analysis.

3.5 Splicing Another area of Biomedicine where deep learning is highly used is splicing, which is indicative of the biological activity in eukaryotic organisms. Current techniques prove insufficient in regulating splicing be it the structure of spice site, its state or splicing silencers or enhancers. But the most evident problem is of ‘raw reads’ at splice code locations which are essentially shorter than actual genes with a really high level of duplication [2, 10]. Deep learning comes to the rescue with high efficiency when it comes to studying splicing mechanism and understanding splice codes, outperforming Bayesian methods in splicing prediction.

3.6 Proteomic Study Deep Learning represents data hierarchically and extract and learn from interactions which are complex which is quite beneficial for protein network analysis. For example, using phosphorylation data a deep learning a belief network (bimodal) was created to predict the response of human cells to stimulus from the response of rat cells to the same stimulus. The algorithm used showed a very high accuracy over traditional approach [2, 10]. Also, analytical approaches (algorithms) of proteomics do not require large training data, unlike other ML algorithms. It is also true that proteomics is still very new to research compared to transcriptomics and contains very less data for analysis.

Applications of Deep Learning in Healthcare and Biomedicine

67

3.7 Structural Biology and Chemistry Protein modelling, including folding and protein dynamic, comprise the study of structural biology and chemistry. For good function prediction of enzymes, RNA binding, substrate and antigen-binding, perfect structure determination is important. Diseases such as Alzheimer’s and Parkinson’s are a result of the accumulation of abnormal proteins which are identified through structural biology studies. Comparative modelling is a technique to predict the secondary structure of a protein, based on homology of the compound but due to a limited number of well-annotated compounds, it is this is not easy [2, 10]. Applying deep learning using sequence has greatly improved protein structure prediction. Certain proteins are particularly very important even after lacking a unique structure. These proteins are called IDPs or intrinsically disordered proteins with the domains without a continuous structure called intrinsically disordered regions or IDRs. Deep learning algorithms have been used to separate IDP/IDR from structured proteins. Back in 2013, ‘DNdisorder’—a sequence-based deep learning predictor was published by Eickholt and Cheng which was highly successful at predicting disordered proteins compared to other advanced predictors. In 2015, ‘DeepCNF’ an even better predictor was developed which could predict IDPs and particular proteins with IDRs by obtaining and analyzing data from experiments. This proved to be a better algorithm than those used in ab initio predictors.

3.8 Drug Discovery Applying computational techniques to discover drugs and study their biochemistry has always been an important part of drug research across the world and it not only reduces time but also saves on cost and resources. Although several approaches exist to do this none of them have been declared ‘optimal’ as of yet due to certain limitations such as, limitation by the class of protein or being unable to perform high throughput screening, etc. PINN or Pairwise Input Neural Network was used by Wang et al. to study the interaction of target and ligand, by extracting features from target profiles and protein sequence [2, 10, 19]. Using DNN and CNN prediction of properties such as drug toxicity and high reactivity is possible which are highly valued aspects of drug design and discovery.

68

S. Mittal and Y. Hasija

4 Applications of Deep Learning in Health Care 4.1 Translational Bioinformatics With the findings of the human genome project, a huge amount of previously unexplored biological data has been obtained including genes, proteins and also knowledge on processes describing how genes interact with the external environment to produce proteins. Also, developments in life sciences and biotechnology have drastically reduced the cost of gene sequencing and directed disease treatment by genome and proteome analysis [20]. Translational Medicine essentially involves the application of research performed in basic biological laboratories at the clinical level by making use of inputs from clinical observations. And Bioinformatics entails the use of computational techniques and algorithms to critically store, represent or analyze biological data including metabolites within cells, RNA expression, DNA sequence and proteins [21]. Translational bioinformatics integrates these two fields; in the sense that it involves the development of databases and algorithms to research basic cellular and molecular data by keeping enhancement of clinical care as the ultimate goal. Simply put, research in translational bioinformatics unites molecular information (small molecules, lipids, protein, RNA and DNA) with knowledge about clinical entities (patients, symptoms, diseases, pathology reports, laboratory tests, clinical images and drugs) to improve our biological understanding and ultimately patient care. This has given rise to bioinformatics research in personalized medicine where treatment is designed specifically for the individual and not generally for many. Machine learning in the field of traditional bioinformatics comprises of 3 research areas—process prediction, disease prevention and personalized treatment. These domains are governed by three major areas of life sciences—genomics, epigenomics, and pharmacogenomics. While genomics is the study of DNA structure, genes, creation of proteins and phenotypic expression for creation of targeted therapies, pharmacogenomics is aimed at creating more effective drugs with minimal side effects while providing specialized treatment for individual and epigenomics is the study of effect of environmental factors on the interaction between and formation of proteins [21]. Genetic variants among population and species are created as a result of alternative splicing which is hence one of the popular areas of study involving machine learning. Their understanding could be the steppingstone in detecting diseases early. Another application of deep learning and machine learning algorithms in computational biology is the protein-protein interaction study using QSAR (Quantitative StructureActivity Relationship) and CPI (Compound-Protein Interaction) [21]. These also help in modelling proteins binding to RNA. Also, due to several reasons such as transcriptional or translational errors, instability in the chromosome, cancer progression or differentiation of cells, DNA methylation affects the expression of DNA, which is an area requiring more study using deep neural networks.

Applications of Deep Learning in Healthcare and Biomedicine

69

4.2 Universal Sensing for Health and Wellbeing One of the most applicative fields of Deep Learning, showing great potential for growth is biosensing in smart devices used in healthcare. The algorithms may be used in devices to monitor calorie intake, assist those with partial vision, detect irregularities in biomedical devices and more. Some of these applications have been discussed in the subsections below.

4.2.1

Recognizing Activity and Expenditure of Energy

Dieticians say that an optimum diet comprising a limited number of calories should be consumed in order to stay healthy and fit. But today obesity is rampant to the level of becoming an epidemic and being one of the causes of dangerous and chronic diseases such as those related to the heart and others such as type II diabetes. This can be overcome by keeping a track of the kind and quantity of consumed food and duration and type of exercises or physical activities performed, all of which contribute to a healthy disposition. However, to do this requires competent technology that is able to select characteristics which may generalize from the numerous foods and activities [21]. This is achieved using wearable devices and smartphones that monitor and manage food intake and energy expense. Recently a Calorie measurement system was created which acted as an assistant that could estimate the number of calories in a food item in a picture and this information then helps the consumer control or prevent health problems concerned with a disease by controlling the intake of that food item [21]. The system is applied by means of smartphones and uses Convoluted Neural Networks (CNNs), leading to many more advanced techniques such as cloud computing on mobiles, size calibration and distance estimation that help in recognizing food type, estimating the calories and for classification of human activities, such as—a baby crawling, someone falling (abnormal activities will raise alarm and inform the family members) [21]. Also, on comparing different datasets of human activity recognition and the performance of CNN based method on them it was found that deep learning method is more generalizable as it has better classification accuracy. Furthermore, smart wearable devices which are low powered are less efficient which is why they cannot handle greater computational complexity needed in deep learning. In such situations using preprocessing standardizing techniques is recommended as they decrease differences caused by properties of sensor like orientation and position from changing data in the input.

4.2.2

Abnormality Detection in Vital Signs

Individuals suffering from prolonged illness and those whose state is critical need to be closely monitored and it is hence important to analyze discrepancies in their vital

70

S. Mittal and Y. Hasija

signs. Abnormalities, however, vary patient to patient and are affected by equipment and noise. Machine learning techniques greatly contribute to this approach for detecting irregularities [21]. EEG is an equipment used to record electronic brain activity; in 2010 Wulsin et al. [22] proposed an approach to detect discrepancies in an EEG using Deep Belief Network (DBN). These use large datasets which proved DBN to be a more effective method even outperforming SVM [21]. In 2015 Wang et al. [23] created a DBN which compressed the signal thereby resulting in 50% energy saving while keeping the same neural decoding accuracy, which is a breakthrough in developing low power implantable and wearable sensors.

4.2.3

Assistive Devices

These comprise devices that are used to understand object shape and volume and classify them by operating in the three-dimensional space. It could be used for patients suffering from visual or audio impairment, speech impairment, etc. with the feedback provided by the user in the form of gesture, tactile feedback or audio feedback. Deep learning greatly helps in enhancing such devices; for example, in 2016 a CNN based wearable device was proposed by Poggi et al. [24] to aid people having impaired vision in detecting an obstacle. Similarly gestured based assistive devices have been proposed for patients with audio impairment and also for a highly sensitive environment like during surgery where a touch-free human-computer interaction would be preferable [21]. In fact, in 2015 Huang et al. [25, 26] had proposed a DNN based method for recognizing sign language that used real-time data. However, many such applications like gesture recognition are quite challenging due to a great number of possible distinctions in hand postures and due to subsequent algorithm complexity.

4.3 Informatics in Medicine This field aims to study a large amount of aggregated data in the medical domain in order to augment and grow the decision support system in the clinical sphere as well as increase healthcare data assessment for assuring good quality and easy access to medical services. The EHR (Electronic Health Records) are very data-intensive sources of information with respect to patient data including their drug prescriptions, treatments recommended, diseases diagnosed, records of vaccinations and laboratory tests results from machines such as EEG and clinical images both internal and external. Mining into this extensive dataset would certainly provide us with a greater understanding of the disease and eventually improve its management [21]. However, there are several disadvantages to this. For example, due to irregular compiling of information, there is complexity in data. Similarly, erratic delays between the recognizing of disease and diagnosis of disease increase the complexity of learning. Deep learning comprehends data depiction in both supervised and unsupervised conducts and its accomplishments are greatly attributed to its capacity to learn unique

Applications of Deep Learning in Healthcare and Biomedicine

71

patterns and characteristics. Scaling up of large datasets is done exceptionally well by deep learning methods. Moreover, deep neural networks can associate several components of data architecture which is why they handle information of multiple models well [21]. For these reasons and more deep learning has been widely accepted in research in medical informatics. In 2015, Shin et al. [27] proposed an image-text CNN to identify data that links images and reports of radiology from picture and information database of hospitals. In 2014, Liang et al. [28] had used a revised version of CDBN for training huge datasets on hypertension. In 2016, Putin et al. [29] used deep learning to recognize markers that forecast the age of a human being based on a blood test. In 2015, Nie et al. [30] suggested a DNN for instinctively inferring disease. Later, more methods were developed to interpret healthcare data some of which were on GBDT (gradient boosting decision trees), LSTM RNN, etc. Deep learning has therefore greatly increased accessibility to varied data be it from clinics, hospitals, data clouds or research organizations.

4.4 Public Health Through the means of analyzing the extent of disease and interaction with the environment, public health is aimed at improving healthcare facilities, preventing diseases and prolonging life. The domain of public health involves epidemic and pandemic studies, and their applications include air quality checks, assurance of drug safety, surveillance of epidemic, studies of environmental factors on lifestyle diseases such as obesity. Computational methodologies help in creating models for such studies; however, they are currently limited as they lack the ability to include real-time data in the analysis, Deep learning however if incorporated promise a better and stronger ability to generalize. This is because they are data-driven methods and are also able to optimize the cost function with the availability of new datasets [10]. An example of one such optimization algorithm is ‘stochastic gradient descent’, which is widely used in DNNs. Therefore, for analyzing public health data deep learning methods along with network analysis and recommendation systems are most advised. Assessing and predicting air pollutant concentration is one such application of deep learning. A system has been designed by Ong et al. in 2015 [31] which collects data from sensors in more than 52 cities of Japan and based on this, it forecasts air pollution level in the country [21]. The DNN method used is trained in an online manner and comprises of stacked Autoencoders. However, it is also true, as was found out, that deep learning techniques are affected by incomplete data of the real world. Tracking of disease outbreaks by performing epidemiology studies and assessment of lifestyle diseases through social media is another very interesting application of deep learning in the health sector. Examples of such diseases are Ebola and Influenza. In 2015, Zhao et al. [32] used Twitter to track the health of the public, continuously and quite accurately applications [1]. Here, DNNs are used to check for characteristics describing an epidemic and their changes with the environment to track the development of the disease. Not just this but messages on twitter may also

72

S. Mittal and Y. Hasija

be used to study antibiotics and have shown a good forecast of intestinal diseases. To classify antibiotic-related classes DBN was used while in 2016, Zou et al. [33] used deep learning to identify three types of intestinal diseases. Furthermore, in 2016, Garimella et al. [34] used geographically marked pictures from Instagram to track drinking, obesity, smoking and other lifestyle diseases and compared the classification by users with deep learning annotations. The results of the study stated that deep learning-based algorithmic annotations were more successful in predicting and categorizing behaviours such as drug abuse and drinking. Data from mobile phones such as texts or phone call location can greatly be used to characterize the behaviour of human beings. This technique uses CNN, it is gaining increasing popularity and is found highly accurate for prediction of gender and age of individuals. Therefore, metadata of individuals, mobile networks, social media data and EHRs help in forming policy for public health. This could also help in keeping large scale surveillance for diseases and create alert mechanisms at their onset or at the time when symptoms appear. However, collection of such personal data also poses the risk of intrusion of one’s privacy be it through social media platforms like Facebook, Twitter, Instagram or through databases containing sensitive data with low security and prone to easy exploitation. Hence, the current situation requires individuals to be able to control access to their private health information while at the same time creating mechanisms to gain more information for large scale study using deep learning algorithms (Table 1).

5 Challenges of Deep Learning in Biomedicine and Healthcare Compared to previous machine algorithms deep learning faces several challenges despite having shown an upper hand in feature extraction, recognition and classification. Biological data is highly complex and is not easy for humans to interpret it alone. Neither can it be interpreted in a good manner with controlled quality level by algorithms alone, such as the ones in deep learning since DNNs lack the transparency to uncover biological relationships [21]. They require human input along with computational analysis to holistically analyze given biological data. This is known as the ‘black box’ problem. Also, in order to have high accuracy deep learning algorithms require a large training data which in most cases is not instantly available. Non-availability of large enough training data gives the risk of overfitting, i.e., when test error is high in spite of having a low training error. Moreover, many times it is difficult to choose the particular type of DNN which would be appropriate for a task. Although there are tools to help in this selection such as techniques of hyperparameter optimization, it is not always upfront in deciding upon which architecture to utilize especially when new ones are continuously being added. Although on the whole computational techniques decrease the cost of analyzing data and save time, DNNs, in particular, have an intensive data training process which is also time-consuming

Applications of Deep Learning in Healthcare and Biomedicine

73

Table 1 Summary of applications of deep learning in biomedicine and healthcare Applications

Data source

Deep learning algorithm used

References

Cancer diagnosis and classification

Gene expression data

Deep autoencoders

[35]

Protein secondary structure prediction

PDB, CASP9, CASP10

DNSS (multimodal DBNs)

[36]

Gene variants

Microarray data

DNN

[37]

Annotating gene expression patterns

Gene expression data

CNN

[38]

Metagenomic classification

Microbiome sequencing data

RNN, DBN

[39]

Target-ligand interaction prediction

sc-PDB

SVD and autoencoder

[40]

DNA methylation

DNA, RNA sequencing data

DNN

[41, 42]

Identification of Expression Quantitative Trait Loci (eQTL)

RNA-seq and whole genome-wide SNP-array data

DNN

[43]

Effects of noncoding variants

Transcription factors binding profiles, histone mark profile from ENCODE and epigenomics project

DeepSEA (CNN)

[44]

Modelling structural features of RNA-binding protein targets

doRiNA (database of RNA interactions in post-transcriptional regulation)

DBN (multimodal DBNs)

[45]

3D brain reconstruction

MRI

Deep autoencoders

[46, 47]

Alzheimer diagnosis

PET scans

DNN, CNN

[48]

Cell clustering

Microscopy

Deep autoencoder

[49, 50]

Hemorrhage detection

X-ray images

DNN

[51, 52]

Organ segmentation

Endoscopy images

CNN

[53]

Human activity recognition

Wearable devices

CNN, DNN

[54, 55]

and requires skilled individuals such as GPU programmers. Scientists also find deep learning unable to answer some important questions and provide solutions. For one, many high-level visualizations obtained using deep learning are not easy to interpret. Plus, there are sometimes no provisions to apply changes in case of any issue in classification. Moreover, deep learning is not suitable for all kinds of diseases, particularly rare diseases. Evidence also suggests that DNNs can also be easily tricked to obtain misclassified information by making minute changes in the input.

74

S. Mittal and Y. Hasija

6 Conclusion It is no surprise that today the tremendous amount of biological data generated is simply too large for humans to analyze alone. In addition, we, therefore, require the support of machine learning and specifically deep learning algorithms to effectively interpret data in biomedicine and healthcare [10]. Recent research and development in deep learning in biomedicine has advanced us into an era where we find widespread applications in drug designing, genomic and proteomic analysis, transcriptomic gene expression analysis, splicing, medical image processing, multi-omics study, among others [2, 14]. In healthcare a similar advancement is seen when DNN is applied to translational bioinformatics, finding genetic variants, studying target and ligand interaction, medical imaging, assistive wearable devices, medical informatics, public health, etc. Although this approach of deep learning in biological sciences is still in its infancy, it is a novel approach and holds great potential of drastically changing the life sciences scenario with respect to cost, time and extent of use of technology— both physically and computationally. Granting there are a few drawbacks when it comes to the application of the algorithm, however, these get overshadowed by the extent of improvement in the biological domain even though we are just at the outset of deep learning applications. With time these methodologies are bound to improve, as a result of the partnerships forged by the thousands of people brought together by the common goal of researching and developing deep learning algorithms upon life science data. Giants of the IT and Pharmaceutical world across the globe are rapidly investing in research into computational biological sciences as the next few years are projected to demonstrate an immense growth in this sector. The book chapter was aimed at achieving two things—One, making the reader aware of the recent advancements of neural networks and particularly deep learning, along with their current and future applications in biological sciences. The second was bridging the gap pure biologists and the community of computational biologists [21, 56]. Many research results of DNNs in Biological sciences have not even been announced due to lack of awareness and are impending communication. Research progresses exponentially when a large number of people associate and work together with a common goal, just like how deep neural networks work more effectively on large datasets.

References 1. Angermueller, C., Pärnamaa, T., Parts, L., Stegle, O.: Deep learning for computational biology. Mol. Syst. Biol. 12(7), 878 (2016) 2. Cao, C., et al.: Deep learning and its applications in biomedicine. Genom. Proteom. Bioinf. 16(1), 17–32 (2018) 3. Rajeswari, K., Vivekanandan, N., Amitaraj, P., Fulambarkar, A.: A study on redesigning modern healthcare using internet of things, pp. 59–69 (2017) 4. Jiang, F., et al.: Artificial intelligence in healthcare: past, present and future. Stroke Vasc. Neurol. 2(4), 230–243 (2017)

Applications of Deep Learning in Healthcare and Biomedicine

75

5. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015) 6. Nelson, D., Wang, J.: Introduction to artificial neural systems. Neurocomputing 4(6), 328–330 (2003) 7. Jain, A.K., Mao, J., Mohiuddin, K.M.: Artificial neural networks: a tutorial. Computer 29(3), 31–44 (1996) 8. Pour, M.P., Seker, H., Shao, L.: Automated lesion segmentation and dermoscopic feature segmentation for skin cancer analysis. In: Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, pp. 640–643 (2017) 9. Norgeot, B., Glicksberg, B.S., Butte, A.J.: A call for deep-learning healthcare. Nat. Med. 25(1), 14–15 (2019) 10. Mamoshina, P., Vieira, A., Putin, E., Zhavoronkov, A.: Applications of deep learning in biomedicine. Mol. Pharm. 13(5), 1445–1454 (2016) 11. Ching, T., et al.: Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15(141) (2018) 12. Min, S., Lee, B., Yoon, S.: Deep learning in bioinformatics. Brief. Bioinf. 18(5), 851–869 (2017) 13. Erickson, B.J., Korfiatis, P., Akkus, Z., Kline, T., Philbrick, K.: Toolkits and libraries for deep learning. J. Digit. Imaging 30(4), 400–405 (2017) 14. Akkus, Z., Galimzianova, A., Hoogi, A., Rubin, D.L., Erickson, B.J.: Deep learning for brain MRI segmentation: state of the art and future directions. J. Digit. Imaging 30(4), 449–459 (2017) 15. Miotto, R., Wang, F., Wang, S., Jiang, X., Dudley, J.T.: Deep learning for healthcare: review, opportunities and challenges. Brief. Bioinf. 19(6), 1236–1246 (2017) 16. Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006) 17. Neapolitan, R.E., Neapolitan, R.E.: Neural networks and deep learning. In: Artificial Intelligence, pp. 389–411 (2018) 18. Esteva, A., et al.: A guide to deep learning in healthcare. Nat. Med. 25(1), 24–29 (2019) 19. Faust, O., Hagiwara, Y., Hong, T.J., Lih, O.S., Acharya, U.R.: Deep learning for healthcare applications based on physiological signals: a review. Comput. Methods Programs Biomed. 161, 1–13 (2018) 20. Kim, K.G.: Book review: deep learning. Healthc. Inform. Res. 22(4), 351 (2016) 21. Ravi, D., et al.: Deep learning for health informatics. IEEE J. Biomed. Health Inf. 21(1), 4–21 (2017) 22. Wulsin, D., Blanco, J., Mani, R., Litt, B.: Semi-supervised anomaly detection for EEG waveforms using deep belief nets. In: Proceedings—9th International Conference on Machine Learning and Applications, ICMLA 2010, pp. 436–441 (2010) 23. Wang, A., Song, C., Xu, X., Lin, F., Jin, Z., Xu, W.: Selective and compressive sensing for energy-efficient implantable neural decoding. In: IEEE Biomedical Circuits and Systems Conference: Engineering for Healthy Minds and Able Bodies, BioCAS 2015—Proceedings (2015) 24. Poggi, M., Mattoccia, S.: A wearable mobility aid for the visually impaired based on embedded 3D vision and deep learning. In: Proceedings—IEEE Symposium on Computers and Communications, Aug 2016, pp. 208–213 25. Huang, J., Zhou,W., Li, H., Li, W.: Sign language recognition using real-sense. In: 2015 IEEE China Summit and International Conference on Signal and Information Processing, ChinaSIP 2015—Proceedings, pp. 166–170 (2015) 26. Tang, A., Lu, K., Wang, Y., Huang, J., Li, H.: A real-time hand posture recognition system using deep neural networks. ACM Trans. Intell. Syst. Technol. 6(2), 1–23 (2015) 27. Shin, H.-C., Lu, L., Kim, L., Seff, A., Yao, J., Summers, R.M.: Interleaved text/image deep mining on a large-scale radiology database for automated image interpretation. J. Mach. Learn. Res. 17(1–31), 2 (2015) 28. Liang, Z., Zhang, G., Huang, J.X., Hu, Q.V.: Deep learning for healthcare decision making with EMRs. In: Proceedings—2014 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2014, pp. 556–559 (2014)

76

S. Mittal and Y. Hasija

29. Korzinkin, M., et al.: Deep biomarkers of human aging: application of deep neural networks to biomarker development. Aging (Albany NY) 8(5), 1021–1033 (2016) 30. Nie, L., Wang, M., Zhang, L., Yan, S., Zhang, B., Chua, T.S.: Disease inference from healthrelated questions via sparse deep learning. IEEE Trans. Knowl. Data Eng. 27(8), 2107–2119 (2015) 31. Ong, B.T., Sugiura, K., Zettsu, K.: Dynamically pre-trained deep recurrent neural networks using environmental monitoring data for predicting PM2.5. Neural Comput. Appl. 27(6), 1553–1566 (2016) 32. Zhao, L., Chen, J., Chen, F., Wang, W., Lu, C.T., Ramakrishnan, N.: SimNest: social media nested epidemic simulation via online semi-supervised deep learning. In: Proceedings—IEEE International Conference on Data Mining, ICDM, Jan 2016, pp. 639–648 33. Zou, B., Lampos, V., Gorton, R., Cox, I.J.: On infectious intestinal disease surveillance using social media content, pp. 157–161 (2016) 34. Garimella, K., Alfayad, A., Weber, I.: Social media image analysis for public health. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 5543–5547 (2015) 35. Fakoor, R., Ladhak, F., Nazi, A., Huber, M.: Using deep learning to enhance cancer diagnosis and classification. In: Proceeding of the ICML Work. Role Mach. Learn. Transform. Healthc. (2013) 36. Spencer, M., Eickholt, J., Cheng, J.: A deep learning network approach to ab initio protein secondary structure prediction. IEEE/ACM Trans. Comput. Biol. Bioinf. (2014) 37. Quang, D., Chen, Y., Xie, X.: DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics (2015) 38. Zeng, T., Li, R., Mukkamala, R., Ye, J., Ji, S.: Deep convolutional neural networks for annotating gene expression patterns in the mouse brain. BMC Bioinf. (2015) 39. Ditzler, G., Polikar, R., Rosen, G.: Multi-layer and recursive neural networks for metagenomic classification. IEEE Trans. Nanobiosci. (2015) 40. Wang, C., Liu, J., Luo, F., Tan, Y., Deng, Z., Hu, Q.N.: Pairwise input neural network for target-ligand interaction prediction. In: Proceedings—2014 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2014 (2014) 41. Tian, K., Shao, M., Wang, Y., Guan, J., Zhou, S.: Boosting compound-protein interaction prediction by deep learning. Methods (2016) 42. Angermueller, C., Lee, H.J., Reik, W., Stegle, O.: DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol. (2017) 43. Witteveen, M.J.: Identification and elucidation of expression quantitative trait loci (eQTL) and their regulating mechanisms using decodive deep learning (2014) 44. Zhou, J., Troyanskaya, O.G.: Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods (2015) 45. Zhang, S., et al.: A deep learning framework for modeling structural features of RNA-binding protein targets. Nucleic Acids Res. (2015) 46. Mansoor, A., et al.: Deep learning guided partitioned shape model for anterior visual pathway segmentation. IEEE Trans. Med. Imaging (2016) 47. Shan, J., Li, L.: A deep learning method for microaneurysm detection in fundus images. In: Proceedings—2016 IEEE 1st International Conference on Connected Health: Applications, Systems and Engineering Technologies, CHASE 2016 (2016) 48. Fritscher, K., Raudaschl, P., Zaffino, P., Spadea, M.F., Sharp, G.C., Schubert, R.: Deep neural networks for fast segmentation of 3D medical images. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 49. Avendi, M.R., Kheradvar, A., Jafarkhani, H.: A combined deep-learning and deformable-model approach to fully automatic segmentation of the left ventricle in cardiac MRI. Med. Image Anal. (2016) 50. Cheng, J.Z., et al.: Computer-aided diagnosis with deep learning architecture: applications to breast lesions in US images and pulmonary nodules in CT scans. Sci. Rep. (2016)

Applications of Deep Learning in Healthcare and Biomedicine

77

51. Rose, D.C., Arel, I., Karnowski, T.P., Paquit, V.C.: Applying deep-layered clustering to mammography image analytics. In: Proceedings of the 2010 Biomedical Science and Engineering Conference, BSEC 2010: Biomedical Research and Analysis in Neuroscience, BRAiN (2010) 52. Wang, J., MacKenzie, J.D., Ramachandran, R., Chen, D.Z.: A deep learning approach for semantic segmentation in histology tissue images. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 53. Xu, T., Zhang, H., Huang, X., Zhang, S., Metaxas, D.N.: Multimodal deep learning for cervical dysplasia diagnosis. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 54. Sun, L., Jia, K., Chan, T.H., Fang, Y., Wang, G., Yan, S.: DL-SFA: deeply-learned slow feature analysis for action recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2014) 55. Ravi, D., Wong, C., Lo, B., Yang, G.Z.: Deep learning for human activity recognition: a resource efficient implementation on low-power devices. In: BSN 2016—13th Annual Body Sensor Networks Conference (2016) 56. Dolmans, D.H.J.M., Loyens, S.M.M., Marcq, H., Gijbels, D.: Deep and surface learning in problem-based learning: a review of the literature. Adv. Health Sci. Educ. 21(5), 1087–1112 (2016)

Shubham Mittal is dynamic individual currently pursuing his master’s in bioinformatics from the Delhi Technological University. Having completed his bachelor’s in biotechnology he is highly motivated towards computational research in life sciences and possesses an in-depth knowledge of the field. In his free time Shubham likes to play basketball, listen to music and read fiction. Dr. Yasha Hasija is an Associate Professor in the Delhi Technological University. She holds a bachelor’s and master’s degree in biotechnology and Ph.D. in Bioinformatics. Besides having a sound academic foundation Dr. Yasha is a vibrant individual and a very good orator. Specializing in genome informatics and interaction study with human diseases, some of her research interests are—genetic analysis of dermatological disorders, tuberculosis study and role of human genetic variations in age-related disorders.

Deep Learning for Clinical Decision Support Systems: A Review from the Panorama of Smart Healthcare E. Sandeep Kumar and Pappu Satya Jayadev

Abstract Innovations in Deep learning (DL) are tremendous in the recent years and applications of DL techniques are ever expanding and encompassing a wide range of services across many fields. This is possible primarily due to two reasons viz. availability of massive amounts of data for analytics, and advancements in hardware in terms of storage and computational power. Healthcare is one such field that is undergoing a major upliftment due to pervasion of DL in a large scale. A wide variety of DL algorithms are being used and being further developed to solve different problems in the healthcare ecosystem. Clinical healthcare is one of the foremost areas in which learning algorithms have been tried to aid decision making. In this direction, combining DL with the existing areas like image processing, natural language processing, virtual reality, etc., has further paved way in automating and improving the quality of clinical healthcare enormously. Such kind of intelligent decision making in healthcare and clinical practice is also expected to result in holistic treatment. In this chapter, we review and accumulate various existing DL techniques and their applications for decision support in clinical systems. There are majorly three application streams of DL namely image analysis, natural language processing, and wearable technology that are discussed in detail. Towards the end of the chapter, a section on directions for future research like handling class imbalance in diagnostic data, DL for prognosis leading to preventive care, data privacy and security would be included. The chapter would be a treat for budding researchers and engineers who are aspiring for a career in DL applied healthcare. Keywords Machine learning · Deep learning · Smart healthcare · Clinical decision support system

E. Sandeep Kumar (B) Department of Telecommunication Engineering, M.S. Ramaiah Institute of Technology, Bengaluru, India e-mail: [email protected] P. Satya Jayadev Department of Electrical Engineering, IIT Madras, Chennai, India e-mail: [email protected] © Springer Nature Switzerland AG 2020 S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics, Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_5

79

80

E. Sandeep Kumar and P. Satya Jayadev

1 Introduction Deep Learning has showed its potential in recent years for reducing the large datasets to a more abstract representation well suited for classification and prediction applications, which is the heart of many smart tech-systems. Majority of the DL algorithms comprise of sequence of blocks embedded with primitive linear or non-linear operations that operate on the data flowing from a block to the other, thereby learning a more condensed representation of the information contained in the dataset and thereby aiding decision making process [1]. A healthcare system comprises of doctors, nurses, front-line managers, middlelevel managers, senior managers, and board of directors. Decision making process is a crucial aspect of this group and the decisions taken by them can be classified to be clinical and non-clinical. Clinical decision support systems (CDSS) are the technology driven arrangements that assist a physician or any medical practitioner for better decision making process. Under clinical systems, there are decisions taken with respect to diagnosis, therapy, treatment and medical prescription, while the nonclinical decisions include those taken with respect to resource allocation, budgets, strategic planning, etc. Even though DL can assist in decisions of all kinds, in this chapter we stick to the core objective of discussing about the use of DL in clinical decision making. Clinical decisions are one among the many complex and challenging decision support systems, mainly because of the various measurable and non-measurable attributes involved in decision making and complex relations that exist between those attributes. The attributes include patients’ beliefs, lifestyles, experiences, education level, diagnostic reports, historical health records and so on. DL algorithms can serve as effective tools for supporting the decision making process, however the attributes input to the algorithms must be measurable and quantifiable. To understand where DL fits in the bigger picture of CDSS, let us look at a general block diagram of a CDSS shown in Fig. 1. The system has mainly three blocks: patient’s primary data [2]—this data comprises of observed symptoms, diagnostic reports, medical records of the patient, etc., secondary data comprises of data external to healthcare system which include the patient’s food and drinking habits, sensitivity of body to certain allergens, patient’s rights, etc., which are to be considered for more informed decisions regarding healthcare. The knowledge base refers to the historical medical records of other patients stored in the database which can serve as a reference while taking decision with regard to the current patient. In [3, 4], we see the authors coming up with a fuzzyrule based system referred as virtual clinic with an objective to automatically assign doctors to patients and then assisting the doctors in giving prescriptions by using the historical knowledge base. However, as this database expands, it may contain thousands of entries making it almost impossible to search them thoroughly for informative records. This is where DL comes in handy with the tremendous representation power of deep neural networks. Machine learning and/or deep learning models have the potential to compress huge databases into abstract miniature representations and

Deep Learning for Clinical Decision Support Systems …

81

Fig. 1 Block diagram of a clinical decision support system

can almost replace the knowledge base, making it optional. These models can be directly used to serve as a decision making tool in the CDSS. As an example, in [5], authors use deep feed forward neural networks to predict the inpatient clinical order patterns. The features considered were comorbidity, patient sex and race, International classification of diseases (ICD)-diagnosis codes and so on, from the electronic health records. They concluded that deep neural network based model outperformed standard of care human authored order sets in predicting actual clinical practices. In [6], authors use convolutional neural networks on the electronic health records of the patients and extract high-level semantic information of the diagnosis and generate a report. This result is used as assistance for medical practitioners to conclude on the health status. There are many such applications of ML and DL in different domains of clinical healthcare and also potential applications of DL that can be explored in the future, all of which will be covered in the later parts of this chapter. The rest of the chapter is organized as follows: in Sect. 2, we review the existing works on the applications of DL with image processing (computer vision) for CDSS, Sect. 3 deals with the applications of DL in Natural Language Processing (NLP) for CDSS while highlighting certain existing challenges, Sect. 4 deals with DL and wearable device technology based CDSS, Sect. 5 looks at the issues involved in using DL for CDSS, Sect. 6 discusses future research perspectives on the use of DL for CDSS and finally, Sect. 7 summarizes and concludes the chapter.

2 Deep Learning and Image Analysis Image analysis is an area that was well explored in smart healthcare. Ever since the digital imaging came into existence, automated analysis of the images using naive rule based architectures is being done. The era in the early 90s shifted to the use of

82

E. Sandeep Kumar and P. Satya Jayadev

simple machine learning algorithms to extract useful patterns and information from the images. However, this involved a lot of hand engineering from deciding which features to extract, how to extract, which algorithm to use for decision making, and so on. The recent advances in DL come as a big relief since the architectural nature of DL is so powerful that it can extract the features and approximate a prediction function from the given data seamlessly. This very special potential of DL algorithms made it a preferred tool for image analysis and computer vision applications. In all the existing research works, the blocks involved in the image analysis are similar. The blocks are summarized in Fig. 2: image acquisition includes various methods using which images of an entity is captured, these images are passed through the preprocessing stages where the images are filtered or subjected to manual bounding box carving, and the modified image is passed into a convolutional neural network (CNN) block for training. The obtained image is passed to an interpreter (optional) which can be a fully connected network, autoencoders, and so on, that quantifies the obtained image from the previous stages into a required form that is suitable for a medical practitioner to understand. We shall now look at the various applications of DL in analysis of medical images, proposed by researchers in recent years. Convolutional neural networks are seen often in the works that use DL for image analysis. The reasons for such an extensive usage of CNNs are: a CNN learns the relevant features like how human brain extracts features from an image. Another important characteristic of CNNs is weight sharing [7], where the kernels are shared across an image which gives the advantages of learning the local patterns efficiently and increasing the model efficiency by reducing the number of parameters involved in the whole process. Transfer learning [7] which is explicitly used in image analysis is easy in case of CNNs than conventional dense neural networks. Let us see few works that use CNNs for image analysis tasks. In [8], the authors review various medical imaging applications of DL. They notify that image analysis

Fig. 2 General block diagram for image analysis using CNNs

Deep Learning for Clinical Decision Support Systems …

83

has been carried majorly on pathology, lung, brain, cardiac, abdomen, breast, bone, retina, etc. Alongside, various imaging modalities are used, such as MRI, CT, Xray, PET, ultrasound and visible range, of which MRI and visible light microscopic imaging are majorly used in image analysis. In addition, the authors state that image analysis techniques like segmentation, classification (for medical examination and inferences, and object detection), and registration are widely studied. Among these, segmentation of a required region of interest (RoI) and detection of an object in a given image are most studied among image analysis methods due to their practical implications. In [9], the authors proposed a methodology for segmentation of regions of interest applied to identifying heart chambers. The methodology has three parts viz. the first part uses convolution neural networks (CNNs) [10] to locate the area containing left ventricle (LV) in the image frame, the second part consists of stacked autoencoders to infer the shape of the LV from the image fed from the first part, the third part comprises of a Dense-NN to segment and deliver a binary mask of the LV. The algorithm was trained and validated on a publicly available LV datasets (MRI scans) obtaining an accuracy of 96.69%. In [11], the authors propose a scribble based CNN for image segmentation task. As stated by the authors, a completely automated DL algorithm performs poorer on the unseen/test images and hence a bounding box is needed to concise the search space for the algorithm. This bounding box based training method has provided better segmentation accuracy. The work in [12], propose a method of multimodal image segmentation where authors use MRI, PET and CT imaging. The images are passed through three separate CNNs and the outputs are fused together to get a more precise segmented image. In [13], the authors propose a novel architecture of using deepCNNs (DCNNs) to work collaboratively towards the segmentation of brain tumor and skin lesions. The DCNNs are paired and whenever a DCNN misclassifies a data input, a synergic error is produced that updates the whole network together with the usual back propagated error. Similar kind of works using CNNs are presented below in Table 1 in a confined manner. Though CNNs are shown to be very effective in object detection and segmentation, they required datasets with large number of samples and correspondingly high memory requirements and processing power. Also they fail to detect the variations in pixel information at the boundaries. Therefore, authors in [18] propose a recurrent neural network (RNN) based architecture where it learns the level-set based deformable models (LDMs, also known as the geometric or implicit active contour models) evolving under constant and mean curvature velocities. The specific tasks considered in this work were the segmentation of the Optic Disc and Cup in color fundus images, cell nuclei in histopathology images and the left atrium in cardiac MRI volumes. The block diagram will remain the same as in Fig. 2, however CNN block is replaced by RNNs. Similar kinds of works that aim at medical segmentation using CNNs and RNNs can be seen in [19–21]. The Table 2 shows the image datasets being used in the majority of the research works related to medical imaging.

84

E. Sandeep Kumar and P. Satya Jayadev

Table 1 DL and image analysis works Citations

Imaging modality

Remarks

Koitka et al. [14]

X-ray

Uses faster-RCNN with Inception ResNet V2 as a feature extractor. DL techniques are used to determine the ossification areas in bone to determine the age of fossils. Dataset was taken from RSNA pediatric bone age challenge

Deniz et al. [15]

MRI (magnetic resonance imaging)

UNets (type of CNNs) are used to work on the MR image slices to segment proximal-femur-RoI for fracture risk assessment.

Abd-Ellah et al. [16]

MRI (magnetic resonance imaging)

The work has two stages: the first stage has CNNs (AlexNets, VGG-16 and VGG-19) used with error-correcting output codes based support vector machine (ECOC-SVM) for tumor detection and in the second part, R-CNN for the tumor localization

Kamnitsas et al. [17]

MRI (magnetic resonance imaging)

The work uses 3D CNNs to segment lesions from the brain images taken from patients having brain tumors and experienced ischemic stroke

Summary In this section various DL algorithms and their use in image analysis task was reviewed. Majority of the existing works in image analysis focus on segmentation and detection of RoI in images. DL architectures like CNNs and its variants can be widely seen. RNNs were also applied for a few imaging tasks, and combination of deep learning with naive machine learning techniques like support vector machines (SVMs) are also encountered in the literature. Even though image segmentation using machine learning techniques was studied for many decades, it was a tedious job to extract meaningful information from the images (especially medical images) due to a lot of hand feature engineering involved. Usage of DL algorithms reduced this effort and clinical support systems reliant on the medical imaging inferences got a tool to take the decisions in a timely manner. It is also observed that supervised deep learning techniques were employed more than the unsupervised learning techniques in the existing literature related to image analysis.

Deep Learning for Clinical Decision Support Systems …

85

Table 2 Image datasets Dataset

Remarks

Brainweb [22]

Contains simulated brain MRI images of normal and multiple sclerosis

MICCAI [23]

Contains brain tumor data

NIHCC [24]

Contains chest X-ray images of 8 kinds of diseases: Atelectasis, Cardiomegaly, Effusion, Infiltration, Mass, Pneumonia, Pneumothorax

TCIA [25]

Lung Image Database Consortium (LIDC), Reference Image Database to Evaluate Response (RIDER), and other image datasets related to- breast cancer, lung phantom, non-small cell lung cancer, brain cancer, Glioblastoma Multiforme, Squamous Cell Carcinoma, prostate cancer, etc.

OASIS [26]

Contains brain images—MR and PET

ADNI [27]

Contains data pertaining to Alzheimer’s disease

FITBIR [28]

Traumatic brain injury dataset

STARE [29]

Retinal image dataset

John Hopkins Medical Institute repository [30]

Brain image database

MIDAS [31]

Brain lesions image dataset

UCI repository [32]

Parkinson’s disease dataset

Cornell repository [33]

Chest CT images

USF repository [34]

Mammography and gait baseline

LIDC [35]

Lung image database

SCR database [36]

Chest radiographs

VIA group [37]

Lung image database

Mini-MIAS [38]

Mammogram images

DIARETDB1 [39]

Diabetic retinopathy images

OAI [40]

General image database

3 DL and Natural Language Processing Natural Language Processing (NLP) is the ability of a computer program to understand, interpret and manipulate human language. The applications of NLP in general include enterprise search where the computer programs extract the information from human speech, thereby searching for the relevant records in the database and returning with an answer. Sentiment analysis is another powerful application of NLP where the comments and reviews are analyzed by data scientists to get a feedback on the performance, for further improvement. Specifically, in healthcare, NLP has a vital role to play apart from the above mentioned ones. Firstly, even though many patients

86

E. Sandeep Kumar and P. Satya Jayadev

can access their electronic health record (EHR), which is a real-time patient data record, they cannot interpret it. One of the prime reasons is the lack of time from the medical practitioners to make patients understand the EHR data. By using NLP, one can understand the data and keep his health on check through suggested medical prescriptions, daily activity chart, and so on. That apart, converting an image or a pdf into informative text and thereby parsing and analyzing it to extract useful information is another application of NLP. One best example is an IBM Watson machine [41], where the machine is trained to run on the patient’s data and extract the risk features and thereby predict possible diseases that could affect the patient. Let us go through the existing works in the literature that uses NLP for clinical support system. The work in [42] presents an approach on usage of NLP to extract the potential medical conditions from the free-text medical reports. The entire process here is composed of two main components: the background application and the problem list management application. The background app is responsible for extracting the information about possible medical conditions using rule based NLP from the medical documents and stores it in a central database. Problem list management app accesses the data stored in the database, and concludes on the medical problem of a patient. The work focused on 80 different types of medical conditions like Arrhythmia or Ischemic heart disease; Mitral stenosis or Left bundle branch block; Wheeze or Pain. In [43], authors propose an NLP based method to analyze and compare the health records of the patients who are more likely to commit suicide and who have already attempted suicide. The work is based on the fact that many patients who are at the risk of committing suicide meet their physicians for consultancy. This study used eNQUIRENet, a database that links EHR data across multiple non-integrated primary care clinical organizations representing more than 3 million patients and 1700 clinicians. Three sources were used to confirm that the patient has a suicidal tendency—firstly searching ICD-9 codes (International Classification of Diseases codes) indicating suicide attempt or ideation: E950–959 (attempt) and V62.84 (ideation) from the EHR, second being parsing the HPI field (History of Present Illness) to recognize the entries that are relevant to the symptoms of the suicide like self harm, hang, cut attempts; third field is the PHQ-9 (Patient Health Questionnaire) examination where the depression severity is recorded. The extracted fields confirmed that suicide attempts is more likely seen than only ideation. A similar work is seen in [44] to infer on the presence of acute bacterial pneumonia based on chest X-ray reports of 292 patients using rule based NLP. However, all the methods mentioned above do not use concepts of DL even though they are considered to be NLP systems for clinical support. In this context, we are proposing a method that uses CNN for text classification. The method has the following stages: (i) extraction of keywords from the data records (ii) Converting the word sequences from the text/sentences and medical codes to a vector form using a look-up table/feature mapping process and (iii) classifying the text into disease occurrence by feeding the obtained sequence of vectors to a sequence of convolution and pooling layers. The block diagram shown in Fig. 3 explains this method. The output of the classification layer can be used for any prediction or identification purposes in CDSS.

Deep Learning for Clinical Decision Support Systems …

87

Fig. 3 Classification from text data using CNNs

In a similar way, we can use RNNs for learning from texts. Figure 4 is a possible architecture based on RNNs of which can be used for learning from text data. The figure shows a series of RNN cells connected sequentially to form a network. The words in the clinical text or the medical codes are fed as the input to this sequence learner and the output can be taken from all the RNN cells or just the last RNN cell based on the requirement. For instance, the text or symptomatic information extracted from the EHR can be fed to these networks to predict the most probable disease affecting the patient. A similar network can be built by replacing vanilla RNNs in the architecture by long short term memory (LSTM) cells. Usage of LSTMs has an advantage of

88

E. Sandeep Kumar and P. Satya Jayadev

Fig. 4 Block diagram of sequential text learning using RNNs

carrying forward the information for a longer part of the sequence using a memory cell and multiple gates. This helps the neural network to learn the changes in training the dataset with fewer errors. The following are a few links to the datasets often used in NLP for clinical support and healthcare applications.

Dataset

Remarks

MIMIC [45]

Developed by MIT and has anonymised health record of approx. 40,000 critical patients

i2b2 [46]

Health records of nearly 1500 patients

HealthData [47]

Health data from US Federal Government

BCHC data platform [48]

Health data from 26 cities, for 34 health indicators and across 6 demographic indicators

HMD [49]

Human mortality database

MHealth dataset [50]

Database of body motion and physical activities

Medicare [51]

Data on services and procedures that physicians and other healthcare professionals provided to Medicare beneficiaries

LSDB [52]

Data related to life sciences (continued)

Deep Learning for Clinical Decision Support Systems …

89

(continued) Dataset

Remarks

HCUP-US [53]

Datasets contain encounter-level information on inpatient stays, emergency department visits, and ambulatory surgery in US hospitals

SEER [54]

Data about cancer incidence segmented by demographic groups such as age, race, and gender, provided by the US government

BROAD [55]

Data categorized by project such as brain cancer, leukemia, melanoma, etc.

In general, the overall system block diagram of DL application on EHR is as shown in the Fig. 5. As shown in Fig. 5, deep learning techniques applied on the EHR should perform three major tasks: single concept extraction which is to extract information like possible diseases, treatments and procedures. Secondly, temporal event extraction which assigns time to the events, like within a few hours, from this month and so on.

Fig. 5 NLP for healthcare decisions [56]

90

E. Sandeep Kumar and P. Satya Jayadev

Third is the relation extraction like which treatment effects what, which test is for what and so on. In the above discussions, few DL techniques like CNNs and RNNs are explained in detail. However, there are other DL techniques that are widely used for NLP applications for clinical decision support such as Boltzmann machines and its variants like deep belief networks [57], autoencoders and its variants like sparse autoencoders, variational and denoising autoencoders. In that context, in [58] authors used deep belief networks (DBNs) that uses restricted Boltzmann machines (RBMs) as building blocks for call-routing in call–center customer hotline that gives technical assistance for a Fortune–500 company. RBMs have an advantage of extracting useful features from the data using visible and hidden node architecture. The obtained features are fed to the layers of RBMs to form DBN, and trained using Kullback–Leibler divergence. In addition, DBNs are used as feature extractors for the traditional machine learning algorithms like SVMs, Maximum entropy and boosting. The obtained results in that work proves that combining DBNs with SVMs, provide better accuracy that using those learning models individually for solving the call-routing problem. The same method can be used to process speech in medical domain as well. Few other applications of RBMs are seen in [59–61].

3.1 Challenges for Using DL for NLP in Healthcare • Data heterogeneity: EHR data is available in different forms varying from handwritten text to printed documents. DL algorithms must be able to parse and understand this data. Specifically, clinical texts contain abbreviations, shorthand notations and vary from one clinician to another. • Policy and data privacy issues: Training using DL algorithms requires large datasets. Providing this data to DL researchers is always bound by the policies and the privacy concerns of the patients. • Deciding benchmarks: Since many researchers use their own private data they are hesitant to share the data to other researchers and hence, setting a common benchmark for a task in clinical support is difficult. • Inherent problems of DL: These problems come from the DL algorithms themselves such as the choice of the model for a task, data size, tuning hyper parameters, high performance hardware requirements, over fitting and under fitting issues, generalization issues, flexibility (bias and variance tradeoffs) and multitasking (learning multiple tasks together taking advantage of common knowledge) issues. Summary Natural language processing (NLP) is one among the well sought areas of deep learning research communities. The use of DL to understand and interpret the health records saves time of clinicians while providing timely medications to patients. NLP applications involve the use of a wide range of algorithms from simple rule based data parsing techniques to usage of convolution and recurrent neural networks. There are few challenges and issues in using DL based NLP for CDSS and

Deep Learning for Clinical Decision Support Systems …

91

addressing these would lead to an important milestone in the progress of automated medicine.

4 DL and Wearable Device Technology Wearable technology is revolutionizing consumer electronics. With the advances in circuit technology, wearable devices are being widely used to capture the patterns of the patients for clinical support and decision making process. Apple’s iWatch [62] having Mayo clinic app to capture the health conditions of the patients like heart rate, blood pressure, body temperature and calories burnt, is one of the best examples of wearable technology. Remote patient monitoring is one of the key focus of this technology, where a patient can be monitored without re-admissions, and the patient’s progress can be distantly invigilated and intervened when there is a sudden decline in health condition. The devices collect data and transmit to a centralized cloud where the clinical support decisions are taken. In general, combination of DL with the wearable technology is derived from the comparison of the big data system with the human nervous system [63]. The human body-central nervous system comprises the brain and spinal cord as the major organs. The spinal cord picks the signals from the different parts of the body using sense organs. The same phenomenon is imitated in the wearable-DL technology where the cloud supported by DL constitute the brain of the system, the sensing and communication modules are analogous to the sensory organs and the spinal cord respectively. The complete block diagram of wearable technology with DL is shown in Fig. 6. The internal architecture of a wearable sensor is shown in Fig. 7, where there is an internal battery and a charger unit or sometimes the module can be driven by an external processor to which it is interfaced. Bio-sensors are used to fetch the physiological signals from the body. These signals are passed through a pre-amplifier and a signal conditioning circuit to eliminate noise and minor signal artifacts, and an ADC (Analog to Digital Converter) for the conversion. From ADC, the signal reaches a controller that transmits the data wirelessly by using a suitable wireless module. Sometimes sensing modules might send directly to the cloud or to a nearby aggregator that aggregates the data from many sensors and transmits the data to the cloud. Wearable technology has evolved with many improvements and innovations and few among them are discussed here. As discussed before, in traditional wearable technology system, DL resides in the cloud since large amount of computation power is required for the algorithms to execute and sensing module which is driven by a battery cannot afford to execute these algorithms. In this context, authors in [64] provide an innovative solution where the DL tasks are not allotted to the cloud but rather to a local hand held device like a smart phone or a tablet, bringing in the notion of edge computing. Doing this will not require an internet connectivity always and

92

E. Sandeep Kumar and P. Satya Jayadev

Fig. 6 Block diagram of wearable DL

Fig. 7 Block diagram of sensing module

privacy breach which arises due to transfer of data to an external site (cloud) can be avoided. In [65], the authors propose a new idea of using a smart phone as the sensing device with DL programs running on the phone itself. The accelerometers, gyroscopes and the magnetometer sensors available on smart phones are used to study the human activity. The work contains use of SIFT (Scale Invariant Feature Transform) for feature extraction from the signals picked up by the smart phone sensors and the

Deep Learning for Clinical Decision Support Systems …

93

obtained features are passed onto convolution neural network for classifying the signal into a human activity. In [66], the authors propose a complete architecture for CDSS based on wearable technology and basic machine learning algorithms. The architecture contains four tiers: tier-1 does pervasive monitoring of the physiological signals like ECG, EEG, respiratory signals, oxygen and heart rate, body temperature, ankle and foot motion. The obtained signals are passed to tier-2 which provides preliminary decision support to the physicians even though accurate laboratory measurements are not yet available at this stage. In tier-3, a more detailed analysis of the patient combined with the laboratory measurements is carried out. Finally tier-4 provides post-diagnostic suggestions, prescriptions and so on. All these tiers are internally connected to a diagnosis engine that contains machine learning algorithms providing decisions to every tier. All the laboratory test and diagnosis data is fed to that engine that provides adequate decisions at every point of time. The machine learning assistance block contains single or ensemble of learning algorithms. The authors have explored the usage of random forest, naive bayes, K-nearest neighbor, SVM, best-first decision tree and multilayer perceptron models for diagnosis inference. These models openup ways for exploring the usage of deep learning algorithms instead of traditional ML algorithms. A very interesting work is observed in [67], where the authors propose a method to monitor the symptoms of mental health using wearable technology. The locomotion data is picked by GPS, accelerometer and gyroscopes, speech is picked by microphones in smart phones or watch, facial expressions by the camera in the phone, eye blink pattern by camera, electrodermal activity by a smart watch, social interaction pattern by voice calls, twitter and other social network data. Though not many details are discussed as to how these signals can be utilized for monitoring mental health, this opens up a new direction, where a CDSS based on learning of the mental health signals can be designed using the same methodology dealt in [65]. In [68], use of wearable technology to remotely monitor elderly citizens is proposed and referred it as Smart Healthcare Monitoring System (SW-SHMS). The architecture of SW-SHMS has three main parts: patient’s environment where the body is attached with sensors to read temperature, blood oxygen level, heart rate and this sensed data is transmitted to the patient’s smart phone or a gateway device via which the data reaches the cloud. The corresponding block diagram is shown in Fig. 8. Cloud performs various analytics on the data using machine learning and/or DL algorithms to extract useful inference which is later sent to the monitoring platform containing of the doctors who can take clinical decisions and take precautionary measures. According to the survey of existing works in [63], these are the list of DL algorithms that are often seen in combination with wearable technology, they are: deep unsupervised learning—restricted boltzmann machines, deep belief networks, deep boltzmann machines, autoencoders and variational autoencoders, generative adversarial networks and sequence learning; deep supervised learning—feed forward neural networks, deep neural networks, spike neural networks, sequence to sequence

94

E. Sandeep Kumar and P. Satya Jayadev

Fig. 8 SW-SHMS system architecture

learning, RNNs, LSTMs, GRUs, Convolutional LSTMs; deep reinforcement learning—deep Q networks and inverse DRL. Summary Majority of the works that use wearable technology with DL are having similar architecture where the application of analytics and other DL tasks are done on the cloud. However, there are exceptions being developed to this, where researchers propose use of edge computing to address the issue of privacy breach and computation burden of centralized cloud methods. Almost all kinds of DL algorithms are being used to perform analytics. The results obtained from the analytics are being used to assist clinicians to take further decisions with respect to a patient.

5 Issues in Using DL for CDSS The following are a few problems that are still prevailing towards usage of DL for CDSS: The following are a few problems that are still prevailing towards usage of DL for CDSS: 1. Regulations and policies: There are no fixed rules and regulations for using DL in clinical decision support systems. To overcome this difficulty, US FDA

Deep Learning for Clinical Decision Support Systems …

2.

3.

4.

5.

6.

95

made the first set of regulations [69] for assessing AI systems in healthcare. The guidelines mentioned by FDA clearly notifies about the use of data and adaptive designs in clinical trials. In this direction, Arterys’ medical imaging platform became the first FDA-approved DL platform for CDSS. Data sharing: Training and validation of DL systems requires huge amount of data, and the sharing of it among the hospitals and the DL experts. Currently there are no incentives for people to share data and also they are bound by IP rights and privacy policies. However, the data exchange is now slowly turning towards a reward based system, one best example is the insurance companies collect data from physicians for data analytics and also crowd sourcing of health data is slowly booming up. Data compatibility: Sometimes the data obtained by the machines and the procedures adopted in healthcare is often not useful for DL/ML systems due to lack of compatibility with the algorithms in use. Privacy issues: As already mentioned, health data is personal information of an individual and many times family member, relatives and clinicians may refuse to provide the data as a notion of privacy breach. To solve this DL experts came up with the concept of distributed machine learning where the training and testing of the learning algorithm will happen at the place where data is generated without transferring it to the centralized cloud. However, the method might still take a considerable amount of time to become acceptable to medical practitioners and be regularly used by them. Sociocultural issues: Most of the patients or clinicians do not trust the use of AI in healthcare and in many cases people are more cautious to stake their lives or careers for using AI. Also, people working in medical domain have feared job insecurity due to the AI systems showing higher level of accuracy than human experts. In addition, the concept of AI is not understandable by majority common people in our society and there is fear due to unawareness and uncertainty. Transparency: Many DL algorithms contain black boxes without much inner details and lack in explaining the clinicians why certain prediction are coming from an algorithm. This makes a clinician not to have much on trust AI based systems.

6 Future Research Directions 1. Explanatory DL: As already mentioned, many DL algorithms lack explanation for the predictions made and work like black boxes. This aspect will not allow sufficient trust to develop on DL based CDSS, where clinicians have to take life and death decisions. This is one prominent issue that needs to be addressed in the future research.

96

E. Sandeep Kumar and P. Satya Jayadev

2. Rationality Versus Irrationality: Humans can be extremely irrational with regard to medical decisions. But ML and DL algorithms are trained to be rational learners. This issue has to be addressed in the future research and there are researchers coming up with game theoretic based solutions for this purpose. However, it is still an open issue. 3. Data security and privacy: When there are pool of databases or a crowd sourced data pools being together used for training and testing ML/DL algorithms online, there are always chances of algorithms getting compromised or data getting leaked. Hence, security is always a concern of these expert systems. In addition, the privacy of the patients should also be safeguarded. In this context, novel security and privacy algorithms are always needed. 4. Artificial Intelligence (AI) to Extended Intelligence (EI): The notion of human intelligence versus artificial intelligence should fade away in the upcoming research. This also increases trust among patients and clinicians to use AI in healthcare. This is possible only if the AI transforms to EI where machines become part of the learning and supporting ecosystem i.e. machines providing support to the activities performed by humans in their daily lives. 5. Skewed or Imbalanced datasets: It becomes extremely important that the data set that we are using for the training and validating the AI algorithms are not using the datasets that are skewed towards a single class. In case the data is class imbalanced, then either the data is to be pre-processed or algorithms be modified for learning to happen in an unbiased manner. This is hardly addressed in many of the existing works and needs to be looked into in the future. 6. DL for prognosis: Majority of the works that focus on the use of deep learning aims at inferences which are required for diagnosis. However, there are limited works that aim at prognosis (to have knowledge before hand, to know how likely the health situation is going to turn out) leading to a medical condition. Using DL for prognosis apart from diagnosis will be a good direction for future research.

7 Conclusions In this chapter, we discussed three important applications of DL for CDSS that are towards image analysis, natural language processing and wearable technology. When disruptive technologies like AI makes its way into the world of healthcare and medicine, the traditional methods of healthcare and means of taking clinical decisions undergo drastic transformation. This is very much essential in today’s era where we are heading towards building of a smart city in which smart healthcare holds high prominence. Usage of sophisticated learning algorithms have been making the tasks of clinicians easy, saving their time, and increasing the quality of life of patients and common people. There are few issues associated with using AI and DL in CDSS especially with security and privacy concerns, but nevertheless in the future there is no doubt that DL will become one of the most powerful tools for decision making in clinical diagnosis leading to smart healthcare.

Deep Learning for Clinical Decision Support Systems …

97

References 1. Esteva, A., Robicquet, A., Ramsundar, B., Kuleshov, V., DePristo, M., Chou, K., Cui, C., Corrado, G., Thrun, S., Dean, J.: A guide to deep learning in healthcare. Nat. Med. 25, 24–29 (2019) 2. Safran, C., Bloomrosen, M., Hammond, W.E., Labkoff, S., Markel-Fox, S., Tang, P.C., Detmer, D.E.: Toward a national framework for the secondary use of health data: an American medical informatics association white paper. J. Am. Med. Inf. Assoc. 14(1), 1–9 (2007). https://doi.org/ 10.1197/jamia.m2273. ISSN 1067-5027. PMC 2329823. PMID 17077452 3. Atta-ur-Rahman, M.I.B.A: Virtual clinic: a CDSS assisted telemedicine framework. In: Telemedicine Technologies, chap. 15, 1st edn. Elsevier (2019) 4. Atta-ur-Rahman, S.M.H., Jamil, S.: Virtual clinic: a telemedicine proposal for remote areas of Pakistan. In: 3rd World Congress on Information and Communication Technologies (WICT’13), pp. 46–50, 15–18 Dec, Vietnam (2013) 5. Wang, J.X., Sullivan, D.K., Wells, A.J., Wells, A.C., Chen, J.H.: Neural networks for clinical order decision support. AMIA Jt. Summits Trans. Sci. Proc. 2019, 315–324 (2019) 6. Yang, Z., Huang, Y., Jiang, Y., Sun, Y., Zhang, Y.-J., Luo, P.: Clinical assistant diagnosis for electronic medical record based on convolutional neural network. Sci. Rep. 8(6329) (2018) 7. Yamashita, R., Nishio, M., Do, R.K.G., Togashi, K.: Convolutional neural networks: an overview and application in radiology. Insights Imaging 9, 611–629 (2018). https://doi.org/ 10.1007/s13244-018-0639-9. Springer Publications 8. Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., van der Laak, J.A.W.M., van Ginneken, B., Sánchez, C.I.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017) 9. Avendi, M., Kheradvar, A., Jafarkhani, H.: A combined deep-learning and deformable-model approach to fully automatic segmentation of the left ventricle in cardiac MRI. Med. Image Anal. 30, 108–119 (2016) 10. Szegedy, C., Toshev, A., Erhan, D.: Deep Neural Networks for Object Detection. NIPS (2013) 11. Wang, G., Li, W., Zuluaga, M.A., Pratt, R., Patel, P.A., Aertsen, M., Doel, T., David, A.L., Deprest, J., Ourselin, S., Vercauteren, T.: Interactive medical image segmentation using deep learning with image-specific fine tuning. IEEE Trans. Med. Imaging 37(7), 1562–1573 (2018) 12. Guo, Z., Li, X., Huang, H., Guo, N., Li, Q.: Medical image segmentation based on multimodal convolutional neural network: study on image fusion schemes. In: IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), 4–7 Apr 2018, Washington, D.C., USA, pp. 903–907 13. Zhang, J., Xie, Y., Wu, Q., Xia, Y.: Medical image classification using synergic deep learning. Med. Image Anal. 54, 10–19 (2019) 14. Koitka, S., Demircioglu, A., Kim, M.S., Friedrich, C.M., Nensa, F.: Ossification area localization in pediatric hand radiographs using deep neural networks for object detection. PLoS One 13(11), e0207496 (2018). https://doi.org/10.1371/journal.pone.0207496 15. Deniz, C.M., Xiang, S., Hallyburton, R.S., Welbeck, A., Babb, J.S., Honig, S., Cho, K., Chang, G.: Segmentation of the proximal femur from MR images using deep convolutional neural networks. Sci. Rep. 8(16485) (2018) 16. Abd-Ellah, M.K., Awad, A.I., Khalaf, A.A.M., Hamed, H.F.A.: Two-phase multi-model automatic brain tumour diagnosis system from magnetic resonance images using convolutional neural networks. EURASIP J. Image Video Process. 2018, 97 (2018) 17. Kamnitsas, K., Ledig, C., Newcombe, V.F.J., Simpson, J.P., Kane, A.D., Menon, D.K., Rueckert, S., Glocker, B.: Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Med. Image Anal. 36, 61–78 (2017) 18. Chakravarty, A., Sivaswamy, J.: RACE-net: a recurrent neural network for biomedical image segmentation. IEEE J. Biomed. Health Inf. 19. Wang, S., He, K., Nie, D., Zhou, S., Gao, Y., Shen, D.: CT Male pelvic organ segmentation using fully convolutional networks with boundary sensitive representation. Med. Image Anal. (2019)

98

E. Sandeep Kumar and P. Satya Jayadev

20. Ambellan, F., Tack, A., Ehlke, M., Zachow, S.: Automated segmentation of knee bone and cartilage combining statistical shape knowledge and convolutional neural networks Data from the osteoarthritis initiative. Med. Image Anal. 52, 109–118 (2019) 21. Gao, Y., Phillips, J.M., Zheng, Y., Min, R., Fletcher, P.T., Gerig, G.: Fully convolutional structured LSTM networks for joint 4D medical image segmentation. In: IEEE 15th international symposium on biomedical imaging (ISBI 2018), Washington, DC, 2018, pp. 1104–1108. https://doi.org/10.1109/isbi.2018.8363764 22. http://brainweb.bic.mni.mcgill.ca/brainweb/ 23. http://braintumorsegmentation.org/ 24. https://nihcc.app.box.com/v/ChestXray-NIHCC 25. https://www.cancerimagingarchive.net/ 26. http://www.oasis-brains.org/#data 27. http://adni.loni.usc.edu/ 28. https://fitbir.nih.gov/ 29. http://cecas.clemson.edu/~ahoover/stare/ 30. http://lbam.med.jhmi.edu/ 31. https://www.insight-journal.org/midas/ 32. http://archive.ics.uci.edu/ml/index.php 33. http://www.via.cornell.edu/databases/ 34. http://www.eng.usf.edu/cvprg/ 35. https://wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI 36. http://www.isi.uu.nl/Research/Databases/SCR/ 37. http://www.via.cornell.edu/crpf.html 38. http://peipa.essex.ac.uk/info/mias.html 39. http://www2.it.lut.fi/project/imageret/diaretdb1/ 40. https://oai.epi-ucsf.org/datarelease/ 41. IBM Watson Clinical Decision support system. https://www.ibm.com/watson-health/solutions/ clinical-decision-support 42. Meystre, S., Haug, P.J.: Natural language processing to extract medical problems from electronic clinical documents: performance evaluation. J. Biomed. Inf. 39(6), 589–599 (2006). ISSN 1532-0464 43. Anderson, H.D., Pace, W.D., Brandt, E., Nielsen, R.D., Allen, R.R., Libby, A.M., West, D.R., Valuck, R.J.: Monitoring suicidal patients in primary care using electronic health records. J. Am. Board Fam. Med. 28(1), 65–71 (2015). https://doi.org/10.3122/jabfm.2015.01.140181 44. Fiszman, M., Chapman, W.W., Aronsky, D., Evans, R.S., Haug, P.J.: Automatic detection of acute bacterial pneumonia from chest X Ray reports. J. Am. Med. Inform. Assoc. 7(6), 593–604 (2000) 45. https://mimic.physionet.org/ 46. https://www.i2b2.org/NLP/DataSets/Main.php 47. https://healthdata.gov/search/type/dataset 48. https://bchi.bigcitieshealth.org/indicators/1827/searches/34444 49. https://www.mortality.org/ 50. https://archive.ics.uci.edu/ml/datasets/MHEALTH+Dataset 51. https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/ Medicare-Provider-Charge-Data/Physician-and-Other-Supplier.html 52. https://dbarchive.biosciencedbc.jp/index-e.html 53. https://hcup-us.ahrq.gov/databases.jsp 54. https://seer.cancer.gov/faststats/index.html 55. https://gengo.ai/datasets/18-free-life-sciences-medical-datasets-for-machine-learning/?utm_ campaign=c&utm_medium=quora&utm_source=rei 56. Shickel, B., Tighe, P.J., Bihorac, A., Rashidi, P.: Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J. Biomed. Health Inf. 22(5), 1589–1604 (2018). https://doi.org/10.1109/JBHI.2017.2767063

Deep Learning for Clinical Decision Support Systems …

99

57. Sarikaya, R., Hinton, G.E., Deoras, A.: Application of deep belief networks for natural language understanding. IEEE/ACM Trans. Audio, Speech, Lang. Process. 22(4), 778–784 (2014). https://doi.org/10.1109/TASLP.2014.2303296 58. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010). https://doi.org/10.1109/TKDE.2009.191 59. Jin, Y., Zhang, H., Du, D.: Improving deep belief networks via delta rule for sentiment classification. In: IEEE 28th international conference on tools with artificial intelligence (ICTAI), San Jose, CA, pp. 410–414 (2016). https://doi.org/10.1109/ictai.2016.0069 60. Jiang, X., Zhang, H., Duan, F., Quan, X.: Identify Huntington’s disease associated genes based on restricted Boltzmann machine with RNA-seq data. BMC Bioinf. 18(1), 447 (2017). https:// doi.org/10.1186/s12859-017-1859-6 61. Tomczak, J.M.: Learning informative features from restricted Boltzmann machines. Neural Process. Lett. 44(3), 735–750 (2016). https://doi.org/10.1007/s11063-015-9491-9. Springer Publications 62. https://www.apple.com/in/watch/ 63. Dargazany, A.R., Stegagno, P., Mankodiya, K.: Wearable DL: wearable internet-of-things and deep learning for big data analytics—concept, literature, and future. Mob. Inf. Syst. (8125126), 20 (2018). https://doi.org/10.1155/2018/8125126 64. Xu, M., Qian, F., Zhu, M., Huang, F., Pushp, S., Liu, X.: DeepWear: adaptive local offloading for on-wearable deep learning. IEEE Nat. Future Mob. Inf. Syst. Article ID 8125126, 20 (2018). https://doi.org/10.1155/2018/8125126TransactionsonMobileComputing, https://doi.org/10.1109/tmc.2019.2893250 65. Ravi, D., Wong, C., Lo, B., Yang, G.: Deep learning for human activity recognition: a resource efficient implementation on low-power devices. In: IEEE 13th international conference on wearable and implantable body sensor networks (BSN), San Francisco, CA, pp. 71–76 (2016). https://doi.org/10.1109/bsn.2016.7516235 66. Yin, H., Jha, N.K.: A health decision support system for disease diagnosis based on wearable medical sensors and machine learning ensembles. IEEE Trans. Multi-Scale Comput. Syst. 3(4), 228–241 (2017). https://doi.org/10.1109/tmscs.2017.2710194 67. Abdullah, S., Choudhury, T.: Sensing technologies for monitoring serious mental illnesses. IEEE Multimedia 25(1), 61–75 (2018). https://doi.org/10.1109/mmul.2018.011921236 68. Al-khafajiy, M., Baker, T., Chalmers, C., Asim, M., Kolivand, H., Fahim, M., Waraich, A.: Remote health monitoring of elderly through wearable sensors. Multimed. Tools Appl. 78(17), 24681–24706 (2019). https://doi.org/10.1007/s11042-018-7134-7. Springer Publications 69. Jiang, F., Jiang, Y., Zhi, H., et al.: Artificial intelligence in healthcare: past, present and future. Stroke Vasc. Neurol. 2 (2017). https://doi.org/10.1136/svn-2017-000101

E. Sandeep Kumar completed his Bachelor of Engineering (B.E.) in Telecommunication Engg., from Jawaharlal Nehru National College of Engineering (JNNCE), Shimoga, Karnataka, India with six merit awards and distinction. He completed his Master of Technology (M. Tech) in Digital Communication Engineering from M.S. Ramaiah Institute of Technology (MSRIT), Bangalore, India with first rank and a gold medal. Currently, he is a collaborative Ph.D. scholar with MSRIT, IIT-Madras and FIU, Miami. His area of interest is data and network science and has published many papers in international, national journals and conferences. Pappu Satya Jayadev earned his Bachelors in Electrical and Electronics Engineering, with distinction, from Gayatri Vidya Parished College of Engineering, Visakhapatnam. Currently, he is a graduate scholar (M.S. + Ph.D.) at IIT Madras, working with Dr. Ramkrishna Pasumarthy and Dr. Nirav Bhatt. He is affiliated with the Robert Bosch Center for Data Science and AI, and Systems and Control groups at IIT Madras. His research interests include analysis, optimization and control of systems, applying the tools of machine learning and deep learning. His works have been published in multiple national and international conferences.

Review of Machine Learning and Deep Learning Based Recommender Systems for Health Informatics Jayita Saha, Chandreyee Chowdhury and Suparna Biswas

Abstract Recommender Systems have become essential in personalized healthcare as they provide meaningful information to the patients depending on the specific requirements and availability of health records. With the improvement of machine learning techniques, the recommender system brings about several opportunities to the medical science. Systems can perform more efficiently and solve complex problems using deep learning, even when data set is diverse and unstructured. Here we present a comprehensive overview of the challenges associated with the existing recommender systems. Machine learning and deep learning techniques that are generally applied for health recommender system are discussed in detail along with their application to health informatics. Keywords Health informatics · Recommender system · Machine learning · Deep learning · Smart healthcare · Semi supervised learning

1 Introduction to Biomedical and Health Informatics World is facing major demographic challenges such as increase of life expectancy leading to aging population and prevalence of chronic diseases. Treatment of such diseases requires daily monitoring, often through hospitalization. These challenges are compounded by the rising healthcare costs. Thankfully, technology has come up a long way to provide assistance to citizens especially for monitoring health parameters under free living conditions. Thus, the period of hospitalization may be reduced while improving the quality of life of citizens. User behavior can be J. Saha · C. Chowdhury (B) Computer Science and Engineering, Jadavpur University, Kolkata, India e-mail: [email protected] J. Saha e-mail: [email protected] S. Biswas Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology, Kolkata, India e-mail: [email protected] © Springer Nature Switzerland AG 2020 S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics, Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_6

101

102

J. Saha et al.

objectively monitored through non-invasive sensing technologies to shed light on the relation between effects of physical activity and daily lifestyle on health of the individual. Health informatics helps to link technologies, communications and healthcare to improve safety, quality of healthy lifestyle and operating medical information systems. Here, informatics refers to the science of how to apply knowledge extracted from collected data to improve health and the quality of health care services. Hence, in health informatics, computer and information science principles are applied for the betterment of patient care and public health. Such technologies have come up to provide assistance to both health professionals and citizens, especially for monitoring health conditions through detailed analysis of various health records as described in Fig. 1. These assistance applications play a crucial role in spreading health awareness. For instance, smartphone and wearable sensing devices are used to collect human daily activities and information related to everyday life [1].

Fig. 1 Components of biomedical and health informatics

Review of Machine Learning and Deep Learning …

103

Current studies have shown that keeping track of lifestyle related information such as daily steps, body weight and spent calories are very useful to develop user awareness that may ultimately lead to healthy lifestyle—a crucial component for treating many chronic diseases. In fact, these measurements over time may reveal interesting insights concerning the user efforts and the final outcome. In this context, recommender systems could be readily utilized for health informatics which may lead to the improvement of chronic health conditions. Thus, in this chapter an overview of recommender systems is presented. The state-of-the-art applications where these are used, machine learning, especially, deep learning techniques that are applied is also detailed in this chapter.

2 Introduction to Recommender System Recommender systems aim to help users by providing suitable options to execute a task easily and efficiently. Such systems learn user behavior by filtering through a large amount of data [2]. Two scenarios for health recommendation system are as follows. • In the first scenario, the health professionals are considered to be the end-users of health recommender systems. • In the second scenario, patients are considered to be the end-users. Health professionals would be able to retrieve additional information, such as related research articles or clinical guidelines through the use of recommender systems. In the second scenario, the end-users may be benefitted by getting evidence based, high quality health related content. Recommender Systems (RS) [3] in healthcare are useful in decision making and assisting in personalized healthcare for generating meaningful recommendations depending on the domain and the particular characteristics of available health records. However, a recommender system should also comprehend the patient, the requirements, and the attitudes in the context of health and disease management. Thus, health recommender systems should be more sensitive for these kind of applications [2]. In the following subsections, the architecture of such systems and its applications that are reported in the literature are discussed.

2.1 Application in Healthcare Recommendation systems are getting increasingly popular day-by-day, especially for applications in health informatics. Ranging from Bioinformatics to predicting the spread of infectious diseases, such systems are often applied to detect any hidden pattern and hence recommend a suitable solution. These systems are mostly based on machine learning, especially deep learning techniques. These learning techniques

104

J. Saha et al.

Table 1 Overview of the applications of recommendations for health informatics Area of applications

Application

Input parameters

Learning techniques

Bioinformatics [4–6]

Drug design

Molecule compounds

Deep neural network

RNA binding protein Compound protein interaction

Gene RNA/DNA sequences Molecule compounds

Deep belief network

Tissue classification Organ segmentation Tumor detection Hemorrhage detection

MRI/CT images Microscopy Hyperspectral images Endoscopy images

Convolutional neural network

Monitoring of biological parameters Anomaly detection

ECG, EEG Devices implanted

Convolutional neural network

Human activity recognition

Wearable sensing devices Smartphones Video

Convolutional neural network

Obstacle detection Sign language recognition Hand gesture recognition

RGB-D camera Real-sense camera Depth camera

Convolutional neural network

Lifestyle diseases Infectious disease epidemics

Text messages Social media data Geo-tagged images

Convolutional neural network

Medical imaging [7–9]

Pervasive sensing [10–13]

Public health [14, 15]

Deep neural network

Convolutional deep belief network Deep neural network

Deep belief network

Deep belief network

Deep belief network Deep neural network

are detailed in the subsequent sections. In Table 1, representative health informatics applications are summarized that employs different deep learning techniques. These techniques are discussed in detail in subsequent sections. The main application domains are as follows. Bioinformatics − It focuses on investigating and understanding the biological processes at a molecular level. Pharmacogenomics is a field of bioinformatics that attempts to analyze the variable drug response of different subjects because of their genetic differences.

Hence this field explores the design of more effective drugs for personalized treatment thereby reducing side effects. Understanding the influence of environmental factors on formation of protein and their interactions is another interesting application where deep learning techniques are found to be very useful. • Medical Imaging—Automated medical image analysis is a crucial requirement today for modern medicine. In recent years, deep learning techniques, especially convolutional neural networks are becoming increasingly popular in the medical imaging research community. It is because deep learning techniques are found

Review of Machine Learning and Deep Learning …

105

to perform extremely well for computer vision applications and run-time performance of such techniques could be improved when parallelized on GPUs. • Pervasive Sensing—Ambient, wearable and even implantable sensors are used to monitor body vitals for health for elderly specifically under free living conditions. Regular monitoring of energy expenditure of a person throughout the day along with his food intake helps him to curb obesity and thus improve personal health. Different wearable and ambient sensors are used to monitor daily activities provide assistance for elderly patients to improve their quality of life. Human activity recognition can also be utilized for rehabilitation of heart and stroke patient and post trauma recovery. Such activity recognition can be performed using wearable and implantable assistive devices. Continuous monitoring of body vital signs are important for improving the treatment of patients in critical care as physical conditions of such patients need to be carefully analyzed [16]. • Public Health—This has come up as an important discipline as it aims to prevent disease proactively by analyzing the possible spreading patterns of a disease. It also aims to investigate the influence of environmental factors on social behaviors. The spread of a disease or even social habits induced by environmental factors can be localized to a small area, a state or even across country. Public health applications mostly focus on different patterns of spread of epidemics and lifestyle diseases and analyze the inherent factors influencing such behavior. As the data size increases, scalability becomes a crucial issue that is hard to be addressed by the conventional predictive models. Therefore, performance tuning of these systems is difficult and can only be done by domain experts. The deep learning algorithm designs mostly explore online machine learning. Thus, cost function optimization takes place sequentially and new training datasets are considered as input to the system. So, evidently, deep learning techniques play important roles in the health recommender system for research studies in public health [17].

2.2 System Architecture A health recommender system has several phases following the basic architecture of a health informatics system as described in Fig. 2. Publicly available health datasets and quality metrics are two key concerns for the success of recommender systems in health informatics.

3 Overview of Health Recommender System Machine learning algorithms are very useful for various recommendation systems for the application domains stated in the previous section. It can provide better recommendations from traditional approaches. It can reduce computation complexity

106

J. Saha et al.

Fig. 2 System architecture of health recommender system

and work with multi source data. Existing Health Recommender Systems (HRS) can be classified into two categories based on their application. (a) Disease Diagnosis HRS People with multiple health conditions may have specific challenges and comorbidities. The onset of a challenge could divulge an underlying medical condition. In this way, the medical conditions may be diagnosed early so as to provide early care through recommendations, which is otherwise not possible. Thus, medical conditions leading to medical emergencies may also be prevented. Healthcare recommender systems for diagnosis and monitoring of chronic diseases play an important role in the continuous monitoring and support of people in need through extending proper advice and prediction of risks associated with diagnosed diseases. Such systems may act as managing and controlling tools to assist physicians and patients. However, providing an accurate recommendation for medical data in real-time is a challenging task due to factors such as the complexity of medical data in terms of unbalanced, large, multi-dimensional, noisy and/or missing data. Depression and mental disorder are increasingly becoming a major problem in present society. Depression is usually accompanied by a negative effect, the assortment of physical, emotional, and behavioral symptoms. Hence, an intelligent health recommender system is proposed in [18], based on smartphones to monitor patients with a mental disorder (mainly related to anxiety) and provides treatment as necessary. Recommender systems are designed exploiting IoT enabled technologies for mHealth domain [3] to acquire patient data based on which proper advice is rendered. Such systems facilitate the task of caregivers by suggesting suitable advice that may lead patients towards a better quality of life. To tackle with sufficient dataset, existing benchmark dataset has been referred to the experiment with the proposed system.

Review of Machine Learning and Deep Learning …

107

Using heterogeneous sensors various physiological signals are sensed to analyze the patient condition to prescribe personalized solutions. The Cloud based architecture of recommender systems help in uploading and downloading of health data with proper access control policy. In [19], a recommender system especially for patients suffering from chronic diseases such as diabetes is designed to improve quality of life by assisting both patients and caregivers with the prediction of accurate disease related risks and trustworthy health recommendations. Accurate prediction model has been built to diagnose risks related to chronic diseases by applying multiple classifications using decision tree algorithms and to prescribe more accurate medical advice by applying unified collaborative filtering based on patients’ medical history, external features, etc. Challenges of existing recommender systems are: (i) missing or erroneous data due to human error or sensor devices, large size of medical database, etc. (ii) two dimensional data problems—one is based on historical recommendations and another is the relation between the patient’s external features and the practitioner’s advice. Accordingly the recommendation system presented in [19] is found to outperform in terms of recall, precision using random forest algorithm compared to other algorithms such as J48, decision stump, REP tree, etc. Intelligent and accurate recommender system development has attracted funding for its relevance in current socio-economic condition and having the support of enabling technologies such as IoT, machine learning, big data, etc. In [20], the authors proposed and developed a recommender system for personalized care and support of people suffering from dementia, which causes memory loss to the sufferers whose number is increasing alarmingly worldwide. This work is funded by EU H020 project and targeted to build a software platform considering dementia patients and their caregivers as a dyad. Dependence of recommender systems on user data creates problems that are termed as cold-start problem. Dealing with new users in the system is problematic as sufficient information may not be present in the database for a new user. There should be a balance between generalized solutions based on general model and over-accuracy/overestimation. (b) Content based HRS This type of recommender systems is intended for users to semantically explore and detect his/her disease related conditions. Such systems often follow a layered architecture as in [21]. This is comprised of (i) user layer to keep a record of interactions of user agents and their preferences, to manage semantic search, data source access and ranking of preferences, (ii) data layer to store acquired data with access control. The performance of this system can be improved further by combining this semantic based approach with a more structured medical practitioner based method. The huge popularity of health related videos on the Internet raises concerns about the video quality and content. To aid people referring to such videos a content based recommender system is designed in [22] to link with health related videos to content rich websites. Method of such linking is done by application of NLP that is, metadata or keywords are extracted from YouTube videos like video name, title, topic, etc.

108

J. Saha et al.

that are used to search for semantic web based content for reference. Correctness and effectiveness of such linking are evaluated through several metrics measurements such as relevance, precision, etc. Systems are also designed to search and select trustworthy health related web based contents available in the internet for recommendation with the individualistic approach [23]. In this context, recommender systems could be categorized as collaborative recommender system, content based, and knowledge based recommender systems, etc. Profiles of users and items, social media information are generally fed as input to the recommender systems.

4 Learning Techniques for Health Informatics Selecting the proper learning technique for analyzing health data is important to mitigate several challenges of the health recommender system. Such techniques are applied to build patterns to describe, analyze, predict data and define the current health status of the users. Several works could be found in medical image processing to diagnose and earlier detection of diseases using different machine learning techniques. The existing learning techniques are Supervised, Semi-supervised and Unsupervised as shown in Fig. 3.

Fig. 3 Learning techniques for machine learning and deep learning

Review of Machine Learning and Deep Learning …

109

4.1 Supervised Learning Supervised learning is a learning technique used to identify objects, and diagnose a disease based on previous related data. It can be applied to sufficiently labeled data. Representative supervised classifiers applied for health recommender systems are detailed as follows.

4.1.1

Instance Based Learning

Instance-based learning methods (IBL) [24] are supervised learning algorithms to learn several types of database and classify the objects. Each instance can be described by n attribute-value pairs. In general, a group of related instances is fetched from memory for classifying any new queries. The most popular instance-based learning algorithm is the k-Nearest Neighbor (kNN) algorithm. Though it can solve a complex problem, it is a simple learning algorithm and it can work with little information. Several distance metrics can be used to define the nearest neighbor of an instance. Euclidean distance can be calculated for all instances to find the nearest neighbor in the n-dimensional space as shown in Fig. 4b. Human Activity Recognition (HAR) systems face the challenge, when training and test environment is totally different. One of the major issues is device independent activity monitoring. Smartphone based inertial sensor like accelerometer, gyroscope are generally used for collecting raw sensory data for several daily activities. kNN is applied in [25] for device independent activity monitoring and is found to achieve considerable accuracy. Researchers recently have come up with a term—Energy Expenditure (EE). Both HAR and EE have been investigated, still, certain challenges remain like energy

Fig. 4 Example of a logistic regression and b k nearest neighbor (kNN) classification

110

J. Saha et al.

consumption during human movement or no movement. In [26] authors made an attempt to solve this problem using an accelerometer and ECG. To propose this system data are collected from thirteen voluntary participants for six daily activities. Some selected Heart-Rate Variability (HRV) parameters are used to analyze the performance of HAR system. The activity-specific model with HRV parameters provides better performance. Their results indicate that the use of human physiological data has an important effect on HAR and Energy expenditure, which are important for assisted living as it aids healthcare system efficiently.

4.1.2

Decision Tree (J48)

The decision tree (J48) algorithm can be used in classification and regression problem and it can solve the problem by using tree representation. It can represent the decision explicitly and visually. Each tree contains internal node and leaf nodes. The internal node corresponds to an attribute and class labels are present in the leaf node. The representation of the tree is understandable as if-then rules are used here. Trees are grown arbitrarily, so a minimum number of inputs should be fixed for leaf node or the maximum depth of the model should be specified. Pruning helps to improve performance and reduce the complexity of this algorithm. It removes a few branches of the tree, which make use of features having low importance. The authors in [19] proposed a health recommender system for disease based on decision tree and collaborative filtering. The disease related data are mostly huge and collected from multiple sources. Most of the time data are multi-dimensional and few data are missing or noises are present in the dataset. It becomes difficult to handle those data using traditional approaches. Filtering techniques are used to remove the noises and reduce the ambiguous labels. Decision tree is applied here to build a model for predicting, diagnosis of the diseases and their risk. An ensemble model of Random Forest is built using several decision trees. The unified collaborative filtering method helps to achieve better recommendation on the basis of previous records and other features. Decision trees are either used alone or in combination with other supervised classifiers for HRS. In [27], the authors considered smartphone based and wrist worn motion (accelerometer, gyroscope and linear acceleration) sensors to identify several complex activities like smoking, eating, drinking coffee, etc. Naive Bayes, decision tree and k nearest neighbor (kNN) three different classifiers are used for the work with different window size to recognize simple as well as complex activities. GENEActiv is a wrist-worn triaxial accelerometer that is used in [28], to classify walking, running and stationary activities and achieved good accuracy. The authors in [29] deployed both support vector machines (SVM) and decision trees in their framework. Depression prediction and monitoring is a crucial challenge for the health recommender systems. Huge data like user behavior, daily activities, mood details, etc. are needed for analyzing and predicting the disease. The heterogeneous data make the system complex. Hence the authors in [18] proposed an intelligent system to provide

Review of Machine Learning and Deep Learning …

111

useful recommendation. Combination of Decision tree and SVM are used to build this system. Various external factors related to depression are considered to build this prediction model.

4.1.3

Logistic Regression (LR)

Logistic regression is a classification technique that applies the sigmoid function for a linear combination of input features. It can predict the data based on real-valued inputs that are combined linearly using weights or coefficient values. In general, the outputs are binary values 0 or 1. The output of Logistic regression classification when applied on a diabetic dataset with default parameter is shown in Fig. 4a. In [30], the authors proposed a device independent activity monitoring with a minimal number of smartphone inertial sensors. The energy efficient ubiquitous system is machine learning based and, performs well with Logistic Regression using inexpensive time domain features.

4.1.4

Multi-layer Perceptron (MLP)

Multi-layer Perceptron is a feed forward artificial neural network, composed of more than one perceptron. It has at least three layers, (i) the input layer is to feed input patterns, (ii) output layer makes the prediction of the given input, (iii) an arbitrary number of hidden layers in between these two layers. Each node of this network is neuron and use nonlinear activation functions. MLP utilizes a supervised learning technique called back-propagation. In forward pass signal moves from the input layer to output layer through the hidden layer. The outputs are fed back to input following the back-propagation algorithm in order to adjust the weights and biases. It is easily distinguishable from linear perceptron because it has multiple layers and nonlinear activation function. MLP is heavily applied in HRS. For HAR, MLP could be utilized to monitor several detailed daily activities [31]. MLP can also be used in combination with other classifiers to further boost the accuracy. For instance, in [31], it is also applied in combination with LogitBoost, and SVM to identify daily activities even when the smartphone is held by the users in their hands.

4.1.5

Ensemble Model

Sometimes, it could be hard to detect all individual class labels with appreciable accuracy using one base classifier. An ensemble of classifiers can be applied instead. The ensemble model combines the outcome of different base learners. Every base learner attempts to classify the test set instances based on the training set instances. The ensemble model takes a decision about the class label of the test instances through combining the outcome of all the base learners. This adds generality to the

112

J. Saha et al.

system. Bagging and boosting are the two methods of ensembling that are heavily used in literature. In bagging, the training set is divided into a no. of bags and a base classifier is tuned according to each of these subsets forming a set of classification models. But, in boosting, the same training set is applied in different iterations, though each instance is assigned a different weight depending on the ease of classifying the instance in the previous iteration. Ensemble may indicate a combination of different condition based classifiers also. For instance, in [32], a condition based ensemble classifier is formed to address the effect of using different smartphones (having various hardware configurations) and usage behavior, such as, how the smartphone is carried by the user (shirt pockets, right pants pocket, or right hand) on detailed HAR. It follows the principles of bagging. The health care recommendation systems for consumers need to make relevant suggestions on the basis of predicting probability values for different health conditions. The ensemble model is used in [33] to build this kind of model. The Bayesian network and Random Forest are used to build the ensemble model and it provides the better recommendation.

4.2 Semi-supervised Learning Labeling becomes expensive for various healthcare recommendation problems, such as, gathering enough data for different emergency conditions. Hence, semisupervised learning algorithms are designed to deal with a combination of labeled and unlabeled data. Features are extracted from unlabeled data and are mapped to determine the dispersion of data in the feature space.

4.2.1

Multi-instance Learning

In Multi-instance learning, each object contains a set of instances and only associated with a single label as shown in Fig. 5a. Thus, every single instance need not be labeled, only a bag of instances is assigned a proper label.

Fig. 5 Semi-supervised learning technique a multi-Instance learning and b multi-label learning

Review of Machine Learning and Deep Learning …

113

Semi-supervised learning is essential for sparsely labeled data. The authors in [34] proposed a HAR framework to monitor user daily activity. The dataset is sparsely labeled. They applied Multi-Instance Learning (MIL) for handling different annotation strategies. Few novel extensions of MIL are also found in literature to reduce the required level of traditional supervision. MI-SVM, citation kNN classifiers are also designed to deal with multiple instances having a single label. Several types of bags are used to represent the continuous dataset in MIL. In [34], three types of labeling (Single, multi-labeled and majority voting) for the bag of instances are considered to represent the entire test and training dataset. Iterative multi-instance Support Vector Machine (SVM) is found to perform better for single labeled bags, whereas the standard multi-instance SVM has been found to perform better for multi labeled bags.

4.2.2

Multi-label Learning

In Multi-label learning, the training dataset contains instances associated with a set of labels. It can classify the label sets of unseen instances on the basis of training instances with known label sets. In general, one instance is present in a multi-label object and K number of class labels are associated with it as shown in Fig. 5b. The authors in [35] proposed a HAR system based on Multi-label machine learning and Expectation-Maximization (EM) algorithm. The system can identify several activities correctly when there is a time gap between the two actions. The pseudo sequence data are used for the entire experiment. The multi-label data set is stochastically labeled. EM algorithm is executed and the probability distribution of the data labels is learned.

4.2.3

Graph Based Learning

The graph based semi-supervised learning technique is also used for HRS based HAR systems. A small set of labeled data with few unlabeled data is found to be present in the experimental dataset reported in [36]. The HAR framework can record long duration activity data, by using experience sampling without detailed annotations by propagating provided labels to the neighboring data.

4.3 Unsupervised Learning In unsupervised learning, data sets need not have any label, the data pattern is unknown to us and we need to find the hidden patterns in the unlabeled data. It is useful when the approximation of the data label is poor. Clustering is an unsupervised learning mechanism to grouping similar data into clusters. Representative clustering mechanisms that could be applied in HRS is discussed below.

114

4.3.1

J. Saha et al.

k-Means Clustering

k-means clustering is an iterative clustering algorithm. The total number of clusters or k is needed to be defined in the initial state. The centroid is the center of clusters and centroids are randomly chosen in the initial state. All the data points are grouped into k clusters so that each data point belongs to any one of the clusters. The algorithm starts with choosing initial individual centroids for k clusters. Each data point is assigned to its nearest centroid and all the points which are assigned to a centroid create a cluster. The centroids of clusters are updated and assignments of data points are changed from the initial state. It will continue until the clusters stabilize. The authors in [37] proposed a system to monitor daily activity based on skeletal movement data. Data are collected from an inexpensive RGBD (RGB-Depth) sensor. The data labeling is not properly maintained, hence unsupervised learning is used here. Appreciable precision and recall values are reported using k-means clustering.

4.3.2

Density Based Spatial Clustering of Applications with Noise (DBSCAN)

It is a clustering technique and mainly used in unsupervised learning. No need to set a cluster number on a priority basis. It can build arbitrary shaped clusters. Two important parameters are used in this algorithm to define the clusters. One is eps or epsilons a positive number to denote the radius of the neighborhood of a point. Another is MinPts, the minimum number of points in between eps-neighborhood of a point. It can identify the data points which are in a dense region of the feature space. The data points within a dense region with MinPts are known as core points. Border point has no MinPts, it is directly reachable from the core point. Initially, it starts by randomly choosing a data point p from the dataset, which is not included in any cluster. The neighborhood of that point is calculated. The given data point p is considered as core point when there are more MinPts points including p within epsneighborhood distance. All the directly reachable points of eps-neighborhood from point p are included to create the cluster. Expand the cluster as necessary to include all the density reachable points. The algorithm randomly picks the next unclassified data point from the dataset when there is no more points to include in the present cluster. The algorithm will be continued until all the points are classified or processed. Few points have less than MinPoints points and do not belong to any cluster. Those points are considered as noise points and are discarded. It can separate high density cluster from low density cluster and manage outlier points in a given dataset. Figure 6a shows the dataset before clustering and Fig. 6b represents the clusters after applying DBSCAN. The DBSCAN cannot work well if the density of clusters are very high dimensional and varies widely. Sometimes it is difficult to get the proper labeling of each activity performed by a user. Hence, instance wise labeling becomes costly. The authors in [39] proposed a HAR using Unsupervised Learning. Here, the number of different activities is unknown. Data are collected from smartphone inertial sensors. Several features are

Review of Machine Learning and Deep Learning …

115

Fig. 6 DBSCAN clustering technique a original dataset before clustering, b clustered data [38]

extracted. The mix of Gaussian method with DBSCAN clustering makes this system more efficient. With proper tuning of MinPts and eps, it achieves good accuracy for daily living activities when the number of activity is unknown to the system.

4.3.3

Hierarchical Clustering

Hierarchical clustering is another type of clustering algorithm. The data points are grouped together to form a tree or hierarchy of clusters. The clusters are graphically represented using dendrogram. Initially, all data points are assigned a cluster. It needs a terminating condition to stop the algorithm. In general, two types of hierarchical clustering are available, one is Agglomerative (bottom–up) and another is Divisive (top–down). Agglomerative clustering starts with each cluster representing a single data point. All the similar pair of clusters are merged in each step. On the other hand, divisive clustering starts from top level with a single cluster and it includes all the data points. It splits the top level cluster into child clusters in each step until the individual child clusters contain only a single data point. The condition of cluster build-up is known as linkage or dissimilarity of two objects. Several types of linkage are used in Hierarchical clustering. The smallest distance between two points of two different clusters is known as Single link or Min, whereas the maximum distance between two points of two different clusters is known as complete link or max. Initially, the distance between each pair of points is computed for individual clusters and then the average distance between all the points of two different clusters is computed. This is known as Average link or Group Average. Alternatively, Ward’s method can also be used that computes the sum of the square of the distances of individual points of two different clusters. Few state-of-the art works are summarized in Table 2. These works are based on pervasive sensing applications

116

J. Saha et al.

Table 2 Comparison of state-of-the-art works Existing work and year

Sensor or device and position

Learning technique

Implemented classifier

Remarks

[25] 2013

Smartphone, accelerometer, gyroscope, magnetic sensor

Supervised, TM

k nearest neighbor (kNN)

Device independent activity monitoring, with expensive frequency domain features

[37] 2013

RGBD (RGB-depth) sensor

Unsupervised learning, clustering

k-means clustering

Unlabeled data are recognized

[39] 2014

Smartphone inertial sensor

Unsupervised learning, clustering

DBSCAN

Unlabeled data are recognized, when the number of activities are unknown can recognize

[26] 2017

IMU sensor accelerometer, wrist, T-Rex TR100A ECG

Supervised, TM

Linear SVM, RBFSVM, kNN, LDA

Recognition ambulatory activity and energy expenditure during activity. LDA achieves the highest accuracy with IMU and selected heart rate

[27] 2017

Wrist worn motion sensor (accelerometer, gyroscope, barometer)

Supervised, TM

Naive Bayes, decision tree and kNN

Recognition complex activity, vary window size too

[32] 2018

Smartphone accelerometer, gyroscope

Supervised, ensemble learning (condition based)

LR with parameter tuning

Device and position independent detail (activity slow/brisk walk, Sit Floor/Chair) monitoring

Review of Machine Learning and Deep Learning …

117

that also have implications in public health. The table shows how recent works heavily use different types of machine learning techniques stated above.

4.4 Performance Metrics The performance of the system can be measured by several performance metrics [40]. Confusion Matrix: The confusion matrix (C Mn∗n ) represents classification results for the different classification algorithm. It specifies the following. • T r ue Positives (T P): The number of positive instances that were classified as positive. • T r ue N egatives (T N ): The number of negative instances that were classified as negative. • False Positives (F P): The number of negative instances that were classified as positive. • False N egatives (F N ): The number of positive instances that were classified as negative. Sensitivity: Probability that a test result will be positive when a positive label is detected. Sensitivit y =

TP (T P + F N )

Specificity: Probability that a test result will be negative when a negative label is detected. Speci f icit y =

TN (F P + T N )

Accuracy: Overall classification performance for all classes is denoted by the following equation in the state of the art literature. Accuracy =

(T P + T N ) (T P + T N + F P + F N )

F-measure (F1): It computes a model’s accuracy that combines precision and recall. If the output has low false positives and low false negatives, the classifier is correctly identifying real objects. It is defined as follows. F1 = 2 ×

Pr ecision × Recall Pr ecision + Recall

118

J. Saha et al.

Precision: Precision talks about how accurate the model is out of those predicted positive, and how many data points are actually positive. Pr ecision =

TP (F P + T P)

Recall: The completeness of classifiers can be measured using recall. A low recall indicates many False Negatives. Recall indicates how many of the Actual Positives captured by the model are really labeled as Positive (True Positive). Recall =

TP (T P + F N )

Most of the existing works reported here use one or more of the above mentioned performance metrics.

5 Deep Learning for Health Data For bioinformatics and medical imaging applications, it is challenging to build a sufficiently correct recommender system. The main challenge is that with supervised learning, the accuracy does not improve appreciably as we add more data to the system beyond a certain point. Initially, the accuracy improves but it almost stabilizes and does not scale well even when we add more data. Performance of such techniques heavily depends on the handcrafted features that are extracted from data. Today, not only the machines are made more powerful to execute complicated machine learning techniques, but also huge amount of data are available today that sufficiently represents each of the different diseases. Hence, deep learning techniques can be applied to health records that can automatically extract features from data. These techniques can extract higher dimensional features that are perceived by human users but are hard to define mathematically by them. With huge data that sufficiently represents different labels of the data set, deep learning techniques can extract useful features so as to scale the accuracy with more data points. Not only for analyzing medical images or gene sequences, but with digitized societies, and hence, availability of multi-source data, deep learning is also useful for HRS designed for personal recommendations. Deep learning techniques are based on neural networks. Multiple hidden layers are present in a deep network. The first input layer receives several data as inputs, and then the activation functions of the first hidden layer are applied. Then those activations are passed on to the subsequent hidden layers to achieve desired output. With deep learning techniques, health recommender systems could be built based on recommendations from users and their social relations with the patients. In [41], a system is developed on the trust and distrust relations of the recommenders. The

Review of Machine Learning and Deep Learning …

119

node information and structural information are merged with a deep learning method to achieve better performance. Some of the deep learning techniques are discussed below.

5.1 Supervised Learning 5.1.1

Convolutional Neural Network

Convolutional Neural Network is a type of multi-layer neural network containing several hidden layers. Each layer is comprised of neurons and weights are attached to those neurons. Convolution and pooling are the two main functions performed by these hidden layers. Convolution filter is used to generate feature maps. But a large set of features increase computation complexity and it can also be prone to over fitting. So, pooling is used after convolution layers that connect a subset of perceptrons from the previous layer by applying some pooling function (max, min etc.). Pooling is used to reduce the dimensionality so as not to overfit. Different combination of convolution and pooling are used by the CNN based HRS. Rectified Linear Unit (ReLU) can be used as an activation function for CNNs rather than sigmoid functions (mostly used in ANNs). This generally achieves better learning speed for the gradient descent search. The last layer for CNN is a fully connected layer similar to ANNs and is known as softmax. The authors in [10] proposed a CNN based algorithm to detect the Myocardial Infarction (MI) using ElectroCardioGram (ECG) signal. The system is capable to detect abnormal beats from unknown ECG signal even with noise. Two datasets are used for this work. The Daubechies wavelet 6 mother wavelet function is used to remove noise and baseline wander from ECG record. R-peaks of the signal are detected and the entire signal is segmented using R-Peaks. Segmented signals are normalized using Z-score and are passed through the CNN layer. Features are extracted in Convolution layer of CNN from the input ECG signal. The activation function Leaky Rectified Linear Unit is used in several layers of the network and softmax function is used in the last layer. Here max-pooling function helps to reduce the size of the feature map and reduce the number of neurons in the next layers. Backpropagation method is applied in this proposed network. Few parameters are needed to be maintained during execution, like Regularization (control the data overfitting), Momentum (fast or slow network learning time), Learning rate, etc. The proposed system performs better for ECG beats without noise and achieves good accuracy value. The proposed system is beneficial for earlier diagnosis of cardiovascular diseases. Parkinson’s Disease is a neurological movement disorder. Accelerometer signals captured from the wearable sensors attached to the patients could be beneficial for monitoring Perkinson’s patients. In [42], the authors developed such a mechanism. Accelerometer signals are fed as input and several one dimensional kernels are used to filter the input. Bias is applied on the accelerometer data. Max-pooling is used to

120

J. Saha et al.

reduce the feature dimensions. The softmax function is used here in the last layer for the classification. It provides better results from state-of-the art classifiers. Now-a-days, deep learning plays a crucial role in HAR. The authors in [43] presented a deep learning framework based on the operation of CNN, LSTM, and ELM classifier. Most of the existing HAR system applied handcrafted features with expert knowledge, like statistical methods, etc. Here CNN is used in the first stage to extract features from accelerometer signals and it is considered as a higher-level abstraction of raw data. It is difficult to recognize the sequence of activity from real-time sensor data as the temporal dependencies are ignored in the basic structure of this deep network. Several challenges may occur during human activity monitoring, like the similarity between few activity classes (like normal walk or slow walk), variable changes in accelerometer value in a period of time, etc. Hence the authors applied Long ShortTerm Memory (LSTM) along with basic CNN to achieve better recognition accuracy from basic CNN. But in real-time, it becomes difficult to achieve good classification result. So, the Extreme Learning Machine (ELM) is integrated with CNN-LSTM to improve the performance of the proposed framework in real-time. The parameters of hidden layers are chosen randomly and weights are calculated using the least square method. The proposed framework achieves 0.88 F-Score applying Baseline CNN, whereas CNN-LSTM-ELM technique achieves better prediction, and the results are improved with the proposed technique. CNN is used in [44] to model intelligent health recommender systems. It works on supplementary data to find the recommended hospital on the basis of previous data analysis. The Convolution Restricted Boltzmann Machine (RBM) model is the combination of RBM and CNN, works as two layer model and use the features of both the learning methods. It can work with big data and help to build effective health recommender system. Two techniques like Root Square Mean Error and Mean Absolute Error are used to minimize the system errors.

5.1.2

Recurrent Neural Network

In real life, it is difficult to represent all problems with fixed length inputs and outputs. Like time series human daily activity accelerometer trace, and due to continuous data pattern there will be new data samples in each time window, hence it requires a capable system to store the sequence of data and use the context of the information. The Recurrent Neural Networks (RNN) is a robust neural network that can utilize sequential information of the data pattern. This helps to build context aware recommendation systems. RNNs can capture the information about previous computation and use it as input to the next hidden layer. It can process the large network towards the time direction in the training phase and fast sequential process in the identification phase. Sometimes, the output of RNN model not only depends on the previous output in the sequence but also needs future elements as shown in Fig. 7a. This kind of network is known as Bidirectional RNN. RNN can work well with different size of processing inputs. It can take historical information for computation. Generated

Review of Machine Learning and Deep Learning …

121

Fig. 7 Deep learning technique of a recurrent neural network and b restricted Boltzmann machine

weights are shared across the network. But, computation is slow from other network and sometimes it is difficult to access the information of distant past. The authors in [45] proposed a novel patient monitoring framework using RNN and Density Based Clustering method. It can monitor ECG signals, and identify ECG beats with different heart rates of the user. Here, features are extracted automatically, based on morphology information including the current heartbeat and T wave of the former heartbeat. It computes a strong correlation between ECG signals and considers ECG beats with various lengths. Here Long Short Term Memory (LSTM), a variation of RNN is applied to maintain the details of the previous context.

5.2 Unsupervised Deep Learning 5.2.1

Restricted Boltzmann Machine and Deep Belief Network

Restricted Boltzmann Machine (RBM) is an important technique in unsupervised deep learning. It is a probabilistic undirected graphical model as shown in Fig. 7b. It is useful for dimensionality reduction, classification, regression, feature learning, etc. Two layers of RBM are stacked on each other to build a Deep Belief Network (DBN). Several low-level features are extracted from data points and each visible nodes capture that information. Each input is multiplied by a weight and the results are added to a bias. Then these results are fed into an activation function. The activation function helps to generate the output. It can add all the inputs to a single node, then all the inputs are multiplied by weights and summation of all the products are added to the bias. RBM can be trained in a greedy manner to build DBN. The single-layer network of DBN works in dual mode, hidden for its previous node and visible or input layer for the next node.

122

J. Saha et al.

The authors in [11] proposed a HAR system based on DBN using wearable sensors. Features are extracted from the raw data set automatically. The Linear Discriminant Analysis (LDA) and Kernel Principal Component Analysis (KPCA) are used to reduce feature space dimensionality. Several hyper parameters like mini batch size, initial value of weight, learning rate, number of hidden layers and units, etc. are needed to be configured for DBN.

5.2.2

Autoencoder

Autoencoder plays an important role in unsupervised learning of deep network using back propagation technique. An autoencoder network has two parts, encoder, and decoder. In general, encoder part of the autoencoder converts input data to a compressed version without losing relevant information and thus, overall data size is reduced. The reduction process is known as dimensionality reduction. The decoder part helps to convert the data in large format and get the output similar to input. It has the capability to reduce and reproduce the input features. It follows the architecture of traditional neural network, input, and output layers have the same number of nodes. The total number of nodes is less in the hidden layers. Few parameters are important for autoencoders—(i) total number of nodes in middle layer or code size, (ii) total number of layers, (iii) total number of nodes per layer, and (iv) loss function according to the range of input values (cross-entropy and mean squared error). In literature, different types of autoencoders are mentioned, such as, sparse, regularized, and multilayer autoencoders. The authors in [46] proposed a novel deep network based on an unsupervised deep feature learning autoencoder. Here patients’ information is retrieved from Electronic Health Records (EHRs) to represent the general set of information. It helps to manage and compose the multi domain patients’ data automatically without user intervention. It works well to predict diseases for several patients from a large database. The preprocessed data helps to understand detailed information of patients from EHR using a deep sequence of non-linear transformations. Cancer disease is generally detected from gene expression of cells. The difference between gene expressions of normal, non-cancerous tissues and gene expressions of cancerous tissues can be differentiated using unsupervised deep network [47]. Initially, PCA is considered representing high dimensional raw feature space using sparse feature learning techniques. Then, autoencoders are applied to improve the accuracy of cancer detection from gene expressions. It is very difficult to find relevant information of patients’ current condition and understand different medical terms and their relations. Collaborative health recommender systems provide useful recommendation in such cases but it faces several issues like sparse data, cold start problem, etc. In [48], a recommender system is developed that incorporates various external factors (current time, weather etc.) for monitoring users’ daily behavior. It considers flexible context specific input and transition matrices in place of constant data. Autoencoder is applied to build such a collaborative filtering recommendation system. The user preferences are generated

Review of Machine Learning and Deep Learning …

123

using encoding and decoding process of Autoencoder and the data is stored as a matrix. Parameters are optimized to reduce reconstruction issues. In [49], the authors proposed a deep learning based collaborative health recommender system based on heterogeneous data from multiple sources. Variational autoencoder neural network is designed to learn the details of primary care of doctors and extract the various features of patient to incorporate with user profile. It is found to perform appreciably well.

6 Conclusion Learning the health data to detect and identify a disease or anomalies in activities of the user is an important challenge to build a robust health recommender system. We can find various applications of this type of systems like healthcare, early diagnosis, elderly care, fitness tracking, and activity monitoring or fall detection. This chapter provides an insight into the learning techniques used in health recommender systems. It presents the recent trends and developments in machine learning techniques as well as deep learning techniques. Deep learning techniques are found to perform better and make the system more efficient and intelligent due to their automated feature extraction techniques. Here, we have also discussed several unsupervised learning techniques, and how it is helpful when the data set is completely unknown to the system.

References 1. Swan, M.: Sensor mania! The internet of things, wearable computing, objective metrics, and the quantified self 2.0. J. Sens. Actuator Netw. 1(3), 217–253 (2012) 2. Calero Valdez, A., Ziefle, M., Verbert, K., Felfernig, A., Holzinger, A.: Recommender systems for health informatics: state-of-the-art and future perspectives. In: Holzinger, A. (ed.) Machine Learning for Health Informatics. Lecture Notes in Computer Science, vol. 9605. Springer, Cham (2016) 3. Erdeniz, S.P., Maglogiannis, I., Menychtas, A., Felfernig, A., Tran, T.N.T.: Recommender systems for IoT enabled m-health applications. In: Iliadis, L., Maglogiannis, I., Plagianakos, V. (eds.) Artificial Intelligence Applications and Innovations. AIAI 2018. IFIP Advances in Information and Communication Technology, vol. 520. Springer, Cham (2018) 4. Ramsundar, B., Kearnes, S., Riley, P., Webster, D., Konerding, D., Pande, V.: Massively Multitask Networks for Drug Discovery (2015). arXiv:1502.02072 5. Zhang, S., Zhou, J., Hu, H., Gong, H., Chen, L., Cheng, C., Zeng, J.: A deep learning framework for modeling structural features of RNA-binding protein targets. Nucleic Acids Res. 44 (2015). https://doi.org/10.1093/nar/gkv1025 6. Tian, K., Shao, M., Wang, Y., Guan, J., Zhou, S.: Boosting compound-protein interaction prediction by deep learning. Methods 110 (2016). https://doi.org/10.1016/j.ymeth.2016.06.024 7. Xu, T., Zhang, H., Huang, X., Zhang, S., Metaxas, D.N.: Multimodal deep learning for cervical dysplasia diagnosis. In: Ourselin, S., Joskowicz, L., Sabuncu, M., Unal, G., Wells, W. (eds.) Medical Image Computing and Computer-Assisted Intervention—MICCAI 2016. MICCAI 2016. Lecture Notes in Computer Science, vol. 9901. Springer, Cham (2016)

124

J. Saha et al.

8. Brosch, T., Tam, R., The Alzheimer’s Disease Neuroimaging Initiative: Manifold learning of brain MRIs by deep learning. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) Medical Image Computing and Computer-Assisted Intervention—MICCAI 2013. MICCAI 2013. Lecture Notes in Computer Science, vol. 8150. Springer, Berlin (2013) 9. Rose, D.C., Arel, I., Karnowski, T.P., Paquit, V.C.: Applying deep-layered clustering to mammography image analytics. In: Biomedical Sciences and Engineering Conference, Oak Ridge, TN, pp. 1–4 (2010) 10. Acharya, U.R., Fujita, H., Oh, S., Hagiwara, Y., Tan, J.H., Adam, M.: Application of deep convolutional neural network for automated detection of myocardial infarction using ECG signals. Inf. Sci. 415–416, 190–198 (2017) 11. Hassan, M.M., Huda, S., Uddin, M.Z., Almogren, A., Alrubaian, M.: Human activity recognition from body sensor data using deep learning. J. Med. Syst. 42, 99 (2018) 12. Poggi, M., Mattoccia, S.: A wearable mobility aid for the visually impaired based on embedded 3d vision and deep learning. In Proceeding of IEEE Symposium of Computer and Communication, pp. 208–213 (2016) 13. Huang, J., Zhou, W., Li, H., Li, W.: Sign language recognition using real-sense. In: Proceeding of IEEE China, SIP, pp. 166–170 (2015) 14. Garimella, V.R.K., Alfayad, A., Weber, I.: Social media image analysis for public health. In: Proceeding of CHI Conference Human Factors Computer System, pp. 5543–5547 (2016) 15. Zou, B., Lampos, V., Gorton, R., Cox, I.J. On infectious intestinal disease surveillance using social media content. In: Proceeding of 6th International Conference on Digital Health Conference, pp. 157–161 (2016) 16. Saha, J., Chowdhury, C., Biswas, S.: Two phase ensemble classifier for smartphone based human activity recognition independent of hardware configuration and usage behavior. Microsyst. Technol. 24, 2737 (2018) 17. Huang, T., Lan, L., Fang, X., An, P., Min, J., Wang, F.: Promises and challenges of big data computing in health sciences. Big Data Res. 2(1), 2–11 (2015) 18. Yang, S., Zhou, P., Duan, K., Hossain, M.S., Alhamid, M.F.: emHealth: towards emotion health through depression prediction and intelligent health recommender system. Mob. Netw. Appl. 23, 216–226 (2018) 19. Hussein, A.S., Omar, W.M., Li, X., Ati, M.: Efficient chronic disease diagnosis prediction and recommendation system. In: Proceeding of IEEE-EMBS Conference on Biomedical Engineering and Sciences, Langkawi, pp. 209–214 (2012) 20. Felipe, LO., Barrué, C., Cortés, A., Wolverson, E., Antomarini, M., Landrin, I., Votis, K., Paliokas, I., Cortés, U.: Health recommender system design in the context of CAREGIVERSPROMMD project. In: Proceeding of PETRA ’18: The 11th PErvasive Technologies Related to Assistive Environments Conference, June, Corfu, Greece (2018) 21. Morrell, T.G., Kerschberg, I.: Personal health explorer: a semantic health recommendation system. In: Proceeding of IEEE 28th International Conference on Data Engineering Workshops, Arlington, VA, pp. 55–59 (2012) 22. Bocanegra, C.L.S., Ramos, J.L.S., Rizo, C., Civit, A., Fernandez-Luque, L.: HealthRecSys: a semantic content-based recommender system to complement health videos. BMC Med. Inform. Decis. Mak. 17, 63 (2017) 23. Sanchez-Bocanegra, C.L., Sanchez-Laguna, F., Sevillano, J.L.: Introduction on health recommender systems. Methods Mol. Biol. 1246, 131–146 (2015) 24. Keogh, E.: Instance-based learning. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning. Springer, Boston (2011) 25. Ustev, Y.E., Incel, O.D., Ersoy, C.: User, device and orientation independent human activity recognition on mobile phone challenges and a proposal. In: The ACM Conference on Pervasive and Ubiquitous Computing Adjunct Publication, Zurich, pp. 1427–1435 (2013) 26. Park, H., Dong, S.Y., Lee, M., Youn, I.: The role of heart-rate variability parameters in activity recognition and energy-expenditure estimation using wearable sensors. Sensors (Basel) 2017(7), 1698 (2017)

Review of Machine Learning and Deep Learning …

125

27. Shoaib, M., Bosch, S., Incel, O.D., Scholten, H., Havinga, P.J.M.: Complex human activity recognition using smartphone and wrist-worn motion sensors. In: Sensors, p. 426 (2016) 28. Zhang, S., Rowlands, A.V., Murray, P., Hurst, T.L.: Physical activity classification using the GENEA wrist-worn accelerometer. Med. Sci. Sports Exerc. 44, 742–748 (2012) 29. Garcia-Ceja, E., Brena, R.F., Carrasco-Jimenez, J.C., Garrido, L.: Long-term activity recognition from wristwatch accelerometer data. Sensors 14, 22500–22524 (2014) 30. Saha, J., Chowdhury, C., Biswas, S.: Device independent activity monitoring using smart handhelds. In: Proceeding of 7th International Conference on Cloud Computing, Data Science and Engineering—Confluence, Noida, pp. 406–411 (2017) 31. Bayat, A., Pomplun, M., Tran, D.A.: A study on human activity recognition using accelerometer data from smartphones. Procedia Comput. Sci. 34, 450–457 (2014) 32. Saha, J., Roy Chowdhury, I„ Chowdhury, C., Biswas, S., Aslam, N.: An ensemble of condition based classifiers for device independent detailed human activity recognition using smartphones. Information 9(4), 94 (2018) 33. Jamshidi, S., Torkamani, M.A., Mellen, J., Jhaveri, M., Pan, P., Chung, J., Kardes, H.: A hybrid health journey recommender system using electronic medical records. In: The Proceedings of the 3rd International Workshop on Health Recommender Systems, HealthRecSys 2018, co-located with the 12th ACM Conference on Recommender Systems (ACM RecSys 2018), Vancouver, BC, Canada (2018) 34. Stikic, M., Schiele, B.: Activity recognition from sparsely labeled data using multi-instance learning. In: Proceeding of Location and Context Awareness. LoCA 2009. Lecture Notes in Computer Science, vol. 5561. Springer, Berlin (2009) 35. Toda, T., Inoue, S., Tanaka, S., Ueda, N.: Training human activity recognition for labels with inaccurate time stamps. In: Proceeding of UbiComp ’14 Adjunct, pp. 863–872, 13–17 Sept 2014 36. Stikic, M., Larlus, D., Schiele, B.: Multi-graph based semisupervised learning for activity recognition. In: Proceeding of International Symposium on Wearable Computers, Linz, pp. 85–92 (2009) 37. Ong, W.H.: An unsupervised approach for human activity detection and recognition. Int. J. Simul. Syst. Sci. Technol. 14(5) (2013) 38. https://medium.com/odessa-ml-club/a-journey-to-clustering-introduction-to-dbscane724fa899b6f. Last seen 20/5/2019 39. Kwon, Y., Kang, K., Bae, C.: Unsupervised learning for human activity recognition using smartphone sensors. Expert Syst. Appl. 41(14), 6067–6074 (2014) 40. Lara, O.D., Labrador, M.A.: A survey of human activity recognition using wearable sensors. In: IEEE Communication Surveys and Tutorials, vol. 15 (2013) 41. Yuan, W., Li, C., Guan, D., et al.: Socialized healthcare service recommendation using deep learning. Neural Comput. Appl. 30, 2071–2082 (2018) 42. Eskofier, B.M., et al.: Recent machine learning advancements in sensor-based mobility analysis: deep learning for Parkinson’s disease assessment. In: Proceeding of 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Orlando, FL, pp. 655–658 (2016) 43. Sun, J., Fu, Y., Li, S., He, J., Xu, C., Tan, L.: Sequential human activity recognition based on deep convolutional network and extreme learning machine using wearable sensors. Hindawi J. Sens. (8580959), 10 (2018) 44. Sahoo, A.K., Pradhan, C., Barik, R.K., Dubey, H.: DeepReco: deep learning based health recommender system using collaborative filtering. Computation 7(25) (2019) 45. Zhang, C., Wang, G., Zhao, J., Gao, P., Lin, J., Yang, H.: Patient-specific ECG classification based on recurrent neural networks and clustering technique. In: Proceeding of 13th IASTED International Conference on Biomedical Engineering (BioMed), Innsbruck, Austria, pp. 63–67 (2017) 46. Miotto, R., Li, L., Kidd, A.B., Dudley, J.T.: Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci. Rep. 6, 1–10 (2016)

126

J. Saha et al.

47. Fakoor, R., Ladhak, F., Nazi, A., Huber, M.: Using deep learning to enhance cancer diagnosis and classification. In: Proceedings of the 30th International Conference on Machine Learning, JMLR: W&CP vol. 28, Atlanta, Georgia, USA (2013) 48. Sedhain, S., Menon, A.K., Xie, L., Sanner, S.: AutoRec: auto encoders meet collaborative filtering. In: Proceeding of 24th International Conference World Wide Web, Florence, Italy (2015) 49. Deng, X., Huangfu, F.: Collaborative variational deep learning for healthcare recommendation. IEEE Access 7, 55679–55688 (2019)

Jayita Saha is currently pursuing the Ph.D. degree in Computer Science and Engineering at Jadavpur University, India. She received her B. Tech. and M. Tech. degrees in Computer Science and Engineering from Durgapur Institute of Advanced Technology and Management and Jadavpur University, India, in 2008 and 2011, respectively. Her research interests include Human Activity Recognition and machine learning. Chandreyee Chowdhury is an Assistant Professor in the department of Computer Science and Engineering at Jadavpur University, since 2006. She received Ph.D. in Engineering from Jadavpur University in 2013 and M.E. in Computer Science and Engineering from Jadavpur University in 2005. Her research interests include Wireless Sensor Networks and its variants, mobile crowdsensing, and human activity recognition. She was awarded Post Doctoral Fellowship from Erusmus Mundus in 2014 to carry out research work at Northumbria University, UK. She is a member of technical program committees of many international conferences. She is a member of IEEE and IEEE Computer Society. Suparna Biswas is an Associate Professor in the Department of Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology (formerly WBUT), India since 2018. She obtained her M.E. and Ph.D. from Jadavpur University. She was awarded Post Doctoral Fellowship from Erusmus Mundus in 2014 to carry out research work at Northumbria University, UK. She has co-authored a number of research papers published in Conferences and journals of international repute. She has served as a reviewer in Conferences and journals of international repute. Her areas of research interests are Mobile Computing, Network Security, Wireless Body Area Network, Healthcare Applications etc.

Deep Learning and Electronics Health Records

Deep Learning and Explainable AI in Healthcare Using EHR Sujata Khedkar, Priyanka Gandhi, Gayatri Shinde and Vignesh Subramanian

Abstract With the evolving time, Artificial Intelligence (AI) has proved to be of great assistance in the medical field. Rapid advancements led to the availability of technology which could predict many different diseases risks. Patients Electronic Health Records (EHR) contains all different kinds of medical data for each patient, for each medical visit. Now there are many predictive models like random forests, boosted trees which provide high accuracy but not end-to-end interpretability while the ones such as Naive-Bayes, logistic regression and single decision trees are intelligible enough but less accurate. These models are interpretable but they lack to see the temporal relationships in the characteristic attributes present in the EHR data. Eventually, the model accuracy is compromised. Interpretability of a model is essential in critical healthcare applications. Interpretability helps the medical personnel with explanations that build trust towards machine learning systems. This chapter contains the design and implementation of an Explainable Deep Learning System for Healthcare using EHR. In this chapter, use of an attention mechanism and Recurrent Neural Network(RNN) on EHR data has been discussed, for predicting heart failure of patients and providing insight into the key diagnoses that have led to the prediction. The patient’s medical history is given as a sequential input to the RNN which predicts the heart failure risk and provides explainability along with it. This represents an ante-hoc explainability model. A neural network having two levels and attention model is trained for detecting those visits of the patient in his history that could be influential and significant to understand the reasons behind any prediction done on the medical history of the patient data. Thus, considering the last visit first proves to be beneficial. When a prediction is made, the visit-level contribution is S. Khedkar (B) · P. Gandhi · G. Shinde · V. Subramanian Department of Computer Engineering, VESIT, Mumbai, India e-mail: [email protected] P. Gandhi e-mail: [email protected] G. Shinde e-mail: [email protected] V. Subramanian e-mail: [email protected] © Springer Nature Switzerland AG 2020 S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics, Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_7

129

130

S. Khedkar et al.

prioritized i.e. which visit contributes the most to the final prediction where each visit consists of multiple codes. This model can be helpful to medical persons for predicting the heart failure risks of patients with diseases they have been diagnosed with based on EHR. This model is then worked upon by local interpretable modelagnostic explanations (LIME) which provide the different features that positively and negatively contribute to heart failure risk. Keywords Heart failure · Predictive modeling · Deep learning · RNN · LIME · Explainability · Interpretability · Attention

1 Introduction Artificial Intelligence (AI) has a huge impact in the Medical Domain. Applications like managing medical records, assisting to the physicians while operations, predicting diseases based on patient history, drug creation or health monitoring applications like FitBit, everything can be achieved using AI. But the number of users using these systems is still less. The main reason for this being that it is difficult for a human being to trust a machine when it comes to their health. It should be mandatory to understand what exactly was responsible for output the model gave when dealing with healthcare as a domain suggested in the new European General Data Protection Regulation act. Talking to domain experts revealed that physicians do not prefer using Artificial Intelligence (AI) systems as an aid to their diagnosis. According to them, these systems do pace up their diagnosis process but do not provide the reasons behind their decisions. Thus, they cannot trust the system and have to continue with their manual method of diagnosis. Thus, even though huge advancements have been made in the research sector, due to a lack of trust in the systems they are unable to find wide-ranging business applications. This chapter contains the design of Explainable AI system based on EHR data [1, 2]. Use of an attention mechanism and Recurrent Neural Network (RNN) on EHR data has been discussed, for predicting heart failure of patients and providing insight into the key diagnoses that have led to the prediction using LIME. The chapter is organized as follows. In Sect. 2, related work is discussed. Then Sect. 3, describes the proposed Methodology. Section 4 describes the experiments, evaluation. Conclusion and future work are described in Sect. 5.

2 Related Work The following Section describes the related work. Holzinger et al. [3] discussed the various ways to build an explainable model for the medical domain. Explanations of predictions can be beneficial in teaching, learning,

Deep Learning and Explainable AI in Healthcare Using EHR

131

research, and even in court. In the medical domain, the demand for interpretable and explainable models is increasing. They must be to re-enact the decision-making and knowledge extraction process. Explainability is classified into two categories: ante-hoc and post hoc. Ante-hoc systems incorporate explainability into the model itself, whereas post hoc systems involve explaining the predictions of a complex model using a secondary simpler model. Examples of ante-hoc models are decision trees, linear regression, fuzzy inference system, etc. Examples of post hoc models are algorithms like BETA (Blackbox Explanations using Transparent Approximations). Zhao [4] used Electronic Health Records (EHR) datasets filled with a wealth of all kinds of medical data for each patient for each medical visit. Existing methods of data analysis on EHR datasets prove to be impossible to understand due to its size, dimensionality, and irregularity. Heart failure (HF) is difficult to predict as it is an overarching condition rather than a distinct phenotype. Choi et al. [5] uses CNNs in the context of natural language processing to process this data. MIMIC III dataset has been used, consisting of 46,520 patients, 651,047 diagnosis events, 240,095 procedures, and 4,156,450 predictions. For each patient, information about the ICD9(International Classification of Diseases) codes, procedure items and, drug names are extracted from the EHR records, and arranged in a sequence similar to a “sentence”. A word2vec model is then used to convert these sentences to embeddings, which are then used to train the CNN. The activation function used is Rectified linear units (ReLu) in the convolutional ad fully connected layers. Heart failure (HF) is a complex condition whose prediction has proved particularly difficult due to the various conditions and events that lead to it. Heart failure may occur due to kidney failure, coronary artery diseases, neural disorders, diabetes, medications for other conditions, procedures performed, and previous instances of heart attacks. This complex nature makes it very difficult to predict heart failure. EHR datasets hold the key to solving this task, however, its size has made it virtually impenetrable by traditional techniques. Hence, the authors of this paper have taken the novel approach of using CNNs in the context of NLP (Natural Language Processing) to efficiently process this data. The data is first concatenated into a sequence form drawn from diagnoses, procedures and medications. This sequence is then fed to an embedding layer. Random and word2vec embeddings both have been used for comparison. Multiple such embedding vectors are stacked and are together fed into the CNN. The CNN processes this input and produces a binary output (HF or not). Guestrin et al. [6] discussed in their paper that machine learning models are black boxes. Trust can be built by understanding the reasons behind predictions. It provides insights into the model and can be used as a technique to assess model performance and build better, more accurate and correct models. Guestrin et al. [6] introduce a new algorithm called LIME algorithm for explaining predictions of any model. LIME treats the given model as a ‘black-box’ and tries to explain a prediction instance x by trying to learn the behavior of the prediction function f(x) in the surroundings of x. The instances surrounding x are obtained by computing random perturbations of the input by random sampling of the input. The random sampling is uniform, so as to maintain ensure samplings evenly in the surrounding of x. This allows obtaining a locally faithful explanation of the prediction

132

S. Khedkar et al.

instance. Explaining a single instance is not enough, so explanations for multiple instances are generated and presented to the user to explain the model as a whole. Two techniques: SP-LIME (Selective pick LIME) and RP-LIME (Random pick LIME) are used to pick the instances to be presented to the user. RP-LIME involves picking instances randomly. This approach may leave some features unexplained. SP-LIME involves picking instances such that all features are covered, and minimum redundant instances are picked. Bengio et al. [7], this paper deals with neural machine translation, the model proposed can be used in a host of other applications. Attention models in the context of EHR data can help in pointing out the features used to generate the prediction. This serves the dual purpose of adding interpretability to the model and allowing assessment of the model as to whether the model is considering the right features while making a prediction. At each hidden state of the RNN, a context vector is formed by considering the attention weights of all the input features w.r.t. to that hidden state. In the context of EHR data, attention weights can be learned from the data, and visualization of these attention weights can help us to analyze the prediction and the reason behind the prediction. Choi et al. [8] have used Graph-Based Attention Model (GRAM) for creating interpretable predictive models based on EHR data. GRAM uses a directed acyclic graph called Knowledge DAG along with the predictive NN model, in which each leaf node represents a medical concept, and a non-leaf node represents a more general concept. It exploits the robust hierarchical ontologies that have been established in medicine. The process of using the parent nodes (concepts) can be performed using attention mechanisms and end-to-end training. GRAM shows to achieve 10% higher accuracy than the basic RNN, the standard model used with EHR data, with an AUC of 84.48%, while also being interpretable, unlike RNNs. For qualitative assessment of the interpretations, a 2-D plot using the t-SNE algorithm of the final representation of 2000 randomly chosen diseases learned by GRAM is used. GRAM demonstrates how an auxiliary model can be used to interpret the predictions of any neural network. Knowledge DAG successfully exploits heuristic knowledge of ontologies in medicine to learn interpretations even if the dataset available is small. The use of RNNs as a predictive neural network is very useful for evaluating the model, as RNNs are already being used to process EHR data.

3 Proposed Methodology To achieve explainability, two methodologies are being used, shown in Fig. 1. Explainability is broadly classified into two types—Ante-hoc explainability and Post hoc explainability. The explanations are built into the prediction model in the ante-hoc explanatory model, whereas the explanations are provided after the prediction is made in the post hoc explanatory model. Electronic Health Records (EHR data) was obtained from the MIMIC III database [2] for which an examination was conducted by the Citi program. MIMIC III dataset

Deep Learning and Explainable AI in Healthcare Using EHR

133

Fig. 1 Proposed methodology

consists of 46,520 patients, 651,047 diagnosis events, and 240,095 procedures. For each patient, there exists information about the ICD9 (International Classification of Diseases) codes related to the diagnosis and procedures conducted in the patient. This dataset was used for the Attention-based RNN model. The second dataset that was used was Cleveland dataset [9] from the UCI ML repository and used by many to build heart disease prediction models. This is a small dataset consisting of 303 patient records. It was used to study the LIME algorithm.

134

S. Khedkar et al.

Attention Mechanism: Attention mechanisms in neural networks work almost similar to visual attention mechanism found in humans. Attention, when seen from the human perspective, tells the human brain what exactly is to be understood and visualized about the model’s work. The attention mechanism allows the network to refer back to the input sequence while calculating attention values and does not force the network to encode all input information into a vector of fixed length.

3.1 Conceptual System Design • Data pre-processing module The data is processed, and dictionaries are created, mapping patient numbers to an admission sequences, and every single admission with a sequence of ICD9 codes, along with other mappings like timestamps and length of stay. These mappings are directly used by the model, as a representation of the actual data required. • Extraction of relevant codes Only patients showing codes relevant to heart failure are considered for training. Also, the patient must have made a minimum of three visits within twelve months. • Visit level attention Attention is given to individual patient’s visits as an overall feature, considering the length of stay and the time between visits. • Variable level attention Attention is given to the ICD9 codes in each visit, calculating their contributions to the output. • Result integration The results from both the attention levels are integrated, and the final output of the model is forwarded along with contribution scores for presentation and visualization. • Visualizations and sentential explanations Visualizations of the attention scores of each ICD9 code by visit are created to analyze which diagnoses contributed the most to the output of the model.

Deep Learning and Explainable AI in Healthcare Using EHR

135

3.2 Attention Models Attention models work by “attending” to input parts while predicting the output, instead of processing it sequentially. So, if the attention value of a particular feature in the input is high, it would imply that it highly influenced the output. This gives us the advantage of being able to interpret the model to understand what part of the input was considered while predicting whether heart failure is present or not. Attention maps visualization make us understand where the network sees when trying to make the prediction. The attention mechanism gives the network the capability to access the internal memory. So the network chooses what to retrieve from memory. The weighted combination of all memory locations is retrieved by the network. A sequence is important for every task that is performed in our everyday lives. Be it our language where the sequence of words is important or the data of a genome sequence where every sequence has a different meaning. Time defines the occurrence of events in time series data. Thus a specific neural network model is needed, known as the recurrent neural network, designed to work on data that is defined by time. Medical history of any patient is vital for predicting accurate medical diagnosis. RNNs make use of this medical history as sequential information to predict the patient’s heart failure risk and the explanations behind this prediction. Recurrent Neural Networks (RNNs) surfer’s from the vanishing gradient problem. It causes information from the past to be washed out for long input sequences. To solve this problem, many techniques such as Long Short Term Memory (LSTM) units, Gated Recurrent Units (GRUs).GRU (Gated Recurrent Unit) tries to solve this problem using vectors called gates. RNN has vanishing gradient problem which is solved in GRUs by using two ‘gates’—reset and update gate. These two vectors identifies what information(values) should be passed to the output (the next t-state in the RNN). GRUs can be trained to preserve information from long ago, without forgetting it over time.

3.3 GRU: How It Works Let us look at some of the workings of GRUs and the mathematics behind it shown in Fig. 2. A GRU can be represented diagrammatically as shown. The notations in the figure are as follows:

The sigmoid functions represent the gates mentioned earlier, one each for the update and the reset gates.

136

S. Khedkar et al.

Fig. 2 Gated recurrent unit

1. Update Gate The update gate determines how much past information from previous time steps is passed along to the future. The update gate value zt for time step t is calculated by using the following formula:   z t = σ W (z) xt + U (z) h t−1 2. Reset Gate This gate is used to decide how much of the past information to forge from the model. It is calculated as follows:   rt = σ W (r ) xt + U (r ) h t−1 3. Final memory content Final memory content, i.e., the output forwarded to the next time step, is calculated in two steps. First, the relevant information from the past is stored using the reset gate, in a  variable called the current memory content, h t :

Deep Learning and Explainable AI in Healthcare Using EHR

137



h t = tanh(W xt + rt  U h t−1 ) Finally, we calculate the ht vector, which holds the information to be passed on down the network. The update gate is used for this. The information to be collected  from h t and h(t −1) are determined as follows: 

h t = z t  h t−1 + (1 − z t )  h t The attention mechanism-based model has two levels of attention, which first detects influential past visits and then detects significant clinical variables within those visits. The attention model tries to imitate a physician’s behavior during an encounter. Just like a physician, it gives greater attention to recent clinical visits, by considering the recent visits first and the previous visits later, i.e., in reverse order. This is because stationary models often put together all the previous information, thus ignoring any information that is time-dependent and can result in loss of temporal relationships present in the input data, which can lead to input data having huge temporal differences getting similar predictions. So considering the last visit first proves to be beneficial, as a result of which the model knows which visit is more important and the model is trained on visit specific features that contribute to prediction. When a prediction is made, the visit-level contribution is prioritised i.e. which visit contributes the most to the final prediction where each visit consists of multiple codes. Also, the variable level contribution i.e. which variable contributes more to the final prediction must be known. The model can be viewed in three parts. Part 1 is governed by GRU for visit-level attention weights and since each visit consists of multiple variables, Part 2 is governed by GRU that generates attention weights for variable-level. Part 3 is Multi-Layer Perceptron to embed visit information to preserve interpretability. The visit is embedded to a lower dimensional space using MLP. Parts 1 and 2 make side loops which later are combined with the MLP model for prediction. As there is no loop in the prediction process, the model is interpretable end to end. There are two major advantages of the model: 1. Running the GRU in reverse time order gives computational concessions 2. There can be a substantial improvement in the prediction process when timestamps are used. Timestamps provide the duration of the time spent by the patient in the ICU. This parameter adds to the accuracy of the model, as longer ICU stays can indicate increased risk. The patients having heart failure and their qualifying ICD_9 diagnosis codes were extracted from the MIMIC III dataset to train and test the attention-based explanatory models. There are 2349 patients, having 9587 admission records having 135,709 diagnoses records in the dataset prepared and 2989 unique ICD_9 codes. These extracted patients conform to the conditions of having at least 3 visits and having diagnosis codes from a list of heart failure related diagnosis codes. This list

138

S. Khedkar et al.

has been compiled from data provided by the creators of the MIMIC III dataset itself, and some experts in the field. • Hyperparameter tuning: It is very easy to achieve a very high accuracy while training the data using dense neural networks, but these might not generalize well to validation and test set. Also, eschewing deep/complex architectures may lead to low accuracy on the data sets. Hence, a sweet spot has to be found which generalizes well and has a high accuracy. Some models fail due to saddle points and local minima making gradients zero, hence hyper-parameters like learning rate need to be tweaked and change the optimizer to either Adam or Adadelta to not get stuck and stop learning further. Hyperparameters: • Number of Layers: It must be chosen wisely as a very high number may introduce problems like overfitting and vanishing and exploding gradient problems and a lower number causes the model to with high bias and low potential model. As the model have two separate GRU units for training visit level codes and variable level codes, visit codes and variable codes with 128 hidden alpha layers and 128 hidden beta layers are trained respectively. As the model’s performance metric is accuracy, on changing the number of hidden layers to 256 there was no significant change in the accuracy of the model. Thus the hidden layer count is kept at 128 so as to maintain the simplicity of the model. The linear embedding applied to the initial list of integers were tweaked from 128 embedding size to 256 as it showed an increase in the accuracy of the model by 9% and also a substantial decrease in the cost function. • Activation Function: The popular choices in this are ReLU, Sigmoid, Tanh, and LeakyReLU. For the Update gate (used to determine how much of the past information is to be passed on) and Reset gate (used to decide how much of the past information to forget) of the GRU models, sigmoid and tanh as activation functions are being used. • Optimizer: It is the algorithm used by the model to update weights of every layer after every iteration to minimize the cost function. For this model, initially AdaGrad was used as an optimizer but it has some concerns of its own like continually decaying learning rate η, manual selection of the learning rate η. To resolve these concerns optimizer was switched to Adadelta. • Initialization: Doesn’t play a very big role as defaults work well but still one must avoid using zero or any constant value (same across all units) weight initialization. The weights are initialized between −1 and 1 for linear embedding, for visit level (alpha), for variable level (beta). The biases are initialized with 0’s of suitable data type and format. • Batch Size: It is indicative of no. of patterns shown to the network before the weight matrix is updated. If the batch size is less, patterns would be less repeating and hence the weights would be all over the place and convergence would

Deep Learning and Explainable AI in Healthcare Using EHR

139

become difficult. For this model, batch size is initialized as 100. This was appropriate as modifying it any further was only increasing the time taken for execution. • Number of Epochs: The no. of epochs is the no. of times the whole training dataset is passed through the model. Seventeen epochs are used here as increasing/decreasing it further does not affect on accuracy. The number of epochs is an important hyperparameter since an increase in this number might result in overfitting of the model and a decrease in it may yield poor results as the model may not function to its fullest potential. Overfitting can lead to generalization, which eventually would result in vanishing and exploding gradient problems. • Dropout: The keep-probability of the Dropout layer can be thought of as hyperparameter which could act as a regularizer to help us find the optimum biasvariance spot. Dropouts are applied to two places: (1) to the input embedding, (2) to the context vector c_i. Their respective dropout rates are 0.4 and 0.4 respectively. In simplest terms, this value is precise as it complements the performance metrics of the model. Dropout values affect the performance so it is recommended to tune them for the data. • L1/L2 Regularization: Any machine learning model needs to learn from all features provided to it. L2 regularization is applied to W_emb (weight of linear embedding layer), w_alpha (weight of visit level GRU model), W_beta ((weight of variable level GRU model)), and w_output (at the output layer after the concatenation of alpha, beta weights with the embedding of the input vector). Trained model is evaluated based on performance measure using test dataset. The difference between the predicted value and its corresponding real values is measured by the cost function. To find this cost (train_cost), the Adadelta optimization algorithm is used. Adadelta is an optimization algorithm from the family of Stochastic Gradient Descent algorithms. It finds the minimum cost value. It uses various weights and always updates the weights according to the loss, so every time it gets to try new weight values. The model is first run with some initial weights and the algorithm updates them, trying to find the right combination by performing thousands of iterations. It is important to note that Adadelta is looking for the minimum cost, not minimum weights, and hence it is only updating weights, not minimizing them.

3.4 LIME Algorithm The general approach LIME takes to achieve the goal is as follows: 1. 2. 3. 4. 5.

For each prediction to explain, permute the observation n times. Let the complex model predict the outcome of all permuted observations. Calculate the distance from all permutations to the original observation. Convert the distance to a similarity score. Select m features best describing the complex model outcome from the permuted data.

140

S. Khedkar et al.

6. Fit a simple model to the permuted data, explaining the complex model outcome with the features from the permuted data weighted by its similarity to the original observation. 7. Extract the feature weights from the simple model and use these as explanations for the complex model’s local behavior.

4 Results and Discussions Results for LIME Algorithm using Multilayer Perceptron, Random Forest and Naïve Bays is described below.

4.1 Multi-layer Perceptron(MLP) A multilayer perceptron (MLP) is composed of layers, which are of three types— to receive the signal, there is an input layer, an output layer makes a decision or prediction, and any number of hidden layers between input and output perform the actual processing of the features. The number of hidden layers used in MLP was 14 and activation function ReLU was used. Solver lbfgs was used. The mean accuracy obtained was 83.11%. Figure 3 shows the result of MLP as a black box model when LIME, the explainer model was run on it.

Fig. 3 Multi-layer perceptron (MLP)

Deep Learning and Explainable AI in Healthcare Using EHR

141

Fig. 4 Random forest algorithm

4.2 Random Forest Algorithm Ensembled classifier Random Forest Classifier is used for classifying objects. This classifier selects a subset of the training set. It generates a set of decision trees and aggregates the decisions from different decision trees and calculates the final class of test object. The accuracy obtained for the Cleveland dataset was 77% as shown in Fig. 4.

4.3 Naive Bayes Algorithm It is a statistical classifier based on Bayes theorem. It uses the class conditional independence assumption. It uses a simplified calculation of probabilities, hence the name Naive Bayes. The simplification makes the calculations more tractable. Thus it performs well by giving an accuracy of 77% as shown in Fig. 5. From the above results it can be said that for this particular patient, the model predicted the patient to have presence of heart disease and on an average it can be said that the feature Thalach (maximum heart rate), presence of exercise included angina, having Asymptomatic chest pain, ca (number of major vessels) and resting electrocardiographic results are the features that contributed positively to the patient’s presence of heart failure disease. Thus these features must be monitored in the future to decrease the risk of heart failure. The feature “Thal” has a value less than 3, indicating it is normal, and thus does not contribute to the indication of heart disease in the patient, or rather contributes negatively.

142

S. Khedkar et al.

Fig. 5 Naive Bayes algorithm

4.4 Results for Attention Mechanisms With a change in the number of input layers from 128 to 256, accuracy was increased to 82.6%, which was initially 73% as showed in Figs. 6 and 7. As shown in Figs. 8 and 9, here D_428: Congestive Heart Failure, D_427: Tachycardia and D_996: Mechanical Complication due to a cardiac pacemaker. On running the model on test data, a ‘.txt’ file is generated. It contains the contribution score of each ICD9 code with respect to each visit for an individual patient. This

Fig. 6 Before proper hyperparameter tuning

Deep Learning and Explainable AI in Healthcare Using EHR

143

Fig. 7 Final hyperparameter tuning

graph shows the contribution of the ICD9 code at a particular visit for a particular patient. As shown in Fig. 10, the accuracy of the RNN model was 82.5%. Some diseases were misclassified with this accuracy but considering that the model is also capable of interpretation the accuracy of the model can be said to be efficient. Also, this allows the engineer to understand that the model is somewhere giving wrong predictions and must be improved, thus increasing transparency. Figure 11, graph plots the number of patients found having a particular disease. For example, Chronic Kidney Disease has the highest count of 1670 patients; similarly, Acidosis is found in 1638 patients, etc. On average, which are those diseases that contribute negatively and has the least effect on heart failure are plotted in Fig. 12. For example, Tobacco use disorder contributes as low as −0.0134; similarly Obesity with average −0.0127 has also less contribution.

144

S. Khedkar et al.

Fig. 8 Various patients considered for testing

Fig. 9 Textual explanations for predictions

On average, which are those diseases that contribute positively and have the most effect on heart failure are plotted in this graph. For example, Atrial fibrillation contributes the most with average score 0.0253; similarly, Congestive Heart Failure with average 0.0206 has also more contribution as shown in Fig. 13. Figure 14 shows the total number of times that particular disease was found in a visit for a particular patient. For example, Chronic Kidney Disease was found 4029 times in total; similarly, Congestive Heart Failure was found 3742 times, etc.

Deep Learning and Explainable AI in Healthcare Using EHR

145

Fig. 10 Graphical output for a patient using ante-hoc explanatory model—ATTENTION mechanism

Fig. 11 Total patients versus diagnosis

146

S. Khedkar et al.

Fig. 12 Negatively contributing ICD_9 codes

Fig. 13 Positively contributing ICD_9 codes

5 Conclusions Artificial Intelligence (AI) and neural networks, in particular have seen unprecedented advancement in the last decade, mainly due to constantly improving computational capabilities. However, these advancements have not been harnessed in a business and social perspective due to a lack of trust in the models owing to their

Deep Learning and Explainable AI in Healthcare Using EHR

147

Fig. 14 Occurrence of ICD_9 codes

black-box nature, with business applications still using relatively simple and less accurate algorithms. This conundrum signals an urgent need to bring about explainability and interpretability to deep neural networks. This chapter addresses this need by describing an explainable neural network, which can explain its own predictions, while also comparing it with a post hoc explainer like LIME. Predicting the possibility of heart failure in an interpretable manner would give doctors an early warning, and help reduce readmission rates. The model using RNN gives 82.5% Accuracy. This solution would contribute towards building trust in AI, and also towards putting neural networks into widespread and constructive use. In Future, the model can be extended to predict other diseases. The efficacy of ensembled algorithms can be tested for more precise predictions.

References 1. Goldberger, A.L., Amaral, L.A.N., Glass, L., Hausdorff, J.M., Ivanov, P.Ch., Mark, R.G., Mietus, J.E., Moody, G.B., Peng, C.-K., Stanley, H.E.: PhysioBank, PhysioToolkit, and PhysioNet(June 13): components of a new research resource for complex physiologic signals. Circulation 101(23), e215–e220. (Circulation Electronic Pages; http://circ.ahajournals.org/content/101/23/ e215.full) (2000) 2. MIMIC III dataset (Medical Information Mart for Intensive Care III). https://mimic.physionet. org/ 3. Holzinger, A., Biemann, C., Pattichis, C.S., Kell, D.B.: What do we need to build explainable AI systems for the medical domain (2017). arXiv:1712.09923v1 4. Zhao, C., Shen, Y., Yao, L.-P.: Convolutional neural network-based model for patient representation learning to uncover temporal phenotypes for heart failure (2017)

148

S. Khedkar et al.

5. Choi, E., Bahadori, M.T., Kulas, J.A., Schuetz, A., Stewart, W.F., Sun, J.: RETAIN: an interpretable predictive model for healthcare using reverse time attention mechanism. In: 30th conference on neural information processing systems (NIPS), Barcelona, Spain (2016) 6. Guestrin, C., Singh, S., Ribeiro, M.T.: Why should i trust you? Explaining the predictions of any classifier (2016). arXiv:1602.04938 7. Bahdanau, D., Cho, K.H., Bengio, Y.: Neural machine translation using attention mechanism paper, ICLR (2015) 8. Choi, E., Bahadori, M.T., Song, L., Stewart, W.F., Sun, J.: GRAM: graph-based attention model for healthcare. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 787–795 (2017) 9. Cleveland Heart Disease Dataset: (1988). https://archive.ics.uci.edu/ml/datasets/heart+Disease

Prof. Sujata Khedkar is working as Associate Professor in Computer Engineering Department, Vivekanand Education Society’s Institute of Technology, University of Mumbai, India. Her current research focuses on Artificial Intelligence and Big Data Analytics. She is member of the ISTE and CSI. Priyanka Gandhi is a Software Developer with active research interests in the field of AI and Big Data Technologies. She has pursued Bachelor in Computer Engineering from V.E.S.I.T., University of Mumbai in 2019. Gayatri Shinde is a Software Developer with active research interests in the field of AI and Deep Learning. She has pursued Bachelor in Computer Engineering from V.E.S.I.T., University of Mumbai in 2019. Vignesh Subramanian is working as Software Developer with active research interests in the field of AI and Big Data Analytics. He has pursued Bachelor in Computer Engineering from V.E.S.I.T., University of Mumbai in 2019.

Deep Learning for Analysis of Electronic Health Records (EHR) Pawan Singh Gangwar and Yasha Hasija

Abstract In current scenario, every medical equipment, clinical instrument, lab setup in healthcare centres and hospitals, is linked with digital devices which has brought about digital data explosion. Due to this, the amount of digital information generated and stored in Electronic Health Records (EHRs) has increased exponentially. Therefore, EHRs have become an area of booming research, as EHRs can provide a host of untouched possibilities which, the data contained in them, can bring about. EHRs have several classification schema and controlled vocabularies are present to record relevant medical information and events. Thus, harmonizing and analysing data among institutions and across terminologies is an ongoing field of research. Several clinical code representation forms have been proposed by various deep learning EHR systems that share themselves easily to cross institutional analysis and applications. EHR records have primary use in storing patient information such as patient medical history, progress, demography, diagnosis and medications. But researchers across the globe have invented secondary use of EHRs for several clinical and health informatics applications. Secondary usage of electronic health records (EHRs) promises to boost clinical research and result into better informed clinical decision making. Challenge in summarizing and representing patient data prevents widespread practice to predict the future of patients using EHRs. Simultaneously, over the span of time, the machine learning field has witnessed widespread advancements in the area of deep learning. The current research in healthcare informatics focusses on applying deep learning based on EHRs to clinical tasks. In this context, the deep learning techniques described here can be applied to various types of clinical applications such as extraction of information, representation learning, outcome prediction, phenotyping and de-identification. Several limitations of current research have been identified like model interpretability and heterogeneity of data.

P. S. Gangwar · Y. Hasija (B) Delhi Technological University, Delhi 110042, India e-mail: [email protected] P. S. Gangwar e-mail: [email protected] © Springer Nature Switzerland AG 2020 S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics, Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_8

149

150

P. S. Gangwar and Y. Hasija

1 Introduction Over the earlier decade, emergency clinic selection of electronic health record (EHR) systems has expanded numerous folds, which gave $30 billion motivators to restorative organizations, medical clinics and specialists to receive EHR systems [1]. According to the most recent report, about 84% of medical clinics have embraced at any rate a fundamental EHR framework, a 9-overlay increment from 2008 [2]. Moreover, office-based doctor appropriation of essential and ensured EHRs has expanded to 87% from 42% [3]. EHR frameworks store every patient experience information, including statistic data, research facility tests and results, analysis, remedies, clinical notes, radiological pictures, and so forth [1]. While for the most part intended for improving social insurance proficiency from a dynamic stance, numerous investigations have discovered optional utilizations of clinical data [4, 5]. Specifically, the patient information included in EHR frameworks have been utilized for several such assignments as medicinal idea extraction [6, 7], infection deduction, quiet direction displaying, clinical choice emotionally supportive networks, and more (Table 1). Until some most recent couple of years, a significant part of the techniques to investigate rich EHR information, depended on, customary statistical and machine learning procedures like logistic regression, support vector machines (SVM), and random forests [13]. Of late, deep learning strategies have made incredible progress in a few spaces through catching long-run conditions and deep hierarchical feature construction in information in an able way [14]. Taking a gander at the ascent in the fame of deep learning strategies and the inexorably tremendous measure of patient information, there has been additionally, an expansion in the quantity of publications which apply deep learning to EHR information for clinical informatics errands yielding better performance over conventional techniques and which require less tedious pre-processing and highlight designing. This chapter audits the particular deep learning strategies employed for EHR information examination and inference, and talks about the strong clinical applications empowered by such advances. Dissimilar to other new studies, which explored deep learning in the broad context of health informatics applications of informatics, running from genome investigation to biomedical picture examination, this chapter is focussed only on deep learning methods combined to EHR information [15]. Table 1 Recent deep EHR projects

Project

Deep EHR task

References

Deepr

Hospital re-admission prediction

[8]

DeepPatient

Multi-outcome prediction

[9]

Doctor AI

Prediction of heart failure

[10]

DeepCare

EHR concept representation

[11]

eNRBM

Stratification of suicide risk

[12]

Med2Vec

EHR concept representation

[10]

Deep Learning for Analysis of Electronic Health Records (EHR)

151

Fig. 1 Patterns in the quantity of publications relating to deep EHR [16]

Selection Criteria and Search Strategy The investigations include publications which were published up to August 2017. All inquiries incorporate the expression “electronic-health-records” or “EHR” or “electronic-medical-records” or “EMR”, mutually with “deep learning” or a particular deep learning procedure (Sect. 4). Figure 1 demonstrates the dispersion of the number of distributions every year in a variety of zones identified with deep EHR. The first distribution of Fig. 1 contains overall results “deep-learning” and “electronichealth-records”, which illuminates the general yearly increment in the number of productions identified with deep learning and EHR. The second distribution shows these same terms in conjunction with a variety of specific application areas. For these inquiries, varieties of included terms are incorporated, for example: “recurrent neural network” OR “RNN”, “deep-learning”, “electronic-health-records”. As the general number of publications is moderately less, the most noticeable and original deep EHR publications are incorporated into the rest of the part. In Sect. 2 there is a survey of EHR frameworks. At that point key machine learning ideas are clarified in Sect. 3, trailed by deep learning systems in Sect. 4. Further, in Sect. 5 ongoing utilizations of deep learning (DL) for EHR information examination are talked about. At last, the part is closed by recognizing current difficulties and future open doors in Sect. 7.

2 Electronic Health Record (EHR) Systems Usage of the EHR frameworks has altogether extended in both hospital an ambulatory thought context [2]. EHR use at crisis hospitals and clinics can improve understanding thought by reducing errors, growing profitability, and improving care coordination, while furthermore giving a rich wellspring of data for examiners. EHR frameworks can change in functionality terms, and are regularly masterminded into basic EHR without clinical notes, basic clinical notes with EHR, and comprehensive systems.

152

P. S. Gangwar and Y. Hasija

While lacking additionally created usefulness, even basic EHR frameworks can give an information on patient’s medicinal history, challenges, and medication use. EHR, since, was generally proposed for internal hospital administrative assignments, a couple classification design were available for record relevance therapeutic information and cases. A couple of models consolidate investigation codes, system codes, re-look office perceptions, and solution codes. These codes can change between foundations, with midway guide pings kept up by resources. Given the gigantic display of schemata, mixing and investigating data across over wordings and between foundations is a consistent region of research. A couple of the profound EHR frameworks in the part proposes sorts of clinical code portrayal that credit themselves even much viably to across foundation examination and application. EHR frameworks store a couple of sorts of patient information, including demographics, diagnoses, physical exams, sensor measurements, laboratory test results, prescribed or administered medications, and clinical notes [15]. EHR data is heterogeneous, include data types: (1) (2) (3) (4)

Numerical sums, for instance, BMI (weight file), Date time objects, for instance, birth date or time of insistence, Categorical characteristics, and Natural language free-content, for instance, advance notes or discharge summaries. Besides, these data types can be mentioned sequentially to outline the explanation behind, (5) Derived course of action of time, for instance, perioperative essential sign or multimodal tolerant history. While other biomedical data, for instance, restorative pictures or genomic incourse of action exist and are peddled in later huge articles, in this review we focus on these 5 data sorts found in many present day EHR frameworks.

3 An Overview of Machine Learning Machine learning strategies can be thoroughly apportioned into 2 vital orders: supervised and unsupervised learning. Supervised learning procedures incorporate deriving a mapping capacity for example y equals f(x), sources of info x to yields y. Examples of supervised learning tasks include regression and classification, with algorithms including logistic regression and support vector machines. On the other hand, the target of unsupervised learning frameworks is to get fascinating properties of the scattering of x. E.g. of unsupervised learning tasks include clustering and density estimation. The representation of inputs is a fundamental issue spanning all types of machine learning frameworks. For every datum point, attributes set called as, features, are separated to use as input to ML frameworks. In standard ML, the features used to be hand-made reliant on territory data. One inside norms of deep learning is the automatic data-oriented feature extraction.

Deep Learning for Analysis of Electronic Health Records (EHR)

153

4 Deep Learning and Its Approaches Deep learning wraps a broad grouping of methodology. In this segment, a short layout of the much generally perceived deep learning systems. For every specified engineering, a key condition is included that depicts its critical technique for errand. The main idea in deep learning are/is of portrayal. Generally, input in features to a ML computation should be hand-made from unrefined data, contingent upon professional aptitude and territory figuring out how to choose un-ambiguous instances of prior premium. The planning methodology of making, separating, choosing, and evaluating legitimate feature(s) could be troublesome and tedious, and is frequently thought of as “dim craftsmanship” requiring imagination, experimentation, and as a rule karma. Then again, deep learning techniques increase perfect features straightforwardly from the data itself, with no human bearing, taking into thought the customized divulgence of inactive data associations that may somehow be dark or concealed. Complex data portrayal in deep learning is regularly imparted as plans of other, increasingly clear portrayals. For example, seeing a man in a picture could incorporate findings portrayal of edge-from-pixel. This thought of unsupervised different levelled portrayal of growing multifaceted nature is a repetitive profound learning theme. Most by a long shot of profound learning computations and architectures depend on the arrangement of the artificial neural framework (ANN). ANNs are made out of different interconnected nodes (neurons), engineered in layers as showed up in Fig. 2. E(θ, D) = −

D 

[log P(Y = yi |xi , θ )] + λθ  P

(1)

i=0

The main term in condition constrains the whole of the log setback over the whole preparing data-set (D); 2nd term tries to restrict p-standard of the educated modelparameters θ i which is constrained by a tuneable-parameter λ. This second term is called as regularization; and is a strategy used to keep a model from over-fitting and to manufacture its ability to total up to new, covered points of reference. The misfortune work is generally upgraded using back propagation, a framework for weight streamlining that limits misfortune in reverse. In the rest of this area, a few normal kinds of profound learning-models utilized for deep EHR application are assessed, which is/are all founded on the ANN’s design Fig. 2 A fundamental neural-network [16]

154

P. S. Gangwar and Y. Hasija

and enhancement technique. A various levelled perspective on these regular deep learning models for investigating EHR information, alongside chose works in this overview which actualize them, are appeared in Fig. 3.

4.1 Multilayer Perceptron (MLP) A MLP is a kind of ANN which comprise of multiple hidden layers, in which every neuron in the layer I is totally associated with one another neuron in the layer I + 1. Conventionally, these systems are constricted to two or three shrouded layers, and the information streams just in one direction, as opposed to repetitive/undirected models. Expanding the possibility of a single-layer ANN, each shrouded unit forms a weighted sum of the yields from the past layer, trailed by a nonlinear initiation (σ) of the determined aggregate as in condition. Here; d is the amount of units in past layer x j is the yield from the past layer’s jth hub, and wij and bij are weight and inclination substances related with each x j . Customarily sigmoid/tan h were picked nonlinear enactment capacities, however present day systems are utilizing capacities, for example, amended direct units (ReLU) [17]. ⎛ ⎞ d  hi = σ ⎝ x j wi j + bi j ⎠

(2)

j=1

In the wake of advancing hidden layer loads amid preparing, the system learns a connection between data x and yield y. As more hidden layers are included, it is normal that the information will be appeared in an obviously progressively unique way in light of each shrouded layer’s nonlinear enactment. While the MLP is one of least troublesome models, various structures frequently combine totally associated neurons.

4.2 Convolutional Neural Networks (CNN) CNN had transformed into an incredibly common gadget of late, especially in the image processing community. CNNs power neighbourhood availability on the unrefined information. For instance, instead of viewing a 50 × 50 picture as 2500 irrelevant pixels, increasingly significant features are separated by studying the image as an accumulation of neighbourhood pixel patch. Basically, a one-dimensional (1D) time course of action can in like manner be considered as an integration of neighbourhood signal bits. The condition for 1-D convolution is showed up in condition, where x is information sign and w is gauging capacity or is convolutional channel.

Fig. 3 The most widely recognized architectures of deep learning for examining EHR information [16]

Deep Learning for Analysis of Electronic Health Records (EHR) 155

156

P. S. Gangwar and Y. Hasija

Fig. 4 Convolutional neural network (CNN) for ordering pictures [16] ∞ 

C1d =

x(a)w(t − a)

(3)

a=−∞

In same way, two-dimensional (2D) convolution is presented in the expression below, in which X is a 2-D grid and K is the kernel. C2d =

 m

X (m, n)K (i − m, j − n)

(4)

n

CNNs incorporate deficient associations as the channels are reliably humbler than the information, accomplishing usually unobtrusive number of parameters. Convolution in like way invigorates parameter sharing since each channel is associated over the whole data. In a CNN, the convolution layer is diverse convolutional channels depicted over, all tolerating a similar responsibility from the past layer, which ideally make sense of how to remove unmistakable lower-level highlights. Thus, a subsampling or pooling layer is conventionally interfaced to signify the removed highlights (Fig. 4).

4.3 Recurrent Neural Networks (RNN) RNNs is/are an exact decision when information is progressively requested, (for instance, time course of action information or normal language). While 1D (onedimensional) courses of action could be encouraged to CNN, the consequent removed feature(s) is/are shallow, as in just immovably restricted associations between a couples of neighbours are factored in the segment portrayals. RNNs are intended to manage several long-run common conditions. RNNs is worked by successive refreshing a hidden state ht put together not just with respect to the enactment of the present information x t at time t, yet additionally on the past concealed state ht − 1, which thusly is refreshed from x t − 1, ht − 2, etc. (Fig. 5). As such, the last hidden state consequent to setting up a whole progression contains data from all its past segments.

Deep Learning for Analysis of Electronic Health Records (EHR)

157

Fig. 5 RNN: symbolic representation (left), expanded representation (right) [16]

Standard RNN varieties incorporate the long short-term memory (LSTM) and gated recurrent unit (GRU) model, the two named to as gated-RNNs. while standardRNNs are involved inter-connected shrouded sanctum units, every unit in the gatedRNN is supplanted by an uncommon cell which contains an inward recurrent circle and an arrangement of doors which controls the movement of information. They have showed up in demonstrating longer term progressive conditions among various preferences.

4.4 Auto-encoders (AE) Sort of deep learning models encapsulating possibility of unsupervised representation learning is AE. 1st promoted as an early gadget to pre-train regulated deep learning models, particularly when labelled information was uncommon, yet in the meantime hold handiness for altogether unmanaged assignments, for instance, phenotype disclosure. Auto encoders are intended to encode the contribution to a low dimensional space; z. The encoded portrayal is then decided by reproducing an approx. portrayal x˜ of the information x. W, W0 are the individual encoding and interpreting loads, and as the reproduction mistake x − x˜  is smaller than usual mined, the encoded portrayal z is considered progressively dependable. z = σ (W x + b)

(5)

  x¯ = σ W  z + b

(6)

At the point when AE is prepared, a lone data is bolstered through the net-work, with most deep hidden layer initiations filling in as the data’s encoded portrayal. AEs serve to change the information into a format where simply the most critical inferred measurements are put away. Thusly, they resemble standard dimensionality decrease systems like principal component analysis (PCA) and singular value decomposition (SVD), yet with a basic bit of leeway for complex issues because of nonlinear changes by methods for each concealed layer’s enactment capacities.

158

P. S. Gangwar and Y. Hasija

Fig. 6 Two hidden layers independently-trained stacked auto-encoder [16]

Profound AE systems can be built and prepared in an insatiable structure by a methodology referred to as stacking (Fig. 6). Various varieties of AEs had been presented, including de-noising auto encoders (DAE), sparse auto-encoders (SAE), and variation auto encoders (VAE).

4.5 Restricted Boltzmann Machine (RBM) Other unsupervised deep learning engineering for learning input information portrayals is RBM. The purpose behind RBMs resembles auto-encoders, yet RBMs rather take a stochastic point of view by assessing the probability dispersion of the data information. Thusly, RBMs are regularly seen as generative model/s, attempting to demonstrate the hidden technique by which the information was created. The acknowledged RBM is an imperativeness based model with two-fold discernible units (~v) and shrouded units (~h), with essentialness work indicated in condition. E(v, h) = −b T v − c T h − W v T h

(7)

In a BM, all the units are totally associated, while in a RBM there are no associations between any two discernible units/any two concealed units. Preparing a RBM is consistently practiced through stochastic improvement, for instance, Gibbs testing.

5 Deep EHR Learning Applications In this area, we review the present forefront in clinical applications coming about in light of continuous advances in profound EHR learning. A diagram generally deep EHR learning ventures and the target assignments is seemed table, where we star

Deep Learning for Analysis of Electronic Health Records (EHR)

159

Table 2 Summary of EHR deep learning tasks Task

Subtasks

Input data

Information extraction

Temporal event extraction Abbreviation expansion Relation extraction Single concept extraction

Clinical notes

Representation learning

Concept representation Patient representation

Medical codes

Outcome prediction

Static prediction Temporal prediction

Mixed

Phenotyping

New phenotype discovery Improving existing definitions

Mixed

De-identification

De-identification of clinical text

Clinical notes

present errand and subtask definitions dependent on a coherent social event of ebb and flow examine. An impressive part of the applications and results in the rest of this area depend on datasets of private EHR having a place with autonomous medicinal services foundations in Section VII. In any case, a couple of concentrates incorporated, a transparently available fundamental thought information base, similarly as open clinical note datasets (Table 2).

5.1 EHR Information Extraction (IE) Instead of the organized fragments of EHR information conventionally utilized for charging and regulatory purposes, clinical notes are more nuanced and are basically utilized by medicinal services suppliers for detailed documentation. Each patient experience is related with a couple of clinical notes, for instance, admission notes, discharge summaries, and transfer orders. Historically these techniques have needed a great deal of non-automatic part building and ontology mapping; one inspiration why this methodology had seen restricted appropriation. Everything considered, a couple of progressing thinks about have concentrated on separating critical clinical information by using deep learning (Fig. 7). The major sub-tasks incorporate (1) (2) (3) (4)

Single idea extraction, Temporal event extraction, Relation extraction, and Abbreviation expansion

Assessing method Accuracy, review, and F1 score are the essential classification measurements for the assignments including single idea extraction, Temporal event extraction [18], and

160

P. S. Gangwar and Y. Hasija

Fig. 7 EHR information-extraction (IE) [16]

clinical relation extraction [19]. The study on clinical shortened form development used exactness as its assessment method. While a few studies share comparative assignments and assessment measurements.

5.2 EHR Representation Learning Presently, carefully assembled example are utilized for mapping between organized medicinal thoughts, where each thought is appointed an unmistakable code by its significant metaphysics. These static various levelled associations disregard to gauge the natural likenesses between thoughts of various sorts and coding plans. Continuous deep learning systems utilized for progressively point by point examination and logically careful prescient assignments. In this area, at first deep EHR strategies for addressing discrete medicinal codes is depicted as certifiable esteemed vectors of discretionary measurement. These undertakings are, all things considered, unsupervised and focus on normal associations and gatherings.

5.2.1

Concept Representation

Several recent studies have applied deep unsupervised representation learning techniques to derive EHR concept vectors that capture the latent similarities and natural

Deep Learning for Analysis of Electronic Health Records (EHR)

161

clusters between medical concepts. We insinuate this district as EHR thought portrayal, and its fundamental goal is to get vector portrayals from meagre medicinal codes to such a degree, that practically identical thoughts are adjoining in vector space. Inactive Encoding: Aside from NLP-roused strategies, other typical profound learning portrayal learning methodology have similarly been utilized for addressing EHR thoughts. Tran et al. plan an adjusted restricted RBM which uses an organized preparing strategy to fabricate portrayal interpretation. They assessed the nature of associations between different restorative thoughts, and found that preparation direct models on portrayals got through AEs massively outflanked customary straight models alone, achieving top tier execution.

5.2.2

Patient Representation

A few distinctive profound learning techniques for getting vector portrayals of patients have been proposed in the composition. Most of the techniques are either propelled by NLP systems, for instance, conveyed word portrayals, or use dimensionality decrease methodology, for instance, auto encoders. Methods of Assessment for EHR Representation-Learning A significant part of the examinations including portrayal learning evaluate their portrayals dependent on partner arrangement under-takings, with the comprehended doubt that redesigns in expectation are ascribed to a logically fiery portrayal of either clinical thoughts or patients. Techniques for appraisal are as such shifted and undertaking subordinate, including estimations, for instance, AUC (heart frustration starting forecast, illness expectation, clinical peril bundle forecast), precision@k (infection development, ailment naming), recall@k (therapeutic code expectation, coordinated clinical event forecast), precision (impromptu readmission forecast), or exactness, audit, and F1 score (association extraction), spontaneous readmission expectation, chance stratification). A couple of studies do exclude any optional order errands, and focus on evaluating the educated portrayals legitimately.

5.2.3

Outcome Prediction

A definitive objective of numerous Deep-EHR systems is to predict persistent results. (1) Static/one-time prediction (for example heart failure prediction utilizing data from a solitary experience), and (2) temporal outcome prediction (for example heart failure prediction within a half year, or disease beginning prediction utilizing recorded data from consecutive experiences). A significant number of these prediction systems utilize unsupervised data modelling, for example, clinical idea representation (Section V-B). As a rule, the principle commitment is simply the deep representation learning. (1) Static Outcome Prediction:

162

P. S. Gangwar and Y. Hasija

The clearest class of result forecast application’s expectation of a particular outcome not including common imperatives. (2) Temporal Outcome Prediction: They furthermore anticipate future readmission dependent on these past conclusions and interventions. For all errands, they found the profound strategies brought about the best execution. Nickerson et al. guess postoperative reactions including post-usable urinary support (POUR) and transient instances of postoperative torment using MLP and LSTM systems to propose logically powerful postoperative torment the administrators. Nguyen et al. Deepr system uses a CNN for foreseeing spontaneous re-confirmation following release. Like a couple of various strategies, Deepr works with discrete clinical event codes.

5.2.4

Computational Phenotyping

As the whole and accessibility of itemized clinical wellbeing records has detonated of late, there is a gigantic task open door for coming back to and refining wide infection and determination definitions and limits. A model utilization rationale of allowing the information to legitimize itself with genuine proof by finding lethargic associations and various levelled thoughts from the rough information, with no human supervision or earlier inclination. With the accessibility of massive proportions of clinical information, various continuous investigations have utilized profound learning frameworks for computational phenotyping. Computational phenotyping has two basic applications: (1) Finding and stratifying new subtypes; (2) Finding unequivocal phenotypes for improving arrangement under existing ailment limits and definitions. The two zones attempt to find new information driven phenotypes; the past is an, all things considered, unsupervised endeavour i.e. hard to quantitatively survey, where the other is naturally attached to an administered learning with viably approved result.

5.2.5

Clinical Data De-identification

Clinical notes usually incorporate unequivocal PHI (individual wellbeing in-course of action), which makes it tuff to openly release various profitable clinical datasets [20]. A framework Dernoncourt et al. [20] was made for the modified dedistinguishing proof of clinical substance, which substitutes a generally tenacious manual de-recognizable proof procedure for sharing restricted information. Their structure includes a bidirectional LSTM arrange (Bi-LSTM) and both word-level and character embedding. The makers observed their system to be top tier, with an outfit approach with restrictive irregular fields furthermore faring incredible. In a

Deep Learning for Analysis of Electronic Health Records (EHR)

163

similar endeavour, Shweta et al. research different RNN designs and word inserting methods for distinguishing perhaps recognizable named substances in clinical substance.

6 Interpretability Since deep learning strategies have increased sick notoriety for creating best in class execution on a wide assortment of errands, its real analysis is that the yield models are hard to translate normally. Accordingly, a few deep learning structures are much of the time alluded to as “secret elements”, where just the information and yield forecasts pass on significance to a human spectator. The fundamental downside for this absence of model straightforwardness is actually what makes deep realizing so viable: the layers of nonlinear information changes that uncover concealed variables of trap in the info. This issue exhibits an exchange off among execution and receptiveness (Table 3). In the clinical area, model straightforwardness is most significant, given that forecasts may be utilized to influence understanding medicines and certifiable restorative basic leadership. This is the motivation behind why interpretable direct models like calculated relapse stifle connected clinical informatics. In this part, clinical deep learning is endeavoured to be made increasingly interpretable.

6.1 Maximum Activation A mainstream game plan inside the picture handling network is to take a gander at the classes of information sources that end in the most extreme enactment of each concealed unit of a model. This speaks to a preliminary to take a gander at what Table 3 Techniques of interpretability for deep EHR systems Type

Methods

(1) Maximum activation

• Output activation maximization [21] • Convolutional filter response [8] • Dense top-layer weight maximization [16]

(2) Constraints

• • • • •

(3) Qualitative clustering

• t-SNE [8]

(4) Mimic learning

• Interpretable mimic learning [23]

Non-negative matrix factorization [21] Non-negativity [12] Ontology smoothing [12] Sparsity [22] Regularization [12]

164

P. S. Gangwar and Y. Hasija

unequivocally the model has learned, and may be utilized to dole out significance to the crude info choices. This methodology has been embraced by numerous investigations encased in our outline.

6.2 Constraints Others have mandatory preparing imperatives explicitly pointed towards expanding the interpretability of deep models. The creators take the k biggest estimations of every section of the subsequent code weight network as an unmistakable illness bunch that is interpretable upon subjective survey. They likewise play out the indistinguishable procedure on the subsequent visit installing grid for examining the sorts of visits every neuron figures out how to spot. Correspondingly, eNRBM engineering additionally implements non-pessimism inside the loads of the RBM. In phenotype revelation system for information of time arrangement, their regularization and sparsity requirements on the AE came about persistent highlights on the first layer that were interpretable as finders of practical component like tough or downhill sign inclines, is another case of progress of interpretability of scholarly model loads through sparsity.

6.3 Qualitative Clustering In the kind of EHR idea representation and phenotype thinks about, a few examinations point to a much roundabout thought of interpretability by looking at normal groups of the subsequent vectorised representations. In comparable manner, Nguyen et al. venture dispersed representations of clinical occasion and patient vectors into 2 measurements by means of t-SNE, taking into consideration a subjective correlation of comparative determinations and patient subgroups.

6.4 Mimic Learning The issue of deep model straightforwardness was handled at long last in the Interpretable Mimic Learning systems. Initial a deep neural system was prepared on crude patient information with related marks of class, which results into a vector for every example. An extra gradient boosting tree (GBT) was prepared on the crude patient information, yet the deep system’s likelihood expectation was utilized as the objective name. As GBTs are interpretable straight models, highlight significance can be appointed to the crude information highlights while outfitting the intensity of deep systems. The copy learning technique has comparative or preferred execution over

Deep Learning for Analysis of Electronic Health Records (EHR)

165

both of the standard straight and deep models for some phenotyping and mortality forecast undertakings, while holding the needed component straight forwardness.

7 Discussion and Future Prospectus This chapter provides a brief overview of current deep learning research as it pertains to EHR analysis. This is a rising zone as seen by the fact that the greater part of the chapter was published in past two years [1]. Tracing back the deep learning-based advances in image and natural language processing, we see a clear chronological similarity to the progression of current EHR-driven deep learning research. In particular, a dominant part of study in the review are associated with the thought of representation learning, i.e., how successfully to represent the enormous measure of crude patient information that has amazingly turned out to be accessible in the earlier decade. Fundamental image processing research is concerned with increasingly complex and hierarchical representations of images composed of individual pixels. Additionally, NLP focusses on word, sentence, and report level representations of language including singular words or characters. Moreover, the investigation of different plans of speaking to quiet wellbeing information is occurring from individual medical codes, demographics, and vital signs [1].

References 1. Shickel, B., Tighe, P.J., Bihorac, A., Rashidi, P.: Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J. Biomed. Health Inform 22(5), 1589–1604 (2018) 2. Birkhead, G.S., Klompas, M., Shah, N.R.: Uses of electronic health records for public health surveillance to advance public health. Annu. Rev. Public Health 36(1), 345–359 (2015) 3. Charles, D., Gabriel, M., Searcy, T., Carolina, N., Carolina, S.: Adoption of Electronic Health Record Systems Among U.S. Non-federal Acute Care Hospitals: 2008–2014. The Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009 Directed the Office of the National Coordinator for Health, vol. 4, no. 23, pp. 2008–2014 (2015) 4. Jamoom, E., Yang, N.: Table of electronic health record adoption and use among office-based physicians in the U.S., by state. In: 2015 National Electronic Health Records Survey, pp. 1–2 (2016) 5. Botsis, T., Hartvigsen, G., Chen, F., Weng, C.: Secondary use of EHR: data quality issues and informatics opportunities. In: AMIA Joint Summits Translational Science Proceedings, vol. 2010, pp. 1–5 (2010) 6. Skrøvseth, S.O., Augestad, K.M., Ebadollahi, S.: Data-driven approach for assessing utility of medical tests using electronic medical records. J. Biomed. Inform. 53, 270–276 (2015) 7. Meystre, S.M., Savova, G.K., Kipper-Schuler, K.C., Hurdle, J.F.: Extracting information from textual documents in the electronic health record: a review of recent research. In: Yearbook of Medical Informatics, pp. 128–144 (2008)

166

P. S. Gangwar and Y. Hasija

8. Ekbal, A., Saha, S., Bhattacharyya, P.: Deep learning architecture for patient data deidentification in clinical records. In: Proceedings of the Clinical Natural Language Processing Workshop, pp. 32–41 (2016) 9. Choi, Y., Chiu, C.Y.-I., Sontag, D.: Learning low-dimensional representations of medical concepts. In: AMIA Joint Summits Translational Science Proceedings, vol. 2016, pp. 41–50 (2016) 10. Nguyen, P., Tran, T., Wickramasinghe, N., Venkatesh, S.: Deepr: a convolutional net for medical records. IEEE J. Biomed. Health Inform. 21(1), 22–30 (2017) 11. Choi, E., et al.: Multi-layer representation learning for medical concepts. In: Proceedings of the ACM SIGKDD International Conference on Knowledge and Discovery and Data Mining, pp. 1495–1504, 13–17 Aug 2016 12. Pham, T., Tran, T., Phung, D., Venkatesh, S.: DeepCare: a deep dynamic memory model for predictive medicine. In: Lecture Notes in Computer Science (including Subser. Lecture Notes in Artificial Intelligence Lecture Notes Bioinformatics), vol. 9652 LNAI, pp. 30–41, Feb 2016 13. Jiang, M., et al.: A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. J. Am. Med. Inform. Assoc. 18(5), 601–606 (2011) 14. Borovcnik, M., Bentz, H.-J., Kapadia, R.: A Probabilistic Perspective (1991) 15. Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. MIT Press (2016) 16. Cheng, Y., Wang, F., Zhang, P., Hu, J.: Risk prediction with electronic health records: a deep learning approach. In: 16th SIAM International Conference on Data Mining 2016 (SDM 2016), pp. 432–440 (2016) 17. Wong, C., Deligianni, F., Berthelot, M., Andreu-perez, J., Lo, B., Yang, G.: Deep learning for health informatics. IEEE J. Biomed. Health Inform. 21(1), 4–21 (2017) 18. Fries, J.A.: Brundlefly at SemEval-2016 task 12: recurrent neural networks vs. joint inference for clinical temporal information extraction. In: SemEval 2016—10th International Workshop Semantic Evaluation Proceedings, pp. 1274–1279 (2016) 19. Lv, X., Guan, Y., Yang, J., Wu, J.: Clinical relation extraction with deep learning. Int. J. Hybrid Inf. Technol. 9(7), 237–248 (2016) 20. Dernoncourt, F., Lee, J.Y., Uzuner, O., Szolovits, P.: De-identification of patient notes with recurrent neural networks. J. Am. Med. Inform. Assoc. 24(3), 596–606 (2017) 21. Tran, T., Nguyen, T.D., Phung, D., Venkatesh, S.: Learning vector representation of medical objects via EMR-driven nonnegative restricted Boltzmann machines (eNRBM). J. Biomed. Inform. 54, 96–105 (2015) 22. Lasko, T.A., Phenotype Discovery from Electronic Medical Records How Do You Perceive a Chessboard? (2017) 23. Che, Z., Purushotham, S., Khemani, R., Liu, Y.: Interpretable deep models for ICU outcome prediction. In: AMIA … Annual Symposium Proceedings. AMIA Symposium, vol. 2016, pp. 371–380 (2016)

Mr. Pawan Singh Gangwar is a dynamic individual currently pursuing his master’s in bioinformatics from Delhi Technological University. Having completed his bachelor’s in biotechnology he is highly motivated towards computational research in life sciences and possesses an in-depth knowledge of the field. In his free time Pawan likes to play badminton, chess and listen to music. Dr. Yasha Hasija a master of many fields Dr. Yasha is an Associate Professor in the Delhi Technological University. She holds a bachelor’s and master’s degree in biotechnology and Ph.D. in Bioinformatics. Besides having a sound academic foundation Dr. Yasha is a vibrant individual and a very good orator. Specializing in genome informatics and interaction study with human diseases, some of her research interests are—genetic analysis of dermatological disorders, tuberculosis study and role of human genetic variations in age-related disorders.

Application of Deep Architecture in Bioinformatics Sagnik Sen, Rangan Das, Swaraj Dasgupta and Ujjwal Maulik

Abstract Recent discoveries in the field of biology have transformed it into a datarich domain. This has invited multiple machine learning applications, and in particular, deep learning a set of methodologies that have rapidly evolved over the last couple of decades. Deep learning (DL) is extensively used in many domains, including bioinformatics for the analysis and classification of biomedical imaging data, sequence data from omics and biomedical signal processing. It has been used to predict protein structures, uncover gene expression regulation, classify anomalies and understand functionalities of the brain. Basic deep neural networks, which contains stacked columns of non-linear processing units, are quite versatile and has been extensively used in almost every domain of bioinformatics. Convolutional neural networks have proved to be quite effective when working with image data and are used in classifying biomedical images such as histopathology images, cell images, X-ray images, magnetic resonance images and so on. They have been used for anomaly classification, recognition, and segmentation. For areas that require dealing with sequential data, such as protein structure prediction and brain decoding, recurrent neural networks have been used extensively. Besides these, a lot of new architectures are being currently explored to address some of the common drawbacks of deep learning. Incorporation of fuzzy systems in deep learning has been done in an attempt to improve the performance of such models. Multimodal learning in deep learning is enabling modern architectures to work with heterogeneous data. Keywords Deep architecture · Bioinformatics · Biomedical images · Convolutional neural network · Recurrent neural network S. Sen (B) · R. Das · S. Dasgupta · U. Maulik Department of Computer Science and Engineering, Jadavpur University, Jadavpur, Kolkata 700032, India e-mail: [email protected] R. Das e-mail: [email protected] S. Dasgupta e-mail: [email protected] U. Maulik e-mail: [email protected] © Springer Nature Switzerland AG 2020 S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics, Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_9

167

168

S. Sen et al.

1 Introduction Deep learning (DL) is an important area in machine learning that has received a lot of attention recently. Deep learning methods work by progressively extracting complex features from the input data and mapping those features from the output. The learning algorithm can efficiently build up complex relationships between an input and the desired output. Therefore, it has been used extensively in areas like computer vision and pattern recognition, self-driving cars, robots, prediction of weather forecasts, earthquakes, and even generate deep neural networks. The innovations were not only fueled by the recent algorithmic advances in deep architectures as well as the availability of large throughput data. For training deep learning models effectively, copious amounts of data is necessary. With modern devices, sequencing techniques and improved imaging technologies, biology have become a data-rich field. Omics data itself is a major fraction of the accumulated data. There are also vast repositories of image data and signal data available. To make good use of the vast amount of available data, deep learning provides the perfect set of tools. Contemporary deep learning models are used to solve diverse biological problems such as protein structure prediction, protein-protein interaction analysis, protein function prediction, bioimage analysis, brain signal analysis and so on. Previously, to make sense of biological data, many other well-known algorithms were used, such as support vector machines (SVMs), hidden markov models (HMMs), random forests, Gaussian networks, Bayesian networks, and so on. These have been heavily implemented in proteomics, genomics, systems biology, and many other related domains. No matter what algorithm is used, the performance of the method depended extensively on what features were presented as the input. Features represent the input data which, subsequently processed by machine learning algorithms, provide the relevant output. But selecting what the right features are can be quite a difficult task, especially in the domain of omics. This has been a great contribution to deep learning that has not only helped make massive progress in other domains but in bioinformatics too.

1.1 Deep Learning: An Overview Deep learning has shown great promise in real-world applications where the majority of machine learning algorithms have failed. Most early machine learning approaches relied heavily on the knowledge of domain experts for feature engineering—the task of crafting the inputs for the machine learning model. This was a great limitation since the task of processing raw data for creating features was a tedious task. This is something that deep learning takes care of, but generating the features on its own and mapping them to the outputs. This is done by multiple layers of nonlinear processing units, called artificial neurons, or perceptions. Each neuron of a layer is connected with all the other neurons of the preceding and the succeeding layer, but with none

Application of Deep Architecture in Bioinformatics

169

of the neurons in its own layer. Layers of these neurons stacked one after the other forms a deep neural network. Each neuron can be tweaked using a set of parameters, called weights and biases. As the deep neural network is trained, these parameters are adjusted by the learning algorithms so that the error is minimized. As each layer of neurons gets trained over each iteration, they get better at extracting the relevant features from the input data. Most deep learning models are built based on this technique only. Deep learning architectures that can be broadly classified into three groups: deep neural networks (DNN), convolutional neural networks (CNN) and recurrent neural networks (RNN). The term deep neural networks is a very generic term that is often used to refer to all deep learning architectures, but in this case, refers to multilayer perceptrons (MLPs), restricted Boltzmann machines (RBMs) and stacked autoencoders (SAEs). CNNs have been used for computer vision problems before and in this case, medical image, microscopic images are extensively used with CNNs for analysis. RNNs are used to predicting or analyzing sequence data, such as biomedical signal data and sequence data.

1.2 An Overview of Protein Structures Proteins are essentially polymers of amino acids [1, 2]. After an amino acid sequence is created after transcription and translation, the chain of amino acids take up different shapes as it folds onto itself. The sequence of the chain of amino acids represent the primary structure of a protein. This chain is formed by the peptide bonds during the synthesis of the protein. Hence, amino acid sequences are also called polypeptides. These polypeptides fold into simple structures such as loops, sheets or helices. These structures are known as secondary structures. Secondary structures are regular local sub-structure on the polypeptide backbone chain. Depending on the amino acids that are present in the chain, the different structures are formed. Formally, the structures are mainly of three types: αhelix, β-sheets and Loops [1, 2]. The hydrogen bonds that exist between the carbonyl oxygen and amine hydrogen in the peptide backbone determine this. Subsequently, the tertiary structure of a protein is its 3D structure which is formed as the secondary structure folds onto itself. The tertiary structure has a one polypeptide chain backbone with single or multiple protein secondary structure (PSS). There are different bonding and non-bonding types of energy at work that determine the tertiary structure [1, 2]. These include the covalent bond energy, Hydrogen-Hydrogen (H–H) interaction energy, electrostatic energy, van der Waals forces and other intra-molecular forces. Many tertiary structures together combine to form the quaternary structure. This happens when multiple protein structures bind together to reach a minimum global energy state [3]. Protein structure prediction is quite a difficult task due to the various parameters at play. Predicting secondary structure from the primary sequences is not that difficult with contemporary methodologies. Chou-Fasman algorithm is a statistical tool that was initially used to find the secondary protein structure from its polypeptide sequence [4]. Now, multiple methods are available that can perform this prediction

170

S. Sen et al.

with a higher accuracy. For instance, a RNN can easily outperform the Chou-Fasman algorithm. However, predicting the 3D structure is quite a challenge. Two types of computational techniques i.e., template based and ab initio are implemented to predict three-dimensional structure computationally. Among them, the template-based technique is quite dated and depends on sequence similarity with another known structure sample e.g., homology modelling. However, the utmost target is to design a structure with a global minimum energy. Till date, multiple machine learning based algorithms are designed [5, 6]. Most of them approximate the results with multiple structures and then further optimize that result. The function of a particular protein depends on its structure of a protein. A protein interacts with other elements (mostly proteins except for DNA binding proteins) through the binding site. The interaction partner of protein is decided by compatible binding sites among proteins. Similarly, the protein-protein interaction (PPI) network is derived from multiple interaction partners [7, 8]. At biological level, PPI networks have their own importance. Technically, proteins are the main functional elements of any biological elements. Their behaviour depends on the functions and the interaction partners which assist in defining the position of a protein in any biological pathway [7, 8]. Predicting the functions and interaction partners are few known challenges in computational biology. Already different Hidden Markov model [9], Genetic Algorithms are implemented to solve this type of issues at some optimal level. However, the processing time is quite larger. So there is more scope of algorithmic improvements in such a field where few deep architectures are implemented e.g., Zhao and Gong [10] describes a deep model to predict protein protein interaction pairs. Deep architecture has a greater impact on image processing [11]. Therefore, deep learning approaches are implemented in medical images to diagnose unusual diseased conditions [12, 13]. In recent researches, it is observed that different deep architectures are implemented on MRI data [13], hyperspectral images [14] and so on.

2 Deep Learning Approaches for Predicting Protein Structures Predicting protein structures is one of the oldest problems in the domain of bioinformatics. Since structures are determined by the sequence, recurrent neural networks intuitively appear to be the suitable choice. However, other networks such as CNN and generative stochastic networks (GSN) have also been used. These are discussed below.

Application of Deep Architecture in Bioinformatics

171

2.1 Predicting with Long Short Term Memory (LSTM) Network A RNN, unlike DNNs or CNNs, have a cyclical computational graph. Traditional DNNs and CNNs are modeled as a feed-forward network. RNNs are commonly implemented using a Long Short Term Memory (LSTM) [15–17] units. In the feed forward neural network [18–21], the connection between neurons does not make any cycle while in the recurrent neural networks (RNN) [22, 23] allows cyclic connection. The main difference between RNN and multilayer perceptron (MLP) is that MLP can map only between input and output while RNN can map the entire history of the preceeding input to each output. LSTM recalls values over arbitrary intervals. An LSTM network is similar to a standard RNN, except that the hidden layer now has memory cells instead of summation units. A schematic to describe the workflow is described in Fig. 1. These memory cells can remember past patterns it has seen without loss. The same output layer which is utilized for RNN can also be used for LSTM [15]. LSTM can be implemented for structure prediction of proteins [24–26]. Even RNN can also be applied to find the predicting the secondary structure of protein [27], however, one of the disadvantages of RNN is the issue of vanishing gradients [27, 28]. LSTM is used for solving the problem of vanishing gradients. For predicting the secondary structure of protein, a simple RNN is not suitable. RNN only considers the past sequences, however, the entire sequence is required beforehand for protein sequences. This problem can be solved by bidirectional RNN [29]. Bidirectional RNN processes the data in both direction with two separate hidden layer, which subsequently are feed forwarded to output layer [29]. The combination of bidirectional RNN and LSTM is bidirectional LSTM [30]. Two layers are combined by normalizing the activation from each layer in a softmax layer [22]. The LSTM

Fig. 1 A schematic diagram to show the workflow of recurrent neural network

172

S. Sen et al.

uses a feed-forward network for PSS prediction using softmax prediction function [24]. Equations 1–8 gives the detailed description of the LSTM architecture which is used for protein secondary structure prediction [24]. Ft = σ (W F h t−1 + W F at + b F )

(1)

It = σ (W I h t−1 + W I at + b I )

(2)

t = tanh(at WG + h t−1 WG + bG ) G

(3)

Mt = Ft  Mt−1 + It   gt

(4)

Yt = σ (αt WY + h t−1 WY + bY )

(5)

h t = Yt  tanh(Mt )

(6)

h t−r ec = h t + Feed f or war d(h t )

(7)

σ (x) =

1 1 + exp(1 − x)

(8)

at : input from the previous layer: h l−1 t F t : Forget Gate I t : Input Gate t : New memory cell G M t : Final memory cell. Y t : Output Gate ht : Final hidden state ht −rec : Forward recursion. Sonderby and Winther [24] modified their LSTM architecture for protein secondary structure prediction by introducing a feed-forward network between recurrent-hidden state as in Eq. 7. This approach for protein secondary structure prediction mainly focuses on 8-class secondary structure [31] prediction which is more informative than the traditional 3-class and 8-class secondary structure labels were designed using the DSSP program [32]. The DSSP program classify each residue into eight classes (C: Loops and irregular elements (corresponding to the blank characters output by DSSP), E: β-strand, H: α-helix, B: β-bridge, G: 3_{10} helix, I: π-helix, T: Turn, S: Bend). This model uses 3 layers that have 300 or 500 LSTM units per layer. The FF network is implemented using a two layers ReLU activation with similar number of units per layer. The output from bidirectional forward and backward is connected to a vector that is forwarded through two ReLU activation layers which have 300 or 500 hidden units. This approach achieved accuracy of

Application of Deep Architecture in Bioinformatics

173

67.40% [24], better than GSN approach [33] which achieved accuracy of 66.40%. And LSTM network also perform much better than bidirectional RNN approach [34] which got accuracy of 51.10%.

2.2 Deep Supervised and Convolutional Generative Stochastic Networks Generative Stochastic Network (GSN) [35] has been recently used to [36] learn generative data distribution models without stating any probabilistic graphical model. Backpropagation is applied to train the GSN model [35, 37]. GSN can estimate the data, generated by the transition operator of a Markov Chain rather than directly parameterizing P(X) [38]. GSN trains a stochastic computational graph for reconstructing the input X [39]. The primary advantage of a GSN is that the computational graph may have latent states. This is similar to generative models like Deep Boltzmann Machine (DBM) [40]. The architecture is described below: There are two inputs, i.e., a feature channel X and label channel y for applying convolution GSN. Figure 2 shows the architecture of a convolutional GSN model. For supervised convolutional GSN, the computational graph corrupts the label channels and reconstructs the label channels. Feature map is given as input to the first hidden layer to compute the activation function [33]. The convolutional GSN includes an input layer and a convolutional layer. Computational graph in convolutional GSN utilizes layer-wise sampling which is similar to DBM [40]. The convolutional GSN layer in computation graph of convolutional GSN must have a convolutional layer but the pooling layer is optional. Stacked convolutional layers can be used deeper architectures [36]. The convolutional GSN approach for PSS prediction mainly focuses on 8-state secondary structure [31] prediction which gives more structural information than 3-state secondary structure. Unlike the 3-class SS, the 8-class can distinguish between 3-helix and 4-helix. Therefore, it can be used describe different types of loop regions. Position-specific scoring matrix (PSSM) is also used predicting the secondary structure of protein [41]. PSSM is a matrix of size n × b where n is Fig. 2 Show the architecture of convolutional GSN with 2 convolutional GSN layer

174

S. Sen et al.

the protein length and b is the number of amino acid types. PSSM matrix is generated using the UniRef90 data set. The generated PSSM matrix is used as input for convolutional GSN model [33]. Score of the PSSM matrix is then transformed into a range of 0–1 using sigmoid function [42]. The protein data set is generated by PISCES Cull PDB server [43]. The data set consists of 6128 proteins which is divided randomly into a training set which contained 5600 proteins and a validation, n set of 256 proteins and test dataset contained 272 proteins [33]. 8-state secondary structure labels are determined from the 3D protein data bank (PDB) structure by the database of secondary structure assignments (DSSP) program [32]. The training data contain both labels and features. To inject some noise into the input labels, half of the input labels were randomly set to zeros. The Convolutional GSN is trained globally by backpropagation [35]. Sigmoid activation is used in the visible layer while tanh activation function used for all other layer. This Convolutional Generative Stochastic Network approach for protein secondary structure prediction [33] achieved Q8 accuracy of 66.40%, better than CNF/Raptor-SS8 [44] which achieved Q8 accuracy of 64.90%. The main disadvantage of this convolutional GSN approach is that the convolutional structure is hard-coded, thus it some times may not capture the spatial organization of the protein sequence.

2.3 Latent Convolutional Neural Networks A deep architecture applying CNN algorithm was utilized to implement a latent deep learning system for predicting protein structure. This architecture has two levels. Firstly, stacked sparse autoencoder approach was implemented to extract initially protein features and then the screened data are utilized as input for latent CNN architecture. Detail description of the levels is given below. Stacked Sparse Autoencoder Approach to Extract Initial Protein Features An autoencoder is an unsupervised feature extraction model. An autoencoder consists of three layers of artificial neurons where the intermediate hidden layer has fewer nodes than the input layer, while the output layer and the input layer has smae number of nodes. The goal of an autoencoder is to replicate the input in the output. Since the data is passed through a smaller number of intermediate nodes, the features are compressed and the output is represented by only the most dominant features of the input. This is how the important features are automatically extracted by the autoencoder. A stacked autoencoder is made out of multiple consecutive autoencoders where the extracted features of one layer is passed as the input to the succeeding autoencoder. [45]. The sparse autoencoder is used to extract the initial level of protein features. This, when used in conjunction with a CNN can enable us to get a better set of features. In the architecture, sparse autoencoder works as a reprocessing and feature extraction unit. For preprocessing, the available protein dataset is separated into two part, training data and validation data. Binary representation is mapped with the sequence string. From the combination of 20 amino acid, one amino acid is coded with 1 and 0 is set in all other positions. Twenty binary strings are needed where

Application of Deep Architecture in Bioinformatics

175

each string represent one amino acid. So the size of input data is 20 × M, where M is number of amino acid in the chain [5]. The same procedure has been used in the output. The α-helix is represented as [1 0 0] whereas β-sheet and the Loop are represented as [0 1 0] and [0 0 1] respectively [5]. The input data of 20 × M dimension is fed into the autoencoder to detect the initial of features from the training data. Using this feature, the softmax classifier is trained [46] to predict the secondary structure [47]. Deep Learning Implemented Using Latent CNN Structure CNN based deep architecture is motivated by animal visual cortex [48]. Al-Azzawi describes at [47] that a latent deep learning architecture can be based on the stacked sparse autoencoder. A CNN is based on neural networks. They are composed of layers or artificial neurons which can learn shared weights and biases. CNN uses backpropagation algorithm to train the network [49, 50]. Local receptive field or local filter scans the entire input data. This local filter unit shares the same weights and biases. This means that all the neurons in the initial hidden layer learn the same features [49, 50]. The feature extraction is done by convolution of the input data with filter and by including a bias term, and then passing the data through an activation function. CNN applies learned filter to convolve the features map from the previous layer. The second operation is pooling. The pooling layer performs subsampling to decrease the size of the output. The max-pooling is a common method of subsampling that takes the maximum value in a local window of the output. The entire map is divided into small, equally sized regions and the maximum value from each region is taken [49, 50]. The final layer of connection is fully connected layer. The final layer of connection is a fully connected layer. As mentioned before, the latent CNN structure is the combination method of stacked sparse autoencoder and deep CNN. Al-Azzawi [47] used SCRATCH protein dataset that contains the primary and secondary structures with their three-class descriptions. The performance of PSS prediction system is measured by the ratio between the number of correct predictions or true positives to the total number of attempts [47]. By using stacked sparse autoencoder the training accuracy achieved is 62.67% and testing accuracy achieved is 61.04% [47]. The latent deep learning approach for protein secondary structure prediction system is achieved by the accuracy of 90.31% using SCRATCH protein dataset [47]. While the machine learning approach proposed by Chistophe is achieved by 84.51% [51].

3 Deep Learning Approach for Protein–Protein Interaction and Protein Function Prediction To understand the molecular mechanism, protein function prediction is the key point. Under the structure-function paradigm, the functional dependencies of proteins are associated with structures of proteins at the cellular and subcellular level. Organismspecific function prediction of the protein from the structure or biophysical properties is a machine learning based modeling problem. Following that, the interaction

176

S. Sen et al.

partners are determined by the functional classification of the proteins. Predicting interaction partners from Protein–Protein Interaction (PPI) network is also one of the computational challenges. Aforementioned issues can be addressed by applying deep architecture. Few recent types of research on this topic have been discussed below.

3.1 Identification of Protein Function Based on Its Structure Using Deep CNN Protein function prediction methods are techniques that are used to define the biological and biochemical role of proteins. DCNN, a high-performance model in machine learning [49, 50] is introduced to design a predictive model for protein function prediction Fig. 3. DCNN consists of convolutional and pooling layers. The depth of each filter increases from the start to the end in the network. The last stage is basically made of one or more fully connected layers. DCNN architecture can be used to predict the function of the protein [52]. The protein function is associated with the 3D structure of the protein. The binding site also influences the functions. A domain in a protein is a structural motif which folds into a definite structure. CATH is a hierarchical classification of the structure of protein domains [53]. SCOP was introduced to provide details and a elaborate description of the structure and correlations of the known protein structure [54]. For tertiary structure recognition of protein, feature extraction is a vital step. One conventional method for identification of the 3D structure of a protein is extracting the feature vector and then comparing them by some distance measure [55]. But this distance based method may not give similar structures of certain types and it is very sensitive. The 3D structure is based

Fig. 3 The DCNN architecture for tertiary protein structure prediction [52]

Application of Deep Architecture in Bioinformatics

177

on a backbone polypeptide chain that is flanked by one or more protein secondary structures or domains. The bonding, as well as the interactions, of side chains with the subunits of the protein define the complete 3D structure. The protein tertiary structure is represented using the position of the atoms in 3D space [2]. For protein function prediction using DCNN, a three dimensional (3D) array is required which represent the tertiary structure of the protein. Visualization tool [56] can describe the protein structure in 3D form. However, DCNN requires coordinates which shows the connection between atoms in a 3D array. Virtual Reality Machine language (VRML) can be applied for better visualization and pixel coordinates. For converting PDB files to VRML format, Molscript Tool can be used [57]. Then Binvox a 3D mesh visualization tool is used to generate the 3D array which contains 0 and 1. These 1 represent the presence of an object in array [52, 58]. Tertiary protein structure is represented as a 3D array format after pre-processing. This 3D array is projected into three perpendicular hyperplane XY, XZ, YZ of feature space [52]. Each of this projected 2D image is provided as the input to the DCNN. In deep CNN for each projected view, separate feature extraction layer is implemented. The last layer applies Rectified Linear Unit (ReLU) [11] for classification. The DCNN extracts the feature from three separately projected image and classifies them using the fully connected neural network. As the functional property of protein is completely depending on the shape and size of the binding site region. The proposed approach [52], applying DCNN, can classify the protein based on their active domains and thus their functionality. The data set is divided into 5 non-overlapping data set and each data set is also divided into a training set and a validation set [52]. This proposed model [52] achieves maximum accuracy of 88% and average accuracy of 81% which is close enough to recently developed successful method based on protein fold recognition [59–62] with the accuracy of 89%, 74%, 80% and 84.04% respectively (Fig. 4).

Fig. 4 A schematic diagram to show the workflow of deep convolution neural network

178

S. Sen et al.

3.2 DL Based PPI Interface Residue Pair Prediction PPIs are biochemical events that involve two or more protein molecules. PPI plays an important role in the functioning of the cell. It is important to identify PPI sites at they show which amino acid residues contribute most to the protein–protein interactions. This allows them to be potential drug targets too. Furthermore, they allow us to gain insight into metabolic and signal transduction networks. Domains like protein engineering, protein design, drug design, and other applications heavily rely on the understanding of PPI. For correctly predicting the PPI, methods have been designed to predict the biding sites of monomer protein [63]. There are mainly four kinds of approaches including machine learning [64], template-based [65], correlated mutations [66] and structural model [67]. The latter is widely utilized but it has some limitation [68]. Nowadays, deep architecture has become one of the popular approaches to perform [69–71] PPI interface residue pair prediction [10]. More precisely, LSTM is applied [15–17]. As mentioned before, LSTM is an RNN architecture that remembers values over arbitrary intervals. Unlike RNN, the summation units are substituted by memory cells in LSTM. The memory cells can remember previously seen information. The output layer which is used for RNN can also be used for LSTM. The RNN consists of the input, the hidden, and the output layer. The input propagates through these layers in order. This is the forward pass. There are mainly two methods for regulating the weights in the neural networks, first is real-time recurrent learning and the second is backpropagation. For backpropagation, first, the partial derivative has to be calculated of the loss or the error function with respect to the output of the network. To change the weights, the partial derivative of the loss with respect to the weights are calculated. And finally, applying the chain rule adjustment, the direction of the weights is observed. This procedure is called a backward pass. The output depends on the cell state which is determined by running a sigmoid layer. Thereafter, the cell state is put into the activation layer with an activation function tanh activation function to normalize the value from −1 to 1. Multiplying the sigmoid value at the sigmoid gate, the final output is stored. This LSTM model train, validate and test on the International Critical Assessment of Protein–protein Interaction Prediction [72, 73]. The method has achieved the accuracy of 90% for prediction of protein–protein interaction interface residue pairs.

4 DL in Medical Imaging and Disease Diagnosis CNN is a powerful tool for solving the problem in computer vision. DCNN can automatically learn mid-level and high-level abstraction which is acquired from raw input data. Accurate disease diagnosis is heavily depending on both image acquisition and image interpretation. In 1996, CNN was applied to medical image processing for breast cancer detection [74].

Application of Deep Architecture in Bioinformatics

179

4.1 Patch-Based CNN Approach for Brain MRI Segmentation Magnetic Resonance Image (MRI) plays a crucial role in medical diagnosis, especially when diagnosing issues with the brain. Structural variation of the brain may correspond to a symptom of many diseases. Medical Image Segmentation is the process of automatic or semi-automatic recognition of boundaries within a 2D or 3D image. The high variability in such images is the biggest challenge. Not only is there huge variation in the anatomy of different humans, but the different medical imaging methods, such as CT, PET X-Ray, and so on have their own distinguishing characteristics. MRI provides quite detailed imaging. MRI images, therefore, has been used for implementing automatic segmentation. In modern medical research, segmentation of brain MRI plays an important role. The seriousness of some disease or evaluation in the brain can be done by observing structural variation by measuring volumes of the region of interest [75]. There are several segmentation methods available, which are basically edge-based and contour-based [12]. However, it is quite challenging to achieve good accuracy using mentioned methods on brain MRI segmentation. CNN is a suitable method because it can work with multidimensional vectors. Therefore, both gray-scale and color images can be processed using CNNs [76–78]. Conventional methods of brain MRI segmentation have some limitation too. The conventional approach for brain MRI segmentation is very time consuming and along with that, training data is a major problem in brain MRI segmentation. To conquer these difficulties, brain MRI segmentation, [79] implemented this using a patch-based CNN architecture. Cui et al. [79] used a public data set CANDI neuroimaging access point for brain MRI segmentation using patch-based CNN architecture. The dataset contains 103 MRI from four diagnostic group: bipolar disorder with and without psychosis, schizophrenic spectrum and finally, a healthy control [80]. In [79], Cui et al. extracts a few sets of MRI data where each data set consists of 4 to 5 MRI. These images are divided into 256 × 256 to 32 × 32 and 13 × 13 patches. The training set has nearly a hundred thousand training image patches. This method utilized CNN for pixel-based automatic segmentation of brain MRI [81]. In image segmentation tasks, each image patch has a label. The labels of these patches are used to create a new segmented MRI image. The proposed CNN architecture achieved an accuracy of 90.83% [79]. It makes use of multiple 5 × 5 kernels. This proposed CNN architecture is compared with five different deep learning architecture, three CNN (CNN1, CNN2 and CNN3) and two artificial neural networks (ANN1 and ANN2). The layered architecture of the first two CNN architecture CNN1 and CNN2 are identical to the proposed CNN, the only difference is, CNN1 and CNN2 used fewer features map than proposed CNN. Input patch size for both CNN1 and CNN2 is 32 × 32. The activation function is replaced by a sigmoid function in the convolutional layer. The third CNN architecture CNN3 used 1313 input patch size. CNN3 contains 4 convolutional layers and a fully connected layer. Max-pooling layer is not a part of CNN3. The structure of two different ANN are: ANN1 is a 3 layer architecture and ANN2 is a 5 layer architecture. In ANN1,

180

S. Sen et al.

Table 1 A list of applied machine learning approaches for different biological problems and along with their performance Implementation on biological issues

Applied machine learning approaches

Accuracy (%)

Protein secondary structure prediction

Latent CNN [47]

90.3126

Machine learning and structural similarity [51]

84.51

LSTM [24]

67.4

GSN [33]

67.4

Stacked sparse auto encoder [47]

62.674

CNF/Raptor-SS8 [44]

64.9

RNN [34]

51.1

Protein function prediction

Protein folding [59]

89

Deep CNN [52]

88

Graph Kernel [62]

84.04

Hierarchical classification [61]

80

Protein–protein interaction interface residue pair prediction

LSTM [10]

90

Brain MRI segmentation

Proposed CNN((conv, pool) = 48, (conv, pool) = 96, conv = 700, conv = 19, softmax = 19) [79]

90

CNN2((conv, pool) = 20, (conv, pool) = 50, conv = 500, conv = 19, softmax = 19) [79]

90.83

CNN1((conv, pool) = 20, (conv, pool) = 50, conv = 500, conv = 19, softmax = 19) [79]

90.81

CNN3(conv = 40, conv = 160, conv = 500, conv = 19, softmax = 19) [79]

89.97

ANN1(3layer(1024-150-10)) [79]

86.25

ANN2(5layer(1024-800-400-150-10)) [79]

74.94

Deep CNN(Bloodcell-3size (973 × 799 × 33)) [14]

93

Deep CNN(Bloodcell-2size(462 × 451 × 33)) [14]

89.92

SVM(Bloodcell-2) [14]

63.11

SVM(Bloodcell-3) [14]

56.35

Cell classification

Alzheimers disease recognition

Deep CNN [13]

96.8588

SVM [83]

84

Identifying metastatic breast cancer

Deep CNN [84]

98.4

Annotating the pathogenicity of genetic variants

DNN [85]

66.1

Classifying and segmenting microscopy images

DCNN [86]

72.3

Application of Deep Architecture in Bioinformatics

181

the first layer, the second layer, and third layer contain 1024, 150 and 10 neurons respectively. And in the ANN2 first layer, the second layer, the third layer, fourth layer, and fifth layer contain 1024, 800, 400, 150 and 10 neurons respectively. The accuracies, achieved by this 5 different architectures CNN1, CNN2, CNN3, ANN1, and ANN2, is 89.97%, 90.18%, 86.25%, 76.68%, and 74.94% [79] respectively. The proposed CNN performs best because of a number of feature maps. Dice-ratio (DR) [82] is also used to measure the segmentation accuracy. The larger value indicates a higher segmentation accuracy. The propose CNN achieved DR of 95.19%. CNN1, CNN2, and CNN3 achieved DR of 94.12%, 94.83%, 92.62% respectively [79]. The proposed CNN can segment complex edge pixels successfully. However, there are also some pixels which are wrongly classified (Table 1).

References 1. Pauling, L., Corey, R.B., Branson, H.R.: The structure of proteins: two hydrogen-bonded helical configuration of the polypeptide chain. Proc Natl Acad Sci 37(4), 205–211 (1951) 2. Ivar, B.C.: Introduction to Protein Structure. Garland Publishing, New York (1999) 3. Patel, M., Shah, H.: Protein secondary prediction using support vector machine. In: International Conference on Machine Intelligence and Research Advancement, pp. 594–598 (2013) 4. Chou, P.Y., Fasman, G.D.: Prediction of the secondary structure of proteins from their amino acid sequence. Trends Biomed. Sci. 2, 128–131 (1977) 5. Hasic, H., Buza, E., Akagic, A.: A hybrid method for prediction of protein secondary structure based on multiple artificial neural networks, pp. 1195–1200. MIPRO, Opatija (2017) 6. Cheng, J., Tegge, A.N., Baldi, P.: Machine learning method for protein structure prediction. IEEE Rev. Biomed. Eng. 1, 41–49 (2008) 7. Andreopoulos, W., Labudde, D.: Protein-protein interaction networks. In: Protein Purification and Analysis I: Methods and Applications. iConcept Press (2013) 8. Jaimovich, A.: Understanding protein-protein interaction network. Ph.D. Thesis. Hebrew University (2010) 9. Asai, K., Hayamizu, S., Handa, K.I.: Prediction of protein secondary structure by the hidden Markov model. Bioinformatics 9(2), 141–146 (1993) 10. Zhao, Z., Gong, X.: Protein-protein interaction interface residue pair prediction based on deep learning architecture, IEEE/ACM Trans. Comput. Biol. Bioinform. (2017) 11. Krizhevsky, A., Sutskever, I., Hinto, G.E.: Imagenet classification using deep convolutional neural network. In: Advances in Neural Information Processing System, pp. 1097–1105 (2012) 12. Cire¸san, D.C., et al.: Mitosis detection in breast cancer histology images with deep neural networks. In: International Conference on Medical Image Computing and Computer-assisted Intervention. Springer, Berlin, Heidelberg (2013) 13. Sarraf, S., Tofighi, G.: Deep learning-based pipeline to recognize alzheimers disease using fMRI Data. In: IEEE, Future Technologies Conference, pp. 816–820, 2016 14. Li, X., Li, W., Xu, X., Hu, W.: Cell classification using convolutional neural networks in medical hyperspectral imagery. In: 2nd International Conference on Image, Vision and Computing, pp. 501–504 (2017) 15. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 16. Greff, K., Kumar Srivastava, R., Koutin, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: a search space Odyssey (2017). arXiv:1503.04069v1 17. Gers, F.A., Schraudolph, N.N., Schmidhuber, J.: Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. pp. 115–143 (2002)

182

S. Sen et al.

18. Svozil, D., Kvasnicka, V., Pospichal, J.: Introduction to multi-layer feed forward neural network. Chemom. Intell. Lab. Syst. 39, 43–62 (1997) 19. Toh, K.-A., Lu, J., Yau, W.-Y.: Global feedforward neural network learning for classification and regression. In: International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition, pp. 407–422 (2001) 20. Bishop, C.M.: Neural network for pattern recognition. Oxford University Press Inc., New York (1995) 21. Schmidt, W.F., Kraaijveld, M.A., Duin, R.P.W.: Feed forward neural networks with random weights. In: 11th IAPR International Conference on Conference B: Pattern Recognition Methodology and Systems, Proceedings, vol. 2, pp. 1–4 (1992) 22. Graves, A.: Supervised Sequence Labelling with Recurrent Neural Networks. Springer, Berlin, pp. 5–13 (2012) 23. Pascanu, R., Gulcehre, C., Cho, K., Bengio, Y.: How to construct deep recurrent neural networks (2013). arXiv preprint arXiv:1312.6026 24. Sonderby, S.K., Winther, O.: Protein secondary structure prediction with long short term memory networks (2015). arXiv:1412.7828v2 25. Hochreiter, S., Heusel, M., Obermayer, K.: Fast model-based protein homology detection without alignment. Bioinformatics 23(14), 1728–1736 (2007) 26. Min, S., Lee, B., Yoon, S.: Deep learning in bioinformatics. Brief. Bioinf. 18(5), 851–869 (2017) 27. Baldi, P., Brunak, S., Frasconi, P., Soda, G., Pollastri, G.: Exploiting the past and the future in protein secondary structure prediction. Bioinformatics 15(11), 937–946 (1999) 28. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994) 29. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997) 30. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005) 31. Yaseen, A., Li, Y.: Template-based prediction of protein 8-state secondary structures. In: IEEE 3rd International Conference on Computational Advances in Bio and Medical Sciences (ICCABS), pp. 1–2 (2013) 32. Wolfgang, K., Christian, S.: Dictionary of protein secondary structure: pattern recognition of hydrogen bond and geometrical features. Biopolymers 22(12), 2577–2637 (1983) 33. Zhou, J., Troyanskaya, O.G.: Deep supervised and convolutional generative stochastic network for protein secondary structure prediction. In: Proceeding of the 31st International Conference on Machine Learning, Beijing, China, JMLR: W&CP, vol. 32, pp. 745–753 (2014) 34. Pollastri, G., Przybylski, D., Rost, B., Baldi, P.: Improving the prediction of protein secondary structure in three and eight classes using recurrent neural network and profiles, proteins: structure. Funct. Genet. 47(2), 228235 (2002) 35. Bengio, Y., Thibodeau-Laufer, E., Alain, G.: Deep generative stochastic networks trainable by backprop. In: International Conference on Machine Learning, pp. 226–234 (2014) 36. Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009) 37. Du, C., Zhu, J., Zhang, B.: Learning deep generative models with doubly stochastic gradient MCMC. IEEE Trans. Neural Netw. Learn. Syst. (2017) 38. Ozair, S., Yao, L., Bengio, Y.: Multimodal transitions for generative stochastic network. arXiV: 1312.5578v4 (2014) 39. Bengio, O., Yao, L., Alain, G., Vincent, P.: Generalized denoising auto-encoders as generative models. In: Advances in Neural Information Processing Systems, pp. 899–907 (2013) 40. Salakhutdinov, R., Hinton, G.: Deep Boltzmann machines. Appearing in Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS), Clearwater Beach, Florida, USA, vol. 5 of JMLR: W&CP 5 (2009) 41. Jones, D.T.: Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292(2), 195–202 (1999)

Application of Deep Architecture in Bioinformatics

183

42. Jamel, T.M., Khammas, B.M.: Implementation of sigmoid activation function for neural network using FPGA. In: 13th Scientific Conference of Al-Ma’moon University College (2012) 43. Wang, G., Dunbrack Jr., R.L.: PISCES: a protein sequence culling server. Bioinformatics 19, 1589–1591 (2003) 44. Wang, Z., Zhao, F., Peng, J., Xu, J.: Protein 8-class secondary structure prediction using conditional neural fields. Proteomics 11(19), 3786–3792 (2011) 45. Ng, A.: Sparse Autoencoder. CS294A Lecture notes, vol. 72 (2011) 46. Ng, A.: Supervised learning. CS229 Lecture Notes, pp. 1–3 (2000) 47. Al-Azzawi, A.: Deep learning approach for secondary structure protein prediction based on first level features extraction using a latent cnn structure. Int. J. Adv. Comput. Sci. Appl. 8(4), 5–12 (2017) 48. Hubel, D.H., Wiesel, T.N.: Receptive fields and functional architecture of monkey striate cortex. J. Physiol. 195, 215–243 (1967) 49. LeCun, Y., Bengio, Y.: Convolutional Networks for Image, Speech and Time-Series. AT and T Bell Laboratories, Dept Imformatique Recherche (1995) 50. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. Adv. Neural Inf. Process. Syst. 396–404 (1990) 51. Magnan, C.N., Baldi, P.: Perfect prediction of protein secondary structure and relative solvent accessibility. Mach. Learn. Struct. Similarity Bioinform. 30(18), 2592–2597 (2014) 52. Tavanaei, A., Maida, A.S., Kaniymattam, A., Loganantharaj, R.: Towards recognition of protein function based on its structure using deep convolutional network. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 145–149 (2016) 53. Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., Thornton, J.M.: CATH a hierarchic classification of protein domain structures. Structure 5(8), 1093–1109 (1997) 54. Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247(4), 536–540 (1995) 55. Karim, R., Al-Aziz, M.M., Shatabda, S., Rahman, M.S., Mia, M.A.K., Zaman, F., Rakin, S.: CoMOGrad and PHOG: from computer vision to fast and accurate protein tertiary structure retrieval. Sci. Rep. 5, 1–11 (2015) 56. Pettersen, E.F., Goddard, T.D., Huang, C.C., Couch, G.S., Greenblatt, D.M., Meng, E.C., Ferrin, T.F.: UCSF chimera a visualization system for exploratory research and analysis. J. Comput. Chem. 25(13), 1605–1612 (2004) 57. Kraulis, P.K.: MOLSCRIPT: a program to produce both detail and semantic plots of protein structures. J. Appl. Crystallogr. 24, 946–950 (1991) 58. Nooruddin, F., Turk, G.: Simplification and repair of polygonal models using volumetric techniques. In: IEEE Trans. Vis. Comput. Graph. 9(2), 191–205 (2003) 59. Zakeri, P., Jeuris, B., Vandebril, R.: Protein fold recognition using geometric kernel data fusion. Bioinformatics 30(13), 1850–1857 (2014) 60. Brylinski, M., Lingam, D.: eThread: a highly optimized machine learning based approach to meta threading and the modeling of protein tertiary structure. PLoS One 7(11), e50200 (2012) 61. Lin, C., Zou, Y., Qin, J., Jiang, Y., Ke, C., Zou, Q.: Hierarchical classification of protein folds using a novel ensemble classifier. PLoS One 8(2), e56499 (2013) 62. Borgwardt, K.M., Ong, C.S., Schonauer, S., Vishwanathan, S.V.N., Smola, A.J., Kriegel, H.-P.: Protein function prediction via graph kernels. Bioinformatics 21, i47–i56 (2005) 63. Giard, J., Ambroise, J., Gala, L.J.: Regression applied to protein binding site prediction and comparison with classication. BMC Bioinform. 10(1), 1–12 (2009) 64. Cheng, J., Baldi, P.: Improved residue contact prediction using support vector machines and a large feature set. BMC Bioinform. 8(2), 1–9 (2007) 65. Ohue, M., Matsuzaki, Y., Shimoda, T.: Highly precise protein-protein interaction prediction based on consensus between template-based and de novo docking methods. BMC Proc. 7(7), S6 (2013)

184

S. Sen et al.

66. Gobel, U., Sander, C., Schneider, R.: Correlated mutations and residue contacts in proteins. BMC Proc. 7(7), S6 (2013) 67. Singh, R., Park, D., Xu, J., Hosur, R., Berger, B.: Struct2Net: a web service to predict protein–protein interactions using structure based approach. Nucleic Acids Res. 38(2), 508–515 (2010) 68. Moult, J.B., Fidelis, K., Rost, B.: Critical assessment of methods of protein structure prediction, CASP, Round 6. Proteins (2010) 69. Lena, D.P., Nagata, K., Baldi, P.: Deep architectures for protein contact map prediction. Bioinformatics 28(19), 2449–2457 (2012) 70. Larochelle, H., Bengio, Y., Louradour, J.: Exploring strategies for training deep neural networks. J. Mach. Learn. Res. 1–40 (2009) 71. Alessandro, L., Gianluca, P., Pierre, B.: Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. J. Chem. Inform. Model. 53(7), 1563–1575 (2013) 72. Vreven, T., Moal, H.I., Vangone, A.: Updates to the integrated protein–protein interaction benchmarks: docking benchmark version 5 and affinity benchmark version 2. J. Mol. Biol. 427(19), 3031–3041 (2015) 73. Janin, J., Henrick, K., Moult, J.: Assessment of predicted interactions. CAPRI: a critical assessment of predicted interactions. Proteins Struct. Funct. Bioinform. 52(1), 2–9 (2003) 74. Sahiner, B.: Classification of mass and normal breast tissue: a convolution neural network classifier with spatial domain and texture images. Proteins Struct. Funct. IEEE Trans. Med. Imag. 15(5), 598610 (1996) 75. Shaun, P.: Brain MRI Segmentation, Computational Surgery and Dual Training, pp. 45–73. Springer, US (2010) 76. Yue-Hei Ng, J., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: deep networks for video classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4694–4702 (2015) 77. Ye, H., Wu, Z., Zhao, R.-W., Wang, X., Jiang, Y.-G., Xue, X.: Evaluating two-stream CNN for video classification. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, pp. 435–442 (2015) 78. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Largescale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on International Computer Vision and Pattern Recognition, pp. 1725–1732 (2014) 79. Cui, Z., Yang, J., Qiao, Y.: Brain MRI segmentation with patch-based CNN approach. In: Proceedings of the 35th Chinese Control Conference, pp. 27–29 (2016) 80. Kennedy, N.D., Haselgrove, C., Hodge, M.S.: CANDIShare: a resource for pediatric neuroimaging data. Neuroinformatics 10(3), 319–322 (2012) 81. Leena Silvoster, M., Govindan, V.K.: Convolutional neural network based segmentation. In: Computer Networks and Intelligent Computing: 5th International Conference on Information Processing, ICIP, vol 157, pp. 190 (2011) 82. Zhang, W., Li, R., Deng, H., Wenlu, L., Lin, W., Ji, S., Shen, D.: Deep convolutional neural networks for multi-modality isointense infant brain image segmentation. NeuroImage 214–224 (2015) 83. Tripoliti, E.E., Fotiadis, D.I., Argyropoulou, M.: A supervised method to assist the diagnosis and classification of the status of alzheimers disease using data from an FMRI experiment. In: Engineering in Medicine and Biology Society. EMBS 2008. 30th Annual International Conference of the IEEE, pp. 4419–4422 (2008) 84. Wang, D., Khosla, A., Gargeya, R., Irshad, H., Beck, A.H.: Deep learning for identifying metastatic breast cancer (2016). arXiv preprint arXiv:1606.05718 85. Quang, D., Chen, Y., Xie, X.: DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics 31(5), 761–763 (2014) 86. Kraus, O.Z., Grys, B.T., Ba, J., et al.: Automated analysis of high-content microscopy data with deep learning. Mol. Syst. Biol. 13(924 (2017)

Application of Deep Architecture in Bioinformatics

185

Sagnik Sen is currently a doctoral researcher at Department of Computer Science and Engineering, Jadavpur University, Kolkata. He has his expertise in the field of Computational Biology, Bioinformatics and Structural Biology. A gold medallist in his Masters, Sagnik has been awarded the Department of Science and Technology (Govt. of India) INSPIRE fellowship for his research. Sagnik’s work has already been published in many highly esteemed peer-reviewed international journals. In last few years Sagnik has done some insightful works on Intrinsically disordered proteins and their functional and structural dynamics. His works provide a strong blend between biology and computer science.

Rangan Das is a Master’s student in the department of Computer Science and Engineering at Jadavpur University, Kolkata, India. His current research interests encompass the area of deep learning.

Swaraj Dasgupta is a Master’s student in the department of Computer Science and Engineering at Jadavpur University, Kolkata, India. His current research interests encompass the area of advance machine learning.

186

S. Sen et al. Dr. Ujjwal Maulik is a Professor in the Department of Computer Science and Engineering, Jadavpur University, Kolkata, India since 2004. He did his Bachelors in Physics and Computer Science in 1986 and 1989 respectively. Subsequently, he did his Masters and Ph.D. in Computer Science in 1992 and 1997 respectively. Dr. Maulik has worked in Los Alamos National Laboratory, Los Alamos, New Mexico, USA in 1997, University of New South Wales, Sydney, Australia in 1999, University of Maryland Baltimore County, USA in 2004, University of Heidelberg, Germany in 2009, German Cancer Research Center (DKFZ) in 2010, 2011 and 2012, International Center of Theoretical Physics (ICTP), Trieste, Italy in 2014 and 2017, University of Padova in 2014 and 2016.

Intelligent, Secure Big Health Data Management Using Deep Learning and Blockchain Technology: An Overview Sohail Saif, Suparna Biswas and Samiran Chattopadhyay

Abstract Sensor-based health data collection, remote access to health data to render real-time advice have been the key advantages of smart and remote healthcare. Such health monitoring and support are getting immensely popular among both patients and doctors as it does not require physical movement which is always not possible for elderly people who lives mostly alone in current socio-economic situations. Healthcare Informatics plays a key role in such circumstances. The huge amount of raw data emanating from sensors needs to be processed applying machine learning and deep learning algorithms for useful information extraction to develop an intelligent knowledge base for providing an appropriate solution as and when required. The real challenge lies in data storage and retrieval preserving security, privacy, reliability and availability requirements. Health data saved in Electronic medical record (EMR) is generally saved in a client-server database where central coordinator does access control like create, access, update, or delete of health records. But in smart and remote healthcare supported by enabling technologies such as Sensors, Internet of Things (IoT), Cloud, Deep learning, Big data, etc. EMR needs to be accessed in a distributed manner among multiple stakeholders involved such as hospitals, doctors, research labs, patients’ relatives, insurance provider, etc. Hence, it is to be ensured that health data be protected from unauthorized access specifically to maintain data integrity using advanced distributed security techniques such as blockchain. Keywords Electronic medical record · Big data · Security · Blockchain · Deep learning

S. Saif · S. Biswas (B) Department of Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology, Kolkata, West Bengal, India e-mail: [email protected] S. Saif e-mail: [email protected] S. Chattopadhyay Department of Information Technology, Jadavpur University, Kolkata, West Bengal, India e-mail: [email protected] © Springer Nature Switzerland AG 2020 S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics, Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_10

187

188

S. Saif et al.

1 Introduction Smart and Remote Healthcare for elderly care [1] and patient’s monitoring are getting increasingly popular among researchers due to its applicability and acceptance in today’s socio-economic scenario where the average lifetime of human being has increased leading to live with age related ailments and without personalized care. Electronic Medical Record (EMR) [2] has traditionally been saved in distributed databases which mostly follow client-server architecture. There is a central control called administrator to supervise or manage permission of end-users to create, access, update, or delete health records. Health data sensed by the sensors are prone to security attacks and vulnerabilities [3]. In the sensing unit, several tiny sensors wearable, implanted or ambient, etc. acquire data. These devices are prone to damage by fall while manual handling leading to loss of data or erroneous data, also devices may be compromised by the adversary for stealing or tampering data, may be replaced by illegitimate one, etc. In the communication unit, sensor data travel through heterogeneous communication links such as short-range communication links e.g. Bluetooth, Wi-Fi, Zigbee, WiMax having varying link quality, security measures, etc. In storage and processing unit, data gets stored in cloud servers for further access, processing, knowledge building, feedback or advice generation, etc. [4]. The first phase of this process comprises sensing of data, the transmission of data and saving the data in the cloud. The second phase comprises access of data from the cloud, analyze or modify data, update or delete data at cloud by multiple stakeholders in healthcare (Fig. 1). Now, for healthcare data, privacy and integrity are two important properties to be ensured so that people do not worry about revealing their sensitive data through unauthorized access. Integrity of health data is important as advice generation is inaccurate if it is based on incorrect data. Underlying security measures and principles of IoT and cloud enabled healthcare framework helps to avoid additional computational complexity resulting in more resource consumption due to implementation of encryption algorithms separately. As there are multi-parties involved in health data access, if data needs to be encrypted at sender and then decrypted at corresponding receiver then it would increase latency which is not desirable in time critical application like healthcare. Moreover, devices in IoT enabled smart healthcare systems are heterogeneous having varying resource level e.g., sensors to Smartphone/tablet/laptop to high-end workstation/server. So, unilateral encryption-decryption algorithms like Data Encryption Standard (DES), Advanced Encryption Standard (AES) or Rivest-Shamir-Adleman (RSA) cannot be applied at all levels of smart healthcare architecture. Thus, direct confidentiality may have an implementation issue but confidentiality may be ensured by implementing authentication and integrity. In blockchain [5], there is a decentralized network where stakeholders (hospitals, doctors, research labs, and insurance provider) are connected to each other called as blockchain nodes. Health Sensors collect data and send to Personal Digital Assistant (PDA) which then forwards data to the blockchain network through access points. Data forwarded to the blockchain network in one session is called a block. Hash of the previous data is bound with the new data so

Intelligent, Secure Big Health Data Management …

189

Fig. 1 The traditional 3-tier architecture of wireless body area network (WBAN) based healthcare [8]

that the blockchain network can validate the new block of data, once validated hash of the data is stored in the blockchain nodes and the health information are stored in the cloud database in encrypted format. Big health data [6] stored in cloud Database requires further analysis using Machine learning techniques for knowledge extraction. Deep Learning techniques are now widely used in healthcare; some of the popular applications include early disease detection, DNA (Deoxyribonucleic Acid) analysis, prediction of new drug effectiveness, personalized treatments, etc. One of the big challenges of using deep learning techniques in health informatics is the need for a huge amount of labeled data. But EMR may contain different unlabeled data, for example, X-ray images without any medical conditions like cancer or fibrosis. In such cases, unsupervised learning techniques can be used for labeling of the data using data mining. For labeled data, supervised learning can be used. For a combination of labeled and unlabeled data, semi-supervised learning is to be applied. Convolution Neural Network (CNN) is highly impacted Deep learning technique among others like Deep Neural Network (DNN), Deep Auto encoder, Deep Belief Network (DBN), Recurrent Neural Network (RNN) as health data ar pre-dominantly image-based nowadays. DNNs in real-time applications such as healthcare have successfully been implemented with parallelism support of Graphical Processing Units (GPUs) [7].

2 Related Works This section describes some of the works related to big health data and issues related to intelligent processing of them using deep learning techniques. Also, the security

190

S. Saif et al.

of health data in terms of privacy, authentication, and integrity is utmost important so efficient security techniques to ensure the security of big health data taking care of related issues and challenges to be focused necessarily. In [7], authors have presented a rigorous review of deep learning techniques and implementation issues in handling big health data in terms of advantages, drawbacks and future scope. Application of deep learning in both sensor-based health data and image based data is also focused. In health informatics, EMR contains the medical history of a patient such as diagnoses with and without medical test, prescription advice and follow up, vaccination data, proneness to allergy, time-varying signals such as Electroencephalogram (EEG) or Electrocardiogram (ECG) or Electromyogram (EMG) signals, sensory data using pervasive sensors such as pressure, temperature, pulse rate, heart rate etc. Health data in health informatics is not always complete and labeled. Such data may also be erroneous. Moreover, as sensing or data acquiring devices are heterogeneous, data are not in the same format, size. Rate and sequence of data acquisition also vary posing a great challenge to processing of such data set. Genuine challenges in the application of deep learning techniques such as DNN, CNN, etc. in big health data processing lies in the nature of this technique itself. Inadequacy of correct and complete training data may lead to poor training of DNN model. Also, limited training labeled data may be appropriate to get very low error but lead to huge error while testing with new dataset. Often researchers apply deep learning techniques such as CNN model as black box without having proper interpretation of hidden layers, weights, etc. Correct and efficient preprocessing, filtering, normalization of training data set is utmost important as noise may lead to misclassification of data in machine learning techniques such as logistic regression, etc. In spite of all, if insight and appropriate interpretation of hyper parameters can be built up, the structures of DNN, number of filters in CNN can be controlled and predefined. In this context, a blockchain-based medical data sharing among untrusted multi-parties ensuring confidentiality, authentication and access control has been proposed [9]. To eliminate illegal modifications by intruders, transaction requests among legitimate parties to access data from the cloud are secured with cryptographic keys. A threat model has been developed identifying security attacks and threats of health data as well as medical reports. When requests for data access come from an entity, signature-based authentication is done first and then data is retrieved. This data is encrypted and sent to the requestor. The purpose of the data access request is also considered. Performance evaluation in terms of latency as number requestor increases is done based on real test scenarios. In [10] authors have presented a rigorous and logical review of related works on blockchain and its application in healthcare. The approach systematically finds characteristics of blockchain technology that make it suitable to ensure secure and trusted transactions among many stakeholders on the shared medical record of patients. A large number of publications have been analyzed and evaluated based on relevant parameters such as blockchain platform used, consensus algorithm implemented, type of blockchain network, smart contracts, etc. This work involves a rigorous search of blockchain papers in healthcare during the span between 2008 and 2019 to discover that a good quality work worthy of analysis have been published only 2015 onwards and implemented works have only been

Intelligent, Secure Big Health Data Management …

191

published in 2017, 2018 showing that interest in blockchain-based secure healthcare is increasing. Though blockchain is a very promising and efficient technique to handle with shared medical records among multiple interested parties ensuring authentication, confidentiality, integrity, access control, etc., it has several limitations and challenges in the context of real healthcare scenario. In [11], authors have identified many challenges e.g., additional overheads in terms of communication, storage, delay in executing requests to access data, scalability issues. Moreover, the performance evaluation of the proposed blockchain-based algorithm has been done. An interesting review of the application of deep learning techniques in healthcare has been presented in [12]. They have categorized deep learning techniques applied to specific physiological signals such as ECG, EMG, EEG. A detailed and insightful illustration on methods of deep learning both mathematically and architecture wise will attract budding or existing researchers in related areas. Authors in [13] proposed a unique secure remote healthcare system using smart contracts by implementing private blockchain based on Ethereum protocol. Ethereum based private blockchain ensures that only authenticated users can access patients’ health data. In this work, only the events such as data sensing in sensors, processing in smart devices using smart contracts, alert or alarm generation and sending to caregivers, etc. are stored in the blockchain ledger. Actual confidential information related to medical records is saved in EMR and mapping is maintained between EMR and blockchain ledger to access or retrieve data. Some limitations while developing and implementing this work have also been identified e.g. efficient key management, latency etc. In IoT enabled systems number of sensors is high and will increase rapidly over time hence key generation, distribution is a trivial issue. Scope of such system to provide support to emergency health scenario by reducing latency in blockchain ledger processing is also a real challenge. A recent research work [14] has proposed a CNN based approach to predict chronic disease risk. Authors have collected real-life hospital data for a regional chronic disease from central hospitals of China, since health data are mostly unlabeled and contains missing data, they have used a latent factor model to reconstruct the missing information. They have implemented their proposal and compared with other disease prediction algorithms. Their experimental results show the accuracy of 94.8% and a convergence speed which is comparatively better. In [15] authors have shown that some of the machine learning algorithms are prone to a security attack known as poisoning attack. In this type of attack, the attacker augments the training dataset using malicious data which causes wrong results than expected. This can be life-threatening while diagnosing a patient. Finally, authors have presented prevention techniques for these types of attacks. In Table 1, we have presented some works published in the last few years applying Deep Learning based techniques in healthcare applications. Table 2 shows works for sharing health data using blockchain.

192

S. Saif et al.

Table 1 Application of various deep learning methods in health informatics Authors, year

Application

Input data

Deep learning method

Sun et al. [16], 2016

Lung cancer diagnosis

Lung Image Database Consortium (LIDC) dataset

CNN

Esteva et al. [17], 2017

Skin cancer classification

Clinical imaging

CNN

Ahmed et al. [18], 2016

Breast cancer classification

Wisconsin breast cancer dataset (WBCD)

DBN

Fakoor et al. [19], 2013

Cancer diagnosis and classification

Gene expression

Deep Auto encoder

Ramsundar et al. [20], 2015

Drug Discovery

Molecular Compound

DNN

Rongjian et al. [21], 2014

Brain Disease Diagnosis

Alzheimer’s disease neuroimaging initiative(ADNI) dataset

CNN

Mohsen et al. [22], 2017

Brain tumor classification

Magnetic resonance imaging (MRI) images

DNN

Amin et al. [23], 2018

Brain tumor detection

Magnetic resonance imaging (MRI) images

DNN

Yaniv et al. [24], 2015

Chest pathology detection

X-ray images

CNN

Charissa et al. [25], 2016

Human activity recognition

Raw sensor data

CNN

Table 2 Use of blockchain technology in health data sharing Authors, year

Security requirements considered

Implementation

Confidentiality Authentication

Access control

Integrity

Zhang et al. [26], 2018





×





Azaria et al. [27], 2016





×

×

×

Peterson et al. [28], 2016





×

×

×

Patel et al. [29], 2018





×

×

×

Xia et al. [9], 2017







×



Intelligent, Secure Big Health Data Management …

193

3 Preliminaries In this section, we have discussed about Internet of Things (IoT), Bigdata, various Deep Learning techniques, and blockchain technology briefly. Then, the proposed architecture has been discussed in details.

3.1 Internet of Things (IoT) The concept behind IoT is to connect the internet with humans that can be achieved through connecting machines and other physical things with internet [30]. This technology is rapidly growing and adopted in healthcare. Usage of IoT based technologies has helped physicians and patients a lot. For example, a patient can take advice from doctors without physically visiting clinics or patients who need real-time monitoring, do not need to visit hospitals. Using biological sensors and internet, doctors can observe the physiological parameters of patients. Wireless body area network (WBAN) is one of the core technologies to support remote healthcare. It basically consists of some battery-powered lightweight wireless sensors that can wearable and implantable. These sensors are connected with an access point using short-range communication and that access point forwards the data to a medical facility such as clinic, hospital. These IoT systems produce massive data which can be qualified as “Big Data”. These data need to be handled in a secured and efficient way so that it can be accessed by all stakeholders.

3.2 Big Data Big Data is a large dataset, which may contain data in a structured, unstructured and semi-structured format. Structured data are basically stored in different databases or in spreadsheets in a tabular format. Image, video, audio belong to unstructured category and these data are very difficult to be analyzed. Semi-structured data do not follow any strict standard, such as XML. These data can be used in emerging applications such as clinical decision support, disease prediction, etc. through various Machine Learning technologies. Healthcare sector produces a huge amount of data such as sensor data, previous health records, drug records. This enormous data are difficult to manage using traditional software or hardware systems. Use of cloud platform reduces the cost for efficient storing and sharing.

194

S. Saif et al.

3.3 Deep Learning Deep learning is prominent unsupervised feature learning method which is used to extract high-level features from low-level data. Since feature identification is timeconsuming and expensive, Deep learning (DL) is used. The main advantage of unsupervised learning is that it does not need labeled data for learning purpose. In most of the cases, medical health data does not contain a label, like X-ray images without any medical condition. Labeled data can also be used in DL techniques, which are called supervised learning. There is various type of DL techniques. In this section, we discuss some of popular DL techniques. One of the most popular Deep Learner is an Artificial Neural Network (ANN). It consists of perceptrons, the neurons, which are organized in layers. Layers contain an input layer, one or multiple hidden layer(s) and an output layer. Hidden layer works as the training layer, but increasing the number of the hidden layer does not guarantee improved results. Overfitting problem may occur if too many layers or perceptrons are added, as a result too many noise data is captured instead of the actual feature. This decreases the accuracy. The architecture of Artificial Neural Network is shown in Fig. 2. Convolutional Neural Network (CNN) is most helpful in healthcare, a fixed size of vector is given as input. For example, an array containing pixel values of a pathological image is the input and then it is mapped to an output such as a type of tumor. In this case, different types of tumor images are given as input for training. In CNN perceptron’s are connected and during training, weights are assigned and adjusted in every iteration. After each iteration loss function is used to determine the error then back propagation is done to adjust the weights. Since signals are passed in one direction i.e. input layer to the output layer, it is called a feed-forward network. In Recurrent Neural Network, both forward and backward connections are present. The loss function can be decreased by using gradient descent. Fig. 2 The architecture of an artificial neural network

Intelligent, Secure Big Health Data Management …

195

3.4 Popular Deep Learners Various deep learning techniques are available; we have to choose to wisely the best technique for a specific problem. Table 3 shows some popular methods which have been used in health informatics. Table 3 Summary of popular deep learning architectures Description

Advantage/disadvantage

Deep neural network • This is a type of neural network that contains multiple hidden layers • In general, used for classification or regression • Non-linear hypotheses can be expressed

Advantage • Widely used because of the success rate in different applications Disadvantage • The training process can be very slow if the computation power of the CPU is not good

Deep auto encoder • It is a fully connected neural network which has two phases known as the encoder and decoder • It performs better for feature extraction • Encoder transforms a high dimensional input vector to low dimensional feature

Advantage • Labeled data is not mandatory for the learning process. Many variations are available Disadvantage • Requires high processing time

Deep belief network • It is comprised of RBM’s, a RBM has a visible layer and a hidden layer • The hidden layer of each sub-network act as a visible layer for the Next RBM

Advantage • Both supervised and unsupervised training is possible Disadvantage • Due to initialization and sampling, the learning process is expensive

Deep Boltzmann machine • It consists of multiple hidden layers which are connected to each other in a unidirectional manner • Nodes in the layers are independent of each other but are dependent on other layers

Advantage • Both supervised and unsupervised training is possible Disadvantage • The technique for an approximation of inference based on mean-field is slower than deep belief networks

Architecture

(continued)

196

S. Saif et al.

Table 3 (continued) Description

Advantage/disadvantage

Recurrent neural network • It has the ability to analyze streaming type data. Useful for the applications where output is dependent on previous inputs • Each hidden layer has its own weight and biases

Advantage • It can store sequential events in the form of activations if feedback connection is present Disadvantage • Training can be difficult if tanh and rely activation function is used

Convolutional neural network • It consists of one or more convolutional layers followed by one or more connected layers • Cross entropy, square loss errors are some popular function to calculation error

Advantage • It can take images as input, which is very helpful for medical applications Disadvantage • Labeled dataset of large size is required for execution

Architecture

Convolutional Neural Network Convolutional Neural Network (CNN) is one of the most popular deep learning methods which are inspired by human visual cortex. It is a kind of feed-forward network that consists of many layers also is a collection of interleaved feed-forward layers having convolutional filters. When input data are passed through the layers, high-level features are extracted in each layer. This technique is highly helpful in the era of medical imaging. For example, tumors can be classified from the irregularities in tissue morphology. CNN can be applied to read pattern which is a difficult task by human experts. For example, early stages of many diseases can be detected from tissue samples. Recurrent Neural Networks Recurrent Neural Networks (RNN) is another useful technique for healthcare because it supports streaming data and which can be analyzed further. Fixed-size of input vectors are used here also data such as speech, text or DNA sequences can be provided as input where output depends on previous input. In the architecture of RNN perceptrons are interconnected with themselves, which act as a memory for consecutive inputs. For healthcare scenario, RNN can be applied for the analysis of medical text like anamnesis. For instance, a pool of patient has the same disease with different symptoms. RNN has the ability to scan a set of text files to find the similarities; this can help a physician for diagnosing an illness.

Intelligent, Secure Big Health Data Management …

197

Deep Autoencoders Recent studies show that there is no universal set of features which works accurately on various datasets. Feature extraction using data-driven learning method is more accurate. So, Autoencoder Neural Network is introduced. In this case, the same number of input and output is used so that the input vectors can be recreated instead of assigning a class label. This is an unsupervised technique. Typically, the hidden layer is less than the input/output layers. To extract the relevant features, it encodes the data in lower-dimensional space, but if the input data has higher dimensionality then a single hidden layer is not sufficient. Deep Boltzmann Machine It is an unsupervised learning technique where the connections between the different layers are undirected and it consists of multiple hidden layers. If we treat odd layers on one side and even layers on another side, it can be treated as a bipartite graph. No intralayer connections exist in Deep Boltzmann Machine (DBM), only the units of neighboring layers are connected. Markov chains are used to determine the gradient of the likelihood function but practically it is slow. Restricted Boltzmann Machine One of the popular variants of Boltzmann Machine is Restricted Boltzmann Machine (RBM) which is stochastic in nature. A specific distribution function in stochastic units is used to model the network. There are some steps in the learning process called Gibbs Sampling, which adjust the weights so that the reconstruction error can be minimized. Nodes are undirected in RBM and as a result, values can be propagated in both directions. To train an RBM one of the common method is the use of Contrastive Divergence (CD) algorithm, which is an unsupervised learning technique. There are two phases in CD algorithm referred to as positive and negative phase. The training set is replicated by changing the network configuration in the positive phase; in the negative phase, data is recreated based on the current network configuration. Deep Belief Network Deep Belief Network (DBN) can be treated as a composition of Restricted Boltzmann Machine (RBM). In DBN, hidden layer of every sub-network is connected to the visible layer of the next RBM. Connections are undirected for the top two layers and the lower layers connections are directed. Layer-by-layer greedy learning technique is used to initialize DBN and gradually modifications are done to achieve the target outputs. Some popular Deep Learning Techniques are summarized in Table 3. Table 4 presents popular software packages where Neural Networks can be implemented.

198

S. Saif et al.

Table 4 Software tools for implementation of neural networks Software

Developer

Platform

Supported technique

Cloud support

CNN

RNN

DBN

RBM

Neural Designer [31]

Artelnics

Microsoft Windows, Linux







×



Keras [32]

François Chollet

Microsoft Windows, Linux







×

×

Apache SINGA [33]

Apache Software Foundation

Linux, macOS, Windows





×



×

Deeplearning4j Adam [34] Gibson, Josh Patterson

Linux, macOS, Windows, Android









×

Microsoft Cognitive Toolkit [35]

Microsoft Research

Windows, Linux





×

×

×

Apache MXNet [36]

Apache Software Foundation

Windows, macOS, Linux





×





OpenNN [37]

Artelnics

Microsoft Windows, Linux





×

×

×

PyTorch [38]

Adam Paszke

Linux, macOS, Windows









×

TensorFlow [39]

Google Brain

Linux, macOS, Windows, Android









×

Theano [40]

Montreal Institute for Learning Algorithms (MILA)

Linux, macOS, Windows





×



×

3.5 Applications and Challenges of Deep Learners Machine Learning (ML) has various successful applications in the area of health informatics whereas Deep Learning (DL) techniques are more recent and its adoption is slow. However, DL has rapid progress and results can be promising in spite of the challenges. We can divide medical applications of DL in three categories.

Intelligent, Secure Big Health Data Management …

199

• Predictive healthcare, e.g., the efficiency of treatment prediction for various diseases. • Medical Decision Support, e.g., using physiological information of the patient various disease can be detected and diagnosed. • Personalized treatments, e.g., personalized drugs can be designed as per the need of individual patients. Predictive Healthcare This type of applications is designed for detection of diseases at early stages so that treatment can be started before the patient goes into a critical state. In general, detection of Alzheimer is very difficult in its early stage. Other areas of predictive healthcare include predicting the effectiveness of treatments. Deep learning (DL) can be used to detect anomalies which are difficult to be detected by the human eye, for examples Computerized axial tomography (CAT) scans or radiographs. DL can be very much effective in anomaly detection since it can detect small variations which can remain undetected by the human in early stages. Medical images are easy to obtain and it can be used as training data which can solve the sparse data problem. Behavioral data of patients can be also used for the early detection of illness. Using these different medical data DL can build a prediction model. Another important of predictive health care can be prediction of the efficiency of new drugs. So far results are not promising but new development approaches can be invented. Medical Decision Support One of the important application of Deep Learning in health informatics in medical decision support which is very much trending nowadays. Deep Learning techniques can help the doctors in every stage of a medical diagnosis like detection of the disease, proposing personalized treatment, post-treatment therapy, etc. In the case of disease prediction from image analysis, Deep Learning techniques can be more accurate than humans. Biomedical text analysis can be done through DL. Due to domainindependent nature, any kind of data can be analyzed as well as correlated using DL. Correlation analysis can be done using a different kind of electronic health record of patients to provide a better diagnosis. Also, from a single data set, correlation analysis can be done, for example, brain regions can be correlated from different MRI images. For correlation analysis, CNN techniques are widely used. CNN can create abstractions of the input data even data are collected from heterogeneous sensors. A medical practitioner may not be able to go through a big medical history of a patient; hence, DL can do that task and can provide medical decision support. Personalized Treatments Personalized treatments are closely related to medical decision support. Based on the prediction Deep Learning techniques can support decision making and hence personalized treatments can be provided as well drugs can be designed. Electronic health records stored at cloud database are mostly multimodal and unstructured and due to the recent advancement of technologies, DL can offer a diagnosis based on

200

S. Saif et al.

the data. Personalized treatments can be offered based on various data. For example biomarkers can be determined by DNA analysis and genome mining. Biomarkers are nothing but a biological state (disease) indicator which can be measured. Every disease is developed in the human body itself. Biomarkers can determine this probability of development and that can help the medical experts to provide better prediction and diagnoses. Genomics helps to identify gene allele which is responsible for the development of an illness. Drug effectiveness can be determined by evaluations the differences in genes when the drug is applied, this is called Pharmacogenomics. This helps to reduce the dosage levels as well as the side effects of the drug. Deep Learning techniques perform very well in cancer classification from gene expression data. For example, to predict splicing pattern, features extraction from Ribonucleic Acid (RNA) and Micro ribonucleic Acid (miRNA) data can be efficiently done using DL. So, DL can help us to analyze data from EMR and can offer personalized medicines. Challenges There are many challenges of Deep Learning in the domain of health data. Depending on the nature of medicines there is a requirement of security, availability, reliability, efficiency. For example, a health sensor must work continuously without any interruption, so that emergencies can be handled. Some recent works show that weight filters can be is used in CNN for extraction of high-level features but the entire learning module may become non-interpretable. Most of the researches use DL techniques without knowing the possibility of success; if misclassification problems occur then they do not have the ability to modify. We have discussed in the previous sections that large datasets are required for effective and reliable training model. Nowadays enormous healthcare data is available but disease-specific data is still limited. So DL is not suited for applications involving rare diseases. Another common issue in training of Deep Neural Network is overfitting problem when the small training dataset is used. This happens when the total number of samples in the training set is proportional to the number of parameters in that network. Overfitting problem can be avoided by exploiting regularization techniques such as dropout during the training process. DNN does not support raw data directly as input data; so, some preprocessing is needed or the input domain needs to be changed. Hyper parameters which control the architecture of a DNN, for example, the number of filters in CNN, is a blind exploration process and accurate validation is very much required. Finding an optimal set of hyper parameters and correct preprocessing of raw data is a challenging task and this can lead to the long training process. Another important issue in DL is that many DNNs can be fooled easily; if the minor change is done in input data (adding imperceptible noise in an image) then the samples will be misclassified [41]. It can be noted that most of the machine learning algorithms can be affected by this issue. If the value of a particular feature is set very high or very low, misclassification problem will surely arise in logistic regression. In decision trees, if a single binary feature is switched in the final layer, then it will product incorrect results. So, we can say that any machine learning technique is vulnerable for security attacks also, as a simple alteration will lead the system to produce wrong results.

Intelligent, Secure Big Health Data Management …

201

3.6 Blockchain Technology A blockchain is a collection of decentralized CPU/node where data can be stored in blocks. It is also known as a decentralized ledger where data blocks are updated continuously (Fig. 3). Data blocks may contain agreements, contracts, sales, financial transaction, health data, etc. Blockchain was introduced by Satoshi Nakamoto in the year 2008. Basically, it was developed to secure the cryptocurrency (Bitcoin) transactions. But nowadays this peer to peer technology has been adopted by various sectors like finance, transportation, education. healthcare, governance, etc. Cryptographic algorithms make the system tamper-proof; these algorithms make the system computationally impossible to alter the data/transaction stored in the blockchain. An intruder needs to compromise 51% of CPU/nodes to overcome the hashing power of the targeted blockchain network. A block contains a header and a message. Several parameters form the header are as follows: (i) Timestamp which records the exact time of creation of the block, (ii) Previous block hash refers to hash of the previous block of the chain, (iii) Merkle root which contains a hash of the root of the tree that is the SHA256 Hash of the transactions, (iv) Difficulty Target is nothing but a piece of data which is difficult to achieve set by Proof of Work (PoW) algorithm, (v) Nonce is required to achieve the difficulty target, it also defends reply attacks. Each block is interconnected through the Hash of the previous block. On the other hand, the message contains the hash of the previous transaction, transaction/message to be sent and a digital signature of the owner. This digital signature is treated as a proof of ownership of the transaction/message which can be verified by the public key of the owner. Each time a block is approved and added to the blockchain that becomes immutable which cannot be tampered or altered.

Fig. 3 Structure of a block in blockchain [42]

202

S. Saif et al.

3.7 Types of Blockchain In general, there are various types of blockchains which can depend on managed data, availability of the data and actions performed by a user. We can categorize in there • Public Blockchain (permissionless) • Consortium (public permissioned) • Private Blockchain. From the types of blockchain, it is clear that blockchains which are accessible and visible to the public are public blockchain. However, the entire data may not be accessible by the public, since some part of the data can be in an encrypted format to keep participants anonymity [43]. In public blockchains, anyone can join the blockchain and act as a node, or can become a miner; hence, approvals are required. Cryptocurrency networks come in this category where a miner gains some economic incentive. For instance, Bitcoin, Ethereum, Litecoin are cryptocurrency networks based on the public blockchain. In Consortium type of blockchains, only selected nodes are allowed to participate in the distributed consensus process [43]. Any kind of industry can use this kind of blockchain. Sometime consortium blockchains are developed for a particular industry (e.g., healthcare sector), but open for public use based on approval. Private blockchains are decentralized network [43] where only permissioned nodes can join the network. The task of the nodes such as, to perform transactions, to execute smart contracts or to act as a miner, is controlled in private blockchains. Basically, a trusted organization manages the blockchain. Platforms like Ripple [44], Hyperledger Fabric [45] only support private blockchain network.

3.8 Challenges of Blockchain in Healthcare Integrating blockchain in healthcare systems is very challenging. Challenges like management, technical problems need to be taken care of. Here we have discussed several coherent challenges. Interoperability To share data among different healthcare providers in a fast and effective way is a challenging task. Due to non-collaboration and lack of coordination, it becomes a barrier for effective data sharing [27]. Patients and other stakeholders of healthcare may face problems in data sharing and retrieval process. Management, Anonymity, and Privacy of Data Management of large health records and sharing it over the healthcare providers is not an easy task while integrating blockchain. Since health data is sensitive in nature,

Intelligent, Secure Big Health Data Management …

203

it should be shared only with trusted parties. It must be ensured that an unauthorized entity does not get access to the data. National regulations and privacy of data must be adhered to adopt blockchain in healthcare. Quality of Service (QoS) One of the big concerns in adopting blockchain in healthcare is delivery time, data must be delivered within the required time. Patients’ lives can be in danger if the required data is not delivered on time. Since blockchain architecture is complex in nature, incorporating blockchain may create computational delays. A lot of research needs be done in order to reduce delay and maintain QoS in terms of reliability before incorporating blockchain in healthcare. Heterogeneous Devices and Traffic Biological sensors are important parts of healthcare and these sensor devices generate various kind of traffic. In general, data traffic is classified into two categories, emergency traffic, and normal traffic. Traffic generated from the data gathered from patients in an emergency situation is emergency traffic and the data gathered by sensors in regular monitoring are known as normal traffic. So, while implementing blockchain in healthcare, a priority mechanism is very much required, so that the emergency traffic experiences a minimum delay compared to regular traffic. Latency Latency is an important parameter of healthcare. Some healthcare applications are based on real-time monitoring to make diagnosis process faster. In the blockchain, blocks are verified before they are added/shared to different stakeholders, this process will create a delay in accessing data and analysis. So, when designing a blockchainbased healthcare system this delay must be considered. Resource Constraints and Energy efficiency Since blockchains add up computational complexities, cryptographic approaches can be a burden for the sensors [46]. Biological sensors are resource-constrained devices in terms of computational power, battery backup, etc. so, this high computational load may cause a rise of temperature in the sensors. This will create discomfort for the patient. Energy efficiency is another challenge since the sensors are battery-powered. Storage Capacity and Scalability Since data generated by health sensors is huge in volume, the nodes of the blockchain should be capable to store these huge data. Health data may consist of medical images, laboratory reports, drug history records, all these require a large amount of storage space. This issue could be solved if cloud storage platforms are used.

204

S. Saif et al.

Security Another important issue for incorporating blockchain in healthcare is the reliability of data gathered. Although blockchain is popular because data stored in the blockchain is immutable, sometimes data that come from the sensors may be corrupted; so, the data will remain corrupt. Data received in the blockchain nodes may be altered and it might be possible because of different security attacks like fake data injection, eavesdropping, etc. So, an effective security mechanism must be taken care to ensure the integrity of the data. Data Mining Blockchain is based on validation of data block; each data that comes from sensors is considered as a block of data and data sent from the sensors each time needs to be validated before adding to the chain. So, the problem will arise when the number of patients is increased; in that case, it will take more time for time for mining because the computation load will increase. So, efficient mining is also a very challenging issue while integrating blockchain in healthcare.

4 System Model A typical IoT based health care system consists of three layers, the first layer consists of different health sensors like, ECG, Pulse, Blood pressure, etc. Usually, these sensors are placed on the body of a patient. They are responsible for sensing different physiological parameters from the patient body, and then this information is sent to the PDA device. In the second layer, PDA device forwards the data to the medical server through an internet connection and in the third layer doctor/medical facility get access of the data. But data in transmit is vulnerable for various cyber-attacks. So, we must need to adopt like confidentiality, authentication, integrity, access control. These four parameters are well-known security requirements for health care applications [47]. Attack Model Traditional IoT based applications mainly faces two types of attacks: attack against confidentiality and attack against integrity. Confidentiality means non-disclosure of private information of patient which is prone to different threats. Some common security attacks on confidentiality are Eavesdropping, impersonation attack, sidechannel attack, packet sniffing, etc. Therefore, it is very important to handle security attacks against confidentiality. Integrity ensures the intactness of data during communications. Nowadays, IoT based biological sensors gather physiological information from a patient and that information is sent to medical facilities since these data are sent through some insecure wired/wireless links, it is easy for an adversary to physically/remotely capture the forwarding device and manipulate the information gathered by sensors. As a result, it may lead to the wrong diagnosis. Some of the common attacks against integrity are data modification attack, fake data injection attack,

Intelligent, Secure Big Health Data Management …

205

replay attack, etc. In our proposed framework we have considered blockchain, which can defend attacks against integrity due it its nature of working and to handle attacks against confidentiality various cryptographic schemes can be used for encryption of data at forwarder device and data can be decrypted using the secret key of medical service providers. Proposed Architecture Here, we propose a secure and smart framework to share the data with different medical facilities in an effective manner. The overall blockchain-based architecture is shown in Fig. 4 where cloud storage is used to store electronic medical record (EMR). In our proposed framework data gathered through sensors are first sent to a PDA device; this device will generate the hash of health data using standard Hash algorithms and after that, the Hash will be forwarded to a private blockchain network through the internet. In the blockchain each medical facility like hospitals, labs, insurance companies, research labs, etc. will act as blockchain node. Hash sent from PDA device will be received by each node and that data block needs to be validated and verified by nodes. Verification is done based on the received hash and that hash is compared with the hash of the previously received data block. It is possible because the data block generated by the PDA device also contains the previous block hash. Majority of the blockchain nodes needs to verify the block. Once verified, the block is added to the chain and a unique secret key and an identifier (ID number) are generated. The key and ID is sent back to the PDA device. The PDA device encrypts the actual health data using the key and the encrypted data, hash of the health data, ID is sent to the cloud-based database server. If someone tries to tamper the data of one block, then the next blocks are also affected. Whenever any medical facility needs to access health data stored in the cloud, first the data is identified through the ID and then decryption is done using the secret key. Once the decryption is done, health data becomes available. It is given as input in the various Deep Learning-based healthcare applications. We have discussed earlier the various applications of Deep Learning techniques for various healthcare applications like Predictive healthcare,

Fig. 4 Proposed blockchain-based health data-sharing framework

206

S. Saif et al.

Medical Decision Support, Personalized treatments, etc. The main advantage of the proposed architecture is the data sharing among the different medical facilities in a secure manner and the data can be used for various healthcare applications. In our architecture, security requirements are maintained. Since private blockchain is used, data is stored/accessed by authenticated users only. Cryptographic algorithms help to maintain the confidentiality of the data. Integrity is maintained due to the working nature of the blockchain and access control is based on the secret key. Since the secret key is generated by the blockchain nodes, only they have permission to decrypt the data.

5 Open Research Issues In this paper, we have described the role of blockchain and deep learning in health informatics. Both these two emerging technologies face some challenges, which is a research area, proper research can mitigate these issues. As discussed earlier unilateral cryptographic algorithms like DES, AES, 3-DES are not a good choice to apply in a blockchain for healthcare applications, applying these algorithms will increase the latency in terms of data sharing. So low complexity encryption-decryption algorithm design is an important research area. Key generation and key sharing should be done in an efficient way so that it does not increase the complexity of a blockchainbased health data-sharing platform. Moreover, as the number of stakeholders may go on increasing in IoT enabled smart and remote healthcare, communication overhead issues, storage overhead issues to be taken care of while designing a blockchainbased secure healthcare system. Health data stored in EMR are largely unlabeled, missing data, noisy data, etc. So researchers should consider the reconstruction of data from the missing data, data filtration is needed to remove the noises. Also, health data is big data due to the large sample size and volume of data. Budding researchers can explore preparing own database based on their own research context besides using existing benchmark database considering demography, geographical location, concerned disease, etc. of target subjects to achieve more realistic intelligent data processing results. There are many research issues in this field, proper exploration is needed to adopt deep learning and blockchain technology in health informatics inefficient way.

6 Conclusion The present era is the era of smart and remote applications in various areas, healthcare in specific, where multiple stakeholders are involved related to big health data which need to be acquired, stored, retrieved in a distributed manner using security techniques such as blockchain and processed intelligently by applying deep learning techniques. Issues and challenges remain in applying deep learning techniques as a medical health record are always not complete, maybe erroneous and not be labeled.

Intelligent, Secure Big Health Data Management …

207

Also, as health record is huge in size and multi parties are involved, to execute all steps of blockchain method may lead to additional storage overhead, communication overhead, and latency to process a submitted request to access data thus making IoT enabled real-time healthcare support unrealistic. This book chapter discusses all relevant deep learning algorithms, and tools, presents basic and fundamental concepts related to big data, healthcare, security, IoT, etc., and illustrates the blockchain-based architecture and defines attack model for a complete view and exploration for the researchers in this domain. Acknowledgements This work has been carried out as a part of sanctioned research project from Government of West Bengal, Department of Science & Technology and Biotechnology, project sanction no. 230(Sanc)/ST/P/S&T/6G-14/2018.

References 1. Majumder, S., Aghayi, E., Noferesti, M., Memarzadeh-Tehran, H., Mondal, T., Pang, Z., Deen, M.J.: Smart homes for elderly healthcare—Recent advances and research challenges. Sensors 17, 2496 (2017) 2. Bahga, A., Madisetti, V.K.: Healthcare data integration and informatics in the cloud. Computer 48(2), 50–57 (2015) 3. Movassaghi, S., Abolhasan, M., Lipman, J., Smith, D., Jamalipour, A.: Wireless body area networks: a survey. IEEECommun. Surv. Tutor. 1–29 (2013) 4. Zhang, Y., Qiu, M., Tsai, C., Hassan, M.M., Alamri, A.: Health-CPS: healthcare cyber-physical system assisted by cloud and big data. IEEE Syst. J. 11(1), 88–95 (2017) 5. Nakamoto, S.: Bitcoin: a peer-to-peer electronic cash system (2008) 6. Andreu-Perez, J., Poon, C.C.Y., Merrifield, R.D., Wong, S.T.C., Yang, G.: Big data for health. IEEE J. Biomed. Health Inf. 19(4), 1193–1208 (2015) 7. Ravi, D., Wong, C., Deligianni, F., Berthelot, M., Andreu-Perez, J., Lo, B., Yang, G.: Deep learning for health informatics. IEEE J. Biomed. Health Inf. 21(1), 2–41 (2017) 8. Karmakar, K., Saif, S., Biswas, S., Neogy, S.: WBAN security: study and implementation of a biological key based framework. Inb: 2018 Fifth International Conference on Emerging Applications of Information Technology (EAIT), pp. 1–6 (2018) 9. Xia, Q., Sifah, E.B., Asamoah, K.O., Gao, J., Du, X., Guizani, M.: MeDShare: trust-less medical data sharing among cloud service providers via blockchain. IEEE Access 5, 14757–14767 (2017) 10. Hölbl, M., Kompara, M., Kamišali´c, A., NemecZlatolas, L.: A systematic review of the use of blockchain in healthcare. Symmetry 10, 470 (2018) 11. Shen, B., Guo, J., Yang, Y.: MedChain: efficient healthcare data sharing via blockchain. Appl. Sci. 9, 1207 (2019) 12. Faust, O., Hagiwara, Y., Hong, T.J., Lih, O.S., Rajendra Acharya, U.: Deep learning for healthcare applications based on physiological signals: a review. Comput. Methods Progr. Biomed. 161, 1–13 (2018) 13. Griggs, K.N., Ossipova, O., Kohlios, C.P., et al.: Healthcare blockchain system using smart contracts for secure automated remote patient monitoring. J. Med. Syst. 42, 130 (2018) 14. Chen, M., Hao, Y., Hwang, K., Wang, L., Wang, L.: Disease prediction by machine learning over big data from healthcare communities. IEEE Access 5, 8869–8879 (2017) 15. Mozaffari-Kermani, M., Sur-Kolay, S., Raghunathan, A., Jha, N.K.: Systematic poisoning attacks on and defenses for machine learning in healthcare. IEEE J. Biomed. Health Inf. 19(6), 1893–1905 (2015)

208

S. Saif et al.

16. Sun, W., Zheng, B., Qian, W.: Computer aided lung cancer diagnosis with deep learning algorithms. In: Proceedings of SPIE 9785, Medical Imaging 2016: Computer-Aided Diagnosis, 97850Z, 24 Mar 2016 17. Esteva, A., Kuprel, B., Novoa, R.A., Ko, J., Swetter, S.M., Blau, H.M., Thrun, S.: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639), 115–118 (2017) 18. Abdel-Zaher, A.M., Eldeib, A.M.: Breast cancer classification using deep belief networks. Expert Syst. Appl. 46, 139–144 (2016) 19. Fakoor, R., Ladhak, F., Nazi, A., Huber, M.: Using deep learning to enhance cancer diagnosis and classification. In: Proceedings of the ICML Workshop on the Role of Machine Learning in Transforming Healthcare, June 2013 20. Ramsundar, B., Kearnes, S., Riley, P., Webster, D., Konerding, D., Pande, V.: Massively multitask networks for drug discovery. arXiv preprint arXiv:1502.02072 (2015) 21. Li, R., Zhang, W., Suk, H., Wang, L.: Deep learning based imaging data completion for improved brain disease diagnosis. In: Proceedings of MICCAI 2014, pp. 305–312, Sept 2014 22. Mohsen, H., El-Dahshan, E.S.A., El-Horbaty, E.S.M., Salem, A.: Classification using deep learning neural networks for brain tumors. Fut. Comput. Inf. J. 3(1), 68–71 (2018) 23. Amin, J., Sharif, M., Yasmin, M., Fernandes, S.: Big data analysis for brain tumor detection: deep convolutional neural networks. Fut. Gener. Comput. Syst. 87, 290–297 (2018) 24. Bar, Y., Diamant, I., Wolf, L., Lieberman, S., Konen, E., Greenspan, H.: Chest pathology detection using deep learning with non-medical training. In: 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI), New York, pp. 294–297 (2015) 25. Ronao, C.A., Cho, S.B.: Human activity recognition with smartphone sensors using deep learning neural networks. Expert Syst. Appl. 59, 235–244 (2016) 26. Zhang, P., White, J., Schmidt, D.C., Lenz, G., Rosenbloom, S.T.: FHIRChain: applying blockchain to securely and scalably share clinical data. Comput. Struct. Biotechnol. J. 16, 267–278 (2018) 27. Azaria, A., Ekblaw, A., Vieira, T., Lippman, A.: MedRec: using blockchain for medical data access and permission management. In: 2016 2nd International Conference on Open and Big Data (OBD), Vienna, pp. 25–30 (2016) 28. Peterson, K., Deeduvanu, R., Kanjamala, P., Boles, K.: A blockchain based approach to health information exchange networks. In: Proceedings of NIST Workshop Blockchain Healthcare, vol. 1, pp. 110 (2016) 29. Patel, V.: A framework for secure and decentralized sharing of medical imaging data via blockchain consensus. Health Inf. J. 1–14 (2018) 30. Chun-Wei, T., Chin-Feng, L., Ming-Chao, C., Yang, L.T.: Data mining for internet of things: a survey. IEEE Commun. Surv. Tutor. 16(1), 77–97 (2014) 31. Artelnics: Neural designer (2015). Available online: https://www.neuraldesigner.com 32. Chollet, F.: Keras (2016). Available online: https://keras.io/ 33. Apache Software Foundation: Apache Singa (2016). Available online: https://singa.incubator. apache.org 34. Skymind: Deeplearning4j (2016). Available online: http://deeplearning4j.org 35. Microsoft: Microsoft cognitive toolkit (2016). Available Online: https://github.com/microsoft/ cntk 36. Apache Software Foundation: Apache MXNet (2016). Available Online: https://mxnet.apache. org/ 37. Artelnics: OpenNN (2014). Avaiable Online: http://www.opennn.net 38. Paszke, A, Gross, S., Chintala, S., Chanan, G.: PyTorch (2016). Avaiable Online: https:// pytorch.org 39. Google: Tensorflow (2016). Available Online: https://www.tensorflow.org 40. Universite de Montreal: Theano (2019). Available Online: http://deeplearning.net/software/ theano/ 41. Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L., Muller, P.-A.: Adversarial attacks on deep neural networks for time series classification. In: IEEE International Joint Conference on Neural Networks (2019)

Intelligent, Secure Big Health Data Management …

209

42. Bahga, A., Madisetti, V.K.: Blockchain platform for industrial internet of things. J. Softw. Eng. Appl. 09, 533–546 (2016) 43. Zheng, Z., Xie, S., Dai, H., Chen, X., Wang, H.: An overview of blockchain technology: architecture, consensus, and future trends. In: Proceedings of the 2017 IEEE International Congress on Big Data (BigData Congress), Boston, MA, USA, 11–14 Dec 2017, pp. 557–564 (2017) 44. Ripple: Ripple—one frictionless experience to send money globally (2018). Available online: https://ripple.com 45. Androulaki, E., Manevich, Y., Muralidharan, S., Murthy, C., Nguyen, B., Sethi, M., Singh, G., Smith, K., Sorniotti, A., Stathakopoulou, C., et al.: Hyperledger fabric: a distributed operating system for permissioned blockchains. In: Proceedings of the Thirteenth EuroSys Conference, Porto, Portugal, 23–26 Apr 2018 46. Zhang, J., Xue, N., Huang, X.: A secure system for pervasive social network-based healthcare. IEEE Access 4, 9239–9250 (2016) 47. Saif, S., Gupta, R., Biswas, S.: Implementation of cloud assisted secure data transmission in WBAN for healthcare monitoring. In: Proceedings of International Conference on Advanced Computational and Communication Paradigms (ICACCP 2017), Advances in Intelligent Systems and Computing, vol. 705, pp. 665–674 (2018)

Sohail Saif is working as a Full Time Ph.D. Research Scholar at Maulana Abul Kalam Azad University of Technology, West Bengal, India. He completed his B.Tech in Computer Science and Engineering and M.Tech in Software Engineering from Maulana Abul Kalam Azad University of Technology, WB in 2014 and 2018, respectively. His areas of research interests are internet of things, network security and remote healthcare. Suparna Biswas is an Associate Professor in the Department of Computer Science and Engineering in Maulana Abul Kalam Azad University of Technology, WB. She completed her ME and Ph.D. from Jadavpur University, West Bengal in 2004 and 2013 respectively. She was an ERASMUS MUNDUS Post Doctoral Research Fellow in cLINK project in Northembria University, Newcastle, UK during 2014–2015. Her areas of research interests are internet of things, wireless body area network, machine learning, network security and remote healthcare. She has authored a number of research papers published in peer reviewed international journals and conferences of repute. She is currently PI of a WB DST funded major research project on IoT based secure remote healthcare. Samiran Chattopadhyay is a professor in the Department of Information Technology, Jadavpur University. He has served as the head of the department for more than twelve years and as the Joint Director of the School of Mobile Computing and Communication since its inception. A graduate, post graduate and gold medalist from Indian Institute of Technology, Kharagpur he received his Ph.D. Degree from Jadavpur University. He has two decades of experience of serving reputed Industry houses such as Computer Associates, Interra Systems India, Agilent, Motorola in the capacity of technical consultant. He led the development of an open-source C++ infrastructure and tool set for reconfigurable computing, released under the GNU GPL 3.0 license. He has visited several Universities in the United Kingdom as a visiting professor. He has been working on Algorithms for Security, Bio Informatics, Distributed and Mobile Computing, and Middleware. He has authored, edited several books and book chapters. Prof. Chattopadhyay acted as a program chair, organizing chair and IPC member of over 30 international conferences. He has published more than 180 papers in reputed journals and international peer reviewed conferences.

Malaria Disease Detection Using CNN Technique with SGD, RMSprop and ADAM Optimizers Avinash Kumar, Sobhangi Sarkar and Chittaranjan Pradhan

Abstract Malaria is life-threatening disease spread when an infected female Anopheles mosquito bites a person. Malaria is one of the predominant diseases in the world. There exists many drugs which make malaria a curable disease but due to inadequate technologies and equipments, we are unable to detect and cure it. The method of diagnosing malaria involves counting of parasite and red blood cells drugs physically which is a labor-intensive and error-prone process, especially if patients have to be tested several times a day. This issue can be solved by training machines to do the work of pathologists. We can the train the machine using many deep learning algorithms. Our model uses CNN based classification to classify the blood films to infected and normal blood films. The experimental result show our model works well on microscopic image and achieves an accuracy of 96.62% and the model has a lower model complexity are requires less computation time. Thus outperforming the state of art used previously. Keywords Malaria · CNN · SGD · ADAM · RMSprop · Deep learning

1 Introduction World’s leading causes of death include very harmful disease such as Malaria. Malaria is spread when an infected female Anopheles mosquito bites a person. It is one of the predominant diseases in the world causing life threatening disease and increasing the Mortality rate in the countries like India. Different kinds of malaria parasite including P. falciparum, P. ovale, P. vivax and P. malariae can cause disease to humans, of which P. falciparum is the deadliest. As per WHO Malaria Report of A. Kumar (B) · S. Sarkar · C. Pradhan School of Computer Engineering, KIIT DU, Bhubaneswar, India e-mail: [email protected] S. Sarkar e-mail: [email protected] C. Pradhan e-mail: [email protected] © Springer Nature Switzerland AG 2020 S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics, Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_11

211

212

A. Kumar et al.

2015 [1], roughly 3.3 billion people in different countries are estimated to be in the risk of being affected with malaria. Also in the report it was mentioned that around 1.2 billion people are at higher risk. It was estimated that there were around 214 million instances of malaria all over the world in 2015 and about 438,000 deaths were seen due to malaria. The impact was more in countries like Africa, where approximately 91% [2] of total demise happened due to malaria which included two third of all deaths were of children of age below 5 years. Some sign of malaria include Muscle pain, vomiting, Chills and in some critical instance it leads to comma which results in Person’s death. There exist many medicines which make malaria a remediable disease but due to lack of new equipments and manual counting of blood cells the rate of deaths are increasing rapidly. The standard method used worldwide for diagnosis of malaria is light microscopy of blood films. This method though frequently used but comes with heavy drawbacks. This method requires heavy expertise of the pathologist which depends on the amount of burden imposed by large scale analysis which is common in malaria prone area. This method involves counting of parasite and RBC drugs manually which is a labor-intensive and error-prone process, especially if patients have to be tested several times a day. However, accurate counts are essential to diagnosing malaria accurately, and are an important part of testing for drug-effectiveness, drug-resistance, and estimating disease severity. This issue can be solved by training machines to do the work of pathologists. We can the train the machine using many deep learning algorithms [3–5]. Deep learning, which is the fastest growing area, has been performing exceptionally well in medical field these days. We use a deep learning model which is popularly known as Convolutional neural network (CNN) in our Model [4]. The main feature of CNN model is that it can automatically detects the important features without any human supervision by training the learning layers once the model fits the input feature. The CNN model provides us the great visualization which helps us in understanding the relations. As compared to other models, CNN is computationally more effective than other models. Other advantage of CNN is that it is easy to train the models and also have less parameter as compared to networks which are fully connected with identical amount of hidden units [6–8].

2 Background CNN are mainly used to categories images, group the images by their similarity and carry out different recognition operation such as image or object recognition. CNN application is not limited in any one field. Some of the applications of CNN are that it can detect different anomalies in the medical images, character text generation, automation of many devices and many more [9].

Malaria Disease Detection Using CNN Technique with SGD …

213

2.1 Convolutional Neural Network (CNN) Nowadays we can see the application of CNN everywhere. It is one of the most sought-after deep learning architecture. Popularity and effectiveness of convents increased the interest in Deep Learning. By AlexNet in 2012, the interest in CNN increased rapidly and has been growing till date. CNN is the best solution to the entire image related problem. When it comes to image related problem statement, CNN is the ultimate go-to model because of its accuracy. CNN can be applied to different models such as recommendation model, natural language processing and many more. The main advantage CNN has over other algorithms is that it automatically detects the features which are essential for classification without teaching the model throughout. For example, given pictures of two different objects, it automatically detects the features that differentiate the two classes. CNN model follow some architecture which is shown in the Fig. 1. First the input image is taken on which we will perform the operations. Convolution and Pooling are performed on input image along with different number fully connected layers. We get output as softmax while performing multiclass classification.

2.1.1

Layers of CNN

Convolution The fundamental layer of CNN is the convolution layer. Convolution is performed on two sets of information computed mathematically to merge them. The input image is on the left side and convolution filter is on the right side [11]. We term convolution filter also as kernel. Figure 2 shows the convolutional operation. Convolution is performed by sliding the window over the input signal. It is calculated by multiplying the elements like that of matrix element multiplication and the adding the results. This sum is mapped to the feature map. Receptive area is the

Fig. 1 Neural network with many convolutional layers [10]

214

A. Kumar et al.

Fig. 2 The convolution operation [10]

field when all these operations take place. The size of the receptive field is same as that of the filter. Figure 3 shows the convolutional layer. In case of image related problem 3D convolution is performed. Here an image has three dimensions namely length, height and breath. The colour of an image or the RGB channel is represented by the height of the image. In order to perform actual convolution we need to perform multiple convolution using different filter, the outcome of each convolution performed is then taken together to form the actual output of the convolution layer.

Fig. 3 The convolutional layer [10]

Malaria Disease Detection Using CNN Technique with SGD …

215

Fig. 4 Relu activation function [10]

Non Linearity Neural networks like ANN and auto encoder are powerful because of its non-linearity. Here the sum of weighted input is passed through an activation function to gain output. Similar technique is used by CNN also. In CNN, the output we obtain from convolution layer is passed through relu activation function. This implies that the output that has been mapped to feature map is not just the summation of the matrix multiplied element but also has relu applied on it. If we consider all the convolution performed, relu activation function is applied on every network because without that the network cannot be powerful [11]. Equation 1, defines the Relu activation function mathematically. y = max(0, x)

(1)

Figure 4, represents the Relu curve. We can add relu layer to our model using keras through following code implementation. from keras.layers import Activation, Dense model.add(Dense(64, activation=’relu’)) Stride and Padding Stride is the count of how we slide the convolution filter at each step of the convolution to be performed. The default value of stride is considered as 1. The bigger the stride, the smaller is the feature map. When the size of the stride is increased, the feature map size gets reduced and may become smaller than that of the input image because the image must contain the convolution filter. In order to maintain same dimensions of image and that of feature map we need to have padding around the image [11]. The padding can be of all zeros or else can be of the values already mentioned on edges of the input image. Now with padding we can achieve a feature map of similar

216

A. Kumar et al.

size of that of image. That’s why to maintain the size of feature map, padding is used in CNN or else it may shrink with each step performed. Figure 5 illustrates how the full padding and same padding are applied to CNN. Pooling We perform pooling after convolution to reduce the size. It also helps us to lessen the parameters which in turn reduce the time of training. It helps to down sample the feature maps by reducing their height and width and keeping the depth or the RGB values constant. Max pooling is the most commonly used pooling technique. It works by considering the maximum value in each pooling window. Pooling has no parameters. It also performs sliding window technique by selecting the maximum value from each window. The size of window is specified using the value of the stride [11]. Figure 6 shows the max pooling, in which a window is slides, like a normal convolution, and get the biggest value as the output.

Fig. 5 Full padding and same padding [10]

Fig. 6 Max pooling layer [10]

Malaria Disease Detection Using CNN Technique with SGD …

217

Hyper parameters If only the convolution is considered by ignoring pooling then we have take into consideration four important factors. They include: • Filter size: filter size of 3 × 3 or 5 × 5 or 7 × 7 is generally used. • Filter count: It is generally a variable size within the range of 32–1024. The more the number of filters used, more powerful the network becomes. This has a limitation also. When the number of filter is increased the over fitting issue increases because of the increase in the count of parameters. • Stride: The size of stride always kept 1. • Padding: Padding is generally preferred. Fully Connected Now after performing pooling and convolution we add an extra layer named fully connected to complete the CNN architecture. The output we obtain after both pooling convolution is performed is a 3D volume but for fully connected layer we need the input to it should be a 1D volume. Therefore we need to convert we need to flatten the 3D volume output obtained in pooling layer to 1D volume so that it can be an input to the fully connected layer. Flattening is a simple converting a 3D volume to a 1D volume. Figure 7 shows the fully connected layer of convolutional neural network. Training The training of CNN is done in the same way as of ANN, back propagation followed by gradient descent. The involvement of mathematical operation is due to convolution.

Fig. 7 Fully connected layer [10]

218

A. Kumar et al.

Fig. 8 Implementation architecture of convolutional neural network [10]

2.1.2

Intuition and Implementation

To implement CNN architecture we need to implement two techniques which are namely feature extraction and feature classification. Feature extraction is performed by convolution layer and pooling layer. In this phase from an anonymous picture, the important features are extracted. For example from a given picture of human being it extracts the number of legs number of hands, eyes etc. the convolution layer trains itself to identify these features by overlapping several layers one upon another. For example the 1st layer detects the outline of the image, the 2nd layer detects the size, and the 3rd layer detects the color which when combined results a particular feature by comparing them in many images [12]. The architecture used by the model is a combination of four convolution layer + one pooling layers, followed by two fully connected layers. The implementational architecture of convolutional neural network is shown in Fig. 8. The basic code for implementing CNN is given below as: model=Sequential() model.add(Conv2D(32,(3,3),activation=’relu’,padding= ’same’,name=’conv_1’,input_shape=(150,150,3))) model.add(MaxPooling2D((2,2),name=’maxpool_1’)) model.add(Conv2D(64,(3,3),activation=’relu’,padding= same’,name=’conv_2’)) model.add(MaxPooling2D((2,2),name=’maxpool_2’)) model.add(Conv2D(128,(3,3),activation=’relu’,padding= ‘same’,name=’conv_3’)) model.add(MaxPooling2D((2,2),name=’maxpool_3’)) model.add(Conv2D(128,(3,3),activation=’relu’,padding= ‘same’,name=’conv_4’)) model.add(MaxPooling2D((2,2),name=’maxpool_4’)) model.add(Flatten()) model.add(Dropout(0.5))

Malaria Disease Detection Using CNN Technique with SGD …

219

model.add(Dense(512,activation=’relu’,name=’dense_1’)) model.add(Dense(128,activation=’relu’,name=’dense_2’)) model.add(Dense(1,activation=’sigmoid’,name=’output’))

2.2 Stochastic Gradient Descent (SGD) In SGD, stochastic tells about the system or task that is associated with random possibility. In this process, instead of whole data set, we select few samples randomly from dataset. SGD computes the parameter’s gradient using only a single or a less training examples [12]. Equation 2 shows the updation of each training example. W := w − n∇ Q i (w)

(2)

2.3 RMSprop The RMSprop optimizer is alike the gradient descent algorithm with momentum. The RMSprop optimizer limits the oscillations in the upright direction. Therefore, we can increase our learning rate and our algorithm could take substantial steps in the horizontal direction converging quickly. The difference between RMSprop and gradient descent is on how the gradients are calculated [13]. We are calculating Running average in terms of means square as shown in Eq. 3, v(w, t) = γ v(w, t − 1) + (1 − γ )(∇ Q i (w))2

(3)

In Eq. 3, γ is forgetting factor w := w − √

n ∇ Q i (w) v(w, t)

(4)

Equation 4 shows the updation of parameters.

2.4 Adaptive Moment Estimation (ADAM) We can use ADAM, which is an optimization algorithm, as an substitute of classical stochastic gradient descent system to update network weights in training data. This is used to perform optimization and is one of the best optimizer at present. ADAM is derived from adagrad and it is the more adjustable approach. ADAGRAD and momentum collectively is known as ADAM [14].

220

A. Kumar et al.

Parameters w (t) and L (t) , where index t indicates the current training iteration, Parameter updation in ADAM is given by: (t) ← β1 m (t) m (t+1) w w + (1 − β1 )∇w L

(5)

2  vw(t+1) ← β2 vw(t) + (1 − β2 ) ∇w L (t)

(6)



mw = 

vw =

m (t+1) w 1 − (β1 )(t+1)

(7)

vw(t+1) 1 − (β2 )(t+1)

(8)



mw wt+1 ← w t − η  vw + ∈ 

(9)

In Eq. 5 and 6, β1 and β2 are gradient’s forgetting factors and second moment of gradients. In Eq. 9, ∈ is small scalar used to prevent division by 0.

3 Automated Diagnosis of Malaria Deep learning can be instrumental in prevent the wrong diagnostic decision by implementing the classification of cell images. An area of machine learning popularly known as Deep Learning has executed outstandingly well in fields other than medical because the its applications had been less implemented in medication area due to absence of expertise in knowledge in that area and due to some privacy concerns as well. But, in last few years medical sectors have started using deep learning [15]. A well known super class of artificial neural networks, Convolutional neural network (CNN) has become most influential in diverse computer vision operations and has gained recognition across a different diversity of domains which includes medical science fields. CNN model can learn spatial features through means of back propagation which involves different building blocks. Figure 9 depicts an example of CNN model. CNN is a best deep learning model specially defined for 2-Dimensional facts such as videos and images. The CNN model provides us the great visualization which help us in understanding the relations. The main feature of CNN model is that it can automatically detects the important features without any human supervision by training the learning layers once the model fits the input feature. As compared to other models, CNN is computationally more effective than other models. Other advantage of CNN is that it is easy to train the models and also have less parameters as compared to networks which are fully connected with similar number of hidden units. [17–20].

Malaria Disease Detection Using CNN Technique with SGD …

221

Fig. 9 Example of CNN Model [16]

Fig. 10 Sample image of dataset

3.1 Image Acquisition The data that has been used in the development of the system were taken from official website of National Library of Medicine (NLM) which contains 27,558 images of cells which is further divided into infected and uninfected cells. Figure 10 shows the sample dataset.

3.2 Data Visualization The technique in which an array of static and interactive graphics within a specific context is used to help us understand and interpret a large amount of data is known

222

A. Kumar et al.

Fig. 11 Labeled image of infected and uninfected cells

as data visualization. We randomly plotted parasitized and uninfected cells which is shown below in Fig. 11 and labeled them as 1 and 0 respectively.

3.3 Data Preprocessing Preprocessing is the process of making transformations on the raw data before the machine learning or deep learning algorithm are applied on it. Preprocessing of data is an essential stage in Machine Learning because the standard of data and functional details can be extracted from it which can affect the quality and accuracy of our model, therefore, processing of data is of utmost important. For example, if we train convolutional neural network on raw images then it will give us poor result. The preprocessing phase also helps to accelerate the whole model. In our Model, Images are processed into Jupyter Notebook. Before inputting the image to CNN for training, we normalize the image by dividing it by 255.

Malaria Disease Detection Using CNN Technique with SGD …

223

4 Proposed Model The Convolutional Neural Network is one of the most effective neural networks to work with images and make classifications. In our model we have used Keras to create the CNN model. Figure 12 depicts the basic flow of our model.

Fig. 12 Flow chart of proposed model

224

A. Kumar et al.

Convolution 2D This creates a convolution kernel. We set a few properties as defined below: • Filters: The first parameter defines the output shape of the layer. In this case, for different layers we kept the value as 16, 32, 64. • Kernel Size: It defines the size of the window we want to use that will traverse along the image. We set it as 2. • Input Size: It is used to define the input size of each image. The parameter input shape will be (50, 50, 3). We need to define input shape only for the first layer. • Activation: The activation function is defined in this parameter. We used relu as the activation function which is also known as Rectified Linear Unit. • Padding: When the size of output feature-maps is same as input feature-maps then it is known as padding. MaxPool 2D Pool_Size: It defines the matrix size which defines the number of pixel values that will be converted to 1 value. We used the pool_size value as 2. Dropout It selects some of the values at random to be set as 0 so as to prevent over fitting in the model and we used only the rate parameter and set it as 0.2. Flatten It flattens the complete n-dimensional matrix to a single array. Dense It defines a densely connected neural network layer and I defined the following parameters: • Activation: It defines the activation function which we set as relu. • Units: Number of neurons in a given layer is defined by Units. Model Training and Result Analysis Using fit method, we trained the model with x_train and y_train. We have used total epochs as 50, which is basically 50 iterations of the complete dataset with a batch size of 50. We have also splitted our data into validation of 0.1, so the model trained on 90% training data and validated on 10% training data. Summary of our Experimental exemplary is shown in Table 1. We have evaluated our model with different optimizer and obtained the different accuracy.

Malaria Disease Detection Using CNN Technique with SGD …

225

Table 1 Summary of model Layer (type)

Output shape

conv2d_1 (Conv2D)

(None, 50, 50, 16)

Param#

max_pooling2d_1(MaxPooling2

(None, 25, 25, 16)

0

conv2d_2(Conv2D)

(None, 25, 25, 32)

2080

max_pooling2d_2((MaxPooling2

(None, 12, 12, 32)

0

conv2d_3(Conv2D)

(None, 12, 12, 64)

8256

max_pooling2d_3(MaxPooling2

(None, 6, 6, 64)

0

dropout_1(Dropout)

(None, 6, 6, 64)

0

flatten_1(Flatten)

(None, 2304)

dense_1(Dense)

(None, 500)

1,152,500

dropout_2(Dropout)

(None, 500)

0

dense_2(Dense)

(None, 2)

208

0

1002

Total parameters: 1,164,046 Trainable parameters: 1,164,046 Non-trainable parameters: 0

4.1 Malaria Detection Using SGD Optimizer Here, stochastic tells about the system or task that is associated with random possibility. In this process, instead of whole data set, we select few samples randomly from dataset. SGD computes the parameter’s gradient using only a single or a few training examples. When we applied SGD optimizer in our model, it gave us the accuracy of 95.54% on test set and 95.33% on train set. Accuracy and Log-Loss (also known as Cost Function) parameter were found during the training of our model and are plotted which is shown in Fig. 13. The classification Report obtained while using SGD optimizer in our model is given in Table 2.

Fig. 13 Graph of log-loss and accuracy while using SGD technique

226

A. Kumar et al.

Table 2 Classification report using SGD 0

Precision

Recall

F1-score

Support

0.98

0.94

0.96

1408

1

0.94

0.98

0.96

1347

Avg/total

0.96

0.96

0.96

2755

4.2 Malaria Detection Using RMSprop Optimizer The RMSprop optimizer is alike the gradient descent algorithm with momentum. The RMSprop optimizer limits the oscillations in the upright direction. Therefore, we can increase our learning rate and our algorithm could take substantial steps in the horizontal direction converging quickly. The difference between RMSprop and gradient descent is on how the gradients are calculated. When we applied RMSprop optimizer in our model, it gave us the accuracy of 95.54% on test set and 95.32% on train set. Accuracy and Log-Loss (also known as Cost Function) parameter were found during the training of our model and are plotted which is shown in Fig. 14. The classification Report obtained while using RMSprop optimizer in our model is given below in Table 3.

Fig. 14 Graph of log-loss and accuracy while using RMSprop technique

Table 3 Classification report using RMSprop 0

Precision

Recall

F1-score

Support

0.96

0.96

0.96

1408

1

0.95

0.95

0.95

1347

Avg/total

0.96

0.96

0.96

2755

Malaria Disease Detection Using CNN Technique with SGD …

227

Fig. 15 Graph of log-loss and accuracy while using ADAM technique

Table 4 Classification report using ADAM 0

Precision

Recall

F1-score

Support

0.95

0.96

0.95

1408

1

0.96

0.96

0.96

1347

Avg/total

0.96

0.96

0.96

2755

4.3 Malaria Detection Using ADAM Optimizer We can use ADAM, which is an optimization algorithm, as an substitute of classical stochastic gradient descent system to update network weights in training data. This is used to perform optimization and is one of the best optimizer at present. ADAM is derived from adagrad and it is the more adjustable approach. ADAGRAD and momentum collectively is known as ADAM. In our model, Adam optimizer gave us the accuracy of 96.88% on train set and 96.62% on test set. Accuracy and Log-Loss (also known as Cost Function) parameter were found during the training of our model and are plotted which is shown in Fig. 15. The classification Report obtained while using Adam optimizer in our model is given below in Table 4.

5 Comparison of Different Techniques After analyzing different optimizer on our dataset we got different accuracies on our train and test set which is plotted in Figs. 16 and 17. The different accuracy is plotted which was obtained by using different optimizer. On Test and Train set we saw that ADAM optimizer worked very well with our dataset and gave us good accuracy of 96.62% in Test Set and 96.88% in Train Set.

228

A. Kumar et al.

Fig. 16 Accuracy on test set

Accuracy on Test Set Accuracy (in %)

97

96.62

96.5 96

95.54

95.54

95.5 95 SGD

RMSProp

ADAM

Optimizers

Fig. 17 Accuracy on train set

Accuracy on Train Set 96.88

Accuracy (in %)

97 96.5 96 95.5

95.33

95.32

95 94.5 SGD

RMSProp

ADAM

Optimizers

6 Conclusion and Future Work The purpose of the proposed method is to improve the quality of detection of Malaria which can help microscopists to detect malaria easily and accurately and further can start the proper medication as soon as possible. The future work is directed towards improving the performance and enhancing the algorithm and denoising the images of blood cell for better detection of Malaria. Another direction of future work is by implementing this model into a single application which can be operated on any Smartphone to detect malaria easily.

References 1. Malaria Microscopy Quality Assurance Manual, version 2. World Health Organization (2016) 2. World Malaria Report. World Health Organization (2016) 3. O’Meara, W.P., Mckenzie, F.E., Magill, A.J., Forney, J.R., Permpanich, B., Lucas, C., Gasser, R.A., Wongsrichanalai, C.: Sources of variability in determining malaria parasite density by microscopy. Am. J. Trop. Med. Hyg. 73(3), 593–598 (2005) 4. Rajaraman, S., Antani, S.K., Xue, Z., Candemir, S., Jaeger, S., Thoma, G.R.: Visualizing abnormalities in chest radiographs through salient network activations in deep learning. In: Life Sciences Conference, IEEE, Australia, pp. 71–74 (2017)

Malaria Disease Detection Using CNN Technique with SGD …

229

5. Liang, Z., Powell, A., Ersoy, I., Poostchi, M., Silamut, K., Palaniappan, K., Guo, P., Hossain, M.A., Sameer, A., Maude, R.J., Huang, J.X., Jaeger, S., Thoma, G.: CNN-based image analysis for malaria diagnosis. In: International Conference on Bioinformatics and Biomedicine, IEEE, China, pp. 493–496 (2016) 6. Dong, Y., Jiang, Z., Shen, H., Pan, W.D., Williams, L.A., Reddy, V.V.B., Benjamin, W.H., Bryan, A.W.: Evaluations of deep convolutional neural networks for automatic identification of malaria infected cells. In: International Conference on Biomedical and Health Informatics, IEEE, USA, pp. 101–104 (2017) 7. Shang, W., Sohn, K., Almeida, D., Lee, H.: Understanding and improving convolutional neural networks via concatenated rectified linear units. In: International Conference on Machine Learning, ACM, USA, pp. 2217–2225 (2016) 8. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: International Conference on Computer Vision and Pattern Recognition, IEEE, USA, pp. 1–9 (2015) 9. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(1), 281–305 (2012) 10. Saha, S. https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neuralnetworks-the-eli5-way-3bd2b1164a53 11. Majumdar, S.: DenseNet Implementation in Keras. GitHub 12. Shang, F., Zhou, K., Liu, H., Cheng, J., Tsang, I.W., Zhang, L., Tao, D., Jiao, L.: VR-SGD: a simple stochastic variance reduction method for machine learning. IEEE Trans. Knowl. Data Eng. (2018) 13. Yazan, E., Talu, M.F.: Comparison of the stochastic gradient descent based optimization techniques. In: International Artificial Intelligence and Data Processing Symposium. IEEE, Turkey (2017) 14. Zhang, Z.: Improved Adam optimizer for deep neural networks. In: International Symposium on Quality of Service. IEEE, Canada (2018) 15. Gopakumar, G.P., Swetha, M., Sai Siva, G., Sai Subrahmanyam, G.R.K.: Convolutional neural network-based malaria diagnosis from focus stack of blood smear images acquired using custom-built slide scanner. J. Biophotonics 11(3) (2017) 16. Saha, S.: A comprehensive guide to convolutional neural networks—the ELI5 way. Towards Data Science 17. Prabhu, R.: Understanding of convolutional neural network (CNN)—deep learning 18. Bibin, D., Nair, M.S., Punitha, P.: Malaria parasite detection from peripheral blood smear images using deep belief networks. IEEE, pp. 9099–9108 (2017) 19. Das, D.K., Maiti, A.K., Chakraborty, C.: Automated system for characterization and classification of malaria-infected stages using light microscopic images of thin blood smears. J. Microsc. 257(3), 238–252 (2015) 20. Kumar, A., Sarkar, S., Pradhan, C.: Recommendation system for crop identification and pest control technique in agriculture. In: International Conference of Communication and Signal Processing. IEEE, India, pp. 185–189 (2019)

Avinash Kumar is a Final year student of KIIT DU, Bhubaneswar, India. His research interests area includes Image Processing, Deep Learning and Machine Learning and currently working in different research domains. Sobhangi Sarkar is a Final year student of KIIT DU, Bhubaneswar, India. Her research interests area includes Deep Learning, Image Processing and Machine Learning and currently working in different research domains.

230

A. Kumar et al.

Chittaranjan Pradhan is working at School of Computer Engineering, KIIT DU, Bhubaneswar, India. His research includes Information Security, Image Processing, Data Analytics and Multimedia Systems. Dr. Pradhan has published more than 50 articles in the national and international journals and conferences.

Deep Reinforcement Learning Based Personalized Health Recommendations Jayraj Mulani, Sachin Heda, Kalpan Tumdi, Jitali Patel, Hitesh Chhinkaniwala and Jigna Patel

Abstract In this age of informatics, it has become paramount to provide personalized recommendations in order to mitigate the effects of information overload. This domain of biomedical and health care informatics is still untapped as far as personalized recommendations are concerned. Most of the existing recommender systems have, to some extent, not been able to address sparsity of data and non-linearity of user-item relationships among other issues. Deep reinforcement learning systems can revolutionize the recommendation architectures because of its ability to use nonlinear transformations, representation learning, sequence modelling and flexibility for implementation of these architectures. In this paper, we present a deep reinforcement learning based approach for complete health care recommendations including medicines to take, doctors to consult, nutrition to acquire and activities to perform that consists of exercises and preferable sports. We try to exploit an “Actor-Critic” model for enhancing the ability of the model to continuously update information seeking strategies based on user’s real-time feedback. Health industry usually deals with long-term issues. Traditional recommender systems fail to consider the long-term effects, hence failing to capture dynamic sentiments of people. This approach treats the process of recommendation as a sequential decision process, which addresses the J. Mulani · S. Heda · K. Tumdi · J. Patel (B) · J. Patel Department of Computer Science and Engineering, Institute of Technology Nirma University, Ahmedabad, India e-mail: [email protected] J. Mulani e-mail: [email protected] S. Heda e-mail: [email protected] K. Tumdi e-mail: [email protected] J. Patel e-mail: [email protected] H. Chhinkaniwala Adani Institute of Infrastructure Engineering, Ahmedabad, India e-mail: [email protected] © Springer Nature Switzerland AG 2020 S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics, Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_12

231

232

J. Mulani et al.

aforementioned issues. It is estimated that over 700 million people will possess wearable devices that will monitor every step they take. Data collected with these smart devices, combined with other sources like, Electronic Health Records, Nutrition Data and data collected from surveys can be processed using Big Data Analysis tools, and fed to recommendation systems to generate desirable recommendations. These data, after encoding (state) into appropriate format, will be fed to the Actor network, which will learn a policy for prioritizing a particular recommendation (action). The action, state pair is fed to the critic network, which generates a reward associated with the action, state pair. This reward is used to update the policy of the Actor network. The critic network learns using a pre-defined Expected Reward. Hence, we find that using tools for Big Data Analytics, and intelligent approaches like Deep Reinforcement Learning can significantly improve recommendation results for health care, aiding in creating seamlessly personalized systems. Keywords Big data · Deep reinforcement learning · Recommendation systems · Biomedical and health informatics · Actor critic model · Electronic health records

1 Introduction Recommender systems have begun to play a key role in industries including entertainment, retail, education, tourism and many more. However, one of the largest industries exploiting the potential of effective and accurate recommendation systems is the healthcare industry. The time has come when people have started to unleash the potential of the state-of-the-art technologies, that we have and can develop, to improve the most important aspect of their lives, i.e. Health. Facts and figures shown in Sect. 2.2 are surprising. Studies by the World Health Organization conclude that there are over 12,400 diseases, disorders and health related ailments that can potentially strike us at any given day. On one hand, around half the population of the United States is not aware about the potential health related threats, be it obesity, diabetes, heart failure, etc., and on the other, we have quintillions of bytes of data, just related to health and fitness being generated on a daily basis. Personalized health recommendation systems are meant to bridge this gap. In this chapter, we propose a deep reinforcement learning based framework for generating personalized health recommendations. Sections 2 and 3 aim to provide some basic, yet important literature concerning the topic on hand. It discusses various facts and figures, role of big data and reinforcement learning in recommendation systems, elaborates on existing recommendation systems and also proposes a need for a reinforcement learning based recommendations. It also discusses the various problems that we are trying to address. They include awareness, data not harnessed to its potential, some security issues and also about the low doctor to patient ratio. The next section deals with some of the limitations of the existing solutions to the said problems, which include lack of an all-round recommendation system, system biases and myopic recommendations.

Deep Reinforcement Learning Based Personalized …

233

Next, we try to use some of the features of Deep Reinforcement Learning, combined with standard machine learning and data mining algorithms and techniques, along with the potential in Big Data to help address the problems by overcoming the said limitations. Thereafter, the aforesaid three-layer deep reinforcement learning based framework is discussed to support our claims. The proposed framework consists of three layers. The first is the data integration and preprocessing layer. In this layer, we try to integrate the data collected from various sources and process it, using big data [1] and data mining techniques, so that it can be fed to the second layer. The second layer is the disease probability prediction layer. It consists of 10 legacy machine-learning and deep-learning algorithms to predict the probability of 10 commonly occurring diseases, and for which some recommendations can be made. Finally, the third layer is the recommendation generation layer. It consists of an actor critic model helping us to make sequential decisions, hence making desirable recommendations. Towards the end, the method to process the outputs of the actor critic model and put it to use is described. Lastly, some sample recommendations prescribed by a medical practitioner have been provided for ready perusal.

2 Background In this era of internet and informatics, the amount of information being consumed and generated has grown exponentially. Before the advent of this myriad of applications generating and using information, it was not so difficult to manage information and it was fairly possible to deliver the right information to the right person. However, considering the present scenario, it has become extremely difficult to deliver personalized information to a targeted audience. Here are where powerful recommendations come into the picture. Previous works in the recommendation systems primarily include content-based collaborative filtering techniques, deep learning models, factorization machines, regression models, hybrid mechanisms, etc.

2.1 Recommendation Systems Content-based recommendation system works on the user’s profile, which contains the information about the user as well as a gist of the types of items liked by the user. Using this information, a model can be built to identify and recommend other similar items that the user is likely to prefer. However, this model fails to identify any other domain that the user might be interested in. So, to solve this issue, a new approach called collaborative filtering came into existence. The basic idea behind collaborative filtering is that “similar users share appreciations”. Collaborative filtering [2] exploits the fact that people with similar choices tend to prefer similar products. Moreover, similar items preferred by peers can be lucrative to recommend to other people in the same neighborhood. Collaborative filtering can be implemented in two ways

234

J. Mulani et al.

discussed above. The first one is called user-user collaborative filtering, and the other is called item-item collaborative filtering. Furthermore, a recommender system can also be built using a combination of the aforementioned content-based and collaborative filtering based, called a hybrid recommendation system [3]. A third approach called property based collaborative filtering can be used to solve some of the persistent issues of data sparsity, overspecification, slow start, etc. Health Aware REcommendation System or HARE is an ontology-based model that uses levels of appeal as a basis of providing recommendations. All of the above-mentioned approaches for recommendation are built from the perspective of customers to help them choose effectively, something that they may or may not be looking for. However, we could not find efficient models to provide recommendations in the discipline of health and bioinformatics. Moreover, there are some issues with these methods that need to be fixed. These methods fail to consider the long term effects of the recommendations they make. Especially, in the health domain, the long-term results far outweigh the short-term successes. The ability of reinforcement learning based model, collectively working with the power of neural networks, to work in a dynamic environment; hence overcoming the shortcomings of traditional recommendation systems make them stand apart. Healthcare recommendations have a dire need to make sequential decisions, rather than spontaneous decisions [4]. The proposed framework helps resolve such issues and provide a unique all-around perspective to effective health recommendations.

2.2 Facts and Figures For machines to learn and perform well, they need is data; not just any data, relevant, complete, formatted, and consistent data. According to International Data Corporation, it is estimated that 2314 exabytes (1 exabyte = 1 billion bytes) of data, relating to healthcare industry alone will be produced annually by 2020, which is growing at an unbelievable rate of 48% per annum. Given this, and the highly advanced algorithms to extract valuable information from this data, coupled with compatible sophisticated hardware, we have an opportunity to give something to the society. Having mentioned this, the biggest question that arises is the sources and the authenticity of the sources of this data. We were not surprised to know about some of the following facts mentioned in the Stanford Medicine 2017 Health Trends Report, titled “Harnessing the Power of data in Health”. 84% of the patients are ready to share vital statistics like blood pressure or basic lab test results and 75% of the people are willing to share information about the health of internal organs. We have been hammered with buzzwords like IoT, Big Data, Machine Learning, Deep learning and what not. Well, statistically analyzing, it is going to be a $34 billion-dollar market for wearable technology, generating quintillions of bytes of research usable data every day. The exponentially growing pace of research in the health domain motivates many researchers to make significant

Deep Reinforcement Learning Based Personalized …

235

contributions. Apart from wearable devices, a substantial amount of data is available for public and research use at platforms like Kaggle, world health organization datasets, data.gov, etc. Big Data tools assist the intelligent systems to gather, manage and process the data effectively.

2.3 Big Data There are a few buzzwords which have gained momentum in past few decades, one of them is Big Data. Before defining it technically, let us give you some reasons behind tossing of this topic. Forbes has reported that approximately 4.15 M YouTube videos are watched every minute, 456,000 tweets are sent on Twitter, 46,740 photos are posted on Instagram and on Facebook 510,000 comments are posted and 293,000 statuses are updated. Not only this Forbes has also reported that with our current pace, we are creating 2.5 quintillion bytes of data, and this pace is only advancing. Internet of Things (IoT) is one of the major technologies which plays a vital role in this advancing. Just imagine the volume of data being produced with these activities. This rapid creation of data that is being developed by social media, telecom, business applications, and various other domains is leading to the formation of Big Data. ‘Big Data is all about size and volume of data’, this is the biggest myth that people have for Big Data. But in reality, it is not just limited to huge volume of data being collected, indeed it is a collection of large volume of data coming from various sources in different formats. Data was generated previously also, but those were in proper formats and that’s why the relational databases were capable of storing them. But due to the varied nature of data, now it is not possible to store them in traditional formats. Big Data has three varied formats: Structured, unstructured and Semi structured.

2.3.1

Characteristics

The following fig explains the five V’s of Big Data [5]: 1. 2. 3. 4. 5.

Volume: Huge amount of data Variety: Different formats of data from various resources, being integrated Velocity: Pace of generation of data Value: Extraction of useful data Veracity: Inconsistencies and uncertainty in data (Fig. 1).

2.3.2

Big Data Analytics

Apart from storing this huge amount of data, there’s another vital problem associated with it, which is to find useful information (knowledge) from this data collection. This

236

J. Mulani et al.

The following fig explains the five V’s of Big Data [15]: 1. Volume: Huge amount of data 2. Variety: Different formats of data from various resources, being integrated 3. Velocity: Pace of generation of data 4. Value: Extraction of useful data 5. Veracity: Inconsistencies and uncertainty in data

Fig. 1 Big Data Characteristics

gives the birth to Big Data Analytics. It is the complex process of processing big data in order to search for any hidden information, interesting patterns, market trends and preferences of customers which can indeed help organizations making their marketing strategies. It is a process of refining the raw, unstructured data retrieved from various sources to useful information. There are various tools available for performing this task like Hadoop, Spark, Hive, Pig etc. Present day organizations realize that Big Data is ground-breaking, however they’re beginning to understand that it’s not so valuable as when it’s matched with wise computerization. With enormous computational power, Machine Learning (ML) and Reinforcement Learning (RL) frameworks help organizations oversee, break down, and utilize their information definitely more effectively than any time in recent memory. Machine Learning and Reinforcement Learning are also used to find hidden information and patterns from huge amount of data using complex algorithms to be faster and accurate. Their capabilities are impacting almost every field. They have a profound effect on healthcare, by providing personalized treatment plans and improving diagnostics. Predictive investigation empowers specialists and clinicians to concentrate on giving better administration and patient consideration, making a proactive system for tending to quiet needs before they are wiped out.

2.4 Reinforcement Learning Reinforcement Learning is an area of machine learning in which an agent learns how to behave in the given environment by taking various actions and observing the rewards/results obtained after taking those actions. It basically maps the situations or states with the corresponding actions to be taken. Concretely, a learner or an agent takes various actions and interacts with the environment and aims to maximize the its expected rewards by taking actions in optimally. The reward can be defined as the result that an agent receives after taking a particular action from the environment. However, for maximizing the total or expected reward the agent cannot always act greedily and maximize the immediate reward. The reinforcement learning algorithms

Deep Reinforcement Learning Based Personalized …

237

Fig. 2 Reinforcement learning elements

try to maximize the rewards in the long run. Policy may or may not be defined as the plan of action of an agent (Fig. 2).

2.4.1

Markov Decision Process

Markov Decision Process (MDP) is the process for modeling the problems in the reinforcement learning. It is used for modeling the sequential decision problems mathematically. The environment in Reinforcement Learning problem consists of a set of States S, a set of actions A, transition probabilities p (st+1| st , at ), a probability distribution of initial states p(s0 ), a reward function r: S A → R (where R is a real number) and a discount factor γ ∈ [0, 1]. These components are used for formulating Markov Decision Process. MDP is defined as a tuple (S, A, p, r, γ ). A policy π is used for mapping the state with corresponding action. π : S → A. The discounted reward with discount factor γ can also be used. Here the goal of the agent would be to maximize the expected return as shown in Eq. 1. Gt =

∞ 

γ k Rt+k+1

(1)

k=0

2.4.2

Q Learning

Q learning uses Action-Value function for a policy π which denotes how good it is for an agent to take an action a being in the state s. Equation 2 denotes the Q value function to be used. Q π (s, a) = E π [G t |St = s, At = a]

(2)

The basic version of Q learning maintains the table of Q values for each stateaction pair value. The Bellman equation (Eq. 3) is used for learning the optimal Q-value function by performing multiple iterations. The optimal policy obtained by

238

J. Mulani et al.

the Q table can be denoted as Q*(s, a).      s Q , a Q ∗ (s, a) = E Rt+1 + γ max ∗  a

(3)

Here (s , a ) denotes possible next state-action pair.

2.4.3

Deep Q Learning

The process of finding Q values for each state-action space cannot be feasible where the actions and states are continues and high-dimensional. Moreover, in the recommender systems the number states will be very large. Hence, the process of learning the Q values for each state-action pair can become very slow if the state space size increases. Therefore, a parameterized values function Q (s, a; θ) is required to approximate the Q values. Here, θ denotes the parameter vectors that is used for defining the Q values. Various function approximators such as Linear Combination of Features, Neural Network, Nearest Neighbor, Fourier/wavelet bases can be used. Deep Q Network (DQN) [6], an algorithm used in Deep Q Learning, uses Neural Network as the value function approximators. The DQN gives the Q values (Q (s, a)) as the output for each of the actions(a) that can be taken from the given state(s). In Deep Q Network the dataset is generated by the tuples of form where an action(a) is taken at state(s) and the immediate reward(r) is observed after reaching the new state(s’). Experience replay is done by selecting the random tuples from the stored database in the memory once the sufficient number of iterations are completed. DQN uses ε-greedy policy for collecting the information of various states in the memory. The network updates the weights of the neural network based on the loss function give below. 2    loss = E Q(s, a; θ ) − r (s, a) + Q s  , a  ; θ −

(4)

Here θ − is a previously stored (frozen) parameter value and is the newly derived parameters. There is also an improvement for DQN called Duel DQN which estimates state-value function V(s) and the advantage function A (s, a) with shared network parameters [7].

2.4.4

Policy Gradient

DQN method tries to learn the state-action value function through the neural network and then select the actions accordingly. Policy gradient method directly learns the policy with the parameterized function, (a, s) [8] The value of reward function is depended on this policy and various algorithms can be applied to maximize the reward. The reward function for continuous space can be defined as follows:

Deep Reinforcement Learning Based Personalized …

J (θ ) =

 s∈S

d π (s)



πθ (a|s)Q π (s, a)

239

(5)

a∈A

Here d π is the stationary distribution of markov chain of π (theta). The equation shows that the reward function depends on action selection as well as stationary distribution of states. The theorem uses likelihood ratios to compute the policy gradient as follows.

θ (6) ∇θ J (θ ) = E πθ ∇θ logπθ (s, a)Q π (s, a)

2.4.5

Actor-Critic Model

Policy based methods and Value based methods (Deep Q Learning) have certain drawbacks. Problem with Policy method based is that it is very hard to find a good score function that evaluates the policy generated by the algorithm. Similarly for Value based method, the policy is implicit in the value function approximation. Hence, it is hard to evaluate the behavior of the model. Actor-Critic model is a hybrid method that incorporates the features of both, the policy-based method and value-based methods. Two neural networks, an actor network that controls the behavior of the agent (policy based) and a critic network that evaluates the actions taken by the actor (value based) are used in this model. Figure 3 shows the architecture of Actor-Critic model. Actor interacts with the environment and updates the θ parameter values of actor network that estimates

Fig. 3 Actor-critic model

240

J. Mulani et al.

the policy. Critic evaluates the actions of actor and updates the parameters of value function approximations based on the reward obtained.

3 Problems The problems that we are trying to address can be four-fold. There is a need to address these problems. These are:

3.1 Data Utilization First, is that despite having so much information about people’s previous health records and knowledge about how it can affect the present health of a person, considering the environmental conditions as well as the medication he/she is undergoing; we are not able to use it all to its full potential. Apart from this, the data that we have may be highly time critical, that means if it is useful now, it may become obsolete at any point in time. Hence, it is important to make the right use of the data and generate useful insights from the same.

3.2 Health Awareness The second, and the most important perspective is that, even with the advancements in the technology, most of the people are not fully aware that they are even suffering from a disease. Apart from this, primarily due to medical jargon, even if they carry out the tests, once the tests are done, they do not track the results in the future. Being so busy in the schedule, many people forget about the health threats hanging right in front of them. By our approach, we provide this end to end solution to collect the data, interpret it, and make people more aware about their own health and health issues.

3.3 Doctor to Patient Ratio A third perspective can be that, even if we have such high end state-of-the-art medicinal treatment techniques and technology, doctors fail to address to so many patients. We have a doctor to patient ratio of less than 1:1000, making it quite difficult for doctors to handle such a huge volume of patients in time. So, if we can develop some smart machines that may substitute a doctor for not so high-risk diseases, that will enable people to data-driven intelligent decision systems, recommending them

Deep Reinforcement Learning Based Personalized …

241

methods to mitigate the diseases, or in some cases even prevent them from happening by predicting some illness that can strike, based on the available data and history of similar patients.

3.4 Information Security Finally, fourth perspective may be about security concerns. Data collected from various sources related to healthcare can be used for providing better solutions to the concerned people for their health-related problems. However, the security of the data should be having prime importance. One must ensure that the health-related data is used for the benefit and betterment of the society for providing health related suggestions. It should not be misused for financial benefits of the company. The framework that we have proposed ensures the security of the data. The health-related datasets that we have collected are only used for giving recommendations to improve health’s of the people. We have tried to avoid inclusion of recommendations that involve financial benefits of various companies in health sector, doctors and hospitals. The sole purpose of the framework is to use health related data and the knowledge of various intelligent algorithms of Machine Learning and AI for the betterment of society. It is right that every individual is different, and that no two people can have same medication even if the diseases they are suffering from is same. However, some steps other than medication like a good diet, or a better exercise format can also help conquer the disease. Our objective here is to provide better recommendations to these aspects that can be generalized and they are beneficial to everyone irrespective of metabolic differences.

4 The Limitations of Existing Solutions There are many solutions to the aforementioned problems. However, no problem is completely solved completely. We found that the following are some of the many limitations that they have:

4.1 Lack of an All-round Solution Many existing solutions for health-related recommendations lack all-around solutions and suggestions. Various online chatbot systems (WebMD) are available that ask various questions to the interactor related to his health and suggest medicine to be taken. However, these systems only take the present scenarios into consideration. The diseases that a person may face cannot be identified by the currently existing

242

J. Mulani et al.

solutions. Moreover, the medical history and family details of a particular person are also not considered for medicine recommendation and disease identification. There are diseases that come from family members inherently. Hence, if family history is not taken into consideration the disease prediction can be false. Similarly, there are various online systems available that suggest a person the food to be taken and diet to be followed after getting the information of the person’s age, sex, weight, height, and other required details. However, these systems lack the feature of identification of potential diseases.

4.2 System Bias The existing recommender systems deal with the items and feedback provided by the users for those items. For building a model, the systems only take into consideration the feedback of the users for the items that the system has already recommended. This problem is called a System Bias, where the system only considers the feedback of the users for the items that a system has recommended. In our case the system has to recommend a content and detailed information based on the given situation. Moreover, it is not necessary that the system will only take the already recommended contents in to the account.

4.3 Myopic Recommendation The recommendation systems are trained to optimize immediate response. Hence, they tend to recommend the content or item which is catchy in nature or users are highly familiar with. These systems avoid exploration of new things which can give higher long-term benefits. However, as aforementioned, and also incorporates a facility to control exploration and exploitation. Hence, Reinforcement Learning Based algorithms can be used for solving the problems with existing recommender systems.

5 Features of RL that Can Help Solve the Problems 5.1 Discounted Future Rewards Reinforcement Learning based algorithms use discounted future rewards. The rewards that an agent will receive after a few actions are also considered while deciding the action for the current state. Hence, long term benefits can be achieved with the help of RL based methods [9].

Deep Reinforcement Learning Based Personalized …

243

5.2 Exploration-Exploitation Control Reinforcement Learning based algorithms provide the facility to control the exploration (taking random action) and the exploitation (taking greedy action). The εgreedy policy allows us to control the exploration of an agent. Moreover, exploration decay parameter is also used while building Deep Reinforcement Learning Models in order to reduce the exploration of the agent after certain iterations or actions.

5.3 Ability to Learn in Dynamic Environments Reinforcement Learning and Deep Reinforcement Learning algorithms are being used for training robots to work in dynamic and real-time environments. Besides, these algorithms are also being used for training an agent to play various games. The agent gets the information about the environment by taking various actions and then behaves optimally once sufficient training has been done [DeepMind]. Hence, the RL based recommendation systems can also work efficiently with the dynamic environment.

6 The Proposed Framework In the preceding section, we saw various diseases, along with some shocking statistics. It is very clear that the problem persists. Here is how we can contribute to a possible solution for the same. So, here is the three-layer framework named “Deep Reinforcement Learning based Personalized Health Recommendation Framework”. The first layer is the data preprocessing layer. The second layer is the disease identification and prediction layer, and the third layer is the recommendation generation layer. An overview of the same is shown in Fig. 4. We have discussed earlier about the fact that millions of gigabytes of data being generated every second. Websites, smartphones, wearable devices, hospital reports, etc. are found to be the key contributors for the same. Accumulation of such huge amount of data which is varied, versatile, volumetric, velocious and veracious in nature is nowadays being referred to as Big Data. As shown in Fig. 4, data collected from various sources have to be integrated first. The process of integration is cumbersome, because of the irregular and inconsistent structure and format of the data gathered from variety of sources. However, it is a necessary step. In the proposed framework, the integration is done keeping patients as subjects. Each patient can be assigned a patient ID, unique worldwide, and all the data concerning that patient can be stored in a semi structured format, giving us the flexibility to accommodate structured, semi structured as well as unstructured

244

J. Mulani et al.

Fig. 4 Framework overview

data, which may be obtained from reports, wearable devices, health records, hospital patient records, etc. Moreover, just integration is not sufficient. Quality data mining techniques have to be employed for preprocessing the data before actually using it. The detailed description of preprocessing as well as the usage of the data has been discussed below.

6.1 The Data Preprocessing Layer As discussed earlier, we can collect huge and huge volumes of data from a myriad of sources. All these data, however, are raw and cannot be used directly. The data that we have, consists of many heterogeneous parameters. Some of the common issues with all the raw data that we have are: 1. Missing Data: It is not possible that we get all the details about all the people, especially patients. We have to deal with the missing data. There are several alternatives as to how to deal with them. Some of them are: Replace with a constant: If we dig deeper, and think about the reason behind the missing data, there is a high probability of that person not suffering from that disease. Hence, no test results about that particular attribute is available, or the case may be completely opposite. That the person is not aware about any such test, or even that there is a possibility that he may suffer from such a disease in the foreseeable future. So considering both the scenarios, we can convert it to two records, by duplicating it. In the first record, we replace the missing value with the value of that attribute for a normal person. We use this data to predict the disease. The second record, we ignore that parameter, or if it is possible to use some alternative, may be less correlated parameter for the prediction of the disease can be used. Finally, both the prediction’s chances can be either compared

Deep Reinforcement Learning Based Personalized …

245

and maximum is chosen, or a mean of both the predictions can be taken as the final result. Interpolation: Another possibility of the missing data may be that the person did not undergo a particular test for a particular year. But, the data for preceding and succeeding time periods are available. Different types of interpolation techniques can be employed to fill the missing information. 2. Data formats: The data that we plan to collect are from different sources, collected and maintained by different organizations, about different diseases, and different hospitals and stored under different models (unstructured, structured or semi-structured). The best way to deal with such data is to convert the data to a format that aids in accommodation of not just presently available data, but also that the data generated and collected over years to come. XML, or JSON are the best formats for the same. Many document-based databases help converting data from different formats to the said formats. 3. Normalization: Deep learning and machine learning algorithms require data in the normalized form. 4. Data Integration: The data that we collect and preprocess have to be integrated in a format that is compliant with machine learning and deep learning algorithms’ input formats. Hence, integration of the data is also an important step before using the data.

6.2 The Disease Prediction Layer After preprocessing the data, we move to the disease prediction module. The processed and integrated data are now fed to the disease prediction layer. In this layer, we try to employ the most accurate existing machine learning based algorithms to predict the chances of occurrences of some of the common diseases that we target for recommendation generation.

6.2.1 1.

Diseases Obesity Obesity has increased at an alarming rate since the last few decades. A survey in the USA conducted by The Centers for Disease Control and Prevention (CDC) reveals that around 39.8% of the population in the US is obese. High obesity leads to heart attack, Type-2 diabetes and certain types of cancer [10]. CDC has initiated many campaigns in order to make people aware of it. Research shows that if you could detect obesity before the age of 5, necessary steps can be taken to prevent it. The SVM (Support Vector Machine) helps us the best in finding whether a person is suffering/will suffer from obesity [11]. It is tested upon the National

246

2.

3.

4.

5.

6.

J. Mulani et al.

Human Genome Research Institute Catalog, which is a manually curated and publicly available database. Heart Disease Next, we take upon heart disease prediction. The recent reports unfold that heart attacks have become the major cause of death, especially death due to some medical illness. Many researches have contributed to providing optimized solutions for predicting whether a person is suffering from any heart disease. Once again, the Support Vector Machine [12] algorithm is preferred over others. A typical heart disease-related dataset contains a total of 13 features and 1 target variable. Some among other data contributors are the wearable devices. Accuracy up to 87% can be achieved. Diabetes One of the most chronic and frequently occurring diseases is Diabetes. Until 2015, 30.3 million people in the USA, or 9.4 percent of their population, suffered from diabetes. The more shocking information is that 1 in 4 of them was not knowing that they have it. This problem is solved by the advances in Machine Learning. It is recommended to use a Naive Bayes Classifier algorithm to predict the presence of diabetic sugar. It outperformed other classifiers by giving an accuracy of 76.3% [13] Rheumatoid Arthritis Rheumatism is a torment in the musculoskeletal framework that brings down the personal satisfaction of patients. It is very imperative to foresee patients who will create rheumatic illnesses as far as personal satisfaction. Some of the common symptoms are people developing fatigue, ambiguous pain in muscles and joints, anorexia, etc. These are difficult to diagnose unless the patient is aware of themselves. Some of the other symptoms that can be identified are morning stiffness, inflammation in hand and wrists, etc. Some features that can be used for early prediction are Rheumatoid factor, anti-CCP, BUN, T_Cholestoral, LDL, HDL, TG, Glucose, ESR and CRP. The best proposed algorithm for prediction of this disease is K-Means Clustering algorithm, giving an accuracy of 84%, with k = 4. Liver Disease Any disturbance which causes a disturbance in the functioning of liver which can lead to illness is termed as Liver Disease. It is also known as hepatic disease. According to the World Health Organization (WHO) report, around 3% of the world’s population is infected with hepatitis C. Out of every 6 infected people, 5 are unaware of their disease. It’s very important for the people to know about it as liver coordinates some critical activities within the body. We try to predict the disease and help the person know about it. We are using the C4.5 decision tree algorithm to predict the disease [14]. We have tested it on the UCI Liver dataset. Asthma It is a chronic disease where the bronchial tubes, present inside the airway of the lungs become swollen or inflamed, making it more susceptible to an allergic

Deep Reinforcement Learning Based Personalized …

7.

8.

9.

247

reaction. Moreover, the swelling makes the movement of air, to and fro the lungs difficult, causing troubles while breathing. Annual U.S. expenditures for Asthma are $56 billion. Around 8.3% of the people are suffering from one or the other form of Asthma. These numbers clearly justify a need for an effective method to deliver an intervention to identify severe exacerbations before the patient actually experiences it. So, in the paper [15], they have built an efficient prediction system that helps address this alarming issue by using data prepared by Daily Asthma Diary, on an Adaptive Bayesian Network algorithm to achieve a sensitivity of 73.8% and specificity of 71.4%. Dementia Dementia is a neurodegenerative brain disease that results in causing the death of nerve cells. The damage of nerve cells interferes with the ability of the cells to communicate with each other. Dementia may not be termed as a specific disease, rather, it is usually referred to as a term that describes a group of symptoms associated with a decline in memory or other skills that hinders the person’s ability to perform daily tasks. Alzheimer’s disease accounts for 60–80% of the cases followed by vascular dementia. A Naive Bayes Classifier [16] is advised to be used to predict the disease using the available data. Thyroid In India, around 42 million people suffer from thyroid disorders, mainly through hypothyroidism. Every 1 among 10 adults is suffering through it. Most of the patients include women. Every 3 out of 10 women suffering from this disease are unaware of it. It is often confused with obesity. With the help of an Artificial Neural Network (ANN), we try to figure out whether the person is suffering from it or not. We have used the Thyroid Disease Dataset from the UCI Machine Learning Repository for our framework. Urine Infectious Around 150 million people are reported to be diagnosed with Urinary Tract Infection (UTI) per year. It is a common disease among women, due to their urethral anatomy. This leads to some serious danger to life. People suffering from it should be diagnosed frequently. It is very hard to diagnose a person suffering from UTI as its most of the symptoms are similar to those caused by inflammation, etc. Using Back Propagation Neural Network, we try to predict this disease, with complex symptoms [17]. The algorithm works on the following parameters; Anamnesis: Gender, Age, Fever, Sudation, Chill, Low back pain, Suprapubic Pain, Malaise, Dysuria, Pollakiuria

248

J. Mulani et al.

Full urine analysis: Urine culture, Glucose, Leucocyte, Erythrocyte, Protein Ultrasound: Renal ultrasound, Bladder ultrasound. 10. Infectious Diseases Infectious diseases tend to be visible enough. Still, some of them need some care to be taken so as to improve health conditions. As they are infectious, they are susceptible to spread quickly. They need to be addressed on time. Due to their vast variety, LSTM (Long-Short Term Memory) and DNN (Deep Neural Networks) are best-preferred algorithms. Chae et al. [18] They may be the easiest to be noticed, but to get to the root cause is really difficult. Some basic care can be taken so as to mitigate their harmful effects on a person as well as the people around him. State Representation Module Actor network of the Actor-Critic model takes state or features of a state as an input and gives the recommendations based on the given input [19]. The output obtained from Disease Prediction Layer module contains the probability of occurrences of each of the mentioned diseases. These probabilities along with patient’s general information is given as an input to the State Representation Module. The state representation module, will, as the name suggests, represent the said information in the format required by the actor model to function. Basically, it represents values as a finite number of states that are used to predict the actions in the actor model.

6.3 The Recommendation Generation Layer The recommendation generation layer module consists of an Actor-Critic Model. Figure 6 shows the architecture of Actor-Critic Model. The module can be divided in the following 2 parts.

6.3.1

Actor Network

Actor network, also called as policy network is shown on the left part of Fig. 5. The actor network generates the action a based on the given state s and tries to learn the policy by adjusting the parameters of neural network. Here, the actor tries to approximate the policy to give the recommendations related to health. The details of these recommendations generated are specified in Sect. 6.3.3. Actor network receives a state an input from State Representation Module. Based on the input, the actor predicts the corresponding recommendations to be given. The parameters of the actor networks are updated from the Q values of each of the state-action pair (s, a) produced by the critic network. Policy Gradient algorithm is used for updating the parameters of the actor network (Fig. 6).

Deep Reinforcement Learning Based Personalized …

249

Fig. 5 Disease prediction layer

Fig. 6 Recommendation Generation Layer

6.3.2

Critic Network

Critic network, also called as target network is shown on the right part of Fig. 5. The critic network tries to approximate the value function for the system based on the rewards r (s, a) obtained from the environment, after the actor takes an action a from the current states s. The reward function mainly depends on the environment in which the system is being implemented. Positive reward for pertinent recommendations and negative reward for irrelevant recommendations can be given from the environment.

250

J. Mulani et al.

The Temporal Difference learning based error is used for updating the parameters of critic network by calculating the TD error from the reward obtained and the Q values predicted. The output generated by Critic Network Q (s, a) is also used for evaluating and the actions of actor and updating the parameters of actor network as mentioned earlier.

6.3.3

Interpreting the Outputs

The activation function being tanh(x) produces the outputs ranging from −1 to 1. It will be an array of size equal to the number of output neurons. Each of the output neurons represents a health-related recommendation. For example, walking, jogging, playing a particular sport, a recommended diet, a specific set of physical activities to be carried out, etc. The number of output neurons depend on the scale of application and the number of diseases that are targeted. The question arises that how will this array of numbers help in generating actual recommendations. The following steps are suggested to generate apt recommendations: Step 1: Categorize The probabilities and age groups may be categorized as shown in Table 1. These categories can be altered and adjusted as per the requirements. They are made to help the end users (the ones for whom the recommendations are generated). Then we generate the recommendations based on these categories. Step 2: Generate After categorizing, the recommendations that can be generated combining the outputs of Disease Prediction layer, age groups and probability of occurrences as shown in Table 2. Now, the question arises that how are these recommendations communicated to the target user. If we have a dedicated medical portal, we can do show pop-ups when the user is logged in. However, in absence of such a facility, we can use the push notification services that may include media like e-mail, SMS, etc.

7 Future Improvements The system proposed here has various scopes of improvements in the future. Some of the possible improvements that we aim to do have been mentioned below.

7.1 Actor-Critic Recommendation System The approach presented here uses Actor-Critic model for generating recommendations related to health. However, with advancements in Deep Reinforcement Learning

Deep Reinforcement Learning Based Personalized … Table 1 Categories of probabilities and age

251

Categories of probability of occurance of disease Probability of occurrence of disease (%)

Category

0–25

Very low

26–50

Low

51–75

High

75–100

Very high

Categories of age group Age group

Category

10–20

Adolescent

21–30

Young

31–40

Adult

41–50

Middle aged

51–60

Old

61–80

Veteran

Table 2 Generated recommendations Disease: diabetes Probability of occurrence

Age group

Output of actor network Context

Value

Low

Young

Jogging

0.5

Walking

0.3

Swimming

0.7

Aerobics

0.8

High

Veteran

Walking

0.8

Jogging/ Running

-0.4

Yoga

Related diseases: Obesity

0.5

Recommendation Jogging is more recommended than brisk walking for young people, along with exercises that include swimming, aerobics, etc. For elderly people with a high probability of occurrence of diabetes, walking is better recommended than running. Moreover, instead of vigorous exercises, yoga is recommended, considering the age range

252

J. Mulani et al.

based algorithms, various improved and efficient algorithms can be used for generating recommendations by interacting with the environment. For example, Hindsight Experience Replay (HER) uses the mistakes committed by the model to learn a better policy [20]. Hence, current model can be combined with HER can be used for achieving higher accuracy by learning the negative rewards obtained from the wrong recommendations generated.

7.2 Recommendations The proposed system mainly gives general recommendations for physical activities and food. However, various other health related recommendations can be incorporated in order to give an efficient and personalized recommendations to each of the patient.

7.3 Data Preprocessing The system aims to collect readily available data from hospitals, wearable devices and laboratories. However, availability of a slightly more specific data such as family health history of a particular patient, type of physical activities that a patient is performing daily and other details about the patient’s daily routines can significantly improve the disease prediction accuracy of the model. Hence, more specific and accurate recommendations can be generated based on the disease and the daily routine of the patient. Information retrieval systems can be employed for collecting data in a much efficient manner.

7.4 Disease Prediction A generalized approach can be developed for prediction of various diseases based on the details of the patient provided. This can help us achieve the goal of “Personalized Health Recommendations” in an efficient and a robust way.

8 Conclusion The road to health is paved with good recommendations. Healthcare development has been found to be one of the fastest growing fields among others including space exploration, software development, etc. With the advancement in technology, it is now possible to build and achieve what couldn’t even be imagined a few years

Deep Reinforcement Learning Based Personalized …

253

back. The proposed approach is a step closer towards building such systems by exploiting the technology we have. In this chapter, we have tried to use the concepts of Recommendation Systems, Reinforcement Learning, Machine Learning and Big Data for the same. The increasing health consciousness among people, along with gigantic growth of data and improvements in technology make this framework a promising work for the future. Because of its personalized solutions, this can even be a propitious business model.

References 1. Elgendy, N., Elragal, A.: Big data analytics: a literature review paper. In: Industrial Conference on Data Mining, pp. 214–227. Springer, Cham (2014, July) 2. Pan, C., Li, W.: Research paper recommendation with topic analysis. In: 2010 International Conference On Computer Design and Applications, vol. 4, pp. V4–264. IEEE (2010, June) 3. Han, Q., Ji, M., de Troya, I.M.D.R., Gaur, M., Zejnilovic, L.: A hybrid recommender system for patient-doctor matchmaking in primary care. In: 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), pp. 481–490. IEEE (2018, Oct) 4. Wiesner, M., Pfeifer, D.: Health recommender systems: concepts, requirements, technical basics and challenges. Int. J. Environ. Res. Public Health 11(3), 2580–2607 (2014). https://doi. org/10.3390/ijerph110302580 5. Patgiri, R., Ahmed, A.: Big data: the v’s of the game changer paradigm. In: 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 17–24. IEEE (2016, Dec) 6. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602. (2013) 7. Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581. (2015) 8. Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, pp. 1057–1063. (2000) 9. Zhao, X., Zhang, L., Ding, Z., Yin, D., Zhao, Y., Tang, J.: Deep reinforcement learning for list-wise recommendations. CoRR, vol. abs/1801.00209. (2018) 10. Mokdad, A.H., Ford, E.S., Bowman, B.A., Dietz, W.H., Vinicor, F., Bales, V.S., Marks, J.S.: Prevalence of obesity, diabetes, and obesity-related health risk factors, 2001. JAMA 289(1), 76–79 (2003) 11. Montañez, C.A.C., Fergus, P., Hussain, A., Al-Jumeily, D., Abdulaimma, B., Hind, J., Radi, N.: Machine learning approaches for the prediction of obesity using publicly available genetic profiles. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 2743–2750. IEEE (2017, May) 12. Sharmila, R., Chellammal, S.: A conceptual method to enhance the prediction of heart diseases using the data techniques. Int. J. Comput. Sci. Eng. (2018, May) 13. Sisodia, D., Sisodia, D.S.: Prediction of diabetes using classification algorithms. Proc. Comput. Sci. 132, 1578–1585 (2018) 14. Sindhuja, D., Priyadarsini, R.J.: A survey on classification techniques in data mining for analyzing liver disease disorder. Int. J. Comput. Sci. Mob. Comput. 5(5), 483–488 (2016) 15. Finkelstein, J.: Machine learning approaches to personalize early prediction of asthma exacerbations. Ann. New York Acad. Sci. 1387(1), 153–165 (2017) 16. Jammeh, E.A., Camille, B.C., Stephen, W.P., Escudero, J., Anastasiou, A., Zhao, P., Chenore, T., Zajicek, J., Ifeachor, E.: Machine-learning based identification of undiagnosed dementia in primary care: a feasibility study. BJGP open 2(2). bjgpopen18X101589. (2018)

254

J. Mulani et al.

17. Ozkan, I.A., Koklu, M., Sert, I.U.: Diagnosis of urinary tract infection based on artificial intelligence methods. Comput. Methods Progr. Biomed. 166, 51–59 (2018) 18. Chae, S., Kwon, S., Lee, D.: Predicting infectious disease using deep learning and big data. Int. J. Environ. Res. Public Health 15(8), 1596 (2018) 19. Liu, F., Tang, R., Li, X., Zhang, W., Ye, Y., Chen, H., et al.: Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv preprint arXiv: 1810.12027. (2018) 20. Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., et al.: Hindsight experience replay. In: Advances in Neural Information Processing Systems, pp. 5048–5058. (2017)

Jayraj Mulani is pursuing B. Tech from Institute of Technology, Nirma University; currently studying in the penultimate year. He was a versatile student at high school being beneficiary of The Best Student award for an all-round performance at Divine Child School. Being passionate about learning new things, he has always looked to explore and work on technologies ranging from development to modelling to machine learning. His areas of interest include data science, image processing, recommendation techniques, machine learning, deep learning, reinforcement learning and information retrieval. He is a computer science enthusiast and aims to pursue higher education from one of the best universities across the globe. His interest in recommendation systems, combined with the alarming health related issues people face, has led him and his friends to think of this personalized health recommendation system. He hopes that it will change the way to generate personalized recommendations and this technology becomes domain independent. He strongly believes in solving data-driven problems using the state-of-the-art tools ruling the Indian market. Sachin Heda is a third-year undergraduate pursuing B. Tech from Institute of Technology, Nirma University. Since childhood he has engaged himself in solving real life problems. He is an enthusiastic learner and has high conceptual clarity. He was awarded with the Head Boy of his School. Being a computer science enthusiast, he had explore and work on many technologies including data science, recommendation techniques, big data. He is an extrovert person and thinks that now people are becoming more aware about their health. He and his friends think that this personalized health system will bring a huge change (positive) in the lives of people. Kalpan Tumdi is a third-year undergraduate pursuing B. Tech from Institute of Technology, Nirma University. His enthusiasm and passion to learn and develop new things and a keen interest in Machine Learning influenced him to get involved in various Machine Learning related research. His areas of interest include data science, image processing, reinforcement learning, machine learning and computer vision. As people are becoming more and more health conscious, he aims to use machine learning and deep reinforcement algorithms to help people get knowledge about improving their health and fitness. Prof. Jitali Patel is working as an Assistant Professor in Computer Science and Engineering Department at Institute of Technology, Nirma University. She obtained post-graduation degree ME(CE) from Dharmsinh Desai University in the year 2010. She has an experience of more than 10 years in the field of Teaching. She has taught Artificial Intelligence, Information Retrieval, Data Mining, Object Oriented Programming and Data Structure. Her area of interest and research are Machine Learning and its applications. She has published more than 10 peer review research articles

Deep Reinforcement Learning Based Personalized …

255

Hitesh Chhinkaniwala is an associate professor and Head of the department in Information and Communication Technology, Adani Institute of Infrastructure Engineering, India. His area of interest are Data Mining and Knowledge Discovery, Privacy Preserving, Text Mining, Text summarization, Information Extraction, Sentiment Analysis, Statistical Data Analysis and Ontology Learning. He has published more than 20 peer review research articles and a book. He is a reviewer of Transactions on Knowledge Discovery from Data (TKDD) Prof. Jigna Patel is working as an assistant professor in Computer Science and Engineering Department at Institute of Technology, Nirma University. She obtained post graduate degree ME from Dharmsinh Desai University in the year 2008. She has experience of more than 10 years in the field of teaching. She has taught Theory of Computation, Cyber Security, Artificial Intelligence, Big Data Analytics, Principles of Programming Language, Mathematical Foundation for Computer Science and C Programming. Her area of interest and research are Data Warehousing, Data Mining and Big Data Analytics

Using Deep Learning Based Natural Language Processing Techniques for Clinical Decision-Making with EHRs Runjie Zhu, Xinhui Tu and Jimmy Huang

Abstract Natural language processing (NLP) is an interdisciplinary domain of research that focuses on the interactions between human languages and computers. There has been a recent trend of solving the NLP problems using deep learning approach. The applications of deep learning in the healthcare sector are mostly considered to be related to canonical examples of applying image processing and computer vision techniques to medical scans for disease diagnoses. Electronic Health Record (EHR) is another source of data often being neglected, equally if not more important than medical scans, that can change the way we learn useful features and information from the medical records of patients. These text-based information stored within the EHR are data-rich by nature, but are often not well-understood due to its characteristics of high volume, variety, velocity and complexity. However, these specific characteristics fit right to the nature of deep learning. Therefore, we believe it is the right time to summarize the current status, to review and learn from the stateof-the-art medical-based NLP techniques. Different from the existing reviews, we examine and categorize the current deep learning-based NLP techniques in medical domain into three major purposes: representation learning, information extraction and clinical predictions. Meanwhile, we discuss whether the application of deep learning methods has tackled the problems differently and transformed these tasks revolutionarily. Based on the results, we find that the distance to revolutionize the existing healthcare sector using deep learning methods still remains long. However, the recent progress made by these proposed methods have already made a promising good start. Furthermore, we state some of the legal and ethical considerations, R. Zhu (B) Information Retrieval and Knowledge Management Research Lab, Department of Electrical Engineering and Computer Science, York University, Toronto, Canada e-mail: [email protected] X. Tu (B) School of Computer Science, Central China Normal University, Wuhan, China e-mail: [email protected] J. Huang (B) Information Retrieval and Knowledge Management Research Lab, School of Information Technology, York University, Toronto, Canada e-mail: [email protected] © Springer Nature Switzerland AG 2020 S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics, Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_13

257

258

R. Zhu et al.

present the status quo of the healthcare industry applications, and provide several possible directions of future research. Keywords Deep learning · Electronic health records · Natural language processing · Representation learning · Information extraction · Clinical predictions

1 Introduction Health problems remain as the central issue for human lives. Healthcare is a concept which usually refers to a set of services exercised by health professionals that can help the patients to improve or maintain health by body check, diagnosis, treatment, restoration, and prevention etc. Because of the nature of the healthcare sector, it can produce data in many different forms and structures, such as DNA sequences, medical scans, and electronic health records (EHRs), at a large scale and an unprecedented speed by the continuously growing number of patients, medical facilities, and healthcare providers. However, the data provided by these healthcare units are often not well-processed nor well-understood due to its features of high volume, velocity, variety and complexity. Thus, the adoption of deep learning-based natural language processing technologies into healthcare studies is increasingly common over the past few decades. Deep learning, as a subgroup of machine learning, is a class of algorithms that can extract features directly from a set of given raw inputs using hidden layers without human intervention. In the past few years, this class of algorithms have caught attention of researchers due to its promising and robust results across a wide variety of tasks and domains. It is now widely accepted that deep learning approaches, such as convolutional neural networks (CNN), recurrent neural network (RNN) etc., can perform well and achieve robust results given different data structures and under different circumstances. In healthcare, for example, it is more common to apply convolutional neural network to examine radiology images, whereas for text-based medical notes, recurrent neural network is more commonly seen. The Electronic Health Record (EHR) systems, as a component of the healthcare industry, has gradually drawn more attention recently. The EHR refers to a system that collects all patients’ health information to store in digital format. Researchers and healthcare professions consider EHRs to be a very important source of medical data to provide insights to domain problems. However, as the data types of EHR vary extensively, and that the system contains a great amount of free text, it has been challenging for traditional models to tackle these problems. Thus, deep learning models which can extract features without human interventions are particularly well suited to solve the EHR problems. The useful information in EHR system include but are not limited to, patient demographics, lab results, medical scan images, prescriptions, medical history, and clinical notes, etc. Hospitals initially adopt the EHR system to store all patients’ data for tracking care, as well as to achieve administrative and

Using Deep Learning Based Natural Language …

259

billing purposes. In fact, these EHRs can provide undiscoverable insights to capture disease trends, make chronical decisions and draw medical conclusions. Among all the information that an EHR system contains, text-based clinical notes are one of the most important resources of the patient’s EHR, however, doctors, nurses, physiotherapists and pharmacists usually complete each section separately using different medical wordings and representations. This has created difficulties for information alignment. Hence, the biggest challenge in the current EHR system is that all the EHRs are structurally different. It is already time consuming to align the information in medical profiles, not even to create a pipeline for these EHR notes processing, term extractions, embeddings or aligning over years, and across hospitals or other units of healthcare providers. Therefore, the current goal of EHR management is to increase clinical efficiency by empowering the physicians with better and more user-friendly EHR systems. Moreover, the system has to help by lowering the clinical diagnosing costs and minimizing the possibilities of medical misdiagnoses. In this chapter, we present the current status of deep learning-based methods that are adopted in healthcare sector. Then we discuss the unique challenges, and give the directions of both clinical and technical opportunities for future work. In second section, we lay out an overview of the existing deep learning methods and cover the backgrounds and motivations of applying deep learning approaches to medical domain. In Sect. 3, we examine and categorize the current deep learning-based NLP techniques into three major purposes: representation learning, information extraction and clinical predictions. Meanwhile, we compare the experimental results of these deep learning-based NLP models in medical domain and focus on the novelty and diversity of these techniques, as well as their evaluation metrics. In Sect. 4, we present some other related application themes and issues of deep learning in healthcare, following by the discussion on legal & ethical considerations, as well as the industry applications. Finally, in Sect. 5, we acknowledge that the recent progress made by these proposed methods have already made a promising good start. Last but not least, we provide several possible and promising directions of future research.

2 Deep Learning for Natural Language Processing Natural language processing is a hot research field in computer science and artificial intelligence focusing on the interactions between human languages and computers. Specifically, it suggests how to represent, process and analyze the large amount of natural language data. Neural networks are powerful learning models in general. Deep learning approaches have gained impressive successes in image and speech recognition. In the past few years, natural language processing has also taken great advantage of the deep learning algorithms and methods to achieve great advances. There has seen an attention shifted from traditional machine learning models, such as Support Vector Machines (SVM) and Logistic Regressions, towards the deep neural network

260

R. Zhu et al.

models such as CNN and RNN. These deep learning approaches eliminate the timeconsuming work on hand-crafted features and replace them with automatic feature learning. In this section, we present the main deep learning approaches and architectures applied in natural languages processing research field. The major approaches include distributed representations which is the foundation of deep learning models, CNN, RNNs and Transformer-based Neural Networks.

2.1 Distributed Representation In the past, local representation is often used to store memories and represent entities with single element directly. It is an easy to understand and easy to implement structure but very inefficient, as each unit is associated with only one represented thing [1]. Distributed representation, however, provides an effective and efficient way of using more than one representational elements to represent each unit. As the representations of different units overlap in the neural network, it is possible for the network to respond to a new input based on its generalization capability. The network is able to output significant features automatically by pretraining all raw data inputs. In most cases of our research, scholars do not have large enough annotated data to use as features to classify tasks. Therefore, we need an unsupervised approach like distributed representation to pretrain data and to embed words with similar meanings to similar vectors.

2.1.1

Word Embeddings

Word embeddings follow distributional hypothesis in a theory that words or terms with similar meanings are more likely to present in similar contexts. It typically serves as data preprocessing function in the first layer of the deep learning architectures. While it can also capture the target term’s neighboring context to calculate the degree of similarity between words. In most of the cases, word embeddings are pretrained and learnt in order to be applied to the large text corpus to capture syntactic and semantic features of the text document collections. It applies invariable and reusable embeddings to learn word representations in context.

2.1.2

Word2Vec

Word2Vec, created by Tomas Mikolov [2–4], refers to a group of models taking large text corpus with hundreds of dimensions as inputs to produce word embeddings. It maps each unique word in the text corpus to a corresponding vector in the space. And these embedded word vectors are positioned in a way that similar context-based words should be located nearer to each other. These models are structured as neural

Using Deep Learning Based Natural Language …

261

networks of two layers, aiming to rebuild the context of words among the entire text corpus. Although Word2Vec is not a deep neural network (DNN), it serves to transform texts into numerical values that DNN can understand without human intervention. In general, Word2Vec trains the inputs against all other neighboring words in context in one of the two ways, Skip-Gram or Continuous Bag of Words. Skip-Gram is a way of predicting target context with a given word. Specifically, the goal of the Skip-Gram model is to achieve the maximization of the average log probability by, T 1   log p(wt+ j |wt ) T t=1 c≤ j≤c, j=0

(1)

Given a sequence of words, w1 , w2 , . . . .wt , for training, where c is the training context size. The more training examples the model gets, the more accurate it can produce the results. The basic Skip-Gram model defining p(wt+ j | wt ) with the softmax function is presented as follow:    vw I exp vwO p(w O |w I ) = W    w=1 exp vw vw I

(2)

where vw and vw are the representations of vector inputs and outputs, W is the amount of words in the vocabulary. Continuous Bag of Words (CBOW) is a way of predicting a target word with given context. It is built on top of the bag of word concept. Bag of Words (BoW) is a method of simplifying representation in NLP. It represents a text as a bag of its words without restrictions on word orderings or grammar. It is often used for training a classifier to classify documents or texts by counting frequencies of each word as a feature. CBOW is a way of representing an unbounded number of features with fixed size of vectors when the number of features is unknown in advance. The CBOW works in a very similar way as the approach of BoW that can sum or average the embedded vectors of the corresponding vectors while ignoring the word order information: 1 v( f i ) CBOW( f 1 , f 2 , . . . , f k ) = k i=1 k

(3)

Weighted CBOW is a simple variation of CBOW where each vector manages to receive different weights, by associating a weight ai to feature f i , WCBOW( f 1 , f 2 , . . . , f k ) = k

k 

1

i=1

ai

i=1

ai v( f i )

(4)

262

R. Zhu et al.

Chalapathy et al. [5] propose an RNN approach with bidirectional long short-term memory model and conditional random field decoding to generate word embeddings, namely GloVe [6] and Word2Vec [3]. In order to practice concept representation and extraction, this proposed bidirectional Long Short-Term Memory (LSTM)Conditional Random Field (CRF) model allows every single word within a sentence to be fed in and mapped to a random word embedded vector first. Thus, the targeted word embeddings and model could be briefly described. Then, the model applies word embedding training methods of GloVe, Skip-Gram and CBOW to learn the entire data collection in order to generate vector representations. These sequences of vectors are thus fed into the RNN based LSTM model which is good at processing sequential data to produce a class of medical concepts. As the LSTM tend to favor the most recent input data, Chalapathy et al. computes both forward and backward state of hidden representation to eliminate the possible biases. Besides GloVe, the other most popular way of learning medical concept representations among current researchers in the field is to generate the distributed embeddings with Skip-Gram [4, 7]. Distributed representations of words in a vector space are able to group similar words together effectively to improve the performance of learning algorithms of natural language processing. The Skip-Gram technique [4] introduced by Mikolov et al. is an efficient method for learning vector representations of words from large quantity of unstructured text data. And it is able to predict the context, thus to capture relations between words.

2.1.3

Contextualized Word Embeddings

Learning high quality representations has never been an easy task. In the past few years, contextualized word embeddings have achieved impressive results and have been adopted in many of the recent deep learning NLP models. These contextualized word embedding approaches are considered to be ideal pre-training word representation models that can accurately capture complicated characteristics of syntactical and semantic word use, and understand how these uses change across different natural language contexts. ELMO, or Embeddings from Language models, is one of the deep contextualized word representation models. The word vectors are pretrained on a large text corpus and resulted as learnt functions of all the internal layers or internal states of a deep bidirectional language model (biLM). In other words, the results of these vectors, which are stacked above each input word or term for each end task, are linearly combined in the end to produce a final output. This unique design can boost the model performance significantly. When these functions serve as an “add up” onto the existing models, there can be significant improvements in the existing problems of the NLP domain. The biggest difference in between commonly seen word embeddings and the ELMO model is that ELMO word representations use the entire sentence as input. These embeddings are computed on top of the biLMs with character convolutions, represented as a linear function of internal network states. The representations generated from the model are contextual, deep, and character-based

Using Deep Learning Based Natural Language …

263

meaning that each word representation depends on the entire context, combines all layers of the deep neural network, and allows network to form robust representations for out-of-vocabulary tokens to be trained. The algorithm of the model is presented as follow: L    E L Moktask = E Rk ; task = γ task s task h kLjM j

(5)

j=0

   − →L M ← −L M LM Rk = xkL M , h k, , h | j = 1, . . . , L = h k, j k, j j | j = 0, . . . , L

(6)

LM where Rk is the representation of a set of 2 L-layer biLM +1, h k, j is the token layer, task task is the optimizing process, the s is the weights for softmax normalization, γ E L M Ok = E(Rk ; e) stands for the condition when ELMO collapses all layers in R into a single vector.

2.2 Convolutional Neural Networks (CNN) Convolutional neural network is a class of deep neural networks in deep learning that is commonly applied to computer vision [8] and natural language processing (NLP) studies. It is an analogy to the neurons connectivity pattern in human brains, and it is a regularized version of multilayer perceptrons which are in fully connected networks. Specifically, a CNN is made up of one input layer, multiple hidden layers and an output layer. The hidden layers structurally include convolutional layers, ReLU (activation function) layers, pooling layers, fully connected layers and normalization layers. Compared to other classification algorithms, CNN requires much less preprocessing, and it can get better results as the number of trainings increase. In natural language processing, a CNN is adopted to identify predictive features in local field from large text corpuses. The features extracted are then processed to generate vector representations in fixed size of the entire structure. Thus, in essence, CNN is an effective feature extraction architecture which can identify the predictive n-gram vocabularies in a sentence automatically. The basic convolutional and pooling model for NLP serves to adopt learnt nonlinear functions to every example of sliding window in the size of k word in sentence. These nonlinear functions act as filters to transform the k words window into a scalar value. By applying several functions on top, an l dimensional vector is generated with important characteristics of the words in the window captured. After that, pooling layers of the CNN architecture is applied to combine the generated vectors from each l dimensional vector, by taking the maximum and average values of l dimensions in each window. The purpose here is to target at extracting the most prominent features in the context without taking the location into account. The combined vector is then applied to a prediction network. The gradients serve to tune the parameters in the filter functions, and thus will emphasize those important aspects of the data in the initial given task. In general, as k

264

R. Zhu et al.

size window running over the sentence or text corpus, the filter function automatically extracts k grams features from the learning experience.

2.3 Recurrent Neural Networks The network connections between notes in an RNN form a directed graph along a temporal sequence, which allows it to exercise temporal dynamic behavior. Due to the characteristics of RNN that can process sequences of inputs, it is common for NLP tasks such as information extractions and speech recognitions to use RNN architectures.

2.3.1

Recurrent Neural Network

The simple recurrent neural network architecture is sensitive to the sequential ordering of elements. Mikolov explored it in 2012 and applied it in use of language modeling [9]. As suggested in the paper, a basic RNN model is a network of nodes organized in a set of successive layers where each node in the respective layer is connected with a one way directed connection with every single node in the next successive layer. These nodes are consisted of input nodes group, output nodes group and the hidden nodes group. Specifically, each neuron in the layers has a real valued activation which can change with the time, while each connection contains a real valued weight that could change with the time as well. By forming a directed graph along a temporal sequence with the connected nodes, the RNN model can exhibit a dynamic behavior by using the internal memory state to process the input sequences. It is worth noting that the basic RNN structure introduced above is less effective in training due to the problem of vanishing gradients. As the gradients in later steps cannot reach earlier input signals and diminish fast in the backpropagation process, the basic RNN model can hardly capture the long range dependencies.

2.3.2

Long Short Term Memory

LSTM was the first to introduce the gating mechanism to solve the vanishing gradients problem. It is one of the most successful types of RNN architecture in research field. The feedback connections that distinguish LSTM models from standard feedforward neural networks allow it to not only process single data points, but also entire sequences of data. A classic LSTM model architecture consists of a cell of memory unit, and three regulator gates, namely an input gate, a forget gate and an output gate. Theoretically speaking, the memory unit serves to keep track on the elements in the input sequence and their dependencies. The input gate is in charge of allowing to what extent the new value input can flow into the memory cell. The forget gate is in charge of allowing to what extent the new value could remain in the memory cell.

Using Deep Learning Based Natural Language …

265

And the output gate is in charge of controlling to what extent the value in the memory cell can be used in the activation function, or often known as the logistic function. In other words, in the proposed model, LSTM splits the state vector into memory cell, which aims to preserve memory and error gradients across time and working memory. Several smoothing mathematical functions that are capable of simulating logical gates are controlling these cells of memories. Within each level of input state, there is a gate applied to make decisions on to what extent should the new input be incorporated into the memory cell; and to what extent should the currently existing content within the memory cell be forgotten.

2.3.3

Gated Recurrent Unit (GRU)

To solve the problem of vanishing gradient or long range dependencies, Cho et al. [10] proposed LSTM and GRU respectively as gating-based architectures based on Hochreiter and Schmidhuber’s [11] theories presented earlier in the years. The GRU architecture uses update gate and reset gate as two vectors to decide which information inflow should be passed onto the output level. Compared to LSTM, GRU is able to train the models while keeping the long-ago memories during the training, without washing them off throughout the time or removing the irrelevant information for future predictions. Also, it does not involve separate memory component and also contains significantly fewer gates. In fact, the architecture of the gated recurrent unit model is similar to a long short term memory model with forget gate.

2.4 Transformer-Based Neural Networks Bidirectional Encoder Representations from Transformer (BERT) was proposed by Google in late 2018. Once it was released, it got all the attention from academia and industry to conduct further research. BERT is constructed with Word Embeddings and Transformer. In the model, word embeddings carried out in BERT is simply the low dimensional representations of words projecting onto the high dimension of vector space. Compared to conventional sequential models such as LSTM, RNN, GRU etc., the Transformer, presented by Google, is a new NN architecture that is more effective in modeling tokens’ long term dependences in temporal sequences. And it is more efficient in training while eliminating sequential dependencies from previous tokens. Instead of sequentially feeding in results, Transformer performs an encoder to decoder architecture where the model adopts an attention system to forward the entire big picture of the whole sequence to decoder as output. Therefore, BERT is a model incorporating all features above by using encoders solely. In fact, Google developed two specific versions of BERT model, namely B E RTBase and B E RTLarge . B E RTBase is a basic BERT model consisting of 12 transformer blacks,

266

R. Zhu et al.

768 hidden layers and 12 attention heads; while the B E RTLarge model is a much bigger model consisting of 24 transformer blacks, 1024 hidden layers and 16 attention heads [12]. GPT-2 proposed by OpenAI stole the thunder arose by BERT just a bit after. The model is a large transformer-based language model with 1.5 billion parameters trained on a dataset of 8 million web pages [13]. It is easy to train to make predictions on the upcoming word, given all previous words in the context, in 40 GB of Internet text. Indeed, it is capable of generating astonishing and promising results which elevated the NLP study furthermore. Particularly, it demonstrates the unprecedented capability of generating synthetic text samples. And the results have shown that it generally outperforms other language models which trained on the same domain without training on the domain specific datasets. The developers decided not to release data or parameters of the biggest model, therefore it will not be elaborated here further. It is worth noting that compared to the RNN and LSTM models, the Transformerbased neural networks (NN) are more hardware friendly. The existing problem with RNN LSTM model is that they are difficult to train, since the memory bandwidth bound computation is a must in process. It is the headache for many hardware designers since training the network takes up a lot of resources in the cloud whereas the cloud is not scalable in nature. Thus, the applicability of these solutions is limited. For example, running the LSTM model requires four linear layers which takes up great amount of memory bandwidth to be computed for each cell and for each sequence of the time step. Whereas for Transformer-based approaches, only a 2D convolutional based NN with causal convolution is required for the test, and the generated results can be even better.

2.5 Generative Adversarial Network Generative Adversarial Network (GAN) is proposed by Ian Goodfellow in 2014. It serves as a set of deep neural network architectures where two neural networks compete with each other, with one acting as a generator and the other acting as a discriminator. Specifically, the generator is in charge of generating all different sorts of data, including text, image, music, speech etc., that look close enough as the original training set; while the discriminator serves to identify whether the input data is an authentic real training data or a make-up data. The potential of the GAN model is huge as they are capable of learning to mimic any type and distribution of data, including EHR data in medical domain. When GAN is applied to the EHR systems, the generator in the model will create new and synthetic patient’s records that passes onto the discriminator. The mission of the generator is to produce fake “authentic” clinical data that the discriminator will not be able to catch. However, the discriminator’s goal is to take in both real and fake medical data and return with one

Using Deep Learning Based Natural Language …

267

or a few values of possibilities in between 0 to 1 (0 meaning fake while 1 meaning authentic) to represent how likely the given data is real. Many recent papers proceed to use GAN as their architecture for EHR studies, which will be further discussed in the next section.

3 Major Applications of Deep Learning in Medical Information Processing In the past few years, we have seen a rising trend of applying deep learning-based NLP techniques to medical information processing. The current goal of EHR management is to increase clinical efficiency by empowering the physicians with better and more user-friendly EHR systems. Moreover, the system has to help lowering the clinical diagnoses costs and minimizing the medical misdiagnoses possibilities. In fact, in the past few years, NN-based representation learning has gained promising results in many fields, and many natural language processing applications of representation learning have been developed. Hinton [14] introduced distributed representation for symbolic data in his paper in 1986. The idea is to form a word embedding layer by learning the distributed representation for each word in the given text. Meanwhile, Bengio et al. [15] presented it in the context of statistical language modeling, named neural net language models [16]. To measure how good the learnt representation are normally depends on how expressive the representation can capture features of the huge number of inputs behind [17]. In 2006, Hinton [18] initiated a breakthrough of greedy layerwise unsupervised pretraining in representation learning that many other scholars followed to same track quickly after [19–23]. The proposed method uses unsupervised feature learning to learn each level of the features separately, and then it consolidates the results from the previous layer. Specifically, by adding up the weights of each layer to the next, the model builds the deep neural network by learning representations in an unsupervised way. Thus, a final deep supervised predictor is generated from the raw data inputs directly. In order to study the current techniques that have achieved those purposes stated above, we classify the existing deep learning-based NLP techniques into three major groups, representation learning, information extraction and clinical predictions. And these three are considered to be the key technologies and applications adopted in the current EHR system.

3.1 Representation Learning (RL) The promising performance of deep learning models is primarily dependent on the data representations or features selections. Representation learning, known as feature learning, refers to a group of techniques that allows the systems or models to learn

268

R. Zhu et al.

representations of the raw data inputs for feature extraction, and to build predictors or classifiers. The tasks of representation learning can be supervised or unsupervised. The supervised tasks involve feature learning with labeled inputs, whereas the unsupervised tasks are the ones with unlabeled input data. In general, as the studies conducted both in academia and in industry grown rapidly, representation learning has been nourished by all new discoveries and gained empirical successes overall. Generally, there are three different ways of using the learnt word embeddings in the existing literature. First, the scholars choose to train the entire model directly as a supervised task with randomly initialized embedding matrix from end to end. This is an easy to adopt method, however, it completely skips the word embedding learning process, and thus could cause problems such as overfitting. Second, some scholars pick part of their data to learn word embeddings, and freeze them while training the rest of the model. Third, most of the conducted research in the past few years choose to use and train the entire word embeddings from end-to-end. As the deep learning approaches can perform better with larger amount raw input data, and most of the recent studies fall into this category, we are going to focus on the literature of the third approach only in this chapter. In the medical domain, representation usually involves learning a list of medical codes or notes which serve for administrative purpose or diagnosis and medication needs in patient’s EHR system. Unlike sentences which contain an ordered sequence of words, medical codes in patient’s profiles are randomly ordered. For the purpose of using these codes and notes as inputs to the machine learning models, representation learning is adopted to turn them into meaningful representations. Skip-gram, GloVe, CBOW, stacked autoencoders and BERT are commonly used NLP techniques to learn the distributed embeddings nowadays. In this section, the trending deep learning based NLP methods will be discussed in the following three subcategories: representations for learning medical concepts, representation for learning patients, as well as representations learning for clinical abbreviations disambiguation and abbreviation.

3.1.1

Medical Concept

Both the medical codes and clinical notes contain plenty of valuable information for physicians to do medical predictions and decision-makings. In a regular patient’s EHR profile, the unstructured data would take up a considerable proportion of his/her file. Doctors, nurses, physiotherapists and pharmacists each take in charge of one section of the general profile and fill in the relative information in unstructured format, known as free texts. The difficulty here for patients to approach to these notes is to understand those medical jargons and medical instructions. Whereas for researchers, these free texts are valuable information for producing effective clinical predictions, but they are also difficult to process. In reality, due to the different structures of the EHR systems across institutions, as well as the wide variety of medical jargons used by different healthcare providers, extracting useful information from the big clinical notes data pool remains as an unsolved problem.

Using Deep Learning Based Natural Language …

269

The heterogeneous nature of the medical data elements and the high volume of unstructured data make clinical care and medical analytics studies difficult. Most of the existing literatures learn features and representations by applying ontology mappings, or by exploiting information directly from the raw data inputs, for example medical notes or codes. Although the higher level of medical features such as disease phenotypes can reduce feature aspects to some extent, they may still not be able to understand the meaningful information embedded in patient data in the entire EHR system. Medical concept learning from patient’s medical notes is a dominant research subfield. Researchers and scholars all understand that many existing approaches to concept representation in medical domain still face data inefficiency challenges. They still depend heavily on hand crafted features and extensive domain knowledge that are difficult to define. To solve the problem, Choi et al. [24] take advantage of the medical codes’ encoded relationships, which are inherently in multilevel structure in EHR system, to construct their novel approach. Specifically, they propose a Multilevel Medical Embedding (MiME) architecture to learn the embeddings of the EHR data in multilevel, and to make clinical predictions based on the inherent EHR structures without the help of external labels. The prediction function is evaluated on two separate tasks of prediction, namely the prediction of heart failure and the prediction of sequential disease. As a result, the proposed MiME consistently outperform all other baseline models with significant percentage of improvement. Escudie et al. [25] demonstrate the possible way of learning low dimensional representations of patient’s visits using deep neural network to predict International Classification of Diseases (ICD) diagnosis categories when these codes are not provided. The deep neural network approach adopted in this paper takes both structured/semistructured data and unstructured free-texts notes in MIMIC-III as inputs. These learnt codes are pertinent to medical domain, meanwhile they can directly be used as inputs to DL or ML algorithms for future patient’s health status prediction and prevention. Choi et al. [26] proposed a different data driven approach to leverage EHR data directly for medical concept learning. Specifically, the method maps medical concepts to similar concept vectors close to each other depending on temporal co-occurrence relationships among raw data inputs. Furthermore, it is capable of transforming heterogeneous patient’s medical data in EHR system to clinically meaningful features. Hence, the patient vectors are constructed at the same time from the related clinical concept vectors. As a result, their proposed representation manages to generate patient representations by learning representations of medical concepts. In their paper [26], the authors presented the method based on Skip-gram [4, 7] to learn multi-dimensional vectors, and to capture the latent relationships between diagnoses, medications and procedures with multi-dimensional real-valued vectors. De Vine et al. [27] in their paper utilize the UMLS concepts to learn representations from patient records of free-texts and abstracts of journals. Basically, rather than directly learn representations from terms in free-texts, they propose a variation of neural language modeling to learn concepts from structured ontologies and to extract information from free-texts by preprocessing the medical texts mapping words to medical concepts in the UMLS. Then, the Skip-gram model is adopted to learn

270

R. Zhu et al.

word representations of these medical concepts. As a result, the empirical findings suggest that the proposed model correlates strongly to expert judgement of semantic similarity measures than existing benchmarks in medical domain. Choi et al. [28] in 2016 initiated a work demonstrating how to learn the medical concepts’ low-dimensional representations using neural language modeling as well. The novelty of this method is to learn representations not only from texts, but also from the abundant claims data. Besides the most direct way of learning medical concept embeddings from medical journals (MCEMJ), Choi et al. also introduced two novel medical concept embeddings for temporal data learnt from medical claims and a diagram of medicine, constructed from word co-occurrences in medical corpuses. The first embeddings, MCEMJ, is the embedding introduced in [27]. The second set of embeddings are learnt from a private health insurance company’s medical claims datasets. As the data contains many duplicate codes and multiple events happening in a short period of time, the authors apply partitioning and randomshuffling to the data before feeding into the Skip-Gram model. The medical claims data are partitioned into intervals first, then the duplicates are removed in each interval before being randomly shuffled to a sequence of concepts. Finally, the sentence is fed into the word2vec system to go through the stochastic gradient descent on SkipGram models. The last set of embeddings comes from the opensource EHR data collection. The authors learnt the representations in two ways, (1) based on the cooccurrence counts, the authors sample the graph edges proportionally to the edge weights, and then to feed these word pairs to Skip-Gram model; (2) to utilize the characteristic of Word2Vec, being implicitly factorizing the shifted positive pointwise mutual information (SPPMI) matrix of words and contexts [28]. Minnaro-Gimenez et al. [29] use the medical texts collections of PubMed, Merck Manuals, Medscape and Wikipedia to apply Skip-gram to different clinical texts and to practice the representation learning for medical terms. However, the results of the experiments are shown as a low hit rate of adopting the word2vec methods. The authors believe that the methods are not suitable for high precision required tasks such as retrieving medical concepts from restricted medical text data collection. Liu et al. [30] propose a multi-task framework for predicting diseases that can integrate the structured information like medical codes into the information-rich freetext medical notes. The proposed model is flexible enough to utilize both structured data with numerical values and unstructured data with free texts to generate vector representations of words or texts. In their paper, they evaluate the current deep learning methods of CNN, LSTM and hierarchical models on their performances of processing clinical notes in EHR systems. Meanwhile, they propose a novel approach to take negations in free texts into consideration towards clinical predictions. The results suggest that their approach can not only effectively do disease prediction within a prediction window, but also require no disease specific feature engineering which serves to affirm the benefit of deep learning approaches. Besides learning medical concept from patient’s medical notes, medical codes within the text-based patient encounters are useful insights to do concept representation too. Medical codes refer to a string of numbers and/or characters used by health providers to symbolize or describe diagnoses, disease types, exercised treatment,

Using Deep Learning Based Natural Language …

271

bills and costs, and applied medicines etc. Patient usually receives his/her own EHR report with a list of demographic codes serving for each hospital’s administrative purposes, a bunch of medical jargons with medical codes, as well as lab tested values. The most common medical codes include but are not limited to CPT Codes (Current Procedural Terminology), HCPCS Codes (Healthcare Common Procedure Coding System), ICD Codes (International Classification of Diseases), ICF Codes for Disabilities, Diagnostic Related Grouping (DRG), NDC Codes (National Drug Codes), CDP Codes (Code on Dental Procedures and Nomenclature), and DSM-IVTR Codes for Psychiatric Illnesses. However, all the medical codes that seem to be common knowledge for health providers are difficult for the public to understand the meanings behind. Hence, it is necessary for researchers and scholars to use these codes as inputs to feed into the models to generate the perceivable information for the public, as well as to produce credible clinical predictions. In general, there are two approaches for physicians to make clinical decisions with medical codes extracted from patient’s medical profiles. A more straightforward approach is a static one to predict the medical outcomes by feeding models a single set of inputs for only one time. For example, Choi et al. [26] propose to feed in the EHR data directly for models to learn heterogeneous concepts and patient representations based on co-occurrence patterns. This effective method of medical concept as well as the patient representation learning uses single inputs to generate the results of a possible heart failure (HF). Meanwhile, it serves to link up relevant concepts and to boost the performance of predictive modeling. A more complicated approach is dynamic to predict the medical outcomes by feeding models a sequence of inputs. The models are capable of producing clinical decisions after each EHR raw input is fed in or after the entire sequence of EHR data points are learnt. For example, Choi et al. [31] leveraged a large dataset in EHR system to develop a temporal predictive model, Doctor AI, for learning observed medical conditions and uses, which will be discussed further in Sect. 3.3. In 2016, Choi et al. [32] approached this issue by proposing an algorithm named Med2Vec and structuring a dataset which consists of patient visit records, diagnosis codes (ICD9), lab test results(LOINC) and drug usage(NDC). Since the Skip-Gram can predict the context and capture the relationship between words by learning word representation vectors, it is necessary to convert the medical codes used in the study into an ordered form of (target, context) pairs. Thus, they define the (target, context) pairs at each patient’s profiles level, instead of the sequence of medical codes level. By doing so, Choi et al. aim to learn medical concepts representations effectively and efficiently. Besides, the authors were also able to make predictions to patient’s neighboring visits by representing his/her medical records as binary vectors, and to further feed into a two-layered neural network. Another popular representation learning technique applied to medical concept and event extraction is bag of words (BOW). Li et al. [33] propose an embedding learning method that incorporates word’s distributional characteristics into medical event extraction. Their model uses BOW features as baseline, and the results generated from the word embedding feature learning are promising since the n-gram effectively enriches the context information.

272

R. Zhu et al.

Tang et al. [34] apply feature learning procedures such as bag-of-words (BOW) and part-of-speech (POS) to their study which are different from the GloVe and SkipGram approaches. In their experiment, Tang et al. adopt a neural language model to generate word embedding vectors from the biomedical corpus, and the experimental results are a bit better than the existing works. Gong et al. [23] evolve the BOW representation learning method by altering it to bag of events (BOE) in their study. The BOE stands for the number of events occurred in the first 24 h of their stay. In their paper, the authors aim to map databasespecific representations to a shared list of medical concepts. Hence, the model can transfer itself across databases. Meanwhile, the Item ID feature is constructed as a new identifier pair consisting of each patient’s unique (ID, text value). Lastly, the representations are converted to the UMLS concepts by a frequently used tool for identifying UMLS concepts. Indeed, no matter if the medical codes are fed as a single set of inputs or a sequence of inputs, they are also common sources of data similar to common medical notes to serve for medical concept representation and the final clinical decisionmaking processes. However, since the medical decision-making process is complicated, researchers and scholars should never consider only one single type of data as inputs to generate effective clinical predictions.

3.1.2

Patient Representation

The purpose of learning patient representation is to map raw information existed in the patient’s medical notes to a dense vector that can be used for future clinical predictions and analytics such as disease phenotypes or clustering tasks. Miotto et al. [35] propose a study based on EHR datasets that uses a different approach of unsupervised deep representation learning method to help clinical predictions as well as to layout a general-purpose patient representation. The paper experimented on over 700,000 patients from the Mount Sinai data warehouse and designed a three-layer stack of denoising autoencoders for these “deep patients” to understand the hierarchical regularities and the dependences among the EHR systems. The experimental results have proved that the proposed design performs significantly better than those representations generated directly from raw EHR data inputs. Particularly, Miotto et al. use multi-layered DL neural network to discover patient representations. Each layer in DNN serves the next layer by producing higher level representations of the observed patterns from data inputs of the previous layer. As each layer generates a relatively more abstract feature than the layer before, the last layer of the network outputs the final result of the patient representation by consolidating all the previous inputs. To continue the discussion of Choi et al’s. [26] work in 3.1.1, because of the impressive features of the Skip-Gram [4], the word vectors here are capable of doing word analogy calculation for both syntactically and semantically meanings of the corpus. Thus, the patient representation is generated directly by summing up all

Using Deep Learning Based Natural Language …

273

vectors, from the conversion of all medical concepts in his/her profile to medical concept vectors, to a single representation vector. Dligach et al. [36] consider an alternative way of learning patient representation by applying text variables only with a deep neural network. In their proposed work, Dligach et al. use billing codes, for example ICD 9 or CPT, as a source of supervision to learn patient vectors first. Then, they train the proposed model together with a set of UMLS Concept unique identifiers (CUIs) generated from the clinical notes in patient’s profile to predict all billing codes might be associated with the patients. The results from the experiments prove that these learnt representations with the new method are good enough to reach the currently existed performances on standard comorbidity detection tasks. Zhang et al. [37] propose a computational framework named Patient2Vec to learn patient representations while overcoming the interpretability problem of the deep learning architectures. It learns each patient’s personalized deep representation of longitudinal EHR data. For purpose of evaluating their proposed method, they utilize it to predict the future hospitalizations with EHR data from real hospitals. Moreover, they also compare the method’s performance on clinical predictions to other baseline models. The proposed architecture consists five parts: the learning vector representations of medical codes with Skip-Gram, the learning within-subsequence self-attention with one-side convolution operation with a filter and a nonlinear activation function, the learning subsequence-level self-attention with a bidirectional GRU-based RNN, construction of the aggregated deep representation by adding patients characteristics such as demographic information and static medical conditions, and the prediction of outcome with a linear and a softmax later. Indeed, the Patient2Vec model is able to produce meaningful structures of vector space and to outperform baseline models with a significant percentage. Denaxas et al. [38] propose a method of learning word embeddings for disease diagnoses and medical procedures using global vector (GloVe) base on the national UK EHR system. They leverage the learnt patient representation to evaluate their performance on identifying patients who are more likely to be hospitalized due to the congestive heart failure. Specifically, they adopt GloVe model to four different corpuses created on their own to learn the word embeddings for concepts. After that, they evaluate the learnt and normalized patient-level embeddings by predicting heart failure onset to be tasks of supervised binary classification with linear SVM classifiers. The experimental results are able to produce marginally improved performance on clinical predictions compared to the current conventional one-hot models. Thus, it can potentially enable us to build robust EHR-based disease risk prediction models in the near future. Wei et al. [39] propose an end-to-end based clinical decision support system which is able to generate and retrieve relevant information and literature for target patients with distant supervision. The experiment uses GloVe to train Wikipedia texts and Word2Vec to train biomedical texts in order to train model for ICD codes prediction from raw text inputs. Note all the raw input data are drawn from the MIMIC-III data collections. Then, the Deep Relevance Matching Model (DRMM) is adopted as a semantic matching model to learn the terms. After that, user’s query

274

R. Zhu et al.

and the candidate documents are split into different paragraphs, while the word embeddings are replaced with convolution embeddings of paragraph level. Lastly, cosine similarity is computed to calculate the direct semantic similarity scoring to output the final results. Their experiment shows a promising result with substantial improvement in the information retrieval tasks. Zhu et al. [40] introduced both supervised and unsupervised methods to evaluate patient similarity also with temporal properties matching of patient’s longitudinal data in EHR system. In fact, the authors suggest to define a unique medical context by those medical events that are happened before and after it, and thus use a fixed-length representation of vectors to express medical concept embedding and to make further predictions. With the embedded matrix of patients representations, the supervised and unsupervised methods are adopted to measure similarity. Specifically, the supervised approach adopts a CNN architecture to learn an optimal representation of patient’s medical record in the EHR system and to map the convolutional filters towards the fixed-length of feature vector; whereas the unsupervised approach applies the RV and dCov coefficients respectively to learn the linear and nonlinear relationships between patients. As a result, these experiments run on testing data outperform the baseline models significantly. They also suggest possibilities of future study towards the same direction. Liu et al. [41] tackle the medical events and patient representation problem differently by distinguishing long time scale medical events with strong temporal patterns from short time scale medical events with disordered co-occurrences. Thus, to accommodate clinical events happened in different time scales, Liu et al. propose a model to learn hierarchical representations of the sequence of events, that are adaptive to different time range events and can capture core temporal dependencies. To be detailed, the proposed model splits the entire sequence of medical events into several groups of events with an adaptive event sequence segment module using RNN first. Second, the model learns the event sequences’ hierarchical representations with two different mechanism, namely the event attention with aggregating GRU event group function and temporal attention with GRU sequence representation. The experimental results outperform most of the existing models and suggest promising results of predictions on deaths and ICU admissions.

3.1.3

Clinical Abbreviation Representation

Abbreviations appear frequently in the EHR systems. A study conducted by a popular online knowledge base has shown that among the 3,096,346 stored abbreviations, 197,787 records are in the medical domain. The number is ranked to the top among all ten domains (www.allacronyms.com) [42]. Disambiguation of the clinical abbreviations is a special example of disambiguation in word meaning. The disambiguation and extension of clinical abbreviations in the medical texts have been important tasks in the medical research. Since there are no universal dictionary or rules of recording clinical abbreviations, healthcare providers, such as doctors, pharmacists and nurses,

Using Deep Learning Based Natural Language …

275

tend to use their own abbreviations to denote certain diagnoses, treatments or medications in patient’s medical profile. Thus, it becomes one of the most difficult tasks to study the ambiguous abbreviations in EHR system, especially in the intensive care unit (ICU) where medical notes are taken in high pressure of workload and limited time. Therefore, a deeper understanding of the abbreviations in clinical notes would not only help medical researchers to understand diseases better, but also to enhance the healthcare service quality more effectively and efficiently. In [42], Liu et al. initiate to learn word embeddings for clinical abbreviation expansions by exploiting task-oriented resources. They explore the domain for two purposes: (1) to effectively reduce misinterpretation of the clinical abbreviations by normalizing all abbreviations used in ICU documentations; (2) to allow the public to understand the abbreviations in the medical free texts better. Specifically, based on the intuition introduced by Harris in 1954 [2], Liu et al. exercise word embedding or distributional semantic representation to learn the meanings of an abbreviation in the given medical context without labelled input data. They use Word2Vec [3] to learn word embeddings first. After that, they used traditional approach to use regular expressions to detect all medical abbreviations in the ICU notes, the possible candidates of abbreviation expansions are then generated from specific domain knowledge base [42]. They compute the expansion of abbreviations, which is a multi-word phrase in most cases, by defining Candidate Ci as the group of the words of the candidate list, following by similarity computation. Although Liu et al. did not apply deep learning to continue their experiment, their method still significantly outperforms all base line methods and achieves 82.27% accuracy. Wu et al. [43] examines the use of neural word embeddings applied in clinical abbreviation disambiguation and develops two new word embedding methods, named LR_SBE and MAX_SBE, to generate word sense disambiguation representations from a large unlabelled medical corpus. Li et al. [44] proposed the method of Surrounding based embedding feature (SBE) in 2014 which serves as a foundation Wu et al.’s the next step of the study. The target SBE word representation is learnt by consolidating the embedded row vectors of all neighboring words that are existed in the given k size of the window. Built on top of the SBE, Wu et al. assume that the direction would help to learn better word representations. Similarly, the MAX-SBE representation takes the same approach by learning to take the maximum value of the embedding dimensions of the surrounding words. The authors present the intuition as the higher score of a latent feature gets, the higher importance should the word win.

3.2 Information Extraction (IE) Information extraction is considered to be a classical and widely used task in NLP. It refers to a set of techniques extracting target structured information from unstructured natural/ human language texts. Although there has been a rise of the studies and attentions on information extraction tasks in the past due to the availability of the

276

R. Zhu et al.

data volume, the development of the information extraction still remains in narrowed and restricted domains because of the relatively higher degree of difficulty. Indeed, information extraction in the medical domain has always been one of the most important tasks, especially after the adoption of the electronic health record system. In healthcare, extracting information from patient’s EHR profile involves learning and extracting medical information from the doctor’s notes, ambulance records and prescriptions etc. to be used in machine learning or deep learning models. It is not only necessary for all related health practitioners to know the patient’s health conditions more efficiently and thoroughly, but also important for the researchers, scholars or policy makers in the healthcare sector to understand the diseases and patient groups better in order to provide better diagnoses, treatment, intervention and even prevention. In the EHR system, text-based clinical notes are one of the most important and informative resources to study about the patients. However, as we mentioned before, the biggest challenge here is that doctors, nurses, physiotherapists and pharmacists normally complete each section of the same patient’s file separately without referring to each other’s notes. On one hand, it is time consuming and unnecessary for them to flip through the documents; on the other hand, each health practitioner has his/her own preferred medical jargons that it is hard to align in nature. Therefore, these different medical wordings and representations create difficulties for information alignment in the patient’s EHR system. In other words, the most challenging task in the current EHR system is to align all the EHRs that are structurally different. In fact, it is already time consuming to align the information in medical profiles, not even to create a pipeline for these EHR notes processing, term extractions, embeddings or aligning over years and across hospitals or other health institutions. Thus, it is necessary for current research and studies to find out what the entities mean according to different context. In general, there are three ways of doing information extraction on medical corpuses. A traditional way of extracting information is to follow rule-based approach and do it manually. However, this method is not only financially too costly but also time consuming. The second way of doing information extraction is using traditional machine learning approaches. These methods are more efficient than rule-based approach but also involve certain degrees of human involvement, therefore they can be costly as well. Thus, a more effective and efficient way of extracting information had been developed recently, known as deep learning-based approach. Indeed, many recent papers and studies on medical information extraction tasks start using this third approach, deep learning, as they can greatly benefit from lower cost and no human interventions at all.

3.2.1

Name Entity Recognition

The current deep learning (DL) approaches to entity recognition are categorized into three major groups. The classical rule-based approach usually applies keyword matching and assign document level labels to the study; RNN approach which requires large datasets of annotated entities; while the Transfer Learning approach

Using Deep Learning Based Natural Language …

277

which uses language modeling to extract the biomedical name entity recognition (NER). To eliminate the limitation of using large amount of annotated entities as prerequisites for training, most of the recent studies adopt the second and the third approaches for entity recognition. Gligic et al. [45] introduce a novel approach with transfer learning which can overcome the problem of neural network model’s dependency on large labelled data and data scarcity issue for name entity recognition. Specifically, neural networks as a more robust structure is adopted to experiment on all datasets released by I2B2 (2007–2012). Then, both CBOW and Continuous Skip-Gram (CSG) are adopted to train embeddings to feed in three term classification architectures, namely context free feedforward NN, context aware feedforward NN, and a RNN based LSTM model. The introduced method extracts information on medications, dosages, modes, frequencies, durations and reasons individually first, and studies the relationship between them with a sequence to sequence Bidirectional RNN model comprising one hundred GRUs versus a Bidirectional LSTM encoder-decoder framework. Sachan et al. [46] tackle the problem by using unlabelled text data to achieve better results of the NER models. Specifically, they train a bidirectional language modeling (BiLM) on unannotated data from PubMed abstracts as a transfer learning approach to pretrain the NER model weights with same architectures of BiLM. The results generated from this training above are initializing better parameters for the NER models and improving F1 scores as the speed of convergence with less data inputs. Hence, the transferred weights of the proposed model along with the pretrained word embeddings allow the authors to practise end-to-end learning as well as the supervised NER tasks. Gorinski et al. [47] take a different perspective by comparing the three dominant systems, (1) rule-based EdIE-R, (2) a bidirectional Long Short-Term Memory combined with deep learning-based conditional random field, EdIE-N, and (3) transfer learning based SemEHR with GATE Bio-YODIE. They evaluate these three architectures on performances on name entity recognition from patient’s stroke records in the brain imaging reports. By trainings on common data set, the experiment is able to identify the advantages and disadvantages of these three different systems. Moreover, it can also construct rules and empirically evaluate the performance of each system. As a result, they believe although machine learning approaches can be easier to adopt, the rule-based handcrafted system remains as the most accurate and trustworthy source of labeling EHR contents automatically. Other related research about entity extraction on genomic data include Yin et al. [48], Huang et al. [49, 50], and An et al. [51]. An et al.’s work [51], constructed on top of [48–50], propose a new metric to evaluate the novelty and relevancy of a medical term in information retrieval based on the aspect-level performance measure provided by TREC Genomics Track. The experimental results show that the proposed geNov metric is superior than the existing metrics in discovering the novelty, redundancy and relevancy in the ranking process. Moreover, it is considerably sensitive to novelty and relevancy of a medical term, and the proposed three parameters are highly tunable according to different evaluation requirements.

278

3.2.2

R. Zhu et al.

Relationship Extraction

Both entity recognition and relationship extraction are standard tasks in natural language processing. In biomedical research, besides name entity recognition, it is also necessary to extract biomedical entities relationships from texts. Many of the existing literature apply feature-based pipeline models to do relationship extractions which could cause problems such as error propagation, extracting subtasks without interactions, and heavy work needed on feature engineering. To overcome the issues stated above, deep learning based natural language processing techniques are commonly applied. Li et al. [52] present a competitive and effective neural joint model for practicing relationship extractions with minimalizing the work on feature engineering. This novel approach uses CNN first to encode the word characteristics to a corresponding character-level representation. Then, the generated character-level representation, word embeddings and part-of-speech embeddings are input into the RNN based BiLSTM model to learn entity representations and the related context for medical entities recognition. After that, the relationship representation along shortest dependency path (SDP) of the two target entities is learnt by a second BiLSTM model get relationship classifiers. The parameters of the LSTM units in both BiLSTM RNN models are shared, therefore those parameters used in the first part can affect the second in training in entity recognition and relation classification tasks. Mehryary et al. [53] also propose an extraction approach based on LSTM model with syntactic dependency graphs (SDP) and Skip-Gram model to get word embeddings. Similar to Li et al.’s work, Quan et al. [54] presented a multichannel CNN to exercise automatic relation extraction in medical domain in 2016 to tackle drugdrug interaction (DDI) extraction and protein-to-protein interaction (PPI) extraction problems. This proposed method also eases the complicated feature learning work by CNN base automated feature learning technique. CBOW was used in the study to capture information from the entire medical corpus on Medline, while all other word embeddings are borrowed from Pyysalo et al.’s study [55]. As a next step, five versions of the word embeddings from PubMed, PMC, MedLine and Wikipedia are consolidated within the multichannel word embedding input layer. In fact, the multichannel word embedding used in the model outperformed the current best DDI Extraction models by 5.1%. In the convolutional layer, the generated embeddings are filtered to produce n-grams of extracted information by adjusting the window sizes. Thus, the max pooling layer would be able to distinguish the most important local features while reducing feature dimensions effectively. Last but not least, the softmax layer does the final relationship classification based on the information consolidated from all above. Cheng et al. [56] focus on medical information extraction from patient’s EHR system for disease phenotyping. The authors construct a temporal matrix representation, with time on one dimension while events on the other, for each patient in the EHR system. Then, the deep learning approach adopts a four layered CNN to extract phenotypes and predict future medical events. Specifically, the first layer of the framework consists of temporal matrices of EHR. The second layer then performs

Using Deep Learning Based Natural Language …

279

a one side convolution to extract the features from the first layer. Similar to the works presented above, the max pooling layer on the third level eliminates certain sparse data points to leave the most important ones stayed. Lastly, a fully connected layer with softmax activation function is in structure to output the predictive results. Zeng et al. [57] learn the relationship between medical notes in EHR and the identification of distant recurrences of breast cancer closely in their paper. To overcome the challenge of relying on manual charts reviews to discover the possibilities of breast cancer’s distant recurrences, they design a hybrid model to work with clinical narratives and structured data from EHR system only. Specifically, the model first extracts medical narratives features with MetaMap while retrieving the structured EHR medical data from the system directly. Second, a linear kernel type of support vector machine is adopted as a prediction model to learn and identify patients that are potentially distant recurrences in breast cancer. The model consists of four baseline classifiers that are adopted here to learn different types of the features, both structured and unstructured, and to achieve the best results. Generally speaking, the model gives promising results by combining feature elements extracted from unstructured clinical text-based notes and from structured data in EHR system to diagnose distant recurrences of the breast cancer. Galko et al. [58] learn a broad scale of relationship extraction by retrieving relevant passages from publicly available data and BioASQ tasks. In other words, they achieve to retrieve passages in a question answering form. To be detailed, they use the neural network word embeddings to propose a weighted scheme for cosine distant retrieval. The paper first projects the terms into semantically meaningful vector spaces which are learnt from Word2Vec or GloVe. Thus, both the query questions and the retrieving passages are all represented in fixed-length of vectors. Then, each term in the space is able to link each other with cosine distance functions. Lastly, with the given representation and similarity measurement, the passages are clustered and ranked in list to generate the final results. The proposed method has proved to outperform traditional models with this cosine distance text matching scheme significantly, and future work in this direction is possible to be applied on broader range of topical domains. Li et al. [59] also utilize the convolutional neural network and distributed semantic representation to exploit binary event relation extraction tasks. Specifically, the study employs CNN to model raw data inputs with word embeddings from medical texts by convolutional layer and max pooling layer. As a result, the most important features are generated automatically from the Max Pooling layers and thus contribute directly back to relation extraction tasks.

3.2.3

Medical Event Extraction

Among the existing literatures of deep learning NLP techniques applied in medical information extraction, event extraction has an important standing in the research subfield. A medical event refers to a change that has been made in patient’s medical records. Those medical events can be insightful and useful for discovering abnormal

280

R. Zhu et al.

clinical decisions and applications that could cause serious problems such as patient’s negative reactions to certain treatments and medications. Rahul et al. [60] apply bidirectional recurrent neural network (RNN) to sequential labeling for medical events extractions and the understanding of unstructured clinical texts in EHR system. The proposed RNN model avoids using time-consuming handcrafted features generated by NLP toolkits, and is able to extract higher level of features directly from the sentences in text corpus to achieve comparable F1-scores on Multi Level Event Extraction (MLEE) corpuses. The input layer uses embedded representations of words to learn a higher level feature representations, layer by layer, until it gets the final classification. Specifically, for the input feature layer, the proposed method extracts two types of features from each single word in the text corpus, namely the word and the entity. Then, in the embedding layer, each feature input is mapped to a dense feature vector for the next layer to use. In the bidirectional RNN layer, each word is learnt by both forward and backward RNN to capture representations of the past and of the future. In this way, the entire context is learnt within the neural network. Similarly, Jagannatha et al. [61] in 2016 try to tackle the EHR semantic understanding problem by sequence labeling for medical events extractions. Conditional Random Fields (CRF) is used as a baseline model to compare results from the experiments. Initially, for the purpose of ensuring unbiased representations of infrequent words, the system trains word vectors from large data corpus in the embedding layer with Skip-Gram techniques. Then, as the words are assigned to the representations of corresponding vectors, they are also input into the double chained long short term memory model for training in both forward and backward directions. The output of the bidirectional long short term memory layer, an output of the combined representations of both words and the related context, is then input into a feedforward neurons with Softmax functions producing those rates of probability. Meanwhile, the paper also utilizes another recurrent neural network based algorithm GRU [62] in the same structure to train the input data as the LSTM structure. The experiments have shown that RNN models in general are valuable techniques to extract medical events from the large amount of EHR corpus. The improvements achieved by these models, especially GRU, suggest that the capability of RNN models to remember information across different ranges and dependencies of contexts is very important for effective information extraction.

3.2.4

Generalization and Summarization

Natural language generation is one of the NLP tasks focusing on natural language generation from structured data sources such as the knowledge base or a linguistic logical form. The technique can be applied on either long or short tasks, which may be content summarizations and news reports, or product descriptions on online shopping website respectively. In the past few years, DL approaches have made huge progress towards the language generation tasks. Ideally, the natural language generalization is trained as end to end NN models consisting of an encoder and a decoder. The

Using Deep Learning Based Natural Language …

281

encoder will serve to produce the hidden representation of the source text while the decoder aims to generate the target text. Choi et al. [63] apply the natural language generalization techniques to tackle the problem of data scarcity. They propose a deep learning based generative adversarial network (GAN) model to synthesize data in EHR systems. The model is capable of learning distributions of the count-valued and binary-valued variables with two neural networks. The first one serves to generate fake records while the other serves to distinguish which records are real and which records are fake. The advantage of this system is that the GAN model is able to generate patient level records which are needed for the study while keeping the patient’s personal information in privacy. However, the GAN system proposed by Choi et al. could only produce discrete data, and fails to produce free-text records of the EHR system which are valuable for the research community. Lee [64] introduce an end-to-end DL encoder-decoder algorithm to build synthetic chief complaints from the electronic health record discrete variables include age, gender and exercised diagnosis. These generated synthetic chief complaints take advantage of the optimization process of the model, which allow them to eliminate the comparably uncommon medical abbreviations and misspellings, while protecting the patients’ privacy with de-identification characteristic by preserving no personallyidentifiable information (PII). Those chief complaints are preprocessed with LSTM model to downsize the matrix of word embedding. The encoder is constructed with a single feed forward network layer to compress records to LSTM cell, while the decoder is also a single layered LSTM model following be Vinyals et al. [65]. Following the same concept, the word embedding matrix is adopted to transform the complaints from integer sequences to dense vector sequences, while the softmax activation function and a feedforward layer are applied to deliver the final output of the predicted word probabilities. Besides natural language generalization, text summarization is also a common problem and research subfield in natural language processing. Text summarization refers to the creation of brief, accurate and fluent summary of a longer piece of text corpus. Being able to summarize text automatically will help not only to discover and extract relevant information easier, but also to consume them more efficiently. In general, there are two different ways of summarizing texts in natural language processing tasks. The first is extraction-based summarization. The extraction-based techniques refer to the set of algorithms and models that can pull out key terms and phrases from the source document and can join them fluently into a summary in the end. The second approach is an abstraction-based summarization. This approach is based on the techniques of re-paraphrasing and shortening the pieces of information contained in source documents. In other words, the abstractive summarization methods are able to create or rewrite new terms, phrases and sentences like human beings to relay the most important information from the source documents. Thus, algorithms for extractive summarization is still relatively more popular as the abstraction based ones are more difficult to develop and adopt. Liu et al. [66] apply the extractive summarization technique to the data in EHR system in medical domain. They used an unsupervised pseudo-labeling approach to

282

R. Zhu et al.

study how to make use of the intrinsic correlation between different data in EHR. Their proposed method is capable of generating pseudo-labels while training the supervised models without any external sources of annotated data. For purpose of finding a subset that can give the best summary of the entire document of patient’s information, they train supervised model without direct human annotations. Then the intrinsic correlation between medical notes and the patient is used to find pseudolabels and produce summaries to find out the answers to three research questions they proposed [66]. As the model proceeds to these questions, the system answers the RQ1 by learning the clinical entities relate to specific disease. For RQ2, the model generates binary label vectors for notes and applies Integer Linear Programming to train data with pseudo labels while optimizing the results. For RQ3, the medical records are summarized by a supervised neural model, the two layered Bidirectional GRU. In general, the study confirms the effectiveness of the proposed model in text summarization task by showing it outperforming other existing unsupervised baselines. It can also be improved in future to further help physicians to understand medical histories of patients better while reducing clinical costs even more.

3.2.5

Information Extraction on Specific Disease

Datta et al. [67] released a scoping review of the existing medical NLP techniques applied in cancer study earlier in 2019. It aims to provide a valuable resource of the cancer frames annotations as well as the related natural language processing tools on general purpose. The paper summarizes the trending NLP techniques, that are able to learn useful features related to cancer, from the EHR system with a wide range of data collections. By reviewing 79 papers, the authors create frame semantic principles with pertained information including cancer diagnosis, tumor descriptions, cancer procedure, breast cancer diagnosis, prostate cancer diagnosis and pain in prostate cancer patients. [67] The study reviews that most of the recent work have a specialization on information extraction towards treatment and breast cancer diagnosis, meanwhile cancer diagnosis amounts the top one focus of all the reviewed papers, with a quantity of 36 out of 79.

3.3 Clinical Predictions (CP) Clinical Prediction contains uncertainties in its nature as a probability problem. Indeed, the clinical predictions are supervised tasks performed by researchers, physicians or other healthcare providers to make future predictions. Specifically, they go through both structured and unstructured data in patient’s EHR profile and identify the probability of a specific disease or outcome by leveraging the learnt representations, signs, symptoms and codes. Then a calculated probability score is given to the

Using Deep Learning Based Natural Language …

283

patient to predict the likelihood of certain disease, diagnosis or outcomes. The probability scores provided by physicians are heavily correlated to one’s clinical experiences. Therefore, the computer-based technologies providing data to physicians act as the best human physician’s assistant in the prediction process. These technologies include but are not limited to understanding medical codes, reading behind clinical notes, interpreting time-series data, and handling medical scans. Among the existing literature, the clinical prediction tasks are split into two subfields, the general clinical predictions and the specific disease targeted medical predictions.

3.3.1

General Clinical Predictions

Zeng et al. [68] reviews and compares the traditional NLP methodologies as well as the DL-based NLP techniques used in disease phenotyping of the EHR system in the past few years. This paper gives a thorough review on the current applications of the EHR-based computational phenotyping, as well as the NLP-based computational phenotyping methods. On one hand, traditional keyword search and rule-based approaches give promising results for the prediction task, however, these methods require human to compute manually which is very costly. Supervised machine learning models that are able to perform data pattern and structure classification are instead popular among the researchers because of their capability. On the other hand, as the DL methods grow to be more important in the natural language processing field, more studies begin to conduct deep learning approaches due to its power of generating novel phenotypes. As a result, despite of posing some opportunities and challenges remained in the field, this paper also concludes that a combination of multiple sources of the data information from the EHR system would produce better performance in general. Rajkomar et al. [69] present a deep learning-based patient’s representation with his/her entire medical record in the EHR system using Fast Healthcare Interoperability Resources (FHIR) format. With the help of sequential format and the procedure of patient de-identification, they study 46,864,534,945 data points generated from a sample size of 216,221 patients in their adulthood who had been hospitalized for 24 h minimum in the two American academic medical centers. Since the patient records contain different data points in length and in density, the authors proposed three different deep learning models to tackle the issue. The first model used is the LSTM based on RNN; the second is an attention-based time-aware NN model; and the third one is a NN built with boosted time-based decision stumps. All experimental results show that the proposed method outperforms all other traditional predictive models used in the current clinical studies, and is able to predict multiple clinical events happened across multiple medical centers accurately without harmonizing data with specific sites. In future, this method could have the potential to extend to a variety of scenarios due to its promised accuracy and scalability. Zhang et al. [70] present a novel meta learning approach to predict clinical risks from longitudinal patient EHRs, named MetaPred. The MetaPred uses a list of related risk prediction tasks to teach and train the meta-learner how to learn a good predictor

284

R. Zhu et al.

for predicting target risks where patient data is in limited. Specifically, the MetaPred framework is built on the model agnostic meta learning strategy to generate risk predictor from specific domain. Meanwhile, the meta-learners can directly serve as inputs into the risk prediction function, while those limited data can help to boost the model performance further with fine-tuning. The risk prediction models adopted in the experiments are either computed on CNN or LSTM based RNN. The experimental results conducted on real patients’ data provided by Oregon Health and Science University show that the CNN and the RNN based MetaPred predictor can outperform all other predictors trained with limited samples significantly. Hosseini et al. [71] introduce a heterogeneous information network named HeteroMed to run predictions on accurate and robust clinical diagnoses with highdimensional data and abundant relationships within the EHR data. The suggested model can get higher level semantic relationships between words and terms in EHR system for disease diagnoses with heterogeneous network embedding, while handling the missing values and heterogeneous data directly. Furthermore, it can also empower its joint embedding framework to accommodate the representations of medical events to the goal of disease diagnoses. As the very first study to model clinical data and disease diagnoses with Heterogeneous Information Network (HIN), the HeteroMed achieves significantly better results over other existing literatures in diagnoses codes extraction and disease prediction. Avati et al. [72] present a scoring rule and a generalization of continuous ranked probability score (CRPS) to make survival rate predictions, named Survival-CRPS, as well as two variants of right and interval-censored. Aside from that, in order to evaluate the quality of event predictions over time, Survival-AUPRC evaluation metric is proposed to compute a precision-recall like curve. To prove the efficiency of the introduced techniques, this paper runs experiments on EHR data with a multilayer deep RNN model to test the accuracy of the patients’ survival rate as the prediction model. The model intakes a sequence of features to predict the mortality probability over time. And the results from the extensive experiments prove that the proposed RNN method dominates the success of large-scale survival predictions with lognormal parameterization. Chung et al. [73] takes back the scope of the research from great population back to individualized and reliable patient-centric prediction model. The proposed framework aims to extract useful information from the EHR system to make promising predictions and to provide tailored clinical services of disease diagnoses, treatment, intervention and prevention with time-series data. The framework consists two parts: (1) a globally developed section which could capture trends across various groups of patients; (2) an individualized section to model tailored services for each patient. To combine the two sections together, a RNN model to capture global patients trends versus a Gaussian Processes probability model to capture individual patient’s characteristics are built together on top of a deep RNN foundation to make clinical predictions more accurate and credible. Heo et al. [74] propose the input-dependent uncertainty notion to attention mechanism in their work, realizing that the attention mechanisms can sometimes be unreliable when they are generated from weakly supervised networks. Indeed, the newly

Using Deep Learning Based Natural Language …

285

proposed notion can build attention to each feature by learning the input noise level effectively. The general framework of the study is based on stochastic attention mechanism. Then the attentions are generated by the stochastic mechanism with input-adaptive Gaussian noise and variance inference. After that, an attentional RNN model with both timesteps attentions and feature attentions is adopted learn the prediction possibilities. As a result, the uncertainty-aware attention mechanism shows significantly better performance on the training datasets than baseline models. Wang et al. [75] take a different way to consolidate supervised learning and the reinforcement learning methods together in their study to generate recommendations to patient treatments. They present this novel architecture of Supervised Reinforcement Learning with Recurrent Neural Network (SRL-RNN) to act as an off-policy actor-critic framework to deal with the complicated relationships between different types of data in the EHR system. The indicator signal and the evaluation signal then co-supervise the actor in SRL-RNN to generate effective prescriptions and low rate of mortality. In the real world of the medical domain, there is always a limit in fully observed states. Because of this characteristic, the RNN is further adopted here to tackle this problem of Partially-observed Markov Decision Process (POMDP). The paper conducts experiments on MIMIC-III data. And the results have shown that the proposed architecture is able to provide ideal accuracy rate in doctor’s prescriptions matching as well as to lower the estimated mortality rate. Pham et al. [76] present this end-to-end approach of DNN, DeepCare, which can interpret clinical records, save patient’s all medical histories, infer the current medical conditions and make possible clinical predictions base on the given information. DeepCare is built on the LSTM recurrent neural network that can store memories of the applicable experiences. At each micro data level, DeepCare uses the LSTM model to read the given input, to update the memory cell and to represent the care episodes as output in the system. It also functions to suggest medical interventions for helping patients with current illness, future clinical risks. At the macro health state level, the DeepCare also learn and aggregate the recorded health states by applying multiscale temporal pooling to get them fed into the deep dynamic neural networks for future estimations. The experiments are done on two chronical diseases, diabetes and mental health, to prove that the proposed method is capable of modeling disease progression, recommending possible clinical interventions, improving general modeling and making clinical predictions accurately. Ma et al. [77] believe the importance of incorporating prior medical knowledge into risk prediction tasks. Therefore, in this study, they initiate a new deep learning approached PRIME framework using posterior regularization method to incorporate all prior knowledge into the predictive models. Specifically, this paper introduced PRIMEr and PRIMEc models based on LSTM and CNN respectively to practice the prediction steps. Besides, the prior knowledge applied in risk prediction model are totally without human intervention while doing disease distribution, in other words, the knowledge doesn’t need to be processed by human to set boundaries. By modeling log linear to prior knowledge, the PRIME framework could even learn the importance of each piece of prior knowledge automatically.

286

R. Zhu et al.

Suresh et al. [78] focus on real-time clinical predictions in the data from intensive care units (ICUs) in MIMIC III. Different from previous studies, this work integrates ICU based data from all different sources to focus on learning insightful representations for clinical interventions predictions. Particularly, the authors compared the two most commonly used approaches to exploit clinical decisions, the LSTM and CNN, on 5 tasks of clinical interventions: invasive ventilation, non-invasive ventilation, vasopressors, colloid boluses, and crystalloid boluses [78]. The experiments have shown great results when comparing to other state-of-the-art literatures. Choi et al. [31] propose a RNN model for clinical prediction. This RNN-based model takes historical data of ICD codes in EHR system as raw inputs to perform multilabel predictions over a period of time. Specifically, the proposed model predicts possible future visits of the patients, possible future diagnoses practiced by physicians, and possible future use of medications. Choi et al. apply skip-gram embedding to ICD codes as inputs to initialize a scheme for the recurrent neural network model. These high dimensional input vectors are projected to a lower dimensional space through RNN by gated recurrent units. Finally, the patient’s potential next visit is predicted by a rectified linear unit (ReLU), while diagnoses codes and medication codes are predicted by a softmax layer. Thus, the medical concepts and the patients are better represented in the proposed architecture. Meanwhile, the experiments ran in this paper suggest the potential of adopting RNN based models to other medical systems by transfer learning, as well as the opportunity of medical systems with insufficient patient data to improve clinical predictions towards their smaller client base. Lasko et al. [79] propose a computational phenotype discovery method in EHR clinical data. Since the nature of medical data in EHR system is unstructured, noisy and sparse, the method adopts a deep learning approach with longitudinal probability densities inferred from Gaussian process regression to study these clinical data. As a result, the study produces continuous phenotypic features accurately to indicate the multiple population subtypes among data collection. Liang et al. [80] propose a deep belief networks-based model to tackle the computer-aided medical decision making (CAMDM) issues, with a focus on clinical decision-making support and medical data analyses in the traditional Chinese medicine in mid 2019. The model adopts an unsupervised learning algorithm of seven layered deep belief network (DBN) to get feature representations following by a supervised learning model of support vector machine (SVM) on top of the deep belief network. The experimental results suggest that the novel deep learning DBN + SVM model outperforms simple decision tree and SVM models in computer-aided medical decision-making tasks. Earlier in 2014, Liang et al. [81] used a convolutional deep belief network to train the electrical medical records to support clinical decision making. The experiments were run on a dataset of hypertension retrieved from HIS system, and a dataset on Chinese medical diagnosis and treatment prescriptions in manually converted EHR system. The experimental results are able to perform significantly better than the conventional shallow models in discovering previously unknown medical concepts.

Using Deep Learning Based Natural Language …

3.3.2

287

Specific Disease Predictions

Except for the studies conducted on general disease, clinical or patient’s trends predictions, many of the existing literature narrow down their research fields to some specific disease predictions. Among all those work, diabetes disease is very popular in the past few years. Mei et al. [82] take the raw data from the EHR system as inputs to construct their proposed “Deep Diabetologist” model with RNN for EHR sequential data modeling. The goal of their study is to generate personalized clinical predictions, on hypoglycemia medicines specifically, for the diabetic patients. The data preprocessing was done by linking patient IDs together with the event IDs. Then, the RNN medication prediction model is adopted to generate the probabilities, following by a hierarchical RNN model of medication prediction to follow those time steps. Compared to other baseline models, the hierarchical RNN model outperforms them significantly, while provide more useful insights for future physicians and researchers to conduct secondary studies. While most of the existing work takes raw data from the EHR system as inputs to their proposed model, Sousa et al. [83] studies the chronic diabetic disease with financial records from health plan providers solely to make predictions on the disease evolution. They believe the financial data is a way of aligning towards the international standard where the records can encode medical procedures. The proposed experiment is exercised on a self-attentive RNN model, where the most relevant sentences are expected to be selected. Specifically, the input embedding layer is pretrained with Word2Vec Skip-Grams. Then, the model’s embedding layer is connected to two fully connected layers of BiLSTM model along with self-attention mechanism. The experimental results generated from the study show it as an effective way of predicting diabetes, however, a full paper on the task is still waiting to be published.

4 Challenges and Remaining Problems Section 3 gives a clear picture of the current status of the published literature adopting deep learning-based natural language processing techniques to medical domain. It is also worth reviewing the existing problems and challenges in the research that are remained to be solved. The challenges include but are not limited to, (1) Data Volume: The foundation for deep learning-based techniques to perform well is to have a huge amount of data. In healthcare sector, limited accessibility of primary healthcare in certain areas and the fact that most patients perceive medical records as one of the most private information and are not willing to share are largely affecting the volume of data for research. (2) Data Variability: The task of collecting a wide and unbiased variability of data is difficult.

288

R. Zhu et al.

(3) Data Quality: Unlike the data generated in other domain, healthcare data are by nature “dirty” data which are heterogenous, unstructured, noisy, incomplete and ambiguous. Data preprocessing for deep learning models is challenging and sometimes very time-consuming. (4) Uncertainty: Diseases or viruses are developing and evolving in an uncertain pattern all the time. Therefore, designing the deep learning based natural language processing techniques to tailor this temporal data characteristic is important. (5) Causal Inference: Identifying a reasonable and rational causal relationship between viruses and diseases or treatments and patient’s body reactions are never easy. Kale et al. [84] initially proposed a DNN to approach the causal inference issue by analyzing the relationships between hidden feature representations to generated outputs. (6) Interpretability: Interpretability has remained as one of the biggest challenges in DL approaches in medical domain although they have delivered promising results. Scholars often refer the deep learning methods as a black box that it is hard to interpret how and why the proposed algorithms can perform so well. Since all the medical results generated are closely related to life and death problem, it is still hard to convince healthcare providers to practice exactly what the machines recommend humans to do. (7) Legal and Ethical Issues: As Choi et al. [63] discussed in their 2018 paper, data privacy and synthetic patient’s EHR are rising issues in medical domain. In many countries, patients’ EHR records are confidential data that are not allowed to be shared across health institutions nor across industries. Research institutions including government usually find it hard to study ongoing diseases as the real patients’ data are at the hands of the hospitals. Thus, as the owner of the data, hospitals usually need to form their own teams to conduct research in specific medical domain. However, this legal and ethical data sharing restriction would not only limit the scale of the study but also limit the diversity of the data resulting in less efficient experimental results. There have been a few papers focusing on this issue. For example, Choi et al. [63] try to find ways of solving the problem of limited data availability by proposing a novel deep learning approach, medical GAN (medGAN), to generate synthetic patient records for medical research. Given the real patients’ medical records, the proposed model generates high-dimensional discrete variables by combining autoencoders and GANs. Furthermore, they use minibatch average values to avoid collapse of the mode, and to improve the efficiency of machine learning with batch normalization and shortcut connections. To sum up, the presented approach demonstrates the ability to produce synthetic patient records of comparable performance to real data collections on many medical prediction tasks.

Using Deep Learning Based Natural Language …

289

5 Conclusion and Direction of Future Research This book chapter provides a thorough review of deep learning-based NLP techniques applied towards the clinical research. It presents the current status of deep learningbased NLP models and their recent changes, as well as reviews the existing technological adoption in specific medical NLP tasks. Unlike other existing surveys, we use a novel structure of methods categorization to split the published deep learning-based NLP techniques for clinical decision making into three major task-oriented groups: representation learning, information extraction and clinical predictions. Meanwhile, from the experimental results presented in these literatures, we believe that it is still early for the deep learning approaches to revolutionarily change the healthcare industry due to its embedded challenges and problems such as uncertainty, interpretability and ethical issues. However, recent advances and improvements made by the proposed deep learning-based NLP models have suggested a promising start. Thus, further research and studies towards various directions in the medical domain are necessary. Some possible research directions towards future work include but are not limited to, (1) Feature enrichment: To solve the data volume and variability problem, we should enrich models by capturing as many features as possible to get a well representation of patients and build more robust models to process the growing number of features. (2) Privacy control: Government could possibly collect and provide inference to medical data at the federal level to protect patient’s privacy while allowing necessary research projects to conduct efficiently. (3) Incorporating expert knowledge into current deep learning approaches: Due to all the challenges presented above and the limited amount of diverse data, human experts will continue to play a dominant role in the healthcare sector in the near future. Therefore, incorporating the invaluable expert knowledge into current deep learning processes can not only produce better results, but also train the machines to learn in a more accurate way. And, (4) Improving model interpretability: The performance of the DL models and the interpretability of the model performance are equally important in the healthcare sector. It is a serious ethical problem for healthcare providers to adopt a system if they do not understand. Therefore, in the future studies, it is necessary to find logical and reasonable explanations about how and why the black box of the DNN can perform well on given tasks. Acknowledgements This work is supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada, an NSERC CREATE award in ADERSIM,1 the York Research Chairs (YRC) program and an ORF-RE (Ontario Research Fund-Research Excellence) award in BRAIN Alliance.2

1 http://www.yorku.ca/adersim. 2 http://brainalliance.ca.

290

R. Zhu et al.

References 1. Hinton, G.E., Mcclelland, J.L., Rumelhart, D.E.: Distributed representation. https://web. stanford.edu/jlmcc/papers/PDP/Chapter3.pdf 2. Harris, Z.S.: Distributional structure. Word (1954) 3. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013) 4. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: arXiv preprint. arXiv:1301.3781 (2013) 5. Chalapathy, R., Borzeshi, E.Z., Piccardi, M.: Bidirectional LSTM-CRF for clinical concept extraction. arXiv. https://arxiv.org/abs/1611.08373v1. (2016) 6. Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: European Conference on Machine Learning (ECML), pp. 1532–1543 (2014) 7. Kiela, D., Grace, E., Joulin, A., Mikolov, T.: Efficient large scale multi-modal classfication. arXiv. http://arxiv.org/pdf/1802/02892.pdf 8. Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. NIPS. http://papers.nips.cc/paper/4824-imagenet-classification-with-deepconvolutional-neural-networks.pdf 9. Mikolov, T.: Statistical language models based on neural networks. Ph.D. thesis, Brno University of Technology (2012) 10. Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F, Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734, Doha, Qatar. Association for Computational Linguistics, Oct 2014b. https://doi.org/10.3115/v1/d14-1179 (2014) 11. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735 (1997) 12. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv. https://arxiv.org/pdf/1810.04805.pdf 13. Better Language Models and Their Implications. https://openai.com/blog/better-languagemodels/ 14. Hinton, G.E.: Learning distributed representations of concepts. In: Proceedings of the Eighth Conference Cognitive Science Society, pp. 1–12. (1986) 15. Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 137–1155 (2003) 16. Bengio, Y.: Neural net language models. Scholarpedia 3(1), 3881 (2008) 17. Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Patt. Anal. Mach. Intell. 35(8), 1798–1828 (2013) 18. Hinton, G.E., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006) 19. Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Proceedings of the Neural Information and Processing Systems (2006) 20. Ranzato, M., Poultney, C., Chopra, S., LeCun, Y.: Efficient learning of sparse representations with an energy-based model. In: Proceedings of the Neural Information and Processing Systems (2006) 21. Lee, H., Ekanadham, C., Ng, A.: Sparse deep belief net model for visual area V2. In: Proceedings of the Neural Information and Processing Systems (2007) 22. Bengio, Y.: Learning Deep Architectures for AI. Foundations and Trends in Machine Learning 2(1), 1–127 (2009) 23. Gong, J.J., Naumann, T., Szolovits, P., Guttag, J.V.: Predicting clinical outcomes across changing electronic health record systems. In: International Conference on Knowledge Discovery and Data Mining (KDD). ACM, pp. 1497–1505 (2017)

Using Deep Learning Based Natural Language …

291

24. Choi, T., Xiao, C., Stewart, W.F., Sun, J.: MiME: multilevel medical embedding of electronic health records for predictive healthcare. arXiv. https://arxiv.org/pdf/1810.09593.pdf 25. Escudie, J.-B., Saade, A., Coucke, A., Lelarge, M.: Deep representation for patient visits from electronic health records. arXiv. https://arxiv.org/pdf/1803.09533.pdf 26. Choi, E., Schuetz, A., Steward, W.F., Sun, J.: Medical concept representation learning from electronic health records and its application on heart failure prediction. arXiv. https://arxiv. org/abs/1602.03686 (2017) 27. De Vine, L., Zuccon, G., Koopman, B., Sitbon, L., Bruza, P.: Medical semantic similarity with a neural language model. In: Proceedings of the 23rd ACM International conference on Information and Knowledge Management-CIKM ‘14, 3–7 Nov 2014, Shanghai, China, pp. 1819–1822. ACM, New York, NY, USA 28. Choi, E., Chiu, C.Y., Sontag, D.: Learning low-dimensional representations of medical concepts. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5001761/pdf/2381736.pdf (2016) 29. Minarro-Gim ÃÅenez, J.A., Mar Ãѱn-Alonso, O., Samwald, M.: Exploring the application of deep learning techniques on medical text corpora. Studies in health technology and informatics (2013) 30. Liu, J., Zhang, Z., Razavian, N.: Deep EHR: chronic disease prediction using medical notes. arXiv. https://arxiv.org/pdf/1808.04928.pdf (2018) 31. Choi, E., Bahadori, M.T., Schuetz, A., Stewart, W.F., Sun, J.: Doctor AI: predicting clinical events via recurrent neural networks. arXiv. https://arxiv.org/abs/1511.05942 (2016) 32. Choi, E., Bahadori, M.T., Searles, E., Coffey, C., Thompson, M., Bost, J., Tejedor-Sojo, J., Sun, J.: Multi-layer representation learning for medical concepts. In: Proceedings of the 22nd ACM SIGKDD International Conference Knowledge Discovery and Data Mining—KDD ’16’, 13–17 Aug 2016, San Francisco, CA, USA, pp. 1495–1504. ACM, New York, NY, USA (2016) 33. Li, C., Song, R., Liakata, M., Vlachos, A., Seneff, S., Zhang, X.: Using word embedding for bio-event extraction. In: Proceedings of the 2015 Workshop on Biomedical Natural Language Processing (BioNLP 2015), Beijing, China, 30 July 2015, pp. 121–126. Association for Computational Linguistics, Stroudsburg, PA (2015) 34. Tang, B., Cao, H., Wang, X., Chen, Q., Xu, H.: Evaluating word representation features in biomedical named entity recognition tasks. Biomed. Res. Int. 2014, 1–6 (2014). https://doi. org/10.1155/2014/240403 35. Miotto, R., Li, L., Kidd, B.A., Dudley, J.T.: Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci. Rep. 6, 26094 (2016). https://doi.org/10.1038/srep26094 36. Dligach, D., Miller, T.: Learning patient representations from text. ARXIV. https://arxiv.org/ pdf/1805.02096.pdf 37. Zhang, Z., Kowsari, K., Harrison, J.H., Lobo, J.M., Barnes, L.E.: Patient2Vec: a personalized interpretable deep representation of the longitudinal electronic health record. arXiv. https:// arxiv.org/pdf/1810.04793.pdf 38. Denaxas, S., Stenetorp, P., Riedel, S., Pikoula, M., Dobson, R., Hemingway, H.: Application of clinical concept embeddings for heart failure prediction in UK EHR data. arXiv. https:// arxiv.org/pdf/1811.11005.pdf 39. Wei, X., Eickhoff, C.: Embedding electronic health records for clinical information retrieval. arXiv. https://arxiv.org/pdf/1811.05402.pdf 40. Zhu, Z., Yin, C., Qian, B., Cheng, Y., Wei, J., Wang, F., Measuring patient similarities via a deep architecture with medical concept embedding. arXiv. https://arxiv.org/pdf/1902.03376. pdf 41. Liu, L., Li, H., Hu, Z., Shi, H., Wang, Z., Tang, Z., Zhang, M.: Learning hierarchical representations of electronic health records for clinical outcome prediction. arXiv. https://arxiv. org/pdf/1903.08652.pdf 42. Liu, Y., Ge, T., Mathews, K., Ji, H., McGuinness, D.: Exploiting task-oriented resources to learn word embeddings for clinical abbreviation expansion. In: Proceedings of the 2015 Workshop on Biomedical Natural Language Processing (BioNLP 2015), Beijing, China, 30 July 2015. Association for Computational Linguistics, Stroudsburg, PA, pp. 92–97 (2015)

292

R. Zhu et al.

43. Wu, Y., Xu, J., Zhang, Y., Xu, H.: Clinical abbreviation disambiguation using neural word embeddings. In: Proceedings of the 2015 Workshop on Biomedical Natural Language Processing (BioNLP 2015), Beijing, China, 30 July 2015. Association for Computational Linguistics, Stroudsburg, PA, pp. 171–176 (2015) 44. Li, C., Ji, L., et al.: Acronym disambiguation using word embedding. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence (2014) 45. Gligic, L., Kormilitzin, A., Goldberg, P., Nevado-Holgado, A.: Named entity recognition in electronic health records using transfer learning bootstrapped neural networks. arXiv. https:// arxiv.org/pdf/1901.01592.pdf 46. Sachan, D.S., Xie, P., Sachan, M., Xing, E.P.: Effective use of bidirectional language modeling for transfer learning in biomedical named entity recognition. arXiv https://arxiv.org/pdf/1711. 07908.pdf (2018) 47. Gorinski, P.J., Wu, H., Grover, C., Tobin, R., Talbot, C., Whalley, H., Sudlow, C., Whiteley, W., Alex, B.: Named entity recognition for electronic health records: a comparison of rule-based and machine learning approaches. arXiv. https://arxiv.org/pdf/1903.03985.pdf 48. Yin, X., Huang, X.J., Li, Z., Zhou, X.: A survival modeling approach to biomedical search result diversification using wikipedia. IEEE Trans. Knowl. Data Eng. (TKDE) 25(6), 1201–1212 49. Huang, X., Zhong, M., Si, X.: York University at TREC 2005: genomics track. In: Proceedings of the Fourteenth Text REtrieval Conference (TREC), Gaithersburg, Maryland, USA, 15–18 Nov (2005) 50. Huang, X., Hu, Q.: A bayesian learning approach to promoting diversity in ranking for biomedical information retrieval. In: Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 307–314. Boston, MA, USA, 19–23 July (2009) 51. An, X., Huang, X., geNov: a new metric for measuring novelty and relevancy in biomedical information retrieval (Special Issue on Biomedical Information Retrieval). Nov 2017, 68(11), 2620–2635 (2017) 52. Li, F., Zhang, M., Fu, G., Ji, D.: A neural joint model for entity and relation extraction from biomedical text. BMC Bioinformatics 18, 1 (2017). https://doi.org/10.1186/s12859017-1609-9 53. Mehryary, F., Bjo¨rne, J., Pyysalo, S., Salakoski, T., Ginter, F.: Deep learning with minimal training data: TurkuNLP entry in the BioNLP shared task 2016. In Proceedings of the 4th BioNLP Shared Task Workshop, 13 Aug 2016, Berlin, Germany, pp. 73–81. Association for Computational Linguistics, Stroudsburg, PA (2016) 54. Quan, C., Hua, L., Sun, X., Bai, W.: Multichannel convolutional neural network for biological relation extraction. Biomed. Res. Int. 2016, 1–10 (2016). https://doi.org/10.1155/2016/ 1850404 55. Pyysalo, S., Ginter, F., Moen, F., Salakoski, T.: Distributional semantics resources for biomedical text processing. In: Proceedings of the Languages in Biology and Medicine (LBM ’13), pp. 39–44, Tokyo, Japan, Dec 2013 (2013) 56. Cheng, Y., Wang, F., Zhang, P., Hu, J.: Risk prediction with electric health record: a deep learning approach. SDM 2016. https://astro.temple.edu/tua87106/sdm16.pdf (2016) 57. Zhang, Z., Roy, A., Li, X., Espino, S., Clara, S., Khan, S., Luo, Y.: Using clinical narratives and structured data to identify distant recurrences in breast cancer. arXiv. https://arxiv.org/ pdf/1806.04818.pdf 58. Galk´o, F., Eickhof, C.: Biomedical question answering via weighted neural network passage retrieval. arXiv. https://arxiv.org/pdf/1801.02832.pdf 59. Li, H., Zhang, J., Wang, J., Lin, H., Yang, Z.: DUTIR in BioNLP-ST 2016: utilizing convolutional network and distributed representation to extract complicate relations. In: Proceedings of the 4th BioNLP Shared Task Workshop, 13 Aug 2016, Berlin, Germany, pp. 93–100. Association for Computational Linguistics, Stroudsburg, PA (2016) 60. Rahul, P.V.S.S., Sahu, S.K., Anand, A.: Biomedical event trigger identification using bidirectional recurrent neural network based models. arXiv. https://arxiv.org/abs/1705.09516v1 (2017)

Using Deep Learning Based Natural Language …

293

61. Jagannatha, A.N., Yu, H.: Bidirectional RNN for medical event detection in electronic health records. In: Proceedings of the Conference Association for Computational Linguistics. North American Chapter. Meeting. See https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5119627/ 62. Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. arXiv e-prints. 2014 Sep. 1409:arXiv:1409.1259 (2014) 63. Choi, E., Biswal, S., Malin, B., Duke, J., Stewart, W.F., Sun, J.: Generating multi-label discrete electronic health records using generative adversarial networks. arXiv. https://arxiv.org/abs/ 1703.06490v1 (2017) 64. Lee, S.: Natural language generation for electronic health records. arXiv. https://arxiv.org/ pdf/1806.01353.pdf 65. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on 2015 Jun 7, pp. 3156-3164. IEEE (2015) 66. Liu, X., Xu, K., Xie, P., Xing, E.: Unsupervised pseudo-labeling for extractive summarization on electronic health records. arXiv. https://arxiv.org/pdf/1811.08040.pdf 67. Datta, S., Bernstam, S.V., Roberts, K.: A frame semantic overview of NLP-based information extraction for cancer-related EHR notes. arXiv. https://arxiv.org/pdf/1904.01655.pdf 68. Zeng, Z., Deng, Y., Li, X., Naumann, T., Luo, Y.: Natural language processing for EHR-based computational phenotyping. arXiv. https://arxiv.org/pdf/1806.04820.pdf 69. Rajkomar, A., Oren, E., Chen, K., Dai, A.M., Hajaj, N., Liu, P.J., Liu, X., Sun, M., Sundberg, P., Yee, H., et al.: Scalable and accurate deep learning for electronic health records. arXiv preprint. arXiv:1801.07860 (2018) 70. Zhang, X.S., Tang, F., Dodge, H., Zhou, J., Wang, F.: MetaPred: meta-learning for clinical risk prediction with limited patient electronic health records. arXiv. https://arxiv.org/pdf/1905. 03218.pdf 71. Hosseini, A., Chen, T., Wu, W., Sun, Y., Sarrafzadeh, M.: HeteroMed: heterogeneous information network for medicaldiagnosis. arXiv., https://arxiv.org/pdf/1804.08052.pdf 72. Avati, A., Duan, T., Jung, K., Shah, N.H., Ng, A.: Countdown regression: sharp and calibrated survival predictions. arXiv. https://arxiv.org/pdf/1806.08324.pdf 73. Chung, I., Kim, S., Lee, J., Hwang, S.J., Yang, E.: Mixed effect composite RNN-GP: a personalized and reliable prediction model for healthcare. arXiv. https://arxiv.org/pdf/1806. 01551.pdf 74. Heo, J., Lee, H.B., Kim, S., Lee, J., Kim, K.J., Yang, K., Hwang, S.J.: Uncertainty-aware attention for reliable interpretation and prediction. arXiv. https://arxiv.org/pdf/1805.09653. pdf 75. Wang, L., Zhang, W., He, X., Zha, H.: Supervised reinforcement learning with recurrent neural network for dynamic treatment recommendation. arXiv. https://arxiv.org/pdf/1807.01473.pdf 76. Pham, T., Tran, T., Phung, D., Venkatesh, S.: DeepCare: a deep dynamic memory model for predictive medicine. arXiv. https://arxiv.org/abs/1602.00357v2 (2016) 77. Ma, F., Gao, J., Suo, Q., You, Q., Zhou, J., Zhang, A.: 2018 risk prediction on electronic health records with prior medical knowledge. In: KDD ’18: The 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 19–23 Aug 2018, London, United Kingdom. ACM, New York, NY, USA, p. 10. https://doi.org/10.1145/3219819.3220020 78. Suresh, H., Hunt, N., Johnson, A., Celi, L.A., Szolovits, P., Ghassemi, M.: Clinical intervention prediction and understanding with deep neural networks. In: Machine Learning for Healthcare Conference, pp. 322–337 (2017) 79. Lasko, T.A., Denny, J.C., Levy, M.A.: Computational phenotype discovery using unsupervised feature learning over noisy, sparse, and irregular clinical data. PLoS ONE 8, e66341 (2013). https://doi.org/10.1371/journal.pone.0066341 80. Liang, Z., Liu, J., Ou, A., Zhang, H., Li, Z., Huang, X.: Deep generative learning for automated EHR diagnosis of traditional Chinese medicine. Comput. Methods Progr. Biomed. 174, 17–23 (2019)

294

R. Zhu et al.

81. Liang, Z., Zhang, G., Huang, X., Hu, Q.: Deep learning for healthcare decision making with EMRs. In: Proceedings of 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 556–559 82. Mei, j., Zhao, S., Jin, F., Xia, E., Liu, H., Li, X.: Deep diabetologist: learning to prescribe hypoglycemia medications with hierarchical recurrent neural networks. arXiv. https://arxiv. org/pdf/1810.07692.pdf 83. Sousa, R.T., Pereira, L.A., Soares, A.S.: Predicting diabetes disease evolution using financial records and recurrent neural networks. arXiv. https://arxiv.org/pdf/1811.09350.pdf 84. Kale, D.C, Che, Z., Bahadori, M.T., Li, W., Liu, Y., Wetzel, R.: Causal phenotype discovery via deep networks. AMIA Annual Symposium Proceedings https://www.ncbi.nlm.nih.gov/ pmc/articles/PMC4765623/ (2015) 85. Ghassemi, M., Naumann, T., Schulam, P., Beam, A.L., Ranganath, R.: Opportunities in machine learning for healthcare. arXiv. https://arxiv.org/pdf/1806.00388.pdf (2018) 86. Lyu, X., Huser, M., Hyland, S.L., Zerveas, G., Ratsch, G.: Improving clinical predictions through unsupervised time series representation learning. arXiv https://arxiv.org/pef/1812. 00490.pdf (2018) 87. Nickel, M., Kiela, D.: Poincar\’e embeddings for learning hierarchical representations. arXiv preprint arXiv:1705.08039 (2017) 88. Greenland, S., Robins, J.M., Pearl, J.: Confounding and collapsibility in causal inference. Stat. Sci., pp. 29–46 (1999) 89. Miotto, R., Wang, F., Wang, S., Jiang, Z., Dudley, J.T.: Deep learning for healthcare: review, opportunities and challenges. Brief. Bioinform. 375, 4 (2017). https://doi.org/10.1093/bib/ bbx044 90. Wei, C.-H., Harris, B.R., Kao, H.-Y., Lu, Z.: tmVar: a text mining approach for extracting sequence variants in biomedical literature. Bioinformatics 29, 1433–1439 (2013). https://doi. org/10.1093/bioinformatics/btt156 91. Liu, S., Tang, B., Chen, Q., Wang, X.: Effects of semantic features on machine learning-based drug name recognition systems: word embeddings vs. manually constructed dictionaries. Information 6, 848–865 (2015). https://doi.org/10.3390/info6040848 92. Mohan, S., Fiorini, N., Kim, S., Lu, Z.: Deep learning for biomedical information retrieval: learning textual relevance from click logs. In: Proceedings of the BioNLP 2017 Workshop, Vancouver, Canada, 4 Aug 2017, pp. 222–231. Association for Computational Linguistics Stroudsburg, PA (2017) 93. Ohno-Machado, L.: Realizing the full potential of electronic health records: the role of natural language processing. J. Am. Med. Inform. Assoc. 18, 539 (2011). https://doi.org/10.1136/ amiajnl-2011-000501 94. Bruijn, Bd, Cherry, C., Kiritchenko, S., Martin, J., Zhu, X.: Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. J. Am. Med. Inform. Assoc. 18, 557–562 (2011). https://doi.org/10.1136/amiajnl-2011-000150 95. Yoon, H.-J., Ramanathan, A., Tourassi, G.: Multi-task deep neural networks for automated extraction of primary site and laterality information from cancer pathology reports. In: Advances in big data, INNS 2016, 23–25 Oct 2016, Thessaloniki, Greece; Angelov, P., Manolopoulos, Y., Iliadis, L., Roy, A., Vellasco, M. (eds.)Advances in Intelligent Systems and Computing, vol. 529. Springer, Cham (2016) 96. Beaulieu-Jones, B.K., Greene, C.S.: Semi- supervised learning of the electronic health record for phenotype stratification. J. Biomed. Inform. 64, 168–178 (2016). https://doi.org/10.1016/ j.jbi.2016.10.007 97. Bowman, S.: Impact of electronic health record systems on information integrity: quality and safety implications. Perspect. Health Inf. Manag. 10, 1c (2013) 98. Beaulieu-Jones, B.K., Wu, Z.S., Williams, C., Byrd, J.B., Greene, C.S.: Privacy-preserving generative deep neural networks support clinical data sharing. bioRxiv https://doi.org/10. 1101/159756 (2017) 99. Letham, B., Rudin, C., McCormick, T.H., Madigan, D., et al.: Interpretable classifiers using rules and bayesian analysis: building a better stroke prediction model. Ann. Appl. Stat. 9(3), 1350–1371 (2015)

Using Deep Learning Based Natural Language …

295

100. Robins, J.M.: Robust estimation in sequentially ignorable missing data and causal inference models. Proc. Am. Stat. Assoc. 1999, 6–10 (2000) 101. Robins, J.M., Rotnitzky, A., Scharfstein, D.O.: Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In: Statistical models in epidemiology, the environment, and clinical trials. Springer, pp 1–94 (2000) 102. Papernot, N., McDaniel, P., Sinha, A., Wellman, M.: Towards the science of security and privacy in machine learning. arXiv. https://arxiv.org/abs/1611.03814v1 (2016) 103. Xu, Z., Chou, J., Zhang, X.S., Luo, Y., Isakova, T., et al.: Identification of predictive subphenotypes of acute kidney injury using structured and unstructured electronic health record data with memory networks. arXiv. https://arxiv.org/pdf/1904.04990.pdf 104. Chou, E., Nguyen, T., Beal, J., Haque, A., Fei-Fei, L.: A fully private pipeline for deep learning on electronic health records. arXiv. https://arxiv.org/pdf/1811.09951.pdf 105. Banerjee, I., Gensheimer, M.F., Wood, D.J., Henry, S., Chang, D., Rubin, D.L.: Probabilistic prognostic estimates of survival in metastatic cancer patients (PPES-Met) utilizing free-text clinical narratives. arXiv. https://arxiv.org/pdf/1801.03058.pdf 106. Kayali, I.: Expert system for diagnosis of chest diseases using neural networks. arXiv. https:// arxiv.org/pdf/1802.06866.pdf 107. de la Torre, J., Valls, A., Puig, D.: A deep learning interpretable classifier for diabetic retinopathy disease grading. arXiv. https://arxiv.org/pdf/1712.08107.pdf 108. Holzinger, A., Malle, B., Kieseberg, P., Roth, P.M., M¨uller, H., Reihs, R., Zatloukal, K.: Towards the augmented pathologist: challenges of explainable-ai in digital pathology. arXiv. https://arxiv.org/pdf/1712.06657.pdf

Runjie Zhu is currently a Ph.D. student at Electrical Engineering and Computer Science Program at Lassonde School of Engineering, York University. Her research interests are in information retrieval, natural language processing, with a specialization in biomedical information retrieval, Electronic Health Records, Clinical Decisions and Predictions. Xinhui Tu is currently an Associate Professor at the School of Computer, Central China Normal University. He received his Ph.D., master’s and bachelor’s degrees from Central China Normal University in 2012, 2006 and 2001, respectively. His current research interests include information retrieval and natural language processing. He has published more than 30 papers in the leading journals and conferences, such as SIGIR, CIKM, etc. Jimmy Huang School of Information Technology holds a York Research Chair Professorship. His research focuses on information retrieval, AI and big data analytics with complex structures and their applications to Web & healthcare. He has published 230+ papers in top-tier venues (e.g. ACM Transactions on Information Systems, IEEE Transactions on Knowledge & Data Engineering, ACM SIGIR, CIKM, KDD, ACL, IJCAI and AAAI). The outcome of his contributions in developing and applying probabilistic modeling techniques to large-scale data analysis had significant impacts on both academia and industry. He was and will be General Chairs for the 19th CIKM and 43rd SIGIR.

Deep Learning for Medical Image Processing

Diabetes Detection Using ECG Signals: An Overview G. Swapna, K. P. Soman and R. Vinayakumar

Abstract Diabetes Mellitus (or diabetes) is a clinical condition marked by hyperglycaemia and it affects a lot of people worldwide. Hyperglycaemia is the condition where high amount of glucose is present in the blood along with lack of insulin. The incidence of diabetes affected people is increasing every year. Diabetes cannot be cured. It can only be managed. If, not managed properly, it can lead to great complications which can be fatal. Therefore, timely diagnosis of diabetes is of great importance. In this chapter, we see the effect of diabetes on cardiac health and how heart rate variability (HRV) signals give an indication about the existence and acuteness of the diabetes by measuring the diabetes-induced cardiac impairments. Extracting useful information from the nonstationary and nonlinear HRV signal is extremely challenging. We review that deep learning methods do that extricating task very effectively so as to identify the correlation between the presence of diabetes and HRV signal variations in the most accurate and fast manner. We discuss several deep learning architectures which can be effectively used for HRV signal analysis for the purpose of detection of diabetes. It can be seen that deep learning methods is the state of art to understand and analyse the fine changes from the normal in the case of HRV signals. Deep learning networks can be developed to a scalable framework which can process large amount of data in a distributed manner. This can be followed by application of distributed deep learning algorithm for learning the patterns so as to do even correct predictions about future progress of the disease. Presently, there is no publicly available data of normal and diabetic HRV. If large amount of private data of diabetic HRV and normal HRV can be made available, then deep learning networks have the capability to give the authorities different kind of statistics from the stored data and projections of future prognosis of diabetes. G. Swapna (B) · K. P. Soman · R. Vinayakumar Amrita School of Engineering, Center for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] K. P. Soman e-mail: [email protected] R. Vinayakumar e-mail: [email protected] © Springer Nature Switzerland AG 2020 S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics, Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_14

299

300

G. Swapna et al.

Keywords ECG · Diabetes · Machine learning · Heart rate variability · Deep learning · Cardiovascular autonomic neuropathy

1 Introduction Biosignals are biological signals extracted from the human body or in general human beings. Commonly referred biosignals are electrical in nature, but there are nonelectrical biosignals also. Some examples of biosignals are electrocardiography (ECG) which measures electrical activity of the heart, electroencephalography (EEG) which measures brain activity, photoplethysmography (PPG) which depicts the volumetric changes of an organ. ECG signal is employed for the noninvasive diagnosis of diabetes. ECG is used by clinicians to electrically measure rhythm of the heart, attaching electrodes to the skin surface. ECG depicts the complete electrical patterns of the heart including atrial depolarization and ventricular repolarization. Heart rate variability (HRV) data is extracted out of ECG signal. HRV is a simple, but powerful signal which clearly reflects the condition of cardiovascular system. From the initial days, biosignals are processed and analysed mainly through extracting features and then classifying them. These processes are performed by developing computer-aided design (CAD) systems. The features are manually selected and need to be optimal since identification of suitable features requires domain knowledge. The performance of above-mentioned approaches is not satisfactory as the complexity of the data increases. The analysis of complex, high dimensional, real-world data can be effectively done using deep learning. Deep learning is done using deep learning architectures typically made up of very large number of hidden layers and containing millions of neurons interconnected in a structure similar to a 2D matrix. These complex networks are capable of handling and analysing complex, very large sized and very high dimensional data. Raw data (or data undergone very little signal processing) can be directly fed into these networks. Each layer of the network produce at its output, representations which are automatically designed by the deep learning network, using a general learning method (in place of manually decided feature extraction in the case of machine learning based typical neural networks, which are very small sized and of very simple structure compared to deep learning networks). Though deep learning networks are commonly used for two-dimensional image analysis problems, it can be very effectively used for one-dimensional data also. We review application of deep learning based methods to one-dimensional HRV data. The main bottle neck of applying deep learning to biosignal in general is the present non-availability of very large sized training data belonging to medical domain which is required for training deep learning networks having gigantic number of parameters. Diabetes mellitus, which is usually called diabetes, is a long-term metabolic disorder wherein the body is incapable of metabolizing glucose (sugar) properly. This creates a very high level of glucose in the blood (this condition is known by the term hyperglycaemia). Insulin is a hormone that is necessary for the body cells to absorb

Diabetes Detection Using ECG Signals: An Overview

301

blood glucose (produced from the carbohydrates in the food we intake) and to store glucose for future needs. The condition of diabetes is either because of the incapability of the body to generate sufficient insulin or because of the state where body cells do not react to the generated insulin. Medically, there is no cure for diabetes. Hence it should be properly controlled. Below are the different types of diabetes. Type 1 diabetes is the name of the diabetes found in children. In type 1 diabetes, the immune system of the body destroys its own beta cells resulting in deficiency of insulin. Type 2 diabetes is the common type of diabetes that develops in adults usually above the age of 40. The cells generally become insensitive to the insulin produced or the cells are unable to use the produced insulin properly. This is known as insulin resistance. Gestational diabetes is the glucose intolerance developed during pregnancy period. Out of these three types, type 2 diabetes is the most commonly prevalent type. In this chapter, we mean type 2 diabetes by the word diabetes. A 2017 statistics estimates that 8.8% of people worldwide have diabetes. It is rising more alarmingly in underdeveloped countries. According to National Diabetes Statistics Report 2017 (pertaining to United States), about 9.4% of U.S. population has diabetes in 2015. Of these, about 23% were not aware or did not report having diabetes (diabetes was undiagnosed for them). As per the statistics of International Diabetes Federation, India has a diabetes population of 6.9 crores. India is the country having the second largest diabetes population in the world. Kerala is one among the states having the largest number of diabetes affected people in India. As per the new statistics of Indian Medical Association (IMA), in Kerala every year, 138 people are being newly diagnosed by diabetes out of a population of 1000 people. Some of the consequences due to diabetes have been briefed by World Health Organization (WHO) as follows. In 2015, approximately 1.6 million deaths globally were directly caused by diabetes. Almost 50% of these deaths happen earlier to 70 years of age because of increased blood sugar. Diabetes causes damage to nerves known as diabetic neuropathy. Diabetes increases the possibility of heart ailments and stroke. About 50% of diabetes inflicted people die due to heart related complications. Diabetes can lead to amputation of limb caused by neuropathy in feet. Another problem caused is diabetic retinopathy wherein the nerve problem caused by diabetes can cause heavy damage to blood vessels in retina which may affect eye vision (10% of diabetic people), may lead to blindness (2% of diabetic people) also. Death comes in the form of kidney failure in average 15% of diabetic people. Thus, over time, uncontrolled diabetes leads to serious damage of many vital organs of the body like heart, blood vessels, kidneys (nephropathy), nerves, feet and eyes. Diabetes deaths are mainly due to complications caused by the disease. Hyperglycaemia in less severe condition is known as impaired glucose tolerance. This condition is characterised by high risk of large blood vessel disease and may lead to complications like myocardial infarction. The impaired glucose tolerance condition does not considerably lead to microvascular disease similar to the condition of diabetes induced hyperglycaemia.

302

G. Swapna et al.

All the above data and reports underline the necessity and challenges in the development of effective diabetic detection and management methods. Some of the symptoms of hyperglycaemia due to diabetes are enormous urine excretion, high levels of thirst, hunger and fatigue. Reduction in weight and impairment in vision are likely to happen. In terms of diagnosis, major challenge is the fact that these symptoms are not that marked at the onset of diabetes. Symptoms get pronounced only after diabetes worsens to the extent of leading to complications. To minimize such complications, early detection of diabetes is important. Methods should be developed that will help to prevent or delay diabetes. Effective ways should be developed for diagnosis and treatment of this disease. Further challenge is developing methods which are capable to predict much early diabetes in a cost effective way so that corrective steps and treatment can be given in time to avert diabetes, thus also saving the person from the serious complications to which diabetes if undetected or not properly managed can lead to. Here, we review methods that are related to non-invasive diagnosis methods of diabetes with high accuracy using HRV signals derived from ECG signals. Heart rate value based diabetes detection has been observed to be computationally efficient than the decision theoretic approach and hence has been heavily explored. Deep learning methods are now being increasingly used in healthcare analytics. Initially, machine learning techniques were extensively used for HRV based diabetes detection. Deep learning architectures have the potential to improve the accuracy of diabetes detection by capturing minute variations in ECG. Further big stride possible in future is the prediction of diabetes if sufficiently large amount of training and testing data are made available. In this chapter, Sects. 2 and 3 provide discussion of the relevant medical aspects of diabetes and its detection methods. Sections 4 and 5 detail the machine learning and deep learning methods used by researchers for diabetes detection. Section 6 gives the detailed literature survey of works using ECG-derived-HRV as input for diabetes detection. A sample architecture and implementation details are described in Sect. 7. The limitations and challenges of deep learning methods are discussed in Sect. 8. The chapter concludes with Sect. 9.

2 Diabetes 2.1 Diabetes and Its Associated Mechanism Glucose homeostasis is the natural regulation mechanism of the body by which the blood glucose (blood sugar) levels are maintained within a narrow range. Diabetes refers to a group of conditions which indicates that blood glucose balance in the body has gone out of control. For proper functioning of the body, the blood glucose values have to strictly fall between a very narrow range (70 ml/dl and 110 mg/dl) (ml is millilitre and dl is decilitre). The pancreatic endocrine hormones namely insulin and

Diabetes Detection Using ECG Signals: An Overview

303

glucagon make this happen. Insulin and glucagon are the vital hormones secreted by pancreatic islet cells in response to the level of blood sugar, but in an opposite manner. The beta cells of the pancreas secrete insulin. Glucose is the main source of energy for the body cells. But glucose is a large molecule which cannot be passed through the cell membrane through simple diffusion mechanism. Insulin enables glucose transport into the cells. There is a very low base level of insulin always secreted. When we take food, carbohydrates are converted to glucose and most of it is sent to the blood. When blood glucose is high, then a proportional amount of insulin is produced. When insulin is present, the cells of the body can absorb glucose out of the blood thus leading to the reduction of blood glucose level. The cells use the absorbed glucose for getting energy for carrying out their assigned functions. When the blood glucose decreases to the normal level, then the amount of insulin secreted also goes down to the base minimum. Thus high blood glucose serves as a signal to pancreas to release insulin to the blood. Suppose the level of blood glucose remains high even after cell absorption, then insulin facilitates the storage of the excess glucose in the cells of the liver in the form of a substance known as glycogen by the process called glycogenesis. The alpha cells of the pancreas secrete glucagon whose action is opposite to that of insulin. Glucagon production is inversely proportional to the amount of blood glucose. If blood glucose is high, no glucagon is produced. If blood glucose is low (for example when there is long gap after taking food), large amount of glucagon is secreted. Glucagon induces liver to release its stored glucose by converting the glycogen to glucose by the process called glycogenolysis. Thus, the level of blood glucose is increased. Glucagon also induces liver and some muscle cells to produce glucose from other nutrients such as protein. The above mentioned processes are summarized in Fig. 1.

2.2 Types of Diabetes Type 1, 2 and gestational diabetes are the commonly seen categories of diabetes. The type 1 is mainly found in children. This is characterized by the incapability of the body to generate insulin, mainly because of the autoimmune damage of beta cells in the pancreas which produces insulin. The people having this diabetes have to live their whole life with the support of insulin injections; otherwise complications will occur due to the increased blood glucose. Type 1 diabetes people commonly show symptoms of fast weight loss, polydipsia (abnormally high thirst), polyuria (large amount of urine production) and the associated nocturia (tendency to urinate more times during night). There will be presence of ketone bodies in urine (condition known as ketonuria).

304

G. Swapna et al.

Fig. 1 Mechanism of maintaining desired blood glucose levels

Table 1 Important distinguishing features of type 1 and 2 diabetes

Different features

Type 1

Type 2

Age of the start of disease

50 years

Duration of symptoms

Weeks

Months to years

Body weight

Normal or low

Above normal

Ketonuria

Present

Absent

If insulin treatment is not given

Can lead to rapid death

Does not pose immediate threat to life

Complications at the time of diagnosis

No

Around 25%

Family history of diabetes

Need not be there

More likely to be there

Type 2 diabetes is the state of decreased sensitivity to the action of insulin. Diabetic patients need external insulin support for maintaining the proper balance of blood glucose. If not treated properly, the diabetes is likely to progress. This is the most prominent type of diabetes prevalent (Table 1).

Diabetes Detection Using ECG Signals: An Overview

305

Gestational diabetes develops in pregnancy (gestation) period. The blood sugar levels, which are normal before pregnancy, increase beyond allowable ranges. If not properly managed, it will affect pregnancy and baby’s health. There is another term related to diabetes known as prediabetes. It is the condition where sufficient insulin is produced in the body, but the body doesn’t make use of it properly. The blood glucose levels are high in the case of prediabetes, but not as high as found in type 2 diabetes. Prediabetes is an indicator of the future high risk of developing type 2 diabetes. Diabetes, if not treated properly, result in too much increased blood glucose (hyperglycaemia) leading to complications. If the diabetes affected people take too much insulin or if they exercise without sufficient food, it can lead to low blood sugar condition known as hypoglycaemia which is highly life threatening.

2.3 Complications Due to Diabetes Uncontrolled diabetes over a long duration can lead to many complications. Type 2 diabetes doesn’t show noticeable symptoms at the initial stage. Because of this, about 25% of the people show evidences of diabetic complications at the time of diagnosis only. 70% of the deaths in diabetes are due to cardiovascular diseases. A statistics from USA indicate that diabetic people have 1.7 times higher cardiovascular death rates than their non-diabetic counter parts among people aged 20 and above. The chance of diabetic people affected by myocardial infarction and stroke are 1.8 and 1.5 times higher when compared to non-diabetic people. The effects of cardiovascular risk factors like smoking and hypertension gets magnified by the presence of diabetes. Macrovascular (large blood vessel) disease caused by diabetes lead to fatal complications like angina, stroke, myocardial infarction, cardiac failure, intermittent claudication (cramping pain in leg) etc. Diabetic people suffer from atherosclerosis (deposit of fatty material in the inner walls of the arteries) much earlier with much severity than non-diabetic people. Diabetes also affects the small blood vessels in the body. This condition is also known as microvascular disease (also known as diabetic microangiopathy) and it leads to thickening of the basement membrane of the capillaries and further leads to increase in the vascular permeability throughout the body. Retinopathy induced by diabetes is the most common form of vision related impairment in adults. Capillary occlusion (blockage) due to hyperglycaemia increases local vascular endothelial growth factor (VEGF) in retina. The occlusion of a lot of capillaries leads to the growth of new vessels in retina. There will be swellings called microaneurysms in capillary vessels in retina which leak fluid and blood resulting in retinal haemorrhages. The most serious form of diabetic retinopathy is called proliferative retinopathy which if left untreated causes extensive visual damage in the form of retinal detachment and frequent haemorrhages.

306

G. Swapna et al.

Diabetic nephropathy refers to the damage caused to the kidneys which may finally lead to kidney failure. Kidney is made up of microscopic units called nephrons which filter out impurities from the blood. Diabetes induced hyperglycaemia affects the proper filtering functions performed by the nephrons. Diabetic nephropathy is a prominent reason for long-term kidney disease and end-stage renal disease (ESRD) wherein the kidneys do not work properly. ESRD is the last stage in diabetic nephropathy where the person cannot survive without dialysis. It is found that diabetic neuropathy is an important cause of morbidity and mortality in diabetes. In peripheral neuropathy, peripheral nerves are affected resulting in problems like deficiencies in motor and sensory functions. Weakening of the proximal muscles (muscles close to the body’s midline), abnormality in gait, pain in limbs and feet can happen. In autonomic neuropathy, parasympathetic or sympathetic nerves may be affected in many visceral systems. There are innumerable clinical features of autonomic neuropathy affecting different systems of the body like cardiovascular systems (e.g. resting tachycardia), gastrointestinal systems (e.g. constipation, abdominal fullness, nocturnal diarrhoea), pupillary systems (e.g. reduced reflexes to light, reduction in pupil size) etc. All the above described complications are shown in Fig. 2.

2.4 Causes (Risk Factors) of Diabetes Overeating, under activity and obesity may lead to diabetes in the case of middleaged people according to the epidemiological studies conducted. People with a body mass index (BMI) larger than 30 kg/m2 are 10 times more prone to getting type 2 diabetes. Middle-aged and elderly people are also at greater risk of diabetes. Ethnic origin is another major risk factor of diabetes. It is found that in USA, only 5.5% of the Alaskan people are affected by diabetes, while it is 7.1% for non-hispanic white people and 13% for non-hispanic black people. The highest value of 33% is for native Americans in USA. These disparities observed based on ethnicity may be due to a variety of unknown and known factors like life style, BMI related etc.

2.5 Treatment and Management of Diabetes Proper treatment, effective blood glucose monitoring and control are very essential in preventing diabetes causing complications. Popular treatment is through the oral intake of effective drugs in order to maintain proper blood glucose level for diabetic people. Another mode of treatment is by insulin injection subcutaneously applied commonly to upper arms, thighs and buttocks with a disposable plastic syringe and a sharp needle. They are normally given in multiple doses several times a day. In acute cases, especially to those belonging to type 1 diabetes, continuous subcutaneous insulin therapy (or insulin pump) is administered. A further improvement of

Diabetes Detection Using ECG Signals: An Overview

307

Fig. 2 Complications of diabetes

insulin pump which incorporates a closed loop system is known as artificial pancreas. Artificial pancreas is an integrated system working in closed loop consisting of insulin pumps along with continuous glucose monitoring systems (CGMS). The CGMS system can be considered to include interstitial glucose measurement done every 5–15 min, a personal glucose monitor which uses the glucose information to calculate the amount of insulin to be delivered into the body by the insulin pump and finally the insulin pump that delivers insulin. It is important to adopt a healthy lifestyle by doing regular physical activity and maintaining proper BMI. Healthy diet is very important. Alcohol consumption, smoking and stress have to be avoided. Many of the important medical aspects discussed in this paper are taken from book Davidson’s Principles and practice of Medicine [1].

308

G. Swapna et al.

3 Common Methods of Diabetes Detection 3.1 Invasive Methods of Diabetes Detection (Blood Testing) As said initially, blood glucose level has to be maintained between 70 and 110 mg/dl in the fasting condition. If it is below 70, then the condition is hypoglycaemia. If food is taken within two or three hours, then the glucose level can exceed 110. Irrespective of the amount of food one has taken, blood sugar should not exceed 180 in the normal case. If it is more than 180, the condition is hyperglycaemia indicative of diabetes. All the commonly used methods for detecting diabetes are invasive in nature. It generally involves extracting blood sample from the person and testing it for the possible anomaly. Popular invasive tests for diabetes detection and its acuteness are explained below. Table 2 also highlights the importance of these tests in diabetes detection.

3.1.1

Oral Glucose Tolerance Test (OGTT)

OGTT is mainly done to check for gestational diabetes in pregnant woman. A prescribed amount of sugar contained drink is given to the person under test. Blood samples are tested at the prescribed time intervals. Blood glucose measurement greater than 200 indicates the presence of diabetes. If diabetes is undetected in pregnant woman, it may lead to complications.

3.1.2

HaemoglobinA1c (HbA1c)

HbA1c blood test gives the average blood sugar value for the past three months. HbA1c means glycated haemoglobin. Haemoglobin is a protein contained in red blood cells whose task is to carry oxygen throughout the body. Haemoglobin is glycated when haemoglobin combines with blood glucose. HbA1c greater than 6.5% indicates diabetes. Table 2 Indication of diabetes and prediabetes Indication of diabetes

Indication of prediabetes

Fasting blood sugar ≥126 mg/dl

Fasting blood sugar ≥110 mg/dl and ≤126 mg/dl

Blood sugar two hours later a 75 g oral glucose drink ≥200 mg/dl

Blood sugar two hours later a 75 g oral glucose drink in range 140–200 mg/dl

HbA1c ≥6.5%

HbA1c in the range 5.7–6.4%

Diabetes Detection Using ECG Signals: An Overview

3.1.3

309

Interstitial Glucose Monitoring

This is a recently developed test to detect diabetes through interstitial continuous glucose monitoring (CGM). This test involves insertion of a tiny sensor under the skin in order to measure the glucose level in the interstitial fluid. One sensor can remain in that place for two weeks after which it has to be replaced by a new sensor. The sensor measures glucose level every one or five minutes in real time. In a span of two weeks, the sensor collects a substantial amount of data which can be analysed to get a variety of information like daily glucose profile, night-time glucose profile etc. It is possible to incorporate alarms into the sensor so that it can give the individual who wears it, warning in case hypoglycaemia occurs.

3.2 Non-invasive Methods of Diabetes Detection (Using ECG Analysis) 3.2.1

Diabetes and Associated Cardiac Changes

Diabetes can cause severe autonomic impairments. Diabetes induced high blood glucose/sugar (hyperglycaemia) causes cardiovascular malfunction and precapillary damage. This damage will affect the endothelial cells’ normal working and blocks the normal route of passage of nitric oxide (NO) [2]. NO is essential for vasodilation. Diabetes-induced-hyperglycaemia causes reduced activation of phosphorylation cascade, leading to less endothelial NO synthase which is required to synthesize NO. Diabetes, thus leads to reduction in the availability of NO. The endothelial cell damages due to diabetes cause the blood vessels to be vasoconstricted and it affects the normal blood circulation. Hyperglycaemia results in the production of free oxygen radicals which activate NO (derived from endothelium) and protein kinase C which boosts vasoconstrictive prostanoid production [3]. Hyperglycemia leads to endothelial damages, increases the activity and aggregability of the platelets [3, 4]. Eventually, monocytes, leukocytes and platelets are strongly adhered to endothelium. Blood coagulability is increased and fibrinolitic activity is decreased. Thus, fatty material is increasingly deposited on the inner side of the blood vessel wall due to the high blood glucose condition. The deposit leads to production of blocks and hardening of blood vessels (atherosclerosis), obstructing flow of blood through the blood vessels. Two major types of cardiovascular disease are coronary artery disease and cerebral vascular disease. Coronary artery disease (ischemic heart disease) is caused by thickening of blood vessels that go to the heart by deposits of fatty material. Heart’s blood flow is thus decreased or blocked leading to a heart attack. Increased blood sugar levels not only damage blood vessels, but also change the level of blood lipid. Diabetic people are at least twice more probable to develop

310

G. Swapna et al.

heart disorders or stroke than non-diabetic people. Heart attacks in people with diabetes are more serious (more likely to result in death). 60–70% of diabetic patients have some form of neuropathy caused by diabetes. Diabetic neuropathy can be further grouped as autonomic, focal, peripheral and proximal neuropathy. Our focus is on the diabetic neuropathy affecting the nerves connected with the functioning of the heart (neuropathy known by cardiovascular autonomic neuropathy (CAN)). Heart rate and blood pressure are affected by CAN. High glucose level associated with diabetes causes serious problems in different organs of the body. All the autonomic microvascular damages also cause decrease in local reflexes. CAN leads to diminished HRV indicative of diabetic neuropathy [5]. Diabetes induced CAN may cause ECG alterations like ST-T changes, sinus tachycardia, heart rate variability changes, long QTc etc. It was also confirmed that QT, QTc and ST dispersions are predictors of death in diabetic patients [6, 7]. Among these ECG alterations, we are concentrating on the HRV signal which can be used for diabetes diagnosis since HRV is indicative of cardiac disorders developed due to diabetes.

3.2.2

ECG Changes Due to Diabetes

ECG represents the role of autonomic nervous system (ANS) in regulating heart’s natural rhythm. The generation method of ECG signal is as follows. The origin of the heartbeat is in a form of an electric impulse from sino-atrial (SA) node. This contracts both atria and then activates atrioventricular (AV) node and spreads through both ventricles. The complete activity is represented in the ECG waveform (Fig. 3).

Fig. 3 Conducting system of the heart [8]

Diabetes Detection Using ECG Signals: An Overview

311

P, QRS, T and U are the prominent electrocardiographic deflections in the ECG signal. The activation or depolarization of the atria is represented by the P wave. Ventricles’ depolarization is represented by the QRS. The repolarization of the ventricles is the T wave. U wave represents the papillary muscle repolarization. Normal duration of P wave is 0.11 s while that of QRS complex is 0.10 s. The normal range of QT interval is 0.35–0.43 s while normal PR interval is 0.12–0.20 s. The widely used configuration for ECG measurements consists of 5 electrodes. One electrode each is positioned on the left arm (LA), right arm (RA), left leg (LL), right leg (RL) and chest to the right of the reference electrode. Another widely used ECG capture system consists of 10 electrodes (12 leads). Stern et al. found out from ECG that a diabetes affected person who showed no indications of CAN develop left ventricular hypertrophy [9]. This shows the high risk of a diabetic patient to develop cardiovascular disease in future. The work by Stern et al. did not stop there. Diet was strictly monitored and proper measures were taken to ensure cardiac health for the patient. Under these conditions, a six year followup was performed. Their observation was that the diabetes of this person remained well controlled and his ECG did not change further and he did not further show any clinical or ECG signs of neuropathy. The shape of the ECG indicates the cardiac health of the person [10].The difficulty in using ECG for the purpose of analysis is due to the fact that the delicate variations in the ECG waveform are extremely difficult to be differentiated by human perception. The performance of usual biosignal analysis methods is thus not up to the mark on ECG signals.

3.2.3

Heart Rate Variability

SA node functions as the heart’s pacemaker. The cardiac impulse generated here is influenced by the parasympathetic and sympathetic nervous systems. Cardioacceleration is caused by enhanced activity of sympathetic nervous system (SNS) or decreased parasympathetic nervous system (PNS) activity. Cardio-deceleration is caused by decreased SNS or increased PNS activity. Thus the status of the ANS is clearly understood from HRV signals. The SNS and PNS are the two branches of the ANS which together control the heart rate. Thus HRV can give a clear picture about sympathetic-parasympathetic balance. The instantaneous heart rate, together decided by the SNS and PNS, is strongly influenced by different kinds of neural, myocardial and hormonal factors [11]. The analysis of the non-invasive HRV data has innumerable applications in clinical areas of cardiology, physiology and pharmacology. HRV related cardiological impairment analysis is of real significance. They are simple and non-invasive, can detect impairments which have not gone to the stage of showing clear symptoms. If detected, the patient can further go in for detailed clinical tests. Research showed that the non-invasive HRV measurements are also reproducible if done under standard conditions [12, 13].

312

G. Swapna et al.

Heart rate signal contains the RR interval information ordered in time. The variation of RR intervals is known as HRV. The variations in the ANS due to hyperglycaemia can be represented well by HRV signals. Shape is an irrelevant feature for the discrete HRV signal. The HRV data available (i.e. instantaneous heart rate against time axis) can be analysed by different methods. It can serve as an excellent and accurate non-invasive technique to understand the state of the ANS which regulates the cardiac activity and heart rate.

4 Machine Learning for Diabetes Detection Before deep learning techniques emerged, biosignals were analysed mainly using machine learning (ML) techniques. ML applies artificial intelligence (AI) to systems to make them capable of automatic learning without explicit rule-based programming and without human assistance. In anomaly detection case, ML algorithm finds a mathematical function by itself that produce the correct outcome (anomaly present or absent) from the input training data (data from diagnostic tests like ECG, HRV), understanding the hidden patterns in input data. With this learned mathematical function, it should be able to predict the output state for a new set of input data with high accuracy. Extensive domain knowledge of the human system and its intricate mechanism coupled with deep understanding of the biosignal variations happening during the anomaly is imperative to decide what type of features has to be extracted from the biosignal and analysed. So the initial step required is the selection of desirable features which can be effectively used for the purpose of anomaly detection. Then these features are extracted and fed to classifiers to detect the presence of anomalies. In the case of diabetes detection using HRV, the initial research used different methods like time, frequency, nonlinear methods etc. All these methods gave different ranges for the parameters for the normal and abnormal signals. These distinctive ranges enabled classifiers to classify with accuracy above 85%. The nonlinear methods were specifically suited to biosignals like ECG which are inherently nonlinear and nonstationary in nature. The important methods of HRV analysis for diabetes detection using ML techniques are discussed below briefly. The features belonging to the below described domains are then passed through suitable classifiers.

4.1 Time Domain Methods Time domain measures involve statistical operations that involve calculating the mean and variance of the RR interval of HRV data. Important time domain parameters are average of heart rate, RMSSD and SDNN. Parameters like RMSSD are indicators of high frequency changes affecting heart rate and thus reflect the state of parasympathetic activity. The shortcoming of time domain measurements is that they

Diabetes Detection Using ECG Signals: An Overview

313

are very easily prone to outliers and artifacts. Hence, elimination of these artifacts has to be necessarily done for the data analysis.

4.2 Frequency Domain Methods Frequency domain measures analyse all available frequency components present in the HRV. Power spectrum density (PSD) can give valuable information about the neurogenic heart rhythms [14]. The high frequency region (0.15–0.5 Hz) is an indicator of the parasympathetic activity, the low frequency region (0.04–0.15 Hz) indicates the complete sympathetic and parasympathetic activities. Fast Fourier transform (FFT) is generally used for the estimation of PSD. Autoregressive (AR) model is another popular frequency domain representation very much suitable for analysis of biosignals like ECG and EEG. The reliability of frequency domain based methods decrease with the decrease in signal-to-noise power.

4.3 Wavelet Transform The traditional frequency domain techniques are incapable to provide exact time localization in a typical nonstationary biosignal. To overcome these, better techniques were developed. The wavelet analysis, which shows very good performance, involves comparison of the signal with a selected wavelet of limited duration and finding parameters. HRV analysis can thus be effectively performed making use of wavelet transform and also be used to obtain the time related information of various frequency bands [15].

4.4 Nonlinear Methods Nonlinear methods are much suited for analysing the nonlinear and nonstationary biosignals like ECG. Some of the important nonlinear parameters used for HRV analysis are approximate entropy (ApEn), higher order spectrum (HOS), detrended fluctuation analysis (DFA), correlation dimension (CD), recurrence quantification analysis (RQA) features and empirical mode decomposition (EMD) features.

4.4.1

Detrended Fluctuation Analysis (DFA)

DFA (Peng et al.) is very useful in assessing the fractal scaling characteristics of HRV data [16]. The fluctuation inherent in the data is represented by parameter α

314

G. Swapna et al.

(indicates irregularity of input data). Typically, α is closer to 1 for normal (young and healthy) people. α varies according to different cardiac disorders.

4.4.2

Correlation Dimension (CD)

CD is a nonlinear feature which can be effectively used for detecting anomalies. CD is a type of fractal dimension. Popular technique for finding out CD (proposed by Grassberger et al.) constructs a function C(r) by finding out the distance among all data points and then grouping them [17]. CD is found out by the expression given by C D = lim

r →0

log[C(r )] log(r )

(1)

The normal people produce a higher CD value when compared to the diabetic signal because normal RR signal has higher RR variability.

4.4.3

Approximate Entropy (ApEn)

ApEn is a measure of disorder in HR signal [18]. The value of ApEn is larger for more complex or irregular data (the normal case) and vice versa for cardiac impairment (diabetic) cases.

4.4.4

Recurrence Quantification Analysis (RQA)

Recurrence plot (by Eckmann et al.) is a graphical aid to identify concealed reoccurrences in time domain signal which may not be pronounced [19]. It measures the nonstationarity of the time-series. Several important parameters can be calculated from recurrence plot. Example of these parameters are laminarity (LAM), mean diagonal line length, recurrence rate (RR), determinism (DET), entropy and trapping time (TT).

4.4.5

Higher Order Spectrum (HOS)

HOS is very useful in the dynamical analysis of nonlinear, nonstationary and nongaussian biosignals. HOS (also called polyspectra) represents the cumulants and moments of order three and above. HOS can be effectively used for the analysis of HRV signals. Several useful HOS features can be extracted from HRV data and fed to different classifiers for the purpose of diabetes detection.

Diabetes Detection Using ECG Signals: An Overview

4.4.6

315

Empirical Mode Decomposition (EMD)

EMD will split the input signal into intrinsic mode functions (IMFs). The IMF generated features are well suited to effectively capture the nonlinearity and nonstationarity characteristics of biosignals like HRV.

5 Methodology of Deep Learning Techniques A variety of time, frequency, wavelet, nonlinear based features along with classifiers have been used for detecting diabetes in previous works. Our concentration in this chapter is on deep learning. Deep learning is an improvisation of machine learning and it is particularly suited to high dimensional data and for complex artificial intelligence problems. The shortcomings of machine learning led to development of deep learning [20]. All the explicit feature-related processes found in the conventional machine learning networks are implicitly performed in deep learning networks. Deep networks self-learn from the data and its efficiency is much better compared to the traditional feature extraction networks. Deep learning networks use cascaded layers of nonlinear processing units. These units do the task of feature extraction and transformation. The output of one unit is fed as input to the succeeding unit. The learning can be performed in a supervised or unsupervised manner. They normally use some kind of gradient descent method for training using back propagation method. Popular deep learning networks are briefly explained below.

5.1 Autoencoder (AE) AE is a type of neural network using unsupervised learning techniques and back propagation methods. Its target values are set to be equal to the inputs [21]. AE is built up of two symmetrical deep networks (typically four or five layers deep), one is for encoding and the other is for decoding. AE is thus implemented very similar to conventional neural networks except for the novelty that its goal is to recreate the input by learning the input data [22, 23].

5.2 Convolutional Neural Network (CNN) CNN is modified multilayer perceptron (MLP) employing convolution operation as one of its layers. CNN is basically built of three layers; convolutional layer followed by pooling and fully connected layers. CNN resembles neural networks in many of

316

G. Swapna et al.

its characteristics. In conventional neural network, it is y = f(x·w) where x and y denote input vector and output vector respectively and w the set of weights. But in the convolutional layer of CNN, it is y = f(s(x·w)) where s indicates the convolution operation between inputs and weights. CNN can be applied on a time series input data (1D) or on an image (2D).

5.3 Recurrent Structures (RNN, LSTM and GRU) 5.3.1

Recurrent Neural Network (RNN)

RNN is an improvement on feedforward network. RNN contain feedback loops (Fig. 4) which serve as short-term memory using which past information (in time scale) can be stored and retrieved. Temporal tasks can be adeptly executed by this modernization. There is no constraint on the permitted length of temporal sequences in RNN, unlike MLP. Parameters can also be shared across time-steps in RNN. In brief, the storage of RNN is replaced by another model incorporating feedback loops and these controlled states are named as gated memory. RNN is widely used in the areas of speech recognition, language modelling and machine translation. The cyclic connections present in RNN architecture makes it difficult to understand the working of RNNs in entirety. For better understanding and analysis purpose, RNN’s intricate network structures can be intelligently converted to FFNs form by unfurling in time scale (Fig. 4).

5.3.2

Long Short-Term Memory (LSTM)

LSTM (Hocreiter et al.) is an enhanced model of RNN, developed in order to model long-range dependencies of temporal sequences more accurately than conventional RNNs [24]. LSTM contains memory blocks in place of simple memory units of RNN Fig. 4 Schema of RNN and unfolded RNN in time (t = 1, t = 2) in onward path

Diabetes Detection Using ECG Signals: An Overview

317

Fig. 5 Memory blocks in RNN (left) and LSTM (right)

(Fig. 5). This property of LSTM made it of wide use in complex tasks like language modelling. Generally, it is of wide use in areas where long time series data analysis is required. Memory block in LSTM can be considered as a complex processing centre built of memory cells. The input and output gates are multiplicative gates which can permit or block the flow of cell activation through the memory unit to nodes coming further in the path. A set of modifiable multiplicative gates manage the entire processes happening in the memory block. Peephole connections and forget gate are the new additions to the LSTM architecture as research progressed. The forget gate can be used in place of CEC (constant error carousel). These three gates also assist the memory cell to store the information ranging across many time steps.

5.3.3

Gated Recurrent Unit (GRU)

GRU is an improved variety of LSTM having less number of parameters. GRU enable each recurrent unit to capture dependencies corresponding to different time scales in an adaptive manner. GRU has gating units that modulate the flow of information inside its memory, but unlike LSTM, it doesn’t have separate memory cells. The memory consumption and computational cost of GRU is much smaller than that of LSTM.

5.4 Hybrid of CNN-RNN, CNN-LSTM, CNN-GRU Hybrid deep neural network, in general, is a fusion of generative and discriminative neural networks so that the advantages of both can be combined effectively. Hybrid deep learning networks can be built out of cascading heterogeneous networks like CNN-LSTM. CNN extracts the spatial features and LSTM extracts the sequential information. This means CNN-LSTM collectively helps to extract spatio-temporal

318

G. Swapna et al.

information of signals like ECG (The details of experimental analysis and topology of work using CNN and CNN-LSTM are explained in Sect. 7). In the case of hybrid architectures like CNN-LSTM, CNN is made up of convolutional1D and maxpooling1D layers alone. Maxpooling layer’s output is passed as input to subsequent network. yi = C N N (xi )

(2)

The input and output of the CNN is xi and yi respectively. Each data type of xi has an associated class label. yi is the output vector of the maxpooling layer in CNN. yi is fed to the next deep learning network placed after CNN. The deep learning network can be of RNN, LSTM and GRU.

6 Literature Survey 6.1 Earlier Methods of Analysis of HRV Signals HRV signals are earlier analysed using the above described time, frequency and nonlinear based parameters. Evidences suggest that heart does not oscillate periodically under normal conditions [25]. Thus, nonlinear techniques, capable of extracting and analysing nonlinear features from HRV signals, are also widely used. Nonlinear features like Lyapunov exponent (Rosenstien et al.), 1/f slope (Kobayashi et al.), approximate entropy (ApEn) (Pincus), detrended fluctuation analysis (DFA) (Peng et al.) can be extracted from the HRV signals for further analysis [16, 18, 26, 27]. The range of the feature values gives indication of the possible anomaly. HRV signals classification is also done by nonlinear techniques [28, 29]. Nonlinear techniques are employed for the cardiac signal analysis for developing cardiac arrhythmia detection algorithms [30, 31].

6.2 Previous Works of Diabetes Detection Using Heart Rate (Including Machine Learning Based) Wheeler et al. first reported a reduced beat-to-beat variation is caused by diabetic neuropathy during deep breathing [32]. The works of Pfeifer, Singh, Villareal had confirmed that parasympathetic autonomic activity was reduced in diabetes affected people much earlier to clinical visibility of neuropathic symptoms [5, 33, 34]. Researchers have found out that diabetes patients who produced negative results after undergoing traditional cardiac function tests showed a decreased HRV. Correlation between fasting blood sugar and cardiovascular complications has been clearly established by many works [35, 36]. About one-fourth of the patients with serious

Diabetes Detection Using ECG Signals: An Overview

319

coronary disorder turned out to be diabetic patients too [37, 38]. This is because diabetes results in early development of coronary disease and atherosclerosis. All these results proved that HRV analysis can be used to identify diabetes. Diabetes-induced-CAN can be very damaging. Hence, early detection of CAN due to diabetes is very important. Ahsan et al. showed the HRV analysis using features likes sample entropy (SampEn) and Poincare plots are very useful in detecting CAN present in diabetic people [39]. Kirvela et al. performed frequency and time domain analysis of HRV (extracted from 24 h duration ECG recordings) [40]. All analysis parameters (both time and frequency) were significantly reduced in diabetic HRV samples compared to those from normal people. Mackay measured heart rate variation at different levels of breathing modes for normal and diabetic patients. It was observed that heart rate variation was markedly lower in diabetic people [41]. Jelinek et al. researched on the consequences of QT dispersion on normal and diabetic people also ensuring that people belonging to both classes had no previous history of cardiac diseases [42]. Heart rate variability was measured through a parameter named tone-entropy (T-E) where tone (T) is the representation of sympatho-vagal balance and entropy (E) is the representation of the autonomic regularity. T-E was observed to be reduced in diabetic people. On similar group of people on similar conditions, Awdah et al. observed that time domain parameters like St. George index, RMSSD, SDRR etc. were reduced in diabetic cases in comparison to normal cases [43]. Chemla et al. used the method of autoregressive frequency modelling for studying of the effect of HRV signals in diabetes affected people [44]. Schroeder et al. found out that time domain parameters of RMSSD, SDNN and RR interval were lower in diabetic people. They also observed that as diabetes progresses, proper autonomic function of the body will be badly affected [45]. Seyd et al. did time and frequency analysis of HRV [46]. The time domain parameters like mean RR interval, TINN, RMSSD, SDNN, NN50 count, HRV triangular index were reduced in diabetic patients than normal people. It was observed that there is considerable difference in power across different frequency ranges between diabetes people and normal people when frequency domain analysis was done. Trunkvalterova et al. proved that multiscale entropy (MSE) is capable of detecting very small aberrations in the cardiovascular systems of patients having type 1 diabetes. In their work, they used the estimator parameter of SampEn and linear measures like RMSSD [47]. Faust et al. analysed time, frequency and nonlinear features derived from HRV signals and showed that nonlinear methods gave better results in the diagnosis of diabetes compared to time domain and frequency domain methods [48]. Jian et al. applied principal component analysis (PCA) to HOS bispectrum magnitude plots obtained out of HRV signals. These were fed to SVM classifier to obtain diabetes detection accuracy value of 79.93% [49]. Acharya et al. arrived at an innovative diabetic integrated index (DII) making use of nonlinear features derived from HRV signal [50]. They obtained diabetes detection accuracy of 86% using adaboost classifier. Swapna et al. used HOS based features for diabetes detection with an accuracy of 90.5% [51]. Acharya et al. obtained accuracy of 90% extracting four nonlinear features using adaboost classifier [52]. Acharya

320

G. Swapna et al.

Table 3 A summary of machine learning methods used for detecting HRV parameters that were significantly different in diabetic patients (DM = Diabetes Mellitus) Authors

Methods/features

Observed activity for extracted features for DM

Pfeifer et al. [5]

Time domain

Kirvela et al. [40]

Frequency domain, time domain

HRV reduced

Singh et al. [33]

Frequency domain, time domain

Reduced LF power

Awdah et al. [43]

Time domain

Reduced

Flynn et al. [55]

DFA

Reduced short-term correlation in DM

Chemla et al. [44]

FFT, Autoregressive spectral analysis

Decreased

Schroeder et al. [45]

Time domain

Decreased

Seyd et al. [46]

Time, frequency domain

Decreased

Trunkvalterova et al. [47]

Nonlinear methods (multiscale entropy (MSE))

Decreased MSE

Faust et al. [48]

Time, frequency, nonlinear

Decreased

Acharya et al. [50]

Nonlinear (RQA, CD)

Accuracy is 86%

Swapna et al. [51]

HOS

Accuracy is 90.5%

Jian et al. [49]

HOS

Accuracy is 79.93%

Acharya et al. [52]

Nonlinear features

Accuracy is 90.0%

Acharya et al. [53]

DWT

Accuracy is 92.02%

Pachori et al. [54]

EMD related features

Accuracy is 95.63%

et al. used entropies, energy skewness and kurtosis to achieve diabetes detection accuracy of 92.02% employing decision tree (DT) classifier [53]. Pachori et al. used EMD on HRV signals along with Morlet wavelet kernel function to achieve the very high accuracy of 95.63% [54]. Table 3 summarises all the above works.

6.3 Deep Learning Based Diabetes Detection Works Using HRV These are some of the works connecting deep learning analysis methods and ECG. CNN based deep learning methods were used to analyse ECG to detect coronary artery disease (Acharya et al.), myocardial infarction (Acharya et al.), classify heartbeats (Acharya et al.) [56–58]. Sujadevi et al. analysed ECG to detect atrial fibrillation [59].

Diabetes Detection Using ECG Signals: An Overview

321

Table 4 Deep learning methods used for diabetes detection (with HRV as input) Authors

Methods/features

Accuracy

Swapna et al. [60]

Deep learning (CNN-LSTM)

Accuracy is 95.1%

Swapna et al. [61]

Deep learning (CNN-LSTM) followed by SVM

Accuracy is 95.7%

Regarding diabetes detection using ECG signals, Swapna et al. employed hybrid deep learning CNN-LSTM network with HRV as input to achieve a very high accuracy value of 95.1% which is comparable to maximum accuracy achieved so far [60]. Swapna et al. improved the above diabetes detection accuracy to 95.7% by adding SVM classifier after the CNN-LSTM network [61]. Accuracy details are given in Table 4.

7 Architecture and Implementation of Deep Learning Architecture—Sample Study The hybrid architecture for diabetes detection is discussed in detail in [60, 61]. The workflow of hybrid architecture is shown in Fig. 6. Deep learning architecture is implemented using powerful software framework of TensorFlow [62] in the case of

Fig. 6 The architecture of proposed system of [60, 61] (with and without SVM)

322

G. Swapna et al.

[60, 61]. TensorFlow is Google’s open-source software library. TensorFlow allows modelling of numerical systems as unified data flow graphs which in turn can be modelled as math related operations using tensors, nodes and edges. Heterogeneous platforms like CPUs, GPU and mobile devices can be used for performing computations. Regarding the work of Swapna et al. [60], the following network structure was implemented. The input layer was made of 1000 neurons (number of samples in the input data set was 1000). The values of the input data were normalized to fall between 0 and 1. The hidden layers consist of a CNN layer (with pool size as 2, stride as 1, number of filters as 64, kernel-size as 3), after that came the maxpooling, flatten and drop-out of 0.5 and ended with a fully-connected layer with sigmoid activation function. There was full connectivity between input, hidden and output layers. Three trials were made to run for 300 epochs (learning rate as 0.001, batch size as 16). All the final values of the above mentioned hyperparameters were fixed after experimenting with different values and then finding out the optimum values based on the performance of the deep learning network. These hyperparameters were corresponding to the first network architecture CNN1 (number of CNN networks in the topology is 1) we tried. The number of CNN layers were increased one by one to five and then we went to the hybrid architecture of attaching LSTM to each of the above five configurations. The accuracy of diabetes detection was found to be 95.1% (maximum value) for CNN5-LSTM. With respect to the second work of Swapna et al. [61], same configuration of the above work was used with the modification that the features extracted from the CNN/CNN-LSTM architecture were passed to the SVM classifier. This improved the accuracy of diabetes detection. Figure 6 shows the network topology comparison of the works [60, 61].

8 Deep Learning in Big Data Analysis: Limitations and Challenges The amount data handled is increasing to unimaginable proportions in size as well as dimensions today, take the case of applications like twitter and facebook. Related to biomedical area, if data is continuously taken from patients in real time, then the collected data can be viewed as big data making big data analysis or analytics capable of playing a remarkable role. For big data analytics, the traditional machine learning based analysis techniques are inadequate. In big data, volume of data handled is very high. In vertical dimension, it is number of records or samples present in the dataset and in horizontal dimension, it is number of features or parameters handled in the dataset. This data volume explosion has brought with it huge challenges in analysing the data. The time and memory taken for computations will have an exponential increase with increasing dataset size. The solution to handle this challenge is to develop architectures capable of parallel

Diabetes Detection Using ECG Signals: An Overview

323

processing of data. Another issue is that as the data volume is very high, it may not be possible to store the entire data in memory or disk. Many training/testing algorithms are designed assuming that the data is available in its entirety in memory. Because of this, such algorithms cannot be run successfully. This is known as the curse of modularity. Distributed computing and parallelization can be resorted to tackle this challenge. Further, there are challenging issues of high dimensionality of the data, highly diverse nature of data and high variation in the probability of occurrence of classes in data which if not handled, will deteriorate the performance of the machine learning network. In machine learning, proper selection of features is crucial using domain knowledge. As the dataset grows in dimension as well as in sample size, it is extremely difficult to create relevant features. Feature selection is also very difficult in high dimensional data. These issues in handling and analysing big data led to the situation of deep learning networks occupy the stage instead of traditional machine learning networks. Concentrating on applying deep learning techniques to ECG-derived-HRV data for the purpose of diabetes detection, the best performed models [60, 61] applied it on real-time data and these works can be considered as the foundation stone towards future work in this direction. Further improvements in accuracy can be tried by giving larger sized input data into the developed architecture compared to the data given in the above works. Present advanced ECG measurement equipment take very less duration (less than 5 min) to extract ECG signal for analysis. On the other hand, there are Holter monitors which do a continuous (for at least 24–48 h) monitoring of ECG signal of a person to check for possible abnormalities which cannot be known by the short-term ECG monitoring. Machine learning techniques are sufficient to handle short-term ECG data. Deep learning networks and algorithms are suitable for relatively short-term data also considering the fact that analysis results can be obtained very quickly in real time. The second case of analysis of large amount of data (continuous ECG signal with duration more than 24 h), say from Holter monitors, also requires big data analytics and deep learning algorithms. If long duration ECG data are available to researchers, deep learning architectures like LSTM and hybrid systems like CNNLSTM are available which are capable of analysing the non-invasive data for the future possibility of being affected by diabetes. Hence if real time big data is made available to deep learning networks, the scenario will shift fast from the problem of detection of a disease to that of prediction of a disease in near future.

9 Conclusion The body of the diabetes affected person is either incapable of producing sufficient insulin or resistant to the produced insulin leading to unbalanced high blood sugar. Autonomic impairments which are nonsymptomatic, but can only be clinically detectable, are evident only after many years have passed after the onset of diabetes. Thus, HRV can be used as an early sign of the impending diabetic neuropathy and

324

G. Swapna et al.

can be used for diabetes detection with high accuracy. HRV analysis is thus a simple, non-invasive and reproducible detection method of diabetes. Deep learning methods can be used to detect diabetes with very high accuracy. Distributed deep learning systems can give results very fast that can turn real time analysis of biosignals a reality. So it can be said for sure that the future of biomedical engineering belongs to the featureless, deep learning based systems which can do big data analytics with no necessity of domain knowledge.

References 1. Ralston, S.H., Penman, I.D., Strachan, M.W., Hobson, R.P.: Davidson’s Principles and Practice of Medicine, 23rd edn. Elsevier 2. Viktor, S., Steven, I., Marina, D.I., Aleksander, N., Vojislava, M.: Impact of diabetes on heart rate variability and left ventricular function in patients after myocardial infarction. Facta Univ. Ser.: Med. Biol. 12(3), 130–134 (2005) 3. Di Carli, M.F., Janisse, J., Grunberger, G., Ager, J.: Role chronic hyperglycemia in the pathogenesis of coronary microvascular dysfunction in diabetes. J. Am. Coll. Cardiol. 41, 1387–1393 (2003) 4. Gresele, P., Guglielmini, G., Deangelis, M., et al.: Acute short-term hyperglycemia enhances heart stress-induced platelet activation in patients with type 2 diabetes mellitus. J. Am. Coll. Cardiol. 41, 1013–1020 (2003) 5. Pfiefer, M.A., Cook, D., Brodsky, J., Tice, D., Reenan, A., Swedine, S., et al.: Quantitative evaluation of cardiac parasympathetic activity in normal and diabetic man. Diabetes 339–345 (1982) 6. Sawicki, P.T., Dahne, R., Bender, R., Berger, M.: Prolonged QT interval as a predictor of mortality in diabetic nephropathy. Diabetologia 39(1), 77–81 (1996) 7. Okin, P.M., Devereaux, R.B., Howard, B.V., Welty, T.K.: Assessment of QT interval and QT dispersion for prediction of all-cause mortality and cardiovascular mortality in American Indians: the Strong Heart Study. Circulation 101, 61–66 (2000) 8. Barrett, K.E., Barman, M.S., Boitano, S., Brooks, H.: Ganong’s Review of Medical Physiology. McGraw-Hill Companies 9. Stern, S., Sclarowsky, S.: The ECG in diabetes mellitus. Am. Heart Assoc. (AHA) J. (2009) 10. Sokolow, M., Mcllroy, M.B., Chiethin, M.D.: Clinical Cardiology. VLANGE Medical Book (1990) 11. Constant, I., Laude, D., Murat, I., Elghozi, J.L.: Pulse rate variability is not a surrogate for heart rate variability. Clin. Sci. 97, 391–397 (1999) 12. Kleiger, R.E., Bigger, J.T., Bosner, M.S., Chung, M.K., Cook, J.R., Rolnitzky, L.M., et al.: Stability over time of variables measuring heart rate variability in normal subjects. Am. J. Cardiol. 68, 626–630 (1991) 13. Ge, D., Srinivasan, N., Krishnan, S.M.: Cardiac arrhythmia classification using autoregressive modeling. Biomed. Eng. Online 1(1), 5 (2002) 14. Akselrod, S., Gordon, D., Madwed, J.B., Snidman, N.C., Shannon, D.C., Cohen, R.J.: Hemodynamic regulation: investigation by spectral analysis. Am. J. Physiol. 249(4 Pt 2), H867–H875 (1985) 15. Gamero, L.G., Vila, J., Palacios, F.: Wavelet transform analysis of heart rate variability during myocardial ischaemia. Med. Biol. Eng. Comput. 40, 72–78 (2002) 16. Peng, C.K., Havlin, S., Hausdorf, J.M., Mietus, J.E., Stanley, H.E., Goldberger, A.L.: Fractal mechanisms and heart rate dynamics. J. Electrocardiol. 28(Suppl), 59–64 (1996) 17. Grassberger, P., Procassia, I.: Measuring the strangeness of strange attractors. Phys. D 9, 189–208 (1983)

Diabetes Detection Using ECG Signals: An Overview

325

18. Pincus, S.M.: Approximate entropy as a measure of system complexity. Proc. Natl. Acad. Sci. U.S.A. 88, 2297–2301 (1991) 19. Eckmann, J.P., Kamphorst, S.O., Ruelle, D.: Recurrence plots of dynamical systems. Europhys. Lett. 4, 973–977 (1987) 20. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press. http://www. deeplearningbook.org (2016) 21. Poultney, C., Chopra, S., Cun, Y.L., et al.: Efficient learning of sparse representations with an energy-based model. In: Advances in Neural Information Processing Systems, pp. 1137–1144 (2006) 22. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006) 23. Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103. ACM (2008) 24. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 25. Goldberger, A.L., West, B.J.: Application of non-linear dynamics to clinical cardiology. Ann. N. Y. Acad. Sci. 504, 195–213 (1987) 26. Rosenstien, M., Collins, J.J., De Luca, C.J.: A practical method for calculating largest Lyapunov exponents from small data sets. Phys. D 65, 117–134 (1993) 27. Kobayashi, M., Musha, T.: 1/f fluctuation of heart beat period. IEEE Trans. Biomed. Eng. 29, 456–457 (1982) 28. Acharya, U.R., Kannathal, N., Krishan, S.M.: Comprehensive analysis of cardiac health using heart rate signals. Physiol. Meas. J. 25, 1130–1151 (2004) 29. Acharya, U.R., Paul Joseph, K., Kannathal, N., Lim, C.M., Suri, J.S.: Heart rate variability: a review. Med. Biol. Eng. Comput. 44(12), 1031–1051 (2006) 30. Chua, K.C., Chandran, V., Acharya, U.R., Lim, C.M.: Computer-based analysis of cardiac state using entropies, recurrence plots and Poincare geometry. J. Med. Eng. Technol. 2(4), 263–272 (2008) 31. Acharya, U.R., Suri, J.S., Spaan, J.A.E., Krisnan, S.M.: Advances in Cardiac Signal Processing. Springer Verlag GmbH Berlin Heidelberg (2007) 32. Wheeler, T., Watkins, P.J.: Cardiac denervation in diabetes. Br. Med. J. 4, 584–586 (1973) 33. Singh, J.P., Larson, M.G., O’Donell, C.J., Wilson, P.F., Tsuji, H., Lyod-Jones, D.M., Levy, D.: Association of hyperglycemia with reduced heart rate variability: the Framingham heart study. Am. J. Cardiol. 86, 309–312 (2000) 34. Villareal, R.P., Liu, B.C., Massumi, A.: Heart rate variability and cardiovascular mortality. Curr. Atheroscler. Rep. 4(2), 120–127 (2002) 35. Stamler, J., Vaccaro, D., Neaton, J.D., Wentworth, D.: Diabetes, other risk factors, and 12-year cardiovascular mortality for men screened in the multiple risk factor intervention trial. Diabetes Care 16, 434–444 (1993) 36. Coutinho, M., Gerstein, H.C., Wang, Y., Yusuf, S.: The relationship between glucose and incidence cardiovascular events: a meta-regression analysis of published data from 20 studies of 95783 individuals followed for 12.4 years. Diabetes Care 22, 233–240 (1999) 37. Melchior, T., Kober, L., Madsen, C.R., et al.: Accelerating impact of diabetes mellitus on mortality in the years following an acute myocardial infarction. Eur. Heart J. 20, 973–978 (1999) 38. Braunwald, E., Antman, E., Beasley, J.W., et al.: ACC/AHA guidelines for the management of patients with unstable angina and non-ST-segment elevation myocardial infarction. J. Am. Coll. Cardiol. 36, 970–1062 (2000) 39. Khandoker, A.H., Jelinek, H.F., Palaniswami, M.L: Identifying diabetic patients with cardiac autonomic neuropathy by heart rate complexity analysis. Biomed. Eng. Online 8, 1–12 (2009) 40. Kirvela, M., Salmela, K., et al.: Heart rate variability in diabetic and non-diabetic renal transplant patients. Acta Anaesthesiol. Scand. 40(7), 804–808 (1996)

326

G. Swapna et al.

41. Mackay, J.D.: Respiratory sinus arrhythmia in diabetic neuropathy. Diabetologia 24(4), 253–256 (1983). https://doi.org/10.1007/BF00282709 42. Jelinek, H.F., Flynn, A., Warner, P.: Automated assessment of cardiovascular disease associated with diabetes in rural and remote health practice. In: The National SARRAH Conference, pp. 1–7 (2004) 43. Awdah, A., Nabil, A., Ahmad, S., Reem, Q., Khidir, A.: Time-domain analysis of heart rate variability in diabetic patients with and without autonomic neuropathy. Ann. Saudi Med. 22, 5–6 (2002) 44. Chemla, D., Young, J., Badilini, F., Maison, B.P., Affres, H., Lecarpentier, Y., Chanson, P.: Comparison of fast Fourier transform and autoregressive spectral analysis for the study of heart rate variability in diabetic patients. Int. J. Cardiol. 104(3), 307–313 (2005) 45. Schroeder, E.B., Chambless, L.E., Liao, D., Prineas, R.J., Evans, G.W., Rosamond, W.D., et al.: Diabetes, glucose, insulin, and heart rate variability: the Atherosclerosis Risk in Communities (ARIC) study. Diabetes Care 28(3), 668–674 (2005) 46. Seyd, P.T.A., Ahamed, V.T., Jacob, J., Joseph, P.: Time and frequency domain analysis of heart rate variability and their correlations in diabetes mellitus. World Acad. Sci. Eng. Technol. 2(3) (2008) 47. Trunkvalterova, Z., Javorka, M., Tonhajzerova, I., Javorkova, J., Lazarova, Z., Javorka, K., Baumert, M.: Reduced short-term complexity of heart rate and blood pressure dynamics in patients with diabetes mellitus type 1: multiscale entropy analysis. J. Physiol. Meas. 29(7) (2008) 48. Faust, O., Acharya, U.R., Molinari, F., Chattopadhyay, S., Tamura, T.: Linear and non-linear analysis of cardiac health in diabetic subjects. Biomed. Signal Process. Control 7(3), 295–302 (2012) 49. Jian, L.W., Lim, T.C.: Automated detection of diabetes by means of higher order spectral features obtained from heart rate signals. J. Med. Imaging Health Inform. 3, 440–447 (2013) 50. Acharya, U.R., Faust, O., VinithaSree, S., Ghista, D.N., Dua, S., Joseph, P., Thajudin, A.V.I., Janarthanan, N., Tamura, T.: An integrated diabetic index using heart rate variability signal features for diagnosis of diabetes. Comput. Methods Biomech. Biomed. Eng. 16, 222–234 (2013) 51. Swapna, G., Acharya, U.R., VinithaSree, S., Suri, J.S.: Automated detection of diabetes using higher order spectral features extracted from heart rate signals. Intell. Data Anal. 17(2), 309–326 (2013) 52. Acharya, U.R., Faust, O., Kadri, N.A., Suri, J.S., Yu, W.: Automated identification of normal and diabetes heart rate signals using nonlinear measures. Comput. Biol. Med. 43(10), 1523–1529 (2013) 53. Acharya, U.R., Vidya, S., Ghista, D.N., Lim, W.J.E., Molinari, F., Sankaranarayanan, M.: Computer-aided diagnosis of diabetic subjects by HRV signals using discrete wavelet transform method. Knowl.-Based Syst. 42, 4567–4581 (2015) 54. Pachori, R.B., Kumar, M., Avinash, P., Shashank, K., Acharya, U.R.: An improved online paradigm for screening of diabetic patients using RR-interval signals. J. Mech. Med. Biol. 16, 1640003 (2016) 55. Flynn, A.C., Jelinek, A.F., Smith, M.: Heart rate variability analysis: a useful assessment tool for diabetes associated cardiac dysfunction in rural and remote areas. Aust. J. Rural Health 13(2), 77–82 (2005) 56. Acharya, U.R., Fujita, H., Oh, S.L., Adam, M., Tan, J.H., Chua, C.K.: Automated detection of coronary artery disease using different durations of ECG segments with convolutional neural network. Knowl.-Based Syst. 132, 62–71 (2017) 57. Acharya, U.R., Fujita, H., Oh, S.L., Hagiwara, Y., Tan, J.H., Adam, M.: Application of deep convolutional neural network for automated detection of myocardial infarction using ECG signals. Inf. Sci. 415, 190–198 (2017) 58. Acharya, U.R., Oh, S.L., Hagiwara, Y., Tan, J.H., Adam, M., Gertych, A., Tan, R.S.: A deep convolutional neural network model to classify heartbeats. Comput. Biol. Med. 89, 389–396 (2017)

Diabetes Detection Using ECG Signals: An Overview

327

59. Sujadevi, V.G., Soman, K.P., Vinayakumar, R.: Real-time detection of atrial fibrillation from short time single lead ECG traces using recurrent neural networks. In: The International Symposium on Intelligent Systems Technologies and Applications, pp. 212–221, Sept 2017. Springer 60. Swapna, G., Soman, K.P., Vinayakumar, R.: Automated detection of diabetes using CNN and CNN-LSTM network and heart rate signals. Procedia Comput. Sci. 132, 1253–1262 (2018) 61. Swapna, G., Vinayakumar, R., Soman, K.P.: Diabetes detection using deep learning algorithms. ICT Express 4, 243–246 (2018) 62. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al.: Tensorflow: a system for large-scale machine learning. OSDI 16, 265–283 (2016)

G. Swapna is a Ph.D. student in the Computational Engineering and Networking, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, India since July 2015. She is also a faculty at Government Engineering College, Kozhikode, Kerala, India. K. P. Soman has 25 years of research and teaching experience at Amrita School of Engineering, Coimbatore. He has around 150 publications in national and international journals and conference proceedings. He has organized a series of workshops and summer schools in Advanced signal processing using wavelets, Kernel Methods for pattern classification, Deep learning, and Big-data Analytics for industry and academia. He authored books on “Insight into Wavelets”, “Insight into Data mining”, “Support Vector Machines and Other Kernel Methods” and “Signal and Image processing-the sparse way”, published by Prentice Hall, New Delhi, and Elsevier. R. Vinayakumar is a Ph.D. student in the Computational Engineering and Networking, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, India since July 2015. He has several papers in Machine Learning applied to Cyber Security. His Ph.D. work centers on Application of Machine learning (sometimes Deep learning) for Cyber Security and discusses the importance of Natural language processing, Image processing and Big data analytics for Cyber Security. He has participated in several international shared tasks and organized a shared task on detecting malicious domain names (DMD 2018) as part of SSCC’18 and ICACCI’18. More details available at https://vinayakumarr.github.io/.

Deep Learning and the Future of Biomedical Image Analysis Monika Jyotiyana and Nishtha Kesswani

Abstract Deep Learning (DL) is popular among the researchers and academicians due to its reliability and accuracy, especially in the field of engineering and medical sciences. In the field of medical imaging for the diagnosis of disease, DL techniques are very helpful for early detection. Most important features of DL techniques are that they are uncomplicated with lower complexity, which ultimately saves the time and money and tackle many tough tasks simultaneously. Artificial Intelligence (AI) and Deep Learning (DL) technologies have rapidly improved in recent years. These techniques played an important role in every field of application, especially in the medical field such as in image processing, image fusion, image segmentation, image retrieval, image analysis, computer aided diagnosis (CAD), image registration and, image-guided therapy and many more. The aim of writing this chapter is to describe the DL methods and, the future of biomedical imaging using DL in detail and discuss the issues and challenges. Keywords Machine Learning · Deep Learning · Convolutional Neural Networks · Recurrent Neural Network · Computer-Aided Diagnosis

1 Introduction Currently, DL techniques are one of the most often used algorithms for getting better, scalable, and accurate results from the data as compared to state-of-the-art methods of Machine Learning (ML). DL is also applied to the biomedical images to detect (diagnose) diseases with precisely tailored treatment plans for improving the patient’s health. EEG, ECG, MEG, MRI, etc. are the trending biomedical images for diagnosis of patients by minimising the intervention of humans. These medical images may also contain noise, which makes it difficult to analyse them accurately. Deep Learning has the potential to give reliable and precise results with higher M. Jyotiyana (B) · N. Kesswani Central University of Rajasthan, Bandar Sindri, Ajmer, India e-mail: [email protected] N. Kesswani e-mail: [email protected] © Springer Nature Switzerland AG 2020 S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics, Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_15

329

330

M. Jyotiyana and N. Kesswani

accuracy. Every technology has some pros and cons, Similarly DL is also having some cons too, like it gives promising outcomes when data size is huge, it needs GPU to process the medical images with requiring higher system configurations. Although Deep Learning is having some disadvantages but still it is trendy in present scenario due to its capability of processing huge amount of data. This chapter discusses, the state-of-the-art approaches of DL for biomedical images. We will also discuss the application of Deep Learning (DL) for classification, registration, segmentation, issues and challenges of DL approaches and future of Deep Learning in biomedical imaging.

1.1 Deep Learning Within the wide assortment of various Machine Learning (ML) approaches, Deep Learning has truly marked its presence with its excellent performance, particularly in the area of medical image processing. Deep Learning (DL) belongs to the area of ML, which in turn is a fork of AI. It deals with algorithms motivated by the structure and functioning of the brains. It permits computing models to learn from the representation of dataset with the aid of numerous hidden processing layers [1]. These layers are concerned with idea of feature extraction and transformation. The output from the last layer is fed into the subsequent one. It is a way to automate predictive analysis. Moreover, it can excel in performance in both supervised and unsupervised approaches (Fig. 1). The working of Deep Learning approach is shown in Fig. 2. In this approach, firstly dataset and particular Deep Learning algorithm are chosen for which model is to be designed, in further steps comprehensive experiments are performed and thereafter the results are generated and analyzed. Numerous Deep Learning models have been developed, such as Convolutional Neural Networks (CNN), Deep Belief Networks (DBN), Recurrent Neural Network (RNN) etc. which are discussed in the following subsections: Convolutional Neural Networks (CNN) Being a DL technique, CNN can take the image as input, allot weights, and biases to multiple image objects and that can be distinguish from the each other. CNN has a lower pre-processing requirement in comparison to other classification techniques. The architecture of the Visual Cortex encourages the structure of CNN. In this, respond to stimuli is done by single neuron in the restricted region of the receptive field (visual field). A group of such fields coincides with covering the full visual surface. It has multiple convolutional layers. The first layer captures low-level features while add-on layers adapt to render an overall understanding of the image, which used in the dataset (Fig. 3).

Deep Learning and the Future of Biomedical Image Analysis

331

Fig. 1 Deep Neural Networks (DNN) architecture

Fig. 2 Working of the Deep Learning model

Recurrent Neural Network (RNN) RNN has been successfully deployed in the sequential data for Google Voice Search and Apple Siri. It has a unique feature to remember the input due to its internal memory. It considers current input and previously received inputs as feedback while Convolution Neural Networks works as a Feed-Forward Neural Network (FFNN) which is only concerned with current input. It is used in time series, text classification,

332

M. Jyotiyana and N. Kesswani

Fig. 3 Working model of Convolutional Neural Networks

audio, video as it develops a deeper understanding of the sequence and context. In RNN, the information transfers through a loop. Decision making is done based upon the previously learned parameters and the current one (Fig. 4). Long-Short Term Memory (LSTM) A version of RNN capable of handling a long sequence of input through the operation of gates. It has following gates—input gate, forget gate, output gate. These three gates manage to add new input, or to forget the least important information or to show to the impact of the parameter on the output at the current timestamp. It has a memory where read, write, and delete operations can be performed. It resolves the vanishing gradient issue with relatively high training accuracy. Encoder-Decoder (ED) Encoder-Decoder architecture surpasses the traditions of ML methods. It has transformed as a core technology for prediction in neural networks and sequence-tosequence technique. It has ability to tackle with variable length input and output.

Fig. 4 Represents the RNN architecture

Deep Learning and the Future of Biomedical Image Analysis

333

Fig. 5 Basic architecture of Encoder and Decoder

The encoder holds the input sequence and maps it to an encoded sequence. The encoded version is utilized by the decoder to materialize it into output (Fig. 5).

1.2 Biomedical Imaging Deep Learning is the growing and trendy research area in medical research for the diagnosis of the diseases. In today’s scenario, people are primarily suffering from lifestyle diseases like type-2 diabetes, obesity, heart diseases, and neurodegenerative diseases due to the consumption of drugs, alcohol, smoking and unhealthy diet. Deep Learning is playing a vital role in the prediction of such diseases. In our day to day life, Computer-Aided Diagnosis (CAD) is preferable for testing and diagnosing any disease via Computerised Tomography (CT), Single Photon Emission Computed Tomography (SPECT), Positron Emission Tomography (PET), Magnetic Resonance Imaging (MRI) and some more. Deep Learning accelerates the processing speed of the diagnosis as well as it can expand the 2D and 3D parameters for further details. It can also resolve the issues regarding data labeling and over fitting to some extent.

1.3 Role of Deep Learning in Diagnosis from Various Medical Images There are many diseases which can be classified or diagnosed using DNN like breast cancer, aphasia, attention deficit hyperactivity disorder (ADHD) and many more. Deep Learning is very much popular research area with the help of which we can diagnose any type of disease. For example, ADHD is a very common mental disorder

334

M. Jyotiyana and N. Kesswani

among children. A child suffering from ADHD may have to face some problems like poor concentration power, distractibility, weakness, and excessive activity. Similarly Deep Learning is also used for detection of cancer, Alzheimer, Parkinson’s, brain tumor and many more.

1.4 Applications Deep Learning is prevalent in nowadays not even in the field of health informatics but in daily routine life too. Many prediction and classification tasks are managed by Deep Learning because of its promising results, accuracy, and faster processing with less complexity. There are many applications of Deep Learning, but some typical popular health informatics applications are: 1. Content-based image retrieval 2. Object detection Face detection Disease diagnosis Lesion detection 3. Machine vision and medical imaging Tumor detection Tumor stage Surgery planning Remote surgery Intra-surgery navigation Virtual surgery simulation 4. Recognition tasks Iris recognition Pattern recognition.

2 Deep Learning in Medical Imaging Machines are faster and more accurate as compare to humans, so humans prefer machine/computer-based jobs mostly. In medical sciences, Computer-Aided Diagnosis (CAD) and automatic medical image analysis are the preferable choices, or we can say crucial too. CAD also playing the important role in the modeling disease progression [2, 3], like in many neurodegenerative disorders (NDD) such as strokes, Parkinson’s disease (PD), Alzheimer’s disease (AD) and another type of dementia,

Deep Learning and the Future of Biomedical Image Analysis

335

brain scan is crucial and detailed maps of brain regions are available for analysis and prediction of the diseases. We can add the most popular task of CAD in medical imaging as a cancer diagnosis and measuring the intensity of lesions too. In current years, CNN’s are more popular because of its spectacular performance and reliability. The efficiency and performance of CNN’s are indicated in a survey of CNN methods/algorithms in which brain pathology segmentation [4] and Deep Learning approaches are used in CAD, shape prediction, and segmentation [2]. The massive challenge in CAD is in distinguishing intensity of tumors and shape and the variations in imaging protocols in same neuro-imaging modality. In various cases, it’s been noticed that intensity of pathological tissues may overlap with healthy tissues and different types of noises like Rician noise, intensity-based noise and nonisotropic resolution effects in MRI cannot be handled easily or by using elementary Machine Learning (ML) approaches. To handle such type of data complications, hand-crafted features and well established ML methods are used to classify them in an entirely distinct step. Deep Learning approaches can automate and unite the features with classification approaches [5, 6]. CNN is capable of learning more complex features; thus, CNN is capable of handling patch of the images centered on unhealthy tissues. CNN in medical imaging is able to classify tuberculosis manifestation based on X-ray images [7], and classification of lung disease based on CT images [8]. Along with Hemorrhages detection in color fundus images [9] CNN can extract least discriminative patches and most discriminative patches in pre-training stage. CNN has proposed some segmentation methods of iso-intense stage brain cells [10] and extraction of different brain regions from multi-modality Magnetic Resonance Images (MRI) [11]. There are many hybrid approaches proposed in which CNN combines with other architectures for example, in [12] DL approach is proposed, to encode the parameters of a distorted model and, the process of segmentation of heart’s left ventricle from short-axis Magnetic Resonance Imaging. CNN itself distinguishes the left ventricle while Deep Auto-Encoder (DAE) is employed to infer its shape.

2.1 Classification Classification, classifies the data into various classes according to our need. There are many cutting edge techniques for classification such as Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Random Forest (RF), Neural Networks (NN) and most recent technique is Deep Learning (DL) in which we used different approaches of DL for classification. CNN is a trendy method in the field of biomedical imaging and health informatics for classification. Details of the image classification are discussed in the next section.

336

2.1.1

M. Jyotiyana and N. Kesswani

Image Classification

Image classification is the broad area in which Deep Learning has an immense contribution. In classification, multiple images are used as input with one variable as the output and that output is compared to the desired output to check whether the disease is diagnosed or not. We can use different classifiers like Support Vector Machine (SVM), Random Forest (RF), Artificial Neural Networks (ANN), and many more. Medical image classification is crucial in image recognition; its prime focus is to classify medical images into various categories for diagnosis of a disease or helping the researchers in further research. Medical image classification can be performed by extracting useful features from the image and, using those features to build classification models that classify the image from the dataset. When CAD was not as popular as it is today, in that era, doctors commonly used their experience for extracting features, from the medical image and then classify the image into various classes. This is an ordinarily complicated, tedious, and timeconsuming job. Deep Learning resolves the issue of accurate prediction means DL is giving more precise results than humans and also it is faster to predict. It can also process many datasets of different patients. In recent years, medical imaging applications have great merits not only in the case of solving issues of doctors but in research too. However, we researchers still cannot succeed in the mission efficiently. If studies could perform classification efficiently and excellently, then it would be a great help to doctors for diagnosis of diseases.

2.1.2

Object or Lesion Classification

In medical image analysis and diagnosis, CAD provides an opinion (second objective or additional) as an assistant. In recent years many types of research and studies have proved that incorporation of CAD system boots up the diagnosis processes faster as well as accurate, by enhancing the image diagnosis by lessening inter-observer variation [13, 14]. CAD enhances quantitative support for clinical recommendations like biopsy [15]. For the identification of tumor, CAD is often constructed from following important steps such as, feature selection, feature extraction, and classification [16–19]. Various ML and DL classification techniques [20] have been proposed to classify cancerous and healthy cells [21]. The main challenge is to reduce the dimensions of features without losing significant information. In Deep Learning, the dataset is the major issue if the dataset is smaller in size; it makes it more difficult to predict some instances with the least risk of over-fitting [21]. The researchers have given many solutions for lesion classification, but most of them accomplish feature space reduction by deriving short feature sets selecting the features or constructing new features in supervised ways [21].

Deep Learning and the Future of Biomedical Image Analysis

337

2.2 Detection In the detection of any organ or tissue, image preprocessing segmentation followed by classification for detection of any disease or classify the subject. Detection of carcinoma cancer cells using Deep Learning consist of few steps, unsupervised or supervised feature learning, image representation using CNN, automatic detection of BCC then last step is visual interpretation [4, 5, 22].

2.2.1

Organ, Region, Landmark Recognition

In medical imaging, the organ and region detection is an important task especially in cancer, and neurodegenerative diseases, When the organ deformation activity is recorded in MRI or other modality then it becomes easy to diagnose the type of disease subject is suffering from and stages of the disease [23]. In case of cancer diagnosis of tumor/brain tumor its plays vital role for treatment planning. A prime challenge in microscopic image analysis is to analyze all independent cells for precise or exact detection, although the distinction of most of the disease grades depends on the cell level information [24]. To accomplish this dare, academician and researcher used CNN for faultless detection and segmentation of cell robustly from histo-pathological images [24, 25], outstandingly used for cancer diagnosis.

2.2.2

Object and Lesion Detection

As discussed in Sect. 2.1.2, object and lesion detection is similar to its classification. The only difference is that for the detection of lesion we have to perform segmentation task first then perform classification or prediction for the diagnosis of disease [23, 26]. In the current scenario, Deep Learning provides promising results so that early stages and treatment can contribute to the patient at the right time. For example, in the year 2018, Abraham et al. suggested a novel method of lesion segmentation using U-Net Deep Learning architecture to enhance segmentation accuracy and disease diagnosis or prediction [27].

2.3 Segmentation Segmentation plays a significant role in predicting the disease/disorder by dividing an image into multiple segments and then compare the segments/parts with testing data [3]. CNN architecture, mostly followed in segmentation U-Net is recently most used architecture for 3D image segmentation. In the medical image analysis, segmentation plays a vital role; we divide the image based on similar properties like color, contrast, brightness, and grey level, etc.

338

M. Jyotiyana and N. Kesswani

Fig. 6 Shows the image segmentation methods

Some of the image segmentation methods are threshold, edge-based, region-based methods, ANN based methods, unsupervised learning methods and many more. For the sake of brevity, the details have not been given here (Fig. 6).

2.3.1

Organ and Substructure Segmentation

Researchers and medical practitioners perform segmentation task for diagnosis of disease stage and its intensity. It is widely used in cancer diagnosis, Cancer is prevalent now-a-days, for example, in the US alone 23,000 cases of brain tumor reported in 2015, and this statistic is increasing day by day. Although the usual treatment for brain tumors is brain surgery but other treatments including chemotherapy and radio-therapies slowdowns the rate of the tumor growth. MRI gives full structural and functional details of the brain. But, tumor segmentation from MR images, CT images, or other medical imaging modality can enhance the improved diagnostic, growth rate of the tumor, size of tumor and planning of the treatment [28]. Tumors like meningioma as can be segmented effortlessly, while gliomas are complicated to segment due to poor contrast and extended tentacles like structure [28]. The prime objective of tumor segmentation is, to mark the location of the tumor and detect the extended region (where cancer cells are present) and compare the affected tissues with healthy tissues for diagnosis [28].

2.3.2

Lesion Segmentation

There are many leading edge approaches for lesion segmentation, but CNN gives the most promising results in 2D as well as 3D biological data [29]. Yuan proposed lesion segmentation method [30] for the detection of melanoma automatically from surrounding skin cells using convolution and deconvolution method [30]. For the diagnosis of various types of cancerous cells, CNN and other DL methods are used, because they give more accuracy and promising results in less time period.

Deep Learning and the Future of Biomedical Image Analysis

339

2.4 Registration Registration is a task of analysis of the images; in this procedure coordinate transform is calculated via one to another image. Although registration is accomplishing in an iterative framework [31], yet assumes a non-parametric transformation and so the predetermined matrix is optimized [31]. MRI analysis is multi-parametric tissue information gather within fewer acquisition times, larger cohorts, higher spatial and temporal resolution, and atlases. We can also conclude that mathematically, image registration is a challenging geometric analysis, optimization of strategies, and numerical schemes [32–34].

2.5 Other Tasks in Medical Imaging There are many other tasks in medical imaging for enriching the quality of image and diagnosis of disease. We will describe them in following subsections:

2.5.1

Content-Based Image Retrieval

Content-Based Image Retrieval (CBIR) tasks prime goal is to assist the physician by yielding similar medical cases of a given image in the process of decision making. It requires massive dataset to be used in DL, sharp image representation and algorithms that reliably retrieve the most identical image and their interpretation. The first application of DL with CBIR came in 2015 [35]. In the year 2019, Pizarro et al. [36] designed CNN architecture for automated inferring the contrast of MRI scans based on pixel amplitude or intensity of the MR images of multiple slices [37].

2.5.2

Image Generation and Enhancement

DL in medical imaging has usually focused on classification, prediction, and segmentation of reconstructed images. Deep Learning penetrates recently into the lower level of MR measurement techniques or approaches from MR image acquisition to denoising and super-resolution [32].

2.5.3

Combining Image Data with Reports

As the massive data is pre-processed in Deep Learning, it gives better results, which helps the radiologist in disease diagnosis and further research. The nearby instances and different probabilities of the occurring of the symptoms of the disease is included in the report of the subjects which helps in strong decision making.

340

M. Jyotiyana and N. Kesswani

3 Future of Deep Learning in Biomedical Imaging An upcoming new era will be known in the health sector, where medical imaging and data will play a vital role. As the human population is increasing day by day, the number of cases/subjects will also increase, as we are aware of the fact that Deep Learning is applied on massive datasets, if the number of cases recorded will increase then the problem of the large dataset will resolve automatically. The fundamental requirement of any subject is that right treatment should be given to the right subject in limited time. In this context, we can say that the availability of massive dataset brings immense opportunities as well as challenges. In many studies, it is reported that CAD is more accurate than humans in disease diagnosis, and it can handle many of the cases simultaneously. Thus CAD availability and reliability is no more an issue in this technological world. In current years, Deep Learning replaces the ML and Pattern Recognition because of the availability of great number of data-driven solutions in medical imaging by permitting the automatic feature creation and lessens human intervention during the procedure [20]. It is favorable in many health informatics problems, and ultimately, Deep Learning reinforces speedily in forward direction for unstructured data originate from biomedical imaging, bioinformatics, and medical informatics. Most of designed applications of DL to medical imaging process the health data which is an unstructured source [20]. However, a plenty of information is encoded in structured data [20]. This gives complete information about the subject’s history, treatment, diagnosis and pathology. In medical imaging, in tumor detection cases, the cytological notes include information about the tumor stage and its spread [20]. Such kind of information is crucial; it is required for judging the patient’s condition or disease. Deep Learning boosts up the reliability of the clinical decision support system with artificial intelligence (AI).

3.1 Recent Methods and Predictive Models As the popularity of Deep Learning increases due to its reliability and flexibility, there are many approaches and frameworks in the field of biomedical imaging, which are popular over time. Recently, CNN is popular with the combination of other Deep Learning architecture like CNN with Auto-Encoder, CNN with SVM for classification, CNN with K-Means algorithm in image segmentation; similarly there are various methods and architecture available for resolving the real-life problems and other research problems. There are some CNN models available with different layers and structure, such as VGG [38], AlexNet [39], GoogLeNet [40], ResNet [41], Highway nets [42], DenseNet [43], ResNext [44], SENets [45], NASNet [46], YOLO [47], GANs [48], Siamese nets [49], U-net [50], V-net [51], and many more.

Deep Learning and the Future of Biomedical Image Analysis

341

4 Challenges and Issues There are various issues and challenges associated with various application domains in particular with the medical applications that need to be solved: • Data volume: Deep Learning being highly computational it tries to process big amount of data. It is not generalized to have a specific number of training documents, but at least 10 sample parameters in the network should be there as a general thumb rule. We can find large volume of data for the various application domains like computer vision, speech, natural language etc. As we are aware of the fact that the population of the earth is increasing day by day, so number of cases of diseases also increase hence, collection of data is easier. • Data quality: Data quality is again a pertinent issue in the area of Deep Learning because, in some application domains the data which is heterogeneous, raw, noisy and incomplete, may cause wrongly interpreted results; so, to maintain the quality of data with such huge and heterogeneous raw database while training a good DL model has several issues, such as data scarcity, repetition of data and missing values that needs to be considered. • Interpretability: Despite of successful implementation of Deep Learning models in few application domains, still Deep Learning models are treated as black boxes, as interpretability for the various application domains is crucial for the predictive systems. • Domain complexity: The domain complexity is another issue; as we talk about the medical domain, the data sets are highly heterogeneous with incomplete knowledge of their causes and their progress. Hence designing and developing Deep Learning model with the domain complexity is very important aspect of the training models. • Temporality: In various applications domains like medical domain datasets are changing over the time in a nondeterministic way because the diseases are progressing and the Deep Learning models are trained with static vector based inputs and are not trained to handle the time factor. So, designing or developing the DL model while taking temporal data into consideration is another aspect of Deep Learning. These challenges and issues associated with the Deep Learning opens the door for the future research directions. • Feature enrichment: There is limited data available in the world because of less number of patient are present that characterize each disease. The data set required for generating the features are not limited to the specific data source like social media etc., the data sources can be collected through various wearable devices, surveys, social communities etc. The integration of data sources with the Deep Learning models is another research challenges in front of the research community. • Temporal modeling: In health sector and real life problems time is crucial. If the involvement of machine like CAD systems and EHR and other monitoring devices, then time is very sensitive and training with Deep Learning should be faster, accurate and reliable too for understanding subject’s condition and detecting

342

M. Jyotiyana and N. Kesswani

the stage of the disease. For solving the issue we can trust on RNNs and architecture coupled with memory. • Interpretable modeling: In Deep Learning, performance of model is important but reliability or interpretability of the model is also very important. Deep Learning is trendier because of its promising results and great performance, yet, how to make the results more explanatory is also a task. Researchers should focus on model performance as well as on algorithms too; to develop better prediction inability of the systems.

References 1. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015) 2. Greenspan, H., Van Ginneken, B., Summers, R.M.: Guest editorial deep learning in medical imaging: overview and future promise of an exciting new technique. IEEE Trans. Med. Imaging 35(5), 1153–1159 (2016) 3. Stoyanov, D., Taylor, Z., Sarikaya, D., McLeod, J., Ballester, M.A.G., Codella, N.C., De Ribaupierre, S. (eds.): OR 2.0 Context-Aware Operating Theaters, Computer Assisted Robotic Endoscopy, Clinical Image-Based Procedures, and Skin Image Analysis: First International Workshop, OR 2.0 2018, 5th International Workshop, CARE 2018, 7th International Workshop, CLIP 2018, Third International Workshop, ISIC 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16 and 20, 2018, Proceedings, vol. 11041. Springer (2018) 4. Havaei, M., Guizard, N., Larochelle, H., Jodoin, P.M.: Deep learning trends for focal brain pathology segmentation in MRI. In: Machine Learning for Health Informatics, pp. 125–148. Springer, Cham (2016) 5. Nie, D., Zhang, H., Adeli, E., Liu, L., Shen, D.: 3D deep learning for multi-modal imagingguided survival time prediction of brain tumor patients. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 212–220, Oct 2016. Springer, Cham 6. Xu, T., Zhang, H., Huang, X., Zhang, S., Metaxas, D.N.: Multimodal deep learning for cervical dysplasia diagnosis. In: International Conference on Medical Image Computing and ComputerAssisted Intervention, pp. 115–123, Oct 2016. Springer, Cham 7. Cao, Y., Liu, C., Liu, B., Brunette, M.J., Zhang, N., Sun, T., Curioso, W.H.: Improving tuberculosis diagnostics using deep learning and mobile health technologies among resource-poor and marginalized communities. In: 2016 IEEE First International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE), pp. 274–281, June 2016. IEEE 8. Anthimopoulos, M., Christodoulidis, S., Ebner, L., Christe, A., Mougiakakou, S.: Lung pattern classification for interstitial lung diseases using a deep convolutional neural network. IEEE Trans. Med. Imaging 35(5), 1207–1216 (2016) 9. van Grinsven, M.J., van Ginneken, B., Hoyng, C.B., Theelen, T., Sánchez, C.I.: Fast convolutional neural network training using selective data sampling: application to hemorrhage detection in color fundus images. IEEE Trans. Med. Imaging 35(5), 1273–1284 (2016) 10. Zhang, W., Li, R., Deng, H., Wang, L., Lin, W., Ji, S., Shen, D.: Deep convolutional neural networks for multi-modality isointense infant brain image segmentation. NeuroImage 108, 214–224 (2015) 11. Kleesiek, J., Urban, G., Hubert, A., Schwarz, D., Maier-Hein, K., Bendszus, M., Biller, A.: Deep MRI brain extraction: a 3D convolutional neural network for skull stripping. NeuroImage 129, 460–469 (2016)

Deep Learning and the Future of Biomedical Image Analysis

343

12. Avendi, M.R., Kheradvar, A., Jafarkhani, H.: A combined deep-learning and deformable-model approach to fully automatic segmentation of the left ventricle in cardiac MRI. Med. Image Anal. 30, 108–119 (2016) 13. Singh, S., Maxwell, J., Baker, J.A., Nicholas, J.L., Lo, J.Y.: Computer-aided classification of breast masses: performance and interobserver variability of expert radiologists versus residents. Radiology 258(1), 73–80 (2011) 14. Sahiner, B., Chan, H.P., Roubidoux, M.A., Hadjiiski, L.M., Helvie, M.A., Paramagul, C., Blane, C.: Malignant and benign breast masses on 3D US volumetric images: effect of computer-aided diagnosis on radiologist accuracy. Radiology 242(3), 716–724 (2007) 15. Joo, S., Yang, Y.S., Moon, W.K., Kim, H.C.: Computer-aided diagnosis of solid breast nodules: use of an artificial neural network based on multiple sonographic features. IEEE Trans. Med. Imaging 23(10), 1292–1300 (2004) 16. Chen, C.M., Chou, Y.H., Han, K.C., Hung, G.S., Tiu, C.M., Chiou, H.J., Chiou, S.Y.: Breast lesions on sonograms: computer-aided diagnosis with nearly setting-independent features and artificial neural networks. Radiology 226(2), 504–514 (2003) 17. Sun, T., Zhang, R., Wang, J., Li, X., Guo, X.: Computer-aided diagnosis for early-stage lung cancer based on longitudinal and balanced data. PLoS ONE 8(5), e63559 (2013) 18. Newell, D., Nie, K., Chen, J.H., Hsu, C.C., Hon, J.Y., Nalcioglu, O., Su, M.Y.: Selection of diagnostic features on breast MRI to differentiate between malignant and benign lesions using computer-aided diagnosis: differences in lesions presenting as mass and non-mass-like enhancement. Eur. Radiol. 20(4), 771–781 (2010) 19. Tourassi, G.D., Frederick, E.D., Markey, M.K., Floyd, C.E.: Application of the mutual information criterion for feature selection in computer-aided diagnosis. Med. Phys. 28(12), 2394–2402 (2001) 20. Ravì, D., Wong, C., Deligianni, F., Berthelot, M., Andreu-Perez, J., Lo, B., Yang, G.Z.: Deep learning for health informatics. IEEE J. Biomed. Health Inform. 21(1), 4–21 (2017) 21. Lu, J., Getz, G., Miska, E.A., Alvarez-Saavedra, E., Lamb, J., Peck, D., Downing, J.R.: MicroRNA expression profiles classify human cancers. Nature 435(7043), 834 (2005) 22. Cruz-Roa, A.A., Ovalle, J.E.A., Madabhushi, A., Osorio, F.A.G.: A deep learning architecture for image representation, visual interpretability and automated basal-cell carcinoma cancer detection. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 403–410, Sept 2013. Springer, Berlin, Heidelberg 23. Bowles, C., Qin, C., Guerrero, R., Gunn, R., Hammers, A., Dickie, D.A., Rueckert, D.: Brain lesion segmentation through image synthesis and outlier detection. NeuroImage Clin. 16, 643–658 (2017) 24. Shen, D., Wu, G., Suk, H.I.: Deep learning in medical image analysis. Annu. Rev. Biomed. Eng. 19, 221–248 (2017) 25. Chen, H., Dou, Q., Wang, X., Qin, J., Heng, P.A.: Mitosis detection in breast cancer histology images via deep cascaded networks. In: Thirtieth AAAI Conference on Artificial Intelligence, Feb 2016 26. Van Leemput, K., Maes, F., Vandermeulen, D., Colchester, A., Suetens, P.: Automated segmentation of multiple sclerosis lesions by model outlier detection. IEEE Trans. Med. Imaging 20(8), 677–688 (2001) 27. Abraham, N., Khan, N.M.: A novel focal Tversky loss function with improved attention U-Net for lesion segmentation. arXiv preprint arXiv:1810.07842 (2018) 28. Havaei, M., Davy, A., Warde-Farley, D., Biard, A., Courville, A., Bengio, Y., Larochelle, H.: Brain tumor segmentation with deep neural networks. Med. Image Anal. 35, 18–31 (2017) 29. Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., Cardoso, M.J.: Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 240–248. Springer, Cham (2017) 30. Yuan, Y.: Automatic skin lesion segmentation with fully convolutional-deconvolutional networks. arXiv preprint arXiv:1703.05165 (2017)

344

M. Jyotiyana and N. Kesswani

31. Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., Sánchez, C.I.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017) 32. Lundervold, A.S., Lundervold, A.: An overview of deep learning in medical imaging focusing on MRI. Z. Med. Phys. (2018) 33. Maclaren, J., Herbst, M., Speck, O., Zaitsev, M.: Prospective motion correction in brain imaging: a review. Magn. Reson. Med. 69(3), 621–636 (2013) 34. Zaitsev, M., Akin, B., LeVan, P., Knowles, B.R.: Prospective motion correction in functional MRI. NeuroImage 154, 33–42 (2017) 35. Juneja, K., Verma, A., Goel, S., Goel, S.: A survey on recent image indexing and retrieval techniques for low-level feature extraction in CBIR systems. In: 2015 IEEE International Conference on Computational Intelligence & Communication Technology, pp. 67–72, Feb 2015. IEEE 36. Pizarro, R., Assemlal, H.E., De Nigris, D., Elliott, C., Antel, S., Arnold, D., Shmuel, A.: Using deep learning algorithms to automatically identify the brain MRI contrast: implications for managing large databases. Neuroinformatics 17(1), 115–130 (2019) 37. Sklan, J.E., Plassard, A.J., Fabbri, D., Landman, B.A.: Toward content-based image retrieval with deep convolutional neural networks. In: Medical Imaging 2015: Biomedical Applications in Molecular, Structural, and Functional Imaging, vol. 9417, p. 94172C, Mar 2015. International Society for Optics and Photonics 38. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) 39. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012) 40. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) 41. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 42. Srivastava, R.K., Greff, K., Schmidhuber, J.: Training very deep networks. In: Advances in Neural Information Processing Systems, pp. 2377–2385 (2015) 43. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017) 44. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017) 45. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018) 46. Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8697–8710 (2018) 47. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) 48. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014) 49. Koch, G., Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. In: ICML Deep Learning Workshop, vol. 2, July 2015 50. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241, Oct 2015. Springer, Cham

Deep Learning and the Future of Biomedical Image Analysis

345

51. Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571, Oct 2016. IEEE

Monika Jyotiyana currently is a doctoral student in the Department of Computer Science at the Central University of Rajasthan, Rajasthan, India. She received her Master’s degree in 2012 and completed her Undergraduate degree in 2010 from Aryan International College affiliated to MDS University Ajmer, Rajasthan, India. Her research interests include medical image processing, Machine Learning, Deep Learning and Neural Networks. She has published quality papers in various conferences, and book chapters. Nishtha Kesswani is currently assistant professor in the Department of Computer Science at the Central University of Rajasthan. She did her post-doctorate research from California State University, San Bernardino, USA and doctorate from the University of Rajasthan, Rajasthan, India. She received her Master’s degree from Malaviya National Institute of Technology, Rajasthan, India. Her research interests include Algorithms, Human-Computer Interaction and Wireless networks. She has publications in various international journals, conferences, books and book chapters.

Automated Brain Tumor Segmentation in MRI Images Using Deep Learning: Overview, Challenges and Future Minakshi Sharma and Neha Miglani

Abstract Brain tumor segmentation of MRI images is a crucial task in the medical image processing. It is very important that a brain tumor can be diagnosed in initial stages which eventually improve treatment as well as survival chances of patient. Manual segmentation is highly dependent on doctor, it may vary from one expert to another as well as it is very time-consuming. On the other side, automated segmentation helps a doctor in quick decision making, results can be reproduced and records can be maintained electronically which improves diagnosis and treatment planning. There are numerous automated approaches for brain tumor detection which are popular from last few decades namely Neural Networks (NN) and Support Vector Machine (SVM). But, recently Deep Learning has attained a central tract as far as automation of Brain tumor segmentation is concerned because deep architecture is able to represent complex structures, self-learning and efficiently process large amounts of MRI-based image data. Initially the chapter starts with brain tumor introduction and its various types. In the next section, various preprocessing techniques are discussed. Preprocessing is a crucial step for the correctness of an automated system. After preprocessing of image various feature extraction and feature reduction techniques are discussed. In the next section, conventional methods of image segmentation are covered and later on different deep learning algorithms are discussed which are relevant in this domain. Then, in the next section, various challenges are discussed which are being faced in medical image segmentation due to deep learning. In the last section, a comparative study is done between various existing algorithms in terms of accuracy, specificity, and sensitivity on about 200 Brain Images. The motivation of this chapter is to give an overview of deep learning-based segmentation algorithms in terms of existing work, various challenges, along with its future scope. This chapter deals with providing the crux of different algorithms involved in the process of Brain Tumor Classification and comparative analysis has also been done to inspect which algorithm is best. M. Sharma (B) · N. Miglani Department of Computer Engineering, National Institute of Technology, Kurukshetra, India e-mail: [email protected] N. Miglani e-mail: [email protected] © Springer Nature Switzerland AG 2020 S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics, Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_16

347

348

M. Sharma and N. Miglani

Keywords Convolution neural networks · Brain tumor segmentation · Deep learning · Magnetic resonance images · Support vector machine · Medical image processing

1 Introduction In earlier times, one could not imagine getting facilitated with a huge amount of health care data; whereas, an enormous amount of data (precisely, big data) is available today; reason being an enhancements in an image acquisition devices and tools, which is further engrossing as well as leading to varying challenges in the domain of image analysis. The magnification and widening extent in terms of medical data, such as images and techniques demands exhaustive and arid attempts by medical professionals which would not only be error-prone but also be mutable across different professionals. Ergo, an equivalent substitute is an absolute concern to automate the diagnostic process. Although, machine language could help to do such automation, yet the traditional approach would not work efficiently for complicated problems. Thus, some sort of blending could be considered to raise an accuracy and precision level in such fields. Henceforth, machine learning along with high-performance computing might help to tackle complex medical images for authentic and adequate diagnostic outcomes. Similarly, feature extraction could be more powerful if done with the help of deep learning; as such it could help to build new images as well. The conclusions obtained by deep learning would hit the many domains namely, diagnosing the disease, providing accurate measurement of targets as well as providing solutions in terms of predictive models suggesting what actions could be preformed, eventually, guiding the field experts. In the past few years, many fields have shown fleet evolution such as Artificial Intelligence, Deep Learning, and Machine Learning. These modalities played a crucial role in the medical domains such as segmenting images, registering and interpreting images, automated diagnosis, image processing, analyzing and retrieving image data. Machine learning assists in image data and features extraction and presenting this information in an organized way. These techniques of Artificial Intelligence and Machine Learning can help medical experts make predictions about the likelihood of diseases in a more detailed and precise manner and eventually, would help to prevent them beforehand. Specialists, experts and researchers of medical fields get enhanced and clear vision for making an analysis of generic variations that are actually responsible for disease manifestation. Numerous traditional algorithms form the core of these techniques, namely K Nearest Neighbors Algorithm, Neural Networks, etc. [1]. Though these approaches are efficient yet they have their own shortcomings in terms of processing power and time consumption. They have the potential to process the images in their raw form but need more time for feature extraction as well as an expert comprehension.

Automated Brain Tumor Segmentation in MRI Images Using …

349

Along with these conventional approaches, many other approaches have also started empowering the domain namely, Long Short Term Memory, Extreme Learning Model, Recurrent Neural Network, Convolution Neural Network and many more. These techniques overcome the limitations of conventional approaches as in feature extraction is automated and learning becomes fast. They tend to automate the depiction of information and training multiple levels of cogitation from a broad set of images that exhibit required data behavior [2]. Despite the fact that conventional approaches have proven to revert significantly precise results in the medical fields, still emerging technologies and advancements helped to derive accurate solutions for complex problems as well. Numerous deep learning algorithms produced significant performance and speed improvements in major areas like the discovery of drugs, text and speech recognition, facial recognition, etc. The chapter persuasion lies in extensive and exhaustive retrospection of deep learning algorithms in the medical fields, particularly, medical image analysis marking the future perspective while considering the past work as well. The chapter is inclined towards providing the elementary information and modernity and highest development of the deep learning approaches in the context of the medical domain.

2 What Is Brain Tumor? Brain controls all imperative and essential functions of the human body. It forms one of the most crucial and complicated organs of the human body and is a dominant part of the Central Nervous System. Skull masks the human brain, which further consists of- “gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF)”. Cerebrospinal fluid (CSF) is a translucent liquid that sheathes the human brain as well as the spinal cord. It provides different functionalities to the central nervous system (CNS) as well as acts as confines shocks comprising ions, oxygen, and glucose, distributed in nervous tissues at full length. CSF also aids in ejecting garbage from nervous tissues [3–5].

2.1 Types of Brain Tumor [6–9] As per the World health organization (WHO), there are approximately 120 different types of brain tumor which have been detected so far. WHO’s classification criteria is the cell’s origin. Broadly, brain tumors can be further classified in two categories (as shown in Fig. 1): a. Primary brain tumors: These are the tumor that originates in the brain. The name of the tumor can be determined from where they originate. Tumors can be categorized in two ways—benign and malignant. Benign tumors are also known as non-cancerous and do not affect other parts. While malignant brain

350

M. Sharma and N. Miglani Types of Brain Tumor

Secondary Brain Tumor

Primary Brain Tumor Giloma(45%)

Astrocytoma (34%)

Pilocytic GRADE 1

Low grade GRADE 2

Metastasis

Ependymoma(2%)

Anaplastic GRADE 3

Giloblastoma GRADE 4

Oligodendroglioma (3%)

Subependymoma

Fig. 1 Brain tumor classification based on their originating cell

tumors start in the brain itself and affect other parts of the body like a spine. The growth rate of malignant tumors is fast. Benign brain tumors are easy to treat than malignant tumors because they are not deeply buried in the brain and have defined boundaries. Also, benign tumor if removed successfully there is very little chance to come back. But that is not true in case of malignant tumor, if they have been removed still there is a chance of coming back. b. Secondary Brain Tumors: These are tumors that come from other body parts (Metastasis). This type of tumor is named according to body parts from where they originally spread. If a brain tumor develops from lung, then it is known as metastatic lung tumor; reason being that this tumor gets developed due to abnormal growth of lung cells [10]. By Grade In medical terms, brain tumors can be classified into four grades. Grade 1 tumors are the tumors that are in the initial phase and their growth rate is slow. Grade 2 is known as a benign tumor. Grade1 and Grade2 come under low-grade tumor category. Grade 3 and Grade 4 can be categorized as a high-grade tumor (Malignant) and need urgent treatment (Refer Fig. 2).

3 What Is Deep Learning? Deep learning is also known as structured learning or hierarchical learning. It is a sub type of machine learning that mimics human brain. In simple neuron architecture, there are only three types of layers (input layer, hidden layer, output layer). Instead, deep neural network contain many hidden layers for multiprocessing.

Automated Brain Tumor Segmentation in MRI Images Using …

351

Brain Tumor

• • • • •

Benign

Malignant

Grade 1/Grade 2

Grade 3/Grade 4

Lethargic growth Little chances of reverting back No proliferation in other parts Surgery is enough to treat Radiotherapy or Chemotherapy is not required

• •

Fast growth rate May Recur even after treatment (surgery)



Infect other parts as well



radiotherapy / chemotherapy is required

Fig. 2 Brain tumor classification according to grade

3.1 Deep Learning Architecture and Neural Network Artificial Neural Network is an initiative to generate a computer model of the human brain, targeted to build a system which can do computational work at a faster rate in comparison to traditional approaches. ANN receives basic units which are interconnected in some way to offer communication between those units [11]. An earliest known neural network being built upon the analogy of biological neural network was perceptron. The simple concept of perceptron was the nodes in an input layer being directly linked to the nodes of an output layer and was efficient enough to hit the problem of linear separability of patterns. But this simple network was not enough to target complex problems, henceforth, a layered structure as proposed to focus complex patterns, which comprised, input layer and output layer; and along with these layers, it also consisted of n number of hidden layers. In basic neural network approach, neurons in an inter-connected form receives input, manipulations and operations are performed on the received data and is eventually, forwarded to the next layer which could be an output layer or a hidden layer depending upon the created structure as shown in Fig. 3. The concept of activation function is used to check what value would be given to the next layer. Depending upon the threshold value selected, neuron would get excited or inhibited. By increasing the number of hidden layers, complex problems can be solved as hidden layers apprehend non-linear relationship. Though hidden layers focus complex problems yet it is important to use minimum number of hidden

352

M. Sharma and N. Miglani

Input Values

Input Layer

Hidden Layer1

Hidden Layer2

Output Layer

Fig. 3 Neural network structure [70]

layers because it would complicate the structure as well. Such types of neural networks are called Deep Neural Network. Training and learning of data is cost-effective in such networks. These extra layers, precisely hidden layers, facilitate constitution of features originating from lower layers and moving towards upper layers by providing ability of designing complex architecture. For designing and developing automated applications, deep learning has emerged as a promising approach and has set a benchmark as well. Results being obtained by automations outperformed manual observations, i.e., when applied in medical domains, computer vision applications based upon deep learning provided accurate and precise results in capturing cancer identifying indicators in tumors and blood in MRIs. It can be judged as an augmentation of artificial neural network which comprised numerous hidden layers permitting an abstract view and refined image analysis. This approach has grabbed attention by many researchers of varying fields because of its unsurpassed conclusions and results obtained in different applications such as facial recognition, object detection as well as medical fields. Deep Neural Network assembles many layers of nodes/neurons, generating a hierarchical structure. The layer count has even exceeded over thousand layers in a single network. With such tremendous modeling dimensions, the network can absolutely recollect every feasible mapping with the help of regressive training process by collecting huge database and could be able to make apt predictions such as reckoning of unseen cases. Therefore, it can be concluded that this approach definitely has an empowering impression in the fields of medical images and computer vision. Nonetheless, its huge influence can also be seen in the fields of voice and text as well. Researchers are exploiting different domains and extensions of deep neural network; one such example is Convolution Neural Network, an absolute trend nowadays. Along with it, many more fields of this domain has become an interest of researchers such as deep Boltzmann machine (DBM), deep neural network (DNN),

Automated Brain Tumor Segmentation in MRI Images Using …

353

deep autoencodre (dA), deep belief network (DBN), recurrent neural network (RNN) and its variants such as MDLATM or BLSTM etc. (depicted with their advantages and disadvantages) in Table 1. The CNN model is grabbing an attention in the fields of digital image processing and vision. Basic Working Deep neural network’s working has been divided into five steps as depicted in Fig. 4. First step involves identification of problem and feasibility study that should be carried out. It is very crucial step to know whether deep learning can solve given problem or not. In second step, relevant data is required to be collected. There are various deep learning algorithms available, thus in the third step, selection of appropriate algorithm is carried out. Eventually, fourth and fifth step deals with training and testing of data.

4 Benefits of Deep Learning Over Machine Learning Image interpretation and acquisitions are two ways of performing correct disease diagnosis. In the past few years, tools and devices for acquiring images have upgraded considerably in such a way that nowadays high-resolution radiological images are retrieved for performing further analysis namely, CT scans, X-Ray, MRI. Nonetheless, this is just an initiation of achieving benefits from automating the process of interpreting images. Numerous applications of machine learning, such as computer vision, are already there, yet conventional machine learning approaches which are used for image interpretations have strong dependency on experts in terms of features extraction, an instance could be detection of brain tumor, which entails structural feature extraction. Conventional approaches though efficient, yet yields an inaccurate and unreliable results; reason being the huge dissimilarities between patients’ data. Henceforth, machine learning algorithm plays crucial role in handling disordered and convoluted data [12]. Furthermore, deep learning, being more peculiar and precise approach has diverted so much interest in every field, precisely, medical fields for analyzing images and expectations behold around $300 million medical imaging market would be held by deep learning by 2021. It would separately get huge investment for medical domains, as in providing better accuracies and results for complex data as well. Growth of deep learning has shown tremendous growth over years (as shown in Fig. 5) The approach falls in the class of supervised machine learning method. Deep learning has targeted many and varying fields, one of them is the computer vision. Yet main success lies in the contraction of human involvement for disease diagnosis and relying on automated results in order to get high veracity level, specifically in the field of brain tumor where infinitesimal mistake in analysis could cause blunder. In such cases, deep learning approach provides significant results as it can better approximate and mimic the human brain by using leading methodologies and technologies in comparison to basic neural network approach. Deep learning delves the utility of

354

M. Sharma and N. Miglani

Table 1 Comparative analysis of deep learning architecture Type of network

Description

Advantages

Disadvantages

Deep Neural Network [1, 72]

It contains more than two hidden layers which can be applied to more complex relationship. It is mainly used in the field of classification and regression

Adapt to new problem very easily It does not require feature engineering which consumes lot of time and efforts.

It requires large amount of data Requires more Computational time to train It cannot summarize classification process

Convolution Neural Network [62]

Convolution neural network comprises n basic building blocks, namely pooling layers, convolution layers, and fully inter-connected layers, and is developed to yield an automation of training features spatial hierarchies using back-propagation approach

CNN have relatively less preprocessing of image CNN shows good result for 2D data

Requires lots of labeled data for classification

Recurrent Neural Network (RNN) [73]

In RNN weights are shared across sequences

RNN architecture helps in time dependent events. It also play major role in speech recognition, natural language processing

RNN has the disadvantage that it needs dataset in large number

Deep Boltzmann Machine (DBM) [74]

It maintains only unidirectional connections between hidden layers

Helpful I in ambiguous dataset

Optimization is very difficult for such a large dataset

Deep Auto-encoder (DA) [47]

Unsupervised learning uses Deep Auto encoder and helps in feature or dimensionality reduction

DA does not require labeled data

DA suffers from the problem of vanishing Processing time is more due pre-training step

Deep Belief Network (DBN) [75, 76]

The model comprises connection which is unidirectional. It makes use of supervised as well as unsupervised machine learning approaches. Every sub network consisting of hidden layer remains visible to its next layer

Greedy approach being used in every respective layer as well as inference compliant enhances the plausibility

An initialization phase makes learning process computationally exhaustive and expensive

Automated Brain Tumor Segmentation in MRI Images Using … Fig. 4 Basic working of deep neural network

355

Understand problem and check for feasibility Collect Data Select Deep Learning Algorithm according to requirement

Training Algorithm

Testing for performance

Fig. 5 Growth of deep neural network [12]

deep and in-sight model of neural network. The technique proves it’s worth when available knowledge in little and problem in hand is complicated and realistic. The crux of neural network is its basic unit-neuron, inspiration being the working of human brain, where multiple signals acts as an input unit, signals are passed on from one layer to another, layers being linked together on the basis of inter-connection weights. Eventually, the combined signals are passed through different non-linear operations, resulting in an output signal.

4.1 Comparison of Different Architecture of Deep Learning Models Comparison of different architecture of deep learning models (Shown in Table 1).

356

M. Sharma and N. Miglani

5 Brain Tumor Classification Steps Brain tumor classification consist seven stages from data collection to tumor detection as shown in Fig. 6:

5.1 MR Image Acquisitions First step is to develop MR image database. Images are collected from 1.5 T MRI machine and images which are generally used have size 256 × 256. The intensity of grey scale image has range [0 255] where 0 represents black and 255 represents white. This database can be divided into two types-Training database and Testing database. These images are stored in jpeg format. Examples of brain images are shown in Fig. 7.

5.2 Image Preprocessing Image preprocessing is a crucial step for an accurate result of subsequent steps. It removes image noise, detect edges or contrast enhancement and is used for loading input image to MATLAB environment [13–17]. Some techniques which are used by proposed system for preprocessing are:

1. MR Image Acquisition 2. Image Pre-Processing 3. Feature Extraction (GLCM) 4. Feature Reduction (Genetic Algorithm) 5. Image Classification and image segmentation using ANFIS, FCM 6. Result (Type of tumor, Area of tumor) 7. Performance (Evaluation and comparisons) Fig. 6 Seven steps involved in the brain classification

Knowledge Base

Automated Brain Tumor Segmentation in MRI Images Using …

357

Fig. 7 Brain images having tumor [4]

(a) (b) (c) (d)

Histogram Equalization Conversion of colored images (RGB) to grey scale images Morphological Operations Edge Detection

(a) Histogram Equalization: For enhancing contrast of image histogram equalization has been applied to image. It improves image contrast which will be beneficial for subsequent steps. Image histogram represents grey level variation of image using graph. For producing uniform histogram all different intensity values spread over the entire scale. Generally, CLAHE (Contrast limited adaptive histogram equalization) is more popular due to high accuracy [10, 17–21] (Fig. 8). (b) Conversion of colored images (RGB) to grey scale images: This technique is used to convert colored images into gray scale images in order to reduce complexity for later steps (as shown in Fig. 9). Grey scale images contain only one image plane instead of three plane of RGB image. This conversion reduces the data to be maintained by 1/3. This data reduction results in faster processing of algorithm. So, this is very crucial step before any further processing [22, 23].

Fig. 8 Shows brain image before histogram equalization and after applying it. It can be seen from both figures that contrast of image is enhanced which is beneficial for later steps [71]. a The original MRI [5]. b Histogram equalized MRI

358

M. Sharma and N. Miglani

Fig. 9 a Colored MRI [4]. b Grey scale MRI

This can be done by using function (available in MATLAB) I = rb2gray(RGB) where RGB is the image to be converted in grey scale image and I is the resulting image (c) Morphological Operations: Morphological operations can be applied to images for sharpening the regions and for filling gaps of image. Basically there are four basic operations: dilation, erosion, opening and closing. Figure 10 shows before and after result of morphological operations [24, 25]. MATLAB : Image = imerode (Image1, SE0); where SE0 = strel(‘disk’, 8); where se is a structuring element and strel is used to create structuring element in which shape is disk of radius 8.

Fig. 10 a Original image. b After morphological operation

Automated Brain Tumor Segmentation in MRI Images Using … Fig. 11 Masks use for fuzzy edge detection

359

Pi1

Pi2

Pi3

Pi4

Pi5

Pi6

Pi7

Pi8

Pi9

(d) Edge Detection [26–48] Edge detection algorithms are very helpful in finding sudden changes in the intensity of an image and hence useless information can be filtered out. These algorithms finds application in many areas like computer vision, image enhancement, and security during multimedia communication, medical diagnosis, image encryption, image compression and image segmentation. There are various edge detection algorithms like Sobel edge detection, Robert edge detection, Prewitt edge detector, Laplacian of Gaussian (LoG) detector, Canny edge detector and Fuzzy based edge detector. Generally Fuzzy based edge detector is used. Since medical images contains more vagueness and uncertainty. Other standard edge detection algorithms fails in correct determination of true edges. Fuzzy based edge detection algorithm has some more advantages like there is no need of parameter setting, works well under all conditions even noise does not affect process of edge detection, no filtration of noise is needed as in other edge detection algorithm. Working of fuzzy based edge algorithm has been elaborated in following steps: (1) First step involves conversion of colored image into grey scale image. (2) Second step involves 3*3 mask (Fig. 11) which is used to scan the whole image. Scanning process is repeated until complete image is scanned with 3*3 block at a time. (3) Then, p1 to p9 crisp input are functions based on membership functions defined. Fuzzy Input values: For Black [0 0 255] and membership function—Triangular for White [0 255 255] and membership function-Triangular (4) Corresponding Output will classify either white, black or an edge according to fuzzy output values: For Black [0 2 4] and membership function: triangular For Edge [133 131 122] and membership function—triangular For White [209 232 235] and membership function-triangular (as shown in Fig. 12) (5) A total of 81 rules are made. A sample of fuzzy rules are shown in Table 2

360

M. Sharma and N. Miglani

Fig. 12 Output membership function

5.3 Feature Extraction Features are an important attributes as far as an image is concerned. One of the vital features of an image is the texture of the image. Filtering different features from any pre-processed image is known as feature extraction. Such features are used in classifying images [49–51]. There are two different approaches to segment an image: Structured approach and statistical approach. The proposed study deals with the statistical approach. Numerous techniques being used for texture measurement are Gabor filters, co-occurrence matrix, wavelet transform, Fractals. The technique used in this study applies Gray Level Co-occurrence Matrix (GLCM). This technique relies on apprehending feature values numerically by making use of spatial relationships among neighboring pixel features. They can also aid further in classification and making comparison of different features values obtained numerically. The function used to compute these features for any given image is available in MATLAB: GLCM2 = graycomatrix(image, ‘Offset’)

PIX2

BK

BK

WT

WT

WT

PIX1

BK

BK

WT

BK

WT

WT

WT

WT

BK

BK

PIX3

Table 2 Sample of fuzzy rules

WT

WT

WT

BK

BK

PIX4

WT

WT

WT

BK

BK

PIX5

WT

WT

WT

BK

BK

PIX6

WT

WT

WT

BK

BK

PIX7

WT

WT

WT

BK

BK

PIX8

BK

WT

WT

BK

WT

PIX9

EDGE

WT

WT

BK

EDGE

PIX_OUTP UT

Automated Brain Tumor Segmentation in MRI Images Using … 361

362

M. Sharma and N. Miglani

where, image is a variable used for input image and offset is used to measure features from four different directions—0°,45°,90°,135° and have offset value—0 1, −1 1, −1 0, −1−1 respectively.

(-1,-1) 135

0

(-1, 0)90

0

(-1, 1)45

0

These features are used for segmenting image. Image segmentation can be done in two ways: statistical approach and structured approach. Most of the researchers make use of statistical approach. There are several statistical techniques for measuring texture such as co-occurrence matrix, Fractals, Gabor filters, wavelet transform. Proposed research work uses Gray Level Co-occurrence Matrix (GLCM). GLCM captures numerical feature values using spatial relationship among neighborhood pixels features. These numerical feature values are used for further comparing and classifying features. GLCM extract 20 texture features, “Autocorrelation, Contrast, Correlation, Cluster Prominence, Cluster Shade, Dissimilarity, Energy, Entropy, Homogeneity, Maximum probability, Sum of squares, Variance, Sum average, Sum variance, Sum entropy, Difference variance, Difference entropy, information measure of Correlation, Information measure of correlation 2 Inverse difference (INV), Inverse difference normalized (INN) Inverse difference moment normalized” (as shown in Fig. 13) [52, 53]. GLCM features are an extracted images- for three different brain images, namely, Brain image 1, Brain image 2, and Brain image 3 as depicted in Fig. 14 and Table 3 presents results obtained from these images. 1. Contrast (contdr): It measure variation between pixel and its adjoining pixel in terms of grey scale change. Contrast can be computed using the formula suggested below Contdr =



|a − b|2 Pi (a, b)

(1)

a,b

Where Pi (a, b) represents pixel at position (a, b) 2. Energy (energd): It calculates- how uniform an image is? a=

 i, j

Pi2 (i, j)

(2)

Automated Brain Tumor Segmentation in MRI Images Using … Feature types

Shape Based Feature 1. Area 2. Perimeter 3. Circularity 4. Irregularity 5. Shape Index

Intensity Based features 1. Mean 2. Variance 3. S.D 4. Median 5. Skewness 6. Kurtosis 7. Range 8. Pixel Orientation

Texture Based Features 1. Autocortrelaion (autoc) 2. Contrast(contrd) 3. Co-relation1 (corrpd) 4. Co-relation2 (cpromd) 5. Cluster shade(cshad1) 6. Energy(energd) 7. Dissimilarity(Dissid) 8. Entropy(entrod) 9. Entropy(entrod) 10. Homogeneity (homopd) 11. Maximum probability (maxprd) 12. Sum of Squares (sosvhd) 13. Sum Average (savghd) 14. Sum Variance (svarhd) 15. Sum entropy(senthd) 16. Difference Variance(dvhd) 17. Difference entropy(denthd) 18. Information measure of Corelation1(inf1hd) 19. Information measure of Corelation2(inf2h) 20. Inverse difference (indncd) 21. Inverse difference moment normalized (idmncd)

Fig. 13 List of feature that can be extracted from image

Fig. 14 a Brain image 1. b Brain image 2. c Brain image 3

363

364

M. Sharma and N. Miglani

Table 3 GLCM features for brain image 1, brain image 2, brain image 3 Feature no

Feature name

Feature value image 1

Feature values image 2

Feature values image3

1

Autocortrelaion (autoc)

0.07978

0.152848

43.1530

2

Contrast (contrd)

0.95866

0.919698

1.8692

3

Co-relation1 (corrpd)

295.685

294.6303

0.1392

4

Co-relation2 (cpromd)

30.1227

30.70965

34.6933

5

Cluster shade (cshad1)

0.06146

0.093196

5.2662

6

Energy (energd)

0.83309

0.787922

0.1233

7

(Dissimilarity) Dissid

0.5369

0.672409

0.6877

8

Entropy (entrod)

0.97182

0.960849

2.6980

9

Homogeneity (homopd)

0.97524

0.959027

0.65645

10

Maximum probability (maxprd)

0.91314

0.886946

0.6411

11

Sum of Squares (sosvhd)

2.31867

2.48507

0.1973

12

Sum Average (savghd)

2.31703

2.447587

44.9329

13

Sum Variance (svarhd)

0.16651

0.622724

13.2626

14

Sum entropy (senthd)

0.53192

0.96682

133.5676

15

Difference Variance (dvhd)

10.9731

0.965064

1.8188

16

Difference entropy (denthd)

1.85308

0.895411

1.8927

17

Information measure of Co-relation1 (inf1hd)

0.15269

2.227421

1.2145

18

Information measure of Co-relation2 (inf2 h)

0.65648

0.886946

−0.0322

19

Inverse difference (indncd)

0.96834

2.48507

0.2863

20

Inverse difference moment

0.62785

2.447587

0.9107

Automated Brain Tumor Segmentation in MRI Images Using …

energd =

√ a

365

(3)

3. Homogeneity (HOM): Measure changes in grey values. If there are large variation in grey values then homogeneity will also be large and vice versa. HOM =

 i, j

Pi (a, b) 1 + |a − b|

(4)

4. Energy (E): It yields the sum of squared errors in the GLCM. If an image is constant, then value of Energy becomes one. E=



Pi (a, b)2

(5)

a,b

5. Entropy (Entrod): It measures an extent of disorder in an image. Entrod =



Pi (a, b) log2 {Pi (a, b)}

(6)

a,b

6. Variance (VAR): It predicts the difference between gray levels and the mean value obtained  VAR = Pi (a, b)Pi (a, b) − µ2 (7) a

b

7. Maximum Probability (MAX): Max value represents largest value of Pi in matrix. 8. Cluster Shade: It calculates skewness of the matrix. 9. Information measure of correlation 1

I MC1 =

Hx y − Hx y1 max(Hx , Hy )

(8)

where Hxy represent homogeneity.

5.4 Feature Reduction Using Genetic Algorithm Feature reduction helps in minimizing feature set out of total available features to enhance the accuracy and precision of segmentation and time complexity will also be minimized. The key behind feature reduction is to filter out merely those

366

M. Sharma and N. Miglani

Initialization and Representation Selection based on fitness value Mutation and cross-over Stopping criteria=false Stopping criteria? Stopping criteria=true

Exit Fig. 15 General steps of genetic algorithm

features which are more relevant. Most popular feature reduction algorithms are“Sequential forward Selection, Sequential Backward selection, Genetic Algorithm and Particle Swarm Optimization, Principal Component analysis [54, 55]”. Genetic Algorithm was developed by Jon Holland in 1975 which relies on the biological concept, that is, fittest can only survive [56, 57]. It means that only best parent can produce their offspring. In the same manner only best solution can lead to another best solution. Genetic Algorithm finds application in many areas like optimization problem, Machine learning and pattern recognition. Generally, Genetic algorithm has following steps (as shown in Figs. 15 and 16): (1) Initialization and representation: In the first phase, initial population is generated. This initial population is randomly generated out of available search space. Genetic algorithm uses binary coding scheme for representation where 1 shows gene is present and 0 shows gene is absent. (2) Selection: Selection is also known as “survival of the test operator”. In this phase, worst solutions are removed from the population while best items are duplicated. A fitness function is used to decide whether an item is best or whether it is worst. (3) Cross Over and Mutation: In mutation, a position in string is chosen at random and flips that value of that bit i.e. 1–0 or 0–1. Whereas, in crossover two best chromosome joins at some point to generate new population. (4) Stopping Criteria: There must be some stopping criteria for feature selection process otherwise this process will keep on going uninterruptedly. There are various ways to stop feature selection process-(1) a pre-defined number of features can be selected as a stopping criteria which depends on user requirement,

Automated Brain Tumor Segmentation in MRI Images Using …

367

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 0 1 1 0 0 0 0 0 0 Features

0

0 0

f1

f2

0

0

0

1

1

1

1

…………………….

f20

Initialization of different parameters (Chromosome length, Population size)

Generate temporary feature subset ANFIS calculates fitness in terms of accuracy

Accuracy>threshold

no

yes Feature Subset is promising

New feature subset is generated Mutation and single point cross-over

no Maximum iteration reached?

yes

exit

Fig. 16 Different steps involved in the process of feature reduction

(2) number of iterations can be used as a stopping criteria, (3) fitness function value cannot be changed further, then algorithm must be stopped. 1. Initially, Grey level co-occurrence matrix (GLCM) abstracts twenty texture features from all the respective images. Each feature has been assigned number from 1 to 20. For example f1, f2 … f20

368

M. Sharma and N. Miglani

2. These 20 features have been passed to genetic algorithm for reducing them up to Y features. For Genetic Algorithm to be start, following parameters should be set: Population Size: 20 Maximum Chromosomal Length: Y 3. After the feature reduction phase, only Y most promising features will be selected out of 20 features. 3. For initialization, initially Y features are randomly selected and assigned to temporary feature set. Each time algorithm is executed, different feature set is selected. Features are shown in the form of binary string “10010110101010000000”, where 1 signifies respective feature set is available and 0 signifies corresponding feature is not available or absent. For exampleIn this example, stopping criteria that has been chosen is Maximum iteration. When maximum iteration would be reached, algorithm will stop. 4. Fitness Function: For the success of genetic algorithm, fitness function must be defined which is used to determine whether a particular feature subset is promising or not. In the proposed research work, ANFIS is used to calculate fitness value using Eq. (1). Fitness(fi) =

20  i=1

Tr P Tr N Tr P + Tr N + + Tr P + Fa N Tr N + Fa P Tr P + Fa N

(9)

where, Fitness (fi) represents fitness value corresponding to a particular feature subset. In Eq. (1), “True Positive (Tr P) means both training algorithm and testing algorithm results are positive, True Negative (Tr N) is both training algorithm and testing algorithm results are negative, False Positive (Fa P) signifies training algorithm result is positive and testing algorithm is negative and False Negative (Fa N) suggests Training algorithm result is negative and Testing algorithm results are positive”. Thus, there are two types of sets being formed i.e. A and B. A represents those features which are selected and B represents those features which are not selected. Total features = A U B Fitness(A) = F(i) − penalty(A) where A is the subset of selected features, and penalty (A) = w * (|A| − d) where w is penalty coefficient. On the basis of fitness function, next generation features are selected (as shown in Table 4). The fitness value helps in deciding whether feature selected is good or not.

Automated Brain Tumor Segmentation in MRI Images Using … Table 4 Best chromosome selected by Genetic Algorithm with their classification accuracy

369

Sr. no

Feature number selected

Classification accuracy

1

14,16,6,12,9,7,20

74.0331

2

13,19,20,4,6,20,7

74.0222

3

15,19,17,12,14,7,20

73.3425

4

20,17,9,18,3,15,19

73.2432

5

9,20,15,5,19,18,3

73.2044

6

1,2,1617,16,1,11

73.0

7

3,7,16,20,19,6,15

75

8

9,16,3,13,14,6,3

72.0994

9

9,20,15,5,19,18,3

74.7238

10

18,20,19,6,10,5,13

71.2707

5.5 Neuro Fuzzy Modeling Neuro Fuzzy concept was developed in 1995 by J.S.R Jang. The hybridization of neuro-fuzzy is the most fruitful integration of the Soft Computing techniques. Neuro Fuzzy system combines benefits of both fuzzy system and neural network. Fuzzy logic is capable of modeling vagueness, handling uncertainty and supporting humantype reasoning. The Adaptive Network based Fuzzy Inference System (ANFIS) uses a Takagi Sugeno Fuzzy Inference System and it has five layers as shown in Fig. 17. The first hidden layer is used for mapping of the input variable to their corresponding membership functions. To calculate antecedent of rule, T-norm is applied in the second hidden layer. Final shape of membership function is also tuned in the second layer. The third hidden layer is concerned about normalization of rule strength.

Fig. 17 Layered architecture of anfis

370

M. Sharma and N. Miglani

Fig. 18 GUI of ANFIS editor

5.6 ANFIS Editor ANFIS Editor GUI can be used for initialization of FIS properties. To start ANFIS editor in MATLAB type: anfisedit Figure 18 shows GUI of ANFIS editor in which no. of inputs are 7 and corresponding to 7 input there is 1 output. Each input has two membership functions of custom type. There are different panel in GUI such as loading the data, generating FIS, training FIS, and testing FIS where loading of data is the first step. Data should be in matrix form and can be either taken from file or workspace.

5.7 Training and Testing Phase Proposed system comprises two steps: In first step, training is done and in the second step, testing is done as shown in Fig. 19.

Automated Brain Tumor Segmentation in MRI Images Using … Training Image Data Set

Feature Extractor

Feature Stored in Database

371 Test Image

[X1 X2……X7]

Feature Extractor

Feature Extractor

[X1 X2……X7]

Feature of test image [Y1, Y2……Y7]

Feature Extractor

Feature Extractor

[X1 X2……X7]

Best Match searches in database

[X1 X2……X7]

Fig. 19 Schematic diagram for MRI training and testing

In training phase, features from different images are extracted using GLCM and are reduced to 7 feature subset using Genetic Algorithm and then store them in the database along with the corresponding output. Total 57 images are used to train proposed system. When a query image comes for tumor identification, firstly its GLCM image features are extracted and are finally send to recognizer of proposed work for finding the best suitable match. After, finding suitable match, corresponding output will be generated. Output means which type of tumor is there and grade of tumor as well [58].

372

M. Sharma and N. Miglani Image Segmentation and Classification Methods

Region Based 1. Region growing and splitting 2. Region merging method 3. Watershed segmentation 4. Level set method 5. Active Contour

Edge Based Methods 1. Gradient based methods 2. Gray Histogram

Unsupervised Methods 1. K means 2. FCM 3. ANT Tree Algorithm

Supervised Methods 1. KNN 2. SVM 3. PCA

Neural network Based

Feed Forward learning 1. Single layer 2. Multi-Layer

Feed Back Learning 1. ART models

Fig. 20 Classification of MRI brain image segmentation methods

5.8 Image Segmentation Methods [59–61] There is various image segmentation methods as shown in Fig. 20.

5.9 Fuzzy C-Means Segmentation Fuzzy C-Means Segmentation (FCM) is a well-known clustering algorithm, used in pattern recognition [62–68]. FCM has an advantage that it is not necessary that one data belongs to only one cluster instead one data can share more than one cluster. Basic FCM features are shown in Fig. 21. The FCM algorithm partitions finite collection of n elements X = {x1 , …, xn } into a collection of c fuzzy clusters with respect to some given criterion. Step 1: Initialization Initialize membership function means assign cluster to each one of them. For example-Four clusters (C1, C2, C3, C4) have been used for detecting four type of brain tumor. C  j=1

µ j (xi ) = 1

(10)

Automated Brain Tumor Segmentation in MRI Images Using …

373

Start

Initialize membership matrix Calculate centroids Calculate dissimilarity between the data points and centroid using Euclidean distance Update new membership matrix No Is previous cluster center same as new cluster center? yes Stop Fig. 21 Flow chart of FCM algorithm

where i n J C µ j (xi )

= 1, 2, 3, … n represent no. of elements to be partition into clusters. = 1, 2, 3 … C represents no. of clusters in which elements are to be partitioned represents degree to which element xi belongs to cluster Cj

Step 2: Calculate centroids m   xi i µ j (x i) cj =   m i µ j (x i)

(11)

where, m is fuzzification parameter and its value lies between 1.25 and 2 (generally) Step 3: Calculate dissimilarity between the data points and centroid using Euclidean distance

374

M. Sharma and N. Miglani

Di =



(x2 − x1 )2 + (y2 − y1 )2

(12)

Step 4: Update new membership matrix using the eq

 µ j (xi ) =

1 d ji

c j=1

1  m−1



1 d ji

1  m−1

(13)

Step 5: Go back to step 2, unless centroids are not changing In Fig. 22, four clusters are represented by four colors-red, blue, purple and green and cluster center is represented by “X”. • Shape feature can also be used to increase classification accuracy. Get extra information from patient like history, age to increase classification accuracy. • Modified Sugeno type ANFIS can be used.

Fig. 22 Output after FCM segmentation

Automated Brain Tumor Segmentation in MRI Images Using …

375

6 Various Challenges Faced by Deep Learning Though deep learning in itself is a domain with numerous benefits and has large number of practical applications yet to attain those benefits, one might encounter some challenges as discussed below:

6.1 Huge Amount of Data The human brain requires lots of information and experiences to reach to any outcome. On similar pattern, artificial neural networks demands huge amount of data for training and learning. Huge dataset is beneficial to obtain accurate and precise results. Deep learning classifier relies heavily on the magnitude and quality of dataset available. If limited data or information is available, it could directly hamper the success ratio of deep learning, specifically in medical domains [69]. Although, huge dataset is a crucial concern, yet another challenge lies in generating such data for medical imaging as it depends on the observations and interpretations provided by experts of that field. In order to minimize inaccuracies and human errors, it is important to consider multiple experts opinions. This would become difficult if field experts are not available. Moreover, in extreme cases of rare diseases, sufficient cases might not be available. One more issue could be unbalancing of data as if it is the case of rare disease, data set could be unprecedented, and in which case an imbalance may supervene.

6.2 Domain Specific and Multi-tasking In deep learning, training the data can yield productive and precise results, but only for a specific problem. In the current scenario, deep learning approach is highly domain-specific in such a way that if one requires solution for similar kind of problems or patterns, one has to re-assess and re-train the data all over. Although, the approach is efficient enough for solving some specific problem, yet it is inflexible to accommodate multi-tasking. Research is going on to focus multi-tasking without the need of revising complete architecture. Multi-Task Learning (MTL) and Progressive Neural Networks are being explored to bring some amelioration in this aspect.

6.3 Deep Learning Is Intrinsically a Black Box Deep learning algorithms bought new hopes in the field of medical imaging and triggered new opportunities. It provided the solution for the problems which were

376

M. Sharma and N. Miglani

previously considered to be unsolvable by conventional approaches. Still, it has its own shortcomings. One of them is Black-Box problem. Although a clear vision is there about what input has been fed to the network, and how they would be combined together yet an output generation is quite complex and there is no clear understanding about how output has been generated. Identifying inputs, applying model parameters, and building the model is available but how the model is actually working is quite an issue to understand. For such reasons, the domain becomes weak in the situations where verification is the foremost requirement as internal manipulations are hidden from user.

6.4 Optimizing Hyper-parameters When the values of parameters are set before the learning process begins, these are called hyper-parameters. If a small change is done in these values, it could largely affect the model performance. When real life problems are considered, default value of parameters cannot help building accurate results. It can hamper the system performance significantly. If small number of hyper-parameters are considered and are tuned manually instead of optimizing them with standard methods, could also raise a performance issue.

6.5 Requires High Performance Hardware Deep learning requires high capacity hardware which is costly and demands huge power consumption as well.

6.6 Less Flexibility Deep neural network can be trained to one domain only. It cannot adapt to another domain. For different problem, it again requires training of neurons.

7 Research Issues and Future Perspectives Processing Power, Big Data and Deep Learning Algorithms based on human brain are three key features that are stimulating the revolution of deep learning. Undoubtedly, the benefits achieved by deep learning are remarkable and for attaining those benefits, human efforts and cost incur is also high. Large scale companies and different research laboratories with prominent hospitals are also engaging and functioning

Automated Brain Tumor Segmentation in MRI Images Using …

377

together towards reaching the most favorable unravelments in medical fields. Numerous companies namely, Hitachi, Siemen etc. have already step forward for putting high expenses in the domain. For detection of pediatric brain disorders, GE Healthcare with Bostons Children Hospital is developing smart imaging technology. Even research labs are expending money for delivering potent image-based applications.

7.1 Enhancements in Deep Learning Approach Deep learning technology relies on supervised learning approach. Nonetheless, illustrations of medical data, precisely, medical images are not available often. These are the cases when either disease occurrence is rare or field expert is not available. To overpower as issue of data unavailability, it is crucial to switch from supervised to either unsupervised or semi-supervised learning method. If training approach is shifted to unsupervised or semi-supervised approach, specifically in medical fields, an accuracy and precision of final results might come on stake. Though efforts are being put in this aspect, yet some rigid solution has not been attained to tackle with an issue of inaccuracies. There are infinite opportunities lying for the scope of improvements and modifications.

7.2 Big Image Data Exploitation There is a requirement of huge dataset for applying deep learning methods, and availability of such huge data in itself is a crucial and difficult task. Illustration of real world data is easy in comparison to medical image data. For instance, illustration of objects, distinction of men or women in real world is a negligible task to do whereas interpretation of medical images requires field expertise as well as it is costly affair which demands lot of time for processing. In fact, not only an opinion of single expert but a multiple experts for same data are required for gaining accuracies and peculiarities in manipulating image data. One more issue could lie in whether data is available or not in case where diseases are rare. In such cases, it becomes more difficult to get large amount of dataset. The solution for above- suggested problem could be the sharing of data by different healthcare service providers as far as possible. In this way problem of data access could be minimized.

7.3 Pervasive Inter-organization Collusion Even though numerous predictions about benefits and growth of deep learning in medical image field are being made by stakeholders, yet replacement of human with machines or tools will always remain a debatable issue. Significant improvements in

378

M. Sharma and N. Miglani

accuracies of analysis and prediction in disease diagnosis by deep learning approach cannot be ignored. However, some issues persists which needs immediate attention of researchers. Collusion between vendors, field experts and hospitals is unavoidable in order to meet exceptional benefits for enhancing the health quality. This would resolve the problem of data availability to the field experts and researchers. Another issue contracts an advanced tools and equipments to tackle exhaustive and unlimited healthcare data. This would be more helpful in the cases where sensor networks are increasing volume of data in an exponential way.

7.4 Privacy and Judicial Concerns Either technical or sociological issues can affect data confidentiality, thus there is an urge of dealing it with both perspectives technical as well as sociological. To deal with privacy concerns, HIPAA comes to the mind as far as medical field is concerned. HIPAA stands for Health Insurance Portability and Accountability Act of 1996, is an US Legislation. It renders patients with the legal rights concerning his/her individual accountable information and providing some standards and protocols to secure their personal details and their use in any form. This privacy concern is an absolute need of the current scenario yet it is challenging in terms of how to secure and hide the patient personal information in order to forbid its misuse. If some kind of restrictions would prevail on data, then it would limit the content availability, which would further raise an issue of limited dataset and henceforth, would lead to inaccurate results. Although it is not mandatory to comply with HIPAA yet secure health information can be stored and maintained as HIPAA covered entity. Applicability of HIPAA exists only if Protected Health Information for transactions is transmitted electronically. Indian organizations and companies are also being assisted for HIPAA compliance in order to stay ahead in the world of data protection. Moreover, health care data is dynamic in nature, thus existing methodologies are insufficient to tackle the problem.

8 Performance Comparison Diagnostic accuracy of different image segmentation algorithm can be analyzed (as shown in Fig. 23 and Table 5) in terms of following parameters: Sensitivity = True Positive/(True Positive + False Negative) ∗ 100% Specificity = True Negative/(True Negative + False Positive) ∗ 100% Accuracy = (True Positive + True Negative)/(True Positive + True Negative + False Positive + False Negative) ∗ 100

Automated Brain Tumor Segmentation in MRI Images Using …

379

Fig. 23 Comparative analysis between deep learning and other segmentation methods

Table 5 Comparative analysis between deep learning and other segmentation methods (also, refer Fig. 23)

Algorithms

Sensitivity (%)

Specificity (%)

Accuracy (%)

Fuzzy C means segmentation

96.1

93.4

86.16

ANFIS + Genetic

95.1

93.1

90.1

K-Mean + FCM

80.1

93.32

83.4

Deep learning (CNN)

97.01

96.1

97.17

9 Conclusion For the automation of daily life tasks, deep learning has gained much popularity in recent years. In the upcoming years most of the routine jobs would be performed using automatic devices rather than manual work. This chapter yields an overview of different segmentation methods for images. Deep learning methods are more efficient and can address problem in better way than other algorithms. Deep learning provides

380

M. Sharma and N. Miglani

improvised results in comparison to conventional approaches of machine learning. In this chapter we discuss various phases in brain tumor segmentation. Each phase has been discussed in brief. Various deep learning algorithms has been compared with their relevant advantages and disadvantages. This chapter also discusses the reasons behind slow growth of deep learning in medical field. Various solutions have been proposed by different researchers. In the last section various research open issues and future directions have been addressed.

10 Future Scope (1) More features can be embedded to enhance classification precision Shape feature is one of those features which can help raise an accuracy level of classification being done. Get extra information from patient like history, age to increase classification accuracy. (2) More efficient deep learning Model Major problem in automatic brain tumor segmentations the similarity between background and tumor pixels. Some background pixels are misclassified as brain tumor pixels. So, in future a more efficient deep learning model can be developed that can differentiate between tumor and background pixels with more accuracy. (3) To train Deep CNN a more efficient loss function can be chosen. A more effective loss function helps in differentiating between background and tumor pixels with improved accuracy (4) Colored images may also be considered. This study targets only grey-scale images. Besides, it could be intensified to augment colored images.

References 1. Zikic, D., Ioannou, Y., Brown, M., Criminisi, A.: Segmentation of brain tumor tissues with convolutional neural networks. In: Proceedings of MICCAI workshop on Multimodal Brain Tumor Segmentation Challenge (BRATS), pp. 36–39 (2014) 2. Pereira, S., Pinto, A., Alves, V., Silva, C.A.: Brain tumor segmentation using convolutional neural networks in MRI images. IEEE Trans. Med. Imaging 35(5), 1240–1251 (2016) 3. Central Brain Tumor Registry of the United States (CBTRUS), Fact Sheet available at (2011) http://www.cbtrus.org/factsheet.html 4. Christ, J.M., Parvathi, R.M.S.: Brain tumors: an engineering perspective. IJCSI 9(4), 392–396 (2012) 5. Schmidt, F.E.W.: Development of a time-resolved optical tomography system for neonatal brain imaging. Ph.D. thesis, Chapter-2, pp. 25–34 (1999) 6. Thurnher, M.M., Thurnher, S.A., Fleischmann, D., Steuer, A., Rieger, A., Helbich, T., Trattnig, S., Schindler, E., Hittmair, K.: Comparison of T2-weighted and fluid-attenuated inversionrecovery. Am. Soc. Neuroradiol. 1601–1609 (1997) 7. Doolittle, N.D.: State of the science in brain tumor classification. Semin. Oncol. Nurs. 20, 224–230 (2004)

Automated Brain Tumor Segmentation in MRI Images Using …

381

8. Wen, P.Y., Teoh, S.K., Black, P.M.: Brain tumors: an encyclopedic approach. Cancer Neurol. Clin. Pract. 217–248 (2001) 9. Chandrasoma, P.C.P.: Stereotactic brain biopsy. W. J. Med. 1–5 (1991) 10. Kong, N.S.P., Ibrahim, H., Hoo, S.C.: A literature review on histogram equalization and its variations for digital image enhancement. Int. J. Softw. Eng. Res. Pract. 1(2), 386–389 (2013) 11. Singaravel, S., Suykens, J., Geyer, P.: Deep-learning neural-network architectures and methods: Using component-based models in building-design energy prediction. Adv. Eng. Inform. 38, 81–90 (2018) 12. Du, X., Cai, Y., Wang, S., Zhang, L.: Overview of deep learning. In: 31st Youth Academic Annual Conference of Chinese Association of Automation Wuham, China, 11–13 Nov 2016, pp. 159–164 13. Ishak, N.F., Logeswaran, R., Tan, W.H.: Artifact and noise stripping on low-field brain mri. Int. J. Biol. Biomed. Eng. 2(2), 59–68 14. Nobi, M.N., Yousuf, M.A.: A new method to remove noise in magnetic resonance and ultrasound images. J. Sci. Res. 3(1), 81–89 (2011) 15. Devasena, C.L., Hemalatha, M.: Noise removal in magnetic resonance images using hybrid KSL filtering technique. Int. J. Comput. Appl. 27(8), 1–4 (2011) 16. Kumar, S., Kumar, P., Gupta, M., Nagawat, A.K.: Performance comparison of median and wiener filter in image de-noising. Int. J. Comput. Appl. 12(4), 27–31 (2010) 17. Bhatia, A., Kulkarni, R.K.: High density salt and pepper noise removal through improved adaptive median filter. Int. Conf. Comput. Sci. Inform. Technol. (CSIT-2012). 197–200 (2012) 18. Bagade, S.S., Shandilya, V.K.: Use of histogram equalization in image processing for image enhancement. Int. J. Softw. Eng. Res. Pract. 6–10 (2011) 19. Chen, S.D.: Contrast enhancement using brightness preserving bi-histogram equalization. IEEE Trans. Consum. Electron. 1, 1–8 (1997) 20. Wang, C., Zhongfu, Y.: Brightness preserving histogram equalization with maximum entropy: a variational perspective. IEEE Trans. Consum. Electron. 51(4), 1326–1334 (2005) 21. Ning, C.Y., Liu S.F., Qu, M.: Research on removing noise in medical image based on median filter method. IEEE Explore. 384–388 (2009) 22. Sawant, H.K., Deore, M.: A comprehensive review of image enhancement techniques. Int. J. Comput. Technol. Electron. Eng. 1(2), 34–38 (2012) 23. Gonzalez, R.C., Woods, R.E.: Digital image processing, 2nd edn. Prentice Hall (2002) 24. Chen, S.D., Ramli, R.: Contrast enhancement using recursive mean-separate histogram equalization for scalable brightness preservation. IEEE Xplore, 1301–1309 (2001) 25. Dykstra, C., Das, M.: The use of image morphing to improve the detection of tumors in emission imaging. Nucl. Sci. Symp. 3, 1781–1785 (1998) 26. Marr, D., Hildreth, E.: Theory of edge detection. Proc. Roy. Soc. Lond. B. 187–217 (1980) 27. Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986) 28. Schunck, B.G.: Edge detection with Gaussian filters at multiple scales. IEEE Comput. Soc. Work. Comp. Vis.208–210 (1987) 29. Bergholm, F.: Edge focusing. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-09, 726–741 (1987) 30. Lacroix, V.: The primary raster: A Multiresolution Image Description. In: 10th International Conference on Pattern Recognition, pp. 903–907 (1990) 31. Williams, D.J., Shah, M.: Edge contours using multiple scales. Comput. Vis. Graph Image Process. 51, 256–274 (1990) 32. Goshtasby, A., Marr, D.: On edge focusing. Image visualization. Computer. 12, 247–256 33. Deng, G., Cahill, L.W.: An adaptive Gaussian filter for noise reduction and edge detection. In: Proceedings IEEE Nuclear Science Symposium, pp. 1615–1619 (1994) 34. Bennamoun, M., Boashash, B., Koo, J.: Optimal parameters for edge detection. Proc. IEEE Int. Conf. SMC. 2, 1482–1488 (1995) 35. Heric, D., Zazula, D.: Combined edge detection using wavelet transform and signal registration. Elsevier J. Image Vis. Comput. 25, 652–662 (2007)

382

M. Sharma and N. Miglani

36. Shih, M.Y., Tseng, D.C.: A wavelet based multi resolution edge detection and tracking. Elsevier J. Image Vis. Comput. 23, 441–451 (2005) 37. Bezdek, J.C., Chandrasekhar, R., Attikiouzel, Y.: A geometric approach to edge detection. IEEE Trans. Fuzzy Syst. 6(1), 52–75 (1998) 38. Wu, J., Yin, Z., Xiong, Y.: The fast multilevel fuzzy edge detection of blurry images. IEEE Signal Process. Lett. 14(5), 344–347 (2007) 39. Lu, S., Wang, Z., Shen, J.: Neuro-fuzzy synergism to the intelligent system for edge detection and enhancement. Elsevier J. Pattern Recogn. 36, 2395–2409 (2003) 40. Shrivakshan, G.T., Chandrasekar, C., Bhandarkar, S.M.: An edge detection technique using genetic algorithm-based optimization. Pattern Recogn. 27(9), 1159–1180 (1994) 41. Zhang, Y., Potter, W.D.: Comparison of various edge detection techniques used in image processing. IJCSI Int. J. Comput. Sci. Issues 9(5), 269–276 (2012) 42. Becerikli, Y., Karan, T.M., Cabestany, J., Prieto, A., Sandoval, D.F.: A new fuzzy approach for edge detection. IWANN 2005, 943–951 (2005) 43. Anver, M.M., Stonie, R.J.: Evolutionary learning of a fuzzy edge detection algorithm based on multiple masks. Springer, vol. 12, pp. 1–13 (2005) 44. Suliman, C., Boldi¸sor, C., B˘az˘avan, R., Moldoveanu, F.: A fuzzy logic based method for edge detection. Eng. Sci. 4, 159–164 (2011) 45. Sharifi, M., Fathy, M., Mahmoudi, M.T.: A classified and comparative study of edge detection algorithms. In: Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC.02) IEEE, pp 1–4 (2002) 46. Yu-Qian, Z., Wei-Hua, G., Zhen-Cheng, C., Jing-Tian, T., Ling-Yun, L.: Medical images edge detection based on mathematical morphology. In: Proceedings of the 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference Shanghai, China, pp. 6492–6495 (2005) 47. Saxena, S., Kumar, S., Sharma, V.K.: Comparative analysis of various edge detection techniques. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3(6), 758–761 (2013) 48. Haralick, R.M., Shanmugam, K., Dinstein, I.H.: Textural Features for Image Classification. IEEE Trans. Syst. Man Cybern. 610–621 (1973) 49. Prasetiyo, Khalid, M., Yusof, R., Meriaudean, F.: A comparative study of feature extraction methods for wood texture classification. SITIS, IEEE Conf.. 23–29 (2010) 50. Nithya, R., Santhi, B.: Comparative study on feature extraction method for breast cancer classification. J. Theor. Appl. Inf. Technol. 33(2), 220–226 (2011) 51. Chadha, A., Mallik, S., Johar, R.: Comparative study and optimization of feature-extraction techniques for content based image retrieval. Int. J. Comput. Appl. 52(20), 35–42 (2012) 52. Ramamurthy, B., Chandran, K.R., Aishwarya, S., Janaranjani, P.: CBMIR: content based image retrieval using invariant moments, GLCM and grayscale resolution for medical images. Eur. J. Sci. Res. 460–471 (2010) 53. Hamza, R.M., Al-Assadi, T.A.: Genetic algorithm to find optimal GLCM features. Inf. Technol. Univ. Babylon Iraq. pp. 1–16 (2012) 54. Jolliffe, I.T., Potter, W.D.: Principal Component Analysis, 2nd edn, pp. 1–5. Springer, New York (2002) 55. Scholkopf, B., Smola, A., Muller, K.R.: Kernel Principal Component Analysis, pp. 327–352. IT Press, Cambridge, MA (1999) 56. Shapiro, V.A., Veleva, P.K., Sgurev, V.S.: An adaptive method for image thresholding. In: 11th IAPR International Conference on Image, Speech and Signal Analysis, pp. 696–699 (1992) 57. Sezgin, Mehmet, Sankur, Bulent: Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging 13, 146–165 (2004) 58. Elaiza, N., Khalid, A., Ibrahim, S., Manaf, M.: Comparative study of adaptive network-based fuzzy inference system (ANFIS), k-nearest neighbors (k-NN) and fuzzy c-means (FCM) for brain abnormalities segmentation. Int. J. Comput. 5(4), 513–524 (2011) 59. Zhang, J., Morgan, N.: Stochastic model based image segmentation using Markov random fields and multi-layerperceptrons. IEEE Signal Process. 1–8 (1990) 60. Azmi, R., Norozi, N.: A new markov random field segmentation method for breast lesion segmentation in MR images. J. Med. Signals Sens. 1(3), 156–164 (2011)

Automated Brain Tumor Segmentation in MRI Images Using …

383

61. Prastawa, M., Bullitt, E., Gerig, G.: A brain tumor segmentation framework based on outlier detection. Med. Image Anal. 18, 217–231 (2004) 62. Dipali, B.B., Patil, S.N.: Brain tumor mri image segmentation using FCM and SVM techniques. Int. J. Eng. Sci. Comput. 3939–3942 (2016) 63. Kannan, S.R., Ramathilagam, S., Devia, R., Hines, E.: Strong fuzzy C-means in medical image data analysis. J. Syst. Softw. 2425–2438 (2012) 64. Zhang, J.G., Ma, K.K., Chong, V.: Tumor segmentation from magnetic resonance imaging by learning via one-class support vector machine. IWAIT. 207–21 (2004) 65. Garcia, C., Moreno, J.: Kernel based method for segmentation and modeling of magnetic resonance images. LNCS. 636–645 (2004) 66. Lee, C.H., Schmidt, M., Murtha, A., Bistritz, A., Sander, J., Greiner, R.: Segmenting brain tumors with conditional random fields and support vector machines. LNCS 3765, 469–478 (2005) 67. Gibbs, P., Buckley, D.L., Blackband, S.J., Horsman, A.: Tumor volume determination from MR images by morphological segmentation. Phys. Med. Biol. 2437–2446 (1996) 68. Letteboer, M., Olsen, O., Dam, E., Willems, P., Viergever, M., Niessen, W.: Segmentation of tumors in magnetic resonance brain images using an interactive multiscale watershed algorithm. Acad. Radiol. 11, 1125–1138 (2011) 69. Havaei, M., Davy, A., Warde-Farley, D., Biard, A., Courville, A., Bengio, Y., Pal, C., Jodoin, P.-M., Larochelle, H.: Brain tumor segmentation with deep neural networks. Med. Image Anal. 35(2017), 18–31 (2017) 70. Web Source: https://www.oreilly.com/library/view/deep-learning/9781491924570/ch04.html 71. Magudeeswaran, V., Ravichandran, C.G.: Fuzzy logic-based histogram equalization for image contrast enhancement. Math. Eng. 1–10 (2013) 72. Vorontsov, A.O., Averkin, A.N.: Comparison of different convolution neural network architectures for the solution of the problem of emotion recognition by facial expression. In: Proceedings of the VIII International Conference “Distributed Computing and Grid-technologies in Science and Education” (GRID 2018), Dubna, Moscow region, Russia, Sep 10–14 2018, pp. 35–40 73. Agarwal, V.: Analysis of histogram equalization in image preprocessing. BIOINFO Hum. Comput. Interact. 1(1), 04–07 74. Yang, Y., Huang, S.: Novel statistical approach for segmentation of brain magnetic resonance imaging using an improved expectation maximization algorithm. Optica Appl. 125–36 (2006) 75. Vinitski, S., Iwanaga, T., Gonzalez, C.F., Andrews, D., Knobler, R., Curtis, M.: Fast tissue segmentation based on a 4D feature map. In: 9th International Conference (ICIAP 97), vol. 2, pp. 445–452 (1997) 76. Revathy, M., Hemalataha, M.: Efficient method for feature extraction on video processing. In: CCSEIT 2012 ACM International Conference, pp. 539–543 (2012)

Minakshi Sharma received the Ph.D. degree in Computer Science from Banasthali University Rajasthan India, in 2015. In 2017, she joined as an Assistant Professor in NIT Kurukshetra in the Department of Computer Engineering. She has more than 10 papers to his credit in national and international conferences and journals. Her research interests include Deep Learning, Artificial Intelligence, Neural Network, Fuzzy logic Based systems. Neha Miglani she has received her Master Degree in Computer Science from Kurukshetra University, India in 2012. Currently, she is working as an Assistant Professor in National Institute of Technology, Kurukshetra, India. Her research interest includes Cloud Computing, Neural Networks, Software Reliability ranging from Cost Models, Software Reliability Growth Models, and Reliability metrics, etc.