Information Technology in Biomedicine [1st ed.] 978-3-030-23761-5;978-3-030-23762-2

This book provides a comprehensive overview of advances in the field of medical data science, presenting carefully selec

441 101 87MB

English Pages XII, 653 [650] Year 2019

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Information Technology in Biomedicine [1st ed.] 9783030496654, 9783030496661

The rapid and continuous growth in the amount of available medical information and the variety of multimodal content has

1,026 111 13MB Read more

Advances in Information Communication Technology and Computing: Proceedings of AICTC 2019 [1st ed.] 9789811554209, 9789811554216

This book features selected research papers presented at the International Conference on Advances in Information Communi

623 92 1MB Read more

English for professional communication in information technology

2,148 261 2MB Read more

Advances in Information and Communication Technology and Systems [1st ed.] 9783030583583, 9783030583590

This book highlights the most important research areas in Information and Communication Technologies as well as Radio El

1,238 65 57MB Read more

Frontier Information Technology and Systems Research in Cooperative Economics [1st ed.] 9783030578305, 9783030578312

This book is the very first book-length study devoted to the advances in technological development and systems research

1,766 118 17MB Read more

Standards and Innovations in Information Technology and Communications [1st ed.] 9783030444167, 9783030444174

This book gives a thorough explanation of standardization, its processes, its life cycle, and its related organization o

807 66 5MB Read more

Special Topics in Information Technology [1 ed.] 9783030859183, 9783030859176

This open access book presents thirteen outstanding doctoral dissertations in Information Technology from the Department

133 26 6MB Read more

Ethics in Information Technology [6th Edition]

49,229 4,038 3MB Read more

Women in information technology 9781422288979, 1422288978

557 105 2MB Read more

Managing information technology [7th ed] 1292023465, 9781292023465

For graduate and executive level MIS students, and practicing IS managers. A thorough and practical guide to IT manageme

112 20 17MB Read more

Information Technology in Biomedicine [1st ed.]
978-3-030-23761-5;978-3-030-23762-2

Author / Uploaded
Ewa Pietka
Pawel Badura
Jacek Kawa
Wojciech Wieclawek

Table of contents :
Front Matter ....Pages i-xii
Front Matter ....Pages 1-1
Functional Thermal Imaging of Skin Tissue Using the Discrete Thermal Time Constants Spectrum (Maria Strąkowska, Robert Strąkowski, Michał Strzelecki)....Pages 3-12
Contextual Classification of Tumor Growth Patterns in Digital Histology Slides (Zaneta Swiderska-Chadaj, Zhaoxuan Ma, Nathan Ing, Tomasz Markiewicz, Malgorzata Lorent, Szczepan Cierniak et al.)....Pages 13-25
Cervical Histopathology Image Classification Using Ensembled Transfer Learning (Chen Li, Dan Xue, Fanjie Kong, Zhijie Hu, Hao Chen, Yudong Yao et al.)....Pages 26-37
Functional Kidney Analysis Based on Textured DCE-MRI Images (Marcin Kociołek, Michał Strzelecki, Artur Klepaczko)....Pages 38-49
Incorporating Patient Photographs in the Radiology Image Acquisition and Interpretation Process (Elizabeth A. Krupinski)....Pages 50-55
Iterative Statistical Reconstruction Algorithm Based on C-C Data Model with the Direct Use of Projections Performed in Spiral Cone-Beam CT Scanners (Robert Cierniak, Piotr Pluta)....Pages 56-66
Deformable Mesh for Regularization of Three-Dimensional Image Registration (Piotr M. Szczypiński, Artur Klepaczko)....Pages 67-78
Simulator for Modelling Confocal Microscope Distortions (Katarzyna Sprawka, Piotr M. Szczypiński)....Pages 79-90
Front Matter ....Pages 91-91
Electromyography Based Translator of the Polish Sign Language (Noemi Kowalewska, Przemysław Łagodziński, Marcin Grzegorzek)....Pages 93-102
Electrooculography Application in Vision Therapy Using Smart Glasses (Maja Trzepacz, Przemysław Łagodziński, Marcin Grzegorzek)....Pages 103-116
Assessment of Muscle Fatigue, Strength and Muscle Activation During Exercises with the Usage of Robot Luna EMG, Among Patients with Multiple Sclerosis (Krystyna Stańczyk, Anna Poświata, Anna Roksela, Michał Mikulski)....Pages 117-128
Information Models of Dynamics in Healthcare (Václav Řepa)....Pages 129-140
Effects of External Conditions to Chaotic Properties of Human Stability (Radek Halfar, Martina Litschmannová, Martin Černý)....Pages 141-150
A Preliminary Evaluation of Transferring the Approach Avoidance Task into Virtual Reality (Tanja Joan Eiler, Armin Grünewald, Alla Machulska, Tim Klucken, Katharina Jahn, Björn Niehaves et al.)....Pages 151-163
Front Matter ....Pages 165-165
Convolutional Neural Networks in Speech Emotion Recognition – Time-Domain and Spectrogram-Based Approach (Bartłomiej Stasiak, Sławomir Opałka, Dominik Szajerman, Adam Wojciechowski)....Pages 167-178
Convolutional Neural Networks for Computer Aided Diagnosis of Interdental and Rustling Sigmatism (Andre Woloshuk, Michal Krecichwost, Zuzanna Miodonska, Dominika Korona, Pawel Badura)....Pages 179-186
Barley Defects Identification by Convolutional Neural Networks (Michał Kozłowski, Piotr M. Szczypiński)....Pages 187-198
Wavelet Convolution Neural Network for Classification of Spiculated Findings in Mammograms (Magdalena Jasionowska, Aleksandra Gacek)....Pages 199-208
Weakly Supervised Cervical Histopathological Image Classification Using Multilayer Hidden Conditional Random Fields (Chen Li, Hao Chen, Dan Xue, Zhijie Hu, Le Zhang, Liangzi He et al.)....Pages 209-221
A Survey for Breast Histopathology Image Analysis Using Classical and Deep Neural Networks (Chen Li, Dan Xue, Zhijie Hu, Hao Chen, Yudong Yao, Yong Zhang et al.)....Pages 222-233
Front Matter ....Pages 235-235
Descriptive Seons: Measure of Brain Tissue Impairment (Artur Przelaskowski, Ewa Sobieszczuk, Izabela Domitrz)....Pages 237-248
An Automatic Method of Chronic Wounds Segmentation in Multimodal Images (Joanna Czajkowska, Marta Biesok, Jan Juszczyk, Agata Wijata, Bartłomiej Pyciński, Michal Krecichwost et al.)....Pages 249-257
Evaluation of Methods for Volume Estimation of Chronic Wounds (Jan Juszczyk, Agata Wijata, Joanna Czajkowska, Marta Biesok, Bartłomiej Pyciński, Ewa Pietka)....Pages 258-267
Infrared and Visible Image Fusion Objective Evaluation Method (Daniel Ledwoń, Jan Juszczyk, Ewa Pietka)....Pages 268-279
Wavelet Imaging Features for Classification of First-Episode Schizophrenia (Kateřina Maršálová, Daniel Schwarz)....Pages 280-291
Dynamic Occlusion Surface Estimation from 4D Multimodal Data (Agnieszka A. Tomaka, Leszek Luchowski, Dariusz Pojda, Michał Tarnawski)....Pages 292-303
Evaluation of Dental Implant Stability Using Radiovisiographic Characterization and Texture Analysis (Marta Borowska, Janusz Szarmach)....Pages 304-313
Patella – Atlas Based Segmentation (Piotr Zarychta)....Pages 314-322
Front Matter ....Pages 323-323
De-Identification of Electronic Health Records Data (Piotr Borowik, Piotr Brylicki, Mariusz Dzieciątko, Waldemar Jęda, Łukasz Leszewski, Piotr Zając)....Pages 325-337
Correcting Polish Bigrams and Diacritical Marks (Mariusz Dzieciątko, Dominik Spinczyk, Piotr Borowik)....Pages 338-348
Initial Motivation as a Factor Predicting the Progress of Learning Mathematics for the Blind (Michał Maćkowski, Katarzyna Rojewska, Mariusz Dzieciątko, Dominik Spinczyk)....Pages 349-357
Front Matter ....Pages 359-359
Relationship Between Body Sway and Body Building in Girls and Boys in Developmental Age (Anna Lipowicz, Tomasz Szurmik, Monika N. Bugdol, Katarzyna Graja, Piotr Kurzeja, Andrzej W. Mitas)....Pages 361-370
Classification of Girls’ Sexual Maturity Using Factor Analysis and Analysis of Moderated Mediation (Monika N. Bugdol, Marta Marszałek, Marcin D. Bugdol)....Pages 371-381
Subjective and Objective Assessment of Developmental Dysfunction in Children Aged 0–3 Years – Comparative Study (Mariola Ciuraj, Katarzyna Kieszczyńska, Iwona Doroniewicz, Anna Lipowicz)....Pages 382-391
Intra- and Intergroup Measurement Errors in Anthropometric Studies Carried Out on Face Images (Katarzyna Graja, Anna Lipowicz, Monika N. Bugdol, Andrzej W. Mitas)....Pages 392-401
Comparative Analysis of Selected Stabilographic Parameters Among Polish and Slovak Children in Aspect of Factors Indirectly Affecting the Body Posture (Tomasz Szurmik, Piotr Kurzeja, Jarosław Prusak, Zuzanna Hudakova, Bartłomiej Gąsienica-Walczak, Karol Bibrowicz et al.)....Pages 402-412
The Impact of Physical Activity on the Change of Pulse Wave Parameters (Anna Mańka, Robert Michnik, Andrzej W. Mitas)....Pages 413-424
Evaluation of Selected Parameters Related to Maintaining the Body Balance in the Aspect of Physical Activity in Young Adults (Piotr Kurzeja, Jarosław Prusak, Tomasz Szurmik, Jan Potoniec, Zuzanna Hudakova, Bartłomiej Gąsienica-Walczak et al.)....Pages 425-435
RAS in the Aspect of Symmetrization of Lower Limb Loads (Patrycja Romaniszyn, Damian Kania, Katarzyna Nowakowska, Marta Sobkowiak, Bruce Turner, Andrzej Myśliwiec et al.)....Pages 436-447
Rhythmic Auditory Stimulation. Biocybernetics Dimension of Music Entrainment (Bruce Turner, Andrzej W. Mitas)....Pages 448-459
Front Matter ....Pages 461-461
Pointwise Estimation of Noise in a Multilead ECG Record (Piotr Augustyniak)....Pages 463-472
Heart Rate Variability Analysis on Reference Heart Beats and Detected Heart Beats of Smartphone Seismocardiograms (Szymon Sieciński, Paweł S. Kostka, Ewaryst J. Tkacz, Natalia Piaseczna, Marta Wadas)....Pages 473-480
The New Approach for ECG Signal Quality Index Estimation on the Base of Robust Statistic (Tomasz Pander, Tomasz Przybyła)....Pages 481-494
8-Lead Bioelectrical Signals Data Acquisition Unit (Tadeáš Bednár, Branko Babušiak, Milan Smetana)....Pages 495-506
Smart Sheet Design for Electrocardiogram Measurement (Branko Babusiak, Stefan Borik, Maros Smondrk, Ladislav Janousek)....Pages 507-517
Evaluation of Specific Absorption Rate in SAM Head Phantom with Cochlear Implant with and Without Hand Model Near PIFA Antenna (Jana Mydlova, Mariana Benova, Zuzana Psenakova, Maros Smondrk)....Pages 518-527
Front Matter ....Pages 529-529
The Evaluation of Gait Strategy on a Treadmill Using a New IGS Index Based on Frequency Analysis of COP Course (Piotr Wodarski, Jacek Jurkojć, Andrzej Bieniek, Miłosz Chrzan, Robert Michnik, Zygmunt Łukaszczyk et al.)....Pages 531-542
The Analysis of the Influence of Virtual Reality on Parameters of Gait on a Treadmill According to Adjusted and Non-adjusted Pace of the Visual Scenery (Piotr Wodarski, Jacek Jurkojć, Andrzej Bieniek, Miłosz Chrzan, Robert Michnik, Jacek Polechoński et al.)....Pages 543-553
Assessment of Loads Exerted on the Lumbar Segment of the Vertebral Column in Everyday-Life Activities – Application of Methods of Mathematical Modelling (Hanna Zadoń, Robert Michnik, Katarzyna Nowakowska, Andrzej Myśliwiec)....Pages 554-565
Evaluation of Muscle Activity of the Lower Limb During Isometric Rotation Based on Measurements Using a Dynamometric and Dynamographic Platform (Miłosz Chrzan, Robert Michnik, Andrzej Bieniek, Piotr Wodarski, Andrzej Myśliwiec)....Pages 566-577
Adhesion of Poly(lactide-glycolide) Coating (PLGA) on the Ti6Al7Nb Alloy Substrate (Janusz Szewczenko, Wojciech Kajzer, Anita Kajzer, Marcin Basiaga, Marcin Kaczmarek, Roman Major et al.)....Pages 578-589
Front Matter ....Pages 591-591
Classification System for Multi-class Biomedical Data that Allows Different Data Fusion Strategies (Sebastian Student, Krzysztof Łakomiec, Alicja Płuciennik, Wojciech Bensz, Krzysztof Fujarewicz)....Pages 593-602
Preliminary Study of Computer Aided Diagnosis Methodology for Modeling and Visualization the Respiratory Deformations of the Breast Surface (Aleksandra Juraszczyk, Mateusz Bas, Dominik Spinczyk)....Pages 603-613
Software Tool for Tracking of Voxel Phantom’s Anatomical Features (Maros Smondrk, Branko Babusiak, Mariana Benova)....Pages 614-622
Evaluation of the Usefulness of Images Transmitted by the Internet Communicator for the Diagnosis of Surgical Changes (Dariusz Dzielicki, Paweł Mikos, Krzysztof Dzielicki, Witold Lukas, Józef Dzielicki)....Pages 623-629
Digitized Records with Automatic Evaluation for Natural Family Planning and Hormonal Treatment (Zuzana Judáková, Ivana Gálová, Michal Gála, Ladislav Janoušek)....Pages 630-637
Design and Testing of Radiofrequency Instrument RONLINE (Alice Krestanova, Jan Kracmar, Milada Hlavackova, Jan Kubicek, Petr Vavra, Marek Penhaker et al.)....Pages 638-649
Back Matter ....Pages 651-653

Citation preview

Advances in Intelligent Systems and Computing 1011

Ewa Pietka Pawel Badura Jacek Kawa Wojciech Wieclawek Editors

Information Technology in Biomedicine

Advances in Intelligent Systems and Computing Volume 1011

Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Nikhil R. Pal, Indian Statistical Institute, Kolkata, India Rafael Bello Perez, Faculty of Mathematics, Physics and Computing, Universidad Central de Las Villas, Santa Clara, Cuba Emilio S. Corchado, University of Salamanca, Salamanca, Spain Hani Hagras, School of Computer Science & Electronic Engineering, University of Essex, Colchester, UK László T. Kóczy, Department of Automation, Széchenyi István University, Gyor, Hungary Vladik Kreinovich, Department of Computer Science, University of Texas at El Paso, El Paso, TX, USA Chin-Teng Lin, Department of Electrical Engineering, National Chiao Tung University, Hsinchu, Taiwan Jie Lu, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia Patricia Melin, Graduate Program of Computer Science, Tijuana Institute of Technology, Tijuana, Mexico Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro, Rio de Janeiro, Brazil Ngoc Thanh Nguyen, Faculty of Computer Science and Management, Wrocław University of Technology, Wrocław, Poland Jun Wang, Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong

The series “Advances in Intelligent Systems and Computing” contains publications on theory, applications, and design methods of Intelligent Systems and Intelligent Computing. Virtually all disciplines such as engineering, natural sciences, computer and information science, ICT, economics, business, e-commerce, environment, healthcare, life science are covered. The list of topics spans all the areas of modern intelligent systems and computing such as: computational intelligence, soft computing including neural networks, fuzzy systems, evolutionary computing and the fusion of these paradigms, social intelligence, ambient intelligence, computational neuroscience, artificial life, virtual worlds and society, cognitive science and systems, Perception and Vision, DNA and immune based systems, self-organizing and adaptive systems, e-Learning and teaching, human-centered and human-centric computing, recommender systems, intelligent control, robotics and mechatronics including human-machine teaming, knowledge-based paradigms, learning paradigms, machine ethics, intelligent data analysis, knowledge management, intelligent agents, intelligent decision making and support, intelligent network security, trust management, interactive entertainment, Web intelligence and multimedia. The publications within “Advances in Intelligent Systems and Computing” are primarily proceedings of important conferences, symposia and congresses. They cover significant recent developments in the field, both of a foundational and applicable character. An important characteristic feature of the series is the short publication time and world-wide distribution. This permits a rapid and broad dissemination of research results. ** Indexing: The books of this series are submitted to ISI Proceedings, EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink **

More information about this series at http://www.springer.com/series/11156

Ewa Pietka Pawel Badura Jacek Kawa Wojciech Wieclawek •

•

•

Editors

Information Technology in Biomedicine

123

Editors Ewa Pietka Faculty of Biomedical Engineering Silesian University of Technology Zabrze, Poland

Pawel Badura Faculty of Biomedical Engineering Silesian University of Technology Zabrze, Poland

Jacek Kawa Faculty of Biomedical Engineering Silesian University of Technology Zabrze, Poland

Wojciech Wieclawek Faculty of Biomedical Engineering Silesian University of Technology Zabrze, Poland

ISSN 2194-5357 ISSN 2194-5365 (electronic) Advances in Intelligent Systems and Computing ISBN 978-3-030-23761-5 ISBN 978-3-030-23762-2 (eBook) https://doi.org/10.1007/978-3-030-23762-2 © Springer Nature Switzerland AG 2019 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

Information technology is a rapidly evolving discipline in medical data science, presenting one of the most significant potential for the future of health care. Multimodal acquisition systems, mobile devices, sensors, and AI-powered applications give a new meaning to the optimization of clinical processes. Traditional signal and image data is completed by patient-related and case-related data which is able to reflect the patient condition and support the diagnosis, treatment, and physiotherapy procedures. The book includes nine parts that discuss various problems related to problem-dependant issues as well as general approaches to data acquisition, analysis, classification, and visualization. A special attention is paid to Active and Assisted Living in aging society. More specifically, the thematic scope of particular parts includes the aspects listed below. Quantitative Data Analysis in Medical Diagnosis part addresses methods for processing of big quantitative medical data. Particular chapters discuss the biomedical image registration, visualization and modeling, data analysis, recognition, and retrieval employed in detection and diagnosis support systems. Medical Data Science part presents original studies reporting on scientific approaches, especially pattern-recognition and machine-learning-based algorithms for interpretation of health-related, sensory data and ICT solutions toward promoting active and healthy lifestyle connected with demographic changes leading to society aging. Data Mining Tools and Methods in Medical Applications part covers a broad range of medical data mining approaches in diagnostic applications and decision support systems including classical and convolutional neural networks and other artificial intelligence tools at the stage of feature selection and transformation, outlier detection, pattern recognition, and classification. Image Analysis part introduces multimodal data acquisition as well as 3D and 4D medical image analysis. Analytics in Action on SAS Platform part presents techniques including cognitive computing, deep learning, natural language processing, and machine learning combined with scalable, high-performance SAS platform in healthcare applications that may improve patient care. v

vi

Preface

Biocybernetics in Physiotherapy part introduces biocybernetic support in innovative physiotherapy procedures by incorporating anthropometrics, data acquisition, and processing systems. Signal Processing and Analysis part indicates various approaches to the ECG signal analysis, noise estimation, and heart rate variability analysis. Biomechanics and Biomaterials part presents the evaluation of gait strategy on a treadmill based on frequency analysis, an assessment of the virtual reality influence on gait parameters, evaluation of muscle activities, and a study on biodegradable polymers technology. Medical Tools and Interfaces part presents studies on workstation interface design and medical informatics tools employed in diagnosis, treatment, and visualization. The editors would like to express their gratitude to all authors who contributed their original research reports as well as to all reviewers for their valuable comments. Your effort has contributed to the high quality of the book that we pass on to the readers. Zabrze, Poland June 2019

Ewa Pietka

Contents

Quantitative Data Analysis in Medical Diagnosis Functional Thermal Imaging of Skin Tissue Using the Discrete Thermal Time Constants Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maria Strąkowska, Robert Strąkowski, and Michał Strzelecki Contextual Classification of Tumor Growth Patterns in Digital Histology Slides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zaneta Swiderska-Chadaj, Zhaoxuan Ma, Nathan Ing, Tomasz Markiewicz, Malgorzata Lorent, Szczepan Cierniak, Ann E. Walts, Beatrice S. Knudsen, and Arkadiusz Gertych Cervical Histopathology Image Classification Using Ensembled Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chen Li, Dan Xue, Fanjie Kong, Zhijie Hu, Hao Chen, Yudong Yao, Hongzan Sun, Le Zhang, Jinpeng Zhang, Tao Jiang, Jianying Yuan, and Ning Xu Functional Kidney Analysis Based on Textured DCE-MRI Images . . . . Marcin Kociołek, Michał Strzelecki, and Artur Klepaczko Incorporating Patient Photographs in the Radiology Image Acquisition and Interpretation Process . . . . . . . . . . . . . . . . . . . . . . . . . . Elizabeth A. Krupinski Iterative Statistical Reconstruction Algorithm Based on C-C Data Model with the Direct Use of Projections Performed in Spiral Cone-Beam CT Scanners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Robert Cierniak and Piotr Pluta Deformable Mesh for Regularization of Three-Dimensional Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Piotr M. Szczypiński and Artur Klepaczko

3

13

26

38

50

56

67

vii

viii

Contents

Simulator for Modelling Confocal Microscope Distortions . . . . . . . . . . . Katarzyna Sprawka and Piotr M. Szczypiński

79

Medical Data Science Electromyography Based Translator of the Polish Sign Language . . . . . Noemi Kowalewska, Przemysław Łagodziński, and Marcin Grzegorzek

93

Electrooculography Application in Vision Therapy Using Smart Glasses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Maja Trzepacz, Przemysław Łagodziński, and Marcin Grzegorzek Assessment of Muscle Fatigue, Strength and Muscle Activation During Exercises with the Usage of Robot Luna EMG, Among Patients with Multiple Sclerosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Krystyna Stańczyk, Anna Poświata, Anna Roksela, and Michał Mikulski Information Models of Dynamics in Healthcare . . . . . . . . . . . . . . . . . . . 129 Václav Řepa Effects of External Conditions to Chaotic Properties of Human Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Radek Halfar, Martina Litschmannová, and Martin Černý A Preliminary Evaluation of Transferring the Approach Avoidance Task into Virtual Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Tanja Joan Eiler, Armin Grünewald, Alla Machulska, Tim Klucken, Katharina Jahn, Björn Niehaves, Carl Friedrich Gethmann, and Rainer Brück Data Mining Tools and Methods in Medical Applications Convolutional Neural Networks in Speech Emotion Recognition – Time-Domain and Spectrogram-Based Approach . . . . . . 167 Bartłomiej Stasiak, Sławomir Opałka, Dominik Szajerman, and Adam Wojciechowski Convolutional Neural Networks for Computer Aided Diagnosis of Interdental and Rustling Sigmatism . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Andre Woloshuk, Michal Krecichwost, Zuzanna Miodonska, Dominika Korona, and Pawel Badura Barley Defects Identification by Convolutional Neural Networks . . . . . . 187 Michał Kozłowski and Piotr M. Szczypiński Wavelet Convolution Neural Network for Classification of Spiculated Findings in Mammograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Magdalena Jasionowska and Aleksandra Gacek

Contents

ix

Weakly Supervised Cervical Histopathological Image Classification Using Multilayer Hidden Conditional Random Fields . . . . . . . . . . . . . . 209 Chen Li, Hao Chen, Dan Xue, Zhijie Hu, Le Zhang, Liangzi He, Ning Xu, Shouliang Qi, He Ma, and Hongzan Sun A Survey for Breast Histopathology Image Analysis Using Classical and Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 Chen Li, Dan Xue, Zhijie Hu, Hao Chen, Yudong Yao, Yong Zhang, Mo Li, Qian Wang, and Ning Xu Image Analysis Descriptive Seons: Measure of Brain Tissue Impairment . . . . . . . . . . . . 237 Artur Przelaskowski, Ewa Sobieszczuk, and Izabela Domitrz An Automatic Method of Chronic Wounds Segmentation in Multimodal Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 Joanna Czajkowska, Marta Biesok, Jan Juszczyk, Agata Wijata, Bartłomiej Pyciński, Michal Krecichwost, and Ewa Pietka Evaluation of Methods for Volume Estimation of Chronic Wounds . . . . 258 Jan Juszczyk, Agata Wijata, Joanna Czajkowska, Marta Biesok, Bartłomiej Pyciński, and Ewa Pietka Infrared and Visible Image Fusion Objective Evaluation Method . . . . . 268 Daniel Ledwoń, Jan Juszczyk, and Ewa Pietka Wavelet Imaging Features for Classification of First-Episode Schizophrenia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 Kateřina Maršálová and Daniel Schwarz Dynamic Occlusion Surface Estimation from 4D Multimodal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 Agnieszka A. Tomaka, Leszek Luchowski, Dariusz Pojda, and Michał Tarnawski Evaluation of Dental Implant Stability Using Radiovisiographic Characterization and Texture Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 304 Marta Borowska and Janusz Szarmach Patella – Atlas Based Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 Piotr Zarychta Analytics in Action on SAS Platform De-Identification of Electronic Health Records Data . . . . . . . . . . . . . . . 325 Piotr Borowik, Piotr Brylicki, Mariusz Dzieciątko, Waldemar Jęda, Łukasz Leszewski, and Piotr Zając

x

Contents

Correcting Polish Bigrams and Diacritical Marks . . . . . . . . . . . . . . . . . 338 Mariusz Dzieciątko, Dominik Spinczyk, and Piotr Borowik Initial Motivation as a Factor Predicting the Progress of Learning Mathematics for the Blind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 Michał Maćkowski, Katarzyna Rojewska, Mariusz Dzieciątko, and Dominik Spinczyk Biocybernetics in Physiotherapy Relationship Between Body Sway and Body Building in Girls and Boys in Developmental Age . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 Anna Lipowicz, Tomasz Szurmik, Monika N. Bugdol, Katarzyna Graja, Piotr Kurzeja, and Andrzej W. Mitas Classification of Girls’ Sexual Maturity Using Factor Analysis and Analysis of Moderated Mediation . . . . . . . . . . . . . . . . . . . . . . . . . . 371 Monika N. Bugdol, Marta Marszałek, and Marcin D. Bugdol Subjective and Objective Assessment of Developmental Dysfunction in Children Aged 0–3 Years – Comparative Study . . . . . . . . . . . . . . . . 382 Mariola Ciuraj, Katarzyna Kieszczyńska, Iwona Doroniewicz, and Anna Lipowicz Intra- and Intergroup Measurement Errors in Anthropometric Studies Carried Out on Face Images . . . . . . . . . . . . . . . . . . . . . . . . . . . 392 Katarzyna Graja, Anna Lipowicz, Monika N. Bugdol, and Andrzej W. Mitas Comparative Analysis of Selected Stabilographic Parameters Among Polish and Slovak Children in Aspect of Factors Indirectly Affecting the Body Posture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402 Tomasz Szurmik, Piotr Kurzeja, Jarosław Prusak, Zuzanna Hudakova, Bartłomiej Gąsienica-Walczak, Karol Bibrowicz, and Andrzej W. Mitas The Impact of Physical Activity on the Change of Pulse Wave Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 Anna Mańka, Robert Michnik, and Andrzej W. Mitas Evaluation of Selected Parameters Related to Maintaining the Body Balance in the Aspect of Physical Activity in Young Adults . . . . . . . . . . 425 Piotr Kurzeja, Jarosław Prusak, Tomasz Szurmik, Jan Potoniec, Zuzanna Hudakova, Bartłomiej Gąsienica-Walczak, Karol Bibrowicz, and Andrzej W. Mitas RAS in the Aspect of Symmetrization of Lower Limb Loads . . . . . . . . . 436 Patrycja Romaniszyn, Damian Kania, Katarzyna Nowakowska, Marta Sobkowiak, Bruce Turner, Andrzej Myśliwiec, Robert Michnik, and Andrzej W. Mitas

Contents

xi

Rhythmic Auditory Stimulation. Biocybernetics Dimension of Music Entrainment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448 Bruce Turner and Andrzej W. Mitas Signal Processing and Analysis Pointwise Estimation of Noise in a Multilead ECG Record . . . . . . . . . . 463 Piotr Augustyniak Heart Rate Variability Analysis on Reference Heart Beats and Detected Heart Beats of Smartphone Seismocardiograms . . . . . . . . 473 Szymon Sieciński, Paweł S. Kostka, Ewaryst J. Tkacz, Natalia Piaseczna, and Marta Wadas The New Approach for ECG Signal Quality Index Estimation on the Base of Robust Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481 Tomasz Pander and Tomasz Przybyła 8-Lead Bioelectrical Signals Data Acquisition Unit . . . . . . . . . . . . . . . . 495 Tadeáš Bednár, Branko Babušiak, and Milan Smetana Smart Sheet Design for Electrocardiogram Measurement . . . . . . . . . . . 507 Branko Babusiak, Stefan Borik, Maros Smondrk, and Ladislav Janousek Evaluation of Specific Absorption Rate in SAM Head Phantom with Cochlear Implant with and Without Hand Model Near PIFA Antenna . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518 Jana Mydlova, Mariana Benova, Zuzana Psenakova, and Maros Smondrk Biomechanics and Biomaterials The Evaluation of Gait Strategy on a Treadmill Using a New IGS Index Based on Frequency Analysis of COP Course . . . . . . . . . . . . . . . 531 Piotr Wodarski, Jacek Jurkojć, Andrzej Bieniek, Miłosz Chrzan, Robert Michnik, Zygmunt Łukaszczyk, and Marek Gzik The Analysis of the Influence of Virtual Reality on Parameters of Gait on a Treadmill According to Adjusted and Non-adjusted Pace of the Visual Scenery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543 Piotr Wodarski, Jacek Jurkojć, Andrzej Bieniek, Miłosz Chrzan, Robert Michnik, Jacek Polechoński, and Marek Gzik Assessment of Loads Exerted on the Lumbar Segment of the Vertebral Column in Everyday-Life Activities – Application of Methods of Mathematical Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554 Hanna Zadoń, Robert Michnik, Katarzyna Nowakowska, and Andrzej Myśliwiec

xii

Contents

Evaluation of Muscle Activity of the Lower Limb During Isometric Rotation Based on Measurements Using a Dynamometric and Dynamographic Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566 Miłosz Chrzan, Robert Michnik, Andrzej Bieniek, Piotr Wodarski, and Andrzej Myśliwiec Adhesion of Poly(lactide-glycolide) Coating (PLGA) on the Ti6Al7Nb Alloy Substrate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578 Janusz Szewczenko, Wojciech Kajzer, Anita Kajzer, Marcin Basiaga, Marcin Kaczmarek, Roman Major, Wojciech Simka, Joanna Jaworska, Katarzyna Jelonek, Paulina Karpeta-Jarząbek, and Janusz Kasperczyk Medical Tools and Interfaces Classification System for Multi-class Biomedical Data that Allows Different Data Fusion Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593 Sebastian Student, Krzysztof Łakomiec, Alicja Płuciennik, Wojciech Bensz, and Krzysztof Fujarewicz Preliminary Study of Computer Aided Diagnosis Methodology for Modeling and Visualization the Respiratory Deformations of the Breast Surface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603 Aleksandra Juraszczyk, Mateusz Bas, and Dominik Spinczyk Software Tool for Tracking of Voxel Phantom’s Anatomical Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614 Maros Smondrk, Branko Babusiak, and Mariana Benova Evaluation of the Usefulness of Images Transmitted by the Internet Communicator for the Diagnosis of Surgical Changes . . . . . . . . . . . . . . 623 Dariusz Dzielicki, Paweł Mikos, Krzysztof Dzielicki, Witold Lukas, and Józef Dzielicki Digitized Records with Automatic Evaluation for Natural Family Planning and Hormonal Treatment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 630 Zuzana Judáková, Ivana Gálová, Michal Gála, and Ladislav Janoušek Design and Testing of Radiofrequency Instrument RONLINE . . . . . . . . 638 Alice Krestanova, Jan Kracmar, Milada Hlavackova, Jan Kubicek, Petr Vavra, Marek Penhaker, and Petr Ihnat Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651

Quantitative Data Analysis in Medical Diagnosis

Functional Thermal Imaging of Skin Tissue Using the Discrete Thermal Time Constants Spectrum (B) Maria Strakowska , Robert Strakowski, and Michal Strzelecki

Lodz University of Technology, Institute of Electronics, 211/215 W´ olcza´ nska Str., 90-924 L´ od´z, Poland {maria.strakowska,robert.strakowski,michal.strzelecki}@p.lodz.pl

Abstract. In this paper we present functional thermal imaging using InfraRed (IR) thermography to measure temperature rise in a skin tissue in transient state after weak cooling. Skin tissue is the multilayer complex biological structure. It can be modelled using thermal-electrical analogy by Foster and/or Cauer networks consisting of thermal resistances and capacitances Rth − Cth . The proposed methodology allows identifying thermal time constants of a multilayer biomedical structure. The distributions of parameters used for approximation temperature evolution at the upper surface of the skin tissue with psoriasis are presented. They correlate with the inflammation areas of the skin. Keywords: IR active thermography medical imaging

1

· cold provocation · screening ·

Introduction

Spectrum of time constants is frequently used for modelling of electrical and electronic dynamic systems described by their transfer functions in many practical applications [2–4,34–36]. This technique has already been applied for thermal biomedical structures [31,33], including the skin which also represents complex biological structure [12,16,17,25,32]. In this case, the thermal model takes a form of a low pass filter with a few single-order poles [2,4]. There are different methods implemented for identification the parameters of transfer functions. The most known and used are: Network Identification by Deconvolution (NID), Continuous-time System Identification (CONTSID), Vector Fitting (VF ), Computer-Aided Program for Time-series Analysis and Identification of Noisy Systems (CAPTAIN ) and Transfer Function Estimation (TFEST ) implemented in Matlab [1–4,6–11,13,15,18,20–23,26–31,34–37]. In this paper we present the methodology for thermal modelling of a tissue using numerical gradientless optimisation. Chosen optimisation methods are implemented in Matlab. After a few trails, fminsearch method was selected in this research [19,24]. c Springer Nature Switzerland AG 2019 E. Pietka et al. (Eds.): ITIB 2019, AISC 1011, pp. 3–12, 2019. https://doi.org/10.1007/978-3-030-23762-2_1

4

M. Strakowska et al.

One has to underline that the thermal Rth − Cth model of multilayer structures is a difficult numerical task. It is the thermal inverse problem which is in the general case very ill-conditioned. This problem becomes even more severe for biomedical applications, because a tissue is not solid material with constant thermal parameters. Tissue is a kind of porous material with perfusion and blood flow, and it reacts on the external stimulus as the close negative feedback dynamic system with thermoregulation [33].

2

Spectrum of Thermal Time Constants for Multilayer Structures

The overall method for time constants’ identification and visualisation is presented in Fig. 1. It starts from temperature measurement using IR thermal camera just after a weak cooling of the upper part of a skin tissue. It is recommended to use a high-speed thermal camera, which allows capturing images with at least 25 Hz frame rate. The higher the frame rate, the shorter the time constants can be identified. As the method is tested on patients in a clinic, the first pre-processing step of the algorithm refers to the movement correction [30]. It is necessary to perform the movement correction because the measurement lasts several minutes typically. During that time patients are not able to stand still. Next, the region of interest (ROI) is determined as the area where the skin was cooled (the difference between first and last frame of the measure sequence). This automatically

Fig. 1. The block scheme of time constant visualisation for the multilayer skin structure

Functional Thermal Imaging of Skin Tissue

5

selected ROI is divided into rectangular sub-blocks with 3 × 3 pixels each. The average temperature in time is calculated for each block. The crucial step of the proposed method is the approximation of temperature curve in time domain using a given series of exponential and error functions. Two models of temperature variation in time were proposed. The first one was a series of 2 or 3 elementary exponential functions – Eq. (1). The second approximation was made using the linear combination of 1 exponential term with the complementary error function erfc(x) – Eq. (2). T (t) =

N

−t Ti 1 − e τi

i=1

T (t) = T1 1 − e

−t τ1

where N = 2 or 3

t t τ2 + T2 1 − e erf c τ2

(1)

(2)

The first approximation corresponds directly to the Foster Rf −th − Cf −th thermal network [4,30,31]. The Foster network has no physical meaning in contrast to the Cauer network. In this research, the visualisation of the parameters (Ti , τi ) of the approximating functions is presented. Instead of using time constants one proposed to use the reciprocal of time constant - angular frequency ωi = 1/τi . The mathematical details of each step in the algorithm have already been presented [31].

3

Software for Functional Imaging of Skin Tissues

The program τ -spectrum for thermal parameters visualisation was written in the Matlab environment. It implements all steps of the method presented above. At first, the user has to load the sequence of the images from the thermal camera. In this research, the CEDIP Titanium IR camera was used. Let us notice that for the given example of ROI presented in Fig. 2 one needs to calculate time constants 8820 times. It is a very time-consuming task, and it lasts about 50 min. on Core i7-3820QM CPU. There is one very important subroutine that takes most of the time for processing. It is the optimisation. The fminsearch method implemented in Matlab program was used. It uses the Nelder-Mead downhill simplex algorithm [19,24]. It is a kind of nonlinear searching of the objective function minimum without calculating derivatives. In consequence, it has an advantage for fitting the noisy curves, but on the other hand, it is sensitive to the starting point [19,24]. The rest of the calculations is relatively fast. The main window of the program τ -spectrum with the first thermal image of sequence is presented in Fig. 2. In addition, τ -spectrum allows calibration of the camera, detection of cooled area (ROI that was mentioned earlier), extraction of all curves within detected ROI and finally visualization of curve parameters distribution on the surface of the skin. Approximation of temperature rise after cooling of a tissue with 3-time constants should be better than using 2-time constants, and it is so indeed but

6

M. Strakowska et al.

Fig. 2. Main screen of τ -spectrum program Table 1. Exemplary values of the parameters of approximation functions 2 tau (exp+erfc) 2 tau (exp) 3 tau (exp) T1 [◦ C]

0.6363

2.5346

1.4812

ω1 [1/s]

0.0814

0.1534

0.3419

T2 [◦ C]

5.3681

2.2840

1.6981

ω2 [1/s]

0.0166

0.0110

0.0471

T3 [◦ C]

–

–

1.7753

ω3 [1/s]

–

–

0.0072

0.9980

0.9994

Correlation 0.9994

the correlation coefficients are sometimes very close to each other – Table 1. Although it is an obvious statement, the significant difference is always at the beginning of the warming up process. Precise approximation at the beginning of the thermal recovery is recommended. The shortest time constants describe the fast reaction of the skin due to the blood flow in the superficial part of the skin. This important information can be useful for screening or diagnosis in medical treatment. The accuracy of the presented approximation in the early part of temperature rise is presented using logarithmic scale as in Fig. 3. Quantitatively this accuracy is expressed by correlation coefficient as in Table 1.

4

Results

The experiments were performed in the Bieganski State Hospital (Lodz, Poland) including the patients suffering from psoriasis. Psoriasis in the developed form

Functional Thermal Imaging of Skin Tissue

7

Fig. 3. Approximation of temperature versus time using (a) 2 exponential functions, (b) single exponential and complementary error functions and (c) 3 exponential functions

easily allows identifying the inflammation areas. In consequence, it is easy to compare and correlate functional images presenting the parameters’ distribution with inflammation of the tissue. In Fig. 4, the exemplary visual and thermal images of the skin tissue at the upper extremity of the patient are shown. The thermal image does not allow localizing the part of the skin with developed psoriasis. In Figs. 5, 6 and 7 the maps of distributions of Ti and ωi parameters are presented. As one can notice, that most of the functional images presenting distributions of angular frequencies corresponds to the visual image and inflammation regions. The short and large time constants describe short and long-time variation of temperature of the skin. In general, the distribution of time constants will allow segmenting the areas of the skin reacting faster and slower. Detailed analysis of the parametric images can lead to the conclusion the healthy part of the skin reacts faster, while the tissue with a disease is more thermally inertial. It agrees with general medical knowledge. It is an important conclusion, especially if no inflammation is visible. As all know, it can happen, e.g. when psoriasis returns after previous resection.

8

M. Strakowska et al.

Fig. 4. Photo (a) and a single thermal image (b), selected from the sequence, of the skin of the patient suffering from psoriasis

Fig. 5. Maps of thermal angular frequencies (reciprocal of time constants) obtained using exp+erfc approximation presented in Eq. (2) (a) ω1 , (b) ω2 and approximation using 2 exp - Eq. (1), (c) ω1 , (d) ω2 (where ω1 ω2 )

The parameters Ti (amplitudes of the time constants) are valuable as well. They show the contribution of an appropriate exponential component in the series approximating temperature rise (Eqs. 1–2). The high value of this parameter means that the appropriate term in the approximation equation has significant contribution and its influence is more visible on the time curve. The image presenting the distribution the largest time constant (the lowest angular frequency) can be eliminated from the analysis as it mainly depends on environmental conditions, such as convection cooling and evaporation at the surface of the skin. The more diagnostic information can be extracted by investigating the relations between time constants and their amplitudes in the spectrum. This problem requires a detailed examination in the future. In addition, the presented results clearly confirm the necessity of approximating the temperature versus time after

Functional Thermal Imaging of Skin Tissue

9

Fig. 6. Distributions of reciprocals of thermal angular frequencies – reciprocal of time constants (a) ω1 , (b) ω2 , (c) ω3 using series of 3 exponential components for temperature curve approximation presented in Eq. (1) (where ω1 ω2 ω3 )

Fig. 7. Distributions of (a) T1 , (b) T2 , (c) T3 parameters using series of 3 exponential components for temperature curve approximation presented in Eq. (1)

Fig. 8. The colormap used to present maps of calculated parameters

10

M. Strakowska et al.

cold provocation using the series of at least 3 exponential components, otherwise it is not possible to estimate the thermal parameters of the skin corresponding to the perfusion and blood flow.

5

Conclusions

In this paper the novel method of visualisation of the skin multilayer structure is presented. One proposed the parametric images showing distributions of angular frequencies, time constants, and their spectral amplitudes to visualise the inflammation parts of the tissue where the different perfusion and blood flow occur. The proposed approach is a kind of functional imaging using noninvasive thermal provocation by weak cooling and temperature measurement in transient state using high-speed thermal cameras. The practical and important result of the research is the conclusion, that minimum 3 thermal time constants approximation is required for quantitative analysis of the skin pathologies. The presented method of functional images needs inverse thermal problem solution which is an extremely ill-conditioned numerical procedure. It requires the precaution to choose the appropriate numerical methods and criteria for results’ assessment. The presented method can be used for screening of selected skin pathologies (such as psoriasis) or/and monitoring of healing, e.g. after skin burn or medical treatment. Also, it can be used to improve breast cancer detection, combined together with the approach described in [14].

References 1. B¨ uttner, W.: Ein numerisches verfahren zur exponentialapproximation von transienten w¨ armewiderst¨ andennumerical exponential approximation of transient thermal impedances. Archiv Elektrotech. 59(6), 351–359 (1977) 2. Chatziathanasiou, V., Chatzipanagiotou, P., Papagiannopoulos, I., De Mey, G., B.: Dynamic thermal analysis of underground medium power cables using Wiecek, thermal impedance, time constant distribution and structure function. Appl. Thermal Eng. 60(1–2), 256–260 (2013) B.: Influence of 3. Chatzipanagiotou, P., Chatziathanasiou, V., De Mey, G., Wiecek, soil humidity on the thermal impedance, time constant and structure function of underground cables: a laboratory experiment. Appl. Thermal Eng. 113, 1444–1451 (2017) M., De Mey, G., Chatziathanasiou, V., Wiecek, 4. Chatzipanagiotou, P., Strakowska, B., Kopeć, M.: A new software tool for transient thermal analysis based on fast IR camera temperature measurement. Measur. Autom. Monit. 63 (2017) 5. Organizing committee of the conference QIRT 2018. In: Berlin, G. (ed.) Proceedings of 14th Quantitative InfraRed Thermography Conference, QIRT Council (2018) 6. Corporation, M.G.: T3Ster-Master Thermal Evaluation Tool. Mentor Graphics Corporation, The address of the publisher. Version 2.2 7. Drmac, Z., Gugercin, S., Beattie, C.: Quadrature-based vector fitting for discretized H2 approximation. SIAM J. Sci. Comput. 37(2), A625–A652 (2015)

Functional Thermal Imaging of Skin Tissue

11

8. Garnier, H., Mensler, M., Richard, A.: Continuous-time model identification from sampled data: implementation issues and performance evaluation. Int. J. Control 76(13), 1337–1357 (2003) J.: Parameter estimation of the electrothermal 9. G´ orecki, K., Rogalska, M., Zarebski, model of the ferromagnetic core. Microelectron. Reliab. 54(5), 978–984 (2014) 10. Gorecki, K., Zarebski, J.: The influence of the selected factors on transient thermal impedance of semiconductor devices. In: 2014 Proceedings of the 21st International Conference on Mixed Design of Integrated Circuits & Systems (MIXDES), pp. 309– 314. IEEE (2014) 11. Hellen, E.H.: Padé-laplace analysis of signal averaged voltage decays obtained from a simple circuit. Am. J. Phys. 73(9), 871–875 (2005) 12. Herman, C.: The role of dynamic infrared imaging in melanoma diagnosis. Expert Rev. Dermatol. 8(2), 177–184 (2013) 13. Jakopovid, Z., Bencic, Z., Koncar, R.: Identification of thermal equivalent-circuit parameters for semiconductors. In: 1990 IEEE Workshop on Computers in Power Electronics, pp. 251–260. IEEE (1990) 14. Jakubowska, T., Wiecek, B., Wysocki, M., Drews-Peszynski, C., Strzelecki, M.: Classification of breast thermal images using artificial neural networks. In: The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, vol. 1, pp. 1155–1158. IEEE (2004) 15. Jibia, A.U., Salami, M.J.E.: An appraisal of gardner transform-based methods of transient multiexponential signal analysis. Int. J. Comput. Theory Eng. 4(1), 16 (2012) 16. Kaczmarek, M.: A new diagnostic ir-thermal imaging method for evaluation of cardiosurgery procedures. Biocybern. Biomed. Eng. 36(2), 344–354 (2016) 17. Kaczmarek, M., Nowakowski, A.: Active IR-thermal imaging in medicine. J. Nondestruct. Eval. 35(1), 19 (2016) B., De Mey, G., Hatzopoulos, A., Chatziathanasiou, V.: Ther18. Kalu˙za, M., Wiecek, mal impedance measurement of integrated inductors on bulk silicon substrate. Microelectron. Reliab. 73, 54–59 (2017) 19. Lagarias, J.C., Reeds, J.A., Wright, M.H., Wright, P.E.: Convergence properties of the nelder-mead simplex method in low dimensions. SIAM J. Optim. 9(1), 112–147 (1998) 20. Ljung, L.: Experiments with Identification of Continuous Time Models. Link¨ oping University Electronic Press (2009) 21. Marco, S., Palac´ın, J., Samitier, J.: Improved multiexponential transient spectroscopy by iterative deconvolution. IEEE Trans. Instrum. Measur. 50(3), 774–780 (2001) 22. Mathworks: Transfer function estimation (tfest) help. https://nl.mathworks.com/ help/ident/ref/tfest.html. Accessed 01 April 2019 23. Murthy, K., Bedford, R.: Transformation between Foster and Cauer equivalent networks. IEEE Trans. Circ. Syst. 25(4), 238–239 (1978) 24. Nelder, J.A., Mead, R.: A simplex method for function minimization. Comput. J. 7(4), 308–313 (1965) 25. Nowakowski, A., Kaczmarek, M.: Active dynamic thermography-problems of implementation in medical diagnostics. Quant. InfraRed Thermogr. J. 8(1), 89– 106 (2011) 26. Ozdemir, A.A., Gumussoy, S.: Transfer function estimation in system identification toolbox via vector fitting. IFAC-PapersOnLine 50(1), 6232–6237 (2017)

12

M. Strakowska et al.

27. Protonotarios, E., Wing, O.: Theory of nonuniform rc lines, part i: analytic properties and realizability conditions in the frequency domain. IEEE Trans. Circ. Theory 14(1), 2–12 (1967) 28. Russo, S.: Measurement and simulation of electrothermal effects in solid-state devices for RF applications. Ph.D. thesis, Universit` a degli Studi di Napoli Federico II (2010) 29. Savitzky, A., Golay, M.J.: Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 36(8), 1627–1639 (1964) M., Strakowski, R., Strzelecki, M., De Mey, G., Wiecek, B.: Thermal 30. Strakowska, modelling and screening method for skin pathologies using active thermography. Biocybern. Biomed. Eng. 38(3), 602–610 (2018) M., Chatzipanagiotou, P., De Mey, G., Chatziathanasiou, V., Wiecek, 31. Strakowska, B.: Novel software for medical and technical thermal object identification (TOI) using dynamic temperature measurements by fast IR cameras. In: Organizing Committee of the Conference QIRT 2018, pp. 531–538 (Berlin [5]). https://doi.org/10. 21611/qirt.2018.053 M., De Mey, G., Wiecek, B., Strzelecki, M.: A three layer model for the 32. Strakowska, thermal impedance of the human skin: modeling and experimental measurements. J. Mech. Med. Biol. 15(04), 1550,044 (2015) M., Strzelecki, M., Wiecek, B.: Thermal modelling and thermography 33. Strakowska, measurements of thermoregulation effects in a skin tissue. In: Organizing Committee of the Conference QIRT 2018, pp. 430–435. (Berlin [5]). https://doi.org/10. 21611/qirt.2018.034 34. Székely, V.: On the representation of infinite-length distributed rc one-ports. IEEE Trans. Circ. Syst. 38(7), 711–719 (1991) 35. Szekely, V.: Identification of RC networks by deconvolution: chances and limits. IEEE Trans. Circ. Syst. I Fundam. Theory Appl. 45(3), 244–258 (1998) 36. Vermeersch, B.: Thermal AC modelling, simulation and experimental analysis of microelectronic structures including nanoscale and high-speed effects. Ph.D. thesis, Ghent University. Faculty of Engineering (2009) 37. Young, P., Jakeman, A.: Refined instrumental variable methods of recursive timeseries analysis part iii. extensions. Int. J. Control 31(4), 741–764 (1980)

Contextual Classification of Tumor Growth Patterns in Digital Histology Slides Zaneta Swiderska-Chadaj1 , Zhaoxuan Ma3,4 , Nathan Ing2,3 , Tomasz Markiewicz1,5 , Malgorzata Lorent5 , Szczepan Cierniak5 , Ann E. Walts4 , Beatrice S. Knudsen3,4 , and Arkadiusz Gertych3,4(B) 1

4

5

Faculty of Electrical Engineering, Warsaw Technical University, Warsaw, Poland 2 Department of Surgery, Cedars-Sinai Medical Center, Los Angeles, USA 3 Department of Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, USA Department of Pathology and Laboratory Medicine, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA [email protected] Department of Pathomorphology, Military Institute of Medicine, Warsaw, Poland

Abstract. Patch-based image classification approaches are frequently employed in digital pathology applications with the goal to automatically delineate diagnostically important regions in digital slides. However, patches into which a slide is partitioned are often classified singly and sequentially without additional context. To address this issue, we tested a contextual classification of image patches with soft voting applied to a multi-class classification problem. The context comprised five or nine overlapping patches. The classification is performed using convolutional neural networks (CNNs) trained to recognize four histologically distinct growth patterns of lung adenocarcioma and non-tumor areas. Classification with soft voting outperformed sequential classification of patches yielding higher whole slide classification accuracy. The F1-scores for the four tumor growth patterns improved by 3% and 4.9% when the context consisted of five and nine neighboring patches, respectively. We conclude that the context can improve classification performance of areas sharing the same histological features. Soft voting is a non-trainable approach and therefore straightforward to implement. However, it is computationally more expensive than the classical single patch-based approach.

Keywords: Deep learning

1

· Digital pathology · Lung cancer

Introduction

Convolutional neural networks (CNNs) are the central component in the current generation of machine learning tools designed to build decision-making workflows for digital pathology. They have shown applicability in pattern recognition, c Springer Nature Switzerland AG 2019 E. Pietka et al. (Eds.): ITIB 2019, AISC 1011, pp. 13–25, 2019. https://doi.org/10.1007/978-3-030-23762-2_2

14

Z. Swiderska-Chadaj et al.

image classification and image segmentation. The main advantage of CNN-based techniques in evaluating digital images from histological or cytological specimens is a high level of accuracy, often comparable to that achieved by human observers. Specific applications in pathology include: delineating areas with cancer cells, grading tumors, detecting metastases or classifying cells by sub-type [2,11,13,18] in digital slides. More advanced tools have been reported to predict patient outcomes directly from digital pathology images or from histological findings linked to genomics data [3,15]. CNN also has the ability of generating readouts that are concordant with or exceed the diagnostic performance of a panel of experts [2]. Digital pathology slides are often analyzed using a patch-based approach. In this case, CNNs classify pixels in a square patch that slides over the digital slide [1,3,14]. After each patch has been visited and classified, classification results are assembled into an output image for visualization and further analysis. Patch-based classification is straightforward to implement and can be applied to digital slides regardless of image format and resolution. However, in typical patch-based pipelines, the spatial dependencies between patches are neglected and the inference performed is based on individual patches without context. Hence, inference on individual patches that are nearby may not be consistent. To address this problem, several trainable context-aware approaches have been proposed. In [1], the authors used activation features of a CNN that was trained for a patch-based classification. They also separately trained a support vector machine classifier (SVM) using features of overlapping patches and implemented majority voting to aggregate classifications outputted by the SVM. In [4] conditional random fields (CRF) formed from latent spaces of a trained CNN were applied to jointly assign labels to the patches. In this approach, extracted features from intermediate layers of the CNN are considered as observations in a fully-connected CRF model. In the current study, we evaluated the performance of a contextual classification of image patches using a non-trainable approach that we applied to a multi-class classification problem. During the diagnostic workup of lung adenocarcioma (LAC), pathologists manually evaluate distinct histological tumor growth patterns. The predominant tumor growth pattern has been reported to impact prognosis [16]. Tumors with predominantly micropapillary and solid patterns have been consistently associated with poor prognosis, and the percentage of cribriform pattern has also been identified as a marker of unfavorable prognosis [12,20]. The subjective nature of the manual slide evaluation yields sub-optimal agreement between pathologists in assessing growth patterns of LAC [19]. Our tests involved whole slide images (WSI) from LAC specimens. In this task, we aimed to automatically delineate areas of four growth patterns of LAC (acinar, micropapillary, solid, and cribriform), and non-tumor areas [6,10]. When applied to the classification of four tumor growth patterns of LAC, our proposed solution outperformed a classical patch-based approach.

Contextual Classification of Tumor Growth Patterns

2

15

Materials

136 H&E stained slides diagnosed by experienced pulmonary pathologist as primary LAC were obtained from the archives of the Department of Pathology and Laboratory Medicine at Cedars-Sinai Medical Center (CSMC) in Los Angeles, USA (91 slides from 50 cases of LAC) and the Department of Pathology of the Military Institute of Medicine in Warsaw (MIMW), Poland (45 slides from 15 cases of LAC). The number of slides in the study varied from 2 to 8 per case. At CSMC, the slides were digitized into WSI using Aperio AT Turbo Scanner (Leica Biosystems, Vista CA). The slides from MIMW were digitized with Pannoramic 250 Flash II scanner (3DHISTECH, Budapest, Hungary). All slides were digitized with the same scanning objective (20x); however, the pixel size in WSIs from CSMC was larger than the pixel size in MIMW WSIs (0.492 µm×0.492 µm versus 0.398 µm×0.398 µm) due to differences in optical path and image capture hardware settings in the scanners. The WSIs in our datasest encode gigapixel images varying from 74,000 × 47,000 pixels (largest) to 20,000 × 15,500 pixels (smallest) tissue sections. Each WSI was reviewed and manually annotated for four tumor growth patters [acinar (AC), micropapillary (MI), solid (SO), and cribriform (CR)], and non-tumor areas (NT). A WSI with pathologist annotations and the four LAC growth patterns are illustrated in Fig. 1. WSI dataset was partitioned into training and test sets. The training set consisted of all 45 MIMW slides and 33 randomly selected CSMC slides. The remaining 58 WSIs from CSMC comprised the test set. The CSMC slides were partitioned into the training or the test set by case to ensure that all WSIs from each case were either in the training or the test set. All patient identifiers were removed from glass the slides before scanning. The WSIs and annotations are a subset of the data collected for and used in our previous project [6,10].

3

Methods

We previously developed a pipeline to identify the four LAC growth patterns in digital slides (Fig. 2) [6]. Briefly, the tissue area in a WSI is divided into square patches and each patch is classified into one of the five classes: AC, SO, MP, or CR (tumor growth patterns), or NT (non-tumor). To determine the class for a given patch, we first classified that patch and eight additional patches that slightly overlap with it. A decision function was then applied to the classification results from these nine patches, and the output classification result identified by this function was assigned to the patch under consideration. The decision function, termed “voting” (Sect. 3.1), is illustrated in Fig. 3, and the patches were classified by a trained CNN (Sect. 3.2). In our previous work we had applied a voting scheme involving eight+one patches (V9) (Fig. 3C) [6], but did not fully evaluate its advantages and disadvantages. In the current project, we compared the performance of V9, V5 (a scheme involving four+one patches shown in Fig. 3B) and V0 (a scheme without voting shown in Fig. 3A). All three voting schemes are described in Sect. 3.1. The OpenSlide libraries [7] were used to access data from WSIs.

16

3.1

Z. Swiderska-Chadaj et al.

Voting Procedure

Voting is one way of combining predictions from an ensemble of sub-classifiers and applying them to make predictions for unseen data. In hard voting, only class labels are needed and the final prediction is generated by the majority vote. In soft voting probabilities outputted by the sub-classifiers are averaged and the final prediction determined by the highest average class probability as is n follows: y = arg maxi j=1 aj pij , where: pij is the probability of the j -th patch belonging to the i -th tissue class ∈ {1 . . . 5}, aj is the weight, and n is the total number of patches in the neighborhood. In our implementation, the weights are uniform, and the soft voting is performed using probabilities outputted by the CNN for each of the nine (V9, n = 9) or five (V5, n = 5) overlapping patches. No voting (V0) is performed when a single patch is classified by the CNN (Fig. 3). Overlapping patches located in the neighborhood of a centrally positioned patch are found by shifting the patch’s frame horizontally or vertically or in both directions by 1/3 of the patch’s length.

Fig. 1. An example of WSI annotations (upper panel) and square patches extracted to train CNNs to recognize four LAC growth patterns and non-tumor (lower panel)

Contextual Classification of Tumor Growth Patterns

17

Fig. 2. A schematic of the whole slide processing workflow. Tissue area separated from background is divided into patches by a grid. Each patch in the grid together with a small set of overlapping patches is extracted from WSI for CNN classification. A voting procedure aggregates the CNN-based classification results to provide inference for each patch in the grid. The patches are classified into one of the four classes of LAC (acinar, micropapillary, solid or cribriform), or non-tumor. The final result reconstructed from classified patches in the grid is a tumor growth pattern map. The four patterns of tumor growth and non-tumor tissue in the map are represented by different gray levels. The patch overlapping and voting schemes are illustrated in Fig. 3

Fig. 3. Illustration of contextual voting schemes. Classification of a single patch (no voting, A) can be contextually extended by adding four (B) or eight (C) overlapping patches from the neighborhood

The central patch shares 33% of its area with any of the horizontally or vertically shifted patches and 22% of its area with any patch shifted both vertically and horizontally. The patches are classified using the same CNN, and the final classification generated by soft voting is assigned to the central patch, which is then color coded and saved in the tumor growth pattern map.

18

3.2

Z. Swiderska-Chadaj et al.

CNN Model Training

Areas annotated in the 78 training WSIs were randomly sampled to extract non-overlapping patches. In total 19,942 patches: 4203 (AC), 3236 (MP), 3562 (SO), 3238 (CR), 5703 (NT) were prepared for training. Patches extracted from MIMW (756 × 756 pixels) were larger than those from CSMC (600 × 600 pixels) due to different pixel sizes in the WSIs. However, all patches included the same physical area (9 × 10−3 mm2 ) of tissue. Extracted patches were augmented by perturbing the color and geometry of each patch yielding 797,680 patches for CNN training [6]. Two currently used CNNs (GoogLeNet [17] and ResNet-50 [8]) were selected for training. Training parameters are detailed in Table 1. The training patches were respectively downsized to fit the receptive field of each of the network (227 × 227 pixels). Each network was trained de-novo, the weights were randomly initialized (uniform distribution), and the CNN weights were optimized using stochastic gradient descent. Table 1. Software environment and CNN training parameters CNN model

ResNet-50

GoogLeNet

Training environment MatConvNet(Matlab) Cafe Number of epochs

90

60

Learning rate

0.1

0.001

Batch size

64

64

Momentum

0.9

0.95

Dropout rate 0.5 *multiplied by 1/10 every 30 epochs

3.3

0.5

Performance Evaluation

Tissue classification performance was evaluated in 58 test WSIs by comparing computer (classification) and human (true) generated class labels at the pixel level. The comparison was performed by overlaying the tumor growth pattern map onto a map with pathologist annotations. True positive (TP), true negative (TN), false positive (FP) and false negative (FN) classifications were collected for each of the four tumor growth patterns and the non-tumor categories in a 5 × 5 confusion matrix (rows = pathologist GT, columns = computer classification). The results were tabulated as accuracy (ACC), precision (PR), (RE) and recall F1-score. The accuracy was calculated as: ACC=( (T P )+ (T N ))/( (T P )+ (T N ) + (F P ) + (F N )), with indicating the summation of pixels in a labeled category over all images in the test set. The F1-score was calculated as: F1-score= 2 · (P R · RE)/(P R + RE) that is the harmonic mean of the precision P R = T P /(T P + F P ) and the RE was calculated as: RE = T P /(T P + F N ).

Contextual Classification of Tumor Growth Patterns

19

The ACC, PR, RE and F1-score are commonly used to measure correctness of classification systems. In our work the ACC assesses the performance of recognizing all five tissue classes globally. PR is the ratio of correctly predicted positive observations to the total predicted positive observations. RE (sensitivity) is the ratio of correctly predicted positive observations to all of the observations in the actual class. As a harmonic mean of PR and RE, the F1-score serves as a measure of correct recognition of each tumor growth pattern. We also juxtaposed computer- and manually-generated tumor maps for visual and qualitative evaluations. The maps were colored to highlight differences between results obtained from different classification approaches with voting. WSI processing times were measured to estimate computational burden of each of the two voting schemes and no-voting.

4

Results

Trained CNNs were embedded into our WSI analysis pipeline to evaluate the effect of voting on the tissue classification performance in digital slides with LAC. We began by comparing the overall classification accuracies for V0 and V9 voting schemes with GoogLeNet and ResNet-50 CNNs used as patch classifiers. Figure 4 and Table 3 show an increase in the average and median accuracy of whole slide tissue classification. When the V9 voting scheme was applied, the pipeline with GooGLeNet yielded on average 3.36% higher whole slide classification accuracy compared to the same pipeline with no-voting. When the same slides were processed by the pipeline with Resnet-50, the accuracy improved by 1.3%. Since ResNet-50 on average slightly outperformed GoogLeNet in this task, we elected to use Resnet-50 in the remaining evaluations. Table 3 shows F1-scores quantifying the success rate of classifying tumor areas into each of the four tumor growth patterns using ResNet-50 conditioned with V0, V5 and V9 voting schemes. Tumor maps outputted by the pipeline with ResNet-50 are illustrated with pathologists manual outlines in Fig. 5. To assess computational burden, we first recorded processing time of all test WSIs with V0, V5 and V9 voting schemes, and then divided recorded time lapses (TL) for each WSI as follows: TL(V5)/TL(V0) and TL(V9)/TL(V0), and TL(V9)/TL(V5). On average, the analysis of a WSI using the V9 or the V5 voting schemes took 3.8× and 2.6× longer, respectively, than the analysis of the Table 2. Average and median of overall classification accuracy in WSIs (N = 58) analyzed by CNNs with no-voting (V0), Voting-5 (V5), and Voting-9 (V9) voting classification schemes GoogLeNet V0 V9

ResNet-50 V0 V9

Average 80.73% 84.09% 84.98% 86.28% Median

86.81% 91.79% 84.98% 90.01%

20

Z. Swiderska-Chadaj et al.

same slide using the V0 scheme. The processing time increased by 1.46× when the pipeline was upgraded from V5 to V9.

Fig. 4. Boxplots of overall tissue classification accuracy in 58 WSIs from the test set. V0 (blue) and V9 (green) voting schemes were tested with GoogLeNet and ResNet-50 CNNs as patch classifiers. Red lines indicate median accuracy

Table 3. Precision, Recall and F1-scores of tumor growth pattern recognition by ResNet-50 V0, V5 and V9 classification schemes Tumor

Precision

Growth pattern V0

5

V5

Recall V9

V0

F1-score V5

V9

V0

V5

V9

Acinar

0.862 0.891 0.899 0.669 0.692 0.704 0.753 0.779 0.790

Micropapillary

0.777 0.807 0.818 0.840 0.865 0.874 0.808 0.835 0.845

Solid

0.961 0.970 0.973 0.868 0.884 0.891 0.912 0.925 0.930

Cribriform

0.332 0.346 0.365 0.695 0.719 0.734 0.450 0.467 0.487

Discussion

Patch-based image classification is frequently employed in digital pathology applications [1,3,4,14] with the goal to automatically delineate diagnostically important regions. Once the digital slide has been partitioned into patches, the patches can be classified without or with additional context (consideration of the previous, next or any of the neighboring patches). An example of results produced by these techniques is shown in Fig. 5. Voting is one of the techniques that has been used to determine the class labels for entire WSIs [5,9]. In their research, voting was used globally, aggregating predictions from a large set of tumor patches collected from the entire WSI. In the current study we explored classifying image patches in a contextual manner locally, with the expectation that the context would improve classification performance of areas that share

Contextual Classification of Tumor Growth Patterns

21

Fig. 5. Example results of whole slide analysis by a pipeline with ResNet-50 contextually classifying image patches. Row A - pathologist’s manual annotations. Results for the classification schemes with voting are as follows: B - V9, C - V5, and D - V0. The tumor growth patterns are colored as follows: acinar (red), solid (blue), micropapillary (green), and cribriform (magenta). Sampled non-tumor areas in A are colored yellow. In B, C and D, non-tumor coloring while identified by the CNN is turned off for ease of figure viewing. Small spaces of white pixels are excluded from analysis (no-tissue)

same histological features in digital slides. Our study utilized CNNs that were trained to recognize four different tumor growth patterns of lung adenocarcioma and non-tumor areas. The context was formed by classification probabilities outputted by the CNN when classifying a small set of overlapping patches.

22

Z. Swiderska-Chadaj et al.

In this study we showed that: (a) the patch-based tissue classification with contextual voting achieves higher accuracy than the classical patch-based classification (with no voting), (b) improvement in slide classification accuracy does not depend on the CNN model used for classification, and (c) the number of patches analyzed contextually using voting positively impacts the tumor growth pattern recognition performance – the highest PR, RE and F1-scores were obtained with the V9 voting scheme across all four growth patterns, and all metrics were lower using the V5 voting scheme. The classification performance (as measured by PR, RE and F1-score) gradually improved as voting increased from V0 to V5 and V9. By upgrading the classification scheme from V0 to V5, and then from V5 to V9, the F1-score for the cribriform tumor growth pattern alone increased respectively by 3.77% and 4.28%. Upgrading the scheme from V0 directly to V9 yielded a 8.22% increase in F1-score. On average, the F1-score, RE and PR improved by 3% (V0 → V5), 1.86% (V5 → V9), and 4.9% (V0→ V9), respectively. We infer that the voting technique incorporates spatial information into the classification process, thus allowing more tissue area “to be seen” at a single step of inference. The smoothing effect of the voting scheme is seen in Fig. 5 row B with V9 yielding a tumor map most closely approaching ground truth (Fig. 5 row A). V5 (Fig. 5 row C) produced an intermediate result between V9 and V0 (Fig. 5 row D). Importantly, the utilization of V9 did not introduce any false positive identifications. In our study, voting improved recognition of tumor growth patterns. In contrast to other ensemble classification systems [5,9] which typically combine different classification models (decision trees, support vector machines, neural networks, and others) to vote class label for an image, we implemented voting to aggregate votes from a single CNN classifying contextually related image patches. By applying this technique, more homogeneous and “smoothened” tumor maps were produced. We recommend the V5 scheme as a first-line tool to reduce the number of outliers during inference. For platforms with more computing power, the V9 scheme would offer a better solution. As a non-trainable function, contextual voting can be seamlessly implemented in WSI processing pipelines. Our results show that it can improve tissue classification performance in tumor areas comprising different and often mixed tumor growth patterns. The pipeline with voting was tested on WSIs exhibiting areas of tumor or nests of tumor cells that are much larger than the area of the patch. Additional tests are needed to determine if voting would be suitable for analysis of slides in which tumor cell clusters smaller than the size of the patch predominate. Our software implementation of contextual voting was not optimal. The average time needed to process a digital slide was proportional to the fold increase in number patches that the CNN had to process. For the V5 and V9 voting schemes there were respectively 5× and 9× more patches to classify per slide. In our implementation, a neighborhood with patches (Fig. 2) was fetched from the file to the memory and then divided into patches for sequential classification by the CNN. The time required for a voting scheme could be kept relatively

Contextual Classification of Tumor Growth Patterns

23

constant by processing the whole slide once with an overlapping sliding window caching the results. The soft voting using the cached probabilities and class labels could be performed afterwards. To remove bottlenecks, a future upgrade of our pipeline will include batch reading and queueing of patches for faster analysis.

6

Conclusion

Contextual voting can improve the classification performance and detection rates of four histologically distinct growth patterns of LAC. Classification of patches extracted from a larger neighborhood yields more accurate classification results than the classification of patches extracted from a smaller neighborhood. Contextual voting is straightforward to implement but carries an increased computational burden when compared to classical patch-based whole slide analysis without voting. Acknowledgments. This work has been supported by in part by the Precision Health Grant at C-S, seed grants from the department of Surgery at CedarsSinai Medical Center and a grant from the National Science Centre, Poland (grant 2016/23/N/ST6/02076).

References 1. Awan, R., Koohbanani, N.A., Shaban, M., Lisowska, A., Rajpoot, N.: Contextaware learning using transferable features for classification of breast cancer histology images. In: Campilho, A., Karray, F., ter Haar Romeny, B. (eds.) Image Analysis and Recognition, pp. 788–795. Springer International Publishing, Cham (2018) 2. Bejnordi, E., Miko, V., van Diest, P.J., et al: Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318(22), 2199–2210 (2017) 3. Coudray, N., Ocampo, P.S., Sakellaropoulos, T., Narula, N., Snuderl, M., Fenyö, D., Moreira, A.L., Razavian, N., Tsirigos, A.: Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nature Med. 24(10), 1559 (2018) 4. Zanjani, F.G., Zinger, S.: Cancer detection in histopathology whole-slide images using conditional random fields on deep embedded spaces (2018) 5. Gecer, B., Aksoy, S., Mercan, E., Shapiro, L.G., Weaver, D.L., Elmore, J.G.: Detection and classification of cancer in whole slide breast histopathology images using deep convolutional networks. Pattern Recog. 84, 345–356 (2018) 6. Gertych, A., Swiderska-Chadaj, Z., Ma, Z., Ing, N., Markiewicz, T., Cierniak, S., Salemi, H., Guzman, S., Walts, A.E., Knudsen, B.S.: Convolutional neural networks can accurately distinguish four histologic growth patterns of lung adenocarcinoma in digital slides. Sci. Rep. 9, 1483 (2019) 7. Goode, A., Gilbert, B., Harkes, J., Jukic, D., Satyanarayanan, M.: OpenSlide: a vendor-neutral software foundation for digital pathology. J. Pathol. Inf. 4(1), 27 (2013)

24

Z. Swiderska-Chadaj et al.

8. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 9. Hou, L., Samaras, D., Kur, T.M., Gao, Y., Davis, J.E., Saltz, J.H.: Patch-based convolutional neural network for whole slide tissue image classification. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2424– 2433 (2016) 10. Ing, N., Salman, S., Ma, Z., Walts, A., Knudsen, B., Gertych, A.: Machine learning can reliably distinguish histological patterns of micropapillary and solid lung adenocarcinomas. In: Pietka, E., Badura, P., Kawa, J., Wieclawek, W. (eds.) 2016 Proceedings of 5th International Conference on Information Technologies in Medicine, ITIB 2016 Kamien Slaski, Poland, 20–22 June 2016, vol. 2, pp. 193–206. Springer International Publishing, Cham (2016) 11. Janowczyk, A., Madabhushi, A.: Deep learning for digital pathology image analysis: a comprehensive tutorial with selected use cases. J. Pathol. Inf. 7 (2016) 12. Kadota, K., Yeh, Y.C., Sima, C.S., Rusch, V.W., Moreira, A.L., Adusumilli, P.S., Travis, W.D.: The cribriform pattern identifies a subset of acinar predominant tumors with poor prognosis in patients with stage I lung adenocarcinoma: a conceptual proposal to classify cribriform predominant tumors as a distinct histologic subtype. Modern Pathol. 27(5), 690 (2014) 13. Li,W., Li, J., Sarma, K.V., Ho, K.C., Shen, S., Knudsen, B.S., Gertych, A., Arnold, C.W.: Path R-CNN for prostate cancer diagnosis and gleason grading of histological images. IEEE Trans. Med. Imaging 38(4), 945–954 (2019) 14. Ma, Z., Swiderska-Chadaj, Z., Ing, N., Salemi, H., McGovern, D., Knudsen, B., Gertych, A.: Semantic segmentation of colon glands in inflammatory bowel disease biopsies. In: Pietka, E., Badura, P., Kawa, J., Wieclawek, W. (eds.) Proceedings 6th International Conference, ITIB’2018, Kamien Slaski, Poland, June 18-20, 2018, Information Technology in Biomedicine, pp. 379–392. Springer International Publishing, Cham (2019) 15. Mobadersany, P., Yousefi, S., Amgad, M., Gutman, D.A., Barnholtz-Sloan, J.S., Vel´ azquez Vega, J.E., Brat, D.J., Cooper, L.A.D.: Predicting cancer outcomes from histology and genomics using convolutional networks. Proc. Natl Acad. Sci. 115(13), E2970–E2979 (2018) 16. Russell, P.A., Wainer, Z., Wright, G.M., Daniels, M., Conron, M., Williams, R.A.: Does lung adenocarcinoma subtype predict patient survival? A clinicopathologic study based on the new international association for the study of lung cancer/american thoracic society/european respiratory society international multidisciplinary lung adenocarcinoma classification. J. Thorac. Oncol. 6(9), 1496–1504 (2011) 17. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) 18. Tellez, D., Balkenhol, M., Otte-H¨ oller, I., van de Loo, R., Vogels, R., Bult, P., Wauters, C., Vreuls, W., Mol, S., Karssemeijer, N., Litjens, G., van der Laak, J., Ciompi, F.: Whole-slide mitosis detection in H&E breast histology using PHH3 as a reference to train distilled stain-invariant convolutional networks. IEEE Trans. Med. Imaging 37(9), 2126–2136 (2018) 19. Thunnissen, E., Beasley, M.B., Borczuk, A.C., Brambilla, E., Chirieac, L.R., Dacic, S., Flieder, D., Gazdar, A., Geisinger, K., Hasleton, P., et al.: Reproducibility of

Contextual Classification of Tumor Growth Patterns

25

histopathological subtypes and invasion in pulmonary adenocarcinoma. An international interobserver study. Mod. Pathol. 25(12), 1574 (2012) 20. Tsao, M.S., Marguet, S., Le Teuff, G., Lantuejoul, S., Shepherd, F.A., Seymour, L., Kratzke, R., Graziano, S.L., Popper, H.H., Rosell, R., et al.: Subtype classification of lung adenocarcinoma predicts benefit from adjuvant chemotherapy in patients undergoing complete resection. J. Clin. Oncol. 33(30), 3439 (2015)

Cervical Histopathology Image Classification Using Ensembled Transfer Learning Chen Li1 , Dan Xue1 , Fanjie Kong2 , Zhijie Hu1 , Hao Chen1 , Yudong Yao1 , Hongzan Sun3 , Le Zhang3 , Jinpeng Zhang1 , Tao Jiang4 , Jianying Yuan4 , and Ning Xu5(B) 1

Microscopic Image and Medical Image Analysis Group, Northeastern University, Shenyang, China {lichen,yyao}@bmie.neu.edu.cn, [email protected], [email protected], [email protected], [email protected] 2 Duke University, Durham, USA [email protected] 3 Shengjing Hospital, China Medical University, Shenyang, China [email protected], [email protected] 4 Chengdu University of Information Technology, Chendu, China {jiang,yuan}@cuit.edu.cn 5 Liaoning Shihua University, Fushun, China [email protected]

Abstract. In recent years, Transfer Learning makes a great breakthrough in the field of machine learning, and the use of transfer learning technology in Cervical Histopathology Image Classification (CHIC) becomes a new research domain. In this paper, we propose an Ensembled Transfer Learning (ETL) framework to classify well, moderately and poorly differentiated cervical histopathology images. In this ETL framework, Inception-V3 and VGG-16 based transfer learning structures are first built up. Then, a fine-tuning approach is applied to extract effective deep learning features from these two structures. Finally, a late fusion based ensemble learning strategy is designed for the final classification. In the experiment, a practical dataset with 100 VEGF stained cervical histopathology images is applied to test the proposed ETL method in the CHIC task, and an average accuracy of 80% is achieved. Keywords: Histopathology image · Cervical cancer · Classification Differentiation stages · Deep learning · Transfer learning · Ensemble learning

1

·

Introduction

Cervical cancer is known as one of the malignant tumors with high incidence in women [10]. There are a large number of potential cervical cancer patients in the c Springer Nature Switzerland AG 2019 E. Pietka et al. (Eds.): ITIB 2019, AISC 1011, pp. 26–37, 2019. https://doi.org/10.1007/978-3-030-23762-2_3

Cervical Histopathology Image Classification Using ETL

27

world, especially in developing countries, so efficient and accurate methods for diagnosing cervical cancer are urgently needed. Even though, there are some preliminary tests and non-invasive methods for detecting cancer in various organs, and histopathological studies are inevitable test. Histopathological studies are also considered to be the “gold standard” for cervical cancer in clinical diagnosis [6]. However, the applications of Computer Aided Diagnosis (CAD) technologies for cervical cancer are still an emerging and lack of in-depth development of research areas [19]. To this end, this paper focuses on the Cervical Histopathology Image Classification (CHIC) problems, and mainly solves a cervical cancer differentiation stage classification task. In this task, the histopatholgoy images of well, moderately and poorly differentiated cervical cancer stages are specially classified using an Ensembled Transfer Learning (ETL) framework. The workflow of the proposed ETL framework is shown in Fig. 1.

Fig. 1. Workflow of the proposed ETL framework. The blue box shows the training part, and the yellow box shows the test part

In Fig. 1, the microscopic images of cervical cancer are first acquired and used as training examples. Then, the acquired images are augmented to expand the size of the training set. Thirdly, two Transfer Learning approaches of VGG16 and Inception-V3 networks are built up to extract deep learning features. After that, the extracted deep learning features are ensembled to obtain a more accurate classification result with a late fusion approach. Finally, test images

28

C. Li et al.

are used to evaluate the effectiveness of the proposed method, where accuracy, precision, recall and F1-score are calculated. This paper is structured as follows. Section 2 introduces the related work about the CHIC. Section 3 introduces the proposed ETL methods, including transfer learning using the VGG-16 and Inception-V3 networks, and the proposed ETL framework. Section 4 introduces the experimental results, including the evaluation and analysis the ETL method. Section 5 concludes this paper and discusses the future work.

2

Related Work

Since the 21st century, CAD technologies are applied to colposcope gray-scale images of automatic screening for cervical cancer diagnosis [8,14]. In [11,13], k-means clustering, Gabor wavelet transform, graph cutting, color segmentation algorithms, cellular morphological methods, and binary tree algorithms are used to classify epithelial cells and stromal cells in the histopathological images of cervical cancer. As well, there are some studies on the staging diagnosis of uterine tumors by MRI, e.g., the work in [5]. However, few studies are published on the staging diagnosis of uterine tumors using CAD techniques on the histopathological images of cervical cancer. In recent years, deep learning approaches show a robust development trend in the CHIC field. For example, in [1], a method is proposed to automatically classify normal and abnormal cervical cells using Artificial Neural Network (ANN) and learning vector quantization algorithms. In [9], a method for the diagnosis of histopathological images of cervical cancer using support vector machine (SVM) and ANN is introduced. In [16], an ANN method is proposed to extract new features of cervical cells, providing a classification method for cervical smear examination using the ANN, and comparing it with k-means and Bayesian classifiers. Over the past decade, research of CAD of cervical cancer histopathological microscopic images focuses on using handcraft image feature extraction methods and machine learning classification methods for image segmentation and pathologic abnormalities screening, there is little research on the differentiation stage analysis of the cervical cancer histopathological image of CAD research. Hence, an effective CAD software can greatly improve the diagnostic efficiency of doctors and reduce their workload in this field.

3 3.1

Classification Using Ensembled Transfer Learning Transfer Learning

Artificial Neural Network (ANN) is one of the main tools used in machine learning, ANNs consist of input and output layers, as well as (in most cases) a hidden layer consisting of units that transform the input into something that the output layer can use. The input data enters the neural network from the input layer,

Cervical Histopathology Image Classification Using ETL

29

and then enters the hidden layers. Deep neural network usually refers to a neural network with a hidden layer more than two. The data output from the hidden layer is input to the final output layer, after activation function, then the output is obtained. The most classical network model for image feature extraction in deep learning is Deep Convolution Neural Network (DCNN), which is a deep neural network designed to classify and identify images. The difference between DCNNs and normal ANNs is that the DCNN include a feature extractor composed of convolutional layers and sub-sampling layers. In the convolutional layer of a DCNN, a neuron is only connected to part of the neighbouring layers. In a convolutional layer of CNN, there are usually several feature maps, each feature map is composed of some rectangular array neurons, the neurons of the same feature map share weights. The shared weights are the convolution kernels. The convolution kernels are generally initialized in the form of a random fractional matrix. During the training of the networks, the convolution kernels will learn to obtain reasonable weights. The direct benefit of shared weights is the reduction of connections between layers of the network while reducing the risk of over fitting. Sub-sampling is also called pooling, which usually has two forms: Average pooling and max pooling. Sub-sampling can be regarded as a special convolution process. Convolution and sub-sampling greatly simplify the model complexity and reduce the parameters of the model. The DCNN consist of three parts: The first part is the input layer, the second part is composed of many convolutional layers and pooling layers, the third part consists of a fully connected multilayer perceptron classifier. Transfer learning uses the pre-trained neural networks of other datasets to train a real dataset, which shows an excellent performance on the small dataset problem in the training process of deep learning. Transfer learning focuses on storing knowledge gained in solving a problem and applying it to different but related problems. It essentially uses additional data so that neural networks can decode the data by using the features of past experience training, after that the neural networks can have better generalization ability [7]. With transfer learning technology, we can directly first use pre-trained deep learning models that are trained through a large number of readily available datasets. Then, find out which layers of the output can be reused. Finally, we can use the output of these layers as input to train a network with fewer parameters and smaller scales. This small-scale network only needs to understand the internal relationships of specific problems, and learns the patterns contained in the data through the pre-trained models. 3.2

Ensemble Learning

Ensemble learning is the process by which multiple models, such as classifiers or experts, are strategically generated and combined to solve a particular computational intelligence problem. Ensemble learning is primarily used to improve the (classification, prediction, function approximation, etc.) performance of a model,

30

C. Li et al.

or reduce the likelihood of an unfortunate selection of a poor one. Other applications of ensemble learning include assigning a confidence to the decision made by the model, selecting optimal (or near optimal) features, data fusion, incremental learning, nonstationary learning and error-correcting [20]. Hence, ensemble learning is an effective and robust strategy to use single “weaker” methods to obtain a combined “stronger” result. 3.3

Our Ensembled Transfer Learning Model

First, we divide the images into training, validation and test sets. Then, input the training images and the corresponding category labels (well, moderately and poorly differentiated) into the pre-trained DCNN models for training. Finally, output the feature vectors for classification. Especially, we adopt the method of transfer learning to build the DCNN models, which can restrain the over fitting problem and improve the performance of classifier under the condition of small data. In this paper, we apply VGG-16 [15] and Inception-V3 [18] networks for this transfer learning process, where the parameters are pre-trained with the ImageNet dataset [3]. Then, based on our pre-tests, we apply the fine-tune with a learning rate of 0.0001 on the last eight layers of VGG-16 and the last 24 layers of Inception-V3. The structure of the ETL is demonstrated in Fig. 2. After that, as shown in Fig. 2, we heuristically add a dense layer and a softmax layer to each transfer learning model. The dense layer is a general deep neural network. The softmax layer is used to expand the features extracted by the DCNN into a one-dimensional feature vector. In order to suppress the gradient disappearance, gradient explosion and over-fitting problems, we insert the batch normalization layer and the drop-out layer in the dense layer. Based on our pretests, the drop-out rate is set to 0.5, then the dense layer is output to softmax layer for classification. The cross entropy function is selected as the objective loss function, the optimizer is AdamOptimizer, the learning rate is set to 0.0005, the batch size during training is set to 64, and the training epochs is set to 80. Finally, we save the model parameter checkpoints that makes the accuracy of the verification set to the highest value, as the final model parameters of our ETL networks. After training the dense layer, we perform a fine-tune operation on the real dataset and adjust the pre-trained parameters using a small learning rate. After the fine-tune process, the deep learning feature vectors of VGG16 and Inception-V3 are extracted. Figure 3 shows an example of the feature maps extracted by the transfer leaning of VGG-16 and Inception-V3 networks. It can be seen that the transfer leaning method can obtain some representative information of the images. Finally, for the extracted deep learning feature vectors using transfer learning, we fuse them into a feature vector first. Then, based on these feature vectors, a stack method is used for the classification. That is, the final output feature vector is used as the input of the second-level neural network, and finally the feature vector is input into the new dense layer for the final classification [12].

Cervical Histopathology Image Classification Using ETL

31

Fig. 2. The structure of the proposed ETL model. The top network is the VGG-16, and the bottom network is the Inception-V3. The blue box shows the training part using transfer learning, and the yellow box shows the test part using ensemble learning

4 4.1

Experimental Result Image Dataset

To test the effectiveness of the proposed ETL method in this paper, a practical histopathology image dataset of cervical cancer tissue sections is applied. The detailed information of this dataset is as follows. Data source: Two practical medical doctors from Shengjing Hospital of China Medical University provide image samples and give image-level labels; Staining method: Immunohistochemical (IHC) Staining, VEGF; Magnification: 400×; Microscope: Nikon (Japan); Acquisition software: NIS-Elements F 3.2; Image size: 1280 × 960 pixels; Image format: *.png; Image bits per pixel depth: 8 × 3 = 24; Image category: There are 100 images in the dataset, where 38 are well differentiation, 33 are moderate differentiation, and 29 are poor differentiation. Among them, the well differentiated tumor cells are the least malignant, the poorly

32

C. Li et al.

(a) The third layer of VGG-16

(b) The third layer of Inception-V3

Fig. 3. An example of the deep learning feature maps using the transfer learning approach with VGG-16 and Inception-V3 networks, respectively

differentiated tumor cells have the highest degree of malignancy, and the moderately differentiated tumor cells are moderately malignant. An example of this dataset is shown in Fig. 4. The morphological characteristics of these three differentiation stages are as follows: – Well differentiation: The tumor cells are closer to normal cells, cell heteromorphism is relatively small, cell sizes and morphology are similar. – Moderate differentiation: Most cancer cells are concentrated in moderately differentiated, the characteristic is between well differentiated and poorly differentiated cervical cancer cells. – Poor differentiation: The cell structure is not visible, and the topological structure is disordered. 4.2

Data Augmentation

Data augmentation adds value to the underlying data by transforming the information inside the dataset. Since the total number of sample datasets is too small, only 100, the neural network training on the small dataset is prone to over-fitting. So, we use data augmentation technology to enhance the original data set. As the expanded training set can improve the generalization ability of the neural networks, as well as help the neural networks to learn some features with scale, rotation and color invariance, thus improving the predictive

Cervical Histopathology Image Classification Using ETL

33

Fig. 4. An example of the cervical caner histopathology image dataset using VEGF staining. The first row shows the well differentiation stage, the middle row demonstrates the moderate differentiation stage, and the bottom row is the poor differentiation stage

performance of the classifier. Especially, we use data rotation and mirroring to augment our images. For each sample picture xi , i = 1, 2, ..., n, n is the total number of pictures in a sample set X, we first divide it into 16 equal-sized subimages z(i,j) , i = 1, 2, ..., n, j = 1, 2, ..., 16. Then we use the mirror edge padding to fill in the sub-images of equal length and width, and obtain image z(i,j) . For each z(i,j) , we apply two data augmentation operations: The first operation is to rotate the image into 0◦ , 90◦ , 180◦ and 270◦ ; the second operation is to do the horizontal flipping, vertical flipping and channel flipping. So that each sub-image z(i,j) can generate 16 images, where the image labels are the same as the original image xi . Hence, each sample image xi data is augmented to 256 images. Then the size of the data set expanded from 100 to 25,600 after data augmentation. Figure 5 shows an example of the applied data augmentation process. 4.3

Experimental Setting

In the experiment, the augmented 25,600 images of VEGF staining are applied to examine the proposed ETL method. The training set, validation set, and test set are divided according to the ratio of 6 : 2 : 2 in Table 1. For splitting the dataset into sharp sets, we divided the three sets by the order of the images.

34

C. Li et al.

Fig. 5. An example of the data augmentation process Table 1. The experimental setting of the VEGF image dataset. The first column shows the usages of the datasets. The second to the last columns denote three differentiation stages, respectively

4.4

Dataset

Well Moderate Poorly

Training

4352 4864

5632

Validation 1536 1792

2048

Test

2048

1536 1792

Evaluation of the ETL Framework

In this paper, we use accuracy, precision, recall and F1-score to evaluate the CHIC result as shown in Table 2. It can be seen from the table that the poorly differentiated stage has the best classification performance, the second is the well differentiated stage, and the worst classification performance is in moderately differentiated stage. Because the morphological characteristics of the moderately differentiated stage is in between well and poorly differentiated stages, it is the most difficult to classify. Furthermore, the confusion matrix of the ETL method is shown in Table 3. It can be seen that there are no poorly differentiated images are predicted to be well differentiated. Meanwhile, the ratio of the poorly differentiated images that are predicted to be moderately differentiated and the well differentiated images that are predicted to be poorly differentiated is low. In addition, we do a contrast experiment of the VGG-16 transfer learning, Inception-V3 transfer learning and the proposed ETL with the accuracy, and the result is shown in Fig. 6. From Fig. 6, we can see that the ETL method has a higher accuracy than the original VGG-16 and Inception-V3 transfer learning approaches on all three differentiation stages. Meanwhile, for the other evaluation index of precision, recall and F1-score, the ETL method is also superior to the VGG-16 and Inception-V3

Cervical Histopathology Image Classification Using ETL

35

Table 2. The cervical histopathology image classification result. The first column shows the evaluation methods. The second to the fourth columns denote three differentiation stages, respectively. The last column shows the average values Evaluation Well

Moderately Poorly Average

Accuracy

85.4%

58.7%

Precision

85.9% 80.7%

77.3%

Recall

85.4%

96.0% 80.0%

F1-score

85.7% 68.0%

96.0% 80.0%

58.7%

85.6%

81.3% 79.8%

Table 3. The confusion matrix of the ETL method for CHIC. The first column shows the differentiation stages. The second to the last columns denote the predicted differentiation stages, respectively. The second to the last rows denote three actual differentiation stages, respectively Stage

Well

Well

85.4% 11.7%

Moderate 11.2% Poorly

0%

Moderate Poorly 2.9%

58.7%

30.1%

4.0%

96.0%

Fig. 6. A comparison between VGG-16, Inception-V3 transfer learning and the proposed ETL classification performance

transfer learning, respectively. The experiment proving that the effectiveness of the ensemble learning strategy. Finally, an example of the classification result is shown in Fig. 7. It can be seen that the wrongly predicted images contain little information about the differentiation stages of cervical cancer tumor cells. So, it can disturb the ETL algorithm to classify the differentiation stages.

36

C. Li et al.

Fig. 7. An example of the classification result: (a) and (b) are correctly classified images, (c) and (d) are wrongly classified images

5

Conclusion and Future Work

In this paper, an ensembled transfer learning (ETL) framework is proposed to classify cervical histopathology images. Especially, three cervical cancer differentiation stages are classified, where the highest accuracy of 96.05% is achieved on the poor differentiation stage, showing the effectiveness and potential of the proposed method. Furthermore, we will plan to insert more effective transfer learning models into the proposed ETL framework to further enhance the classification ability, e.g., the ResNet-101 network [4], Inception-V2-ResNet network [17], and Xception network [2] in the future. Acknowledgements. We thank the funds supported by the “National Natural Science Foundation of China” (No. 61806047), the “Fundamental Research Funds for the Central Universities” (No. N171903004), the “Scientific Research Launched Fund of Liaoning Shihua University” (No. 2017XJJ-061), and the “Sichuan Science and Technology Program China” (No. 2018GZ0385). We also thank Dan Xue, due to her contribution is considered as the same as the first author in this paper.

References 1. Anousouya, D., Subban, R., Vaishnavi, J., Punitha, S.: Classification of cervical cancer using artificial neural networks. Proc. Comput. Sci. 89(2016), 465–472 (2016) 2. Chollet, F.: Xception: deep learning with depthwise separable convolutions (2017). arXiv:1610.02357 3. Deng, J., Dong, W., Socher, R., et al.: ImageNet: a large-scale Hierarchical Image Database. In: Computer Vision and Pattern Recognition, pp. 1063–6919 (2009) 4. He, K., Zhang, X., Ren, S., et al.: Computer aided decision support system for cervical cancer classification. Proc. CVPR 2016, 770–778 (2016) 5. Ji, Q., John, E., Eric, C.: Classifying cervix tissue patterns with texture analysis. Pattern Recog. 33(2000), 1561–1573 (1999) 6. Jothi, J., Rajam, V.: A survey on automated cancer diagnosis from histopathology images. Artif. Intell. Rev. 48(1), 31–81 (2016)

Cervical Histopathology Image Classification Using ETL

37

7. Kamishima, T., Hamasaki, M., Akaho, S.: TrBagg: a simple transfer learning method and its application to personalization in collaborative tagging. In: Proceedings of The 9th IEEE International Conference on Data Miming, pp. 219–228 (2009) 8. Lange, H., Ferris, D.: Computer-aided-diagnosis (CAD) for colposcopy. Proc. SPIE 5747, 71–84 (2005) 9. Mustafa, N., MIsa, N., Mashor, M.: Colour contrast enhancement on preselected cervical cell for ThinPrep images. In: Proceedings of IIH-MSP 2007, pp. 209–212 (2007) 10. Parkin, D., Bray, F., Ferlay, J., et al.: Global cancer statistics, 2002. Cancer J. Clin. 55(2), 74–108 (2005) 11. Purwanti, E., Bustomi, M., Aldian, R.: Applied computing based artificial neural network for classification of cervical cancer. In: Proceedings of CISAK (2013) 12. Qi, Z., Wang, B., Tian, Y.: When ensemble learning meets deep learning. Knowl.Based Syst. 107(2016), 54–60 (2016) 13. Rahmadwati, R., Naghdy, G., Todd, C., et al.: Computer aided decision support system for cervical cancer classification. Proc. SPIE 8499, 1–13 (2012) 14. Sala, E., Wakely, S., Senior, E.: MRI of malignant neoplasms of the uterine corpus and cervix. AJR Am. J. Roentgenol. 188(6), 1577–87 (2008) 15. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Comput. Sci. (2014) 16. Sukumar, P., Gnanamurthy, R.: Computer aided detection of cervical cancer using pap smear images based on adaptive neuro fuzzy inference system classifier. J. Med. Imaging Health Inf. 6(2), 312–319 (2016) 17. Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-V4, Inception-ResNet and the Impact of Residual Connections on Learning (2016). arXiv:1602.07261v2 18. Szegedy, C., Vanhoucke, V., Ioffe, S., et al.: Rethinking the inception architecture for computer vision. Comput. Sci. (2015) 19. Wied, G., Bartels, P., Bahr, G., et al.: Taxonomic intra-cellular analytic system (TIICAS) for cell identification. Acta Cytol. 12(3), 180 (1968) 20. Zhou, Z.: Machine Learning. Tsinghua University Press, Beijing (2016)

Functional Kidney Analysis Based on Textured DCE-MRI Images Marcin Kociolek(B) , Michal Strzelecki, and Artur Klepaczko Institute of Electronics, Lodz University of Technology, ul. W´ olcza´ nska 211/215, 90-924 L ´ od´z, Poland {marcin.kociolek,michal.strzelecki,artur.klepaczko}@p.lodz.pl http://eletel.p.lodz.pl/kociolek

Abstract. The increasing number of renal disorders requires application of modern medical imaging techniques that in a non-invasive and efficient way enable monitoring of various kidney diseases. The dynamic contrastenhanced sequence (DCE) is a magnetic resonance imaging method, which allows visualizing kidney state and estimating a number of functional kidney parameters, e.g. glomerular filtration rate. In this paper we propose application of texture analysis to provide numerical descriptors of DCE-MR images. It is demonstrated that such an approach extends possibilities of DCE-MR examination providing additional information related to kidney functionality. The proposed method was verified on a data set of real DCE-MRI examinations acquired for 10 healthy volunteers.

Keywords: Kidney

1

· DCE-MRI · Texture features

Introduction

Recent advances in medical imaging technology establish its broad and increasing role in various clinical applications, especially those associated with current and frequent health problems. One of such application areas are renal diseases, such as chronic kidney disease, hypertension, diabetes, and cancer. Kidneys are responsible for blood filtration, removal of water-soluble waste products of metabolism, surplus glucose and other organic substances. Renal performance is an indicator of physiological homeostasis, acid-base balance and arterial blood pressure. The growing rate of renal disorders in modern aging societies calls for a diagnostic method that would allow for continuous monitoring of patients belonging to risk groups. Routinely, renal performance can be assessed using the creatinine clearance test [2]. This method, however, requires acquisition of two blood samples in a 24-h interval and as a result one obtains the overall estimate of glomerular filtration rate (GFR), which characterizes renal perfusion. On the other hand, dynamic contrast-enhanced (DCE) is a magnetic resonance imaging protocol, which allows visualizing kidney state, separately for the left and c Springer Nature Switzerland AG 2019 E. Pietka et al. (Eds.): ITIB 2019, AISC 1011, pp. 38–49, 2019. https://doi.org/10.1007/978-3-030-23762-2_4

Functional Kidney Analysis Based on Textured

39

right side [1]. Moreover, perfusion estimates can be calculated voxel-wise, hence localization of lesions within renal tissue becomes possible. DCE-MRI involves intravenous administration of a gadolinium-based contrast agent (CA) in the form of a bolus, which passes through the arterial system and is filtered out in the kidneys. Tracer kinetic is reflected in temporal variations of image signal. By inspecting signal intensity time-curves various metrics can be derived to evaluate perfusion. Usage of pharmacokinetic models [8,12] leads to estimation of a series of quantitative measures, such as single-kidney GFR, based on model parameters after its fitting to the observed signal data. However, reliability of these estimates is limited by the approximation of a function converting image signal to CA concentration. This function requires knowledge of the blood and kidney tissue longitudinal relaxation rates, which are difficult to measure for each patient in the clinical practice [3,13]. Thus, in an alternative semi-quantitative approach, renal perfusion is described by the attributes of observed intensity time-courses, such as time to peak, mean residence time or area under curve [4]. In this paper, we postulate yet another approach to estimating kidney perfusion in a semi-quantitative manner. Our method utilizes texture analysis [7,10] to provide numerical descriptors of DCE-MR images. Although absolute quantification of perfusion is not possible in this case, texture enables objective monitoring of a patient’s state between multiple examinations along with assessment of functional kidney operation. In addition, texture analysis offers many numerical descriptors which encapsulate various patterns inherent in an image. The proposed strategy, as a first step, involves segmentation of kidneys parenchyma in every frame of the dynamic series. For every patient, the region of interest (ROI) was manually delineated over the whole kidney. The ROI drawn in one frame was used in all other frames after shifting and necessary adjustments to compensate for the kidney motion (mainly in the vertical direction) and small local deformations due to respiration. The potential of the proposed approach was verified on a data set of real DCE-MRI examinations acquired for 10 healthy volunteers, as described below.

2

Materials and Methods

Our material constitutes a subset of images described in [2]. We had access to 3D image sequences acquired by means of dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) from 10 healthy nonsmoking volunteers. Sequences were acquired by means of 1.5 T Siemens Magnetom Avanto scanner (Erlangen, Germany) using a standard phased-array coil. For each volunteer two DCE-MRI examinations were performed at interval of 7 days resulting in 20 datasets. Each dataset consists of 74 volumetric images taken at intervals of 2.3 s. Each volumetric image constitutes Coronal-oblique DCE-MRI acquired using a 3D spoiled gradient-recalled (SPGR) pulse-sequence: T E = 0.8 ms, T R = 2.36 ms; flip angle F A = 20◦ ; parallel imaging factor = 3. The field of view was 425 mm×425 mm×90 mm. The voxel size is 2.2135 mm×2.2135 mm×3 mm.

40

M. Kociolek et al.

This results in a stack of 30 slices having dimensions of 192 pixels × 192 pixels. All voxel intensities are stored in a 16 bit format. Minimum voxel intensity for all datasets is 0, while the maximum varies from 466 up to 890. Size of the left kidney, for selected slice, ranged from 707 to 1013 pixels while for the right kidney ranged from 755 to 1074 pixels. The shape of a kidney remains the same while the internal structure (appearance) changes as the contrast agent enters the kidney with the blood and then is filtered out and transferred to the pelvis. Due to the respiratory movements kidneys change its their vertical position in the image over time. Figure 1 shows a sample of different slices taken at the same time point. Kidneys are visible on up to 12 consecutive slices. Cortex is visible on all of these slices, but the medulla and pelvis are visible only on a few of them.

(a) slice 10

(b) slice 13

(c) slice 16

(d) slice 19

(e) slice 22

(f) slice 25

Fig. 1. Example images for patient no. 8, examination no. 1, time frame no. 12

Each dataset was manually segmented into 6 different ROIs: left kidney (LK), left medulla (LM), left pelvis (LP), right kidney (RK), right medulla (RM), right pelvis (RP) (Fig. 2). Additionally, left cortex (LC) and right cortex (RC) were found as (1) and (2). LC = LK ∩ (LM ∪ P L)

(1)

RC = RK ∩ (RM ∪ RL)

(2)

Functional Kidney Analysis Based on Textured

(a) all regions

(b) kidney contours

(c) medula contours

(d) pelvis contours

41

Fig. 2. ROIs superimposed on sample image. Colors correspond to following regions: red – left kidney (LK), blue – left medulla (LM), green – left pelvis (LP), purple – right kidney (RK), brown – right medulla (RM), yellow – right pelvis (RP)

In the conducted experiment for each dataset a slice containing a clear view of medulla and pelvis was selected (Fig. 3). For such a slice a number of intensity and textural features were calculated independently for the left and right kidney regions over all 74 time points. The features were calculated by means of QMaZda software [11]. The following feature sets were calculated: – 13 based on the intensity histogram; – 220 based on Grey Level Co-occurrence Matrix (11 features for GLCM built for combination of 4 directions (0◦ , 45◦ , 90◦ , 135◦ ) and 5 distances/offsets (1, 2, 3, 4, 5)); – 24 based on Grey-Level Run-Length Matrix (6 features for GLRLMs build for 4 directions (0◦ , 45◦ , 90◦ , 135◦ )); – 5 based on Gradient Map; – 5 based on Auto-Regressive Model;

42

M. Kociolek et al.

(a) time frame 9

(b) time frame 10

(c) time frame 11

(d) time frame 13

(e) time frame 14

(f) time frame 15

(g) time frame 38

(h) time frame 39

(i) time frame 40

Fig. 3. DCE-MRI images for characteristic time points of renal filtration action (patient 4, examination 1, slice 16). Contour of the left kidney was superimposed on the images. The first row shows three consecutive images showing the process of entering the contrast agent to the cortex (perfusion phase). The second row shows three consecutive images showing the process of entering the contrast agent to the medulla region (uptake phase). The third row shows three consecutive images showing the process of the contrast agent entering the pelvis region (wash-out phase)

– 20 based on Gabor Transform (Gabor transforms created for combination of 4 directions (0◦ , 45◦ , 90◦ , 135◦ ) and 5 sizes of Gaussian envelope (4, 6, 8, 12, 16, 24)); – 12 based on 2D Harr Wavelet Transform (3 filters and 4 scales); – 12 based on Histogram of Oriented Gradients (3 bins and 4 orientations);

Functional Kidney Analysis Based on Textured

43

– 626 based on Local Binary Patterns (combination of 3 algorithms: Overcomplete, Transition and Center-symmetric, and 3 different sizes of neighborhoods (4, 8, 12 pixels)). A total of 937 features were calculated. All textural features, except histogram and LBP based features, were calculated for pixel intensities normalized by means of ±3σ normalization followed by the reduction of the number of values coding the brightness [5,6] (16 intensity values were used). The details of the features calculated and a description of QMaZda functionalities can be found in QMaZda manual [9].

3

Results

In the Fig. 4 an example plot of mean intensity across the whole left kidney area is shown. The feature was measured for first examination of patient 4. Slice 16 was used.

Fig. 4. Example plot of mean intensity across the whole left kidney area (patient 4, examination 1, slice 16). Vertical lines indicate time frames corresponding to images shown in Fig. 3. Blue lines mark time frames when the contrast agent enters the cortex, red lines mark time frames when the contrast agent enters the medulla and green lines mark time frames when the contrast agent enters the pelvis

Mean intensity is related to the amount of contrast agent present in the tested area. This parameter can be easily measured directly from MRI images, assuming that the MR scanner is equipped with the appropriate acquisition protocol and software. There is a clearly visible time point at which mean intensity rises which is caused by the contrast agent flowing into the cortex (time frames 9–11,

44

M. Kociolek et al.

Fig. 3(a)–(c)). Then there is a mean intensity drop caused by the contrast agent flow into medulla (time frames 12–14, Fig. 3(d)–(f)). However we are not able to identify the time point at which the contrast agent flows into the pelvis (time frames 38–40, Fig. 3(g)–(i)). In the Fig. 5 plots of GLCM based (direction horizontal, distance/offset 1) angular second moment are shown. This is the same dataset as for Fig. 4 (patient 4, examination 1, slice 16).

Fig. 5. Plot of GLCM based angular second moment (direction horizontal distance/offset 1) for entire left kidney region (patient 4, examination 1, slice 16). Blue lines mark time frames when the contrast agent entering the cortex and red lines mark time frames when the contrast agent entering the medulla

A sudden drop in the GLCM based angular second moment value can be easily seen when the contrast agent enters the kidney cortex. Followed by a rise of the feature value when the contrast agent starts to flow into medulla region. The GLCM based differential variance (direction horizontal distance/offset 1) plotted for the same dataset is shown on Fig. 6. A sudden jump of feature value can be observed when the contrast agent flows into the pelvis. In the Fig. 7 plot of GLCM based sum of squares (direction horizontal distance/offset 1) is shown. For this feature all three contrast agent transitions between kidney structures are visible. Similar properties can be found for other texture features, e.g. mean value of gradient matrix or energies of certain sub-bands and scales of Haar DWT (see Discussion).

Functional Kidney Analysis Based on Textured

45

Fig. 6. Plot of GLCM based differential variance (direction horizontal distance/offset 1) green lines mark time frames when the contrast agent is entering the pelvis

Fig. 7. Plot of GLCM based sum of squares (direction horizontal distance/offset 1) Blue lines mark time frames when the contrast agent is entering the cortex, red lines mark time frames when the contrast agent is entering the medulla and green lines mark time frames when the contrast agent is entering the pelvis

4

Discussion

As it was shown in the Result Section, calculation of mean intensity in the kidney ROI allows identification of the time points when the contrast agent enters the cortex and then when it starts to enter the medulla. This is rather obvious,

46

M. Kociolek et al.

since such ROI intensity corresponds to the contrast amount and distribution in kidney. On the other hand it is not possible, by means of this feature, to identify a time point at which the contrast agent flows into the pelvis (the so-called wash-out phase). To do this, one needs to calculate the mean intensity from the pelvis area only. Such approach requires additional segmentation of the kidney region which is quite tedious. Pelvis region is visible on a smaller portion of the frames and the whole kidney changes its vertical position because of respiratory movements. Another solution is an application of more sophisticated parameters for kidney region analysis. We have shown that features commonly used for texture analysis can improve identification of the most important steps of the filtration process in kidneys. Automated identification of the wash-out phase is important from the pharmacokinetic modeling perspective. The time range of the model support must be limited to filtration and uptake phases only. Otherwise, a very high intensity in the pelvis region at the end of DCE time series may apparently increase the signal value averaged over the whole parenchymal ROI, effectively leading to incorrect GFR estimates. During our experiments we have tested a vast number of popular texture features. We have found that some of them can be used for identification of characteristic moments during the filtration process when the contrast agent enters the cortex, the medulla and the pelvis. Some of such features are more selective like previously mentioned angular second moment calculated by means of GLCM or energy of discrete wavelet transform calculated for scale 1 and subband HL (Fig. 8). Other features can catch more events as the mentioned earlier sum of squares based on GLCM and mean value of gradient matrix (Fig. 9).

Fig. 8. Plots of energy calculated for discrete wavelet transform (Haar wavelet) at scale 1 and sub-band LH (low-pass filter in horizontal direction high-pass in vertical direction)

Functional Kidney Analysis Based on Textured

47

Fig. 9. Plots of gradient matrix based mean value

The cross-section of the internal structure of the kidney visualized by the MR scanner does not represent an uniform texture, since a kidney consists of several organs that are characterized by their particular textures. Nevertheless, the values of selected features, customarily used in texture analysis, code some time properties of filtration process performed by the kidney. This is due to the properties of the analyzed DCE-MRI images that sequentially show the contrast agent flow through the kidney. A single kidney occupies a small image area between 700 and 100 pixels. Medulla pyramids are visible when the contrast agent enters the cortex. They are small dark areas contrasting to the brighter cortex. The pelvis is visible when the contrast agent is present in it. It is another small area, this time much brighter than the cortex. However, the information obtained from these images thanks to texture analysis extends findings that can be acquired from contrast intensity distribution. It was demonstrated that texture analysis provides information about important time instants during the filtration process that can be identified altogether from distribution of a single texture feature without the need of further kidney ROI segmentation into smaller organs.

5

Summary

We have showed that textural features can be applied for analyzing the state of the kidneys during filtration process, performed by means of dynamic contrastenhanced magnetic resonance imaging (DCE-MRI). In order to enable such analysis following steps should be performed:

48

M. Kociolek et al.

1. Simple manual or semi-automatic segmentation of the whole kidney region performed for only one slice from entire z-stack (providing that for such a slice both medullas and pelvis will be visible). 2. Spatial registration of this region on consecutive time frames. 3. Calculation of textural features over the kidney region for consecutive time frames. The presented approach utilizes analysis of the entire kidney region on only one slice so segmentation and registration are simpler than in case of multi-slice and /or multi-region techniques. The proposed approach requires verification on more patient data which will be the topic of further research. In this paper we are intending to show that variations of feature values correspond to the dynamics of the contrast agent thus all analyses were performed on healthy patients. In the future we are planing discriminative analyses of the data for patients with the renal tract disorders. Acknowledgements. This paper was supported by the Polish National Science Centre grant no. UMO-2014/15/B/ST7/05227. The authors express their gratitude to Prof. Arvid Lundervold, Prof. Jarle Rørvik, and Dr. Eli Eikefjørd from the Haukeland University Hospital Bergen, Norway, for providing the DCE-MRI data set used in this study.

References 1. Bammer, R.: MR and CT Perfusion and Pharmacokinetic Imaging: Clinical Applications and Theoretical Principles. Lippincott Williams & Wilkins (2016) 2. Eikefjord, E., Andersen, E., Hodneland, E., Hanson, E.A., Sourbron, S., Svarstad, E., Lundervold, A., Rørvik, J.T.: Dynamic contrast-enhanced MRI measurement of renal function in healthy participants. Acta Radiol. 58(6), 748–757 (2017) 3. Huang, Y., Sadowski, E.A., Artz, N.S., Seo, S., Djamali, A., Grist, T.M., Fain, S.B.: Measurement and comparison of T1 relaxation times in native and transplanted kidney cortex and medulla. J. Magn. Reson. Imaging 33(5), 1241–1247 (2011) 4. Jackson, A., Li, K.L., Zhu, X.: Semi-quantitative parameter analysis of DCE-MRI revisited: Monte-Carlo simulation, clinical comparisons, and clinical validation of measurement errors in patients with type 2 neurofibromatosis. PloS One 9(3), e90,300 (2014) 5. Kociolek, M., Strzelecki, M., Szymajda, S.: On the influence of the image normalization scheme on texture classification accuracy. In: 2018 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), pp. 152–157. IEEE (2018) 6. Materka, A., Strzelecki, M.: On the importance of MRI nonuniformity correction for texture analysis. In: Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), 2013, pp. 118–123. IEEE (2013) 7. Obuchowicz, R., Nurzynska, K., Obuchowicz, B., Urbanik, A., Pi´ orkowski, A.: Caries detection enhancement using texture feature maps of intraoral radiographs. Oral Radiol. (2018). https://doi.org/10.1007/s11282-018-0354-8 8. Sourbron, S.P., Michaely, H.J., Reiser, M.F., Schoenberg, S.O.: MRI-measurement of perfusion and glomerular filtration in the human kidney with a separable compartment model. Investig. Radiol. 43(1), 40–48 (2008)

Functional Kidney Analysis Based on Textured

49

9. Szczypi´ nski, P.M.: Qmazda manual (2018). http://www.eletel.p.lodz.pl/pms/ Programy/qmazda.pdf 10. Szczypi´ nski, P.M., Klepaczko, A.: MaZda – a framework for biomedical image texture analysis and data exploration. In: Biomedical Texture Analysis, pp. 315– 347. Elsevier (2017). https://doi.org/10.1016/b978-0-12-812133-7.00011-9 11. Szczypi´ nski, P.M., Klepaczko, A., Kociolek, M.: Qmazda - software tools for image analysis and pattern recognition. In: Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), 2017, pp. 217–221. IEEE (2017) 12. Tofts, P.S., Cutajar, M., Mendichovszky, I.A., Peters, A.M., Gordon, I.: Precise measurement of renal filtration and vascular parameters using a two-compartment model for dynamic contrast-enhanced MRI of the kidney gives realistic normal values. Eur. Radiol. 22(6), 1320–1330 (2012) 13. Zhang, X., Petersen, E.T., Ghariq, E., De Vis, J., Webb, A., Teeuwisse, W.M., Hendrikse, J., Van Osch, M.: In vivo blood T1 measurements at 1.5 t, 3 t, and 7 t. Magn. Reson. Med. 70(4), 1082–1086 (2013)

Incorporating Patient Photographs in the Radiology Image Acquisition and Interpretation Process Elizabeth A. Krupinski(B) Department of Radiology & Medical Imaging, Emory University, Atlanta, GA 30322, USA [email protected]

Abstract. Radiologic image interpretation is a complex task and unfortunately errors do occur. Some errors, however, might be preventable if the radiologist had additional information about the state of the patient when the images were acquired. A system was developed that automatically acquires a photograph of the patient simultaneously with portable radiographic images. Studies using this novel technology have demonstrated improved recognition of incorrectly labeled images, wrong side errors, improved ability to correctly detect and follow tubes and lines, and demonstrated that inclusion of the photo doers not significantly change viewing times.

Keywords: Radiology Observer performance

1

· Photographs · Point-of-care ·

Introduction

Nearly 100,000 people die annually from medical errors [1] in the United States alone. Radiology is no exception, with errors rates often estimated to be as high as 30% (although most people do not die from the majority of these errors). Many of these errors are preventable simply by providing the interpreting radiologist with more information about the patient. In its 2010 National Patient Safety Goals (NPSG), the Joint Commission on Accreditation of Healthcare Organizations (JCAHO) provides a specific requirement (NPSG.01.01.01) that at least two patient identifiers be used when providing care, treatment, and services. This is because wrong-patient errors are actually rather common and can occur at any point in the care continuum. An individual’s name, an assigned identification number, telephone number, or other person-specific number are acceptable patient identifiers. This however can be difficult in radiology, especially when a good number of patients receive multiple images, often from different modalities), at different points in time. The National Quality Forum has therefore endorsed the use of standardized protocols to prevent radiograph mislabeling. It does not however specify what aspects should be standardized or how. c Springer Nature Switzerland AG 2019 E. Pietka et al. (Eds.): ITIB 2019, AISC 1011, pp. 50–55, 2019. https://doi.org/10.1007/978-3-030-23762-2_5

Incorporating Patient Photographs

51

To address this, a system was developed (Camerad Technologies, LLC Decatur, GA) to simultaneously acquire a photograph of the patient at the time the portable X-ray exam is acquired. The system uses smart cameras built around the Raspberry Pi Zero platform (Raspberry Pi Foundation, Cambridge, UK) that are integrated with the portable radiograph acquisition systems (initially Carestream DRX Revolution (Carestream, Rochester, NY). They are powered by a USB port available on the system. A non-mechanical force-sensing resistor (FSR 400, Interlink Electronics, Westlake Village, CA) is attached to the radiography hand switch, triggering the photo acquisition at the same time the X-ray is triggered. The image is tagged with the acquisition time and radiography machine it is attached to, then a custom-designed server retrieves the images securely through the hospital WiFi. Using the time and machine data, the server polls the PACS (Picture Archiving and Communications System) for the corresponding radiographic image. The system processes the photos, converts them to DICOM and matches them with the images in the PACS. All data are encrypted using advanced Encryption Standard (AES-XTS). Initial studies [2] described the core technology (which has evolved significantly since first introduced in 2013), its potential use and cost ( 4 of such points (Fig. 1), the problem becomes ill-posed and requires regularization to be solved. To do this, we assume that the coordinates measured after the transformation include some error. In this case we seek the solution to minimize the mean squared error ε. The error is a function of the elements of the transformation matrix and vector (2). The partial derivatives (3) and (4) of the error function should be equal to zero to minimize the error. The resulting system of equations has a unique solution (5). More detailed reasoning and derivation of this solution can be find in [13] and application to motion estimation in [14]. ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ w1 j11 · · · j1D t1 v1 ⎥ ⎢ .. ⎥ ⎢ .. ⎥ ⎢ .. ⎥ ⎢ . . (1) v=⎣ . ⎦≈⎣ ⎦ ⎣ . ⎦ + ⎣ . ⎦ = Jw + T . jD1 · · · jDD

vD

ε=

wD

P

P D

|Jwp + T − vp |2 =

p=1

p=1 n=1

tD

D

2 (jnm wpm ) + tn − vpn

P D ∂ε =2 (jkm wpm ) − vpk = 0 tk + ∂tk p=1 m=1

(3)

P D ∂ε =2 (wpm jkm ) + wpl tk − wl vpk = 0 wpl ∂jkl p=1 m=1

(4)

⎡

P

⎤

p=0

2 wp1

P

P

⎤−1 ⎡

··· wp1 wpD wp1 ⎥ ⎢ p=0 p=0 ⎢ p=0 ⎥ jk1 ⎢ ⎥ .. .. .. ⎢ ⎥ ⎢ .. ⎥ ⎢ . . . ⎥ ⎢ . ⎥ ⎢ P ⎥ P P ⎢ ⎥ = ⎢

⎥ 2 ⎣jkD ⎦ ⎢ wp1 wpD · · · wpD wpD ⎥ ⎢p=0 ⎥ p=0 p=0 tk ⎢ P ⎥ P

⎣ ⎦ wp1 · · · wpD P ⎡

(2)

m=1

p=0

P

⎤

wp1 vpk ⎥ ⎢ ⎢ p=0 ⎥ ⎢ ⎥ .. ⎢ ⎥ . ⎢ ⎥ ⎢ ⎥ (5) ⎢ P ⎥ w v ⎢ pD pk ⎥ ⎢p=0 ⎥ ⎢ P ⎥ ⎣ ⎦ vpk p=0

As presented in (5), the problem of estimating the affine transformation from a set of paired points can be solved analytically. It requires calculation of the inverse matrix, which in three-dimensional case (D = 3) is trivial.

70

P. M. Szczypi´ nski and A. Klepaczko

Fig. 1. The goal is to estimate a transformation of green points (LHS) on to the corresponding red points (RHS) with a minimum error

As aforementioned, the affine transformation defines scaling, shear and rotation of the image space. To establish contribution of these components in the transformation, matrix J has to be decomposed. The matrix is decomposed to orthogonal U and symmetric S matrices (6). Next, the symmetric matrix S is eigendecomposed to the orthogonal matrix Q and diagonal matrix D. ⎡ ⎤ 0 λ1 ⎢ ⎥ J = US = UQ−1 DQ = |D|UQ−1 ⎣ . . . (6) ⎦Q 0

λD

Determinant |D| represents scaling, matrix U defines rotation, and eigenvalues λn determine shear at directions indicated by the column vectors of matrix Q. Both U and Q matrices are orthogonal, what particularly means that Q−1 = QT and |Q| = |U| = 1. If all the eigenvalues are equal to one (λn = 1), the transformation does not involve shear and is called a Procrustes transformation. If |D| = 1 then the transformation is volume preserving. However, it must be noted that computation of the affine transformation and then removal of the U matrix from (6) yields only a rough estimation of Procrustes transformation. The way to exactly estimate the Procrustes transformation is presented in [4,13] – it is a numerical solution and it is much more demanding computationally. In contrast to the Procrustes transformation, the volume preserving transformation can be estimated based on the solution (5) by removal of the |D| factor from Eq. (6). The above approach to determine the affine transformation and then remove selected factors gives control over the regularizer, which is needed in medical image registration.

Deformable Mesh for Regularization

3

71

Image Similarity

Finding a correspondence between points in two images requires definition of a similarity function. Selecting a suitable similarity function is not a trivial decision and depends on the properties of the images to be co-registered. The simplest and most straight-forward method to establish similarity between image fragments is the mean absolute difference (MAD). A block of voxels from one image is compared with the block of the same size from the other image. Absolute difference between corresponding pixels in blocks is computed and averaged. The value of this measure is lower if the similarity between blocks is higher. However, this method can be applied exclusively for images of the same modality and acquired under the same conditions, which is a rare case in medical applications. Measurement of similarity in images which differ in terms of brightness and contrast can be solved with the normalized covariance measure (NCM) [2]. The NCM computes covariance between voxel intensities in two image blocks and divides it by a geometric mean of intensity variances in the blocks. The most complex situation is to compare images of differing modalities. This problem is often solved by applying the mutual information (MI) measure [11], which compares information entropies in the compared image blocks. In NCM and MI, the more similar the image blocks, the higher is the value of the measure. The definitions of the MAD and NCM functions are given in equations (7) and (8). They define relation between the block a of voxels in image IA and the similar block b with the center at coordinates (xb ,yb ,zb ) in image IB . The R parameter defines a so-called radius of the block.

MAD a (xb , yb , zb ) =

1 (2R+1)3

R

R

R

i=−R j=−R k=−R

|IA (xa + i, ya + j, za + k) − IB (xb + i, yb + j, zb + k)|

(7) R

NCM a (xb , yb , zb ) =

R

R

(IA (xa +i,ya +j,za +k)−imua )(IB (xb +i,yb +j,zb +k)−imub ) R R R (IA (xa +i,ya +j,za +k)−imua )2 (IB (xb +i,yb +j,zb +k)−imub )2

i=−R j=−R k=−R R

R

R

i=−R j=−R k=−R

(8)

i=−R j=−R k=−R

In some solutions the image IA is divided into blocks of the same sizes organized in a regular raster. Then, for every block its counterpart is searched in the image IB . This approach is often criticized since not all the blocks from IA can be uniquely matched with specific blocks in IB . The example of such situation may be a block presenting a part of homogeneous background region, which will match any location of the similar background in the other image. To solve this problem, the concept of feature points was introduced. Only the blocks presenting specific and unique content of image IA are matched with the image IB . This means, the blocks presenting homogeneous regions or recurring content should be excluded from this process. Having this in mind, the regularization of three-dimensional image registration should support the concept of feature point matching and also should enable application of alternative similarity measures.

72

4

P. M. Szczypi´ nski and A. Klepaczko

Elastic Registration

There are two components considered in image registration, the first is derived from the images and involves image similarity measures, the second is the regularization term and it is derived from the transformation of paired points. The way the two components are combined was inspired by a concept of deformable parametric models and meshes [6,10] (Fig. 2). This concept defines an energy of the model as a sum of the two components. The energy derived from the image content (from block matching) is usually called an external, and the regularization term is referred as an internal component. The model’s energy can be defined by (9), where the parameters ρ and ξ control a contribution of both the energy components, and P is the number of all the points (nodes) of the model. The equation contains an error ε defined by (2). However, note that in Sect. 2 the error was adopted as a function of elements of J and T. Now, the error is used as a function of coordinates of all the nodes, with assumption that the optimal J and T were already established. The function Mp is a image block disimilarity function, computed for a block from the image IA and linked to the point p. The Mp may be represented by the MAD function, or optionally can be equal to inverted or negated NCM or MI functions. E=

P

(ρMp (x, y, z)) + ξε

(9)

p=1

Fig. 2. Conception of deformable mesh; Nodes are linked with each-other to reflect the regularization term and linked with matching blocks to depict the image derived term

Solving the registration problem requires minimization of the energy E. If we neglect the influence of point p on the J matrix and on the T vector, which is valid if the displacement of the points are small, the problem can be solved

Deformable Mesh for Regularization

73

independently for individual points. To do this, we apply a gradient-descent T optimization (10), where (i) indicates an iteration step and vp = [xp yp zp ] . vp(i+1) = vp(i) − ρ∇Mp (xp(i) , yp(i) , zp(i) ) − ξ∇εp(i) (xp(i) , yp(i) , zp(i) )

(10)

∇εp = vp − (Jp wp + Tp )

(11)

It must be noted that the coordinates v change in subsequent iterations. This affects J and T elements, which in turn affect a form of the ε function. Therefore, in Eq. (10) the ε function is indexed with (i) – the iteration number. Moreover, in (11) the ε, J and T are indexed with the index p. This means, that the function is computed in different ways for different nodes of the mesh. On the other hand, examination of Eq. (1) may lead to conclusion that the matrix J and vector T are computed the same way for all the nodes in the mesh. This inconsistency requires explanation. As aforementioned, the affine transformation preserves straight lines, flat planes and keeps their parallelism. Unfortunately this means that the transformation cannot model bending which may appear in some human or animal body structures. Therefore, applying the same matrix J and vector T for all the nodes of the mesh would prevent the mesh from bending. To overcome this difficulty the affine transformation is estimated locally within a limited neighborhood of nodes surrounding the selected node p. If the neighborhood is narrow then the bending of the whole mesh is possible. Otherwise, use of wide neighborhoods would counteract bending, and using all the nodes for estimation of the transformation would eventually prevent bending completely. The procedure for image registration by means of the proposed regularization method is as follows: 1. The feature points are selected in the image to be transformed (IA – a moving image). The points should be persistently linked with the centers of unique blocks in the image. 2. The mesh is constructed from the points by virtually linking the neighboring points. This step has an impact on ability of the mesh to bend, since narrow neighborhoods enable bending and the wider neighborhoods may restrict bending ability. 3. The ρ and ξ parameters are set to establish contribution of image similarity function term and the regularization term respectively. 4. The transformation matrix and translation vector are computed for every node and its neighborhood. The matrix is decomposed and modified by removal of its selected factors (e.g. removal of the determinant introduces resistance to scaling). 5. For every point, and its persistently linked block, the gradient of image dissimilarity function is computed in the matched (a fixed) image (IB ). 6. New coordinates of every point of the mesh are computed from (10).

74

P. M. Szczypi´ nski and A. Klepaczko

7. The steps 4, 5 and 6 are repeated for a given number of iterations. In the final iterations the values of ρ and ξ parameters can be gradually and proportionally reduced. 8. Finally, the moving image shall be transformed from the initial to the final location of the mesh.

5

Results

The registration algorithm was tested on 20 cases of time series showing Dynamic Contrast Enhanced (DCE) MRI images. Each of the time series is made up of 74 three-dimensional images, each of 192 × 192 × 30 voxels and with voxel spacing of 2.2 × 2.2 × 3 mm. The images present flow of tracer agent into both kidneys, filtration process and outflow of the agent through the renal pelvis. In the subsequent image time frames, brightness of medulla, renal pyramids and renal cortex significantly changes over time. Moreover, the kidneys move in antero-posterior direction due to respiration process, and are slightly squeezed and bent by movement of the diaphragm. Local changes in brrightness and deformations make the registration task nontrivial. We applied two strategies of finding feature points and building a mesh. In the first strategy the feature points were indicated manually – voxels were selected on the kidney and medula boundaries, along the main arteries and along the pelvis. In the second strategy we manually outlined volumes of kidneys. The positions of the feature points were indicated randomly, so that they evenly fill the volume of each kidney. The second approach can be justified by the fact, that resolution of image is low so that even accidentally indicated voxels lie on or close to the distinctive anatomical structures. The coordinates of the selected voxels were used to determine initial locations of the model nodes. The choice of either of the two strategies did not have a significant impact on the obtained results. The neighborhood of nodes that surrounds any specific node was established for the initial locations of the nodes and then preserved during the matching process. The neighborhood was determined on the basis of the Euclidean distance and included nodes located at the distance equal or less than the width of 6 image voxels. However, the presented algorithm for image registration is not restricted to this strategy. Another, alternative approach to be considered may utilize a generalized Delaunay tessellation [7] to build 3-dimensional simplex mesh. Also, the neighborhoods may be established based on the graph theory distance, instead of the Euclidean one. In every time series we arbitrarily selected a single frame – the 17th frame. In this particular frame the regions of kidneys were manually outlined and the feature points were identified exclusively within the regions. Figure 3 presents an example mesh of interconnected feature points placed in the three-dimensional region of the kidney. Next, for every series, the registration procedure was performed to adjust the mesh to the remaining frames of the series. Based on the final and the initial form of the mesh, all the frames were transformed to match

Deformable Mesh for Regularization

75

the image content in the 17th frame. The results were qualitatively assessed by the expert, were recognized as adequate and enabled correct estimation of glomerular filtration rate.

Fig. 3. The example mesh placed in the space of a three-dimensional image

The number of nodes ranged from 550 to 1015 depending on the specific time series. The NCM was used as the image similarity function with a block of 7 × 7 × 7 voxels and parameter ξ = 5.0. Unimodal transformation was used as a regularization term with ρ = 0.7. The registration procedure was split into two stages. At first, 150 iterations with the neighborhood comprising all the nodes of the mesh were executed. In the second stage additional 10 iterations were performed with neighborhoods including nodes located within a range of 12 voxels. The mesh in the first stage behaved semi-rigid, and in the second stage it allowed for bending. This multistage approach was successfully applied in two-dimensional models [14] and proved to be computationally efficient and accurate. The algorithm enabled registration of 4–8 images per second on Intel Core i7-4790 360 GHz processor. It must be noted that quantitative assessment of the correspondence in the registered images is difficult since there is no ground truth. Also, an attempt to manually indicate the corresponding points in the images is burdened with a significant error. Therefore, we present the results of registration in a form of overlapping images to enable the reader to make his or her own assessment of the method. Figure 4 presents selected frames before (LHS) and after (RHS) the registration. The fusion of two co-registered images is presented by means of color components, one of the images is shown in green and the other in magenta. If

76

P. M. Szczypi´ nski and A. Klepaczko

Fig. 4. The images of kidneys before and after the registration

both the images overlap, the green and magenta contours and structures align to compose a gray-scale pattern. Otherwise the misaligned green or magenta streaks or patches are present. The alignment of the kidney regions after the registration seams accurate. It can be noticed however that some image fragments outside the kidney volumes, especially the outer contour of the body, are misaligned. This effect was expected and it is acceptable since only the kidney regions were registered, whilst locations of the other image fragments were ignored as irrelevant. For comparison with a state-of-the-art method, the DCE-MR images were co-registered using b-spline deformable registration algorithm implemented in Plastimatch module [12] of 3D Slicer software. The entire procedure was fully automatic, i.e. no fiducial points were annotated on the registered frames. For each analyzed series, one time-frame was selected as a reference (fixed ) and all the other frames were registered to it. In Plastimatch, image matching can be configured as a multi-stage procedure, where each stage corresponds to a different scale of image content. The first stage performs preliminary alignment based on translation and affine transformation. This step is followed by three b-spline deformable stages varied by the image subsampling rate (in x, y, and z direction) and the grid size. In the experiment we used the following parameter settings: subsampling rate = (4, 4, 2) grid size = 100 mm (in stage 2), and subsampling rate = (2, 2, 1) with grid sizes = 50 and 25 mm (in stages 3 and 4). In each stage, registration was evaluated using the mean squared error criterion. The four registration stages are performed in a sequence and the result of a given step is simultaneously an input to the next one. This procedure provided a registration rate of roughly 1 image per second. Figure 5 presents frame 63 of the 1st time series. After Plastimatch application the shape and size of the white spot of pelvis is much smaller than required. Moreover, the boundary of kidneys are not aligned with the red contour indi-

Deformable Mesh for Regularization

77

Fig. 5. Comparison of registration results by the reference (LHS) and the proposed (RHS) methods

cating the expected location of the boundary. This means that in the reference algorithm the volume of the whole kidney is inadequate, the boundaries are not properly aligned, and the volume of the pelvis is reduced in size in an undesirable way. These unwanted artifacts are not present in the image obtained by the proposed algorithm.

6

Conclusions

The presented algorithm fulfills the needs of medical image registration. It enables to focus the registration on selected regions of interest and on selected feature points. The user has gained control to properly balance the image derived contribution of individual feature points and the contribution of the regularization term, which maintain mutual spatial relations of these points arrangement. The regularization term is based on the locally estimated transformation. The optional choice of affine, Procrustes or unimodal transform make feasible to model various physical properties of tissues, including susceptibility or resistance to stretching, bending or volume changes. The proposed method can be successfully applied to two- or three-dimensional images. Moreover, the algorithm enables arbitrary selection of image similarity function. Therefore, it enables registration of images of the same or varying modalities. Compared to the reference method, the proposed regularization term and deformable mesh performed more accurate alignment of kidney regions. The position of the kidneys is more stable and the motion effect is completely compensated. The ability to preserve volume has been confirmed in registration of the pelvis region. The proposed solution keeps the shape and the size of the pelvis, whereas the method based on b-splines extremely reduces its volume when the contrast agent is present. The presented results confirmed the ability of the proposed method to correctly register time series of kidney images. The visual assessment of the resulting images confirmed the high accuracy of the

78

P. M. Szczypi´ nski and A. Klepaczko

registration. The results were found useful for further analysis, specifically for estimation of the glomerular filtration rate. The further work will focus on quantitative evaluation of the algorithm, its comparison with another state-of-the-art methods, and application to registration of images of various modalities. Acknowledgment. This work was supported by the Polish National Science Centre Grant 2014/15/B/ST7/05227. The DCE-MR images of kidneys were provided by the Department of Radiology, Haukeland University Hospital, Bergen, Norway. The source codes of the presented program for image registration are available from https://gitlab. com/piotr.szczypinski/deformowalne.

References 1. Bookstein, F.L.: Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell. 11(6), 567–585 (1989) 2. Chen, F., Loizou, P.C.: Analysis of a simplified normalized covariance measure based on binary weighting functions for predicting the intelligibility of noisesuppressed speech. J. Acoust. Soc. Am. 128(6), 3715–3723 (2010) 3. Crum, W.R., Hartkens, T., Hill, D.: Non-rigid image registration: theory and practice. Br. J. Radiol. 77(suppl 2), S140–S153 (2004) 4. Eggert, D.W., Lorusso, A., Fisher, R.B.: Estimating 3-D rigid body transformations: a comparison of four major algorithms. Mach. Vis. Appl. 9(5–6), 272–290 (1997) 5. Hill, D.L., Batchelor, P.G., Holden, M., Hawkes, D.J.: Medical image registration. Phys. Med. Biol. 46(3), R1 (2001) 6. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: active contour models. Int. J. Comput. Vis. 1(4), 321–331 (1988) 7. Liu, Y., Snoeyink, J.: A comparison of five implementations of 3D delaunay tessellation. Comb. Comput. Geom. 52(439–458), 56 (2005) 8. Maintz, J.A., Viergever, M.A.: A survey of medical image registration. Med. Image Anal. 2(1), 1–36 (1998) 9. Maurer, C.R., Fitzpatrick, J.M.: A review of medical image registration. Interact. Image-Guided Neurosurg. 1, 17–44 (1993) 10. McInerney, T., Terzopoulos, D.: Deformable models in medical image analysis: a survey. Med. Image Anal. 1(2), 91–108 (1996) 11. Pluim, J.P., Maintz, J.A., Viergever, M.A.: Mutual-information-based registration of medical images: a survey. IEEE Trans. Med. Imaging 22(8), 986–1004 (2003) 12. Sharp, G.C., Li, R., Wolfgang, J., Chen, G., Peroni, M., Spadea, M.F., Mori, S., Zhang, J., Shackleford, J., Kandasamy, N.: Plastimatch-an open source software suite for radiotherapy image processing. In: Proceedings ICCR, Amsterdam, Netherlands (2010) 13. Sp¨ ath, H.: Fitting affine and orthogonal transformations between two sets of points. Math. Commun. 9(1), 27–34 (2004) 14. Szczypi´ nski, P.M., Sriram, R.D., Sriram, P.V., Reddy, D.N.: A model of deformable rings for interpretation of wireless capsule endoscopic videos. Med. Image Anal. 13(2), 312–324 (2009)

Simulator for Modelling Confocal Microscope Distortions Katarzyna Sprawka(B) and Piotr M. Szczypi´ nski Institute of Electronics, Lodz University of Technology, W´ olcza´ nska 211, 90-924 L ´ od´z, Poland [email protected], [email protected]

Abstract. In research on stem cells differentiation, studying of interactions between cell and scaffold may lead to understanding of complex biological processes. The evolution of cell’s morphology may be quantified by application of confocal fluorescent microscopy and image processing techniques. However, selection of appropriate image processing algorithms, their optimization and validation is difficult due to lack of a ground truth. In this paper, we present a simulator to construct three-dimensional models of fiber scaffolds, mimic imaging phenomena of confocal microscopy and produce artificial yet realistic images. The model of point spread function to represent optical distortions is based on the vectorial theory of light and accounts for setups used under non-design conditions. Having the artificial images and the corresponding ground truth models, one can test the image processing procedures, optimize them and compare the results with the models. We compare three algorithms for image segmentation to present the utility of the simulator. Keywords: Confocal microscopy

1

· Point spread function · Simulator

Introduction

The examination of the contact points between cell and scaffold is of great importance for tissue engineering, because it allows for assessing the influence of a given type of scaffold on a cell [2]. The cellular differentiation depends on the shape of the growing cell, which in turn is influenced by the shape of the scaffold and its biomaterial substrate. Therefore, the examination of the scaffold-cell contacts can provide important information on the relationship between the cell behaviour and the types of scaffolds, which would allow for their design aimed at desired function of the cell [11,13,14,18]. Recently, the research on processing of the confocal fluorescent microscopy images of cell cultures growing on different types of scaffolds are conducted to investigate the cell-scaffold contact points [1–3]. An indispensable step during the investigation is development of various data-driven segmentation methods of fiber scaffolds images. However, until now, no evaluation method is known c Springer Nature Switzerland AG 2019 E. Pietka et al. (Eds.): ITIB 2019, AISC 1011, pp. 79–90, 2019. https://doi.org/10.1007/978-3-030-23762-2_8

80

K. Sprawka and P. M. Szczypi´ nski

enabling quantitative verification of the results. In order to overcome this issue, we aim to design the simulator to generate the ground truth images of the fiber scaffold and create realistic artificial images by means of introducing the distortions specific for the confocal fluorescent microscope. The distortions involve the system’s point spread function (PSF) and the photon shot noise. In this way, the segmentation algorithms can be tested on the artificial images and the results can be verified with the ground truth models.

2

Image Formation in Confocal Fluorescent Microscope

For most applications any optical microscope can be assumed to be [16]: (1) linear, which implies that the image of an object is a linear superposition of all object elements; (2) shift-invariant meaning that when the object moves, the image is shifted but not deformed. Under these assumptions, the optical system can be fully described by its point spread function (PSF). The PSF characterises the response of the system to the infinitesimal, point-like object. Due to diffraction resulting from the circular aperture of the microscope, the response extends to three dimensions. In other words, the PSF is an impulse response of the system [15,16]. As a result, the 3D image formation can be modelled by a convolution of the objects’s fluorescence distribution with the system’s PSF [15–17,19]. In addition, the photon shot noise of a Poisson distribution is introduced [6,10] associated with the particle nature of light and image sensor characteristics. Hence, the image f (x, y, z), arises from the convolution of the object intensity function f (x, y, z) with the system’s PSF(x, y, z) plus the shot noise. The process of image formation in a confocal microscope is summarised in Fig. 1 and can be described by the formula (1). f (x, y, z) = f (x, y, z) PSF(x, y, z) + ξ,

(1)

where ξ is a photon shot noise. In a typical confocal microscope, a small volume of a sample is illuminated by a beam of linearly polarized monochromatic light of wavelength λill , highly focused by a microscope’s objective. On its way, the illumination light passes through three different mediums: immersion medium, cover glass and the sample medium of thicknesses ti , tg and ts , respectively. These mediums have different refraction indices (RIs): ni , ng and ns , respectively, which results in refraction of the light wave on the interferences of the mediums and thus in change of incidence angles θ1 , θ2 , θ3 . The illumination path is showed in Fig. 2. The fluorescent molecules near the focal point get excited and emit light of wavelength λem ,

Fig. 1. Image formation in a confocal microscope

Simulator for Modelling Confocal Microscope Distortions

81

Fig. 2. Illumination path of a confocal microscope; Microscope operated under (a) the design conditions (b) the actual conditions

which is collected by the objective and focused into a pinhole, a small circular aperture placed in front of the detector blocking the out-of-focus light [8,16,19]. The PSF of the confocal microscope is thus a product of the illumination PSF determined by the field distribution of the focused illuminating beam and the detection PSF described by the spatial filtering properties of the pinhole [8,17]. The amount of light, which can enter and exit the given objective is defined by its numerical aperture (NA) given by NA = ni sin α, where α is a half-angle of the marginal rays [16,17]. If the rays in Fig. 2 are assumed the marginal rays, α = θ1 . When the microscope is operated under the design conditions used by the manufacturer for objective correction, the focal point lays immediately under the cover glass of a given thickness, t∗g , and RI, n∗g . Also, the cover glass is separated from the lens system by an immersion medium of given thickness, t∗i and RI, n∗i , as shown in Fig. 2a. However, in biological applications the actual values of the system often significantly differ from the design values, see Fig. 2b. The design parameters are denoted with an asterisk (e.g. t∗g ), whereas the actual ones without (e.g. tg ). Every refractive indices or thickness mismatch introduces additional aberrations to the system, influencing its PSF [5,8,16].

3 3.1

Methods The Simulator

The goal of the simulator is to mimic image formation of a 2-channel fluorescent confocal microscope. This technique may be used to visualize cells growing on fiber scaffolds, where the cell and the scaffold are stained to emit fluorescent light of varying wavelengths. Therefore, there are two channels or images generated, one presenting the scaffold and another showing the cell. The simulator produces such images in two stages. In the first stage, the ground truth images are created by combining threedimensional solids such as cylinders and toruses, which approximate the shape

82

K. Sprawka and P. M. Szczypi´ nski

of the fibers, and ellipsoids which approximate the shape of the cells. The blocks are defined by text instructions, each specifying the type of the solid and its parameters. A cylinder is described by its radius (r) and coordinates of two points ([x1 , y1 , z1 ], [x2 , y2 , z2 ]) by which it passes. A torus is defined by its centre ([x0 , y0 , z0 ]), two rotation angles (αx , αy ), major (R) and minor radius (r). An ellipsoid is defined by its centre ([x0 , y0 , z0 ]), semi-axes (a, b, c) and three rotation angles (αx , αy , αz ). The simulator creates two raster images of a size and voxel space set by the user. Voxel by voxel it evaluates if a given point [x, y, z] is inside of any of the solids. In case of a torus and an ellipsoid the point is rotated by the relevant angles around the center of the solid. Next, it is verified if the coordinates satisfy appropriate inequality of a cylinder (2a), a torus (2b) or an ellipsoid (2c). If it does, it is labeled as a cell (first channel) or a scaffold (second channel) or otherwise as a background. The partial volume effect may be also simulated by modeling a raster image with increased resolution and then locally averaging and downsampling it. The images resulting from the first stage of simulation are used as a ground truth. 2

((z − z1 )(x − x2 ) − (x − x1 )(z − z2 )) ((y − y1 )(z − z2 ) − (z − z1 )(y − y2 )) + (x2 − x1 )2 + (y2 − y1 )2 + (z2 − z1 )2 (x2 − x1 )2 + (y2 − y1 )2 + (z2 − z1 )2

2

2

((x − x1 )(y − y2 ) − (y − y1 )(x − x2 )) < r2 (x2 − x1 )2 + (y2 − y1 )2 + (z2 − z1 )2 2 (x − x0 )2 + (y − y0 )2 − R + (z − z02 ) < r2 +

(2a) (2b)

(x − x0 )2 (y − y0 )2 (z − z0 )2 + + 38o C. The same event is the starting event of the clinical process. – The very first state of the life cycle caused with the febrile seizure attack is Unclassified Febrile Seizure. In this state the model expects three important possible events: • Either the seizure ends within 10 min and it is not repeating within 24 h. In such case the initial incident is classified as a Simple Febrile Seizure and probably does not mean anything serious. In the case of children between 6 and 60 months of age, which is a relevant group of patients for the problem of simple febrile seizures, the seizure together with the fever can occur for many other ‘innocent’ reasons as a natural phenomena associated with the quick development of the organism in this age. • If the seizure lasts over 10 min or repeats within 24 h it is classified as a Complex Febrile Seizure which represents completely different, more serious, possibilities of the further development. • Third possible event in this state is the detection of the Inflammation in Nervous System which can explain both the seizure as well as the fever. If successfully treated this state can also result in nothing serious, nevertheless otherwise it can lead to the state of Complex Febrile Seizure. Based on this information the process immediately starts with checking the possible inflammation in nervous system in order to exclude this possibility before eventual conclusion that the incident is just a simple febrile seizure. Such conclusion might mean, in the case of the inflammation, the neglecting of possible cause of consequent complex febrile seizure. Just after the inflammation is excluded the consequent decision about whether the incident is a simple or complex febrile seizure can be made. If the seizure lasts over 10 min the condition for the complex febrile seizure is met so the process does not needlessly wait for possible repetition and continues immediately with the consequent magnetic resonance investigation. If the seizure ends within 10 min then the process still has to wait next 24 h to exclude the complex febrile seizure. Only after 24 h without the next seizure the process can end in the state Parient healthy. – The life cycle model at Fig. 4 also contains possible state of MRI Abnormality which can follow the Complex Febrile Seizure if the focal epilepsy is diagnosed. This state can be progressively worsen by subsequent seizure incidents. This possibility is modeled by the self-transition, representing the causal view of the cyclic associations between the concepts MRI Abnormality and MRI Abnormality Precondition in the conceptual model at Fig. 3. In the process model at Fig. 5 this information is reflected in the unconditional

Information Models of Dynamics in Healthcare

139

activity Magnetic Resonance Investigation after the complex febrile seizure is diagnosed. According to the result of the investigation the process then either continues with the treatment of diagnosed focal epilepsy or returns to the 24 h waiting in order to exclude the complex febrile seizure. If the seizure comes repeatedly even when the focal epilepsy is excluded by MRI then the process ends in the state of persisting complex febrile seizure as there is no further generally recommended medical action. This possible situation follows from the circular relationship between the states Unclassified Febrile Seizure and Complex Febrile Seizure in the life cycle which may result, under relevant combination of conditions, in the persisting complex febrile seizure. As it follows from the previous paragraphs the event is a common denominator of both basic types of dynamics in the healthcare - the causality of disease and the clinical process. The same events which occur in the life cycle as causes of the transitions of states play in the clinical process the roles of stimuli for relevant activities focused to the process goal. This way modelled clinical process is then determinately related to the general causality of the Real World relevant to the goal of the process. This approach to the conception of clinical processes and their IT support can also significantly contribute to the needed integration of naturally related but unfortunately still not integrated fields of eHealth activities, namely the field of Evidence-based Healthcare (EBHC) and the field of medical ontology engineering. Medical ontologies can be enriched with models of life cycles of essential medical concepts which are regular subjects of their interest. Such enrichment shifts current ontological models from the position of just static modality model closer to the position of fully-fledget causality model which then can be directly used as an exact basis for the design of clinical processes. Such way designed clinical processes then might be closely bound with the Real World causality recognized in the medical research and exactly defined in the formal informatics models. Acknowledgement. The work presented in this paper was processed with financial contribution of long term institutional support of research activities by Faculty of Informatics and Statistics, University of Economics, Prague.

References 1. Business Process Model and Notation (BPMN). OMG Doc. No.: formal/2011-0103, Standard document, http://www.omg.org/spec/BPMN/2.0 (2011) 2. Braun, R., Burwitz, M., Schlieter, H., Benedict, M.: Clinical, processes from various angles - amplifying BPMN for integrated hospital management. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2015), pp.837–45. IEEE, Washington, DC, USA (2015) 3. Wang, D., Peleg, M., Tu, S.W., Shortliffe, E.H., Greenes, R.A.: Representation of clinical practice guidelines for computer-based implementations. Stud. Health Technol. Inf. 285–289 (2001) 4. Eriksson, H.E., Penker, M.: Business Modeling with UML: Business Patterns at Work. Wiley, New York (2000)

140

ˇ V. Repa

5. Seizures, F.: Guideline for the neurodiagnostic evaluation of the child with a simple febrile seizure. Subcomm. Febrile Seizures Pediatr. (2011) 6. Guizzardi, G.: Ontological Foundations for Structural Conceptual Models. Centre for Telematics and Information Technology, Telematica Instituut (2005) 7. Homer, Jack B., Hirsch, Gary B.: System dynamics modeling for public health: background and opportunities. Am. J. Pub. Health 96(3), 452–58 (2006). https:// doi.org/10.2105/AJPH.2005.062059 8. Joanna Briggs Institute: Management of the child with fever. Best Practice, Evidence Based Practice Information Sheets for Health Professionals, vol. 5, Issue 5 (2001) 9. Morris, Z., Whiteley, W.N., Longstreth, W.T., Weber, F., Lee, Y.C., Tsushima, Y., et al. Incidental findings on brain magnetic resonance imaging: systematic review and meta-analysis. BMJ Res. Paper (2009) 10. Mulyar, N., Pesic, M., van der Aalst, W.M.P., Peleg, M.. Declarative and procedural approaches for modelling clinical guidelines: addressing flexibility issues. In: ter Hofstede, A., Benatallah, B., Paik, H.Y. (eds.) Business Process Management Workshops, vol. 4928, pp. 335–346. Springer, Heidelberg (2008) 11. Perez, L., Dragicevic, S.: An agent-based approach for modeling dynamics of contagious disease spread. Int. J. Health Geograph. 8(1) (2009) 12. Repa, V.: Essential challenges in business systems modeling. In: 2017 Proceedings of10th SIGSAND/PLAIS Euro Symposium on Information Systems: Research, Development, Applications, Education, Gdansk, Poland, 22 Sept 2017, pp. 99–110 (2017) 13. Repa, V.: Modelling life cycles of generic object classes. In: Linger, H., Fisher, J., Barnden, A., Barry, C., Lang, M., Schneider, C., (eds.) Building Sustainable Information Systems, pp. 443-54. Springer, Boston (2013) 14. Repa, V.: Application of modelling methods from informatics in evidence based health care. In: Andersson, B., Johansson, B., Carlsson, S., Barry, C., Lang, M., Linger, H., Schneider, C. (eds.) Designing Digitalization (ISD2018 Proceedings). Lund, Sweden: Lund University (2018). ISBN: 978-91-7753-876-9. http://aisel. aisnet.org/isd2014/proceedings2018/eHealth/4 15. Rovetto, R.J., Mizoguchi, R.: Causality and the Ontology of Disease. Appl. Ontol. 10(2), 79–105 (2015) 16. Strasser, M., Pfeifer, F., Helm, E., Schuler, A., Altmann, J.: Defining and reconstructing clinical processes based on IHE and BPMN 2.0. Stud. Health Technol. Inf. 482–486 (2011) 17. Svatos, O., Repa, V.: Working with process abstraction levels. In: Proceedings of 15th International Conference on Perspectives in Business Informatics Research, BIR 2016, Prague, Czech Republic, 15–16 Sept 2016, pp. 65–79 (2016) 18. Unified Modelling Language Infrastructure Specification, version 2.4.1, Object Management Group (OMG), http://www.omg.org

Effects of External Conditions to Chaotic Properties of Human Stability ˇ Radek Halfar1(B) , Martina Litschmannov´ a2 , and Martin Cern´ y1 1

2

Department of Cybernetics and Biomedical Engineering, ˇ – Technical University of Ostrava, VSB 17. listopadu 15/2172, 708 33 Ostrava, Czech Republic {radek.halfar,martin.cerny}@vsb.cz ˇ – Technical University of Ostrava, Department of Applied Mathematics, VSB 17. listopadu 15/2172, 708 33 Ostrava, Czech Republic [email protected]

Abstract. Human balance is one of the main tasks, that the human body needs to perform. This ability is crucial for performing daily activities and its examinations can be used for the prediction of falls in elderly people. It is important to analyse, and understand the way, how the body can achieve this state. In this paper, the data obtained from the force platform were examined. The investigated experiment consists of steady standing on a force plate for 60 s during the different condition of vision and surface. From these data, maximal Lyapunov exponents were calculated and the effects of test conditions were analysed. Keywords: Posturography Statistical analysis

1

· Maximal Lyapunov exponent · Chaos ·

Introduction

Maintaining the balance is a result of the coordination of many systems of the human body (muscles, vision, neural system etc.). Thanks to this coordination we are able to perform everyday tasks and be independent human beings. When this system is corrupted (e.g by age-related disabilities or by illnesses) it can lead to impaired balance and increase the risk of falling. For these reasons, examinations and understanding of this process are the main goals of many researchers. The most common way to examine balance is by using posturography. During posturography test is patient in standing posture placed on a force plate, which is able to detect tiny oscillations of the body. These data are most commonly evaluated in the form of the centre of pressure (CoP) displacement. There are many possible ways how to investigate these data and many of researchers finding the best analysation techniques up to these days. One of this paper is the work carried out by Javaid et al. [1]. In this article, the researchers propose a set of features for handling CoP data and discriminations between young and elderly subjects from this data. Paper written by Malik c Springer Nature Switzerland AG 2019 E. Pietka et al. (Eds.): ITIB 2019, AISC 1011, pp. 141–150, 2019. https://doi.org/10.1007/978-3-030-23762-2_13

142

R. Halfar et al.

and Lai [2] is dealing with time series clustering techniques for CoP. In this work, partitioning clustering with Dynamic Time Warping measure, Permutation Distribution Clustering and k-means for longitudinal data clustering techniques for CoP time series is studied. In made by Montesinos et al. [3] are studied entropy measures of CoP time series using different values of parameters. The authors in this paper describe the effect of embedding dimension (m) and tolerance (r) in the calculation of Approximate Entropy and Sample Entropy. On the other hand, the investigation of maximal Lyapunov exponent is the well-stated tool used in many studies dealing with diagnose of various disease. One of these studies is paper carried by Pavlov et al. [4] in which investigation of maximal Lyapunov exponent calculated from RR sequences is used for diagnostic of the cardiovascular disease. Another example of using maximal Lyapunov exponent is ¨ the paper written by Ubeyl and G¨ uler [5]. In this paper, the statistics over Lyapunov exponents for feature extraction in electroencephalography is computed. In work carried out by Alonso et at. [6], using maximal Lyapunov exponent as a feature used in the artificial neural network to distinguish healthy and pathological ECG signals. Investigation of human balance, as well as using maximal Lyapunov exponent as a diagnostic tool to distinguish between physiological and pathological data is often examined problem these days. Nevertheless, for the proper data analysis, the researcher must fulfil the assumption, that the process of data acquisition is not impaired by the effects of external conditions to the measuring system and propose the experiment to inhibit these undesirable effects as much as possible. In order to determinate these unfavourable external conditions and help researchers with proper experiment proposal, the possible effects of the external condition are being evaluated in this paper. Data acquired during static posturography (forces F , moments of forces M oF , and centre of pressure CoP ) are investigated to the presence of chaos under different condition of vision (closed and open) and surface (firm and foam). This analysis is performed by the computation of maximal Lyapunov exponent and subsequent statistical analysis. Data are described by standard summary statistics for quantitative variables and visualized by histograms and boxplots. For the determination of outliers inner fences are used. For statistical inference Wilcoxon signed-rank test is used, the assumption of normality is tested by the Shapiro-Wilk test. All hypotheses tests used in this study are performed with 5% significance level. Statistical analysis performed in this study is computed in R designed by R Core Team [7] using packages ggpubr proposed by Kassambara [8], moments introduced by Komsta and Novomestky [9], dplyr [10], and tidyr [11] proposed by Wickham et al., and ggplot2 introduced by Wickham[12].

2 2.1

Materials and Methods Dataset

In this paper, the publicly available dataset from PhysioNet website [13] is used. More specifically, it is dataset called Human Balance Evaluation Database (HBEDB) provided by Santos and Duarte [14]. The HBEDB contains data

Chaos of Human Stability

143

recorded using a force platform from subject performing posturography tests. These tests consist of steady standing for 1 min on the force plate under 4 different conditions. – – – –

Eyes Eyes Eyes Eyes

open on firm surface. open on foam surface. closed on firm surface. closed on foam surface.

Each condition is tested three times and the total amount of test subjects is 163. The overall amount of records in the dataset is 1930. The data is sampled at 100 Hz. Next, the low pass filter with a cut off frequency of 10 Hz is used. Each recording includes 8 channels with 3 different recording types. – Force (N) measured in x, y and z. – Moment of forces (Nm) measured in x, y, and z. – Center of pressure (cm) calculated in x and y. More information about this dataset can be obtained from [14]. 2.2

Lyapunov Exponent

The Lyapunov exponent is a quantity, that can be used to analyze divergence of separation of trajectories in phase space, that differs in an initial condition by a small number δZ0 . Under assumptions, that the divergence can be dealt with in the linearized approximation, the rate of divergence is given by |δZ(t)| ≈ eλt |δZ0 | where λ is Lyapunov exponent. Since the rate of divergence can be different for diverse orientations of initial separation vector, it exists the whole spectrum of Lyapunov exponents, that is equal to the number of dimensions in the phase space. The maximal number in this spectrum is common to call the maximal Lyapunov exponent. This number determines the concept of predictability for a dynamical system. In case of fulfilment other assumption (such as phase space compactness etc.), the positive maximal Lyapunov exponent is the indication of deterministic chaos in the dynamical system. The maximal Lyapunov exponent if given by λmax = lim lim

t→∞ δZ0 →0

2.3

1 |δZ(t)| ln t |δZ0 |

Maximal Lyapunov Exponent Ccalculation

In order to analyze the system measured in multiple dimensions as a whole unit, new variables are computed from the data contained in the investigated dataset.

144

R. Halfar et al.

These variables (force F , moment of force M oF and centre of pressure CoP ) are calculated as a module from the axes of the measured system (see Eq. (1)). 2 + F2 + F2 F = FX Y Z 2 2 (1) M oF = M oFX + M oFY + M oFZ2 2 + CoP 2 + CoP 2 CoP = CoPX Y Z From these variables, the maximal Lyapunov exponent is calculated. These calculation are performed in R [7] using nonlinearTseries package proposed by Garcia and Sawitzki [15]. In this package, the estimation of the maximal Lyapunov exponent is calculated using Takens’ vectors. Next, for each tested subject, the value representing λmax for the investigated system (F , M oF , CoP ) is assigned. This value is calculated as a mean value from all λmax computed from the F , M oF , and CoP generated by the tested subject during all performed tests.

MoF

F

CoP

8

λmax

4

0

−4

Open

Closed

Open

Closed

Open

Closed

Fig. 1. Boxplots of λmax depicted according to the vision

In Figs. 1 and 2 can be seen graphical representation of these calculated values. In these figures can be seen boxplots of λmax according to the different condition of vision and surface. From this representation of data can be seen that all calculated values are distributed around 0 and λmax computed from F has more variability, than data computed from M oF and CoP . From these figures can be also seen, that the biggest difference in variability (between opposite test condition) is in λmax calculated from F and sorted according to surface. This information is also summaries in Tables 1 and 2. Another fact worth the notice is, that all data contains outliers (see Figs. 2 and 1). For subsequent analysis, these outliers are removed from the investigated dataset. After the outliers removal, another dataset reduction is performed. This reduction is done by remaining only data from subjects, that has been contained in both comparative measurements in tested condition (vision closed/open and surface firm/foam). Therefore for each investigated variables, the dataset

Chaos of Human Stability MoF

F

145

CoP

5

λ max

0

−5

−10 Firm

Foam

Firm

Foam

Firm

Foam

Fig. 2. Boxplots of λmax depicted according to the surface Table 1. Summary statistics of λmax according to the vision Type

MoF

F

CoP

Vision

Open Closed Open Closed Open Closed

Frequency

163

Minimum

−2.54 −2.27

−6.95 −6.28

−2.48 −2.40

Lower Quartile

−0.42 −0.44

−1.85 −1.18

−0.44 −0.47

Median

0.02

0.02

−0.41 0.07

0.01

0.04

Upper Quartile

0.29

0.49

1.37

1.74

0.26

0.51

Maximum

2.25

2.35

7.14

7.92

2.27

2.67

Standard Deviation 0.65

0.74

2.64

2.33

0.63

0.76

163

163

163

163

163

Table 2. Summary statistics of λmax according to the surface Type

MoF

F

CoP

Surface

Firm

Foam Firm

Foam

Firm

Foam

Frequency

163

160

160

163

160

Minimum

−1.64 −2.73 −5.42 −10.48 −1.63 −2.59

Lower Quartile

−0.26 −0.51 −1.26 −1.75

163

−0.26 −0.50

Median

0.04

0.01

0.02

0.12

0.03

0.02

Upper Quartile

0.40

0.42

1.19

1.83

0.39

0.41

Maximum

2.32

1.77

6.22

8.78

2.29

1.67

Standard Deviation 0.65

0.74

2.21

3.03

0.65

0.76

contains values of λmax only from those test subjects, whose maximal Lyapunov exponents can be found in both measurement of tested condition (vision closed/open and surface firm/foam).

146

3 3.1

R. Halfar et al.

Statistical Analysis Exploratory Data Analysis

Summary statistics of above-mentioned dataset can be found in Tables 3 and 4. Data visualization can be seen at Figs. 3 and 4. From these tables can be seen, that medians of λmax (except for data calculated from F during open vision) are close to 0. Another fact worth to mention is that all data meet the condition of negative minimum and positive maximum value. This fact indicates, that it cannot be decided about the nature of human stability (chaotic or regular). Table 3. Summary statistics of λmax according to the vision (data without outliers) Type

MoF (n = 154)

Vision

Open Closed

Open Closed Open Closed

Minimum

−1.39 −1.56

−5.71 −4.74

−1.39 −1.58

Lower Quartile

−0.41 −0.44

−1.77 −1.11

−0.42 −0.47

Median

0.02 −0.02

Upper Quartile

0.28

0.48

F (n = 153)

CoP (n = 153)

−0.41

0.13

0.01 −0.02

1.32

1.76

0.25

0.49

Maximum

1.32

1.56

5.78

5.55

1.17

1.47

Standard Deviation

0.56

0.67

2.37

2.08

0.53

0.67

Table 4. Summary statistics of λmax according to the surface (data without outliers) Type

MoF (n = 149) F (n = 150) CoP (n = 148)

Surface

Firm

Foam

Firm

Foam Firm

Foam

Minimum

−1.21 −1.77

−4.69 −6.97 −1.22 −1.84

Lower Quartile

−0.27 −0.45

−1.21 −1.74 −0.27 −0.45

Median

0.02

0.04

−0.02

0.12

0.02

0.03

Upper Quartile

0.36

0.44

1.16

1.85

0.34

0.43

Maximum

1.30

1.77

4.70

6.64

1.26

1.67

Standard Deviation

0.52

0.68

1.90

2.89

0.52

0.70

From Tables 3 and 4 can be also seen, that λmax calculated from M oF and CoP has similar variability. Data computed from F has more variability not only in comparison to values calculated from CoP and M oF , but reaches the biggest variability comparing the opposite test condition (vision closed/open and surface firm/foam). Distribution of Lyapunov exponents according to condition (open/closed vision and firm/foam condition) can be seen at histograms in Figs. 3 and 4.

Chaos of Human Stability MoF

F

147

CoP

75 Open

50

frequency

25 0 75 Closed

50 25 0 −4

0

4

−4

0

4

−4

0

4

λ max

Fig. 3. Histograms of λmax depicted according to the vision MoF

F

CoP

75 Firm

frequency

50 25 0 75

Foam

50 25 0 −8

−4

0

4

8−8

−4

0

4

8−8

−4

0

4

8

λ max

Fig. 4. Histograms of λmax depicted according to the surface

3.2

Paired Tests

Since the data are calculated pairwise upon same statistical units (test subjects), the paired tests are used for investigation of the effect of different vision and surface condition to λmax . Non-parametric Wilcoxon signed-rank test is used because of rejection of assumption of normality for differences of λmax computed from F using data sorted by vision condition (normality is tested by ShapiroWilk test, see Tables 5 and 6). In this test, the hypothesis whether the difference between the pairs follows a symmetric distribution around zero is tested. In other words, whether μdif = 0. For these reasons, the dataset with differences between λmax calculated from variables acquired during the opposite condition is defined. From these results (see Tables 5, 6 and Figs. 3, 4) can be seen, that except for differences of λmax computed from F using data sorted by vision condition are all p-values greater than 0.05. This fact implies, that there is no statistically significant effect to the value of λmax , because there the median of differences of λmax is not significantly different from zero. The only statistically significant effect is determined for F and different vision condition.

148

R. Halfar et al.

Table 5. Results of Shapiro-Wilk tests computed from differences of λmax and results (p-values) of Wilcoxon signed-rank test according to vision Type Shapiro-Wilk test (p-value) Wilcoxon signed-rank test (p-value) MoF 0.158

0.888

F

0.027

0.015

CoP

0.256

0.873

Table 6. Results of Shapiro-Wilk tests computed from differences of λmax and results (p-values) of Wilcoxon signed-rank test according to the surface Type Shapiro-Wilk test (p-value) Wilcoxon signed-rank test (p-value)

4

MoF 0.656

0.330

F

0.157

0.659

CoP

0.518

0.561

Conclusion

In this work, the effects of different vision and surface condition to chaotic properties of variables acquired during posturography tests are investigated. For this purpose, the maximal Lyapunov exponents from the recorded force, the moment of forces, and the centre of pressure are computed. From these results, statistical analysis is performed using exploratory analysis and hypothesis testing. Using these techniques, the statistically significant effect of the vision to a value of λmax calculated from the force is proven. These results suggest that missing information obtained from human vision has a statistically significant effect on the human balance (more precisely to dynamical properties of forces during steady standing). This fact should be kept in mind by all researchers proposing posturography experiments. All subject involving in these test should have an open and clear vision, to ensure that obtained results are not affected by this condition. For future work is planned to verify these results using a different algorithm of computation maximal Lyapunov exponents and using different tools of nonlinear analysis such as entropy calculation. Acknowledgement. The work and the contributions were supported by the project SV4508811/2101 Biomedical Engineering Systems XIV’ and the internal grant agency of VSB Technical University of Ostrava, Faculty of Electrical Engineering and Computer Science, Czech Republic, under the project no. SP2018/68. This study was also supported by the research project The Czech Science Foundation (GACR) 2017 No. 17- 03037S Investment evaluation of medical device development run at the Faculty of Informatics and Management, University of Hradec Kralove, Czech Republic. This study was supported by the research project The Czech Science Foundation (TACR) ETA No. TL01000302 Medical devices development as an effective investment for public and private entities.

Chaos of Human Stability

149

References 1. Javaid, A.Q., Gupta, R., Mihalidis, A., Etemad, S.A.: 2017 IEEE EMBS International Conference on Biomedical Health Informatics (BHI), pp. 453–456 (2017). https://doi.org/10.1109/BHI.2017.7897303 2. Malik, O.A., Lai, D.T.C.: Multi-disciplinary trends. In: Phon-Amnuaisuk, S., Ang, S.P., Lee, S.Y. (eds.) Artificial Intelligence, pp. 112–125. Springer International Publishing, Cham (2017) 3. Montesinos, L., Castaldo, R.: L. Pecchia. In: Lhotska, L., Sukupova, L., Lacković, I., Ibbott, G.S. (eds.) World Congress on Medical Physics and Biomedical Engineering 2018, pp. 315–319. Springer, Singapore (2019) 4. Pavlov, A., Janson, N., Anishchenko, V., Gridnev, V.I., Ya, P., D.: Chaos Solitons & Fractals - CHAOS SOLITON FRACTAL 11 (2000). https://doi.org/10.1016/ S0960-0779(98)00212-4 ¨ 5. Derya Ubeyl, E., G¨ uler, N.: (2019) 6. Alonso-Hern´ andez, J.B., Barrag´ an-Pulido, M.L., Travieso-Gonz´ alez, C.M., FerrerBallester, M.A., Plata-Pérez, R., Dutta, M.K., Singh, A.: 2018 5th International Conference on Signal Processing and Integrated Networks (SPIN), pp. 372–379 (2018). https://doi.org/10.1109/SPIN.2018.8474274 7. R Core Team R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2018). https://www.R-project. org/ 8. Kassambara, A.: ggpubr: ‘ggplot2’ Based Publication Ready Plots (2018). https:// CRAN.R-project.org/package=ggpubr. R package version 0.2 9. Komsta, L., Novomestky, F.: Moments: moments, cumulants, skewness, kurtosis and related tests (2015). https://CRAN.R-project.org/package=moments. R package version 0.14 10. Wickham, H., Fran¸cois, R., Henry, L., M¨ uller, K.: dplyr: A Grammar of Data Manipulation (2018). https://CRAN.R-project.org/package=dplyr. R package version 0.7.7 11. Wickham, H., Henry, L.: tidyr: Easily Tidy Data with ’spread()’ and ’gather()’ Functions (2018). https://CRAN.R-project.org/package=tidyr. R package version 0.8.2 12. Wickham, H.: ggplot2: Elegant Graphics for Data Analysis. Springer, New York (2016). http://ggplot2.org 13. Goldberger, A.L., Amaral, L.A.N., Glass, L., Hausdorff, J.M., Ivanov, P.C., Mark, R.G., Mietus, J.E., Moody, G.B., Peng, C.K., Stanley, H.E.: Circulation 101(23), e215 (June 13 2000). Circulation Electronic Pages: http:// circ.ahajournals.org/content/101/23/e215.fullPMID:1085218; https://doi.org/10. 1161/01.CIR.101.23.e215 14. Santos, D.A., Duarte, M.: Peer J. 4, e2648 (2016). https://doi.org/10.7717/peerj. 2648 15. Garcia, C.A.: Nonlinear T series: Nonlinear Time Series Analysis (2018). https:// CRAN.R-project.org/package=nonlinearTseries. R package version 0.2.5 16. Bryant, P., Brown, R., Abarbanel, H.D.I.: Phys. Rev. Lett. 65, 1523 (1990). https://doi.org/10.1103/PhysRevLett.65.1523 17. Constantine, W., Percival, D.: Fractal: A Fractal Time Series Modeling and Analysis Package (2017). https://CRAN.R-project.org/package=fractal. R package version 2.0-4

150

R. Halfar et al.

18. Acharya, U.R., Fujita, H., Adam, M., Sudarshan, V.K., Koh, J.E.: 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 000,533– 000,538 (2016). https://doi.org/10.1109/SMC.2016.7844294 19. Wang, Y., Wang, K., Zhang, L.: 2017 32nd Youth Academic Annual Conference of Chinese Association of Automation (YAC), pp. 89–95 (2017). https://doi.org/10. 1109/YAC.2017.7967384 20. Quang Dang Khoa, T., Huong, N., Vo, T.: Computational and mathematical methods in medicine 2012, 847686 (2012). https://doi.org/10.1155/2012/847686

A Preliminary Evaluation of Transferring the Approach Avoidance Task into Virtual Reality Tanja Joan Eiler1(B) , Armin Gr¨ unewald1 , Alla Machulska2 , Tim Klucken2 , orn Niehaves3 , Carl Friedrich Gethmann4 , Katharina Jahn3 , Bj¨ and Rainer Br¨ uck5 1

2 3

Medical Informatics and Microsystems Engineering, University of Siegen, 57076 Siegen, Germany {tanja.eiler,armin.gruenewald}@uni-siegen.de http://www.eti.uni-siegen.de/mim/ Department of Clinical Psychology, University of Siegen, 57076 Siegen, Germany {alla.machulska,tim.klucken}@uni-sigen.de Department of Information Systems, University of Siegen, 57076 Siegen, Germany {katharina.jahn,bjoern.niehaves}@uni-sigen.de 4 Research College FoKoS, University of Siegen, 57076 Siegen, Germany [email protected] 5 Life Science Faculty, University of Siegen, 57076 Siegen, Germany [email protected]

Abstract. Following our previous study, a new demonstrator was developed, which transfers the Approach Avoidance Task (AAT) into virtual reality (VR) to support the therapy of substance dependency diseases. This was done in consideration of past evaluation results in order to further increase the effectiveness of the training, for instance by improving usability, functionalities and graphics. A study was carried out with twenty-five people who were then asked to complete a questionnaire focusing on presence, involvement, realism, the possibility to act, and quality of interface. The results show that the use of the Leap Motion sensor as an interaction variant, compared to using controllers, performed best in most areas. Due to the simplified interaction and lower susceptibility to errors, this interaction variant, in combination with the new feature “simplified object interaction”, is to be preferred when calculating the cognitive bias. Keywords: Addiction · Approach avoidance task · AAT · Approach bias · Cognitive bias · Cognitive bias modification · CBM Dual process model · Embodiment · Game design · Immersion · Presence · Smoking · Therapy · Virtual reality · VR

c Springer Nature Switzerland AG 2019 E. Pietka et al. (Eds.): ITIB 2019, AISC 1011, pp. 151–163, 2019. https://doi.org/10.1007/978-3-030-23762-2_14

·

152

1

T. J. Eiler et al.

Introduction and Motivation

About 29% of German adults and 7.2% of adolescents between twelve and seventeen years are smokers, which makes nicotine the most consumed addictive substance, apart from alcohol. In Germany, it is estimated that 121.000 people die each year as a result of tobacco consumption, most of them because of cancer or cardiovascular diseases [7]. Only one out of four smokers manages to stop smoking for more than six months, which indicates an alarmingly high relapse rate [5,12]. Our studies aim is to find innovative methods that support the therapy of substance dependence diseases with the help of digital medicine [9], in this case virtual reality (VR) and mobile applications. This should not only improve the accessibility, but also the motivation of the patients and thus the success rate of therapies. We chose the Approach Avoidance Task (AAT, [25,35]) as the basis therapy procedure, as several studies have shown that AAT can not only be used to measure the cognitive bias, but also to modify it in order to counteract addiction [15]. This approach is known as cognitive bias modification (CBM). Cognitive biases are responsible for the selective processing of stimuli in the environment, and thus influence the emotions and motivation of the viewer [22]. However, the dropout rate of CBM is relatively high [2,21,27], which makes long-term successes rather mediocre. Based on these facts, we already implemented a first VR application [8], which was used to find out how effectively the AAT process can be transferred into virtual reality. Usability was examined, but also whether reasonable values could be achieved in the calculation of the cognitive bias. Based on the results and insights gained from this, we have developed a new demonstrator that offers new features, fixes several issues discovered during the previous study and should further contribute to increased effectiveness and motivation through immersion, body-ownership and game design elements.

2 2.1

Theoretical Background Dual Process Model of Addiction

Dual process models [6,13,38] assume that two qualitatively different mental processes mediate between input (stimuli) and output (behavior): Reflective processes, which result in conscious actions, and impulsive processes, which lead to automatic behavior. In case of addiction, these two processes, which are normally balanced, become imbalanced, whereby the impulsive information processing gains more dominance and influence on the behavior. Reflective processes therefore have a minor influence on addictive behavior, which means that control over it is increasingly lost. Addictive stimuli trigger a strong craving, which is succumbed to, usually resulting in an automatic approach and consumption of the substance.

Evaluation of Transferring the AAT into VR

153

This condition can be characterized by the development of cognitive biases [22]. One of them is the approach bias [20,37], which reflects automatic approach tendencies towards stimuli. This is particularly important in addiction disorders, as it is associated with both a psychological and a physical approach. One method, which can not only measure but also modify this, is the Approach Avoidance Task.

Fig. 1. Idea of the approach avoidance task [8]

2.2

Approach Avoidance Task

The Approach Avoidance Task (AAT) is a procedure developed by Solarz et al. [35] and has been further developed for therapeutic purposes by Rinck and Becker [25]. In this method, participants are shown images with a certain distinguishing feature on a computer screen, to which they should react as fast as possible using a joystick. For example, all images tilted to the left are pulled and become larger, and all images tilted to the right are pushed away and shrink in size (see Fig. 1). During training, every addiction-related stimulus must be pushed, while the positive/neutral ones will be pulled, by which the test persons should learn to avoid stimuli related to their addiction and approach the positive/neutral ones. For the bias measurement, however, 50% of the addiction-related images are pulled, and the other 50% are pushed. The same applies to the positive/neutral stimuli. In addition, the reaction times (RTs) for each stimulus will be measured and evaluated to determine the approach bias. The calculation for each stimulus type is as follows: Bias = M edianRT [P U SH] − M edianRT [P U LL]

(1)

A positive value implies an approaching behavior towards the stimulus type, whereas a negative value represents an averting behavior. People with an addictive disorder should therefore receive a higher value for addiction-related stimuli than people who are not affected.

154

T. J. Eiler et al.

2.3

Virtual Reality

Virtual Reality (VR) is understood as a computer-generated three-dimensional alternative reality in which the users can interact and move [32,33]. This immersive virtual environment (VE) can be used for many applications, e.g. in medical or military context, psychology, education, or entertainment [10]. Presence is understood as the feeling of actually being in the virtual world, of interacting in and with it, and of adapting one’s own behavior to it. While presence describes a psychological phenomenon, immersion expresses the feeling of being physically in another world than the real one [29]. Immersion can be achieved through visual, auditive, or haptic feedback and can increase the effectiveness of interactions inside the VE by leading the users to lose their sense of time, and allowing them to focus better on the application or given tasks [1,31]. Game design elements can further increase this effect, which means that users can be motivated to interact with the system in a more targeted manner [26]. Further factors that can contribute to immersion and cognitive absorption are embodiment and body-ownership, describing the feeling that artificial or virtual body parts belong to one’s own physical body [16,34]. Possible technologies for this are data gloves, or the Leap Motion infrared sensor [17]. The Leap Motion device consists of three infrared LEDs and two cameras, which track infrared light with a wavelength of 850 nm. The visual range is about 80 cm above the device, limited by LED light propagation through space. The signals captured by the binocular cameras are sent into the local memory via USB controller, and then streamed to the tracking software, where advanced algorithms are applied to the raw sensor data. After the images are analyzed, a three-dimensional representation of what the device sees is reconstructed, and the tracking layer matches the data to extract tracking information like fingers [4].

3

Related Work

Schroeder et al. [28] started a first attempt to transfer the CBM to VR. For this study, which was focused on the therapy of eating disorders, the aim was to measure how quickly the test subjects approach nutrients. The Oculus Rift DK2 Head-Mounted Display (HMD) and the Leap Motion sensor were used for the task, as the visible virtual hand should further increase body-ownership [34]. Once subjects have placed their preferred hand inside a predefined area and held their head oriented centrally for 1 s, an object appears on the virtual table in front of them. This stimulus shows either a nourishment or a ball, which should be rejected by a defensive hand movement, or grabbed and collected inside a box. The results showed that food objects, especially with increasing body mass index (BMI), were collected significantly faster than ball objects. In summary, Schroeder et al. came to the conclusion, that CBM training within a VE can be helpful for detecting and treating addiction disorders.

Evaluation of Transferring the AAT into VR

155

Further studies on the therapy of addiction disorders in VR were conducted by Lee et al. [18], and Girard et al. [11]. These studies investigated the cue exposure therapy (CET), which is designed to expose subjects to addictive stimuli until their tolerance towards them has increased to such an extent, that they no longer react automatically, but can consciously decide on their actions again [24]. Due to the findings, that three-dimensional stimuli produce higher craving than two-dimensional images or neutral cues within VR [3,19], both studies worked exclusively with such. While the test subjects found themselves in a virtual bar, in which numerous smoking-related stimuli are present during the experiment by Lee et al., cigarettes were to be found and destroyed, or balls collected in the medieval VE created by Gorini and colleagues. Both came to the conclusion that therapy in VR shows effect, and that the addictive behavior is reduced. In addition, embodiment further increases the effect.

4

Design and Implementation

Based on the preliminary work for the project [8], the demonstrator presented here represents an extension and improvement compared to our previously developed demonstrator. The requirements have remained largely the same. As with the Desktop-AAT, stimuli should have a distinguishing feature that is not too dominant, since the stimulus should still be recognizable despite fast automatic actions. In addition, RTs of the test persons must be measured as accurately as possible, ideally to the millisecond. Presence, immersion, and body-ownership should be as strong as possible, but not distracting. Game design elements have been incorporated to achieve increased motivation, thereby reducing the rate of discontinuation of therapy. 4.1

Hard- and Software

The demonstrator is designed for use with the HTC Vive HMD, which has a resolution of 2160 × 1200 pixels and a refresh rate of 90 Hz [14]. In addition, the Leap Motion infrared sensor [17] is used to transfer own hand movements into the VE, thereby enhancing body-ownership and presence. The Unity3D engine [36] was used for the implementation, therefore the C# language was used for programming the scripts. 4.2

Concept and Implementation

After starting the demonstrator, a start screen will be shown, which offers the possibilities to start the training, go into the settings menu, or to view information about credits and licensing. This user-interface is visible exclusively for the test leaders, as only they should be able to make changes. The test subject won’t be distracted, the VE will be completely unaffected. Within the settings menu (see Fig. 2), the test leader can edit the name for the current test subject, the number of stimuli that will be shown, choose a

156

T. J. Eiler et al.

mode (training or bias measurement), starting level, control type (Leap Motion or HTC VIVE controllers), if only default or additional custom models should be used, and if the “simplified object interaction” should be activated. The latter can be used to help the subjects performing automatic actions easier, as only a push of the stimulus in the desired direction is necessary. This function was implemented because grabbing, especially small objects, can be rather difficult while using the Leap Motion sensor. These options are, besides the possibilities to restart the level or to load the next/previous level, accessible at any time during the training via an in-game menu, only visible to the test leader.

Fig. 2. Settings menu

If controllers were selected as interaction method, these are represented in the VE as white androgynous hands, which have a gripping animation. When using the Leap Motion sensor, the following representations can be selected at run time: – – – –

capsule hands low poly hands male hands with forearm (white or skin texture) female hands with forearm (white or skin texture)

In both modes 50% of the stimuli are pushed, and 50% are pulled. In “training” mode, all stimuli pushed away are smoke-related and all stimuli pulled are neutral/positive. In the “bias” mode, on the other hand, half of the smoke-related stimuli are pulled and half of the neutral/positive ones pushed, while the other half is treated as in training. After starting the first level, the test subject will stand inside an office room with a table in the middle. A cardboard box stands in front of it, a garbage bin behind it. On the table is an instruction how to start, the illustration depends on the selected control mode. If the controllers are used, the thumb-stick must

Evaluation of Transferring the AAT into VR

157

be pressed, if the Leap Motion sensor is active, a thumbs-up gesture must be made. There is also a clipboard on the right edge of the table, where it can be seen at any time which edge color is to be sorted into which container. Sounds of nature can be heard in the background as an ambient soundtrack. Once the training is started, a stimulus, which has either a red or blue border color, accompanied by a particle effect, appears in the middle of the table. Red objects have to be sorted into the garbage bin (PUSH), blue objects into the box (PULL, see Fig. 3). In this way, the arm movements required for the AAT procedure are maintained in three-dimensional space. If the stimulus is sorted incorrectly, the lighting inside the room turns red, and a negative sound is played back to give the test person feedback about the error. If, on the other hand, the stimulus is sorted correctly, a positive tone sounds, and one second later the next stimulus appears. If there was an error before, the light returns to its natural color.

Fig. 3. State right after starting the training with a thumbs-up gesture

RTs are recorded twice for each stimulus: The first time recording shows the elapsed time between the stimulus appearance and the first contact by the subject, the second recording the time required to handle the object correctly. To achieve a time measurement as accurate as possible, threads, and the Stopwatch class [23] were used. RTs are stored in an external .csv-file, which contains the times (in milliseconds), the test person’s name, name and border color of each shown stimulus, and a note indicating whether it was handled incorrectly, as incorrect runs are excluded from the bias analysis.

5 5.1

Experimental Setup Participants and Design

Twenty-five participants (seven females and eighteen males; mean age: 29.6 years, range: 22–44) took part in an evaluation, which aim was to find out

158

T. J. Eiler et al.

how good presence, involvement, realism, and the possibility to act are. Seven participants used the Leap Motion sensor without (LM−) and eight of them with “simplified object interaction” (LM+), ten test persons used the HTC Vive controllers (C) to interact within the VE. 4% have never used VR, 96% have used VR at least once, 37.5% of those use it regularly. Each participant was shown twenty stimuli in training mode, to which they should react according to their border color. No instructions about the control were given, only the AAT principle was known to the participants. All further information should be taken from the instructions within the VE. After the test persons were introduced to the framework plot and AAT procedure, they signed a declaration of consent and have been assigned to a group. After the participants put on the HTC Vive HMD, the application was started. Afterwards, all subjects were asked to complete a questionnaire, containing twenty-six questions, to determine how they perceived the program. Each question should be answered on a scale from −3 (fully disagree) to +3 (fully agree). This questionnaire combined the “Igroup Presence Questionnaire” (IPQ, [29,30]), and the “Presence Questionnaire” by Witmer and Singer [39]. In order to find out if side effects occurred during, or shortly after the run, and whether the instructions within the VE were sufficient, two additional questions were added. 5.2

Results

Results of the questionnaire (see Fig. 4) and some observations will be presented in the following: Overall, the general presence received good ratings (M = 1.75, SD = 0.65), whereby LM− had the best rating (M = 1.86, SD = 0.7). Spatial presence was evaluated quite balanced (C: M = 2.04, SD = 1.0; LM−: M = 2.14, SD = 0.9; LM+: M = 2.03, SD = 0.9), whereby also in this case LM− achieved the highest rating. This is probably due to the increased embodiment. In addition, compared to LM+, all stimuli that were interacted with had realistic physics and reacted to each interaction accordingly. Involvement, meaning to what extent the real world could be ignored, was judged rather moderately (M = −0.12, SD = 1.37), which may be because there were other work groups present in the test room. Interestingly, LM- performed much better here, despite the same conditions. This could also be due to the increased embodiment, and thus higher immersion and cognitive absorption. The acoustic aspects of the environment were perceived rather mediocre as well (M = 1.0, SD = 1.96), which might have the same reasons. Concerning experienced realism, a positive trend is clearly visible (M = 1.14, SD = 1.24). Again, the valuations of the three variants were quite balanced. The Quality of Interface was positively rated as well (M = 1.13, SD = 1.88). However, it should be noted that the controller performed significantly worse than the other two variants in the observed delay between action and reaction (C: M = 0.0, SD = 2.45; LM−: M = 0.14, SD = 2.12; LM+: M = 0.5, SD = 2.2), and in the perceived impairment in the performance of the task (C: M = 0.0,

Evaluation of Transferring the AAT into VR

159

SD = 1.63; LM−: M = 1.14, SD = 1.77; LM+: M = 1.87, SD = 1.13), whereby 0 means moderate, and a positive value less delay or distraction. Regarding the possibility to act, LM+ achieved the highest score (M = 2.08, SD = 0.78), compared to LM− (M = 1.38, SD = 1.16) and C (M = 1.13, SD = 1.07). As grasping small objects with the Leap Motion sensor is often problematic, this is probably the reason why LM+ got a better rating concerning the possibility to act and quality of interface. Since stimuli only needed to be pushed in order to be sorted, there were significantly fewer problems when interacting with the object and fulfilling the task. This reduced the error rate, and improved the captured RTs as well. Only one test person complained about side effects (mild nausea), the existing instructions were considered sufficient by most, but by no means all of the participants (M = 1.96, SD = 1.31).

Fig. 4. This graphic shows a summary of the average points given for each interaction variant (Controllers (C), Leap Motion sensor without (LM−) and with (LM+) activated “simplified object interaction”) in the different categories, as well as the total average

6

Conclusion and Future Work

Compared to the previous demonstrator, this one has received a more positive response from the testers. Presence is generally better pronounced, which will be partly due to improved graphics and a now present ambient background soundscape. Since the rating for presence in both Leap Motion variants is not too far apart, LM+ should be the preferred variant for RT measurements, as more precise values can be calculated for the cognitive bias.

160

T. J. Eiler et al.

We must find out how involvement and experienced realism can be increased. In future evaluations, care must be taken to ensure complete silence in the laboratory, or that noise-reducing headphones will be used. It may also be helpful to enlarge the VR area, so that participants can move around the entire office room. A further improvement of the graphics, e.g. through more realistic models or post processing, could also contribute to this. Other features to be implemented in the future include further improvements in the acquisition of RTs, especially with respect to the initial hand movement towards stimuli. This should be followed by a study to check whether reasonable RTs can be recorded and cognitive bias values calculated with this demonstrator as well. Furthermore, a whole-body motion (step forward or backward with simultaneous approaching or deflecting motion of the arms) is currently being implemented as an interaction option to compare it with the other variants. This can be extended to full-body tracking, and then connected to a 3D avatar to further increase immersion and embodiment. Additionally, new levels will be implemented, e.g. a pub where stimuli will be offered to the test subjects, and gamification elements, like a score or achievements, will be added. In the first quarter of 2019, a large-scale study will be started, which will use the demonstrator presented here. For this study, we are currently recruiting smokers who are willing to quit smoking and who consume at least five cigarettes a day. They will be accompanied over eight sessions, of which six will include VRAAT training. A placebo level is currently being developed, in which the stimuli are to be sorted into containers located to the left and right of the stimulus. At the first and last session, the cognitive bias will be measured, to determine whether there has been an improvement in dependence behavior over the course of our study. Besides that, an associated project partner, a regional clinic, will be responsible for the application of the VR software and the accompanying mobile app in social practice. The clinic has a high self-interest in expanding its competences in the field of addiction therapy by integrating the project solution into its own service spectrum. In addition, it must be examined whether this can achieve a further broad effect in patient care.

References 1. Agarwal, R., Karahanna, E.: Time flies when you’re having fun: cognitive absorption and beliefs about information technology usage. MIS Q. 24(4), 665 (2000). https://doi.org/10.2307/3250951 2. Beard, C., Weisberg, R.B., Primack, J.: Socially anxious primary care patients’ attitudes toward cognitive bias modification (CBM): a qualitative study. Behav. Cogn. Psychother. 40(5), 618–633 (2012). https://doi.org/10.1017/ S1352465811000671 3. Bordnick, P.S., Graap, K.M., Copp, H., Brooks, J., Ferrer, M., Logue, B.: Utilizing virtual reality to standardize nicotine craving research: a pilot study. Addict. Behav. 29(9), 1889–1894 (2004). https://doi.org/10.1016/j.addbeh.2004.06.008

Evaluation of Transferring the AAT into VR

161

4. Colgan, A.: How does the leap motion controller work? http://blog.leapmotion. com/hardware-to-software-how-does-the-leap-motion-controller-work/ (2014) 5. Cummings, K.M., Hyland, A.: Impact of nicotine replacement therapy on smoking behavior. Ann. Rev. Public Health 26, 583–599 (2005). https://doi.org/10.1146/ annurev.publhealth.26.021304.144501 6. Deutsch, R., Strack, F.: Reflective and impulsive determinants of addictive behavior. In: Wiers, R.W.H.J., Stacy, A.W. (eds.) Handbook of Implicit Cognition and Addiction, pp. 45–57. Sage Publications, Thousand Oaks, California (2006). https://doi.org/10.4135/9781412976237 7. Donath, C.: Drogen- und suchtbericht. https://www.drogenbeauftragte.de/ fileadmin/dateien-dba/Drogenbeauftragte/Drogen und Suchtbericht/pdf/DSB2018.pdf (2018). (in German) 8. Eiler, T.J., Gr¨ unewald, A., Br¨ uck, R.: Fighting substance dependency combining AAT therapy and virtual reality with game design elements. In: Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications-Volume 2: HUCAPP, pp. 28–37. SciTePress INSTICC (2019). ISBN: 978-989-758-354-4. https://doi.org/10.5220/ 0007362100280037 9. Elenko, E., Underwood, L., Zohar, D.: Defining digital medicine. Nat. Biotechnol. 33(5), 456–461 (2015). https://doi.org/10.1038/nbt.3222 10. Giraldi, G., Silva, R., Oliveira, J.: Introduction to virtual reality. LNCC Res. Rep. 6 (2003) 11. Girard, B., Turcotte, V., Bouchard, S., Girard, B.: Crushing virtual cigarettes reduces tobacco addiction and treatment discontinuation. Cyberpsychol. Behav. Impact Internet Multimed. Virtual Real. Behav. Soc. 12(5), 477–483 (2009). https://doi.org/10.1089/cpb.2009.0118 12. Hajek, P., Stead, L.F., West, R., Jarvis, M., Hartmann-Boyce, J., Lancaster, T.: Relapse prevention interventions for smoking cessation. Cochrane Database Syst. Rev. 8, CD003,999 (2013). https://doi.org/10.1002/14651858.CD003999.pub4 13. Hofmann, W., Friese, M., Strack, F.: Impulse and self-control from a dual-systems perspective. Perspect. Psychol. Sci. J. Assoc. Psychol. Sci. 4(2), 162–176 (2009). https://doi.org/10.1111/j.1745-6924.2009.01116.x 14. HTC Corporation: ViveTM — vive virtual reality system. https://www.vive.com/ us/product/vive-virtual-reality-system/ (2017) 15. Kakoschke, N., Kemps, E., Tiggemann, M.: Approach bias modification training and consumption: a review of the literature. Addict. Behav. 64, 21–28 (2017). https://doi.org/10.1016/j.addbeh.2016.08.007 16. Kilteni, K., Groten, R., Slater, M.: The sense of embodiment in virtual reality. Presence Teleoperators Virtual Environ. 21(4), 373–387 (2012) 17. Leap Motion: Leap motion: Reach into virtual reality with your bare hands. https://www.leapmotion.com (2018) 18. Lee, J., Lim, Y., Graham, S.J., Kim, G., Wiederhold, B.K., Wiederhold, M.D., Kim, I.Y., Kim, S.I.: Nicotine craving and cue exposure therapy by using virtual environments. Cyberpsychol. Behav. Impact Internet Multimed. Virtual Real. Behav. Soc. 7(6), 705–713 (2004). https://doi.org/10.1089/cpb.2004.7.705 19. Lee, J.H., Ku, J., Kim, K., Kim, B., Kim, I.Y., Yang, B.H., Kim, S.H., Wiederhold, B.K., Wiederhold, M.D., Park, D.W., Lim, Y., Kim, S.I.: Experimental application of virtual reality for nicotine craving through cue exposure. Cyberpsychol. Behav. Impact Internet Multimed. Virtual Real. Behav. Soc. 6(3), 275–280 (2003). https://doi.org/10.1089/109493103322011560

162

T. J. Eiler et al.

20. Machulska, A., Zlomuzica, A., Adolph, D., Rinck, M., Margraf, J.: A cigarette a day keeps the goodies away: smokers show automatic approach tendencies for smoking– but not for food-related stimuli. PloS One 10(2), e0116, 464 (2015). https://doi. org/10.1371/journal.pone.0116464 21. Machulska, A., Zlomuzica, A., Rinck, M., Assion, H.J., Margraf, J.: Approach bias modification in inpatient psychiatric smokers. J. Psychiatr. Res. 76, 44–51 (2016). https://doi.org/10.1016/j.jpsychires.2015.11.015 22. MacLeod, C., Mathews, A.: Cognitive bias modification approaches to anxiety. Ann. Rev. Clin. Psychol. 8, 189–217 (2012). https://doi.org/10.1146/annurevclinpsy-032511-143052 23. Microsoft: Stopwatch class. https://docs.microsoft.com/de-de/dotnet/api/system. diagnostics.stopwatch?view=netframework-4.7.2 (2018) 24. Murphy, K.: Cue exposure therapy: What the future holds. https://www.rehabs. com/pro-talk-articles/cue-exposure-therapy-what-the-future-holds/ (2014) 25. Rinck, M., Becker, E.S.: Approach and avoidance in fear of spiders. J. Behav. Ther. Exp. Psychiatry 38(2), 105–120 (2007). https://doi.org/10.1016/j.jbtep.2006.10. 001 26. Sailer, M., Hense, J.U., Mayr, S.K., Mandl, H.: How gamification motivates: an experimental study of the effects of specific game design elements on psychological need satisfaction. Comput. Hum. Behav 69, 371–380 (2017). https://doi.org/10. 1016/j.chb.2016.12.033 27. Schoenmakers, T.M., de Bruin, M., Lux, I.F.M., Goertz, A.G., van Kerkhof, D.H.A.T., Wiers, R.W.: Clinical effectiveness of attentional bias modification training in abstinent alcoholic patients. Drug Alcohol Depend. 109(1–3), 30–36 (2010). https://doi.org/10.1016/j.drugalcdep.2009.11.022 28. Schroeder, P.A., Lohmann, J., Butz, M.V., Plewnia, C.: Behavioral bias for food reflected in hand movements: a preliminary study with healthy subjects. Cyberpsychol. Behav. Soc. Netw. 19(2), 120–126 (2016). https://doi.org/10.1089/cyber. 2015.0311 29. Schubert, T., Friedmann, F., Regenbrecht, H.: Embodied presence in virtual environments. In: Paton, R., Neilson, I. (eds.) Visual Representations and Interpretations, pp. 269–278. Springer, London and s.l. (1999) 30. Schubert, T., Friedmann, F., Regenbrecht, H.: The experience of presence: factor analytic insights. Presence Teleoperators Virtual Environ. 10(3), 266–281 (2001). https://doi.org/10.1162/105474601300343603 31. Schultze, U.: Embodiment and presence in virtual worlds: a review. J. Inf. Technol. 25(4), 434–449 (2010). https://doi.org/10.1057/jit.2010.25 32. Sherman, W.R., Craig, A.B.: Understanding Virtual Reality: Interface, Application, and Design. Morgan Kaufmann Series in Computer Graphics and Geometric Modeling. Morgan Kaufmann, San Francisco, CA (2003) 33. Simpson, R.M., LaViola, J.J., Laidlaw, D.H., Forsberg, A.S., van Dam, A.: Immersive vr for scientific visualization: a progress report. IEEE Comput. Graph. Appl. 20(6), 26–52 (2000). https://doi.org/10.1109/38.888006 34. Slater, M., Perez-Marcos, D., Ehrsson, H.H., Sanchez-Vives, M.V.: Inducing illusory ownership of a virtual body. Front. Neurosci. 3(2), 214–220 (2009). https:// doi.org/10.3389/neuro.01.029.2009 35. Solarz, A.: J. Exp. Psychol. 59(4), 239–245 (1960). https://doi.org/10.1037/ h0047274 36. Unity Technologies: Unity. https://unity3d.com/ (2018)

Evaluation of Transferring the AAT into VR

163

37. Wiers, C.E., K¨ uhn, S., Javadi, A.H., Korucuoglu, O., Wiers, R.W., Walter, H., Gallinat, J., Bermpohl, F.: Automatic approach bias towards smoking cues is present in smokers but not in ex-smokers. Psychopharmacology 229(1), 187–197 (2013). https://doi.org/10.1007/s00213-013-3098-5 38. Wiers, R.W., Rinck, M., Dictus, M., van den Wildenberg, E.: Relatively strong automatic appetitive action-tendencies in male carriers of the OPRM1 G-allele. Genes Brain Behav. 8(1), 101–106 (2009). https://doi.org/10.1111/j.1601-183X. 2008.00454.x 39. Witmer, B.G., Singer, M.J.: Measuring presence in virtual environments: a presence questionnaire. Presence Teleoperators Virtual Environ. 7(3), 225–240 (1998). https://doi.org/10.1162/105474698565686

Data Mining Tools and Methods in Medical Applications

Convolutional Neural Networks in Speech Emotion Recognition – Time-Domain and Spectrogram-Based Approach Bartlomiej Stasiak(B) , Slawomir Opalka, Dominik Szajerman, and Adam Wojciechowski Institute of Information Technology, L ´ od´z University of Technology, ul. W´ olcza´ nska 215, 93–005 L ´ od´z, Poland {bartlomiej.stasiak,dominik.szajerman,adam.wojciechowski}@p.lodz.pl, [email protected]

Abstract. In this work a convolutional neural network is applied for classification of emotional speech. Two significantly different approaches to speech signal pre-processing are compared: traditional, based on frequency spectrum and time domain-based. In the first case, a mel-scale spectrogram of the sound signal is computed and used as a 2-dimensional input for the network, similarly as in image recognition tasks. In the second approach, raw sound signal in time domain is fed to the network. Despite the radically different form and content of the input data, the neural architecture is similar, with 2D convolutional layers in the first approach and 1D convolutional layers in the second one, and also identical fully-connected output layers in both approaches. We put emphasis to use practically the same number of trainable parameters in both networks, as well as the same size of input signal snippets used for training. The obtained results show that, under this setting, the frequency-based approach offers very little advantage over direct application of the raw sound signal. In both cases, the total accuracy of whole-file classification exceeded 93% for a dataset with three emotion types. Keywords: Deep learning · Mel-frequency filter bank Emotional speech · Emo-DB

1

·

Introduction

Recent development of deep learning tools and techniques is paving the way for changes in how artificial intelligence (AI) influences our everyday lives. Deep learning methods have already introduced a new quality in information technology, with numerous applications in a broad range of areas, including i.a. natural language processing (NLP), human-computer interaction (HCI), autonomous car control, automated trading, entertainment, virtual reality (VR), and medical diagnostics. In medicine, deep neural networks (DNNs) allow doctors to make more accurate and faster diagnoses as they basically can analyze much more c Springer Nature Switzerland AG 2019 E. Pietka et al. (Eds.): ITIB 2019, AISC 1011, pp. 167–178, 2019. https://doi.org/10.1007/978-3-030-23762-2_15

168

B. Stasiak et al.

data than a human specialist. They prevent diseases by predicting their risks and make the treatment more individual. They enable researchers to study the genetic background of various medical conditions. Their ability to accurately analyze medical signals and images is invaluable, as they are able to find both linear and nonlinear relationships between input and output data, owing to the multi-layered neural architecture and appropriate optimization techniques. One of the key changes introduced by the modern approach to machine learning is a significant shift in input data preparation methods. More and more often, carefully selected hand-crafted features are replaced by raw data fed directly to the DNN input. We might say, that DNNs learn from input data similarly to the brains of children who develop their knowledge about the environment by experiencing it just as it is. The inherent redundancy of many typical signals, such as natural images, is no longer a headache for data scientists armed with convolutional neural network (CNN) paradigm and powerful GPU accelerators [1–4]. In this work we question yet another well-established view on input data preparation, regarding specifically the sound signal. As confirmed by decadeslong practice, spectral representation of a sound is usually definitely superior to raw time domain sequence-of-samples form in most problems involving signal content analysis, description and classification. This view has also been adopted for deep learning, where sound signal is often represented by its spectrogram, processed by convolutional neural layers just like an ordinary image. However, an alternative approach is also possible – instead of two-dimensional convolution of the spectrogram images, the direct, time-domain input signal may be processed by one-dimensional convolutional layers constituting the initial part of the neural network architecture. Inspired by some recent works [5], demonstrating effectiveness of the latter approach, we decided to perform comparison with the more traditional, i.e. spectral one in the task of speech emotion recognition. The remaining part of the paper is structured as follows: in the next section we present the recent research in the field of emotional speech analysis with deep learning tools, then we present our solution and describe in detail the two compared architectures; the last two sections contain experimental result, discussion and conclusion.

2

Previous Work

Speech emotion recognition (SER) has been studied for several decades as a specific task of speech analysis complementary to automatic speech recognition (ASR). The research in this area differs from ASR i.a. in specific needs with regard to data collection, as the speech samples must explicitly express emotions (usually 2–7 emotions), which may be natural, elicited or simulated by professional actors [6–10]. Emotion recognition in a speech signal often involves analysis of prosodic elements (energy, fundamental frequency, speech rate), vocal tract parameters (formant frequencies), and sequences of some general short-time spectral characteristics, e.g. Mel-frequency cepstral coefficients (MFCC) [11,12] or log-frequency power coefficients (LFPCs) [6]. Basically, many of the applied

Convolutional Neural Networks in Speech Emotion Recognition

169

tools and parameterization methods are otherwise quite similar to those used in ASR solutions. Similarly, the increased interest in deep learning, evident throughout the last decade, is also currently reflected in SER research, starting from the influential works on long short-memory networks (LSTM) and restricted Boltzmann machine (RBM) application [13,14]. Apart from the LSTM networks, convolutional neural networks (CNN) are often applied in SER research to process speech signal spectrograms [15–18]. In [15] two separate CNNs are applied to extract features from speech spectrograms and text data, respectively. The feature vectors are then used to build an attention matrix, processed further by a third CNN to obtain the predicted emotion. A detailed study on the CNN properties (the kernel sizes in time and frequency domains) is presented in [17], where a multi-layer convolutional network is compared to several classifiers with fixed feature vectors (mel-frequency cepstral coefficients, MFCC), showing the superiority of the CNN with rectangular kernels. Convolutional neural networks are often used as a part of a bigger emotion classification system, responsible for extraction of features which are then processed by LSTM network playing the role of the classifier [19,20]. Acknowledging the importance and the influence of convolutional neural networks on the speech emotion recognition research area, in this work we present a comparative analysis of two distinct approaches to CNN application in SER. In order to evaluate the influence of the input data form: spectrogram-based [17,18] vs time domain [5] we refrain from using a more complex architecture, leaving the classification task to fully connected output layers of the network, as described in the next section.

3

Data Preprocessing and Neural Architecture

The source material used for our tests and analyses was taken from the wellknown Emo-DB dataset [7]. It is composed of recordings of 10 sentences in German, repeatedly pronounced several times with different emotional expression by 10 actors: five males and five females. Each sentence is recorded in strictly controlled acoustic conditions (anechoic chamber) with sampling frequency 16 kHz (16 bits per sample, mono). In our data preprocessing procedure we cut each recording, i.e. a single sentence with a given emotion to recognize, into short segments, each of which is an independent training example. This is preceded by removal of silence from the beginning and the end of the recording. We do not however repeat the silence detection after cutting, so some segments may actually correspond to pauses between words or the breath sound. After training, the testing procedure is also based on identically prepared short segments, but apart from reporting the recognition rates for each segment separately, we also collect all the segments from a single recording and decide on the final emotion class of this recording via the majority rule. The actual steps of the speech signal preprocessing procedure differ between both compared approaches, as detailed below.

170

3.1

B. Stasiak et al.

Frequency Domain Approach (2D)

In this approach the sound signal is represented by its spectrogram defined as a sequence of magnitude spectra computed for consecutive frames of N samples each. We decided to set N = 512 which, considering the sampling frequency fs = 16 kHz, yielded spectral resolution of 31.25 Hz. Relatively small hop-size of 128 samples, resulting in 3/4 overlap between frames, was used to guarantee high time resolution. With these parameter values, five consecutive frames covered 1024 samples of the raw input signal (64 ms) and this was exactly the scope of a single segment being a training example for the neural network, as shown in Fig. 1. Each of the five consecutive frames was windowed with von Hann window and transformed to the frequency domain with a discrete Fourier transform. Afterwards, a mel-scale logarithmic frequency mapping was done to emphasize the low-frequency part of the spectrum. We used 205 filters spanning the range up to 8kHz. In this way a single training example was a spectrogram fragment represented by an image of size 205 × 5 (frequency × time; 1025 pixels in total) directly fed – after normalization – to the input of the first convolutional layer (Fig. 2).

Fig. 1. Input signal preprocessing (2D approach): 5 frames overlaid on the input signal

The neural network comprised six convolutional layers followed by three fully connected (dense) layers, as presented in Table 1. Rectified linear units (ReLU) were used as activation functions in all convolutional and dense layers. The only exception was the last layer with a softmax activation function, suitable for the classification task. After every second convolutional layer an average pooling layer was used. We also applied several dropout layers, with dropout probability of 0.1 and 0.2 (not shown in Table 1) in order to increase robustness and to limit generalization error of the network.

Convolutional Neural Networks in Speech Emotion Recognition

171

Fig. 2. Input signal preprocessing (2D approach): the spectrogram (single input for the neural network marked with the black rectangle)

3.2

Time Domain Approach (1D)

For this approach exactly the same segment length was applied, but apart from normalization, no other preprocessing was done and the segments were directly fed to the network input. To be more precise, each segment comprised 1024 consecutive sound signal samples, with overlap of 384 samples (hop-size: 640 samples), which – as we should stress – strictly corresponds to the five overlapped frames used to compute a single spectrogram fragment in the frequency domain approach (2D) described above. In this way, despite the completely different form of the neural network input, the source data size was kept identical, both in terms of the range of samples in time domain and the number of elements in the neural network input.

172

B. Stasiak et al. Table 1. Neural architecture (2D)

Layer type

Output shape Neurons Kernel size Num freq × time × channels (pool size) weights

Input

205 × 5 × 1

–

–

–

Conv2D

201 × 3 × 32

32

5×3

512

Conv2D

195 × 1 × 32

32

7×3

21536

Average pooling2D

65 × 1 × 32

–

(3 × 1)

–

Conv2D

59 × 1 × 32

32

7×1

7200

Conv2D

51 × 1 × 32

32

9×1

9248

Average pooling2D

17 × 1 × 32

–

(3 × 1)

–

Conv2D

11 × 1 × 16

16

7×1

3600

Conv2D

5 × 1 × 16

16

7×1

1808

Global average pooling2D 16

–

(5 × 1)

–

Dense

480

480

–

8160

Dense

240

240

–

115440

Dense

3

3

–

723

The neural architecture used in this case was based on 1-dimensional convolutional layers, as shown in Table 2. It is worth noting that both neural architectures, although different due to the input data form, exhibit some important similarities, including the same number of layers of corresponding types, and a very similar total number of neural weights to adapt: 168,227 and 168,483 in the 2D and 1D case, respectively. We used the same activation function types (ReLU in all but one layers and softmax in the last one) and the same number and locations of dropout layers with the same dropout probabilities. Most importantly, the fully-connected part (the last three layers) is identical in both networks, sharing also the same number of input values (16) received from the preceding, convolutional part.

4

Experimental Validation

Random weight initialization and arbitrary training set selection are among the main sources of unrepeatability and subjectiveness in machine learning research. In order to limit these factors, we decided to apply stratified cross-validation scheme with multiple repetitions of the training process with the same data, on the cost of some reduction of the number of recordings included into the study. Out of the seven emotions available in the Emo-DB dataset, we selected three: Sadness, Neutral and Anger, represented by 62, 79, and 127 recordings, respectively. These 268 recordings were split into four parts with roughly the same proportions of individual emotions (16/20/32, 16/20/32, 15/20/32 and 15/19/31 recordings in each part, respectively). In every cross-validation fold,

Convolutional Neural Networks in Speech Emotion Recognition

173

Table 2. Neural architecture (1D) Layer type

Output shape Neurons Kernel size Num time × channels (pool size) weights

Input

1024 × 1

–

–

–

Conv1D

1018 × 32

32

7

256

Conv1D

1012 × 32

32

7

7200

Average pooling1D

506 × 32

–

(2)

–

Conv1D

494 × 32

32

13

13344

Conv1D

482 × 32

32

13

13344

Average pooling1D

241 × 32

–

(2)

–

Conv1D

229 × 16

16

13

6672

Conv1D

217 × 16

16

13

3344

Global average pooling1D 16

–

(217)

–

Dense

480

480

–

8160

Dense

240

240

–

115440

Dense

3

3

–

723

one part was treated as the testing set, while the remaining three were used for training. Each of the four folds was repeated 10 times with the same training/testing data and the results were summed over all the folds and averaged over all the repetitions. The recordings were split into short segments (17861 in total), as detailed in the previous section. In every fold the segments from the part used for testing were mapped to the respective recordings (in order to allow for the implementation of the majority rule for voting, as mentioned above), while the training segments were all mixed, i.e. set in random order – different in every training epoch. Naturally, in each of the four parts complete recordings were included, so that no segment from any testing recording was ever used for training within a given fold. The speaker-to-fold assignment was less strict, although the overlap was very low as shown in Table 3. In order to control the generalization error, some training segments (the last 25% of the training set) were set always aside and used as a validation set during training. For building and training both networks we used Keras API with TensorFlow backend engine. Adam RMSprop optimization algorithm with Nesterov momentum (Nadam) and batch size equal to 21 was used to minimize categorical cross-entropy between the network output and the target. The target vectors encoded the true class of the input segment in a standard way, by three-element vectors: [0, 0, 1], [0, 1, 0] and [1, 0, 0] expected at the network output for Sadness, Neutral and Anger, respectively. A fixed number of training epochs (40)

174

B. Stasiak et al. Table 3. Individual speaker contribution

Speaker ID [7]

Part 1

Part 2

Part 3

Part 4

Angr. Neutr. Sad Angr. Neutr. Sad Angr. Neutr. Sad Angr. Neutr. Sad 03

14

11

7

0

0

0

0

0

0

0

0

0

08

12

9

9

0

1

0

0

0

0

0

0

0

09

6

0

0

7

9

4

0

0

0

0

0

0

10

0

0

0

10

4

3

0

0

0

0

0

0

11

0

0

0

11

6

7

0

3

0

0

0

0

12

0

0

0

4

0

2

8

4

2

0

0

0

13

0

0

0

0

0

0

12

9

5

0

0

0

14

0

0

0

0

0

0

12

4

8

4

3

2

15

0

0

0

0

0

0

0

0

0

13

11

4

16

0

0

0

0

0

0

0

0

0

14

5

9

was used, but the accuracy1 obtained on the validation set was recorded after every epoch, and the best neural weights (maximizing this accuracy) were used in the testing phase. 4.1

Results

The results with respect to individual segments from the testing set are presented in Tables 4 and 5 for the spectral and time-domain approach respectively. The next two tables (Tables 6 and 7) show the results of assigning the class label to whole testing recordings on the basis of classification of their respective segments. In each table, all four cross-validation folds contribute to the presented figures, which means that the row sums correspond exactly to the number of recordings in each class. The non-integer values in the confusion matrices result from averaging over all ten repetition for each of the folds. As for the execution time, in both approaches a single epoch lasted ca 5 s (4.91 s in the 2D case and 5.03 s in the 1D case), which yields ca 3’20” for one training session. Table 4. Segment-based confusion matrix – spectral approach (2D) True class

Predicted class Anger Neutral Sadness

Anger

7089.60 329.90

Neutral

396.80

3125.20 807.00

51.50

Sadness

56.10

816.00

5188.90

Total segment accuracy 15403.70/17861 = 86.19% 1

Defined as the proportion of correctly classified segments to the number of all the segments.

Convolutional Neural Networks in Speech Emotion Recognition

4.2

175

Discussion

As may be observed, both approaches yielded very similar result (slightly better in the spectral case). The segment-based and whole-file results are coherent, with Anger class being the simplest to recognize, while the two other classes are more prone to be confused with each other. The whole-file classification result is naturally significantly higher than in the segment-based case. Closer inspection of individual training repetitions revealed that in some cases 100% successful classification of the recordings was achieved (2 times for the 3rd fold in the 1D approach and 5 times for the 1st fold in the 2D approach). The overall result significantly exceeding 90% may be deemed very good if we consider that no long-term information was extracted from the analyzed speech signal. Emotional speech is strongly related to prosodic elements such as variability of the fundamental frequency or energy contour observed in the course of the whole sentence. Using only the short (64 ms) segments as complete trainTable 5. Segment-based confusion matrix – time-domain approach (1D) True class

Predicted class Anger Neutral Sadness

Anger

7046.50 302.90

Neutral

430.60

2823.10 1075.30

121.60

Sadness

99.10

799.60

5162.30

Total segment accuracy 15031.90/17861 = 84.15% Table 6. Whole-file confusion matrix – spectral approach (2D) True class

Predicted class Anger Neutral Sadness

Anger

125.80 1.20

0.00

Neutral

3.30

69.00

6.70

Sadness

0.00

3.50

58.50

Total whole-file accuracy 253.30/268 = 94.46% Table 7. Whole-file confusion matrix – time-domain approach (1D) True class

Predicted class Anger Neutral Sadness

Anger

125.10 1.60

0.30

Neutral

1.00

65.10

12.90

Sadness

0.00

2.50

59.50

Total whole-file accuracy 249.70/268 = 93.12%

176

B. Stasiak et al.

ing examples probably impedes the classification significantly. This limitation was imposed deliberately, as were some assumptions regarding the spectrogram parameters we used. Let us stress that both time and frequency resolution (or – more precisely – the number of filters in the logarithmically transformed frequency domain and the time-domain segment overlap) could have been probably set much lower, while the time span of a single input “spectrogram image” could have been bigger. Our settings implied significant redundancy, with the number of elements in a single spectrogram fragment corresponding to the number of input samples of the signal used to generate this fragment (1025 vs 1024). However, our goal was to allow for a fair comparison of the influence of the input data representation on the classification capabilities of the network. The input size in the 2D approach was determined by the assumptions taken for the time-domain approach. In fact we have also performed a test with even shorter segments in the time domain (320 samples), similarly as in [5] where the authors used the same database (Emo-DB) and the same three emotions for their experiments. We applied our 1D network architecture, shown in Table 2, obtaining segment-based and whole-file accuracy of 79.39% and 90.74% respectively (Tables 8 and 9). Table 8. Segment-based confusion matrix – short segments (320 samples, 1D) True class

Predicted class Anger Neutral Sadness

Anger

13978.10 689.70

Neutral

1225.70

4665.10 2905.20

485.20

Sadness

453.80

1695.60 10074.60

Total segment accuracy 28717.80/36173 = 79.39%

Table 9. Whole-file confusion matrix – short segments (320 samples, 1D) True class

Predicted class Anger Neutral Sadness

Anger

125.50 1.00

0.50

Neutral

1.90

57.90

19.20

Sadness

0.00

2.00

60.00

Total whole-file accuracy 243.40/268 = 90.74%

These results are a few percent lower than those obtained for the 1024-long segments, but still satisfactory, considering the degree of input size reduction (from 64 ms down to 20 ms per training example). The authors of [5] report

Convolutional Neural Networks in Speech Emotion Recognition

177

very similar segment-based accuracy of 77.51% with significantly bigger wholefile accuracy of 96.97%. The latter outcome may, however, result from a serendipitous distribution of the segments in the testing recordings, as the testing set in [5] was fixed (no crossvalidation) and relatively small (33 recordings). It is worth noting here that in our test, the individual results for each of the crossvalidation folds (each averaged over 10 repetitions) were 97.06%, 88.68%, 94.78% and 82.46%, respectively, which demonstrates quite significant variability of the classification difficulty among individual recordings and their subsets.

5

Conclusion and Future Works

In this paper a study regarding the feasibility of raw time-domain sound signal classification by a convolutional neural network was presented. The obtained results were very close to those obtained with a more traditional frequencydomain signal representation, assuming a similar structure and complexity of the neural network and the size of the input training examples. This outcome may be attributed to the potential of convolutional neural networks and gradient optimization methods, enabling effective processing of raw input data and extracting the significant information, without the explicit need of time-to-frequency transformation. In fact, this type of transformation is probably just learned by the initial convolutional layers of the 1D architecture in the training process, which leads to obtaining similar final results and – what is also important – does not require additional time and effort for manual preprocessing of the input signal. The presented results are obtained on the basis of short segments of input speech signal. Including more long-term information in the training and testing process will lead to improvement of the classification potential of the neural network, which is the research area we are going to explore in future works.

References 1. Dean, J., Patterson, D., Young, C.: A new golden age in computer architecture: empowering the machine-learning revolution. IEEE Micro 38(2) (2018) 2. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25, pp. 1097–1105. Curran Associates Inc. (2012) 3. Opalka, S., Stasiak, B., Szajerman, D., Wojciechowski, A.: Multi-Channel Convolutional Neural Networks Architecture Feeding for Effective EEG Mental Tasks Classification, Sensors 18(10), 3451 (2018) 4. Tarasiuk, P., Pryczek, M.: Geometric transformations embedded into convolutional neural networks. J. Appl. Comput. Sci. 24(3), 33–48 (2016) 5. Har´ ar, P., Burget, R., Dutta, M.K.: Speech emotion recognition with deep learning. In: Proceedings of 4th International Conference on Signal Processing and Integrated Networks (SPIN), pp. 137–140 (2017) 6. Ververidis, D., Kotropoulos, C.: Emotional speech recognition: resources, features, and methods. Speech Commun. 48(9), 1162–1181 (2006)

178

B. Stasiak et al.

7. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of German emotional speech. In: Proceedings of INTERSPEECH 2005, Lissabon, Portugal, pp. 1517–1520 (2005) 8. Uhrin, D., Partila, P., Frnda, J., Sevcik, L., Voznak, M., Lin, J.C.-W.: Design of emotion recognition system. In: Proceedings of the 2nd Czech-China Scientific Conference 2016, pp. 53–63 (2017) 9. Kolakowska A., Landowska A., Szwoch M., Szwoch W., Wr´ obel M.R.: Emotion recognition and its applications. In: Hippe, Z., Kulikowski, J., Mroczek, T., Wtorek, J. (eds.) Human-Computer Systems Interaction: Backgrounds and Applications 3. Advances in Intelligent Systems and Computing, vol. 300, pp. 51–62. Springer (2014) 10. Partila P., Voznak M.: Speech emotions recognition using 2-D neural classifier. In: Zelinka, I., Chen, G., R¨ ossler, O., Snasel, V., Abraham, A. (eds.) Nostradamus 2013: Prediction, Modeling and Analysis of Complex Systems. Advances in Intelligent Systems and Computing, vol. 210, pp. 221–231. Springer, Heidelberg (2013) 11. Stasiak, B., Rychlicki-Kicior, K.: Fundamental frequency extraction in speech emotion recognition. In: Communications in Computer and Information Science, CCIS, vol. 287, pp. 292–303 (2012) 12. Pan, Y., Shen, P., Shen, L.: Speech emotion recognition using support vector machine. Int. J. Smart Home 6(2), 101–108 (2012) 13. W¨ ollmer, M., Eyben, F., Reiter, S., Schuller, B., Cox, C., Douglas-Cowie, E., Cowie, R.: Abandoning emotion classes - towards continuous emotion recognition with modeling of long-range dependencies. In: Proceedings of INTERSPEECH, Brisbane, Australia, ISCA, pp. 597–600 (2008) 14. Stuhlsatz, A., Meyer, C., Eyben, F., Zielke, T., Meier, G., Schuller, B.: Deep neural networks for acoustic emotion recognition: raising the benchmarks. In: Proceedings of ICASSP, Prague, Czech Republic, pp. 5688–5691. IEEE (2011) 15. Lee, C.W., Song, K.Y., Jeong, J., Choi, W.Y.: Convolutional Attention Networks for Multimodal Emotion Recognition from Speech and Text Data (2018). arXiv:805.06606 16. Mao, Q., Dong, M., Huang, Z., Zhan, Y.: Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Trans. Multimed. 16(8), 2203–2213 (2014) 17. Badshah, A.M., Rahim, N., Ullah, N., Ahmad, J. Muhammad, K., Lee, M.Y., Kwon S., Baik, S.W.: Deep features-based speech emotion recognition for smart affective services. Multimed. Tools Appl. (2017). https://doi.org/10.1007/s11042017-5292-7 18. Weiskirchen, N., B¨ ock, R., Wendemuth, A.: Recognition of emotional speech with convolutional neural networks by means of spectral estimates. In: 7th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos, ACII, pp. 50–55 (2017) 19. Jianfeng, Z., Xia, M., Lijiang, C.: Learning deep features to recognise speech emotion using merged deep CNN. IET Signal Process. 12(6), 713–721 (2018) 20. Zhang, L., Wang, L., Dang, J., Guo, L., Guan, H.: Convolutional Neural Network with Spectrogram and Perceptual Features for Speech Emotion Recognition. In: Cheng, L., Leung, A., Ozawa, S. (eds.) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science, vol. 11304, pp. 62–71. Springer (2018)

Convolutional Neural Networks for Computer Aided Diagnosis of Interdental and Rustling Sigmatism Andre Woloshuk1 , Michal Krecichwost2(B) , Zuzanna Miodonska2 , Dominika Korona2 , and Pawel Badura2 1

Weldon School of Biomedical Engineering, Purdue University, 206 S Martin Jischke Dr, West Lafayette, IN 47907, USA [email protected] 2 Faculty of Biomedical Engineering, Silesian University of Technology, Roosevelta 40, 41–800 Zabrze, Poland {michal.krecichwost,zuzanna.miodonska,pawel.badura}@polsl.pl, [email protected]

Abstract. Sigmatism (lisping), is the misarticulation of sibilant sounds. Multiple classes of sigmatism exist, and the treatment for each type differs. An automatic classifier may improve therapeutic options with the supervision of a speech therapist. A database containing 1188 multichannel recordings of children diagnosed as having normative pronunciation, interdental sigmatism, or rustling sigmatism was used to create visual representations of the spectrum, mel-filter bank energies (FBE), and mel-frequency cepstral coefficients. These images were used to train a convolutional neural network. The network achieved a binary accuracy of 97.67% using FBE images to distinguish between normative pronunciation and any other of the analyzed types of sigmatism.

Keywords: Computer-aided pronunciation evaluation Sigmatism diagnosis · Convolutional neural network

1

· Sibilants ·

Introduction

Sigmatism, or lisp, is a common speech disorder that appears in children at the preschool age and is defined as the misarticulation of sibilants. Sigmatism is diagnosed by a speech therapist by identifying anatomical flaws and functional inaccuracies during sibilant articulation, which results in incorrect sounds. Sigmatism is further divided into interdental, rustling, lateral, palatal, or many other types of lisp. In proper pronunciation, airflow from the lungs is directed between the upper and lower teeth by the tongue, resulting in high intensity sibilant noises with spectral peaks between 6500 and 10000 Hz [1]. In pathological cases, both the direction of airflow and the spectral peaks may be changed. Treatment usually begins at the phoneme level, where children produce single c Springer Nature Switzerland AG 2019 E. Pietka et al. (Eds.): ITIB 2019, AISC 1011, pp. 179–186, 2019. https://doi.org/10.1007/978-3-030-23762-2_16

180

A. Woloshuk et al.

sounds with or without the aid of tactile indicators in the mouth. The prevalence of speech disorders in children aged 3–17 is roughly 5%, and the prevalence of articulation disorders is even higher at 8–9% [2]. Children with different misarticulation types often face social stigma and exclusion [3–5], and a rapid, automatic diagnostic may expand and improve current therapeutic options. Machine learning classifiers are becoming a popular way to analyze and classify speech samples for different applications. Traditionally, Hidden Markov Models (HMM) [6,7] and Support Vector Machines (SVM) [8,9] have been combined with feature reduction techniques such as Linear Discriminant Analysis (LDA) [10] to achieve pronunciation pathology discrimination accuracies of up to 90% [8]. Similar methods have been employed for other problems, like voice pathology detection [11]. More recently, neural networks present an alternative to these machine learning algorithms [12]. However, the feature sets used in the literature differ considerably and affect the accuracy of classification. A promising alternative is using a convolutional neural network (CNN) in combination with time-frequency sound representations, which has been used for environmental sound classification [13], emotion classification [14], and music classification [15]. CNNs provide an advantage over traditional classifiers by potentially learning small changes in spectral distribution. However, they are limited by the number of training samples available. In images of objects, jitter is a commonly used tool to increase the number of training images and generalizability using techniques such as flipping, cropping, and rotating [16]. Based on the concept of jitter, variation was introduced in this experiment by using 15 microphones at slightly different positions around the subject’s mouth, which could potentially record pathological airflows or spectral peaks at off-center locations. The main objective of this study was to investigate the effectiveness of a convolutional neural network classifier on different types of time-frequency representations of children’s speech to aid speech therapists in lisp diagnosis. In order to achieve this objective, three different image types were compared in binary classification on recordings from fifteen spatially distributed microphones.

2 2.1

Materials and Methods Database Description

A database was created in [17], which contains speech samples from 85 children aged 6–8 years prior to any speech therapy. It includes 1188 recordings—612 normative, 378 interdental, and 198 rustling recordings. For each child, a speech therapist provided a pathology description and diagnosis. The dictionary used during measurements consists of the following logatomes: ASA, ESE, ISI, OSO, USU, YSY, SAS, SES, SIS, SOS, SUS and SYS. These sequences are used to analyze the phoneme /s/ in various acoustic environments at the beginning, middle, and end of the logatome. A multichannel recording device was designed in [18]. The array of 15 directional microphones are characterized by a linear transmission characteristic in the frequency range of the speech signal. Recordings were carried out at 44.1 kHz

CNN for Sigmatism Diagnosis

181

to capture the high frequency components of the logatomes. Manual segmentation was used to isolate the /s/ phoneme in each recording. 2.2

General Workflow

The general workflow of the proposed methodology is presented in Fig. 1.

Fig. 1. The general workflow of the methodology

Preprocessing and Feature Extraction Based on the previously described database, feature extraction from individual segments was carried out. First, the recording was converted to a logarithmically scaled spectrogram from 4 to 22 kHz using the Hamming window of 25ms and overlap of 10ms. Second, the recording was converted to the mel-cepstrum using a series of triangular filters, and the energy in each mel-scale filter bank was used to denote a pixel value in the Filter Bank Energy (FBE) image. The third image was a visual representation of the MFCC coefficients after using the discrete cosine transform (DCT) on the filter bank signal, where time is represented horizontally and the 13 MFCC coefficients determine the pixel values [6]. In order to unify the number of features describing individual speech segments, linear interpolation was used. As a result, each segment of the phoneme /s/ was composed of 23 frames. The result of this processing stage was the preparation of three different types of images. An example of each image type can be seen in Fig. 2. All images were converted to a grayscale image of dimensions size 32 x 32 pixels and the values were scaled to [−1, 1] for the purposes of CNN classification. The size is similar to the CIFAR10 dataset, an image classification standard, which has been used as a basis for training a network to detect lung nodes [19]. The average training image was also subtracted from all images prior to training and testing [20]. For each image type, there were a total of 17820 images (15 channels * 1188 recordings). These images were used to develop a CNN to distinguish between normative pronunciation, interdental, and rustling lisp types. Convolutional Neural Network Design The CNN structure can be seen in Fig. 3. Two different convolutions, 5 × 5 and 3 × 3 with a stride at 2, are used to identify smaller and larger trends in the spectral pattern. Batch normalization is a commonly used technique to improve model performance [21]. Max pooling layers with kernel size of 2 and 4 pixels, respectively, were used to downsample the images. A softmax layer was used in the classification layer. Training was completed using a batch size of 64 and learning rate of 0.001 for 50 epochs. An equal number of training examples were used for each class. Metrics and Analysis The networks were trained using 10-fold cross validation and repeated 10 times. Therefore, sizes of traning sets were always 90% of the overall number of images (16038 elements), and test sets always consisted

182

A. Woloshuk et al.

Fig. 2. Examples of three different types of images from the center microphone of a normative pronunciation (top), interdental sigmatism (middle), and rustling sigmatism (bottom) recording. The horizontal axis represents time frames and the vertical axis represents frequency bins. The value of the pixel is determined by the feature (e.g. spectral power, FBE, or MFCC)

Fig. 3. CNN architecture using a 32 × 32 pixel grayscale input image with a 5 × 5 convolution to 32 filters and a 3 × 3 convolution to 64 filters. Number of filters and channel size are shown above each convolutional layer visualization. Batch normalization and max pooling were performed after both convolutions

of the remaining part of the dataset (1782 elements). Performance was evaluated using the network accuracy (ACC), sensitivity (TPR), and specicity (SPC), calculated according to formulas: – sensitivity: TPR =

TP · 100%, TP + FN

(1)

SP C =

TN · 100%, TN + FP

(2)

– specificity:

CNN for Sigmatism Diagnosis

– accuracy: ACC =

TP + TN · 100% TP + FP + FN + TN

183

(3)

where TP – true positives, TN – true negatives, FP – false positives, and FN – false negatives. The statistical significance of the accuracy was evaluated using a Wilcoxon-Mann-Whitney test for non-normal data.

3

Results and Discussion

The CNN classifier was trained on different binary combinations to evaluate the discrimination abilities of different image types. In all cases, an equal number of images from both classes were used, i.e. the forward probability was 50%. The norm-all designation indicates that the classifier attempted to distinguish between normative recordings and a mix of interdental and rustling sigmatism recordings. Figure 4 shows a box plot representation of CNN accuracy for binary classifiers, and Table 1 shows the accuracy, sensitivity, and specificity of the CNN. Table 1. Classifier performance on 3 different image types for 3 binary classification modes Norm vs. All ACC TPR SPC

Norm vs. Interdent. Norm vs. Rustling ACC TPR SPC ACC TPR SPC

FBE

97.67 98.81 91.13 93.89 96.38 96.22

96.97 98.39 98.68

MFCC

94.87 94.82 95.57 91.39 92.68 94.78

95.35 95.85 95.85

Spectrogram 89.13 92.26 86.60 88.45 97.89 70.62

87.11 94.42 82.50

The results in the binary performance table (Table 1) suggest that there was a statistically significant improvement in accuracy when using the filter bank energy images to distinguish normative vs rustling recordings. However, FBE images only performed as well as MFCC images when distinguishing between normative vs interdental or normative vs all pathologies. The spectrogram image achieved significantly worse accuracy than the other two image types for all binary comparisons. Additionally, the CNN trained on spectrogram images had a lower specificity than the CNNs from other image types. From the binary performance boxplot (Fig. 4), it can be seen that the data variability is larger for normative vs interdental comparisons than other binary comparisons. One possible explanation for this phenomena is that the vocal features of interdental sigmatism are quite similar to normative articulation, and the CNN is susceptible to variations across speakers and recordings. Additionally, the beginning and end of the phoneme segments are marked with coarticulation of the vowel sound and other interference, which may contribute to decreased performance [22,23].

184

A. Woloshuk et al.

Fig. 4. Classifier accuracy boxplot for binary classification for spectrogram, FBE, and MFCC images. FBE norm-all and FBE norm-rustling are significantly higher than the remaining classifiers (p < 0.05)

The high accuracy of FBE is potentially due to the complexity of learning linear transforms, since pre-processing with linear transforms may make the data more easily learned by the classifier. The fast Fourier transform (FFT) performed between the spectrogram and FBE images provides assistance to the learning algorithm, while the discrete cosine transform (DCT) between the FBE and MFCC images decorrelates the data and removes data that are highly nonlinear [24,25]. Therefore, it is beneficial for the classifier to use data that has been transformed using the FFT, but not the DCT. A study investigating automatic diagnosis of two types of laryngeal disease compared a variety of cepstral, pitch, frequency, pitch and amplitude perturbation, and other feature sets achieved an accuracy of 84.6% and 95.1% for three-class and binary classification using a support vector machine (SVM), respectively [26]. Therefore, using CNN architectures in speech classification could be an alternative to classic recognition methods. Future experimentation can seek to address some of the limitations of this experiment as well as further characterize CNNs for visual representations of speech. Specifically, the input resolution and CNN depth could be increased to implement different filters while also increasing computation time. Finally, the effect of image jitter from the spatial microphones on CNN performance can be more thoroughly investigated.

CNN for Sigmatism Diagnosis

4

185

Conclusion

The objective of this article was to evaluate the effectiveness of convolutional neural networks in classifying 3 types of pronunciation (normative, interdental sigmatism, and rustling sigmatism) using recordings from a multichannel recording device. Spectrograms, filter bank energies, and mel-frequency cepstral coefficients were used as inputs to the CNN. The FBE images showed the highest accuracy for binary classification (norm vs. all pathologies), 97.67%. Future experimentation should seek to evaluate the effect of jitter from the multichannel microphone as well as to investigate different network architecture.

References 1. Khinda, V., Grewal, N.: Relationship of tongue-thrust swallowing and anterior open bite with articulation disorders: a clinical study. J. Indian Soc. Pedod. Prev. Dent. 17(2), 33–39 (1999) 2. Black, L.I., Vahratian, A., Hoffman, H.J.: Communication disorders and use of intervention services among children aged 3–17 years: United States, 2012. NCHS Data Brief 205, 1–8 (2015) 3. Jerome, A., Fujiki, M., Brinton, B., James, S.: Self-esteem in children with specific language impairment. J. Speech, Lang. Hear. Res. 45(4), 700–714 (2002) 4. Blood, G., Blood, I., Tellis, G., Gabel, R.: A preliminary study of self-esteem, stigma, and disclosure in adolescents who stutter. J. Fluen. Disord. 28(2), 143– 159 (2003) 5. McKinnon, S., Hess, C., Landry, R.: Reactions of college students to speech disorders. J. Commun. Disord. 19(1), 75–82 (1986) M., Szyma´ nska, A.: Computer-aided evaluation of 6. Miodo´ nska, Z., Krecichwost, sibilants in preschool children sigmatism diagnosis. In: Information Technologies in Medicine, pp. 367–376. Springer International Publishing (2016) 7. Hu, W., Qian, Y., Soong, F., Wang, Y.: Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers. Speech Commun. 67, 154–166 (2015) 8. Ali, S.M., Dr. Karule, P.T.: MFCC, LPCC, formants and pitch proven to be best features in diagnosis of speech disorder using neural networks and SVM. Int. J. Appl. Eng. Res. 11(2), 897–903 (2016) 9. Krecichwost, Michal, Miodonska, Zuzanna, Badura, Pawel, Trzaskalik, Joanna, Mocko, Natalia: Multi-channel acoustic analysis of phoneme /s/ mispronunciation for lateral sigmatism detection. Biocybern. Biomed. Eng. 39(1), 246–255 (2019) 10. Bugdol, M.N., Bugdol, M., Lipowicz, A.M., Mitas, A.W., Bienkowska, M.J., Wijata, A.M.: Prediction of menarcheal status of girls using voice features. Comput. Biol. Med. 100, 296–304 (2018) 11. Akbari, A., Arjmandi, M.: An efficient voice pathology classification scheme based on applying multi-layer linear discriminant analysis to wavelet packet-based features. Biomed. Signal Proc. Control 10, 209–223 (2014) 12. Majidnezhad, V.: A novel hybrid of genetic algorithm and ANN for developing a high efficient method for vocal fold pathology diagnosis. EURASIP J. Audio Speech Music. Process. 2015(1), 3 (2015)

186

A. Woloshuk et al.

13. Huzaifah, M.: Comparison of time-frequency representations for environmental sound classification using convolutional neural networks. CoRR (2017). arXiv:1706.07156 14. Badshah, A.M., Ahmad, J., Rahim, N., Baik, S.W.: Speech emotion recognition from spectrograms with deep convolutional neural network. In: 2017 International Conference on Platform Technology and Service (PlatCon), pp. 1–5 (2017) 15. Costa, Y., Oliveira, L., Silla, C.: An evaluation of convolutional neural networks for music classification using spectrograms. Appl. Soft Comput. 52, 28–38 (2017) 16. Reed, R., Marks, R.J., Oh, S.: Similarities of error regularization, sigmoid gain scaling, target smoothing, and training with jitter. IEEE Trans. Neural Netw. 6(3), 529–538 (1995) M., Miodo´ nska, Z., Badura, P., Trzaskalik, J., Pietka, 17. Woloshuk, A., Krecichwost, E.: CAD of sigmatism using neural networks. In: Pietka, E., Badura, P., Kawa, J., Wieclawek, W. (eds.) Information Technology in Biomedicine, pp. 260–271. Springer International Publishing, Cham (2019) M., Miodo´ nska, Z., Trzaskalik, J., Pyttel, J., Spinczyk, D.: Acoustic 18. Krecichwost, mask for air flow distribution analysis in speech therapy. In: Information Technologies in Medicine, pp. 377–387. Springer International Publishing (2016) 19. Shin, H.C., Roth, H.R., Gao, M., Lu, L., Xu, Z., Nogues, I., Yao, J., Mollura, D., Summers, R.M.: Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging 35(5), 1285–1298 (2016) 20. Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097– 1105. Curran Associates Inc. (2012) 21. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. CoRR (2015). arXiv:1502.03167 22. Soli, S.D.: Second formants in fricatives: acoustic consequences of fricative vowel coarticulation. J. Acoust. Soc. Am. 70(4), 976–984 (1981) 23. Sereno, J.A., Baum, S.R., Marean, G.C., Lieberman, P.: Acoustic analyses and perceptual data on anticipatory labial coarticulation in adults and children. J. Acoust. Soc. Am. 81(2), 512–519 (1987) 24. Sahidullah, Md, Saha, G.: Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition. Speech Commun. 54(4), 543–565 (2012) 25. Nadeu, C., Macho, D., Hernando, J.: Time and frequency filtering of filter-bank energies for robust HMM speech recognition. Speech Commun. 34(1), 93–114 (2001). (Noise Robust ASR) 26. Gelzinis, A., Verikas, A., Bacauskiene, M.: Automated speech analysis applied to laryngeal disease categorization. Comput. Methods Programs Biomed. 91(1), 36– 47 (2008)

Barley Defects Identification by Convolutional Neural Networks Michal Kozlowski1(B) and Piotr M. Szczypi´ nski2 1

Faculty of Technical Science, Department of Mechatronics and Technical and IT Education, University of Warmia and Mazury, Sloneczna 46A, 10-710 Olsztyn, Poland [email protected] 2 Institute of Electronics, L od´z University of Technology, W´ olcza´ nska 211/215, 90-924 L od´z, Poland [email protected]

Abstract. The right choice of ingredients, particularly barley, is a key issue in the malting and brewing industry. Nowadays, controlling barley quality involves visual inspection to identify defective or infected kernels. It requires expertise and is labour-intensive. Computer vision solutions sequentially applying attribute extraction and classification algorithms tend to be inaccurate. Deep learning networks combine the two aspects together to enable their mutual adjustment and to increase classification ability. We use this technique to identify the most common defects of malting barley. Two ways of data presentation, two implementations of convolutional neural networks and a handcrafted-features-based method are examined. The classification results are presented, compared and discussed. Keywords: Barley grain · Classification · Defects · Discrimination · Computer vision · Machine learning · Convolutional neural network

1

Introduction

Malting is a process of partial germination of grains which causes the development of carbohydrates. It is suspended by drying when a maximum amount of sugar is gained. Grains should be intact and contain the germ, which is necessary for the germination. The existence of kernels with fungal infections is unacceptable due to the possible contamination of the final product with the toxins and in case of beer, an extensive gushing. The utilization of low quality cereal for malting results in lower quality of the final product and in economic losses. Therefore, assuring superior quality of barley grains for the malting process is crucial. When a new supply of cereal, usually several tons, is acquired by a malting house, its utility for malting is assessed from a sample of 100 g. The evaluation of barley is typically carried out by an expert, who visually identifies kernels with spots of fungal infection, the grains which are already sprouted, undeveloped or without the germ. This kind of a visual evaluation is a tedious and time-consuming c Springer Nature Switzerland AG 2019 E. Pietka et al. (Eds.): ITIB 2019, AISC 1011, pp. 187–198, 2019. https://doi.org/10.1007/978-3-030-23762-2_17

188

M. Kozlowski and P. M. Szczypi´ nski

task, while its reliability depends on the skills, observation and experience of the evaluator. The inferior quality is indicated by the existence of infected kernels, contribution of foreign matter, undeveloped or mechanically damaged kernels. The details of the evaluation procedure and acceptable amounts of defective kernel fractions are specified by the Polish industry standard BN-87/9131-13. The attempts to apply computer vision and machine learning techniques to automatize the evaluation procedure were already made. The [13] introduced an ontology, which formalizes the human expert knowledge, and an expert system based on the ontology to classify barley kernels. The system extracts selected visual attributes by applying computer vision methods. However, the assessment of more complex or abstract attributes requires human involvement - therefore, the solution is not fully automatic. In [11] the algorithm is presented, which in sequence applies image processing, segmentation, morphology analysis, extraction of texture and colour attributes, supervised learning and classification. The procedures were validated on a small dataset and enabled classification of defective kernels with the accuracy of 91–97%. The deep learning networks have gained a huge interest recently. They join the attribute extraction (in convolution layers), the feature selection (max pooling) and the classification (in the fully connected layer) in one body. Thus, the machine learning procedure enables joint and mutual optimization of these components, which was not feasible in traditional approaches. Therefore, we expect that the deep learning networks can gain higher accuracy ratios in the detection of defective barley kernels than the traditional algorithms. In this paper, a machine vision approach to identify barley defects using convolutional neural networks is presented. A fully-trained and size-tailored model of a network is compared with the pretrained model commonly used in the image classification tasks. For reference we apply the hand-engineered-features-based method presented in [11]. Moreover, we present two approaches to the image data presentation. In one of them, the single side of the kernel is analysed. In the other, both sides, the ventral and the dorsal, are jointly presented to the networks. The classification results obtained by each of the examined solutions are presented, compared and discussed.

2

Related Works

In the literature there are several approaches to the identification of the grains using image analysis. The researches usually use hand-engineered methods to identify and extract the significant features from the images. Such features are the statistics or the model parameters used to characterize colour, texture or shape of the cereal kernels. Most solutions use machine learning for selection of the most important discriminative features and at the final stage of the classification. Ngampak and Piamsa-nga [7] explored the possibilities of classifying images of broken rice grains using least-square support vector machine (LS-SVM) with radial basis function (RBF) kernel. In [5] the grayscale histogram statistical

Barley Defects Identification by Convolutional Neural Networks

189

features have distinguished eight classes of wheat varieties. The classification of four types of cereal grains was performed with use of the morphological and colour features in [6].

Fig. 1. Example images: (a) broken, (b) infected or sprouted, (c) with missing germ, (d) green or undeveloped and (e) normal

There were several approaches to use perceptron neural network for classification. An input for the classifiers were feature vectors extracted from the cereal images. An equalised colour component extraction, computation of morphological features, and the application of multilayer perceptron neural network were used to analyse bulk samples of three barley varieties in [9]. In [14] four different architectures of neural classifiers were examined, namely back propagation network (BPN), Ward network, general regression neural network (GRNN) and probabilistic neural network (PNN). The quantitative characteristics of shape, colour and texture computed for every kernel served as an input to the classier, capable of recognizing grain types, varieties, defective kernels or foreign objects. It enabled detection of defective kernels in [15]. All the above methods assumed that the features are extracted from images presenting only one side of the kernel. In [12] it was presented that classification accuracy rises if two sides of the kernel, the dorsal and the ventral ones, are analyzed concurrently. Recently, an innovative automatic acquisition system has appeared [4,8], which records images of both sides of kernels. It enabled development of novel workflow for classification of barley, in which features extracted from both the images are combined and jointly presented to the classifier [11]. In recent years convolutional neural networks (CNNs) have made an impact on many vision-based problems. CNNs are capable of learning meaningful image features directly from data and in some applications, they significantly outperform traditional computer vision methods [2]. The CNN applied for classification of 8 barley varieties was presented in [1]. Two separate convolution layers analyse the images of dorsal and ventral sides respectively. The information is merged at the stage of fully connected layers. The network was trained on the relatively small set of 200–500 cases per class and gained the classification accuracy of 97%. Following the above works, we examine CNN application in detection and classification of defective barley kernels. To our knowledge, this issue has not been the subject of scientific publication yet. We compare two different architec-

190

M. Kozlowski and P. M. Szczypi´ nski

(a)

(b)

Fig. 2. (a) Input images presenting two opposite sides of kernel, (b) outliers

tures of CNNs and verify whether the analysis of single or both sides of kernels has a significant impact on the classification performance.

3

Materials

Barley grains were obtained from selected farms in Poland. The experimental material consisted of the images of four classes of barley defects and one reference class of healthy grains (Fig. 1). The classes of defects included kernels which were (a) broken, (b) infected or sprouted, (c) without a germ, (d) green or not fully developed. The initial classification of all grains was carried out by an expert of Slodownia Soufflet Polska Sp. z o.o. malt house. The grains were photographed by a two-camera acquisition system [4]. The device drops kernels on the flat transparent surface and enables image acquisition of their both sides. However, the dorso-ventral orientation of each kernel is random and the system does not determine which of the cameras takes an image of which side. An additional problem is a random orientation of anteroposterior axis as kernels can rotate on the flat surface (Fig. 2(a)). The data set is based on biological material, which makes it highly diversified within classes and at the same time kernels belonging to different classes may look similar. The exception is the class of broken grains which includes grain pieces highly distinctive with the morphological attributes. Another problem is a large number of difficult or unusual cases which significantly differ from other kernels (Fig. 2(b)). These are kernels which were deformed by other than mechanical causes, sprouted kernels, kernels with awns, which in most other cases are detached, and kernels covered with a husk sticking out. All these cases highly differ with their attributes from the other kernels and become outliers, which may cause problems in machine learning. Nevertheless, the outliers were deliberately kept present in the training and the test sets. This enabled assessment of the algorithms immunity to the difficult cases. It also makes the experiment conditions closer to the real life situation. Moreover, if the object does not fit within the image frame, the algorithm analyses its visible part only. The entire dataset of images was randomly divided into training, validation and test sets, containing 80%, 10% and 10% of samples respectively. The detailed number of cases used in each of the three sets and in every class is listed in Table 1.

Barley Defects Identification by Convolutional Neural Networks

191

Table 1. Dataset Total Training Validation Test 100% 80% 10% 10% ID Class name

4 4.1

29714 23772

2972

2972

1

Broken

6046

4836

604

604

2

Infected

8808

7046

880

880

3

Missing germ

5258

4206

526

526

4

Green

5260

4208

526

526

5

Normal

4342

3474

434

434

Methods Preprocessing

Since the brightness of the kernel is higher than the brightness of the background, the initial step in image processing is grayscale thresholding to find the region of interest. Next, the binary image is median-filtered to smooth the contour of the region. The connected set of the highest intensity area is selected as a mask of a kernel. Next, the contour of every mask is approximated by an ellipse. The longer diameter of the ellipse designates the kernel main axis. Then, the width of the kernel is estimated along the axis to determine the germ-brush orientation. This information is used to correct the anteroposterior orientation of the kernel image to set the germ side upward [11]. Moreover, the background of the original image outside the mask is replaced with a uniform black colour. The images are then prepared to fit to the inputs of CNNs. If a single side of the kernel is analysed the resolution of a single image is reduced to fit in the 80× 170 (the proposed CNNs configuration) or in the 227 × 227 (pretrained AlexNet model) pixel window. If the goal is to analyse pairs of the images showing the opposite sides of a single kernel, the images of both sides are combined in a single frame. One of the images is located on the left-hand side and the other on the right-hand side, next to each other. The joint image is resized to fit in the 170 × 170 or in the 227 × 227 pixel window for respective CNNs. The Fig. 3 presents an average of all the images belonging to the training set, preprocessed in the

Fig. 3. Averages of training samples: (a) 80 × 170 (CNN), (b) 170 × 170 (CNN), (c) and (d) 227 × 227 (AlexNet)

192

M. Kozlowski and P. M. Szczypi´ nski

way explained above. The images average is used during training to properly bias neurons belonging to the input layer. 4.2

Convolutional Neural Network

In the last decade, CNNs have become state-of-the-art tools for many images classification and recognition tasks. CNNs are made from many layers that come arranged one after another. During the classification, the entire layer is displayed on the input layer. As opposed to multi-layer perceptron (MLP) neural networks which are usually composed of 3 fully connected layers. CNNs usually consist of more layers that have shaped the concept of deep networks. Our experiments were based on a simplified architecture that was adapted to the problem under study. The known AlexNet model was used as the reference method [3]. This model was trained using 1.5 million natural images and won the ImageNet Large Scale Visual Recognition Competition (ILSVRC) 2010 and 2012. For this reason this architecture has become state of art models. AlexNet contains eight layers, the first five are convolutional layers (11 × 11, 5 × 5, 3 × 3), and the last three are fully connected (2×2048 and 1000 neurons). It implements a non-saturating rectified-linear layer (RELU) the activation function, which outperforms hyperbolic tangent or sigmoid functions. In addition to AlexNet, there are more models that are much more complex and achieve better classification results (when classifying objects for thousands of classes). It was noticed that the smaller 3×3 filters in the convolutional layers are better at distinguishing the important features from the image. This information was confirmed using the VGG [10] and ResNet [2] models. By increasing the depth of the model (the number of convolutional layers), the classification results were improved at the expense of the speed of operation dictated by a large number of calculations. The motivation of our work is to find the optimal solution between the speed of model operation and high classification. This is dictated by the practical application in the malt house, where the speed of classification is important. The quoted models were designed for classification tasks based on natural images in a thousand classes, as evidenced by the number of exits of the last layer of FC (Fully-Connected Layer) in AlexNet. Our problem is different. The image acquisition conditions are repeatable and the number of classes is limited to five. We suggest that basing on proven solutions create a model suited to the classification task. A deep learning framework Caffe was used to build a new architecture suited to the problem being studied. It contains 2 convolutional layers and 2 fully connected layers. The input layer is adapted to the objects of interest and has a resolution of 80 × 170 (one grain) or 170 × 170 (two grains in one image). This reduced the size of the final model and influenced on the speed of the learning process. The number of convolutional layers (CONV) was also limited. The first consisting of 64 filters with a size of 3 × 3 and the second with 128 filters of the same size. We assume that the simplification of the convolutional layers will reduce the computational requirements and the important features

Barley Defects Identification by Convolutional Neural Networks

193

identified will be sufficient for proper classification. After each CONV layer, the RELU activation function was used, which speeds up the training process and maintains the same level of the accuracy [3]. The amount of data from CONV or RELU layers is large. Further spatial downsampling of information is performed by the max pooling (POOL) layer. The max operator is applied across the local neighbourhood of the previous layer outputs, with predefined strides. To prevent overfitting, a dropout was used before the first fully connected (FC) layer. It is worth noting that the dropout of 0.5 extends the training time. We used two FC layers. The first is 1024 neurons and the number of neurons in the second one is equal to 5, which is the number of classes. Two approaches were compared (Fig. 4). The difference between them is how to enter data on the first layer of a convolutional neural network. In the first approach for the CNN model, the images are read one at a time in 80 × 170 resolution. Combining a pair of images on both sides of the grain increases the resolution of the input image to 170 × 170. Higher resolution images (227 × 227) were used in the reference method, which was the previously trained AlexNet model. Caffe contains so-called Zoo model that allows the use of a model trained on a large dataset like ImageNet. This technique is called transfer learning and uses the experience gained in a very advanced classification task to shorten significantly the training time and usually may improve the classification results. In order to adopt such a model to the new problem, it is enough to modify the FC layer by changing the number of outputs to match the number of classes. In the AlexNet model that we use, the number of FC layer outputs was reduced to 5. This model revealed excessive overfitting characteristics during the initial tests. This is due to the large capacity of the AlexNet model, which is able to remember all the samples. Instead of distinguishing features - they are taught by heart. To deal with this problem, the number of neurons in the first two FC layers was halved, which brought the intended effect. Finally, the architecture of both models is presented in the (Fig. 4).

Fig. 4. The proposed algorithm includes image preprocessing and classification with two alternative CNNs

194

4.3

M. Kozlowski and P. M. Szczypi´ nski

Training

During training, the network was validated on the valuation data and the classification accuracy was estimated. The training was continued until the accuracy value was no longer increasing. Finally, the network was validated on the test set to compute the credible value of the classification accuracy. The goal of learning is to find a set of weights that minimises the loss function. The experiments were run on Linux Ubuntu computer with Intel Core i7-4930K CPU @ 3.40 GHz with 16 GB RAM and nVidia GeForce GTX 780 Ti with 3GB memory. There were four configurations of CNNs examined and undergoing the complete training. The loss function was calculated using the Softmax Loss Layer which is a combination of Multinomial Logistic Loss Layer and Softmax Layer. The proposed model was trained using a modified stochastic gradient descent function loss with momentum. The main hyperparameters were: the highest stable initial learning speed α = 0.001, momentum μ = 0.8. The batch size that fits into the available GPU memory. The batch size was 50 (for images with a resolution of 80 × 170) and was reduced to 8 (170 × 170). For the reference method, the hyperparameters were as follows: α = 0.001, μ = 0.9, batch size = 35. We set the maximum number of iterations to 200000. A models snapshot is taken every 1000 iterations. Finally, the networks were validated on the test set to compute the credible value of classification accuracy. 4.4

Reference Method

The method for detection of defective barley kernels was presented in [11]. Following this publication, we compare the proposed neural network with the image analysis procedure implemented in QMaZda software. The software computes quantitative features to characterise every kernel in terms of morphology (shape), colour and texture. The morphological features are computed from binary masks and they estimate area, height, width, perimeter, minimum and maximum diameters, slenderness, compactness, corrugation, circularity, elongation, Danielson index, Blair-Bliss ratio, Malinowska ratio and Hu moments. The attributes of brightness distribution, texture and colour are extracted from the image fragments bounded by the contours of the masks. Colour and brightness distribution features are statistics computed from histograms of colour components. They include components of RGB, YUV, YIQ, HSB, CIE XYZ and CIE Lab colour models. The texture is described in terms of second order statistics derived from the grey-level co-occurrence matrix and the grey-level run-length matrix, magnitudes of Haar, Fourier and Gabor transform components, parameters of an autoregressive model, local binary patterns, and histograms of oriented gradients. Altogether, if the single side of the kernel is considered, we compute over 750 attributes to characterise an individual kernel. Optionally, if the feature vector combines information extracted from the pair of images, both sides, the number of attributes amounts to over 1500. Not all of the attributes carry information relevant for identification of defective kernels. Moreover, applying the 1500-dimensional vectors for training would

Barley Defects Identification by Convolutional Neural Networks

195

Fig. 5. Normalized confusion matrix comparing the classifications of the defects grains

cause overfitting problems. Therefore, a subset of most discriminative features is selected before training of the classifier. The goal is to establish a feature subset or less dimensional subspace which would enable the best discrimination of every class from each other. The criterion for the selection is a Fisher discriminant resulting from linear discriminant analysis. This procedure leads to the feature space dimensionality reduction from 750 or 1500 to 50. The 50dimensional feature vectors are used for training the support vector machines classifier. We use the classifier with a 3rd order polynomial kernels. The entire dataset of image pairs was split into training and test sets containing 80% and 20% of cases respectively. The procedure of feature selection and classifier training was performed with the training set images. Finally, the classification ability was established on the test set.

5

Results

The results (Fig. 5) are presented in the form of a normalized confusion matrices. Each matrix presents classification counts for one of six examined methods. The three matrices at the top present results obtained by the analysis of one side of the kernels. The bottom row present results generated for the joint images. Matrices on the left relate to the analysis of the proposed CNN architecture, in the middle the AlexNet, whereas the right-handside ones present the results of the QMaZda reference method. The values presented on the diagonals express the percentage of correctly classified cases, whereas the values out of the diagonal indicate error rates. We use a balanced accuracy to quantitatively compare the results obtained by the four approaches. The balanced accuracy is computed as a sum of diagonal elements divided by the sum of all the elements of the particular matrix. Classification accuracy, error rates and duration of training and classification are compared and summarised in Table 2. The proposed size-tailored CNN

196

M. Kozlowski and P. M. Szczypi´ nski

model generates about 2% fewer errors then the AlexNet. It is also evident that classification accuracy increases by 1.5–2.5% when the two sides of the kernel are analysed concurrently. The duration of the training is a time from the start of the training until reaching the highest accuracy of classification estimated on the validation set. It can be noticed that the use of the transfer learning significantly reduces the training time in favour of AlexNet (over 3 times). Table 2. Training and classification ratings Test

Errors Accuracy Test duration Test duration Training

samples

[%]

on GPU [s]

[hh:mm:ss]

484.3

8.9

05:13:49

93.14

506.0

9.6

05:35:49

89.49

3615.3

29.2

01:35:22

131

91.18

1421.8

16.4

01:37:27

5944

701

88.2

224.0

N/A

41:00:05

QMaZda (two sides) 2972

131

91.6

111.7

N/A

41:00:05

CNN (one side)

2972

274

90.80

CNN (two sides)

1486

102

AlexNet (one side)

2972

312

AlexNet (two sides)

1486

QMaZda (one side)

on CPU [s]

Combining two images into one to present both sides of the kernel resulted in different duration of the classification process. It extended the duration slightly in the proposed CNN. In AlexNet the test duration decreases significantly by 1.8–2.5 times. Caffe framework has the ability to use either CPU or GPU for testing (classification). The use of the graphics processing unit (GPU) resulted in faster computations. The time was reduced by 50–130 times when compared to the central processing unit (CPU) application. The reference method enabled classification accuracy of 88.2 for analysis of one side of the kernels and 91.6 for analysis of the both-sides images. It places this method between the proposed network configuration and the AlexNet solution. The computation of feature vectors for 23772 cases required 40 h. The feature selection process lasted 1 hour and the classifier training took less than 5 s. The Table 2. Presents the total time of all the stages of the analysis. However it should be noted, that once calculated feature vectors can be reused many times in various selection and training experiments. Computation of 50 features required for the classification of a single kernel takes 0.037 s on average. Therefore, feature extraction from 2972 kernels takes 112 s. The classification time is negligible and it was estimated as 0.2 s for the entire test set.

6

Conclusions

In this article, we compared four approaches to classify defects of malting barley using a computer vision and convolutional neural network. We proposed the simple CNN model with reduced number and sizes of architectural layers. In

Barley Defects Identification by Convolutional Neural Networks

197

comparison with the state-of-the-art AlexNet solution, despite the lower resolution images, the model was able to classify the examined objects with the accuracy higher by 2%. The experiment confirmed that concurrent analysis of images presenting both sides of kernels increases the accuracy of classification by 1.5–2.5%. The overall accuracy obtained by the simple CNN on the images combining views of the two sides of the kernel gained 93%. This result is satisfactory considering the in-class complexity and diversity of the input data. The highest classification errors can be seen in the identification of green and infected cases. These classes are the most often confused by the classifier. From 13% to 20% of green or undeveloped kernels are identified as infected. It can be noticed that an analysis of two-side images significantly reduces these errors. In contrast, the defects originating from mechanical causes, such as broken or missing germ kernels, are discriminated fairly easily, with errors below 2.1% (in the most effective classifier). The highest error in missing germ detection occurred in AlexNet fed with single-side images, and it exceeded 5%. The broken kernels and the kernels with missing germ significantly differ with shape from the kernels belonging to the other classes. This shows, that the CNN can cope with the classification problems requiring the analysis of morphological characteristics. The classification time is 5–10 times shorter in the simplified CNN. This gives to the proposed method the advantage over the pretrained AlexNet. The applications of GPU technology enabled reduction of the classification time by 50–100 times when compared with the usage of the CPU. With the GPU, the classification rate of 300 kernels per second was achieved, which is more than satisfactory in the quality assessment applications. On the other hand, the training of the proposed model took over 5 h and was 4 times longer than in case of the AlexNet solution. However, we still find this time acceptable in practical applications. We have compared two CNNs, the proposed one required a complete training of the convolutional layers and the AlexNet was pretrained. It can be observed, that the training based on the public image database may be insufficient for analysis of specific biological material. It was demonstrated that the network in which the convolutional layers were trained from scratch, and on the dedicated data, gained a higher classification efficiency. Moreover, the reduction of the number of layers and of their sizes resulted in better generalization capabilities of the proposed network. The classification accuracy of the reference method ranks in between the compared neural networks. Computationally it is the most efficient when using a CPU. However, complex algorithms for handcrafted feature extraction and nonlinear decision boundaries of the classifiers can be hardly implemented on GPUs. Therefore, having a modern computer system with an efficient GPU the proposed CNN configuration would be a method of choice. On other systems, methods based on feature extraction, selection and data classification would prevail. The results of this study indicate that the recognition of individual defects of barley grains can be achieved by CNN with satisfactory accuracy. The proposed

198

M. Kozlowski and P. M. Szczypi´ nski

model gains an advantage over AlexNet by shortening the classification time and it improved the classification accuracy. Acknowledgment. This work was supported by the National Center for Research and Development (NCBR) in Poland, grant no. PBS3/A8/38/2015.

References 1. Dolata, P., Reiner, J.: Barley variety recognition with viewpoint-aware doublestream convolutional neural networks. In: 2018 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 101–105 (2018) 2. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 3. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012) 4. Lampa, P., Mrzygl´ od, M., Reiner, J.: Methods of manipulation and image acquisition of natural products on the example of cereal grains. Control. Cybern. 45(3), 339–354 (2016) 5. Manickavasagan, A., Sathya, G., Jayas, D., White, N.: Wheat class identification using monochrome images. J. Cereal Sci. 47(3), 518–527 (2008) 6. Mebatsion, H., Paliwal, J., Jayas, D.: Automatic classification of non-touching cereal grains in digital images using limited morphological and color features. Comput. Electron. Agric. 90, 99–105 (2013) 7. Ngampak, D., Piamsa-nga, P.: Image analysis of broken rice grains of Khao Dawk Mali rice. In: 2015 7th International Conference on Knowledge and Smart Technology (KST), pp. 115–120 (2015). https://doi.org/10.1109/KST.2015.7051471 8. Ni, C., Wang, D., Vinson, R., Holmes, M., Tao, Y.: Automatic inspection machine for maize kernels based on deep convolutional neural networks. Biosyst. Eng. 178, 131–144 (2019) 9. Pazoki, A., Pazoki, Z., Sorkhilalehloo, B.: Rain fed barley seed cultivars identification using neural network and different neurons number. World Appl. Sci. J. 22(5), 755–762 (2013) 10. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv:1409.1556 11. Szczypi´ nski, P.M., Klepaczko, A., Kociolek, M.: Barley defects identification. In: Proceedings of the 10th International Symposium on Image and Signal Processing and Analysis, pp. 216–219 (2017). https://doi.org/10.1109/ISPA.2017.8073598 12. Szczypi´ nski, P.M., Zapotoczny, P.: Computer vision algorithm for barley kernel identification, orientation estimation and surface structure assessment. Comput. Electron. Agric. 87, 32–38 (2012) 13. Szturo, K., Szczypi´ nski, P.M.: Ontology based expert system for barley grain classification. In: Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), 2017, pp. 360–364. IEEE (2017) 14. Visen, N., Paliwal, J., Jayas, D., White, N.: Automation and emerging technologies: specialist neural networks for cereal grain classification. Biosyst. Eng. 82(2), 151– 159 (2002) 15. Zapotoczny, P., Zielinska, M., Nita, Z.: Application of image analysis for the varietal classification of barley: morphological features. J. Cereal Sci. 48(1), 104–110 (2008)

Wavelet Convolution Neural Network for Classification of Spiculated Findings in Mammograms Magdalena Jasionowska(B) and Aleksandra Gacek Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warsaw, Poland [email protected], [email protected] http://www.mini.pw.edu.pl/~jasionowskam

Abstract. The subject of this paper is computer-aided recognition of spiculated findings in low-contrast noisy mammograms, such as architectural distortions and spiculated masses. The issue of computer-aided detection still remains unresolved, especially for architectural distortions. The methodology applied was based on wavelet convolution neural network. The originality of the proposed method lies in the way of input image creation. The input images were created as the maximum value maps based on three wavelet decomposition subbands (HL,LH,HH), each describing local details in the original image. Moreover, two types of convolution neural network architecture were optimized and empirically verified. The experimental study was conducted on the basis of 1585 regions of interest (512 × 512 pixels) taken from the Curated Breast Imaging Subset of Digital Database for Screening Mammography (CBIS-DDSM), containing both normal (1191) and abnormal (406) breast tissue images including clinically confirmed architectural distortions (141) and spiculated masses (265). With the use of wavelet convolutional neural network with a reverse bioorthogonal wavelet, the recognition accuracy of both types of pathologies reached over 87%, whereas the recognition accuracy for architectural distortions was 85% and for spiculated masses - 88%. Keywords: Wavelet convolution neural network · Breast cancer Spiculated pathology recognition · Architectural distortions · Mammograms

1

·

Introduction

The interpretation process of mammographic images is significantly affected by image quality, conditioning of content assessment, as well as knowledge and experience of individual radiologists, who interpret mammograms, by describing the physical components of potential visualized findings, such as shape, size, density tissue [1]. Precise characteristic of the observed structures in the background c Springer Nature Switzerland AG 2019 E. Pietka et al. (Eds.): ITIB 2019, AISC 1011, pp. 199–208, 2019. https://doi.org/10.1007/978-3-030-23762-2_18

200

M. Jasionowska and A. Gacek

of the imaged tissue seems a very difficult matter, especially for subtle spiculated findings in mammograms of dense breast tissue. In case of some pathological findings, such as architectural distortions (ADs) and subtle spiculated masses (SMs) [1], their detection is impossible even by the most experienced radiologists [2]. The reason for ineffective recognition of both ADs and subtle SMs is their subtlety and ambiguity in appearance, as well as the lack of a typical model of normal breast tissue as a mammogram is different for each patient. Morover, the error rate in screening mammography is estimated even up to 30% for false positive cases and 20% for false negative cases [3]. What is more, the efficiency of computer-aided diagnosis systems is still insufficient to automatically recognize subtle spiculated findings in mammmograms. The detection sensitivity of the commercial systems (R2 Image Checker, CADx Second Look) for ADs does not exceed 50% with the number of false positives per image equal to 1.0 [2,4].

2

Materials and Methods

Various manifestations of spiculated findings in mammograms prove highly unstable. The properties include the number of radiating spicules, their size, angular distribution, spicule overlapping, and the correlation of pathological findings with the surrounding tissue. Therefore, many research groups use various methods of texture analysis or local edge detection [5], as well as statistical analysis of developed directional maps [6], Gabor filtering and phase portrait [7]. Other approaches concentrate on non-directional properties of spiculated findings. These include the intensity distribution of pixel context matched to the symptom template [8], and a fractal texture analysis [9]. Moreover, various image representation domains are used to extract mammographic spicules, such as the Radon domain [10], mutliscale domains including discrete tensor wavelets [11], steerable complex filtering [12] or dual-tree complex wavelet [13]. Nowadays, there are more and more research based on convolution neural networks (CNNs) [15–17], which process images directly in the spatial domain. 2.1

Wavelet Convolution Neural Network

The objective of this paper was to develop a method for classification of mammographic regions of interest containing both normal and abnormal breast tissue - spiculated findings such as architectural distortions and spiculated masses. For this purpose, both spacial and spectral approaches are desirable for image classification. In our preliminary studies, the results of classification with the use of convolution neural network (CNN) were unsatisfactory for multidirectional structures of the subtle spiculated findings in low-contrast noisy mammograms. Hence, an attempt was made to verify multiscale wavelet representation in order to extract desired image content from mammograms, based on [14]. Discrete wavelet transform, which allows to concentrate the signal energy into a small number of transform coefficients, due to multiresolution decomposition of

Wavelet Convolution Neural Network for Classification

201

Fig. 1. The model of wavelet convolution neural network architecture with 1-channel (top) and 2-channel (bottom) input

signal was selected to acquire the local directional characteristics of image texture. To differentiate the images of normal and abnormal breast tissue two types of CNN architecture were proposed, which have different input images in the first stage (Fig. 1). The first architecture model is an architecture with 1-channel input, in which discrete wavelet transform is performed on raw input images with a single level decomposition. As a results of this transform four decomposition images of wavelet coefficients (LL, LH, HL and HH) are obtained, with all image sizes two times smaller than the original image in each dimension. In case of the first architecture model 1-channel input images are formed by concatenating LL, LH, HL and HH wavelet coefficient images into one. For the second model of CNN architecture, two input channels are used, one of which is LL wavelet coefficient image, whereas in the 2-channel input each pixel of image is obtained by taking absolute maximum value pij of absolute value of LH, HH, and HL coefficient images. The formula for geting the second input channel is as follows: pij = arg max |xij | LH,HL,HH

(1)

Both models of wavelet CNN architectures consist of several convolution layers, followed by LeakyRelu and MaxPooling layers. The output of Convolution-ReluMaxPooling layers is followed by Flatten Layer, and subsequently by two Dense layers, and the final activation function used is softmax [18]. Size of the neural network input might vary, depending on the type of the chosen wavelet (consequently depending on the size of filter mask) and the number of the wavelet decomposition levels used.

3

Results and Discussion

The proposed method was tested on mammograms taken from the Curated Breast Imaging Subset of Digital Database for Screening Mammography (CBISDDSM) [19,20], curated and standardized version of the DDSM for evaluation of future research into CADx and CADe systems in mammography [21]. For

202

M. Jasionowska and A. Gacek

testing purposes in case of architectural distortions data from the CBIS-DDSM test dataset was used with a small addition of the CBIS-DDSM train dataset images. It was a fixed dataset, used for all the tests. The data used for the training of the neural networks consisted of the rest of the images, i.e. the training CBIS-DDSM dataset, with excluded images used for the test. Before the training of the network, the data was divided into two datasets in proportion 4:1, the first one being used as the network training data, and the other one used as the network validation data. As a result, 512 × 512 pixels ROIs were extracted from complete mammograms containing 141 architectural distortions (ADs) and 265 spiculated masses (SMs). Moreover, there were extracted 1191 ROIs (512 × 512 pixels) depicting normal breast tissue, which surrounds pathological findings in mammograms. In the real scenario, the selected ROIs provide appropriate proportions of various types of breast tissue density, including images of normal breast tissue similar to architectural distortions. The effectiveness of our method was confirmed by the proposed objective assessment measures of spicule recognition in the analyzed ROIs, defined as: P +TN – accuracy = T P +N – precision = T PT+PF P – recall = TPP where T P - true positives, F P - false positives, P - all positive cases, N - all negative cases – the area under the receiver operating characteristic curve (AUC ROC) a curve created by plotting the true positive rate against the false positive rate at various thresholds.

Classification assessment measures are calculated taking into account that the ROIs contain the pathological findings in their entirety, marked by radiologists in the CBIS-DDSM database. Hence, TP and FP relate to the detection of spiculated findings in ROIs, and not to the recognition of their accurate contours. Firstly, the types of wavelet were selected based on the experiment with the use of 1-channel architecture model of CNN. Wavelet types used for testing purposes were the Haar wavelet (haar1 ), the reverse bioorthogonal wavelet (rbio3.1 ), and the bioorthogonal wavelet (bior1.3 ). Each of these wavelet types was used in 3-level wavelet decomposition, thus forming input images of size 256 × 256 pixels, 128 × 128 pixels, and 64 × 64 pixels, approximately (the exact size, differing only by a few pixels, depends on the wavelet type used). The promising results were obtained using rbio3.1 and 1-level wavelet decomposition (Table 1). Comparing to the other wavelet types, the classification accuracy of both ADs and SMs was slightly higher using rbio3.1, whereas in case of ADs, pathologies more difficult to detect in mammograms were even more noticeable and reached 88%. Therefore, the reverse bioorthogonal wavelet was used in further experiments. Moreover, to enhance the model performance and to receive a more balanced training dataset, the data was augmented by flipping and rotating the images. Hence, the database was increased to 3425 input images (1900 ROIs with normal

Wavelet Convolution Neural Network for Classification

203

Table 1. The classification results of wavelet CNN (1-channel architecture) for spiculated findings, divided into three groups: architectural distortions (ADs), spiculated masses (SMs) and both ADs+SMs simultaneously with the use of different wavelet types and CNN without data augmentation Wavelet types Accuracy ADs SMs ADs+SMs haar

0.86 0.89 0.83

rbio3.1

0.88 0.90 0.84

bio1.3

0.75 0.76 0.78

breast tissue, 500 ROIs - architectural distortions, and 1025 ROIs - spiculated masses) with nearly 1:1 classes balance. The tests performed on the augmented dataset yielded slightly improved results for recognition of both ADs and SMs (Table 2) - the accuracy increased by 3% with simultaneous increase in the precision for normal cases and the recall for pathological cases. Subsequently, the influence on spicule recognition of the number of wavelet decomposition levels was examinated with the use of the augmented dataset. There were no significant differences in the results for the first or the second level wavelet decomposition used in the classification CNN with 1-channel or 2-channel architecture (Table 2). The results achieved with the use of 2-level wavelet decomposition were comparable, with a slight decrease in the recall for pathologies. Moreover, when taking into consideration the results for 1-channel and 2-channel architecture model, it is noticeable that the recognition accuracy for both ADs and SMs is comparable, but slightly higher in case of SMs recognition, whereas the precision for all pathological cases (ADs+SMs) is slightly increased and the recall slightly decreased. The database was divided into a training and a validation dataset in proportions 3:1. All models of CNN were evaluated on the ROIs, without data augmentation, and neither were there used in the training or validation step. The balanced evaluation-dataset consists of 101 ROIs with normal breast tissue, 41 ROIs with ADs and 60 ROIs with SMs. The unbalanced evaluation-dataset was also examined, as in a real life scenario, containing more images with normal breast tissue (240 ROIs) than with pathologies (41 ROIs - ADs, and 60 ROIs SMs). The evaluation of the proposed classification method was made on three datasets - one containing only images with both normal breast tissue and ADs, the second with both normal breast tissue and SMs, and the third - with both normal breast tissue and all pathologies (ADs+SMs). The neural network model was fed with a 32-element batch for each training step. The input images were standardized before being fed into the network. As for the optimizer, a stochastic gradient descent was used. For the first 10 epochs, the learning rate was set to 0.01, whereas for the next 20 epochs it was set to 0.001, giving 30 epochs in total. For the neural network implementation Keras, framework was used [18,22], and PyWavelet library was applied for wavelet

204

M. Jasionowska and A. Gacek

Table 2. The classification results of wavelet CNN for spiculated findings, divided into three groups: architectural distortions (ADs), spiculated masses (SMs) and both ADs+SMs simultaneously with the use of rbio3.1 wavelet type and CNN with data augmentation Type of pathologies 1-LEVEL DWT (2-LEVEL DWT), 1-CHANNEL CNN Accuracy Precision Recall AUC ROC ADs

0.85 (0.85) 0.50 (0.50) 0.85 (0.78) 0.74 (0.75)

SMs

0.87 (0.88) 0.62 (0.63) 0.93 (0.92) 0.80 (0.81)

ADs+SMs

0.87 (0.87) 0.72 (0.73) 0.90 (0.86) 0.84 (0.85)

Type of pathologies 1-LEVEL DWT (2-LEVEL DWT), 2-CHANNEL CNN Accuracy Precision Recall AUC ROC ADs

0.87 (0.86) 0.52 (0.52) 0.80 (0.80) 0.93 (0.82)

SMs

0.88 (0.88) 0.64 (0.64) 0.88 (0.92) 0.95 (0.85)

ADs+SMs

0.87 (0.87) 0.74 (0.74) 0.85 (0.87) 0.94 (0.84)

transform [23]. The results of ROI classification based on evaluation-dataset were presented in Table 3 and Table 4 respectively, for unbalanced and balanced tests. Table 3. The classification results for spiculated findings, divided into three groups: architectural distortions (ADs), spiculated masses (SMs) and both ADs+SMs simultaneously for the unbalanced dataset using rbio3.1 wavelet type, 1-level wavelet decomposition and 2-channel architecture of CNN Type of pathologies Classification results Accuracy Precision Recall AUC ROC ADs

0.85

0.52

0.80

SMs

0.88

0.64

0.88

0.93 0.95

ADs+SMs

0.87

0.74

0.85

0.94

It should be noted that for ADs the results were significantly worse than for SMs due to ADs characteristics. This fact is consistent with the data reported in the literature. Our results seem to confirme that the computer-aided recognition of ADs is more difficult than recognition of SMs, which are more precisely defined, have more visible characteristics and repeatable size in the radiological assessment. The lack of normal breast tissue model is another obstacle in detection. The image similarity of normal and abnormal tissue (Fig. 4) is so significant that it is difficult to find well-differentiating measures. The limitations make subtle spiculated pathologies, especially ADs, extremely difficult to differentiate due to their subtlety and a high degree of similarity to other non-abnormal types of spiculated structures. Various manifestations of mammographic spicules are

Wavelet Convolution Neural Network for Classification

205

Fig. 2. The ROC curves for recognition of architectural distortions (left), spiculated masses (middle), and both architectural distortions and spiculated masses (right) - the unbalanced test

Table 4. The classification results for spiculated findings, divided into three groups: architectural distortions (ADs), spiculated masses (SMs) and both ADs+SMs simultaneously for the balanced dataset using rbio3.1 wavelet type, 1-level wavelet decomposition and 2-channel architecture of CNN Type of pathologies Classification results Accuracy Precision Recall AUC ROC ADs

0.83

0.81

0.85

0.90

SMs

0.88

0.87

0.88

0.94

ADs+SMs

0.85

0.85

0.85

0.93

Fig. 3. The ROC curves for recognition of architectural distortions (left), spiculated masses (middle), and both architectural distortions and spiculated masses (right) - the balanced test

206

M. Jasionowska and A. Gacek

Fig. 4. The examples of classification results - correctly (top) and incorrectly (bottom) recognized ROIs with architectural distortions (left), spiculated masses (middle), and normal breast tissue (right)

highly unstable and relatively case-dependent. Generally, breast tissue is manifested as a directionally oriented image texture. Normal breast tissue converges frequently but not always towards the nipples, whereas the spiculated findings can be distinguishable in mammograms by groups of spicules radiating from a certain area with invisible or almost invisible mass. Consequently, it is difficult to define unambiguous measures that correctly differentiate the images of normal breast tissue from the images of ADs or subtle SMs. When summarizing the results presented in Tables 3 and 4 and on the charts with ROC curves (Fig. 2 and Fig. 3, respectively), it is worth emphasizing the importance of data balance in the CNN test step. While using the balanced dataset, it is possible to obain a similar accuracy value with higher precision. In case of the unbalanced dataset, a lower precision value results from the higher number of false positives, which is expected with the dataset of normal breast tissue larger than the dataset containing pathologies.

4

Conclusions

This paper attempts to show that wavelet-like convolution neural network was concluded to be useful for recognition of spiculated findings in low-contrast noisy mamograms under certain conditions. Firstly, the training dataset, used in the convolution neural network, should be adequately balanced. From the viewpoint of image processing, mammographic spicules are referred to as piecewise linear structures of various directionality. Hence, the suitably selected local directional

Wavelet Convolution Neural Network for Classification

207

characteristics of image texture seems to be ensured. Thus, the concept is that the use of more precise directional image representation tends to be a more appropriate solution of input image creation, used in the convolution neural network. This approach is planned to be verified in further research.

References 1. Dziukowa, J. (ed.): Mammografia w diagnostyce raka sutka, Warszawa (1998). (in Polish) 2. Sampat, M.P., Markey, M.K., Bovik, A.C.: Computer-aided detection and diagnosis in mammography. In: Bovik, A.C. (ed.) Handbook of Image and Video Processing, 2nd edn., pp. 1195–1217. Academic, New York (2005) 3. Kolb, T.M., Lichy, J., Newhouse, J.H.: Comparison of the performance of screening mammography, physical examination and breast US and evaluation of factors that influence them: an analysis of 27,825 patient evaluations. Radiology 225(1), 165– 175 (2002) 4. Leon, S., Brateman, L., Honeyman-Buck, J., Marshal, J.: Comparison of two commercial CAD systems for digital mammography. J. Digit. Imaging 22(4), 421–423 (2009) 5. Karssemeijer, N., Te Brake, G.M.: Detection of stellate distortions in mammograms. IEEE Trans. Med. Imaging 15(5), 611–619 (1996) 6. Kegelmeyer, W.P.: Evaluation of stellate lesion detection in a standard mammogram data set. In: Bowyer, K.W., Astley, S. (eds.) State of the Art in Digital Mammographic Image Analysis, pp. 262–279. World Scientific (1993) 7. Rangayan, R.M., Ayres, F.J.: Gabor filters and phase portraits for the detection of architectural distortion in mammograms. Med. Biol. Eng. Comput. 44(10), 883– 894 (2006) 8. Ozekes, S., Osman, O., Camurcu, A.Y.: Mammographic mass detection using a mass template. Korean J. Radiol. 6(3), 221–228 (2005) 9. Kim, H.J., Kim W.H.: Automatic detection of spiculated masses using fractal analysis in digital mammography. LNCS, vol. 3691, pp. 256–263. Springer, Heidelberg (2005) 10. Sampat M.P., Whitman G.J., Markey M.K.,Bovik A.C.: Evidence based detection of spiculated masses and architectural distortions. SPIE, Medical Imaging 2005: Image Processing, vol. 5747, pp. 26–37 (2005) 11. Rashed, E.A., Ismail, I.A., Zaki, S.I.: Multiresolution mammogram analysis in multilevel decomposition. Pattern Recogn. Lett. 28, 286–292 (2007) 12. Shenk, V.U.B., Brady, M.: Finding CLS using multiresolution oriented local energy feature detection. In: Proceedings 6th International Workshop on Digital Mammography, pp. 64–68 (2002) 13. Berks, M., Taylor, C., Rahim, R., Boggis, C., Astley, S.: Modelling structural deformations in mammographic tissue using the dual-tree complex wavelet. LNCS, vol. 6136, pp. 145–152 (2010) 14. Fujieda, S., Takayama, K., Hachisuka, T.: Wavelet convolution neural networks. Comput. Vis. Pattern Recognit. (2018). arXiv:1805.08620 15. Costa, A.C., Oliveira, H.C.R., Catani, J.H., de Barros, N., Melo, C.F.E, Vieira, M.A.C.: Data augmentation for detection of architectural distortion in digital mammography using deep learning approach. Comput. Vis. Pattern Recognit. (2018). arXiv:1807.03167

208

M. Jasionowska and A. Gacek

16. Liu, X., Zhai, L., Zhu, T.: Recognition of architectural distortion in mammographic images with transfer learning. In: 2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) (2016) 17. Ben-Ari, R., Akselrod-Ballin, A., Karlinsky, L., Hashoul, S.: Domain specific convolutional neural nets for detection of architectural distortion in mammograms. In: IEEE 14th International Symposium on Biomedical Imaging, Israel (2017) 18. DataCamp: Convolutional Neural Networks in Python with Keras. https://www. datacamp.com/community/tutorials/convolutional-neural-networks-python 19. Sawyer Lee R., Gimenez F., Hoogi A., Rubin D.: Curated Breast Imaging Subset of DDSM. The Cancer Imaging Archive (2016). http://dx.doi.org/10.7937/K9/ TCIA.2016.7O02S9CY 20. Clark, K., Vendt, B., Smith, K., Freymann, J., Kirby, J., Koppel, P., Moore, S., Phillips, S., Maffitt, D., Pringle, M., Tarbox, L., Prior, F.: The cancer imaging archive (TCIA): maintaining and operating a public information repository. J. Digit. Imaging 26(6), 1045–1057 (2013) 21. Digital Database for Screening Mammography (DDSM). University of South Florida, Florida, USA. http://marathon.csee.usf.edu/Mammography/Database. html 22. Keras: The Python Deep Learning Library. https://keras.io/ 23. Lee, G., Gommers, R., Wasilewski, F., Wohlfahrt, K., O’Leary, A., Nahrstaedt, H., Contributors: PyWavelets - Wavelet Transforms in Python (2006). https://github. com/PyWavelets/pywt

Weakly Supervised Cervical Histopathological Image Classification Using Multilayer Hidden Conditional Random Fields Chen Li1 , Hao Chen1 , Dan Xue1 , Zhijie Hu1 , Le Zhang2 , Liangzi He1 , Ning Xu3 , Shouliang Qi1 , He Ma1 , and Hongzan Sun2(B) 1

Microscopic Image and Medical Image Analysis Group, Northeastern University, Shenyang, China {lichen,qisl,mahe}@bmie.neu.edu.cn, [email protected], [email protected], [email protected], [email protected] 2 Shengjing Hospital, China Medical University, Shenyang, China [email protected], [email protected] 3 Liaoning Shihua University, Fushun, China [email protected]

Abstract. In this paper, a novel Multilayer Hidden Conditional Random Fields based weakly supervised Cervical Histopathological Image Classification framework is proposed to classify well, moderately and poorly differentiation stages of cervical cancer. First, color, texture and Deep Learning features are extracted to represent the histopathological image patches. Then, based on the extracted features, Artificial Neural Network, Support Vector Machine and Random Forest classifiers are designed to calculate the patch-level classification probability. Thirdly, effective features are selected to generate unary and binary potentials of the proposed Multilayer Hidden Conditional Random Fields framework. Lastly, using the generated potentials, the final image-level classification result is predicted by our Multilayer Hidden Conditional Random Fields model, and an accuracy of 88% is obtained on a practical histopathological image dataset with more than 100 AQP stained samples. Keywords: Cervical cancer · Histopathological image · Weakly supervised learning · Feature extraction · Deep learning Conditional random fields

1

·

Introduction

Among females, cervical cancer ranks fourth for both incidence and mortality in the world. In 2018, the worldwide number of new cases of cervical cancer is 569847, accounting for 3.2% of all new cancer cases; the number of cervical cancer deaths is 311365, accounting for 3.3% of all cancer deaths. In all of the 185 countries surveyed, the incidence of cervical cancer is the highest among c Springer Nature Switzerland AG 2019 E. Pietka et al. (Eds.): ITIB 2019, AISC 1011, pp. 209–221, 2019. https://doi.org/10.1007/978-3-030-23762-2_19

210

C. Li et al.

women in 28 countries, and the number of countries with the highest mortality rates reaches 42 [1]. In recent years, Machine Learning (ML) plays a more and more important role in the computer-aided diagnosis (CAD) of cervical cancer. In terms of Cervical Histopathological Image Classification (CHIC), a variety of ML methods are developed and applied to image segmentation, feature extraction and classification tasks. From decision trees to Support Vector Machines (SVMs) [4], from classical Artificial Neural Networks (ANNs) [15] to complex Deep Learning (DL) [25], the ML methods in the CHIC field are constantly updated with the development of technology. Due to the fact that Conditional Random Fields (CRFs) [19] can characterize the spatial relationship of images, they are suitable for analyzing the contents of complex images. Hence, we propose a weakly supervised Multilayer Hidden Conditional Random Fields (MHCRF) framework to address the CHIC problem, where the cervical histopathological images are mixed with complicated nuclei, interstitial and tissue fluid, and only have imagelevel labels from medical doctors. Furthermore, as far as we know, MHCRF methods are not used in the CHIC field before this work. The workflow of the proposed weakly supervised MHCRF model is shown in Fig. 1, and the details are introduced in Sect. 3:

Fig. 1. Workflow of the proposed weakly supervised MHCRF model

– Step 1 (Input Data): Cervical histopathological digital images of training set and validation set are input to the proposed MHCRF framework for a weakly supervised learning, where these images only have image-level labels. – Step 2 (Image Pre-processing): Image meshing is used as a method of image pre-processing to match the next feature extraction step. First, all the image

Weakly Supervised Cervical Histopathological Image

–

–

–

–

211

sizes are unified to 1280 × 960 pixels, then images are meshed into patches (100 × 100 pixels). Step 3 (Feature Extraction): Multiple features are extracted from the preprocessed image patches, including color, texture and DL features. Color features: Color histograms [23] of R, G, B and Gray channels. Texture features: Scale-invariant Feature Transform (SIFT) [21], DAISY [28], Gray-level Co-occurrence Matrix (GLCM) [11] and Histogram of Oriented Gradient (HOG) [5] features. DL features: transfer learning based Inception-V3 [27] and VGG-16 [25] features. Step 4 (Post-processing): To obtain a priori probability, SVMs, ANNs, and RFs are used to pre-classify image patches. In the SVMs, ’RBF’ and ‘Linear’ kernels are applied. In the ANNs, different hidden layers are compared. In the RFs, different numbers of trees are tested. So, 19 classifiers, which can be trained with seven features, and result in 133 patch-level classification results are obtained. Finally, the top 8% of the 133 are selected for a further joint probability calculation. Step 5 (Classifier Design): Based on the selected patch-level classification results, unary and binary potentials of the MHCRF are generated and combined to calculate the joint probability for the final image-level classification result. Step 6 (System Evaluation): Test images are input to the trained MHCRF framework to evaluate the effectiveness of the proposed method.

2 2.1

Related Work Cervical Cancer and Histopathological Image Analysis

According to the data of the female cancer diseases in the world [1], the incidence of cervical cancer in 2018 accounts for 6.6%, and the mortality rate is 7.5%, where both the morbidity and mortality are ranked fourth. Therefore, the prevention and timely diagnosis of cervical cancer is particularly important. To diagnose cervical cancer, the biopsy diagnosis is a general and effective way, including the pap test [8], colposcopy test [18], and cervical conization [24]. Furthermore, the histopathological image analysis is considered as the “gold standard” for the whole cervical cancer diagnostic procedure. 2.2

Machine Learning Techniques

Feature Extraction Scale Invariant Feature Transform (SIFT) is a classical operator for extracting local features of images [21]. The essence of SIFT algorithm is to find the key points in different scale space and calculate the direction of the key points [22]. DAISY is a description operator which can quickly calculate local image features in the face of dense feature extraction [28]. DAISY extends the basic idea of SIFT: Block statistical gradient direction histogram. The difference is

212

C. Li et al.

that DAISY uses Gaussian convolution to aggregate the histograms of gradient direction [29]. Because of the convolution property of Gaussian kernels, the gradient graphs with different weights can be obtained only by convoluting the gradient graphs several times when calculating DAISY operators. The gray level of the pixel appears repeatedly in the spatial position to form the texture of an image [11]. Gray-level Co-occurrence Matrix (GLCM) describes the joint distribution of the gray level of two pixels with some spatial position relationship. GLCM not only reflects the distribution characteristics of brightness, but also reflects the location distribution characteristics between pixels with the same brightness or near brightness, so it is a second-order statistical feature of image brightness change [14]. Histogram of Oriented Gradient (HOG) feature is a descriptor for object detection in computer vision and image processing [5]. The HOG method is based on the computation of normalized local orientation gradient histograms in dense grids [26]. The HOG method can maintain good geometric invariance and optical deformation. In image processing, a color histogram is a representation of the distribution of colors in an image [23]. For digital images, a color histogram represents the number of pixels that have colors in each of a fixed list of color ranges. The color histogram is a statistic that can be viewed as an approximation of an underlying continuous distribution of colors values. Classifier Models Support Vector Machine (SVM) is a supervised learning algorithm [4]. The basic idea of SVM is to find the best separating hyperplane in a feature space to maximize the interval between positive and negative samples in the training set. The core idea of SVM is to make every effort to maximize the separation between the two categories, so as to make the separation more credible. Moreover, it has good classification and prediction ability for unknown new samples [3,6]. Artificial Neural Networks (ANNs) are computing systems vaguely inspired by the biological neural networks that constitute animal brains [15]. The neural network itself is not an algorithm, but rather a framework for many different machine learning algorithms to work together and process complex data inputs. Random Forest (RF) algorithm is to train multiple decision trees, generate models, and then use multiple decision trees to classify [13]. Random forests have improved the prediction accuracy without increasing the computational complexity [2]. Random forests are insensitive to multivariate collinearity. The results are robust to missing data and unbalanced data, and can predict the effects of up to thousands of explanatory variables. Weakly supervised learning is a branch of ML strategy, which only annotates an image with the label of its category, but do not give any other annotations. Hence, the weakly supervised learning approach is a suitable solution for big dataset or complex image labelling problems. For example, in the work of [20], a sparse coding based weakly supervised learning framework is introduced to address a microscopic image classification task.

Weakly Supervised Cervical Histopathological Image

2.3

213

Applications of Conditional Random Fields

In [7], a systematic survey on automated cancer diagnosis based on histopathological images is specially proposed, where related technologies from three aspects are summarized, including image pre-processing, feature extraction and diagnosis. These technologies basically cover five feature extraction and three classification methods mentioned above. However, as far as we know, effective CRF models are not applied in the CHIC fields. CRFs are a kind of undirected graph model, and are often used in the fields of natural language processing [19], and computer vision [12]. Especially, because the CRFs can describe the spatial information of an image effectively, they are good at analysing complex contents within the image. For example, in [31], a probabilistic discriminative approach is proposed to fuse contextual constraints in functional images based on the CRF and applied it to the detection of brain activation from both synthetic and real fMRI data. As well, in our previous work [16], an environmental microorganism classification engine is proposed, which can automatically analyse microscopic images using CRF and deep learning features.

3 3.1

Multilayer Hidden Conditional Random Fields Basic Knowledge of CRFs

Conditional Random Field (CRF) was first proposed in [19]. The definition of a CRF is as follows: Firstly, X is defined as a random variable of the data sequence to be labelled, and Y is a random variable of the corresponding label sequence. Then, let G = (V, E) be a graph such that Y = (Yv )v∈V , so that Y is indexed by the vertices of G. V is the array of all sites, corresponding to the nodes of an associated undirected graph G = (V, E), whose edges E model interactions between adjacent sites. So, (X, Y) is a conditional random field in case, when conditioned on X, the random variables Yv obey the Markov property with respect to the graph: p = (Yv |X, Yw , w = v) = p(Yv |X, Yw , w ∼ v), where w ∼ v means that w and v are neighbours in G. This means that the CRF is an undirected graphical model whose nodes can be divided into two disjoint sets X and Y, which is the observed variable and the output variable. Then the modelled conditional distribution is p(Y|X). According to the basic theorem of the random fields in [10], the joint distribution on the label sequence Y of a given X has the form as Eq. (1). ⎛ ⎞ λk fk (e, y|e , x) + μk gk (v, y|v , x)⎠ , (1) pθ (y|x) ∝ exp ⎝ e∈E,k

v∈V,k

where x is a data sequence, y a label sequence, and y|S is the set of components of y associated with the vertices in sub-graph S. It can be known from the literature [9,32] that Eq. (1) can be rewritten as Eq. (2). 1 p(Y|X) = ψC (YC , X), (2) Z C

214

C. Li et al.

where Z = XY P (Y|X) is the normalization factor, and ψC (YC , X) is the potential function on the clique C. A clique, C, in an undirected graph G = (V, E) is a subset of the vertices, C ⊆ V , such that every two distinct vertices are adjacent. 3.2

The Proposed MHCRF Model

Structure Our MHCRF can be expressed by Eq. (3). p(X|Y) =

1 ϕi (xi ; Y) ψij (xi , xj ; Y), Z i∈V

where Z=

XY i∈V

ϕi (xi ; Y)

(3)

(i,j)∈E

ψij (xi , xj ; Y)

(4)

(i,j)∈E

is the normalization factor; V is the set of all nodes in the graph G; E is the set of the all edges. The clique potential function consists of two parts: The unary potential function ϕi (xi , Y) is used to measure the probability that a node i is labelled as xi for a given observation vector Y; the binary potential function ψij (xi , xj ; Y) is used to describe the adjacent nodes i and j in the graph. The spatial context relationship between them two is related not only to the tag of node i but also to the tag of its neighbour node j. Finally, find the ˜ = arg maxX p(X|Y) can solve the problem of image largest posterior label X classification. The structure of our MHCRF model is shown in Fig. 2: In Fig. 2, Layer 1 shows the real labels of those patches which correspond one-to-one with Layer 2. Layer 3 denotes seven kinds of features of each patch, including color histogram RGBGray, SIFT, DAISY, GLCM, HOG, InceptionV3 and VGG-16 features. Binary potential has another layer, namely the Layer 3.5, where the features of the target patches are obtained by calculating the characteristics of the surrounding patches according to the layout. Then in Layer 4, these features are classified by four kinds of classifiers, Linear-SVMs, RBFSVMs, ANNs and RFs, to obtain a priori probability. Furthermore, in Layer 5, according to the Gaussian distribution and proportion, the most effective features are selected. In Layer 6, the selected features are jointly used, and the best results are further selected to obtain the final unary or binary potential models in Layer 7. Finally, combine the unary and binary potential models, and the proposed MHCRF model is obtained in Layer 8. Unary Potential The probability of a label xi taking a value c ∈ L is connected with the unary potential parts ϕi (xi ; Y) of the Eq. (3) given the data Y by ϕi (xi ; Y) ∝ p(xi = c|fi (Y)) [17], where the image data is expressed as site-wise feature vectors fi (Y) which may depend on all the data of Y. We extract seven kinds of features from each patch. Color features: We extract the histograms of the R, G, B color channels and gray-scale format of the image, and obtain a 1024-dimensional feature vector [30]. Texture features:

Weakly Supervised Cervical Histopathological Image

215

Fig. 2. The structure of our weakly supervised MHCRF model. The left part shows the structure of the unary potential model and the right part shows the structure of the binary potential model

There are four texture features extracted I have described in Sect. 2.2, including SIFT [22], DAISY [28], GLCM [14] and HOG [5] features. The vector dimensions of these texture features are 128, 200, 64, and 4356, respectively. In addition, there are two deep learning features are extracted, one is Inception-V3 [27] and another is VGG-16 [25], where transfer learning by ImageNet is applied, and the second last layer are fine-tuned with our cervical histopathological images. Finally, the length of the extracted deep learning feature vectors is set to 1000 dimensions based on our pre-tests. In order to get the label probability, we use a total of 19 classifiers in four categories. Including, Linear-SVMs, RBF-SVMs, ANNs and RFs. When choosing ANNs as the classifier, we use the quantization gradient algorithm “trainscg”, and the hidden layer uses the six forms from one to six layers, respectively. Similarly, when we use RFs as classifiers, the number of trees ranges is 2n (n = 1, 2, ..., 11). According to the seven features and 19 classifiers, we obtain 133 results initially. Then, based on the Gaussian distribution of these 133 results, we select the top 8% of them (about 11). Among them, the number of the top 8% of the deep learning features is about three, and the number of the top 8% of the handcraft features is about eight. In the handcraft features, color features and texture fea-

216

C. Li et al.

tures have the same numbers, so the most effective four features of each of them are selected, respectively. Next, the selected 11 results are combined separately, and the number of the combinations is a factorial of 11, i.e. 39916800. Finally, the top ten combinations of them are further selected as promising candidates to generate the unary potential. Binary Potential The binary potential part ψij (xi , xj ; Y) of the Eq. (3) shows how probably the pair of adjacent sites i and j is to take the label (xi , xj ) = (c, c ) given the data: ψij (xi , xj ; Y) = p(xi = c; xj = c |fi (Y)fj (Y)) [17]. Figure 3 shows the layout of our binary potential. We use this “lattice” layout to characterize the feature vector of each patch by calculating the sum of each patch of eight neighbourhood feature vectors. The other steps are consistent with the unary potential in Sect. 3.2.

Fig. 3. Binary potential layout. “ ” denotes that the sum of the eight neighbourhood feature vectors is used as the feature vector of the target patch

4 4.1

Experimental Results Experimental Setting

Data source: Two practical medical doctors from Shengjing Hospital of China Medical University provide image samples and give image-level labels; Staining method: Immunohistochemical (IHC) Staining, AQP; Magnification: 400×; Microscope: Nikon (Japan); Acquisition software: NIS-Elements F 3.2; Image size: 1280 × 960 pixels; Image format: *.png. There are 103 images in the dataset, of which 35 are well differentiation, 35 are moderate differentiation, and 33 are poorly differentiation. And we divide this dataset into training, validation and test sets. There are 9 well, 9 moderate and

Weakly Supervised Cervical Histopathological Image

217

9 poorly differentiation images in the training set. There are 9 well, 9 moderate and 8 poorly differentiation images in the validation set. There are 17 well, 17 moderate and 16 poorly differentiation images in the test set. Figure 4 shows some examples of the dataset.

Fig. 4. Examples of our CHIC dataset. The first, second and last row show well, moderate and poorly differentiation images, respectively

4.2

Evaluation of Unary and Binary Potentials

First, in order to select effective features to general our unary and binary potentials, we compare the 133 accuracies of single results on the validation set in the patch-level. Then, the classification results of the selected combinations and the generated potentials in the image-level are shown in Fig. 5, where the labels on the horizontal axis denote the ten selected candidates and the final optimized combination for the potentials. Further, “RGBGray” means the color features extracted from R, G, B channels and gray-level images. And the number after “ANN” means the number of hidden layers. Moreover, the number (2n , n = 1, 2, ..., 11) after the “RF” refers to the index of the tree. The image-level classification result of the final MHCRF model of the unary potential on the validation set images is shown in Fig. 6(a), where the classification accuracy is 84.6%. The classification result of the final model of the binary potential on the validation set images is shown in Fig. 6(b), where the classification accuracy of the binary potential model to the verification set is 84.6%.

218

C. Li et al.

Fig. 5. Comparison of image-level classification accuracies between selected single results and (a) unary or (b) binary potentials on the validation set

Fig. 6. Classification result. (a)–(d) represent the confusion matrices for the classification results of unary potential on validation set, binary potential on validation set, the MHCRF model on validation set, and the MHCRF model on test set, respectively

4.3

Joint Probability Calculation

The classification result of the proposed weakly supervised MHCRF model on the validation and test sets are shown in Fig. 6(c) and (d), respectively. We can see that the accuracies on the validation and test sets are 84.6% and 88%, respectively. From these results, it can be seen that although the combined result of the unary and binary potentials is stable on the verification set, it has an improved classification performance on the test set. Furthermore, Fig. 7 shows some examples of the classification results by the MHCRF model on the test set. According to our analysis and speculation, the reasons for image classification errors are as follows. Firstly, because the contents of the cervical histopathological images are complex, where the characteristics and properties between various differentiation stages are not always obviously different, resulting in a difficulty of image feature extraction. Secondly, the applied binary potential layout has a small coverage and cannot effectively contain spatial information.

Weakly Supervised Cervical Histopathological Image

219

Fig. 7. An example of the classification results

5

Conclusion and Future Work

In this paper, we propose a weakly supervised MHCRF model to classify the cervical histopathological images into well, moderate and poorly differentiation three stages. The proposed MHCRF method not only considers the handcraft color and texture features, but also combines the state-of-the-art deep learning techniques into the framework. Furthermore, this MHCRF model builds both unary and binary potentials to describe the spatial relationship between the image locations. In the experiment, the proposed method is tested on a practical dataset and obtains an overall classification accuracy of 88%, showing the effectiveness and potential of the method. In the future, we plan to use our MHCRF model to classify cervical histopathological images stained by other IHC methods and even other cancers. And we will try to use more types of features and classification algorithms to improve the MHCRF model. Acknowledgment. We thank the funds supported by the “National Natural Science Foundation of China” (No. 61806047), the “Fundamental Research Funds for the Central Universities” (No. N171903004), and the “Scientific Research Launched Fund of Liaoning Shihua University” (No. 2017XJJ-061). We also thank Hao Chen and He Ma, due to their contributions are considered as the same important as the first author and corresponding author, respectively.

References 1. Bray, F., Ferlay, J., Soerjomataram, I., et al.: Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 68(6), 394–424 (2018) 2. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

220

C. Li et al.

3. Chang, C., Lin, C.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011) 4. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995) 5. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of CVPR 2005, vol. 1, pp. 886–893 (2005) 6. Decoste, D., Sch¨ olkopf, B.: Training invariant support vector machines. Mach. Learn. 46(1–3), 161–190 (2002) 7. Demir, C., Yener, B.: Automated Cancer Diagnosis Based on Histopathological Images: A Systematic Survey. Technical Report, Rensselaer Polytechnic Institute (2005) 8. Fahey, M., Irwig, L., Macaskill, P.: Meta-analysis of Pap test accuracy. Am. J. Epidemiol. 141(7), 680–689 (1995) 9. Gupta, R.: Conditional Random Fields. Unpublished Report, IIT Bombay (2006) 10. Hammersley, J., Clifford, P.: Markov Fields on Finite Graphs and Lattices. Unpublished (1971) 11. Haralick, R., Shanmugam, K., Dinstein, I.: Textural features for image classification. IEEE Trans. Syst. Man Cybern. 6, 610–621 (1973) 12. He, X., Zemel, R., Carreira-Perpi˜ n´ an, M.: Multiscale conditional random fields for image labeling. In: Proceedings of CVPR 2004, vol. 2, pp. II–II (2004) 13. Ho, T.: Random decision forests. In: Proceedings of ICDAR 1995, vol. 1, pp. 278– 282 (1995) 14. Kekre, H., Thepade, S., Sarode, T., et al.: Image retrieval using texture features extracted from GLCM, LBG and KPE. Int. J. Comput. Theory Eng. 2(5), 695 (2010) 15. Kohonen, T.: An introduction to neural computing. Neural Netw. 1(1), 3–16 (1988) 16. Kosov, S., Shirahama, K., Li, C., et al.: Environmental microorganism classification using conditional random fields and deep convolutional neural networks. Pattern Recognit. 77, 248–261 (2018) 17. Kumar, S., Hebert, M.: Discriminative random fields. Int. J. Comput. Vis. 68(2), 179–201 (2006) 18. Kumar, V., Robbins, S.: Robbins Basic Pathology. Saunders/Elsevier, America (2007) 19. Lafferty, J., A.McCallum, Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML 2001, pp. 282–289 (2001) 20. Li, C., Shirahama, K., Grzegorzek, M.: Environmental microorganism classification using sparse coding and weakly supervised learning. In: Proceedings of EMC@ICMR 2015, pp. 9–14 (2015) 21. Lowe, D.: Object recognition from local scale-invariant features. In: Proceedings of ICCV 1999, vol. 2, pp. 1150–1157 (1999) 22. Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004) 23. Novak, C., Shafer, S.: Anatomy of a color histogram. In: Proceedings of CVPR 1992, pp. 599–605 (1992) 24. Nyirjesy, I.: Conization of Cervix (2015). http://emedicine.medscape.com/article/ 270156-overview 25. Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-scale Image Recognition (2014). ArXiv Preprint p. online

Weakly Supervised Cervical Histopathological Image

221

26. Suard, F., Rakotomamonjy, A., Bensrhair, A., et al.: Pedestrian detection using infrared images and histograms of oriented gradients. In: Proceedings of IV 2006, pp. 206–212 (2006) 27. Szegedy, C., Vanhoucke, V., Ioffe, S., et al.: Rethinking the inception architecture for computer vision. In: Proceedings of CVPR 2016, pp. 2818–2826 (2016) 28. Tola, E., Lepetit, V., Fua, P.: A fast local descriptor for dense matching. In: Proceedings of CVPR 2008, pp. 1–8 (2008) 29. Tola, E., Lepetit, V., Fua, P.: Daisy: an efficient dense descriptor applied to widebaseline stereo. IEEE Trans. Pattern Anal. Mach. Intell. 32(5), 815–830 (2010) 30. Wang, X., Wu, J., Yang, H.: Robust image retrieval based on color histogram of local feature regions. Multimed. Tools Appl. 49(2), 323–345 (2010) 31. Wang, Y., Rajapakse, J.: Contextual modeling of functional mr images with conditional random fields. IEEE Trans. Med. Imaging 25(6), 804–812 (2006) 32. Zheng, S., Jayasumana, S., Romera-Paredes, B., et al.: Conditional random fields as recurrent neural networks. In: Proceedings of ICCV 2015 (2015)

A Survey for Breast Histopathology Image Analysis Using Classical and Deep Neural Networks Chen Li1 , Dan Xue1 , Zhijie Hu1 , Hao Chen1 , Yudong Yao1 , Yong Zhang2 , Mo Li2 , Qian Wang2 , and Ning Xu3(B) 1

Microscopic Image and Medical Image Analysis Group, Northeastern University, Shenyang, China {lichen,yyao} @bmie.neu.edu.cn, [email protected], [email protected], [email protected] 2 Liaoning Cancer Hospital & Institute, Shenyang, China [email protected], [email protected], wangqian an [email protected] 3 Liaoning Shihua University, Fushun, China [email protected]

Abstract. Because Breast Histopathology Image Analysis (BHIA) plays a very important role in breast cancer diagnosis and medical treatment processes, more and more effective Machine Learning (ML) techniques are developed and applied in this field to assist histopathologists to obtain a more rapid, stable, objective, and quantified analysis result. Among all the applied ML algorithms in the BHIA field, Artificial Neural Networks (ANNs) show a very positive and healthy development trend in recent years. Hence, in order to clarify the development history and find the future potential of ANNs in the BHIA field, we survey more than 60 related works in this paper, referring to classical ANNs, deep ANNs and methodology analysis. Keywords: Breast cancer · Histopathology image · Artificial neural networks · Deep learning · Feature extraction Classification

1

·

Introduction

According to the statistics from American Cancer Society (ACS), the incidence of breast cancer is in the first place of malignancy in the world [51]. In order to diagnose and treat the breast cancer, Breast Histopathology Image Analysis (BHIA) is applied as the most direct and effective approach in medical processes. In histopathological research, the sections are examined under a microscope to analyse the characteristics and properties of the tissues by a histopathologist [48]. In the traditional way, the tissue sections are observed by naked eyes of the histopathologist directly, and the visual information is analysed based on the priori medical knowledge manually. However, the objectivity of this manual c Springer Nature Switzerland AG 2019 E. Pietka et al. (Eds.): ITIB 2019, AISC 1011, pp. 222–233, 2019. https://doi.org/10.1007/978-3-030-23762-2_20

A Survey for Breast Histopathology Image Analysis

223

analysing process is unstable, depending on experiences, work load and mood of the histopathologist greatly. To this end, in order to enhance the objectivity and quantitative level of the histopathological research, effective Machine Learning (ML) approaches are developed and applied in recent decades, e.g., principal component analysis (PCA), support vector machines (SVMs) and Artificial Neural Networks (ANNs) [59]. Especially, because the ANN algorithms have a very positive and robust performance in the BHIA field, we propose this paper to review more than 60 related works from the technical points, and focus on the development history and further potential of the ANNs. Hence, this paper is very suitable for computer scientists to consult. In Fig. 1, an example of breast histopathology images stained by different methods is shown, where the Hematoxylin and Eosin (H&E) staining approach is the most common method.

Fig. 1. An example of breast histopathology images stained by different methods (a) is H&E, (b) is P63, (c) is CK5/6, and (d) is Calponin

Because ML systems are usually semi- or full-automatic, they are effective and can save a lot of human resource. In addition, because ML approaches only need some cheap equipment, like microscopes and computers, the BHIA work can reduce many financial investments. Hence, ML can help histopathologists to obtain useful microscopic information effectively. Especially, because ANN methods, including both classical and deep neural networks, are a kind of very effective ML algorithms [44], they are widely used in the BHIA fields for image segmentation, feature extraction and classification tasks in recent years. The development trend is shown in Fig. 2. Although there exist some related survey papers about the BHIA work (e.g., the reviews in [1,4,5,9,10,13,17,20,22–24,58,62]), there is not a special one that focuses on the ANN approaches in this field. Hence, in order to clarify the BHIA work using ANN approaches in recent years, we propose this review paper with

224

C. Li et al.

Fig. 2. The development trend of ANN methods for BHIA tasks. The vertical axis shows the total number of related works. The horizontal axis denotes the time line by years. The blue, orange and green curves represent All ANNs, Classical ANNs and Deep ANNs, respectively

the following structure: In Sec. 2, the BHIA work using classical ANN methods are introduced; in Sec. 3, the state-of-the-art deep ANN methods are summarized; in Sec. 4, reasons of the effectiveness of the ANNs in the BHIA field are analysed and concluded.

2 2.1

BHIA Using Classical ANNs Related Works

Classification Tasks In [46], in order to classify H&E stained breast cancer tissue sections into five types, a third-party software (LNKnet package) is applied to build a forward/back ANN classifier based on nine texture features. In the experiment, 536 samples are used for classifier training, and 526 samples are used for test. Finally, an overall classification accuracy higher than 90% is achieved. In [68–70], an automatic breast cancer classification scheme based on histological images is proposed. First, edge, texture and intensity features are extracted. Then, based on each of the extracted features, an ANN classifier is designed, respectively. Thirdly, an ensemble learning approach, namely ’random subspace ensemble’, is used to select and aggregate these classifiers for an even better classification performance. Finally, an classification accuracy of 95.22% is obtained on a public image dataset. In [52], four types of H&E stained breast histopathology images are classified, using eight features and a three layer forward/back ANN classifier. In the experiment, 1808 training samples, 387 validation samples, and 387 test samples are tested, and an overall accuracy around 95% is achieved. In [31], in order to classify low magnification (10×) breast cancer histopathology images (H&E stained) into three malignancy grades, 30 texture features are

A Survey for Breast Histopathology Image Analysis

225

extracted first. Then, feature selection is applied to find more effective information from the extracted features. Thirdly, a probabilistic neural network (PNN) classifier is built up based on the selected features. Lastly, 65 images are tested in the experiment, and an overall accuracy around 87% is obtained. In [2], to classify cancerous and non-cancerous cells in breast histopathology images, multiple morphological features are extracted first. In the experiment, an ANN classifier achieves an accuracy of 80%. 2.2

Segmentation Tasks

In [26], a competitive neural network is applied as a clustering based method to segment breast cancer regions from needle biopsy microscopic images. In this work, 21 shape, texture and topological features are extracted first. Then, the network is used to cluster the images into different regions based on these features. In the experiment, a dataset with over 500 images is tested, and an overall accuracy around 98.7% is achieved. In [36], a supervised segmentation scheme using multilayer neural network and color active contour model to detect breast cancer nuclei is proposed. In this work, 24 images are used to test the method, and an average accuracy of 95.5% is finally achieved. 2.3

Summary

From the review above, we can find that since the 2000s till the early years of the 2010s, most classifiers in the BHIA field are classical ANNs. This situation is mainly because of the limitation of hardware during these decades, where the computational ability of computers is not high enough to do large scale calculation to extract effective ANN features. Hence, many BHIA works choose ANNs only due to their robust classification abilities.

3 3.1

BHIA Using Deep Neural Networks Related Works

“ICPR 2012” Tasks In the 2012 International Conference on Pattern Recognition (ICPR), a “mitotic figure recognition contest” is released. In [12], in order to detect the mitosis in a breast histology image, a deep maxpooling CNN is built up, which is trained to classify each pixel in the image into a labelled region. In the experiment, 26 images are used for training, 9 for validation, and 15 for test. Finally, an F1-score of 78.2% is achieved. Furthermore, a similar method is used in the work of [61], and an F1-score of 61.1% is obtained. In [33,34], manually designed color, texture, and shape features are jointly used with the machine learned features extracted by a multi layer CNN. Finally, this method obtains an F1-scores up to 65.9% on color scanners and 58.9% on multi-spectral scanners. Similarly, in the work of [64], handcrafted features and deep CNN features are used in an ensemble learning process together, and an F1-score of 73.5% is obtained.

226

C. Li et al.

“BreaKHis” Tasks In 2016, BreaKHis dataset is released in [56], with 7909 images acquired on 82 patients, including both benign and malignant samples. Based on this dataset, many related works are carried out. Related Works of BreaKHis in 2017 – In [54,55,57], based on LeNet and AlexNet, deep ANN methods are used to classify breast histopathology images in the BreaKHis dataset. In the experiment, the dataset is divided into training (70%) and testing (30%) sets, and an overall accuracy around 85% is obtained. – In [53], a transfer learning work is carried out on this task, where an image is first represented by Fisher Vector (FV) encoding of local features extracted using the CNN model pretrained on ImageNet. Then, a new adaptation layer is designed to fine-tune the whole deep learning structure. Finally, an accuracy around 87% is achieved on 30% testing images. Similarly, in [71], another transfer learning strategy is applied to the same task, and achieves an overall accuracy around 90%. – In [43], a deep learning structure with a single convolutional layer is proposed in this classification task, which obtains an accuracy of 77.5%. In contrast, in [28], a deep learning model with multi layer CNNs is built up, and obtains an accuracy up to 90%. Furthermore, in [21], a CNN model, namely the ‘class structure-based deep CNN’ (CSDCNN), is proposed to represent the spatial information within a deep CNN. Related Works of BreaKHis in 2018 – In [35], different ResNet structures are tested and compared for this task, and the ResNet-V1-152 model obtains the best performance with an overall accuracy of 99.6% after 3000 epochs. Similarly, in [50], the effectiveness of three well recognized pre-trained transfer learning models (VGG-16, VGG-19, and ResNet-50 networks) are compared in this task. In the experiment, the VGG16 with a logistic regression classifier obtains the best performance of a 92.6% accuracy. Furthermore, in [40], Inception-V1, Inception-V2 and ResNet-V1-50 based transfer learning methods are compared, and the ResNet-V1-50 obtains the highest accuracy of 95%. – In [39], two restricted Boltzmann machine and back propagation based deep CNN models are proposed. Using these two models, 81.7% and 88.4% accuracies are obtained, respectively. Furthermore, in [38], based on CNN and recurrent neural network (RNN) algorithms, a combined deep learning structure is introduced. In this work, unsupervised learning algorithms are first used to segment different tissues into different regions. Then, based on the segmentation result, the proposed deep learning approach is applied to the final classification task. Lastly, an accuracy of 91% is achieved. In addition, in the work of [37], five deep CNN models are built up, considering handcraft features and deep learning features jointly. In the experiment, the second model obtains the best performance of 92.19% accuracy.

A Survey for Breast Histopathology Image Analysis

227

– In [14], a classification approach via deep active learning and confidence boosting is introduced, and achieves an overall accuracy around 90%. Similarly, in [27], an implemented in-house CNN model is proposed, which combines the advantages of both machine learnt features and classical color features. – In [41], a DenseNet based CNN model is proposed for this task, including four dense blocks and three transition layers. In the experiment, a 95.4% accuracy is achieved. Similarly, in [15], a ResNet based 152 layer deep learning model is built, and achieves a correct classification rate of 98.77%. “Camelyon” Tasks In the “Camelyon Grand Challenge”, a task is to evaluate computational systems for the automated detection of metastatic breast cancer in whole slide images of sentinel lymph node biopsies. Related Works of Camelyon 2016 – In [30], a deep CNN is built for this task, and achieves an AUC of 97%. In [63], a GoogLeNet based deep learning method is introduced, where 270 images are used for training, and 130 are used for testing. Lastly, an area under the receiver operating curve (AUC) of 92.5% is obtained. With the same experimental setting, in [8], a recurrent visual attention model is proposed, which includes three primary components composed of dense or convolutional layers to describe the information flow between components within one timestep. Finally, a 96% AUC is achieved. – In [7], a summary of the Camelyon 2016 shows that: 25 of 32 submitted algorithms are deep learning based methods, and the 19 top-performing algorithms are all deep CNN approaches. Related Works of Camelyon 2017 In the Camelyon 2017 [11], in order to detect four types of breast cancer from the whole slide histopathology image, a deep learning architecture is proposed with limited computational resources. In this work, two CNNs are applied in a cascade, followed by local maxima extraction and SVM classification of local maxima regions. In the experiment, 300 images are used for training, 200 are used for validation, 500 are for test, and an accuracy of 92% is finally achieved. “BACH” Tasks In the “Breast Cancer Histology Challenge” (BACH) 2018, in order to classify four types of breast cancer histopathology images, an InceptionV3 based deep learning model is introduced in [18]. In the experiment, 300 images are used for training, and 100 are used for test. Finally, an average accuracy of 85% is achieved. For the same BACH 2018 task, a two-stage CNN model is also proposed in [42], where the first stage is for pixel-level classification, and the second stage is for image-level classification. In the experiment, an overall accuracy around 94% is obtained. Similarly, in [25], a two-stage classification approach is proposed.

228

C. Li et al.

In the first stage, an AlexNet based feature extraction is applied. In the second stage, three different classifiers are used. In the experiment, a support vector machine (SVM) classifier achieves the best result (99.84% accuracy). Similarly, in [49], the AlexNet is also applied as a basic model to build a hierarchical classification model, and an accuracy of 95% is obtained. Other Tasks In [45], a Convolutional Neural Network (CNN) model with three hidden layers is built up to segment breast cancer cell nucleus in histopathology images. In this work, 58 H&E stained images are tested, and an overall accuracy around 95% is achieved on both RGB and Lab color spaces. In [66], a principal component analysis network (PCANet) is introduced to classify ductal carcinoma in situ (DCIS) and usual ductal hyperplasia (UDH) images. In this work, a dataset with 20 DCIS and 31 UDH images are tested, where 10000 patches are randomly sampled from the training set to learn the models. Finally, an accuracy around 79% is achieved. In [6], a novel deep learning structure is introduced to solve a magnification independent breast cancer histopathology image classification task, referring to 40×, 100×, 200×, and 400× images. Finally, an average classification rate about 80% is achieved. In [67], a deep learning strategy, named as ‘stacked sparse auto-encoder’ (SSAE), is presented to detect nuclei on high-resolution breast cancer images. In [29], a deep CNN model is presented to detect breast cancer metastasis in sentinel lymph nodes. In the experiment, 100 examples are used for training, 50 for validation, and 75 are for testing. Finally, a sensitivity of 99.9% is achieved. In [3], a deep CNN model is trained to classify four breast cancer histopathology types in the whole slide images. In the experiment, 249 images are used for training, 20 images are used for test, and an accuracy of 77.8% is obtained. Additionally, in the work of [47], using the same images set and a data augmentation process, pre-trained ResNet-50, Inception-V3 and VGG-16 networks are fused into a deep learning structure, and achieves a mean accuracy of 87.2%. In [32], in order to classify different breast cancer types in H&E stained histopathology images, pre-trained ResNet-50 and ResNet-101 networks are applied with a fine-tune process and a fusion strategy. In the experiment, BioImaging 2015 Challenge (BI) dataset and ICIAR 2018 Grand Challenge (ICIAR) dataset are tested. Finally, 97.22% and 88.5% accuracies are obtained on the BI and ICIAR datasets, respectively. In [65], to classify four breast cancer types in histopathology images, a deep learning method is introduced with hierarchical loss and global pooling. In this work, VGG-16 and VGG-19 networks are applied as the basic deep learning structures, and a dataset with 400 images are tested. In the experiment, 280 images are used for training, 60 images for validation and 60 for testing. Finally, an average accuracy around 92% is obtained. In [16], an impressive work is carried out to classify five diagnostic breast cancer styles in the whole histopathology images. First, a saliency detector performs multi-scale localization of diagnostically relevant regions of interest in

A Survey for Breast Histopathology Image Analysis

229

the images. Then, a CNN classifies image patches as five types of carcinoma. Lastly, the saliency and classification maps are fused for final categorization. In the experiment, 240 images are used to exam the effectiveness of the proposed method, and a 55% accuracy is finally achieved. The high light of work is that, 45 pathologists take part in the final evaluation of the test images, and an average accuracy around 65% is obtained. Hence, the performance of the proposed method is comparable to the performances of the pathologists that practice breast pathology in their daily routines. 3.2

Summary

From the survey above, we can find that the deep ANN techniques are used more and more since the middle of the the 2010s, where, nearly all pattern analysis tasks can be solved by them, including image pre-processing, feature extraction, post-processing and classifier design. This development trend is mainly caused by the powerful ability of the fast evolution of hardware, which supports a high feasibility to implement the high computational complexity deep ANN algorithms. In addition, in contrast to the traditional manual craft feature extraction methods, the deep ANNs support a kind of full-automatic feature extraction approaches, which are more robust to describe complex morphological characteristics and structures of microscopic breast tissues. Furthermore, more and more transfer learning strategies are applied in the past three years to solve the small training dataset problem. Therefore, the deep ANNs show a very huge potential in the BHIA field.

4

Methodology Analysis and Conclusion

There are many classification methods used in BHIA tasks, like Bayesian and SVM classifiers. In contrast to other classifiers, the ANN classifiers have a very stable development history [44]. Compared to similarity-based classifiers, ANNs are good at working in a high-dimensional feature space. Because of many irrelevant dimensions, similarity-based classifiers fail to appropriately measure similarities in high-dimensional feature spaces. In contrast to probability-based classifiers, ANNs are able to solve small dataset problem much better. Probabilitybased classifiers need a large number of image examples to appropriately estimate probabilistic distributions in high-dimensional feature spaces [19]. However, due to the practical BHIA tasks, it is usually difficult to collect a large and statistically relevant number of data for training. For these reasons, weight-based classifiers can train different weights (parameters) for different components in a feature vector and construct a decision boundary between images of different data classes based on the margin maximisation principle. Due to this principle, the generalisation error of the ANN is theoretically independent of the number of feature dimensions [60]. Furthermore, a complex (non-linear) decision boundary can be extracted using a non-linear ANN to enhance the classification performance [44]. Hence, ANN classifiers are chosen and applied in many BHIA works

230

C. Li et al.

in the past periods for classification tasks. Although the SVMs are also weightbased classifiers and can solve high-dimensional feature space and small dataset problems effectively, they cannot do image pre-processing, feature extraction or post-processing works as the deep ANNs. In conclusion, the ANN approaches support not only classification methods, but also other pattern analysis functions. Hence, ANNs are very effective and potential ways for the BHIA applications and have a huge potential in the future. Acknowledgment. We thank the funds supported by the “National Natural Science Foundation of China” (No. 61806047), the “Fundamental Research Funds for the Central Universities” (No. N171903004), and the “Scientific Research Launched Fund of Liaoning Shihua University” (No. 2017XJJ-061). We also thank Dan Xue, due to her contribution is considered as the same important as the first author in this paper.

References 1. Acs, B., Rimm, D.: Not just digital pathology, intelligent digital pathology. J. Am. Med. Assoc. 4(3), 403–404 (2018) 2. Anuranjeeta, Shukla, K., Tiwari, A., Sharma, S.: Classification of histopathological images of breast cancerous and non cancerous cells based on morphological features. Biomed. Pharmacol. J. 10(1), 353–366 (2017) 3. Araujo, T., Aresta, G., Castro, E., et al.: Classification of breast cancer histology images using convolutional neural networks. Plos One 12(6), 1–14 (2017) 4. Arevalo, J., Cruz-Roa, A., Gonzelez, F.: Histopathology image representation for automatic analysis: a state-of-the-art review. Revista Med 22(2), 79–91 (2014) 5. Aswathy, M., Jagannath, M.: Detection of breast cancer on digital histopathology images: present status and future possibilities. Inform. Med. Unlocked 8, 74–79 (2017) 6. Bayramoglu, N., Kannala, J., Heikkilae, J.: Deep learning for magnification independent breast cancer histopathology image classification. In: Proceedings of ICPR 2016 (2016) 7. Bejnordi, B., Veta, M., Diest., P., et al.: Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318(22), 2199–2210 (2017) 8. BenTaieb, A., Hamarneh, G.: Predicting cancer with a recurrent visual attention model for histopathology images. In: Proceedings of MICCAI 2018, pp. 129–137 (2018) 9. Bhattacharjee, S., et al.: Review on histopathological slide analysis using digital microscopy. Int. J. Adv. Sci. Technol. 62, 65–96 (2014) 10. Chen, J., Li, Y., Xu, J., et al.: Computer-aided prognosis on breast cancer with hematoxylin and eosin histopathology images: a review. Tumor Biol. 39(3), 1–12 (2017) 11. Chervony, L., Polak, S.: Fast Classification of Whole Slide Histopathology Images for Breast Cancer Detection. Camelyon Grand Challenge 2017 (2017) 12. Ciresan, D., et al.: Mitosis detection in breast cancer histology images with deep neural networks. In: Proceedings of MICCAI 2013, pp. 411–418 (2013) 13. Demir, C., Yener, B.: Automated cancer diagnosis based on histopathological images: a systematic survey. Technical Report, Rensselaer Polytechnic Institute, Department of Computer, TR-05-09 (2005)

A Survey for Breast Histopathology Image Analysis

231

14. Du, B., Qi, Q., Zheng, H., et al.: Breast cancer histopathological image classification via deep active learning and confidence boosting. In: Proceedings of ICANN 2018, pp. 109–116 (2018) 15. Gandomkar, Z., Brennan, P., Mello-Thoms, C.: A framework for distinguishing benign from malignant breast histopathological images using deep residual networks. In: Proceedings of SPIE 10718 (2018) 16. Gecer, B., Aksoy, S., Mercan, E., et al.: Detection and classification of cancer in whole slide breast histopathology images using deep convolutional networks. Pattern Recognit. 84, 345–356 (2018) 17. Gil, J., Wu, H., Wang, B.Y.: Image analysis and morphometry in the diagnosis of breast cancer. Microsc. Res. Tech. 59(2), 109–118 (2002) 18. Golatkar, A., Anand, D., Sethi, A.: Classification of breast cancer histology using deep learning. arXiv Breast Cancer Histology Challenge 2018 (2018) 19. Guo, G., Dyer, C.: Learning from examples in the small sample case: face expression recognition. IEEE Trans. Syst. Man Cybern. 35(3), 477–488 (2005) 20. Gurcan, M., Boucheron, L., Can, A., et al.: Histopathological image analysis: a review. IEEE Rev. Biomed. Eng. 2, 147–171 (2009) 21. Han, Z., Wei, B., Zheng, Y., et al.: Breast cancer multi-classification from histopathological images with structured deep learning model. Sci. Rep. 7(4172), 1–10 (2017) 22. He, L., Long, L., Antani, S., Thoma, G.: Computer assisted diagnosis in histopathology. In: Zhao, Z. (ed.) Sequence and Genome Analysis: Methods and Applications, pp. 271–287. iConcept Press, Hong Kong (2010) 23. He, L., Long, L., Antani, S., Thoma, G.: Histology image analysis for carcinoma detection and grading. Comput. Methods Programs Biomed. 107(3), 538–556 (2012) 24. Irshad, H., Veillard, A., Roux, L., Racoceanu, D.: Methods for nuclei detection, segmentation, and classification in digital histopathology: a review - current status and future potential. IEEE Rev. Biomed. Eng. 7, 97–114 (2014) 25. Kiambe, K.: Breast histopathological image feature extraction with convolutional neural networks for classification. ICSES Trans. Image Process. Pattern Recognit. 4(2), 4–12 (2018) 26. Kowal, M., et al.: Computer-aided diagnosis of breast cancer based on fine needle biopsy microscopic images. Comput. Biol. Med. 43(10), 1563–1572 (2013) 27. Lee, G., et al.: Deep learning and color variability in breast cancer histopathological images: a preliminary study. In: Proceedings of SPIE 10718 (2018) 28. Li, Q., Li, W.: Using Deep Learning for Breast Cancer Diagnosis. Technical Report, Chinese University of Hong Kong, China (2017) 29. Litjens, G., et al.: Deep learning as a tool for increased accuracy and efficiency of histopathology diagnosis. Sci. Rep. 6(26286), 1–11 (2016) 30. Liu, Y., Gadepalli, K., Norouzi, M., et al.: Detecting Cancer Metastases on Gigapixel Pathology Images. arXiv Camelyon Grand Challenge 2016 (2017) 31. Loukas, C., Kostopoulos, S., Tanoglidi, A., et al.: Breast cancer characterization based on image classification of tissue sections visualized under low magnification. Comput. Math. Methods Med. 2013, 1–8 (2013) 32. Mahbod, A., et al.: Breast cancer histological image classification using fine-tuned deep network fusion. In: Proceedings of ICIAR 2018, pp. 754–762 (2018) 33. Malon, C., Cosatto, E.: Classification of mitotic figures with convolutional neural networks and seeded blob features. J. Pathol. Inform. 4(8) (2013) 34. Malona, C., et al.: Mitotic figure recognition: agreement among pathologists and computerized detector. Anal. Cell. Pathol. 35(2), 97–100 (2012)

232

C. Li et al.

35. Motlagh, M., Jannesari, M., Aboulkheyr, H., et al.: Breast Cancer Histopathological Image Classification: A Deep Learning Approach. bioRxiv (2018) 36. Mouelhi, A., Sayadi, M., Fnaiech, F.: A supervised segmentation scheme based on multilayer neural network and color active contour model for breast cancer nuclei detection. In: Proceedings of ICEESA, pp. 1–6 (2013) 37. Nahid, A., Kong, Y.: Histopathological breast-image classification using local and frequency domains by convolutional neural network. Information 9(19), 1–26 (2018) 38. Nahid, A., Mehrabi, M., Kong, Y.: Histopathological breast Cancer image classification by deep neural network techniques guided by local clustering. BioMed Res. Int. 2018, 1–20 (2018) 39. Nahid, A., Mikaelian, A., Kong, Y.: Histopathological breast-image classification with restricted boltzmann machine along with backpropagation. Biomed. Res. 29(10), 2068–2077 (2018) 40. Nawaz, M., Sewissy, A., Soliman, T.: Automated classification of breast cancer histology images using deep learning based convolutional neural networks. Int. J. Comput. Sci. Netw. Secur. 18(4), 152–160 (2018) 41. Nawaz, M., Sewissy, A., Soliman, T.: Multi-class breast cancer classification using deep learning convolutional neural network. Int. J. Adv. Comput. Sci. Appl. 9(6), 316–332 (2018) 42. Nazeri, K., et al.: Two-stage convolutional neural network for breast cancer histology image classification. arXiv Breast Cancer Histology Challenge 2018 43. Nejad, E., Affendey, L., Latip, R., Ishak, I.: Classification of histopathology images of breast into benign and malignant using a single-layer convolutional neural network. In: Proceedings of ICISPC 2017, pp. 50–53 (2017) 44. Nielsen, M.: Neural Networks and Deep Learning. Determination Press (2015) 45. Pang, B., Zhang, Y., Chen, Q., et al.: Cell nucleus segmentation in color histopathological imagery using convolutional networks. In: Proceedings of CCPR, pp. 1–5 (2010) 46. Petushi, S., Garcia, P., Haber, M., et al.: Large-scale computations on histology images reveal grade-differentiating parameters for breast cancer. BMC Med. Imaging 6(14), 1–11 (2006) 47. Rakhlin, A., Shvets, A., Iglovikov, V., Kalinin, A.: Deep convolutional neural networks for breast cancer histology image analysis. In: Proceedings of ICIAR 2018, pp. 737–744 (2018) 48. Ramos-Vara, J.: Principles and methods of immunohistochemistry. In: Gautier, J. (ed.) Drug Safety Evaluation. Methods in Molecular Biology (Methods and Protocols), vol. 691, pp. 83–96. Springer, Humana Press, Germany (2011) 49. Ranjan, N., et al.: Hierarchical approach for breast cancer histopathology images classification. In: Proceedings of MIDL 2018, pp. 1–7 (2018) 50. Shallu, Mehra, R.: Breast cancer histology images classification: training from scratch or transfer learning? ICT Express 4(4), 247–254 (2018) 51. Siegel, R., Miller, K., Fedewa, S., et al.: Colorectal cancer statistics, 2017. CA Cancer J. Clin. 67(3), 177–193 (2017) 52. Singh, S., Gupta, P., Sharma, M.: Breast cancer detection and classification of histopathological images. Int. J. Eng. Sci. Tech. (IJEST) 3(5), 4228–4332 (2011) 53. Song, Y., Zou, J., Chang, H., Cai, W.: Adapting fisher vectors for histopathology image classification. In: Proceedings of ISBI 2017, pp. 600–603 (2017) 54. Spanhol, F.: Automatic breast cancer classification from histopathological images: a hybrid approach. Ph.D. thesis. Federal University of Parana, Brazil (2018)

A Survey for Breast Histopathology Image Analysis

233

55. Spanhol, F., et al.: Deep features for breast cancer histopathological image classification. In: Proceedings of SMC, pp. 1868–1873 (2017) 56. Spanhol, F., et al.: A dataset for breast cancer histopathological image classification. IEEE Trans. Biomed. Eng. 63(7), 1455–1462 (2016) 57. Spanhol, F., et al.: Breast cancer histopathological image classification using convolutional neural networks. In: Proceedings of IJCNN (2016) 58. Steiner, D., MacDonald, R., Liu, Y., et al.: Impact of deep learning assistance on the histopathologic review of lymph nodes for metastatic breast cancer. Am. J. Surg. Pathol. 42(12), 1636–1646 (2018) 59. Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. Elsevier (2009) 60. Vapnik, V.: Statistical Learning Theory. Wiley-Interscience, US (1998) 61. Veta, M.: Breast cancer histopathology image analysis. Ph.D. thesis in Utrecht University, Netherlands (2014) 62. Veta, M., Pluim, J., Diest, P., Viergever, M.: Breast cancer histopathology image analysis: a review. IEEE Trans. Biomed. Eng. 61(5), 1400–1411 (2014) 63. Wang, D., Khosla, A., Gargeya, R., et al.: Deep learning for identifying metastatic breast cancer. arXiv Camelyon Grand Challenge 2016 (2016) 64. Wang, H., Cruz-Roa, A., Basavahally, A., et al.: Cascaded ensemble of convolutional neural networks and handcrafted features for mitosis detection. In: Proceedings of SPIE 9041 (2014) 65. Wang, Z., Dong, N., Dai, W., et al.: Classification of breast cancer histopathological images using convolutional neural networks with hierarchical loss and global pooling. In: Proceedings of ICIAR 2018, pp. 745–753 (2018) 66. Wu, J., Shi, J., Li, Y., et al.: Histopathological image classification using random binary hashing based PCANet and bilinear classifier. In: Proceedings of EUSIPCO, pp. 2050–2054 (2016) 67. Xu, J., Xiang, L., Liu, Q., et al.: Stacked sparse autoencoder (SSAE) for nuclei detection on breast cancer histopathology images. IEEE Trans. Med. Imaging 35(1), 119–130 (2016) 68. Zhang, Y., Zhang, B., Coenen, F., Lu, W.: Breast cancer diagnosis from biopsy images with highly reliable random subspace classifier ensembles. Mach. Vis. Appl. 24(7), 1405–1420 (2013) 69. Zhang, Y., Zhang, B., Lu, W.: Breast cancer classification from histological images with multiple features and random subspace classifier ensemble. In: Proceedings of AIP 1371, no. 1, pp. 19–28 (2011) 70. Zhang, Y., Zhang, B., Lu, W.: Breast cancer histological image classification with multiple features and random subspace classifier ensemble. In: Pham, T.D., Jain, L.C. (eds.) Knowledge-based Systems in Biomedicine, SCI 450, pp. 27–42. Springer, Germany (2013) 71. Zhi, W., Yueng, H., Chen, Z., et al.: Using transfer learning with convolutional neural networks to diagnose breast cancer from histopathological images. In: Proceeding of ICONIP 2017, pp. 669–676 (2017)

Image Analysis

Descriptive Seons: Measure of Brain Tissue Impairment Artur Przelaskowski1(B) , Ewa Sobieszczuk2 , and Izabela Domitrz3 1

2

3

Faculty of Mathematics and Information Science, Warsaw University of Technology, 75 Koszykowa st., 00-662 Warsaw, Poland [email protected] Department of Neurology 1st Faculty of Medicine, Medical University of Warsaw, 1a Banacha st., 02-097 Warsaw, Poland [email protected] Department of Neurology 2nd Faculty of Medicine, Medical University of Warsaw, 80 Ceglowska st., 01-809 Warsaw, Poland [email protected]

Abstract. In this paper, new numerical interpretation of CT scans to aid stroke care is considered. Semantic models are used to explain the essence of the observed reality contained in complex structures and imaged features to have full cognition of the reality under investigation. Proposed concept and particular implementation of a cognitive seon is based on integrated, image-based descriptors of CT diagnostic imaging. Brain tissue impairment was numerically measured to characterize ischemic stroke severity, dynamics and extent. In consequence, emergent stroke decisions could be supported using the seon of the integrated descriptive components to predict stroke treatment output. Respective experiments with a database of 145 strokes and controls have confirmed the usefulness and significant efficiency of the specific seon while the correlation of the descriptive measurements to ground true of clinical assessments of stroke cases was satisfied. The most highlighted contribution is model-based interpretation of stroke problem with insightful data-driven parametrization. Keywords: Semantic descriptors · Ischemia patterns Cognitive models · Medical image interpretation · Computerized decision support · Stroke care

1

·

Introduction

Prognostic model of acute stroke development applies primarily to severity and dynamics of ischemia process, both defined in the relationship of causes and effects. The effects are described in the form of clinical and neurological symptoms confirmed and described subjectively in expert neurological assessment (mainly NIHSS1 ). Their specificity points to the supposed causes [1], while the 1

NIH Stroke Scale for quantifying stroke severity.

c Springer Nature Switzerland AG 2019 E. Pietka et al. (Eds.): ITIB 2019, AISC 1011, pp. 237–248, 2019. https://doi.org/10.1007/978-3-030-23762-2_21

238

A. Przelaskowski et al.

reliably measured intensity allows to conclude on the stroke severity level. Knowing the causes, it is better to define real or hypothetical extent of ischemia, confirmed dynamically. The symptomatic causes of the ischemia (i.e. involved artery, location of ischemic lesion and pathomechanism of the ischemia) might be confirmed or better diagnosed using imaging techniques. Radiological opinions allow to better understand the symptoms, but also more precisely assess the causes by observing the changes taking place in the brain tissue. However, time tendency of ischemia progress which covers the speed and expected limitation of its development is difficult to assess without reliable estimates of intensity in fairly accurate timestamps. The problem is the limitations of the imaging systems that can be used, primarily in relation to the sensitivity of measuring the distribution of tissue density (CT), accuracy of impaired blood flow (perfusion CT or MR), precision of vascular status (angiography CT or MR) and sensitivity of diffusion (DWI). The use of dynamic imagery allows to assess the progression of ischemic processes, ascertaining the dynamics of disease development. Especially, serial imaging is suggested future technology [2]. However, only NCCT (Non-Contrast CT) has in fact wide accessibility and applicability. The unclear nature of the ischemic changes in local and global blood supply to the brain makes brain tissue fate difficult to evaluate and prognose. There are no adequately controlled studies to conclude the constancy of the time course for evolution through the stages of ischemic cell change [3]. But even if not as visible signs in the image, it is possible to differentiate impaired tissue using advanced computer-based image analysis [4]. It is because the significant impairment of blood flow, hemodynamically distributed in brain circulatory system changes generally any fluid flow and distribution in cerebral tissue. So that the specificity of focal ischemia can be determined according to sensitive CT-based assessment of global brain response to the perturbations of integrated cerebral blood flow system. But the assessment should refer to established tissue properties that can be reliably calculated, and which are crucial in describing the dynamics, severity and extent of ischemia development. Because of that, physical and technological limits of the contemporary imaging systems resulting in ambiguity of the provided information may be reduced having regard to biological reasons and clinical determinants of performed observations used to explain more deeply and individually image-percepted or calculated information. Detailed and specific measurement of CT scan-distributed response of whole brain volume is used as substitute of expensive, burdensome and difficult to access imaging case study after the time. Analysis of mutual relations and relationships between uniformity of tissue density distribution, multifocal clusters of density centers, and subtle texture differentiation in volume leads to possibly complete picture of ischemia. Diverse assessment of density distribution enables estimation of dynamics of ischemia progress but the specificity of density centers can be properly interpreted only in relation to total density changes to interpret stroke severity. A context of subtle texture characteristics projected to density approximant points out the extent of tissue impairment in addition. Therefore, effective integration of all these descriptors could allow a prognostic assessment

Descriptive Seons: Measure of Brain Tissue Impairment

239

of the development and effects of the disease potentially to aid decision of stroke confirmation, hospital admission or thrombolysis application. Our proposal is numerical measurement of global brain tissue response contained in CT imaging. Decisive cognition components were selected and integrated in one descriptive model to conclude the state of the ischemic brain injury. 1.1

Concept of Ischemia Seon

According to general concept of empirical model building [5], semantic descriptors are used to sparsify-by components and consequently to simplify the problem defining kernel understanding of its specificity instead of data-dependent uncontrolled assumptions. Compound analysis of imaged ischemia was implemented basing on semantic descriptors of stroke severity, dynamics and extent in a form of descriptive seon. The seon of ischemia is defined as integrated description of this pathology, rooted in domain knowledge adapted to the specifics of the CT imaging method used. In consequence, a set of matched and effectively integrated cognitive components constitutes the calculation model of specific medical nature in a form of the seon to facilitate urgent decisions related to urgent clinical actions, challenging diagnostic interpretations or responsible therapeutic, e.g. thrombolytic procedures. A design and implementation of seon-based ischemia model to aid diagnosis (cognitive model) or therapy (prognostic model) focus primarily on CT-based computational analysis under specialist knowledge and clinical control. The next steps of the completed model implementation are as follows: (a) formal description of domain knowledge including the comprehensive description of causeand-effect relationships regarding the development of pathology at the possibly fundamental level, (b) searching for sensitive measurement methods that give the greatest cognitive benefit, (c) determination and implementation of the significant cognitive descriptors which model the problem as comprehensively and universally as possible, (d) integration of cognitive components in the form of the seon based on verified clinical knowledge. The model parametrization, learning and verification procedures relate to reliable observations and medical assessments used to establish ground truth (GT) in terms of severity, dynamics and extent description, stroke confirmation and clinical output verification. The GT was established as a consensus of neurologists and clinicians in a retrospective analysis of representative strokes and controls.

2

Implementation of the Ischemia Seon

The primary objective was to enhance CT scans so that the ischemia occurrence and progression could be measured if there is even the slightest trace of stroke, tissue impairment or any consequences of blood flow dysfunction. Scan processing was based on multiscale image decomposition with nonlinearly approximated

240

A. Przelaskowski et al.

sparse, multicomponent representation of decisive information. Strengthen manifestation of tissue impairment in processed brain images was used to approximate the synthetic state of whole brain functioning through the integration of interscan deficit specificity of the cognitive model. We assumed hypothetically that volumetric and scalable specificity of enhanced brain tissue distribution makes it possible to implement. Locally varying characteristics of approximated tissue density and patterns related to more general texture trends monitored across the whole brain at the time of measurement were investigated as a comprehensive picture of the phenomenon able to conclude prognostic model of thrombolytic treatment. Sensitively measured key properties of impaired tissue across volumetric scan data of CT imaging were normalized and adjusted to the formal conditions of the diagnosis and key premises of therapeutic decisions in learning process. Basing on that, decisive suggestions to support clinical stroke protocols in emergency were formulated (Fig. 1).

Fig. 1. The implementation of the ischemia seon to prognose the efficiency of stroke treatment

In details, we propose: – descriptor of ischemia extent (ExtDesc) used for subtle characteristics of tissue texture differentiation globally; statistically-identifiable deviations of

Descriptive Seons: Measure of Brain Tissue Impairment

241

various texture properties are calculated to estimate volumetric differentiation of the imaged tissue; basing on that, a sensitive sensor of volumetric tissue impairment was optimized to represent extent of blood supply changes; the proposed construction includes selected local Haralick’s features with the contexts adjusted to more or less local signs of edema; – stroke severity descriptor (SevDesc) designed to analyze volumetric density distribution basing on non-linearly, coarsely approximated tissue density in a scaled base of surfacelets2 [6]; adjusted representation of the subsequent density maps (isodense maps or isomaps defined later in this section) with scale progression underlines specificity of possible tissue impairments to measure general statistics of approximated density distribution and the parameters of respective isomaps over volume scans of both hemispheres; the implemented vector of severity characteristics includes the statistics of the tissue density distributions in the left and right binarized iso-hemispheres (i.e. pseudo hemispheres estimated for each isomap to extract the asymmetry of density levels) or in individual hemispheres and relative to each other; – descriptor of tissue impairment dynamics (DynDesc) used to precisely monitor any variation of locally approximated tissue density in TVL1-L2 variational image reconstruction based on partial Fourier basis [7]; for this purpose spatially distributed singularities and smooth discontinuities of scan data are estimated to notice any slight change in the structure manifestation representing sensitively isodense swelling because of altered perfusion([8]); to calculate it, successive isomaps adaptively estimated for each slice of the examination were used to specify detailed density patterns of cerebral structures and diversified tissue properties. The design and implementation of the integrated measure of ischemia, which will effectively define the abovementioned elements of the collective description of the explored brain tissue was realized using supervised learning process, serving selection of a set of effective components of the descriptor and the synthesis of the final description of ischemia in terms of a sudden diagnosis and therapeutic decision necessary. Substantive Rules of Descriptive Computations Let the CT scan volume ×N of the brain imaging be represented as a sequence of data matrices F(k) ∈ ZN + with N = 512 indexed by the scan number k = 1, . . . , K. The single scan is (k) defined as intensity image fx,y where x, y ∈ ΩF(k) are Cartesian coordinates of scan domain points in Euclidean space. Any descriptive analysis of the scans should be directed to prior segmented regions of stroke-susceptible tissue in subsequent slices. For this purpose, simple two step segmentation S was applied, i.e.: (a) soft thresholding of F(k) with a window extending from 960 to 1080 of intensity (−40 HU to 80 HU), (b) morphological erosion for smoothing the shape 2

Family of filter banks with angular resolution iteratively refined by invoking more levels of decomposition was used to efficiently capture and represent surface-like singularities.

242

A. Przelaskowski et al.

of initially segmented regions to eliminate any artifacts. The obtained compact representation of volume of interest is defined as V(k) = S(F)(k) with segmented ROI support suppV(k) = {x, y : vx,y > 0} for each k. Next, the brain orientation in scan images by estimation of its main axis and reliable segmentation of the left and right hemispheres were performed to minimize inaccuracies because possible unstable patient placement during the examination and important consideration of the specificity of the case. The idea of centroidal principal axes referring to subject’s unique brain structure was adopted [9]. Implemented method is insensitive to spatially uniform random noise and positioning of the head, reflecting symmetry level of cerebral hemispheres. In brief, the sliced brain’s geometric properties were firstly characterized locating centroids of region element coordinates x, y ∈ suppV(k) for subsequent slices. Additionally, the eigenvalues and eigenvectors of covariance matrix of the brain coordinates were used to estimate orientation of the brain axis and its shift relative to a global reference point. The principal axes of the successive slices depend on their geometric shape while the origins of axes coincide with their geometric centers. The descriptor of ischemia extent was defined by selected scan features extracted from V(k) describing the diversity of the brain texture across a whole volume. The set of selected Haralick’s features includes: autocorrelation, contrast, correlation, cluster prominence, cluster shade, dissimilarity, energy, entropy [10], variance, sum average, sum variance, sum entropy, difference variance, difference entropy, information measure of correlation [11], inverse difference normalized and inverse difference moment normalized [12]. All of them were calculated for both hemispheres separately and for whole brain. The most effective elements of the ExtDesc are: autocorrelation, sum average, energy, dissimilarity and inverse difference normalized and calculated for both hemispheres separately and together. In addition, multiscale image decomposition, defined by the Φ matrix of sparsifying transformation was applied to extract and characterize the tissue density distribution used for design the other two descriptors. In case of SevDesc, the Φ represents scalable basis of surfacelets while partial Fourier transform was used for measurement of ischemia dynamics (DynDesc). Next, nonlinear approximation with operator H of hard thresholding parametrized with h adjusted in learning procedure was applied in transform domain. The intended effect was subsequently approximated density scans D(k) = H(F(k) Φ, h) with subtle differentiation and extraction of sparse tissue patterns, focuses or local orientations. An exemplary effect of processing scans has been presented in Fig. 2. Tissue density analysis was based on a set of the isomaps extracted from D(k) , where subsequent isomaps were used to characterize distribution of tissue density slice-by-slice in dominant subranges of processed scan intensity. Brief description of the method for determining the set of isomaps Tk,i ∈ {0, 1}N ×N with i = 1, . . . , I, where I is the number of adaptively adjusted dominant isomaps determined for k scan, includes: (a) calculating I relevant peaks hi of D(k)

Descriptive Seons: Measure of Brain Tissue Impairment

243

Fig. 2. Representations of sample scan data used to calculate the descriptors integrated in ischemia seon (left to right): source scan (extent), its approximation with surfacelets (severity) and partial Fourier-based reconstruction (dynamics)

histogram with more than 1200 counts of the normalized histogram, (b) fixing (k,i) binary maps tx,y of scan domain, where tx,y = 1 ⇔ dx,y = arg hi . Next, the concept of centroidal principal axes was applied to the brain subdensity in order to analyze the geometry of the isodensity maps representing the slice tissue distribution with the various subranges of density. It is because subtle changes of density distribution affects geometry of the subsequent isomaps. Asymmetry of size and shape differences between successively designated “isodense hemispheres” (iso-hemispheres) was measured because of calculated isomap-specific region axis, referenced to main axis. Identified principal axes and centroids of the respective iso-regions, the slope and offset of the axes, eigenvectors and eigenvalues were statistically analyzed to estimate asymmetry rates of tissue density distribution on subsequent levels of intensity. Feature vectors formulated on this basis occurred really effective to define and recognize hypodense trace of the edema highly specific for ischemic tissue damage. The descriptor of ischemia severity is defined primarily using size specificity of the left and right iso-hemispheres (LHi and RHi , i = 1, . . . , I) calculated for subsequent scans. Moreover, the statistics calculated on the tissue density distributions in individual hemispheres and relative to each other to characterize the differences in tissue density more generally. This descriptor contains of the following vector components: • the number of isomaps determined in subsequent scans; − mean(LH) ; • |1mean(RH)| |1 − mean

• mean

RH

LH

arg RH arg RH = max(RH)

arg LH < .5∗max(LH) ∧ arg(LH) > arg LH = max(LH)

;

|LHi −RHi | |LHi −RHi | 3 LHi LHi i −RHi | ); var( |LH • var( RH |LHi +RHi | ); rms( RHi ); rms( |LHi +RHi | ); iqr( |LHi +RHi | ). i

The other features of this descriptor do not use the isomap representation. These are selected statistics counted differentiation of tissue density distributions of both hemispheres with each other and in relation to the properties of the entire brain, such as: entropy, variation, kurtosis. skewness, energy, joint entropy, three Tamura texture attributes [13] and four basic GLCM4 features (contrast, correlation, energy, homogeneity). The most effective components of this descriptor are: (a) root-mean-square and variance calculated for ratios of the difference to the sum of the size of both iso-hemispheres extracted in the scan, (b) the ratio of entropies of both hemispheres in relation to each other, (c) the ratio of Tamura’s first attributes of both hemispheres in relation to each other, (d) the ratio of entropies calculated for respective context of GLCM structure of both hemispheres in relation to each other, (e) the ratio of homogeneities calculated for respective context of GLCM structure of both hemispheres in relation to each other, (f) the ratio of entropy to Tamura’s second attribute calculated for the entire brain. The descriptor of ischemia dynamics is concentrated on differential analysis of geometric properties of iso-regions defined in successive isodensity maps. Diversified asymmetry of the successive isodense maps was sensitively characterized determining sequentially centroid distances, absolute differences of principal iso-axis coordinates and attitudes of the axis directional factors referenced to main brain axis, the slopes and offsets of the principal iso-axes, distributions of eigenvalues and eigenvectors across successive isomaps and point projections on eigenvectors. The following statistics were calculated: average, variance, median, skewness, energy and correlation coefficient. In addition, 1D Fourier transform (FT) was calculated across columns of binary map representing geometry of the isomaps. The most effective components of this descriptor relate to: (a) statistics of relative isomaps’ centroids, i.e. distances between centroids of the extracted isomaps and centroid of the brain ROI, (b) statistics estimated for distributed angles of slopes calculated for principal axes of the isomaps, (c) relative statistics defined for a quotient of imaginary energy to real energy of 1D FFT calculated for columns of the each isomap. To conclude, the ischemia seon was implemented to measure tissue impairment, predict tissue fate during the acute phase and estimate final tissue outcome (infarct/recovery) affecting stroke confirmation and influencing the treatment applicability. Numerical description of ischemia specificity could be useful to predict favorable outcome of available treatment. Such reliable measurement of ischemia would be helpful in making clinical decisions to maximize benefits and 3 4

var – Variance, rms – Root-mean-square, iqr – Interquartile range. Grey-Level Co-occurrence Matrix.

Descriptive Seons: Measure of Brain Tissue Impairment

245

minimize losses of thrombolysis. Therefore, the efficiency assessment was limited to verification of the effects of accurate ischemic stroke recognition and reliable prediction of treatment usability.

3

Experimental Setup

Implementation of the proposed method was optimized in terms of (a) conceptual and numerical efficiency of the model and its universality, (b) correlation to experimentally established clinical pattern of ischemia severity, dynamics and extent, (c) classification procedures to recognize stroke and predict usefulness of the therapy. Learning of the seon-based model has been realized with lasso selector to define the effective semantic components of the each descriptor to be integrated in the ischemia seon. Next, the weights of such complex model were established with multiple linear regression. Finally, linear discriminant classifier was applied to recognize stroke and predict applicability of treatment. 3.1

Representatives of Ground Truth Data

Reliable datasets having the essential characteristics of tissue impairments of various intensity, developmental dynamics and the consequences were used to learn and verify the model implemented by means of seasons. Representative collection of cases was formed on the basis of significantly different cases. The NCCT examinations have been carried out in clearly different procedural and technological conditions over many years to optimize and verify semantic component model constructed using the seon of ischemic tissue impairments. The context of its use is diagnosis and treatment of stroke. Complete dataset of 145 normal and stroke cases was collected in two medical centers (MC1 and MC2) at an interval of more than 10 years: 2005–2008 at MC1 and 2013–2015 at MC2, respectively. Four CT scanners of different generations were applied to verify independence of constructed model from conditioning of NCCT imaging. In details, a group of 71 stroke cases was mean aged to 67 years (29–83) with 50% of male patients. Median time onset-first CT was approximately 3 h (45 min–8 h). The 74 cases of control group consisted of patients with non-confirmed stroke and with other neurological or general non-neurological diseases, mean aged to 64 years (26–97) and 54% of male. Complete dataset of 145 cases as representative enough was firstly used to learn the seon-based model. In addition, we selected distinctive subset of 51 older (2005–2008) and lower quality cases (because of older generations of tomographs used) collected in only one center (CM1). The selected subset characteristics is as follows: 35 strokes with mean age of 76.7 years (52–92), 65% of males and median time to first CT approximately equal to 3 h 50 min (1 h–8 h); 16 controls with mean age of 65.3 years (31–88), 69% of males. The selected subset of reference data was used to verify universality of the implemented model, i.e. its efficiency was measured only on this specific subset of training datasets.

246

A. Przelaskowski et al.

Follow-up clinical data of test cases with accurate assessments of ischemia intensity and dynamics, confirmation of stroke and treatment efficiency and results were used to formulate GT reference for calculated measures of impairments, diagnostic indications and treatment predictors. The GT was defined according to the common opinion of 2 clinicians participating in the experiments. Proposed pattern of severity was formulated as a combination of normalized neurological scores of input NIHSS and ABCD25 , having age and verified state of risk factors. In turn, dynamics pattern is estimated depending on the time between symptom onset and CT examination, the scores of input and output NIHSS and patient age while the ischemia extent true was based on the relation of NIHSS input/output and ABCD2 scores. A pattern of treatment usefulness (beneficial or not) has been retrospectively assessed as common opinion of the clinicians: confirming (or not) and grading the usefulness of applied therapy (i.e. improved prognosis in follow-up assessment) or recommending (or not) therapy for test cases of stroke untreated with thrombolysis in analyzed dataset. These suggestions were supported by detailed analysis of stroke progression in the context of all documented conditions, based on clinical evidence available in hospital centers involved in the study, agreed by the neurologists and radiologists, supervisors of constituted clinical evidence. 3.2

Procedures and Results

Retrospective verification of the proposed method was applied in patient-oriented procedures of (a) correlation to severity, dynamics and extent patters, (b) stroke recognition and prediction of treatment usability in clinical practice basing on established consensus of clinicians. Pearson correlation coefficient has been calculated for assessment of impairment measure while leave-one-patient-out crossvalidation has been applied for complete dataset. Two indicators of classification efficiency were used: the Area Under the Curve (AUC) computed from the estimated ROC curve and Correct Rate (CR) determining the effectiveness of automatic suggestions that relate to a decision-making processes in confirming the stroke or usability of thrombolytic therapy. The selected results of realized experiments verifying the effectiveness of the optimized method in question are presented in Table 1. Achieved efficiency is high enough to conclude usefulness of the proposed seon-based model of brain tissue impairments for both cases of potential application. Almost perfect stroke recognition and effective therapy prognosis do not depend on the subset of testing cases confirming the universality of the implemented model. Among the proposed descriptors, the descriptor of ischemia extent proved to be the most effective because of its versatility. The other two in the case of tests on the subset showed a significant decrease in their usefulness.

5

The ABCD2 is used to identify patients at high risk of stroke following a TIA.

Descriptive Seons: Measure of Brain Tissue Impairment

247

Table 1. The experimental verification of the implemented ischemia seon basing on the semantic descriptors. Calculated measures of severity, dynamics and ichemia extent were correlated to the clinically defined patterns. Stroke recognition and efficiency of thrombolytic therapy were tested on the complete dataset and its selected subset Descriptors of Correlation Integrated ischemia seon impairment Complete Subset Complete Subset

4

ExtDesc

0.93

0.92

Diagnosis

Therapy

SevDesc

0.83

0.60

AUC CR

AUC CR AUC CR

DynDesc

0.92

0.80

0.99

0.99 1.0

Diagnosis

1.0 1.0

Therapy AUC CR

0.98 1.0

1.0

Achievements to Highlight

In this paper, we extended measurable stroke-affected areas to whole imaged volume of soft tissue prone to ischemia. Thus, inspired by previous results of NCCT texture analysis, demonstrating that non-lesional tissue of the patients is statistically different from normal tissue of the controls [4], we proposed differentiated analysis of dynamic tissue impairment basing on spatially distributed intensities of imaged brain tissue. Because of that, we hypothesized that integrated global and local volume characteristics of tissue fate is potentially more predictive for the ischemia outcome than regional analysis suggested, among others, by [14]. This is due to the fact that hemodynamic changes due to ischemic attack onset concern majority of brain tissue, including balancing of collateral circulations of blood flow, regional compromises but also subsequent accumulation of excess water inside brain tissue, local compensation of edema, reductions in the volume of cerebrospinal fluid etc. [15]. Our proposal of the seons refers to complex, multicomponent analysis of imaging effects understood in context of biological or cell-based approaches in modeling of stroke ischemia [16]. Aiding stroke confirmation and accurate prediction of treatment efficiency were positively and usefully verified. The proposed design framework for optimization of computerized assistance of stroke care can be expanded both in the depth, perfecting the proposed model and computational representations of the NCCT-based seons, as well as outside by choosing the method of its integration with clinical observations projected to prospective practice. Other applications in the field of computer-aided diagnosis and clinical decision support are also possible. Acknowledgment. This publication was funded by the National Science Centre (Poland) based on the decision DEC-2011/03/B/ST7/03649.

References 1. Ciszek, B., Jozwiak, R., Sobieszczuk, E., et al.: Stroke Bricks - spatial brain regions to assess ischemic stroke location. Folia Morphol. 76(4), 568–573 (2017)

248

A. Przelaskowski et al.

2. Ip, H.L., Liebeskind, D.S.: The future of ischemic stroke: flow from prehospital neuroprotection to definitive reperfusion. Interv. Neurol 2, 105–117 (2013) 3. Pulsinelli, W.A.: Selective Neuronal Vulnerability and Infarction in Cerebrovascular Disease. Primer on cerebrovascular diseases, Gulf Professional Publishing, ed. by Welch K.M.A. Caplan L.R. et al., pp. 104–107 (1997) 4. Oliveira, M.S., Fernandes, P.T., Avelar, W.M., et al.: Texture analysis of computed tomography images of acute ischemic stroke patients. Braz. J. Med. Biol. Res. 42(11), 1076–1079 (2009) 5. Thompson, J.R.: Empirical Model Building: Data, Models, and Reality. Wiley, Hoboken, New Jersey (2011) 6. Lu, Y.M., Do, M.N.: Multidimensional directional filter banks and surfacelets. IEEE Trans. Image Proc. 16(4), 918–931 (2007) 7. Yang, J., Zhang, Y., Yin, W.: A fast alternating direction method for TVL1-L2 signal reconstruction from partial Fourier data. IEEE J. Sel. Top. Signal Process. 4(2), 288–297 (2010) 8. Muir, K.W., Baird-Gunning, J., Walker, L., et al.: Can the ischemic penumbra be identified on noncontrast CT of acute stroke? Stroke 38, 2485–2490 (2007) 9. Levy, A.V., Brodie, J.D., Russell, A.G., et al.: The metabolic centroid method for PET brain image analysis. J. Cereb. Blood Flow Metab. 9, 388–397 (1989) 10. Soh, L., Tsatsoulis, C.: Texture analysis of SAR sea ice imagery using gray level co-occurrence matrices. IEEE Trans. Geosci. Remote. Sens. 37(2), 780–795 (1999) 11. Haralick, R.M., Shanmugam, K., Dinstein, I.: Textural features of image classification. IEEE Trans. Syst. Man Cyber SMC 3(6), 610–621 (1973) 12. Clausi, D.A.: An analysis of co-occurrence texture statistics as a function of grey level quantization. Can. J. Remote. Sens. 28(1), 45–62 (2002) 13. Karmakar, P., Teng, S.W., Zhang, D. et al.: Improved Tamura features for image classification using kernel based descriptors. In: Proceedings of the IEEE International Conference on Digital Image Computing: Techniques and Applications (DICTA), pp. 1–8 (2017) 14. Scalzo, F., Hao, Q., Alger, J.R., Hu, X., Liebeskind, D.S.: Regional prediction of tissue fate in acute ischemic stroke. Ann. Biomed. Eng. 40(10), 2177–2187 (2012) 15. Krieger, D.W., Demchuk, A.M., Kasner, S.E., Jauss, M., Hantson, L.: Early clinical and radiological predictors of fatal brain swelling in ischemic stroke. Stroke 30, 287–292 (1999) 16. Canazza, A., Minati, L., Boffano, C., Parati, E., Binks, S.: Experimental models of brain ischemia: a review of techniques, magnetic resonance imaging, and investigational cell-based therapies. Front Neurol 5(19), 1–15 (2014)

An Automatic Method of Chronic Wounds Segmentation in Multimodal Images Joanna Czajkowska(B) , Marta Biesok, Jan Juszczyk, Agata Wijata, Bartlomiej Pyci´ nski, Michal Krecichwost, and Ewa Pietka Faculty of Biomedical Engineering, Silesian University of Technology, 40-800 Zabrze, Poland {joanna.czajkowska,marta.biesok,jan.juszczyk,agata.wijata, bartlomiej.pycinski,michal.krecichwost,ewa.pietka}@polsl.pl

Abstract. Chronic wounds are common diseases in aging society. Automatic method of images segmentation is required to effectively and objectively monitor the healing process. The segmentation method proposed in the paper employs Histograms of Oriented Gradients, Weighted Fuzzy C-Means Clustering, Edge Detection, Gradient Vector Flow and Active Contour techniques. The method gives high compliance with manual outlines performed by two experts. Mean Dice Index for 11 cases was 0.84. Obtained results indicate the possibility of automation of diagnosis and monitoring processes. An infrared image reveals the parts of the wound under the skin which are invisible for commonly used cameras and it might give valuable information for physicians in assortment of treatment. Keywords: Wound segmentation · Image processing Active contour · Histogram of oriented gradients

1

·

Introduction

The skin, the largest organ in human body, fulfills protective function. It is the outer shell of the body and its interruption could provide to infections and further complication, if not treated properly. The time of wound healing varies depending on its location, size, depth, and type. When the patient is under pathologic factors (eg. diabetes, cancer, radiation) wounds can transform into chronic ones [1]. Chronic wounds might be divided into 3 main categories: pressure sores, diabetic ulcers, and venous ulcers [2]. Mean age of patients suffering from persistent wounds is greater than 60 years. Nowadays, when the population is aging, estimated number of patients with chronic wounds is growing rapidly [3]. The gold standard in wound assessment is the biopsy of the wound tissue but there is also many non-invasive methods in clinical use, inter alia: laser Doppler c Springer Nature Switzerland AG 2019 E. Pietka et al. (Eds.): ITIB 2019, AISC 1011, pp. 249–257, 2019. https://doi.org/10.1007/978-3-030-23762-2_22

250

J. Czajkowska et al.

imaging, indocyanine green videoangiography, near-infrared spectroscopy, in vivo capillary microscopy, orthogonal polarization spectral imaging, reflectance-mode confocal microscopy, hyperspectral imaging, optical coherence tomography, laser speckle imaging, and photoacoustic microscopy [4]. They are based on the analysis of the different types of images. However, the most easily available imaging technique is a simple video camera, but the digital planimetry still requires the delineation of the wound by a clinician. Wound segmentation could be processed automatically using various methods based on images. One of the most common methods are based on different variants of the active contours [23]. There are also methods based on fuzzy divergence [5], color histograms [6], region growing [7,8], differential evolution algorithm [9], and even artificial neural networks [10] or bayesian classifiers [11]. Many of them use images in HSV (Hue-SaturationValue) color space [5,7]. Some features could be based on other color spaces, such as RGB, YIQ, and YCbCr [15,16]. A significant part of the wound is placed under the surface of the skin, therefore it is invisible. By using an infrared thermal imaging it is possible to measure difference between health and damaged tissues also under the skin. An information about wound temperatures is useful for prediction of time of healing [12] and determining the depth of the burn injury [13]. In this study a new semiautomatic method for evaluating the area of the wound using infrared (IR) and visible light (VIS) images is designed. Due to the diversity of wounds etiology, appearance, and locations an automatic analysis of their images is impeded.

2

Materials and Methods

The database consists of eleven images of lower limbs’ chronic wounds from five patients. VIS images were recorded using Fujifilm X-T1 camera and IR images with FLIR A300 in accordance to the method described in the paper [14] to provide a common field of view and to perform the fusion between the modalities. 2.1

Methods

Due to the fact, that the analysed skin wounds differ in shape, location in the body, size and colours, the fully automated detection or segmentation method does not work properly. Therefore, to reduce the detection error, before the actual segmentation starts, a rectangular region of interest (ROI) containing the whole wound and adjacent healthy skin is selected. The block diagram including ROI selection and all the further segmentation steps is shown in Fig. 1. The analysis starts from colour space changing, starting points selection by introducing HoG features, reducing the colour features by using the clustering step, edge detection and Gradient Vector Flow (GVF) followed by active contour segmentation technique. The influence of the variability of wound colours is reduced by transferring the previously obtained ROI to other colour scale. The acquired colour images of wounds are in RGB format and due to the high relations between RGB components the usage of chromatic information directly is not proper [5]. In studies

An Automatic Method of Chronic Wounds

251

Fig. 1. Workflow of the proposed method

of wound segmentation [17] the author claim, that the Y CbCr colour space is better to complete this task because the distribution of the skin colour in this space, especially Cr values of skin pixels, is more concentrated. However, based on our experiments it is better to use the Y and Cr channels interchangeably. The Y CbCr colour space represents colours in terms of three components: a Luminance (Y ) and two Chrominances (Cb and Cr) where Y = 0.299R + 0.587G + 0.114B Cb = 128 − 0.168736R − 0.331264G = 0.5B Cr = 128 + 0.5R − 0.418688G − 0.081312B

(1)

and R, G and B denote channels of the RGB colour space. The selected segmentation technique, namely Active Contour Model [18,19] requires to define a starting point. The starting contour is selected based on Histogram of Oriented Gradients (HoG) feature analysis. The HoG descriptor is based on analysing local histograms of image gradient orientations [21,22]. The analysed image is divided into subregions, further called as the “HoG cells”, of the size of [x × s] pixels, where s is the previously defined HoG scale. Then, over the pixels of each cell, a local histogram of gradient directions or edge orientations is calculated. Similarly as for s, the number of gradient directions has to be predefined by the user. The algorithm of HoG computation can be defined in five steps: (1) cell image normalization, (2) gradient computation, (3) orientation binding, (4) gradient histogram estimation, and (5) normalization across the blocks. The gradient image of single cell (step (2)) is computed using Gaussian smoothing function followed by centred derivative mask [101] at σ = 0. An exemplary wound image with corresponding HoG descriptor is shown in Fig. 2. The obtained HoG cells are then classified into two groups using Weighted Fuzzy C-Means (WFCM) clustering method [24]. The feature vectors for the classification consist of HoG features calculated for each of the gradient direction. The obtained clustering results subjected to the morphological erosion are then used for starting points selection required in the segmentation step (the green region in Fig. 4). Moreover, as a clustering result the region possibly including the wound (the area delineated by red in Fig. 4) and the region excluded from further analysis are extracted. Independently from the previously described analysis the input image is prepared for the segmentation step. The rescaled image is then subjected to the

252

J. Czajkowska et al.

Fig. 2. Left: analysed image (Y channel), and Right: corresponding HoG descriptor

WFCM clustering procedure, with multiple clusters. Thanks to it, the grey intensity levels are reduced to 8. An exemplary Cr image after the clustering step is shown in Fig. 3. Next, the Canny Edge detector [25] is applied to extract the wound contour. To reduce the influence of artifacts and wound structure heterogeneity to the edge results, all the edges covered by the starting points are excluded from further analysis. The final segmentation step incorporates Active Contour Model developed by Kass et al. [18], where the optimized contour energy is given by the following equation: 1 (2) E = (Eint (ψ(s, t)) + Eext (ψ(s, t))) ds, 0

where ψ(s, t) is the parametrised curve and Eint and Eext denote the contour energies. While the external energy (Eext ) is defined on the basis of gradient image on previously obtained edges, the internal one (Eint ) defines physical features of deformed curve: 2 ∂ψ 2 ∂ ψ (3) + β 2 , Eint = α ∂s ∂s where α limits the contour tension and β limits its stiffness. Due to the fact, that the snake approach is limited by a local character of the estimated gradient and it will not cause the movements of the curve placed far from the edges, the Gradient Vector Flow (GVF) [19] technique is used. The GVF field g(x, y) = (u(x, y), v(x, y)) minimizes the energy function: = μ(u2x + u2y + vx2 + vy2 ) + |∇f |2 |g − ∇f |2 dxdy, (4) where μ is a parameter set by the user adjusting the trade-off between the equation terms. The snake movement as well the GVF are visualized in Fig. 3. Exemplary final segmentation results are shown in Fig. 4.

An Automatic Method of Chronic Wounds

253

Fig. 3. Left: Cr channel after the clustering step, Right: GVF field and Active Contour Model iterations

Fig. 4. Left: WFCM classification results of HoG descriptor - red area and starting points for the segmentation step - green, Right: final segmentation results - green, and experts delineations - blue and red

3

Results

Evaluation of the obtained results was carried out using the Dice Index (DI) measure. DI was calculated according to the formula: DI =

2T P 2T P + F P + F N

(5)

where DI is Dice Index, TP is number of True Positive, FP is False Positive and FN is False Negative pixels.

254

J. Czajkowska et al.

The analysis of the effectiveness of the method was performed for 11 cases. The segmentation results were confronted with delineations made by two independent experts and numerical values of DI are presented in Table 1. Table 1. Results of Dice Indices between proposed method (M) and two experts (E1 and E2) for all cases Case M vs E1 M vs E2 E1 vs E2 Case M vs E1 M vs E2 E1 vs E2 1

0.782

0.836

0.824

6

0.869

0.868

0.986

2

0.903

0.849

0.921

7

0.828

0.823

0.963

3

0.868

0.836

0.954

8

0.816

0.862

0.872

4

0.845

0.825

0.971

9

0.842

0.830

0.976

5

0.823

0.823

0.984

10

0.894

0.873

0.971

11

0.852

0.833

0.960

The mean and median value of DI for both experts are close to 0.84, whereas inter-experts DI oscillates around 0.94. The similarity of the mean and median values and low standard deviation (0.03) indicate repeatability of the method. High differences between experts in some cases are probably caused by an ambiguous borders or small size of the wounds. The best result is presented in Fig. 5. Due to the high homogeneity of the analysed images, the HoG based detection technique results in wrong starting point area. In such cases the starting region can be roughly defined by an expert - inside the wound. In 3 of the analysed cased, the analysis an expert intervention was required.

4

Discussion

The presented segmentation method gives satisfactory results despite the variety of images. It indicates the versatility of the method. The fusion of IR and VIS images could improve the estimation of treatment efficiency despite seemingly the same area of the wound. In wound assessment a manual method is commonly used. The local temperature could indicate the characteristic features of the healing process. Observation of this activity gains us the possibility of creating chronologic and clear medical documentation. Nowadays, VIS images are widely used due to its noninvasiveness, low costs, widespread accessibility and easiness of interpretation. Because the information about the scale of VIS images are obtained from an external videometric system, the proposed segmentation method gives us the possibility of remote and exact estimation of the skin wound area without the manual measurements. Moreover, it is possible to use the method in clinical practice with very little overhead time due to the low resource consumption and limited interaction with the user.

An Automatic Method of Chronic Wounds

255

Fig. 5. Final segmentation results with the highest Dice index - green, and experts delineations - blue and red

5

Conclusion

The number of patients suffering from chronic wounds is still growing because of the aging of the population. Therefore it is very important to find an easy way to monitor the progress of wound healing. Nowadays, all medical databases are digitally stored and huge part of them are automatically processed. Actually, manual assessment is widely used in clinical practice. Automation of the diagnostic process allows to accelerate and objectify the assessment of the result of treatment. The obtained DI results indicate high compliance of the presented method with the experts’ manual outlines as well as high repeatability. Infrared images also could help to visualize the suspicious areas under the skin surface. Probably the method might be used not only for chronic wound segmentation but also for postoperative ones. Future work includes automatic volume estimation of the wound based on the fusion of scanned 3D surface with VIS and IR images.

256

J. Czajkowska et al.

Acknowledgment. This research is supported by the Polish National Science Centre (NCN) grant No.: UMO-2016/21/B/ST7/02236. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References 1. Han, G., Ceilley, R.: Chronic wound healing: a review of current management and treatments. Adv. Ther. 34(3), 599–610 (2017). https://doi.org/10.1007/s12325017-0478-y 2. Mustoe, T.: Understanding chronic wounds: a unifying hypothesis on their pathogenesis and implications for therapy. Am. J. Surg. 187(5), 65S–70S (2004). https:// doi.org/10.1016/S0002-9610(03)00306-4 3. Sen, C.K., Gordillo, G.M., Roy, S., Kirsner, R., Lambert, L., Hunt, T.K., Gottrup, F., Gurtner, G.C., Longaker, M.T.: Human skin wounds: a major and snowballing threat to public health and the economy. Wound Repair Regen. 17(6), 763–771 (2009). https://doi.org/10.1111/j.1524-475X.2009.00543.x 4. Frykberg, R.G., Banks, J.: Challenges in the treatment of chronic wounds. Adv. Wound Care 4(9), 560–582 (2015). https://doi.org/10.1089/wound.2015.0635 5. Mukherjee, R., Manohar, D.D., Das, D.K., Achar, A., Mitra, A., Chakraborty1, C.: Automated tissue classification framework for reproducible chronic wound assessment. BioMed Res. Int. (2014). https://doi.org/10.1155/2014/851582 6. Marina Kolesnik, M., Fexa, A.: Multi-dimensional color histograms for segmentation of wounds in images. In: International Conference Image Analysis and Recognition 2005, LNCS, vol. 3656, pp. 1014–1022 (2005). https://doi.org/10. 1007/11559573 123 7. Filko, D., Cupec, R., Nyarko, E.K.: Detection, reconstruction and segmentation of chronic wounds using Kinect v2 sensor. Procedia Comput. Sci. 90, 151–156 (2016). https://doi.org/10.1016/j.procs.2016.07.022 8. Fauzi, M.F.A., Khansa, I., Catignani, K., Gordillo, G., Sen, C.K., Gurcan, M.N.: Computerized segmentation and measurement of chronic wound images. Comput. Biol. Med. 60, 74–85 (2015). https://doi.org/10.1016/j.compbiomed.2015.02.015 9. Aslantas, V., Tunckanat, M.: Differential evolution algorithm for segmentation of wound images. In: 2007 IEEE International Symposium on Intelligent Signal Processing, pp. 1–5 (2007). https://doi.org/10.1109/WISP.2007.4447606 10. Song, B., Sacan, A.: Automated wound identification system based on image segmentation and artificial neural networks. In: 2012 IEEE International Conference on Bioinformatics and Biomedicine, pp. 1–4 (2012). https://doi.org/10.1109/ BIBM.2012.6392633 11. Veredas, F., Mesa, H., Morente, L.: Binary tissue classification on wound images with neural networks and bayesian classifiers. IEEE Trans. Med. Imaging 29(2), 410–427 (2010). https://doi.org/10.1109/TMI.2009.2033595 12. Chaves, M., Silva, F., Soares, V., Ferreira, R., Gomes, F., Andrade, R., Pinotti, M.: Evaluation of healing of pressure ulcers through thermography: a preliminary study. Res. Biomed. Eng. 31(1), 3–9 (2015). https://doi.org/10.1590/2446-4740. 0571 13. Renkielska, A., Nowakowski, A., Kaczmarek, M., Dobke, M.K., Grudzinski, J., Karmolinski, A., Stojek, W.: Static thermography revisited - an adjunct method for determining the depth of the burn injury. Burns 31(6), 768–775 (2005). https:// doi.org/10.1016/j.burns.2005.04.006

An Automatic Method of Chronic Wounds

257

14. Woloshuk, A., Krecichwost, M., Juszczyk, J., Pyci´ nski, B., Rudzki, M., Choroba, B., Ledwon, D., Spinczyk, D., Pietka, E.: Development of a multimodal image registration and fusion technique for visualising and monitoring chronic skin wounds. In: Information Technology in Biomedicine, pp. 138–149 (2018). https://doi.org/ 10.1007/978-3-319-91211-0 12 15. Yadav, M.K., Manohar, D.D., Mukherjee, G., Chakraborty, C.: Segmentation of chronic wound areas by clustering techniques using selected color space. J. Med. Imaging Health Inform. 3(1), 22–29 (2013). https://doi.org/10.1166/jmihi.2013. 1124 16. Navas, M., Luque-Baena1, R.M., Morente, L., Coronado, D., Rodriguez, R., Veredas, F.J.: Computer-aided diagnosis in wound images with neural network. In: IWANN 2013, vol. 7903. Advances in Computational Intelligence, pp. 439–448 (2013). https://doi.org/10.1007/978-3-642-38682-4 47 17. Li, F., Wang, C., Liu, X., Peng, Y., Jin, S.: A composite model of wound segmentation based on traditional methods and deep neural networks. Comput. Intell. Neurosci. 2018 (2018) 18. Kass, M., Witkin, A., Terzopoulos, D.: Int J Comput. Vis. 1, 321 (1988). https:// doi.org/10.1007/BF00133570 19. Xu, C., Prince, J.L.: Gradient vector flow: a new external force for snakes. In: Proceedings of the IEEE conference on computer vision and pattern Recognition (CVPR), pp. 66–71. Computer Society Press, Los Alamitos, June 1997 20. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: International Conference on Vision & Pattern Recognition, vol. 2, pp. 886–893 (2005) 21. Czajkowska, J., et al.: HoG feature based detection of tissue deformations in ultrasound data. In: 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 6326–6329 (2015) 22. Czajkowska, J., Juszczyk, J., Pycinski, B., Pietka, E.: Biopsy needle and tissue deformations detection in elastography supported ultrasound. In: Pietka, E., Badura, P., Kawa, J., Wieclawek, W. (eds.) Information Technologies in Medicine. ITiB: Advances in Intelligent Systems and Computing, vol. 471. Springer, Cham (2016) 23. Czajkowska, J., Feinen, C., Grzegorzek, M., Raspe, M., Wickenhofer, R.: Skeleton graph matching vs. maximum weight cliques aorta registration techniques. Comput. Med. Imaging Graph., part II 46, 142–152 (2015). https://doi.org/10.1016/j. compmedimag.2015.05.001 24. Szwarc, P., Kawa, J., Pietka, E.: White matter segmentation from MR images in subjects with brain tumours. In: Third International Conference on Information Technologies in Biomedicine: ITIB 2012, Gliwice, Poland, June 11–13, 2012. Proceedings, pp. 36–46. Springer, Heidelberg (2012) 25. Canny, J.: A computational approach to edge detection. IEEE Trans Pattern Anal. Mach. Intell. 8(6), 679–698 (1986)

Evaluation of Methods for Volume Estimation of Chronic Wounds Jan Juszczyk(B) , Agata Wijata, Joanna Czajkowska, Marta Biesok, Bartlomiej Pyci´ nski, and Ewa Pietka Faculty of Biomedical Engineering, Silesian University of Technology, Roosevelta 40, 41-800 Zabrze, Poland {Jan.Juszczyk,Agata.Wijata,Joanna.Czajkowska, Marta.Biesok,Bartlomiej.Pycinski,Ewa.Pietka}@polsl.pl http://ib.polsl.pl

Abstract. Chronic wounds coexist with many diseases or injuries. The diagnostic and therapeutic process is related to the monitoring and assessment of the wound. The latter includes such parameters as the area or the volume. The aim of this paper is to validate and compare various point cloud based wound volume estimation techniques. The obtained results were compared with Kundin technique (as the most popular) and liquid volume fill method. The point clouds were acquired using three different types of devices. For point cloud data analysis and processing the surface reconstruction algorithm has been performed. The reconstruction algorithm applied Delaunay triangulation and bi-cubic interpolation. In the performed experiments, the point clouds were acquired threefold: from a Time-of-Flight camera, from Claron video-metric system and from a 3D scanner. The obtained volume estimation results were compared with the liquid volumetric fill method as a reference method. According to the obtained results, the best method of wound volume estimation based on the textured point cloud is the usage of Time-of-Flight camera, which results are comparable to the measurements obtained using Kundin device.

Keywords: Chronic wounds

1

· Volume estimation · Point cloud

Introduction

The chronic skin wound diagnosis and treatment are the interdisciplinary medical problem and the wound occurrence increases as the population ages. A chronic wound is defined as a skin loss due to a disease process or injury that cannot be treated [1]. The most common types of chronic wounds are venous stasis ulcers, diabetic foot ulcers, and pressure ulcers. These types of wounds affect the psychophysical condition of the patient by losing their self-esteem as a result of immobility for several months, preventing walking and maintaining personal hygiene. c Springer Nature Switzerland AG 2019 E. Pietka et al. (Eds.): ITIB 2019, AISC 1011, pp. 258–267, 2019. https://doi.org/10.1007/978-3-030-23762-2_23

Evaluation of Methods for Volume Estimation

259

The chronic wounds constitute a serious problem in the health care system. Their treatment generates high costs of medical care in many areas: human suffering, burdens of care or resources. Healing wounds often takes several months or years. Moreover, very often a complete healing is not possible so it can lead to different complications like: depression, limb amputation, and even death. There are numerous factors, which can adversely affect the healing process. The disruptive factors can be divided into local (tissue maceration, infection) and systemic cures (advanced age, malnutrition, diabetes, and renal disease) [2,3]. A report published by the World Health Organization (WHO) shows that in 2014, around 422 million adults with diabetes live in the world (in 1980 there were 108 million). In Poland, the problem of diabetes affects about 9.5% of the population [4]. Due to the upward trend in the prognosis of chronic wounds, for example in the case of diabetic, there is a need to develop new tools that will allow effective monitoring of the progress and treatment of chronic wounds of the skin. The condition of chronic wounds is assessed using various tools. Among them, the measurements of the greatest length and width of wound, its area, depth, and condition of the tissues under the surface of the wound can be mentioned. The measurements of width and length are made during the patient visit to the physician or by measuring the distances in the wound image [5]. The area of wounds is often calculated using measurements obtained by a ruler, a Kundin device - by multiplying them [5], according to the formula: Varea = 0.73 · length · width,

(1)

or by elliptical approximation [6]. The wound segmentation is applied in two-dimensional images and which can be done by a manual wound delineation [7] or by computer methods of image processing. It can be accomplished using the active contour [7–9] or texture analysis [9]. The assessment of the condition of subcutaneous tissues is carried out with ultrasound imaging [10]. An important parameter for the chronic wound assessment is the volume. The volume measurement can be performed either in vivo or using digital images of the wound. In the first case, the measurements are performed using already mentioned measurements obtained by the ruler or the Kundin device, or Liquid Volumetric Fill Method (LVFM). In the latter the wound is filled in with a sterile liquid [5]. The wound volume is estimated using the formula [6]: Vvolume = 0.327 · length · width · depth.

(2)

In this paper, the method based on the above formula (2) is called the Ruler method (RM). The second group of methods are based on three-dimensional digital data analysis, where a digital model of the wound shape is generated. A commonly used tool for creating the model is the 3D scanner, which allows to acquire the shape of the wound surface. However, it does not provide information concerning the texture [5,7]. Some other tools used for this purpose are the stereo-camera

260

J. Juszczyk et al.

[11], the Time-of-Flight camera and the stereovision 3D digital optical system. These devices generate point clouds of the surface together with the information about the texture. The wound volume can be also calculated based on the information from the Kinect sensor [12]. Estimation of wound volume using a point cloud can be done incorporating 3D Reimann sum [5]. A spherical approximation and a prismatic approximation models can also be used for 3D volume reconstruction [13]. The 3D model based on segmentation and reconstruction of wound surface may be used in preparation of treatment methods applying a 3D printer [14]. The main contribution of this paper is to verify several methods of wound volume estimation based on 3D point clouds acquisition in relation to the Ruler technique and to the Liquid Volumetric Fill Method. The latter is assumed as a reference one. The paper is organized as follows. In Sect. 2 the phantom preparation, data acquisition and methodology of volume estimation are described. In Sect. 3 the results are shown and discussed. Conclusions and future plans are described in Sect. 4.

2 2.1

Materials and Methods Phantoms

To verify the wound volume estimation method eight phantoms were prepared and examined. Each of the phantoms was made of cuboid polyurethane sponge with a fragment removed from it. The phantoms are shown in Fig. 1. The surface of the phantoms was covered by a wax to prevent from deformations and to ensure smoothness of the surface. The cavity imitates the wound and the other part represents the skin.

Fig. 1. All phantoms of the skin wound used in the experiment, numbered 1–8 from left to right

For each phantom the shape of the wound and its volume was different. Four of them (3, 5, 6, 7) simulated deep and four of them imitated extensive wounds. Various shapes of the wound were necessary to verify the influence of the shape on volume determination accuracy. To compare several methods of determination the wound’s volume each synthetic wound was measured with a ruler by measuring its greatest dimensions

Evaluation of Methods for Volume Estimation

261

and the depth. Moreover each phantom was scanned using three different 3D scanning devices. Each phantom was set to the initial position (Fig. 2(a)) and the acquisition parameters were consistent to the description presented in [3]. Each phantom was scanned four times (first time in the initial position, then after three consecutive rotations of 90 degrees around the optical axis) by the following devices: (1) photo camera (Fujifilm XT-1), (2) thermal imaging camera (IR camera, FLIR A300) and by 3D scanning devices: (3) 3D handheld scanner (3D Scanner, ZScanner 700), in this paper called as 3D Scanner 3) Time of Flight camera (ToF, Mesa Imaging SR4000), in this paper called as ToF Camera (ToF), and (4) Stereography camera (Claron, MicronTracker ClaroNav), in this paper called as Sterocamera. In total, 32 datasets were obtained and each of them consisted of: 1. 2. 3. 4.

color photography from the photo camera (VIS, Fig. 2(a)), thermovision image(IR, Fig. 2(b)), mesh from the 3D scanner (Fig. 2(c)), scattered point cloud generated by the ToF camera with intensity map (Fig. 2(d)), 5. scattered point cloud from the Claron stereocamera with grayscale image (Fig. 2(e)).

The ground truth volumes (Vwater ) of the wounds were measured using liquid volumetric fill method [5]. 2.2

Image Fusion

Using the intrinsic and extrinsic parameters of the cameras computed during a calibration procedure [3], in the first step, the fusion of VIS and thermovision image was performed (Fig. 3(a)). In the second step, the photo image was registered and fused to grayscale image from the Claron system (Fig. 3(b)). The fusion was also carried out between resulting image from the first stage and the intensity map generated by the ToF device (Fig. 3(c)). As the 3D scanning devices yield point cloud in common coordinate systems with grayscale values, finally there were generated three point clouds textured by the image fusion obtained at the second step. 2.3

Automatic Volume Estimation

The manual delineation of a synthetic wound was performed on 2D image registered with 3D point cloud. Points inside the delineation were labelled as the wound (P Cwound ), and the other points were labelled as the skin (P Cskin ) (Fig. 4(a)). Both point clouds (wound and skin) are scattered with no defined faces and normals. The scattered character of point clouds is forced by the X and Y location of point and depends on their depth (Z location) relatively to the camera

262

J. Juszczyk et al.

Fig. 2. Acquired image modalities of a phantom: (a) VIS, (b) IR, (c) 3D scanner, (d) ToF amplitude image, (e) Claron left and right image

Fig. 3. An example of a fusion: (a) VIS and IR images, (b) Fig. 3(a) and left Claron images, (c) Fig. 3(a) and amplitude image of ToF

Evaluation of Methods for Volume Estimation

263

Fig. 4. Point clouds on following stages: (a) segmented point cloud, where red points belong to wound, blue to skin, (b) Aligned point clouds of skin (blue) and wound (red), (c) reconstructed skin, (d) wound volume estimated by composition of two point clouds

position. Due to this fact, for both point clouds the new mesh grid is created using Delaunay triangulation [15,16]. The projection plane for points, in Delaunay triangulation, is normal to the optical axis of scanning device (Z direction). The new point clouds are gridded in the common mesh grid space. It means, that the distances between points projection on XY-plane are the same for both point clouds, and the origins for both spaces are common (Fig. 4(b)). In the next step, the space after wound separation in skin point cloud is interpolated by using bi-cubic interpolation (Fig. 4(c)) [7,17]. After the reconstruction of skin covering the wound, the wound volume is determined (Fig. 4(d)). The wound volume is determined using a projection of the wound point cloud to XY plane. From the skin point cloud there are chosen points which projections are covered by the wound point cloud projection. Due to the fact that the skin point cloud and the wound point cloud are located in common coordinates space, the points belonging to the wound are located exactly below (along Z direction) the points of reconstructed skin surface. Therefore, the estimated volume value can be determined without surface reconstruction. The value of wound volume

264

J. Juszczyk et al.

is calculated from a sum of distances between corresponding points belonging to the skin point cloud and wound point cloud, respectively. To evaluate the investigated technique, the wound volume was also estimated using the Eq. 2, where the width, length and depth have been measured using a ruler. The proposed methodology can be transferred to the images of real wounds. The use of phantoms was caused by the application of multiple measurements (also invasive) not always possible to perform on the patient, such as the Ruler and Liquid Volumetric Fill Method (LVFM).

3

Results and Discussion

Wound volumes, determined by the Ruler method, and obtained by the proposed method, based on different point clouds (ToF, Stereocamera, Scanner), are presented on the Fig. 5. The volume given by LVFM are shown for comparison and it is treated as a reference method. Volume measurements using the LVFM were performed 30 times by 3 independent experts. The analysis was carried out using the percentiles because, due to the occurrence of the meniscus and the human factor, the obtained volumes did not have a normal distribution (Lilliefors test, Table 1). The experts subjectively assessed the water level in relation to the edge of the wound (analogously to the subjective preparation of expert outlines). These factors can be sources of large errors between the reference method and the others. Table 1. The results of Lilliefors test for LVFM for each phantom Phantom 1 p-value

2

3

4

5

6

7

8

0.150 0.004 0.001 0.251 0.051 0.025 0.005 0.001

The volume estimation results are shown in Fig. 5. The chart shows the average volumes obtained using the Stereocamera, the ToF camera and the 3D Scanner point clouds, and the Ruler method. The whiskers indicate max and min, whereas the estimated volumes for the liquid volumetric fill method are given as boxplots. For proposed method of wound volume estimation, the largest difference between the Reference Wound Volume (RW V ) and the estimated volume were obtained for Stereocamera point clouds. The most repeatable results were obtained for ToF camera point clouds. For the reference method, the dependency (Pearson correlation coefficient) between mean volume and standard deviation of mean volume was determined and it is equal to 0.81. It allows assuming the high correlation between estimated volume and measurement accuracy exist. For higher volume of a wound the accuracy is lower. Similarly, the correlation coefficient for point cloud method has been determined, and its value is 0.77, 0.83, 0.95

Evaluation of Methods for Volume Estimation

265

Fig. 5. The estimated volumes for the liquid volumetric fill method (VLV F M ) given as boxplot, and the mean values of proposed method (VRuler , VStereocam , VT oF , VScanner ). The phantoms are sorted according to the increasing average volume designated by LVFM. The results are presented in cm3

for ToF, Sterocamera and 3D scanner point clouds, respectively. For the Ruler method the correlation coefficient is 0.66. Most of the estimated volumes using point clouds and by the Ruler method take a value lower than the LVFM. This dependence suggests underestimating the volume using these methods. The volume estimation method based on the ToF point cloud and the Ruler method give the comparable results. However the point cloud method is the non-contact technique what ensures the measurements sterility. Measurements using the Ruler method require contact with wound and it’s harder to perform in the clinical condition because of e.g. the necessity of using a plastic sheeting to protect the wound. In order to quantify the accuracy of the evaluated methods, a relative error (RE) was calculated according to the formula: RE =

|RW V − V E| · 100% RW V

(3)

where RW V is the reference volume calculated using the liquid volumetric fill method and V E is the estimated wound volume. The results are presented on the Fig. 6. The lowest mean RE has been obtained for the Ruler method (18%). Quite similar results (20%) were obtained using proposed method with ToF point cloud. The highest RE were obtained for the Stereocamera point cloud (42%). RE for 3D scanner point cloud was equal 25%. Analysis of the size of the spread of measurements (Fig. 5) suggests that the ToF point cloud method and the Ruler method are comparable and significantly better than stereocamera system and 3D scanner. Considering the widespread use of the Ruler method to estimate the wound volume, it may be concluded that the ToF camera can be used interchangeably with it.

266

J. Juszczyk et al.

Fig. 6. Range of relative errors RE for the presented methods of estimation the volume of the wound. VLV F M is defined in the Eq. 2, VStereocam , VT oF and VScanner are the volumes calculated using the Claron system, the ToF camera and the 3D scanner, respectively. VRuler are obtained based on the formula 2. The phantoms are sorted according to the increasing average volume designated by LVFM

The poor quality of the Stereocamera point cloud may result from the spatial resolution of this device in the Z-axis. The low resolution in this case causes the compensation of the wound depth and it generates an incorrect wound model.

4

Conclusion

The volume of chronic wounds is one of the basic parameters for the evaluation of the diagnostic and therapeutic process. Although the main advantage of the volume measurement with the ruler method is its simplicity, yet the greatest disadvantage of this method is a human impact on the results during measurements and the necessity of touching the ruler to the surface of the wound. The use of non-contact tools allows to compute the volume of the wound in a non-invasive, sterile and comfortable way. The proposed method using the point cloud based on the manual segmentation gives a reliable assessment of the volume of the wound. Objective and independent measurement methods makes it possible to correctly assess the progress of the treatment. The comparison of different methods of surface acquisition indicates that the best results are obtained in the case of a point cloud generated by the Time of Flight camera, due to the low relative error. Moreover, the texture applied on the surface of the cloud facilitates the visual assessment of the wounds. The proposed method using the point clouds supports the monitoring of chronic wounds healing. The further work will focus on extending the proposed method with an automatic wound segmentation to speed up the diagnostic process. Acknowledgement. This research is supported by the Polish National Science Centre (NCN) grant No.: UMO-2016/21/B/ST7/02236. The founders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Evaluation of Methods for Volume Estimation

267

References 1. Frykberg, R.G., Banks, J.: Challenges in the treatment of chronic wounds. Adv. Wound Care 4(9), 560–582 (2015) 2. Treuillet, S., Albouy, B., Lucas, Y.: Three-dimensional assessment of skin wounds using a standard digital camera. IEEE Trans. Med. Imaging 28(5), 752–762 (2009) M., Juszczyk, J., Pyci´ nski, B., Rudzki, M., Choroba, 3. Woloshuk, A., Krecichwost, B., Ledwon, D., Spinczyk, D., Pietka, E.: Development of a multimodal image registration and fusion technique for visualising and monitoring chronic skin wounds. Inf. Technol. Biomed., 138–149 (2018) 4. Global Report on Diabetes. World Health Organization (2016) 5. Shah, A.J., Wollak, C., Shah, J.B.: Wound measurement techniques: comparing the use of ruler method, 2D imaging and 3D scanner. J. Am. Coll.E Clin. Wound Spec. 5(3), 52–57 (2013) 6. Jorgensen, L.B., Sorensen, J.A., Jemec, G.B., Yderstraede, K.B.: Methods to assess area and volume of wounds - a systematic review. Int. Wound J. 13(4), 540–553 (2016) 7. Zvietcovich, F., Castaneda, B., Valencia, B., Llanos-Cuentas, A.: A 3D assessment tool for accurate volume measurement for monitoring the evolution of cutaneous leishmaniasis wounds. In: Conference Proceedings: Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 2025–2028 (2012) 8. Hettiarachchi, N.D.J., Mahindaratne, R.B.H., Mendis, G.D.C., Nanayakkara, H.T., Nanayakkara, N.D.: Mobile-based wound measurement. In: Proceedings of the IEEE Point-of-Care Healthcare Technologies, pp. 298–301 (2013) 9. Loizou, C., Kasparis, T., Polyviou, M.: Evaluation of wound healing process based on texture image analysis. J. Biomed. Graph. Comput. 3(3), 1–13 (2013) 10. Czajkowska, J., Badura, P.: Automated epidermis segmentation in ultrasound skin images. Proc. Innov. Biomed. Eng. (2018) 11. Wannous, H., Lucas, Y., Treuillet, S., Albouy, B.: A complete 3D wound assessment tool for accurate tissue classification and measurement. In: 15th IEEE International Conference on Image Processing, pp. 2928–2931 (2008) 12. Filko, D., Cupec, R., Nyarko, E.K.: Detection, reconstruction and segmentation of chronic wounds using Kinect v2 sensor. Procedia Comput. Sci. 90, 151–156 (2016) 13. Zhu, F., Bosch, M., Woo, I., Kim, S., Boushey, C.J., Ebert, D.S., Delp, E.J.: The use of mobile devices in aiding dietary assessment and evaluation. IEEE J. Sel. Top. Signal Process. 4(4), 756–766 (2010) 14. Gholami, P., Ali Ahmadi-pajouh, M., Abolftahi, N., Hamarneh, G.: Segmentation and measurement of chronic wounds for bioprinting. IEEE J. Biomed. Health Inform. 22(4), 1269–1277 (2018) 15. Albouy, B., Treuillet, S., Lucas, Y., Pichaud, J.C.: Volume estimation from uncalibrated views applied to wound measurement. In: International Conference on Image Analysis and Processing ICIAP 2005: Image Analysis and Processing ICIAP 2005, pp. 945–952 (2005) 16. Amenta, N., Bern, M., Eppstein, D.: The crust and the beta-skeleton: combinatorial curve reconstruction. Graph. Model. Image Process. 60, 125–135 (1998) 17. Keys, R.G.: Cubic convolution interpolation for digital image processing. IEEE Trans. Acoust. Speech Signal Process. 29(6), 1153–1160 (1981)

Infrared and Visible Image Fusion Objective Evaluation Method Daniel Ledwo´ n(B) , Jan Juszczyk, and Ewa Pietka Faculty of Biomedical Engineering, Silesian University of Technology, Zabrze, Poland {daniel.ledwon,jan.juszczyk,ewa.pietka}@polsl.pl

Abstract. Infrared–visible (IR–VIS) pixel level image fusion has been applied in many fields in which thermal measurements are used. This process provides spatial information about objects and invisible details to infrared images. In medical applications, infrared thermography provides information about physiology and is a supplement of classical diagnostic methods. The purpose of fusion in presented study was to improve human body thermograms. The novel objective fusion results comparison method, based on statistical analysis, has been proposed to indicate the best fusion method. Fusion methods were chosen based on literature and subjective assessment. Resulting images were parameterized with numerical fusion evaluation metrics. Obtained numerical values were cumulated to one parameter, corresponding to one image. These parameters were applied to compare results using Friedman test and post–hoc 1 × N procedure. Keywords: Image fusion · Image registration Infrared imaging · Statistical analysis

1

· Multimodal imaging ·

Introduction

Image fusion is a process of combining information from multiple input images of one object, coming from different acquisition processes (e.g. with different modalities, camera settings or at time intervals). In literature, the main goal of data integration is to receive specify, unified information about the object. The validity of fusion is justified by receiving higher accuracy in further analysis than with only one data source [5,13,18]. Image fusion is often the one stage of decision–making process based on multiple images. Depending on the level, on which data integration takes place, fusion process may be classified into three classes: pixel fusion, feature fusion and decision fusion [9,17]. The result of fusion at the pixel level is an image, which should provide more information for observer and further image processing [2]. Currently, novel fusion methods are still developed and applied in an increasing number of fields related to image processing and analysis. The main areas in which it is currently used include remote sensing, medical imaging, surveillance systems and photography [11,20]. c Springer Nature Switzerland AG 2019 E. Pietka et al. (Eds.): ITIB 2019, AISC 1011, pp. 268–279, 2019. https://doi.org/10.1007/978-3-030-23762-2_24

Infrared and Visible Image Fusion Objective Evaluation Method

269

Fusion of images in visible range (VIS) with infrared images (IR) is a popular issue of multimodal image fusion. Spatial information in thermal images is limited in compare to visible image, so the structure of objects is not well visible, but imaging in infrared range allows invisible details and objects to become observable. The main goal of fusion of these two modalities is obtaining an image containing relevant informations included in images from particular modalities [9]. It is comonly useful in every field in which the thermal measurements are used. The advantage of this process is increasing of efficiency in the night vision imaging systems used for navigation and tracking. These types of applications are used in military and pedestrian detection and tracking systems. Biometric identification systems applying image fusion are also being developed. In these systems images obtained as a result of the VIS–IR fusion are used to improve the detection efficiency in various conditions [9]. In medical applications, thermal imaging is used to get the information about physiology. The purpose of fusion in this study was to improve thermograms of human body with the spatial information from visible image. To achieve this goal, the best fusion method should be indicated in an objective way. Despite the increasing number of publications about image fusion, there is still a lack of systematic way to verify and compare results obtained with different methods and to indicate the best solution. In the literature, the evaluation methods of fusion results are divided into a subjective assessment, based on a visual assessment of the expert group and an objective one, using numerical parameters for assessing the fusion results [7,12]. Disadvantages of subjective assessment are high costs and the dependence of the results on the test conditions and the knowledge, preferences and experience of the expert [7]. The objective evaluation consists of determining the numerical metrics values. They allow the quality of the resulting image and the spectral and spatial similarities between this image and input images to be determined. The comparison of evaluation metrics values in the literature is the most often realized on a small number of cases, which are compared directly with results obtained by other methods. The latest publications suggest using statistical methods to compare obtained results [12]. The aim of this work was to develop a methodology for an objective evaluation of the IR–VIS fusion results. Proposed method based on evaluation metrics values normalisation and cumulation. Comparison of obtained numerical values with Friedman test allowed the best from selected IR–VIS image fusion methods to be identified. In the Table 1 tested fusion methods and the signatures have been presented. These methods were chosen based on the current literature about IR–VIS fusion [9] and medical images fusion [8]. At the high level of generality they can be divided into spatial domain algorithms and transform domain algorithms. From the first group the simply methods like average, maximum and PCA were chosen. The PCNN fusion variant base on images pixel values in spatial domain was tested as a representative fusion method based on neural networks [21]. From the group of algorithms based on transform domain Laplacian Pyramid and Discrete Wavelet Transform were chosen as two very

270

D. Ledwo´ n et al.

popular decomposition algorithms. Using these decomposition algorithms, several fusion strategies were tested and evaluated by subjective, visual assessment. Only some of the best methods, presented in Table 1 have been included to the further analysis. The other tested algorithms, also belonging to the group of transform domain fusion methods, were two different variants of methods based on mathematical morphology [1,14]. All of mentioned methods were applied in fusion algorithm based on IHS transform. The simplest fusion method was replacing of intensity channel by grayscale image from other modality. This approach was also included in the further analysis. Table 1. Tables present chosen fusion methods

2 2.1

Methodology Fusion Algorithm

Image fusion was performed to improve the interpretation of infrared images by the spatial information about objects from visible images. Before the procedure of fusion, input images were registered and prepared in accordance with assumptions which have been described in the next paragraph. Visible image was changed to grayscale and next both VIS and IR images were normalized to values from 0 to 1. With this operation three dimensional information about color from visible image was changed to one dimensional information about light intensity. It preserves objects edges and their spatial characteristic which was the most important for fusion result. The input infrared image was mapped to the range of chosen colormap after normalization. Next, the color thermogram was transformed from RGB to IHS color space in which Hue and

Infrared and Visible Image Fusion Objective Evaluation Method

271

Saturation channels contain information about color and Intensity channel is a grayscale representation of the image. Only the Intensity channel was used in fusion and result was transformed to RGB with H and S channels from IR image. Similar solutions often have been used in color image fusion [6]. The general scheme of image fusion algorithm has been presented in Fig. 1.

Fig. 1. General scheme of IR–VIS images fusion algorithm. It shows steps from two original input images to one fusion result. More complex processes like image registration and fusion are marked with a dotted line. Operations presented in blocks with solid line border are constant for every input set and fusion method

2.2

Fusion Methods Evaluation

To evaluate results of pixel level image fusion, resulting image had to be transform to numerical metric values. Five fusion evaluation metrics were chosen [7]: standard deviation, average gradient, spatial frequency, fusion mutual information and entropy. From many other proposed novel metrics, chosen metrics were relatively easy to compute and interpret with regard to image quality and its features. Each of the proposed metrics assess the different parameter of resulting image such as contrast, the level of details and noise, amount of information or similarity to input images. All chosen metric values should be interpreted

272

D. Ledwo´ n et al.

proportionally—higher numerical value means that the result is better. Since color images were analyzed, the values of the parameters were calculated for each of the RGB channels and then averaged to a single value. Obtained values of numerical fusion evaluation metrics were used in different fusion method comparison. This process was carried out based on statistical analysis methods. Due to recommendations contained in the article Statistical comparison of image fusion algorithms: Recommendations [12], the Friedman test, which is non–parametric equivalent of the repeated–measures analysis of variance, has been used. In the first step, like it was presented in [12], every value was treated as different observation. It means that for one resulting image, the five independent observations were obtained. In the post–hoc 1 × N analysis the method with the best rank was compared to results from other methods, to check if it achieved the significant difference. It was obtained using Friedman test with Bonferroni correction. The statistical analysis were performed using the R language and StatService software [15]. The proposed modification of statistical comparison of image fusion methods is based on a trial of creation one numeric value from all evaluation metrics. First step was normalization to the range from 0 to 1 of metric values obtained from different methods, in the way which did not change the ranks from Friedman test. Next, the normalized metrics were added together, which allowed obtaining one evaluation value for one image. Outlier results were removed based on the distribution of that values. After these operations, prepared data were analyzed with Friedman test and post–hoc 1 × N analysis to indicate the best fusion methods. Figure 2 presents a general scheme of proposed image fusion methods comparison.

Fig. 2. Scheme of proposed fusion results evaluation and methods comparison algorithm. It shows steps from resulting image, by parametrization, to statistical analysis

Infrared and Visible Image Fusion Objective Evaluation Method

273

Additionally, the importance of using many evaluation metrics in assessment process has been checked. In this case, the Kendall coefficient of concordance was computed for every pair of metrics. Next, the significance of every coefficient has been tested with the chi-square test with Holm’s correction for multiple comparison.

3 3.1

Experiment Measuring Station

The first step of presented methodology was to create a database of visible and infrared images of human body parts. The images acquisition was realized with the usage of measuring station and software created in grant No. UMO2016/21/B/ST7/02236 and described in details in [19]. The measuring station was a full setup of chronic wounds monitoring system. In presented study only two modalities has been used: thermal infrared camera FLIR A300 and one lens of stereo camera Micron Tracker Claron Hx40 which was a part of optical tracking system. The thermal imaging camera recorded images with resolution 320 × 240px and the resolution of grayscale stereo camera was 1024 × 768px. Cameras were placed in a stereovision setup on one tripod, which was able to move and rotate without displacement between cameras. Their settings were manually adjusted with the use of a calibration pattern in such a way that their optical axes were arranged in parallel and as close as possible to each other. 3.2

Image Acquisition

In order to verify the fusion methods comparison methodology, a database included 72 sets of images was built. The images were taken from 5 different body parts which have been presented in Table 2. In some images, the markers made from paper soaked with cold water were applied to exposed skin. It was done to improve the precision of visual assessment of registration and fusion results. Table 2. The number of acquisitions carried out for each human body part Body part Number of images Back

13

Face

13

Arm

7

Hand

14

Leg

25

274

3.3

D. Ledwo´ n et al.

Image Registration

In Fig. 3, a general scheme of IR–VIS images registration has been presented. The proposed image registration process has been divided into two stages. In the first step, the alignment checkerboard pattern has been used to initial adjustment of coordinate systems from two cameras. This method has been described in details in [19]. The second stage involved registration of initially adjusted images with area–based method using mutual information of visible and infrared images gradient. The operation of determining the first image derivative was carried out with the use of the Sobel operator. In both stages of registration the affine transformation has been determined. Described methodology of registration was fully automated and for one set of parameters well results were obtained for the whole images dataset.

Fig. 3. General scheme of IR—VIS images registration methodology. Left side of the scheme shows steps of initial alignment stage base on checkerboard phantom images. The right side shows the stage of accurate alignment of objects on input images base on mutual information of images gradient. After two stages the final affine transform of VIS image has been determined

4

Results and Discussion

The dataset of 72 visual and thermal images after registration were fused with 14 methods presented in Table 1. Figure 4 shows exemplary set of results for one pair of input images. Obtained results has been compared using proposed comparison method. The variant in which every metric value is an independent observations was tested. The next step of data analysis was to verify proposed methodology of evaluation metrics preparation. After normalisation and sum of metric values for

Infrared and Visible Image Fusion Objective Evaluation Method

275

Fig. 4. Results obtained from every tested method for example input images of leg. Images (a) and (b) are the input VIS and IR, respectively

every image the outliers images for every method were removed from further analysis. For tested dataset it resulted in the removal of 13 images. The distribution of new metrics after these steps for every method has been presented in Fig. 5. Table 3 contains results from Friedman test and post–hoc analysis in 1 × N variant. Objective evaluation with proposed method indicated the best fusion algorithm based on Laplacian Pyramid decomposition and fusion strategy in which lower levels of pyramid was taken from visible image, and the highest level from infrared image. In this method, the high frequency details were transferred from visible image to thermogram with the smallest impact to temperature distribution. The visual assessment of results provided from this method proved, that the edges of objects were better seen and more details than in standard thermogram can be spotted. Other methods placed in the high places of resulting ranking have been characterized by different level of disturbance of temperature distribution. The advantage in compare to the LPIR−V IS method was that objects in the out-

276

D. Ledwo´ n et al.

Fig. 5. Distributions of sum of normalized evaluation metrics values of resulting images for every tested method after removing the outliers. The red central mark indicates the median, the bottom and top edges of the box indicate the first and third quartile, respectively. The whiskers extend to the most extreme data points and red pluses are the outliers Table 3. The rank values obtained by individual methods for proposed parameter after removing the outliers. P-values after Bonferroni correction refer to the comparison of each method with LPIR−V IS Rank

Bonferroni

LPIR−V IS

2.6271 -

LPavg−V IS

4.1356 0.0502

Morphcenter

4.9831 0.0022

DWTmax−max

5.0678 0.0015

LPmax−V IS

5.7458 0.0001

Morphmulti10

6.5424 0.0000

LPmax−max

6.9492 0.0000

DWTavg−max

7.7966 0.0000

Morphmulti5

8.4746 0.0000

MAX

8.6610 0.0000

VIS

10.6610 0.0000

PCNN

10.6780 0.0000

PCA

11.1525 0.0000

AVG

11.5254 0.0000

Infrared and Visible Image Fusion Objective Evaluation Method

277

put images from these algorithms got more three-dimensional character. It was caused by transfer of lights reflectance and shadows, visible in VIS images. Results obtained for the methods from the bottom of the ranking provided images with lower quality in comparison to others. Also the level of temperature interpretation disturbance was higher i.a. due to high amount of information about light from VIS image. 4.1

Role of Different Evaluation Metrics

To justify the choice of different fusion evaluation metrics and to show the necessity of using each of them in statistical analysis the Kendall’s coefficient of concordance has been determined. The coefficient of concordance among several judges were computed and tested for rankings from all five metrics. The value of the W coefficient was 0.2113, with p–value 0.3930, which means that there were no significance similarity between rankings. In the post–hoc procedure rankings received for assessment with different metrices were compared in pairs and next the value of Kendall’s W and p–values with Holm’s correction were computed (Table 4). Table 4. Values of Kendall’s coefficient of concordance obtained for comparisons of rankings awarded for various fusion evaluation metrics for different methods. None of presented values achieved statistical significance level IE

AG

STD

IE

-

0.6769 0.5121 0.5670 0.5143

AG

0.6769 -

FMI

0.4835 0.9363 0.2396

STD 0.5121 0.4835 SF

SF

0.5670 0.3934

0.5670 0.9363 0.5670 -

0.1802

FMI 0.5143 0.2396 0.3934 0.1802 -

The obtained results confirm the importance of usage many different metrics in statistical comparison of image fusion algorithms. In other studies presented new image fusion methods, the comparison with reference results is often realized by comparison with absolute values for every metric. Statistical analysis based on a larger group of cases provide to indicate the best methods and to make conclusions regarding evaluation metrics.

5

Conclusion

The proposed methodology of fusion method comparison by conducting statistical analysis of numerical fusion evaluation metrics allowed the best amongst tested methods to be indicated. Results obtained with objective evaluation confirmed conclusions from visual, subjective assessment.

278

D. Ledwo´ n et al.

Thanks to the normalization and summation of parameters calculated for each image, it was possible to reject outliers for any method. It increased the reliability of the conducted statistical analysis and the differences between the ranks obtained by individual methods. Proposed evaluation method indicated two variants of method based on Laplacian Pyramid decomposition as two best fusion algorithms. Subjective assessment made by authors confirmed this results, but also proved that the resulting images differ in amount of information about depth from light and shadows. These differences come from fusion strategies used with LP decomposition in a fusion process. Similar results were also obtained with DWT decomposition, and lower positions of these methods in resulting ranking may have been caused by lower quality and artefacts in resulting images. In the case of human body parts, results obtained from pixel level IR–VIS image fusion were characterized by higher level of spatial information than standard thermograms. Different methods gave resulting images with different levels of disturbance in distribution of temperature. It was often caused by illuminated regions in VIS images. The best methods indicated by proposed objective evaluation were characterized by the highest level of spatial details without influence on temperature distribution. The quality of resulting images obtained by these methods was also better than from others. The transfer of spatial information about edges, depth and details from visible image to thermogram could improve the interpretive value of these images. In medical diagnosis IR–VIS fusion may simplify discovering the dependency between objects invisible in thermogram and the temperature around them. Further investigation should concern the indicated fusion methods and evaluation methodology on images containing real, known abnormalities. In further analysis the greater number of fusion algorithms and their parameters should be taken into account. Presented method should also be examine on different image databases with greater number of images. Acknowledgements. This research is supported by the Polish National Science Center grant No.: UMO-2016/21/B/ST7/02236. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References 1. Bai, X.: Morphological center operator based infrared and visible image fusion through correlation coefficient. Infrared Phys. Technol. 76, 546–554 (2016) 2. Dhirendra, M.: Image fusion techniques: a review 7(3), 406–410 (2015) 3. El-Gamal, F.E.Z.A., Elmogy, M., Atwan, A.: Current trends in medical image registration and fusion. Egypt. Inform. J. 17(1), 99–124 (2016) 4. Flusser, J., Sroubek, F., Zitov, B.: Image fusion: principles, methods, and applications. Signal Process. Tutor. EUSIPCO 2007, 60 (2007) 5. Hall, D.L.D.L., Member, S., Llinas, J.: An introduction to multisensor data fusion. Proc. IEEE 85(1), 6–23 (1997) 6. He, C., Liu, Q., Li, H., Wang, H.: Multimodal medical image fusion based on IHS and PCA. Procedia Eng. 7, 280–285 (2010)

Infrared and Visible Image Fusion Objective Evaluation Method

279

7. Jagalingam, P., Hegde, A.V.: A Review of Quality Metrics for Fused Image. Aquat. Procedia 4(Icwrcoe), 133–142 (2015) 8. James, A.P., Dasarathy, B.V.: Medical image fusion: a survey of the state of the art. Inf. Fusion 19(1), 4–19 (2014) 9. Jin, X., Jiang, Q., Yao, S., Zhou, D., Nie, R., Hai, J., He, K.: A survey of infrared and visual image fusion methods. Infrared Phys. Technol. (2017) 10. Lahiri, B.B., Bagavathiappan, S., Jayakumar, T., Philip, J.: Infrared physics & technology medical applications of infrared thermography: a review. Infrared Phys. Technol. 55(4), 221–235 (2012) 11. Li, S., Kang, X., Fang, L., Hu, J., Yin, H.: Pixel-level image fusion: a survey of the state of the art. Inf. Fusion 33, 100–112 (2017) 12. Liu, Z., Blasch, E., John, V.: Statistical comparison of image fusion algorithms: recommendations. Inf. Fusion 36, 251–260 (2017) 13. Mandic, D., Obradovic, D., Kuh, A., Adali, T., Trutschel, U., Golz, M., De Wilde, P., Barria, J., Constantinides, A., Chambers, J.: Data fusion for modern engineering applications: an overview. Artif. Neural Netw. Form. Model. Appl. ICANN 2005, 715–721 (2005) 14. Mukhopadhyay, S., Chanda, B.: Fusion of 2D grayscale images using multiscale morphology. Pattern Recognit. 34(10), 1939–1949 (2001) 15. Parejo, J.A., Garc´ıa, J., Ruiz-Cortés, A., Riquelme, J.C.: Statservice: Herramienta de an´ alisis estad´ıstico como soporte para la investigaci´ on con metaheur´ısticas. In: Actas del VIII Congreso Expa˜ nol sobre Metaheur´ısticas, Algoritmos Evolutivos y Bio-inspirados (2012) 16. Nahm, F.S.: Infrared thermography in pain medicine. Korean J. Pain 26(3), 219– 222 (2013) 17. Pohl, C., Van Genderen, J.L.: Review article multisensor image fusion in remote sensing: concepts, methods and applications 19 (1998) 18. Wald, L.: Some terms of reference in data fusion. IEEE Trans. Geosci. Remote. Sens. 37(3), 1190–1193 (1999) M., Juszczyk, J., Pyci´ nski, B., Rudzki, M., Choroba, 19. Woloshuk, A., Krecichwost, B., Ledwon, D., Spinczyk, D., Pietka, E.: Development of a multimodal image registration and fusion technique for visualising and monitoring chronic skin wounds. In: International Conference on Information Technologies in Biomedicine, pp. 138– 149. Springer (2018) 20. Yadav, J., Dogra, A., Goyal, B., Agrawal, S.: A review on image fusion methodologies and applications. Res. J. Pharm. Technol. 10(4), 1239–1251 (2017) 21. Zhanbin, W., Yide, M.: Dual-channel PCNN and its application in the field of image fusion. In: Proceedings - Third International Conference on Natural Computation, ICNC 2007 1(60572011), pp. 755–759 (2007)

Wavelet Imaging Features for Classification of First-Episode Schizophrenia Kateˇrina Marˇsálov´ a1,2(B) and Daniel Schwarz1,2(B) 1

2

Masaryk University, Faculty of Medicine, Kamenice 5, 625 00 Brno-Bohunice, Czech Republic Institute of Biostatistics and Analyses, Ltd., Poˇstovsk´ a 68/3, 602 00 Brno, Czech Republic {marsalova,schwarz}@iba.muni.cz

Abstract. Recently, multiple attempts have been made to support computer diagnostics of neuropsychiatric disorders, using neuroimaging data and machine learning methods. This paper deals with the design and implementation of an algorithm for the analysis and classification of magnetic resonance imaging data for the purpose of computer-aided diagnosis of schizophrenia. Features for classification are first extracted using two morphometric methods: voxel-based morphometry (VBM) and deformation-based morphometry (DBM); and then transformed into a wavelet domain by discrete wavelet transform (DWT) with various numbers of decomposition levels. The number of features is reduced by thresholding and subsequent selection by: Fisher’s Discrimination Ratio, Bhattacharyya Distance, and Variances – a metric proposed in the literature recently. Support Vector Machine with a linear kernel is used here as a classifier. The evaluation strategy is based on leave-one-out crossvalidation. The highest classification accuracy – 73.08% – was achieved with 1000 features extracted by VBM and DWT at four decomposition levels and selected by Fisher’s Discrimination Ratio and Bhattacharyya distance. In the case of DBM features, the classifier achieved the highest accuracy of 72.12% with 5000 discriminating features, five decomposition levels and the use of Fisher’s Discrimination Ratio. Keywords: Classification · Machine learning · Neuroimaging · Schizophrenia · Support vector machines · Wavelet transformation

1

Introduction

In recent years, medical imaging methods have been intensively developed to provide comprehensive and extensive data for further processing. This progress in the field of neuroscience allows, on the one hand, a thorough studying of the brain structures, but also discovering of connections between the brain structure and its function. Brain image analysis in psychiatric research has begun to be used c Springer Nature Switzerland AG 2019 E. Pietka et al. (Eds.): ITIB 2019, AISC 1011, pp. 280–291, 2019. https://doi.org/10.1007/978-3-030-23762-2_25

Wavelet Imaging Features for Classification

281

with great potential, which led to finding a connection between neuropsychiatric disorders and structural changes of the brain. Machine learning and artificial intelligence are very promising methods for computer aided diagnosis of neuropsychiatric disorders. The use of these computing methods is highly desirable, as they might create an objective way of early diagnosis, resolution and prognosis. The objective and early diagnosis would then allow for a higher treatment efficiency. Many attempts have been made to use machine learning and artificial intelligence for classification of patients and healthy controls, and also for prediction of the disease progression based on neuroimaging data [9,26]. The methods are, however, still showing too low accuracy to be used in the clinical practice and further research is therefore needed. Computational neuroanatomy represents a set of methods for processing and analysis of structural data from magnetic resonance imaging (MRI). The brain images are analyzed in voxel-by-voxel tests to uncover any statistically significant changes resulting from neuropsychiatric disorders, and thus illustrate the relationship between the brain morphology and impaired brain functions. Two most frequently used computational neuroanatomy methods are Deformationbased Morphometry (DBM) and Voxel-based Morphometry (VBM). The main advantage compared to other morphometry methods (such as volumetry based on manual segmentation of regions of interest) is the ability to analyze data from the whole brain without arbitrarily defined boundaries [4]. This is very important for neuroimaging analysis of mental disorders where the morphological abnormalities are suspected to have a very complex spatial patterns rather than to follow shapes of anatomic structures. The DBM method detects structural differences, such as changes in the shape, position and size, compared to a template of normal brain anatomy taken from an available digital brain atlas [4,21]. The DBM results depend on the used technique of non-linear image registration. During the registration, a so-called deformation field is computed for each subject, which contains information about the displacement of voxels - how much and what directions they should be moved to match the template. These displacements are then analyzed statistically in a group comparison to uncover only the significant abnormalities in terms of shape, volume or other metrics which can be computed from the underlying vector fields. The VBM method aims at unveiling statistical differences in the local volume of the brain tissue in each voxel. The VBM pipeline consists of multiple preprocessing and analysis steps resulting in a Statistical Parametric Map (SPM) which displays voxels with the density [25] or volume [10] of gray or white matter significantly different between groups of healthy and diseased subjects. The VBM results depend not only on the registration technique used inside the pipeline, but also on segmentation of the brain tissues. In order to classify 52 healthy subjects and 52 patients with schizophrenia, a classifier with a Support Vector Machines (SVM) was used, which is a widely used classification tool in recent years [20].

282

K. Marˇs´ alov´ a and D. Schwarz

The authors proposed a classification algorithm for MRI data which are preprocessed using both DBM and VBM methods and subsequently transferred into a domain of dicrete wavelet detail coefficients for a subsequent analysis. The quality of classification with various configurations of the algorithms parameters was then quantitatively evaluated using leave-one-out cross-validation.

2

Methods

Our proposed algorithm follows the ideas in [5] by Dluhos et al. The main contribution compared to the original algorithm lies in carrying out more experiments with different decomposition levels by wavelet transform. The algorithm is outlined in Fig. 1 and consists of 5 main steps. (i) The data are primarily preprocessed using VBM and DBM to extract features. (ii) Subsequently, these images are decomposed by wavelet transformation into another domain. (iii) After that, their number is reduced by thresholding absolute values smaller than 0.05. (iv) The number of features is further reduced by selection, and the most discriminating features are obtained. (v) Lastly, the classification is carried out, the quality of which is evaluated using sensitivity, specificity and accuracy. 2.1

Data

The proposed classification algorithm is tested on a dataset provided by The University Hospital Brno. This dataset contains a total of 104 subjects, of which 52 subjects are patients with first episode of schizophrenia (FES) and 52 are healthy volunteers (NC – Normal Control). The dataset thus contains 104 T1weighted MR images of the studied subjects with a resolution of 160 × 512 × 512 voxels. Data is obtained with a 1.5 T magnetic resonance device. The subjects are hospitalized individuals, exclusively men with the mean age of 24 years who participated in a study at the Psychiatric Clinic of Masaryk University in Brno. The primary criterion for entering the research is for at least one month, for the first time, to suffer from symptoms of this disease. The subjects are included in the study after a diagnostic interview with the experienced psychiatrist of the clinic. Additional investigations have resulted in a number of subjects being excluded because of neurological disease, drug addiction and other causes [21]. The remaining 52 male schizophrenics are paired with healthy volunteers based on age and hand preference. All subjects signed the informed consent and the study was approved by the ethics committee. The dataset used in this paper is the same as the one in [7]. The obtained T1-weighted data of 104 subjects is first preprocessed using DBM and VBM. In addition to these morphometricaly preprocessed images, image data processed by wavelet transform also enter classification algorithm.

Wavelet Imaging Features for Classification

2.2

283

Wavelet Transform

The 3-D discrete wavelet transformation is used to extract the features from the data. Wavelet transformation is a tool that allows you to process the studied signal and convert it to another domain. While the more familiar Fourier transform represents the signal in the form of harmonic signals of different frequencies, wavelet transformation converts the signal into a time-frequency description. This means that apart from the different frequency components that occur in the signal, it also records their localization in time [17]. The principle of signal transformation into the wavelet domain consists of decomposition of the signal into a linear combination of functions derived from the so-called mother wavelet ψ(t). The coefficients of this linear combination then describe the correlation with the original signal [16,17]. The base function or mother wavelet, which determines the basic shape of the ripple is defined as t−b 1 , (1) ψa,b (t) = √ ψ a a where a ∈ R+ and b ∈ R are its dilatation and displacement parameters. The transformation of the signal into the new wavelet domain is used for its displacement and dilatation. The shift of the mother wavelet determines the time location in which the transformation is performed. Its dilatation then determines the frequency at which it is necessary to focus on during the transformation [17]. Features that enter the next steps of analysis are extracted by Discrete Wavelet Transform (DWT), which we can also see as a specially sampled continuous wavelet transform where the wavelet function ψ(t) behaves in this case as a bandpass filter that filters the signal around central frequency. DWT is implemented with the so-called Fast Algorithm. The processed data have been transformed into a space where the information contained is sparsely1 represented. For decomposition into the wavelet domain, two optional parameters are needed: signal decomposition level and mother wavelet. On the basis of literature [3,11] and consultation with a specialist, the sym5 mother wavelet is selected, which has proven itself in the research of natural images [11]. 3, 4 and 5 image degradation levels are set as a parameter entering the classification algorithm. After decomposition of the image data by wavelet transformation, each subject is described by means of a wavelet coefficient vector which is sequentially ordered in descending order of magnitude. The largest amount of energy in the image is contained in coefficients with the highest values [11]. Based on this information, coefficients with the absolute value smaller than the selected threshold of 0.05 are therefore removed from the coefficient vector [1]. After wavelet transformation of the DBM preprocessed images to 3 decomposition levels, from the original 8.5 million coefficients, only 90 thousand are obtained, while 99% of energy is retained. In the case of VBM, after wavelet 1

A discrete signal can be called sparse if most of its coefficients equal to zero [22].

284

K. Marˇs´ alov´ a and D. Schwarz

transformation the number of features is reduced from 3 million to 23 thousand while maintaining the same percentage of energy. The algorithm also works with data that are not processed by wavelet transform. In this case primary data size reduction and feature selection occur instead of thresholding using a binary brain mask that removes the non-fragments from the image data. This step follows for all data files to calculate metrics in order to select discriminating features. 2.3

Feature Selection

Despite the reduction of the number of features by thresholding or by binary mask of the brain, the data entering the classification algorithm is still too large. In this case, for the systematic experiments, the following criteria are chosen to express the ability of the features to divide subjects into the desired groups: – Fisher’s Discrimination Ratio (FDR), which has been used several times in this context and has achieved great results [24,27]: F DR =

(μ1 − μ2 )2 . σ12 + σ22

(2)

– Bhattacharyya distance [8]: 1 (μ1 − μ2 )2 1 Bha = + ln 2 2 4 σ1 + σ2 2

σ12 + σ22 2σ1 σ2

,

(3)

where μ1 and μ2 are the mean feature values between subjects of the first and second class (i.e. from the class of schizophrenic patients and healthy controls) and σ12 and σ22 denote the variance of the feature values within the classes. – The last tested criterion was the Variances metric [5]: 2 variances = where

2

σ12 + σ22

,

(4)

represents the variance of the feature over all subjects.

After the value of the criterion is computed for each feature, the output values are sorted in descending order of magnitude, and the first p features with the largest values are put into the next step of the algorithm. An important issue is a suitably chosen number of p best-discriminating features. For this reason, the number of p features in the classification algorithm is an optional parameter. This number of features is set to {100, 500, 1000, 5000, 10,000} within the experiment.

Wavelet Imaging Features for Classification

2.4

285

Classification

Extracted and selected features for each subject may further enter the classification algorithm. These features and the identifier whether a classified subject is a schizophrenic or a healthy control enter the learning phase of a linear learning kernel classifier - Support Vector Machines (SVM). This algorithm classifies entities into predefined classes by searching for a superplane that divides image data into individual groups of entities in the most robust way. This method emphasizes the most suitable separation of border images in terms of minimizing the distance of vectors from the border [2]. The output of the trained classifier is an indicator of the group in which the algorithm enrolled the subject. In order to be able to qualify for classification, the success rate of schizophrenia and healthy controls is verified on the test subject. Subsequently, the quality of the classification is verified using sensitivity, specificity and overall accuracy: TP , TP + FN TN SP EC = , FP + TN TP + TN ACC = , TP + FN + TN + FP

SEN S =

(5) (6) (7)

where TP expresses true positive results, TN denotes true negative results, FP indicates false positives, and FN indicates false negatives. The frequencies of these variables are obtained from the results of one-out cross-validation [12].

3

Experiments and Results

Several experiments are performed with the proposed classification algorithm on the dataset described above. The classifier depends on four parameters, namely: 1. Preprocessing of magnetic resonance images: {DBM, VBM}, 2. Decomposition level by wavelet transform: {3, 4, 5} decomposition levels, 3. Metrics for the selection of the most discriminating features: {FDR, Bha, Variances}, 4. Number of selected features p: {100, 500, 1000, 5000, 10000} features. Experiments are performed on all possible combinations of parameter settings, i.e. 2 × 3 × 3 × 5 = 90 experiments with data processed by wavelet transformation and 30 experiments with data not preprocessed. Together, 120 experiments are carried out. Tables 1 and 2 show the best results of experiments with differently selected metrics for selecting the most discriminating features, decomposition levels using wavelet transformation, and the two best numbers of selected features. For DBM data, the most accurate results for 5 decomposition levels using wavelet transformation when working with Fisher’s Discrimination

286

K. Marˇs´ alov´ a and D. Schwarz

Ratio and 1000 discrimination features are obtained, and again for 5 decomposition levels but when working with Variances metric and 5000 features. In VBM preprocessed data, the best results are obtained when working with the Fisher’s Discrimination Ratio and the Variances metric at 4 decomposition levels and 1000 features.

4

Discussion

This article deals with design of the classification algorithm and the application of wavelet transformation to image data, aiming at the use of computer aided diagnostics of the first episode of schizophrenia. In current clinical practice neuropsychiatric disorders are diagnosed only on the basis of an objective interview with a psychiatrist. For this reason, these procedures could open the door to more efficient and accurate diagnostics, possibly to predict such diseases. At the same time, in this case, it would be possible to refine the medication, possibly set up individual treatment for each patient. The classification algorithm worked with 4 types of data sets that are different in image processing methods: data processed by deformation-based morphometry and their decomposition into the wavelet domain, and data processed using voxel-based morphometry and their wavelet transformation. Such preprocessed

Fig. 1. A schematic diagram of the proposed classification algorithm. TP expresses true positive results, TN denotes true negative results, FP indicates false positives, and FN indicates false negatives

Wavelet Imaging Features for Classification

287

Table 1. Classifier success in recognition between FES and HC datasets preprocessed by DBM, depending on the metric for selecting the most discriminating features, the decomposition levels by wavelet transform and the number of features. Values that are highlighted reach the best results Metrics Decomposition Number of Accuracy Sensitivity Specificity Level Features [%] [%] [%] FDR

0 3 4 5

Bha

0 3 4 5

Var

0 3 4 5

1000 5000 1000 5000 1000 5000 1000 5000

47.12 63.46 60.58 68.27 56.73 69.23 58.65 72.12

51.92 67.31 59.62 69.23 55.77 67.31 57.69 67.31

42.31 59.62 61.54 67.31 57.69 71.15 59.62 76.92

1000 5000 1000 5000 1000 5000 1000 5000

50.00 61.54 62.50 62.50 61.54 65.38 60.58 64.42

55.77 67.31 63.46 63.46 61.54 63.46 61.54 61.54

44.23 55.77 61.54 61.54 61.54 67.31 59.62 67.31

1000 5000 1000 5000 1000 5000 1000 5000

46.15 63.46 63.46 70.19 57.69 69.23 59.62 71.15

51.92 69.23 61.54 71.15 55.77 67.31 57.69 67.31

40.38 57.69 65.38 69.23 59.62 71.15 61.54 75.00

data with selected other parameters that affected the resulting image analysis then entered the classification algorithm. The algorithm worked mainly with data decomposed by wavelet transformation into a new, more suitable wavelet domain. In this case, it was necessary to specify 2 parameters: decomposition levels and mother wavelet. According to consultation with the expert and literature [3,11] was chosen the sym5 mother wavelet that gives good performance for decomposition of natural images such as medical data. The decomposition levels have been set to several values, and they appear to be one of the parameters in the classification algorithm. By selecting

288

K. Marˇs´ alov´ a and D. Schwarz

Table 2. Classifier success in recognition between FES and HC datasets preprocessed by VBM, depending on the metric for selecting the most discriminating features, the decomposition levels by wavelet transform and the number of features. Values that are highlighted reach the best results Metrics Decomposition Number of Accuracy Sensitivity Specificity Level Features [%] [%] [%] FDR

0 3 4 5

Bha

0 3 4 5

Var

0 3 4 5

1000 5000 1000 5000 1000 5000 1000 5000

57.69 54.81 70.19 71.15 73.08 67.31 71.15 67.31

51.92 50.00 69.23 67.31 71.15 65.38 69.23 63.46

63.46 59.62 71.15 75.00 75.00 69.23 73.08 71.15

1000 5000 1000 5000 1000 5000 1000 5000

53.85 53.85 73.08 71.15 72.12 66.35 60.58 64.42

48.08 50.00 73.08 67.31 71.15 63.46 61.54 61.54

59.62 57.69 73.08 75.00 73.08 69.23 59.62 67.31

1000 5000 1000 5000 1000 5000 1000 5000

59.62 55.77 71.15 71.15 73.08 69.23 71.15 68.27

53.85 51.92 71.15 67.31 71.15 67.31 69.23 65.38

65.38 59.62 71.15 75.00 75.00 71.15 73.08 71.15

the mother wavelet and the exact values of the decomposition levels there was a certain limitation of the parameters of the classification algorithm. Data values decomposed by wavelet transformation at an absolute value less than the predetermined threshold T = 0.05 were deleted from the dataset. Threshold value was determined experimentally and the image energy was calculated, rapidly declining with increasing threshold value. Furthermore, the selection of the number of discriminating features was performed. Three different metrics were used for this, which also figured in the classification algorithm as parameters: Fisher’s Discrimination Ratio, Bhattacharyya distance, and Variances.

Wavelet Imaging Features for Classification

289

In Table 1, it can be seen that the classification algorithm for pattern recognition achieves better results in case of image data decomposed into a wavelet domain than in the case without transformation. For data that describes gray matter densities, best results are obtained for 3 and 4 decomposition levels, while data describing local volume changes achieve the best results for 5 decomposition levels. The higher classification accuracy was achieved for the data that has been processed by wavelet transformation, could be caused by the fact that the wavelet transform coefficients in a smaller quantity contained substantial amounts of information and data were not loaded by ballast. In addition to the decomposition levels and the image preprocessing methods, metrics that select the most discriminating features and the number of selected features were also among the important factors that undoubtedly affect the classifier’s final success. The best results were obtained using data describing local volume changes where the best performance classifier achieved at 5 decomposition levels for 5000 features that were selected using the Fisher’s Discrimination Ratio and Variances. The first case reached 72.12% accuracy, 67.31% sensitivity and 76.92% specificity, the second case then 71.15% accuracy, 67.31% sensitivity and 75.00% specificity. Using data describing gray matter density, 73.08% accuracy, 71.15% sensitivity, and 75.00% specificity were reached. In this case, the data were decomposed into 4 decomposition levels, the classifier worked with 1000 features selected in the first case by the Variances metric and in the second by the Fisher’s Discrimination Ratio. In order to make a comparison with other studies dealing with the classification of subjects with schizophrenia and healthy volunteers, only studies that work with patients in the initial phase of schizophrenia should be selected. If the patient suffers from a chronic form of schizophrenia, the difference in morphological abnormalities differs from the patients in the initial first episode. The differences are even more noticeable [15,18]. The results from this image data analysis are comparable to those of similarly oriented works [9,21,22,26]. Despite the results of classification of schizophrenics and healthy individuals, the resulting accuracy is still too low to ensure that the proposed algorithm could be used in clinical practice as an objective method for diagnosis of this devastating neuropsychiatric disorder. The low resulting accuracy might have been caused by several difficulties during the analysis. The difficulties, compared to conventional pattern recognition, might have come from at least four following sources. (i) The feature-to-instance ratio is extremely large, in the order of 5000:1, while in a typical pattern recognition problem it is expected to be much smaller than 1. Thus, algorithms for brain-image classification often cope with the problem known as the ”curse of dimensionality” or the so-called ”small sample size problem” – a well-known and described problem in the domain of computational neuroscience [14]. (ii) There is a spatial relationship between the features that were needed to be taken into account. (iii) The signal-to-noise ratio is low. (iv) There is great redundancy in

290

K. Marˇs´ alov´ a and D. Schwarz

the feature set [13]. Due to these problems, there is still a space for additional improvements and analyses. Acknowledgements. This work was supported by the research grant from the Ministry of Health, Czech Republic No. 17-33136A.

References 1. Anutam, R.: Performance analysis of image denoising with wavelet thresholding methods for different levels of decomposition. Int. J. Multimed. Its Appl. 6(3) (2014) 2. Bishop, C.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, New York (2006). ISBN 9780387310732 3. Al-Qazzaz, N.K., Bin Mohd Ali, S. H., Ahmad, S. A., Islam, M. S., Escudero, J., : Selection of mother wavelet functions for multi-channel EEG signal analysis during a working memory task. Sensors 15(11), 29015–29035 (2015). Basel, Switzerland 4. Ashburner, J., Friston, K.J.: Voxel-based morphometry–the methods. NeuroImage 11, 805–821 (2000) 5. Dluhoˇs, P.: Multiresolution, feature selection for recognition in magnetic resonance brain images. Master thesis, Masaryk University, Faculty of Science, Department of Experimental Biology, Brno (2013) 6. Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2009). ISBN 9780387848587 7. Janousova, E., Montana, G., Kasparek, T., Schwarz, D.: Supervised, multivariate, whole-brain reduction did not help to achieve high classification performance in schizophrenia research. Front. Neurosci. 10 (2016) 8. Kailath, T.: The divergence and Bhattacharyya distance measures in signal selection. IEEE Trans. Commun. 15(1), 52–60 (1967) 9. Kaˇsp˚ ek, T., Thomaz, C.E., Sato, J.R., Schwarz, D., Janousova, E., Marecek, R., Prikryl, R., Vanicek, J., Fujita, A., Ceskova, E.: Maximum-uncertainty linear discrimination analysis of first-episode schizophrenia subjects. Psychiatry Res. Neuroimaging 191(3), 174–181 (2011) 10. Kennedy, K.M., Erickson, K.I., Rodrigue, K.M., Voss, M.W., Colcombe, S.J., Kramer, A.F., Acker, J.D., Raz, N.: Age-related differences in regional brain volumes: a comparison of optimized voxel-based morphometry to manual volumetry. Neurobiol. Aging 30, 1657–1676 (2009) 11. Kumari, S.: Effect of symlet filter order on denoising of still images. Adv. Comput. Int. J. 3(1), 137–143 (2012) 12. Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms, p. 0471210781. Wiley, ISBN (2004) 13. Kuncheva, L.I., Rodriguez, J.J., Plumpton, C.O., Linden, D.E.J., Johnston, S.J.: Random subspace ensembles for fMRI classification. IEEE Trans. Med. Imaging 29(2), 531–542 (2010) 14. Lemm, S., Blankertz, B., Dickhaus, T., M¨ uller, K.-R.: Introduction to machine learning for brain imaging. Neuroimage 56, 387–399 (2011) 15. Lieberman, J. A., Tollefson, G. D., Charles, C., Zipursky, R., Sharma, T., Kahn, R. S., Group, H. S: Antipsychotic drug effects on brain morphology in first-episode psychosis. Arch. Gen. Psychiatry 62(4), 361–370 (2005)

Wavelet Imaging Features for Classification

291

16. Mallat, S.: A Wavelet Tour of Signal Processing: The Sparse Way. Elsevier, Boston (2007). ISBN 9780123743701 17. Misiti, M., Misiti, Y., Oppenheim, G., Poggi, J.-M.: Wavelets and their Applications, p. 9781905209316. Wiley-ISTE, ISBN (2007) 18. Pierson, R., Magnotta, V.: Long-term antipsychotic treatment and brain volumes: a longitudinal study of first-episode schizophrenia. Arch. Gen. Psychiatry 68(2), 128–137 (2012) 19. Rathi, V.P.G.P., Palani, S.: Brain tumor MRI image classification with feature selection and extraction using linear discriminant analysis (2012). arXiv:1208.2128 [cs] 20. Sch¨ olkopf, B.: Learning with kernels. J. Electrochem. Soc. 129, 2865 (2002) 21. Schwarz, D., Kaˇsp´ arek, T., Provazn´ık, I., Jarkovsk´ y, J.: A deformable registration method for automated morphometry of MRI brain images in neuropsychiatric research. IEEE Trans. Med. Imaging 26, 452–461 (2007) 22. Starck, J.-L., Murtagh, F., Fadili, J.M.: Sparse Image and Signal Processing: Wavelets, Curvelets. Cambridge University Press, Morphological Diversity (2010). ISBN 9780521119139 23. Vyˇskovsk´ y, R., Schwarz, D., Janouˇsov´ a, E., Kaˇsp´ arek, T.: Random subspace ensemble artificial neural networks for first-episode schizophrenia classification. Ann. Comput. Sci. Inf. Syst. 8, 317–321 (2016) 24. Wang, S., Li, D., Song, X., Wei, Y., Li, H.: A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification. Expert. Syst. Appl. 38(7), 8696–8702 (2011) 25. Wright, I.C., McGuire, P.K., Poline, J.B., Travere, J.M., Murray, R.M., Frith, C.D., Frackowiak, R.S., Friston, K.J.: A voxel-based method for the statistical analysis of gray and white matter density applied to schizophrenia. Neuroimage 2(4), 244–252 (1995) 26. Yoon, J.H., Nguyen, D.V., McVay, L.M., Deramo, P., Michael, J., Ragland, J.D., Niendham, T., Solomon, M., Carter, C.S.: Automated classification of fMRI during cognitive control identifies more severely disorganized subjects with schizophrenia. Schizophr. Res. 135, 28–33 (2013) 27. Zhang, H., Ho, T.B., Lin, M., Liang, X.: Feature extraction for time series classification using discriminating wavelet coefficients. Adv. Neural Netw. 3971, 1394–1399 (2006)

Dynamic Occlusion Surface Estimation from 4D Multimodal Data Agnieszka A. Tomaka(B) , Leszek Luchowski, Dariusz Pojda, and Michal Tarnawski Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, Baltycka 5, Gliwice, Poland {ines,leszek.luchowski,dpojda}@iitis.pl, [email protected]

Abstract. Methods are described of determining the dynamic occlusion surface based on dynamic sequences of mandibular motion, acquired with a dynamic scanner, and measurements acquired with the Zebris axiograph. The Zebris data is brought into register with multimodal imagery, using three corresponding transformations. This registration allows dense motion sequences to be simulated, which in turn allows occlusal conditions to be known at various stages of movement and the dynamic occlusal surface to be identified for the purposes of occlusal splint design. Keywords: Dynamic occlusion · Mandibular movement simulation · Multimodal image registration · Temporomandibular disorder diagnosis

1

Introduction

The occlusal surface of a single tooth is the one facing the opposing jaw. The sum of these surfaces of the maxillary or mandibular teeth is respectively the upper and lower occlusal surface. Their shape is determined by skull and tooth anatomy, tooth positions in the dental arches, and is adjusted for such functions as: biting, mastication, swallowing. There are points and areas of contact of occlusion planes. They are different in different articulation positions of the mandible. Their correct distribution protects against overload. Changes of the occlusal surface can lead to patient discomfort and to malfunctions of the temporo-mandibular joint (TMJ). For this reason, any change in the shape of teeth during dental treatment is corrected with the use of articulating paper to identify excess fragments - places with undesirable tooth contact during various phases of mandibular motion - and cut them off. The importance of the occlusal surface is even greater in prosthetics and in the design of occlusal splints for the treatment of TMJ disorders and bruxism. A distinction should be made between static occlusion, defined as the contact points of dental arches during swallowing, when the jaws are momentarilyclenched and the contact area is the greatest - and dynamic occlusion, c Springer Nature Switzerland AG 2019 E. Pietka et al. (Eds.): ITIB 2019, AISC 1011, pp. 292–303, 2019. https://doi.org/10.1007/978-3-030-23762-2_26

Dynamic Occlusion Surface Estimation from 4D Multimodal Data

293

defined as points or areas where tooth contact occurs in various positions of the mandible. Dynamic analysis of the occlusal surface is very often used in dentistry for analysis of jaw movements and testing dental appliances: occlusal splints, dentures, brackets, crowns etc. A device which augments this process is the articulator, combining plaster models of the dental arches and a simplified mechanism emulating the function of temporomandibular joint. Articulators help the dental practitioner adjust the appliances to the patient’s individual occlusal conditions. The procedure is tedious, requiring manual work of an experienced technician. At present, the development of 3D imaging, image processing, and measurement techniques make the creation of a virtual articulator possible [1,2]. Virtual dental models can be acquired with intraoral scanners [3,4]. Another possibility is to scan plaster dental models, or to reconstruct tooth surfaces from cone beam computed tomography or computed tomography (CBCT/CT) [5]. While acquiring 3D data of dental arches is itself not a problem, a precondition for using it is to find a relationship between this data and other anatomical structures. This issue has been addressed in various ways by a number of authors [5–8], but their solutions are proposals rather than accepted standards. The motion of the mandible is also assessed by using a virtual hinge contained in the design of the articulator, or by aquisition of the true motion [9,10]. A number of systems for tracking the motion of the mandible exist. The examples are: Zebris or Cadiax; a problem, however, is to precisely register the positions of the mandible against other structures. 3D scanners have also been used to assess mandibular motion; some other methods use calibrated photogrammetry or stereo vision as an affordable and readily available solution [7–9]. Dynamic occlusion data can be used to form a tooth surface, for example when the crown of an implanted tooth is formed by its opposing tooth sliding over it [11]. Knowledge of the dynamic occlusion surface also helps in assessing the influence of TMJ function and mandibular motion on the shape of teeth [12]. One of the applications of a dynamic occlusal surface is in the design of occlusal splints. In [13] it was proposed that the splint should replicate the shape of the arch that bears it. In this way, the same contact conditions are ensured with the splint inserted as there were before its application. While dynamic occlusion was not available, it was assumed that the shape of the occlusion is reflected in the tooth, and those parts of the tooth surface were taken which lay within a preset distance to be the occlusal surface. Depending on what maximum distance is interpreted as contact, various surface fragments can be designated. In a previous paper [10] it was proposed to follow mandibular motion with the use of a dynamic 3dMD scanner, which allowed the dynamic data to be put in register with CBCT reconstructions, leading to the creation of dynamic virtual X-ray images and visualisation of mandible motion. Unfortunately the temporal resolution of the dynamic 3D scanner was insufficient for fluent observations. The present work concentrates on registering dense measurements of mandible motion acquired with the Zebris (75 measurements per second) to static and dynamic scanner output and CBCT reconstructions. At the same time, a

294

A. A. Tomaka et al.

method is presented of determining dynamic occlusal surfaces, which can serve in the construction of a splint and be the basis for analysis of the work of TMJ.

2

Materials and Methods

The paper is a continuation of previous work concerning the registration of static multimodal images of the head [5], mandibular movement acquisition with a dynamic 3D scanner [14] and the registration of this static and dynamic data [10]. 2.1

The Idea of the Method

A schematic diagram of the method is shown in Fig. 1. Operations published in earlier papers are on the left-hand side. The right-hand side, on a shaded background, is the original contribution of the present paper. An important part of it is the registration of two imaging modalities (the Zebris data and the 3D scans) based not on point or other feature correspondences, but on motions. These motions are known to be the same in the two coordinate systems, even though, in each of them they affect a different set of points: the Zebris axiograph yields an abstract triangle in different positions, while the 3D scans contain images of actual tissues before and after each move. Another aspect is the detection of occlusion in various positions of the mandible.

Fig. 1. Diagram of the data flow

Dynamic Occlusion Surface Estimation from 4D Multimodal Data

295

The previously developed operations will be briefly recapitulated inasmuch as it is necessary for the sake of clarity. For more detailed descriptions, the reader is referred to the bibliography. 2.2

Input Data

Orthodontic treatment is a long-term process, during which various image documentation is collected from recommended diagnostic examinations. The work uses data from the following static examinations: CBCT/CT, 3D scanning of the face and the teeth or dental models, as well as measurements from dynamic examinations of mandible movement from a 3D scanner and the Zebris axiograph. All these imagery and measurement data constitute virtual patient record. Static Imagery Data Data obtained from CBCT/CT, as 3D surface reconstructions gleaned from the slices with different thresholds represent different body tissues: soft tissues, bones and teeth (Fig. 2). All the reconstructions are done in the same coordinate system connected with CBCT device. Therefore the geometrical relation between different patient’s tissues are preserved in the set of reconstructions. Further analysis of the

Fig. 2. CBCT Reconstructions (from left to right): soft tissues, bones, teeth and their mutual relation

Fig. 3. 3D photos and virtual dental models registration

296

A. A. Tomaka et al.

mandible movement requires separation of the reconstruction of the mandible from the other skull bones. If the CBCT is acquired for the patient with his jaws in the static occlusion, which is a common practice in the examination of joint conditions, the reconstruction of teeth produce a single mesh, which has to be separated manually. Some information obtained from those examinations can be also acquired from 3D photos and virtual dental models - 3D images of respectively the patient’s face and his teeth or dental models, acquired with 3D scanners. As far as information obtained from 3D scanners concern different regions, and images haven’t got a common part, 3D scanning have to be repeated with the reference object mounted to the upper jaws, which provides the relations between 3D photo and virtual dental models (Fig. 3) [5]. Mandible Movement Data from a 3D Scanner A dynamic 3D scanner makes it possible to acquire motion of 3D objects in a 3D space. Its application to the acquisition of mandibular motion was described in [10]. As far as mandibular motion can be regarded as a rigid-body transformation and mandibular bone and the teeth are not visible in the obtained images (they are covered by soft tissues), a reference rigid body is needed - a kind of extension, fixed to the mandibular dental arch throughout its movement, and protruding outside of the mouth to be visible. The output of the scanner is a sequence of 3D scans containing the surface of the face with the markers attached to the mandibular teeth, which have to be separated in a process of segmentation (Fig. 4).

Fig. 4. 3D photos with arches for chosen mandibular positions (four left); Head motion compensated out (two right)

The scanner used for the present work can take 7 frames per second, which is not quite enough for continuous observation. It is, however, sufficient to capture the position of the mandible at crucial points of its movement, such as static occlusion, extreme mandibular depression or deviation. The data describing these key positions are then converted into rigid-body transformation matrices relating them to the reference position of maximal intercuspation. These transformation matrices are expressed in the CBCT coordinate system, but they physically describe the same changes of position as will be acquired by the Zebris axiograph.

Dynamic Occlusion Surface Estimation from 4D Multimodal Data

297

Fig. 5. Mandible movement acquisition using the Zebris axiograph (left), Bonwille triangle (center), diagrams of displacement (right)

Zebris Axiograph Movement Data In the Zebris axiograph sets of ultrasound transmitters and receivers attached to the kinematic and facial arch allow position change to be determined (Fig. 5). As in the case of the 3D scanner, the mandible is assumed to undergo a rigid-body transformation. The transformation is not represented by its parameters; instead, displacement vectors are given for the vertices of the Bonwille triangle (Fig. 5), defined by a point on the lower central incisors edge junction and two points on the condyles. 2.3

Registration of Multimodal Data

Registering the Craniomaxillary Part of Dynamic Images to the Static Ones Motion acquisition with the dynamic scanner was performed using two arches. One was attached to the upper part of the face, and the other to the mandibule. Because the patient’s position is not constrained in any way while moving the mandible, the dynamic scanner will also acquire the motion of the entire head. At first, the images in the dynamic series are brought into register using the upper facial arch. After cancelling the movement of the head, the positions of the mandibular arch reflect the motion of the mandible (Fig. 4). Registration using the skin surface allows the movement to be transferred into the head-centred coordinate system used by CBCT. Under unknown correspondence the iterative closest point (ICP) algorithm can be applied [15]. The distributions of spheres in the arches is known. By registering the surface of the arch to the position of maximal (static) occlusion it is possible to determine the locations of the centres of spheres in this position. Motion can be described equivalently either by a sequence of rigid-body transformation matrices (12 coefficients per instant of time) or by a sequence of positions of 3 non-collinear markers (9 coordinates per instant of time). In the case of transformation matrices, it is preferable to represent the position at each time instant ti by a transform to it from the position at t0 , rather than from the immediately preceding position at ti−1 , because in the latter case trying to determine the position at any given time moment would involve concatenating many transformation and incur a significant accumulation of numerical errors.

298

A. A. Tomaka et al.

Registering the Transformation Collected by the Zebris Axiograph The Zebris JMA axiograph uses ultrasound tracking to measure the position of the mandible relative to the head in all 6 degrees of freedom. Motion acquisition is carried out by tracking the movement of markers. The result is represented as displacement vectors of 3 preselected points (usually two on the head of the mandible and one between the first mandibular incisors). The diagrams of those displacement are evaluated (Fig. 5) by specialists. The difficulty in using this kind of data consists in the fact that the position of the face is not known in the patient-centred coordinate system used by Zebris. It is possible to designate 3 points on the upper part of the face, but the designation is done manually and when the doctor tries to locate the same three points on a 3D scan, their positions are subject to error. Even small registration errors can lead to very significant errors in the simulation of motion in the coordinate system bound to the static imaging data. These significant errors are visible in the maxillary and mandibular meshes intersecting (a situation not physically possible) in various stages of motion. To eliminate such errors, a different registration technique is needed. Instead of registering the artificial points of the triangle used by the axiograph, registration will rely on the rigid-body transformation parameters derived from the positions of the triangle in various stages of mandibular motion. The task is to bring into register the parameters of the same motions acquired - and expressed - in two different coordinate systems: P1 (tf ) = Tf · P1 (t0 )

P2 (tg ) = Tg · P2 (t0 )

(1)

where f - index in the first sequence, g - index of the corresponding transformation in the second sequence. The fact that no point correspondences are known poses a significant challenge; the points in P2 do not correspond to those in P1 . The matrix of a rigid-body transformation comprises a rotation and a translation. Such a transformation can be interpreted as a composition of a pure rotation by the matrix Rf , Rg and pure translation by the vector γf , γg . The rotation matrix can be presented as a quaternions of rotation: qf = [qf0 , qfi , qfj , qfk ], where qf0 = cos

α f

2

α g

, qfi = sin

f

2

α g

· af , qfj = sin · ag , qgj = sin

α f

2

α g

· bf , qfk = sin · bg , qgk = sin

(2) α f

2

α g

· cf ,

· cg , 2 2 2 2 αf = αg is the rotation angle, the same in both coordinate systems af , bf , cf , ag , bg , cg are the vectors of directional coefficients of the straight lines around which the rotation occurs in each coordinate system. A general rigid-body transformation can be uniquely represented as a rotation around an axis and a translation along the same axis. When the transforqg0 = cos

, qgi = sin

α

qg = [qg0 , qgi , qgj , qgk ],

Dynamic Occlusion Surface Estimation from 4D Multimodal Data

299

mation matrix has been exactly determined, it is possible to derive the direction and location of the axis in a given coordinate system. ⎡

⎤ ⎡ ⎤ ⎛⎡ ⎤ ⎡ ⎤ ⎞ ⎡ ⎤ Xf1 Xf2 Xf3 af1 af2 af3 Xg 1 Xg 2 Xg 3 a g1 a g2 a g3 γx ⎣ Yf Yf Yf ⎦+⎣ bf bf bf ⎦ t = R ⎝⎣ Yg1 Yg2 Yg3 ⎦ + ⎣ bg1 bg2 bg3 ⎦ t⎠+⎣ γy ⎦ 1T 3 1 2 3 1 2 3 Zf1 Zf2 Zf3 cf1 cf2 cf3 Z g1 Z g2 Z g3 c g1 c g2 c g3 γz

where 1Tk - vector of ones. After substituting the straight line equations to the equation of transformation, a formula is obtained for the R and γ. X, Y and Z with subscripts f and g are corresponding points, either of intersection or of smallest distance between those lines. This situation corresponds to bringing into register a set of 3 straight lines in two coordinate systems. To solve this problem, 3 corresponding movements are needed, each of them recorded in the coordinate systems of both the dynamic scanner and the Zebris. To achieve this, the so-called occlusal bits can be used, preserving the relations of dental arches in three different positions. In this situation, the transformations are known to correspond to each other.

Fig. 6. Fusion of Zebris triangles with the face surface, obtained from CBCT (left), both jaw models obtained from the CBCT (center) and a tooth model (right)

Simulation of Mandible Movement The Zebris determines the sequence of movements of mandible with much better temporal resolution than the dynamic scanner used for registration. When the sequence of Zebris-based transformations has been registered to the head-centered coordinate system (Fig. 6), each of those transformations can be applied to the mesh representing the surface of the mandible. In result, the dense sequence of positions of this mesh is obtained and smooth simulation of the movement of the mandible is generated. 2.4

Occlusion Determination

Static Occlusion In practice, a particularly important position is the static occlusion (during swallowing). The contact area is the greatest in this position. The static occlusion is mostly used when dynamic occlusion cannot be determined. The static occlusion is identified as the points (of the tooth arch being

300

A. A. Tomaka et al.

analysed) whose smallest distance from the mesh of the opposing arch is less than a preset threshold. Consequently, to each point of a mesh the closest point in the opposing arch is assigned. Determining the closest point does not take into account the general geometric relations between meshes or the topology of each mesh. The vector connecting corresponding points can intersect a mesh (e.g. for outer points of first incisors) or the search for corresponding points can turn in a forbidden direction, as inside a mesh. To eliminate such cases, those matchings are filtered out for which occlusion has been detected, as well as those where the vector connecting matched points forms an obtuse angle with the vertex normal (Fig. 7).

Fig. 7. The occlusion surface (red) in the distance smaller than 3mm (left), filtered with the cosine between normal and a vector joining the vertex with the nearest vertex of the opposite arch (center) and without self-occluded faces (right), green - filtered out surfaces in both cases

As occlusion detection requires that for each mesh vertex all the faces of the opposite mesh be checked, steps were taken to reduce the computational cost. In particular, the vertices and faces which lie outside the common part of the so-called bounding boxes were omitted. When designing an occlusal splint [16], it is desirable that the occlusion surface be not an assortment of many small areas of contact between the dental arches, but, rather, a coherent area which can be used as a basis for the construction of the splint. Consequently, the static occlusion is determined with an arbitrarily adjusted distance parameter ensuring the cohesion of the area. Dynamic Oclussion Estimation When the initial mutual position of the dental arches has been registered, the static occlusion can be determined. When the motion of the jaws relative to each other is also known, thanks to registrations, the static-occlusion algorithm can be applied to each position of the jaws in the recorded sequence (Fig. 8). The margin, which was set very large when determining static occlusion, no longer has to be so voluminous.

3

Results

The techniques of determining the dynamic occlusion described above were implemented in the dpVision program [17,18]. As a result, for each acquired relation between the jaws knowledge about the contact points can be derived. Based on the occlusions so determined, visualisations can be generated for all the registered sequences, which is of diagnostic

Dynamic Occlusion Surface Estimation from 4D Multimodal Data

301

Fig. 8. Three selected maxillary-mandible relations (upper row), and their corresponding momentary occlusions within 1mm (middle row) and 2mm (bottom row) distances

value to doctors, as it indicates, for each phase of motion, which teeth are in contact. This information has a high diagnostic significance for the evaluation of the work of the TMJ. Based on the points of contact in various phases of the motion, a set of all contact points can be obtained, and compared to the static occlusion (Fig. 9). It is also feasible to follow the paths of contact for individual teeth, but this will require the implementation of additional functions, allowing the specialist to select the points for which such paths should be displayed.

4

Conclusions

The paper presents new possibilities offered by integrating multidimensional, multimodal imaging and measurement data. Acquisition of mandibular motion, in conjunction with models of anatomical structures obtained from CBCT and 3D scanner, creates powerful possibilities of simulating the motion represented by the recorded sequence. This is an innovative method, with few parallels in literature which it might be compared against. It is still very much in the preclinical stage, with further research under way. Thanks to these simulations, the work of the TMJ can be evaluated, as was shown in [10]. In the present work, methods have been illustrated which allow the relations between the teeth of both arches to be evaluated in various phases of motion. This leads to determining a summary occlusal surface which can be used for splint design. Compared to our previous work, an algorithm has been presented for the registration of motion sequences based on three known corresponding moves

302

A. A. Tomaka et al.

Fig. 9. Comparison of static (left) and dynamic occlusion (right) within 1mm (top row) and 2mm (bottom) distances

in two different coordinate systems. This allows data with greater temporal resolution to be used. The drawback of the method is that the precision of occlusal surfaces depends on the quality of registration. With poor registration, confusing or unrealistic situations can be created, such as intersecting solids. From the medical point of view, the following aspects have to be improved: – the occlusal bits need to be diversified so that the rotation axes of the transformations from position zero to that imposed by each of the bits not be parallel to each other. – the motion strategy needs to be developed so that all possible contact points will be recorded.

References 1. Luthra, R.P., Gupta, R., Kumar, N., Mehta, S., Sirohi, R.: Virtual articulators in prosthetic dentistry: a review. J. Adv. Med. Dent. Sci. Res. 3(4) (2015) 2. Park, S.: Digitalization of virtual articulator: methods, discrepancy to real articulators, comparing of each methods. Master’s thesis, Lithuanian University of Health Sciences, Kaunas (2017) 3. Richert, R., Goujat, A., Venet, L.: Intraoral scanner technologies: a review to make a successful impression. J. Healthc. Eng. (2017). https://doi.org/10.1155/ 2017/8427595 4. Zimmermann, M., Mehl, A., M¨ ormann, W.H., Reich, S.: Intraoral scanning systems–a current overview. Int. J. Comput. Dent. 18(2), 101 (2015) 5. Tomaka, A.A., Tarnawski, M.: Integration of multimodal image data for the purposes of supporting the diagnosis of the stomatognatic system. In: Lecture notes of the ICB Seminar. 137th ICM Seminar: Novel Methodology of Both Diagnosis and Therapy of Bruxism, IBIB PAN, pp. 68–75 (2015)

Dynamic Occlusion Surface Estimation from 4D Multimodal Data

303

6. Solaberrieta, E., Minguez, R., Barrenetxea, L., Etxaniz, O.: Direct transfer of the position of digitized casts to a virtual articulator. J. Prosthet. Dent. 109(6), 411 (2013). https://doi.org/10.1016/S0022-3913(13)60330-3 7. Solaberrieta, E., Garmendia, A., Minguez, R., Brizuela, A., Pradies, G.: Virtual facebow technique. J. Prosthet. Dent. 114(6), 751 (2015). https://doi.org/10.1016/ j.prosdent.2015.06.012 8. Lam, W.Y.H., Hsung, R.T.C., Choi, W.W.S., Luk, H.W.K., Cheng, L.Y.Y., Pow, E.H.N.: A clinical technique for virtual articulator mounting with natural head position by using calibrated stereophotogrammetry. J. Prosthet. Dent. 119(6), 902 (2018). https://doi.org/10.1016/j.prosdent.2017.07.026 9. Yuan, F., Sui, H., Li, Z., Yang, H., L¨ u, P., Wang, Y.: A method of three-dimensional recording of mandibular movement based on two-dimensional image feature extraction. PLoS ONE 10(9), e0137507 (2015). https://doi.org/10.1371/journal.pone. 0137507 10. Tomaka, A.A., Tarnawski, M., Pojda, D.: Multimodal Image Registration for Mandible Motion Tracking, in Information Technologies in Medicine, pp. 179–191. Springer International Publishing. Cham (2016). https://doi.org/10.1007/978-3319-39796-2 15 11. Fang, J.J., Kuo, T.H.: Tracked motion-based dental occlusion surface estimation for crown restoration. Comput. Aided Des. 41(4), 315 (2009). https://doi.org/10. 1016/j.cad.2008.10.006 12. Oancea, L., Stegaroiu, R., Cristache, C.M.: The influence of temporomandibular joint movement parameters on dental morphology. Ann. Anat. 218, 49 (2018). https://doi.org/10.1016/j.aanat.2018.02.013 13. Luchowski, L., Tomaka, A.A., Skabek, K., Tarnawski, M., Kowalski, P.: Forming an occlusal splint to support the therapy of bruxism. In: Information Technologies in Medicine: 5th International Conference, ITIB 2016, vol. 2, pp. 267–273 (2016). https://doi.org/10.1007/978-3-319-39904-1 24 14. Tarnawski, M., Tomaka, A.A.: Acquisition of mandible movement with a 3d scanner. In: Lecture notes of the ICB Seminar. 137th ICM Seminar: Novel Methodology of Both Diagnosis and Therapy of Bruxism, IBIB PAN, pp. 63–67 (2015) 15. Besl, P., McKay, N.: A method for registration of 3-D shapes. IEEE Trans. PAMI 144, 239 (1992). https://doi.org/10.1109/34.121791 16. Pojda, D., Tomaka, A.A., Luchowski, L., Skabek, K., Tarnawski, M.: Applying computational geometry to designing an occlusal splint. In: 6th International Symposium on CompIMAGE’18, Krak´ ow, CompIMAGE’18 (2018). https://doi.org/10. 1007/978-3-030-20805-9 16 17. Kowalski, P., Pojda, D.: Visualization of heterogenic images of 3d scene. In: ManMachine Interactions 3. Springer International Publishing. Cham 291–297 (2014). https://doi.org/10.1007/978-3-319-02309-0 31 18. Pojda, D., Kowalski, P.: Assumptions for a software tool to support the diagnosis of the stomatognathic system: data gathering, visualizing compound models and their motion. In: Lecture notes of the ICB Seminar. 137th ICM Seminar: Novel Methodology of Both Diagnosis and Therapy of Bruxism. IBIB PAN, Warszawa (2015)

Evaluation of Dental Implant Stability Using Radiovisiographic Characterization and Texture Analysis Marta Borowska1(B) and Janusz Szarmach2 1

Faculty of Mechanical Engineering, Department of Biocybernetics and Biomedical Engineering, Bialystok University of Technology, Wiejska 45C, 15–351 Bialystok, Poland [email protected] 2 Department of Oral Surgery, Medical University of Bialystok, M. Curie-Sklodowskiej 24A, 15–276 Bialystok, Poland [email protected]

Abstract. The aim of this study was to assess the bone structure using the texture features of panoramic radiographs directly after the surgery with those performed 12 months after the implant prosthetic loading. The study also examined the possibility of using texture features as a prognostic indicator for implant integration process, which is dynamic and modifies bone structure. Depending on the type of implant, this process is more or less visible. The panoramic radiographs of 40 patients who underwent implant treatment for the single threading dental materials were analyzed using texture method based on first order statistics, gray level cooccurrence matrix and fractal dimension. Irregular regions of interest were cropped and filtered, and a texture features analysis were performed to evaluate their suitability for monitoring bone integration with the implant surface. The Wilcoxon test revealed a significant difference between features obtained from radiographs directly after surgery with those performed 12 months later. This difference could indicate changes in the bone microstructure around the implant. In the feature, the analysis will also be carried out for double threading dental materials. Keywords: Texture analysis · Radiographic images · Dental implant Grey level co-occurrence matrix · First order statistics · Fractal dimension

1

·

Introduction

Treatment with the use of dental implants in patients with missing dentition is a common and reliable procedure. It provides reproduction of full arches in missing single tooth patients, lacking few teeth or edentulous individuals. Surgical procedure consists of introduction of single or multiple implants in the upper jaw or in the mandible [23]. There are several implant systems, as well as variety of tool shapes, surfaces and screw formation, all together being very useful c Springer Nature Switzerland AG 2019 E. Pietka et al. (Eds.): ITIB 2019, AISC 1011, pp. 304–313, 2019. https://doi.org/10.1007/978-3-030-23762-2_27

Evaluation of Dental Implant Stability Using Radiovisiographic

305

in different clinical situations. Conical intraosseous titanium implants, similar to the tooth shape, are most commonly used [19]. Bone integration with the implant surface is a dynamic process of bone tissue growing and its simultaneous resorption. Balance between those two opposing bone mechanisms depends on many stimuli, such as biomechanical forces present in prosthetic-implant system site. The bone of the implant zone is a type of connective tissue with unique mechanical and biological characteristics [23]. Its healing process is being done with no scar formation, furthermore it adapts to the load change by structure modification. Radiological assessment is one of the method of osseointegration monitoring [20]. In case of conventional radiology, the properties of spongeous bone layer are indirectly evaluated with increase of external surface thickness, or shade density decrease on the radiogram. Proper image interpretation is highly dependent on the examiner’s eye possibilities, therefore leaving uncertainty. Subjective evaluation of the typical radiologic examination of the bone change in the course of dental implant integration process shows the need for the new and measurable methods. Radiographic image analysis can be based on texture analysis. The texture represents regular features of the object surface, so it determines whether the surface is smooth or rough or the pattern presented on it is more or less regular. The parameters characterizing textures are calculated using the specified properties of the digital image, like coarseness, homogeneity, or the local contrast. The following methods of parameters calculation can be found in the literature: – statistical methods, e.g. first order statistics, gradient matrices, run length matrices, or co-occurrence matrices [5,7,8,14,18,24,28]; – mathematical model, e.g. fractals [10,11,22] and Markov fields [9,17]; – structural methods [12,16,27,29]; – using signal processing techniques, like e.g. Fourier transform and Gabortransform, or wavelets [3,4,15,21]; – morphological methods using mathematical morphology operations [2,13,26]. Recent studies have demonstrated that the features obtained from texture analysis are considered appropriate for quantifying bone integration with the implant surface in the rectangular regions of interest. Sansare et al. [22] assessed the changes in the FD before and after implant placement. They showed that fractal dimension can be a diagnostic indicator in the implant osseointegration success. Abdulhameed et al. [1] evaluated fractal dimension in predicting implant stability from intraoral periapical radiographs using two implant protocols. The study was performed after the placement of the dental implant and at 3 months and 6 months postoperatively. The FD analysis proved usability in prediction of the implant stability with very high validity (ROC area exceeding 0.8). The aim of this study was to evaluate the bone structure using panoramic radiographs performed directly after the surgery with those 12 months after the implant prosthetic loading. The structure modification of bone was observed near the implant zone. Therefore, the analysis was done in the irregular region of interest (irROI), in order to detect any changes in bone modification.

306

2 2.1

M. Borowska and J. Szarmach

Materials and Methods Data Collection

The examination consisted of radiovisiograms (RVG) of 40 patients (females and males, aged 18–74) treated with implant based prosthetics due to single or multiple teeth loss. The whole group of patients was introduced with the same implant system (Nobel Biocare), intraosseous, screw-in, made of titanium (Ti6Al-4V). Radiographic images were performed with KODAK RVG 6100 set with the real resolution over 14 pl/mm. RVG image of the patients maxillary bone was analyzed in the conical implants introduction site, for the single threading dental materials. Radiovisiograms of the implant zones were compared directly after the surgery (Group A) with those performed 12 months after the implant prosthetic loading (Group B) (Fig. 1).

Fig. 1. Two X-ray images of radivisiograms: (a) image directly after the procedure, (b) image 12 months after the implant prosthetic loading

Two irregular ROIs with one-third the length of the implant sizes, without including any part of the implant in the ROI, were selected from images directly after the surgery (Group A): the mesial peripheral region of the implant and the distal peripheral region of the implant. All ROIs were done on the radiographs using Corel Photo Paint (CorelDRAW Graphics Suite). Each ROI was cut and saved as a 256 gray level PNG image (8 bits) (Portable Network Graphics) and submitted to Matlab R2018a software for Windows(MathWorks Inc.) for image texture feature extraction. ROI selection was done by the same examiner. Another irregular ROI was placed in a similar region on the implant radiograph 12 months after the implant prosthetic loading. The cropped image was duplicated, and the duplicated image was blurred with a Gaussian filter (kernel size 30). The blurred image was then subtracted from the original image and to each pixel were added a gray value of 128 (Fig. 2). After performing filtration operation, the texture features was calculated in Matlab software. SPSS 10.0 software (SPSS Inc., Chicago, IL, USA) was used to store and analyze the data. The Wilcoxon paired test was used to compare the texture features of Group A and Group B. The level of statistical significance was set at 0.05.

Evaluation of Dental Implant Stability Using Radiovisiographic

307

Fig. 2. Different steps involved in calculating the texture features from a Group A: (a) Cropped rectangular ROI, (b) Cropped irregular ROI, (c) blurred image, (d) image with added 128, and from a Group B: (e) Cropped rectangular ROI, (f) Cropped irregular ROI, (g) blurred image, (h) image with added 128

2.2

First Order Statistics

Parameters determined on the basis of the first-order histogram Hist(i) of image intensity [25] characterize the distribution of intensity of the analyzed textures including sharpness or contrast. For image I(x, y) normalized histogram is defined as: X Y 1 1 I(x, y) = i Hist(i) = i ∈ [0, ..., N − 1] (1) 0 otherwise XY x=1 y=1

where X and Y are the width and the height of image I, N -the range intensity of image I. Based on these definition different features (Table 1) can be calculated: Table 1. First order statistics features

2.3

Texture features

Formulae

Mean

F OS1 =

Variance

F OS2 =

Skewness

F OS3 =

Kurtosis

F OS4 =

Energy

F OS5 =

Entropy

F OS6 =

X Y 1 I(x, y) XY x=1 y=1 X Y 1 (I(x, y) − F OS1 )2 x=1 y=1 XY √ X Y 3 −3 1 x=1 y=1 (I(x, y) − F OS1 ) ( F OS2 ) XY √ X Y 4 −4 1 } x=1 y=1 {(I(x, y) − F OS1 ) ( F OS2 ) XY N 2 i=1 Hist(i) − N i=1 Hist(i) log Hist(i)

−3

Gray-Level Co-occurrence Matrix

Grey Level Co-Occurrence Matrix (GLCM) is a mathematical tool based on the analysis of the spatial relations of pixels in the texture. The co-occurrence matrix is a matrix of estimated transition probabilities from the level of intensity i to the level of intensity j for a given angle θ and the assumed distance between

308

M. Borowska and J. Szarmach

image pixels d, where i, j = 0, . . . N − 1, N is the number of intensity levels. The distance between the analyzed pixels is used as a parameter (its value was set to 1 in this study). The GLCM features were proposed by Haralick [14], by Soh [24] and by Clausi [8], these are: autocorrelation, contrast, correlation, cluster prominence, cluster shade, dissimilarity, energy, entropy, homogeneity, homogeneity, maximum probability, sum of squares: variance, sum average, sum variance, sum entropy, difference variance, difference entropy, information measure of correlation 1, information measure of correlation 2, inverse difference, diagonal moment. Matrix was created for 4 different directions of pixel pairs: 0◦ , 45◦ , 90◦ , and 135◦ . Features obtained for different directions was averaged. Among these parameters, correlation, cluster prominence, energy, entropy, maximum probability, sum entropy and information measure of correlation 2 are few such measures which are given in Table 2: Table 2. GLCM features [7] Texture features

Formulae

Correlation

GLCM1 =

Cluster prominence

GLCM2 =

Energy Entropy Maximum probability Sum entropy Information measure of correlation 2

i

j (ij)P (i,j)−μx μy

i

σx σy

4 j (i + j − µx − µy ) P (i, j)

2 i j P (i, j) GLCM4 = − i j P (i, j) log P (i, j)

GLCM3 =

GLCM5 = maxi,j P (i, j) Ng −1 GLCM6 = − i=2 Px+y (i) log Px+y (i) GLCM7 = (1 − exp[−2.0(HXY 2 − HXY )])1/2 HXY = − i j P (i, j) log P (i, j) HXY 1 = − i j P (i, j) log(Px (i)Py (j)) HXY 2 = − i j Px (i)Py (j) log(Px (i)Py (j))

where P (i, j, d, θ)-the simultaneous occurrence probability of two pixels, i-the pixel with gray scale from the image I(x, y), j-the pixel with gray scale from the image I(x + x, y + y), θ-deviation, d-distance. 2.4

Fractal Dimension

Fractal dimension (FD) analysis is a a relatively new mathematical technique that can help in quantifying complex structures, including that of the trabecular bone [10,22]. Existing algorithms for computing the fractal dimension are either based on geometrical or stochastic approach. In this paper, FD was calculated using a Chen’s algorithm [6] in which quantify the surface is realized by intensity difference scaling. Given on N × M image with the maximum possible scale n, the grey level intensity vector is defined as follows: V = [v(1), v(2), . . . , v(n)]

(2)

Evaluation of Dental Implant Stability Using Radiovisiographic

309

where v(i) is the average of the absolute intensity difference of all pixel pairs with scale i. The realization of estimation FD can be defined as: v(i) =

N −1 M −1 N −1 M −1 1 |I(x2 , y2 ) − I(x1 , y1 )| Li x =0 y =0 x =0 y =0 1

1

2

2

where Li is the number of pairs for scale i with distance

3

(3)

(x2 − x1 )2 +(y2 − y1 )2 .

Results

The analysis of radiovisiograms of forty patients performed directly after the surgery (Group A) with those 12 months after the implant prosthetic loading (Group B) determined 13 significant parameters of irregular region of interest. Features were tested using Wilcoxon paired-test (p < 0.05). Following parameters are statistically significant: 4 derived from first order statistics (variance, kurtosis, energy, entropy), 7 derived from GLCM matrix (correlation, cluster prominence, energy, entropy, maximum probability, sum entropy, information measure of correlation 2) and fractal dimension (Chen’s method). First order statistics for the features shows Table 3 considering basic image parameters, whereas Table 4 GLCM features and Table 5 fractal dimension. In this study, all the features characterize the smoothness of the image. The lower values of standard deviation, variance, fractal dimension, entropy, for Group B informs about greater regularity, smoothness and less complex texture. This is shown on the box plots in the Fig. 3. Table 3. First order statistics (mean ± SD) for Group A and Group B. Statistical analysis was performed by means of the nonparametric Wilcoxon signed rank test (p < 0.05) FOS features

Group A N = 80

Group B N = 80

p-values

24.62 ± 4.36

22.61 ± 4.29