Introduction to Artificial Intelligence [1 ed.] 9783031259272, 9783031259289

This book aims to provide physicians and scientists with the basics of Artificial Intelligence (AI) with a special focus

196 61 3MB

English Pages viii; 165 [169] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Introduction to Artificial Intelligence [1 ed.] 9783031259289, 9783031259272

This book aims to provide physicians and scientists with the basics of Artificial Intelligence (AI) with a special focus

149 104 6MB Read more

Introduction to Artificial Intelligence 9783319584874

1,920 168 9MB Read more

The Mind and Machine: An Introduction to Artificial Intelligence 9798374274424

This book is an essential read for anyone interested in understanding the basics of Artificial Intelligence and its appl

712 240 6MB Read more

An Introduction to Artificial Intelligence [1 ed.] 1857283996, 9781857283990

An authoritative and accessible one-stop resource, An Introduction to Artificial Intelligence presents the first full ex

1,235 236 17MB Read more

Introduction to Artificial Intelligence [2 ed.] 9783319584874, 3319584871

This accessible and engaging textbook presents a concise introduction to the exciting field of artificial intelligence (

4,447 458 12MB Read more

Introduction to Artificial Intelligence and Expert Systems 0134771001, 9780134771007

2,274 232 61MB Read more

Artificial Intelligence Basics: A Self-Teaching Introduction 1683925165, 9781683925163

Designed as a self-teachingintroduction to the fundamental concepts of artificial intelligence, the book beginswith its

2,356 304 9MB Read more

Artificial Intelligence Basics: A Self-Teaching Introduction 9781683925163, 1683925165

Designed as a self-teaching introduction to the fundamental concepts of artificial intelligence, the book begins with it

160 117 5MB Read more

Introduction to Deep Learning: From Logical Calculus to Artificial Intelligence 3319730037, 9783319730035

This textbook presents a concise, accessible and engaging first introduction to deep learning, offering a wide range of

1,571 294 2MB Read more

Artificial Intelligence: Crash Course

Artificial Intelligence (AI). It’s everywhere, whether we notice it or not. It’s taking the critical decisions in almost

630 120 3MB Read more

Introduction to Artificial Intelligence [1 ed.]
9783031259272, 9783031259289

Author / Uploaded
Michail E. Klontzas
Salvatore Claudio Fanni
Emanuele Neri

Categories
Medicine

Commentary
Medicine//Technology//Imaging Informatics//Radiology

Table of contents :
Preface
Contents
1 What Is Artificial Intelligence: History and Basic Definitions
1.1 Twentieth Century: Setting the Foundations of Artificial Intelligence
1.1.1 Artificial Intelligence
1.1.2 Machine Learning
1.1.2.1 Neural Networks
1.2 The Period 2000–2020
References
2 Using Commercial and Open-Source Tools for Artificial Intelligence: A Case Demonstration on a Complete Radiomics Pipeline
2.1 Introduction
2.2 Image Segmentation
2.3 Image Pre-processing
2.4 Radiomics Extraction
2.5 Radiomics Modeling
2.6 From Theory to Practice
2.7 Discussion
2.8 Conclusion
References
3 Introduction to Machine Learning in Medicine
3.1 Introduction
3.2 What Is Machine Learning?
3.3 Principal ML Algorithms
3.3.1 Supervised Machine Learning
3.3.1.1 Linear Regression
3.3.1.2 Support Vector Machine
3.3.1.3 Random Decision Forest
3.3.1.4 Extreme Gradient Boosting
3.3.1.5 Naive Bayes
3.3.2 Unsupervised Machine Learning
3.3.2.1 k-Nearest Neighbours
3.3.2.2 Principal Component Analysis
3.3.2.3 k-Means Clustering
3.3.3 Artificial Neural Networks
3.3.4 Reinforcement Learning
3.4 Issues and Challenges
3.4.1 Data Management
3.4.2 Machine Learning Model Evaluation Metrics
3.4.3 Explainability, Interpretability, and Ethical and Legal Issues
3.4.4 Perspectives in Personalized Medicine
3.5 Conclusions
References
4 Machine Learning Methods for Radiomics Analysis: Algorithms Made Easy
4.1 Introduction
4.2 Methods for Region of Interest Segmentation
4.2.1 R-CNN
4.2.2 U-Net and V-Net
4.2.3 DeepLab
4.3 Methods for Exploratory Data Analysis
4.3.1 Correlation Analysis
4.3.2 Clustering
4.3.3 Principal Component Analysis
4.4 Methods for Feature Selection
4.4.1 Boruta
4.4.2 Recursive Feature Elimination
4.4.3 Maximum Relevance: Minimum Redundancy
4.5 Methods for Predictive Model Construction
4.5.1 Decision Trees
4.5.2 Random Forests
4.5.3 Gradient Boosting Algorithms
4.5.4 Support Vector Machines
4.5.5 Neural Networks
4.6 Conclusion
References
5 Natural Language Processing
5.1 Brief History of NLP
5.2 Basic of Natural Language Processing
5.3 Current Applications of Natural Language Processing
References
6 Deep Learning Fundamentals
Abbreviations
6.1 Deep Learning in Medical Imaging
6.1.1 Key Concepts
6.1.2 DL Architectures for Medical Image Analysis*-9pt
6.1.3 Cloud Computing for Deep Learning
6.1.4 DL-Based Computer-Aided Diagnosis
6.2 Quality and Biases of Medical Databases
6.3 Pre-processing for Deep Learning
6.3.1 CT Radiation Absorption Map to Grayscale
6.3.2 MRI Bias Field Correction
6.3.3 Tissue-Based Standardization
6.3.4 Pixel Intensities Normalization
6.3.5 Harmonization
6.3.6 Spacing Resampling
6.3.7 Image Enhancement
6.3.8 Image Denoising
6.3.9 Lowering Dimensionality at the Imaging Level for Deep Learning
6.4 Learning Strategies
6.4.1 Transfer Learning
6.4.2 Multi-task Learning
6.4.3 Ensemble Learning
6.4.4 Multimodal Learning
6.4.5 Federated Learning
6.5 Interpretability and Trustworthiness of Artificial Intelligence
6.5.1 Reproducibility
6.5.2 Traceability
6.5.3 Explainability
6.5.4 Trustworthiness
References
7 Data Preparation for AI Analysis
7.1 Introduction
7.2 Data Quality and Numerosity
7.2.1 Intrinsic Image Quality
7.2.2 Image Diagnostic Quality
7.2.3 Image Quality for AI Analyses
7.3 Data Preprocessing for Machine Learning Analyses
7.3.1 The Machine Learning Pipeline
7.3.2 The Machine Learning Pipeline: A Case Study
References
8 Current Applications of AI in Medical Imaging
8.1 Introduction
8.2 Detection
8.3 Classification
8.4 Segmentation
8.4.1 Monitoring
8.4.2 Prediction
8.4.3 Additional Applications
8.4.3.1 Image Enhancement and Reconstruction
8.4.4 Workload Reduction?
8.5 Conclusions
References

Citation preview

Imaging Informatics for Healthcare Professionals

Michail E. Klontzas · Salvatore Claudio Fanni · Emanuele Neri Editors

Introduction to Artificial Intelligence

Imaging Informatics for Healthcare Professionals Series Editors Peter M. A. van Ooijen, University Medical Center Groningen, University of Groningen, GRONINGEN, Groningen, The Netherlands Erik R. Ranschaert, Department of Radiology, ETZ Hospital, Tilburg, The Netherlands Annalisa Trianni, Department of Medical Physics, ASUIUD, UDINE, Udine, Italy Michail E. Klontzas, University Hospital of Heraklion, Heraklion, Greece Institute of Computer Science, Foundation for Research and Technology (FORTH), Heraklion, Greece

The series Imaging Informatics for Healthcare Professionals is the ideal starting point for physicians and residents and students in radiology and nuclear medicine who wish to learn the basics in different areas of medical imaging informatics. Each volume is a short pocket-sized book that is designed for easy learning and reference. The scope of the series is based on the Medical Imaging Informatics subsections of the European Society of Radiology (ESR) European Training Curriculum, as proposed by ESR and the European Society of Medical Imaging Informatics (EuSoMII). The series, which is endorsed by EuSoMII, will cover the curricula for Undergraduate Radiological Education and for the level I and II training programmes. The curriculum for the level III training programme will be covered at a later date. It will offer frequent updates as and when new topics arise.

Michail E. Klontzas • Salvatore Claudio Fanni • Emanuele Neri Editors

Introduction to Artificial Intelligence

Editors Michail E. Klontzas University Hospital of Heraklion Heraklion, Greece Institute of Computer Science, Foundation for Research and Technology (FORTH) Heraklion, Greece

Salvatore Claudio Fanni Academic Radiology, Department of Translational Research University of Pisa Pisa, Pisa, Italy

Emanuele Neri Academic Radiology, Department of Translational Research University of Pisa Pisa, Italy

ISSN 2662-1541 ISSN 2662-155X (electronic) Imaging Informatics for Healthcare Professionals ISBN 978-3-031-25927-2 ISBN 978-3-031-25928-9 (eBook) https://doi.org/10.1007/978-3-031-25928-9 © EuSoMII 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Paper in this product is recyclable.

Preface

Artificial intelligence (AI) is rapidly infiltrating the scientific world while steadily demonstrating important real-life applications. The increasing number of publications in the field and the numerous commercial applications of AI algorithms available not only to computer scientists but to experts in numerous fields necessitate deep understanding of basic AI principles. An increasing number of professionals in disciplines other than mathematics and computer scientists encounter terminology related to basic AI principles on a daily basis. Even though AI principles are slowly being introduced to medical school and residency curricula, there is a great need for basic education on the foundations of this exciting field. This book aims to provide physicians and scientists the basics of artificial intelligence with special focus on medical imaging. The book provides an introduction to the main topics of artificial intelligence currently applied on medical image analysis. Starting with a chapter explaining the basic terms used in artificial intelligence for novice readers, the book embarks on a series of chapters each one of which provides the basics on one AIrelated topic. The second chapter utilizes a radiomics paradigm to practically demonstrate how programming languages and available automated tools can be used for the development of machine learning models. The third chapter endeavours to analyse the main traditional machine learning techniques, explaining algorithms such as random forests, support vector machines as well as basic neural networks. The applications of those algorithms on the

v

vi

Preface

analysis of radiomics data are expanded in the fourth chapter. Chapter 5 provides the basics of natural language processing which has revolutionized the analysis of complex radiological reports, and Chap. 6 affords a succinct introduction to convolutional neural networks which have revolutionized medical image analysis. The penultimate chapter provides an introduction to data preparation for use in the aforementioned artificial intelligence applications. The book concludes with a chapter demonstrating the main landscape of current AI applications while providing an insight about the foreseeable future. Ultimately, we sought to provide a succinct textbook that can offer all basic knowledge on AI required for professionals dealing with medical images. This volume comes as the third addition to the “Imaging Informatics for Healthcare Professionals” book series endorsed by EuSoMII aiming to become the basic resource of information for healthcare professionals dealing with AI applications. Heraklion, Greece Pisa, Italy Pisa, Italy

Michail E. Klontzas Salvatore Claudio Fanni Emanuele Neri

Contents

1

2

What Is Artificial Intelligence: History and Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Emmanouil Koltsakis, Michail E. Klontzas, and Apostolos H. Karantanas Using Commercial and Open-Source Tools for Artificial Intelligence: A Case Demonstration on a Complete Radiomics Pipeline . . . . . . . . . . . . . . . . . . . . . Elisavet Stamoulou, Constantinos Spanakis, Katerina Nikiforaki, Apostolos H. Karantanas, Nikos Tsiknakis, Alexios Matikas, Theodoros Foukakis, and Georgios C. Manikis

3

Introduction to Machine Learning in Medicine . . . . . . Rossana Buongiorno, Claudia Caudai, Sara Colantonio, and Danila Germanese

4

Machine Learning Methods for Radiomics Analysis: Algorithms Made Easy . . . . . . . . . . . . . . . . . . . . . . Michail E. Klontzas and Renato Cuocolo

1

14

39

69

5

Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . Salvatore Claudio Fanni, Maria Febi, Gayane Aghakhanyan, and Emanuele Neri

87

6

Deep Learning Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Eleftherios Trivizakis and Kostas Marias

vii

viii

Contents

7

Data Preparation for AI Analysis . . . . . . . . . . . . . . . . . . . . . . 133 Andrea Barucci, Stefano Diciotti, Marco Giannelli, and Chiara Marzi

8

Current Applications of AI in Medical Imaging. . . . . . 151 Gianfranco Di Salle, Salvatore Claudio Fanni, Gayane Aghakhanyan, and Emanuele Neri

1

What Is Artificial Intelligence: History and Basic Definitions Emmanouil Koltsakis, Michail E. Klontzas, and Apostolos H. Karantanas

1.1

Twentieth Century: Setting the Foundations of Artificial Intelligence

1.1.1

Artificial Intelligence

One would expect that the answer to a theoretically simple question such as “What is artificial intelligence?” would be E. Koltsakis () Department of Radiology, Karolinska University Hospital, Stockholm, Sweden M. E. Klontzas University Hospital of Heraklion, Heraklion, Greece Institute of Computer Science, Foundation for Research and Technology (FORTH), Heraklion, Greece A. H. Karantanas Department of Medical Imaging, University Hospital of Heraklion, Heraklion, Crete, Greece Department of Radiology, School of Medicine, University of Crete, Heraklion, Crete, Greece Advanced Hybrid Imaging Systems, Institute of Computer Science, Foundation for Research and Technology (FORTH), Heraklion, Crete, Greece

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. E. Klontzas et al. (eds.), Introduction to Artificial Intelligence, Imaging Informatics for Healthcare Professionals, https://doi.org/10.1007/978-3-031-25928-9_1

1

2

E. Koltsakis et al.

correspondingly simple. However, the greatest difficulty is hidden in simplicity. The purpose of this chapter is to simplify and define the basic principles that are comprehended in the sphere of artificial intelligence while delving into it and guiding the reader through its history. Among his great contributions to science, the mathematician Alan Turing was the first one, back in 1950, to ask whether machines can think and proposed the famous Turing test or imitation game. According to this, the three participants would be two humans (an interrogator and a contestant) and a machine. While the interrogator asks single blinded questions to the other two, he/she would have to understand which answers come from the contestant and which from the machine [1]. Up until the day that this chapter is written, no machine has ever successfully passed the Turing test. At the same time Alan Turing’s fellow student Christopher Stracey developed a game that played checkers [2], while Dietrich Prinz, who learned programming from Alan Turing’s seminars, developed a program that played chess. In 1956 the Dartmouth Summer Research Project on Artificial Intelligence took place, in which Professor John McCarthy introduced the term Artificial Intelligence (AI). The aim of the Dartmouth Workshop was that over a 2-month period, a group of 10 people would have a machine adopt a characteristic of intelligence, such as using language, forming abstractions, or self-improvement. The proposal of this workshop stated that “An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves”. The workshop was not successful but the term AI came to life.

Artificial Intelligence: is the science and engineering of inventing machines or computer systems with feature imitating humanlike abilities or behaviors such as visual and speech interpretation, problem-solving, self-teaching.

1 What Is Artificial Intelligence: History and Basic Definitions

3

Three years later, in 1959, the first AI laboratory was established in MIT. It was in that very laboratory that ELIZA, the first chatbot, was created by Joseph Weizenbaum. ELIZA worked in a simple manner. An input was analyzed and inspected for keywords, then according to a rule which was associated with the keyword an output was created [3]. ELIZA and the Georgetown experiment in which 60 sentences were automatically translated from Russian to English were the first steps toward Natural Language Processing (NLP) and more specifically Symbolic NLP.

Natural Language Processing: is a subfield of AI with focus on understanding human language.

1.1.2

Machine Learning

While NLP was starting to form, Artur Samuel published in 1959 the first paper that introduced the term Machine Learning (ML) [4]. In this publication, two ML procedures for the game of checkers were proposed [4]. While the two procedures will not be further discussed in this book, it is important to mention that there are four types of ML. Supervised learning, unsupervised learning, semi-supervised, and reinforcement learning.

Machine Learning: is a subfield of AI in which computer systems are able to understand and learn patterns in order to solve problems without external reprogramming.

In supervised learning the input and output are provided with annotation (labels). Thus, one feeds the algorithm with information that helps the machine to learn. An example of

4

E. Koltsakis et al.

supervised learning is an algorithm for analyzing chest X-rays for pneumothorax, where during training the algorithm should be fed with manually annotated images on which a pneumothorax is marked. However, in unsupervised learning the user does not assist the machine in the learning process. The machine finds patterns in the unlabeled input and classifies the results creating clusters depending on their differences. For example, it will divide Xrays into two groups depending on the presence or absence of pneumonia without manually providing labels. The algorithm would be fed with a large number of chest X-rays allowing the algorithm to cluster the photos in two groups (with or without pneumonia). Combining the first two processes produces the semisupervised process, where unlabeled data is provided in combination with labeled data. This process helps to overcome the limitations of supervised or unsupervised learning, whereas it increases the accuracy and performance of the machine learning model. The fourth learning process is reinforcement learning. As one may guess it follows Pavlov’s dog’s principle. The computer program performs actions and either a positive or negative feedback is provided. Through this reinforcement process the computer tries to maximize the positive feedback. It is worth mentioning that reinforcement learning was described for the first time in 1961 by Donald Michie who used it in a machine that plays tictac-toe [5]. In 1975 Edward Shortlife published his paper for MYCIN, the first artificial intelligence program with application in medicine. MYCIN identified bacteria causing infection and recommended antibiotics—why the name MYCIN—with an adjusted dose according to the patient’s weight [6]. MYCIN was never used in clinical practice. The “informatica” symposium took place in Slovenia in 1976 and a paper of S. Bozinovski and A. Fulgosi was published in the proceedings. The paper was about transfer learning (TL), which is applied in ML. Likewise, Suzanna Becker and Geoffrey E. Hinton described how output of modules that have already undergone a

1 What Is Artificial Intelligence: History and Basic Definitions

5

learning process can be used as input for more complex modules, making the learning process of large-scaled networks faster [7].

Transfer Learning: is a learning type in which knowledge gained while processing a dataset is stored and then transfer for the process of a different dataset.

Transfer Learning is a learning type where knowledge gained during the processing of one dataset is stored and then transferred for the processing of another dataset. Transfer learning is currently very popular in deep learning because it can train deep neural networks with comparatively little data. This is very useful in the data science field since most real-world problems typically do not have millions of labeled data points to train such complex models. This method also has a significant potential in medicine, as we will see later.

1.1.2.1 Neural Networks Moving forward to 1979, a group of students from Stanford University invented a cart that had the ability to navigate and avoid obstacles. Concurrently, Kunihiko Fukushima published an article on the neocognitron, which is a type of artificial neural network (ANN) and the inspiration for convolutional neural networks (CNN).

Artificial Neuron: is a mathematical function which receives one or multiple inputs, sums them, and produces an output. The basic principle in which an artificial neuron functions is the same as in a biological neuron. Artificial Neural Network: is a network of artificial neurons which communicate with each other through edges (equivalent to the synapses of the biological neurons). (continued)

6

E. Koltsakis et al.

The edges and the neurons are weighted and the weights can be adjusted through the learning process of the machine. As long as the output is not the desired and there is a difference (error) the machine will adjust the weight of the neuron in order to reduce that error. The neurons are arranged in layers. The first layer corresponds the input layer and the last to the output.

The neocognitron is a multilayered artificial neural network consisting of multiple types of cells, two of which are the simple cells—S-cells—and the complex cells—C-cells—similar to the visual nervous system model as introduced by Hubel and Wiesel [8, 9]. Stimulus patterns are recognized based on geometric similarity and each specific pattern is processed only by specific C-cells, which corresponds to the way the visual cortex works. This is how deep neural networks and deep learning came to life.

Deep Neural Network: is an ANN with multiple layers of neurons between the input and the output layer. There are different subtypes of DNNs and various types of layers. Deep Learning: is a subset of machine learning that is using DNNs and works in a way similar to that of a human brain. Like in machine learning, there are various deep learning algorithms.

Connection of neuron layers of a deep neural network resembles the way neurons of the human brain are connected to each other. While the chess champion Garry Kasparov was defeated

1 What Is Artificial Intelligence: History and Basic Definitions

7

by the Deep Blue computer system which run in IBM chesscomputer in 1997, Yann LeCun published the first paper for CNNs inspired by the neocognitron [10].

Convolutional Neural Network: is a subtype of DNN which is working using mathematical convolution in at least one of their layers. CNNs are used in computer vision and pixel pattern recognition.

The applicability of CNNs in computer vision render them ideal for use on radiological applications where images have to be analyzed.

1.2

The Period 2000–2020

As the research usage of AI, ML, DL, and CNN became greater and greater, new milestones were added to the timeline. In 2002, Torch, the first machine learning library that provided algorithms for DNNs, was created and released by Ronan Collobert, Samy Bengio, and Johnny Mariéthoz. As the name implies, in a machine learning library, one may find common learning algorithms that are available for the public. Depending on the purpose of the program or the programming language, different libraries are more applicable than others, similar to traditional libraries. In the following years, important changes escalated quickly. In 2005, a Stanford robot won the DARPA Grand Challenge by driving autonomously for 131 miles along an unrehearsed desert trail. With raw data from GPS, camera, and 3D mapping composed with LIDARS, the car controlled the speed and direction in order to avoid obstacles. In 2006, the term “Machine Reading” was used by Oren Etzioni, Michele Banko, Michael J. Cafarella to describe the

8

E. Koltsakis et al.

automatic, unsupervised interpretation of text the same year, the data scientist Fei-Fei Li set up the ImageNet, a database that now contains more than 14 million annotated images available for computer vision and different applications such as object recognition or localization or image classification. A couple of years later, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) began, in which different algorithms for object detection and image classification at large scale were evaluated. When Microsoft launched the Kinect for the Xbox 360 in 2010, AI began to become available for everyday activities such as video games. Kinect was the first gaming device that tracked human body movement using a 3D camera and infrared detection. Gradually, VR gaming came to life when in 2011 IBM’s Watson computer beat TV game show “Jeopardy!” champions Rutter and Jennings. Importantly, not only did a computer beat the two champions, but it was also the first time people witnessed AI using NLP. Then Apple’s Siri (2011), Google’s Google Now (2012), Microsoft’s Cortana (2014), and smartphone apps that used natural language to answer questions, make recommendations, and perform actions became part of our daily lives. In 2013 FDA approved marketing of the AI-based device IDx-DR to detect diabetic retinopathy, to assist clinicians who may not routinely be involved in eye care. In 2017 Arterys became the first AI company to receive FDA clearance to use cloud-based DL in a clinical setting. More and more AI models with clinical applicability became available for direct use on radiological images in picture archiving and communication systems (PACS) or in electronic health records (EHRs), while others were used for automatic recognition of melanomas in dermoscopy or analysis of the retina in retinal optical coherence tomography (OCT). However, AI did not enter into just these medical domains. AI has the potential to make a substantial or promising contribution to multiple specialties, ranging from pathology, cytology, and radiology to oncology, gastroenterology, or anesthesiology, and in any specialty in which NLP, computer vision, biometrics, or decision-making can be

1 What Is Artificial Intelligence: History and Basic Definitions

9

applied [11]. As AI increasingly integrates into clinical practice, it is logical that questions are raised from the ethical perspective around its usability. A quick search in PubMed reveals that an increasing number of studies on the topic have been published since 2019, and a joint European and North American multisociety statement on AI in radiology was published in the same year [12]. By this point there is no uncertainty regarding AI soft- and hardware in medicine. Guides and books have been published targeting medical professionals. AI is expanding and pushing the boundaries of imagination. Entrepreneurial inventions are proliferating. AI fuses with science, art, literature, production, machinery, people’s safety and anything one can imagine. In January 2021, the DALL-E 1 program started generating digital images with natural language descriptions as input. The same year the AphaFold project released predicted protein structures for about 1 million proteins. In it, almost all human proteins were included. A year later, more than 200 million proteins structures were predicted. DALL-E 2 entered the beta phase and the ChatGPT was launched, providing its users with detailed and articulated answers. Before continuing to the next chapter, the authors encourage you to think and identify how many times and in which ways you have interacted with AI since this morning. The next steps are ongoing research into AI and inventing new software until one finally arrives at models that successfully pass the Turing test. At that point, we can speak of human-level intelligence, also called “general AI.” However, it is not yet clear how long it will take to reach that level of AI, and certainly when AI will exceed human intelligence. A brief overview of all major milestones in AI history is presented in Fig. 1.1.

10

E. Koltsakis et al.

Fig. 1.1 Major milestones in the history of artificial intelligence (created with biorender.com)

1 What Is Artificial Intelligence: History and Basic Definitions

11

References 1. Turing AM. Computing machinery and intelligence. Mind. 1950;LIX:433–60. 2. Strachey CS. Logical or non-mathematical programmes, ACM ’52, 1952. 3. Weizenbaum J. ELIZA-A computer program for the study of natural language communication between man and machine. Commun ACM. 1966;9:36–45. 4. Samuel AL. Some studies in machine learning. IBM J Res Dev. 1959;3:210–29. 5. Michie D. Experiments on the mechanization of game-learning. Part I. Characterization of the model and its parameters. Comput J. 1963;6:232– 6. 6. Shortlife EH, Buchanan BG. A model of inexact reasoning in medicine. Math Biosci. 1975;23:351–79. 7. Becker S, Hinton GE. Self-organizing neural network that discovers surfaces in random-dot stereograms. Nature. 1992;355:161–3. 8. Fukushima K. Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern. 1980;36:193–202. 9. Hubel DH, Wiesel TN. Receptive fields of single neurones in the cat’s striate cortex. J Physiol. 1959;148:574–91. 10. Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86:2278–324. 11. Hinton G. Deep learning-a technology with the potential to transform health care. JAMA. 2018;320:1101–2. 12. Raymond GJ. Ethics of artificial intelligence in radiology: summary of the joint European and North American Multisociety Statement. Radiology. 2019;293:436–40.

Using Commercial and Open-Source Tools for Artificial Intelligence: A Case Demonstration on a Complete Radiomics Pipeline Elisavet Stamoulou, Constantinos Spanakis, Katerina Nikiforaki, Apostolos H. Karantanas, Nikos Tsiknakis, Alexios Matikas, Theodoros Foukakis, and Georgios C. Manikis

Authors Elisavet Stamoulou and Constantinos Spanakis have equally contributed E. Stamoulou · C. Spanakis · K. Nikiforaki Computational BioMedicine Laboratory, Institute of Computer Science, Foundation for Research and Technology (FORTH), Heraklion, Greece A. H. Karantanas Computational BioMedicine Laboratory, Institute of Computer Science, Foundation for Research and Technology (FORTH), Heraklion, Greece Department of Medical Imaging, University Hospital, Heraklion, Greece Department of Radiology, School of Medicine, University of Crete, Voutes Campus, Heraklion, Greece N. Tsiknakis Computational BioMedicine Laboratory, Institute of Computer Science, Foundation for Research and Technology (FORTH), Heraklion, Greece Department of Oncology-Pathology, Karolinska Institutet, Stockholm, Sweden

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. E. Klontzas et al. (eds.), Introduction to Artificial Intelligence, Imaging Informatics for Healthcare Professionals, https://doi.org/10.1007/978-3-031-25928-9_2

13

2

14

2.1

E. Stamoulou et al.

Introduction

Radiology images can be considered the cornerstone of healthcare, as they affect the complete pathway from diagnosis and optimal treatment selection to evaluation of treatment response. However, imaging modalities including but not limited to tomographic techniques, i.e., magnetic resonance imaging (MRI) or computed tomography (CT), are often not used in their raw form. The reason is that image characteristics vital for the physicians usually lie beyond human perception not only because of their subtle manifestation but also because they can be masked by a number of image degradation factors, i.e., artifacts. Recent advances in artificial intelligent (AI) and medical image analysis have drastically upgraded the role of radiology images in the clinical routine, facilitating their digital decoding into high-throughput quantitative features instead of their use as a subjective qualitative assessment of the examined tissue. Particularly, the emerging field of radiomics in conjunction with machine learning (ML) modeling has enabled the conversion of routine radiology images into high-throughput quantitative data to describe non-intuitive properties of the imaging phenotype and tissue micro-environment [1]. In general, a radiomics analysis workflow comprises four main steps: image segmentation, image pre-processing, radiomics extraction, and ML modeling (an indicative analysis workflow can be found in [2] to predict isocitrate dehydrogenase (IDH)

A. Matikas · T. Foukakis Department of Oncology-Pathology, Karolinska Institutet, Stockholm, Sweden G. C. Manikis () Computational BioMedicine Laboratory, Institute of Computer Science, Foundation for Research and Technology (FORTH), Heraklion, Greece Department of Oncology-Pathology, Karolinska Institutet, Stockholm, Sweden

2 Commercial and Open-Source Tools for AI

15

mutations in gliomas). Image segmentation is an essential step of the analysis workflow that seeks to define a specific region of interest (ROI) within the image from which radiomics features are extracted. Image pre-processing aims to improve the quality of images by removing or alleviating noise and artifacts which could result in more accurate ML model predictions. Next, the radiomics extraction can be achieved either using image analysis techniques to calculate high-dimensional handcrafted features or using an end-to-end deep learning (DL) architecture to extract “deep features” [3]. In the end, these features are used to build radiomics-based ML models for clinical outcome predictions. Recent efforts aim to address technical details in radiomics development [4–10], and however these are mostly addressed to data scientists with a background in AI modeling. Consequently, for a wide range of doctors and medical physicists, having little to no experience with radiomics analysis is difficult to go beyond the theoretical framework and exploit its full potential for the benefit of the medical community and the society. To this direction, we believe that studies raising interest and actively engaging these professionals to the field of radiomics will increase people’s attention to deliver more drastically AI in the clinical routine and accelerate productivity and precision in healthcare delivery. This is the main scope of this chapter toward presenting the basic steps and the available tools and plugins required to develop a radiomics analysis pipeline under the criteria of requiring only a high-level friction with AI and minimal or no programming skills. In addition, a “from theory to practice” section summarizes a practical guide on how to perform radiomics by illustrating each of the analysis steps and the appropriate tools/plugins.

2.2

Image Segmentation

Image segmentation is the process of partitioning an image to multiple image components usually with reference to anatomical structures, known as image segments or image regions. It is required in many scientific or clinical projects and can be conducted either manually, semi-automatically, or fully automatically

16

E. Stamoulou et al.

to delineate a 2D ROI or a three-dimensional (3D) volume of interest (VOI). Manual and semi-automatic methods have the drawback that they are tedious, time-consuming, need a starting point from which they can begin the segmentation process, and interactions with the user (even if it is at the beginning of the process) can add bias to the segmentation process. Another major issue is the intra- and inter-observer variability in the segmentations that highly affect the generalization performance of the AI models [11]. The rapid development of AI in the last decade has made great advances for the improvement of image segmentation in terms of its precision, accuracy, and repeatability. Especially, the use of fully automated AI techniques leveraging DL has recently become a sine qua non-condition in the field, demonstrating state-of-the-art results when segmenting a variety of anatomical structures from different image modalities. These techniques not only show better results compared to traditional image analysis methods for segmentation but also are fast and robust especially when developed using large-scale datasets [12]. Current DLbased segmentation approaches include the Deep Grow module [13] and MedSeg [14], an online image segmentation tool written in JavaScript and WebGL that provides seamless and lightning-fast performance. FiJi [15] employs a WEKA plugin [16] that combines a collection of AI and image analysis tools to support fully integrated segmentation analysis workflows. Several other techniques have been implemented as plugins in 3DSlicer such as the MONAI [17], NVIDIA-AIAA (https://github.com/ NVIDIA/ai-assisted-annotation-client/tree/master/slicer-plugin), and TotalSegmentator [18]. Another group of AI tools and plugins can be found in the literature for brain (Slicer-DeepSeg [19]), lung (DensityLungSegmentation [20]), and liver parenchyma and vessels segmentation (SlicerRVXLiverSegmentation (https:// github.com/R-Vessel-X/SlicerRVXLiverSegmentation)). Other DL image segmentation tools are the Autoradiomics [21], Avizo [22], Qupath [23, 24], RECOMIA [25], nucleAIazer [26], aimis3d [27], AutoRadiomics [21], MVision [28], MedViso [29], ImFusion [30], and RSIP [31]. Although there is a plethora of AI tools already in the literature that facilitate

2 Commercial and Open-Source Tools for AI

17

image segmentation under an efficient and user-friendly manner, technical challenges frequently arise related to the number of the annotated images required for model training, the inherent physiological heterogeneity among patients, model overfitting, and gradient vanishing [32].

2.3

Image Pre-processing

Image pre-processing, digital or geometric, focuses on the image data quality improvement by removing noise, artifacts, and nonuniformity in the signal intensities of an image. It is usually conducted by following several discrete actions (e.g., image reconstruction, registration, filtering, interpolation, bias field correction, intensity-based normalization, and discredization) and has shown to have a considerable impact on the radiomics analysis results since it comprises a series of techniques aiming to transform and homogenize images from which reproducible and reliable radiomics models can be produced. An in-depth discussion of these steps is beyond the scope of this review, and however the readers can review recent literature with a comprehensive technical presentation in image pre-processing [9, 11, 33]. Digital pre-processing is the process of the signal intensities of an image, and image enhancement is one of its most frequently used applications. Image enhancement refers to noise reduction, artifact correction, and resolution improvements of an image, focusing on highlighting features that are difficult to be distinguished due to degraded or inherently poor image contrast. AI has shown great success in this field through the artificial neural networks [34]. Specifically, convolutional neural networks (CNNs) based on super revolution (SR) have been deployed to restore corrupted images when a low-quality and noisy image (low resolution) is upsampled to match the size of a high-resolution (HR) enhanced image [35]. Python programming language [36] supports AI libraries for medical image resolution improvement such as the MedSRGAN [37] and ANTsPvNet [38]. Regarding noise and artifacts reduction, denoising convolutional

18

E. Stamoulou et al.

neural network (DnCNN) [39] and intensity inhomogeneity correction network (InhomoNet) [40] have been successfully applied to MRI studies. Geometric pre-processing, mostly performed by image registration and reconstruction, is the process that geometrically alters the view of the depicted anatomical areas in an image. AI in image registration is used either to assess the similarity of two or more images [41] or to search for their best transformation (e.g., using evolutionary [42] and swarm algorithms [43]). Novel approaches include learning-based image registration methods [44] and generative adversarial networks (GANs) [45]. Additionally, there are several AI tools for geometric image pre-processing steps such as ImFusion [46] for image registration and StudierFenster [47] for reconstruction. In ImFusion [46], DL techniques are utilized to learn the features used as descriptors for image registration. Image registration tools are also embedded in 3DSlicer [48]. AI has also revolutionized the field of image reconstruction, the agglutination of two or more images to create two-dimensional (2D) high-resolution (named as image stitching) or 3D images, mainly focusing on conventional DL architectures [49] and novel approaches such as GANs [50]. Although several AI techniques exist for image pre-processing, they should be used with caution since they can result to an unrecoverable signal loss [7]. Indicatively, in a typical MRI-based radiomics analysis workflow where images suffer from spatial signal inhomogeneity due to bias field corruption effect, it is a prerequisite to utilize image analysis algorithms such as the N4 bias field correction to minimize or correct for this low-frequency signal variation [51]. In addition, there is an evident variability in studies where radiomics is applied to heterogeneous data coming from different acquisition protocols and/or vendors. Since this can hamper the quality of the extracted radiomics features [52], image-based harmonization is proposed as a solution to reduce this inherent variability and bring images in a common analysis workspace [11]. To this direction, a lot of effort has been put into developing user-friendly open-source tools for image pre-processing. Among others, this chapter highlights software such as the ImageJ [53], MIPAV [54], MITK [55], 3DSlicer [48], and LIFEx [56]. It is

2 Commercial and Open-Source Tools for AI

19

worth mentioning that the readers with experience in software development using Python can also refer to open-source libraries including the SimpleITK [57] and PyRadiomics [58]. PyRadiomics provides several pre-processing options and supports a wide variety of images formats. It can be used standalone or using the 3DSlicer which has incorporated a plugin providing a convenient front-end interface for PyRadiomics. MITK not only provides image processing tasks but more importantly a stable, well-tested, and extensible framework for radiomics analysis. Also, it has a unified framework for different user applications accessible either though GUI, a command line tool, Python, or C.++, making it usable for users with experience in software programming. An indicative image pre-processing example is given in the “from theory to practice” section using 3DSlicer.

2.4

Radiomics Extraction

The emerging field of radiomics proposes a challenging image analysis framework, involving a massive extraction of quantitative features to reveal non-intuitive properties of the imaging phenotype and tissue micro-environment [59]. The analysis demands an effective and efficient representation of the image content which is performed either using several mathematical equations from the image analysis domain to extract handcrafted features (descriptors of shape, size and textural patterns) from a ROI or using a complex DL architecture that uses non-linear transformations of the image to extract a massive amount of “deep features” without the need of any human intervention. For handcrafted feature extraction, we can list, among the widely used approaches, software tools that require little to no programming skills from the user like MaZda [60], LIFEx [56], IBEX [61], StudierFenster [47], RadiomiX ToolBox [62], AutoRadiomics [21], and 3DSlicer [48, 63]. On the other hand, software platforms that utilize AI to calculate “deep features” directly from images include the deep learning studio [64] and Nvidia’s Digits [65]. While there is a wide range of AI applications and image analysis techniques that can be incorporated in the radiomics

20

E. Stamoulou et al.

extraction phase, the recent literature raise concerns about sources of variation (e.g., inter-patient and inter-scanner factors) that can significantly hamper radiomics features stability [11]. These variation effects are more evident in multicenter studies, having a significant impact on radiomics analysis repeatability and robustness [66]. To overcome this issue, efforts from the image biomarker standardization initiative (IBSI) have focused on the standardization of the image pre-processing in order to improve reproducible radiomic feature calculations [67]. It is of note to mention that the aforementioned RadiomiX ToolBox, AutoRadiomics, 3DSlicer, and LIFEx are software tools that can extract fully IBSI compliant radiomics features. Particularly, the first three integrate PyRadiomics that follows the IBSI guidelines, whereas LIFEx is an IBSI compliant freeware tool with its own user interface [58]. Additional effort involves the development of data-driven techniques that operate directly on the calculated features (i.e., feature-based) to compensate for variability effects. These include ComBatTool [68], a popular standalone web application that focuses to adjust the feature values into a common space by estimating their differences using an empirical Bayes framework and Neuroharmony [69], an AI tool that enables radiomics feature harmonization without having knowledge of the image acquisition parameters. An illustrative representation of the radiomics extraction using 3DSlicer is given in the “from theory to practice” section.

2.5

Radiomics Modeling

The use of AI in radiomics analysis is gaining momentum since it shows great promise in transforming the field of medical imaging and bringing medicine from the era of “sickcare” to the era of healthcare and prevention. This potential is prominent in AI algorithms that once they are trained to learn from existing imaging data they can perform a statistical inference to make accurate predictions on new “unseen” data [70, 71]. This learning process facilitates the development of a variety of traditional machine learning models such as the naive Bayes [72],

2 Commercial and Open-Source Tools for AI

21

logistic regression [73], support vector machines [74], decision trees [75], and random forests [76]. Different from the traditional ML architecture, deep learning introduces a subfield of ML that operates on large-scale neuronal architectures in which higher level features are obtained by applying multiple nonlinear transformations to the input images [77]. The most popular DL models are Autoencoders, Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Generative Adversarial Networks (GANs) [77]. In general, an analysis pipeline comprising model training, hyperparameter optimization, selection, and validation is technically demanding and can be properly performed by a data scientist with AI skills. A thorough review of the basics in ML and DL is far beyond the scope of this chapter, and however several practical guides and tutorials are available online [78–80]. Automated Machine Learning (AutoML) was recently introduced to address technical challenges by proposing off-the-shelf AI implementations through a user-friendly interface [81]. A comprehensive review of the functionalities and benefits working with AutoML can be found in a recent survey [82]. To this direction, users can benefit from cloud-based AutoML platforms such as Google AutoML [82] and Microsoft AutoML [83] to train and validate high-quality models. Recently, a popular commercialbased AutoML platform was published (RapidMiner [84]), incorporating a comprehensive set of tools for every step of the analysis pipeline. Another commercial platform from AutoML is JADbio [85], which includes predictive and diagnostic clinical models for the analysis of both low- and high-dimensional data. AutoRadiomics [21] and RadiomiX [62] have been designed to meet the needs of ML-based radiomics. KNIME [86] and BigML [87] are user-friendly AI tools that perform basic data analytics and assist users with interactive data visualization to build ML models without coding experience. In the literature, radiologists recommend the use of WEKA [88, 89] and Orange [90] considering the simplicity and ease they provide to non-experts to conduct their own experiments [10]. Table 2.1 summarizes stateof-the-art platforms that are mainly dedicated to end users with little to no programming skills. For the readers even having low-

22

E. Stamoulou et al.

level programming skills, Python provides a significant list of AI libraries such as Google’s TensorFlow [91], Auto-Sklearn [92], AUTOkeras [93], and MLbox [94]. An indicative radiomics analysis is presented in the “from theory to practice” section using RapidMiner.

2.6

From Theory to Practice

It is evident from Table 2.1 that no single software tool can serve as a one-stop-shop solution for the design of a complete radiomics analysis workflow since each one has its own design style, customization, and functionality. To the best of our knowledge, several AI models can be deployed and executed within the 3DSlicer ecosystem (e.g., MONAI), and however this integration demands Python software skills to enable compatibility between the components. The user lacking programming skills can only partially exploit the capabilities of each software. It might be discouraging to create a seamless AI workflow where the output of each action is fed as input to a different software platform, not only because a general overview of the available tools is necessary but also because ensuring compatibility is a non-trivial task. A possible course of actions to structure a complete ML-based radiomics project in practice is presented in the following paragraph, shedding light on the interplay between different actions and how they can be combined to compose the full pathway from the clinical question to the AI derived answer. To this direction, our demonstration on how to perform a radiomics workflow starts with an MRI region of a cancer patient to predict diagnosis and assess the evolution of cancer and its mortality. Since our proposed pipeline is designed to be used by doctors or medical physicists with little or no programming skills, we recommend the 3DSlicer as a user-friendly tool which provides the flexibility on using the PyRadiomics benefits for both image pre-processing and radiomics feature extraction steps. As for the radiomics harmonization process, ComBaTool seems to be the ideal tool as it needs no programming skills and can be used online by selecting each parameter manually. Then, in order to develop the ML-based

2 Commercial and Open-Source Tools for AI

23

Table 2.1 Tools and plugins for radiomics analysis Image process Tool Image AutoRadiomics segmentation [21] Avizo [22] MedViso [29] 3DSlicer [19, 48, 63] FiJi [15] Qupath [23, 24] RECOMIA [25] MVision [28] nucleAIazer [26]

Image pre-processing

Radiomics extraction

Coding Python

Commercial Platform No Any

No Matlab No

Yes Both No

Any Windows Any

JavaScript Python No No Python

No No Yes Yes No

MedSeg [14] RSIP [31] aimis3d [27] ImFusion [30] ImageJ [53]

No No Python No No

No No No Yes No

MIPAV [54] 3DSlicer [48] and PyRadiomics [58] LIFEx [56] StudierFenster [47] ImFusion [46] MITK [55]

No No

No No

Any Any Online Windows Windows, Linux Online Online Windows Windows macOS, Linux, Windows Any Any

No No

No No

Any Online

No No

Yes No

3DSlicer [48] and PyRadiomics [58] RadiomiX [62] and PyRadiomics [58]

No

No

Windows Windows, Linux Any

Matlab

Yes

macOS, Windows

(continued)

24

E. Stamoulou et al.

Table 2.1 (continued) Image process Tool AutoRadiomics [21] and PyRadiomics [58] MaZda [60] IBEX [61] LIFEx [56] StudierFenster [47] Nvidia’s Digits [65] Deep Learning Studio [64] Radiomics NeuroHarmony [69] Harmonization ComBaTool [68] Modeling AutoRadiomics [21] RadiomiX [62]

Coding Commercial Platform Python No Any

No No No No

No No No No

Windows Windows Any Online

No

No

Any, Cloud

No

No

Python No

Windows, Linux, Cloud Any

No Online Python No

Any Any

Matlab No

macOS, Windows Any Online Windows, Linux macOS, Linux, Windows macOS, Linux, Windows Online Online Online

RapidMiner [95] JADBio [85] WEKA [88, 89] Orange [90]

No No No No

Yes Yes No No

KNIME [86]

No

No

BigML [87] Google [82] Microsoft [83]

No No No

No Yes Yes

2 Commercial and Open-Source Tools for AI

25

radiomics model, we propose the RapidMiner, one of the most popular data science AutoML tools. It is preferred because of its ability to provide a unique user-friendly visual workflow helping the user to build models with speed and automation. The acquired raw images are imported in 3DSlicer in DICOM format and information is retrieved by the DICOM headers about the acquisition protocol, study date, etc. 3DSlicer supports image viewing and a segment editor module which activates a wide range of segmentation methods (Fig. 2.1a). Within 3DSlicer, the user can create and edit slice-by-slice manual ROI segmentations (e.g., paint, draw, etc.), semi-automated (e.g., using thresholding, region growing etc.), and fully automated (e.g., MONAI plugin). Subsequently, the image pre-processing phase using 3DSlicer and PyRadiomics (Fig. 2.1b) supports the N4 bias field correction, filtering modules for noise and artifact reduction, and embedded tools for manual and automatic registration and reconstruction. Further pre-processing (e.g., interpolation, intensity-based normalization, and discredization) can be performed at the feature extraction step (Fig. 2.1c), either by setting up the required parameters from the user interface or by loading automatically a parameter file (e.g., YAML or JSON structured text file) from PyRadiomics. Indicative parameter files can be found in the PyRadiomics GitHub repository (https://github. com/AIM-Harvard/pyradiomics/tree/master/examples). For more detailed instructions and examples, extensively, documentations are available in PyRadiomics (https://pyradiomics.readthedocs. io/en/latest/) and 3DSlicer (https://slicer.readthedocs.io/en/latest/ index.html). As we have mentioned in the radiomics extraction section, radiomics features need to be harmonized before modeling. This is implemented using a free online application of ComBaTool (https://forlhac.shinyapps.io/Shiny_ComBat/) simply by uploading the radiomics values in tabular format (e.g., csv or txt file) (Fig. 2.1d). Except from the radiomics values, the uploaded file includes an extra column containing information about the image protocol (e.g., extracted from the DICOM header). Density plots and descriptive statistics of the radiomics values before and after the harmonization are also available through the online application (the details can be found in [96]). At the next step, the

Filtering

Intensity Normalization

D. Radiomics Harmonization with ComBaTool

Reconstruction / Registration

Bias Field Correction

Fig. 2.1 A proposed radiomics analysis pipeline using commercial tools and plugins

E. Radiomics Modeling with RapidMiner

A. Image segmentation with 3DSlicer

B. Image pre-processing with 3DSlicer & PyRadiomics

radiomics features

deep features

C. Radiomics extraction with 3DSlicer & PyRadiomics

Discretization

Interpolation

26 E. Stamoulou et al.

2 Commercial and Open-Source Tools for AI

27

harmonized radiomics features are downloaded in a csv file and imported in RapidMiner (Fig. 2.1e). For the beginners, the Auto Model is a recommended extension of RapidMiner for model development. It consists of the following steps: (i) data selection, (ii) task selection (e.g. classification or regression), (iii) target preparing, (iv) inputs selection, and (v) model selection (including automated validation and optimization of the selected models). In the RapidMiner menu by selecting the Overview section, the user can find the resulting performance metrics for each model, while by selecting the ROC comparison section the AUC curves are visualized. Furthermore, there is the “open process” tool in which the user can make changes without needing to start the analysis from scratch. Plenty of tutorials, examples, and instructions are included on the online documentation of RapidMiner (https:// docs.rapidminer.com/).

2.7

Discussion

Despite the contribution of AI in medical image analysis, there are still challenges that need to be addressed. Two main challenges are (i) the reproducibility of radiomics features and (ii) the explainability of AI models [97]. To the best of our knowledge, there is no user-friendly platform that enables to access and qualify these parameters automatically. As we have mentioned, radiomics features suffer from variability making the feature selection based on their stability an appropriate step for building robust radiomics models [98]. In the literature, the stability has been investigated by calculating the concordance correlation coefficient (CCC) and intra-class correlation coefficient (ICC) [11] after the radiomics extraction step. However, the discriminating power of radiomics cannot be guaranteed, and therefore harmonization methods have been proposed [99]. Another key problem in using AI is the lack of model interpretation which makes the users unable to understand how model predictions are generated. Explainable AI (XAI) methods are needed to understand the

28

E. Stamoulou et al.

mechanism behind the models and how the selected radiomics features are correlated with biological phenotypes [97]. Recently, SHapley Additive exPlanations (SHAP) method has been proposed as an explanatory technique [100] and has been successfully incorporated into a multiparametric radiomics model for the diagnosis of schizophrenia [33] and IDH mutations in gliomas [2]. Efforts should be made to integrate XAI into medical software tools in order to increase reliability and transparency in AI predictions. After a thorough and meticulous search on the armamentarium of medical image processing tools, it is evident that there are some common points and trends. First of all, the majority of the aforementioned tools are not commercial which means that they can be used without any engagement or cost. An experienced user can witness a steady, albeit slow, transition from conventional methods to more advanced and sophisticated ones influenced by the current trends in AI. In addition, a significant number of them can be installed on any platform, which can be useful for users not familiar with Linux or MacOs. Since the results of AI cannot be explained, it is very often described as black box mechanism in healthcare [101]. In order to encourage an increased use of AI in the clinical domain, the user needs to approach the rationale bridging the input to the output by coming in contact with some intermediate results or the strength of each contributing factor for the final result. The latter is the object of interest of XAI mechanisms research, which should grow in parallel with any other branch of AI. Last but not least, the lack of necessity for programming skills in a large number of applications is contributing to the development of an extended medical society that can produce, share, and discuss their AI studies and thus molding with interdisciplinary knowledge and feedback the relevant works. This in turn will produce results tightly bounded to the clinical needs and will positively affect both the clinical practice with new insights and also the technical parties with real world scenarios and abundant data resources.

2 Commercial and Open-Source Tools for AI

2.8

29

Conclusion

There is a plethora of AI tools indicative of the vivid interest of users in this area. Nevertheless, there is still a need for further research and work that needs to be done in order to increase their impact in radiology. First, the large number of tools itself escalates the need for a method to unify or integrate different medical-image-acquisition-oriented tools to form a seamless workflow embracing different imaging modalities (i.e., the tool should be able to handle images of different image acquisition techniques). The optimal option would be to unite them into a single tool that can deal with many tomographic techniques (MRI, CT, Ultrasound, etc.). Second, the transition from conventional methods to AI-based needs to be supported by additional improvements in AI algorithms regarding their area of applicability. Although AI techniques tend to overcome limitations of traditional techniques, they frequently focus on particular body areas or clinical problems. Another thing that also needs to be taken into account is that, despite the advancements in AI, the usefulness of these methods is still affected by the way a specific pipeline is constructed. Therefore, the construction of an optimal AI analysis pipeline requires the complementary knowledge of physicians, image acquisition experts, and AI specialists. To conclude, the ultimate challenge is to develop and construct an AI API that performs the aforementioned tasks on all medical images, regardless of the image acquisition technique, with high-quality results in a seamless user-friendly fashion. Acknowledgments Georgios C. Manikis is a recipient of a postdoctoral scholarship from the Wenner–Gren Foundations (www.swgc.org (accessed on January 27, 2023)) (grant number F2022-0005).

References 1. Lambin P, Leijenaar RTH, Deist TM, Peerlings J, De Jong EEC, Van Timmeren J, Sanduleanu S, Larue RTHM, Even AJG, Jochems A, et al. Radiomics: the bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol. 2017;14(12):749–62

30

E. Stamoulou et al.

2. Manikis GC, Ioannidis GS, Siakallis L, Nikiforaki K, Iv M, Vozlic D, Surlan-Popovic K, Wintermark M, Bisdas S, Marias K. Multicenter DSC–MRI-based radiomics predict IDH mutation in gliomas. Cancers. 2021;13(16):3965 3. Afshar P, Mohammadi A, Plataniotis KN, Oikonomou A, Benali H. From handcrafted to deep-learning-based cancer radiomics: challenges and opportunities. IEEE Sig Process Mag. 2019;36(4):132–60 4. Tian J, Dong D, Liu Z, Zang Y, Wei J, Song J, Mu W, Wang S, Zhou M. Radiomics in medical imaging—detection, extraction and segmentation. In: Artificial intelligence in decision support systems for diagnosis in medical imaging. Berlin: Springer; 2018. p. 267–333 5. Severn C, Suresh K, Görg C, Choi YS, Jain R, Ghosh D. A pipeline for the implementation and visualization of explainable machine learning for medical imaging using radiomics features. Sensors. 2022;22(14):5205 6. Bibault J-E, Xing L, Giraud P, El Ayachy R, Giraud N, Decazes P, Burgun A. Radiomics: a primer for the radiation oncologist. Cancer/Radiothérapie. 2020;24(5):403–10 7. Papanikolaou N, Matos C, Koh DM. How to develop a meaningful radiomic signature for clinical use in oncologic patients. Cancer Imaging. 2020;20(1):1–10 8. Lohmann P, Galldiks N, Kocher M, Heinzel A, Filss CP, Stegmayr C, Mottaghy FM, Fink GR, Shah NJ, Langen K-J. Radiomics in neurooncology: basics, workflow, and applications. Methods. 2021;188:112– 21 9. Van Timmeren JE, Cester D, Tanadini-Lang S, Alkadhi H, Baessler B. Radiomics in medical imaging—“how-to” guide and critical reflection. Insights Imaging. 2020;11(1):1–16 10. Koçak B, Durmaz ES, ¸ Ate¸s E, Kılıçkesmez Ö. Radiomics with artificial intelligence: a practical guide for beginners. Diagn Interventional Radiol. 2019;25(6):485 11. Stamoulou E, Spanakis C, Manikis GC, Karanasiou G, Grigoriadis G, Foukakis T, Tsiknakis M, Fotiadis DI, Marias K. Harmonization strategies in multicenter MRI-based radiomics. J Imaging. 2022;8(11):303 12. Kumar BV, Sabareeswaran S, Madumitha G. A decennary survey on artificial intelligence methods for image segmentation. In: Advanced engineering optimization through intelligent techniques. Berlin: Springer; 2020. p. 291–311 13. Sakinis T, Milletari F, Roth H, Korfiatis P, Kostandy P, Philbrick K, Akkus Z, Xu Z, Xu D, Erickson BJ. Interactive segmentation of medical images through fully convolutional neural networks; arxiv 2019. Preprint. arXiv:1903.08205 14. Medseg, October 2021

2 Commercial and Open-Source Tools for AI

31

15. Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M, Pietzsch T, Preibisch S, Rueden C, Saalfeld S, Schmid B. et al. Fiji: an open-source platform for biological-image analysis. Nat Methods. 2012;9(7):676–82 16. Arganda-Carreras I, Kaynig V, Rueden C, Eliceiri KW, Schindelin J, Cardona A, Seung HS. Trainable weka segmentation: a machine learning tool for microscopy pixel classification. Bioinformatics. 2017;33(15):2424–26 17. Diaz-Pinto A, Alle S, Ihsani A, Asad M, Nath V, Pérez-García F, Mehta P, Li W, Roth HR, Vercauteren T, Xu D, Dogra P, Ourselin S, Feng A, Cardoso MJ. MONAI label: a framework for AI-assisted interactive labeling of 3D medical images. arXiv e-prints; 2022 18. Wasserthal J, Meyer M, Breit H-C, Cyriac J, Yang S, Segeroth M. TotalSegmentator: robust segmentation of 104 anatomical structures in CT images. Preprint. arXiv:2208.05868; 2022 19. Zeineldin RA, Weimann P, Karar ME, Mathis-Ullrich F, Burgert O. Slicer-DeepSeg: open-source deep learning toolkit for brain tumour segmentation. Curr Directions Biomed Eng. 2021;7(1):30–4 20. Zaffino P, Marzullo A, Moccia S, Calimeri F, De Momi E, Bertucci B, Arcuri PP, Spadea MF. An open-source covid-19 CT dataset with automatic lung tissue classification for radiomics. Bioengineering. 2021;8(2):26 21. Woznicki P, Laqua F, Bley T, Baeßler B, et al. AutoRadiomics: A framework for reproducible radiomics research. Front Radiol. (2022);2:919133. https://doi.org/10.3389/fradi.2022.919133 22. Fermentas Inc. Thermo scientifictm amira-avizo software; 2021. November 2008 23. Bankhead P, Loughrey MB, Fernández JA, Dombrowski Y, McArt DG, Dunne PD, McQuaid S, Gray RT, Murray LJ, Coleman HG, et al. Qupath: open source software for digital pathology image analysis. Sci Rep. 2017;7(1):1–7 24. Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Süsstrunk S. Slic superpixels. Technical report; 2010 25. Trägårdh E, Borrelli P, Kaboteh R, Gillberg T, Ulén J, Enqvist O, Edenbrandt L. Recomia—a cloud-based platform for artificial intelligence research in nuclear medicine and radiology. EJNMMI Phys. 2020;7(1):1–12 26. Hollandi R, Szkalisity A, Toth T, Tasnadi E, Molnar C, Mathe B, Grexa I, Molnar J, Balind A, Gorbe M, et al. nucleAIzer: a parameter-free deep learning framework for nucleus segmentation using image style transfer. Cell Syst. 2020;10(5):453–58 27. Jia G, Huang X, Tao S, Zhang X, Zhao Y, Wang H, He J, Hao J, Liu B, Zhou J, et al. Artificial intelligence-based medical image segmentation for 3D printing and naked eye 3D visualization. Intell Med. 2022;2(01):48–53

32

E. Stamoulou et al.

28. Kiljunen T, Akram S, Niemelä J, Löyttyniemi E, Seppälä J, Heikkilä J, Vuolukka K, Kääriäinen O-S, Heikkilä V-P, Lehtiö K, et al. A deep learning-based automated CT segmentation of prostate cancer anatomy for radiation therapy planning-a retrospective multicenter study. Diagnostics. 2020;10(11):959 29. Heiberg E, Sjögren J, Ugander M, Carlsson M, Engblom H, Arheden H. Design and validation of segment-freely available software for cardiovascular image analysis. BMC Med Imag. 2020;10(1):1–13 30. Salehi M, Prevost R, Moctezuma J-L, Navab N, Wein W. Precise ultrasound bone registration with learning-based segmentation and speed of sound calibration. In: International conference on medical image computing and computer-assisted intervention. Berlin: Springer; 2017. p. 682–90 31. Lee Y, Veerubhotla K, Jeong MH, Lee CH. Deep learning in personalization of cardiovascular stents. J Cardiovasc Pharmacol Ther. 2020;25(2):110–20 32. Hesamian MH, Jia W, He X, Kennedy P. Deep learning techniques for medical image segmentation: achievements and challenges. J Digit Imag. 2019;32(4):582–96 33. Bang M, Eom J, An C, Kim S, Park YW, Ahn SS, Kim J, Lee SK, Lee S-H. An interpretable multiparametric radiomics model for the diagnosis of schizophrenia using magnetic resonance imaging of the corpus callosum. Transl Psychiatry. 2021;11(1):1–8 34. Chen Z, Pawar K, Ekanayake M, et al. Deep learning for image enhancement and correction in magnetic resonance imaging—state-ofthe-art and challenges. J Digit Imag. 2023;36:204–230. https://doi.org/ 10.1007/s10278-022-00721-9 35. Yamashita K, Markov K. Medical image enhancement using super resolution methods. In: International conference on computational science. Berlin: Springer; 2020. p. 496–508 36. Van Rossum G, Drake FL Jr. Python reference manual. Centrum voor Wiskunde en Informatica, Amsterdam, 1995 37. Gu Y, Zeng Z, Chen H, Wei J, Zhang Y, Chen B, Li Y, Qin Y, Xie Q, Jiang Z, et al. MedSRGAN: medical images superresolution using generative adversarial networks. Multimedia Tools Appl. 2020;79(29):21815–40 38. Tustison NJ, Cook PA, Holbrook AJ, Johnson HJ, Muschelli J, Devenyi GA, Duda JT, Das SR, Cullen NC, Gillen DL, et al. The ANTsX ecosystem for quantitative biological and medical imaging. Sci Rep. 2021;11(1):1–13 39. Zhang K, Zuo W, Chen Y, Meng D, Zhang L. Beyond a gaussian denoiser: residual learning of deep CNN for image denoising. IEEE Trans Image Process. 2017;26(7):3142–55

2 Commercial and Open-Source Tools for AI

33

40. Venkatesh V, Sharma N, Singh M. Intensity inhomogeneity correction of MRI images using InhomoNet. Comput Med Imaging Graphics. 2020;84:101748 41. Cheng X, Zhang L, Zheng Y. Deep similarity learning for multimodal medical images. Comput Methods Biomech Biomed Eng Imag Visualization. 2018;6(3):248–52 42. Spanakis C, Mathioudakis E, Kampanis N, Tsiknakis M, Marias K. Machine-learning regression in evolutionary algorithms and image registration. IET Image Process. 2019;13(5):843–49 43. Manoj S, Ranjitha S, Suresh HN. Hybrid BAT-PSO optimization techniques for image registration. In: 2016 International conference on electrical, electronics, and optimization techniques (ICEEOT). Piscataway: IEEE; 2016. p. 3590–3596 44. Wodzinski M, Müller H. DeepHistReg: unsupervised deep learning registration framework for differently stained histology samples. Comput Methods Prog Biomed. 2021;198:105799 45. Dey N, Ren M, Dalca AV, Gerig G. Generative adversarial registration for improved conditional deformable templates. In: Proceedings of the IEEE/CVF international conference on computer vision; 2021. p. 3929– 41 46. Markova V, Ronchetti M, Wein W, Zettinig O, Prevost R. Global multi-modal 2d/3d registration via local descriptors learning. Preprint. arXiv:2205.03439; 2022 47. Li J, Deep learning for cranial defect reconstruction, [Master’s Thesis, Graz University of Technology (90000)], 2020 48. Fedorov A, Beichel R, Kalpathy-Cramer J, Finet J, Fillion-Robin J-C, Pujol S, Bauer C, Jennings D, Fennessy F, Sonka M, et al. 3d slicer as an image computing platform for the quantitative imaging network. Magn Reson Imaging. 2012;30(9):1323–41 49. Schlemper J, Caballero J, Hajnal JV, Price A, Rueckert D. A deep cascade of convolutional neural networks for MR image reconstruction. In: International conference on information processing in medical imaging. Berlin: Springer; 2017. p. 647–58 50. Vasudeva B, Deora P, Bhattacharya S, Pradhan PM. Co-vegan: complexvalued generative adversarial network for compressive sensing MR image reconstruction. Preprint. arXiv:2002.10523; 2020 51. Tustison NJ, Avants BB, Cook PA, Zheng Y, Egan A, Yushkevich PA, Gee JC. N4itk: improved n3 bias correction. IEEE Trans Med Imaging. 2010;29(6):1310–20 52. Carré A, Klausner G, Edjlali M, Lerousseau M, Briend-Diop J, Sun R, Ammari S, Reuzé S, Andres EA, Estienne T, et al. Standardization of brain MR images across machines and protocols: bridging the gap for MRI-based radiomics. Sci Rep. 2020;10(1):1–15 53. Schneider CA, Rasband WS, Eliceiri KW. NIH image to ImageJ: 25 years of image analysis. Nat Methods. 2012;9(7):671–5

34

E. Stamoulou et al.

54. Bazin P-L, Cuzzocreo JL, Yassa MA, Gandler W, McAuliffe MJ, Bassett SS, Pham DL. Volumetric neuroimage analysis extensions for the MIPAV software package. J Neurosci Methods. 2007;165(1):111– 21 55. Götz M, Nolden M, Maier-Hein K. MITK phenotyping: an opensource toolchain for image-based personalized medicine with radiomics. Radiother Oncol. 2019;131:108–11 56. Nioche C, Orlhac F, Boughdad S, Reuzé S, Goya-Outi J, Robert C, Pellot-Barakat C, Soussan M, Frouin F, Buvat I. LIFEx: a freeware for radiomic feature calculation in multimodality imaging to accelerate advances in the characterization of tumor heterogeneity. Cancer Res. 2018;78(16):4786–9 57. Yaniv Z, Lowekamp BC, Johnson HJ, Beare R. Simpleitk imageanalysis notebooks: a collaborative environment for education and reproducible research. J Digital Imaging. 2018;31(3):290–303 58. Van Griethuysen JJM, Fedorov A, Parmar C, Hosny A, Aucoin N, Narayan V, Beets-Tan RGH, Fillion-Robin J-C, Pieper S, Aerts HJWL. Computational radiomics system to decode the radiographic phenotype. Cancer Res. 2017;77(21):e104–7 59. van Timmeren JE, Cester D, Tanadini-Lang S, Alkadhi H, Baessler B. Radiomics in medical imaging—“how-to” guide and critical reflection. Insights Imaging. 2020;11:91 60. Szczypiński PM, Strzelecki M, Materka A, Klepaczko A. Mazda– the software package for textural analysis of biomedical images. In: Computers in medical activity. Berlin: Springer; 2009. p. 73–84. 61. Zhang L, Fried DV, Fave XJ, Hunter LA, Yang J, Court LE. Ibex: an open infrastructure software platform to facilitate collaborative work in radiomics. Med Phys. 2015;42(3):1341–53 62. RadiomiX Research Toolbox. https://radiomics.bio/radiomix-toolbox/. Available online; accessed on 21 Nov 2022 63. Talukder S. GPU-based medical image segmentation: Brain MRI analysis using 3d slicer. In: Artificial intelligence applications for health care. Boca Raton: CRC Press; 2022. p. 109–121 64. Deep Learning Studio. https://deeplearningstudio.com/. Available online; accessed on 21 Nov 2022 65. Nvidia’s Digits System. https://developer.nvidia.com/digits. Available online; accessed on 21 Nov 2022 66. Zwanenburg A, Vallières M, Abdalah MA, Aerts HJWL, Andrearczyk V, Apte A, Ashrafinia S, Bakas S, Beukinga RJ, Boellaard R, et al. The image biomarker standardization initiative: standardized quantitative radiomics for high-throughput image-based phenotyping. Radiology. 2020;295(2):328–38 67. Zwanenburg A, Leger S, Vallières M, Löck S. Image biomarker standardisation initiative; arxiv 2016. Preprint. arXiv:1612.07003

2 Commercial and Open-Source Tools for AI

35

68. Fortin J-P, Cullen N, Sheline YI, Taylor WD, Aselcioglu I, Cook PA, Adams P, Cooper C, Fava M, McGrath PJ, et al. Harmonization of cortical thickness measurements across scanners and sites. Neuroimage. 2018;167:104–20 69. Garcia-Dias R, Scarpazza C, Baecker L, et al. Neuroharmony: A new tool for harmonizing volumetric MRI data from unseen scanners. Neuroimage. 2020;220:117127. https://doi.org/10.1016/j.neuroimage.2020. 117127 70. Bouhali O, Bensmail H, Sheharyar A, David F, Johnson JP. A review of radiomics and artificial intelligence and their application in veterinary diagnostic imaging. Vet Sci. 2022;9(11):620 71. Wagner MW, Namdar K, Biswas A, Monah S, Khalvati F, Ertl-Wagner BB. Radiomics, machine learning, and artificial intelligence—what the neuroradiologist needs to know. Neuroradiology. 2021;63(12):1957–67 72. Webb GI, Keogh E, Miikkulainen R. Naïve bayes. Encycl Mach Learn. 2010;15:713–14 73. Wright RE, Logistic regression. In: Grimm LG, Yarnold PR, editors. Reading and understanding multivariate statistics. American Psychological Association; 1995. p. 217–44 74. Suthaharan S. Support vector machine. In: Machine learning models and algorithms for big data classification. Berlin: Springer; 2016. p. 207–35 75. Myles AJ, Feudale RN, Liu Y, Woody NA, Brown SD. An introduction to decision tree modeling. J Chemom J Chemom Soc. 2004;18(6):275– 85 76. Breiman L. Random forests. Mach Learn. 2001;45(1):5–32 77. Puttagunta M, Ravi S. Medical image analysis based on deep learning approach. Multimedia Tools Appl. 2021;80(16):24365–98 78. Hodgson J. The 5 Stages of Machine Learning Validation. Retrieved https://towardsdatascience.com/the-5-stages-of-machinefrom learning-validation-162193f8e5db, 2022 79. Karthikeyan N. Step-by-step guide for a deep learning project. Retrieved from https://medium.com/@neelesh_k/structuring-deeplearning-projects-b83d29513aea, 2022 80. Brownlee, J. Machine learning mastery with python: understand your data, create accurate models, and work projects end-to-end; 2016. Melbourne: Machine Learning Mastery 81. Hutter F, Kotthoff L, Vanschoren J. Automated machine learning: methods, systems, challenges; 2019. Berlin: Springer Nature 82. Mustafa A, Azghadi MR. Automated machine learning for healthcare and clinical notes analysis. Computers. 2021;10(2):24 83. Microsoft AutoMl. https://www.microsoft.com/en-us/research/project/ automl/. Available online; accessed on 21 Nov 2022 84. Goudas T, Doukas C, Chatziioannou A, Maglogiannis I. A collaborative biomedical image-mining framework: application on the image analysis of microscopic kidney biopsies. IEEE J Biomed Health Inf. 2012;17(1):82–91

36

E. Stamoulou et al.

85. Tsamardinos I, Charonyktakis P, Papoutsoglou G, Borboudakis G, Lakiotaki K, Zenklusen JC, Juhl H, Chatzaki E, Lagani V. Just add data: automated predictive modelling for knowledge discovery and feature selection. NPJ Precis Oncol. 2022;6:38 86. KNIME Software. 2021. https://www.knime.com/knime-software/. Available online; accessed on 18 Nov 2022 87. BigML, Inc. Corvallis, Oregon, USA, 2011. https://bigml.com. Available online; accessed on 21 Nov 2022 88. Witten IH, Frank E, Hall MA, Pal CJ, MINING DATA. Practical machine learning tools and techniques. In: Data mining. vol. 2; 2005 89. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The weka data mining software: an update. ACM SIGKDD Explorations Newsl. 2009;11(1):10–8 ˇ Hoˇcevar T, Milutinoviˇc M, 90. Demšar J, Curk T, Erjavec A, Gorup C, Možina M, Polajnar M, Toplak M, Stariˇc A, Štajdohar M, Umek L, Žagar L, Žbontar J, Žitnik M, Zupan B. Orange: data mining toolbox in python. J Mach Learn Res. 2013;14:2349–53 91. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X. TensorFlow: large-scale machine learning on heterogeneous systems; 2015. Software available from https://tensorflow.org 92. Feurer M, Klein A, Eggensperger K, Springenberg J, Blum M, Hutter F. Efficient and robust automated machine learning. In: Advances in neural information processing systems 28; 2015. p. 2962–70 93. Jin H, Song Q, Hu X. Auto-Keras: an efficient neural architecture search system. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. New York: ACM; 2019. p. 1946–56 94. Vasile M-A, Florin POP, Mihaela-C˘at˘alina N, Cristea V. MLBox: machine learning box for asymptotic scheduling. Inf Sci. 2018;433:401– 416 95. Kotu V, Deshpande B. Predictive analytics and data mining: concepts and practice with RapidMiner. Burlington: Morgan Kaufmann; 2014 96. Orlhac F, Eertink JJ, Cottereau A-S, Zijlstra JM, Thieblemont C, Meignan M, Boellaard R, Buvat I. A guide to combat harmonization of imaging biomarkers in multicenter studies. J Nucl Med. 2022;63(2):172–9 97. Fernandes S, Chong JJH, Paige SL, Iwata M, Torok-Storb B, Keller G, Reinecke H, Murry CE. This is an open access article under the cc by license http://creativecommons.org/licenses/by/4.0/

2 Commercial and Open-Source Tools for AI

37

98. Da-Ano R, Visvikis D, Hatt M. Harmonization strategies for multicenter radiomics investigations. Phys Med Biol. 2020;65(24):24TR02 99. Mali SA, Ibrahim A, Woodruff HC, Andrearczyk V, Müller H, Primakov S, Salahuddin Z, Chatterjee A, Lambin P. Making radiomics more reproducible across scanner and imaging protocol variations: a review of harmonization methods. J Pers Med. 2021;11(9):842 100. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, editors. Advances in neural information processing systems 30. Red Hook: Curran Associates Inc.; 2017. p. 4765–74 101. Loh HW, Ooi CP, Seoni S, Barua PD, Molinari F, Acharya UR. Application of explainable artificial intelligence for healthcare: A systematic review of the last decade (2011-2022). Comput Methods Programs Biomed. 2022;226:107161. https://doi.org/10.1016/j.cmpb. 2022.107161

Introduction to Machine Learning in Medicine Rossana Buongiorno, Claudia Caudai, Sara Colantonio, and Danila Germanese

3.1

Introduction

The increasing availability of patient-related data is driving new research trends addressing new personalized prediction and disease management. Nevertheless, the complexity of such an analysis makes it necessary the use of cognitive augmentation in the form of Artificial Intelligence (AI) systems [1, 2]. Modern AI techniques have considerable potential to exploit complex medical data toward an improvement of the current healthcare. Many of the advances in this field are tied to progress in a subdomain of AI research known as Machine Learning (ML). The scientist Arthur Lee Samuel was the first to introduce the term Machine Learning in 1959. He created the checkers player, a programme designed to develop its own logic while playing and self-improve its performance.

R. Buongiorno · C. Caudai · S. Colantonio () · D. Germanese Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa, Italy e-mail: [email protected]; [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. E. Klontzas et al. (eds.), Introduction to Artificial Intelligence, Imaging Informatics for Healthcare Professionals, https://doi.org/10.1007/978-3-031-25928-9_3

39

3

40

R. Buongiorno et al.

ML algorithms are fruitfully applied in many research fields involving the most varied applications [3–5]. As far as applications in the medical and clinical fields are concerned, many uses, advantages, and opportunities can be enumerated [6–8]. Nevertheless, ML is not yet used in the medical field as a direct diagnostic tool, but exclusively as a support to the diagnosis, because human supervision is still indispensable and probably will remain so for a long time. A vast trend is that of the analysis and interpretation of medical images, such as, for example, X-ray, US, echo, RM, using ML algorithms. Another widespread use is the prediction of the prognosis associated with particular diseases, such as different types of cancer. ML algorithms can also help automatically select patients suitable for particular clinical pathways and experimental trials. The analysis of the ML anatomical features can also be very useful in the real-time and non-realtime support of surgery, with the aim of promptly identifying unexpected critical or irregular anatomical structures. Other uses also include drug discovery, genomic screening, cardiovascular tasks, and epidemiology. In Table 3.1, many examples of medical applications are summarized. In this chapter, we aim to describe, as simply as possible, what Machine Learning is and how it is possible to use it fruitfully in the medical field. In Sect. 3.2, we describe the flow of a learning algorithm, in Sect. 3.3, instead we report the main ML techniques and their most widespread clinical applications. In Sect. 3.4, however, we briefly address some highly interesting issues (i.e., model evaluation, explanability, reproducibility, sharing, and ethical and legal problems) and the great challenges of precision medicine and personalized medicine, which are increasingly handy thanks to Machine Learning.

3.2

What Is Machine Learning?

Machine learning is a branch of AI which comprises elements of mathematics, statistics, and computer science. The term machine learning describes the ability of an algorithm to “learn” from data by recognizing patterns and making inferences and/or predictions

3 Introduction to Machine Learning in Medicine

41

Table 3.1 Main medical applications for Machine Learning methods Applications

Description Cardio Vascular Cardiovascular risk prediction Tasks Hyper-myocardial infarction prediction Coronary artery disease detection Heart failure prediction Stroke risk prediction Cardiac arrhythmias evaluation Human Cancer Cancer detection on CT-scan, X-ray, US, MRI images Imaging Lesion segmentation Regression on cancer grade Cancer risk evaluation Cancer evolution prediction Cancer classification Cancer genomics classification Cancer Genomics Identification of pathogenic variants Detection of oncogenic states Bioactivity prediction Cancer cell-line specific interactions Clinical trials monitoring Prediction of protein secondary structure Functional Genomics Prediction of epigenomic marks Transcriptional and post-transcriptional regulation Investigation Transcription biological rules detection Protein Mass Spectrometry classification Metabolic syndrome risk prediction Metabolic Disorders Alignment of metabolomic peaks Metabolites classification Kinetic metabolic modelling Prediction of Oestrogen receptor status

Ref. 2018 [9] 2016 [10] 2018 [11] 2011 [12]

2018 [13] 2016 [14] 2019 [15] 2022 [16] 2021 [17] 2019 [18] 2007 [19] 2013 [20] 2014 [21] 2003 [22] 2014 [23] 2021 [24] 2001 [25] 2011 [26] 2018 [27] 2015 [28] 2020 [29] 2021 [30] 2021 [31] 2014 [32] 2017 [33] 2016 [34] 2022 [35] (continued)

42

R. Buongiorno et al.

Table 3.1 (continued) Applications

Description Survival Prediction Disease prognosis Trials outcome prediction Identification of risk factors Disease recurrence prediction Drug target identification Drug Discovery Target druggability prediction Splice variants classification Anticancer drugs prioritization Prognostic Prediction

Ref. 2018 [36] 2019 [37] 2019 [38] 2016 [39] 2018 [40] 2018 [41] 2016 [42] 2015 [43] 2019 [44]

of future events with minimal human intervention and without explicit programming. In other words, the results produced by ML algorithms are inferences made from complex statistical analyses of adequately large datasets, expressed as the likelihood of a relationship between variables [45]. Furthermore, machine learning methods improve their performance adaptively as the number of examples from which they learn increases. Over the past 5 years, machine-learned tools have demonstrated visible successes in the medical field, and in particular in disease detection, patient monitoring, prognosis prediction, surgical assistance, patient care, and systems management, by supporting complex clinical decision-making [46, 47]. A wide variety of machine learning algorithms are in use today. The choice of a particular model for a given problem is determined by the characteristics of the data as well as the type of desired outcome. The majority of ML algorithms can be categorized into three types learning techniques: supervised learning, unsupervised learning, and reinforcement learning (Fig. 3.1). However, any type of algorithm consists of a series of key steps, as shown in Fig. 3.2: • Data collection: as machines learn from the data that one gives them, it is of the utmost importance to collect data from reliable sources. The higher the quality of the data, the higher

3 Introduction to Machine Learning in Medicine

43

Fig. 3.1 ML algorithms are generally categorized as supervised learning, unsupervised learning, and reinforcement learning

the accuracy of the developed model. Good data are relevant, contain very few missing and repeated values, and have a good representation of the various subcategories/classes present. Noisy or incorrect data will clearly reduce the effectiveness of the model. • Data pre-processing: after collecting data, they have to be correctly prepared. For instance, they may be randomized to make sure that data are evenly distributed, and the ordering does not affect the learning process. Also, data may be cleaned by removing missing/duplicate values, by converting data type, etc. Finally, the cleaned data should be split into two sets: a training set and a testing set. The training set is the set the model learns from. The testing set is used to check the accuracy of the trained model.

44

R. Buongiorno et al.

Fig. 3.2 Core steps of any type of ML approach

• Model training: the prepared data are analysed, elaborated, and interpreted by the machine learning model to find patterns and make predictions. Over time, with training, the model gets better at learning from the data and inferring. • Model testing: after training the model, its performances have to be checked. This is done by testing the accuracy and the speed of the model on previously unseen data (testing set). • Model improving: this step is also known as Parameter Tuning. Once the model is created and evaluated, the tuning of model parameters allows for improving the accuracy of the model itself. Parameters are the variables in the model that better fit the relationship between the data. At certain values of the parameters set, the accuracy may reach the maximum. Parameter tuning refers to finding these values. In the following sections, the three main types of learning are described: supervised, unsupervised, and reinforcement learning.

3 Introduction to Machine Learning in Medicine

3.3

45

Principal ML Algorithms

Machine learning concerns three main types of algorithms: supervised learning, unsupervised learning, and reinforcement learning. The difference between them is defined by how each algorithm learns the data to make predictions.

3.3.1

Supervised Machine Learning

Supervised learning refers to approaches in which a model is trained on a set of numerical inputs (or features, or predictors) which are associated with known outcomes (also referred to as ground truth or prior knowledge). As reported in Fig. 3.3, the goal in the first stage of learning is to best approximate the relationship between input and output observables in the data. In the validation step, the model is iteratively improved to reduce the error of prediction using optimization techniques: in other words, the learning algorithm iteratively compares its predictions with the correct output (ground truth label) and finds errors in order to modify itself accordingly. Once the algorithm is successfully trained, it will be able to make outcome predictions when applied to new data. Predictions can be either discrete (sometimes referred to as classes, e.g., positive or negative, benign or malignant, no risk– low risk–high risk, etc.) or continuous (e.g., a value from 0 to 100). A model that maps input to a discrete output is based on a classification algorithm (Fig. 3.4). Examples of classification algorithms include those which predict if a tumour is benign or malignant or to establish whether comments written by a patient convey a positive or a negative sentiment [48–50]. Classification algorithms return the probability of a class (between 0 for impossible and 1 for definite). Typically, a probability of 0.50 will be transformed into a class of 1, but this threshold may be modified according to the required algorithm performance. A model that maps input to a continuous value is based on a regression algorithm (Fig. 3.4). A regression algorithm might be

46

R. Buongiorno et al.

Fig. 3.3 Three steps of Supervised Learning: (i) training of the algorithm, (ii) validation of the trained model, and (iii) test of the model

Fig. 3.4 How classification (on the left) and regression algorithms (on the right) work. Classification algorithms find out the better hyperplane(s) which divides the data into two (or more) classes; a regression model aims to find out the better function that approximates the trend of the data

used, for instance, to estimate the percentage of fat in the liver in case of steatosis or predict an individual’s life expectancy [51,52].

3 Introduction to Machine Learning in Medicine

47

Note that in [51] the estimation is performed based on ultrasound images. For this type of tasks, i.e., image processing, the predictors must be processed by a feature selector. A feature selector extracts measurable characteristics from the images dataset which then can be represented in a numerical matrix and understood by the algorithm (see Fig. 3.3). Four key concerns to be considered in supervised learning are: • Bias-variance trade-off: in any supervised model, there is a balance between bias, which is the constant error term, and variance, which is the amount by which the error may vary between different training sets. Increasing bias will usually lead to lower variance and vice versa. Generally, in order to produce models that generalize well, the variance of the model should scale with the size and complexity of the training data: small datasets should usually be processed with low-variance models, and large, complex datasets will often require higher variance models to fully learn the structure of the data. • Model complexity and volume of training data: the proper level of model complexity is generally determined by the nature of the training data. A small amount of data, or data that are not uniformly spread throughout different possible scenarios, will be better explained by a low-complexity model. This is because a high-complexity model will overfit if used on a small number of data points. • Overfitting: it refers to learning a function that fits the training data very well but does not generalize to other data points. In other words, the model strictly learns the training data without learning the actual trend or structure in the data themselves that leads to those outputs. • Noise in the output values: this issue concerns about the amount of noise in the preferred output values. Noisy or incorrect data labels will clearly reduce the effectiveness of the trained model. Here, the most prominent and common methods used in supervised machine learning are reported: Linear Regression, Support

48

R. Buongiorno et al.

Vector Machine, Random Decision Forest, Extreme Gradient Boosting, and Naive Bayes.

3.3.1.1 Linear Regression Linear Regression (LR) is one of the simplest and most used supervised learning methods. In essence, this method formalizes and identifies the relationship between two or more variables. The assumption of linearity on the cost function is very strong, and therefore more complex cost function regression methods have been developed: Non-linear Regression, Polynomial Regression, Logistic Regression with Sigmoid function, Poisson Regression, and many others. As one of the oldest approaches, LR has been widely used in many fields, including medical [15]. This approach is mainly used when a relationship between variables is strongly assumed, and the value of one variable (unknown) is to be deduced starting from the values of the other (known). A recent example of the use of LR in medicine concerns the prediction of the evolution of systemic diseases starting from clinical evidence [31, 53], or in the genomic field, for example, it can be useful for estimating gene expression patterns in particular biological conditions [54]. 3.3.1.2 Support Vector Machine Support Vector Machines (SVMs) are supervised learning methods for binary classification. The SVMs represent the data as points in space, building a hyperplane, as wide as possible, which can be positioned as a separator between the two classification categories. The SVMs perform a linear classification, but it is also possible to perform a nonlinear classification using an adequate kernel, projecting the data into a multi-dimensional rather than a two-dimensional space. This algorithm is used in many classification and regression problems, and in the medical field it is often used for signal separation or for clinical discrimination starting from well-specified characteristics [55, 56]. A very interesting and promising use concerns the early diagnosis [20] or the classification of some types of cancer starting from genomic data [19, 57].

3 Introduction to Machine Learning in Medicine

49

3.3.1.3 Random Decision Forest Random Decision Forests (RDFs) were first proposed by Tin Kam Ho in 1995 [58]. They is a learning method based on training many Decision Trees (DTs), from which a decision strategy is then aggregated. The various Decision Trees are based on the observation of certain characteristics of the data, selected randomly. RDTs are often unstable methods but have the great advantage of being easily interpretable. They can be used for both regression and classification. It is mainly used in problems on which there is not yet a precise idea of the weight of the data characteristics or of the relationships between them. In the medical field, it is a widely used method for relating clinical features and pathologies [22, 59–62]. In a recent work by Wang et al. [63], RDFs have been used to detect the factors that most impact on medical expenses for diabetic individuals in the USA, while Hsich [12] studied the most critical risk factors for survival in patients with systolic heart failure through Random Survival Forests (RSFs). RDF models have also been used for the analysis of genomic data [64]. 3.3.1.4 Extreme Gradient Boosting Extreme Gradient Boosting is a supervised Machine Learning technique for regression and classification problems that aggregates an ensemble of weak individual models to obtain a more accurate final model. This method is applied to multicollinearity problems in which there are high correlations between the variables. It helps a lot in improving the predictive accuracy of the model and is often used in risk assessment problems. In the medical field, it has been used in many applications, and some examples are represented by the evaluation of the outcomes of clinical treatments [65] or the study of systemic diseases that depend on multifactorial conditions that are difficult to interpret [66]. 3.3.1.5 Naive Bayes Bayes Classifiers are ML methods that use Bayes’ theorem for the classification process. These classifiers are very fast and, despite their simplicity, are efficient at many complex tasks, even with

50

R. Buongiorno et al.

small training datasets. They are used to calculate the conditional probability of an event, based on the information available on other related events. A disadvantage of such classifiers is the fact that they require knowledge of all the data of the problem, especially the simple and conditional probabilities (information that is difficult to obtain). They also assume the independence of the characteristics of the input and therefore provide a simple approximation (naive) of the problem. In the medical field, they have often been used in classification problems [67] or in feature selection [68]. Silla et al. [69] used an extension of the Naive Bayes approach in the context of proteomics, for the hierarchical classification of protein function, while Sandberg et al. [70] used a naive Bayes classifier for the analysis of complete sequences of bacterial genomes, capturing highly specific genomic signatures.

3.3.2

Unsupervised Machine Learning

In contrast with supervised learning, unsupervised learning models process unlabeled data to uncover the underlying data structure. In unsupervised learning, patterns are found out by algorithms without any input from the user. Unsupervised techniques are thus used to find undefined patterns or clusters of data points which are “closer” or more similar to each other. A visual illustration of an unsupervised dimension reduction technique is given in Fig. 3.5. In this figure, the raw data (represented by various shapes in the left panel) are presented to the algorithm which then groups the data into clusters of similar data points (represented in the right panel). Note that data that do not have sufficient commonality to the clustered data are typically excluded, thereby reducing the number of features within of the dataset. Indeed, these techniques are often referred to as dimension reduction techniques. Unsupervised methods ability to discover similarities and differences in information make them the ideal solution for exploratory data analysis. The output is highly dependent on the algorithm and hyperparameters selected. Hyperparameters, also called tuning parameters, are values used to control the behaviour

3 Introduction to Machine Learning in Medicine

51

Fig. 3.5 How Unsupervised Machine Learning algorithms work. They use a more self-contained approach, in which a computer learns to identify patterns without any guidance, only inputting data that are unlabeled and for which no specific output has been defined. In practice, this type of algorithms will learn to divide the data into different clusters based on the characteristics that unite or discriminate them the most

of the ML algorithm (e.g., a number of clusters, distance or density thresholds, type of linkage between clusters). Algorithms exist to detect clusters based on spatial distance between data points, space or subspace density, network connectivity between data points, etc. By compressing the information in a dataset into fewer features, or dimensions, issues including multiple collinearity between data or high computational cost of the algorithm may be avoided. Unsupervised approaches also share many similarities to statistical techniques which will be familiar to medical researchers. Unsupervised learning techniques make use of similar algorithms used for clustering and dimension reduction in traditional statistics. Those familiar with Principal Component Analysis, for instance, will already be familiar with many of the techniques used in unsupervised learning. Here, the most prominent and common methods used in Unsupervised Machine Learning are reported: k-Nearest Neighbours, Principal Component Analysis, and k-Means Clustering.

52

R. Buongiorno et al.

3.3.2.1 k-Nearest Neighbours k-Nearest Neighbours is an instance-based learning algorithm, used for pattern recognition for classification or regression. The algorithm uses similarity criteria between elements that are close to each other. In essence, the closest neighbours contribute more to the attribution of characteristics than the distant ones. The parameter k represents the number of neighbours that will contribute to the training in the feature space. This algorithm is often used in problems concerning the recognition of similarity patterns aimed at classification, and it is a fast, precise, and efficient algorithm but has the disadvantage that the precision of the predictions is strongly dependent on the quality of the data. In the medical field, it has often been used for the analysis of hidden patterns on a very big amount of data from clinical repositories [71] and in the genomic field [72]. 3.3.2.2 Principal Component Analysis Principal Component Analysis (PCA, also called the Karhunen– Loève transform) is a statistical procedure for dimensionality reduction of the space of variables. The PCA consists of a linear transformation of the variables that projects the original ones into a new Cartesian system in which the new variables try to transfer most of the significance (variance) of the old variables onto a plane, thus obtaining a dimensional reduction without losing too much information. One major limitation of this method is that it can only capture linear correlations between variables. To overcome this disadvantage, sparse PCA and nonlinear PCA have been recently introduced. PCA is widely used especially in the fields of medicine and psychology, where the scientists works with datasets made up of numerous variables [26, 73, 74]. 3.3.2.3 k-Means Clustering K-Means Clustering is a vector quantization method for partitioning input data into k clusters. The goal of the algorithm is to minimize the total intra-group variance; each group is identified by a centroid or a mid-point. At each step of the algorithm, the input points are assigned to the group with the closest centroid to them. The centroids are recalculated at each step until the

3 Introduction to Machine Learning in Medicine

53

algorithm converges and the centroids are stable. k-Means is both a simple and an efficient algorithm for clustering problems, but it has the drawback of being very sensitive to outliers, which can significantly deviate the centroids, and of having to choose the number of clusters a priori. In medicine, it is mainly used in situations where a lot of unlabelled data are available [75–77].

3.3.3

Artificial Neural Networks

Artificial Neural Networks deserve a very long description, which is beyond the scope of this chapter. We will only say that they are computational learning models made up of artificial “neurons,” inspired by the simplification of biological neural networks. Such models consist of layers of artificial neurons and processes using computational connections. They are adaptive systems, which change their structure based on the information they process during the learning phase. The layers of neurons can also be very deep (hence the term Deep Learning). There are several ANN models, which can be trained in a supervised or unsupervised manner. They are used to create very complex learning algorithms, with very high abstraction capabilities, which are therefore difficult to interpret. In the medical field, they have many applications, especially in areas where large amounts of data are available (Big Data) [78–81]. Among the ANNs, the Convolutional Neural Networks (CNNs) deserve particular attention, in which the pattern of connectivity between neurons is inspired by the organization of the animal visual cortex, and for this reason they are particularly suitable for processing images, widely used for the analysis of medical images for detection, segmentation, and classification of anomalies or lesions [14, 82, 83].

3.3.4

Reinforcement Learning

Reinforcement learning is a machine learning technique in which a computer (agent) keeps learning continuously to perform a task through repeated trial-and-error interactions with an interactive environment. In other words, agent is self-trained on reward and

54

R. Buongiorno et al.

Fig. 3.6 Basic diagram of Reinforcement Learning

punishment mechanisms (see Fig. 3.6). This learning approach allows the agent to make a series of decisions that maximize a reward metric for the activity, without being explicitly programmed to do so and without human intervention.

3.4

Issues and Challenges

3.4.1

Data Management

Data used in ML applications, derived by medical protocols and experimental trials, may be incomplete, contain errors, biases, and artefacts. Moreover, some data point may be missing for some of the samples. In this scenario, data imputation, denoising, and integration should be part of the design of ML algorithms applied to medicine. We will not go into detail, but it is important to know that the performance of AI algorithms is strongly dependent on the quality of the dataset and that in the case of unbalanced, incomplete, noisy, biased datasets, there are many solutions to be applied to improve datasets and performances [84–88].

3.4.2

Machine Learning Model Evaluation Metrics

Evaluating the developed machine learning model is an essential part of any project. Here are listed the most widely used evaluation metrics:

3 Introduction to Machine Learning in Medicine

55

• Classification Accuracy: it is the sum of all the correct predictions divided by the total number of input samples. Nevertheless, it works well only in the case of balanced dataset, i.e., there are equal number of samples belonging to each class. In the case of unbalanced samples, it can be misleading and give the false impression of achieving high accuracy. For instance, if we have 98% samples of class “A” and 2% samples of class “B,” the model can easily get 98% training accuracy by simply predicting every training sample belonging to class A. But when tested on a test set with 60% samples of class A and 40% samples of class B, the model can get a lower accuracy of 60%. This issue may be a real problem when the cost of misclassification of the minor class samples is very high: in the case of serious pathologies, the cost of not diagnosing a sick person’s disease is much higher than the cost of further testing a healthy person. • Logarithmic Loss (or Log Loss): it penalizes the false classifications and works well for multi-class classification. Log Loss nearer to 0 indicates higher accuracy. In general, minimizing Log Loss gives higher classifier accuracy. • Area Under Curve (AUC): it is one of the most widely used metrics for evaluation, especially for binary classification problem. AUC of a classifier indicates the probability that the classifier will rank a randomly chosen positive example higher than a randomly chosen negative example. To understand how it is computed, let us introduce (i) the True Positive Rate (TPR, or Sensitivity), which corresponds to the proportion of positive data samples that are correctly considered as positive, with respect to all positive data samples and (ii) the True Negative Rate (TNR, or Specificity), which corresponds to the proportion of negative data samples that are correctly considered as negative, with respect to all negative data samples. Sensitivity and (1—Specificity) are plotted at varying threshold values in the range [0,1] and the Receiver Operating Characteristic (ROC) curve is graphed. AUC is the area under ROC curve, and the higher the value, the better the performance of the ML model.

56

R. Buongiorno et al.

• Mean Absolute Error: it is the average of the difference between the correct outputs and the predicted outputs. It gives us the measure of how far the predictions were from the actual values. Nevertheless, they do not give any idea whether we are under predicting the data or over predicting the data. • Mean Squared Error (MSE): it is quite similar to Mean Absolute Error, as it takes the average of the square of the difference between the correct outputs and the predicted outputs. As the square of the error is calculated, the effect of larger errors becomes more pronounced than smaller error, and hence the model can focus more on larger errors.

3.4.3

Explainability, Interpretability, and Ethical and Legal Issues

Both the interpretability and the explainability of ML algorithms go in the direction of strengthening trust in ML algorithms [89, 90]. Interpretability concerns the way in which the ML model reaches its conclusions and has the aim of verifying that the accuracy of the model derives from a correct representation of the problem and not from artefacts present in the training data, while explainability seeks to explain the reasons why an algorithm made one decision rather than another. The concept of explainability is receiving more and more attention from a legal point of view, according to the right of every individual to receive detailed explanations regarding the decisions that impact on his life. In general, explainability and interpretability are well achievable with simple classification algorithms such as decision trees or linear regressors, but the relationships between data characteristics are not always linear, and consequently, these approaches may not be suitable in many tasks. Most of the ML algorithms are too complex to be understood directly, so it is necessary to adopt the post hoc analysis [91]. In some cases, the explanation is achieved by deriving a transparent model that mimics the original one, although such an approach is not always possible. It is often necessary to

3 Introduction to Machine Learning in Medicine

57

work hard to obtain explanations, through empirical approaches, recursive attempts, and series of examples. Applications of ML in the medical field are often directed toward diagnosis and therapy, and as with doctors, they can make mistakes, generate delays, and sometimes lead toward wrong therapies, running into major legal and economic problems. For this reason, the use of ML in the medical field is still limited to supporting diagnosis, providing for the review of outputs by human experts. Another very important aspect of the application of ML in medicine concerns the equity of data. By their nature, and certainly not by intention, artificial intelligence algorithms can be highly discriminatory against minorities, generating deep ethical issues [92,93]. Given their heavy dependence on available data, it is clear that their performance will be very good for people from well-represented populations and very bad, misleading and even dangerously wrong for people from underrepresented populations [94]. With most of the medical data repositories coming from hospitals in rich, industrialized countries, developing countries risk finding themselves completely shut out of new medical advances if they rely heavily on ML. This is certainly a problem that needs attention and prompt solutions.

3.4.4

Perspectives in Personalized Medicine

The synergy between artificial intelligence and precision medicine is revolutionizing the concepts of diagnosis, prognosis, and assistance [95, 96]. Conventional symptom-based treatment of patients is slowly giving way to more holistic approaches in which aggregate data and biological indicators of each patient are combined with specific observations and general patterns inferred from artificial intelligence approaches on large numbers of patients. Genetics, genomics, and precision medicine, combined with machine learning and deep learning algorithms, make it possible to generate a personalized therapy even for individuals with less common therapeutic responses or with particular medical needs [97–99].

58

R. Buongiorno et al.

In general, the shared adoption of the EHR (Electronic Health Record) format for storing clinical data has allowed for the systematic collection of a great deal of information on the health of individuals in digital format. This format has greatly facilitated the use of ML tools in the medical field, thanks to the uniformity of data and the ease of retrieval and use. Other projects born with the aim of facilitating the application of ML in the medical field are the health databases containing data of millions of individuals, such as the All of US Research Program Project, the Human Genome Project, the UK Biobank, the IARC Biobank, and the European Biobank. To conclude our contribution on the innovative, indeed, revolutionary perspectives of the application of ML in medicine, we would like to mention the Digital Twins, models that mimic the biological background of a patient as much as possible, making it possible to test drugs, therapies, treatments, maximizing the results, and minimizing the risks on the patient’s health [100].

3.5

Conclusions

The introduction of Artificial Intelligence has upset all fields of research, necessarily medicine as well. It has changed our perception of diseases, of treatments, our relationship with doctors, it has changed procedures, and it has opened many doors and brought to the table issues and problems that we had not really thought about yet. We consider ML a huge opportunity to improve more or less everything we can operate on, but it is a tool to understand, to handle with care, and to use with attention. In this chapter, we have provided some basic indications to navigate the ocean of ML, especially for non-experts. We have described the most used methods, giving guidance on when to use them, and we have provided plenty of bibliography, so you know where to go further. We have tried not so much to provide answers, but rather to help understand what are the right questions to ask when you want to use ML in the medical field, which is already a very important starting point.

3 Introduction to Machine Learning in Medicine

59

USEFUL GLOSSARY . . .

. . . . . .

.

.

.

.

Accuracy: Measure of the algorithm ability in giving correct predictions. Algorithm: Any systematic calculation scheme or procedure. Classification: Learning process in which the data are divided into two or more classes and the system assigns one or more classes among the available ones to an input. Clustering: Learning process in which a set of data is divided into groups that are not known a priori. Features: Interesting parts, qualities or characteristics of something. Layer: Collection of nodes operating together. Model: Formal representation of knowledge related to a phenomenon. Normalisation: Process of feature transformation to obtain a similar scale. Neural Networks: Computational model made of artificial neurons, vaguely inspired by the simplification of a biological neural network. Node: Computational unit (also called artificial neuron) which receives one or more inputs and composes them to produce an output. Overfitting: Excessive reliance of the model on training data, leading to inability to generalise and evaluate other data well. Pre-processing: Adjusting data before it is used in order to ensure or improve performance in the data mining process. Regression: Learning process similar to classification, with the difference that the output has a continuous domain and not a discrete one. (continued)

60

R. Buongiorno et al.

.

. .

.

.

Training: the process of creating a model from the training data. The data is fed into the training algorithm, which learns a representation for the problem, and produces a model. Also called “learning”. Training Set: Set of data used in input during the learning process to fit the parameters. Test Set: Set of data, independent to the training set, used only to assess the performances of a fully specified classifier or a regressor. Validation Set: Set of data used to tune the parameters and to assess the performances of a classifier or a regressor. It is sometimes also called the development set (dev set). Weights: Parameters within a neural network calibrating the transformation of input data.

References 1. Wartman S, Combs C. Medical education must move from the information age to the age of artificial intelligence. Acad Med. 2018;93:1107–9. 2. Obermeyer Z, Lee T. Lost in thought – the limits of the human mind and the future of medicine. N Engl J Med. 2017;377(13):1209–11. 3. Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, Li B, Madabhushi A, Shah P, Spitzer M, Zhao S. Applications of machine learning in drug discovery and development. Nat Rev Drug Discov. 2019;18:463–77. 4. Valletta JJ, Torney C, Kings M, Thornton A, Madden J. Applications of machine learning in animal behaviour studies. Anim Behav. 2017;124:203–20. 5. Recknagel F. Applications of machine learning to ecological modelling. Ecol Modell. 2001;146:303–310. 6. Garg A, Mago V. Role of machine learning in medical research: a survey. Comput Sci Rev. 2021;40:100370. 7. Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med. 2019;380:1347–1358.

3 Introduction to Machine Learning in Medicine

61

8. Erickson B, Korfiatis P, Akkus Z, Kline T. Machine learning for medical imaging. Radiographics Rev Publ Radiol Soc North Am Inc. 2017;37(2):505–15. 9. Poplin R, Varadarajan A, Blumer K, Liu Y, McConnell M, Corrado G, Peng L, Webster D. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat Biomed Eng. 2018;2:158–64. 10. Rumsfeld J, Joynt K, Maddox T. Big data analytics to improve cardiovascular care: promise and challenges. Nat Rev Cardiol. 2016;13:350–9. 11. Ramalingam V, Dandapath A, Raja M. Heart disease prediction using machine learning techniques: a survey. Int J Eng Technol. 2018;7:684. 12. Hsich E, Gorodeski E, Blackstone E, Ishwaran H, Lauer M. Identifying important risk factors for survival in patient with systolic heart failure using random survival forests. Circul Cardiovas Qual Outcomes. 2011;4:39–45. 13. Zhou M, Scott J, Chaudhury B, Hall L, Goldgof D, Yeom K, Iv M, Ou Y, Kalpathy-Cramer J, Napel S, Gillies R, Gevaert O, Gatenby R. Radiomics in brain tumor: image assessment, quantitative feature descriptors, and machine-learning approaches. Am J Neuroradiol. 2018;39:208–16. 14. Setio A, Ciompi F, Litjens G, Gerke P, Jacobs C, Riel S, Wille M, Naqibullah M, Sánchez C, Ginneken B. Pulmonary nodule detection in ct images: false positive reduction using multi-view convolutional networks. IEEE Trans Med Imag. 2016;35:1160–9. 15. Sidey-Gibbons JAM, Sidey-Gibbons CJ. Machine learning in medicine: a practical introduction. BMC Med Res Methodol. 2019;19(1):64. PMID: 30890124; PMCID: PMC6425557. https://doi.org/10.1186/s12874-0190681-4. 16. Koh DM, Papanikolaou N, Bick U, Illing R, Kahn CE Jr, Kalpathi-Cramer J, Matos C, Martí-Bonmatí L, Miles A, Mun SK, Napel S, Rockall A, Sala E, Strickland N, Prior F. Artificial intelligence and machine learning in cancer imaging. Commun Med (Lond). 2022;2:133. PMID: 36310650; PMCID: PMC9613681. https://doi.org/10.1038/s43856-022-00199-0. 17. Zerouaoui H, Idri A. Reviewing machine learning and image processing based decision-making systems for breast cancer imaging. J Med Syst. 2021;45:1–20. 18. Bi W, Hosny A, Schabath M, Giger M, Birkbak N, Mehrtash A, Allison T, Arnaout O, Abbosh C, Dunn I, Mak R, Tamimi R, Tempany C, Swanton C, Hoffmann U, Schwartz L, Gillies R, Huang R, Aerts H. Artificial intelligence in cancer imaging: clinical challenges and applications. Ca. 2019;69:127–57. 19. Liao C, Li S. A support vector machine ensemble for cancer classification using gene expression data. In: International symposium on bioinformatics research and applications; (2007). 20. Zhang F, Kaufman H, Deng Y, Drabier R. Recursive SVM biomarker selection for early detection of breast cancer in peripheral blood. BMC Med Genom. 2013;6:S4–S4.

62

R. Buongiorno et al.

21. Kircher M, Witten D, Jain P, O’Roak B, Cooper G, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014;46:310–5. 22. Vlahou A, Schorge J, Gregory B, Coleman R. Diagnosis of ovarian cancer using decision tree classification of mass spectral data. J Biomed Biotechnol. 2003;2003:308–14. 23. Hsu Y, Huang P, Chen D. Sparse principal component analysis in cancer research. Transl Cancer Res. 2014;3(3):182–90. 24. Chen L, Li H, Xie L, Zuo Z, Tian L, Liu C and Guo X. Editorial: big data and machine learning in cancer genomics. Front. Genet. 2021;12:749584. https://doi.org/10.3389/fgene.2021.749584. 25. Pan XM. Multiple linear regression for protein secondary structure prediction. Proteins. 2001;43(3):256–9. PMID: 11288175. https://doi. org/10.1002/prot.1036. 26. Taguchi Y, Okamoto A. Principal component analysis for bacterial proteomic analysis. In: 2011 IEEE international conference on bioinformatics and biomedicine workshops (BIBMW); 2011. p. 961–3. 27. Cao C, Liu F, Tan H, Song D, Shu W, Li W, Zhou Y, Bo X, Xie Z. Deep learning and its applications in biomedicine. Genomics Proteomics Bioinformatics. 2018;16:17–32. 28. Asgari E, Mofrad MRK. Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS ONE 2015;10(11): e0141287. https://doi.org/10.1371/journal.pone.0141287. 29. Mathlin J, Le Pera L, Colombo T. A census and categorization method of epitranscriptomic marks. Int J Mol Sci. 2020;21(13):4684. PMID: 32630140; PMCID: PMC7370119. https://doi.org/10.3390/ ijms21134684. 30. Caudai C, Galizia A, Geraci F, Pera L, Morea V, Salerno E, Via A, Colombo T. AI applications in functional genomics. Comput Struct Biotechnol J. 2021;19:5762–90. 31. Coelewij L, Waddington KE, Robinson GA, Chocano E, McDonnell T, Farinha F, Peng J, Dönnes P, Smith E, Croca S, Bakshi J, Griffin M, Nicolaides A, Rahman A, Jury EC, Pineda-Torra I. Serum metabolomic signatures can predict subclinical atherosclerosis in patients with systemic lupus erythematosus. Arterioscler Thromb Vasc Biol. 2021;41(4):1446– 1458. PMID: 33535791; PMCID: PMC7610443. https://doi.org/10.1161/ ATVBAHA.120.315321. 32. Da-Yuan, Liang Y, Yi L, Xu Q, Kvalheim O. Uncorrelated linear discriminant analysis (ULDA): a powerful tool for exploration of metabolomics data. Chemom Intell Lab Syst. 2008;93:70–9. 33. Alakwaa F, Chaudhary K, Garmire L. Deep learning accurately predicts estrogen receptor status in breast cancer metabolomics data. J Proteome Res. 2017;17:337–47.

3 Introduction to Machine Learning in Medicine

63

34. Khodayari, A., Maranas, C. A genome-scale Escherichia coli kinetic metabolic model k-ecoli457 satisfying flux data for multiple mutant strains. Nat Commun 2016;7:13806. https://doi.org/10.1038/ ncomms13806. 35. Yang H, Yu B, Ouyang P, Li X, Lai X, Zhang G, Zhang H. Machine learning-aided risk prediction for metabolic syndrome based on 3 years study. Sci Rep. 2022;12(1):2248. PMID: 35145200; PMCID: PMC8831522. https://doi.org/10.1038/s41598-022-06235-2. 36. Grinfeld J, Nangalia J, Baxter J, Wedge D, Angelopoulos N, Cantrill R, Godfrey A, Papaemmanuil E, Gundem G, Maclean C, Cook J, O’Neil L, O’meara S, Teague J, Butler A, Massie C, Williams N, Nice F, Andersen C, Hasselbalch H, Guglielmelli P, McMullin M, Vannucchi A, Harrison C, Gerstung M, Green A, Campbell P. Classification and personalized prognosis in myeloproliferative neoplasms. N Engl J Med. 2018;379:1416–30. 37. Denis F, Basch E, Septans A, Bennouna J, Urban T, Dueck A, Letellier C. Two-year survival comparing web-based symptom monitoring vs routine surveillance following treatment for lung cancer. JAMA. 2019;321:306– 7. 38. Hasnain Z, Mason J, Gill K, Miranda G, Gill IS, Kuhn P, Newton PK. Machine learning models for predicting post-cystectomy recurrence and survival in bladder cancer patients. PLoS One 2019 Feb 20;14(2):e0210976. PMID: 30785915; PMCID: PMC6382101. https:// doi.org/10.1371/journal.pone.0210976. 39. Nie D, Zhang H, Adeli E, Liu L, Shen D. 3D deep learning for multimodal imaging-guided survival time prediction of brain tumor patients. Med Image Comput Comput Assist Interv. 2016;9901:212–220. PMID: 28149967; PMCID: PMC5278791. https://doi.org/10.1007/978-3-31946723-8_25. 40. Meiring C, Dixit A, Harris S, MacCallum NS, Brealey DA, Watkinson PJ, Jones A, Ashworth S, Beale R, Brett S, Singer M, Ercole A. Optimal intensive care outcome prediction over time using machine learning. PLoS ONE 2018;13(11):e0206862. https://doi.org/10.1371/journal.pone. 0206862. 41. Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T. The rise of deep learning in drug discovery. Drug Discovery Today. 2018;23(6):1241–50. 42. Gupta S, Chaudhary K, Kumar R, Gautam A, Nanda JS, Dhanda SK, Brahmachari SK, Raghava GP. Prioritization of anticancer drugs against a cancer using genomic features of cancer cells: A step towards personalized medicine. Sci Rep. 2016;6:23857. PMID: 27030518; PMCID: PMC4814902. https://doi.org/10.1038/srep23857. 43. Hejase H, Chan, C. Improving drug sensitivity prediction using different types of data. CPT: Pharmacometrics Syst Pharmacol. 2015;4. 44. Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, Li B, Madabhushi A, Shah P, Spitzer M, Zhao S. Applications of machine

64

R. Buongiorno et al.

learning in drug discovery and development. Nat Rev Drug Discov. 2019;18:463–77. 45. Frankish K, Ramsey W. The Cambridge Handbook of Artificial Intelligence. Cambridge: Cambridge University Press; 2014. 46. Smiti A. When machine learning meets medical world: current status and future challenges. Comput Sci Rev. 2020;37:100280. https://doi.org/10. 1016/j.cosrev.2020.100280. 47. Garg A, Mago V. Role of machine learning in medical research: a survey. Comput Sci Rev. 2021;40:100370. https://doi.org/10.1016/j.cosrev.2021. 100370. 48. Esteva A, Kuprel B, Novoa R, Ko J, Swetter S, Blau H, Thrun S. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542:115–8. 49. Hawkins J, Brownstein J, Tuli G, Runels T, Broecker K, Nsoesie E, McIver D, Rozenblum R, Wright A, Bourgeois F, Greaves F. Measuring patient-perceived quality of care in US hospitals using Twitter. BMJ Qual Saf. 2015;25:404–13. 50. Mangasarian O, Street W, Wolberg W. Breast cancer diagnosis and prognosis via linear programming. Oper Res. 1995;43:570–7. 51. Colantonio S, Salvati A, Caudai C, Bonino F, De Rosa L, Pascali MA, Germanese D, Brunetto MR, Faita F. A deep learning approach for hepatic steatosis estimation from ultrasound imaging. In: Proceedings of ICCCI 2021 – 13th international conference on computational collective intelligence, Rhodes, Greece; 2021. p. 703–4. 52. Ali N, Srivastava D, Tiwari A, Pandey A, Pandey AK, Sahu A. Predicting life expectancy of hepatitis B patients using machine learning. In: IEEE international conference on distributed computing and electrical circuits and electronics (ICDCECE); 2022. 53. Simos N, Manikis G, Papadaki E, Kavroulakis E, Bertsias G, Marias K. Machine learning classification of neuropsychiatric systemic lupus erythematosus patients using resting-state fMRI functional connectivity. In: 2019 IEEE international conference on imaging systems and techniques (IST); 2019. p. 1–6. 54. Liu S, Lu M, Li H, Zuo Y. Prediction of gene expression patterns with generalized linear regression nodel. Front. Genet. 2019;10:120. https:// doi.org/10.3389/fgene.2019.00120. 55. Taylor RA, Moore CL, Cheung KH, Brandt C. Predicting urinary tract infections in the emergency department with machine learning. PLoS One. 2018;13(3):e0194085. PMID: 29513742; PMCID: PMC5841824. https://doi.org/10.1371/journal.pone.0194085. 56. Leha A, Hellenkamp K, Unsöld B, Mushemi-Blake S, Shah AM, Hasenfuß G, Seidler T. A machine learning approach for the prediction of pulmonary hypertension. PLoS One. 2019;14(10):e0224453. PMID: 31652290; PMCID: PMC6814224. https://doi.org/10.1371/journal.pone. 0224453.

3 Introduction to Machine Learning in Medicine

65

57. Huang S, Cai N, Pacheco P, Narrandes S, Wang Y, Xu W. Applications of support vector machine (SVM) learning in cancer genomics. Cancer Genom Proteom. 2018;15(1):41–51. 58. Ho T. Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition; 1995. vol. 1. p. 278– 82. 59. Zhu M, Xia J, Jin X, Yan M, Cai G, Yan J, Ning G. Class weights random forest algorithm for processing class imbalanced medical data. IEEE Access. 2018;6:4641–52. 60. Martin-Gutierrez L, Peng J, Thompson NL, Robinson GA, Naja M, Peckham H, Wu W, J’bari H, Ahwireng N, Waddington KE, Bradford CM, Varnier G, Gandhi A, Radmore R, Gupta V, Isenberg DA, Jury EC, Ciurtin C. Stratification of patients with Sjögren’s syndrome and patients with systemic lupus erythematosus according to two shared immune cell signatures, with potential therapeutic implications. Arthritis & Rheumatology 2021;73(9):1626–37. https://doi.org/10.1002/art.41708. 61. Seccia R, Gammelli D, Dominici F, Romano S, Landi AC, Salvetti M, Tacchella A, Zaccaria A, Crisanti A, Grassi F, Palagi L. Considering patient clinical history impacts performance of machine learning models in predicting course of multiple sclerosis. PLoS ONE 2020;15(3): e0230219. https://doi.org/10.1371/journal.pone.0230219. 62. Baumgartner C, Bóhm C, Baumgartner D. Modelling of classification rules on metabolic patterns including machine learning and expert knowledge. J Biomed Inf. 2005;38(2):89–98. 63. Wang J, Shi L. Prediction of medical expenditures of diagnosed diabetics and the assessment of its related factors using a random forest model, MEPS 2000–2015. Int J Qual Health Care. 2020;32(2):99–112. PMID: 32159759. https://doi.org/10.1093/intqhc/mzz135. 64. Chen X, Ishwaran H. Random forests for genomic data analysis. Genomics. 2012;99(6):323–9. 65. Mo X, Chen X, Ieong C, Zhang S, Li H, Li J, Lin G, Sun G, He F, He Y, Xie Y, Zeng P, Chen Y, Liang H, Zeng H. Early prediction of clinical response to etanercept treatment in juvenile idiopathic arthritis using machine learning. Front Pharmacol. 2020;11:1164. PMID: 32848772; PMCID: PMC7411125. https://doi.org/10.3389/fphar.2020.01164. 66. Murray S, Avati A, Schmajuk G, Yazdany J. Automated and flexible identification of complex disease: building a model for systemic lupus erythematosus using noisy labeling. J Am Med Inf Assoc JAMIA. 2019;26(1):61–5. 67. D’souza K, Ansari Z. Big data science in building medical data classifier using Naïve Bayes model. In: 2018 IEEE international conference on cloud computing in emerging markets (CCEM); 2018. p. 76–80. 68. Degroeve S, Baets B, Peer Y, Rouzé P. Feature subset selection for splice site prediction. Bioinformatics. 2002;18(Suppl 2):S75–83.

66

R. Buongiorno et al.

69. Silla C, Freitas A. A global-model naive bayes approach to the hierarchical prediction of protein functions. In: 2009 Ninth IEEE international conference on data mining; 2009. p. 992–997. 70. Sandberg R, Winberg G, Brändén C, Kaske A, Ernberg I, Cöster J. Capturing whole-genome characteristics in short sequences using a naïve Bayesian classifier. Gen Res. 2001;11(8):1404–9. 71. Khamis H. Application of k-nearest neighbour classification in medical data in the context of Kenia. Digit Repositry Unimib. 2014. 72. Parry R, Jones W, Stokes T, Phan J, Moffitt R, Fang H, Shi L, Oberthuer A, Fischer M, Tong W, Wang M. k-Nearest neighbor models for microarray gene expression analysis and clinical outcome prediction. Pharmacogenomics J. 2010;10:292–309. 73. Alexe G, Dalgin G, Ganesan S, DeLisi C, Bhanot G. Analysis of breast cancer progression using principal component analysis and clustering. J Biosci. 2007;32:1027–39. 74. Maisuradze G, Liwo A, Scheraga H. Principal component analysis for protein folding dynamics. J Mol Biol. 2009;385(1):312–29. 75. Le T. Fuzzy C-means clustering interval type-2 cerebellar model articulation neural network for medical data classification. IEEE Access. 2019;7:20967–73. 76. Khanmohammadi S, Adibeig N, Shanehbandy S. An improved overlapping k-means clustering method for medical applications. Expert Syst Appl. 2017;67:12–8. 77. Handhayani T, Hiryanto L. Intelligent kernel K-means for clustering gene expression. Procedia Comput Sci. 2015;59:171–7. 78. Greenspan H, Ginneken B, Summers R. Guest editorial deep learning in medical imaging: overview and future promise of an exciting new technique. IEEE Trans Med Imaging. 2016;35:1153–9. 79. Litjens G, Kooi T, Bejnordi B, Setio A, Ciompi F, Ghafoorian M, Laak J, Ginneken B, Sánchez C. A survey on deep learning in medical image analysis. Med Image Anal. 2017;42:60–88. 80. Gao X, Lin S, Wong T. Automatic feature learning to grade nuclear cataracts based on deep learning. IEEE Trans Biomed Eng. 2015;62:2693–701. 81. Sundaram L, Gao H, Padigepati S, McRae J, Li Y, Kosmicki J, Fritzilas N, Hakenberg J, Dutta A, Shon J, Xu J, Batzoglou S, Li X, Farh K. Predicting the clinical impact of human mutation with deep neural networks. Nat Genet. 2018;50:1161–70. 82. Frid-Adar M, Diamant I, Klang E, Amitai M, Goldberger J, Greenspan H. GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing. 2018;321:321–31. 83. Kamnitsas K, Ledig C, Newcombe V, Simpson J, Kane A, Menon D, Rueckert D, Glocker B. Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Med Image Anal. 2017;36:61–78.

3 Introduction to Machine Learning in Medicine

67

84. Jain A, Patel H, Nagalapatti L, Gupta N, Mehta S, Guttula S, Mujumdar S, Afzal S, Mittal R, Munigala V. Overview and importance of data quality for machine learning tasks. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 2020. 85. Dai W, Yoshigoe K, Parsley W. Improving data quality through deep learning and statistical models. ArXiv. abs/1810.07132; 2018. 86. Luca A, Ursuleanu T, Gheorghe L, Grigorovici R, Iancu S, Hlusneac M, Grigorovici A. Impact of quality, type and volume of data used by deep learning models in the analysis of medical images. Inf Med Unlocked. 2022;29:100911. https://doi.org/10.1016/j.imu.2022.100911. 87. Wang Z, Poon J, Sun S, Poon S. Attention-based multi-instance neural network for medical diagnosis from incomplete and low quality data. In: 2019 International joint conference on neural networks (IJCNN); 2019. p. 1–8. 88. Chang Y, Yan L, Chen M, Fang H, Zhong S. Two-stage convolutional neural network for medical noise removal via image decomposition. IEEE Trans Instrument Meas. 2020;69:2707–21. 89. Marcinkevics R, Vogt J. Interpretability and explainability: a machine learning zoo mini-tour. ArXiv. abs/2012.01805; 2020. 90. Samek W, Müller K. Towards explainable artificial intelligence. ArXiv. abs/1909.12072; 2019. 91. Montavon G, Samek W, Müller K. Methods for interpreting and understanding deep neural networks. ArXiv. abs/1706.07979; 2018. 92. Chen I, Pierson E, Rose S, Joshi S, Ferryman K, Ghassemi M. Ethical machine learning in health care. Ann Rev Biomed Data Sci. 2021;4:123– 44. 93. Yoon C, Torrance R, Scheinerman N. Machine learning in medicine: should the pursuit of enhanced interpretability be abandoned? J Med Ethics. 2021;48:581–5. 94. Martin A, Kanai M, Kamatani Y, Okada Y, Neale B, Daly M. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet. 2019;51:584–91. 95. Johnson K, Wei W, Weeraratne D, Frisse M, Misulis K, Rhee K, Zhao J, Snowdon J. Precision medicine, AI, and the future of personalized health care. Clin Transl Sci. 2020;14:86–93. 96. Quazi, S. Artificial intelligence and machine learning in precision and genomic medicine. Med Oncol 2022;39:120. https://doi.org/10.1007/ s12032-022-01711-1. 97. Xu J, Yang P, Xue S, Sharma B, Sanchez-Martin M, Wang F, Beaty K, Dehan E, Parikh B. Translating cancer genomics into precision medicine with artificial intelligence: applications, challenges and future perspectives. Hum Genet. 2019;138:109–24. 98. Grapov D, Fahrmann J, Wanichthanarak K, Khoomrung S. Rise of deep learning for genomic, proteomic, and metabolomic data integration in precision medicine. OMICS J Integrat Biol. 2018;22:630–6.

68

R. Buongiorno et al.

99. Hamamoto R, Komatsu M, Takasawa K, Asada K, Kaneko S. Epigenetics analysis and integrated analysis of multiomics data, including epigenetic data, using artificial intelligence in the era of precision medicine. Biomolecules. 2019;10(1):62. PMID: 31905969; PMCID: PMC7023005. https://doi.org/10.3390/biom10010062. 100. Björnsson B, Borrebaeck C, Elander N, Gasslander T, Gawel DR, Gustafsson M, Jörnsten R, Lee EJ, Li X, Lilja S, Martínez-Enguita D, Matussek A, Sandström P, Schäfer S, Stenmarker M, Sun XF, Sysoev O, Zhang H, Benson, M. Digital twins to personalize medicine. Gen Med. 2019;12(1):4. PMID: 31892363; PMCID: PMC6938608. https://doi.org/ 10.1186/s13073-019-0701-3.

4

Machine Learning Methods for Radiomics Analysis: Algorithms Made Easy Michail E. Klontzas and Renato Cuocolo

4.1

Introduction

Radiomics analysis represents the extraction of quantitative textural data from medical images by mathematically obtaining a series of values representing signal intensities or a variety of pixel interrelationship metrics [1]. Radiomics is the image-based alternative of traditional omics methods that provide big data for biological systems including genomics, proteomics, transcriptomics, and metabolomics [2]. In the case of radiomics data can be extracted from any kind of medical image including X-rays, ultrasound, CT, MRI, and PET in an attempt to obtain a detailed representation of image characteristics that cannot be seen with the bare eye of a

M. E. Klontzas () University Hospital of Heraklion, Heraklion, Greece Institute of Computer Science, Foundation for Research and Technology (FORTH), Heraklion, Greece R. Cuocolo Department of Medicine, Surgery and Dentistry, University of Salerno, Baronissi, Italy

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. E. Klontzas et al. (eds.), Introduction to Artificial Intelligence, Imaging Informatics for Healthcare Professionals, https://doi.org/10.1007/978-3-031-25928-9_4

69

70

M. E. Klontzas and R. Cuocolo

radiologist and that can be used for the quantitative characterization of tissues and the identification of novel imaging biomarkers [3]. The use of big data for the characterization of biological systems and disease states has emerged over the past couple of decades because of shortcomings of single variable analysis. The complexity of biological systems and the multitude of factors that dictate the imaging appearance of a normal or diseased tissue cannot be fully described by univariate or limited multivariate analyses. Omics analyses provide global evaluation of the examined system/tissue/image region and overcome limitations attributed to factor interactions and information loss when examining a limited number of variables [4]. Radiomics provides a comprehensive analysis of image components allowing the identification of image biomarkers that cannot be seen with the eye of radiologists but can be derived by means of mathematical transformations of the original image [1, 5, 6]. Dealing with big data necessitates the use of sophisticated algorithms that allow data curation and analysis to extract meaningful information and allow the construction of predictive models [4]. These algorithms include a series of machine learning methods that span the majority of steps required for radiomics analysis from the segmentation of regions of interest, to data curation and selection of radiomics features and the achievement of predictions based on the extracted data [1, 6].The aim of this chapter is to present a basic overview of the most commonly used machine learning algorithms in radiomics manuscripts for each of the aforementioned radiomics pipeline steps (Fig. 4.1). This overview is primarily aimed at physicians aiming to pursue radiomics research or wanting to acquire a basic knowledge of the field.

4 Machine Learning Methods for Radiomics Analysis

71

Fig. 4.1 Overview of the most common machine learning methods used in radiomics analysis (created with biorender.com)

4.2

Methods for Region of Interest Segmentation

Once the image dataset has been constructed and appropriately preprocessed to increase the quality of images and reduce inherent sources of noise and bias, the tissue or lesion of interest needs to be identified in order to extract radiomics features. This process of selecting the appropriate region of interest (ROI) is called segmentation and was traditionally performed in a manual manner, where a radiologist would draw a line by hand at the border of the lesion to delineate it. This manual approach is accompanied by significant bias and can lead to significant errors since a significant number of commonly encountered radiomics features are sensitive to small changes in segmentation borders [1, 6]. Other

72

M. E. Klontzas and R. Cuocolo

traditional segmentation techniques include thresholding, regionbased and edge-based methods. These methods are sensitive to noise, may require manual interaction to define a seed point, and are sensitive to intensity inhomogeneities which are very common in medical images. For all the aforementioned reasons researchers are starting to use tools for automatic AI-based lesion segmentation. These methods reduce the bias associated with manual segmentation which may depend on the meticulousness, the skills, and experience of the reader. Both supervised and unsupervised methods have been used for automatic segmentation. Unsupervised methods include k-mean, hard e-mean, and fuzzy e-mean algorithms. Supervised methods include deep learning architectures such as encoder–decoder networks (e.g., U-Net, V-Net), regional convolutional neural networks (e.g., Fast-RCNN, Faster-RCNN, MaskRCNN), and DeepLab models. Recently, transformers have been also combined with U-Net architectures for image segmentation purposes (e.g., UNETR, TransUNet).

4.2.1

R-CNN

One of the biggest breakthroughs in object detection and segmentation was the development of R-CNN (regions with convolutional neural networks). This method receives an input image, extracts approximately 2000 region proposals, utilizes a CNN to compute features for each region proposal, and finally uses a support vector machine (SVM) classifier to classify regions to different objects [7]. Several variations with improvements of RCNN have been published including Fast-RCNN, Faster-RCNN, and Mask-RCNN. These overcome drawbacks of R-CNN such as the training and object detection speed [8]. Mask RCNN specifically creates object masks in addition to the object bounding boxes produced by other versions of the algorithm such as Faster R-CNN [9].

4 Machine Learning Methods for Radiomics Analysis

4.2.2

73

U-Net and V-Net

U-Net is a fully convolutional neural network which is named after its U-shaped architecture where information through the layers of the network flows up and down due to their symmetrical shape. The characteristic of U-Net is that it is composed of a combination of an encoder and decoder module which are symmetric. The encoder module functions by reducing the spatial resolution of the image and increasing the number of filters followed by the decoder module which performs the exact opposite operation by increasing spatial resolution and decreasing the number of filters. A characteristic of U-Net is that each decoder incorporates a feature map derived from the corresponding encoder. This analysis allows the U-Net to understand the input image as a whole while identifying and segmenting the objects of interest. Ultimately, the network results in the segmentation of the original image, labeling each image pixel as part of either the background or the object of interest [10]. V-Net is another type of fully convolutional neural network similar to U-Net which was published in 2016. V-Net performs segmentation in 3D in comparison to U-Net that performs segmentation in 2D yielding an output where each voxel is labeled as background or object of interest. The original VNet paper introduced a novel objective function for segmentation based on the maximization of Dice coefficient [11].

4.2.3

DeepLab

DeepLab is a fully convolutional neural network published by Google for semantic segmentation purposes with the ability to capture information at a variety of scales. It has a structure similar to U-net with convolutional and pooling layers that reduce resolution while increasing feature maps followed by a sequence of transposed convolutional layers that increase resolution while decreasing feature maps [12]. DeepLab architectures introduced the “atrous convolution” method that allows the network to “understand” information at various scales allowing changes

74

M. E. Klontzas and R. Cuocolo

in kernel sizes without increasing the number of pixels to be processed. This also offers advantages in terms of speed. A series of DeepLab versions have been published (v1, v2, v3) with the latest being v3+ which is a revision of v3 that includes depthwise separable convolution and extra batch normalization [12, 13]. Detailed description of these network structures falls beyond the scope of this chapter.

4.3

Methods for Exploratory Data Analysis

Exploratory data analysis is the first unsupervised step to recognize groups/clusters between the samples of a dataset. This process is important to identify data patterns and potential data problems such as outliers skewing the data. Exploratory data analysis starts by exploring the summary statistics of our dataset including group member counts, mean, minimum, maximum, standard deviation. Histograms and/or box plots are commonly used to visualize these summary statistics.

4.3.1

Correlation Analysis

An important part of exploratory data analysis that is of great importance for radiomics is correlation analysis. Correlation analysis is a statistical method that is used to measure the strength and direction of the relationship between two variables. Even though it does not represent a machine learning method per se, its combination with clustering brings it at the interface of classical statistics and machine learning [14, 15]. It can be used for feature selection by identifying which features are highly correlated with the target variable, and which features are highly correlated with each other. In order to select between the two types of correlation coefficient, Pearson and Spearman, one needs to know if the data are normally distributed. In cases of normally distributed data Pearson correlation can be used which measures the linear association between two continuous variables. It ranges between −1 and 1, where −1 represents a perfect negative

4 Machine Learning Methods for Radiomics Analysis

75

linear correlation, 0 represents no correlation, and 1 represents a perfect positive linear correlation. Spearman correlation is the non-parametric equivalent of Pearson correlation. It ranges also between −1 and 1. Analyzing correlations between variables can provide important information, such as the identification of highly correlated and redundant features, the identification of outliers, and the identification of relationships between variables. Such correlation data is usually presented on correlation heatmaps [15]. It is important to keep in mind that correlation analysis only measures linear relationships and will not be able to capture nonlinear relationships between variables.

4.3.2

Clustering

Identification of clusters in the data can be done in a supervised or unsupervised manner. The former requires labeled data, whereas the latter looks for relationships in the data disregarding any labels. Since most radiomics manuscripts use supervised learning for the development of classification models in this initial exploratory analysis step, unsupervised clustering is most suitable to identify patterns in the data. Hierarchical clustering is one of the most commonly used methods for unsupervised clustering which creates a hierarchical representation of the clusters, where each cluster is represented as a node in a tree-like structure called a dendrogram [16]. The similarity between clusters can be measured using different metrics such as Euclidean distance, Manhattan distance, or cosine similarity. The choice of similarity metric depends on the nature of data and the problem. However, for high dimensional data such as omics data, Manhattan has been suggested as the optimal distance metric [17]. Dendrograms can be formed either in an agglomerative or a divisive fashion. In cases of agglomerative clustering each data point starts as its own cluster and clusters are iteratively merged based on their similarity, whereas in divisive clustering all data points start in the same cluster and this cluster is iteratively divided into smaller clusters based on similarity. Results can be visualized using

76

M. E. Klontzas and R. Cuocolo

dendrograms and associated heatmaps which provide a visual representation of the range of radiomics features [18].

4.3.3

Principal Component Analysis

Another way of unsupervised visualization of data patterns is the use of dimensionality reduction techniques, such as principal component analysis, linear discriminant analysis, and multidimensional scaling. Like other omics analysis, radiomics suffers from the “curse of dimensionality” problem, where the presence of more features than dataset samples increases the chance of encountering redundant information and correlations in high dimensional datasets, obscuring data interpretation, and reducing the performance of machine learning algorithms [19]. A way to solve this problem is by using dimensionality reduction techniques. This also allows the visualization of data relationships on the 2D or 3D space which is not possible otherwise. The main idea behind PCA is to project the data points onto a new set of axes, called principal components, which are orthogonal (and not-correlated) to each other and capture the most important information of the dataset. A PCA graph typically shows the data points projected onto the first two or three principal components. The principal components are chosen such that the first principal component explains the most variation in the data, the second principal component explains the second most variation, and so on. In a two-dimensional PCA graph, the first principal component is represented on the x-axis, and the second principal component is represented on the y-axis. Each data point is then plotted as a point in this space, with the position of the point indicating the values of the data point along the principal components [20, 21].

4 Machine Learning Methods for Radiomics Analysis

4.4

77

Methods for Feature Selection

The number of features extracted by radiomics methods/software (see Chap. 2) is usually significantly higher than the number of data points used for the analysis. This renders model construction significantly prone to overfitting. Therefore, a set of feature reduction techniques are used to produce a smaller number of valid and reproducible radiomics features that can be used for model construction [6]. Several algorithms can be used including regression methods (e.g., least absolute shrinkage and selection operator—LASSO), some of which are based on machine learning methods (e.g., Boruta, recursive feature elimination, maximum relevance—minimum redundancy). This section will present the machine learning based feature selection methods most commonly encountered in radiomics literature.

4.4.1

Boruta

Boruta is an algorithm for feature selection in machine learning. It is a wrapper method, which means that it uses another algorithm (such as random forests) to evaluate the importance of each feature. Boruta works by creating copies of each feature in the dataset and then randomly shuffling the values of these copies (referred to as “shadow features”). The algorithm creates a “shadow” dataset from the original provided. The algorithm then trains a feature selection algorithm (e.g. random forest) utilizing both the original features and the shadow features. If the performance of the algorithm is not affected by the shuffled values of a feature, it is considered not important. Boruta statistically compares the performance of random forests on the original features with the performance on the shadow features. If the performance on the original feature is significantly better than the performance on its corresponding shadow feature, the original feature is considered important. The algorithm repeats this process multiple times, recording the number of times each feature is selected as important or not important. Finally, it uses a

78

M. E. Klontzas and R. Cuocolo

threshold to decide which features are truly important [22]. One of the most important advantages of Boruta is that it accounts for the presence of correlated features being therefore robust against overfitting. In addition, it yields a measure of feature importance, enabling better model interpretation.

4.4.2

Recursive Feature Elimination

Another commonly used machine learning method for feature selection is recursive feature elimination (RFE). RFE recursively removes the least important features of a dataset, training a model on the remaining features. Similar to Boruta it is also a wrapper method, which means that it uses another algorithm (such as a decision tree or a linear model) to build a model for the evaluation of feature importance. RFE starts by training a model with all the dataset features, sequentially removing the features that contribute the least to the performance of the model. This process is repeated, training the model on the remaining features after each feature removal. The process stops when reaching either a pre-specified number of features or meeting a stopping criterion. RFE is computationally expensive since it requires training a new model for each feature set. In addition, it is affected by the selected model and the number of features removed at each model-training round [23–25]. Both Boruta and RFE are widely used in radiomics literature and are well accepted as methods. However, certain differences can be identified between them that could be considered when selecting which one of the two to use. The first major difference is that RFE does not provide a measure of feature importance compared to Boruta which can provide an importance statistic. Another important difference is that RFE is more sensitive to overfitting than Boruta, since feature removal depends on their performance in the training set, whereas Boruta is robust to overfitting since it eliminates features by comparing the performance of the model to a model created with the randomized versions of the features. Moreover, the fact that RFE creates a separate model for each elimination round renders it more computationally

4 Machine Learning Methods for Radiomics Analysis

79

expensive. Nonetheless, RFE is a simple and efficient method and the choice between the two greatly depends on the problem, the size of the dataset, and the computational power available [22, 25].

4.4.3

Maximum Relevance: Minimum Redundancy

In comparison to Boruta which seeks to identify all relevant features that can be used to construct a radiomics signature [22], the maximum relevance—minimum redundancy method (mRMR) aims to identify the minimum number of relevant features that when combined with each other can predict an outcome. mRMR requires the user to set the number of features desired to be selected. This number is usually defined empirically based on the number of images, the model that will be subsequently used and the computational capacity of the system. As the name implies, mRMR attempts to select features with the maximum relevance to the outcome but with the minimum redundancy. This is performed by calculating a relevance (based on F-statistic) and a redundancy metric (based on Pearson correlation). These metrics are used to rank the features based on a score that accounts for both metrics at each iteration of the algorithm. The feature with the highest score at each round is selected [26]. It is worth mentioning that Boruta and mRMR have been also combined in literature to extract all relevant features with Boruta and then rank them with mRMR [27, 28].

4.5

Methods for Predictive Model Construction

Once the final set of important features has been selected using the aforementioned methods the final step is to use these features to create predictive models. These can be classification or regression models and a set of traditional machine learning methods can be used for this purpose. The most commonly used ones are logistic regression, decision trees, random forests, gradient boosting models, support vector machines (SVM), and neural

80

M. E. Klontzas and R. Cuocolo

networks. These represent supervised machine learning models, trained and tuned on a partition of the dataset, tested on another partition and ideally on one or more external datasets from other institutions. Regression methods include algorithms shared with classical statistics, such as linear and logistic regression, as well as purely machine learning ones, such as random forest and support vector regressors. Given linear and logistic regression stand at the borders between statistical and machine learning models, they fall out of the scope of this text.

4.5.1

Decision Trees

A decision tree is a non-parametric algorithm that works by recursively partitioning the data into subsets based on the values of the input variables. It is named after its tree-like hierarchical structure starting with a root node and branching into internal nodes and finally leaf nodes. The algorithm performs repeated splitting of the dataset in a top-down fashion until it finds the optimal split and classified most of the records in the predefined labels. As the size of the trees increases it is extremely difficult to maintain its purity and is prone to overfitting [29].

4.5.2

Random Forests

Random forest is an ensemble method that combines multiple decision trees to create a “forest,” thus improving the accuracy and robustness of the predictions. It was created based on the assumption that combining multiple unrelated decision trees in one model can give better predictions than each single decision tree. Random forest uses the input dataset to create multiple decision trees by randomly selecting subsets of the data (resampling it with replacement, i.e., bootstrapping) and feature sets, then yielding a combined final result through majority voting (“bootstrap aggregating,” i.e., bagging) [30]. Combining several “weak” decision trees significantly overcomes the overfitting problem encountered in single, complex decision trees. However,

4 Machine Learning Methods for Radiomics Analysis

81

as it can be easily understood, combining several decision trees in one ensemble model is computationally costlier than running a single decision tree. Gini importance and variable importance can be computed to provide an estimate of factor importance for the resulting model [31].

4.5.3

Gradient Boosting Algorithms

Gradient boosting algorithms include some of the most successful machine learning algorithms for model development with tabular data. Gradient boosting includes powerful algorithms such as XGBoost, LightBoost, AdaBoost, and CatBoost. They represent ensemble methods that combine multiple weak learners to improve the accuracy and robustness of the predictions. AdaBoost first appeared in 1997 [29, 32] setting the basis of subsequent gradient boosting models. Gradient boosting algorithms utilize a loss function such as log-loss (classification) that is optimized and multiple weak learners (decision trees) which provide potential splits that are in turn being added one at a time and loss is minimized using a gradient descent method [33]. Gradient descent is a commonly used optimization algorithm that finds local minima of given functions (in our case the loss function) [34]. The most successful and commonly used gradient boosting algorithm is XGBoost which has been used to win a series of machine learning competitions with tabular data (a list of competitions won with the use of XGBoost can be found here https://github.com/dmlc/xgboost/blob/master/demo/README. md#machine-learning-challenge-winning-solutions). XGBoost aims to minimize a regularized objective function, representing a convex loss function that penalizes model complexity. Importantly, XGBoost can be scaled up using minimal resources [35]. Computations are performed in C++ but there are R and python packages that support the use of XGBoost with easy commands. Gradient boosting algorithms have been widely used in radiomics studies achieving excellent performance [36–38].

82

4.5.4

M. E. Klontzas and R. Cuocolo

Support Vector Machines

Support vector machines (SVM) is one of the most common traditional machine learning techniques that works well with small training sets. The aim of SVM is to find a geometrical way to maximize the difference between data classes. This is realized by identifying a separating hyperplane that passes between data classes in the n-dimensional space (where n is the number of features for each sample of the dataset), after transformation of the dataset using a kernel function. The model then finds the hyperplane function that maximizes the margin between the different classes in the data [29]. SVM has been widely used in radiomics manuscripts because of its ease of use and its excellent generalization capacity.

4.5.5

Neural Networks

Neural networks have also been used with radiomics data but will be extensively discussed in Chap. 6. However, it is important to mention that even though deep learning is excellent in computer vision tasks, it has been proven to underperform when used with tabular data, especially given the relatively small size of datasets typically available in medical imaging. Importantly it has been shown that when using tabular data, methods such as ensembles (e.g., random forest and XGBoost) may perform better than deep learning models [39] and need less data and tuning than deep learning models. This is the reason why other machine learning algorithms are often preferred over deep learning when creating radiomics predictive models in literature.

4.6

Conclusion

Radiomics has revolutionized medical imaging research by providing the imaging alternative to traditional biological omics. Radiomics analysis is a multi-step pipeline that utilizes machine learning methods in almost every step of the process. Basic

4 Machine Learning Methods for Radiomics Analysis

83

understanding of these machine learning algorithms is crucial in comprehending radiomics manuscript and in selecting the appropriate methods in radiomics research. Selection of the ideal algorithm at each step of the pipeline depends on the application, the dataset, and the experience of the user. In conclusion, radiomics is a “partner in crime” with machine learning and one needs to understand both in order to keep up with developments in the field.

References 1. van Timmeren JE, Cester D, Tanadini-Lang S, Alkadhi H, Baessler B. Radiomics in medical imaging—“how-to” guide and critical reflection. Insights Imaging. 2020;11:91. 2. Yamada R, Okada D, Wang J, Basak T, Koyama S. Interpretation of omics data analyses. J Human Gen. 2021;66(1):93–102. https://doi.org/10.1038/ s10038-020-0763-5. 3. Gillies RJ, Kinahan PE, Hricak H. Radiomics: images are more than pictures, they are data. Radiology. 2016;278(2):563–77. [cited 2021 Oct 18] https://pubs.rsna.org/doi/abs/10.1148/radiol.2015151169 4. Rohart F, Gautier B, Singh A, Lê Cao KA. mixOmics: an R package for ‘omics feature selection and multiple data integration. PLoS Comput Biol. 2017;13(11):e1005752. 5. Lambin P, Leijenaar RTH, Deist TM, Peerlings J, de Jong EEC, van Timmeren J, et al. Radiomics: the bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol. 2017;14(12):749–62. https:/ /doi.org/10.1038/nrclinonc.2017.141. 6. Papanikolaou N, Matos C, Koh DM. How to develop a meaningful radiomic signature for clinical use in oncologic patients. Cancer Imaging. 2020;20(1):33. 7. Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. ArXiv. 2013;1311.2524. http://arxiv.org/abs/1311.2524. 8. Girshick R. Fast R-CNN. ArXiv. 2015;1504.08083. http://arxiv.org/abs/ 1504.08083. 9. He K, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. ArXiv. 2017;1703.06870. http://arxiv.org/abs/1703.06870. 10. Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. ArXiv. 2015;1505.04597. http:// arxiv.org/abs/1505.04597.

84

M. E. Klontzas and R. Cuocolo

11. Milletari F, Navab N, Ahmadi SA. V-Net: fully convolutional neural networks for volumetric medical image segmentation. ArXiv. 2016;1606.04797. http://arxiv.org/abs/1606.04797. 12. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. ArXiv. 2016;1606.00915:1–14. http:// arxiv.org/abs/1606.00915. 13. Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H. Encoder-decoder with atrous separable convolution for semantic image segmentation. ArXiv. 2018;1802.02611:1–18. http://arxiv.org/abs/1802.02611. 14. Wu HM, Tien YJ, Ho MR, Hwu HG, Lin WC, Tao MH, et al. Covariateadjusted heatmaps for visualizing biological data via correlation decomposition. Bioinformatics. 2018;34(20):3529–38. 15. Gu Z, Eils R, Schlesner M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics. 2016;32(18):2847–9. 16. Thalamuthu A, Mukhopadhyay I, Zheng X, Tseng GC. Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics. 2006;22(19):2405–12. 17. Aggarwal CC, Hinneburg A, Keim DA. On the surprising behavior of distance metrics in high dimensional space. In: van den Bussche J, Vianu V, editors. Database theory — ICDT 2001. Berlin, Springer; 2001. p. 420–34. 18. Roux M. A comparative study of divisive and agglomerative hierarchical clustering algorithms. J Classif. 2018;35(2):345–66. 19. Misra BB, Langefeld CD, Olivier M, Cox LA. Integrated omics: tools, advances, and future approaches. J Mol Endocrinol. 2018;JME-18-0055. 20. Yeung KY, Ruzzo WL. Principal component analysis for clustering gene expression data. Bioinformatics. 2001;17(9):763–74. http:// www.cs.washington.edu/homes/kayee/pca 21. Yao F, Coquery J, Lê Cao KA. Independent principal component analysis for biologically meaningful dimension reduction of large biological data sets. BMC Bioinf. 2012;13(1):24. 22. Kursa MB, Rudnicki WR. Feature selection with the Boruta Package. J Stat Softw. 2010;36(11):1–13. http://www.jstatsoft.org/. 23. Guyon I, Weston J, Barnhill S. Gene selection for cancer classification using support vector machines. Mach Learn. 2002;46:389–422. 24. Degenhardt F, Seifert S, Szymczak S. Evaluation of variable selection methods for random forests and omics data sets. Brief Bioinform. 2019;20(2):492–503. 25. Pfannschmidt L, Hammer B. Sequential feature classification in the context of redundancies. ArXiv. 2020;2004.00658:1–10. http://arxiv.org/ abs/2004.00658. 26. Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol. 2005;3(2):185–205.

4 Machine Learning Methods for Radiomics Analysis

85

27. Li ZD, Guo W, Ding SJ, Chen L, Feng KY, Huang T, et al. Identifying key microRNA signatures for neurodegenerative diseases with machine learning methods. Front Genet. 2022;13:880997. 28. Zhang YH, Li H, Zeng T, Chen L, Li Z, Huang T, et al. Identifying transcriptomic signatures and rules for SARS-CoV-2 infection. Front Cell Dev Biol. 2021;8:627302. 29. Wu X, Kumar V, Ross QJ, Ghosh J, Yang Q, Motoda H, et al. Top 10 algorithms in data mining. Knowl Inf Syst. 2008;14(1):1–37. 30. Goldstein BA, Polley EC, Briggs FBS. Random forests for genetic association studies. Stat Appl Genet Mol Biol. 2011;10(1):32. 31. Tolo¸si L, Lengauer T. Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics. 2011;27(14):1986–94. 32. Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and application to boosting. J Comput Syst Sci. 1997;55:119– 39. 33. He Z, Lin D, Lau T, Wu M. Gradient boosting machine: a survey point zero one technology. ArXiv. 2019;1908.06951:1–9. 34. Ruder S. An overview of gradient descent optimization algorithms. ArXiv. 2016;1609.04747:1–14. http://arxiv.org/abs/1609.04747. 35. Chen T, Guestrin C. XGBoost: a scalable tree boosting system. ArXiv. 2016;1603.02754:1–13. http://arxiv.org/abs/1603.02754. 36. Klontzas ME, Manikis GC, Nikiforaki K, Vassalou EE, Spanakis K, Stathis I, et al. Radiomics and machine learning can differentiate transient osteoporosis from avascular necrosis of the hip. Diagnostics. 2021;11:1686. 37. Chen PT, Chang D, Yen H, Liu KL, Huang SY, Roth H, et al. Radiomic features at CT can distinguish pancreatic cancer from noncancerous pancreas. Radiol Imaging Cancer. 2021;3(4):e210010. 38. Awe AM, van den Heuvel MM, Yuan T, Rendell VR, Shen M, Kampani A, et al. Machine learning principles applied to CT radiomics to predict mucinous pancreatic cysts. Abdom Radiol. 2022;47(1):221–31. 39. Shwartz-Ziv R, Armon A. Tabular data: deep learning is not all you need. ArXiv. 2021;2106.03253:1–13. http://arxiv.org/abs/2106.03253.

5

Natural Language Processing Salvatore Claudio Fanni, Maria Febi, Gayane Aghakhanyan, and Emanuele Neri

5.1

Brief History of NLP

Natural language processing (NLP) is a field of artificial intelligence (AI), computational linguistics, and computer science and it is related to the interaction between natural human languages and computers [1]. The beginning of the field is often attributed to the early 1950s as a subfield of AI and Linguistics, with the aim of studying the problems derived from the automatic generation and understanding of natural language. The beginning of the field is often attributed to the early 1950s. However rudimental works from earlier periods can be found, it was in 1950 that Alan Mathison Turing, who was a leading cryptanalyst during World War II at the Government Code and Cypher School in Bletchley Park, Buckinghamshire, England, published an article entitled “Computing Machinery and Intelligence,” where he proposed a method “intelligence criteria” nowadays widely known as Turing

S. C. Fanni () · M. Febi · G. Aghakhanyan · E. Neri Academic Radiology, Department of Translational Research, University of Pisa, Pisa, Italy e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. E. Klontzas et al. (eds.), Introduction to Artificial Intelligence, Imaging Informatics for Healthcare Professionals, https://doi.org/10.1007/978-3-031-25928-9_5

87

88

S. C. Fanni et al.

test, to empirically determine whether a computer has achieved intelligence [2]. Turning test provides a powerful, simple, traceable, and pragmatic tool to evaluate the ability of computer to perform indistinguishably from a human. A necessary tenant of the Turing test is that the computer does not have to think like a human; rather the computer must simulate intelligence so that it is indistinguishable from human intelligence [3]. Several cornerstone stages can be identified in the history of NLP characterized by momentous events that drew the history of NLP, such as machine translation, the impact of AI, adaptation of a logico-grammatical style, and the use of massive language data [4]. In the modern era, the NLP has undergone a renaissance that was primarily fueled by researchers working at Google. In 2013, the Word2Vec algorithm (Google, https://code.google.com/ archive/p/word2vec/) was developed, which employed neural networks to learn word associations from free text without additional input from the user. This was further developed and refined in 2018, with the advent of the bidirectional encoder representations from transformers (BERT) language model, which builds on the framework of Word2Vec to learn from not only the text itself but the context in which it is used [5]. The first phase of work in NLP was a period of enthusiasm and optimism and was focused on machine translation. Most of the NLP research done in this period was focused on syntax, partly because syntactic processing was manifestly necessary, and partly through implicit or explicit endorsement of the idea of syntax-driven processing [4]. This was also driven because many researchers came to NLP research with a background and established status in linguistic and language study rather than computer science, as in more later periods. In 1964, the U.S. National Research Council (NRC) created the Automatic Language Processing Advisory Committee (ALPAC), whose task was to evaluate the progress of Natural Language Processing research. In 1966, the NRC and ALPAC initiated the first AI and NLP stoppage, by halting the funding of research on NLP and machine translation. However, after 12 years of research, and about $20 million dollars of investments, machine translations were still more expensive than manual human translations, and

5 Natural Language Processing

89

there were still no computers that came anywhere near being able to carry on a basic conversation. The machine translation research was almost killed by the 1966 ALPAC Report, which concluded that machine translation was nowhere near achievement and led to significant cut of funding [4]. The second phase of NLP was prompted by AI, with much more emphasis on world knowledge and on its role in the construction and manipulation of meaning representations. Overall, it took nearly 14 years (until 1980) for NLP AI research to recover from the broken expectations created by extreme enthusiasts during the first phase of the development. Albeit the second phase of NLP work was AI-driven and semantics-oriented, the third phase can be described, in reference to its dominant style, as a grammatical-logical phase. This trend, as a response to the failures of practical system building, was stimulated by the development of grammatical theory among linguists during the 1970s, and by the move toward the use of logic for knowledge representation and reasoning in AI. Computational grammar theory became a very active area of research linked with work on logics for meaning and knowledge representation that can deal with the language user’s beliefs and intentions and can capture discourse features and functions like emphasis and theme, as well as indicate semantic case roles. Research and development extended worldwide, notably in Europe and Japan, aimed not only at interface subsystems but at autonomous NLP systems, as for message processing or translation [1]. Until the 1980s, the majority of NLP systems used complex, “handwritten” rules, however, in the late 1980s, a revolution in NLP came about. This was the result of both the steady increase of computational power, and the shift to machine learning (ML) algorithms. In the 1990s, the popularity of statistical models for natural language processes analyses rose dramatically. The pure statistics NLP methods have become remarkably valuable. In addition, the recurrent neural network (RNN) models have been introduced and found their niche in 2007 for voice and text processing. Currently, neural network models are considered the cutting edge of research and development in the NLP’s understanding of text and speech generation [1].

90

S. C. Fanni et al.

Nowadays, the combination of a dialog manager with NLP makes it possible to develop a system capable of holding a conversation, and sounding human-like, with back-and-forth questions, prompts, and answers. Nevertheless, the current modern AI models are still not able to pass Alan Turing’s test, and still do not sound like real human beings. The NLP in the field of medical informatics and, in particularly, in radiological reporting has received increasing attention only in the recent years. A PubMed search for [natural language processing] or [text mining] showed that 52 manuscripts were published in 1998, compared to 1862 manuscripts in 2022 with an overall 35.8-fold increase. Recently, the Food and Drugs Administration (FDA) and the Centers for Disease Control and Prevention (CDC) launched a collaborative effort for the “Development of a Natural Language Processing (NLP) Web Service for Structuring and Standardizing Unstructured Clinical Information.” This project aims to create a NLP Platform for clinical text that will be extensible for many different subdomains [6]. The overall plan is to perform the necessary development for maximizing the use of existing tools and filling certain gaps. It should be noted that NLP is a generic term involving a wide variety of techniques to account for the complexity of language, and it has humbler origins dating much further back [7]. When the phrases “natural language processing” and “radiology” are included in the PubMed search, it yielded only 7 manuscripts in 1998, while 135 manuscripts in 2022, a 19.3-fold increase. Radiology is particularly suited to benefit from applications of NLP, given that the primary mode of inter-physician and physician-topatient communication is by way of the radiology report. In the following sections, we will address the NLP basics and cover the cutting-edge applications in the field of radiology and radiological reporting.

5 Natural Language Processing

5.2

91

Basic of Natural Language Processing

NLP encompasses any computer-based methods that analyze both written and spoken human language to convert it into structured and mineable data [8]. Through the combination of linguistic, statistical, and AI methods NLP can be used either to determine the meaning of a text or even to produce a human-like response [9]. According to these two different purposes, NLP can be categorized into two subsets: natural language understanding (NLU) and natural language generation (NLG). Natural language understanding (NLU) is the subset of NLP dedicated to the analysis of text and speech to interpretate natural language, determine the meaning, and identify the context using syntactic and semantic analysis. Conversely, natural language generation (NLG) focuses on producing a response in human language based on previously analyzed data input [10]. To accomplish these tasks, different approaches have been investigated, reflecting the above-mentioned different phases of NLP history and not really differing from those already described in the previous chapters for image analysis. Similarly to the best-known radiomic pipeline, the first step of NLP analysis is represented by segmentation, in this case meant as the identification of section/paragraphs in the analyzed text. Each section is further divided into sentences (sentence splitting) and words (tokenization). Before starting more sophisticated analysis, it is necessary to normalize the words by determining their lexical root (stemming), expanding abbreviation and spelling mistakes. When normalization is completed, a syntactic analysis is carried out to determine the part of speech of words (e.g. noun, verb, adverb, adjective) and their dependency relations, followed by a semantic analysis to determine the meaning of words [9]. Similarly to radiomics, the results of this preprocessing analysis are defined NLP features and are used as input for subsequent rule-based or ML-based processing steps [11]. The first NLP systems that were developed were rule-based classifiers and resemble computer-aided-detection system. Rule-based classifier is explicitly programmed by experts and relies on very sophisticated

92

S. C. Fanni et al.

handcrafted or “handwritten” rules. Conversely, ML classifier is based on rules that are automatically generated based on input labeling. These two approaches may be combined in a hybrid approach, where both handcrafted and automatically generated rules are used to generate an output. In both cases, it is necessary to perform the preprocessing step of NLP feature extraction, which is not actually required with deep learning (DL) based classifier [12]. DL is a subfield of ML based on artificial neural networks (ANN), whose structure resembles that of neural cortex [13]. ANN consist of artificial neurons organized in input, hidden computational and output layers and are classified according to their structure [14]. Convolutional neural networks (CNN) are one of the bestknown and are widely adopted, but not exclusively, in image analysis for detection, classification, or segmentation tasks. As written or spoken text are a sequence of words, RNN is the ideal ANN for NLP [15]. RNNs process sequential information and consist of neurons connected sequentially in a long chain. In RNNs the processed output is transferred from one neuron to the next one, and this transfer generates a “memory.” However, just as humans, that memory effect may lose effectiveness when facing long sentences. Thus, long short-term memory network has been developed, with higher effectiveness for long and complex written text analysis [16]. Recently, DL-based NLP outperformed the performance of traditional rule-based classifier or ML-based algorithms [17]. To quantitatively measure the performance and compare different systems, several metrics have been adopted, but undoubtedly F1 score is the most frequently used. F1 score is defined as a harmonized average of recall (sensitivity) and precision (positive predictive value) and it is an overall measure of NLP algorithms’ performance. Different variables affect the algorithms performance, and, as well as for image analysis, one of the most important is the quality of the dataset used for the training. However, beyond the different system and approaches, it is worth noting the importance of a standardize vocabulary allowing NLP engine

5 Natural Language Processing

93

to properly work with medical terms. To solve this problem many biomedical lexicons are born, like the unified medical language system (UMLS) developed by the US National Library of Medicine. UMLS includes a large lexicon of biomedical and general English and it integrates terms and codes from other vocabularies (Metathesaurus) such as CPT, ICD-10-CM, LOINC, MeSH, RxNorm, and SNOMED CT. UMLS integrates a semantic network: each concept in the Metathesaurus is assigned one or more semantics and they are linked together by semantic relationships [18]. SNOMED CT (Systematized Nomenclature of Medicine Clinical Terms) is a multilingual clinical healthcare terminology maintained and distributed by SNOMED International [19].

5.3

Current Applications of Natural Language Processing

The development of new NLP models, nowadays empowered by AI, results potentially in an infinite number of applications. Contrary to common thought, NLP is already part of our everyday life, e.g. in our computer software, in our mobile phones, but more and more also in our hospitals. Everyday life uses of NLP are language translation, virtual assistant, and e-mail spam detection. As previously described, translation is one of the very first applications of NLP and it is nowadays used to instantly translate online from one language to another [20]. Many people use virtual assistants (e.g. Alexa, Siri, Google Assistant) that are designed with NLP and AI. Virtual assistants can understand written and spoken language and answer correctly holding a conversation with the user. NLP is used also in e-mail spam detection, to elaborate the written text and classify the e-mail and then to decide if it is spam or not. While e-mail spam detection uses mostly NLU, which allows PCs to understand human language, virtual assistants use both NLU and NLG [21]. Regarding healthcare application, the real and clinical use of NLP is still in its beginning, but some hospitals use it with very

94

S. C. Fanni et al.

good results to facilitate and fasten the work of doctors and researchers. In healthcare, the use of NLP is spreading, and some hospitals have integrated them into their systems, thanks to the large use of electronic health records (EHR). Two of the most used application of NLP in healthcare are “information extraction” and “information retrieval.” Besides these, however, many applications are being developed, and many will be discovered in the future. Information extraction is the capability to extract structured information using a large pool of unstructured free text. Hospitals produce a large amount of written free-text data, which are vital for the patient, covering his diagnosis, treatment, followup, clinical and private information, and, by using an information extraction software, it is possible to extract relevant information for clinical decision support, evidence-based medicine or even research [22]. MedLEE (Medical Language Extraction and Encoding system), born in 1995 and developed by the Columbia University of New York, is one of the first invented and used NLP software for clinical information extraction [23]. MedLEE can be used for diverse clinical applications like surveillance, quality assurance, decision support, data mining, and adverse drug event detection. MedLEE has been used to detect patients with suspected tuberculosis and breast cancer [24, 25]. Another American information extraction NLP engine is CTakes (Clinical Text Analysis and Knowledge Extraction System). CTakes is open source and developed in 2006 at Mayo Clinic and under the Apache License [26]. A recent study used CTakes for assessing the validation of a pneumonia diagnosis in radiology reports [27]. The aim of information retrieval, instead, is to return a set of documents in response to a user’s query. The most common example in everyday life is Google search [23]. In healthcare, in a single hospital, there are different informatics systems that have to communicate with each other, like the reservation system, the picture and archiving communication system or the radiology information system, and there may not be a single user interface to search across all of them. The use of NLP to retrieve the right data at the right time can simplify clinical practice and research work, as it has been done with CogStack. CogStack is an

5 Natural Language Processing

95

open-source information retrieval and information extraction NLP engine implemented in King’s College Hospital (KCH) in the UK [28]. CogStack allowed to implement a real-time psychosis risk detection and alerting service or a clinical decision support system for diabetes management [29, 30]. Radiology is a branch of medicine that would significantly benefit from the applications of NLP software due to the use of written reports and the need to communicate with clinicians and patients. NLP could be applied in all types of reports, both structured report and the far more common unstructured freetext report. Moreover, NLP can be used to automatically convert unstructured into structured report and combine the advantage of both the reporting style [31, 32]. Some of the main uses of NLP in radiology are information extraction, text classification, topic modeling, simplification, and summarization. “Information extraction” allows to identify specific words or phrases in millions of reports and then classifying them to answer clinical questions. Using “text classification” and “topic modeling,” radiology reports can be organized by categories, like diagnosis, topics, severity, etc. This could be very helpful, for example, to find cohorts for clinical trials. Because radiology reports use a specific language that can be challenging to understand for patients, NLP finds a field of application also in simplifying reports. Moreover, radiologists need to communicate effectively with clinicians, and the use of “simplification” and “synthesis” applications could simplify and accelerate clinical and therapeutic decision-making [5]. In literature, there is an increase, especially in recent years, of new NLP models used in radiology. A recently published (2021) systematic review about NLP applied to radiology reports included 164 publications from 2015 and the techniques most used result were rule-based and ML. However, DL publication are raising in recent years, with “recurrent neural networks” (RNN) being the most common type of DL architecture. The embedding models (used to convert the report texts into numbers) most used result being Word2Vec, followed by GLOVE (global vectors

96

S. C. Fanni et al.

for words representations), FastText, ELMo (embeddings from language models), and Bert (bidirectional encoder representations from transformers) [33]. An example of rule-based classifier is PeFinder (Pulmonary Embolism Finder), a tool developed in 2011 by Chapman et al. PeFinder classifies reports based on the presence/absence of pulmonary embolism, the temporal state (acute/chronic), the certainty of the diagnosis, and also the quality of the exam (if diagnostic or not) [34]. Miao et al. developed a DL-based NLP method, to extract BI-RADS findings from breast ultrasound reports to support clinical decisions and breast cancer research [35]. A different use is the one from Brown et al., which uses a ML-based software to predict the radiology resource utilization in patients with hepatocellular carcinoma, starting with abdomen CT exam reports, for translation to healthcare management to improve decision-making and reduce costs [36]. Another growing use of NLP and a current challenge is its application in “sentiment analysis,” which analyzes people’s attitudes, opinions, but also emotions. This field relies mostly on social media, an enormous pool of written opinions from people around the world. For example, during the COVID-19 pandemic, some NLP models were suggested to understand the population’s feelings toward COVID-19 and the vaccine using comments and tweets [37]. Moreover, with NLP it is possible to extract health information through social media to diagnose depression, mental health problems, or insomnia [38, 39]. Sentiment analysis is being used also in radiology to extract radiologist opinion about the severity and the urgency of treatment of a radiological finding [5].

References 1. Kochmar E. Getting started with natural language processing. New York: Simon and Schuster; 2022. 2. Turing AM. Computing machinery and intelligence. Mind. 1950;236:433–60. 3. Harnad S. Minds, machines and searle. J Exp Theor Artif Intell. 1989;1(1):5–25.

5 Natural Language Processing

97

4. Jones KS. Natural language processing: a historical review. In: Zampolli A, Calzolari N, Palmer M, editors. Current issues in computational linguistics: in honour of Don Walker. Dordrecht: Springer; 1994. p. 3–16. https://doi.org/10.1007/978-0-585-35958-8_1. http:// link.springer.com/10.1007/978-0-585-35958-8_1. 5. Mozayan A, Fabbri AR, Maneevese M, Tocino I, Chheang S. Practical guide to natural language processing for radiology. RadioGraphics. 2021;41(5):1446–53. 6. Kreimeyer K, Foster M, Pandey A, Arya N, Halford G, Jones SF, et al. Natural language processing systems for capturing and standardizing unstructured clinical information: a systematic review. J Biomed Inform. 2017;73:14–29. 7. Chen P-H. Essential elements of natural language processing: what the radiologist should know. Acad Radiol. 2020;27(1):6–12. 8. Fanni SC, Gabelloni M, Alberich-Bayarri A, Neri E. Structured reporting and artificial intelligence. In: Fatehi M, Pinto dos Santos D, editors. Structured reporting in radiology. Imaging informatics for healthcare professionals. Cham: Springer; 2022. https://doi.org/10.1007/978-3-03091349-6_8. 9. Pons E, Braun LM, Hunink MG, Kors JA. Natural language processing in radiology: a systematic review. Radiology. 2016;279(2):329–43. https:// doi.org/10.1148/radiol.16142770. 10. Kao A, Poteet S. Overview. In: Kao A, Poteet S, editors. Natural language processing and text mining. New York: Springer; 2007. p. 1–7. 11. Goecks J, Jalili V, Heiser LM, Gray JW. How machine learning will transform biomedicine. Cell. 2020;181(1):92–101. https://doi.org/10.1016/ j.cell.2020.03.022. 12. Cheng LT, Zheng J, Savova GK, Erickson BJ. Discerning tumor status from unstructured MRI reports—completeness of information in existing reports and utility of automated natural language processing. J Digit Imaging. 2010;23(2):119–32. https://doi.org/10.1007/s10278-009-92157. Epub 2009 May 30. 13. Soffer S, Ben-Cohen A, Shimon O, Amitai MM, Greenspan H, Klang E. Convolutional neural networks for radiologic images: a radiologist’s guide. Radiology. 2019;290:590–606. 14. Chartrand G, Cheng PM, Vorontsov E, et al. Deep learning: a primer for radiologists. RadioGraphics. 2017;37:2113–31. 15. Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform. 2001;34(5):301. 16. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9:1735–80. 17. Ruder S. NLP’s ImageNet moment has arrived. 2018. https:// thegradient.pub/nlp-imagenet/. Accessed Mar 2021.

98

S. C. Fanni et al.

18. Lindberg C. The unified medical language system (UMLS) of the national library of medicine. J Am Med Rec Assoc. 1990;61(5):40–2. 19. Millar J. The need for a global language – SNOMED CT introduction. Stud Health Technol Inform. 2016;225:683–5. 20. Khurana D, Koli A, Khatter K, Singh S. Natural language processing: state of the art, current trends and challenges. Multimed Tools Appl. 2022. https://doi.org/10.1007/s11042-022-13428-4. 21. Garg P, Girdhar N. A systematic review on spam filtering techniques based on natural language processing framework. In: 2021 11th international conference on cloud computing, data science & engineering (confluence). 2021. p. 30–5. https://doi.org/10.1109/ Confluence51648.2021.9377042. 22. Malmasi S, Hosomura N, Chang L-S, Brown CJ, Skentzos S, Turchin A. Extracting healthcare quality information from unstructured data. AMIA Annu Symp Proc. 2017;2017:1243–52. 23. Iroju OG, Olaleke JO. A systematic review of natural language processing in healthcare. BMC Med Inform Decis Mak. 2015;21:179. https://doi.org/ 10.5815/ijitcs.2015.08.07. 24. Jain NL, Knirsch CA, Friedman C, Hripcsak G. Identification of suspected tuberculosis patients based on natural language processing of chest radiograph reports. In: Proceedings: a Conference of the American Medical Informatics Association. AMIA Fall Symposium. 1996. pp. 542– 6. 25. Jain NL, Friedman C. Identification of findings suspicious for breast cancer based on natural language processing of mammogram reports. Proceedings: a Conference of the American Medical Informatics Association. AMIA Fall Symposium. 1997. pp. 829–33. 26. Savova GK, et al. Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010;17(5):507–13. https://doi.org/10.1136/ jamia.2009.001560. 27. Panny A, et al. A methodological approach to validate pneumonia encounters from radiology reports using natural language processing. Methods Inf Med. 2022;61(1–2):38–45. https://doi.org/10.1055/a-18177008. 28. Jackson R, et al. CogStack – experiences of deploying integrated information retrieval and extraction services in a large National Health Service Foundation Trust hospital. BMC Med Inform Decis Mak. 2018;18(1):1– 13. https://doi.org/10.1186/s12911-018-0623-9. 29. Wang T, et al. Implementation of a real-time psychosis risk detection and alerting system based on electronic health records using CogStack. J Vis Exp. 2020;159:60794. https://doi.org/10.3791/60794.

5 Natural Language Processing

99

30. Patel D, et al. An implementation framework and a feasibility evaluation of a clinical decision support system for diabetes management in secondary mental healthcare using CogStack. BMC Med Inform Decis Mak. 2022;22(1):100. https://doi.org/10.1186/s12911-022-01842-5. 31. Spandorfer A, Branch C, Sharma P, et al. Deep learning to convert unstructured CT pulmonary angiography reports into structured reports. Eur Radiol Exp. 2019;3:37. https://doi.org/10.1186/s41747-019-0118-1. 32. Fanni SC, Colligiani L, Spina N, Colasanti G, Gabelloni M, Cioni D, et al. Current knowledge of radiological structured reporting. J Radiol Rev. 2022;9:93–9. https://doi.org/10.23736/S2723-9284.22.00189-1. 33. Casey A, et al. A systematic review of natural language processing applied to radiology reports. BMC Med Inform Decis Mak. 2021;21(1):179. https://doi.org/10.1186/s12911-021-01533-7. 34. Chapman BE, Lee S, Kang HP, Chapman WW. Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm. J Biomed Inform. 2011;44(5):728–37. https:// doi.org/10.1016/j.jbi.2011.03.011. 35. Miao S, et al. Extraction of BI-RADS findings from breast ultrasound reports in Chinese using deep learning approaches. Int J Med Inform. 2018;119:17–21. https://doi.org/10.1016/j.ijmedinf.2018.08.009. 36. Brown AD, Kachura JR. Natural language processing of radiology reports in patients with hepatocellular carcinoma to predict radiology resource utilization. J Am Coll Radiol. 2019;16(6):840–4. https://doi.org/10.1016/ j.jacr.2018.12.004. 37. Sv P, et al. Twitter-based sentiment analysis and topic modeling of social media posts using natural language processing, to understand people’s perspectives regarding COVID-19 booster vaccine shots in India: crucial to expanding vaccination coverage. Vaccine. 2022;10:11. https://doi.org/ 10.3390/vaccines10111929. 38. Doan S, Yang EW, Tilak SS, Li PW, Zisook DS, Torii M. Extracting health-related causality from twitter messages using natural language processing. BMC Med Inform Decis Mak. 2019;19(3):79. https://doi.org/ 10.1186/s12911-019-0785-0. 39. Patel R, et al. Frequent discussion of insomnia and weight gain with glucocorticoid therapy: an analysis of Twitter posts. NPJ Digit Med. 2018;1:20177. https://doi.org/10.1038/s41746-017-0007-z.

6

Deep Learning Fundamentals Eleftherios Trivizakis

and Kostas Marias

Abbreviations AE AI ANN CAD/CADe CLAHE CNN CT DL DrCNN FL

autoencoder Artificial intelligence Artificial neural networks Computer-aided diagnosis/detection Contrast-limited adaptive histogram equalization Convolutional neural networks Computed tomography Deep learning Denoising residual convolutional neural network Federated learning

E. Trivizakis () Computational BioMedicine Laboratory (CBML), Institute of Computer Science (ICS), Foundation for Research and Technology-Hellas (FORTH), Heraklion, Greece e-mail: [email protected] K. Marias Computational BioMedicine Laboratory (CBML), Institute of Computer Science (ICS), Foundation for Research and Technology-Hellas (FORTH), Heraklion, Greece Department of Electrical and Computer Engineering, Hellenic Mediterranean University, Heraklion, Greece e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. E. Klontzas et al. (eds.), Introduction to Artificial Intelligence, Imaging Informatics for Healthcare Professionals, https://doi.org/10.1007/978-3-031-25928-9_6

101

102 GAN GPU HE HU ML MRI PET ROI TL VAE VGG ViT XAI

E. Trivizakis and K. Marias Generative adversarial network Graphics processing unit Histogram equalization Hounsfield units Machine learning Magnetic resonance imaging Positron emission tomography Region of interest Transfer learning Variational autoencoder Visual geometry group Visual transformer Explainable artificial intelligence

6.1

Deep Learning in Medical Imaging

6.1.1

Key Concepts

The field of medical image analysis has recently shifted its focus from traditional “hand-crafted” image processing and simple statistical models to the cutting-edge technique of deep learning analysis. Medical professionals can be potentially benefited by the accurate lesion identification, segmentation of regions of interest, progression tracking, and categorization of pathological anatomical structures to aid them in clinical practice. Therefore, it is crucial for healthcare to adopt DL-based applications for the aforementioned tasks, since they can provide overburdened doctors with more agency and facilitate swift decision-making in the highly demanding clinical environment. The data analysis in deep neural networks follows a hierarchical architecture that progressively identifies hidden patterns inside the examined region of interest [1] and can potentially correlate those with clinical outcomes. Like biological neurons, artificial neurons receive a number of inputs, perform some sort of computation, and then output the results. In each neuron, a straightforward computation is performed, including a nonlinear activation function and a mechanism for sparse feature coding. Some typical nonlinear activation [2] functions of ANN include the sigmoid transformation, hyperbolic tangent, and the com-

6 Deep Learning Fundamentals

103

monly used rectified linear unit. Subsequently, salient elements of the imaging examination can be isolated and enhanced for certain tasks (such as classification or detection), while less significant ones would be suppressed. During the supervised learning process, each training image is assigned a label, and the model’s parameters are tuned by utilizing the prediction error of image-label pairings. A DL model is able to uncover complicated patterns in specialized datasets by utilizing a method called back-propagation [3] to demonstrate how a machine should adjust the internal weights needed to calculate representations in each layer based on the feature in the preceding layer. In a similar way, in unsupervised learning, deep models learn hidden features or underlying patterns even in the absence of labels by reconstructing the input image into the output. Contrastive learning [4] uses a loss that is based on a similarity metric, aiming to maximize similar feature pairings and eliminate dissimilar pairs. Therefore, the trained model can identify discriminative features in the targeted domain. In this context, DL models are greatly adaptable and trainable universal approximators of data distributions that allow establishing a relationship between the input data, including medical images, and clinical outcomes. In particular, these types of models are a collection of merely complicated mathematical transformations consisting of millions of learnable parameters. Their performance accuracy is highly dependent on the quality of dataset, data analysis method, and training approach [5]. The rapidly evolving topic of deep learning has shown remarkable potential in several areas of medical image analysis. These areas include disease classification, pixel-based segmentation, detection, and image registration in anatomical regions such as the nervous system [6], liver [7], lungs [8], breast [9], and bones [10]. Significant research has been made on this topic, however, there are still numerous potential technological difficulties or pitfalls to overcome before DL-based CAD schemes with high scientific consistency can be widely adopted in a clinical setting. One of the most significant drawbacks of using deep learning for medical image analysis is that medical institutions do not always make accessible adequate imaging examinations [11] that

104

E. Trivizakis and K. Marias

could be used for model convergence due to privacy concerns, and challenges in data ownership and patient protection. Other problems in DL for medical imaging include the lack of standardization of medical terminology or knowledge for clinical endpoints, access to well-defined and descriptive information for data, laborious image annotation processes, the lack of extensive clinical trials for assessing the impact of DL in a clinical setting, and the lack of established criteria for quality, quantity, specificity, and labeling protocols for imaging data.

6.1.2

DL Architectures for Medical Image Analysis

Convolutional neural network (CNN) This type of network was first built by LeCun et al. [12] as an end-to-end image analysis model for identifying handwritten digits. The core concept of this deep model is to exhaustively process and integrate feature maps to infer nonlinear correlations between the input data and the known ground truth. A set of convolutions with adaptive filters is incorporated hierarchically into the feature extraction part of the model, while the feature selection, reduction, and classification of these imaging features are performed on the neural part of the model, as depicted in Fig. 6.1a. In medical image analysis, CNNs are widely used for the fully-automated and end-to-end analysis of high-dimensional imaging data, offering robust modeling for challenging clinical endpoints. Popular architectures include VGG [13], Inception [14], ResNet [15], and DenseNet [16] and are mainly utilized as pre-trained models. CapsuleNet A capsule is a collection of neurons that represents a component of the input subject by activating a subspace of its features. The CapsuleNet [17] is composed of separate groups of capsules, as opposed to kernels or filters in CNNs, which propagate information to successively higher-level capsules via the routing by agreement procedure. This paradigm is one of the most recent developments in the field of deep learning and has

6 Deep Learning Fundamentals

105

Fig. 6.1 Deep learning architectures: (a) convolutional neural network, (b) artificial neural network, (c) autoencoder

not yet been thoroughly evaluated by data scientists and medical practitioners. AutoEnconders (AE) By recreating their input, AEs learn a compact representation of the examined distribution. After each hidden layer, the encoder component of the AE contains fewer neurons per layer, thereby decreasing the dimensionality of the input image. The complexity of the architecture and the layout of the neurons differentiate a basic ANN (Fig. 6.1b) from an AE (Fig. 6.1c). Back-propagating the reconstruction error, the decoder reconstructs from the learned latent space an estimation of the original image. AEs are self-supervised deep models that are widely used in medical imaging for pre-training of deep models, feature extraction, and synthetic image generation.

106

E. Trivizakis and K. Marias

Visual Transformers (ViT) Transformers have been effectively translated to a number of computer vision applications, delivering state-of-the-art performance and challenging the dominance of CNNs as the de facto choice for image analysis tasks [18]. Using these advancements in computer vision, visual transformers have been applied on medical imaging because they can better exploit global dependencies in data as opposed to other architectures that leverage local features from a narrow receptive field. Furthermore, a lot of time and effort has been invested in integrating attention mechanisms into CNN-based architectures. These hybrid CNNtransformer models have emerged as an alternative to ViT but promising option, mainly because of their capacity to both encode global dependencies and obtain highly discriminative feature distributions. In particular, these attention mechanisms were inspired by the biological primary cortex of mammals, which isolates only the relevant sensory information for scene interpretation. The transformer architecture uses self-attention to capture the global interdependence of labeled data without the need for sequential computing. These neural network layers (attention blocks) capture information from the full input sequence. Attention modules can focus on adaptively learned localizations of interest points, so model predictions are based on the most relevant visual information and attributes. Soft and hard are two categories of attention based on how image localizations are addressed. The former learns a weighted average of all characteristics, whereas the latter samples from a uniform subset. A typical algorithm for training a ViT will initially extract fixed-size patches from the original examination, transform these patches into vectors that will be encoded into compact embeddings, and finally adapt the ViT encoder on the embedding sequences for the downstream classification task. Generative Adversarial Networks (GAN) This novel deep model framework has attracted interest for its capacity to produce synthetic data that matches the real data after learning its distribution. Synthetic data can be quite beneficial in certain cases, such as training deep models on size-limited datasets and

6 Deep Learning Fundamentals

107

potentially alleviating biases related to distribution imbalances by augmenting the available samples of the minority class [19]. It is extremely difficult to generate synthetic data for the varying medical imaging modalities in oncology due to the substantial biological diversity that leads to various genetic subtypes and the physical composition of neoplasms [20]. Furthermore, medical imaging has specific properties and intricacies, like the high variability of measurable signals that depend on the scanner manufacturers, acquisition settings, and many other confounding variables. By increasing the heterogeneity and volume of the analyzed sample distributions, a well-fitted generative model has the potential to overcome some of these disadvantages. In particular, GAN is a methodology for fitting generative models that pulls samples from the targeted distribution without defining a probability distribution. A pair of models, generator G and discriminator D, is adapted in an adversarial manner. The generative model G is fed a noise vector, sampled from a Gaussian distribution, with the objective of matching it to the desired distribution. The resulting new samples are supposed to mimic data from an existing distribution. The discriminating network D assigns a probability that a sample originates from a known set instead of being a product of synthesis from G. Training the pair of deep models is like a two-player minimax game. The discriminator D is trained to maximize the probability of labeling synthetic and genuine samples correctly, while the generator G maximizes the error rate of the D model. G will eventually create real samples by capturing the underlying data distribution via the adversarial method. The architecture of a typical GAN is shown in Fig. 6.2. The deep generative modeling is a major methodology for overcoming significant drawbacks that make collecting data challenging, like the disease subtype rarity, privacy concerns at clinical sites, and the supply of imaging data from different scanner vendors.

108

E. Trivizakis and K. Marias

Fig. 6.2 A high-level illustration of a generative adversarial network. The generator G creates synthetic samples (red), and the discriminator D assigns a probability to the input image for being either real (green) or generated. The adversarial loss L estimates the distance between the two distributions

6.1.3

Cloud Computing for Deep Learning

Because of the sheer volume of imaging data, it is imperative to employ specialized analytics infrastructure for predictive modeling and demanding computational tasks in order to process the high-dimensional medical data. The rise of cloud computing platforms from major tech giants such as Alphabet, Amazon, and Microsoft, in addition to large data repositories [21], will greatly simplify swift diagnosis and clinical outcome prediction using advanced DL models. Success in this area would not have been possible without the use of parallel computing (large GPU server farms) and cloud-based infrastructure (large volumes of storage, fast networking infrastructure, high-performance computing), which have been crucial in resolving the difficulties inherent in processing large amounts of data.

6 Deep Learning Fundamentals

6.1.4

109

DL-Based Computer-Aided Diagnosis

Medical image analysis is crucial at all phases of patient treatment and monitoring lesion progression. The state of the art in image analysis has been vastly enhanced by deep learning, which uses many processing layers to learn representations of data with different degrees of abstraction. In particular, medical image applications benefit from the use of DL architectures that automatically learn intermediate and high-level abstractions from raw input. Computer-aided diagnosis/detection (CADx/CADe) might benefit from DL-based analysis because the latter has the potential to provide unbiased assessments of medical imaging data free from interobserver variability. Demystifying the DL’s black-box functionality by uncovering associations and underlying connections with patient data will be crucial to the adoption of this technology in clinical practice. Therefore, a CADx/CADe system should intelligently convey suggestions based on AI estimations to physicians, link the findings with other patients’ data and their overall health state, and provide more justification whenever clinicians have doubts regarding the proposed recommendations. Using explainability techniques for DL will allow the diagnosis system to deliver interpretable smart recommendations to clinicians, establishing a trustworthy framework. Future CAD/CADe systems should also be able to interpret multimodal data simultaneously, therefore mirroring clinicians’ reasoning, who likewise consider a variety of sources prior to treating patients [22]. The conventional function of medical practitioners will be bolstered by the advent of DL-based applications in terms of precision, repeatability, and scalability, all of which contribute to the delivery of effective care across a wide range of geographical areas. Moreover, medical imaging is anticipated to make significant strides forward in the new areas of deep learning model development and deployment. Artificial intelligence (AI) can also streamline the diagnostics and treatment decision-making processes in the near future. In contrast, medical professionals will have more time and energy to devote to doing what they were

110

E. Trivizakis and K. Marias

trained to do: treating patients and preventing illness, as opposed to staring at an endless amount of raw data.

6.2

Quality and Biases of Medical Databases

It is extremely challenging for modern machine learning to properly converge to a useful model without access to highdimensional data. One factor that has contributed to the rapid development and widespread adoption of conventional DL systems is the ease with which vast quantities of human-annotated data can be shared nowadays. However, significant difficulties still exist in imaging data gathering, expert annotation procedures, accessibility, and availability of infrastructure; all of which contribute to data scarcity for specialized medical cases and, consequently, limit the efficacy of DL-based models. These constraints during data collection frequently result in selection bias. A typical pattern of selection bias might occur when a single center is providing data used for model convergence and development, thereby leading to biases toward patient groups that are included in the training cohort. The integration of an AI system known to perform poorly and discriminate against patients with traditionally underrepresented characteristics in the original medical center’s dataset when adopted in a different institution with a different acquisition protocol may be problematic and lack the generalization ability of a medical device. Data shift is a type of selection bias and is one of the greatest challenges to DL systems’ generalization and usefulness. Data shift often occurs due to changes in the distribution of data used to train a DL-based model and does not precisely represent the features of the forthcoming data that are utilized in future deployments of the AI. These distribution shifts and biases can be detrimental to the generalizability of AI. For example, when attempting to redistribute DL-based systems developed in advanced industrialized countries with an underrepresented rural population to regions with different population characteristics,

6 Deep Learning Fundamentals

111

the AI model will most certainly have reduced efficiency and prediction performance [5]. Due to differences across equipment or acquisition protocols, imaging acquisition might suffer from technical bias, which is particularly common in the field of radiology. Critical issues including a suitable study design, a well-defined data collection protocol, realistic goals for data availability, proper infrastructure for collecting large amounts of imaging examinations, and outlining a quality control process are usually overlooked by researchers while collecting data. Researchers must be committed to mitigating the causes of poor data quality because applying DL to such data distribution has been demonstrated to systematize or exacerbate biases [23]. Evidence shows [24] that improving the aforementioned problems might assist in decreasing human bias in the clinical setting and, therefore, enhance best practices in healthcare. Although radiologists frequently evaluate and consider technical differences during image acquisition, like voxel spacing, scanner model or vendor, the resulting AI is not aware of these properties of the data distribution unless such differences were included in the model’s convergence process. Therefore, to facilitate robust development of models trained with a single institution’s data and to make multi-institutional model deployments feasible, it is necessary to establish standardized protocols and pre-processing methodologies that allow the harmonization of data distributions across medical centers and different scanners. Annotating and labeling imaging data to train DL models necessitates specialist knowledge. In spite of the importance of this, high-quality data can be difficult to collect since human annotations are regularly plagued by unclear boundaries or noisy information [25]. Because of the different interpretations of medical examinations, the performance of the DL analysis is determined by the quality of the annotated data. The human element in the process of data annotation is always subjective, which can be a significant obstacle to the success of a research project. Therefore, annotations from different experts are likely to vary due to the disagreements on the boundaries of the examined regions of interest. The underlying class distribution could potentially be

112

E. Trivizakis and K. Marias

approximated or modeled more accurately because of the use of a consensus from numerous annotations. Data quality can be significantly improved with the stratification of patients according to specific features or case rarity, the isolation of data with limited availability, and image annotations, which require time-consuming and laborious tasks but also multiple expert clinicians are required to avoid interobserver variability in the delineations of important regions, and explicitly define criteria for selecting a certain region of interest are needed.

6.3

Pre-processing for Deep Learning

6.3.1

CT Radiation Absorption Map to Grayscale

Windowing, often referred to as gray-level mapping, is the method of manipulating the computed tomography (CT) grayscale component using CT Hounsfield units (HU) and other relevant parameters of the CT. This alters the look of the image to emphasize certain anatomical structures. The image’s luminosity is modified by using the window level and contrast by adjusting the window width. DL image analysis tasks are greatly affected by HU windowing, especially when converting DICOM to other image formats (e.g., png) [26].

6.3.2

MRI Bias Field Correction

The bias field deformity in MR imaging results in more voxel intensity variation across scans that were acquired using the same device and patient. In cases where the bias field and MR image have different spatial frequencies, the bias field can be easily removed by filtering out the spatial frequencies that represent the magnetic field, which is a common method for correcting the scan with distance minimization algorithms such as the Manhattan distance and squared Euclidean distance. A few other prevalent bias field correction approaches include the N4ITK [27] and joint removal of bias [28].

6 Deep Learning Fundamentals

6.3.3

113

Tissue-Based Standardization

Standardization based on a reference tissue has been used extensively in brain MRI tasks [29]. This type of standardization can be applied to get a uniform pixel distribution in MR examinations, which will allow tissue-specific signatures across scans and enable improved quantification. This can be observed in Fig. 6.3, where the original pixel intensities of each prostate MRI scan (Fig. 6.3a) were standardized (Fig. 6.3b) using the distribution of the fat tissue near the prostate gland as a constant.

6.3.4

Pixel Intensities Normalization

Normalization is an essential pre-processing procedure that translates the spectrum of pixel intensity data to a standard scale; usually the new minimum will be close to zero and maximum near one. Standardization or z-score normalization is a method of normalization that employs the statistical features of mean and standard deviation. DL models converge better when the input data are standardized to have a zero mean and unit variance [30].

6.3.5

Harmonization

Harmonization strategies have been recommended as a way of minimizing the inherent variability of medical imaging [31]. Harmonization generally aims to address the lack of uniformity across medical scans and the loss of stability in imaging characteristics. Several pre-processing techniques can be applied for harmonization on the image level, like applying a uniform voxel spacing across scans, filtering techniques to produce similar spatial characteristics and noise patterns, quantitative-based standardization, and using generative models to standardize multicenter images [32].

114

E. Trivizakis and K. Marias

Fig. 6.3 A comparison of histograms of pixel intensities calculated from the: (a) original MRI scans, and (b) tissue-based normalized data

6 Deep Learning Fundamentals

6.3.6

115

Spacing Resampling

Across upsampling and downsampling, matrix interpolation resizes an image from its initial pixel grid to an interpolated grid. Several resampling techniques have been proposed and are widely used in literature, such as the nearest neighbor, trilinear, tricubic convolution, and tricubic spline interpolation. This technique allows for the extraction of texture characteristics from rotationally invariant tomographic scans. Additionally, an isotropic voxel space across scans can potentially increase the reproducibility of radiomics in multicenter studies with different spatial properties [33–35].

6.3.7

Image Enhancement

There are several image enhancement methods, including histogram equalization and its many variants. The typical histogram equalization (HE) method shifts the distribution of pixel intensities across a broader histogram, increasing the perceived contrast in an imaging examination. The alternative contrast-limited adaptive histogram equalization (CLAHE) restricts the histogram range of the processed image, preventing or clipping outlier pixel values that may reduce the benefits of HE.

6.3.8

Image Denoising

To maintain a reliable quantification and high diagnostic value in medical imaging, it is necessary to minimize the impacts of medical equipment and imaging distortions such as light oscillations and signal loss, while retaining the texture quality of key tissues or regions of interest. Gaussian noise is a form of electronic noise that originates from the amplifier and detector components of the device and distorts pixel distributions. Using median and other statisticalbased filters is the standard method for reducing different types of noise from images without compromising textural information.

116

E. Trivizakis and K. Marias

Fig. 6.4 The architecture of a DrCNN model used for denoising. This model architecture estimates the noise distribution patterns of the input dataset. DrCNN, denoising residual convolutional neural network

Utilizing several filtering techniques [36] such as the Gaussian, averaging, and Wiener filters can reduce noise but also introduce a soft appearance to the image that results in losing edge information in landmarks [37]. Deep learning-based models have significantly improved image denoising over traditional methods like the aforementioned average or median filtering. In particular, the internal representation of DL models is adapted to the specific data distribution during training, whereas traditional denoising uses predetermined methods that cannot be customized for the needs of specific datasets. Deep learning denoising maintains texture edges and granular details in the image. Additionally, the utilization of DL models in medical imaging, like the DrCNN in Fig. 6.4, has led to significant image quality enhancements while preserving the high-frequency information and has been applied to several anatomical areas, including scans of the liver [37], abdominal [38], lung [39], and pelvis [40] regions.

6.3.9

Lowering Dimensionality at the Imaging Level for Deep Learning

The DL models require constant image dimensionality for their input, which can be challenging in oncological tasks since tumors appear in a variety of shapes and forms but also because the

6 Deep Learning Fundamentals

117

Fig. 6.5 Two patch extraction methods are presented: (a) exhaustive with no overlapping patches from a high resolution pathology image and (b) based on regions of interest

imaging data themselves might have extremely high resolution like a typical pathology image (Fig. 6.5a). Trimming unwanted tissue in radiology data based on a segmentation mask will result in pixel arrays with different dimensions, as shown in Fig. 6.5b. To mitigate this problem, zero padding can be employed as the preferred method for DL applications since zero is the neutral digit of the convolutional operator. However, when standardization is applied, an appropriate padding constant should be selected. Normalization is performed prior to padding because it can alter key statistical features such as the standard deviation, maximum, and mean values of the dataset that are used to perform this pre-processing step. Patch extraction is an important pre-processing step for ViTs. There are a few parameters that affect the extraction process, such as the stride of the sliding window (overlapping), localization based on a segmented mask of a region of interest (ROI), and patch size. These parameters can have a negative effect on the training process of a neural network since they can introduce overfitting or underrepresented smaller regions of interest (e.g., strides larger than ROI, more patches from large tumors). When using TL with ImageNet weights, the medical images have to have the same pixel matrix as the used training samples. Therefore, 3D scans should be fed to the deep model as 2D slices. Patient-based stratification is required when these types

118

E. Trivizakis and K. Marias

of data handling are implemented because, for example, samples (patches, 2D slices, etc.) from the same patient might be present in both training and testing sets, introducing sample selection bias that may result in overfitting on the model level.

6.4

Learning Strategies

6.4.1

Transfer Learning

Deep learning methods require a large amount of training data under ideal conditions, but the level of generalizability of deep models has a significant impact on performance of the application. The restricted population of the available patient cohorts and the human resources, like expert clinicians, required for annotating these sets, as mentioned in previous sections, are well-known hurdles in developing DL models. Transfer learning (TL) strategies have been utilized in several studies to circumvent these issues. It is common knowledge that people can accomplish tasks that share some similarities by utilizing prior knowledge. Deep models are also transferable between similar tasks and can enhance performance on targeted tasks, a domain that may have lacked the required amount of data. This has been the case for medical imaging applications, where the efficacy of TL can reduce computing costs and save time without compromising prediction accuracy. Additionally, TL allows the introduction of deeper models for medical analysis tasks. Two primary forms of domain adaptation have been proposed in the literature: (1) finetuning TL (Fig. 6.6a), the weights of the pre-trained model must be updated for the target task via a new training procedure; (2) “off-the-shelf” TL (Fig. 6.6b), the feature extraction component of a trained model is used to produce imaging descriptors for use in a separate downstream task. In particular, “off-the-shelf” TL keeps the original convolutional weights the same while discarding the fully-connected layers, and a machine learning algorithm like support vector machines or a Gaussian process classifier is used for the downstream task. Because the fine-tuning TL updates a subset of the

6 Deep Learning Fundamentals

119

Fig. 6.6 The two types of transfer learning methods that have been proposed in the literature: (a) fine-tuning TL, the transferred weights are adapted for the new data distribution, and (b) “off-the-shelf” TL, only the convolutional weights are transferred to the target model for feature extraction

convolutional layers with new parameters, fine-tuning is similar to training from scratch in terms of being time-consuming and requiring a modest volume of data. TL has been successfully integrated into a variety of medical image classification tasks, such as lung lesions [41], colonic polyps [42], breast cancer [43], tissue density calculation [44], and brain tumors [45], as well as evaluated across a variety of other pathology imaging datasets [46]. The model with a single fine-tuned layer for object detection and two fine-tuned layers for the classification tasks achieved the highest performance [47]. Fine-tuning on every layer is the most popular TL strategy in the literature. However, this strategy does not significantly increase the model’s performance, and it has a higher computational cost compared to the abovementioned finetuning strategies since it adapts all the layers. Therefore, gradually updating the convolutional layers, usually by starting from the last convolutional layer, is highly recommended.

120

6.4.2

E. Trivizakis and K. Marias

Multi-task Learning

Enhancing generalization is the goal of multi-task learning, which does so by mixing information from many tasks (can be considered as enforcing restrictions on the model’s weights). Multi-task learning is an effective method in situations where large amounts of labeled input data for one task are available and can be transferred to another task with significantly less labeled data [48]. For instance, multi-task learning can be used in applications where the same features might be utilized for other supervised learning problems to predict different outcomes [49]. In this case, the model’s feature extraction part may generalize the identical inputs for different tasks since each output is predicted by a separate portion of the model.

6.4.3

Ensemble Learning

Ensemble learning is a learning strategy that uses multiple models converging simultaneously by using the same data to synergize their predictions [50]. With ensemble learning, many models process different data sources in parallel to provide enhanced predictions that would be unattainable by a single and simpler model. This involves the merging of data perspectives on different model types used in the group as well as the fusion of pretrained individual models on a prediction level. This strategy may increase the robustness of stochastic learning algorithms such as CNNs. Common examples of group learning computations include the bootstrap [51], weighted normal [52], and stacking [53].

6.4.4

Multimodal Learning

The use of artificial intelligence has evolved to the point where it is now a necessary approach for deducing information from a high-dimensional space using a data-driven point of view in a variety of fields. Increasing volumes of information in the

6 Deep Learning Fundamentals

121

field of medicine, particularly in oncology, might provide an overview of the intricacies of the underlying biology of certain lesions. Multimodal machine learning is inspired by the way that people learn best when exposed to a variety of stimuli at once. Currently, the majority of healthcare machine learning approaches only consider data from one modality. Modern computer-aided diagnosis systems should be able to handle several types of data at once, just like human clinicians perform diagnosis for patients. Typically, a radiologist is responsible for summarizing the results of scans to support the physician in reaching a clinical decision. A physician’s decision to select appropriate treatment for a patient is based on inputs from a variety of data sources that may include laboratory, pathology images, and multiple modalities of radiographic scans. Therefore, it is apparent that multimodality is an intrinsic property of healthcare data. It is reasonable to assume that the vast majority of data produced and amassed over the course of a patient’s lifetime will contain at least some information useful for delivering precise and individualized treatment. In a clinical setting, an AI-based support system should be able to reason with and interpret high-throughput data from multiple imaging modalities and other sources, as illustrated in Fig. 6.7, in order to make a clinically rational decision, just like a human medical expert would. The utilization of high-dimensional and high-throughput data (semantic, radiomics from varying modalities, laboratory, clinical, and transcriptomics) can lead to the discovery of composite markers with predictive properties for assessing treatment outcomes in oncology [22]. Modality fusion The most intuitive multimodal technique is the combination of multiple sensory inputs prior to a supervised classification process. This is accomplished by merging different data sources on a feature level with vector concatenation, addition, mean or maximum pooling. Fusion techniques are further classified as early (feature level) and late (model level) fusion, based on the stage in the analysis pipeline at which the merger is performed.

122

E. Trivizakis and K. Marias

Fig. 6.7 An example of multimodal analysis with imaging (top) and genomic (bottom) data in a common feature space by utilizing early fusion

Representation learning This approach is focused on acquiring enhanced feature representations by using data from many modalities. This process includes techniques that deal with selfsupervised learning (GAN, AE), weakly-supervised learning (CNN), and contrastive learning. Raw imaging sequences with multiple modalities can be integrated into a single multi-channel data structure by preserving their spatial properties (image processing tasks such as registration or interpolation are required) and the importance of clinical contexts (e.g., using high bvalue diffusion MRI with T2-weighted sequences) with extensive unlabeled data and sparse representations. Modality translation A subcategory of multimodal learning includes processes that translate data across modalities, such as CT to PET [54]. This is particularly interesting since deep generative networks are designed to learn nonlinear correlation,

6 Deep Learning Fundamentals

123

generally between an input image and the corresponding output data. This is a promising technology for datasets with incomplete or less relevant imaging sequences but with clinical data that might provide solutions to unmet clinical needs.

6.4.5

Federated Learning

Modern DL models may learn tens of millions of parameters through a training process that requires a large population to achieve high performance in clinical scenarios and generalizability to unseen data distributions. These large quantities of data, especially in the medical field, are quite challenging to collect since they are sensitive in terms of privacy and are carefully regulated in many jurisdictions to safeguard the secrecy of the medical records. Besides, the deployment of a centralized infrastructure requires substantial effort that includes the establishment of secure connections, the provision of efficient and dependable communications among the different parties of a centralized architecture, and the negotiation of complex data sharing agreements and governance among multiple institutions with varying jurisdictions. Additionally, maintaining and scaling this type of infrastructure is also challenging. Data anonymization may help to overcome some of these challenges; however, removing critical data information reduces the database’s usefulness for future research. Federated learning (FL) is a strategy or platform that allows learning across remote and separate data centers without requiring their private data to be shared with an external institution [55, 56]. The FL framework offers a robust environment for AI development in the medical imaging area by exploiting existing computing infrastructure and avoiding bureaucratic procedures. When trained on limited data from a single institution, DL models are susceptible to overfitting. Consequently, data distributions for developing AI models must incorporate a diverse set of cases, preferably originating from a variety of acquisition sites, backgrounds, and demographics. Therefore, multi-institutional patient cohorts are key to training reliable DL models.

124

E. Trivizakis and K. Marias

The contribution of remote agents, such as participating clinical sites, is necessary for the development of a distributed global predictive model that addresses unmet clinical needs [57]. The FL strategy has the potential to increase the robustness and generalizability of a global predictive model by allowing scalability via external agents that can provide data distributions from different scanners and populations from geographically distant regions with varied socioeconomic and genetic origins. The aggregator server, acting as the orchestrator, is central to the FL architecture because it provides all the necessary constraints and functionality to perform vital tasks in a uniform manner such as data pre-processing, selection of the DL architectures and hyperparameters, evaluation protocols, and gradient distribution methods. The remote agents perform analysis based on the distributed protocol on the locally accessible private databases. Finally, the aforementioned agents contribute to the aggregator server with all the parameters required to construct or refine the global predictive model for the desired clinical outcome.

6.5

Interpretability and Trustworthiness of Artificial Intelligence

It is a fundamental need in human nature [58] to want to comprehend how decisions are formed and what components motivate them, especially when medical issues are involved. Transparency, interpretability, and explainability are principles that are intimately connected to ethics in data science and required for establishing confidence in DL models that will be safe to deploy and use for the benefit of patients. Interpretability is the capacity to comprehend how an AI model generates decisions. Transparency has a twofold meaning: (1) related to the way a model is produced, and (2) the decision-making process of the model is both observable and follows a meaningful path that is comprehensible to an external observer. The FUTURE-AI guidelines [59] were developed to provide actionable directions for the creation, evaluation, and integration of AI solutions for healthcare that are fair, universal, traceable,

6 Deep Learning Fundamentals

125

usable, robust, and explainable (FUTURE). The “black-box” aspect of AI can be rather overwhelming for experts in radiology and healthcare in general. Typically, a clinician can elaborate on the reasoning behind a diagnosis. Likewise, procedures that enable a certain level of traceability and explainability for DLbased diagnostic assessments are necessary.

6.5.1

Reproducibility

The rapid advancement of computer vision is intimately tied to the culture of research that promotes the repeatability of experiments. In medical image analysis, an increasing number of researchers prefer to make their code accessible to the public, which considerably aids in building strong foundations for more complex projects and gaining the trust of a wider community of data scientists and clinicians. A well-documented and detailed data selection protocol is a strong indicator for increased reproducibility of experiments and results. Therefore, to ensure a reproducible AI system, the systematic accumulation of metadata during the development phase includes extensive data descriptions, the impact of various experimental settings and hyperparameters, monitoring the performance metrics used to evaluate the models, and detailed documentation of the complete development cycle.

6.5.2

Traceability

The traceability aspects of an AI provide end-users (clinical sites, clinicians, patients) with clarity of actions throughout the development and deployment phases of a model. It is well-known that DL models are susceptible to overfitting and memorization of data distribution, particularly with size-limited datasets or due to choices in specific parameters of the employed experimental protocol. Additionally, data stratification on a subject or sample basis is crucial for fairly dividing the original dataset into training, validation, and testing sets and is also indicative of the

126

E. Trivizakis and K. Marias

model’s validity. A platform that documents metadata records for traceability must contain the most crucial parameters of an experimental protocol, such as the data pre-processing protocol, the convergence strategy of a DL model, the overall design of a deep learning analysis, the patient cohorts used during the different development phases (training, validation, evaluation), and the optimal hyperparameters for ensuring repeatability or for future reference.

6.5.3

Explainability

The explainability of an artificial intelligence (XAI) system is interconnected with the deployment of transparency and traceability in the so-called black-box DL systems. Despite the fact that attempts to address issues related to the XAI have existed for a number of years, there has been an extraordinary increase in research studies over the past few years [60]. In regards to DL model interpretability, deep saliency maps have been introduced [61] to indicate which elements of an image the model has identified as the most relevant to the analysis of a particular clinical outcome. This method is based on perceptual interpretability and aims to reconstruct maps of causal dependencies among clinical outcomes in the examined data distributions.

6.5.4

Trustworthiness

Low-quality data or poorly curated databases in radiology might integrate and prolong socioeconomic biases that are currently the cause of inequities in health services. A poorly fitted AI system might produce predictions that are against treating patients with lower income, as the model was adapted from biased data distributions with limited representation of specific patient groups. Consequently, the contribution of societal prejudices has the potential to widen the inequalities in healthcare [62] when they are not thoroughly evaluated during the system’s designing phase. Moreover, whenever an AI system leads to unfavorable

6 Deep Learning Fundamentals

127

incidents, the developers responsible for its design must be able to decipher and specify why and how the system reached that decision. Finally, an AI-based CADx/CADe must fulfill some key prerequisites in order to be consistent with trustworthy principles. These requirements include complying with the legal system, conforming to agreed-upon ethical norms (privacy protection, fairness, and respect for individual rights), maintaining human input, transparency in the behavior of the AI, and explainability in the decision of the system. Acknowledgments We thank Aikaterini Dovrou (FORTH) for providing the original histograms of tissue-based normalization.

References 1. Chen X, Wang X, Zhang K, Fung KM, Thai TC, Moore K, Mannel RS, Liu H, Zheng B, Qiu Y. Recent advances and clinical applications of deep learning in medical image analysis. Med Image Anal. 2022;79:102444. 2. He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. 2015. 3. Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propagating errors. Nat 1986 3236088. 1986;323:533–6. 4. Jaiswal A, Babu AR, Zadeh MZ, Banerjee D, Makedon F. A survey on contrastive self-supervised learning. Technol 2021. 2020;9:2. 5. Luca AR, Ursuleanu TF, Gheorghe L, Grigorovici R, Iancu S, Hlusneac M, Grigorovici A. Impact of quality, type and volume of data used by deep learning models in the analysis of medical images. Informatics Med Unlocked. 2022;29:100911. 6. Xia W, Hu B, Li H, et al. Deep Learning for automatic differential diagnosis of primary central nervous system lymphoma and glioblastoma: multi-parametric magnetic resonance imaging based convolutional neural network model. J Magn Reson Imaging. 2021;54:880–7. 7. Trivizakis E, Manikis GC, Nikiforaki K, Drevelegas K, Constantinides M, Drevelegas A, Marias K. Extending 2D convolutional neural networks to 3D for advancing deep learning cancer classification with application to MRI liver tumor differentiation. IEEE J Biomed Heal Inform. 2018:1–1. 8. Asuntha A, Srinivasan A. Deep learning for lung Cancer detection and classification. Multimed Tools Appl. 2020;79:7731–62. 9. Trivizakis E, Ioannidis G, Melissianos V, Papadakis G, Tsatsakis A, Spandidos D, Marias K. A novel deep learning architecture outperforming ‘off-the-shelf’ transfer learning and feature-based methods in

128

E. Trivizakis and K. Marias

the automated assessment of mammographic breast density. Oncol Rep. 2019;42:2009–15. 10. Allegra A, Tonacci A, Sciaccotta R, Genovese S, Musolino C, Pioggia G, Gangemi S. Machine learning and deep learning applications in multiple myeloma diagnosis, prognosis, and treatment selection. Cancers 2022. 2022;14:606. 11. Trivizakis E, Papadakis GZ, Souglakos I, Papanikolaou N, Koumakis L, Spandidos DA, Tsatsakis A, Karantanas AH, Marias K. Artificial intelligence radiogenomics for advancing precision and effectiveness in oncologic care (Review). Int J Oncol. 2020;57:43–53. 12. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1989;1:541–51. 13. Simonyan K, Zisserman A. Very deep convolutional networks for largescale image recognition. 2014. arXiv Prepr. arXiv1409.1556. 14. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society; 2016. p. 2818–26. 15. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society; 2016. p. 770–8. 16. Huang G, Liu Z, van der Maaten L, Weinberger KQ. Densely connected convolutional networks. 2016. arXiv Prepr. arXiv1608.06993. 17. Sabour S, Frosst N, Hinton GE. Dynamic routing between capsules. Adv Neural Inf Process Syst. 2017:3856–66. 18. Shamshad F, Khan S, Zamir SW, Khan MH, Hayat M, Khan FS, Fu H. Transformers in medical imaging: a survey. 2022; https://doi.org/ 10.48550/arxiv.2201.09873. 19. Osuala R, Kushibar K, Garrucho L, Linardos A, Szafranowska Z, Klein S, Glocker B, Diaz O, Lekadir K. Data synthesis and adversarial networks: a review and meta-analysis in cancer imaging. Med Image Anal. 2021;102704 20. Dimitriadis A, Trivizakis E, Papanikolaou N, Tsiknakis M, Marias K. Enhancing cancer differentiation with synthetic MRI examinations via generative models: a systematic review. Insights Imaging. [Accepted]. 21. National Institutes of Health – National Cancer Institute (NIH – NCI) Imaging Data Common (IDC). https:// portal.imaging.datacommons.cancer.gov/. Accessed 30 Nov 2022. 22. Trivizakis E, Souglakos I, Karantanas AH, Marias K. Deep radiotranscriptomics of non-small cell lung carcinoma for assessing molecular and histology subtypes with a data-driven analysis. Diagnostics. 2021;11:1– 15.

6 Deep Learning Fundamentals

129

23. Wiens J, Saria S, Sendak M, et al. Do no harm: a roadmap for responsible machine learning for health care. Nat Med. 2019;25(10):1337–40. 24. Chen IY, Joshi S, Ghassemi M. Treating health disparities with artificial intelligence. Nat Med. 2020;26:16–7. 25. Schmarje L, Grossmann V, Zelenka C, et al. Is one annotation enough? A data-centric image classification benchmark for noisy and ambiguous label estimation. 2022; https://doi.org/10.48550/arxiv.2207.06214. 26. Gul S, Khan MS, Bibi A, Khandakar A, Ayari MA, Chowdhury MEH. Deep learning techniques for liver and liver tumor segmentation: a review. Comput Biol Med. 2022;147:105620. 27. Tustison NJ, Avants BB, Cook PA, Zheng Y, Egan A, Yushkevich PA, Gee JC. N4ITK: improved N3 bias correction. IEEE Trans Med Imaging. 2010;29:1310–20. 28. Learned-Miller EG, Jain V. Many heads are better than one: jointly removing bias from multiple MRIs using nonparametric maximum likelihood. Lect Notes Comput Sci. 2005;3565:615–26. 29. Haur Ong K. White matter lesion intensity standardization using adaptive landmark based brain tissue analysis on FLAIR MR image stroke CAD view project. Artic. Int. J. Adv. Soft Comput. Its Appl. 2018; 30. Hinton GE. Learning multiple layers of representation. Rev TRENDS Cogn Sci. https://doi.org/10.1016/j.tics.2007.09.004 31. Ugga L, Romeo V, Stamoulou E, et al. Harmonization strategies in multicenter MRI-based radiomics. J Imaging. 2022;8:303. 32. Da-Ano R, Visvikis D, Hatt M. Harmonization strategies for multicenter radiomics investigations. Phys Med Biol. 2020;65:24TR02. 33. Park JE, Park SY, Kim HJ, Kim HS. Reproducibility and generalizability in radiomics modeling: possible strategies in radiologic and statistical perspectives. Korean J Radiol. 2019;20:1124. 34. Larue RTHM, van Timmeren JE, de Jong EEC, et al. Influence of gray level discretization on radiomic feature stability for different CT scanners, tube currents and slice thicknesses: a comprehensive phantom study. Acta Oncol (Madr). 2017;56:1544–53. 35. Loi S, Mori M, Benedetti G, et al. Robustness of CT radiomic features against image discretization and interpolation in characterizing pancreatic neuroendocrine neoplasms. Phys Medica. 2020;76:125–33. 36. Das KP, Chandra J. A review on preprocessing techniques for noise reduction in PET-CT images for lung cancer. Lect Notes Data Eng Commun Technol. 2022;111:455–75. 37. Park S, Yoon JH, Joo I, et al. Image quality in liver CT: low-dose deep learning vs standard-dose model-based iterative reconstructions. Eur Radiol. 2022;32:2865–74. 38. Akagi M, Nakamura Y, Higaki T, Narita K, Honda Y, Zhou J, Yu Z, Akino N, Awai K. Deep learning reconstruction improves image quality of abdominal ultra-high-resolution CT. Eur Radiol. 2019;29:6163–71.

130

E. Trivizakis and K. Marias

39. Hata A, Yanagawa M, Yoshida Y, Miyata T, Tsubamoto M, Honda O, Tomiyama N. Combination of deep learning–based denoising and iterative reconstruction for ultra-low-dose CT of the chest: image quality and lung-rads evaluation. Am J Roentgenol. 2020;215:1321–8. 40. Feng TS, Lian LA, Hong LJ, Jun LY, Dong PJ. Potential value of the PixelShine deep learning algorithm for increasing quality of 70 kVp+ASiR-V reconstruction pelvic arterial phase CT images. Jpn J Radiol. 2019;37:186–90. 41. Shin HC, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM. Deep convolutional neural networks for computeraided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging. 2016;35:1285–98. 42. Ribeiro E, Uhl A, Wimmer G, Häfner M. Transfer learning for colonic polyp classification using off-the-shelf CNN features. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics). 2016;10170 LNCS:1–13. 43. Zhi W, Wing H, Yueng F, Chen Z, Zandavi SM, Lu Z, Chung YY. Using transfer learning with convolutional neural networks to diagnose breast cancer from histopathological images. https://doi.org/10.1007/9783-319-70093-9_71 44. Trivizakis E, Ioannidis GS, Melissianos VD, Papadakis GZ, Tsatsakis A, Spandidos DA, Marias K. A novel deep learning architecture outperforming ‘off-the-shelf’ transfer learning and feature-based methods in the automated assessment of mammographic breast density. Oncol Rep. 2019; https://doi.org/10.3892/or.2019.7312. 45. Ioannidis GS, Trivizakis E, Metzakis I, Papagiannakis S, Lagoudaki E, Marias K. Pathomics and deep learning classification of a heterogeneous fluorescence histology image dataset. Appl Sci. 2021;11:3796. 46. Mormont R, Geurts P, Maree R. Comparison of deep transfer learning strategies for digital pathology. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Work. IEEE; 2018. p. 2343. 47. Kim HE, Cosa-Linan A, Santhanam N, Jannesari M, Maros ME, Ganslandt T. Transfer learning for medical image classification: a literature review. BMC Med Imaging. 2022;22:1–13. 48. Amyar A, Modzelewski R, Vera P, Morard V, Ruan S. Multi-task multiscale learning for outcome prediction in 3D PET images. Comput Biol Med. 2022;151:106208. 49. Kainz P, Pfeiffer M, Urschler M. Semantic segmentation of colon glands with deep convolutional neural networks and total variation segmentation. 2015; https://doi.org/10.48550/arxiv.1511.06919. 50. Suganyadevi S, Seethalakshmi V, Balasamy K. A review on deep learning in medical image analysis. Int J Multimed Inf Retr. 2021;11:19–38. 51. Bovis K. Classification of mammographic breast density using a combined classifier paradigm. Med Image Underst Anal. 2002:1–4.

6 Deep Learning Fundamentals

131

52. Rao T. Performance analysis of deep learning models using bagging ensemble. ˙ 53. Yurttakal AH, Erbay H, Ikizceli T, Karaçavu¸s S, Biçer C. Classification of breast DCE-MRI images via boosting and deep learning based stacking ensemble approach. Adv Intell Syst Comput. 2021;1197 AISC:1125–32. 54. Armanious K, Jiang C, Fischer M, Küstner T, Hepp T, Nikolaou K, Gatidis S, Yang B. MedGAN: medical image translation using GANs. Comput Med Imaging Graph. 2020;79:101684. 55. Darzidehkalani E, Ghasemi-Rad M, van Ooijen PMA. Federated learning in medical imaging: Part I: toward multicentral health care ecosystems. J Am Coll Radiol. 2022;19:969–74. 56. Darzidehkalani E, Ghasemi-Rad M, van Ooijen PMA. Federated learning in medical imaging: Part II: Methods, challenges and considerations. J Am Coll Radiol. 2022;19:975–82. 57. Hoffman RR, Mueller ST, Klein G, Litman J. Metrics for explainable AI: challenges and prospects. 2018. https://doi.org/10.48550/ arxiv.1812.04608 58. Doshi-Velez F, Kim B. Towards a rigorous science of interpretable machine learning. 2017. https://doi.org/10.48550/arxiv.1702.08608 59. Lekadir K, Osuala R, Gallin C, et al. FUTURE-AI: guiding principles and consensus recommendations for trustworthy artificial intelligence in medical imaging. 2021. https://doi.org/10.48550/arxiv.2109.09658 60. Angelov PP, Soares EA, Jiang R, Arnold NI, Atkinson PM. Explainable artificial intelligence: an analytical review. Wiley Interdiscip Rev Data Min Knowl Discov. 2021;11:e1424. 61. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision. IEEE; 2017. p. 618–26. 62. AI HLEG, Commission E. Ethics guidelines for trustworthy AI. 2019.

7

Data Preparation for AI Analysis Andrea Barucci, Stefano Diciotti, Marco Giannelli, and Chiara Marzi

7.1

Introduction

In the past decade, artificial intelligence (AI) has definitely prevailed as a “disruptive technology” with widespread applications in every field of human knowledge, from space travel to cultural heritage [1–3], through medicine and biology [4, 5]. Machine learning [6] and deep learning [7] techniques are at the heart of the current AI success in medicine, and they have proven that they can

A. Barucci () · C. Marzi “Nello Carrara” Institute of Applied Physics, National Research Council of Italy (IFAC-CNR), Florence, Italy e-mail: [email protected]; [email protected] S. Diciotti Department of Electrical, Electronic, and Information Engineering “Guglielmo Marconi”, University of Bologna, Cesena, Italy Alma Mater Research Institute for Human-Centered Artificial Intelligence, University of Bologna, Bologna, Italy e-mail: [email protected] M. Giannelli Unit of Medical Physics, Pisa University Hospital “Azienda Ospedaliero-Universitaria Pisana”, Pisa, Italy e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. E. Klontzas et al. (eds.), Introduction to Artificial Intelligence, Imaging Informatics for Healthcare Professionals, https://doi.org/10.1007/978-3-031-25928-9_7

133

134

A. Barucci et al.

adapt to seemingly extremely distant tasks, such as recognising a hieroglyph in a photograph [1] or finding a cancerous lesion in medical images. Historically, the first AI applications in medicine, thanks to a large amount of data, were focused on radiological imaging, specifically computed tomography (CT), and magnetic resonance (MR). Nowadays, AI involves most imaging techniques, also including X-ray [8], positron emission tomography [9], ultrasound imaging [10], and digital pathology [11]. Initially, radiologically derived hand-crafted features, such as radiomic features [12], were analysed through machine learning techniques. More recently, deep learning networks that operate directly on images are also being used. Some AI applications are now widely used at the research level [13] or being validated through clinical trials. Others are already being used in clinical practice [14– 17]. The potential of AI in clinical imaging is supported by a large number of studies (see, e.g., [18–22]), but at the same time some current limitations become apparent, such as the lack of generalisation of the results obtained. Many studies have led to results that are only valid in the conditions in which they were performed and therefore cannot be directly exported to other clinical contexts. These limitations were immediately attributed to data quality, preprocessing, and algorithms architecture. It is worth noting that in this scenario, the interaction between different experts, such as clinicians, engineers, physicists, and data scientists, becomes fundamental in defining a system consisting of data and algorithms of the highest quality and performance. Quality should become a feature of the entire system, taking into account the quality of the individual image, the quality of the image dataset, and the high potential of the AI algorithms. In this chapter, we will focus on the first two aspects. More specifically, we will discuss how data quality affects the results of AI, from the acquisition of the image and the information contained therein to the methods for preprocessing the data

7 Data Preparation for AI Analysis

135

Fig. 7.1 Overview of the impact of data quality and numerosity in clinical image analysis using AI

to create numerous specific datasets for AI applications (Fig. 7.1).

7.2

Data Quality and Numerosity

7.2.1

Intrinsic Image Quality

If we want to investigate how the quality of a medical image is related to the result of the subsequent analysis with AI, we must first define what exactly we observe in an image. The latter is basically the result of a measurement of the object under examination, such as X-rays for a CT-scan or electromagnetic waves for an MRI examination, reconstructed through various steps. First, there is the instrumentation (the scanner) that introduces its signature—the so-called hardware-fingerprint, related to the hardware components (such as detectors for CT or coils for MRI). The same time, the acquisition protocol introduces its idiosyncrasies (protocol signature) in the image as a result of that particular procedure’s ability to acquire and highlight information from the tissue under investigation. Moreover, the protocol signature has its own intrinsic specificity related to the actual software and hardware implementation of the protocol itself. The result of this measurement can then be processed by mathematical algorithms (reconstruction, filtering, noise suppression, etc.) to

136

A. Barucci et al.

Fig. 7.2 The medical imaging formation process. The final image is the result of different processes related to image acquisition and processing

produce the graphical image we are used to, which in turn contains “traces” of the processing (software-fingerprint) [23]. Although this final image makes it possible to analyse the physical, chemical, biochemical, biological processes, etc. taking place inside the human body, the whole chain that constitutes the observation system is deeply involved in the obtained image. Each measurement is thus able to highlight the processes occurring within the observed object (in this case, a patient) but imprints its fingerprint in the images, which must be taken into account (Fig. 7.2).

7.2.2

Image Diagnostic Quality

As discussed earlier, the quality of a medical image depends on various factors (Fig. 7.2), including scanner hardware and acquisition protocol, as well as preprocessing (interpolation, normalisation, artefact removal, filtering, etc.). All these factors directly affect the “intrinsic” image quality [24], and several measurements have been used in the literature for this assessment [25], including signal-to-noise ratio, contrast-to-noise ratio, signal

7 Data Preparation for AI Analysis

137

uniformity, etc. While these indices are extremely useful and valid for describing the intrinsic characteristics of the image, they need to be supported by other information that takes into account the “diagnostic” quality of a medical image. The diagnostic quality of a medical image is defined as “the quality of decisions taken as a result of the image evaluation in terms of benefit for the patient (does the therapeutic treatment change? What is the effect on mortality? etc.)” [24]. Clearly, a radiologist does not necessarily need a “beautiful” image (assuming he can define what the term “beautiful” means), but rather the image that provides the most useful and effective representation for his purposes, such as the identification and classification of a lesion, or the pathological condition, etc. Moreover, with the same diagnostic quality, thanks to his experience, the clinical context, and the capabilities of the human mind, a radiologist is able to respond to the specific task even with two images of different intrinsic quality. Of course, it remains clear that diagnostic quality is more difficult to achieve in highly complex clinical questions, looking for small structural or functional changes, for example. Therefore, in radiology, since human observers are the final recipients of visual information, subjective evaluation of image quality is currently considered the most reliable approach [26].

7.2.3

Image Quality for AI Analyses

Machine learning and deep learning algorithms are agnostic, i.e. they can be used to answer questions that are very far apart. This does not mean that the same model trained to recognise hieroglyphs, for example, can be applied to recognise pathology in an MR image, but that the structure of the algorithm (e.g. the architecture of a neural network) can respond to different problems through specific training. Their versatility makes them powerful tools, but at the same time, makes their optimisation and adaptation essential to work efficiently on specific questions. Achieving this goal requires a training phase in which the AI algorithms learn from data to model the problem. Essentially, it is about finding a function that describes the data based upon

138

A. Barucci et al.

the data itself, as Vladimir Vapnik said, “learning is a problem of function estimation based upon empirical data.” In general, machine learning algorithms, and deep learning networks in particular, require extensive datasets. The amount of data needed to train and test a machine learning algorithm depends on many factors, including the complexity of the problem (i.e. the unknown function that relates the input variables to the output variable) and the complexity of the learning algorithm (i.e. the algorithm used to inductively learn the unknown mapping function). Tasks with strongly predictive variables require fewer samples to obtain wellperforming models than tasks with weakly informative variables or mixed with noise. Moreover, a large dataset is usually more statistically representative of the problem in terms of population, and algorithms trained on it are more robust to errors in the dataset. In contrast, small datasets are highly sensitive to poor homogeneity, typically suffer from overfitting problems and low generalisability [27], and often report unrealistic performance. When the number of “available” images is limited, as in some clinical applications, the requirement of a homogeneous dataset becomes stringent to reduce the impact of confounding factors such as acquisition protocols, preprocessing, etc. An example of the impact of data quality and data volume on the performance of a machine learning algorithm for a classification task is reported in Fig. 7.3. In the case of deep learning, it is possible to mitigate the need for an extensive dataset by using strategies such as transfer learning, where the neural network exploits the knowledge gained by solving a different problem (with greater data availability), but related to the problem being investigated. For example, the knowledge acquired by learning to recognise images of pathology might be applied, with the proper adaptation, to investigate a different problem. From an AI point of view, therefore, not only is the diagnostic quality of the single image important, but it is essential that all the images making up the entire dataset also have similar intrinsic quality. Not fulfilling this requirement leads at best to worse performance or, at worse, to chance-level predictions [27]. In this way, the AI algorithms are trained in the best possible conditions

7 Data Preparation for AI Analysis

139

Fig. 7.3 An example of the impact of data quality and numerosity on the performance of a machine learning algorithm for a classification task. Each point represents a patient with a colour referring to three different pathologies (yellow, green, and purple). The sketch shows that, in the presence of few data with low quality, the algorithm is unable to separate the pathologies. By simultaneously improving the quality and number of data, the algorithm’s performance increases

since, on the one hand, each image of the dataset has sufficient diagnostic quality to solve the problem, and, on the other hand, there are be no sources of unwanted variability between the different images, due to differences in intrinsic quality, that could confuse the AI algorithms. In medical imaging, acquiring a set of images with intrinsic “constant” quality means examining all the subjects in the same setting, with the same scanner, the same acquisition protocol, carrying out the same preprocessing, etc. In this scenario, it is not easy to obtain a large dataset. Therefore, in recent years, many studies have combined data and images collected in different ways (different acquisition institutes, scanners, acquisition protocols, etc.) to obtain a multicentre dataset that is clinically representative of the population to be analysed. Thereby, each image showed sufficient diagnostic quality but different intrinsic quality as a function of the different acquisition protocols and processing parameters. Therefore, the process of appropriately combining data from different sources, known as data pooling, is becoming fundamental to the success of AI in radiology and

140

A. Barucci et al.

is currently one of the steps that healthcare professionals and researchers need to handle carefully. In summary, the image quality for AI analyses focuses not only on the single image, but on the quality of the entire dataset. The whole dataset should be informative and contain biological, medical, and clinical information. There is a principle in information theory known as GIGO (garbage in–garbage out), which states that the outgoing information quality from an algorithm cannot exceed that of incoming information, meaning that it is not possible to extract information where there is none. This principle outlines the importance of data quality and the risk of obtaining random results using inappropriate AI tools. This is particularly true for deep learning, which, due to its power and an intrinsic complex interpretation of results, can lead to incorrect results beyond the control of experts [28].

7.3

Data Preprocessing for Machine Learning Analyses

In machine learning, data pre-processing is a crucial step because the quality of the data directly affects the learning ability of the model [29]. In the following, we briefly describe the most common preprocessing steps for tabular data. Missing Values Imputation Many clinical and image datasets contain missing values, frequently coded as blanks, NaNs, or other placeholders. However, many machine learning models cannot use these datasets directly because they assume the presence of all values. Discarding entire rows and/or columns with missing values is a possible solution for working with incomplete datasets. However, this approach further reduces the size of the dataset. Imputation of the missing values or their inference from the known portion of the data is a preferable approach [30]. One type of imputation algorithm is univariate, where values in a specific feature are imputed using the statistics (mean, median, or most frequent) of the non-missing values of that feature. In contrast, multivariate imputation procedures use

7 Data Preparation for AI Analysis

141

the entire set of available features to estimate the missing values, modelling each feature with missing values as a function of the other features and using that estimate for imputation [31]. Encoding Categorical Features Encoding categorical data (e.g., gender, education, drug level, etc.) is a process that transforms categorical data into numerical data that can be provided to the machine learning models [32]. Label or ordinal encoding is used when the categorical variables are ordinal; ordinal encoding converts each categorical label into integer values, respecting the sequence of labels. By contrast, the one-hot encoding strategy converts each category into binary numbers (0 or 1). This type of encoding is used when the data is nominal. Newly created binary features can be considered dummy variables. After one-hot encoding, the number of dummy variables depends on the number of categories present in the data [32]. Standardisation Standardisation of datasets is a common requirement for many machine learning models that might behave badly if individual features show different ranges. One way to standardise the data is to remove the mean value from each feature and divide it by the standard deviation. The mean value and standard deviation are calculated across samples. Other examples of data standardisation are detailed in reference [33]. Multicentre Data Harmonisation Pooling data from multiple institutions and hospitals provides an opportunity to assemble more extensive and diverse groups of subjects [34–37], increases statistical power [35, 38–41], and allows for the study of rare disorders and subtle effects [42, 43]. However, a major drawback of combining data across sites is the introduction of confounding effects due to non-biological variability in the data, usually related to the hardware and data acquisition protocol. For example, in an MRI study, the properties of MRI, such as scanner field strength, radiofrequency coil type, gradients coil characteristics, hardware, image recon-

142

A. Barucci et al.

struction algorithm, and non-standardised acquisition protocol parameters can introduce unwanted technical variability, which is also reflected in MRI-derived features [44–46]. Harmonisation of multicentre data, defined as applying mathematical and statistical concepts to reduce unwanted site variability while preserving biological content, is therefore necessary to ensure the success of cooperative analyses. Several harmonisation techniques exist, including functional normalisation [47], Removal of Artificial Voxel Effect by Linear regression (RAVEL) [48], global scaling [49, 50], and ComBat [51, 52]. Dimensionality Reduction Dimensionality reduction refers to the process of reducing the number of features in a dataset while keeping as much variation in the original dataset as possible. Dimensionality reduction could manage multicollinearity of the features and remove noise in the data. From a machine learning analysis perspective, a lower number of features means less training time and less computational power. It also avoids the potential problem of overfitting, leading to an increase in overall performance. Principal component analysis (PCA) is a linear dimensionality reduction technique that transforms a set of correlated variables into a smaller number of uncorrelated variables, called principal components, while retaining as much variation in the original dataset as possible [53]. Other linear dimensionality reduction methods are factor analysis (FA) [54] and linear discriminant analysis (LDA) [55]. Feature Selection In machine learning and statistics, feature selection is the process of selecting a subset of relevant features for use in model construction. In medicine and health care, feature selection is advantageous because it enables the interpretation of the machine learning model and the discovery of new potential biomarkers related to a specific disorder or condition [56]. Feature selection methods can be grouped into three categories: filter method, wrapper method, and embedded method [57, 58]. In the filter method, features are selected based on the general characteristics of the dataset without using any predictive model. In the wrapper

7 Data Preparation for AI Analysis

143

method, the feature selection algorithm is wrapped around the predictive model algorithm as a “wrapper” and the same model is used to select the best features [59]. In embedded methods, the feature selection process is integrated into the model learning phase by using algorithms that have their own feature selection methods (e.g., classification and regression tree (CART) and least absolute shrinkage and selection operator (LASSO) algorithms).

7.3.1

The Machine Learning Pipeline

Training and testing machine learning models requires choosing a proper validation scheme that handles data splitting (e.g., holdout, cross-validation (CV), bootstrap, etc.). This choice is crucial to avoid data leakage by ensuring that the model is built on training data and evaluated on test data that was never seen during the learning phase. Indeed, data leakage, which occurs when information from outside the training set is used to create the model, can lead to falsely overestimated performance in the test set (see, e.g., [60, 61]). In this view, all preprocessing steps involving more than one sample (e.g. some types of imputation of missing values, standardisation, multicentre data harmonisation, dimensionality reduction, feature selection, etc.) should be performed only on the training data and subsequently applied to the test data. In medicine and health care, where relatively small datasets are usually available, the straightforward hold-out validation scheme is rarely applied. In contrast, the CV and its nested version (nested CV) for hyperparameter optimisation of the entire workflow [62– 64] are frequently preferred. Repeated CVs or repeated nested CVs are also suggested for improving the reproducibility of the entire machine learning system [63]. In all these validation schemes, several training and test data procedures are performed on different data splits, recalling the need for a compact code structure to avoid errors that may lead to data leakage. All things considered, machine learning pipelines are a solution because they orchestrate all processing steps and the actual model in a short, easier-to-read, and easier-to-maintain code structure.

144

A. Barucci et al.

Fig. 7.4 Scheme of a machine learning pipeline consisting of two preprocessing steps (i.e., transformers #1 and #2) and one prediction step (i.e., estimator). Using the pipeline, preprocessing is performed on the training data only, regardless of the validation scheme selected (e.g., hold-out, nested hold-out, cross-validation (CV), nested CV). Reprinted from [52]

A pipeline represents the entire data workflow, combining all preprocessing steps and training of the machine learning model. It is essential to automate an end-to-end training/test process without any form of data leakage and improve reproducibility, ease of deployment, and code reuse, especially when complex validation schemes are needed (Fig. 7.4).

7.3.2

The Machine Learning Pipeline: A Case Study

To highlight the importance of performing all preprocessing steps on training data, which is needed in order to avoid data leakage and workflow performance overestimation on test data, we present the following case study. From the MR T1-weighted scans of 86 healthy subjects belonging to the International Consortium for Brain Mapping

7 Data Preparation for AI Analysis

145

(ICBM) dataset [65], we estimated radiomic [66] and fractal descriptors [67–71]—for a total of 140 MRI-derived features— of the brain cortical grey matter. By setting an arbitrary age threshold (in this case, 45 years), each subject was labelled 0 or 1 depending on their age. With a hold-out validation scheme (80% of subjects in the training set and 20% in the test set), we predicted the age class using a Support Vector classifier (using the Scikitlearn version 1.0.2 default hyperparameters) trained on radiomic and fractal features. We performed several data preprocessing procedures: data standardisation (on the entire dataset/in the training set only), feature selection (on the entire dataset/in the training set only), data standardisation, and feature selection (on the entire dataset/in the training set only). In Table 7.1, we show the following classification scores estimated in the test set: area under the receiver operating characteristic (AUROC), accuracy, sensitivity, and specificity. All the scores obtained by performing the preprocessing steps on the entire dataset are higher than those estimated by running the preprocessing only on the training data and then applying it to the test data. This is clear evidence of how incorrect application of preprocessing leads to data leakage, falsely inflating machine learning models performance. Table 7.1 Age class prediction scores in the test set AUROC Accuracy Standardisation 0.74 0.78 Entire dataset 0.71 0.72 Training set Feature selection Entire dataset 0.86 0.67 0.58 0.50 Training set Standardisation and feature selection 0.92 0.83 Entire dataset 0.81 0.72 Training set

Sensitivity

Specificity

0.77 0.69

0.80 0.80

0.86 0.71

0.55 0.37

0.86 0.71

0.82 0.73

AUROC area under the receiver operating characteristic

146

A. Barucci et al.

References 1. Barucci A, Cucci C, Franci M, Loschiavo M, Argenti F. A deep learning approach to ancient Egyptian hieroglyphs classification. IEEE Access. 2021;9:123438–47. 2. Cucci C, Barucci A, Stefani L, Picollo M, Jiménez-Garnica R, FusterLopez L. Reflectance hyperspectral data processing on a set of Picasso paintings: which algorithm provides what? A comparative analysis of multivariate, statistical and artificial intelligence methods. In: Groves R, Liang H, editors. Optics for arts, architecture, and archaeology VIII. Bellingham: SPIE; 2021. p. 1. 3. Li Z, Shen H, Cheng Q, Liu Y, You S, He Z. Deep learning based cloud detection for medium and high resolution remote sensing images of different sensors. ISPRS J Photogramm Remote Sens. 2019;150:197– 212. 4. Scapicchio C, Gabelloni M, Barucci A, Cioni D, Saba L, Neri E. A deep look into radiomics. Radiol Med. 2021;126(10):1296–311. 5. Hamet P, Tremblay J. Artificial intelligence in medicine. Metabolism. 2017;69:36–40. 6. Jordan MI, Mitchell TM. Machine learning: trends, perspectives, and prospects. Science. 2015;349(6245):255–60. 7. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–44. 8. Ismael AM, Sengür ¸ A. Deep learning approaches for COVID-19 detection based on chest X-ray images. Expert Syst Appl. 2021;164:114054. 9. Ding Y, Sohn JH, Kawczynski MG, Trivedi H, Harnish R, Jenkins NW, et al. A deep learning model to predict a diagnosis of Alzheimer disease by using 18F-FDG PET of the brain. Radiology. 2019;290(2):456–64. 10. Van Sloun RJ, Cohen R, Eldar YC. Deep learning in ultrasound imaging. Proc IEEE. 2019;108(1):11–29. 11. Deng S, Zhang X, Yan W, Chang EI, Fan Y, Lai M, et al. Deep learning in digital pathology image analysis: a survey. Front Med. 2020;14(4):470– 87. 12. Guiot J, Vaidyanathan A, Deprez L, Zerka F, Danthine D, Frix A, et al. A review in radiomics: making personalized medicine a reality via routine imaging. Med Res Rev. 2022;42(1):426–40. 13. Consortium TM. Project MONAI. Zenodo; 2020. https://zenodo.org/ record/4323059. 14. van Leeuwen KG, Schalekamp S, Rutten MJCM, van Ginneken B, de Rooij M. Artificial intelligence in radiology: 100 commercially available products and their scientific evidence. Eur Radiol. 2021;31(6):3797–804. 15. imbio. https://www.imbio.com. 16. BRAINOMIX. https://www.brainomix.com. 17. Goebel J, Stenzel E, Guberina N, Wanke I, Koehrmann M, Kleinschnitz C, et al. Automated ASPECT rating: comparison between the Frontier

7 Data Preparation for AI Analysis

147

ASPECT Score software and the Brainomix software. Neuroradiology. 2018;60(12):1267–72. 18. Ciulli S, Citi L, Salvadori E, Valenti R, Poggesi A, Inzitari D, et al. Prediction of impaired performance in trail making test in MCI patients with small vessel disease using DTI data. IEEE J Biomed Health Inform. 2016;20(4):1026–33. 19. Yagis E, De Herrera AGS, Citi L. Generalization performance of deep learning models in neurodegenerative disease classification. In: 2019 IEEE international conference on bioinformatics and biomedicine (BIBM), vol. 2019. San Diego: IEEE. p. 1692–8. https:// ieeexplore.ieee.org/document/8983088/. 20. Bertelli E, Mercatelli L, Marzi C, Pachetti E, Baccini M, Barucci A, et al. Machine and deep learning prediction of prostate cancer aggressiveness using multiparametric MRI. Front Oncol. 2022;11:802964. 21. Trajkovic J, Di Gregorio F, Ferri F, Marzi C, Diciotti S, Romei V. Resting state alpha oscillatory activity is a valid and reliable marker of schizotypy. Sci Rep. 2021;11(1):10379. 22. Marzi C, d’Ambrosio A, Diciotti S, Bisecco A, Altieri M, Filippi M, et al. Prediction of the information processing speed performance in multiple sclerosis using a machine learning approach in a large multicenter magnetic resonance imaging data set. Hum Brain Mapp. 2022;2022:26106. 23. Barca P, Marfisi D, Marzi C, Cozza S, Diciotti S, Traino AC, et al. A voxel-based assessment of noise properties in computed tomography imaging with the ASiR-V and ASiR iterative reconstruction algorithms. Appl Sci. 2021;11(14):6561. 24. Coppini G, Diciotti S, Valli G. Bioimmagini. 3rd ed. Bologna: Pàtron; 2012. 25. Ding Y. Visual quality assessment for natural and medical image. Cham: Springer; 2018. 26. Lévêque L, Outtas M, Liu H, Zhang L. Comparative study of the methodologies used for subjective medical image quality assessment. Phys Med Biol. 2021;66(15):15TR02. 27. Geirhos R, Temme CR, Rauber J, Schütt HH, Bethge M, Wichmann FA. Generalisation in humans and deep neural networks. Adv Neural Inf Proces Syst. 2018;31:7549–61. 28. Barucci A, Neri E. Adversarial radiomics: the rising of potential risks in medical imaging from adversarial learning. Eur J Nucl Med Mol Imaging. 2020;47(13):2941–3. 29. Marfisi D, Tessa C, Marzi C, Del Meglio J, Linsalata S, Borgheresi R, et al. Image resampling and discretization effect on the estimate of myocardial radiomic features from T1 and T2 mapping in hypertrophic cardiomyopathy. Sci Rep. 2022;12(1):10186. 30. Little RJA, Rubin DB. Statistical analysis with missing data. 3rd ed. Hoboken: Wiley; 2020. p. 1.

148

A. Barucci et al.

31. Rubin DB, editor. Multiple imputation for nonresponse in surveys. Hoboken: Wiley; 1987. 32. Cohen P, West SG, Aiken LS. Applied multiple regression/correlation analysis for the behavioral sciences. London: Psychology Press; 2014. 33. Raju VNG, Lakshmi KP, Jain VM, Kalidindi A, Padma V. Study the influence of normalization/transformation process on the accuracy of supervised classification. In: 2020 third international conference on smart systems and inventive technology (ICSSIT). Tirunelveli: IEEE; 2020. p. 729–35. 34. Pomponio R, Erus G, Habes M, Doshi J, Srinivasan D, Mamourian E, et al. Harmonization of large MRI datasets for the analysis of brain imaging patterns throughout the lifespan. NeuroImage. 2020;208:116450. 35. Radua J, Vieta E, Shinohara R, Kochunov P, Quidé Y, Green MJ, et al. Increased power by harmonizing structural MRI site differences with the ComBat batch adjustment method in ENIGMA. NeuroImage. 2020;218:116956. 36. Fortin JP, Cullen N, Sheline YI, Taylor WD, Aselcioglu I, Cook PA, et al. Harmonization of cortical thickness measurements across scanners and sites. NeuroImage. 2018;167:104–20. 37. Fortin JP, Parker D, Tunç B, Watanabe T, Elliott MA, Ruparel K, et al. Harmonization of multi-site diffusion tensor imaging data. NeuroImage. 2017;161:149–70. 38. Beer JC, Tustison NJ, Cook PA, Davatzikos C, Sheline YI, Shinohara RT, et al. Longitudinal ComBat: a method for harmonizing longitudinal multi-scanner imaging data. NeuroImage. 2020;220:117129. 39. Keshavan A, Paul F, Beyer MK, Zhu AH, Papinutto N, Shinohara RT, et al. Power estimation for non-standardized multisite studies. NeuroImage. 2016;134:281–94. 40. Pinto MS, Paolella R, Billiet T, Van Dyck P, Guns PJ, Jeurissen B, et al. Harmonization of brain diffusion MRI: concepts and methods. Front Neurosci. 2020;14:396. 41. Suckling J, Ohlssen D, Andrew C, Johnson G, Williams SCR, Graves M, et al. Components of variance in a multicentre functional MRI study and implications for calculation of statistical power. Hum Brain Mapp. 2008;29(10):1111–22. 42. Dansereau C, Benhajali Y, Risterucci C, Pich EM, Orban P, Arnold D, et al. Statistical power and prediction accuracy in multisite resting-state fMRI connectivity. NeuroImage. 2017;149:220–32. 43. Yu M, Linn KA, Cook PA, Phillips ML, McInnis M, Fava M, et al. Statistical harmonization corrects site effects in functional connectivity measurements from multi-site fMRI data. Hum Brain Mapp. 2018;39(11):4213–27.

7 Data Preparation for AI Analysis

149

44. Han X, Jovicich J, Salat D, van der Kouwe A, Quinn B, Czanner S, et al. Reliability of MRI-derived measurements of human cerebral cortical thickness: the effects of field strength, scanner upgrade and manufacturer. NeuroImage. 2006;32(1):180–94. 45. Jovicich J, Czanner S, Greve D, Haley E, van der Kouwe A, Gollub R, et al. Reliability in multi-site structural MRI studies: effects of gradient non-linearity correction on phantom and human data. NeuroImage. 2006;30(2):436–43. 46. Takao H, Hayashi N, Ohtomo K. Effect of scanner in longitudinal studies of brain volume changes. J Magn Reson Imaging. 2011;34(2):438–44. 47. Fortin JP, Triche TJ, Hansen KD. Preprocessing, normalization and integration of the Illumina human methylation EPIC array with minfi. Bioinformatics. 2016;33(4):558–60. 48. Fortin JP, Sweeney EM, Muschelli J, Crainiceanu CM, Shinohara RT. Removing inter-subject technical variability in magnetic resonance imaging studies. NeuroImage. 2016;132:198–212. 49. Cleveland WS. LOWESS: a program for smoothing scatterplots by robust locally weighted regression. Am Stat. 1981;35(1):54. 50. Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19(2):185–93. 51. Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostat Oxf Engl. 2007;8(1):118–27. 52. Marzi C, Giannelli M, Barucci A, Tessa C, Mascalchi M, Diciotti S. Efficacy of MRI data harmonization in the age of machine learning. A multicenter study across 36 datasets. 2022. 53. Jolliffe IT, Cadima J. Principal component analysis: a review and recent developments. Philos Trans R Soc A Math Phys Eng Sci. 2016;374(2065):20150202. 54. Lord FM, Wainer H, Messick S, editors. Principals of modern psychological measurement: a Festschrift for Frederic M[ather] Lord. Hillsdale: Erlbaum; 1983. p. 377. 55. Duda RO, Hart PE, Stork DG. Pattern classification. 2nd ed. New York: Wiley; 2001. p. 654. 56. Remeseiro B, Bolon-Canedo V. A review of feature selection methods in medical applications. Comput Biol Med. 2019;112:103375. 57. Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res. 2003;3(Mar):1157–82. 58. Stańczyk U. Feature evaluation by filter, wrapper, and embedded approaches. In: Stańczyk U, Jain LC, editors. Feature selection for data and pattern recognition. Berlin: Springer; 2015. p. 29–44. 59. Kohavi R, John GH. Wrappers for feature subset selection. Artif Intell. 1997;97(1–2):273–324.

150

A. Barucci et al.

60. Yagis E, Atnafu SW, Seco G, de Herrera A, Marzi C, Scheda R, Giannelli M, et al. Effect of data leakage in brain MRI classification using 2D convolutional neural networks. Sci Rep. 2021;11(1):22544. 61. Tampu IE, Eklund A, Haj-Hosseini N. Inflation of test accuracy due to data leakage in deep learning-based classification of OCT images. Sci Data. 2022;9(1):580. 62. Müller AC, Guido S. Introduction to machine learning with Python: a guide for data scientists. 1st ed. Sebastopol: O’Reilly Media; 2016. p. 376. 63. Scheda R, Diciotti S. Explanations of machine learning models in repeated nested cross-validation: an application in age prediction using brain complexity features. Appl Sci. 2022;12(13):6681. 64. Varma S, Simon R. Bias in error estimation when using cross-validation for model selection. BMC Bioinf. 2006;7(1):91. 65. 1000 functional connectomes project (FPC). https://fcon_1000.projects.nitrc.org/fcpClassic/FcpTable.html. 66. van Griethuysen JJM, Fedorov A, Parmar C, Hosny A, Aucoin N, Narayan V, et al. Computational radiomics system to decode the radiographic phenotype. Cancer Res. 2017;77(21):e104–7. 67. Marzi C, Ciulli S, Giannelli M, Ginestroni A, Tessa C, Mascalchi M, et al. Structural complexity of the cerebellum and cerebral cortex is reduced in spinocerebellar ataxia type 2. J Neuroimaging Off J Am Soc Neuroimaging. 2018;28(6):688–93. 68. Pantoni L, Marzi C, Poggesi A, Giorgio A, De Stefano N, Mascalchi M, et al. Fractal dimension of cerebral white matter: a consistent feature for prediction of the cognitive performance in patients with small vessel disease and mild cognitive impairment. NeuroImage Clin. 2019;24:101990. 69. Marzi C, Giannelli M, Tessa C, Mascalchi M, Diciotti S. Toward a more reliable characterization of fractal properties of the cerebral cortex of healthy subjects during the lifespan. Sci Rep. 2020;10(1):16957. 70. Marzi C, Giannelli M, Tessa C, Mascalchi M, Diciotti S. Fractal analysis of MRI data at 7 T: how much complex is the cerebral cortex? IEEE Access. 2021;9:69226–34. 71. Pani J, Marzi C, Stensvold D, Wisløff U, Håberg AK, Diciotti S. Longitudinal study of the effect of a 5-year exercise intervention on structural brain complexity in older adults. A generation 100 substudy. NeuroImage. 2022;2022:119226.

Current Applications of AI in Medical Imaging Gianfranco Di Salle, Salvatore Claudio Fanni, Gayane Aghakhanyan, and Emanuele Neri

8.1

Introduction

In the recent years, a growing interest in artificial intelligence (AI) has been observed, and its use has been investigated in a variety of clinical contexts with different applications. Oncologic imaging is undeniably the most investigated application field of AI, in the forms of radiomics-based machine learning (ML) and deep learning (DL), already well-described in the previous chapters. However, also non-oncologic imaging has not been spared by AI breakthrough, as demonstrated by the increasing number of research studies and clinically applicable tools. AI applications have been tested at virtually all stages of the imaging pipeline, from exam modality selection to exam

G. Di Salle () · S. C. Fanni · E. Neri Department of Translational Research, Academic Radiology, University of Pisa, Pisa, Italy e-mail: [email protected]; [email protected] G. Aghakhanyan Department of Translational Research, University of Pisa, Pisa, Italy

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. E. Klontzas et al. (eds.), Introduction to Artificial Intelligence, Imaging Informatics for Healthcare Professionals, https://doi.org/10.1007/978-3-031-25928-9_8

151

8

152

G. Di Salle et al.

protocol selection, data acquisition, image reconstruction, image processing, image interpretation, and reporting. DL models have also been trained to understand and generate natural language. These models may be useful as clinical decision support tool providing imaging appropriateness guidance and may even be able to generate radiological report based on keywords provided by the radiologists. However, AI applications in diagnostic radiology are mostly aimed at reasoning and perception task, based on interpretation of sensory information. AI could be particularly useful for findings detection, as an additional reader providing a second interpretation, or as the exclusively reader in those low-resources regions where the availability of radiologists is limited. As proof of its value, in 2021 the WHO recommended the use of AIpowered computer-aided detection (CAD) software for screening and triage purpose in countries suffering from a high burden of pulmonary tuberculosis. AI becomes even more valuable in classification task, to discriminate between benign or malignant tumors or between different histological subtype of malignancies. Radiomics-based algorithms are designed for diagnostic, prognostic, and classification tasks. Radiomics-based model implementation in clinical practice may be fastened by AI-based segmentation. Segmentation is the process of tagging the diagnostic image, and of dividing it in subregions representing organ structures and components, tumor masses and their different necrotic, proliferating, hypervascular and quiescent cellular compartments. Lesion contouring is often useful for assessing disease severity and prognosis but is a largely time-consuming and operator-dependent task, which hampers the diffusion of handcrafted segmentation protocols for diagnosis and follow-up. Indeed, nearly all research studies in radiomics ground their pipeline on the basis of a solid segmentation, which is essential for the correct calculation of radiomic features. This is why AI-based automated segmentation may improve the efficiency and methodological reliability of radiomic studies themselves. This brief chapter aims at summarizing the most up-to-date applications of AI in medical imaging, dividing them by the

8 Current Applications of AI in Medical Imaging

153

specific task they are conceived for: lesion detection, disease classification, and organ/tumor segmentation. Instead of listing all approaches proposed to-date in literature—which would be unproductive as their number is constantly growing—we will discuss the most relevant contributions per specific task while trying to highlight favorable and unfavorable arguments for their translation into clinical practice.

8.2

Detection

Lesion detection in medical images can be tricky in the day-to-day clinical practice, especially with incidental findings, during exams performed for other purposes. Additionally, less conspicuous findings may escape the reporter’s attention, for example, in emergency setting where all stages of the diagnostic exams have to be performed very quickly. Therefore, implementation of AI algorithms designed to automatically detect imaging findings can be useful for improving the sensitivity of human reporting. Examples of the use of DL in disease detection include pulmonary embolism (PE) in urgent CT pulmonary angiograms (CTPAs) [1, 2], as well as large vessel occlusion (LVO) in noncontrast head CT scans (NCCT) of acute stroke patients [3]. Detection of intracranial hemorrhage (ICH) was investigated through DL in works by Ginat [52] and by Seyam et al. [4]. Oncologic imaging is, of course, one of the fields that benefit most from AI, in that new algorithms are aimed to automatically detect new lesions and may profoundly impact oncologic screening and follow-up. Automatic tumor detection is mostly pursued using DL techniques, especially CNNs. Neuroradiology already counts a large number of applications, mostly based on MRI [5, 6] and nuclear medicine [7] due to the functional and multiparametric information provided by these techniques. On the contrary, computed tomography-based data are more frequently used to train algorithms for body applications, such as colorectal cancer [8], liver [9], and ovary [10]. Of note, interesting emerging applications use DL algorithms to detect pancreatic neoplasm in endoscopic ultrasound videos [11].

154

G. Di Salle et al.

In the surgical field, DL is increasingly applied to detect critical events in an intraoperative setting, which is called “surgical phase recognition” or “surgical workflow recognition.” This task is currently recognized to be potentially critical to improve workflow optimization, intraoperative planning, surgical training, and patient safety. As in the following paragraphs, average performance of the reported algorithms was astonishingly good. Integration into clinical practice depends on their generalizability, which in turn is a function of research methodological quality, especially in terms of external validation.

8.3

Classification

The devil is in the details, a famous idiom says. Diagnostic strategies in medicine, including imaging, are all designated to collect enough information to discriminate among physiological and pathological, benign and malignant, stable and quickly evolving clinical conditions. As the number of available diagnostic modalities increases, complex data integration is needed to subdivide patients into homogeneous groups benefitting from similar therapies, sharing similar prognosis, and characterized by a similar response to treatments. Disease classification is one of the most investigated AI-based tasks, as witnessed by the high number of applications proposed in literature and tested in hospitals workflow. Most available algorithms are based on supervised learning, as they require input from large datasets of labeled data in order to infer predictions about new cases. An example of the use of AI algorithms in image-based disease classification is represented by Alzheimer’s disease (AD), the most common cause of dementia among older adults. MR images were used as training data for distinguishing AD patients from healthy controls in works by Lu et al. [12] and Qiu et al. [13]. Nuclear medicine data have been used for the same research question by Alongi et al. [14], reaching good accuracy performances.

8 Current Applications of AI in Medical Imaging

155

Another field of AI application in patient’s diagnostic classification is Parkinson’s disease (PD), a prevalent neurodegenerative disorder especially affecting motor and cognitive domains. In routine clinical practice, PD patients need to be timely distinguished from atypical parkinsonian syndrome patients (APS) and healthy controls to undergo the most appropriate clinical management. For this purpose, MRI-based [15] and nuclear medicine-based [16, 17] approaches have been proposed. Good accuracy performances have also been obtained in the validation of AI-based classification algorithms of glioma molecular subtypes (Jiang et al. 2021, [18]) and grading [19], and breast tissue density [20]. Further oncological applications include computer-aided colorectal polyp classification [21], classification of the origin of brain metastasis [22], and malignancy assessment of breast masses [23]. In summary, the perspective of classifying patients based on AI-powered elaborations of imaging data is succulent, in oncological as well as non-oncological settings. The most plausible and direct way to integrate this information in clinical practice is using it as a support for clinical diagnosis and considering it along with multimodal data while maintaining the physician at the center of the diagnostic process. Conversely, the lack of human and financial resources, the tendency to centralization of care, the overload of the existing infrastructure are somewhat urging decision-makers to implement rapid and easy-to-automate solutions, eventually skipping human supervision or intervention. Of note, it is currently unclear which is the amount and even type of evidence needed to accomplish such a transition to safely integrating AI-based automation in diagnostic imaging.

8.4

Segmentation

As mentioned above, segmentation is a very useful activity for calculating quantitative indices to guide disease diagnosis and prognosis, but also for extracting radiomic features from the segmented image. Manual segmentation is not a widespread activity in routine clinics because it is time-consuming and

156

G. Di Salle et al.

operator dependent. This is the reason why many AI tools have been recently implemented for the semi- and fully automated contouring of lesions, organs, and tissues. Automatic segmentation for volume quantification has been recently introduced to improve selection of ischemic stroke patients for fibrinolytic and endovascular therapy. A large number of clinical trials, including EXTEND [24], EXTEND-IA [25], DAWN [26], DEFUSE [27] were based on the use of a single software for the analysis of CT perfusion imaging (RAPID-AI iSchemaView), aimed at contouring and measuring the ischemic core and penumbra. Since then, an increasing number of software applications has been developed to serve as therapeutic decision support in stroke patients triage. The widespread diffusion of these software in clinical practice is based on specific AHA guidelines [28] and has impacted profoundly the workflow of stroke patients, with considerable advantages in terms of efficiency gain and time savings [29]. Most of this software does not have scientific validation by competent regulatory authorities, and concerns can be raised regarding output variability across different vendors, software, or calculation methods. Segmentation is also the most time-consuming task in many diagnostic cardiac-CT and MRI applications, where structural volume measurements throughout the cardiac cycle can give valuable information about anatomy and physiology [30]. Dedicated commercial algorithms are widespread in most diagnostic centers and are mostly intended to the measurement of cardiac chambers volume [31, 32], myocardial thickness [33], and great vessels/coronary arteries segmentation [34]. In a recent paper, Monti et al. [35] implemented a CNN-based approach to measure aortic diameters in nine different landmark sites proposed by the American Heart Association (AHA) for the follow-up of aneurism growth and rupture risk prediction. Automatic segmentation is also a matter of current investigation in oncologic imaging, where tumor volumetric parameters give invaluable information about staging, radiation therapy dosimetry, prognosis, treatment response. Investigated organs

8 Current Applications of AI in Medical Imaging

157

include brain [36], liver [37], lung [38], breast [39], head and neck [40], rectum [41], and stomach [42].

8.4.1

Monitoring

Emerging evidence has been collected about the potentiality of monitoring oncologic patients’ follow-up to predict prognostic information. For example, radiomics- [43] and neural networkbased (Trebeschi et al. [53]) models have been developed to predict survival of patients with metastatic urothelial cancer from follow-up whole-body CT images. Integration of such algorithms in clinical practice could give valuable additional information to the established response evaluation criteria and ultimately influence therapeutic management of oncologic patients.

8.4.2

Prediction

Disease outcome information is never just binary, as disease severity classes profoundly influence quality of life, organizational needs, and healthcare costs. Within comparable durations of disease, a valuable information to obtain is how the patient will survive, allowing to shift the focus of healthcare to encompass qualitative interventions on disease management. For example, several attempts have been recently made to predict long-term motor outcome in PD patients, using nuclear medicine baseline data. More specifically, multi-dimensional databases of clinical and radiological information were used to predict year-4 motor UPDRS-III score [44, 45]. A recent study by Salmanpour et al. [46] found three different disease clusters in longitudinal progression of PD, by training an algorithm on a clinical-imaging dataset. This kind of studies is only a small part of the AI-powered innovation in medical imaging but is likely to impact future healthcare organization, and especially resources allocation, in a preferential manner.

158

8.4.3

G. Di Salle et al.

Additional Applications

8.4.3.1 Image Enhancement and Reconstruction In the last years, the field of medical image processing has been revolutionized by the availability of AI-powered image enhancement and reconstruction tools. Image enhancement is the process of reducing erroneous information within the medical images, e.g. denoising¸ artifact correction technique, and resolution enhancement. Conversely, image reconstruction is a method to transform a series of images into another one. Despite getting less attention compared to clinical diagnostic tasks, the use of AI for image quality enhancement and reconstruction may have an even higher potential in terms of imaging cost reduction and safety. Two reports have recently highlighted the good performances obtained by state-of-the-art techniques [47, 48]. In particular, dose reduction obtained using AI algorithms in pediatric radiology has been quantified as 36–70% [47]. In adult applications, such as abdominal CT scans for urinary tract calculi detection, reduction rates up to 84% compared to usual iterative-reconstruction (IR) algorithms have been documented with similar image quality [48]. Available literature suggests that DL-based reconstruction may also overcome the appearance defined as “waxy/plastic” [49] of the newer reconstruction low-dose algorithms while simultaneously providing a better SNR and CNR.

8.4.4

Workload Reduction?

Intuitively, AI applications should simplify decision-making and reduce radiologists’ workload by automatizing complex calculations, time-consuming handcraft segmentations, and careful comparison with literature and textbooks for diagnosis and classifications. Curiously, this seems not be the case. In a literature review by [50], it is reported that novel applications of AI in radiology increase radiologists’ workload in approximately 48% of the analyzed studies and decrease it in only 4%. On the one hand, the common goal of AI applications is enhancing diagnostic

8 Current Applications of AI in Medical Imaging

159

accuracy and patient care; on the other hand, the radiology field is not exempt from the inverse relationship between decision accuracy and decision speed [51]. AI may find itself to improve medical decision accuracy by powering segmentation, classification, and detection, and, at the same time, worsen it by increasing radiologists’ workload to and beyond their optimal functioning limits. In the light of these considerations, as direct recipients of AI-based revolution in radiology, we should be aware that this revolution can only take place when appropriate resources and organizational support can be ensured to radiologists.

8.5

Conclusions

Current literature is rich in potential AI applications in medical imaging, involving multiple image modalities, covering all body regions, and using innumerable variants of technically refined ML algorithms. In the available works, there is considerable variability regarding AI methods explainability, quality of the ML design, and amount of evidence about single conditions. All the previous factors influence the potential of single models to be translated into clinical workflow. Model generalizability and rigorous validation techniques are essential for integration into real-world clinical scenarios. As these criteria become more and more recognized, and biobanks start collecting publicly available imaging data, sparse stand-alone AI integration experiments will hopefully converge into large trials with unquestionable generalizability and outcomes scalable into clinical routine. The simultaneous, across-the-board discussion of ethical and regulatory features of AI does not only influence the availability of proposed techniques, but shapes current and future research and defines the role of technological advancements in clinical decision-making.

160

G. Di Salle et al.

References 1. Weikert T, Winkel DJ, Bremerich J, Stieltjes B, Parmar V, Sauter AW, Sommer G. Automated detection of pulmonary embolism in CT pulmonary angiograms using an AI-powered algorithm. Eur Radiol. 2020;30(12):6545–53. https://doi.org/10.1007/s00330-020-06998-0. 2. Schmuelling L, Franzeck FC, Nickel CH, Mansella G, Bingisser R, Schmidt N, et al. Deep learning-based automated detection of pulmonary embolism on CT pulmonary angiograms: no significant effects on report communication times and patient turnaround in the emergency department nine months after technical implementation. Eur J Radiol. 2021;141:109816. 3. Olive-Gadea M, Crespo C, Granes C, Hernandez-Perez M, Pérez De La Ossa N, Laredo C, Urra X, Carlos Soler J, Soler A, Puyalto P, Cuadras P, Marti C, Ribo M. Deep learning based software to identify large vessel occlusion on noncontrast computed tomography. Stroke. 2020;51:3133– 7. https://doi.org/10.1161/STROKEAHA.120.030326. 4. Seyam M, Weikert T, Sauter A, Brehm A, Psychogios MN, Blackham KA. Utilization of artificial intelligence–based intracranial hemorrhage detection on emergent noncontrast CT images in clinical workflow. Radiology. 2022;4(2):1–6. https://doi.org/10.1148/ryai.210168. 5. Yang S, Yoon HI, Kim JS. Deep-learning-based automatic detection and segmentation of brain metastases with small volume for stereotactic ablative radiotherapy. Cancer. 2022;14:2555. 6. Turk O, Ozhan D, Acar E, Cetin T, Yilmaz M. Automatic detection of brain tumors with the aid of ensemble deep learning architectures and class activation map indicators by employing magnetic resonance images. Z Med Phys. 2022; https://doi.org/10.1016/j.zemedi.2022.11.010. 7. Rahimpour M, Boellaard R, Jentjens S, Deckers W, Goffin K, Koole M. A multi-label CNN model for the automatic detection and segmentation of gliomas using [18 F] FET PET imaging. Eur J Nucl Med Mol Imaging. 2023; https://doi.org/10.1007/s00259-023-06193-5. 8. Akilandeswari A, Sungeetha D, Joseph C, Thaiyalnayaki K, Baskaran K, Ramalingam RJ, Al-lohedan H, Al-dhayan DM, Karnan M, Hadish KM. Automatic detection and segmentation of colorectal cancer with deep residual convolutional neural network. Evid Based Complement Alternat Med. 2022;2022:3415603. 9. Models P, Othman E, Mahmoud M, Dhahri H, Abdulkader H, Mahmood A, Ibrahim M. Automatic detection of liver cancer using hybrid pretrained models. Sensors. 2022;22:5429. 10. Wang X, Li H, Zheng P. Automatic detection and segmentation of ovarian cancer using a multitask model in pelvic CT images. Oxid Med Cell Longev. 2022;2022:6009107.

8 Current Applications of AI in Medical Imaging

161

11. Jaramillo M, Ruano J, Gómez M, Romero E. Automatic detection of pancreatic tumors in endoscopic ultrasound videos using deep learning techniques. In: Medical imaging 2022: ultrasonic imaging and tomography, vol. 12038. Bellingham, WA: SPIE; 2022. p. 106–15. 12. Lu D, Popuri K, Ding GW, Balachandar R, Beg MF. Alzheimer’s disease neuroimaging. I. Multimodal and multiscale deep neural networks for the early diagnosis of Alzheimer’s disease using structural MR and FDGPET images. Sci Rep. 2018;8:5697. 13. Qiu S, Joshi PS, Miller MI, Xue C, Zhou X, Karjadi C, Chang GH, Joshi AS, Dwyer B, Zhu S, Kaku M, Zhou Y, Alderazi YJ, Swaminathan A, Kedar S, Saint-Hilaire MH, Auerbach SH, Yuan J, Sartor EA, Au R, Kolachalama VB, et al. Development and validation of an interpretable deep learning framework for Alzheimer’s disease classification. Brain. 2020;143(6):1920–33. https://doi.org/10.1093/brain/awaa137. 14. Alongi P, Laudicella R, Panasiti F, Stefano A, Comelli A, Giaccone P, Arnone A, Minutoli F, Quartuccio N, Cupidi C, Arnone G, Piccoli T, Grimaldi LME, Baldari S, Russo G. Radiomics analysis of brain [(18)F]FDG PET/CT to predict Alzheimer’s disease in patients with amyloid PET positivity: a preliminary report on the application of SPM cortical segmentation, pyradiomics and machine-learning analysis. Diagnostics. 2022;12(4):933. https://doi.org/10.3390/diagnostics12040933. 15. Shinde S, Prasad S, Saboo Y, Kaushick R, Saini J, Pal PK, Ingalhalikar M. Predictive markers for Parkinson’s disease using deep neural nets on neuromelanin sensitive MRI. NeuroImage Clin. 2019;22:101748. https:// doi.org/10.1016/j.nicl.2019.101748. 16. Zhao Y, Wu P, Wu J, Brendel M, Lu J, Ge J, Tang C, Hong J, Xu Q, Liu F, Sun Y, Ju Z, Lin H, Guan Y, Bassetti C, Schwaiger M, Huang SC, Rominger A, Wang J, Zuo C, Shi K, et al. Decoding the dopamine transporter imaging for the differential diagnosis of parkinsonism using deep learning. Eur J Nucl Med Mol Imaging. 2022;49(8):2798–811. https://doi.org/10.1007/s00259-022-05804-x. 17. Salmanpour MR, Shamsaei M, Saberi A, Hajianfar G, Soltanian-Zadeh H, Rahmim A. Robust identification of Parkinson’s disease subtypes using radiomics and hybrid machine learning. Comput Biol Med. 2021;129:104142. https://doi.org/10.1016/j.compbiomed.2020.104142. 18. Hsu WW, Guo JM, Pei L, Chiang LA, Li YF. A weakly supervised deep learning-based method for glioma subtype classification using WSI and mpMRIs. Sci Rep. 2022;12:6111. https://doi.org/10.1038/s41598-02209985-1. 19. Yu X, Wu Y, Bai Y, Han H, Chen L, Gao H, Wei H, Wang M. A lightweight 3D UNet model for glioma grading. Phys Med Biol. 2022;67:155006. 20. Magni V, Interlenghi M, Cozzi A, Alì M, Salvatore C, Azzena AA, Capra D, Carriero S, Della Pepa G, Fazzini D, Granata G, Monti CB, Muscogiuri G, Pellegrino G, Schiaffino S, Castiglioni I, Papa S,

162

G. Di Salle et al.

Sardanelli F. Development and validation of an AI-driven mammographic breast density classification tool based on radiologist consensus. Radiol Artif Intell. 2022;4(2):e210199. https://doi.org/10.1148/ryai.210199. 21. Younas F, Usman M, Yan WQ. A deep ensemble learning method for colorectal polyp classification with optimized network parameters. Berlin: Springer; 2023. p. 2410–33. 22. Jiao T, Li F, Cui Y, Wang X, Li B, Shi F, Xia Y, Zhou Q, Zeng Q. Deep learning with an attention mechanism for differentiating the origin of brain metastasis using MR images. J Magn Reson Imaging. 2023; https:/ /doi.org/10.1002/jmri.28695. 23. Abdel Rahman AS, Belhaouari SB, Bouzerdoum A, Baali H, Alam T, Eldaraa AM. Breast mass tumor classification using deep learning. In: 2020 IEEE International conference on informatics, IoT, and enabling technologies (ICIoT), Doha, Qatar; 2020. p. 271–6. https://doi.org/ 10.1109/ICIoT48696.2020.9089535. 24. Ma H, Campbell BCV, Parsons MW, et al. Thrombolysis guided by perfusion imaging up to 9 hours after onset of stroke. N Engl J Med. 2019;380:1795–803. pmid:31067369. https://doi.org/10.1056/ NEJMoa1813046. 25. Campbell BC, Mitchell PJ, Kleinig TJ, Dewey HM, Churilov L, Yassi N, Yan B, Dowling RJ, Parsons MW, Oxley TJ, Wu TY, Brooks M, Simpson MA, Miteff F, Levi CR, Krause M, Harrington TJ, Faulder KC, Steinfort BS, Priglinger M, EXTEND-IA Investigators. Endovascular therapy for ischemic stroke with perfusion-imaging selection. N Engl J Med. 2015;372(11):1009–18. https://doi.org/10.1056/NEJMoa1414792. 26. Nogueira RG, Jadhav AP, Haussen DC, Bonafe A, Budzik RF, Bhuva P, Yavagal DR, Ribo M, Cognard C, Hanel RA, Sila CA, Hassan AE, Millan M, Levy EI, Mitchell P, Chen M, English JD, Shah QA, Silver FL, Pereira VM, DAWN Trial Investigators. Thrombectomy 6 to 24 hours after stroke with a mismatch between deficit and infarct. N Engl J Med. 2018;378(1):11–21. https://doi.org/10.1056/NEJMoa1706442. 27. Albers GW, Marks MP, Kemp S, Christensen S, Tsai JP, Ortega-Gutierrez S, McTaggart RA, Torbey MT, Kim-Tenser M, Leslie-Mazwi T, Sarraj A, Kasner SE, Ansari SA, Yeatts SD, Hamilton S, Mlynash M, Heit JJ, Zaharchuk G, Kim S, Carrozzella J, DEFUSE 3 Investigators. Thrombectomy for stroke at 6 to 16 hours with selection by perfusion imaging. N Engl J Med. 2018;378(8):708–18. https://doi.org/10.1056/ NEJMoa1713973. 28. Powers WJ, Rabinstein AA, Ackerson T, Adeoye OM, Bambakidis NC, Becker K, Biller J, Brown M, Demaerschalk BM, Hoh B, Jauch EC, Kidwell CS, Leslie-Mazwi TM, Ovbiagele B, Scott PA, Sheth KN, Southerland AM, Summers DV, Tirschwell DL. Guidelines for the early Management of Patients with Acute Ischemic Stroke: 2019 update to the 2018 guidelines for the early management of acute ischemic stroke: a guideline for healthcare professionals from the American Heart Associa-

8 Current Applications of AI in Medical Imaging

163

tion/American Stroke Association. Stroke. 2019;50(12):e344–418. https:/ /doi.org/10.1161/STR.0000000000000211. 29. Vagal A, Saba L. Artificial intelligence in “code stroke”—a paradigm shift: do radiologists need to change their practice? Radiol Artif Intell. 2022;4(2):6–8. https://doi.org/10.1148/ryai.210204. 30. Yang DH. Application of artificial intelligence to cardiovascular computed tomography. Korean J Radiol. 2021;22(10):1597–608. Epub 2021 Jul 26. PMID: 34402240; PMCID: PMC8484158. https://doi.org/ 10.3348/kjr.2020.1314. 31. Bruns S, Wolterink JM, Takx RAP, van Hamersvelt RW, Suchá D, Viergever MA, et al. Deep learning from dual-energy information for whole-heart segmentation in dual-energy and single-energy non-contrastenhanced cardiac CT. Med Phys. 2020;47:5048–60. 32. Baskaran L, Maliakal G, Al’Aref SJ, Singh G, Xu Z, Michalak K, et al. Identification and quantification of cardiovascular structures from CCTA: an end-to-end, rapid, pixel-wise, deep-learning method. JACC Cardiovasc Imaging. 2020;13:1163–71. 33. Koo HJ, Lee JG, Ko JY, Lee G, Kang JW, Kim YH, et al. Automated segmentation of left ventricular myocardium on cardiac computed tomography using deep learning. Korean J Radiol. 2020;21:660–9. 34. Morris ED, Ghanem AI, Dong M, Pantelic MV, Walker EM, Glide-Hurst CK. Cardiac substructure segmentation with deep learning for improved cardiac sparing. Med Phys. 2020;47:576–86. 35. Monti CB, van Assen M, Stillman AE, Lee SJ, Hoelzer P, Fung GSK, Secchi F, Sardanelli F, De Cecco CN. Evaluating the performance of a convolutional neural network algorithm for measuring thoracic aortic diameters in a heterogeneous population. Radiol Artif Intell. 2022;4(2):e210196. https://doi.org/10.1148/RYAI.210196. 36. Chen W, Zhou W, Zhu L, Cao Y, Gu H, Yu B. MTDCNet: a 3D multi-threading dilated convolutional network for brain tumor automatic segmentation. J Biomed Inform. 2022;133(August):104173. https:/ /doi.org/10.1016/j.jbi.2022.104173. 37. Manjunath RV, Kwadiki K. Biomedical engineering advances modified U-NET on CT images for automatic segmentation of liver and its tumor. Biomed Eng Adv. 2022;4(June):100043. https://doi.org/10.1016/ j.bea.2022.100043. 38. Yang J, Wu B, Li L, Cao P, Zaiane O. MSDS-UNet: a multi-scale deeply supervised 3D U-net for automatic segmentation of lung tumor in CT. Comput Med Imaging Graph. 2021;92:101957. https://doi.org/10.1016/ j.compmedimag.2021.101957. 39. Yue W, Zhang H, Zhou J, Li G. Deep learning-based automatic segmentation for size and volumetric measurement of breast cancer on magnetic resonance imaging. Front Oncol. 2022;12:984626. https:// doi.org/10.3389/fonc.2022.984626.

164

G. Di Salle et al.

40. Abed M, Khanapi M, Ghani A, Ibraheem R, Ahmed D, Khir M. Artificial neural networks for automatic segmentation and identification of nasopharyngeal carcinoma. J Comput Sci. 2017;21:263–74. https:// doi.org/10.1016/j.jocs.2017.03.026. 41. Zhu H-T, Sun S. Automatic segmentation of rectal tumor on diffusionweighted images by deep learning with U-Net. Appl Clin Med Phys. 2021;22:324. https://doi.org/10.1002/acm2.13381. 42. Li H, Liu B, Zhang Y, Fu C, Han X, Du L. 3D IFPN: improved feature pyramid network for automatic segmentation of gastric tumor. Front Oncol. 2021;11:618496. https://doi.org/10.3389/fonc.2021.618496. 43. Park KJ, Lee JL, Yoon SK, Heo C, Park BW, Kim JK. Radiomicsbased prediction model for outcomes of PD-1/PD-L1 immunotherapy in metastatic urothelial carcinoma. Eur Radiol. 2020;30(10):5392–403. https://doi.org/10.1007/s00330-020-06847-0. 44. Rahmim A, Huang P, Shenkov N, Fotouhi S, Davoodi-Bojd E, Lu L, Mari Z, Soltanian-Zadeh H, Sossi V. Improved prediction of outcome in Parkinson’s disease using radiomics analysis of longitudinal DAT SPECT images. NeuroImage Clin. 2017;16:539–44. https://doi.org/ 10.1016/j.nicl.2017.08.021. 45. Tang J, Yang B, Adams MP, Shenkov NN, Klyuzhin IS, Fotouhi S, Davoodi-Bojd E, Lu L, Soltanian-Zadeh H, Sossi V, Rahmim A. Artificial neural network-based prediction of outcome in Parkinson’s disease patients using DaTscan SPECT imaging features. Mol Imaging Biol. 2019;21(6):1165–73. https://doi.org/10.1007/s11307-019-01334-5. 46. Salmanpour MR, Shamsaei M, Hajianfar G, Soltanian-Zadeh H, Rahmim A. Longitudinal clustering analysis and prediction of Parkinson’s disease progression using radiomics and hybrid machine learning. Quant Imaging Med Surg. 2022;12(2):906–19. https://doi.org/10.21037/qims-21-425. 47. Ng CKC. Artificial intelligence for radiation dose optimization in pediatric radiology: a systematic review. Children. 2022;9(7):1–12. https:// doi.org/10.3390/children9071044. 48. McLeavy CM, Chunara MH, Gravell RJ, Rauf A, Cushnie A, Staley Talbot C, Hawkins RM. The future of CT: deep learning reconstruction. Clin Radiol. 2021;76(6):407–15. https://doi.org/10.1016/j.crad.2021.01.010. 49. Laurent G, Villani N, Hossu G, Rauch A, Noël A, Blum A, Gondim Teixeira PA. Full model-based iterative reconstruction (MBIR) in abdominal CT increases objective image quality, but decreases subjective acceptance. Eur Radiol. 2019;29(8):4016–25. https://doi.org/10.1007/s00330018-5988-8. 50. Kwee TC, Kwee RM. Workload of diagnostic radiologists in the foreseeable future based on recent scientific advances: growth expectations and role of artificial intelligence. Insights Imaging. 2021;12:88. https:// doi.org/10.1186/s13244-021-01031-4. 51. Alexander R, Waite S, Bruno MA, Krupinski EA, Berlin L, Macknik S, Martinez-Conde S. Mandating limits on workload, duty, and speed

8 Current Applications of AI in Medical Imaging

165

in radiology. Radiology. 2022;304(2):274–82. https://doi.org/10.1148/ radiol.212631. 52. Ginat DT. Analysis of head CT scans flagged by deep learning software for acute intracranial hemorrhage. Neuroradiology 2020;62(3):335–340. https://doi.org/10.1007/s00234-019-02330-w. 53. Trebeschi S, Bodalal Z, van Dijk N, Boellaard TN, Apfaltrer P, Tareco Bucho TM, Nguyen-Kim TDL, van der Heijden MS, Aerts HJWL, Beets-Tan RGH. Development of a prognostic AI-Monitor for metastatic urothelial cancer patients receiving immunotherapy. Front Oncol. 11(April) 2021. https://doi.org/10.3389/fonc.2021.637804.