Basics of Image Processing: The Facts and Challenges of Data Harmonization to Improve Radiomics Reproducibility 9783031484452, 9783031484469

This book, endorsed by EuSoMII, provides clinicians, researchers and scientists a useful handbook to navigate the intric

112 13 5MB

English Pages 218 [169] Year 2024

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Basics of Image Processing: The Facts and Challenges of Data Harmonization to Improve Radiomics Reproducibility
 9783031484452, 9783031484469

Table of contents :
Preface
Contents
1: Era of AI Quantitative Imaging
1.1 Precision Medicine Needs Precision Imaging
1.2 Transforming Clinical Care from Qualitative to Quantitative
1.2.1 Automated Methods Capable of Quantifying Imaging Features Related to Clinical Endpoints
1.2.2 AI-Based Methods as Gold-Standard for Imaging Biomarkers
Image Acquisition and Reconstruction
Image Harmonization
Image Synthesis for Data Augmentation
Image Segmentation
Extraction of Deep Features
AI Models for Prediction of Clinical Endpoints
Integration of Imaging, Clinical, Biological and Pathology Data
References
2: Principles of Image Formation in the Different Modalities
2.1 Ionizing Radiation Imaging
2.1.1 X-Ray Beam Generation
X-Ray Tube
Generator
2.1.2 Radiation-Patient Interaction
Photoelectric Effect
Compton Effect
2.1.3 Image Acquisition and Reconstruction
Computed Tomography
Reconstruction Algorithms
2.1.4 Image Quality
Spatial Resolution
Contrast Resolution
Image Noise
Artifacts
2.2 Nuclear Medicine Imaging
2.2.1 Radiopharmaceuticals
2.2.2 Physics Concepts in PET: Decay, Annihilation, and Coincidences
Radioactive Decay
Electron–Positron Annihilation
Scattered Coincidences
Random Coincidences
Multiple Coincidences
2.2.3 PET Detector Materials
2.2.4 Image Acquisition and Reconstruction
2.2.5 Image Quality
2.3 Magnetic Resonance Imaging
2.3.1 Hardware Components of MR
Magnetic Field Magnet
Magnetic Field Gradient Magnets
Radiofrequency Coils
2.3.2 Physical Basis
Nuclear Spin and Magnetic Moment
Precession and Larmor Frequency
Parallel or Antiparallel Alignment
Resonance and Nutation Motion
Longitudinal Relaxation: T1
Transverse Relaxation: T2 and T2*
Proton Density Image (PD)
T1, T2 or PD Weighted Images
2.3.3 Image Acquisition and Reconstruction
K-Space
2.3.4 Image Quality
Signal-to-Noise Ratio (SNR)
Spatial Resolution
Contrast-to-Noise Ratio (CNR)
Image Acquisition Time
References
3: How to Extract Radiomic Features from Imaging
3.1 Introduction to Radiomic Analysis
3.2 Deep Learning vs. Traditional Machine Learning
3.3 Radiomic Features Extraction Process
3.3.1 Image Preprocessing
3.3.2 Image Segmentation
3.3.3 Feature Extraction and Selection
3.3.4 Standardization
3.4 Deep Learning Radiomic Features
3.4.1 Deep Learning Radiomics and Hand-Crafted Radiomics
References
4: Facts and Needs to Improve Radiomics Reproducibility
4.1 Introduction
4.2 Factors Influencing Reproducibility
4.2.1 Acquisition
4.2.2 Segmentation
4.2.3 Radiomic Features Extraction
4.2.4 Model Construction
4.3 How to Improve Reproducibility
4.3.1 Guidelines and Checklists
4.3.2 Code and Development Platforms
4.4 Recommendations for Achieving Clinical Adoption of Radiomics
References
5: Data Harmonization to Address the Non-biological Variances in Radiomic Studies
5.1 Non-biological Variances in Radiomic Analysis
5.2 Data Harmonization
5.2.1 Data Harmonization in Radiomics Studies
5.2.2 Automatic Harmonization Schemes
5.2.3 Automatic Harmonization Approaches
Location and Scale Methods
Clustering Methods
Matching Methods
Synthesis Methods
Invariant Representation Learning Methods
5.3 Challenges for Data Harmonization
References
6: Harmonization in the Image Domain
6.1 The Need for Image Harmonization
6.2 Image Variability Sources
6.2.1 Image Acquisition
6.3 Harmonization Techniques
6.3.1 Non-AI Methods
Intensity Scaling
Z-Score Normalization
Histogram Equalization
Histogram Matching
6.3.2 AI Methods
Autoencoders
Generative Adversarial Networks (GANs)
Applications and Other Approaches
6.4 Conclusions
References
7: Harmonization in the Features Domain
7.1 Introduction
7.2 Reproducibility of Radiomic Features
7.2.1 Imaging Data Reproducibility
Image Acquisition and Reconstruction Parameters
CT Scans
PET Scans
MRI Sequences
Intra-individual Test-Retest Repeatability
Multi-scanner Reproducibility
7.2.2 Segmentation Reproducibility
7.2.3 Post-processing and Feature Extraction
7.2.4 Reporting Reproducibility
7.3 Normalization Techniques
7.3.1 Statistical Normalization
7.3.2 ComBat
7.3.3 Deep Learning Approaches
7.4 Strategies Overview
References

Citation preview

Imaging Informatics for Healthcare Professionals

Ángel Alberich-Bayarri Fuensanta Bellvís-Bataller Editors

Basics of Image Processing The Facts and Challenges of Data Harmonization to Improve Radiomics Reproducibility

Imaging Informatics for Healthcare Professionals Series Editors Peter M. A. van Ooijen, University Medical Center Groningen University of Groningen, GRONINGEN, Groningen The Netherlands Erik R. Ranschaert, Department of Radiology, ETZ Hospital Tilburg, The Netherlands Annalisa Trianni, Department of Medical Physics, ASUIUD UDINE, Udine, Italy Michail E. Klontzas, Institute of Computer Science, Foundation for Research and Technology (FORTH) & University Hospital of Heraklion, Heraklion, Greece

The series Imaging Informatics for Healthcare Professionals is the ideal starting point for physicians and residents and students in radiology and nuclear medicine who wish to learn the basics in different areas of medical imaging informatics. Each volume is a short pocket-sized book that is designed for easy learning and reference. The scope of the series is based on the Medical Imaging Informatics subsections of the European Society of Radiology (ESR) European Training Curriculum, as proposed by ESR and the European Society of Medical Imaging Informatics (EuSoMII). The series, which is endorsed by EuSoMII, will cover the curricula for Undergraduate Radiological Education and for the level I and II training programmes. The curriculum for the level III training programme will be covered at a later date. It will offer frequent updates as and when new topics arise.

Ángel Alberich-Bayarri Fuensanta Bellvís-Bataller Editors

Basics of Image Processing The Facts and Challenges of Data Harmonization to Improve Radiomics Reproducibility

Editors Ángel Alberich-Bayarri Founder and CEO Quibim SL Valencia, Spain

Fuensanta Bellvís-Bataller VP of Clinical Studies Quibim SL Valencia, Spain

ISSN 2662-1541     ISSN 2662-155X (electronic) Imaging Informatics for Healthcare Professionals ISBN 978-3-031-48445-2    ISBN 978-3-031-48446-9 (eBook) https://doi.org/10.1007/978-3-031-48446-9 © EuSoMII 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Paper in this product is recyclable.

Preface

In the rapidly evolving landscape of medical imaging and cancer research, radiomics has emerged as a promising field with the potential to revolutionize diagnosis, treatment, and improve patient outcomes. Radiomics delves into the extraction of quantitative features from medical images with the aim of transforming them into actionable predictions. With the emergence of radiomics, there come associated challenges that could potentially hinder the growth of the field in clinical practice. One such challenge is data harmonization, which plays an essential role in ensuring the reproducibility, robustness, and generalizability of radiomic studies. In an ideal scenario, images acquired under the same conditions should exhibit consistent technical quality with minimal deviations. However, in a real-world, multi-centric setting, various factors come into play, including different manufacturers, scanners, and a high degree of variability originating from the different sites where the images are acquired. These variations inevitably lead to shifts in the signal intensities of the image voxels, even when similar acquisition protocols are employed. When different manufacturers, scanners, or acquisition protocols are involved, this introduces image variability that can potentially impact the development of generalizable AI predictive models. Additionally, it can affect the reproducibility in the calculation of quantitative imaging biomarkers. This book delves into this crucial aspect of radiomics reproducibility, aiming to provide a comprehensive exploration of the intricacies surrounding data harmonization. We bring together a v

vi

Preface

diverse group of experts from the fields of radiology, engineering, data science, and oncology, who collectively share their invaluable insights and experiences. Our journey begins with a foundational understanding of radiomics and its transformative potential in the realm of precision medicine. We explore the principles of image formation in different modalities, followed by the different methodologies employed in radiomic features extraction and the significant strides that have been made in this field. The book then navigates through the description of the factors influencing reproducibility in radiomic studies ending with the fundamental principles of data harmonization both in the image and features domains. We also recognize the limitations and potential biases inherent in the explained methodologies and emphasize the need for a balanced and nuanced approach for data harmonization that depends on the specific application, available data, and resources. Furthermore, we address the critical role of standardization and the initiatives that have been undertaken to establish guidelines and best practices in radiomics research. We acknowledge that collaboration and open data sharing strategies are vital components to foster reproducibility and accelerate progress in this field. We express our sincere gratitude to all the authors who have generously shared their expertise, experiences, and passion in the creation of this book. Additionally, we extend our appreciation to the readers whose curiosity and interest will drive progress in this field. Together, let us embark on a journey into the captivating world of data harmonization, where we strive to enhance radiomics reproducibility and make a meaningful clinical impact on diagnosis, prognosis, and treatment planning. Valencia, Spain Valencia, Spain 

Ángel Alberich-Bayarri Fuensanta Bellvís-Bataller

Contents

1 Era  of AI Quantitative Imaging��������������������������������������  1 L. Marti-Bonmati and L. Cerdá-­Alberich 2 Principles  of Image Formation in the Different Modalities�������������������������������������������������������������������������� 27 P. A. García-Higueras and D. Jimena-­Hermosilla 3 How  to Extract Radiomic Features from Imaging�������� 61 A. Jimenez-Pastor and G. Urbanos-­García 4 Facts  and Needs to Improve Radiomics Reproducibility ���������������������������������������������������������������� 79 P. M. A. van Ooijen, R. Cuocolo, and N. M. Sijtsema  Harmonization to Address the Non-­biological 5 Data Variances in Radiomic Studies���������������������������������������� 95 Y. Nan, X. Xing, and G. Yang 6 Harmonization  in the Image Domain ����������������������������117 F. Garcia-Castro and E. Ibor-Crespo 7 Harmonization  in the Features Domain ������������������������145 J. Lozano-Montoya and A. Jimenez-­Pastor

vii

1

Era of AI Quantitative Imaging L. Marti-Bonmati and L. Cerdá-­Alberich

1.1 Precision Medicine Needs Precision Imaging In the quest for personalized healthcare, precision medicine has emerged as a transformative approach, tailoring treatments to the unique genetic expression and characteristics of each patient. By considering subject variability in genes, environment exposure and lifestyle, healthcare professionals aim to enhance the efficacy of therapies and minimize potential side effects. For years, diseases were considered to affect patients in a similar way and medical treatments were mainly designed for the “average patient.” As a result of this one-size-fits-all approach, treatments are very successful for some patients but not for others, as patient and disease heterogeneity are always present. While the advancements in genomics have been at the forefront of this revolution, another crucial aspect that stands to revolutionize the landscape of precision medicine is precision imaging. Genetic information alone

L. Marti-Bonmati (*) · L. Cerdá-Alberich Biomedical Imaging Research Group (GIBI230), La Fe Health Research Institute, Valencia, Spain e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Á. Alberich-Bayarri, F. Bellvís-Bataller (eds.), Basics of Image Processing, Imaging Informatics for Healthcare Professionals, https://doi.org/10.1007/978-3-031-48446-9_1

1

2

L. Marti-Bonmati and L. Cerdá-Alberich

may not provide a comprehensive understanding of the patient’s condition and disease’s intricacies. Precision imaging may serve as the gateway to unveiling these hidden facets. The drivers of precision medicine innovation are based on health data accessibility and biological treatment developments. First, more and more data with biopathological understanding is becoming available, the use of real-world (RW) data imaging is increasing and data extraction and analytical tools are being improved, allowing to have data very difficult or not feasible under the basic guideline of traditional randomized controlled trials (RCT). Secondly, more therapy choices exist as novel classes of targeted therapies and pharmacokinetic tailoring, like blood transfusion type matching or targeted cancer therapies by molecular phenotyping. Conventional medical imaging has long been a cornerstone of clinical practice, offering valuable glimpses into the inner workings of the human body. X-rays, computed tomography (CT) scans, and magnetic resonance imaging (MRI) have proven invaluable in diagnosing and monitoring diseases. However, precision medicine demands a higher level of detail, one that transcends what traditional imaging modalities can provide. In recent years, advancements in imaging technology have enabled us to visualize molecular and cellular processes in unprecedented ways. Positron emission tomography (PET) scans combined with radiotracers can detect metabolic changes at the cellular level. Functional MRI (fMRI) can map neural activity, allowing us to understand brain function better. Techniques like diffusion tensor imaging (DTI) offer insights into the microstructure of tissues and nerve fiber tracts. These cutting-edge imaging tools offer the potential to identify subtle changes that are indicative of early-stage diseases and predict a patient’s response to specific treatments. This increase in knowledge will allow to take decisions at the very initial steps or even before the disease is present, as in the recognition of pre-metastatic niches changing treatment from sur-

1  Era of AI Quantitative Imaging

3

gery to neoadjuvancy in some tumors. Early tumor detection of microscopic lesions before macroscopic size may also allow better treatment allocation, as in early microscopic regional lymph node invasion in breast cancer [1]. In macroscopic tumors, a precise lesion characterization will avoid biopsy sampling inaccuracies, and the identification of heterogeneous habitats with different behaviors in different locations will help target biopsies, as in neuroblastoma cancer patients [2]. Patients also need an accurate disease staging map to identify the full anatomical tumor extension even before the invasion is evident, such as the microvascular extension evaluation in hepatocellular carcinoma tumors [3]. Knowing the expected patients’ overall prognosis will allow their adequate stratification before treatment allocation, avoiding sampling biases. Optimization of treatment options is quite relevant in those cases were either pathology or liquid biopsy might not be sufficient, such as in immunotherapy in solid tumors. Imaging can also guide target focal therapies, such as radiofrequency ablation of osteoid osteomas or dose painting radiotherapy. Images are also used to evaluate and grade the response to treatment, defining next steps in the management of the disease, using scores such as the RECIST 1.1 or RANO.  Radiologists are proud of the essential role of medical imaging in daily clinical practice. Radiologist involvement is further fostered by the development of new technical devices with improved sensitivity to small changes and abnormalities, such as spectral CT or contrast-enhanced tomosynthesis. By combining new devices and acquisition protocols with computational and artificial advanced algorithms to extract quantitative insights from the medical images, radiologists are improving both the understanding of the expression that new pathobiological pathways have on images (Fig. 1.1), such as the regional neoangiogenesis profiles in glioblastoma [4], and the new clinical needs, such as the inflammatory and fibrotic progression in patients with metabolic liver disease.

4

L. Marti-Bonmati and L. Cerdá-Alberich

Fig. 1.1  The art of science and life representation based on phenotype imaging for personalized classification and prediction of clinical outcomes to achieve a diagnostic gain with respect to standard of care clinical practice

1.2 Transforming Clinical Care from Qualitative to Quantitative Imaging analysis is crucial for tumor detection, staging, and follow-­up. Cancer patients are stratified based on tumor properties, such as size and shape, invasion into lymph nodes and extension toward distant organs. All this information was summarized into the TNM international staging system, and radiologists included this classification within the radiology report. With the inputs of pathology, molecular and genetic information, new grades, and subtypes were defined and most treatments were reallocated. However, today we recognize that further stratification is still needed to properly allocate treatment. Also, the need to recognize tumor phenotypes and heterogeneity organization as a driving tool toward treatment allocation strategies is based in several facts. The same genetic and pathologic expression can have different phenotypic and aggressiveness behaviors; tumor vary locally as different habitats develop due to differences in cellular clones, microenvironment, and stroma evolution; and distant sites may behave different genetic and biological expressions than the primary tumor. Different sites in different organs have a distinct and specific extracellular matrix and cellular composition com-

1  Era of AI Quantitative Imaging

5

pared with that of the originating site. Metastases are usually biologically different from the primary tumor. Therefore, imaging has the potential to help targeting treatments if cancer hallmarks and biological behavior are estimated from them. Radiomics and imaging biomarkers are subrogated features and parameters extracted from medical images, providing quantitative information as the regional distribution and magnitude of the evaluated property. They can also be clustered as nosological signatures by the combination of relevant features into a single value. This information is resolved in space, as parametric maps, and time, through delta analysis of longitudinal changes. Artificial intelligence (AI) offers a paradigm shift toward data-­ driven tools decision-making that is revolutionizing medicine. AI can be used to improve the process of data acquisition (such as faster and higher quality MRI), extract new information from existing data (such as data labeling, lesion detection and segmentation, and deep radiomics extraction for patients’ stratification), and generate prediction on future disease-related events (such as predictive models on therapy response, and time-to-events for patient outcomes). Nowadays, AI-powered imaging is being widely used in cancer care, providing more reliable diagnosis and early detection, improving screening results, adjusting follow-up schemes, aiding in the discovery of new drugs, grading aggressiveness, defining best treatments, and improving final prognostic outcomes (Fig. 1.2).

Fig. 1.2  Diagram of the AI and medical imaging innovation research pathway, containing aspects related to the clinical question to answer, the data to be employed, the model to be developed to predict a particular clinical outcome and the proposed improvements for sustainability and reproducibility of research

6

L. Marti-Bonmati and L. Cerdá-Alberich

1.2.1 Automated Methods Capable of Quantifying Imaging Features Related to Clinical Endpoints Handcrafted computational methods to extract radiomics and imaging biomarkers suffer from variability and low reproducibility. This limitation is due to the inherent difference of medical images obtained from one machine to the other, based on the large variability of acquisition and reconstruction protocols. Vendors and technicians modified the way images are obtained to provide radiologists with images having the highest possible quality for their subjective clinical evaluation. Unfortunately, this pathway introduces a huge spectrum of differences due to geometrical and contrast dispersion. To partially avoid this issue, clinical trials forced centers to use similar image protocols. Dealing with real-­ word evidence, standardization of image acquisition protocols will surely never happen (new machines, new releases, different approaches). As a further challenge, similar image acquisition protocols can provide different biomarkers results, and repeatability (test-retest reliability, variation in measurements taken by a single person or instrument on the same item and under the same conditions) and reproducibility (replicate the study in different locations and different equipment by different people) studies usually show discrepancies. To minimize this reproducibility crisis, calibration methods (comparison with a known magnitude for correctness) can be applied, introducing corrections such as intraclass correlation coefficient or linear regression. Also, different image preparation steps were applied before biomarkers extraction to transform source images into a common framework via resizing, intensity normalization, and noise reduction. Unfortunately, all these measures were not sufficient for most applications and only a few had succeeded outside the oncology field, such as the proton density fat fraction and R2* measures for the non-invasive liver fat and iron calculation [5, 6]. There is a lack of availability of centralized and distributed repositories with quality large, labeled, and high complexity

1  Era of AI Quantitative Imaging

7

imaging data [7]. Lack of standardization of cancer-related health data is hampering the use of AI in cancer care mainly due to difficulties to access and share patient’s data and to test, validate, certificate, and audit AI algorithms. Data standardization, interoperability, biases, completeness, safety, privacy, ethical, and regulatory sharing aspects are crucial for the secondary use of data in predictive analytics. Massive data extraction, multicenter observational studies, and federated machine learning (ML) approaches are fostering the impact of imaging in medicine [8]. Real world data (RWD) heterogeneity, due to routine clinical practice variability between and within sites, datasets size and accessibility limitations capturing the full complexity of biological diversity, are the main forces responsible of the reproducibility crisis. The scientific method tries to avoid biases when proving facts and causal relations, but heterogeneity and diversity biases can only be minimized. As standardization in data acquisition will never be achieved (so many different vendors and changing platforms, releases and protocols), data harmonization is the only feasible solution. AI will have a role allowing data harmonization and capturing diverse patterns.

1.2.2 AI-Based Methods as Gold-Standard for Imaging Biomarkers AI-based methods are being increasingly used in medical imaging for the extraction of radiomics and imaging biomarkers (Fig. 1.3), as large and standardized imaging repositories are currently being built through different regional, national, and European initiatives [9, 10]. Given the ability of AI tools to analyze thousands of images and develop its own expertise, the global AI in the medical imaging market is projected to grow significantly in the upcoming years. Some of the most relevant AI-based methods for the extraction of imaging biomarkers are encountered in the following medical imaging areas:

8

L. Marti-Bonmati and L. Cerdá-Alberich

Fig. 1.3  Schema of the AI-based workflow in medical imaging and oncology, including tumor detection and segmentation, obtention of hallmarks in terms of parametric maps, extraction of diagnosis models and tools for the prediction of aggressiveness, overall survival, angiogenesis, cellularity, and relationships between phenotyping–genotyping, for the development of a Clinical Decision Support System (CDSS) that may impact decision on treatment [based on the probability of treatment response, confidence level, impact on radiotherapy (RT), etc.]. *DW diffusion-weighted, DCE dynamic contrast-enhanced, MR magnetic resonance

I mage Acquisition and Reconstruction AI can help automate image acquisition and workflows, streamline processes, and improve patient care. While AI-based methods are still being developed and tested, they are showing promise in creating faster and more reliable US/CT/MR/PET scans, improving the efficiency of the image acquisition process and the quality of the reconstructed images [11]. AI can help with tasks such as planning, physiological tracking, parameter optimization, noise and artifact reduction, and quality assessment. Deep Learning (DL) algorithms may aid in the transformation of raw k-space data to image data and specifically examine accelerated imaging, noise reduction and artifact suppression. Recent efforts in these areas show that deep learning-based algorithms can eclipse conventional reconstruction methods in terms of lower acquisition times, improved image quality and higher computational efficiency across all clinical applications, such as brain, musculoskeletal, cardiac, and abdominal imaging [12]. AI-based DL

1  Era of AI Quantitative Imaging

9

reconstruction and post-processing techniques can consistently improve diagnostic image quality at the lowest attainable source signal across all patients and procedures, far beyond what is possible with current reconstruction techniques. This presents a huge step for image optimization programs [13].

Image Harmonization One of the main challenges when developing AI models with RWD is the large image heterogeneity caused by many different vendors, scanners, protocols, acquisition parameters, and clinical practice. One of the most promising areas of research regarding image harmonization is the use of generative adversarial networks (GANs) [14] to generate synthetic images which belong to a new common framework space of standardized imaging data. GANs make use of a generator and a discriminator to improve their ability of (1) creating new images (fake) as similar as possible to the reference images (real) used as ground truth and (2) distinguishing the real images from the fake ones. This method allows an effective and efficient learning procedure where the only requirement is to have a well-defined ground truth; for instance, we may target images in a particular imaging domain (e.g., images belonging to a specific manufacturer, scanner, magnetic field strength or type of weighting in MR images). Additionally, if paired images are not available, the CycleGAN-based architecture is a great alternative solution, which is a popular DL model used for image-­ to-­image translation tasks without paired examples. The models are trained in an unsupervised manner using a collection of images from the source and target domain that do not need to be related in any way. However, in the case of aiming to generate images in a new common standardized imaging data space, we may not be able to define the specific characteristics for these images to achieve a better resolution and lower noise. Potential solutions may include the use of the frequency space, which allows to isolate specific components of the image, keeping its main information and excluding the one related to its contrast, which is at the core of image acquisition heterogeneity. This strategy can be used in combination with autoencoders, which have demonstrated a good

10

L. Marti-Bonmati and L. Cerdá-Alberich

performance in image reconstruction even when large parts of the images are removed. Other relevant harmonization techniques include distribution-­ based methods, such as location-scale strategies (e.g., ComBat to address the heterogeneity of cortical thickness, surface area and subcortical volumes caused by various scanners and sequences [15] or to harmonize the radiomic features extracted across multicenter MR datasets [16]), and image processing techniques, such as image filtering, physical-size resampling, standardization, and normalization [17]. These harmonization techniques aim at reducing batch effects in quantitative imaging feature extraction and, therefore, decreasing the variability observed across different manufacturers and acquisition protocols.

I mage Synthesis for Data Augmentation Image synthesis for data augmentation is a technique widely used in computer vision and ML tasks to improve the performance and robustness of models. Data augmentation involves generating additional training data by applying various transformations to existing images, such as rotations, translations, scaling, and flipping. This augmentation process helps the model generalize better by exposing it to a wider range of variations while reducing overfitting. Image synthesis techniques play a crucial role in data augmentation by creating new images that closely resemble the original dataset while introducing controlled variations. These synthesized images can include realistic deformations, different lighting conditions, and other visual changes. By integrating these synthetic images with the original dataset, the model becomes more resilient to variations in the real-world scenarios. There are several approaches to image synthesis for data augmentation. One common technique is geometric transformations, where images are modified by applying operations like rotation, translation, scaling, and shearing. These transformations can simulate changes in perspective or object position, enhancing the model’s ability to recognize objects from different viewpoints. Another method is to alter the color and texture properties of images. This can involve changing the brightness, contrast,

1  Era of AI Quantitative Imaging

11

s­ aturation, or hue of the original images. Adding noise, blurring, or sharpening effects can also be applied to simulate variations in image quality or focus. These modifications allow the model to adapt to different lighting conditions and improve its robustness against image distortions. AI algorithms can also be employed for medical imaging data augmentation. Different types of deep generative models, such as variational autoencoders (VAEs) [18] and GANs, can learn the underlying distribution of the training dataset and generate new images that resemble the original data. Regarding clinical applications, synthetic images can improve the accuracy and robustness of medical imaging tasks (classification, regression, segmentation) in several ways. For instance, they can be used to generate additional data to improve model performance and avoid overfitting. Additionally, synthetic images can be used to address limited data and privacy issues, as they can be generated to provide additional data for training models and to anonymize patient data. Synthetic images can also be generated with their corresponding segmentation masks to aid segmentation network generalization and adaptation, which can improve the robustness of segmentation models. By leveraging image synthesis techniques, researchers and practitioners can create augmented datasets with a larger and more diverse range of samples. With a more robust and diverse training set, models are better equipped to handle real-world scenarios and exhibit improved performance, accuracy, and reliability. However, the use of synthetic medical images raises several methodological considerations that should be carefully considered to ensure the proper use of this technology. The first one is the potential for bias in the data used to generate the synthetic images, which can lead to biased models and inaccurate diagnoses. Another concern is the privacy of patient data, as synthetic images can be used to anonymize patient data, but there might be a risk of re-identification if the synthetic images are not properly de-identified.

12

L. Marti-Bonmati and L. Cerdá-Alberich

Image Segmentation Volume of interest segmentation plays a crucial role in various aspects of medical applications, such as quantifying the size and shape of organs in population studies, detecting and extracting lesions in disease analysis, defining computer-aided treatment volumes, and surgical planning, among others. While manual segmentation by medical experts was considered the ground truth, it is expensive, time-consuming, and prone to disagreements among readers. On the other hand, automatic segmentation methods offer faster, cost-effective, and more reproducible results after manual checking and editing [19]. Traditionally, segmentation relied on classical techniques like region growing [20], deformable models [21], graph cuts [22], clustering methods [23], and Bayesian approaches [24]. However, in recent years, DL methods have surpassed these classical handcrafted techniques, achieving unprecedented performance in various medical image segmentation tasks [25, 26]. Recent reviews and advancements in DL for medical image segmentation are available, focusing on improving network architecture, loss functions, and training procedures [27]. Remarkably, it has been demonstrated that standard DL models can be trained effectively using limited labeled training images by making use of several transfer learning techniques [28]. Although there is considerable variation in proposed network architectures, they all share a common foundation: the use of convolution as the primary building block. Some alternative network architectures have explored recurrent neural networks [29] and attention mechanisms [30] but still rely on convolutional operations. However, recent studies suggest that a basic fully convolutional network (FCN) with an encoder–decoder structure can handle diverse segmentation tasks with comparable accuracy to more complex architectures [31]. Convolutional neural networks (CNNs), including FCNs, owe their effectiveness in modeling and analyzing images to key properties such as local connections, parameter sharing, and translation equivariance [32]. These properties provide CNNs with a strong and valuable inductive bias, enabling them to excel in

1  Era of AI Quantitative Imaging

13

v­ arious vision tasks. However, CNNs also have limitations as the fixed weights determined during training treat different images and parts of an image equally, lacking the ability to adapt based on image content. Additionally, the local nature of convolution operations limits the learning of long-range interactions between distant parts of an image. Attention-based neural network models offer a potential solution to address the limitations of convolution-based models. These models focus on learning relationships between different parts of a sequence, deviating from the fixed weights approach of CNNs. Attention-based networks, widely adopted in natural language processing (NLP) applications, have transformers as the dominant attention-based models [33]. Transformers outperform recurrent neural networks in capturing complex and long-range interactions and overcome limitations like vanishing gradients. They also enable parallel processing, resulting in shorter training times on modern hardware. Despite the advantages of transformer networks, their adoption in computer vision applications and medical image segmentation is limited. Challenges arise from the significantly larger number of pixels in images compared to the length of signal sequences in typical NLP applications, limiting the direct application of standard attention models to images. Furthermore, training transformer networks is more challenging due to their minimal inductive bias, requiring larger amounts of training data. Recent studies propose practical solutions to these challenges. Vision transformers (ViTs) consider image patches as the units of information [34], embedding them into a shared space and learning their relationships through self-attention modules. ViTs have shown superior image classification accuracy compared to CNNs when massive, labeled datasets and computational resources are available. Knowledge distillation from a CNN teacher has been proposed as a potential solution to training transformer networks, enabling them to achieve image classification accuracy comparable to CNNs with the same amount of labeled training data [35]. Additionally, self-attention-based deep neural networks, relying on self-attention between linear embeddings of 3D image patches

14

L. Marti-Bonmati and L. Cerdá-Alberich

without convolution operations, have been proposed. These ­models typically require large, labeled training datasets, and are often combined with unsupervised pre-training methods that leverage large unlabeled medical image datasets [36]. While U-Net [37], a U-shaped CNN architecture, has achieved tremendous success on most medical image segmentation tasks, transformer-based models are challenging the well-configured U-Net architectures with promising results [34].

 xtraction of Deep Features E DL is an emerging approach primarily utilized in tasks related to recognition, prediction, and classification. By propagating data through multiple hidden layers, a neural network can learn and construct a representation of the data, which can be further used for prediction or classification purposes. In the case of image data, a CNN typically employs multiple convolutional kernels to extract various textures and edges before passing the extracted information through multiple hidden layers. After learning, the convolutional layers of CNNs contain representations of edge gradients and textures. When these representations are propagated through fully connected layers, the network is believed to have learned diverse high-level features. From these fully connected layers, deep features, which refer to the outputs of units in the layer, are extracted and denoted by the position of a neuron in a hidden layer row vector. Due to the still limited data availability in the medical imaging field, training a CNN from scratch is often unfeasible. In these cases, a pretrained CNN can be employed, such as the VGG16 network [38], which has been trained on the ImageNet dataset. Additionally, transfer learning methods [39], which refer to the application of previously acquired knowledge from one domain to a new task domain, are considered as an alternative option. To extract deep features from medical images within a given exam, a commonly used approach relies on selecting the 2-­dimensional slice containing the largest lesion area. In this case, only features from the lesion region are extracted by incorporating the largest rectangular box around the tumor. The resulting images can be then resized to an isometric voxel size by ­employing

1  Era of AI Quantitative Imaging

15

a bicubic interpolation in order to match the required input size of the neural network. In addition, these pretrained networks have been originally trained using natural camera images with three color channels (R, G, B), whereas the medical images are usually grayscale, lacking color components. Voxel intensities of the medical images are usually converted to a 0–255 range. Thus, the same grayscale image can be utilized three times to mimic an image with three color channels, and normalization is encouraged to be carried out using the appropriate color channel image. The deep features can then be generated from the last fully connected layer, followed by the application of a ReLU activation function. The resulting feature vector, in the case of the VGG16 network, has a size of 4096. Consequently, further feature engineering and dimensionality reduction techniques may be necessary to maximize their effectiveness and address the following specific challenges: 1. The presence of redundant or irrelevant information within the deep features. High-dimensional feature vectors can be computationally expensive to process and may lead to overfitting or increased model complexity. Dimensionality reduction techniques, such as principal component analysis (PCA) or t-distributed stochastic neighbor embedding (t-SNE), can be applied to reduce the dimensionality of deep features while preserving the most informative characteristics. By reducing the number of dimensions, these techniques can enhance model efficiency, improve generalization, and facilitate visualization of the data. 2. The need for more interpretable or domain-specific features. Deep features are often abstract and lack direct human interpretability. In certain applications, it may be beneficial to incorporate domain knowledge or expert insights into the feature engineering process. This can involve designing handcrafted features based on prior knowledge or using rule-based algorithms to extract specific patterns or characteristics. By incorporating domain expertise, the deep features can be transformed into more meaningful representations that align with the specific problem at hand.

16

L. Marti-Bonmati and L. Cerdá-Alberich

3. Deep features may require further processing to address specific challenges related to data variability or noise. For instance, in tasks involving medical imaging, there can be variations in image acquisition protocols, artifacts, or noise levels. Preprocessing techniques, such as image normalization, denoising, or data augmentation, can be applied to enhance the quality and robustness of deep features. These techniques help mitigate the impact of data variability and improve the model’s ability to generalize across different conditions or sources [40]. It is important to note that the need for further feature engineering and dimensionality reduction depends on the specific task and used dataset. In some cases, the raw deep features obtained from DL models may be sufficiently informative and effective without additional processing. However, in other situations, incorporating additional techniques can enhance the performance, interpretability, and efficiency of the models. In other words, while deep features extracted from DL models offer powerful representations of data, the choice and extent of these techniques depend on the specific requirements and characteristics of the task at hand.

 I Models for Prediction of Clinical Endpoints A AI models have emerged as powerful tools in the prediction of clinical outcomes, offering valuable insights and assisting healthcare professionals in making informed decisions [41]. In this context, two prominent types of AI models are radiomics models and end-to-end deep learning models. Radiomics models leverage the vast amount of quantitative imaging data extracted from medical images to predict clinical outcomes. Radiomics involves extracting a large number of quantitative features from medical images, including texture, shape, and intensity, and combining them with clinical and demographic data. These features are then used as inputs to ML algorithms, such as support vector machines, random forests, or neural networks, to build predictive models [42].

1  Era of AI Quantitative Imaging

17

Radiomics models offer several advantages. Firstly, they allow for an in vivo objective assessment of disease characteristics by analyzing imaging data. This can aid in the early detection of diseases [43], prediction of treatment response [44], and prognosis estimation [45]. Secondly, radiomics models can capture intricate patterns and relationships within the data that may not be evident to the human observer, enabling more accurate predictions. Lastly, radiomics models have the potential to facilitate personalized medicine by identifying biomarkers or image-based signatures that correlate with specific clinical outcomes, thus guiding individualized treatment plans [46, 47] (Fig. 1.4). Some examples of AI-based radiomic models include the identification of biological target volume from functional images, using AI-derived imaging biomarkers to shift radiology from a mostly subjective analysis to a more objective one, applying sophisticated ML and computational intelligence to revolutionize cancer image analysis, and using radiomic-based approaches and AI to analyze medical images and construct prediction algorithms for precision digital oncology [48, 49]. On the other hand, end-to-end DL models represent a more recent and rapidly advancing approach in AI [50]. These models

Fig. 1.4  Diagram of the AI image processing pipeline, including image harmonization (spatial resolution, common framework, normalization), image annotation and tumor extraction (3D segmentation), properties extraction (parameters, deep features) and modeling, and personalized cancer phenotyping, prediction, and prognosis estimations

18

L. Marti-Bonmati and L. Cerdá-Alberich

apply deep neural networks to automatically learn hierarchical representations from the raw data, without the need for explicit feature extraction. In the context of clinical outcome prediction, end-to-end DL models can directly analyze medical images or combine them with other types of data, such as electronic health records or genetic information. End-to-end DL models offer several advantages. Firstly, they can learn complex patterns and relationships in a data-driven manner, enabling them to extract features that are highly relevant for the prediction task. Secondly, these models have the potential to outperform traditional methods by automatically discovering intricate image features that may be challenging to capture through manual feature engineering. Moreover, they can handle multi-modal data integration seamlessly, incorporating diverse information sources to improve prediction accuracy. Lastly, their ability to generalize across different datasets makes them highly adaptable and transferable to various clinical settings. A recent example is the use of these models for evaluating COVID-19 patients, which has demonstrated potential for improving clinical decision making and assessing patient outcomes throughout images [51]. However, both radiomics models and end-to-end DL models face challenges and limitations. Radiomics models heavily rely on the quality and consistency of imaging data and are susceptible to variations in image acquisition protocols. Standardization of image acquisition and feature extraction techniques is crucial to ensure robust and reproducible results. On the other hand, end-to-­ end DL models require large amounts of labeled data for training, which can be a limitation in domains where annotated data is scarce or time-consuming to obtain. Additionally, the black-box nature of DL models can hinder their interpretability, making it difficult to understand the reasoning behind their predictions, which is a critical aspect in the healthcare domain. In conclusion, both radiomics models and end-to-end deep learning models have shown promise in predicting clinical outcomes and assisting clinical users in detecting and quantifying a wide array of clinical conditions, with excellent accuracy, sensitivity, and specificity. Radiomics models leverage quantitative

1  Era of AI Quantitative Imaging

19

imaging features to provide insights into disease characteristics and treatment response. Meanwhile, end-to-end DL models offer the advantage of learning from raw data and can handle multi-­ modal integration. While challenges exist, ongoing research and advancements in these AI models will further enhance their predictive capabilities and contribute to improved patient care and outcomes.

I ntegration of Imaging, Clinical, Biological and Pathology Data The integration of imaging, clinical, biological, and pathology information has revolutionized the field of healthcare by providing valuable new insights into the diagnosis, treatment, and management of various diseases. This multidisciplinary approach brings together different types of data from various sources, enabling a more comprehensive and holistic understanding of a patient’s condition. Imaging data from the different modalities, such as radiographs, CT, MR, and PET, provide visual representations of the internal structures and organs of the body. These images allow healthcare professionals to identify abnormalities, tumors, lesions, or other indicators of disease. By integrating imaging data with clinical information, such as patient history, symptoms, and laboratory results, a more accurate diagnosis can be made. This integration enhances the diagnostic accuracy and improves patient outcomes [52]. Biological data, including genetic and molecular profiles, provides insights into the underlying mechanisms of diseases at a cellular and molecular level. With the advancement of technologies like genomics and proteomics, healthcare professionals can analyze an individual’s genetic makeup and identify specific genetic variations that may contribute to the development or progression of a disease. By integrating this information with imaging and clinical data, personalized treatment plans can be tailored to each patient’s unique genetic profile, leading to more effective and targeted therapies. Pathology data, derived from the microscopic examination of tissues and cells, plays a crucial role in diagnosing and character-

20

L. Marti-Bonmati and L. Cerdá-Alberich

izing diseases, particularly cancer. Pathologists analyze biopsy samples and provide information about the presence, type, and stage of the disease. By integrating pathology data with imaging, clinical, and biological data, healthcare professionals can gain a comprehensive understanding of the disease, allowing for more accurate prognostic predictions and personalized treatment strategies. The integration of these diverse datasets is made possible by advancements in technology and the development of specialized software platforms. These platforms allow for the aggregation, storage, and analysis of large volumes of data from different sources. Data integration techniques, such as data mining, ML and AI, help identify patterns, correlations, and predictive models that can aid in clinical decision-making. The benefits of integrating imaging, clinical, biological, and pathology data are manifold. Firstly, it improves diagnostic accuracy, enabling healthcare professionals to detect diseases at an earlier stage when they are more treatable. Secondly, it facilitates personalized medicine, where treatment plans can be tailored to an individual’s unique characteristics, resulting in better outcomes and reduced side effects. Thirdly, it enhances research and development by providing researchers with a wealth of data to study disease mechanisms, identify new therapeutic targets, and develop innovative treatments. However, there are challenges in integrating these different types of data. One significant is data interoperability, as each type of data is often stored in different formats and systems. Efforts are being made to develop standards and protocols that allow seamless data exchange and interoperability across different platforms and healthcare settings. In conclusion, the integration of imaging, clinical, biological, and pathology data is transforming healthcare and research by providing a comprehensive and multidimensional view of patients’ conditions. This integrated approach enhances diagnostic accuracy, enables personalized treatment plans, and fosters advancements in research and development. With continued advancements in technology and data analysis through AI tech-

1  Era of AI Quantitative Imaging

21

niques, the integration of these diverse datasets will continue to play a vital role in improving patient care and advancing medical knowledge.

References 1. Demicheli R, Fornili M, Querzoli P et al (2019) Microscopic tumor foci in axillary lymph nodes may reveal the recurrence dynamics of breast cancer. Cancer Commun 39:35. https://doi.org/10.1186/s40880-­019-­ 0381-­9 2. Cerdá Alberich L, Sangüesa Nebot C, Alberich-Bayarri A et al (2020) A confidence habitats methodology in MR quantitative diffusion for the classification of neuroblastic tumors. Cancers (Basel) 12(12):3858. https://doi.org/10.3390/cancers12123858. PMID: 33371218; PMCID: PMC7767170 3. Ni M, Zhou X, Lv Q et  al (2019) Radiomics models for diagnosing microvascular invasion in hepatocellular carcinoma: which model is the best model? Cancer Imaging 19:60. https://doi.org/10.1186/s40644-­019-­ 0249-­x 4. Juan-Albarracín J, Fuster-Garcia E, Pérez-Girbés A et  al (2018) Glioblastoma: vascular habitats detected at preoperative dynamic susceptibility-­weighted contrast-enhanced perfusion MR imaging predict survival. Radiology 287(3):944–954. https://doi.org/10.1148/ radiol.2017170845. Epub 2018 Jan 19. PMID: 29357274 5. Reeder SB, Yokoo T, França M et al (2023) Quantification of liver iron overload with MRI: review and guidelines from the ESGAR and SAR. Radiology 307(1):e221856. https://doi.org/10.1148/radiol.221856. Epub 2023 Feb 21. PMID: 36809220; PMCID: PMC10068892 6. Martí-Aguado D, Jiménez-Pastor A, Alberich-Bayarri Á et  al (2022) Automated whole-liver MRI segmentation to assess steatosis and iron quantification in chronic liver disease. Radiology 302(2):345–354. https://doi.org/10.1148/radiol.2021211027. Epub 2021 Nov 16. PMID: 34783592 7. Kondylakis H, Kalokyri V, Sfakianakis S et al (2023) Data infrastructures for AI in medical imaging: a report on the experiences of five EU projects. Eur Radiol Exp 7(1):20. https://doi.org/10.1186/s41747-­023-­ 00336-­x. PMID: 37150779; PMCID: PMC10164664 8. Marti-Bonmati L, Koh DM, Riklund K et al (2022) Considerations for artificial intelligence clinical impact in oncologic imaging: an AI4HI position paper. Insights Imaging 13(1):89. https://doi.org/10.1186/ s13244-­022-­01220-­9. PMID: 35536446; PMCID: PMC9091068

22

L. Marti-Bonmati and L. Cerdá-Alberich

9. Martí-Bonmatí L, Alberich-Bayarri Á, Ladenstein R et  al (2020) PRIMAGE project: predictive in silico multiscale analytics to support childhood cancer personalised evaluation empowered by imaging biomarkers. Eur Radiol Exp 4(1):22. https://doi.org/10.1186/s41747-­020-­ 00150-­9. PMID: 32246291; PMCID: PMC7125275 10. Martí-Bonmatí L, Miguel A, Suárez A et al (2022) CHAIMELEON project: creation of a Pan-European repository of health imaging data for the development of AI-powered cancer management tools. Front Oncol 12:742701. https://doi.org/10.3389/fonc.2022.742701. PMID: 35280732; PMCID: PMC8913333 11. Reader AJ, Schramm G (2021) Artificial intelligence for PET image reconstruction. J Nucl Med 62(10):1330–1333. https://doi.org/10.2967/ jnumed.121.262303. Epub 2021 Jul 8. PMID: 34244357 12. Lin DJ, Johnson PM, Knoll F, Lui YW (2021) Artificial intelligence for MR image reconstruction: an overview for clinicians. J Magn Reson Imaging. 53(4):1015–1028. https://doi.org/10.1002/jmri.27078. Epub 2020 Feb 12. PMID: 32048372; PMCID: PMC7423636 13. Shan H, Padole A, Homayounieh F et al (2019) Competitive performance of a modularized deep neural network compared to commercial algorithms for low-dose CT image reconstruction. Nat Mach Intell 1:269– 276. https://doi.org/10.1038/s42256-­019-­0057-­9 14. Goodfellow I, Pouget-Abadie J, Mirza M et al (2014) Generative adversarial nets. In Advances in neural information processing systems (NIPS 2014). pp 2672–2680 15. Radua J, Vieta E, Shinohara R et  al (2020) ENIGMA Consortium collaborators. Increased power by harmonizing structural MRI site differences with the ComBat batch adjustment method in ENIGMA. Neuroimage 218:116956. https://doi.org/10.1016/j.neuroimage.2020.116956. Epub 2020 May 26. PMID: 32470572; PMCID: PMC7524039 16. Whitney HM, Li H, Ji Y et al (2020) Harmonization of radiomic features of breast lesions across international DCE-MRI datasets. J Med Imaging (Bellingham) 7(1):012707. https://doi.org/10.1117/1.JMI.7.1.012707. Epub 2020 Mar 5. PMID: 32206682; PMCID: PMC7056633 17. Nan Y, Ser JD, Walsh S et al (2022) Data harmonisation for information fusion in digital healthcare: a state-of-the-art systematic review, meta-­ analysis and future research directions. Inf Fusion 82:99–122. https://doi. org/10.1016/j.inffus.2022.01.001. PMID: 35664012; PMCID: PMC8878813 18. Kingma DP, Welling M (2019) An introduction to variational autoencoders. Found Trends Mach Learn 12(4):307–392. https://doi. org/10.1561/2200000056 19. Veiga-Canuto D, Cerdà-Alberich L, Jiménez-Pastor A et  al (2023) Independent validation of a deep learning nnU-net tool for neuroblastoma detection and segmentation in MR images. Cancers (Basel) 15(5):1622.

1  Era of AI Quantitative Imaging

23

https://doi.org/10.3390/cancers15051622. PMID: 36900410; PMCID: PMC10000775 20. Wan SY, Higgins WE (2003) Symmetric region growing. IEEE Trans Image Process 12(9):1007–1015. https://doi.org/10.1109/ TIP.2003.815258. PMID: 18237973 21. Bogovic JA, Prince JL, Bazin PL (2013) A multiple object geometric deformable model for image segmentation. Comput Vis Image Underst 117(2):145–157. https://doi.org/10.1016/j.cviu.2012.10.006. PMID: 23316110; PMCID: PMC3539759 22. Chen X, Pan L (2018) A survey of graph cuts/graph search based medical image segmentation. IEEE Rev Biomed Eng 11:112–124. https://doi. org/10.1109/RBME.2018.2798701. Epub 2018 Jan 26. PMID: 29994356 23. Mittal H, Pandey AC, Saraswat M et al (2022) A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets. Multimed Tools Appl 81(24):35001–35026. https:// doi.org/10.1007/s11042-­021-­10594-­9. Epub 2021 Feb 9. PMID: 33584121; PMCID: PMC7870780 24. Wong WC, Chung AC (2005) Bayesian image segmentation using local iso-intensity structural orientation. IEEE Trans Image Process 14(10):1512–1523. https://doi.org/10.1109/tip.2005.852199. PMID: 16238057 25. Veiga-Canuto D, Cerdà-Alberich L, Sangüesa Nebot C et  al (2022) Comparative multicentric evaluation of inter-observer variability in manual and automatic segmentation of neuroblastic tumors in magnetic resonance images. Cancers (Basel) 14(15):3648. https://doi.org/10.3390/ cancers14153648. PMID: 35954314; PMCID: PMC9367307 26. Kamnitsas K, Ledig C, Newcombe VFJ et al (2017) Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Med Image Anal 36:61–78. https://doi.org/10.1016/j. media.2016.10.004. Epub 2016 Oct 29. PMID: 27865153 27. Taghanaki SA, Abhishek K, Cohen JP, Cohen-Adad J, Hamarneh G (2020) Deep semantic segmentation of natural and medical images: a review. Artif Intell Rev 54(1):1–42 28. Ghafoorian M, Mehrtash A, Kapur T et al (2017) Transfer learning for domain adaptation in MRI: application in brain lesion segmentation. In: Proceedings of the international conference on medical image computing and computer assisted intervention. Springer, Cham, pp 516–524 29. Bai W, Suzuki H, Qin C et al (2018) Recurrent neural networks for aortic image sequence segmentation with sparse annotations. In: Proceedings of the international conference on medical image computing and computer assisted intervention. Springer, Cham, pp 586–594 30. Chen J, Lu Y, Yu Q et  al (2021) TransUNet: transformers make strong encoders for medical image segmentation. arXiv:2102.04306

24

L. Marti-Bonmati and L. Cerdá-Alberich

31. Isensee F, Kickingereder P, Wick W et  al (2018) No new-net. In: Proceedings of the international MICCAI brain lesion workshop. Springer, Cham, pp 234–244 32. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436 33. Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. In: Proceedings of the 31st international conference on neural information processing systems (NIPS’17). Curran Associates Inc., Red Hook, pp 6000–6010 34. He K, Gan C, Li Z et al (2022) Transformers in medical image analysis: a review. ArXiv abs/2202.12165 35. Touvron H, Cord M, Douze M et al (2020) Training data-efficient image transformers & distillation through attention. arXiv:2012.12877 36. Karimi D, Dou H, Gholipour A (2022) Medical image segmentation using transformer networks. IEEE Access 10:29322–29332. https://doi. org/10.1109/access.2022.3156894. Epub 2022 Mar 4. PMID: 35656515; PMCID: PMC9159704 37. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells W, Frangi A (eds) Medical image computing and computer-assisted intervention—MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science, vol 9351. Springer, Cham. https://doi.org/10.1007/978-­3-­319-­ 24574-­4_28 38. Liu S, Deng W (2015) Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian conference on pattern recognition (ACPR), Kuala Lumpur, pp 730–734. https://doi.org/10.1109/ACPR.2015.7486599 39. Tammina S (2019) Transfer learning using VGG-16 with deep convolutional neural network for classifying images. Int J Sci Res Publ 9:9420. https://doi.org/10.29322/IJSRP.9.10.2019.p9420 40. Fernández Patón M, Cerdá Alberich L, Sangüesa Nebot C et al (2021) MR denoising increases radiomic biomarker precision and reproducibility in oncologic imaging. J Digit Imaging 34(5):1134–1145. https://doi. org/10.1007/s10278-­021-­00512-­8. Epub 2021 Sep 10. PMID: 34505958; PMCID: PMC8554919 41. Oltra-Sastre M, Fuster-Garcia E, Juan-Albarracin J et  al (2019) Multi-­ parametric MR imaging biomarkers associated to clinical outcomes in gliomas: a systematic review. Curr Med Imaging Rev 15(10):933–947. https://doi.org/10.2174/1573405615666190109100503. PMID: 32008521 42. Marti-Bonmati L, Cerdá-Alberich L, Pérez-Girbés A et  al (2022) Pancreatic cancer, radiomics and artificial intelligence. Br J Radiol 95(1137):20220072. https://doi.org/10.1259/bjr.20220072. Epub 2022 Jun 28. PMID: 35687700

1  Era of AI Quantitative Imaging

25

43. Sanz-Requena R, Martínez-Arnau FM, Pablos-Monzó A et al (2020) The role of imaging biomarkers in the assessment of sarcopenia. Diagnostics (Basel) 10(8):534. https://doi.org/10.3390/diagnostics10080534. PMID: 32751452; PMCID: PMC7460125 44. Carles M, Fechter T, Radicioni G et al (2021) FDG-PET radiomics for response monitoring in non-small-cell lung cancer treated with radiation therapy. Cancers (Basel) 13(4):814. https://doi.org/10.3390/cancers13040814. PMID: 33672052; PMCID: PMC7919471 45. Fuster-Garcia E, Juan-Albarracín J, García-Ferrando GA et  al (2018) Improving the estimation of prognosis for glioblastoma patients by MR based hemodynamic tissue signatures. NMR Biomed 31(12):e4006. https://doi.org/10.1002/nbm.4006. Epub 2018 Sep 21. PMID: 30239058 46. Paiar F, Gabelloni M, Pasqualetti F et al (2023) Correlation of pre- and post-radio-chemotherapy MRI texture features with tumor response in rectal cancer. Anticancer Res 43(2):781–788. https://doi.org/10.21873/ anticanres.16218. PMID: 36697103 47. Pang Y, Wang H, Li H (2022) Medical imaging biomarker discovery and integration towards AI-based personalized radiotherapy. Front Oncol 11:764665. https://doi.org/10.3389/fonc.2021.764665. PMID: 35111666; PMCID: PMC8801459 48. Weiss J, Hoffmann U, Aerts HJWL (2020) Artificial intelligence-derived imaging biomarkers to improve population health. Lancet Digit Health 2(4):e154–e155. https://doi.org/10.1016/S2589-­7500(20)30061-­3. Epub 2020 Mar 2. PMID: 33328074 49. Forghani R (2020) Precision digital oncology: emerging role of radiomics-­ based biomarkers and artificial intelligence for advanced imaging and characterization of brain tumors. Radiol Imaging Cancer 2(4):e190047. https://doi.org/10.1148/rycan.2020190047. PMID: 33778721; PMCID: PMC7983689 50. Koh DM, Papanikolaou N, Bick U et al (2022) Artificial intelligence and machine learning in cancer imaging. Commun Med (Lond) 2:133. https:// doi.org/10.1038/s43856-­022-­00199-­0. PMID: 36310650; PMCID: PMC9613681 51. Zhao W, Jiang W, Qiu X (2021) Deep learning for COVID-19 detection based on CT images. Sci Rep 11:14353. https://doi.org/10.1038/s41598-­ 021-­93832-­2 52. Rodríguez-Ortega A, Alegre A, Lago V et al (2021) Machine learning-­ based integration of prognostic magnetic resonance imaging biomarkers for myometrial invasion stratification in endometrial cancer. J Magn Reson Imaging 54(3):987–995. https://doi.org/10.1002/jmri.27625. Epub 2021 Apr 1. PMID: 33793008

2

Principles of Image Formation in the Different Modalities P. A. García-Higueras and D. Jimena-­Hermosilla

2.1 Ionizing Radiation Imaging X-rays are a form of ionizing electromagnetic radiation that were discovered in 1895 by the German physicist Wilhelm Röntgen. In a very short time after discovery, two types of applications in medicine were defined: diagnosis of diseases and therapeutic purposes. Since then, X-ray applications have dramatically increased due to the evolution of radiological technology [1]. The ionizing radiation imaging involves three basic processes: the generation of the X-ray beam, its interaction with the patient tissues, and the image formation.

2.1.1 X-Ray Beam Generation All equipment that produces X-rays for radiodiagnostic purposes has a common structure for beam generation: the X-ray tube and the generator. P. A. García-Higueras (*) · D. Jimena-Hermosilla Hospital Radiophysics Clinical Management Unit, University Hospital of Jaén, Jaén, Spain e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Á. Alberich-Bayarri, F. Bellvís-Bataller (eds.), Basics of Image Processing, Imaging Informatics for Healthcare Professionals, https://doi.org/10.1007/978-3-031-48446-9_2

27

28

P. A. García-Higueras and D. Jimena-Hermosilla

X-Ray Tube The X-ray tube is the radiation source and is electrically powered by the generator. It is composed of a protective housing and a glass envelope in which a vacuum is created. Two main parts can be found inside the glass envelope: • Cathode—negative side of the X-ray tube. Its main component is the filament. When a high electric current passes through the filament, it heats up and produces electrons which are directed towards the anode. • Anode—positive side of the X-ray tube. When an accelerated electron interacts with the anode target, the particle’s trajectory can be altered and thereby lose part or all its kinetic energy. This energy is transferred to the medium in the form of heat (99%) or can be emitted by photons through radiation losses (1%), which form the X-ray beam.

Generator The generator is the device that transforms and accommodates the power from the electrical grid to the needs of the X-ray tube. The generator usually consists of two separate elements: the console and the electric transformer. The operator can use the console to define the radiological technique, which basically consists of three adjustable parameters: • Peak kilovoltage (kV). A higher voltage increases the energy of the photons generated and therefore their penetration depth. The change of the peak kilovoltage affects the image contrast. • Tube current or milliamperes (mA). It is the number of electrons generated by a unit of time. A higher tube current means that more photons will be generated per unit time. • Time exposure (ms). It determines the time duration of the X-ray beam production. The product of tube current (mA) and exposure time (ms) determines the number of photons produced (mAs).

2  Principles of Image Formation in the Different Modalities

29

Nowadays most of the equipment operates with systems that automatically adjust the technique according to the characteristics of the studied object.

2.1.2 Radiation-Patient Interaction Once the X-ray beam is generated, it passes through the patient and interacts with the tissues. Although there are many ways that photons interact with matter, in radiodiagnosis they are fundamentally reduced to two: photoelectric effect and Compton effect [2, 3].

Photoelectric Effect In photoelectric effect, the photon disappears being absorbed by the atomic electrons that compose the medium. In other words, the X-ray beam will lose a photon that will not reach the imaging system. The photoelectric effect is a phenomenon that occurs most likely at low X-ray beam energies (low kV). Compton Effect In the Compton effect, the photon interacts with an electron of the medium and transfers part of its energy, the photon is scattered after the interaction with a scattering angle. The process results in a decrease of the photon energy and the emission of an atomic electron.

2.1.3 Image Acquisition and Reconstruction Although the X-ray beam generation has not changed for many decades and its interaction with matter is governed by invariant physics laws, in the last decades image receptors have been evolving towards systems generically called digitals. The way in which the image is obtained could be used to classify the X-ray equipment. A distinction could be made between

30

P. A. García-Higueras and D. Jimena-Hermosilla

X-ray and computed tomography modalities, the former obtains the image through a direct exposure of the X-ray beam passing through the patient, and the latter, obtains the image from a mathematical reconstruction.

Computed Tomography Computed tomography (CT) was the first and earliest application of digital radiology and is considered by many the greatest advance in radiodiagnosis since the X-rays discovery [4]. In a modern CT scan both the X-ray tube and the image detector are rigidly mounted on a platform and rotate at the same time. This structure is called gantry. CT imaging systems are generally an array of solid-state detectors fabricated by modules. The field of view (FOV) is delimited by the physical extension of the detectors array. At the CT tube exit, there are “shape filters” to adjust the intensity gradient of the X-ray beam and a collimation system to limit the beam width along the longitudinal axis. Another intrinsic element of a CT scanner is the couch, which has precision motors for accurate movements and lasers to centre the patients. The modes of CT acquisitions can be classified into axial and helical (Fig. 2.1): • Axial acquisition: The tube does not irradiate while the patient moves between acquisition cycles. • Helical acquisition: The tube has a spiral trajectory around the patient in the form of helices. The ratio between the travel length of the couch in a complete revolution and the slice thickness is called pitch.

Reconstruction Algorithms The basic principle of tomographic image reconstruction is the accurate reproduction of the object from a set of projections taken from different angles. Although tomographic reconstruction was originally performed by algebraic methods, the slowness of the calculation processes led to the use of analytical methods. The most popular of the analytical methods is the filtered backprojection (FBP) algorithm. This algorithm based on the Fourier

2  Principles of Image Formation in the Different Modalities

31

Fig. 2.1  Schematic representation of the different acquisition modes and pitch effect. If pitch is less than 1 the spirals overlap each other, if pitch is equal to 1 the spirals are contiguous and if pitch is greater than 1 the spirals are separated from each other. Pitch usually takes values between 1 and 2

slice theorem states that by projections of an image it is possible to determine the image through a two-dimensional inverse Fourier transform [5]. This means that by taking n projections of an object at different angles ϑ1, ϑ2, ..., ϑn and obtaining the Fourier transformation of each of them, the two-dimensional transformation of the object on the lines passing through the origin forming the angles ϑ1, ϑ2, ..., ϑn can be determined. Nowadays, due to the great improvement in computing capacity, algebraic methods are used again, with iterative reconstruction algorithms as the most employed. These algorithms improve image quality compared to FBP, especially in conditions in which the image is noisy or it is not possible to obtain complete acquisitions, at the expense of longer reconstruction time. There are many different algorithms, but all of them start with an assumed image calculated from projections of the original image. Subsequently, the original projection data is compared with the calculated projections and the image is updated based on

32

P. A. García-Higueras and D. Jimena-Hermosilla

the found differences. This process is repeated for multiple iteration steps until a specified number of times or until the differences in the original and calculated projections become very small.

2.1.4 Image Quality Image quality is a crucial concept for the evaluation of any imaging system [6]. Although it is usually evaluated subjectively, some characteristics allow to establish the system capability to reproduce or distinguish different tissues of a volume or a studied pathology. Basic characteristics when analysing CT image quality are mainly: spatial resolution, contrast resolution, noise, and artifacts.

Spatial Resolution Spatial resolution or high-contrast resolution is the ability of an imaging system to discriminate and represent small details in an image of an interest volume. This capability provides the details of the anatomical structures of the tissues. The elements that affect spatial resolution in a CT scan are summarized as follows: • Hardware of the equipment. The size of the detectors, directly related to the pixel size, the number of detectors or focus size are elements that strongly affect spatial resolution. • Radiation dose which affects the quality of used radiation. The definition of peak kilovoltage (kV) and the amount of emitted radiation (mAs) have a direct relation with the spatial resolution achieved and the patient absorbed dose. • Acquisition parameters. Slice thickness and pitch or FOV affect the spatial resolution. Smaller slice thickness or lower pitch improves the spatial resolution of the image. • Image processing and reconstruction.

2  Principles of Image Formation in the Different Modalities

33

Contrast Resolution Contrast resolution or low-resolution contrast is the ability of an imaging system to discriminate between different structures or tissues within an interest volume. The elements that affect contrast resolution in a CT scan are: • Physical properties of the studied volume. Differences in tissue density and the respective attenuation coefficients will have a high dependence on the image contrast. • Radiation technique. Employing a low number of mAs can reduce the contrast and increase the image noise. Using a low kV increases the contrast resolution, but at the same time the image noise. • Acquisition parameters such as slice thickness or pitch can increase the image contrast. • Equipment characteristics such as the response function of the detector or the dynamic range used, are related to the contrast resolution obtained. • Image processing and mathematical filters applied. Different image processing or reconstruction techniques can be used to increase contrast resolution and decrease image noise.

Image Noise Image noise refers to the presence of random signals or fluctuations that are not related to the patient’s anatomy or pathology. Noise has a negative impact on image quality, as it may hinder the visualization of tissues or obscure subtle details. The most common sources of image noise in CT are as follows: • Noise generated by the equipment. This noise is inherent to the used equipment and can be generated, for example, by the electronic components. • Noise produced by radiation physics. Noise is strongly associated with Compton effect; through a correct selection of the

34

P. A. García-Higueras and D. Jimena-Hermosilla

radiological technique or radiation filters applied, the final image noise can be reduced. • As the thickness of the patient crossed by the radiation increases, Compton effect will be greater and, therefore, the image noise will be higher. • Acquisition parameters. A high pitch value reduces the number of acquisitions and therefore lead to higher image noise. • Mathematical filters and reconstruction methods are available to reduce image noise.

Artifacts Artifacts are undesired structures resulting from distortions, anomalies or interferences which appear in the image and do not represent the patient’s anatomy or pathology. The most common are artifacts from patient movements (intentional or unintentional), artifacts from metallic objects in the volume of interest or artifacts from hardware problems in the equipment. Although the image quality characteristics describe different aspects, they cannot be treated as completely independent factors because the improvement of one of them is often obtained through deterioration of one (or more) of the others.

2.2 Nuclear Medicine Imaging The characteristic of nuclear medicine imaging is the administration of radioactive tracers or radiopharmaceuticals to patients to determine its distribution. For this purpose, an original molecule is labelled in which an atom is replaced by a radioisotope. Metabolic processes distribute the substance within the patient. The amount of radiopharmaceutical introduced (or activity) is measured in millicuries (mCi). For diagnostic purposes, tracers are labelled with short half-­ life radioisotopes that emit gamma photons or positrons. This allows imaging to be performed in a short period of time, with a lower dose deposited in patients and rapid removal from the organism.

2  Principles of Image Formation in the Different Modalities

35

Gamma scintigraphy consists of obtaining planar images by detecting gamma radiation from a radionuclide. Single photon emission computed tomography (SPECT) is a gammagraphic volumetric representation presented in tomographic slices or images. Both techniques are performed on equipment called gamma cameras. Positron emission tomography (PET) is a tomographic technique whose main difference is that it uses radiopharmaceuticals that emit positrons. This technique makes possible to obtain regional distributions of functional processes that cannot be measured by any other technology. Furthermore, the development of PET images allowed this technique to acquire some advantages. Firstly, it is dynamic, which means studies may be performed in a short time, being able to be at the pace of physiological processes kinetics. Secondly, micromoles of the traces could be detected, making this technique sensible. Furthermore, quantitative information might be obtained from physiological processes besides being a non-invasive diagnosis method. Regarding this, PET tomography comes with a CT scanner. This device is attached to PET tomography. This allows to obtain a CT scan before the PET study, giving the possibility to register both images. Consequently, it is possible to combine the advantages of both diagnostic techniques. CT image provides anatomical and morphological information whereas PET study obtains functional data, such as metabolic behaviour of tissues.

2.2.1 Radiopharmaceuticals A radiopharmaceutical is made up of two different components: • Tracer. It conditions the metabolic pathway of the radiopharmaceutical and is directed towards the target organ to be studied. • Radionuclide. It is the isotope which emits the radiation that allows information to be obtained on the process under study.

36

P. A. García-Higueras and D. Jimena-Hermosilla

In PET, 18FDG (18F-2-fluoro-2-deoxy-d-glucose) is the most widely used radiopharmaceutical allowing the measurement of glucose consumption in real time [7]. In cardiology, myocardial blood flow can be measured by means of a compound called 13 N-ammonia as well as 18FDG to study viability and glucose consumption of the heart. In neurology, 18F-DOPA along with 11 C-raclopride are used in dopamine transporter and D2 receptor in brain, being remarkable its application in Parkinson’s disease. In addition, H215O is employed in brain blood flow studies to discover functional behaviour of the brain in aspects as linguistics and pharmacological drugs effects. Lastly, PET tomography is widely used in oncology to diagnose pulmonary nodules, breast cancer, lymphoma and colorectal cancer, head and neck cancer, among others.

2.2.2 Physics Concepts in PET: Decay, Annihilation, and Coincidences Radioactive Decay Each element from the periodic table has multiple isotopes. An isotope of an element is a nucleus with the same atomic number (number of protons) but differing from the number of neutrons. Some isotopes are unstable and have some likelihood to undergo a decay process. If so, they are called radionuclides. These radionuclides may decay by different means. PET is based on the decay path called beta plus decay (β+). Radioisotopes with an excess of protons are likely to decay via β+ [8]. It means one of the protons from the nucleus is converted into a neutron, emitting a positron, and an electronic neutrino in the process. The difference of energy between the father radionuclide and the daughter is shared between the positron and the neutrino. A general expression for this process is expressed below, being X the father nucleus, Y the daughter nucleus (losing one proton), e+ is the positron, and n e corresponds to antineutrino. Finally, Z and A are the atomic and mass number, respectively,

A Z



Y + e + + ve

A Z -1

(2.1)

2  Principles of Image Formation in the Different Modalities

37

Electron–Positron Annihilation Immediately after the decay, the positron loses its kinetic energy (10−1 cm) and, when it is almost at rest, interacts with an electron from the tissue, resulting in the disappearance of both and the emission of two photons of 511 kiloelectronvolts (keV) energy moving in opposite directions. This interaction is called electron– positron annihilation. Considering neither positron nor electron have kinetic energy, the energy of the photons is a result of Einstein’s energy-mass equation as follows, being me and mp the masses of the electron and the positron, respectively, and c the speed of light:

E = mc 2 = me c 2 + mp c 2

(2.2)

Therefore, the two opposed photons resulting from this process are the basis of PET tomography. As these photons are very energetic, they have high probability to escape from the body and to be detected externally. Hence, placing two detectors in the line the photons are moving and detecting them at the same time it is known as the point where the annihilation (that is the decay) happened (Fig. 2.2). A single detection (also called true detection) occurs when two photons coming from an annihilation are detected in a short period of time, called coincidence time (τ). At the same time, not only the true detections occur, but other detections also occur deteriorating the image quality and decreasing the quantitative information. They are called random, scattered, and multiple coincidences (Fig. 2.3).

Scattered Coincidences Scattered coincidences are produced when a photon interacts within the object studied via Compton scattering. These scattered photons are registered by a detector out of the line of coincidence of annihilation photons. Nonetheless, this effect may be reduced using tungsten septa ring, absorbing photons at large angles, and identifying and removing scattered photons, being the low energy resolution of the detector the main drawback for this filtering method.

38

P. A. García-Higueras and D. Jimena-Hermosilla

Fig. 2.2  Neutron-rich radionuclide decays, following a positron emission and finally annihilating after interacting with an electron, producing the opposed photons that will be detected. Figure adapted from the published [9]

Random Coincidences It could happen that two photons produced by different events are detected by two opposed detectors within the timing coincidence window, mistaken it for a true coincidence. The random events rate probability (Crandom) is increased directly with the timing ­coincidence window τ and the single event rate of each detector (S1 and S2) as follows: Crandom = 2t S1 S 2 (2.3) As the activity increases so does the ratio between random/ single events rate probability. For this purpose, the use of septa rings notably reduces this ratio besides the development of faster detector with a lower timing coincidence window.

2  Principles of Image Formation in the Different Modalities

a

b

c

d

39

Fig. 2.3  Different events that result in a detection. True coincidences (a) are important to create the image. Scattered, random and multiple coincidences (b–d) worsen the image quality and it is necessary to identify and diminish them. Figure adapted from published in [9]

Multiple Coincidences There is a chance of two or more single events to be detected in the timing coincidence window. In such case, the events are normally discarded because it turns out confusing for the event to be positioned. Nevertheless, some annihilations contain information of the quantity and spatial location of the decays. Therefore, under specific circumstances, one of the lines is selected randomly from the multiple detections.

40

P. A. García-Higueras and D. Jimena-Hermosilla

2.2.3 PET Detector Materials A PET scan consists of hundreds of detectors set in the shape of a ring surrounding the source, being the two detectors placed in the same movement line of the two opposed photon emitted which detects and identify those photons coming from the electron positron annihilation [10]. Generally, inorganic scintillators are employed as detectors. Absorbed energy from the photon takes an electron from the material to a higher energy state. Afterwards, that empty state is occupied by another electron from a higher state, emitting a photon in that transition. Eventually, those scintillation photons are detected by a photomultiplier tube cathode, which transform them into electrons being accelerated by electric fields in different steps. Those electrons are collected by the anode generating an electric signal necessary to obtain the final image. Originally, PET detectors were made of NaI(TI) crystals used in gamma cameras. It does not take excessive difficulties to manufacture large surfaces of those crystals, nonetheless, they have a low sensitivity to 511 keV photons. Thus, in the 70s, it was introduced a new material called bismuth germanate (Bi4Ge3O12) as known as BGO. Despite having a worse time and light production properties, it resulted to be more sensitive to annihilation photons, becoming massively used in PET scanners. It was in the late 90s when new scintillators were developed with more appropriate characteristics for PET, being used in PET tomography nowadays. They are lutetium oxyorthosilicate (LSO) and gadolinium oxiorthosilicate. Both materials improve photon detection efficiency having a narrower time coincidence window, with a better energy resolution, allowing dismissing random detections for instance. Furthermore, new materials were introduced such as lutecium-­ yttrium oxyorthosilicate (LYSO). Its low coincidence window permits the use of “time of flight” technique (TOF) [9]. This method is based on the production of annihilation photons on a point separated from the centre of the scan. These photons go across different distances to achieve the respective opposed detec-

2  Principles of Image Formation in the Different Modalities

41

tor. Hence, using rapid detectors (LYSO) with a narrow coincidence window it is possible to detect the time difference between both detections and, therefore, the point from the coincidence line where they were produced may be measured with a certain uncertainty. In consequence, this technique improves image quality specially in heavier patients tomographies.

2.2.4 Image Acquisition and Reconstruction In a 2D acquisition, a uniform angular sampling is made. The events collected by every pair of detectors are arranged in a 2D matrix called synogram. The elements of this matrix correspond to the numbers of events detected by a pair of detectors ordered such that rows are a function of the azimuthal angle of the coincidence lines and columns are ordered depending on the distance from the coincidence line to the centre. When 3D acquisitions are performed it is necessary to include a polar angle besides the azimuthal one. Additionally, it is important to perform some corrections to the collected data to optimize resulting images and minimize undesirable effects: • Death time correction: the death time is the time required by a detector to process and register an event, so it is not possible to detect anything meanwhile. This effect is more significative at high activity rates. It is corrected by empiric relations between true and measured detections ratio. • Normalization: PET tomographs have thousands of detectors so, they can present slightly differences of thickness, light emission and electronic of each detector. It is necessary to calibrate periodically every detector by different methods such as the use of a linear 68Ge source. • Random detections correction: as it was explained previously, random events, reduce the image quality and distort the activity values. Therefore, a method to prevent this is called delayed window method. It consists of a new measure 50 ns

42

P. A. García-Higueras and D. Jimena-Hermosilla

delayed from the time coincidence window, giving a measure of the random detections and being capable to subtract it from the coincidence measure in real time. • Scattered photons correction: this effect is corrected by different and complexes mathematical treatment of the data after the random detection correction. Such methods depend on whether it is a 2D or a 3D reconstruction. • Attenuation correction: attenuation deteriorate PET images. Moreover, this is the most important effect to be corrected. On this purpose, four methods may be considered: measured corrections, CT based correction, mathematical method, and segmentation correction. After acquisition and corrections, the image must be reconstructed. Reconstruction is aimed to obtain cross-section images of the spatial distribution of the radionuclide throughout the object under study. Basically, there are two reconstruction approaches. The first approach is analytic essentially, that is, it uses mathematical basis of computed tomography to relate line integral tomography to the activity distribution in the object. On this matter, there are multiple reconstruction algorithms such as filter backprojection and Fourier reconstruction [5]. To reconstruct the image, a synogram information is needed. Every angle value of this projection data undergo a Fourier transform and the resulting values are arranged in a grid as a function of the azimuthal angle Φ. It is applied as an inverse Fourier transform on the grid to obtain the reconstructed image. Unlike simple backprojection, filtered backprojection uses a filter during reconstruction. This is called ramp filter and, basically, have an impact on high frequencies, removing them and, eventually, reduce blurring present in simple backprojection images. The second approach uses iterative methods to model the data collection process to obtain, by means of successive iterative steps, the image that best accurately matches the measured data.

2  Principles of Image Formation in the Different Modalities

43

2.2.5 Image Quality The factors affecting the image quality in nuclear medicine might be [11]: • Radiopharmaceutical properties: the choice of radiopharmaceutical, its half-life, energy, and emission characteristics play a significant role in the image quality. Radiopharmaceuticals with appropriate characteristics for the specific study are essential. • Radiopharmaceutical dosage: administering the appropriate radiopharmaceutical dosage based on the patient’s weight and condition is crucial for achieving optimal image quality. • Radiotracer uptake: the uptake and distribution of the radiotracer within the patient’s body can affect image quality. Variability in uptake patterns can lead to image distortion or poor contrast. • Patient preparation: patient preparation, including fasting, hydration, and any necessary medication adjustments, can influence image quality. Adequate patient cooperation is also essential to minimize motion artifacts. • Detector sensitivity: the sensitivity of the gamma camera or PET scanner detectors impacts image quality. More sensitive detectors can provide higher-quality images with better signal-­ to-­noise ratios. • Collimator design: collimator is a critical component that shapes the gamma photon paths to form the image. Different collimator designs have varying trade-offs between spatial resolution and sensitivity. • Acquisition time: the duration of image acquisition can affect image quality. Longer acquisition times can lead to better image quality but may not always be practical due to patient comfort and radiotracer decay.

44

P. A. García-Higueras and D. Jimena-Hermosilla

• Motion artifacts: patient motion during image acquisition can result in blurring or misalignment of structures. Strategies to minimize motion, such as immobilization devices or gating techniques, are important. • Noise: various sources of noise, such as statistical noise, electronic noise, and patient motion, can degrade image quality. Noise reduction techniques and longer acquisition times can mitigate this issue. • Attenuation correction: correcting for attenuation (absorption and scattering of radiation within the body) is crucial for accurate image quantification and improved image quality, especially in PET imaging. • Scatter correction: scattered gamma photons can degrade image contrast and quality. Advanced algorithms can be used to correct for scatter and improve image quality. • Image reconstruction algorithms [11]: the choice of image reconstruction algorithm can significantly impact image quality. Iterative reconstruction methods often provide superior results compared to traditional filtered back projection. • Count statistics: sufficient counts are needed to generate high-­ quality images. Low-count studies may result in noisy images with poor contrast. • Technician skills: the skill and experience of the nuclear medicine technician in positioning the patient, setting acquisition parameters, and monitoring the procedure can influence image quality. • Quality control: routine quality control measures, including calibration and maintenance of imaging equipment, are essential to ensure consistent and high-quality nuclear medicine images. • Post-processing: image post-processing techniques, such as image filtering and contrast enhancement, can be used to improve image quality and diagnostic accuracy. Optimizing these factors and adhering to best practices in nuclear medicine imaging can help ensure that high-quality images are obtained, leading to more accurate interpretations.

2  Principles of Image Formation in the Different Modalities

45

2.3 Magnetic Resonance Imaging Magnetic resonance (MR) imaging (MRI) is based on the tissue properties of responding to magnetic fields and radiofrequency waves to generate detailed images of the different body structures [12]. Magnetic field is a vectorial magnitude (with magnitude, direction, and sense) and is measured in units of Tesla (T) in the International System. The advantages of MR include the fact that it does not use ionizing radiation, allows multiplanar acquisitions and provides a large amount of information for each anatomical slice, allowing dynamic and functional studies. The disadvantages include longer acquisition times, the sequences are complex to optimize, high heterogeneity dependent on acquisition parameters and it is more expensive than other imaging techniques.

2.3.1 Hardware Components of MR  agnetic Field Magnet M To obtain an MRI, it is necessary to create a very intense, uniform, and stable magnetic field within a defined volume. To generate this magnetic field, magnets are used, being the most common magnetic fields generated in actual equipment of 1.5  T and 3 T. The higher the intensity of the magnetic field is, the stronger the signals are obtained. Most MR equipment currently used in the clinical environment employs superconducting magnets to generate the main magnetic field. These fields are generated by wire coils through which high intensity current flows. The conductive wires are usually made of metallic alloys (commonly niobium and titanium) which lose their resistance to the current flow when cooled to temperatures close to absolute zero, becoming “superconductive”. For this purpose, the conductive wires are immersed in a liquid He bath. The main maintenance cost of this equipment is to refill the Helium gas (about once a year), which gradually evaporate.

46

P. A. García-Higueras and D. Jimena-Hermosilla

 agnetic Field Gradient Magnets M MRI equipment use magnetic field gradients to create a spatial differentiation of the studied region. This creates spatial coding along X, Y, and Z axes to produce sagittal, coronal, and axial slices, respectively. Oblique slices can be obtained by activating several coils simultaneously. These magnetic fields are much weaker than the main field. Radiofrequency Coils They generate the radiofrequency radiation (RF) and are also responsible for detecting the signal returned by the studied tissues. The most important components are as follows: • Frequency synthesizer: produces a central frequency which is matched to the excitation frequency of the nuclei. • RF envelope: produces a range of frequencies (bandwidth) around the central frequency. • Power amplifiers: They magnify the RF pulses in order to increase the energy responsible for exciting the nuclei. • Transmitting and receiving antennas: there are many types and they are responsible for emitting RF signals to excite the nuclei and collecting the signal emitted by the tissues.

2.3.2 Physical Basis Most existing MR equipment is based on the excitation of hydrogen (H) atoms [13]. H nucleus consists of a single proton and is the most abundant element in living organisms as it is part of water molecules.

 uclear Spin and Magnetic Moment N Spin is a particle property which is quantized, i.e., it only takes certain discrete values. Particles with non-zero spin, such as protons, being electric charged in motion, generate around them a magnetic field that has an associated vector “magnetic moment” (μ), which is oriented in the spin direction.

2  Principles of Image Formation in the Different Modalities

47

In a nuclei, composed of protons and neutrons, the spins tend to be paired, because this is an energetically favourable situation. Magnetically active nuclei have a non-zero spin, i.e., those nuclei that have an odd number of protons and/or neutrons. The H atom, being composed of a single proton, has a spin. Without an external magnetic field, the direction of the magnetic moment of the H nucleus is randomly oriented, cancelling themselves mutually and resulting, macroscopically, in a total magnetic charge of the body equal to zero. However, when an external magnetic field is applied, the magnetic moment (μ) tends to align in the direction of the magnetic field.

 recession and Larmor Frequency P The magnetic moment lines μ of the H nucleus are not completely parallel to the direction of the magnetic field and perform a conical rotational movement, defined as precession motion (Fig. 2.4). The precession angle is determined by quantum laws, but the precession frequency is characteristic of each nucleus and depends on the applied magnetic field (B). This characteristic frequency is called Larmor frequency (v) and is calculated as follows, being γ the gyromagnetic ratio of the particle, which depends on its charge and mass:

v = Bg

(2.4)

 arallel or Antiparallel Alignment P Once the H nucleus is introduced into a magnetic field B, there are two possibilities: to be positioned in a lower energy state parallel to the magnetic field or to be positioned in a higher energy state antiparallel to the magnetic field. Considering the human body and the magnetic fields used in MR at ambient temperature, there is an excess of H nuclei positioned parallel to those positioned antiparallel. The vector sum of the magnetic moments at each nuclear spin is defined as the net magnetization vector, represented by M. Without an external magnetic field, the magnetization vector is generally zero, because the magnetic moments of the H nuclei are

48

P. A. García-Higueras and D. Jimena-Hermosilla

Fig. 2.4  The particle rotation (spin) generates a magnetic moment μ. When an external magnetic field B is applied, the magnetic moment tends to be oriented in the direction of the magnetic field (B), forming an angle θ with it and producing a precession motion at Larmor frequency. Figure adapted from published in [13]

oriented in random directions. When a magnetic field is applied, there is an excess of parallel oriented nuclei, originating a net magnetization oriented in the direction and sense of the magnetic field without any transverse magnetization component.

 esonance and Nutation Motion R Only with the magnetic field B presence, the magnetization vector M is in equilibrium (Fig. 2.5a). To obtain information for image generation, it is necessary to excite the nuclei that compose the tissues. This is done by applying radiofrequency pulses at Larmor frequencies of the nuclei to be excited. During the radiofrequency pulse, the H nuclei with lower energy (parallel state) absorb energy and switch to a higher energy

2  Principles of Image Formation in the Different Modalities

a

49

b

Fig. 2.5 (a) In presence of a magnetic field B, the magnetization vector M is in equilibrium with the same direction and sense of B due to a higher concentration of atoms in parallel state. (b) Once the RF pulse is applied, the atoms precess coherently causing M to be projected on the transverse plane (X,Y). When the RF pulse ends, the particles enter into a relaxation phase returning to their original state (a). Figure adapted from published in [13]

state (antiparallel state). Macroscopically, the magnetization vector M moves away from its equilibrium position during the pulse. In addition to the change of the magnetization vector in the Z axis, all the protons subjected to the RF pulse enter into resonance simultaneously, i.e., coherently (Fig. 2.5b). This coherence produces, besides modifying the magnetization vector in the Z axis, a growth of the vector projected on the transverse plane (X,Y). This process is called “radiofrequency pulse excitation”.

Longitudinal Relaxation: T1 At the moment the RF pulse ends, the H nuclei release their energy to the surrounding medium, so some of them oriented in an antiparallel state return to their parallel state. A more homogeneous surrounding medium means a more coherent and uniform energetic release.

50

P. A. García-Higueras and D. Jimena-Hermosilla

Water, due to its chemical properties, accepts with difficulty the energy exchange, so the relaxation is coherent and very slow. This fact causes water to have a long longitudinal time (long T1). In contrast, fat, due to its mobility, produces rapid energy exchanges and has short T1 relaxation times. Instead of a single RF pulse, several pulses are emitted separated by a time called relaxation time (TR), the tissues with a long T1 (water) do not have time to reach total relaxation, so they will have fewer relaxed nuclei available to excite when a new RF pulse arrives. This produces that tissues with a short T1 (fat) emit a stronger signal than those with a long T1 (water).

 ransverse Relaxation: T2 and T2* T Once the RF pulse ends, there is a loss of precession coherence or dephasing in the X, Y field. Therefore, some protons will precess more slowly than others depending on the influence of the medium or the local variations of the applied magnetic field. One of these two factors, the inhomogeneity of the external magnetic field, can be compensated. The transverse relaxation time is denominated T2 when the inhomogeneities of the external magnetic field are compensated and only the medium is considered, but if the inhomogeneity of the magnetic field is included, it is denominated T2*. Thus, T2* will always be shorter than T2, since the two influences cause a faster loss of coherence. Free induction decay (FID) is the electrical current induced by the relaxation motion of the H nuclei after the RF signal ends. This signal is registered in a receiving antenna and is processed to obtain the image. The time between sending the RF pulse and collecting the FID is called the echo time (TE). Consequently, with a fixed TE, the tissue that loses the coherence more slowly (long T2) has a stronger signal (water).  roton Density Image (PD) P For a given TE, if the TR is lengthened, the longitudinal relaxation of the tissues will be complete and the effect of the T2 relax-

2  Principles of Image Formation in the Different Modalities

51

ation time will be minimal, so the resulting image will depend on the density of H nuclei in the voxel.

 1, T2 or PD Weighted Images T All MRI have both T1 and T2 components. A correct selection of TR and TE parameters allows a weighting of T1, T2 or a suitable combination of both (PD weighted image). To summarize: • Short TR/Short TE: T1 weighted image • Long TR/Short TE: PD weighted image • Long TR/Long TE: T2 weighted image

2.3.3 Image Acquisition and Reconstruction The magnetic gradient fields Gx, Gy, and Gz are activated to create a spatial encoding along the three space directions. The Gz gradient is used to select the slice along the longitudinal axis, for the transversal plane the Gy (phase encoding) and Gx (frequency encoding) are used [14]. Phase encoding begins when Gy is activated. The rows that receive a higher magnetic field precess at a higher frequency than other rows that receive a lower magnetic field. When the Gy gradient closes there is a phase shift between the different rows whereby the row in the plane can be uniquely identified. The frequency encoding is activated by Gx, which is perpendicular to Gy, so that each column will receive a different magnetic field. By Larmor’s Law the H nuclei of different columns will precess at a different frequency. To prevent Gx and Gy from overlapping making encoding of each row impossible, a bipolar Gx gradient is applied, with two lobes of the same amplitude and duration but in the opposite directions. During the first lobe (−Gx) no signal is captured and it is used to produce a phase shift that will be compensated for the one produced during the reading. When the second gradient (+Gx) is applied, the echo signal is collected. The second lobe (+Gx) is applied just after the first one and inverts the gradient over the

52

P. A. García-Higueras and D. Jimena-Hermosilla

nuclei of H. Therefore, the nuclei that had been advanced in phase are now delayed and after a time (tx) all the spins are in phase again. During the reading phase each column relaxes at a frequency which depends on its position. At the beginning of the echo signal collection the spins are very phase shifted, so the collected signal is very small. As time elapses the phase shifts are gradually recovered, and the echo signal increases up to a maximum after tx time. Generally, the second lobe is allowed to act again for another tx time to collect the complete echo signal because the signal decreases progressively as the phase shift increases due to the effect of this new gradient (Fig. 2.6). The encoding process of a complete plane will be repeated as many times as the number of rows in a plane, since for each row the signal corresponding to the action of the bipolar gradient Gx with a phase encoding determined by Gy will be collected. The echo signals are digitized and stored orderly in a matrix that constitutes the k-space.

Fig. 2.6  Schematic diagram of the application of magnetic gradients for echo signal generation and spatial encoding of the tomographic plane. Figure adapted from published in [14]

2  Principles of Image Formation in the Different Modalities

53

 -Space K The echo signal collected by the receiving antenna is subjected to a series of stages before digitization [15]. Bandwidth (BW) is the frequency range collected and accepted for digitization measured in Hertz (Hz). The digitizing process of the echo signal is carried out by measuring the voltage at regular time intervals called sampling intervals (∆tm). The number of samples to be taken corresponds to the number of pixels to be displayed in a row. Actually, two components are generated in the process (real component and imaginary component), although for simplicity of explanation, it will be considered as a single component for the moment. By Nyquist’s theorem, a signal can be mathematically reconstructed if it is band-limited and the sampling rate is more than two times the maximum frequency of the sampled signal. The echo signal’s maximum frequency is found at the extreme of the acquire band, that is, at BW/2. Applying Nyquist’s theorem, the minimum reading frequency of the signal will be BW, and the sampling intervals will be determined by Dtm =

1 1 = BW 2 · f max

(2.5) Hence, for each encoding a line of digitized values of the echo signal is obtained spaced by a sampling interval ∆tm. From the Gy values employed, it is possible to arrange the lines obtained from the different echoes as rows of a matrix in which the columns would be separated by a time ∆tm and the rows by the time it takes to transition from one echo signal to another, i.e., the TR.  This matrix, which constitutes the digitized data space in time-domain, must be transformed to frequency-domain data to reconstruct the image. The signals that constitute the echo, obtained by the action of the Gx gradient, belong to a frequency range that depend on the position in the voxel plane, therefore, we obtain as many digitized values expressed in spatial frequency scale (kx) as the number of pixels to represent in a row.

P. A. García-Higueras and D. Jimena-Hermosilla

54

The maximum spatial frequency in the direction of Gx at the extreme of the field of view in that direction (FOVx) is determined by



æ g f max = ç è 2p

æ FOVx ö ÷ · Gx · ç ø è 2

ö ÷ ø

(2.6)

Applying Nyquist’s theorem again, the spatial frequency is given by



Dk x =

1 æ g =ç FOVx è 2p

ö ÷ · Gx · Dtm ø

(2.7)

Since time t is taken based on the maximum value of the TE, the values of kx are ordered on a line of spatial frequencies spaced one interval apart ∆kx ordered symmetrically with respect to the centre. This line constitutes a row in the k-space matrix. In order to fill the matrix, it is necessary to collect as many echoes as voxels in the column, which are encoded by the gradient value Gy. Considering that the time of application of Gy is always the same (ty), the difference between echoes will be the variation of Gy gradient value, therefore: Dk y =

1 æ g =ç FOVy è 2p

ö ÷ · DG y · t y ø

(2.8)

This is how the data matrix that constitutes the K-space is obtained, where each row is separated by ∆ky and each column by ∆kx, considering also that each matrix position (kx, ky) corresponds to a value (signal strength). Thus, the most external or peripheral line of the K-space matrix will be filled with the highest value of Gy. Since high spatial frequencies carry information about fast signal variations in space, the most external rows of K-space carry information about the spatial resolution of the image. Analogously, the central part of the matrix is where the highest signal intensities are stored because Gy has the lowest values of spatial frequencies, which carry much information about contrast, i.e., the central part of the K-space matrix carries information about contrast resolution.

2  Principles of Image Formation in the Different Modalities

55

As explained at the beginning, with the digitization of the echo signal two components are generated, so that K-space is composed of a matrix of two components (real and imaginary). With both spaces two different images are formed: the magnitude image, which is the image usually presented, and the phase image. When all the K-space data have been filled, a spatial domain image can be obtained by Fourier transformation [5] (FT), assigning a chromatic value (representation scale) to each spatial position (image pixel). Therefore, every image has an associated equivalent K-space, and it is possible to pass from one to the other by Fourier transforms. Although rows and columns of the matrix are coincident in the image and K-space, in each K-space value there is information about the entire image plane, so a position in the image plane (x,y) is not necessarily highly correlated with the analogous K-space value (kx,ky). K-space in MRI is a mathematical space that represents the information collected during the acquisition process and is one of the most versatile tools in image generation. Image artifacts, such as those caused by patient movements or hardware limitations, can appear in K-space and affect image reconstruction. Depending on the way the matrix is filled, the amount of information stored or its rearrangement, different types of an image can be obtained with different acquisition times. Advances in acquisition and reconstruction techniques have been fundamental in improving image quality and speed of MR imaging.

2.3.4 Image Quality There are several parameters that can be adjusted before starting a MRI acquisition that affect image quality [16]. Therefore, to achieve a good quality image in a reasonable period of time, it is necessary select them appropriately. The principal factors responsible for image quality in MRI are signal-to-noise ratio (SNR), contrast-to-noise ratio (CNR), spatial resolution, and image acquisition time.

56

P. A. García-Higueras and D. Jimena-Hermosilla

 ignal-to-Noise Ratio (SNR) S SNR represents the ratio of received signal to received signal noise. A simple way to improve image quality is increasing signal and decreasing noise, although this is not always possible, as some parameters improve SNR at the cost of deteriorating other factors. Parameters that most affects SNR are the following: • Proton density. This affects the received signal, since a tissue with a higher proton density sends a stronger signal than a tissue with a lower proton density. • Voxel volume. The SNR is proportional to voxel volume. Thus, we can increase SNR by increasing the slice thickness or increasing the FOV while leaving the matrix size invariant. Equivalently, the matrix size can be decreased leaving the FOV invariant. This is done by decreasing the number of rows/columns obtained in k-space. • TR, TE, and flip angle of the magnetization vector. A long TR allows complete recovery of the longitudinal magnetization, increasing the signal. In contrast, once a short TE is used, the transverse plane magnetization loss before picking up the signal is minor. The flip angle of the magnetization vector determines the amount of magnetization is generated in the transverse plane. Thus, a sequence using a flip angle of 90° provides more signal, since all the longitudinal magnetization is displaced to the transverse plane to be collected, increasing the SNR. • Number of acquisitions or excitations. Represents the number of times the data collection is repeated. The SNR is proportional to square root of the number of acquisitions: SNR µ Acquisitions number. • Type of coil used in the acquisition.

Spatial Resolution Parameters involved in spatial resolution are slice thickness, the FOV or the image matrix size. To increase the spatial resolution, the voxel volume has to be reduced by decreasing the slice thickness, the FOV or by increasing the matrix size. These improvements, as discussed above, result in a reduction of the SNR.

2  Principles of Image Formation in the Different Modalities

57

 ontrast-to-Noise Ratio (CNR) C It is the difference in a grey scale between tissues. Some of the parameters that affect the CNR are as follows: • Relaxation times T1, T2, and proton density. These properties, related to the emitted signal, are part of the physical properties of the tissues. • TR, TE, and flip angle of the magnetization vector. With a long TR, the magnetization vector is fully recovered before the next pulse is received and therefore, it is available to be shifted to the transverse plane, which will enhance the contrast. Using long TE (T2-weighted sequences) the tissues, in which the magnetization vector in the transverse plane don’t disappear when the echo is read, are those with long T2 relaxation times. Thus, the rest of the tissues will not present signal, and although the SNR is worst, there will be a high CNR. The flip angle of the magnetization vector also influences the CNR, since it determines the amount of magnetization generated in the transverse plane: the greater the angle, the greater the contrast.

Image Acquisition Time A shorter image acquisition time means that the image is less likely to be impaired by patient movements. The parameters affecting acquisition time are as follows: • TR.  Decreasing TR results in an incomplete recovery of the magnetization vector, with a consequent decreasing in the SNR. • Number of phase encodings. By reducing the number of k-space lines, a rectangular field appears. This reduces the acquisition time, spatial resolution and increase the SNR ratio. • Acquisitions number. If the number of acquisitions is reduced, the acquisition time decreases without changing the spatial resolution, but reducing the SNR. • Echo reading time. By reducing the TE, the signal drop is reduced and the SNR is increased. What is used to further

P. A. García-Higueras and D. Jimena-Hermosilla

58

reduce the acquisition time is to obtain a fractioned echo, in which only part of the frequency encodings is acquired, and the rest is calculated by the equipment. In this way the resolution does not decrease, but the SNR is deteriorated. It is important to know these parameters and their interrelationship, since a correct management of these will allow to obtain an image with a great signal, good resolution, and contrast and obtained in the shortest feasible time.

References 1. Brenner DJ, Hall EJ (2007) Computed tomography, an increasing source of radiation exposure. New Engl J Med 357(22):2277–2284. https://doi. org/10.1056/NEJMra072149 2. Johns HE, Cunningham JR (1983) The physics of radiology, 4th edn. ISBN: 0-398-04669-7 3. Curry TS, Dowdey JE, Murry RC (1990) Christensen’s physics of diagnostic radiology, 4th edn. ISBN: 0-8121-1310-1 4. Herman GT (2009) Fundamentals of computerized tomography. Image reconstruction from projections, 2nd edn. ISBN: 978-1-84628-737-7 5. Bracewell RN (2000) The Fourier Transform and its applications, 3rd edn. ISBN: 0-07-303938-1 6. Aichinger H, Dierker J, Joite-BarfuB S, Säbel M (2012) Radiation exposure and image quality in X-ray diagnostic radiology, 2nd edn. ISBN: 978-3-642-11240-9 7. Anand SS, Singh H, Dash AK (2009) Clinical applications of PET and PET-CT.  Med J Armed Forces India 65(4):353–358. https://doi. org/10.1016/S0377-­1237(09)80099-­3 8. Shukla AK, Kumar U (2006) Positron emission tomography: an overview. J Med Phys 31(1):13–21. https://doi.org/10.4103/0971-­6203.25665 9. Surti S (2015) Update on time-of-flight PET imaging. J Nucl Med 56(1):98–105. https://doi.org/10.2967/jnumed.114.145029 10. Cherry SR, Dahlbom M (2006) PET: physics, instrumentation and scanners. ISBN: 978-0-387-32302-2 11. Cherry SR, Sorenson JA, Phelps ME (2012) Image quality in nuclear medicine. Phys Nucl Med 233–251. https://doi.org/10.1016/b978-­1-­ 4160-­5198-­5.00015-­0

2  Principles of Image Formation in the Different Modalities

59

12. Weishaupt D, Köchli VD, Marincek B (2008) How does MRI work? An introduction to the physics and function of magnetic resonance imaging, 2nd edn. ISBN: 978-3-540-30067-0 13. Harms SE, Morgan TJ, Yamanashi WS, Harle TS, Dodd GD (1984) Principles of nuclear magnetic resonance imaging. Radiographics 4:26– 43. https://doi.org/10.1148/radiographics.4.1.26 14. Conolly S, Macovski A, Pauly J, Schenck J, Kwong KK, Chesler DA (2000) The biomedical engineering handbook, 2nd edn. Magnetic Resonance Imaging. ISBN: 0-8493-0461-X 15. Horowitz AL (1995) MRI Physics for Radiologists: a visual approach, 3rd edn. ISBN: 978-0-387-94372-5 16. Westbrook C, Kaut C (2006) MRI in clinical practice, 2nd edn. ISBN: 978-1-84628-161-7

3

How to Extract Radiomic Features from Imaging A. Jimenez-Pastor and G. Urbanos-­García

3.1 Introduction to Radiomic Analysis Radiomic analysis has been widely applied in cancer research and has demonstrated its potential to improve patient care. For instance, radiomic analysis can provide information on tumor heterogeneity, which is known to be an important factor in cancer progression and treatment resistance. In addition, radiomic analysis has shown promising results in patient staging and risk stratification at diagnosis [1, 2] and in predicting patient outcomes, such as overall survival, disease-free survival, and progression-­ free survival [3, 4]. Furthermore, radiomic analysis can be used to predict treatment response and identify patients who are likely to benefit from specific therapies [5, 6]. This can help avoid unnecessary treatments and reduce the risk of side effects, ultimately improving patient quality of life. These predictions are based on a combination of radiomic features, clinical data, and other biomarkers. By integrating multiple sources of data, radiomic analysis can provide a more comprehen-

A. Jimenez-Pastor (*) · G. Urbanos-García Department of AI Research, Quibim, Valencia, Spain e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Á. Alberich-Bayarri, F. Bellvís-Bataller (eds.), Basics of Image Processing, Imaging Informatics for Healthcare Professionals, https://doi.org/10.1007/978-3-031-48446-9_3

61

62

A. Jimenez-Pastor and G. Urbanos-García

sive understanding of the patient’s disease status and help clinicians make informed treatment decisions [7]. Radiomic analysis is not limited to cancer research and can also be applied to other medical conditions, such as neurological disorders, cardiovascular diseases, and respiratory diseases. In these fields, radiomics can help identify early disease markers and monitor disease progression. For instance, radiomic analysis has been used to diagnose Alzheimer’s disease and predict cognitive decline in older adults [8]. In cardiovascular diseases, radiomic analysis has been used to evaluate the morphology and function of the heart, which can help identify patients at risk of developing heart failure or other cardiovascular complications [9]. Finally, in respiratory disorders radiomics has been used to stage idiopathic lung disease (ILD) [10] or to stratify risk patients with chronic obstructive pulmonary disease (COPD) [11]. Radiomic analysis can be applied to any image modality, including computed tomography (CT), magnetic resonance imaging (MRI), positron emission tomography (PET), and others. The process of radiomic analysis involves several steps, starting with image acquisition and preprocessing, followed by the segmentation of the region of interest (ROI). Once the ROI is defined, radiomic features are extracted, which include shape, intensity, texture, and statistical measures, among others. These features are then subjected to statistical and machine learning algorithms to identify patterns and relationships between features and clinical endpoints. In addition to radiomic features, other features such as deep features or other imaging biomarkers can be extracted from the image. These features can be combined with radiomic features to provide the model with a more comprehensive view of the patient’s condition. To extract deep features from medical images, pretrained convolutional neural networks (CNNs) are used. These architectures utilize multiple convolutional filters to extract features at different levels. This approach results in a high-­ dimensional feature vector for each input image. On the other hand, imaging biomarkers are parameters that are computationally derived and have been shown to correlate with a physiological or pathological process. Some examples of imaging biomarkers

3  How to Extract Radiomic Features from Imaging

63

include the apparent diffusion coefficient (ADC), which is a measure of tissue cellularity, obtained from MRI diffusion weighted images (DWI); Ktrans, which is a measure of tissue permeability, obtained from MRI dynamic contrast enhanced (DCE); and proton density fat fraction (PDFF), which is a measure of fat concentration, obtained from multi-echo T1 weighted (w) MRI.  These are just a few examples of the many imaging biomarkers that can be extracted from medical images using various computational methods. The use of imaging biomarkers can help to improve diagnostic accuracy, predict treatment response, and monitor disease progression. Additionally, imaging biomarkers can provide valuable insights into the underlying biology of diseases [12]. Despite the potential benefits of radiomic analysis, there are still several challenges that need to be addressed. One major challenge is the lack of standardization in image acquisition and analysis protocols, which can affect the reproducibility and reliability of radiomic features [13]. Also, the lack of robustness found across some radiomic features due to intra- and inter-observer variabilities [14]. Another limitation is the need for larger datasets to develop robust and reproducible predictive models. Also, the lack of external validations limits the application of these solution in clinical practice. Finally, the integration of radiomic analysis into clinical workflows remains a challenge, as it requires a multidisciplinary approach and collaboration between radiologists, oncologists, and other healthcare professionals. However, there are different initiatives to try to overcome all these limitations. In conclusion, radiomic analysis is a rapidly growing field that has the potential to transform medical imaging and improve patient outcomes. By providing quantitative information on tissue structure and function, radiomic analysis can help clinicians make more informed treatment decisions and develop personalized treatment strategies. However, several challenges need to be overcome to ensure the widespread adoption of radiomic analysis in clinical practice. With continued research and development, radiomic analysis holds great promise in advancing personalized medicine and improving patient care.

64

A. Jimenez-Pastor and G. Urbanos-García

3.2 Deep Learning vs. Traditional Machine Learning When building a predictive model based on medical imaging, there are two main approaches: feature-based models and imaging-­based models (Fig. 3.1). Feature-based models involve extracting features from the images, which can be based on radiomic features, deep features, or other imaging biomarkers used for tissue characterization. Before feature extraction, usually, a ROI is defined, therefore, only a limited area of the image is analyzed, losing contextual image information. Machine learning pipelines are then used to build prognostic models, based on these input features, for the prediction of clinical outcomes. However, a main limitation of this approach is that these features are sensitive to differences between scanners and acquisition protocols, which can lead to reduced generalizability of the model. However, initiatives such as the Quantitative Imaging Biomarkers Alliance (QIBA), the European Imaging Biomarkers Alliance (EIBALL) or the Imaging Biomarkers Standardization Initiative (IBSI) have been established to promote standardization in image acquisition and processing and facilitate data sharing among researchers. The IBSI is an international collaborative effort between researchers, clinicians, and industry partners aimed at promoting radiomics standardization through the development of guidelines and ­standards

Fig. 3.1  Feature-based models vs. imaging-based models in the development of predictive models in medical imaging

3  How to Extract Radiomic Features from Imaging

65

for radiomics feature extraction and analysis. In addition, feature harmonization techniques such as ComBat can be applied to minimize these differences at a later stage [15, 16]. In contrast, imaging-based models have emerged as a promising alternative to feature-based models. One of the reasons for this is that imaging-based models can learn features automatically, which can help overcome the limitations of feature-based models that are sensitive to inter-scanner and inter-protocol variability. Furthermore, imaging-based models can detect complex patterns that may not be visible to the human eye or through manual feature extraction. Also, the whole image can be used as input to these models (i.e., an ROI is not required to be defined), using all the contextual information. This method is based on end-to-­ end solutions using deep learning (DL), specifically CNN.  A CNN is built by multiple convolutional layers, used to extract features from the input image. During the CNN training process, the weights of these layers are learned, allowing the features extracted to be adapted to the specific problem. This is a significant advantage of imaging-based models compared to feature-based models. Additionally, with a large and heterogeneous dataset, a CNN can learn the differences found across scanners and acquisition protocols and provide a more generalizable solution. However, CNNs are complex models with many parameters to adjust during the training process. As such, large datasets are required to avoid overfitting to the training data, which can make their use challenging due to the difficulties of collecting such large datasets. Another challenge in building predictive models based on medical imaging is the interpretability of the models. DL models, such as CNNs, are often criticized for being black box models, meaning that it can be challenging to understand how the model arrives at its predictions. While recent advances in interpretability techniques have addressed this issue to some extent [17], it remains a significant challenge in the field. One of the most significant challenges in building predictive models based on medical imaging is the availability of high-quality data. Collecting and curating large datasets is a time-consuming and expensive process, particularly when dealing with medical images that require annotation by trained experts.

66

A. Jimenez-Pastor and G. Urbanos-García

In conclusion, the use of medical imaging for predictive modeling is a rapidly evolving field with significant potential for improving patient outcomes. Both feature-based models and imaging-based models have their advantages and limitations, and the choice of approach depends on the specific problem and available data. In some cases, feature-based models may be sufficient, while in others, imaging-based models may be necessary to achieve higher accuracy and generalizability. As more high-­ quality data becomes available, and initiatives continue to promote standardized data sharing and collaboration, it is expected that the field will continue to advance, leading to more accurate and interpretable predictive models.

3.3 Radiomic Features Extraction Process The goal of radiomic features extraction is to obtain as much quantitative information as possible from the medical images, which can be used for subsequent analysis. Radiomic features can be categorized into four main groups: shape-based, intensity-based, texture-based, and higher order features: • Shape-based features describe the shape and size of the lesion and include features such as volume, surface area, compactness, and sphericity. • Intensity-based features describe the intensity distribution within the ROI and include features such as mean, median, variance, skewness, and kurtosis. • Texture-based features describe the spatial distribution of intensity values within the ROI and include features such as entropy, energy, homogeneity, and correlation. • Finally, higher order features are extracted from a derived image after applying a filter to the original image. The most common filters are the wavelet transformation which decompose the image into different frequency bands and extract features from each band; the Laplacian of Gaussian filter ­

3  How to Extract Radiomic Features from Imaging

67

(LoG), that enhances edges; or other mathematical operations such as square, square root, logarithm or exponential. Both intensity-­based and texture-based features are extracted from these derived images. Therefore, following this process, thousands of features are extracted from the ROI. The process of radiomics feature extraction typically involves several steps.

3.3.1 Image Preprocessing First, the medical images are preprocessed to enhance the quality of the images and remove noise. Also, image harmonization techniques can be applied at this stage. These preprocessing methods aim to reduce variabilities across images acquired in different scanners and acquisition protocols. DL has shown great results in both image quality enhancement [18] and harmonization [19]. One of the most common and basic normalization techniques is the z-score normalization where the image intensities are normalized to a distribution of zero mean and unit variance. Also, in some cases, can be interesting to remove outliers from the image to avoid biasing the radiomic values to spurious voxel intensities, The most commonly used method is to calculate the mean (μ) and standard deviation (σ) of the intensities within the ROI and to exclude those outside the range μ ± 3σ. Another source of variance is the size of the image and the reconstructed voxel size [20]. When images are acquired using different scanners, the voxel dimensions can vary between images, resulting in discrepancies in the extracted radiomic features. To mitigate these differences, images are usually resized to reduce the variability across scans. Currently, there is not clear recommendation whether to upsample or downsample the image or the exact final voxel size. In general, if the spacing between slices is small compared to the voxel size in the acquisition plane, the image can be resized to an isotropic voxel, commonly of size 1 or

68

A. Jimenez-Pastor and G. Urbanos-García

2  mm3. However, for images with low resolution between the slices, a 2D approximation is often used, where the image is resized to have isotropic pixels, meaning the same voxel size only in the acquisition plane. Resizing the image in this way can help to reduce the variability of the radiomic features and improve the accuracy of the analysis. To address the image harmonization problem, AI-based methods are developed minimizing the influence of the type of scanner used, the center to which the patient belongs, and the parameters used for image acquisition.

3.3.2 Image Segmentation Once the image is ready, the ROI is defined by a radiologist or automated segmentation algorithm. The process of ROI definition is a critical step in radiomic features extraction as it can significantly impact the reproducibility of the results. Manual segmentation by a radiologist is the gold standard, but it is time-consuming and can introduce intra- and inter-reader variability [21], particularly in cases where the ROI borders are unclear. To mitigate this limitation, multiple readers may be used, and radiomic features with a low intraclass correlation coefficient (ICC) can be discarded. The ICC is a measure of reproducibility, where 0 indicates no reproducibility, and 1 indicates perfect reproducibility. To overcome the limitations of manual segmentation, semi-automatic and automatic methods have been developed. Semi-automatic methods, such as thresholding or region growing, can reduce intra-reader variability while maintaining inter-reader variability, but fully automatic methods based on DL can significantly reduce both sources of variability. However, it is important to note that automatic methods generate systematic errors, which can be minimized by training them on large and heterogeneous datasets. In recent years, deep learning-based solutions for automatic segmentation of organs and lesions have shown promising results compared to traditional computer vision methods. However, one of the main disadvantages of these automatic methods is the lack of generalization to new data. When using an automatic method, it is

3  How to Extract Radiomic Features from Imaging

69

important to analyze if any external validation was done after model building and to assess its performance on the dataset of interest.

3.3.3 Feature Extraction and Selection Once the ROI is defined, radiomic features can be extracted using specialized software or programming libraries. A variety of open-­ source and commercial packages are available, including PyRadiomics, LIFEx, and PyCWT.  The choice of package depends on several factors, such as the type of images being analyzed, the specific features of interest, and the user’s level of expertise. With the availability of these tools, radiomic analysis has become more accessible and standardized, enabling its broader adoption in clinical practice and research. The choice of radiomic features to extract is a critical step in the analysis of medical imaging data and can vary depending on the specific research question or clinical application. Typically, a large number of features are extracted for each use case, therefore, to avoid overfitting, the next step is to reduce the number of features through a process of feature selection. Several methods can be used for feature selection, including traditional statistical and machine learning techniques, as well as more recent methods such as deep feature selection. It is worth to mention that when dealing with radiomic features, it is crucial to perform a correlation analysis before feature selection. Many radiomic features are highly correlated, and it is important to identify these correlations and remove redundant features to avoid overfitting the model. This can be achieved by keeping only the most informative feature when two variables are highly correlated. Overall, careful selection of radiomic features is essential for building accurate and robust predictive models from medical imaging data. It is important to note that the quality and reproducibility of radiomic features can be affected by several factors, including image acquisition protocols, segmentation methods, and feature extraction algorithms. Therefore, it is essential to ensure standard-

70

A. Jimenez-Pastor and G. Urbanos-García

ization and validation of the radiomics workflow to ensure robust and reliable results. Several initiatives, such as IBSI, have been launched to address these challenges and promote the use of radiomics in clinical practice.

3.3.4 Standardization Radiomics standardization is a crucial aspect of medical imaging research, especially when using radiomic features for predictive modeling. Standardization helps to ensure that results are reproducible, and that radiomic features can be compared across different studies and datasets. The IBSI guidelines [22] provide recommendations for all aspects of the radiomics workflow, from image acquisition and preprocessing to feature extraction and analysis. They include recommendations for the definition of the ROI, image preprocessing, and feature extraction parameters. The IBSI guidelines also provide a standardized nomenclature for radiomic features, which helps to ensure consistency across studies and datasets. The standardization of radiomic features extraction is divided in two chapters: • Chapter 1 focused on the standardization of 169 commonly used radiomic features. This was initiated in 2016 and completed in 2020. Therefore, a standard radiomics image processing scheme together with the reference values to different radiomic features are provided. • Chapter 2 dedicated to the standardization of commonly used imaging filters in radiomic studies (e.g., wavelet, LoG, etc.), since features derived from filter response maps have been found to be poorly reproducible. Therefore, the main goal is to standardize the way image filters for radiomics studies are implemented to improve reproducibility. Each chapter is divided in three phases: (1) standardization of radiomic feature computations using a digital phantom without image processing; (2) standardization of radiomic feature compu-

3  How to Extract Radiomic Features from Imaging

71

tations under a general image processing scheme using CT data of a lung cancer patient; and (3) validation using a multi-modality imaging dataset of multiple patients. To demonstrate an algorithm follows the IBSI guidelines in radiomic features extraction, these guidelines, given in specific image processing process, provide the exact value (together with a tolerance) that the algorithm should provide for each radiomic feature. The IBSI guidelines are constantly evolving and being updated based on new research findings and feedback from the community. These guidelines are freely available online, and software packages for implementing them are also available. Following the IBSI guidelines is not only important for ensuring consistency across studies but can also help to improve the accuracy and reliability of radiomics-based predictive models. In conclusion, the IBSI guidelines are an essential resource for researchers and clinicians working with radiomics data. Standardization ensures that radiomic features can be compared across studies and datasets, leading to more robust and reliable results. Following the IBSI guidelines not only ensures consistency but also facilitates the development of more accurate and reliable predictive models, ultimately improving patient outcomes.

3.4 Deep Learning Radiomic Features In recent years, DL has shown a good performance in medical image classification, detection or segmentation [23–25]. The fundamental notion of DL is neural network (NN) that emulates the behavior of the human brain to solve complex data-based problems. A NN transforms the input image through several layers by extracting features from images. These features contain textures and edges information that propagates through the layers of the network maintaining the spatial relationships in the image. Commonly, features extracted from the last layer of the NN are called deep features. The approach of extracting deep features from medical imaging is referred to as deep learning-based radiomics (DLR).

72

A. Jimenez-Pastor and G. Urbanos-García

DLRs have been used for disease diagnosis such as cancer type prediction [26] or survival prediction [27]. These features can be extracted through different DL architectures, in imaging, the most common architectures are based on CNN. The deep features extraction issue can be approached in different ways and from different levels. On the one hand, the images input can be at slice level, at volume level or at patient level. On the other hand, DLRs can be extracted from either pretrained or custom models. Designing a model from scratch has the advantage of having a network adjusted to the problem to solve. However, there may be problems such as overfitting and class imbalance due to the lack of available training datasets. To solve these problems, transfer learning (TL) has been used as an alternative to construct new models. TL is based on using a DL model pretrained with a natural image dataset and retrain the network with the desired medical dataset to fine-tuning the hyperparameters. This approach has been used in different studies. For example, in automatic polyp detection in CT colonography [28], detection and classification of breast cancer in microscope images [29] or pulmonary nodules in thoracic CT images [30]. TL is usually applied using a pretrained CNN model such as GoogleNet [31], Visual Geometry Group Network (VGGNet) [32] or Residual Networks (ResNet) [33] trained with the data from the ImageNet dataset. Deep features from pretrained CNNs have achieved higher prediction accuracy than hand-crafted radiomics signatures and clinical factors [34, 35]. However, TL has been arbitrarily configured in most studies, and it is not evident whether good performance is obtained until an evaluation of the model is performed. DLRs can be extracted using both discriminative and generative deep learning networks [36]. Discriminative models are supervised learning and use labels to distinguish classes, such as distinguishing lesion from healthy tissue. Generative models are unsupervised learning and extract general image features to generate new data with the same structure. In consequence, the ­features extracted from generative models can be used as input to a classifier.

3  How to Extract Radiomic Features from Imaging

73

Furthermore, choosing the optimal architecture to extract the DLRs is a challenge that remains to be studied. One of the most popular deep learning-based architecture is CNN. A CNN transforms the input image through convolutional layers, RELU (rectified linear unit) lineal function activation layers and pooling layers responsible for extracting features from images. In case of discriminative models, the output of a CNN can be a classification or regression results or can be used as an input to the rest radiomics problem pipeline [27]. In generative models, auto-encoders are often used and are built by an encoder, in which the input image is codified to a lower dimensional feature space (latent space), and a decoder, in which this latent space is decodified back to the original space. Once the autoencoder is trained to codify-decodify images by minimizing the reconstruction error, the encoder is used to, given an input image, generate a lower dimensional feature space which represents the previously called DLR. Convolutional auto-encoders (CAEs) are used in generative radiomic problems to maintain the spatial correlations of the image [37].

3.4.1 Deep Learning Radiomics and Hand-­ Crafted Radiomics The main benefit of DLR vs. hand-crafted radiomics (HCRs) is that no manual segmentation step is required. Manual segmentations are highly dependent on the reader, which makes them unreliable. In addition, eliminating manual segmentation helps save time from a tedious task for experts and radiologists. However, DLRs extraction requires large datasets and can have a high computational cost. Moreover, deep features are difficult to interpret and explain from a clinical perspective. The combination of HCRs and DLRs features has shown an increase of the model’s performance [38]. This combination can be made in two ways: at decision-level or at features-level. The decision-level approach is based on training the two models (HCRs and DLRs) separately and combining the outputs by

74

A. Jimenez-Pastor and G. Urbanos-García

v­ oting to obtain the best results [39, 40]. This voting can be soft, hard or adaptative. Feature-level fusion consists of concatenating the HCRs and DLRs vectors, applying features reduction to avoid overfitting and using them as input to a model, obtaining better results in lung cancer survival models [27] or tumor detection [38]. Thus, radiomics and deep features have proven to be two novel technologies that have a high potential for early detection, prediction of treatment response and prognosis of the disease. Figure 3.2 shows the different approaches introduced along the chapter, going from HCR to the different approaches to extract DLR and how they can be combined.

Fig. 3.2  Pipeline of the different radiomics models. In yellow, prediction models with HCRs; in blue, prediction models with DLRs; in red, prediction models combining HCRs and DLRs

3  How to Extract Radiomic Features from Imaging

75

References 1. Chetan MR, Gleeson FV (2021) Radiomics in predicting treatment response in non-small-cell lung cancer: current status, challenges and future perspectives. Eur Radiol 31(2):1049–1058. https://doi.org/10.1007/ s00330-020-07141-9. Epub 2020 Aug 18. PMID: 32809167; PMCID: PMC7813733 2. Du P, Liu X, Shen L, Wu X, Chen J, Chen L, Cao A, Geng D (2023) Prediction of treatment response in patients with brain metastasis receiving stereotactic radiosurgery based on pre-treatment multimodal MRI radiomics and clinical risk factors: A machine learning model. Front Oncol 13:1114194. https://doi.org/10.3389/fonc.2023.1114194. PMID: 36994193; PMCID: PMC10040663 3. Huynh LM, Hwang Y, Taylor O, Baine MJ (2023) The use of MRI-­ derived radiomic models in prostate cancer risk stratification: a critical review of contemporary literature. Diagnostics (Basel) 13(6):1128. 13(6):1128. https://doi.org/10.3390/diagnostics13061128. PMID: 36980436; PMCID: PMC10047271 4. Zhang Y, Yang Y, Ning G, Wu X, Yang G, Li Y (2023) Contrast computed tomography-based radiomics is correlation with COG risk stratification of neuroblastoma. Abdom Radiol (NY) https://doi.org/10.1007/s00261023-03875-4. Epub ahead of print. PMID: 36951989 5. Cui Y, Li Z, Xiang M, Han D, Yin Y, Ma C (2022) Machine learning models predict overall survival and progression free survival of non-surgical esophageal cancer patients with chemoradiotherapy based on CT image radiomics signatures. Radiat Oncol. 17(1):212. https://doi.org/10.1186/ s13014-022-02186-0. PMID: 36575480; PMCID: PMC9795769 6. Chu F, Liu Y, Liu Q, Li W, Jia Z, Wang C, Wang Z, Lu S, Li P, Zhang Y, Liao Y, Xu M, Yao X, Wang S, Liu C, Zhang H, Wang S, Yan X, Kamel IR, Sun H, Yang G, Zhang Y, Qu J (2022) Development and validation of MRI-based radiomics signatures models for prediction of disease-free survival and overall survival in patients with esophageal squamous cell carcinoma. Eur Radiol. 32(9):5930–5942. https://doi.org/10.1007/ s00330-022-08776-6. Epub 2022 Apr 6. PMID: 35384460 7. Chen W, Qiao X, Yin S, Zhang X, Xu X (2022) Integrating Radiomics with Genomics for Non-Small Cell Lung Cancer Survival Analysis. J Oncol. 2022:5131170. https://doi.org/10.1155/2022/5131170. PMID: 36065309; PMCID: PMC9440821 8. Feng Q, Ding Z (2020) MRI Radiomics Classification and Prediction in Alzheimer’s Disease and Mild Cognitive Impairment: A Review. Curr Alzheimer Res. 17(3):297–309. https://doi.org/10.2174/1567205017666 200303105016. PMID: 32124697 9. Pujadas ER, Raisi-Estabragh Z, Szabo L, McCracken C, Morcillo CI, Campello VM, Martín-Isla C, Atehortua AM, Vago H, Merkely B, Maurovich-Horvat P, Harvey NC, Neubauer S, Petersen SE, Lekadir K 2022

76

A. Jimenez-Pastor and G. Urbanos-García

Prediction of incident cardiovascular events using machine learning and CMR radiomics. Eur Radiol https://doi.org/10.1007/s00330-02209323-z. Epub ahead of print. PMID: 36512045 10. Gabryś HS, Gote-Schniering J, Brunner M, Bogowicz M, Blüthgen C, Frauenfelder T, Guckenberger M, Maurer B, Tanadini-Lang S 2022 Transferability of radiomic signatures from experimental to human interstitial lung disease. Front Med (Lausanne). 9:988927. https://doi.org/10.3389/ fmed.2022.988927. PMID: 36465941; PMCID: PMC9712180 11. Cho YH, Seo JB, Lee SM, Kim N, Yun J, Hwang JE, Lee JS, Oh YM, Do Lee S, Loh LC, Ong CK (2021) Radiomics approach for survival prediction in chronic obstructive pulmonary disease. Eur Radiol 31(10):7316– 7324. https://doi.org/10.1007/s00330-021-07747-7. Epub 2021 Apr 13. PMID: 33847809. 12. Martí-Bonmatí Luis and Alberich-Bayarri, A (2018) Imaging biomarkers development and clinical integration. Cham: springer international publishing. 13. Jha AK, Mithun S, Jaiswar V, Sherkhane UB, Purandare NC, Prabhash K, Rangarajan V, Dekker A, Wee L, Traverso A (2021) Repeatability and reproducibility study of radiomic features on a phantom and human cohort. Sci Rep 11(1):2055. https://doi.org/10.1038/s41598-021-81. 526– 8. PMID: 33479392; PMCID: PMC7820018. 14. Liu R, Elhalawani H, Radwan Mohamed AS, Elgohari B, Court L, Zhu H, Fuller CD (2019) Stability analysis of CT radiomic features with respect to segmentation variation in oropharyngeal cancer. Clin transl radiat oncol. 21:11–18. https://doi.org/10.1016/j.ctro.2019.11.005. PMID: 31886423; PMCID: PMC6920497. 15. Leithner D, Schöder H, Haug A, Vargas HA, Gibbs P, Häggström I, Rausch I, Weber M, Becker AS, Schwartz J, Mayerhoefer ME (2022) impact of combat harmonization on pet radiomics-based tissue classification: a dual-cen2ter PET/MRI and PET/CT Study. J Nucl Med. 63(10):1611–1616. https://doi.org/10.2967/jnumed.121.263102. Epub 2022 Feb 24. PMID: 35210300; PMCID: PMC9536705. 16. Cabini RF, Brero F, Lancia A, Stelitano C, Oneta O, Ballante E, Puppo E, Mariani M, Alì E, Bartolomeo V, Montesano M, Merizzoli E, Aluia D, Agustoni F, Stella GM, Sun R, Bianchini L, Deutsch E, Figini S, Bortolotto C, Preda L, Lascialfari A, Filippi AR (2022). Preliminary report on harmonization of features extraction process using the ComBat tool in the multi-center “Blue Sky Radiomics” study on stage III unresectable NSCLC. Insights Imaging 13(1):38. https://doi.org/10.1186/ s13244-022-01171-1. PMID: 35254525; PMCID: PMC8901939 17. Zeineldin RA, Karar ME, Elshaer Z, Coburger J, Wirtz CR, Burgert O, Mathis-Ullrich F (2022) Explainability of deep neural networks for MRI analysis of brain tumors. Int J Comput Assist Radiol Surg 17(9):1673– 1683. https://doi.org/10.1007/s11548-022-02619-x. Epub 2022 Apr 23. PMID: 35460019; PMCID: PMC9463287. 18. Zerunian M, Pucciarelli F, Caruso D, Polici M, Masci B, Guido G, De Santis D, Polverari D, Principessa D, Benvenga A, Iannicelli E, Laghi A (2022) Artificial intelligence based image quality enhancement in liver

3  How to Extract Radiomic Features from Imaging

77

MRI: a quantitative and qualitative evaluation. Radiol Med 127(10):1098– 1105. https://doi.org/10.1007/s11547-­022-­01539-­9. Epub 2022 Sep 7. PMID: 36070066; PMCID: PMC9512724 19. Tixier F, Jaouen V, Hognon C, Gallinato O, Colin T, Visvikis D (2021) Evaluation of conventional and deep learning based image harmonization methods in radiomics studies. Phys Med Biol 66(24). https://doi. org/10.1088/1361-­6560/ac39e5. PMID: 34781280 20. Shafiq-Ul-Hassan M, Zhang GG, Latifi K, Ullah G, Hunt DC, Balagurunathan Y, Abdalah MA, Schabath MB, Goldgof DG, Mackin D, Court LE, Gillies RJ, Moros EG (2017) Intrinsic dependencies of CT radiomic features on voxel size and number of gray levels. Med Phys 44(3):1050–1062. https://doi.org/10.1002/mp.12123. PMID: 28112418; PMCID: PMC5462462 21. Covert EC, Fitzpatrick K, Mikell J, Kaza RK, Millet JD, Barkmeier D, Gemmete J, Christensen J, Schipper MJ, Dewaraja YK (2022) Intra- and inter-operator variability in MRI-based manual segmentation of HCC lesions and its impact on dosimetry. EJNMMI Phys 9(1):90. https://doi.org/10.1186/ s40658-­022-­00515-­6. PMID: 36542239; PMCID: PMC9772368 22. Zwanenburg A, Vallières M, Abdalah MA, Aerts HJWL, Andrearczyk V, Apte A, Ashrafinia S, Bakas S, Beukinga RJ, Boellaard R, Bogowicz M, Boldrini L, Buvat I, Cook GJR, Davatzikos C, Depeursinge A, Desseroit MC, Dinapoli N, Dinh CV, Echegaray S, El Naqa I, Fedorov AY, Gatta R, Gillies RJ, Goh V, Götz M, Guckenberger M, Ha SM, Hatt M, Isensee F, Lambin P, Leger S, Leijenaar RTH, Lenkowicz J, Lippert F, Losnegård A, Maier-Hein KH, Morin O, Müller H, Napel S, Nioche C, Orlhac F, Pati S, Pfaehler EAG, Rahmim A, Rao AUK, Scherer J, Siddique MM, Sijtsema NM, Socarras Fernandez J, Spezi E, Steenbakkers RJHM, Tanadini-Lang S, Thorwarth D, Troost EGC, Upadhaya T, Valentini V, van Dijk LV, van Griethuysen J, van Velden FHP, Whybra P, Richter C, Löck S (2020) The image biomarker standardization initiative: standardized quantitative radiomics for high-throughput image-based phenotyping. Radiology 295(2):328–338. https://doi.org/10.1148/radiol.2020191145. Epub 2020 Mar 10. PMID: 32154773; PMCID: PMC7193906 23. Li W (2015) “Automatic segmentation of liver tumor in CT images with deep convolutional neural networks”. J Comput Commun 3(11):146 24. Menze BH, Jakab A, Bauer S, Kalpathy-Cramer J, Farahani K, Kirby J, Burren Y et al (2014) “The multimodal brain tumor image segmentation benchmark (BRATS)”. IEEE Trans Med Imaging 34(10):1993–2024 25. Wang S, Zhou M, Liu Z, Liu Z, Gu D, Zang Y, Dong D, Gevaert O, Tian J (2017) “Central focused convolutional neural networks: developing a data-driven model for lung nodule segmentation”. Med Image Anal 40:172–183 26. Huynh BQ, Li H, Giger ML (2016) Digital mammographic tumor classification using transfer learning from deep convolutional neural networks. J Med Imaging 3(3):034501 27. Paul R et al (2016) Deep feature transfer learning in combination with traditional features predicts survival among patients with lung adenocarcinoma. Tomography 2(4):388–395

78

A. Jimenez-Pastor and G. Urbanos-García

28. Summers RM, Johnson CD, Pusanik LM, Malley JD, Youssef, AM, Reed JE (2001) Automated polyp detection at CT colonography: feasibility assessment in a human population. Radiology 219(1):51–59 29. Wang Y, Sun L, Ma K, Fang J (2018) Breast cancer microscope image classification based on CNN with image deformation. In Image Analysis and Recognition: 15th International Conference, ICIAR 2018, Póvoa de Varzim, Portugal 27–29;2018. Proceedings 15 (pp. 845–852). Springer International Publishing 30. Dehmeshki J, Amin H, Valdivieso M, Ye X (2008) Segmentation of pulmonary nodules in thoracic CT scans: a region growing approach. IEEE transactions on medical imaging 27(4):467–480. 31. Szegedy C et al (2015) Going deeper with convolutions. En Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1–9 32. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 33. He K et  al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770–778 34. Zhu Y et  al (2019) A deep learning radiomics model for preoperative grading in meningioma. Eur J Radiol 116:128–134 35. Zheng X et al (2020) Deep learning radiomics can predict axillary lymph node status in early-stage breast cancer. Nat Commun 11:1–9 36. Afshar P et  al (2019) From handcrafted to deep-learning-based cancer radiomics: challenges and opportunities. IEEE Signal Process Mag 36(4):132–160 37. Echaniz O, Graña M (2017) Ongoing work on deep learning for lung cancer prediction. In: Biomedical applications based on natural and artificial computing: international work-conference on the interplay between natural and artificial computation, IWINAC 2017, Corunna, Spain, June 19–23, 2017, Proceedings, Part II.  Springer International Publishing, pp 42–48 38. Fu L et  al (2017) Automatic detection of lung nodules: false positive reduction using convolution neural networks and handcrafted features. In: Medical imaging 2017: computer-aided diagnosis. SPIE, pp 60–67 39. Hassan AH, Wahed ME, Metwally MS, Atiea MA (2022) A hybrid approach for classification breast cancer histopathology images. Frontiers in scientific research and technology 3(1):1–10 40. Liu S et al (2017) Pulmonary nodule classification in lung cancer screening with three-dimensional convolutional neural networks. J Med Imaging 4(4):041308

4

Facts and Needs to Improve Radiomics Reproducibility P. M. A. van Ooijen , R. Cuocolo , and N. M. Sijtsema

4.1 Introduction Quantitative imaging aims to extract quantifiable features (radiomics, deep features, and/or imaging biomarkers) to determine the normal anatomy or disease, tumor characterization, and chronic condition severity or status. These quantitative measures can be used to get an objective measurement of a biological process or endpoint, to perform early diagnosis, predict patient outcomes, measure response to therapy, or assist surgery planning. However, after the initial hype of quantitative imaging it became clear that not all quantitative information obtained from the imaging data was reliable because of the many dependencies [1]. Currently, in clinical practice, commonly used and accepted quantifiable features are limited to rather simple measurements such as size, volume, or histogram analysis (Fig. 4.1).

P. M. A. van Ooijen (*) · N. M. Sijtsema Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands e-mail: [email protected] R. Cuocolo Department of Medicine, Surgery and Dentistry, University of Salerno, Baronissi, Italy © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Á. Alberich-Bayarri, F. Bellvís-Bataller (eds.), Basics of Image Processing, Imaging Informatics for Healthcare Professionals, https://doi.org/10.1007/978-3-031-48446-9_4

79

80

P. M. A. van Ooijen et al.

Fig. 4.1  Example of quantitative imaging. Lung volumes and areas of low attenuation are provided in cm3 and in percentage of low attenuation

To become clinically accepted and used, the reproducibility of radiomic features is a crucial factor to consider since this ­determines the generalizability of the resulting radiomics-based models. This generalizability is required to be usable in the clinical practice in any hospital around the world and not just at the center where it was developed. However, so far, the reproducibility and generalizability of radiomic features and models have been reported to be limited [2]. The main reason for this low reproducibility is that each step of the radiomic analysis introduces its own factors that influence the final output with often limited to no ability to predict the effect of these factors. This was already demonstrated when moving from qualitative to quantitative radiology where the quantification results, among other factors, heavily rely on the properties of the obtained images which in turn are depending greatly on image

4  Facts and Needs to Improve Radiomics Reproducibility

81

acquisition equipment manufacturer, version, and setup. This reliance on the inherent properties of the acquired images has only increased with the use of radiomics because of the full dependency on the, sometimes hundreds of, radiomic features derived directly from these images. Furthermore, the reproducibility of the radiomics approach and thus the prediction gained from this approach is, in turn, heavily depending on the reproducibility and repeatability of the selected individual radiomic features. High reproducibility indicates that the radiomic features are stable when obtained from imaging data from different origin (site, equipment, imaging acquisition protocol, etc.). High repeatability indicates that the radiomic features are stable when obtained multiple times from the same subject using the same imaging equipment. This chapter will focus on defining the confounding factors that could lead to inaccurate and unreliable radiomic features estimation, explaining how the inconsistencies that exist across CT scanners and MRI scanners/sequences may decrease the reliability of image-derived radiomic features. The needs that underlie the process of evaluating these factors and reducing their influence as far as possible will also be covered.

4.2 Factors Influencing Reproducibility Previous review papers have shown that radiomic features are sensitive to image acquisition, reconstruction, tumor segmentation, and interpolation [3–5] (Table 4.1). They have also shown that the level of sensitivity is depending on the radiomic feature itself. As an example, textural features are reported to be less robust than statistical features [4] and reproducibility was claimed to be higher in first-order features when compared to shape metrics and textural features [5]. However, because of the variation in design and implementation of studies in radiomics and variation in scanner acquisition protocols used, the reported robustness of specific radiomic features or feature groups are sometimes contradictory in various reports.

P. M. A. van Ooijen et al.

82

Table 4.1  Overview of the steps in the radiomics process, the related factors influencing the accuracy and reproducibility, and the possible solutions reported in literature Step Acquisition

Segmentation

Feature extraction

Model construction

Factors influencing accuracy Digital image pre-processing Voxel size/slice thickness Reconstruction filters Contrast enhancement protocol Image noise Patient size Artifacts Lack of accuracy Inter-reader variability Intra-reader variability Validation Feature definition Feature parameter setting Feature implementation Software used for feature extraction Feature selection Machine learning model selection Cut-off selection Model validation

Possible solutions Protocol standardisation/ harmonization Histogram normalization Interpolation

Fixed protocols Consensus segmentation

IBSI guidelines + digital phantom Well accepted, open source, feature implementations Delta-radiomics IBSI guidelines Inter-software comparison

4.2.1 Acquisition One of the major issues with quantitative medical imaging in general, and radiomics specifically, is the vast variation in the acquisition process of the imaging data. This variation is already introduced when using data from acquisition scanners from different vendors who each have their own specific acquisition and pre-processing techniques to obtain the best medical image. Meaning that even though configurable acquisition parameters are

4  Facts and Needs to Improve Radiomics Reproducibility

83

kept the same, the images could still result in different radiomic features values. Another problem is the variation in the reconstruction parameters as defined by local protocols. These reconstruction parameters have shown their influence to the appearance of the imaging data to such an extent that they affect quantitative measurements and radiomic features. Examples of such reconstruction parameters are properties like the in-plane resolution, the slice thickness and applied reconstruction kernels. Although previous imaging studies have shown the effects of slice thickness and reconstruction kernels on computed features, between ~5% and ~25% of radiomics studies prior to 2020 did not even report their study imaging protocols. Most of those who did report their imaging protocols only included the slice thickness information [2]. Additional to the scan protocol, the contrast enhancement protocol also plays a major role in the presentation of the image. This includes the injection protocol itself (bolus timing and size) but also the type of contrast media used (e.g., its iodine concentration in CT scanning). One must keep in mind that the effect of the contrast media can extend beyond the targeted area. For example, with intravenous injection of contrast media into the blood for the enhancement of the arteries this enhancement can also be apparent beyond the wall of the arteries because of partial volume effects. For example, Kristanto et al. showed a strong positive correlation between lumen contrast enhancement and mean plaque HU-value [6]. Scan artifacts also can hamper the determination of quantitative features. These include not only artifacts caused by alien objects such as metal implants (e.g., pacemakers, dental fillings, hip/knee prostheses), but also those caused by inaccurate acquisition (e.g., incorrect triggering/gating or contrast timing) or voluntary and involuntary movement of the patient. Finally, the patients themself also play a role in the determination of radiomic features. Different size patients or female patients with different size breasts can—because of the disturbance of fatty tissue—have different quantitative measures for the same structure.

84

P. M. A. van Ooijen et al.

4.2.2 Segmentation Before computing the radiomic features, segmentation of the region of interest (e.g., tumor or other pathology) is required. This segmentation can be performed either manually, semi-­ automatically, or fully automatically. These three methods all have their own characteristics concerning accuracy, intra- and inter-reader variability, and validation status. It was shown that manual segmentation affects the reproducibility of radiomic features to some extent because of the intraand inter-reader variability and that those differences were amplified in textural features [5]. Moving to semi-automatic segmentation has shown to improve features reproducibility when compared to fully manual segmentation [7]. Automatic segmentation promises more deterministic models able to provide more consistent outputs. However, these automatic segmentation methods are nowadays mostly deep learning based and thus heavily depending on the training data used. This dependency makes them sensitive to variations in the input data such as differences in the acquisition protocol used. Therefore, they should still be verified for their accuracy as varying degrees of precision in different cases would still negatively influence radiomics robustness. Previous work showed that even with high agreement among segmentation methods, subtle differences can significantly affect radiomic features and their predictive power [8].

4.2.3 Radiomic Features Extraction In an extensive review of 41 studies, Traverso et al. showed that reproducibility was highest in texture-based features when compared to shape-based and intensity-based metrics [5]. The most stable texture-based feature reported was entropy while the least reproducible features were reported to be coarseness and contrast. Pfaehler et  al. showed that in general texture-based were less

4  Facts and Needs to Improve Radiomics Reproducibility

85

robust than statistical features [4]. However, as stated before, there is no consensus about this in literature and contradictory results are reported on the reproducibility of features or feature groups. One frequently reported reason for this is the incomplete reporting of radiomics studies in the literature which makes extremely difficult to replicate previous work and thus leads to new, (slightly) different, implementations for each new study conducted. Another possible reason for the difference in results from the same features is the lack of standardized metrics to report feature repeatability and/or reproducibility. Besides the difference in the procedures to extract the features from the imaging data, there is also a variety in the way feature parameters are configured. These parameters are mostly fine-­ tuned on the local dataset to obtain the best results. However, because of the lack of standardization there is no guarantee that the same feature parameters will obtain the same results in a different dataset with (slightly) different properties. Furthermore, again the exact configuration of the feature parameters is not always reported in full in radiomics publications making it hard to accurately replicate earlier studies. Finally, the feature implementation can also contain slight variations both in interpretation and exact definition of specific features as well as in the exact name given to the features.

4.2.4 Model Construction Feature selection is one of the most important steps in the model construction. Features should be selected on their reproducibility and their ability to differentiate between the outcome classes. However, there is often a strong correlation between different features from the same feature group, increasing the risk of falsely significant associations when adopting multiple features from the same feature group to construct the predictive machine learning model.

86

P. M. A. van Ooijen et al.

4.3 How to Improve Reproducibility One obvious solution to increase reproducibility would be the standardization of acquisition protocols and segmentation strategies. The standardization of acquisition protocols can be very difficult to achieve because of the variety in hardware and implementation by the different vendors. Therefore, to achieve this an increased effort is required from both the users and the vendors of image acquisition equipment. Efforts to standardize acquisition protocols in medical imaging have taken place in recent years, especially through the proposal of Reporting and Data Systems (RADS). These are typically tailored to specific organs or pathologies, such as prostate (PI-RADS) or bladder (VI-­ RADS) cancer imaging, and often include technical requirements for image acquisition [9, 10]. However, it should be noted that adherence to these acquisition guidelines is still far from ideal in clinical practice [11, 12]. Standardization of segmentation strategies could be obtained in manual segmentation by providing strict guidelines on how to perform the segmentation and how to deal with specific situations that could occur [13]. Furthermore, contouring consensus could be implemented to reduce intra- and inter-reader variability. When moving to semi- or fully automatic segmentation, standardization becomes challenging because in that case not only the human reader but also the specific implementation of the software plays a major role in the decisions made. However, automatic or semi-­ automatic segmentation is also reported to reduce inter-reader variability and increase reproducibility. Both can also be combined using automatic segmentation with human oversight by supervision of computerized results and the ability to adjust it when it does not comply to pre-defined rules. When the image acquisition is done, pre-processing steps can be taken to prepare the images to allow a higher level of reproducibility. Image normalization and interpolation are two basic steps to obtain comparable results from radiomic features derived from data from different origin. Normalization ensures that the histogram distribution is similar in data with unit-less voxels. This can

4  Facts and Needs to Improve Radiomics Reproducibility

87

be obtained through different means, including gray level z-score normalization or even discretization with a fixed bin number. While generally advisable to implement, it is also true that the effect of normalization and discretization on texture-based feature reproducibility may vary based on the feature type and use case [14–16]. Interpolation makes sure that the voxel size is the same for different datasets when they are acquired, for example, with different slice thickness. With interpolation or even when resampling data from a single scanner dataset, the goal should be to obtain high resolution isotropic voxels with the same dimensions in all three directions. This is important to obtain rotationally invariant texture matrices. The reproducibility of radiomics can also be improved by reducing the noise in the imaging data before feeding them into the radiomics model. Noise can be reduced by applying filters (e.g., Laplacian or Gaussian filters in PyRadiomics) or wavelet decomposition to the images. While such conventional noise reduction methods are commonly used, a more novel methodology using deep learning to decrease the noise in, for example, (low dose) computed tomography, is gaining interest. Chen et al. demonstrated the use of a cycle GAN based approach to reduce image noise and they showed increased survival prediction AUC from 0.52 to 0.59 on a simulated noise CT dataset and 0.58 on the RIDER dataset [17]. They concluded that cycle GANs trained to reduce noise in CT can improve radiomics reproducibility and performance in low-dose CT. The implementation of delta radiomic features that do not provide information about a single time point but about the radiomic feature change over time in repeated scanning is also reported to increase the muti-site reproducibility of features in a phantom study [18]. Another possible solution to tackle the diversity in acquisition of imaging data is deep learning based harmonization. Exploiting the generative capabilities of deep learning networks image harmonization can be achieved to improve the accuracy of deep learning predictors [19] and has also shown to be helpful to increase reproducibility of radiomic features [17] and outperforming more conventional, histogram based, techniques [20].

88

P. M. A. van Ooijen et al.

4.3.1 Guidelines and Checklists A more general development that could benefit radiomics is the Quantitative Imaging Biomarkers Alliance (QIBA) initiative that aims to standardize quantitative imaging. Adhering to these standardization guidelines would result in a more consistent representation of the imaging data for a specific application. Although the aim of QIBA is to improve quantitative imaging, in general, this will also have a direct positive effect on the reproducibility of radiomic features. The Image Biomarker Standardization Initiative (IBSI) is also worth mentioning here [21]. IBSI compliance is certainly positive but still requires usage of comparable parameters for the extraction to ensure robustness [22]. A reporting checklist for scientific radiomics papers was proposed by Pfaehler et  al. [4]. The main goal of their reporting checklist was to evaluate the feasibility to reproduce a reported study. Similar work was advocated earlier by Traverso et al. who propose that radiomics software should be benchmarked on publicly available datasets [5]. However, public datasets are not synonym to perfect. These benchmarking datasets therefore should contain data from different institutions to guarantee maximum heterogeneity and be audited externally to ensure reliability. Furthermore, use of public benchmarks needs to be carefully implemented because of the risk of overfitting though iterative testing. A solution for this could be to only provide benchmark datasets with “hidden” labels and include automated feedback on the results. Traveso et al. also propose a standard reporting of the benchmark study [5]. Recently, a quality scoring tool has been developed to assess and improve research quality of radiomics studies: METhodological RadiomICs Score (METRICS). It is based on a large international panel and a modified Delphi protocol with a conditional format to cover methodological variations. It provides a well-­constructed framework for the key methodological concepts to assess the quality of radiomic research papers [23].

4  Facts and Needs to Improve Radiomics Reproducibility

89

4.3.2 Code and Development Platforms To increase reproducibility in the implementation of radiomics, standardized public domain code or development platforms could also provide a means to avoid variation caused by factors such as feature implementation and model construction. Examples of such code bases and development platforms are radiomics extensions on the Computational Environment for Radiological Research (CERR) [24], International Radiomics Platform (IRP) [25], PyRadiomics, and LIFEx. The challenge here is in the choices since there are different solutions that provide different capabilities and a varying level of programming knowledge, thus also implying less reproducibility because of lack of standardization between the different code bases and platforms. Pyradiomics is an open-source Python package for the extraction of radiomics data from medical images. It provides a variation of feature groups that can be extracted namely first order, shape, GLCM, GLRLM, and GLSZM features. Pydicom is easily imported into python code providing all necessary procedures. An example Jupyter Notebook can be found here: https://www. radiomics.io/pyradiomicsnotebook.html. The downside of PyRadiomics is that it lacks DICOM-RT input of anatomical structures [24]. A more practical downside of PyRadiomics is that it obviously requires Python programming skills. To overcome this, SlicerRadiomics was developed. SlicerRadiomics is an extension to 3D Slicer that encapsulates the PyRadiomics library, which in turn implements calculation of a variety of radiomic features. SlicerRadiomics can be obtained from GITHUB and used in 3D Slicer by building it from the source provided or installed directly from 3DSlicer by searching for “radiomics” in the Extensions Manager. Advantage is that using the radiomics extension allows you to calculate radiomic features on a segmentation in 3DSlicer without requiring any programming knowledge. The radiomics extension on CERR is based on MATLAB [24]. It provides batch calculation and visualization of radiomic fea-

90

P. M. A. van Ooijen et al.

tures using a tailored data structure for radiomics metadata. A test suite is also provided to allow comparison with radiomic features computed with other platforms such as PyRadiomics. LIFEx is an end-user freeware that allows to obtain a broad range of conventional, textural, and shape features from medical imaging data (www.lifexsoft.org). Also, other implementations have been released with the aim of harmonizing radiomic features. One example is the statistical method ComBat, originally developed for genomics but adapted to correct variations in radiomics measurements by Orlhac et al. [26]. ComBat does not require modification of images but allows for harmonization of radiomic features based on their distribution and knowledge of covariates. It is claimed to be a data driven approach that enables pooling of radiomic features from different CT protocols. Although the ComBat method shows promises and meaningful improvements in reproducibility of radiomic features, it did perform worse in patients than in phantom images. More work is needed to improve the method and extend it to other patient cohorts than lung cancer.

4.4 Recommendations for Achieving Clinical Adoption of Radiomics Previous reviews have shown that the assessment of repeatability and reproducibility of radiomic features is mainly performed in a limited number of pathologies, with most frequent pathologies being non-small cell lung cancer and oropharyngeal cancer [5]. Furthermore, they have shown that detailed information about the radiomics methodology is often lacking or incomplete. In reported papers the radiomics methodology is also applied to single site databases resulting in possible overfitting of the prediction model to the local data and thus no guaranteed high reproducibility when applied to data from a different origin. One of the main tasks for the current radiomics development is therefore the ability to perform validation by replication of results in an external dataset. This requires extensive and complete reporting of the

4  Facts and Needs to Improve Radiomics Reproducibility

91

radiomics development and implementation including a detailed description of the patient cohort and the image acquisition and reconstruction protocols used. Furthermore, the developed software and datasets used in the model development should be made publicly available. In the future for radiomics to become clinically useful, an extensive quality control must be implemented in the radiomics process to avoid problems caused by data and model drift. To detect data drift, quality control should be performed on the data acquired to ensure it is complying to the expectations of the radiomics evaluation and on the segmentation performed, especially in the case of automatic segmentation. For the detection of model drift, quality control should be implemented on the radiomics predictions themselves. In case of changes in imaging equipment, in the image acquisition and reconstruction protocols or changes in segmentation protocols or automatic segmentation tools, a more extensive model validation and a model update could be necessary.

References 1. Steiger P, Sood R (2019) How can radiomics be consistently applied across imagers and institutes. Radiology 291:60–61 2. Zhao B (2021) Understanding sources of variation to improve the reproducibility of radiomics. Front Oncol 11:633176 3. Lennartz S, O’Shea A, Parakh A, Persigehl T, Baessler B, Kambadakone A (2022) Robustness of dual-energy CT-derived radiomic features across three different scanner types. Eur Radiol 32:1959–1970 4. Pfaehler E, Zhovannik I, Wei L, Boellaard R, Dekker A, Monshouwer R, El Naqa I, Bussink J, Gillies R, Wee L, Traverso A (2021) A systematic review and quality of reporting checklist for repeatability and reproducibility of radiomic features. Phys Imaging Radiat Oncol 20:69–75 5. Traverso A, Wee L, Dekker A, Gillies R (2018) Repeatability and reproducibility of radiomic features: a systematic review. Int J Radiat Oncol Biol Phys 102(4):1143–1158 6. Kristanto W, van Ooijen PMA, Greuter MJW, Groen JM, Vliegenthart R, Oudkerk M (2013) Non-calcified coronary atherosclerotic plaque visualization on CT: effects of contrast-enhancement and lipid-content fractions. Int J Cardiovasc Imaging 29:1137–1148

92

P. M. A. van Ooijen et al.

7. Parmar C, Rios Velazquez E, Leijenaar R et al (2014) Robust radiomics feature quantification using semiautomatic volumetric segmentation. PLoS One 9:e102107 8. Poirot MG, Caan MWA, Ruhe HG, Bjornerug A, Groote I, Reneman L, Marquering HA (2022) Robustness of radiomics to variations in segmentation methods in multimodal brain MRI. Sci Rep 12:16712 9. Panebianco V, Narumi Y, Altun E et al (2018) Multiparametric magnetic resonance imaging for bladder cancer: development of VI-RADS (vesical imaging-reporting and data system). Eur Urol 74(3):294–306 10. Turkbey B, Rosenkrantz AB, Haider MA et al (2019) Prostate imaging reporting and data system version 2.1: 2019 update of prostate imaging reporting and data system version 2. Eur Urol 76(3):340–351 11. Cuocolo R, Stanzione A, Ponsiglione A et al (2019) Prostate MRI technical parameters standardization: a systematic review on adherence to PI-­ RADSv2 acquisition protocol. Eur J Radiol 120:108662 12. Esses SJ, Taneja SS, Rosenkrantz AB (2018) Imaging facilities’ adherence to PI-RADS v2 minimum technical standards for the performance of prostate MRI. Acad Radiol 25(2):188–195 13. deSouza NM, van der Lugt A, Deroose CM, Alberich-Bayarri A, Bidaut L, Fournier L, Costaridou L, Oprea-Lager DE, Kotter E, Smits M, Mayerhoefer ME, Boellaard R, Caroli A, de Geus-Oei LF, Kunz WG, Oei EH, Lecouvet F, Franca M, Loewe C, Lopci E, Caramella C, Persson A, Golay X, Dewey M, O’Connor JPB, deGraaf P, Gatidis S, Zahlmann G, European Society of Radiology, European Organisation for Research and Treatment of Cancer (2022) Standardised lesion segmentation for imaging biomarker quantitation: a consensus recommendation from ESR and EORTC.  Insights Imaging 13(1):159. https://doi.org/10.1186/s13244-­ 022-­01287-­4. PMID: 36194301; PMCID: PMC9532485 14. Duron L, Balvay D, Vande Perre S et al (2019) Gray-level discretization impacts reproducible MRI radiomics texture features. PLoS One 14(3):e0213459 15. Kociolek M, Strzelecki M, Obuchowicz R (2020) Does image normalization and intensity resolution impact texture classification? Comput Med Imaging Graph 81:101716 16. Schwier M, van Griethuysen J, Vangel MG et al (2019) Repeatability of multiparametric prostate MRI radiomics features. Sci Rep 9(1):9441 17. Chen J, Wee L, Dekker A, Bermejo I (2022) Improving reproducibility and performance of radiomics in low-dose CT using cycle GAN. J Appl Clin Med Phys 23:e13739 18. Nardone V, Reginelli A, Guida C, Belfiore MP, Biondi M, Mormile M et  al (2020) Delta-radiomics increases multicentre reproducibility: a phantom study. Med Oncol 37(5):38 19. Bashyam VM, Doshi J, Erus G, Srinivasan D et al (2022) Deep Generative Medical Image Harmonization for improving cross-site generalization in deep learning predictors. J Magn Reson Imaging 55(3):908–916

4  Facts and Needs to Improve Radiomics Reproducibility

93

20. Tixier F, Jaouen V, Hognon C, Gallinato O, Colin T, Visvikis D (2021) Evaluation of conventional and deep learning based image harmonization methods in radiomics studies. Phys Med Biol 66(24):ac39e5 21. Zwanenburg A, Vallieres M, Abdalah MA, Aerts HJWL, Andrearczyk V, Apte A, Ashrafinia S, Bakas S, Beukinga RJ, Boellaard R, Bogowicz M, Boldrini L, Buvat I, Cook GJR, Davatzikos C, Depeursinge A, Desseroit M-C, Dinapoli N, Viet Dinh C, Echguray S et al (2020) The image biomarker standardization initiative: standardized quantitative radiomics for high-throughput image-based phenotyping. Radiology 295:328–338 22. Fornacon-Wood I, Mistry H, Ackermann CJ, Blackhall F, McPartiin A, Faivre-Finn C, Price GJ, O’Connor JPB (2020) Reliability and prognostic value of radiomic features are highly dependent on choice of feature extraction platform. Eur Radiol 30:6241–6250 23. Kocak B, Akinci d’Antonoli T, Mercaldo N, Alberich-Bavarri A, Baessler B et al (2024) METhodological RadiomICs Score (METRICS): a quality scoring tool for radiomics research endorsed by EuSoMII. Insights Imaging 15:8. https://doi.org/10.1186/s13244-023-01572-w 24. Apte AP, Iyer A, Crispin-Ortuzar M, Pandya R, van Dijk LV, Spezi E, Thor M, Um H, Veeraraghavan H, Oh JH, Shukla-Dave A, Deasy JO (2018) Technical Note: extension of CERR for computational radiomics: a comprehensive MATLAB platform for reproducible radiomics research. Med Phys 45(8):3712–3720 25. Overhoff D, Kohlmann P, Frydrychowicz A, Gatidis S, Loewe C, Moltz J, Kuhnigk J-M, Gutberlet M, Winter H, Volker M, Hahn H, Schoenberg SO (2021) The international radiomics platform—an initiative of the German and Austrian radiological societies—first application examples. Rofo 193(03):276–288 26. Orlhac F, Frouin F, Nioche C, Ayache N, Buvat I (2019) Validation of a method to compensate multicenter effects affecting CT radiomics. Radiology 291:53–59

5

Data Harmonization to Address the Non-­biological Variances in Radiomic Studies Y. Nan , X. Xing , and G. Yang

5.1 Non-biological Variances in Radiomic Analysis To ensure the reliability and reproducibility of radiomics models, it is essential to establish strict standards for data collection and pre-processing. This means that the imaging data needs to be collected and processed in the same way for all patients, to ensure that the radiomic features are accurate and comparable across different samples. However, the medical imaging data obtained from different scanners or hospitals can be significantly various under different image acquisition protocols (such as slice thickness, spatial resolution, and reconstruction kernels), which results in immense variability in the extracted radiomic features. For example, even when imaging the same lung tumour region, CT scans acquired from

Y. Nan · X. Xing · G. Yang (*) Bioengineering Department and Imperial-X, National Heart and Lung Institute, Imperial College London, London, UK e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Á. Alberich-Bayarri, F. Bellvís-Bataller (eds.), Basics of Image Processing, Imaging Informatics for Healthcare Professionals, https://doi.org/10.1007/978-3-031-48446-9_5

95

96

Y. Nan et al.

Fig. 5.1  Illustration of how non-biological variances exists and how it affects radiomic studies (RoI region of interest)

different imaging devices (SIEMENS and PHILIPS in Fig.  5.1, respectively) present different visual patterns, which can lead to different texture-based features and prognostication results. Additionally, there can be considerable variations between two repeated scans for the same patient due to manual operations and the patient’s status [1]. To minimize motion artifacts in the images, patients are often asked to hold their breath when undergoing CT scans. However, movement during the scanning process can still occur, resulting in blurred or distorted images that are difficult to interpret. To address these issues, there is an urgent need to implement an harmonization algorithm that can harmonize data with different quality and acquisition protocols (Fig. 5.2). We refer to the variability caused by non-biological factors, such as acquisition devices, acquisition protocols, and laboratory preparations. These variances may reduce the reproducibility and generalizability of radiomic models and lead to incorrect or unreliable conclusions. Here, we summarized these non-biological factors into three main types: 1. Acquisition devices (Hardware): The variation of acquisition devices significantly contributes to the variations in multicentre data, especially in CT and MRI scans. Variations in detector systems of different vendors, coil sensitivity, positional and physiological differences during acquisition, as well as mag-

5  Data Harmonization to Address the Non-biological Variances…

97

Fig. 5.2  Three main types of non-biological variances

netic field variations in MRI, are some of the factors that cause differences in these data. Such variations can have serious implications for the reproducibility of radiomic features, even when a fixed acquisition protocol is used for different brands of scanners. For example, researchers have investigated the reproducibility of radiomic features on different scanners despite using the same acquisition protocol and found substantial differences [2]. The reproducibility of radiomic features ranged from 16% to 85%, indicating that even with a fixed protocol, there are still significant differences in the images produced by different scanners. Similarly, Sunderland et  al. [3] found a large variation in the standard uptake value (SUV) of different brands of scanners. They discovered that newer scanners had a much higher maximum SUV compared to older ones, indicating that the heterogeneity of acquisition devices could significantly impact the interpretation of imaging data. These findings suggest that the variability in acquisition devices can significantly affect the reproducibility and reliability of imaging data, which can have serious implications for clinical decision-making. To address this issue, there is a need for data harmonization strategies to ensure that imaging data is

98

Y. Nan et al.

consistent across different devices and vendors. This would help to increase the reproducibility of radiomic features and enhance the reliability of imaging data in clinical practice. 2. Patient status: Patient status could also affect the image quality of CT and MRI scans. Patient positional changes during the image acquisition could cause variation or artifacts that result in reduced image quality and inconsistencies between scans. For instance, if a patient moves during a CT or MRI scan, image distortions may occur, resulting in not only poor image quality and inconsistent results but also artifacts that could potentially be misinterpreted as pathological changes. This can be especially problematic in functional imaging studies such as PET and functional MRI (fMRI), where changes in patient motion can affect brain activation patterns or tracer uptake, significantly impacting the reproducibility and consistency of the results. Therefore, patient status, including patient positional changes and motion artifacts, should be carefully monitored and minimized during imaging procedures to improve the reproducibility and accuracy of the results. Techniques such as immobilization devices, patient coaching, and motion correction software can help minimize these effects and improve the quality and consistency of medical imaging in some extent, with additional costs and the patients’ cooperation. 3. Acquisition protocols: Variations in acquisition protocols are a significant cause of cross-cohort variability, which can have a significant impact on the reproducibility of radiomic features. These acquisition protocols include scanning parameters such as voltage, tube current, field of view, slice thickness, microns per pixel, as well as reconstruction approaches such as different reconstruction kernels. Details of factors that lead to non-biological variances in CT and MRI scans are given in Table 5.1. To investigate the reproducibility of radiomic features, several reproducibility studies have been conducted using test-reset experiments. These experiments determine the correlation coefficient or error between two features, where a high correlation coef-

5  Data Harmonization to Address the Non-biological Variances…

99

Table 5.1  Factors that lead to non-biological variances in CT and MRI imaging Acquisition parameters Voltage Tube current Field of view Slice thickness Pixel/voxel size Kernels Magnetic field strength Pulse sequence

Impact on CT imaging Impact on MRI Image contrast and noise Signal-to-noise ratio levels Image noise and N/A radiation dose Image resolution and anatomy coverage Image resolution and anatomical detail Image resolution and spatial detail Image sharpness and N/A texture a N/A Signal-to-noise ratio N/A

Echo time

N/A

Repetition time

N/A

Radiation dose

Image quality

Contrast and spatial resolution Contrast and signal-to-­ noise ratio Contrast and temporal resolution N/A

N/A means “not applicable” as the parameter does not apply to the given imaging modality a

ficient or low error indicates good reproducibility/repeatability. For instance, a radiomic feature is considered r­eproducible/ repeatable when the correlation coefficient score between features extracted from two comparison scans is greater than 0.90 [4]. We summarized the previous reproducibility studies given in Table 5.2, which further demonstrate that the scanning parameters significantly affect radiomic features, making statistical analysis difficult. As illustrated in Table 5.2, the reproducibility of radiomic features ranges from 8.0 to 63.3%, showing a weak stability and capacity of unharmonized image-derived radiomic features. Among these variables, the reconstruction kernel during CT scans has a distinct effect on radiomics reproducibility, which cannot be eliminated by unifying the reconstruction kernels as different kernels are used to meet different clinical demands. For example, when using different kernels (soft and sharp, respectively) during

Y. Nan et al.

100 Table 5.2  Summary of the reproducibility/repeatability studies Reference Jha et al. [5], 2021 Emaminejad et al. [6], 2021

Reproducibility

30.7% (332/1080) 8.0% (18/226) 7.5% (17/226) Kim et al. [7], 11.0% 2021 (112/1020) Saeedi et al. 20.5% [8], 2019 (8/39) 30% (13/39) Meyer et al. 20.8% [9], 2019 (22/106) 52.8% (56/106) 39.6% (42/106) 12.3% (13/106) Perrin et al. 24.8% [10], 2018 (63/254) 13.4% (34/254) Midya et al. 11.7% [11], 2018 (29/248) 19.8% (49/248) 63.3% (157/248) Altazi et al. 21.5% [12], 2017 (17/79) Zhao et al. 11.2% [13], 2016 (10/89) Choe et al. 15.2% [14], 2019 (107/702)

Definition

Variables

ICC > 0.90

Slice sickness Phantoms CT

CCC > 0.90 R-Kernel

Object

Human

Modality

CT

CCC > 0.90 Radiation Human CT dose CCC > 0.85 Acceleration Human MRI factors CoV  0.95

Radiation dose R-Kernel

Human

CT

R2 > 0.95

R-Kernel

Human

CT

R2 > 0.95

Slice sickness Human

CT

CCC > 0.90 Injection rates CCC > 0.90 Resolution

Human

CECT

Human

CECT

CCC > 0.90 Tube current Phantoms CT CCC > 0.90 Noise

Phantoms CT

CCC > 0.90 R-Kernel

Human

CT

L1  0.90 R-Kernel

Human

CT

CCC > 0.85 R-Kernel

Human

CT

R2: R-squared coefficient, CCC concordance correlation coefficient, ICC intraclass correlation coefficient, CoV coefficient of variation, CECT consecutive contrast-enhanced computed tomography, PET positron emission tomography, L1 mean difference score, R-Kernel reconstruction kernel

5  Data Harmonization to Address the Non-biological Variances…

101

the reconstruction, only 15.2% of radiomic features are reproducible [14]. While strict standard protocols can reduce ­non-­biomedical variances, radiologists often require specific acquisition protocols to ensure personalized, centre-based image quality considerations. For instance, radiologists may adjust the spacings (voxel sizes) on a case-by-case basis to assist the diagnosis. This heterogeneity in acquisition protocols is therefore unavoidable and requires a general solution to harmonize these data.

5.2 Data Harmonization 5.2.1 Data Harmonization in Radiomics Studies Data harmonization refers to the process of integrating data from multiple sources to facilitate analysis and comparison. Collecting data using different methods, storing it in different formats, or measuring it on different scales can make it challenging to integrate and compare data across sources. To overcome these challenges, data harmonization methods are used to standardize, match, transform, aggregate, or clean the data. The choice of method depends on the nature of the data being harmonized and the objectives of the project, with each approach having its own advantages and disadvantages. Factors such as data quality, available resources, and research topics are considered when choosing the appropriate method. Standardization involves creating a common set of data elements and ensuring that they are consistently defined and measured across different sources. This may involve developing a data dictionary that includes definitions of each data element, as well as instructions for how to collect and record data in a consistent manner. Standardization is often used in cases where multiple data sources need to be integrated, such as in the case of clinical trials where data may be collected from multiple sites. Matching involves identifying corresponding data elements in different sources and reconciling any differences between them.

102

Y. Nan et al.

Matching may involve comparing data elements based on a set of pre-defined criteria, such as matching on patient ID or other demographic information. Once corresponding data elements have been identified, any differences between them may be resolved through manual review or automated algorithms. Transformation involves converting data from one format or measurement scale to another, so that it can be integrated with other data. This may involve converting data from one type of unit (e.g., pounds to kilograms) or from one type of measurement (e.g., self-reported data to objectively measured data). Transformation may also involve data normalization, which involves adjusting data values so that they are on a common scale, often by dividing each value by a baseline value or standard deviation. Aggregation involves combining data from multiple sources into a single dataset, often by creating summary statistics or aggregating individual records. Aggregation may involve summarizing data by specific categories, such as age or geographic location, or by creating overall summary statistics, such as means or medians. Cleaning involves identifying and correcting errors or inconsistencies in the data, such as misspellings, duplicate entries, or outliers. Data cleaning may involve manual review or automated algorithms to identify and correct errors. Once errors have been identified and corrected, the data can be harmonized across multiple sources. It is of note that many data harmonization approaches rely on pre-defined criteria to guide the process. For instance, standardization usually involves creating a set of pre-defined data elements with clear definitions and measurement scales. To address the non-biological variances in radiomics study, smart harmonization approaches are carried out to integrate image data.

5.2.2 Automatic Harmonization Schemes In radiomics studies, large scale analysis on multicentre datasets has become increasingly important for improving the generaliz-

5  Data Harmonization to Address the Non-biological Variances…

103

ability of radiomics models and for gaining more insight into complex disease processes. To increase the efficiency of data harmonization and to alleviate human workload, automatic data harmonization has been proposed. There are two main schemes of smart data harmonization: samplewise and featurewise, respectively. Featurewise harmonization (scheme shown in blue colour in Fig. 5.3) aims to reduce the bias of extracted features by fusing the extracted features to eliminate cohort variances. In this workflow, models are developed separately regarding of the number of data sources (each model corresponds to a data source). Then, multicentre features are extracted following the same feature extraction criteria, followed by the featurewise harmonization techniques to eliminate the non-biological variances. While this approach could improve data consistency and comparability, it can be more complex than samplewise harmonization, as it often requires multiple models to extract features of interest. Additionally, when the number of samples in each cohort is small, it can be challenging to develop corresponding models due to limited training samples. Samplewise harmonization (scheme shown in orange colour in Fig.  5.3) is typically performed before modelling which involves reducing the cohort variance of all training samples. Normally, different sources of datasets are first pre-processed under the same criteria, followed by the harmonization model to merge all these data together. This process is achieved through

Fig. 5.3  Two typical ways (featurewise and samplewise) of automatic data harmonization

104

Y. Nan et al.

various techniques, such as image processing, synthesis, and invariant feature learning. By harmonizing the data in this way, multicentre samples can be fused into a single dataset, allowing for a more robust and accurate model. Based on these harmonized data, a single model is trained to extract feature of interest for clinical analysis. It is also known as image-domain harmonization. Task-driven harmonization is different from the samplewise and featurewise harmonization approaches which harmonize the data or the feature for further analysis. The task-driven harmonization is designed to learn cohort-invariant features from multiple data sources, then applies these features to the primary task (e.g., segmentation, classification, regression). The concept behind task-driven harmonization approaches is that if a sparse dictionary/mapping can be constructed from the data of various cohorts, then these learned representations will not contain intra/inter cohort variability. It focuses more on the development of robust computational models instead of harmonizing raw data or extracted features.

5.2.3 Automatic Harmonization Approaches In this section, we summarize harmonization approaches into different groups based on the techniques behind them, including location and scale, clustering, matching, synthesis, and invariant feature learning. Among these approaches, location and scale and clustering can be both used for samplewise and featurewise, while matching and synthesis can be only implemented to samplewise harmonization (Fig. 5.4).

 ocation and Scale Methods L Location and scale methods are statistical techniques utilized to estimate the distribution of a dataset. These methods are commonly used in descriptive statistics and can be used to summarize the distribution of a dataset in a few key measures.

5  Data Harmonization to Address the Non-biological Variances…

105

Fig. 5.4  Harmonization approaches. Blue blocks represent methods that can be used for samplewise harmonization only, while the orange blocks correspond to methods that can be used for two harmonization schemes. The yellow block indicates invariant representation learning approach which is mainly used to develop harmonized models

The location method is used to describe the centre of the data distribution. Among different types of estimation approaches, mean or average value is the commonly used measurement. Another helpful measure of location is the median value, which is the middle value when the samples are arranged in order. The median is a useful measure of location when the dataset includes some outliers, which may skew the mean value. The scale method is used to describe the variation of the data distribution. The common measure of scale is the standard deviation, which measures how much the values in the dataset deviate from the mean value. A low standard deviation indicates that the values are tightly clustered around the mean, while a high standard deviation indicates that the values are more spread out. In addition to the standard deviation, range and interquartile range (the difference between the 75th and 25th percentiles), are also used as measurements. Based on these location and scale parameters, data collected from different site is aligned towards the same location and scale value. One intuitive way is normalization (also called standardization), which rescales the samples to same ranges. Given the mean μ and standard deviation σ of sample x, the commonly used z-score normalization and max–min normalization can be given by

x¢ =

x-m , s

(5.1)

Y. Nan et al.

106

x¢ =

x - min ( x )

, (5.2) max ( x ) - min ( x ) respectively. In addition to normalization/standardization, the ComBat algorithm, as described in [15, 16], was proposed for featurewise harmonization. For instance, researchers used ComBat to harmonize the image-derived features from multicentre MRI datasets [16]. It utilized empirical Bayes shrinkage to accurately estimate the mean and variance for each batch of data. These estimates were then used to harmonize the data across cohorts. The first step is standardizing the data to ensure similar overall mean and variance, followed by the empirical Bayes estimation with parametric empirical priors. The resulting adjusted bias estimators were then used in the location-scale model-based functions to harmonize the data. Another type of location and scale methods are based on the alignment of data distributions, using cumulative distribution functions or pdfs. For instance, Wrobel et  al. [17] proposed a method to harmonize MRI multicentre data, which aligns the voxel intensities of the source dataset with the target cumulative distribution functions by estimating a non-linear intensity transformation. In another study [18], the empirical density was estimated and the distance between probability density functions was calculated. Common features from different datasets were selected first, and then their probability density functions were estimated to determine the most suitable matching offsets. The harmonized data was obtained by subtracting the estimated offsets from the source cohorts.

Clustering Methods Clustering methods are commonly used in data harmonization to group data samples based on their distances. In clustering, distance measures the similarity or dissimilarity between pairs of data samples. The distance between two observations is typically calculated based on the values of their attributes or features. The aim of clustering is to create subsets or clusters of samples that are

5  Data Harmonization to Address the Non-biological Variances…

107

Fig. 5.5  Steps of using clustering methods for data harmonization

more like each other than to those in other clusters. This grouping can help to harmonize the data by creating a more uniform representation of the samples that can be used for subsequent analysis. Figure 5.5 illustrates the steps when using clustering methods for data harmonization. Several factors, such as the choice of clustering algorithm, the selection of distance metrics, the pre-processing of data, and the determination of the number of clusters, can influence the quality of harmonization obtained through clustering methods. Clustering algorithm: The selection of clustering algorithm can affect the quality of harmonization. Different algorithms have different assumptions and properties and may perform different on various types of data. For instance, k-means assumes spherical clusters and is sensitive to initialization, while hierarchical clustering can handle non-spherical clusters but requires more computational costs. Distance metrics: The choice of distance metric can also impact the quality of harmonization. Different distance metrics lead to different clustering results, as the similarity or dissimilarity between samples is calculated in different ways.

108

Y. Nan et al.

Data pre-processing: Pre-processing steps such as scaling, normalization, or handling missing values can impact the distance between observations, and therefore the clustering results. It is important to carefully apply appropriate pre-processing steps before clustering. The number of clusters: Define an appropriate number of clusters can be challenging, this can be achieved by introducing the prior knowledge (when the number of data sources is known) or different assessment approaches such as silhouette scores and gap statistics. Choosing inappropriate number of clusters can lead to poor harmonization performance. Here we introduce some clustering methods for harmonization. Nearest neighbours methods: these methods first identify the pairs of mutual nearest neighbours and then estimate the bias correction vectors between paired samples. These vectors are then subtracted from the source cohort. The differences in NN methods primarily relate to the way in which the mutual nearest pairs are located within the geometric space [19–23]. For instance, MNN [19] identified nearest neighbours between different datasets and used them as reference points to calculate cohort bias. It employed cosine normalization to pre-normalize the data, and then estimated the bias correction vector by computing the Euclidean distances between paired samples. The bias correction vector was subsequently applied to all samples, not just the paired samples. Iterative clustering methods: Iterative clustering methods aim to address cohort bias by conducting multiple bias correction iterations through repeated clustering procedures. Typically, these methods (1) cluster all samples from different cohorts, and (2) determine the correction vectors for harmonization based on the cluster centroids. Harmony [24] first utilized principal component analysis (PCA) to reduce the dimensionality of all samples and then divided them into multiple groups, with one centroid per group, by using k-means clustering. These centroids were then used to calculate the correction factors for harmonization. The clustering and correction steps were repeated until convergence was achieved.

5  Data Harmonization to Address the Non-biological Variances…

109

Matching Methods Matching methods in data harmonization are used to align data collected from different sources that may have different formats or structures. Resampling is the common matching method used for automatic data harmonization. Resampling, also known as resizing, is a method that involves altering the dimensions or resolution of images or signals to match those of other datasets. This method can be used to harmonize data collected from different sources with varying resolutions or image sizes. In radiomic study, the reproducibility of radiomic features is heavily affected by the voxel/pixel size (refers to the physical length of a single pixel in the CT/MRI image). Synthesis Methods Synthesis is a method used to generate samples that belong to a specific modality or domain, effectively harmonizing multi-cohort datasets. This approach simplifies the task of data harmonization by considering each cohort as a distinct style and transferring all samples to a common style. Synthesis techniques can be divided into paired synthesis and unpaired synthesis, depending on the features of the training sample. Paired synthesis is used when corresponding samples from different cohorts are available, while unpaired synthesis is used when such correspondence is absent (in Fig. 5.6). Paired synthesis approaches are trained on paired samples that originate from the same object but are obtained using different protocols (e.g., CT scans collected from same patient with different scanners). These techniques are developed to learn how to transform data between the source and reference cohorts. For example, Park et  al. proposed “deep harmonics” for CT slice

Fig. 5.6  Workflow of paired synthesis and unpaired synthesis model

110

Y. Nan et al.

thickness harmonization, by introducing an end-to-end deep neural network to generate CT scans. However, in clinical practice, paired data is difficult to acquire with a high cost. Unpaired synthesis mainly refers to cycle-GAN and conditional VAE approaches (Fig. 5.6), which can be well trained with sufficient unpaired samples. The cycle-GAN based approaches [25, 26] are trained in a cycle-consistent manner, including forward translation, backward translation, and cycle consistency. The training procedure remains iteratively until the synthetic images are close to the target domain images. Different from cycle-GAN based method, VAE applies an encoder to compress the input (high-dimensional data) into data representations (low dimensional vector), and a decoder to reconstruct the raw data using these data representations. Conditional VAE changes the decoder to a conditional one which transfers the data representations back to the harmonized data regarding the cohort prior knowledge. By integrating conditional VAE with the adversarial module, the cohort transfer can be performed without paired training samples. For instance, Moyer et al. [27] proposed a conditional VAE to provide cohort-invariant representations by introducing spherical harmonics coefficients as inputs and outputs.

I nvariant Representation Learning Methods Invariant feature learning techniques aim to identify features that are consistent across different sets of data, and then use these features to perform specific tasks such as segmentation, classification, or regression. The idea behind representation learning methods for harmonization is that by creating a concise dictionary or mapping from diverse data sources, the resulting representations will not include any variability that is specific to a particular dataset or cohort. There are mainly two schemes of invariant feature learning (in Fig. 5.7). Approaches such as “Deep unlearning harmonics” [28] are usually applied with an adversarial module, or domain classifier, to aid the encoder in identifying features that are consistent across various cohorts of data. This is accomplished by maximizing the adversarial loss Ladv while simultaneously minimizing the

5  Data Harmonization to Address the Non-biological Variances…

111

a

b

Fig. 5.7  Invariant feature learning schemes

main task loss LTask. To obtain accurate representations of the features, methods such as normalization autoencoder [29] introduced a decoder that reconstructs the original input data, thereby minimizing the reconstruction loss LRec. By incorporating these optimization functions, these methods can ensure stable performance when working with data from multiple cohorts.

5.3 Challenges for Data Harmonization For years, computational data harmonization has been proposed as a solution to mitigate the data inconsistency issue in digital healthcare research studies. However, applying this concept to real-world multicentre, multimodal, and multi-scanner medical practice and clinical trials poses a significant challenge. Although transfer/federated/multitask learning approaches have shown promising results, their success depends on ideal conditions and may fail when working across different data sources, requiring effective data harmonization. Unfortunately, there is limited consensus on which approaches and metrics are best suited for dealing with multimodal datasets [1]. In addition, the lack of a standardized stepwise design methodology makes it difficult to reproduce existing studies, hindering progress in the field. The

112

Y. Nan et al.

challenges of data harmonization in radiomics include various aspects. Firstly, the local and scale approaches present several issues. Most distribution-based methods require refined feature vectors that depend on prior knowledge of the regions of interest, while accurate prediction of regions of interest cannot be achieved without proper data harmonization. Furthermore, although some distribution-­based methods, such as ComBat, can remove cohort bias while preserving differences between radiomic features on phantoms, they are not well-suited to images or high-dimensional signals due to their demanding computational complexity. Additionally, when new data is added, data harmonization needs to be performed on the entire dataset again, and some pairwise approaches require a complex training procedure, such as repeated training, when applied to multicentre datasets with more than two cohorts. Secondly, despite the significant progress made in deep learning-­based synthesis methods, their reproducibility and generalizability are still a concern. These methods face clear limitations such as (1) being mainly built on existing multicentre datasets, lacking evaluations on new datasets; (2) being based on GAN models that are unstable and may introduce unrealistic changes or hallucinations; and (3) requiring a large amount of training data for all cohorts, which may not be feasible for clinical studies. To address these issues, researchers should report the performance of data harmonization on new datasets that were not involved in the model development; improve the stability of data synthesis; and develop data harmonization strategies that require less training data. Moreover, while extracting invariant features across cohorts is a promising approach to address the limitations of synthesis methods, it also has its challenges. Specifically, it can only extract invariant features for analysis and cannot generate harmonized data. Therefore, future research should aim to develop methods that can generate harmonized data using the extracted invariant features. An unexplored research area in data harmonization is the use of explainable artificial intelligence (XAI) methods [30]. XAI

5  Data Harmonization to Address the Non-biological Variances…

113

techniques can provide insight into the possible reasons for inconsistent data representations that contribute to bias in data-based models. By analysing these insights, researchers can determine whether the biasing artifacts are due to inadequate data harmonization before the learning phase. Additionally, local explanatory methods can identify out-of-distribution examples that may relate to data harmonization issues, such as equipment miscalibration or changes in data capture protocols. Improved data harmonization can benefit XAI by standardizing all data and eliminating cohort biases [31, 32]. In summary, we anticipate an exciting cross-­ disciplinary research area at the intersection of harmonization and XAI.

References 1. Nan Y et al (2022) Data harmonization for information fusion in digital healthcare: a state-of-the-art systematic review, meta-analysis and future research directions. Inf Fusion 82:99 2. Berenguer R et al (2018) Radiomics of CT features may be nonreproducible and redundant: influence of CT acquisition parameters. Radiology 288(2):407–415 3. Sunderland JJ, Christian PE (2015) Quantitative PET/CT scanner performance characterization based upon the society of nuclear medicine and molecular imaging clinical trials network oncology clinical simulator phantom. J Nucl Med 56(1):145–152 4. Yamashita R et  al (2020) Radiomic feature reproducibility in contrast-­ enhanced CT of the pancreas is affected by variabilities in scan parameters and manual segmentation. Eur Radiol 30(1):195–205 5. Jha A et  al (2021) Repeatability and reproducibility study of radiomic features on a phantom and human cohort. Sci Rep 11(1):1–12 6. Emaminejad N, Wahi-Anwar MW, Kim GHJ, Hsu W, Brown M, McNitt-­ Gray M (2021) Reproducibility of lung nodule radiomic features: multivariable and univariable investigations that account for interactions between CT acquisition and reconstruction parameters. Med Phys 48:2906 7. Kim M, Jung SC, Park JE, Park SY, Lee H, Choi KM (2021) Reproducibility of radiomic features in SENSE and compressed SENSE: impact of acceleration factors. Eur Radiol 31:1–14 8. Saeedi E et al (2019) Radiomic feature robustness and reproducibility in quantitative bone radiography: a study on radiologic parameter changes. J Clin Densitom 22(2):203–213

114

Y. Nan et al.

9. Meyer M et al (2019) Reproducibility of CT radiomic features within the same patient: influence of radiation dose and CT reconstruction settings. Radiology 293(3):583–591 10. Perrin T et al (2018) Short-term reproducibility of radiomic features in liver parenchyma and liver malignancies on contrast-enhanced CT imaging. Abdom Radiol 43(12):3271–3278 11. Midya A, Chakraborty J, Gönen M, Do RK, Simpson AL (2018) Influence of CT acquisition and reconstruction parameters on radiomic feature reproducibility. J Med Imaging 5(1):011020 12. Altazi BA et al (2017) Reproducibility of F18-FDG PET radiomic features for different cervical tumor segmentation methods, gray-level discretization, and reconstruction algorithms. J Appl Clin Med Phys 18(6):32–48 13. Zhao B et al (2016) Reproducibility of radiomics for deciphering tumor phenotype with imaging. Sci Rep 6(1):1–7 14. Choe J et al (2019) Deep learning–based image conversion of CT reconstruction kernels improves radiomics reproducibility for pulmonary nodules or masses. Radiology 292(2):365–373 15. Johnson WE, Li C, Rabinovic A (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8(1):118–127 16. Whitney HM, Li H, Ji Y, Liu P, Giger ML (2020) Harmonization of radiomic features of breast lesions across international DCE-MRI datasets. J Med Imaging 7(1):012707 17. Wrobel J et al (2020) Intensity warping for multisite MRI harmonization. NeuroImage 223:117242 18. Lazar C et al (2013) GENESHIFT: a nonparametric approach for integrating microarray gene expression data based on the inner product as a distance measure between the distributions of genes. IEEE/ACM Trans Comput Biol Bioinform 10(2):383–392 19. Haghverdi L, Lun AT, Morgan MD, Marioni JC (2018) Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat Biotechnol 36(5):421–427 20. Hie B, Bryson B, Berger B (2019) Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat Biotechnol 37(6):685– 691 21. Wolf FA, Angerer P, Theis FJ (2018) SCANPY: large-scale single-cell gene expression data analysis. Genome Biol 19(1):1–5 22. Polański K, Young MD, Miao Z, Meyer KB, Teichmann SA, Park J-E (2020) BBKNN: fast batch alignment of single cell transcriptomes. Bioinformatics 36(3):964–965 23. Butler A, Hoffman P, Smibert P, Papalexi E, Satija R (2018) Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol 36(5):411–420

5  Data Harmonization to Address the Non-biological Variances…

115

24. Korsunsky I et al (2019) Fast, sensitive and accurate integration of single-­ cell data with Harmony. Nat Methods 16(12):1289–1296 25. Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 2223–2232 26. Zhao F et  al (2019) Harmonization of infant cortical thickness using surface-­to-surface cycle-consistent adversarial networks. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 475–483 27. Moyer D, Ver Steeg G, Tax CM, Thompson PM (2020) Scanner invariant representations for diffusion MRI harmonization. Magn Reson Med 84(4):2174–2189 28. Dinsdale NK, Jenkinson M, Namburete AI (2021) Deep learning-based unlearning of dataset bias for MRI harmonization and confound removal. NeuroImage 228:117689 29. Rong Z et al (2020) NormAE: deep adversarial learning model to remove batch effects in liquid chromatography mass spectrometry-based metabolomics data. Anal Chem 92(7):5082–5090 30. Arrieta AB, et  al (2020). Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information fusion, 58:82–115. 31. Yang G, Ye Q, Xia J (2022) Unbox the black-box for the medical explainable ai via multi-modal and multi-centre data fusion: a mini-review, two showcases and beyond. Inf Fusion 77:29–52 32. Holzinger A et  al (2022) Information fusion as an integrative cross-­ cutting enabler to achieve robust, explainable, and trustworthy medical artificial intelligence. Inf Fusion 79:263–278

6

Harmonization in the Image Domain F. Garcia-Castro

and E. Ibor-Crespo

6.1 The Need for Image Harmonization Medical imaging presents variability that depends on several factors [1]. The scanner model and manufacturer, acquisition protocol or patient preparation are factors that affect image quality. When describing image quality, two main aspects must be considered within the field of medical imaging: diagnostic quality and technical quality. Although both quality definitions focus on different aspects to assess whether an image is satisfactory or not, they are not independent or unrelated. Obtaining an adequate diagnostic quality requires the images in the study to capture the necessary attributes and characteristics that represent physiology and pathology, facilitating the clinician or radiologist to identify the relevant findings with a correct interpretation of the image. Technical quality should be defined by the parameters used to acquire, reconstruct or generate the images ensuring, for example, that the images of the study will be free of artifacts, with sufficient F. Garcia-Castro (*) Department of Technology and Innovation, Quibim, Valencia, Spain e-mail: [email protected] E. Ibor-Crespo Department of AI Research, Quibim, Valencia, Spain © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Á. Alberich-Bayarri, F. Bellvís-Bataller (eds.), Basics of Image Processing, Imaging Informatics for Healthcare Professionals, https://doi.org/10.1007/978-3-031-48446-9_6

117

118

F. Garcia-Castro and E. Ibor-Crespo

spatial resolution, or with satisfactory contrast. Technical quality has, of course, a large effect on diagnostic quality. A low signal-­ to-­noise ratio (SNR) can lead to inaccurate representation of ­tissues and organs, as a high noise level can render features indistinguishable [2], thus decreasing the diagnostic quality. However, a high technical quality does not automatically produce an image of high diagnostic quality as, for instance, the image focus could be completely out of the scope of the tissues or organs that need to be properly represented on the image. In an ideal scenario, images acquired under the same conditions should present the same technical quality with minimal deviations. However, in a real-world scenario, we will find different manufacturers and scanner models in clinical practice. These variations will cause shifts in the signal intensities of the image voxels1 even with similar acquisition protocols. This situation is accentuated when dealing with multi-centric real-world data (RWD), as a high degree of variability affects images coming from the different imaging sites [3]. Different manufacturers, scanners or acquisition protocols introduce image variability that can affect the possibility of developing generalizable AI models, while also affecting the reproducibility in the calculation of quantitative imaging biomarkers (QIBs) [4], these being paramount to successfully develop and apply radiomic techniques and methodologies. These sources of variability are especially pronounced when dealing with magnetic resonance (MR) images, since the intensity values are not normalized to standard units as in computed tomography (CT) scans, where intensity values can be standardized to Hounsfield units values due to the intrinsic physics principles of X-ray attenuation. To reduce this variability, two large groups of techniques can be used: standardization and harmonization. Through standardization, it is intended to ensure that the processes are always carried out in the most similar way possible to reduce variability in image quality and in the quantification of imaging biomarkers. For example, always using the same acquisiSince we will be discussing mainly tomographic image, we will refer to voxels instead of pixels. 1 

6  Harmonization in the Image Domain

119

tion protocol [5, 6] or performing patient preparation in a strict and reproducible manner [7]. Also, when it comes to quantifying imaging biomarkers, standardization should seek to perform the measurements in a homogeneous way in all cases. Initiatives such as the Quantitative Imaging Biomarkers Alliance (QIBA) from the Radiological Society of North America (RSNA) or the European Imaging Biomarkers Alliance (EIBALL) from the European Society of Radiology (ESR) seek standardization in the quantification of imaging biomarkers by setting profiles for specific methodologies. These profiles include recommendations that depict not only image acquisition protocols in detail, but also the specific methodologies to be followed for the calculation of a given quantitative imaging biomarker. The objective of harmonization processes in medical imaging, and specifically of image harmonization techniques (IHTs), is to bring the images to a common space, space referring to the set of characteristics that define the appearance of the image: contrast, resolution, dynamic range, etc. Harmonization is an important process to ensure better generalization in artificial intelligence (AI) models [8]. Supervised AI models based on convolutional neural networks (CNNs) focused on segmentation, classification or object detection learn from a finite data set. This dataset will contain images with a specific contrast range determined by the acquisition protocols and characteristics of the scanners used. The trained network will have learned, therefore, to recognize findings or structures in images that present these characteristics and the generalization that the network is capable of reaching will be limited by the particularities of the training dataset. Applying IHTs before using a CNN to segment a specific anatomy can indirectly help in the generalization of the model, as the objective of the IHT would be to bring the image contrast to the general vicinity of the images used for training, therefore ensuring a better performance of the model than when using a completely foreign contrast. A model to predict patient relapse based on radiomic features would be also greatly benefit from IHTs application, since it would ensure that radiomic features are always calculated from images with similar contrast and signal intensities.

120

F. Garcia-Castro and E. Ibor-Crespo

The field of radiomics has gained significant attention in the recent years, with researchers seeking to develop models that can extract useful information from medical images to aid in ­diagnosis, treatment planning, and patient outcomes. In this context, IHTs have emerged as a crucial tool for improving the performance of radiomic models [9]. A wide range of IHTs has been investigated to address the various sources of image variability that can impact the accuracy and reliability of radiomic features. These techniques range from traditional computer vision algorithms for image normalization to more advanced solutions based on AI, such as generative adversarial networks (GANs) and autoencoders. In this chapter, we will describe several of these IHTs, while also considering the various sources of image variability, including factors that affect signal intensity, contrast, and spatial resolution, as well as issues related to image artifacts and noise. By gaining a better understanding of these topics, we can develop more effective strategies for optimizing radiomics performance and advancing the field of medical imaging in research and clinical practice.

6.2 Image Variability Sources Variability sources in medical imaging are usually inherent to each modality. While these sources can be grouped in general categories, such as patient preparation, acquisition protocol, foreign bodies, etc., all of them have concrete root causes that can be tied to the intrinsic characteristics of each modality. Patient preparation is one of the sources that induces variability across all modalities, but with very specific issues arising depending on the modality. The effect of patient preparation is especially relevant and concerning, as if the process was not performed appropriately, the imaging study might be rendered completely useless, not only for radiomic model development, but also for reading purposes in some scenarios. Lack of patient preparation can create a wide array of issues with many of them affecting the possibility of developing AI

6  Harmonization in the Image Domain

121

models. If when acquiring a lung CT, the patient does not perform the full inspiration cycle properly, the study will not be adequate to perform any kind of volumetric analysis [10]. In ­multiparametric prostate MRI (mpMRI), if the patient underwent suboptimal rectal preparation, the rectum could show an excessive amount of gas, creating susceptibility artifacts that especially affect the diffusion weighted imaging (DWI) sequence with a large degree of unwanted deformation [11], affecting image registration or organ segmentation. Or if during the acquisition of a FDG PET/CT (fluorodeoxyglucose positron emission tomography  +  computed tomography) scan the patient is not placed with the arms raised above the head but alongside the body, beam hardening artifacts and field-of-view (FOV) truncation artifacts might appear [12]. Poor patient preparation, in general, cannot be solved by harmonization techniques. Standardization of patient preparation for each type of imaging study is the best way to ensure reducing variability in the cases where this procedure is needed. However, there is no consensus patient preparation protocols depicted for all reporting guidelines. For instance, the PI-RADS v2.1 prostate mpMRI reading guidelines of the American College of Radiology (ACR) specifically states that there is no consensus patient preparation that demonstrates an improvement in diagnostic accuracy. Nevertheless, artifacts introduced by lack of patient preparation will significantly hinder the performance of AI models and traditional computer vision algorithms, as anatomy can suffer heavy deformations or other unwanted modifications. While IHTs may not be the ideal solution for addressing this kind of issues, they are particularly useful when it comes to dealing with variability sources that directly impact the signal intensity and contrast of an image. Factors such as the vendor of the equipment, the specific scanner model, the firmware version, the reconstruction algorithms employed, and the acquisition protocols utilized can all affect image contrast in different ways depending on the modality. For instance, the tube voltage in CT scans and the repetition time (TR) in MRI scans can fall within acceptable ranges for diagnostic purposes. However, the differences in SNR and image contrast that they introduce can hinder the performance of AI

122

F. Garcia-Castro and E. Ibor-Crespo

algorithms, rendering them less effective. In such cases, employing IHTs to standardize the contrast of the images can significantly enhance the accuracy and reliability of AI models [13]. It is important to note that the choice of IHT will depend on the specific imaging modality and the particular variability sources that need to be addressed. Furthermore, careful consideration of factors such as the computational resources required and the impact of the IHT on the final image quality should also be considered when selecting the appropriate IHT for a given application.

6.2.1 Image Acquisition Several aspects influence image SNR and contrast across different modalities. Due to the physical principles that each modality is based on, the characteristics that affect the image are different. In CT, the SNR is affected by many different aspects, from detector, collimator, tube current, slice thickness or the reconstruction algorithm. All these features affect the number of photons detected, which in turn affects the appearance of the image. Tube current, or dose, is one of the most closely related aspects of SNR variability. Increasing the current by a factor of 2 means potentially having twice the signal. However, since CT is based on the emission of ionizing radiation, the trend in clinical practice is to follow the principle “as low as reasonably achievable” (ALARA) [14]. The ALARA principle is based on the fact that with ionizing radiation there is no safe dose, no matter how small, and therefore it must be kept as low as possible. This principle has led to the fact that, for good reason, most CT acquisitions in clinical practice are low-dose CT scans. In MRI, one of the main factors affecting the SNR is the intensity of the magnetic field. Magnetic field strength is directly proportional to SNR, hence introducing variability on image quality depending on if the study was acquired with a 1.5 Tesla (T) or a 3 T scanner. TR and echo time (TE) also affect SNR in different ways. Long TRs will increase SNR as longitudinal magnetization will get closer to its maximum. However, excessively long TRs in

6  Harmonization in the Image Domain

123

T1-weighted (T1w) images will result in contrast loss among tissues. TE behaves contrary to TR, as decreasing TE will increase the SNR. Short TEs ensure that the transverse magnetization is high, resulting in high signal. However, for T2-weighted (T2w) images, greatly decreasing TE will result in suboptimal image contrast. Many other acquisition-related factors will affect SNR, and hence contrast, in MRI.  Flip angle, slice thickness, spacing between slices, matrix size, and field-of-view (FOV) are among the key factors that influence image contrast. Each of these parameters can be adjusted in different ways depending on the specific MRI imaging protocol and the scanner configuration, resulting in significant variability in image contrast. The radiofrequency coil used for signal reception is another important factor that can impact the contrast of an MRI sequence. Different types of coils have varying sensitivity to different tissue types and magnetic field strengths and selecting the appropriate coil for a particular imaging task is crucial for obtaining high-­ quality images with consistent contrast. In addition to these acquisition-­related factors, the chosen k-space filling technique can also affect image contrast. K-space is a mathematical representation of the raw MRI data, and the way that it is sampled and filled can impact image contrast and resolution. Figure 6.1 shows the effect on image contrast of TR and TE combinations on different prostate T2w image series acquired in clinical practice. The contrast differences shown can easily be appreciated and might introduce a degree of variability that an AI model cannot simply overcome without proper application of IHTs.

Fig. 6.1  Effect of TR and TE on prostate T2w images

124

F. Garcia-Castro and E. Ibor-Crespo

Due to this complexity, without the use of IHTs, an AI model trained on a finite dataset of MRI images could perform poorly on new, unseen images [15] due to the variability in image acquisition parameters. However, by incorporating IHTs into the preprocessing pipeline, the AI model can learn to recognize important features and patterns in the images despite the differences in acquisition parameters, leading to more accurate and robust predictions.

6.3 Harmonization Techniques The scope of image harmonization is broad and encompasses a variety of techniques. However, for the purposes of this chapter, we will focus on IHTs that aim to normalize the intensity of voxels across multiple images. This means that the techniques we will discuss are those that adjust the intensity values of the voxels to a similar range, regardless of their specific implementation. As a result, we will not be discussing techniques such as blurring filters, inhomogeneity filters, or other image correction methods in this chapter. Instead, we will concentrate on methods that are designed to bring image intensity into alignment across different images. Application-specific techniques will not be depicted in detail either. These techniques make use of particular characteristics of the acquired images, usually specific tissues, in order to achieve a certain degree of harmonization. Using the values of healthy liver parenchyma in order to normalize the standardized uptake values (SUV) on a PET scan [16] or the WhiteStripe [17] method to normalize of the tissues on brain MRI are examples of such techniques. In the field of image harmonization, we can classify IHTs into two main categories from an implementation point of view: those that employ conventional computer vision algorithms and those that utilize AI methodologies. While non-AI methods are applicable in many different situations, they often lack robustness and overall performance compared to AI-based techniques. This is where CNNs have emerged as a more robust alternative, learning from the complexities and nuances of images, resulting in a more

6  Harmonization in the Image Domain

125

accurate and effective image harmonization. Despite the many advantages of AI-based IHTs, they also have some limitations that must be addressed to ensure their successful implementation in certain scenarios. Nonetheless, incorporating IHTs into the preprocessing pipeline can significantly improve an AI model’s ability to recognize important features and patterns in images, even in the face of variability in image acquisition parameters.

6.3.1 Non-AI Methods Image harmonization techniques that do not rely on AI are based on conventional image processing techniques. These techniques have been in use for several years and rely on traditional computer vision algorithms. The primary goal of these techniques, as with AI-based IHTs, is to adjust the intensity and contrast of the images so that they appear similar, regardless of differences in the imaging conditions or equipment used to acquire them. The most commonly used image harmonization techniques are based on intensity scaling histogram matching or normalization, with the purpose of adjusting the intensity values of each voxel in the image so that they have a similar distribution across different images. These techniques are used extensively in the medical imaging field to improve the quality of the images and reduce the variability among images acquired using different equipment or protocols. The aim is to produce images that are visually comparable and that can be used to facilitate diagnosis or further analysis. One of the most significant advantages of non-AI image harmonization techniques is that they are generally straightforward to implement and can be applied to a broad range of imaging modalities. They do not require any specific hardware or software and can be implemented on a standard computer. This makes them cost-effective and accessible to a broad range of users, including those with limited computational resources. While non-AI techniques have been widely used for several years, their limitations include a lack of robustness and the inability to handle complex variations in image characteristics. These

F. Garcia-Castro and E. Ibor-Crespo

126

limitations can result in suboptimal performance, especially when dealing with images acquired using different protocols or ­equipment. However, despite these limitations, non-AI image harmonization techniques remain a valuable tool in the field of medical imaging, even as part of preprocessing pipelines for the implementation of AI models. In the following sections, some of these techniques will be described.

Intensity Scaling Intensity scaling is a technique used in image processing to adjust the contrast and brightness of an image by scaling the range of voxel intensity values. In other words, it involves mapping the original intensity values of an image to a new range of values. The scaling process of a grayscale digital image typically involves two steps: normalization and rescaling. In the normalization step, the minimum and maximum intensity values in the image are identified. Then, the intensity values of all the voxels in the image are shifted and scaled to be in the range of [0,1] using Eq. 6.1, Ni =

I i - I min I max - I min

(6.1) where Ii is an image voxel, Imin and Imax are the image minimum and maximum intensity, respectively, and Ni is the normalized voxel. After normalization, the image is rescaled to a new range of intensity values. This is usually done to enhance the contrast of the image by stretching the intensity range to occupy the full available range of values. The new intensity values are obtained using Eq. 6.2, Ri = ( N i * ( N max - N min ) ) + N min (6.2) where Ri is a rescaled voxel and Nmin and Nmax are the normalized image minimum and maximum intensity, respectively. Other intensity scaling approaches can be used depending on the application, ranging from simply dividing by the maximum intensity of the image to more complex solutions.

6  Harmonization in the Image Domain

127

 -Score Normalization Z Z-score is a statistical measure that is used to evaluate how many standard deviations a data point is from the mean of a dataset. In the context of image processing, Z-score normalization is a technique that is used to normalize the intensity of voxels in an image. It is a linear transformation method that scales the voxel values to have a mean of zero and a standard deviation of one. Z-score normalization is applied to images by computing the mean and standard deviation of the intensity values of all the voxels in the image. The mean is subtracted from each voxel value, and the result is divided by the standard deviation (Eq. 6.3). This transforms the voxel values such that the mean of the image is zero and the standard deviation is one. Ii - m (6.3) s where Zi is a Z-score normalized voxel and μ and σ are the mean and standard deviation of all the image voxels, respectively. As a standalone method for image harmonization, the Z-score might not be applicable in all scenarios, as it assumes that the distribution of voxel intensities in an image is Gaussian. If the distribution is non-Gaussian, the normalization may not be appropriate. Additionally, Z-score normalization can only adjust the overall brightness and contrast of an image and may not be able to correct for more complex artifacts or variability in image acquisition. However, it has been used in recent research in different situations, such as a normalization method in MRI of head and neck cancer [18] or as part of normalization strategies for radiomic pipelines [19]. It can also be of great help as part of the preprocessing pipeline of an AI model training, as data with an average close to zero can help speed up convergence in specific scenarios [20]. Zi =

Histogram Equalization Histogram equalization (HE) is a technique used in image processing to enhance the contrast of an image by redistributing voxel values in the image’s histogram. It works by increasing the

128

F. Garcia-Castro and E. Ibor-Crespo

global contrast of the image, which can reveal hidden details and improve the overall quality of the image. While HE does not guarantee obtaining the same contrast across a dataset, it will create a similar effect on all of them due to the flattening of the histogram. HE works by transforming the original image’s voxel intensities to a new set of intensities such that the cumulative distribution function (CDF) of the resulting image is as flat as possible. The CDF is a measure of the distribution of voxel intensities in the image. The histogram equalization algorithm is a two-step process. The first step is to calculate the histogram of the input image, which is a plot of the frequency of occurrence of each gray level in the image. The second step is to calculate the cumulative distribution function of the histogram, which represents the number of voxels with intensity levels less than or equal to a given level. The image is then transformed by mapping the original voxel intensities to their new values in a way that equalizes the CDF as seen in Eq. 6.4. æ ( L - 1) ö Ei = round ç * CDF ( I i ) ÷ (6.4) è M ø where Ei is the new voxel intensity value, Ii is the original voxel intensity value, M is the total number of voxels in the image and L is the number of possible voxel intensity levels. Figure  6.2 shows the effect of HE on a T2w prostate MRI slice. Histogram equalization may not work well for images with a bimodal or multimodal histogram, where there are several peaks in the histogram. In such cases, adaptive histogram equalization techniques such as contrast limited adaptive histogram equalization (CLAHE) may be used [21]. CLAHE is a modified version of the traditional HE technique which overcomes the limitations of HE by dividing the image into small rectangular regions called tiles, and then applying the HE technique to each tile separately. The size of the tiles is usually chosen based on the size of the features of interest in the image. For example, for medical images such as MRI scans, smaller tiles can be used to capture the fine details of the image.

6  Harmonization in the Image Domain

129

Fig. 6.2  Effect of HE on a T2w prostate MRI slice. Top left, original slice. Top right, equalized slice. Bottom left, original histogram. Bottom right, equalized histogram

To prevent over-enhancement, CLAHE limits the maximum amount of contrast enhancement that can be applied to each tile. This limit is determined by the contrast distribution of the surrounding tiles. In other words, the maximum enhancement that can be applied to a tile is based on the contrast distribution of the neighboring tiles. This approach ensures that the contrast enhancement is adaptive to the local features of the image and prevents the formation of artifacts or noise. CLAHE has applications as part of machine and deep learning preprocessing pipelines [22], improving contrast enhancement before model training or inference.

Histogram Matching Histogram matching, also known as histogram specification, is a technique used to match the histogram of one image to another, typically a reference image. It has been applied as a normalization technique of medical images for many years [23]. The goal of

130

F. Garcia-Castro and E. Ibor-Crespo

histogram matching is to adjust the intensity values of an input image such that it matches the intensity distribution of a reference image. To obtain the new histogram, the histograms of the input image and the reference image are computed. After calculating the cumulative distribution function (CDF) of both histograms they are normalized to be in the range of 0–1. The inverse CDF of the reference histogram is calculated, and the intensity values of the input image are mapped to the corresponding values in the inverse CDF. Figure 6.3 shows the effect of histogram matching on a slice of a brain FLAIR MRI. The resulting image has a histogram that matches the histogram of the reference image. By matching the histograms of two images, it is possible to transfer the statistical properties of the reference image to the input image.

Fig. 6.3  Effect of histogram matching on a FLAIR MRI slice. Top left, input FLAIR image. Top middle, reference FLAIR image. Top right, result image after histogram matching. Bottom left, input histogram and CDF.  Bottom middle, reference histogram and CDF.  Bottom right, result histogram and CDF after histogram matching. Note that the result image CDF matches the shape of the reference image CDF

6  Harmonization in the Image Domain

131

Compared to histogram equalization, histogram matching provides more control over the output histogram, since it uses a reference histogram to determine the mapping function. Histogram matching can be used to match the histogram of an image to a specific desired histogram, whereas histogram equalization attempts to spread the intensity values evenly over the entire dynamic range of the image. Histogram equalization can be considered a particular case of histogram matching. Histogram matching can also preserve the spatial structure of the image, as it does not rely on global operations, whereas histogram equalization can produce undesirable artifacts due to the global nature of the operation. However, histogram matching can also introduce artifacts, particularly in regions where the reference image has very few voxels. In such cases, the mapping function may be non-monotonic, resulting in non-linearities and distortion in the output image. To mitigate these artifacts, various modifications to the standard histogram matching algorithm have been proposed, such as adaptive histogram matching (AHM). AHM divides the image into small regions or blocks and works independently on each block. The size of the block and the number of bins used in the histogram can be adjusted depending on the characteristics of the image. This approach ensures that the ­contrast and brightness of different regions of the image are adjusted independently, while preserving the local details and avoiding the over-enhancement of noise. One advantage of AHM, over other histogram-based techniques, is its ability to enhance images with varying illumination conditions or local contrast variations, such as medical images. Piecewise linear histogram matching (PLHM) introduces a different approach to histogram matching. Unlike regular histogram matching, which applies a global transformation to the entire image, PLHM performs piecewise linear transformations on the image histogram, allowing for more precise and fine-grained adjustments. PLHM first divides the image histogram into several equal intervals or bins. It then computes the CDF of both the source image and the target histogram. Next, it divides the CDF of the

132

F. Garcia-Castro and E. Ibor-Crespo

source image into the same number of segments as the histogram bins and fits a linear function to each segment. The slope of each linear function represents the degree of contrast enhancement or attenuation for that segment. Finally, it applies the piecewise linear transformation to the image, mapping the intensities of each voxel to the corresponding intensities of the target histogram. Compared to regular histogram matching, PLHM offers several advantages. First, it preserves the local contrast of the image, which can be important for preserving the fine details and textures. Second, it can handle non-linear mappings between the source and target histograms, which can occur when the histograms have different shapes. Third, it can be used to selectively enhance or attenuate specific regions of the image, by adjusting the slopes of the linear functions in different parts of the histogram. However, PLHM also has some limitations. One major issue is the potential for introducing artifacts or discontinuities in the image, particularly at the boundaries between adjacent histogram segments. Another issue is the requirement for careful selection of the number of histogram bins and the placement of the linear functions, as poorly chosen parameters can lead to suboptimal results. PLHM has been used in recent research for harmonizing image quality in multi-center radiomic studies [24].

6.3.2 AI Methods As with non-AI methodologies, IHTs based on AI are designed to reduce variability among images that come from different sources or scanners. AI approaches have been shown to be particularly effective in achieving this goal. Specifically, image harmonization techniques based on deep learning methods, have been found to be more computationally advanced and capable of generating much more satisfactory results than traditional techniques [25], greatly reducing the variability on the resulting images. Compared to traditional techniques, AI-based methods can effectively capture and model the underlying distribution of the data, leading to more accurate and robust results.

6  Harmonization in the Image Domain

133

In this section, we will explore two techniques for image harmonization: autoencoders and GANs. Both methods are based on deep learning and have shown promising results in reducing variability among images from different sources or scanners.

Autoencoders Autoencoders are a type of CNN that can be trained to learn a compressed representation of input images by encoding them into a low-dimensional latent space. The encoded information is then formatted to solve a particular task, which depends on the architecture design and loss function, among others. In medical imaging, autoencoders have been used for various tasks, such as denoising [26] or segmentation [27]. Recently, autoencoders have also been explored for image harmonization purposes [28]. The basic idea behind image harmonization using autoencoders is to train the network to learn the underlying features of a set of medical images and then use this knowledge to generate new images that have similar features, but with reduced variability in appearance. To achieve image harmonization using autoencoders, the network is first trained using a set of input medical images. The encoder part of the network is used to extract features from the images, which are then compressed into a lower-dimensional latent space. The decoder part of the network then takes this compressed representation and reconstructs an output image that is as close as possible to the original input image. Once the autoencoder has been trained, it can be used to generate new images that are similar to the original input images but with reduced variability. To do this, a new image is first fed through the encoder to generate its latent representation. This latent representation is then fed into the decoder to generate a new image that is similar in appearance to the original input image but has been harmonized to match the features of the training set. Figure 6.4 shows a generic diagram of an autoencoder architecture. There are some limitations that should be considered. One limitation is that the quality of the harmonized image is heavily dependent on the quality and quantity of the training data. If the

134

F. Garcia-Castro and E. Ibor-Crespo

Fig. 6.4  Generic autoencoder architecture

training dataset is small or unrepresentative of the target population, the performance of the model may be limited. ­ Additionally, autoencoders may struggle with image harmonization tasks that require complex transformations or adjustments, such as correcting for geometric distortions or registration errors. Another limitation of autoencoders for image harmonization is their sensitivity to noise and artifacts in the input images. This can be particularly problematic in medical imaging, where images may have low signal-to-noise ratios or other types of artifacts. Furthermore, autoencoders may struggle with images that have a very significant variability in appearance. In such cases, GANs may be more effective for image harmonization.

 enerative Adversarial Networks (GANs) G Generative adversarial networks (GANs) have emerged as a powerful tool for image synthesis and manipulation in medical imaging [29]. They were introduced as a means of creating synthetic images in a non-supervised manner. GANs have the ability to learn complex and high-dimensional data distributions, making them suitable for many different applications. Unlike autoencoders, which learn a compressed representation of input images, GANs learn to generate new images that are similar to a target distribution. This makes GANs well-suited for harmonizing images [30] from different sources or with different acquisition protocols, as they can learn to generate new images that match the target distribution.

6  Harmonization in the Image Domain

135

Fig. 6.5  Generic GAN architecture

The GAN architecture consists of two neural networks: a generator and a discriminator. The generator network learns to ­generate new images that match the target distribution, while the discriminator network learns to distinguish between real images and generated images. The two networks are trained simultaneously in an adversarial manner, where the generator tries to fool the discriminator, and the discriminator tries to correctly classify the images as real or generated. Figure 6.5 shows a generic GAN architecture. In the context of medical image harmonization, the generator network can be trained to generate new images that match the distribution of a target dataset, such as a dataset of images acquired with a specific scanner or protocol. This can help to reduce the variability in image appearance and make the images more comparable across different sources or protocols. One approach for using GANs for medical image harmonization is to use a CycleGAN [31] architecture, which can learn mappings between two different domains of images without the need for paired training data. In the context of medical images, this means that the CycleGAN can learn to map images from one acquisition protocol to another, without the need for images that are acquired with both protocols. Another approach is to use a conditional GAN (cGAN) [32], which can learn to generate new images conditioned on a specific input, such as an image acquired with a specific scanner or protocol. This can help to generate new images that match the target

136

F. Garcia-Castro and E. Ibor-Crespo

distribution, even if the input image is from a different source or protocol. Some limitations should be taken into account when using GANs for the development of IHTs. One of the main limitations is the lack of control over the generated images. GANs generate images by learning the distribution of the training data but controlling the features of the generated images can be challenging, leading to unrealistic or undesired characteristics. “Hallucinations” can occur in both GANs and autoencoders, but they are more common in GANs due to their generative nature. In the context of image harmonization, “hallucinations” refer to the generation of unrealistic or implausible details in the output images. GANs are designed to generate new images by learning the distribution of the training data, but this can lead to the generation of images with unwanted features. To prevent “hallucinations” in image harmonization, it is important to use a diverse and representative dataset during training. Additionally, regularization techniques such as dropout and batch normalization can help prevent overfitting and improve generalization. Another limitation is the difficulty in training GANs. Large amounts of training data are required and choosing the appropriate hyperparameters can be challenging. Moreover, GANs can be unstable during training and prone to mode collapse, where the generator produces a limited set of similar images, ignoring the diversity of the training data. Also, GANs are designed to generate images that are similar to the training data and may not generalize well to new, unseen data.

 pplications and Other Approaches A The applications of AI methodologies for image harmonization are varied and can be very problem specific. The effectiveness of AI-based harmonization techniques largely depends on the characteristics of the imaging data being analyzed and the specific requirements of the application. For example, the types of harmonization required for X-ray or CT scans may be very different from those required for MRI or ultrasound images. In CT, image harmonization is most commonly used for the mitigation of the acquisition protocol induced variability, and

6  Harmonization in the Image Domain

137

Fig. 6.6  High-dose CT image reconstructed from low-dose CT image. Left, reconstructed high-dose CT image. Right, original low-dose CT image

more specifically for converting low-dose images into high-dose images (Fig.  6.6). This approach involves the generation of ­high-­quality medical images from low-quality, low-dose images, which has significant implications for reducing radiation exposure to patients. To achieve this goal, GANs have been utilized to generate high-dose CT images from low-dose CT images [33]. GANs have the ability to learn and mimic the image characteristics of high-dose images based on the low-dose images and can generate realistic and high-quality images with fine details. In MRI, the range of applications is much wider, from image harmonization for a specific sequence, such T1w, T2w or FLAIR, to style transfer to convert contrast from a T1w sequence to a T2w sequence. However, image harmonization in MRI is mainly used to reduce the variability introduced by the different acquisition protocols. This is particularly important for multi-center studies where imaging data are acquired from different sites. Several research studies have been focused on the use of GANs for image harmonization, comparing the performance of GANs for T1w harmonization to traditional histogram matching techniques applied to development of radiomic prediction models [25]. Trying to avoid the need of paired data, style-blind autoencoders can also be used for T1w harmonization from previously unseen scanners [34].

138

F. Garcia-Castro and E. Ibor-Crespo

Besides GANs and autoencoders, other deep learning architectures might be suitable for image harmonization. U-Net has been extensively used for anatomy and pathology segmentation, but it is also being researched as an effective method for image harmonization [35]. As some types of autoencoders and GANs do, U-Net requires of paired data for training, with one scan acting as the ground truth and the other as the image to be adapted to this ground truth.

6.4 Conclusions Radiomics has emerged as a promising field for personalized medicine, where medical images are used to extract quantitative features for diagnosis and treatment planning. However, the accuracy and reproducibility of radiomic studies can be affected by variations in image acquisition protocols, leading to inconsistencies in the extracted features. Medical image harmonization techniques have been developed to address these issues by standardizing images acquired from different scanners or at different timepoints. Conventional image processing techniques have been used for image harmonization, such as histogram matching or intensity scaling. However, these methods may not capture the complex inter-voxel relationships in medical images, leading to limited success in harmonizing images across different modalities or scanners. Recent advances in deep learning have shown great potential in medical image harmonization using GANs and autoencoders. These methods can learn the complex mappings between images from different sources. GAN-based methods can generate realistic images with high fidelity, while autoencoder-based methods can preserve the structural information of the original image. Autoencoders have been shown to be effective in harmonizing medical images with and without the need for paired data, but they require careful design and training to ensure the quality of the generated images. GANs, on the other hand, have shown to be highly effective in generating realistic images that can be used for

6  Harmonization in the Image Domain

139

medical image harmonization, but some types of GANs require paired data and are prone to instability during training. Overall, both conventional image processing techniques and AI methods have their strengths and limitations in medical image harmonization. While conventional image processing techniques can provide some level of harmonization, AI methods based on GANs and autoencoders have shown greater success in capturing the complex relationships in medical images. The choice of approach depends on the specific application, available data, and resources. In conclusion, medical image harmonization is a crucial step in radiomic studies to ensure the accuracy and reproducibility of extracted features. Future research in this area should focus on developing more robust and efficient AI methods for medical image harmonization, as well as validating their clinical impact on diagnosis and treatment planning.

References 1. Smith TB, Zhang S, Erkanli A, Frush D, Samei E (2021) Variability in image quality and radiation dose within and across 97 medical facilities. J Med Imaging (Bellingham) 8(5):052105. https://doi.org/10.1117/1. JMI.8.5.052105. Epub 2021 May 8. PMID: 33977114; PMCID: PMC8105613 2. Smith NB, Webb A (2010) Introduction to medical imaging: physics, engineering and clinical applications. Cambridge University Press 3. Yan W, Huang L, Xia L, Gu S, Yan F, Wang Y, Tao Q (2020) MRI manufacturer shift and adaptation: increasing the generalizability of deep learning segmentation for MR images acquired with different scanners. Radiol Artif Intell 2(4):e190195. https://doi.org/10.1148/ ryai.2020190195. PMID: 33937833; PMCID: PMC8082399 4. Shukla-Dave A, Obuchowski NA, Chenevert TL, Jambawalikar S, Schwartz LH, Malyarenko D, Huang W, Noworolski SM, Young RJ, Shiroishi MS, Kim H, Coolens C, Laue H, Chung C, Rosen M, Boss M, Jackson EF (2019) Quantitative imaging biomarkers alliance (QIBA) recommendations for improved precision of DWI and DCE-MRI derived biomarkers in multicenter oncology trials. J Magn Reson Imaging 49(7):e101–e121. https://doi.org/10.1002/jmri.26518 5. Schellinger PD, Jansen O, Fiebach JB, Hacke W, Sartor K (1999) A standardized MRI stroke protocol: comparison with CT in hyperacute intra-

140

F. Garcia-Castro and E. Ibor-Crespo

cerebral hemorrhage. Stroke 30(4):765–768. https://doi.org/10.1161/01. str.30.4.765. PMID: 10187876 6. Purysko AS, Baroni RH, Giganti F, Costa D, Renard-Penna R, Kim CK, Raman SS (2021) PI-RADS version 2.1: a critical review, from the AJR special series on radiology reporting and data systems. AJR Am J Roentgenol 216(1):20–32. https://doi.org/10.2214/AJR.20.24495. Epub 2020 Nov 19. PMID: 32997518 7. Sheikh-Sarraf M, Nougaret S, Forstner R, Kubik-Huch RA (2020) Patient preparation and image quality in female pelvic MRI: recommendations revisited. Eur Radiol 30(10):5374–5383. https://doi.org/10.1007/s00330-­ 020-­06869-­8. Epub 2020 Apr 30. PMID: 32356160 8. Bashyam VM, Doshi J, Erus G, Srinivasan D, Abdulkadir A, Habes M, Fan Y, Masters CL, Maruff P, Zhuo C, Völzke H, Johnson SC, Fripp J, Koutsouleris N, Satterthwaite TD, Wolf DH, Gur RE, Gur RC, Morris JC, Albert MS, Grabe HJ, Resnick SM, Bryan RN, Wolk DA, Shou H, Nasrallah IM, Davatzikos C (2020) Medical image harmonization using deep learning based canonical mapping: toward robust and generalizable learning in imaging. ArXiv.abs/2010.05355 9. Isaksson LJ, Raimondi S, Botta F, Pepa M, Gugliandolo SG, De Angelis SP, Marvaso G, Petralia G, De Cobelli O, Gandini S, Cremonesi M, Cattani F, Summers P, Jereczek-Fossa BA (2020) Effects of MRI image normalization techniques in prostate cancer radiomics. Phys Med 71:7– 13. https://doi.org/10.1016/j.ejmp.2020.02.007. Epub 2020 Feb 18 PMID: 32086149 10. Petersen J, Wille MM, Rakêt LL, Feragen A, Pedersen JH, Nielsen M, Dirksen A, de Bruijne M (2014) Effect of inspiration on airway dimensions measured in maximal inspiration CT images of subjects without airflow limitation. Eur Radiol 24(9):2319–2325. https://doi.org/10.1007/ s00330-­014-­3261-­3. Epub 2014 Jun 6. PMID: 24903230 11. Plodeck V, Radosa CG, Hübner HM, Baldus C, Borkowetz A, Thomas C, Kühn JP, Laniado M, Hoffmann RT, Platzek I (2020) Rectal gas-induced susceptibility artefacts on prostate diffusion-weighted MRI with epi read-­ out at 3.0 T: does a preparatory micro-enema improve image quality? Abdom Radiol (NY) 45(12):4244–4251. https://doi.org/10.1007/s00261-­ 020-­02600-­9. Epub 2020 Jun 4. Erratum in: Abdom Radiol (NY). 2021 Nov;46(11):5450. PMID: 32500236; PMCID: PMC8260527 12. Boellaard R, Delgado-Bolton R, Oyen WJ, Giammarile F, Tatsch K, Eschner W, Verzijlbergen FJ, Barrington SF, Pike LC, Weber WA, Stroobants S, Delbeke D, Donohoe KJ, Holbrook S, Graham MM, Testanera G, Hoekstra OS, Zijlstra J, Visser E, Hoekstra CJ, Pruim J, Willemsen A, Arends B, Kotzerke J, Bockisch A, Beyer T, Chiti A, Krause BJ, European Association of Nuclear Medicine (EANM) (2015) FDG PET/CT: EANM procedure guidelines for tumour imaging: version 2.0. Eur J Nucl Med Mol Imaging 42(2):328–354. https://doi.org/10.1007/

6  Harmonization in the Image Domain

141

s00259-­014-­2961-­x. Epub 2014 Dec 2. PMID: 25452219; PMCID: PMC4315529 13. Shao M, Zuo L, Carass A, Zhuo J, Gullapalli RP, Prince JL (2022) Evaluating the impact of MR image harmonization on thalamus deep network segmentation. Proc SPIE Int Soc Opt Eng 12032:120320H. https:// doi.org/10.1117/12.2613159. Epub 2022 Apr 4. PMID: 35514535; PMCID: PMC9070007 14. Krishnamoorthi R, Ramarajan N, Wang NE, Newman B, Rubesova E, Mueller CM, Barth RA (2011) Effectiveness of a staged US and CT protocol for the diagnosis of pediatric appendicitis: reducing radiation exposure in the age of ALARA.  Radiology 259(1):231–239. https://doi. org/10.1148/radiol.10100984. Epub 2011 Jan 28 PMID: 21324843 15. Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, Cui C, Corrado G, Thrun S, Dean J (2019) A guide to deep learning in healthcare. Nat Med 25(1):24–29. https://doi.org/10.1038/s41591-­018-­ 0316-­z. Epub 2019 Jan 7 PMID: 30617335 16. Kuhnert G, Boellaard R, Sterzer S, Kahraman D, Scheffler M, Wolf J, Dietlein M, Drzezga A, Kobe C (2016) Impact of PET/CT image reconstruction methods and liver uptake normalization strategies on quantitative image analysis. Eur J Nucl Med Mol Imaging 43(2):249–258. https:// doi.org/10.1007/s00259-­015-­3165-­8. Epub 2015 Aug 18. Erratum in: Eur J Nucl Med Mol Imaging. 2015 Oct 19; PMID: 26280981 17. Shinohara RT, Sweeney EM, Goldsmith J, Shiee N, Mateen FJ, Calabresi PA, Jarso S, Pham DL, Reich DS, Crainiceanu CM, Australian Imaging Biomarkers Lifestyle Flagship Study of Ageing, Alzheimer’s Disease Neuroimaging Initiative (2014) Statistical normalization techniques for magnetic resonance imaging. Neuroimage Clin 6:9–19. https://doi. org/10.1016/j.nicl.2014.08.008. Erratum in: Neuroimage Clin. 2015;7:848. PMID: 25379412; PMCID: PMC4215426 18. Wahid KA, He R, McDonald BA, Anderson BM, Salzillo T, Mulder S, Wang J, Sharafi CS, McCoy LA, Naser MA, Ahmed S, Sanders KL, Mohamed ASR, Ding Y, Wang J, Hutcheson K, Lai SY, Fuller CD, van Dijk LV (2021) Intensity standardization methods in magnetic resonance imaging of head and neck cancer. Phys Imaging Radiat Oncol 20:88–93. https://doi.org/10.1016/j.phro.2021.11.001. PMID: 34849414; PMCID: PMC8607477 19. Carré A, Klausner G, Edjlali M, Lerousseau M, Briend-Diop J, Sun R, Ammari S, Reuzé S, Alvarez Andres E, Estienne T, Niyoteka S, Battistella E, Vakalopoulou M, Dhermain F, Paragios N, Deutsch E, Oppenheim C, Pallud J, Robert C (2020) Standardization of brain MR images across machines and protocols: bridging the gap for MRI-based radiomics. Sci Rep 10(1):12340. https://doi.org/10.1038/s41598-­020-­69298-­z. PMID: 32704007; PMCID: PMC7378556 20. LeCun YA, Bottou L, Orr GB, Müller KR (2012) Efficient BackProp. In: Montavon G, Orr GB, Müller KR (eds) Neural networks: tricks of the

142

F. Garcia-Castro and E. Ibor-Crespo

trade. Lecture notes in computer science, vol 7700. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-­3-­642-­35289-­8_3 21. Pizer SM, Eberly DH, Fritsch DS, Yushkevich PA (1987) Adaptive histogram equalization and its variations. Comput Vis Graph Image Process 39(3):355–368. https://doi.org/10.1016/s0734-­189x(87)80186-­x. PMID: 11538358 22. Alghamedy FH, Shafiq M, Liu L, Yasin A, Khan RA, Mohammed HS (2022) Machine learning-based multimodel computing for medical imaging for classification and detection of Alzheimer disease. Comput Intell Neurosci 2022:9211477. https://doi.org/10.1155/2022/9211477. PMID: 35990121; PMCID: PMC9391119 23. Wang L, Lai HM, Barker GJ, Miller DH, Tofts PS (1998) Correction for variations in MRI scanner sensitivity in brain studies with histogram matching. Magn Reson Med 39(2):322–327. https://doi.org/10.1002/ mrm.1910390222. PMID: 9469718 24. Campello VM, Martín-Isla C, Izquierdo C, Guala A, Palomares JFR, Viladés D, Descalzo ML, Karakas M, Çavuş E, Raisi-Estabragh Z, Petersen SE, Escalera S, Seguí S, Lekadir K (2022) Minimising multi-­ centre radiomics variability through image normalisation: a pilot study. Sci Rep 12(1):12532. https://doi.org/10.1038/s41598-­022-­16375-­0. PMID: 35869125; PMCID: PMC9307565 25. Tixier F, Jaouen V, Hognon C, Gallinato O, Colin T, Visvikis D (2021) Evaluation of conventional and deep learning based image harmonization methods in radiomics studies. Phys Med Biol 66(24). https://doi. org/10.1088/1361-­6560/ac39e5. PMID: 34781280 26. Nishio M, Nagashima C, Hirabayashi S, Ohnishi A, Sasaki K, Sagawa T, Hamada M, Yamashita T (2017) Convolutional auto-encoder for image denoising of ultra-low-dose CT.  Heliyon 3(8):e00393. https://doi. org/10.1016/j.heliyon.2017.e00393. PMID: 28920094; PMCID: PMC5577435 27. Baur C, Denner S, Wiestler B, Navab N, Albarqouni S (2021) Autoencoders for unsupervised anomaly segmentation in brain MR images: a comparative study. Med Image Anal 69:101952. https://doi. org/10.1016/j.media.2020.101952. Epub 2021 Jan 2. PMID: 33454602 28. An L, Chen J, Chen P, Zhang C, He T, Chen C, Zhou JH, Yeo BTT (2022) Alzheimer’s disease neuroimaging initiative; Australian imaging biomarkers and lifestyle study of aging. Goal-specific brain MRI harmonization. Neuroimage 263:119570. https://doi.org/10.1016/j. neuroimage.2022.119570. Epub ahead of print. PMID: 35987490 29. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inf Process Syst 27:2672–2680 30. Bashyam VM, Doshi J, Erus G, Srinivasan D, Abdulkadir A, Singh A, Habes M, Fan Y, Masters CL, Maruff P, Zhuo C, Völzke H, Johnson SC, Fripp J, Koutsouleris N, Satterthwaite TD, Wolf DH, Gur RE, Gur RC,

6  Harmonization in the Image Domain

143

Morris JC, Albert MS, Grabe HJ, Resnick SM, Bryan NR, Wittfeld K, Bülow R, Wolk DA, Shou H, Nasrallah IM, Davatzikos C, iSTAGING and PHENOM Consortia (2022) Deep generative medical image harmonization for improving cross-site generalization in deep learning predictors. J Magn Reson Imaging 55(3):908–916. https://doi.org/10.1002/ jmri.27908. Epub 2021 Sep 25. PMID: 34564904; PMCID: PMC8844038 31. Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE international conference on computer vision (ICCV), pp 2242–2251 32. Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 33. Chen J, Wee L, Dekker A, Bermejo I (2022) Improving reproducibility and performance of radiomics in low-dose CT using cycle GANs. J Appl Clin Med Phys 23(10):e13739. https://doi.org/10.1002/acm2.13739. Epub 2022 Jul 30. PMID: 35906893; PMCID: PMC9588275 34. Fatania K, Clark A, Frood R, Scarsbrook A, Al-Qaisieh B, Currie S, Nix M (2022) Harmonisation of scanner-dependent contrast variations in magnetic resonance imaging for radiation oncology, using style-blind auto-encoders. Phys Imaging Radiat Oncol 22:115–122. https://doi. org/10.1016/j.phro.2022.05.005. PMID: 35619643; PMCID: PMC9127401 35. Dewey BE, Zhao C, Reinhold JC, Carass A, Fitzgerald KC, Sotirchos ES, Saidha S, Oh J, Pham DL, Calabresi PA, van Zijl PCM, Prince JL (2019) DeepHarmony: a deep learning approach to contrast harmonization across scanner changes. Magn Reson Imaging 64:160–170. https://doi. org/10.1016/j.mri.2019.05.041. Epub 2019 Jul 10. PMID: 31301354; PMCID: PMC6874910

7

Harmonization in the Features Domain J. Lozano-Montoya and A. Jimenez-­Pastor

7.1 Introduction Harmonization of radiomic features is the process of standardizing the extraction and quantification of imaging features from medical images by establishing guidelines and protocols for each step of the radiomics workflow, to ensure that the extracted features are consistent and comparable across different imaging platforms, institutions, and studies. The harmonization methods, which fall under the feature domain category, are applied either after or during feature extraction to ensure consistency among the extracted radiomic features once the image has been processed. Under the feature domain, there are mainly two differentiated approaches. On the one hand, some methods seek to identify radiomic variables that are more stable under the type of image, modifications in the acquisition parameters, or the center effect. On the other hand, there are different methods based on normalization techniques, which use statistical or deep learning approaches based on variables standardization or scaling. J. Lozano-Montoya (*) · A. Jimenez-Pastor Department of AI Research, Quibim, Valencia, Spain e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Á. Alberich-Bayarri, F. Bellvís-Bataller (eds.), Basics of Image Processing, Imaging Informatics for Healthcare Professionals, https://doi.org/10.1007/978-3-031-48446-9_7

145

146

J. Lozano-Montoya and A. Jimenez-Pastor

7.2 Reproducibility of Radiomic Features The selection of reproducible features is a task performed in radiomic studies that seeks to ensure their robustness to imaging variability, identifying a subset of features that are insensitive to variations in imaging protocols and acquisition settings. One of the challenges in radiomics is that the reproducibility of radiomic features is often not generalizable to different sites, modalities, or scanners. This is due in part to the retrospective nature of many radiomic studies and the lack of standardization and variability in imaging protocols [1]. Additionally, radiomic feature values are also influenced by patient variabilities, such as weight and height or patient’s movements, which can impact the levels of noise and presence of artifacts in the image [2]. Moreover, when assessing the reproducibility of radiomics, it is important to be aware that cut-offs for correlation coefficients are often arbitrarily chosen, and the number of “robust” features depends on the number of subjects involved. Furthermore, it is also important to note that the information from studies that have assessed the impact of imaging settings on radiomics is often not directly helpful to future studies, as the reproducibility of radiomic features is not necessarily generalizable to different disease sites, modalities, or scanners [3]. In order to overcome these challenges, several methods and considerations can be applied at different reproducibility stages to ensure reliability, see Fig.  7.1. These methods aim to minimize the impact of the variability of imaging protocols and patient characteristics on the extracted features.

Fig. 7.1  Reproducibility in radiomic features can be affected by different aspects: imaging data, region of interest (ROI) segmentation, post-processing and feature extraction algorithms, and research reproducibility

7  Harmonization in the Features Domain

147

7.2.1 Imaging Data Reproducibility Imaging data reproducibility refers to the ability to consistently obtain similar results from imaging studies. It encompasses two aspects: repeatability and reproducibility. Repeatability refers to the consistency of results obtained from repeated measurements under identical or near-identical conditions, using the same equipment and procedures. Reproducibility, on the other hand, refers to the consistency of results obtained from measurements taken in different settings, using different equipment or operators [4]. The methodology for conducting these analyses differs depending on the type of study [5].

I mage Acquisition and Reconstruction Parameters Image acquisition and reconstruction parameters play an important role in the extraction of reproducible and robust radiomic features. These parameters include things like the imaging modality (e.g., computed tomography [CT], magnetic resonance imaging [MRI]), the imaging protocol (e.g., slice thickness, field of view), and the reconstruction algorithm (e.g., filtered back projection, iterative reconstruction) [6]. Ideally, the same imaging parameters must be used across different scans of the same patient, or different patients with similar characteristics. This ensures that the features extracted from the images are directly comparable and that any changes in the features can be attributed to changes in the underlying disease rather than variations in the imaging. However, when dealing with retrospective multi-centric real world data (RWD), it becomes difficult to ensure the same acquisition protocol across scanners. Furthermore, images should be acquired and reconstructed in a way that minimizes noise and artifacts. This can be achieved through the use of high-quality imaging equipment and standard imaging protocols, as well as the use of advanced reconstruction algorithms that can reduce noise and improve the signal-to-noise ratio of the images. Several studies have investigated the repeatability and reproducibility of radiomic features using CT, MRI, and positron

148

J. Lozano-Montoya and A. Jimenez-Pastor

e­ mission tomography (PET) with strategies such as test-retest or phantom (specialized object utilized in medical imaging for quality control, equipment calibration, dosimetry, and education) studies to ensure that radiomic features are robust and reproducible. Although these studies reduce the exposure to patients, it is important to note that phantom studies do not fully replicate the complexity and heterogeneity of human tissues, and are commonly used to equipment calibrations. CT Scans One of the most studied factors influencing reconstruction in CT scans is the voxel size. However, some studies also investigate the impact of image discretization on radiomic features [1]. When different acquisition modes and image reconstructions were applied to CT, most features were found to be redundant, with only 30% of them being reproducible across test-retest, with a concordance correlation coefficient (CCC) of at least 0.90 [7]. When a phantom with 177 features was used and the pitch factor and reconstruction kernel were modified, it was found that between 76 and 151 of the features were reproducible [8]. This highlights the importance of carefully considering factors such as the voxel size, image discretization, and the reconstruction kernel when analyzing CT-based datasets. PET Scans Many studies have been conducted to assess the reproducibility of radiomic features in PET scans, but most of them only examine the impact of variability in scanner and imaging parameters and do not provide specific methods for achieving reproducible features. The full-width half maximum (FWHM) of the Gaussian filter is the most frequently investigated reconstruction factor in this context [1]. An important study evaluates the impact of various image reconstruction settings in PET/CT scans using data from a phantom and a patient dataset from two different scanners [9]. The study grouped the radiomic features into intensity-based, geometry-­ based, and texture-based features, and their

7  Harmonization in the Features Domain

149

r­eproducibility and variability were measured using the coefficient of variation (COV). The results from both, the phantom and patient studies, showed that 47% of all radiomic features were reproducible. Another study [10] investigated whether radiomic models developed using PET/CT images could be transferred to PET/MRI images by assessing the reproducibility of radiomic features under different test-retest and attenuation correction variability. The results of this study showed that intensity-based and geometry-­based features were also reproducible. MRI Sequences The impact of test-retest, acquisition, and reconstruction settings in MRI has been explored less extensively than for PET and CT. A recent study investigated the robustness of radiomic features across different MRI scanning protocols and scanners using a phantom [11]. The results showed that the robustness of the features varied depending on the feature, with most intensity-based and gray-level co-occurrence matrix (GLCM) features showing intermediate or small variation, while most neighborhood gray-­ tone difference (NGTD) features showed high variation. In the GLCM features, variance, cluster shade, cluster tendency, and cluster prominence had poor robustness. However, these features had high reproducibility if the scanning parameters were kept the same, making them useful for intrascanner studies. Nevertheless, the study had limitations, including the effect of subject movement and uncertainty in lesion segmentation.

Intra-individual Test-Retest Repeatability Intra-individual test-retest repeatability studies involve measuring the same individual multiple times. Radio mic features in test-­ retest repeatability assessments may be influenced by a variety of factors, including variations in patient positioning, respiration phase, contrast enhancement, and acquisition and processing parameters [5]. Studies have shown that respiration can have a significant impact on the reproducibility of radiomic features in CT images of lung cancer patients during test-retest assessments [12]. An MRI study performed a test-retest analysis for three

150

J. Lozano-Montoya and A. Jimenez-Pastor

acquisitions showing that only 37% of radiomic features were found to be reproducible with a CCC > 0.8 [13].

Multi-scanner Reproducibility Multi-machine reproducibility studies involve measuring the same image on different scanners. A recent study based on the reproducibility of radiomic features across several MRI scanners and acquisition protocol parameters using both phantom and patient data with a test-retest strategy, revealed very little differences in the variability between filtering and normalizing effect which were used for preprocessing [11]. Moreover, the intra-class correlation coefficient (ICC) measurements showed higher reproducibility for the phantom data than for the patient data, however, the study was unable to mitigate the impact of patient’s movements despite simulating movement during scanning. A similar study also extracted stable MRI radiomic features with a minimum CCC of 0.85 between data derived from 61 patients’ test and retest apparent diffusion coefficient (ADC) maps across various MRI systems, tissues, and vendors [14].

7.2.2 Segmentation Reproducibility Segmentation is a challenging and contentious aspect, particularly in the context of oncology research, where tumors often have complex borders and there is a high degree of inter-reader variability in manually contouring tumors. While normal structures can now be segmented with full automation, diseases such as cancer, require operator input due to the inter- and intra-reader morphologic and contrast heterogeneity at the initial examination. There is an ongoing debate over the optimal methods for segmentation, including the use of manual or automatic techniques and the pursuit of ground truth or reproducibility. One solution to this problem is the use of semiautomatic segmentation, which is more reproducible than manual segmentation [15]. However, even with semiautomatic segmentation, reproducibility is not ideal, and researchers continue exploring automatic segmentation methods. One study in MRI brain exams that were segmented with

7  Harmonization in the Features Domain

151

four different methods found that deep learning-based approaches had higher accuracy in predictive models, but also noted that subtle differences in the segmentation methods can affect the radiomic features obtained [16]. The reproducibility of segmentation can also vary depending on the type of tumor being studied. For example, a study of CT scans from patients with head and neck cancer, pleural mesothelioma, and non-small cell lung cancer found that the ROIs and radiomic features were most reproducible in lung cancer [17]. However, despite these challenges, a consensus is emerging that the optimal approach to segmentation is a combination of computer-aided contouring followed by manual curation [18].

7.2.3 Post-processing and Feature Extraction The process of feature extraction is complex and can be influenced by several factors, such as outlier control, setting ranges of intensity, and the number of bins used to discretize an image (i.e., for GLCM matrix calculation). An MRI study tested 33 different combinations of variations using different voxel sizes, four gray-­ level discretizations, and three quantization methods [19]. However, it did not find a strong CCC across the combinations. Furthermore, to ensure the reproducibility and generalizability of results across studies, it is important to use consistent methods for discretization and quantification, as the IBSI (Imaging Biomarker Standardization Initiative) manual proposes [20]. Moreover, the use of different software packages for radiomic extraction can lead to increased variability in feature values and can negatively impact the reliability and prognostication ability of radiomic models. Thus, one way to improve the reproducibility and generalizability of results is by using a standardized and open-source platform for feature extraction that follows IBSI guidelines to obtained comparable features. PyRadiomics [21] is the platform that is commonly used in this field and is publicly available, with source code, documentation, and examples. Recently, a study investigated the reproducibility of radiomic features with two widely used radiomic software packages (IBEX

152

J. Lozano-Montoya and A. Jimenez-Pastor

and MaZda) in comparison to an IBSI compliant software (PyRadiomics) [22]. The non-compliant packages obtained significantly less reproducible features compare with the IBSI compliant ones; however, both options had similar predictive power in a model of response to radiotherapy for head and neck cancer.

7.2.4 Reporting Reproducibility Open-source data plays an important role in the improvement and reproducibility of radiomics. The availability of open datasets, like the RIDER dataset, and public phantoms can aid to understand the effects of different factors on radiomics and could help to further assess the influence of acquisition settings. Ensuring consistency and transparency in radiomic studies is crucial, and it is necessary to provide detailed reporting on preprocessing steps to enhance reproducibility and repeatability. Fortunately, recent developments in the field aim to improve the quality of radiomic studies by providing several guidelines that facilitate their execution. Initiatives such as the IBSI [20], RQS (radiomics quality score) [23] or TRIPOD (transparent reporting of a multivariate prediction for individual prognosis or diagnosis) [24] are recommended to improve the final quality and reproducibility of the studies. Furthermore, to increase the potential for clinically relevant and valuable radiomic studies, some authors [1] recommend assessing whether the following questions can be answered affirmatively before starting a new study: –– Is there an actual clinical need which could potentially be answered with the help of radiomics? –– Is there enough expertise in the research team to ensure high quality of the study and potential of clinical implementation? –– Is there access to enough data to support the conclusions with sufficient power, including external validation datasets? –– Is it possible to retrieve all other non-imaging data that is known to be relevant for the research question?

7  Harmonization in the Features Domain

153

–– Is information on the acquisition and reconstruction of the images available? –– Are the imaging protocols standardized and if not, is there a solution to harmonize images or to ensure minimal influence of varying settings on the modeling?

7.3 Normalization Techniques Normalization techniques are statistical approaches that are used to account for variations in image intensity, brightness, and contrast. These methods are designed to standardize the features by making them comparable across different imaging modalities, scanners, and centers improving the reliability and comparability of radiomic features, making possible to use them in clinical practice. Data normalization methods are crucial for radiomic features, as they are often characterized by differences in scale, range, and statistical distributions. Without normalization, features may exhibit high levels of skewness, which can artificially result in lower p-values in statistical analysis [25]. Neglecting feature normalization or using inappropriate methods can also lead to individual features being over or underrepresented and introduce bias into developed models. Normalization techniques are further divided into several subtypes: statistical normalization, ComBat methods, and normalization with deep learning. It is important to note that while image preprocessing normalization steps are important to reduce ­technical variability across images, additional feature normalization steps are still necessary and should not be overlooked [26].

7.3.1 Statistical Normalization Normalization techniques are used to correct biases and differences in radiomic features that may be caused by variations in imaging devices, acquisition protocols, or reconstruction param-

154

J. Lozano-Montoya and A. Jimenez-Pastor

eters, and several studies have specifically evaluated the benefit of normalization for this purpose. Various methods of statistical normalization can be applied, but some of the most commonly used include the following: • Z-score normalization, which scales feature values to have a mean of 0 and a standard deviation of 1. A variation of this method is the robust Z-score, which uses median and absolute deviation from the median instead of mean and standard deviation to account for outliers. • Min–max normalization, which performs a linear transformation to scale feature values to a common range of 0–1. This preserves relationships among the original data but can suppress the effect of outliers due to the bounded range. • Square root or log transformations can be used to decrease the skewness of distributions. However, these transformations can only be used for positive values and can sometimes make the distribution more skewed than the raw data. • Upper quartile normalization divides each read count by the 75th percentile of read counts in the sample [27]. • Quantile normalization, which transforms the original data to remove undesirable technical variation by forcing observed distributions to be the same. This method can work well in practice but can wipe out important information and artificially induce features that are not statistically different across samples [28]. • Whitening normalization using principal component analysis (PCA), which is based on a linear transformation that converts a vector of random variables with a known covariance matrix into a set of new variables whose covariance is the identity matrix [29]. This technique can make a more substantial normalization of the features but can also exaggerate noise in the data [30]. It is important to keep in mind that different normalization methods have their own specific advantages and disadvantages, and it may be necessary to experiment multiple methods to find the best approach for a given set of radiomic features.

7  Harmonization in the Features Domain

155

A recent study compared different normalization methods to standardize the radiomic features extracted from CT images of non-small cell lung cancer (NSCLC) patients. The study found that using a z-score normalization resulted in the best prediction radiomic model with an AUC of 0.789 when compared to min-­ max normalization (0.725 AUC) and PCA (0.785 AUC) [30]. Another study also evaluated the effect of several normalization techniques, including no normalization, z-score, robust z-score, log-transformation, upper quantile, and quantile on predicting several clinical phenotypes using a machine learning pipeline [26]. The results from the correlation analysis showed that all radiomic features were perfectly correlated with non-normalized radiomic features when using scaling, z-score, robust z-score, and upper quartile normalization methods. These methods were found to help reduce bias and not alter the information. However, log-­ transformation, quantile, and whitening methods showed a poor correlation value with non-normalized radiomic features.

7.3.2 ComBat ComBat is a statistical method that was originally developed to harmonize gene expression arrays and correct “batch effects” in genomic studies [31] but can also be applied on radiomics to remove discrepancies introduced by technical differences in the images. It is a data-driven post-processing technique that employs empirical Bayes methods to estimate the differences in feature values due to a batch effect. Moreover, it can provide satisfactory results even for small datasets depending on the representativeness of the samples available for each site [32]. ComBat standardizes radiomic features by centering them to the overall mean of all samples. This process shifts the data into a new location that is different from the original centers, consequently, features lose their physical meaning [33]. In cases where there is high heterogeneity and the number of variables to use with ComBat would be too high for the number of patients, unsupervised clustering can be used to identify potential labels for harmonization. One of the main limitations of the ComBat method is

156

J. Lozano-Montoya and A. Jimenez-Pastor

its inability to harmonize new data that comes from a different source than the data used during the feature transformation phase. This means that if new data is added to the analysis, ComBat needs to be reapplied to the entire dataset to ensure that reliability. A detailed manual was recently provided to explain the correct application of Combat in multi-site research. The guide illustrates and clarifies under what conditions Combat can be utilized to standardize image-based biomarkers [34]: 1. The distributions of the features to be realigned must be similar except for shift (additive factor) and spread (multiplicative factor) effects. 2. Any covariates that might explain different distributions at the two sites must be identified and considered. 3. The different sets of feature values to be realigned must be independent. An extensive comparison of ComBat with the previous normalization techniques in the context of radiomics has not yet been carried out, although previous comparisons between ComBat and similar techniques for batch effect correction in different fields indicated the superiority of ComBat [35]. Several modifications of ComBat have been proposed to improve its performance and to solve its limitations, however, some variations still require validation for certain image modalities [36]: • M-Combat addresses the problem of features losing their physical meaning after harmonization by allowing the selection of a reference center to align other centers with no loss of performance. • B-ComBat and BM-ComBat are modifications that use bootstrapping to improve the accuracy of the estimates, as they account for the uncertainty in the data. The initial estimates obtained are resampled a specified number of times (B) with replacement. The resamples are then fitted to obtain new estimates of the coefficients and finally, they are calculated using the Monte Carlo method by taking the mean of all of them. A study reported an increase in the radiomic models’ perfor-

7  Harmonization in the Features Domain



• • •



157

mance using these variations compared with standard ComBat and M-Combat [33]. Transfer learning ComBat addresses the limitation of ComBat’s inability to harmonize new unseen data by coupling the method with a transfer learning technique, allowing the use of previously learned harmonization transforms on new data [37]. Nested ComBat provides a sequential radiomic features harmonization workflow to compensate for multicenter heterogeneity caused by multiple batch effects. NestedD was introduced to handle bimodal feature distributions instead of Gaussian distributions and is recommended for high-dimensional datasets [38]. GMM ComBat is a modification that uses the Gaussian mixture model split method to handle bimodality coming from unknown factors, providing an alternative to ComBat for ­confronting potential issues arising from the assumption that all imaging factors are known [38]. Longitudinal ComBat expands ComBat’s applicability to the longitudinal domain by eliminating additive and multiplicative scanner effects [39].

Figure 7.2 shows an example of the application of ComBat on two radiomic variables (median gray-level intensity and entropy) extracted from non-small cell lung cancer (NSLC) lesions to correct for the manufacturer effect (GE, Canon, Siemens, and ­Philips). In this case, the GE manufacturer was used as a reference to standardize the other distributions. A study used ComBat to transform MRI-based radiomic features from T1 phantom images to T1-weighted brain tumors [40]. The study found that ComBat eliminated the scanner effect and increased the number of statistically significant features that could be used to differentiate between low and intermediate/high-risk scores. Additionally, a radiomic model based on linear discriminant analysis was implemented, which achieved a higher Youden Index when ComBat was used (0.43) compared to when it was not used (0.12). Another study evaluated its use for harmonizing radiomic features in a combined PET and MRI radiomic model

158

J. Lozano-Montoya and A. Jimenez-Pastor

Fig. 7.2  Application of ComBat for two radiomic features extracted from NSCLC lesions to correct for manufacturer’s batch effect: median gray-level intensity (top) and entropy values (bottom). On the left, original density distributions and, on the right, density distributions after harmonization with ComBat

using ADC parametric maps to predict the recurrence in advanced cervical patients [41]. In this case, ComBat improved the accuracy when the model was validated externally from two different cohorts: 82–85% accuracy without harmonization and 90% after ComBat. Most radiomic features were significantly affected by differences in acquisition and reconstruction parameters in CT scans. In this study [42], the authors reported an improvement in the performance of the developed radiomic signatures with the ComBat harmonization, where all the features could be used after the process.

7  Harmonization in the Features Domain

159

Nevertheless, harmonizing the distribution without paying attention to individual value and rank is not expected to be beneficial for the generalizability of radiomic signatures. The effective use of ComBat ideally requires the evaluation of the consistency of radiomic features after applying ComBat on samples that lack biological variability, such as phantoms. Then, radiomic features extracted from patients’ scans obtained with the same imaging settings can be transformed using the location/scale parameters determined through the application of ComBat on the phantom data. Therefore, a framework that guides the use of ComBat in radiomic analyses has been published, however, this procedure could be also applied to other feature harmonization methods [43]. The workflow starts by collecting imaging datasets and extracting the imaging acquisition and reconstruction parameters. A phantom is then scanned with the different acquisition and reconstruction parameters used for acquiring the scans in the patient’s imaging dataset. Radiomic features are extracted from the phantom scans and their reproducibility is assessed using the CCC. Features that obtain a CCC > 0.9 are considered reproducible and further used for modeling. Finally, to assess the performance of the feature harmonization method, it is applied on the phantom’s scans and the robust features are obtained again (CCC > 0.9). The combination of the identified stable and harmonizable features should be used for further analysis. One study applied this framework with ComBat to 13 scans of a phantom using different imaging protocols and vendors [44]. By investigating the reproducibility of radiomic features in a pairwise manner, the study found a wide range of reproducible features, between 9 and 78. The harmonization did not have a uniform impact on radiomic features and the number of features that could be used following harmonization varied widely. Therefore, the impact of ComBat harmonization should be carefully analyzed depending on the data under analysis. In summary, ComBat seems a promising and straightforward method for standardizing radiomic features, as long as there is a sufficient number of labels and the sources of variations are identified.

160

J. Lozano-Montoya and A. Jimenez-Pastor

7.3.3 Deep Learning Approaches Deep learning has been widely used in image domain harmonization, but recent studies have shown that it can also be an effective alternative for batch effect removal in the feature domain. A study [45] trained a deep neural network to standardize radiomic and deep features across different scanner models, acquisition, and reconstruction settings, using a publicly available texture phantom dataset. The idea behind this approach was to use a neural network to learn a nonlinear normalization transformation that reduces intra-scan clustering while maintaining informative and discriminative features. The generalization to unknown textures and scans was demonstrated through a series of experiments using a publicly available phantom CT texture dataset scanned with various imaging devices and parameters. Another approach that has been explored is the use of domain adversarial neural networks (DANNs). DANNs use a label predictor and a domain classifier to optimize the features and make them discriminative for the main task, but not discriminative between domains. A study used this method with an iterative update approach to generate harmonized features of MRI images and evaluated their performance in segmentation, resulting in a decreased influence of scanner variation on predictions [46]. The method was tested on a multi-centric dataset, making it a more suitable approach for feature harmonization. Finally, other studies have also achieved good results in reducing divergence between source and target data feature distributions in other fields, but these methods tend to be less successful with medical imaging data. Overall, deep learning has proven to be a promising alternative for feature harmonization in medical imaging, but more research is needed to fully understand its potential and limitations.

7  Harmonization in the Features Domain

161

7.4 Strategies Overview Radiomics is a powerful tool for characterizing and predicting diseases. However, multiple factors can influence the feature values, including scanner and patient variability, image acquisition and reconstruction settings, and image preprocessing. Table  7.1 illustrates a summary of the technical factor that influence radiomics stability in the radiomics workflow. Thus, radiomics harmonization is an important step to ensure robust and stable features that can enable reproducibility and generalizability in radiomics modeling. One common approach to deal with variability is to eliminate radiomic features that are not robust against these factors. This is typically done by evaluating the variability of radiomic features across different scanners and protocols using metrics such as the ICC and the COV. The challenge is to find the ideal threshold to select the stable radiomic features because potentially relevant information could be removed [36]. To ensure that radiomic features are reproducible and robust, it is recommended to follow a standard protocol for image acquisition and feature extraction and to perform appropriate quality control measures. Figure 7.3 shows an overview of the strategies described during the chapter to ensure the harmonization of the radiomic features. In conclusion, harmonization techniques in the feature domain are important for ensuring that imaging data is consistent across different studies. Two studied approaches are deep learning and statistical methods. While deep learning is well-suited for detecting nonlinear patterns in imaging data, it can be more difficult to apply than statistical methods and more research is needed in this field. One statistical method that has been extensively studied is ComBat, which has been shown to offer better results when compared to other methods. However, it is important to consider the number of samples that are available for analysis when choosing an harmonization technique, for example, ComBat requires a minimum of 20–30 patients per batch, whereas other methods, such as Z-score or White-Stripe normalization do not have any restriction on dataset size.

Image acquisition

•  Field strength •  Sequence design •  Acquired matrix size •  Field of view •  Slice thickness •  Acceleration techniques • Vendor •  Contrast timing • Movement

•  Tube voltage • Milliamperage • Pitch •  Field of view/pixel spacing •  Slice thickness •  Acquisition mode • Vendor •  Contrast timing • Movement

•  Field of view/pixel spacing •  Slice thickness •  Injected activity •  Acquisition time •  Scan timing •  Duty cycle • Vendor • Movement

Image modality

MRI

CT

PET

Reconstruction matrix Slice thickness Reconstruction technique Attenuation correction

Reconstruction matrix Slice thickness Reconstruction kernel Reconstruction technique

Reconstructed matrix size Reconstruction technique

Reconstruction parameters Manual 2D Manual 3D Semi-automated 2D Semi-automated 3D Automated 2D Automated 3D Size of the ROI

Segmentation Image interpolation Intensity discretization Normalization

Post-processing

Mathematical formula Package

Feature extraction

Table 7.1  Factor influencing radiomics stability and posterior reproducibility. Adapted from: J.  E. van Timmeren, D.  Cester, S. Tanadini-­Lang, H. Alkadhi, y B. Baessler, “Radiomics in medical imaging—“how-to” guide and critical reflection”, Insights Imaging, vol. 11, n.o 1, p. 91, dic. 2020, doi: https://doi.org/10.1186/s13244-­020-­00887-­2. Licensed under a Creative Commons Attribution 4.0 International License https://creativecommons.org/licenses/by/4.0/

162 J. Lozano-Montoya and A. Jimenez-Pastor

7  Harmonization in the Features Domain

163

Fig. 7.3  Summary of strategies for harmonization of radiomic features

References 1. van Timmeren JE, Cester D, Tanadini-Lang S, Alkadhi H, Baessler B (2020) Radiomics in medical imaging—“how-to” guide and critical reflection. Insights Imaging 11(1):91. https://doi.org/10.1186/s13244-­ 020-­00887-­2 2. Mühlberg A et al (2020) The technome—a predictive internal calibration approach for quantitative imaging biomarker research. Sci Rep 10(1):1103. https://doi.org/10.1038/s41598-­019-­57325-­7 3. van Timmeren JE et al (2016) Test–retest data for radiomics feature stability analysis: generalizable or study-specific? Tomography 2(4):361– 365. https://doi.org/10.18383/j.tom.2016.00208 4. Kessler LG et  al (2015) The emerging science of quantitative imaging biomarkers terminology and definitions for scientific studies and regulatory submissions. Stat Methods Med Res 24(1):9–26. https://doi. org/10.1177/0962280214537333 5. Park JE, Park SY, Kim HJ, Kim HS (2019) Reproducibility and generalizability in radiomics modeling: possible strategies in radiologic and statistical perspectives. Korean J Radiol 20(7):1124. https://doi.org/10.3348/ kjr.2018.0070 6. Midya A, Chakraborty J, Gönen M, Do RKG, Simpson AL (2018) Influence of CT acquisition and reconstruction parameters on radiomic feature reproducibility. J Med Imaging 5(01):1. https://doi.org/10.1117/1. JMI.5.1.011020

164

J. Lozano-Montoya and A. Jimenez-Pastor

7. Balagurunathan Y et al (2014) Test–retest reproducibility analysis of lung CT image features. J Digit Imaging 27(6):805–823. https://doi. org/10.1007/s10278-­014-­9716-­x 8. Berenguer R et al (2018) Radiomics of CT features may be nonreproducible and redundant: influence of CT acquisition parameters. Radiology 288(2):407–415. https://doi.org/10.1148/radiol.2018172361 9. Shiri I, Rahmim A, Ghaffarian P, Geramifar P, Abdollahi H, Bitarafan-­ Rajabi A (2017) The impact of image reconstruction settings on 18F-­ FDG PET radiomic features: multi-scanner phantom and patient studies. Eur Radiol 27(11):4498–4509. https://doi.org/10.1007/s00330-­017-­ 4859-­z 10. Vuong D et  al (2019) Interchangeability of radiomic features between [18F]-FDG PET/CT and [18F]-FDG PET/MR.  Med Phys 46(4):1677– 1685. https://doi.org/10.1002/mp.13422 11. Lee J et  al (2021) Radiomics feature robustness as measured using an MRI phantom. Sci Rep 11(1):3973. https://doi.org/10.1038/s41598-­021-­ 83593-­3 12. Hunter LA et  al (2013) High quality machine-robust image features: identification in nonsmall cell lung cancer computed tomography images: robust quantitative image features. Med Phys 40(12):121916. https://doi. org/10.1118/1.4829514 13. Kickingereder P et al (2018) Radiomic subtyping improves disease stratification beyond key molecular, clinical, and standard imaging characteristics in patients with glioblastoma. Neuro Oncol 20(6):848–857. https:// doi.org/10.1093/neuonc/nox188 14. Peerlings J et al (2019) Stability of radiomics features in apparent diffusion coefficient maps from a multi-centre test-retest trial. Sci Rep 9(1):4800. https://doi.org/10.1038/s41598-­019-­41344-­5 15. Parmar C et al (2014) Robust radiomics feature quantification using semiautomatic volumetric segmentation. PLoS One 9(7):e102107. https://doi. org/10.1371/journal.pone.0102107 16. Poirot MG et al (2022) Robustness of radiomics to variations in segmentation methods in multimodal brain MRI. Sci Rep 12(1):16712. https:// doi.org/10.1038/s41598-­022-­20703-­9 17. Pavic M et al (2018) Influence of inter-observer delineation variability on radiomics stability in different tumor sites. Acta Oncol 57(8):1070–1074. https://doi.org/10.1080/0284186X.2018.1445283 18. Gillies RJ, Kinahan PE, Hricak H (2016) Radiomics: images are more than pictures, they are data. Radiology 278(2):563–577. https://doi. org/10.1148/radiol.2015151169 19. Li Q et  al (2017) A fully-automatic multiparametric radiomics model: towards reproducible and prognostic imaging signature for prediction of overall survival in glioblastoma multiforme. Sci Rep 7(1):14331. https:// doi.org/10.1038/s41598-­017-­14753-­7

7  Harmonization in the Features Domain

165

20. Zwanenburg A, Leger S, Vallières M, Löck S (2020) Image biomarker standardisation initiative. Radiology 295(2):328–338. https://doi. org/10.1148/radiol.2020191145 21. van Griethuysen JJM et  al (2017) Computational radiomics system to decode the radiographic phenotype. Cancer Res 77(21):e104–e107. https://doi.org/10.1158/0008-­5472.CAN-­17-­0339 22. Korte JC et al (2021) Radiomics feature stability of open-source software evaluated on apparent diffusion coefficient maps in head and neck cancer. Sci Rep 11(1):17633. https://doi.org/10.1038/s41598-­021-­96600-­4 23. Lambin P et al (2017) Radiomics: the bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol 14(12):749–762. https:// doi.org/10.1038/nrclinonc.2017.141 24. Collins GS, Reitsma JB, Altman DG, Moons KGM (2015) Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ 350(4):g7594–g7594. https://doi.org/10.1136/bmj.g7594 25. Parmar C, Barry JD, Hosny A, Quackenbush J, Aerts HJWL (2018) Data analysis strategies in medical imaging. Clin Cancer Res 24(15):3492– 3499. https://doi.org/10.1158/1078-­0432.CCR-­18-­0385 26. Castaldo R, Pane K, Nicolai E, Salvatore M, Franzese M (2020) The impact of normalization approaches to automatically detect radiogenomic phenotypes characterizing breast cancer receptors status. Cancers 12(2):518. https://doi.org/10.3390/cancers12020518 27. Bullard JH, Purdom E, Hansen KD, Dudoit S (2010) Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11(1):94. https://doi. org/10.1186/1471-­2105-­11-­94 28. Hicks SC, Irizarry RA (2014) When to use quantile normalization? Genomics, preprint. https://doi.org/10.1101/012203 29. Kessy A, Lewin A, Strimmer K (2018) Optimal whitening and decorrelation. Am Stat 72(4):309–314. https://doi.org/10.1080/00031305.2016.12 77159 30. Haga A et  al (2019) Standardization of imaging features for radiomics analysis. J Med Invest 66(1.2):35–37. https://doi.org/10.2152/jmi.66.35 31. Johnson WE, Li C, Rabinovic A (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8(1):118–127. https://doi.org/10.1093/biostatistics/kxj037 32. Goh WWB, Wang W, Wong L (2017) Why batch effects matter in omics data, and how to avoid them. Trends Biotechnol 35(6):498–507. https:// doi.org/10.1016/j.tibtech.2017.02.012 33. Da-ano R et al (2020) Performance comparison of modified ComBat for harmonization of radiomic features for multicenter studies. Sci Rep 10(1):10248. https://doi.org/10.1038/s41598-­020-­66110-­w

166

J. Lozano-Montoya and A. Jimenez-Pastor

34. Orlhac F et al (2022) A guide to ComBat harmonization of imaging biomarkers in multicenter studies. J Nucl Med 63(2):172–179. https://doi. org/10.2967/jnumed.121.262464 35. Papadimitroulas P et  al (2021) Artificial intelligence: deep learning in oncological radiomics and challenges of interpretability and data harmonization. Phys Med 83:108–121. https://doi.org/10.1016/j. ejmp.2021.03.009 36. Stamoulou E et al (2022) Harmonization strategies in multicenter MRI-­ based radiomics. J Imaging 8(11):303. https://doi.org/10.3390/jimaging8110303 37. Da-ano R et al (2021) A transfer learning approach to facilitate ComBat-­ based harmonization of multicentre radiomic features in new datasets. PLoS One 16(7):e0253653. https://doi.org/10.1371/journal. pone.0253653 38. Horng H et  al (2022) Generalized ComBat harmonization methods for radiomic features with multi-modal distributions and multiple batch effects. Sci Rep 12(1):4493. https://doi.org/10.1038/s41598-­022-­08412-­9 39. Beer JC et  al (2020) Longitudinal ComBat: a method for harmonizing longitudinal multi-scanner imaging data. NeuroImage 220:117129. https://doi.org/10.1016/j.neuroimage.2020.117129 40. Orlhac F et al (2021) How can we combat multicenter variability in MR radiomics? Validation of a correction procedure. Eur Radiol 31(4):2272– 2280. https://doi.org/10.1007/s00330-­020-­07284-­9 41. Lucia F et  al (2019) External validation of a combined PET and MRI radiomics model for prediction of recurrence in cervical cancer patients treated with chemoradiotherapy. Eur J Nucl Med Mol Imaging 46(4):864– 877. https://doi.org/10.1007/s00259-­018-­4231-­9 42. Orlhac F, Frouin F, Nioche C, Ayache N, Buvat I (2019) Validation of a method to compensate multicenter effects affecting CT radiomics. Radiology 291(1):53–59. https://doi.org/10.1148/radiol.2019182023 43. Ibrahim A et al (2021) Radiomics for precision medicine: current challenges, future prospects, and the proposal of a new framework. Methods 188:20–29. https://doi.org/10.1016/j.ymeth.2020.05.022 44. Ibrahim A et al (2021) The application of a workflow integrating the variable reproducibility and harmonizability of radiomic features on a phantom dataset. PLoS One 16(5):e0251147. https://doi.org/10.1371/journal. pone.0251147 45. Andrearczyk V, Depeursinge A, Müller H (2019) Neural network training for cross-protocol radiomic feature standardization in computed tomography. J Med Imaging 6(02):1. https://doi.org/10.1117/1.JMI.6.2.024008 46. Dinsdale NK, Jenkinson M, Namburete AIL (2021) Deep learningbased unlearning of dataset bias for MRI harmonisation and confound removal. NeuroImage 228:117689. https://doi.org/10.1016/j.neuroimage.2020.117689