Biomedical Information Technology (Biomedical Engineering) [2 ed.] 0128160349, 9780128160343

Biomedical Information Technology, Second Edition, contains practical, integrated clinical applications for disease dete

1,426 138 54MB

English Pages 820 [795] Year 2019

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Biomedical Image Analysis (Biomedical Engineering) [1 ed.] 9780203492543, 0203492544

212 63 24MB Read more

Multiscale Modelling in Biomedical Engineering (IEEE Press Series on Biomedical Engineering) [1 ed.] 1119517346, 9781119517344

Multiscale Modelling in Biomedical Engineering Discover how multiscale modeling can enhance patient treatment and outcom

331 120 12MB Read more

Careers in Biomedical Engineering 0128148160, 9780128148167

Careers in Biomedical Engineering offers readers a comprehensive overview of new career opportunities in the field of bi

872 161 3MB Read more

Handbook of Photonics for Biomedical Engineering 9789400761742

595 66 40MB Read more

Explainable Artificial Intelligence for Biomedical Applications (River Publishers Series in Biomedical Engineering) 8770228493, 9788770228497

Since its first appearance, artificial intelligence has been ensuring revolutionary outcomes in the context of real-worl

238 74 31MB Read more

Biomedical Engineering and Environmental Engineering 9781138028050, 9781315685489, 1531571611, 1138028053

This conference series is a forum for enhancing mutual understanding between Biomedical Engineering and Environmental En

875 66 78MB Read more

Biomedical Informatics

886 59 14MB Read more

Advances in Biomedical Engineering and Technology: Select Proceedings of ICBEST 2018 [1st ed.] 9789811563287, 9789811563294

This book comprises select peer-reviewed papers presented at the International Conference on Biomedical Engineering Scie

1,261 80 20MB Read more

Application of Biomedical Engineering in Neuroscience 9789811371424, 9789811371417, 9811371423

This book focuses on interdisciplinary research in the field of biomedical engineering and neuroscience. Biomedical engi

183 26 62MB Read more

Biomedical Engineering and Science [1 ed.] 9781601323552

Biomedical Engineering and Scienceis a compendium of articles and papers that were presented at the 2016 international c

164 57 8MB Read more

Biomedical Information Technology (Biomedical Engineering) [2 ed.]
0128160349, 9780128160343

Author / Uploaded
David Dagan Feng (editor)

Table of contents :
Cover
Biomedical Information Technology
Copyright
Contributors
Acknowledgements
Introduction
Part One: Biomedical data technologies
ONE . Medical imaging
1.1 Introduction
1.2 Digital radiography
1.2.1 Formation and characteristics of X-rays
1.2.2 Scatter and attenuation of X-rays in tissue
1.2.3 Instrumentation for digital radiography
1.3 Computed tomography
1.3.1 Principles of computed tomography
1.3.2 Spiral and multislice computed tomography
1.4 Nuclear medicine
1.4.1 Radioactive nuclides in nuclear medicine
1.4.2 Nuclear medicine detectors
1.4.3 Single-photon emission-computed tomography
1.4.4 Positron-emission tomography
1.4.5 Combined positron-emission tomography/computed tomography scanners
1.4.6 Combined positron-emission tomography/magnetic resonance scanners
1.5 Ultrasonic imaging
1.5.1 Fundamentals of ultrasound
1.5.2 Transducers and beam characteristics
1.5.3 Image acquisition and display
1.6 Magnetic resonance imaging
1.6.1 Basis of magnetic resonance imaging
1.6.2 Magnetic field gradients
1.6.3 Fourier imaging techniques
1.6.4 Magnetic resonance imaging contrast agents
1.7 Diffuse optical imaging
1.7.1 Propagation of light through tissue
1.7.2 Measurement of blood oxygenation
1.7.3 Image reconstruction
1.7.4 Measurement techniques
1.8 Biosignals
1.8.1 Electroencephalography
1.8.2 Electrocardiograms
1.9 Digital cameras and microscopes
Appendix
A.1 Fourier transforms
A.2 Filtered backprojection
A.3 Iterative image reconstruction
Exercises
Further reading
General imaging textbooks
X-ray and computed tomography books
Nuclear medicine books
Ultrasonic imaging
Magnetic resonance imaging
Diffuse optical imaging books
Diffuse optical imaging review papers
Biosignals
Digital cameras and microscopes
TWO . Biomedical sensors
2.1 Introduction
2.2 Wearable devices
2.2.1 Wearable sensing technology and needs
2.2.2 Application examples of wearable sensing technology
2.2.2.1 Microelectromechanical system motion sensor
2.2.2.2 Flexible sensor
2.2.2.3 Wearable biosensors
2.2.3 GluSense artificial islet system
2.2.4 Contact lenses for detecting blood sugar
2.3 Biochip
2.3.1 Gene chips
2.3.2 Protein chips
2.3.3 Cell chips
2.3.4 Tissue chips
2.3.5 Organoid chips
2.4 Biosensors
2.4.1 Biological molecular sensor
2.4.2 Cell-based biosensors
2.5 Implantable sensors
2.5.1 Biocompatibility
2.5.2 Biofunctionality: sensitivity and specificity
2.5.3 Miniaturizing: nanomaterials
2.5.4 Lifetime
2.6 Neural sensing and interfacing
2.7 Summary
References
THREE . Biological computing
3.1 Introduction
3.2 General workflow for the analysis of biological samples
3.3 Overview of genomic methods
3.3.1 Introduction of next-generation DNA sequencing
3.3.2 Workflow for DNA sequencing data processing
3.3.3 Other types of sequencing data and applications
3.4 Overview of proteomic methods
3.4.1 Noise filtering
3.4.2 Deisotoping
3.4.3 Peak detection
3.4.4 Normalization
3.4.5 Retention time alignment and peak matching
3.4.6 Differential expression analysis
3.4.7 Analysis of targeted quantitative proteomic data
3.4.8 Introduction to label-based protein quantitation
3.4.9 Introduction of protein data processing pipeline of MS-PyCloud
3.5 Biological databases and open-source software
3.5.1 Brief introduction of major biological databases
3.5.2 Introduction of open-source software
3.5.3 Usability of open source software
3.5.4 Commercial products based on open-source software
3.6 Biological network analysis
3.6.1 Brief introduction of biological network analysis
3.6.2 Introduction of differential dependency network analysis
3.7 Summary
Acknowledgments
References
FOUR . Picture archiving and communication systems and electronic medical records for the healthcare enterprise
4.1 Introduction
4.1.1 The role of the picture archiving and communication system in the clinical environment
4.1.2 The role of the picture archiving and communication system in medical imaging informatics
4.1.3 General picture archiving and communication system design: introduction and impact
4.1.4 Chapter overview
4.2 Picture archiving and communication system infrastructure
4.2.1 Introduction to picture archiving and communication system infrastructure design
4.2.2 Industry standards
4.2.2.1 Health Level 7
4.2.2.2 Digital Imaging and Communications in Medicine version 3.0 standard
4.2.2.3 Digital Imaging and Communications in Medicine data model
4.2.2.4 Digital Imaging and Communications in Medicine service classes
4.2.2.5 Integrating the Healthcare Enterprise
4.2.3 Connectivity and open architecture
4.2.4 Reliability
4.2.5 Security
4.2.6 Current picture archiving and communication system architectures
4.2.6.1 Client/server picture archiving and communication system architecture
4.2.6.2 Web-based model
4.3 Picture archiving and communication system components and workflow
4.3.1 Introduction of components
4.3.2 Image acquisition gateway
4.3.3 Picture archiving and communication system server and image archive
4.3.4 Display workstations
4.3.5 Communications and networking
4.3.6 Picture archiving and communication system workflow
4.4 Picture archiving and communication system server and image archive
4.4.1 Image management and design concept
4.4.2 Picture archiving and communication system server and storage archive functions
4.4.2.1 The archive server
4.4.2.2 The database system
4.4.2.3 The storage archive or library
4.4.2.4 Communication networks
4.4.2.5 Picture archiving and communication system server and storage archive functions
4.4.3 Digital Imaging and Communications in Medicine–compliant picture archiving and communication system archive server
4.4.4 Hardware and software components
4.4.4.1 Redundant array of inexpensive disks
4.4.4.2 Digital linear tape
4.4.4.3 Storage area network
4.4.4.4 Cloud storage
4.4.4.5 Vendor neutral archive
4.4.4.6 Archive server software
4.4.5 Disaster recovery and backup archive solutions
4.4.6 Current changes in picture archiving and communication system architecture: the vendor neutral archive
4.5 Picture archiving and communication system clinical experiences
4.5.1 Introduction
4.5.2 Picture archiving and communication system implementation strategy
4.5.2.1 Risk assessment analysis
4.5.2.2 Implementation phase development
4.5.2.3 Development of workgroups
4.5.2.4 Implementation management
4.5.3 System acceptance
4.5.4 Image/data migration
4.5.5 Picture archiving and communication system clinical experiences and pitfalls
4.5.5.1 Clinical experiences at Baltimore VA Medical Center
4.5.5.2 Clinical experience at Saint John's Health Center
4.5.5.3 Picture archiving and communication system pitfalls
4.6 Introduction to hospital clinical systems
4.6.1 Hospital information system and the electronic medical record
4.6.2 Radiology information system
4.6.3 Voice recognition system
4.6.4 Interfacing picture archiving and communication, hospital information, radiology information, and voice recognition systems ...
4.6.4.1 Database-to-database transfer
4.6.4.2 Interface engine
4.6.4.3 Integrating health information, radiology information, picture archiving and communication, and voice recognition systems
4.7 Picture archiving and communication systems and electronic medical records
4.7.1 Changes in the roles of the picture archiving and communication systems and electronic medical records in healthcare
4.7.2 Large-scale enterprise-wide electronic medical record implementation and design
4.7.2.1 Step 1: strategic planning
4.7.2.2 Step 2: adapting the workflow
4.7.2.3 Step 3: financing
4.7.2.4 Step 4: recruiting the workforce
4.7.2.5 Step 5: collaboration
4.7.2.6 Step 6: choosing an electronic medical record vendor
4.7.2.7 Step 7: go-live and preparation for clinical use
4.7.2.7.1 Data migration and cleansing
4.7.2.7.2 Training program development
4.7.2.7.3 Go-live activities
4.7.2.8 Step 8: system evaluation and optimizing for quality assessment
4.7.3 Electronic medical record integration with medical images and picture archiving and communication system
4.7.4 Electronic medical record implementation use case: Los Angeles County department of Health Services ORCHID project
4.7.4.1 Integration of ORCHID with picture archiving and communication system and non-DICOM images
4.8 Summary
4.9 Exercises
Further reading
Part Two: Artificial intelligence and big data processing in biomedicine
FIVE
. Machine learning in medical imaging
5.1 Medical imaging
5.1.1 Role in healthcare
5.2 Machine intelligence and machine learning
5.3 Supervised learning
5.3.1 Overview
5.3.2 Classification with supervised machine learning
5.3.2.1 Nearest neighbor approaches
5.3.2.2 Support vector machines
5.3.2.3 Supervised deep learning
5.3.2.4 Multilabel classification
5.3.2.5 Classification of multimodality imaging data
5.3.3 Image segmentation with supervised machine learning
5.3.3.1 Segmentation with convolutional neural networks
5.3.3.2 Segmentation via statistical shape models
5.3.3.3 Saliency-based segmentation
5.3.4 Image synthesis with supervised machine learning
5.4 Unsupervised learning
5.4.1 Overview
5.4.2 Unsupervised clustering
5.4.2.1 Image segmentation via unsupervised clustering
5.4.3 Unsupervised representation learning
5.4.3.1 Statistical approaches for unsupervised representation learning
5.4.3.2 Deep unsupervised representation learning
5.5 Semisupervised learning
5.6 Reinforcement learning
5.7 Summary
5.8 Questions
References
SIX
. Health intelligence
6.1 Introduction
6.2 Predictive modeling and forecasting for health intelligence
6.3 Multiple facets of health intelligence
6.3.1 Global health intelligence
6.3.2 Public and population health intelligence
6.3.2.1 Social components of public and population health intelligence
6.3.2.2 Population health intelligence and health disparities
6.3.2.3 Ethical dilemmas in public and population health intelligence
6.3.3 Personalized health and point-of-care intelligence
6.3.3.1 Point-of-care analytics
6.3.3.2 Research themes
6.3.3.2.1 Heart rate characteristics
6.3.3.2.2 Physiological multimodal methods
6.3.3.2.3 Future directions
6.4 Conclusions
References
SEVEN
. Artificial intelligence in bioinformatics: automated methodology development for protein residue contact map prediction
7.1 Background
7.2 Evaluation of prediction performance
7.3 Contact map prediction models
7.3.1 Correlated mutation analysis
7.3.2 Direct correlation analysis
7.3.2.1 Direct-coupling analysis
7.3.2.2 Sparse inverse covariance estimation
7.3.2.3 Network deconvolution
7.3.3 Supervised learning models
7.3.3.1 Traditional machine learning models
7.3.3.2 Convolutional neural network-based models
7.4 Performance significantly depends on MSA features
7.5 Conclusions
References
EIGHT
. Deep learning in biomedical image analysis
8.1 Introduction—deep learning meets medical image analysis
8.2 Basics of deep learning
8.2.1 Feed-forward neural networks
8.2.2 Stacked autoencoder
8.2.3 Convolutional neural networks
8.2.4 Tips to reduce overfitting
8.2.5 Open-source software toolkits for deep learning
8.2.6 Brief summary of deep learning in biomedical imaging
8.3 Applications in biomedical imaging
8.3.1 Deep feature representation learning in the medical imaging area
8.3.2 Medical image segmentation using deep learning
8.3.3 Nuclear segmentation in mouse microscopy images using convolutional neural networks
8.3.3.1 3-D convolutional neural network for cell segmentation
8.3.3.2 Cascaded convolution neural network using contextual features
8.3.3.3 Advantage of cascaded convolutional neural network over single convolutional neural network
8.3.3.4 Evaluation of cell segmentation accuracy with comparison to current state-of-the-art methods
8.4 Conclusion
References
NINE
. Automatic lesion detection with three-dimensional convolutional neural networks
9.1 Introduction
9.2 3-D convolutional neural network
9.2.1 3-D convolutional kernel
9.2.2 3-D CNN hierarchical model
9.3 Efficient fully convolutional architecture
9.3.1 Fully convolutional transformation
9.3.2 3-D score volume generation
9.3.3 Score volume index mapping
9.4 Two-stage cascaded framework for detection
9.4.1 Candidate screening stage
9.4.2 False positive reduction stage
9.5 Case study I: cerebral microbleed detection in brain magnetic resonance imaging
9.5.1 Background of the application
9.5.2 Dataset, preprocessing and evaluation metrics
9.5.3 Experimental results
9.6 Case study II: lung nodule detection in chest computed tomography
9.6.1 Background of the application
9.6.2 Improved learning strategy
9.6.3 Dataset, preprocessing and evaluation metrics
9.6.4 Experimental results
9.7 Discussion
9.8 Conclusions
Acknowledgments
References
TEN
. Biomedical image segmentation for precision radiation oncology
10.1 Introduction
10.2 Graph models in biomedical image segmentation
10.2.1 Graph nodes
10.2.2 Graph edges
10.2.2.1 Nodes connection
10.2.2.2 Weighting function
10.2.3 Graph matrices
10.2.4 Graph-theoretic methods in target object segmentation
10.2.4.1 Random walker–based models
10.2.4.2 Graph Cut, Normalized Cut and Average Cut
10.2.5 Applications in medical image segmentation
10.3 Deep network in object detection and segmentation
10.3.1 Deep object detection
10.3.1.1 Region-based convolutional neural network–based models
10.3.1.2 Multiscale location-aware kernel representation
10.3.2 Deep image segmentation
10.3.2.1 Architecture of mask region-based convolutional neural networks
10.4 Applications for medical image processing
10.4.1 Nucleus segmentation
10.4.2 Ultrasound image segmentation
10.5 Computational delineation and quantitative heterogeneity analysis for personalized radiation treatment planning
10.6 Summary
References
ELEVEN
. Content-based large-scale medical image retrieval
11.1 Introduction
11.2 Fundamentals of content-based image retrieval
11.2.1 General framework architecture
11.2.2 Image features used in retrieval
11.2.3 Retrieval in medical imaging
11.3 Visual feature-based retrieval
11.3.1 Retrieval based on color
11.3.2 Retrieval based on texture
11.4 Geometric spatial feature-based retrieval
11.4.1 Retrieval based on shape
11.4.2 Retrieval by 3-D volumetric features
11.4.3 Retrieval by spatial relationships
11.5 Clinical contextual and semantic retrieval
11.5.1 Retrieval by semantic pathology interpretation
11.5.2 Retrieval based on generic models
11.5.3 Retrieval based on physiological functional features
11.5.4 Understanding visual features and their relationship to retrieved data
11.6 Summary
11.7 Exercises
Acknowledgments
References
TWELVE
. Diversity and novelty in biomedical information retrieval
12.1 Introduction and motivation
12.2 Overview of novelty and diversity boosting in biomedical information retrieval
12.3 Boosting diversity and novelty in biomedical information retrieval
12.3.1 Boosting novelty by maximal marginal relevance
12.3.2 Boosting novelty by probabilistic latent semantic analysis
12.3.3 Boosting diversity by relevance-novelty graphical model
12.4 Diversity and novelty evaluation metrics
12.4.1 Subtopic retrieval metrics
12.4.2 α-nDCG
12.4.3 geNov
12.5 Evaluation results of diversity and novelty metrics
12.5.1 Sensitiveness to the ranking qualities
12.5.2 Discriminative power and running time
12.6 Summary and future work
Acknowledgments
References
THIRTEEN
. Toward large-scale histopathological image analysis via deep learning
13.1 Introduction
13.2 Unique challenges in histopathological image analysis
13.3 Computer-aided diagnosis for histopathological image analysis
13.3.1 Fine-grained analysis of regions of interest
13.3.2 High-level analysis of whole-slide images
13.3.3 Deep learning acceleration for histopathological image analysis
13.4 Deep learning for histopathological image analysis
13.4.1 Overview
13.4.2 Patch encoding with convolutional neural networks
13.4.3 Accurate prediction via two-dimensional long short-term memory
13.4.4 Loss function
13.4.5 Results and discussions
13.5 High-throughput histopathological image analysis
13.5.1 Overview
13.5.2 Small-capacity network
13.5.3 Transfer learning from large-capacity network
13.5.4 Feature adaptation from intermediate layers
13.5.5 Efficient inference
13.5.5.1 Results and analysis
13.6 Summary
References
FOURTEEN
. Data modeling and simulation
14.1 Introduction
14.2 Compartmental models
14.2.1 Tracee model
14.2.2 Tracer model
14.2.3 Linking tracer and tracee models
14.3 Model identification
14.3.1 A priori identifiability
14.3.1.1 Examples
14.3.1.2 Definitions
14.3.1.3 The model is a priori
14.3.1.4 The transfer function method
14.3.2 Parameter estimation
14.3.2.1 Weighted least squares
14.3.3.1 Residuals and weighted residuals defined for the aforementioned linear case
14.3.3.2 Test of model order
14.4 Model validation
14.4.1 Simulation
14.5 Case study
14.6 Quantification of medical images
14.6.1 Positron-emission tomography
14.6.2 Blood flow
14.6.2.1 Glucose metabolism
14.6.2.2 Receptor binding
14.6.3 Arterial spin labeling–magnetic resonance imaging
14.6.4 Dynamic susceptibility contrast magnetic resonance imaging
14.7 Exercises
References
Further reading
FIFTEEN
. Image-based biomedical data modeling and parametric imaging
15.1 Introduction
15.1.1 Anatomical and molecular imaging
15.1.2 Compartmental models
15.1.3 Kinetic modeling in molecular imaging
15.1.4 Parameter estimation and parametric images in molecular imaging
15.1.5 Compartment model parameter estimation
15.1.5.1 Nonlinear least squares fitting
15.1.5.2 Steady state techniques
15.2 Parametric image estimation methods
15.2.1 Autoradiographic technique
15.2.2 Standardized uptake value method
15.2.3 Integrated projection method
15.2.4 Weighted integrated method
15.2.5 Spectral analysis
15.2.6 Graphical analysis methods
15.2.6.1 Patlak graphical analysis
15.2.6.2 Logan graphical analysis
15.2.6.3 Yokoi plot
15.2.6.4 Relative equilibrium-based graphical plot
15.2.7 Linear least squares method
15.2.7.1 Linear least squares
15.2.7.2 Generalized linear least squares
15.2.7.3 Improved versions for generalized linear least squares methods
15.2.7.4 Multiple linear analysis for irreversible radiotracer
15.2.8 Parametric image reconstruction method
15.3 Noninvasive methods
15.3.1 Image-derived input function
15.3.2 Reference tissue model
15.3.3 Population-based input function and cascaded modeling approaches
15.4 Applications of parametric imaging and kinetic modeling
15.4.1 Blood flow parametric images
15.4.2 Oxygen-consumption parametric images
15.4.3 Glucose metabolism parametric images
15.4.4 Receptor-specific parametric images
15.4.5 Recent applications of kinetic modeling in preclinical and clinical studies
15.5 Summary
References
SIXTEEN
. Molecular imaging in biology and pharmacology
16.1 Introduction and background
16.1.1 Basic elements and new developments in molecular imaging
16.1.2 Recent developments in biology and pharmaceuticals
16.2 Considerations for quantitative molecular imaging
16.2.1 Input function
16.2.2 Physiological/biological model
16.3 Design/development of molecular imaging probes
16.3.1 Chemical probes (small molecules)
16.3.2 Biological probes (antibodies, peptides, aptamers)
16.4 Molecular imaging of beta-amyloids and neurofibrillary tangles
16.4.1 Brief review of molecular probes for beta-amyloid imaging
16.4.2 In vitro characterization of FDDNP
16.4.3 In vivo imaging of beta-amyloids and neurofibrillary tangles in Alzheimer disease
16.5 Molecular imaging using antibody probes
16.5.1 Imaging cell-surface phenotype
16.5.2 Optimization of antibodies for in vivo targeting
16.5.3 Measurement of target expression
16.5.4 Monitoring response to therapy
16.6 Some other molecular imaging applications
16.6.1 In vivo regional substrate metabolism in human brain
16.6.2 Cell proliferation rate in mouse tumor
16.6.3 Measurement of murine cardiovascular physiology
16.7 Summary and future perspectives
16.7.1 Optical imaging, MicroSPECT, microfluidic blood sampler
16.7.2 Automated image/data analysis
16.7.3 Virtual experimentation
16.7.4 Total-body imaging and tracer kinetics in the entire human body
16.7.5 Artificial intelligence in molecular imaging
16.8 Exercises
References
SEVENTEEN
. Biomedical image visualization and display technologies
17.1 Introduction
17.2 Biomedical imaging modalities
17.2.1 Single-modality volumetric biomedical imaging data
17.2.2 Multimodality biomedical imaging
17.2.3 Serial scans of biomedical imaging modalities
17.3 Biomedical image visualization pipeline
17.4 Volume rendering techniques
17.4.1 Two-dimensional visualization
17.4.2 Three-dimensional surface rendering visualization
17.4.3 Three-dimensional direct volume rendering visualization
17.4.3.1 Direct volume rendering computing pipeline
17.4.3.2 Image semantic analysis for direct volume rendering visualization
17.4.3.3 Transfer function designs
17.4.3.4 Volume clipping and viewpoint selection
17.4.4 Multimodality direct volume rendering visualization
17.4.5 Direct volume rendering visualization for serial scans
17.5 Display technology
17.5.1 Two-dimensional conventional visualization display technologies
17.5.2 Virtual reality visualization
17.5.3 Augmented reality visualization
17.6 Development platforms for biomedical image visualization
17.6.1 Voreen (volume rendering engine)
17.6.2 The visualization toolkit
17.6.3 MeVisLab
17.7 Conclusions
17.8 Questions
References
EIGHTEEN
. Biomedical image characterization and radiogenomics
18.1 Introduction
18.2 Radiomic characterization of medical imaging
18.2.1 Handcrafted radiomic analysis
18.2.1.1 Region of interest identification
18.2.1.1.1 Tumor area.
18.2.1.1.2 Heterogeneous intratumoral subregion.
18.2.1.1.2.1 Tumor image heterogeneity evaluation by human definition.
18.2.1.1.2.2 Tumor image heterogeneity evaluation by clustering analysis.
18.2.1.1.2.3 Tumor heterogeneity analysis by image decomposition.
18.2.1.1.3 Normal-appearing tissue area
18.2.1.1.3.1 Background parenchyma that surrounds tumor.
18.2.1.1.3.2 Tumor contralateral areas.
18.2.1.2 Feature extraction and quantification
18.2.1.2.1 Human-defined features.
18.2.1.2.2 Semiautomatic approaches.
18.2.1.3 Feature selection and predictive model building
18.2.2 Deep learning-based radiomic analysis
18.2.3 Multimodality/multiparametric radiomics
18.3 Radiogenomics for uncovering cancer mechanism
18.3.1 Individual genomic signatures
18.3.2 Multiomics whole-genome genomic features
18.4 Radiomics as signatures for non-invasive probes of cancer related molecular biomarkers
18.4.1 Molecular subtypes prediction
18.4.2 Clinical biomarkers
18.5 Radiogenomic applications in cancer diagnosis and treatment
18.5.1 Radiomics for tumor diagnosis
18.5.2 Radiomic for prediction treatment response
18.5.3 Radiomic for prediction of tumor prognosis
18.5.4 Radiomic for prediction of tumor recurrence scores
18.5.5 Integration of image and clinical/genomic features for cancer diagnosis and treatment
18.6 Summary
References
Part Three: Emerging technologies in biomedicine
NINETEEN
. Medical robotics and computer-integrated interventional medicine
19.1 Introduction
19.2 Technology and techniques
19.2.1 System architecture
19.2.2 Registration and transformations between coordinate systems
19.2.3 Navigational trackers
19.2.4 Robotic devices
19.2.5 Intraoperative human–machine interfaces
19.2.6 Sensorized instruments
19.2.7 Software and robot control architectures
19.2.8 Accuracy evaluation and validation
19.2.9 Risk analysis and regulatory compliance
19.3 Surgical CAD/CAM
19.3.1 Example: robotically assisted joint reconstruction
19.3.2 Example: needle placement
19.4 Surgical assistance
19.4.1 Basic concepts
19.4.2 Surgical navigation systems as information assistants
19.4.3 Surgeon extenders
19.4.4 Auxiliary surgeon supports
19.4.5 Remote telesurgery and telementoring
19.4.6 Toward “intelligent” surgical assistance
19.5 Summary and conclusion
19.6 Exercises
References
TWENTY
. Virtual and augmented reality in medicine
20.1 Introduction
20.2 Surgical education with virtual reality technologies
20.2.1 Laparoscopic virtual reality surgery simulations
20.2.1.1 Minimally invasive surgery trainer—virtual reality [31]
20.2.1.2 LapSim [28]
20.2.1.3 Laparoscopy virtual reality [26]
20.2.1.4 SINERGIA [32]
20.2.2 Arthroscopy training with virtual reality
20.3 Minimally invasive surgery with augmented reality
20.3.1 Neurosurgery with augmented reality
20.3.2 Soft-tissue surgery with augmented reality
20.3.3 Catheterized interventional procedures with augmented reality
20.3.4 Orthopedic surgery
20.3.5 Intravenous injection
20.4 Mental health care with virtual reality and augmented reality technologies
20.4.1 Virtual reality
20.4.2 Augmented reality
20.5 Other medical applications with virtual and augmented reality technologies
20.5.1 Telementoring
20.5.2 Anatomy education
20.6 Future research and development opportunities as well as challenges in the healthcare zone
20.7 Summary
References
Further reading
TWENTY ONE
. Sensory information feedback for neural prostheses
21.1 Introduction
21.2 Background: anatomy and physiology of the somatosensory system
21.2.1 Somatosensory receptors
21.2.1.1 Touch
21.2.1.2 Proprioception
21.2.2 Thermoreception and nociception
21.2.3 Properties of somatosensory receptors
21.2.3.1 Location
21.2.3.2 Intensity
21.2.3.3 Duration
21.2.4 Integration of somatosensory input
21.2.5 Spinal reflexes
21.2.6 Ascending sensory pathways
21.2.7 Dorsal column–medial lemniscus tract
21.2.7.1 Spinothalamic tract
21.2.7.2 Spinocerebellar tract
21.3 Overview of sensory feedback in neural prostheses
21.4 Anatomical targets and interface technologies for stimulating somatosensory inputs
21.4.1 Transcutaneous targets and techniques
21.4.1.1 Vibrotactile
21.4.1.2 Electrotactile
21.4.1.3 Applications
21.4.2 Implantable targets and techniques
21.5 Anatomical targets and interface technologies for sensing somatosensory inputs
21.6 Summary and future directions
21.6.1 Summary
21.6.2 Future directions
21.6.2.1 Technology
21.6.2.2 Neural reinnervation (surgical)
References
TWENTY TWO
. Mobile health (m-health): evidence-based progress or scientific retrogression
22.1 Introduction
22.1.1 What is mobile health?
22.1.2 Defining mobile health and rapprochement with digital health
22.1.3 Advances in the triangular pillars of mobile health
22.1.4 The evidence of mobile health: market progress or clinical retrogression
22.1.5 m-Health for diabetes care: an exemplar of market v/s clinical retrogression
22.2 The science of mobile health: recent developments and challenges
22.3 Conclusions
References
Further reading
TWENTY THREE
. Health and medical behavior informatics
23.1 Introduction
23.2 Behavior and behavior informatics
23.2.1 Behavior
23.2.2 Behavior informatics
23.2.2.1 Behavior representation and reasoning
23.2.2.2 Behavior analysis and learning
23.2.2.3 Behavior management and applications
23.2.3 Applications of behavior informatics
23.3 Health and medical behavior
23.3.1 Health behavior
23.3.2 Medical behavior
23.4 Health and medical behavior informatics
23.4.1 Health behavior informatics
23.4.1.1 Health behavior acquisition and construction
23.4.1.2 Health behavior modeling and representation
23.4.1.3 Health behavior analysis, learning and evaluation
23.4.1.4 Health behavior management and applications
23.4.2 Medical behavior informatics
23.4.2.1 Medical behavior acquisition and construction
23.4.2.2 Medical behavior modeling and representation
23.4.2.3 Medical behavior analysis, learning, and evaluation
23.4.2.4 Medical behavior applications and management
23.4.3 Integrative health and medical behavior informatics
23.5 Related work
23.5.1 Connection to health behavior research
23.5.2 Connection to behavioral medicine
23.5.3 Connection to health/medical informatics and medical imaging
23.6 Prospects
Acknowledgment
References
Index
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
Back Cover

Citation preview

Biomedical Information Technology

SECOND EDITION Edited by

DAVID DAGAN FENG

Academic Press is an imprint of Elsevier 125 London Wall, London EC2Y 5AS, United Kingdom 525 B Street, Suite 1650, San Diego, CA 92101, United States 50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom Copyright Ó 2020 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/ permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library ISBN: 978-0-12-816034-3 For information on all Academic Press publications visit our website at https://www.elsevier.com/books-and-journals Publisher: Mara Conner Acquisition Editor: Chris Katsaropoulos Editorial Project Manager: Emma Hayes Production Project Manager: Sruthi Satheesh Cover Designer: Miles Hitchen Typeset by TNQ Technologies

Contributors

Oguz Akbilgic University of Tennessee Health Science Center e Oak Ridge National Laboratory (UTHSC-ORNL) Center for Biomedical Informatics, Department of Pediatrics, Memphis, TN, United States

Turki AlAnzi College of Public Health, Imam AbdulRahman Bin Faisal University, Dammam, Saudi Arabia

Xiangdong An Department of Computer Science, University of Tennessee at Martin, Martin, TN, United States

Jorge R. Barrio Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, CA, United States

Alessandra Bertoldo Department of Information Engineering, University of Padova, Padova, Italy

Lei Bi Biomedical & Multimedia Information Technology (BMIT) Research Group, School of Computer Science, The University of Sydney, Sydney, NSW, Australia; ARC Training Centre for Innovative BioEngineering, Sydney, NSW, Australia

Sheng Bin Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China

Weidong Cai Biomedical & Multimedia Information Technology (BMIT) Research Group, School of Computer Science, The University of Sydney, Sydney, NSW, Australia

Longbing Cao Advanced Analytics Institute, University of Technology Sydney, Sydney, NSW, Australia

Hao Chen Department of Computer Science and Engineering, The University of Hong Kong, Sha Tin, Hong Kong

Robert Clarke Lombardi Cancer Research Center, Georgetown University Medical Center, Washington, DC, United States

Claudio Cobelli Department of Information Engineering, University of Padova, Padova, Italy

Hui Cui Biomedical & Multimedia Information Technology (BMIT) Research Group, School of Computer Science, The University of Sydney, Sydney, NSW, Australia; Department of Computer Science and Information Technology, La Trobe University, Melbourne, Australia

Qi Dou Department of Computer Science and Engineering, The University of Hong Kong, Sha Tin, Hong Kong

Contributors

xiv

Yiping P. Du Shanghai Jiao Tong University, China

Stefan Eberl Department of Molecular Imaging, Royal Prince Alfred Hospital, Sydney, NSW, Australia; Biomedical & Multimedia Information Technology (BMIT) Research Group, School of Computer Science, The University of Sydney, Sydney, NSW, Australia

Ming Fan Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou, Zhejiang, China

Shi-Hao Feng Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China

David Dagan Feng Biomedical & Multimedia Information Technology (BMIT) Research Group, School of Computer Science, The University of Sydney, Sydney, NSW, Australia

Gregory S. Fischer Department of Mechanical Engineering, Worcester Polytechnic Institute, Worcester, MA, United States

Yi Fu The Bradley Department of Electrical and Computer Engineering, Virginia Tech Research Center e Arlington, Arlington, VA, United States

Michael J. Fulham Department of Molecular Imaging, Royal Prince Alfred Hospital, and Biomedical & Multimedia Information Technology (BMIT) Research Group, School of Computer Science, and Sydney Medical School, The University of Sydney, Sydney, NSW, Australia

Fan Gao Biosensor National Special Laboratory, Key Laboratory of Biomedical Engineering of Ministry of Education, Department of Biomedical Engineering, Zhejiang University, Hangzhou, China

Manzhao Hao School of Biomedical Engineering, and Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, China

Pheng-Ann Heng Department of Computer Science and Engineering, The University of Hong Kong, Sha Tin, Hong Kong

Sung-Cheng Huang Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, CA, United States

H.K. Huang University of Southern California, Los Angeles CA, USA; Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong; Shanghai Institute of Technical Physics, The Chinese Academy of Sciences, Shanghai, China

Jimmy Xiangji Huang Department of Computer Science, University of Tennessee at Martin, Martin, TN, United States

Contributors

Robert S.H. Istepanian College of Public Health, Imam AbdulRahman Bin Faisal University, Dammam, Saudi Arabia

Younhyun Jung Biomedical & Multimedia Information Technology (BMIT) Research Group, School of Computer Science, The University of Sydney, Sydney, NSW, Australia

Rishikesan Kamaleswaran University of Tennessee Health Science Center e Oak Ridge National Laboratory (UTHSC-ORNL) Center for Biomedical Informatics, Department of Pediatrics, Memphis, TN, United States

Peter Kazanzides Department of Computer Science, The Johns Hopkins University, Baltimore, MD, United States

Jinman Kim Biomedical & Multimedia Information Technology (BMIT) Research Group, School of Computer Science, The University of Sydney, Sydney, NSW, Australia; ARC Training Centre for Innovative BioEngineering, Sydney, NSW, Australia

Minjeong Kim Department of Computer Science, University of North Carolina at Greensboro, NC, United States

Bin Kong Department of Computer Science, University of North Carolina at Charlotte, Charlotte, NC United States

Ashnil Kumar Biomedical & Multimedia Information Technology (BMIT) Research Group, School of Computer Science, The University of Sydney, Sydney, NSW, Australia; ARC Training Centre for Innovative BioEngineering, Sydney, NSW, Australia

Ning Lan School of Biomedical Engineering, and Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, China

Zhongyu Li Department of Computer Science, University of North Carolina at Charlotte, Charlotte, NC United States

Lihua Li Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou, Zhejiang, China

Brent J. Liu University of Southern California, Los Angeles CA, USA

Junbo Ma Department of Psychiatry, University of North Carolina at Chapel Hill, NC, United States

Chunhong Mao Biocomplexity Institute and Initiative, University of Virginia, Charlottesville, VA, United States

Saleha Masood Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China; Department of Computer Science, COMSATS University, Islamabad, Pakistan

xv

Contributors

xvi

Seong K. Mun Arlington Innovation Center: Health Research, Virginia Tech National Capital Region, Arlington, VA, United States

Yuxiang Pan Biosensor National Special Laboratory, Key Laboratory of Biomedical Engineering of Ministry of Education, Department of Biomedical Engineering, Zhejiang University, Hangzhou, China

Jing Qin School of Nursing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong

Habtom W. Ressom Department of Oncology, Genomics and Epigenomics Shared Resource, Georgetown University Medical Center, NW, Washington, DC, United States

Caroline Schoenewald Department of Physical Medicine and Rehabilitation, University of Pittsburgh, PA, United States

Arash Shaban-Nejad University of Tennessee Health Science Center e Oak Ridge National Laboratory (UTHSC-ORNL) Center for Biomedical Informatics, Department of Pediatrics, Memphis, TN, United States

Hong-Bin Shen Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China

Eun Kyong Shin University of Tennessee Health Science Center e Oak Ridge National Laboratory (UTHSC-ORNL) Center for Biomedical Informatics, Department of Pediatrics, Memphis, TN, United States

Nabil Simaan Department of Mechanical Engineering, Vanderbilt University, Nashville, TN, United States

Nadine Smith Penn State University, United States

Yang Song School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, Australia

Russell H. Taylor Department of Computer Science, The Johns Hopkins University, Baltimore, MD, United States

Tsung-Heng Tsai Khoury College of Computer Sciences, Northeastern University, Boston, MA, United States

Jiawei Tu Biosensor National Special Laboratory, Key Laboratory of Biomedical Engineering of Ministry of Education, Department of Biomedical Engineering, Zhejiang University, Hangzhou, China

Michael A. Urbin Department of Physical Medicine and Rehabilitation, University of Pittsburgh, PA, United States; VA Pittsburgh Healthcare System, PA, United States

Contributors

Hao Wan Biosensor National Special Laboratory, Key Laboratory of Biomedical Engineering of Ministry of Education, Department of Biomedical Engineering, Zhejiang University, Hangzhou, China

Hao Wang School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China

Yuqi Wang School of Information Technology, York University, Toronto, Canada

Qian Wang School of Communication and Information Engineer, Xi’an University of Posts & Telecommunications, Xi’an, China

Xiuying Wang Biomedical & Multimedia Information Technology (BMIT) Research Group, School of Computer Science, The University of Sydney, Sydney, NSW, Australia

Minkun Wang iCarbonX, Shenzhen, Guangdong Province, China

Ping Wang Biosensor National Special Laboratory, Key Laboratory of Biomedical Engineering of Ministry of Education, Department of Biomedical Engineering, Zhejiang University, Hangzhou, China

Yue Wang Computational Bioinformatics and Bio-imaging Laboratory Grant A. Dove Professor, The Bradley Department of Electrical & Computer Engineering Virginia Polytechnic Institute and State University, Arlington, VA, United States

Andrew Webb Penn State University, United States

Douglas J. Weber Department of Bioengineering, University of Pittsburgh, PA, United States

Lingfeng Wen Department of Molecular Imaging, Royal Prince Alfred Hospital, Sydney, NSW, Australia; Biomedical & Multimedia Information Technology (BMIT) Research Group, School of Computer Science, The University of Sydney, Sydney, NSW, Australia

Anna M. Wu Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, CA, United States

Guorong Wu Department of Psychiatry, University of North Carolina at Chapel Hill, NC, United States

Jia-Yan Xu Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China

xvii

Contributors

xviii

Chenggang Yan Institute of Information and Control, Hangzhou Dianzi University, Hangzhou, China

Ke Yan Biomedical & Multimedia Information Technology (BMIT) Research Group, School of Computer Science, The University of Sydney, Sydney, NSW, Australia

Defu Yang Institute of Information and Control, Hangzhou Dianzi University, Hangzhou, China

Xiaofeng Zhang Penn State University, United States

Shaoting Zhang Department of Computer Science, University of North Carolina at Charlotte, Charlotte, NC United States

Bin Zhang Biosensor National Special Laboratory, Key Laboratory of Biomedical Engineering of Ministry of Education, Department of Biomedical Engineering, Zhejiang University, Hangzhou, China

Zhen Zhang Departments of Pathology and Oncology, Johns Hopkins Medical Institute, Baltimore, MD, United States

Yitan Zhu Program for Computational Genomics and Medicine, NorthShore University HealthSystem, Evanston, IL, United States

Liujing Zhuang Biosensor National Special Laboratory, Key Laboratory of Biomedical Engineering of Ministry of Education, Department of Biomedical Engineering, Zhejiang University, Hangzhou, China

Wangmeng Zuo School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China

Acknowledgements

The editor would like to take the opportunity to express his sincere appreciation to all of the contributors of this book for making it possible to have such a comprehensive coverage of the most current information in this very dynamic field, to Ms Cindy Bai for assisting data collection, and to the support from ARC grants.

xix

j

Introduction

We are living in a time of constant and exciting change that has been driven, arguably, by advances in computer science, information technologies, human imagination and innovation. These advances continue to directly and indirectly, influence society, biomedicine, and health. “Big data”, “artificial intelligence ”, and “machine learning” are now commonplace terms, although most people have limited understanding of what they mean and what they do. So now, we as scientists, engineers, and researchers have a tremendous, collective opportunity to use our skills to better assimilate, analyze, interpret, and understand the vast amounts of biomedical information and thus improve the quality of life for everyone in society. The 1st edition of this book was published in 2008 and had two main sectionsd Technological Fundamentals and Integrated Clinical Applications. In this 2nd edition, there are three sectionsdBiomedical Data Technologies, Artificial Intelligence/Big Data Processing in Biomedicine, and Emerging Technologies in Biomedicine. We have updated 9 chapters and rewritten another 14 chapters to reflect the emergence of new technologies. The new chapters are: Biomedical Sensors; Machine Learning in Medical Imaging; Health Intelligence; Artificial Intelligence in Bioinformatics: Automated Methodology Development for Protein Residue Contact Map Prediction; Deep Learning in Bio-Medical Image Analysis; Automatic Lesion Detection with 3D Convolutional Neural Networks; Biomedical Image Segmentation for Precision Radiation Oncology; Diversity and Novelty in Biomedical Information Retrieval; Towards Large-Scale Histopathological Image Analysis via Deep Learning; Biomedical Image Characterization and Radiogenomics; Virtual and Augmented Reality in Medicine; Sensory Information Feedback for Neural Prostheses; Mobile Health (m-Health): Evidence Based Progress or Scientific Retrogression; and Health and Medical Behavior Informatics. These chapters reflect the expansion of biomedical knowledge and data across genes, proteins, metabolism, pathology, organs, systems, individuals, and populations, often using new engineering and computer science innovations. The chapters have been written by domain experts from world-renowned research groups, and their names and affiliations can be found in the Contributors List. As with the 1st edition, this 2nd edition can be used as a handbook, reference, and survey for undergraduate and postgraduate students, researchers, and everyday practitioners. Exercises are provided at the end of some chapters to consolidate knowledge and understanding.

xxi

j

xxii

Introduction

Our aim is to provide readers with a comprehensive, up-to-date overview of computer science and information technologies in biomedicine. We hope that this 2nd edition will continue to provide inspiration and encouragement to the next generation of innovators in our field who share our aspirations of contributing to and enhancing society through their chosen profession. Professor David Dagan Feng, FACS, FATSE, FHKIE, FIET, & FIEEE Director, Biomedical & Multimedia Information Technology Research Group School of Computer Science, The University of Sydney

CHAPTER ONE

Medical imaging Xiaofeng Zhang, Nadine Smith, Andrew Webb Penn State University, United States Revised by Yiping P. Du, Shanghai Jiao Tong University, China

1.1 Introduction Medical imaging forms a key part of clinical diagnosis, and improvements in the quality and type of information available from such images have extended the diagnostic accuracy and range of new applications in healthcare. Previously seen as the domain of a hospital’s radiology department, recent technological advances have expanded medical imaging into neurology, cardiology, and cancer centers, to name a few. The past decade in particular has seen many significant advances in each imaging method covered in this chapter. Since a large number of texts (see Bibliography) deal in great detail with the basic physics, instrumentation, and clinical applications of each imaging modality, this chapter summarizes these aspects in a succinct fashion and emphasizes recent technological advances. State-of-the-art instrumentation for clinical imaging now comprises, for example, 320-slice spiral computed tomography (CT), multielement multidimensional phased arrays in ultrasound, combined positron-emission tomography (PET)/CT and PET/magnetic resonance scanners, rapid parallel imaging techniques in magnetic resonance imaging (MRI) using large multidimensional coil arrays, with developments such as integrated diffuse optical tomography (DOT)/MRI on the horizon. Considered together with significant developments in new imaging contrast agents, so-called “molecular imaging agents,” the role of medical imaging looks to continue to expand its role in modern-day healthcare.

1.2 Digital radiography Planar X-ray imaging has traditionally been film-based and is used for diagnosing bone breaks, lung disease, a number of gastrointestinal (GI) diseases (fluoroscopy), and conditions of the genitourinary tract, such as kidney stones (pyelography). Increasingly, images are being formed and stored in digital format for integration with PACS systems, ease of storage and transfer, and image manipulation in, for example, digital subtraction angiography. Many components of conventional film-based systems (X-ray source, collimators, antiscatter grids) are essentially identical to those in digital radiography, the only difference being the detector itself. Biomedical Information Technology ISBN 978-0-12-816034-3, https://doi.org/10.1016/B978-0-12-816034-3.00001-8

Ó 2020 Elsevier Inc. All rights reserved.

3j

Xiaofeng Zhang, Nadine Smith and Andrew Webb

4

1.2.1 Formation and characteristics of X-rays A schematic of an X-ray source is shown in Fig. 1.1A. A potential difference, termed the accelerating voltage (kVp) and typically between 90 and 150 kV, is applied between a small helical cathode coil of tungsten wire and a rotating anode consisting of a tungsten target embedded in a rotating copper disc. When an electric current is passed through the cathode, electrons are emitted via thermionic emission and accelerate toward the anode target, and thus X-rays are created by the interaction of these electrons with the target; this electron flow is termed the tube current. X-rays then pass through a “window” in the X-ray tube. To create the desired thin X-ray beam, a negatively charged focusing cup is placed around the cathode. A broad spectrum of X-ray energies is emitted from the X-ray tube as shown in Fig. 1.1B. Characteristic lines are produced when the accelerated electrons knock out a bound electron in the K-shell of the tungsten anode, with the resulting hole being filled by an electron from the L-shell, and the difference in binding energy of the two electrons being transferred to an X-ray. The broad “hump” component of the X-ray spectrum arises from “general radiation,” which corresponds to an accelerated electron losing part of its kinetic energy when it passes close to a tungsten atom in the target, and this energy being emitted as an X-ray. Overall, the number of X-rays produced by the source is proportional to the tube current, and the energy of the X-ray beam to the square of the accelerating voltage. The collimator, also termed a beam restrictor, consists of lead sheets that can be slid over one another to restrict the beam dimensions to match those of the patient area to be imaged.

1.2.2 Scatter and attenuation of X-rays in tissue The two dominant mechanisms for the interaction of X-rays with tissue are photoelectric absorption and Compton scattering. Photoelectric interactions in the body involve the energy of an incident X-ray being absorbed by an atom in tissue, with a glass/metal envelope induction stator induction rotor induction stator

Relative number of X-rays

rotating tungsten anode cathode ee- focussing cup

X-rays

0 100

20

40 60 X-ray energy

80

Figure 1.1 Schematic of an X-ray tube (left). Typical energy spectrum from a tungsten anode with an accelerating voltage of w100 kVp (right).

Medical imaging

5

tightly bound electron emitted from the K- or L-shelldthe incident X-ray is completely absorbed and does not reach the detector. The probability of photoelectric absorption occurring (Pphoto) is given by Z3 Pphoto f eff E3

(1.1)

where Zeff is the effective atomic number and E is the X-ray energy. Since there is a large difference in the values of Zeff for bone (Zeff ¼ 20 due to the presence of Ca) and soft tissue (Zeff ¼ 7.4), photoelectric absorption produces high contrast between bone and soft tissue. Compton scattering involves the transfer of a fraction of an incident X-ray’s energy to a loosely bound outer shell of an atom in tissue. The X-ray is deflected from its original path but typically maintains a substantial component of its original energy. The probability of Compton scattering is essentially independent of the effective atomic number of the tissue, linearly proportional to tissue electron density, and is weakly dependent on the X-ray energy. Since the electron density is quite similar for bone and soft tissue, Compton scattered X-rays result in very little image contrast. Attenuation of the intensity of the X-ray beam as it travels through tissue can be expressed mathematically by Ix ¼ I0 eðmCompton þmphotoelectric Þx

(1.2)

where I0 is the intensity of the incident X-ray beam, Ix is X-ray intensity at a distance x from the source, and m is the linear attenuation coefficient of tissue measured in cm1. The contribution from photoelectric interactions dominates at lower energies, whereas Compton scattering is more important at higher energies. X-ray attenuation is often characterized in terms of a mass attenuation coefficient equal to the linear attenuation coefficient divided by the density of the tissue. Fig. 1.2 plots the mass attenuation coefficient of fat, bone, and muscle as a function of the incident X-ray energy. At lowincident X-ray energies, bone has a much higher mass attenuation coefficient. As incident X-ray energy increases, the probability of photoelectric interactions decreases greatly, and the value of the mass attenuation coefficient becomes much lower. At X-ray energies greater than about 80 keV, Compton scattering is the dominant mechanism, and the difference in the mass attenuation coefficients of bone and soft tissue is less than a factor of two. At incident X-ray energies greater than around 120 keV, the mass attenuation coefficients for bone and soft tissue are very similar. In cases in which there is little contrast, for example between blood vessels and surrounding tissue, X-ray contrast agents can be used. There are two basic classes of contrast agents, those based on barium and those based on iodine. Barium sulfate is used

Xiaofeng Zhang, Nadine Smith and Andrew Webb

6 Mass attenuation coefficient (cm2g-1)

10

Bone 1 Muscle

Fat 0.1 0

50

100

X-ray energy (keV)

Figure 1.2 Mass attenuation coefficient for bone, muscle, and fat as a function of incident X-ray energy.

to investigate abnormalities such as ulcers, polyps, tumors, or hernias in the GI tract. Since barium has a K-edge at 37.4 keV, X-ray attenuation is much higher in areas where the agent accumulates. Barium sulfate is administered as a relatively thick slurry. Orally, barium sulfate is used to explore the upper GI tract including the stomach and esophagus (the so-called “barium meal”). As an enema, barium sulfate can be used either as a single or “double” contrast agent. As a single agent, it fills the entire lumen of the GI tract and can detect large abnormalities. As a double contrast agent, barium sulfate is introduced first, followed usually by air; the barium sulfate coats the inner surface of the GI tract, and the air distends the lumen. This double agent approach is used to characterize smaller disorders of the large intestine, colon, and rectum. Iodine-based X-ray contrast agents are used for a number of applications including intravenous urography, angiography, and intravenous and intraarterial digital subtraction angiography. An iodine-based agent is injected into the bloodstream, and because iodine has a K-edge at 37.4 keV, X-ray attenuation in blood vessels is enhanced compared with the surrounding soft-tissue. This makes it possible to visualize arteries and veins within the body. Digital subtraction angiography (DSA) is a technique in which one image is taken before the contrast agent is administered, a second is taken after injection of the agent, and the difference between the two images is computed. DSA gives very high contrast between vessels and tissue and can produce angiograms with extremely high spatial resolution, resolving vessels down to w100 mm in diameter.

Medical imaging

1.2.3 Instrumentation for digital radiography The detector placed on the opposite side of the patient to the X-ray source consists of an antiscatter grid and a recording device. The role of the antiscatter grid is to minimize the number of Compton scattered X-rays that reach the detector, since these reduce image contrast. The grid consists of thin strips of lead spaced by aluminum for structural support. The grid ratio, which is the length of the lead strips divided by the interstrip distance, has values between 4:1 and 16:1, and the strip line density ranges from 25 to 60 per cm. Digital radiography has largely replaced the use of X-ray film for recording images. A large-area (41 41 cm) flat-panel detector (FPD) consists of an array of thin-film transistors (TFTs). The FPD is fabricated on a single monolithic glass substrate. A thin-film amorphous silicon transistor array is then layered onto the glass. Each pixel of the detector consists of a photodiode and associated TFT switch. On top of the array is a structured thallium-doped cesium iodide (CsI) scintillator that consists of many thin, rod-shaped crystals (approximately 6e10 mm in diameter) aligned parallel to one another. When an X-ray is absorbed in a CsI rod, the CsI scintillates and produces light. The light undergoes internal reflection within the fiber and is emitted from one end of the fiber onto the TFT array. The light is then converted to an electrical signal by the photodiodes in the TFT array. This signal is amplified and converted into a digital value for each pixel using an analogue-to-digital converter. Each pixel typically has dimensions of 200 200 mm.

1.3 Computed tomography 1.3.1 Principles of computed tomography CT acquires X-ray data at different angles with respect to the patient and then reconstructs these data into images. The basic scanner geometry is shown in Fig. 1.4. A wide X-ray “fan-beam” and large number of detectors (typically between 512 and 768) rotate synchronously around the patient. The detectors used are ceramic scintillators based on Gd2O2S, with different companies adding trace amounts of various elements to improve performance characteristics. Behind each scintillator is a silicon photodiode to convert light into current flow. The current is amplified and then digitized. The combined data represent a series of one-dimensional projections. Prior to image reconstruction, the data are corrected for the effects of beam hardening, in which the effective energy of the X-ray beam increases as it passes through the patient due to the greater degree of attenuation of lower X-ray energies. Corrections are also made for imbalances in the sensitivities of individual detectors and detector channels. Reconstructing a 2-D image from a set of projections p(r,f) acquired as a function of r,

7

Xiaofeng Zhang, Nadine Smith and Andrew Webb

8

Figure 1.3 Schematic of the operation of a third-generation CT scanner (left). Photograph of a CT scanner with patient bed (right).

the distance along the projection, and f, the rotation angle of the X-ray source and detector, is performed using filtered backprojection. Each projection p(r,f) is Fourier transformed along the r-dimension to give P(k,f), and then P(k,f) is multiplied by H(k), the Fourier transform of the filter function h(r), to give Pfilt(k,f). The filtered projections, Pfilt(k,f), are inverse Fourier transformed back into the spatial domain and backprojected to give the final image, bf ðx; yÞ: bf ðx; yÞ ¼

n X j¼1

F1 Pfilt k; fj d4

(1.3)

where F1 represents an inverse Fourier transform and n is the number of projections. The filter is typically a low-pass cosine or generalized Hamming function. The reconstruction algorithm assumes that all projections are parallel; however, Fig. 1.3 shows that in the case of an X-ray fan-beam this is not the case. The backprojection algorithm is adapted by multiplying each projection by the cosine of the fan beam angle, with the angle also incorporated into the filter. After reconstruction, the image is displayed as a map of tissue CT numbers defined by mo mH2 O (1.4) CTo ¼ 1000 mH2 O where CTo is the CT number, and mo the linear attenuation coefficient, of the tissue.

Medical imaging

9 x-ray source

collimators continuous table motion

multi-slice detectors

Figure 1.4 Continuous motion of the patient while the X-ray source and detectors rotate causes the X-rays to trace out a helical trajectory through the patient. Multislice detectors (not shown to scale) enable very thin slice thicknesses to be acquired.

1.3.2 Spiral and multislice computed tomography Spiral CT acquires data as the patient table is moved continuously through the scanner, with the trajectory of the X-ray beam through the patient tracing out a spiral pattern as shown in Fig. 1.4. This technique enables very rapid scan times that can be used, for example, for a complete chest and abdominal study during a single breath-hold. Full three-dimensional vascular imaging data sets can be acquired very shortly after injection of an iodinated contrast agent. The instrumentation for spiral CT is very similar to conventional third-generation CT scanners but uses multiple slip rings for power and signal transmission. The spiral trajectory is defined in terms of parameters such as the spiral pitch (p), defined as the ratio of the table feed (d) per rotation of the X-ray source to the collimated slice thickness (S). Due to the spiral trajectory of the X-rays through the patient, modification of the backprojection reconstruction algorithm is necessary to form images that correspond closely to those that would have been acquired using a single-slice CT scanner. Images are usually processed in a way that results in considerable overlap between adjacent slices. This has been shown to increase the accuracy of lesion detection, for example, since with overlapping slices there is less chance that a significant portion of the lesion lies between slices. The vast majority of new CT scanners are multislice scannersdi.e., they incorporate an array of detectors in the direction of table motion, as shown in Fig. 1.4, in addition to spiral data acquisition. Multislice spiral CT can be used to image larger volumes in a

10

Xiaofeng Zhang, Nadine Smith and Andrew Webb

Figure 1.5 Three-dimensional volume rendering of the cardiac surface with data from a multislice spiral CT system (left). Three-dimensional cardiac angiogram (right).

given time or to image a given volume in a shorter scan time, compared with conventional spiral CT. The collimated X-ray beam can also be made thinner, giving higher quality three-dimensional scans with slice thicknesses well below 1 mm. Multislice machines now offered by vendors can scan up to 320 slices and allow very highresolution images to be acquired, as shown in Fig. 1.5.

1.4 Nuclear medicine 1.4.1 Radioactive nuclides in nuclear medicine In contrast to X-ray, ultrasound and MRI, nuclear medicine imaging techniques do not produce an anatomical map of the body but instead image the spatial distribution of radioactive materials (radiotracers) introduced into the body. Nuclear medicine detects early biochemical indicators of disease by imaging the kinetic uptake, biodistribution, and clearance of very small amounts (typically nanograms) of radiotracers that enter the body via inhalation into the lungs, direct injection into the bloodstream, or oral administration. These radiotracers are compounds consisting of a chemical substrate linked to a radioactive element. Abnormal tissue distribution or an increase or decrease in the rate at which the radiopharmaceutical accumulates in a particular tissue is a strong indicator of disease. Radiation in the form of g-rays is detected using an imaging device called a “gamma camera.” The vast majority of nuclear medicine scans are performed using technetium-containing radiotracers. 99mTc exists in a metastable state and is formed from 99Mo according to the scheme shown below: s1=2 66 hours s1=2 6 hours 99g 99 99m 42 Mo ! b þ 43 Tc ! 43 Tc

þ g

Medical imaging

The energy of the emitted g-ray is 140 keV, which is high enough for a significant fraction to pass through the body without being absorbed and low enough not to penetrate the collimator septa used in gamma cameras to reject scattered g-rays. Tcbased radiotracers are produced from an on-site technetium generator, which can be replenished on a weekly basis. The generator consists of an alumina ceramic column with radioactive 99Mo absorbed onto its surface in the form of ammonium molybdate. The column is housed within a lead shield for safety considerations. 99mTc is obtained by flowing an eluting solution of saline through the generator. The solution washes out the 99m Tc, which binds very weakly to the alumina, leaving the 99Mo behind. The 99mTc eluted from the generator is in the form of sodium pertechnetate, NaTcO4. The majority of radiotracers, however, are prepared by reducing the pertechnetate to ionic technetium (Tc4þ) and then complexing it with a chemical ligand that binds to the metal ion. Examples of ligands include diphosphonate for skeletal imaging, diethylenetriaminepentaacetic acid (DTPA) for renal studies, hexamethylpropyleneamine oxime for brain perfusion, and macroaggregated albumin for lung perfusion.

1.4.2 Nuclear medicine detectors The gamma camera is based on a large scintillation crystal that transduces the energy of a g-ray into light. In front of the crystal is a lead collimator, usually of a hexagonal “honeycomb” structure, that minimizes the contribution of Compton scattered g-rays, analogous to the setup described previously for X-ray imaging. The crystal itself is made of thallium-activated sodium iodide, NaI(Tl), which converts the g-ray energy into light at 415 nm. The intensity of the light is proportional to the energy of the incident g-ray. The light emission decay constant, which is the time for the excited states within the crystal to return to equilibrium, is 230 ns, which means that count rates of 104e105 g-rays per second can be recorded accurately. The linear attenuation coefficient of NaI(Tl) is 2.22 cm1, so 90% of the g-rays that strike the scintillation crystal are absorbed in a 1 cm thickness. Approximately 13% of the energy deposited in the crystal via g-ray absorption is emitted as visible light. The only disadvantage of the NaI(Tl) crystal is that it is hygroscopic and so must be sealed hermetically. The light photons emitted by the crystal are detected by hexagonal-shaped (sometimes square) photomultiplier tubes (PMTs), which are closely coupled to the scintillation crystal via light pipes. Arrays of 61, 75, or 91 PMTs, each with a diameter between 25 and 30 mm, are typically used. The output currents of the PMTs pass through a series of low-noise preamplifiers and are digitized. The PMTs situated closest to a particular scintillation event produce the largest output current. By comparing the magnitudes of the currents from all PMTs, the location of individual scintillations within the crystal can be estimated using an Anger logic circuit. In addition, the summed signal from all the PMTs, termed the “z-signal,” is sent to a pulse-height analyzer (PHA) that compares the

11

Xiaofeng Zhang, Nadine Smith and Andrew Webb

12 Display A/D converter

pulse height analyzer z-pulse Anger position network preamplifiers photomultiplier tubes light pipe/optical coupling scintillation crystal lead collimator

Figure 1.6 Schematic of an Anger gamma camera used for planar nuclear medicine (left).

z-signal to a threshold value that corresponds to that produced by a g-ray with energy 140 keV. If the z-signal is significantly below this threshold, it is rejected as having originated from a Compton scattered g-ray. A range of values of the z-signal is accepted, with the energy resolution of the system being defined as the full-width at halfmaximum (FWHM) of the photopeak and typically is about 14 keV (or 10%) for gamma cameras. The narrower the FWHM of the system, the better it is at discriminating between unscattered and scattered g-rays (Fig. 1.6).

1.4.3 Single-photon emission-computed tomography The relationship between single-photon emission-computed tomography (SPECT) and planar nuclear medicine is exactly the same as that between CT and planar X-ray imaging. In SPECT, two or three gamma cameras are rotated around the patient to obtain a set of projections that are then reconstructed to produce a two-dimensional image. Adjacent slices are produced from separate rows of PMTs in the two-dimensional array. SPECT uses instrumentation and radiotracers similar to those of planar scintigraphy, and most SPECT machines can also be used for planar scans. Projections can either be acquired in a “stop-and-go” mode or acquired during continuous rotation of the gamma camera. Image reconstruction can be performed either by filtered backprojection, as in CT, or using iterative methods. In either case, attenuation and scatter correction of the data are required prior to image reconstruction. Attenuation correction is performed using either of two methods. In the first, the attenuation coefficient is assumed to be uniform in the tissue being imaged. A patient outline is formed by fitting an ellipse or circle from the acquired data. This approach works well when imaging homogenous tissues such as the brain. However, for cardiac applications, for example, a spatially

Medical imaging

13

variant correction must be applied based on direct measurements of tissue attenuation using a transmission scan with tubes of known concentration of radioactive gadolinium (153Gd), which emits w100 keV g-rays, placed around the patient. The transmission scan can be performed with the patient in place before the actual diagnostic scan or can be acquired simultaneously with the diagnostic scan. Since the attenuation coefficients are measured for 100 keV g-rays, a fixed multiplication factor is used to convert these numbers to 140 keV. The attenuation map is calculated from the transmission projections using filtered backprojection. The second step in data processing is scatter correction, which must be performed on a pixel-by-pixel basis since the number of scattered g-rays is not spatially uniform. The most common method uses dual-energy window detection: one energy window is centered at 140 keV with a fractional width (Wm) of w20%, with a “subwindow” centered at 121 keV with a fractional width (Ws) of w7%. The main window contains contributions from both scattered and unscattered g-rays, but the subwindow has contributions only from scattered g-rays. The true number of primary g-rays, Cprim, can be calculated from the total count, Ctotal, in the main window, and the count, Csub, in the subwindow: Cprim ¼ Ctotal

Csub Wm 2Ws

(1.5)

In addition to filtered backprojection, iterative reconstruction methods are also available on commercial machines. These iterative methods can often give better results than filtered backprojection because accurate attenuation corrections based on transmission source data can be built into the iteration process, as can the overall modulation transfer function of the collimator and gamma camera. Typically, the initial estimate of the distribution of radioactivity can be produced using filtered backprojection. Projections are then calculated from this initial estimate and the measured attenuation map and compared with the projections actually acquired. The differences (errors) between these two data sets are computed and the estimated image correspondingly updated. This

Figure 1.7 Single-photon emission-computed tomography images of the brain.

Xiaofeng Zhang, Nadine Smith and Andrew Webb

14

process is repeated a number of times to reach a predetermined error threshold. The most commonly used iterative methods are based on maximum likelihoodeexpectation maximization (ML-EM), with the particular implementation being the ordered subset expectation maximum (OSEM) algorithm. Potential instability in the reconstruction from noisy data normally necessitates applying a filter, such as a two or three-dimensional Gaussian filter with an FWHM comparable to the intrinsic spatial resolution of the data (Fig. 1.7).

1.4.4 Positron-emission tomography Radionuclides used in PET scanning emit positrons, which travel a short distance in tissue before annihilating with an electron, resulting in the formation of two g-rays, each with an energy of 511 keV. The two g-rays travel in opposite directions to one another and are detected by a ring of detectors placed around the patient (Fig. 1.8). The location of the two crystals that detect the two antiparallel g-rays defines a line along which the annihilation occurred. This process is referred to as annihilation coincidence detection and forms the basis of signal localization in PET. The spatial distribution, rate of uptake, and rate of washout of a particular radiotracer are all quantities that can be used to distinguish diseased from healthy tissue. Radiotracers for PET have very short half-lives (e.g., 11C (20.4 min), 15O (2.07 min), 13N (9.96 min), and 18F (109.7 min)) and must be synthesized on-site using a cyclotron. After production they are incorporated via rapid chemical synthesis into structural analogues of biologically active molecules, such as 18 F-fluorodeoxyglucose (FDG) and 11C-palmitate. Robotic units are available commercially to synthesize 18FDG, 15O2, C15O2, C15O, and H15 2 O.

Figure 1.8 Image formation using PETdantiparallel g-rays strike pairs of detectors that form a line integral for filtered backprojection (left). Abdominal PET study using fluorodeoxyglucose with hot spots indicating the presence of small tumors (right).

Medical imaging

The individual scintillation crystals used in PET are either bismuth germanate (BGO: Bi4Ge3O12), or increasingly more commonly, lutetium silicon oxide (LSO: Lu2SiO5:Ce). The advantages of LSO are its short decay time (allowing a short coincidence time, reducing accidental coincidences as described below), a high emission intensity, and an emission wavelength close to 400 nm, which corresponds to maximum sensitivity for standard PMTs. Multislice capability can be introduced into PET imaging, as for CT, by having a number of detector rings stacked adjacent to one another. Each ring typically consists of 16 “buckets” of 8 8 blocks of scintillation crystals, each block coupled to either 16 (BGO) or 4 (LSO) PMTs. The number of rings in a high-end multislice PET scanner can be up to 48. Retractable septa (lead or tungsten) are positioned between each ring; these can be retracted for imaging in three-dimensional mode. When a g-ray interacts with a particular detector crystal, it produces a number of photons. These photons are converted into an amplified electrical signal at the output of the PMT that is fed into a PHA. If the electrical signal is above a certain threshold, the PHA generates a “logic pulse” that is sent to a coincidence detector. Typically, this logic pulse is 6e10 ns long. When the next g-ray is detected, a second logic pulse is sent to the coincidence detector that adds the logic pulses together and passes the summed signal through a separate PHA. If the logic pulses overlap in time, the system accepts the two g-rays as having evolved from one annihilation and records a line integral between the two crystals. The PET system can be characterized by its “coincidence resolving time,” which is defined as twice the length of the logic pulsedi.e., 12e20 ns in this case. Prior to reconstruction, the data must undergo attenuation correction as well as the removal of accidental and scattered coincidences. Prior to the development of dual CT/ PET scanners (see the next section), an external ring source of positron emitters, usually containing germanium-68, was used for a transmission-based calibration. However, with the advent of CT/PET scanners, anatomical information from the CT scan, together with knowledge of tissue attenuation factors, is used for attenuation correction. Accidental coincidences refer to events in which the line integral formed by the detection of the two g-rays is assigned incorrectly. These occur due to the finite coincidence resolving time of the system, g-rays passing through the crystal and not being detected, and the presence of background radiation. The most common method of estimating accidental coincidences uses additional parallel timing circuitry, which splits the logic pulse from one detector into two components. The first component is used in the standard mode to measure the total number of coincidences. The second component is delayed well beyond the coincidence resolving time so that only accidental coincidences are recorded. The accidental coincidences are then removed from the acquired data. Image reconstruction uses either filtered backprojection or iterative methods. Due to the detection of two g-rays, the point spread function (PSF) in PET is essentially constant through the patient. The PSF is limited by three factors: (1) the finite

15

16

Xiaofeng Zhang, Nadine Smith and Andrew Webb

distance the positron travels before annihilation with an electron (w1 mm for 18F), (2) the statistical distribution (180 0.3 degrees) that characterizes the relative trajectories of the two g-rays, meaning that a 60 cm diameter ring has a spatial resolution of 1.6 mm, whereas a 100 cm diameter ring has a resolution of 2.6 mm, and (3) the size of the detection crystal; one-half the crystal diameter is often assumed. The most common clinical application of PET is in tumor detection using 18F -FDG. In the body, the radiopharmaceutical FDG is metabolized in exactly the same way as naturally occurring 2-deoxyglucose. Once injected, FDG is actively transported across the bloodebrain barrier into the cells in brain tissue. Inside the cell, FDG is phosphorylated by glucose hexokinase to give FDG-6-phosphate. This chemical is trapped inside the cell, since it cannot react with G-6-phosphate dehydrogenase, which is the next step in the glycolytic cycle. The amount of intracellular FDG is therefore proportional to both the rate of initial glucose transport and subsequent intracellular phosphorylation. Malignant cells, in general, have higher rates of aerobic glucose metabolism than those of healthy cells, and therefore in PET scans using FDG, the tumors show up as areas of increased signal intensity, as seen in Fig. 1.8. Future technical advances in PET technology seem likely to be based on time-offlight (TOF) PET scanners, which can potentially increase signal-to-noise significantly over today’s scanners. If PET detectors have good time resolution, the actual location of the annihilation can be estimated by measuring the difference in the arrival times of the two g-rays. In its original implementation in the early 1980s, the only scintillator that was sufficiently fast was BaF2 with a timing resolution of 5

2017 2018

1 mine72 h Up to 2M Variable 1 mine64 h Up to 2M Variable

250 315

>5 >5

MiSeq NextSeq HiSeq X NovaSeq 6000 Ion Ion ProtonÔ system Torrent Ion S5XL 540 PacBio Sequel (per SMRT cell) Oxford MinION Mk Nanopore 1B GridION X5 PromethION (1 flow cell)

Notes: Numbers are either from the NGS company websites (www.illumina.com, www.lifetechnologies.com/iontorrent, www.pacb.com, nanoporetech.com) or allseq.com/kb-category/sequencing-platforms as of January 2019. Max. reads/ run for Oxford Nanopore platforms are variable depending on sample types and fragment lengths.

84

Yi Fu, Tsung-Heng Tsai, Chunhong Mao, Seong K. Mun, Habtom W. Ressom, Minkun Wang, Zhen Zhang and Yue Wang

relatively small number of very long reads instead of generating a large number of short reads. However, the raw read error rate is around 14%, which is substantially higher than the 0.1%e1% error rate of other leading NGS platforms. Oxford Nanopore is a relatively new sequencing technology that offers real-time, direct DNA/RNA sequencing with ultralong reads of up to 2 Mbp. Long reads make it easier to assemble genomes, as they span repetitive genomic regions, that are difficult to assemble with short reads. However, the error rate of Oxford Nanopore is still relatively high at >5%. The read accuracies for PacBio and Oxford Nanopore platforms are still improving. Sequence accuracies can be vastly improved after error corrections through iterative read consensus.

3.3.2 Workflow for DNA sequencing data processing The enormous amount of sequencing data generated by NGS platforms requires adequate computational methods to process and analyze. Fig. 3.1 shows a typical workflow for genomic DNA sequencing data processing. First, the raw sequence reads need to go through a quality control and preprocessing step to remove sequence adaptors and trim and filter out the low-quality sequence reads. There are two approaches for further analysis and genome assembly. When a reference genome sequence is available, the reference-based assembly method can be used to align the sequence reads to reference genome sequences. Another approach is de novo assembly without using a reference genome. As mentioned earlier, complex genomes that contains highly repetitive sequences are difficult to assemble with short reads alone. Long reads from either PacBio

Figure 3.1 Workflow for genomic DNA sequencing data processing.

Biological computing

or Oxford Nanopore are often needed. In addition, other methods such as physical mapping (Bionano, www.bionanogenomics.com [15]) or chromatin conformation capture [16e18] can complement NGS methods in generating a more contiguous and accurate genome assembly. When sequence reads are aligned to the reference genome, variant calling can be done to provide information on single-nucleotide polymorphism, insertions/deletions, and other structural variations. Such variation analyses can help us understand how genetic variations contribute to the phenotype of an organism. After the assembly step, the assembled contigs or scaffolds can be analyzed for genome structure and function. This step is called genome annotation, which typically includes the predictions of protein- and RNA-coding genes and noncoding regions and their functions. Other advanced genome analysis can also be performed, such as comparative genomics for individual gene and gene family functions.

3.3.3 Other types of sequencing data and applications In addition to whole-genome DNA sequencing, NGS technologies can generate other types of high-throughput sequencing data, such as whole-exome sequencing (WES), RNA sequencing (RNA-Seq), and chromatin immunoprecipitation followed by sequencing (ChIP-Seq). These types of data also have wide applications for genomic study. WES is a technique for sequencing all protein-coding genes in a genome [19]. The WES method can help identify genetic variants that alter protein sequences at a much lower cost than that of whole-genome sequencing, as it only sequences the proteincoding regions. RNA-Seq is a very powerful technique for sequencing transcriptomes [20,21]. RNA-Seq is used to study the transcription profile of an organism to help understand gene expression and regulation under different conditions. Thanks to recent advances that effectively compartmentalize single cells and molecular reagents, it is now possible to perform single-cell RNA sequencing simultaneously on tens of thousands of cells (www.10xgenomics.com [22e26]). ChIP-Seq combines chromatin immunoprecipitation with massively parallel DNA sequencing used to identify the binding sites of DNA-associated proteins [27,28]. Combining RNA-Seq and ChIP-Seq technologies, researchers can investigate the complex gene regulatory networks in an organism [29e31].

3.4 Overview of proteomic methods In a typical untargeted proteomic study, proteins are first enzymatically digested into smaller peptides that are then analyzed by liquid chromatography coupled with mass spectrometry (LCeMS). This involves chromatographic separation and MS-based analysis of the peptides. Due to differences in hydrophobicity and polarity, among other properties, peptides elute from the LC column at different retention time (RT)

85

86

Yi Fu, Tsung-Heng Tsai, Chunhong Mao, Seong K. Mun, Habtom W. Ressom, Minkun Wang, Zhen Zhang and Yue Wang

points. The eluted peptides are then analyzed by mass spectrometry (MS) or tandem MS (MS/MS). Therefore, an LCeMS run contains RT information in chromatogram, mass-over-charge ratio (m/z) in MS spectrum, and relative ion abundance for each particular ion. MS signals detected throughout the range of chromatographic separation are formatted in a three-dimensional map, which defines the data from a single LCeMS run. The data contain quantitative information of detected peptides and their associated proteins, which are identified by de novo sequencing or database searching using MS/ MS spectra. During data preprocessing, features (usually referred to as peaks) are extracted from LCeMS data, with each peptide characterized by its isotopic pattern resulting from common isotopes such as 12C and 13C in a set of MS spectra within its elution duration. There are different sources of bias: noise, variabilities during sample collection and sample storage, poor experimental design, etc. Furthermore, instrument variability in experiments involving a large number of LCeMS runs leads to drift in RT and intensity measurements. Thus, LCeMS data preprocessing involves various steps including noise filtering, deisotoping, peak detection, RT alignment, peak matching, and normalization. These data preprocessing steps generate a list of detected peaks characterized by their RTs, m/z values and intensities. The preprocessed data can be used in subsequent analysisde.g., identification of peaks with significant differences between biological groups and association of these peaks with peptides/proteins through MS/MS identification. In the following subsections, steps for LCeMS data preprocessing and differential expression analysis in a typical untargeted proteomic study are described [32]. In addition, a brief description of multiple reaction monitoring (MRM) for targeted quantitation of selected proteins and label-based protein quantitation methods is provided. Finally, a cloud-computing-based open-source software (OSS) package, MSPyCloud, is introduced.

3.4.1 Noise filtering LCeMS data are subject to electronic/chemical noise due to contaminants present in the column solvent or instrumental interference. Noise filtering allows us to increase the signal-to-noise ratio (SNR) and facilitate the subsequent peak detection step. Software tools such as MZmine 2 [33] integrate noise filtering into the peak detection step to ensure coherence. Smoothing filters such as Gaussian and SavitzkyeGolay filters are commonly applied to reduce noise effects. Due to differences in resolutions and detection limits among various LCeMS platforms, parameters for smoothing filters need to be adaptively selected, preferably through a pilot experiment with similar experimental settings.

Biological computing

3.4.2 Deisotoping Most chemical elements have naturally occurring isotopes. For example, 12C and 13C are two stable isotopes of the element carbon with mass numbers 12 and 13, respectively. Consequently, each analyte gives rise to more than one ion peak in an MS spectrum, where the peak arising solely from the most common isotope is called the monoisotopic peak. In LCeMS-based proteomics, each peptide is characterized by an envelope of ion peaks due to its constituent amino acids. 13C constitutes about 1.11% of the carbon species and the approximately 1 dalton (Da) mass difference between 13C and 12C results in 1/z difference between adjacent ion peaks in the isotopic envelope, where z is the state of a charged peptide. The deisotoping step integrates siblings of ion peaks originating from the same peptide and summarizes by its monoisotopic mass. This facilitates the interpretation of LCeMS data and reduces complexity in subsequent analysis. DeconTools is widely used to deisotope MS spectra, which involves: (1) identification of isotopic pattern, (2) prediction of the charge state based on the distance between the ion peaks, and (3) comparison between the observed isotopic pattern and a theoretical distribution generated based on an average residue.

3.4.3 Peak detection Peak detection is a procedure to determine the existence of a peak in a specific range of RTand m/z value. Most existing methods perform peak detection via a pattern matching process followed by a filtering step based on quantified peak characteristics. Since elution profiles may vary across different RTs, the use of a single pattern throughout the whole RT range in current approaches may lead to inaccurate estimates of peak characteristics and SNR, where the latter is often employed as a filtering criterion. Also, peak detection is usually performed for each LCeMS run individually without leveraging the information from other runs in the same experiment. Utilization of information from multiple runs could improve the ability to detect peaks in subsequent peak matching across multiple runs.

3.4.4 Normalization Due to the presence of various analytical and technical variabilities, normalization of LCeMS-based intensity measurements is needed. One of the typical normalization approaches carries out the task through identifying a reference for ion intensities and making adjustments based on the reference. Other more rigorous approaches use regression methods based on a set of matched peaks or spiked-in internal standards. However, it is unclear whether neighboring ions (in terms of RT, m/z value, or intensity) would necessarily share a similar drifting trend along the analysis order. Another alternative approach utilizes quality control (QC) runs to assess and correct variability in LCeMS data. QC runs can be collected using a reference sample or a mixture pooled

87

88

Yi Fu, Tsung-Heng Tsai, Chunhong Mao, Seong K. Mun, Habtom W. Ressom, Minkun Wang, Zhen Zhang and Yue Wang

from analyzed samples. This idea has been successfully implemented for large-scale metabolomic studies, where variability along the analysis order is estimated for each of the detected peaks through assessment of the QC runs. This circumvents the need to select an arbitrary reference, with additional experimental challenges to assure appropriate coverage and reproducible detection of ions in the QC runs. We developed a Bayesian normalization model (BNM) that utilizes scan-level information from LCeMS data. Specifically, BNM uses peak shapes to model the scan-level data acquired from extracted ion chromatogram with parameters considered as a linear mixed effects model. We extended the model into BNM with drift (BNMD) to compensate for the variability in intensity measurements due to long LCeMS runs. We evaluated the performance of our method using synthetic and experimental data. In comparison with several existing methods, BNM and BNMD yielded improvement in terms of decreasing the variability of ion intensities among quality control runs [32e34].

3.4.5 Retention time alignment and peak matching The peak matching step groups consensus peaks across multiple LCeMS runs prior to subsequent analysis, e.g., identification of significant differences between samples, to ensure a valid comparison of the LCeMS runs. Also, it is crucial for potential extensions of peak detection and normalization steps, by leveraging information from multiple runs. The main challenge in peak matching results from the presence of RT variability among LCeMS runs. Most LCeMS preprocessing pipelines (OpenMS, msInspect, MZmine 2, etc.) integrate the estimation of RT variability into the peak matching step in order to perform RT alignment and achieve reliable identification of consensus peaks. RT alignment approaches can be categorized as (1) feature-based approaches and (2) profile-based approaches. The feature-based approaches perform the alignment task based on detected peaks and rely on the correct identification of a set of consensus peaks among LCeMS runs. On the other hand, the profile-based approaches utilize chromatograms of the LCeMS runs to estimate variability along RT and then adjust accordingly. Incorporation of information from peptide identification can reduce matching ambiguity and improve the alignment result. For example, the PEPPeR platform integrates peak lists and MS/MS identification for RT alignment. A more sophisticated approach has been implemented in MaxQuant, which leverages each preprocessing step to enhance overall performance. We developed a Bayesian alignment model (BAM) for RT alignment in LCeMS data [35,36]. The alignment model provides estimates of RT variability along with uncertainty measures. The model enables integration of multiple sources of information including internal standards and clustered chromatograms in a mathematically rigorous framework. The performance of the model was evaluated based on ground-truth data, by measuring correlation of variation, RT difference across runs and peak-matching performance.

Biological computing

We demonstrated that BAM improves RT alignment performance through integration of relevant information [36].

3.4.6 Differential expression analysis Following LCeMS data preprocessing, statistical methods are used to identify peaks with significant change in peptide or protein expression levels between distinct biological groups. Because label-free LCeMS methods measure relative abundance of peptides/ proteins without the use of stable isotope labeling, they require a rigorous workflow to detect differential abundances. In addition to analytical considerations, crucial steps include (1) an experimental design that reduces bias during data acquisition and enables effective utilization of available resources; (2) data preprocessing steps that extract meaningful features; and (3) a statistical test that identifies significant changes while accounting for experimental design. Good experimental design provides an opportunity to process and compare samples in an unbiased manner. This benefit can diminish if the data analyst fails to conduct subsequent statistical tests in accordance with the experimental design. For example, a t-test is commonly applied for detecting differences between groups; however, the independence assumption is not valid in a study using multiple analytical and/or technical replicates and could lead to false positives. Variability assessment is a key component in both the design of experiments and the evaluation of hypothesis tests. It provides guidelines for replication assignment, sample size calculation, and identification of significant differences in statistical tests. In view of this, mixed effects models can be used as alternatives to the t-test to capture variability in peak intensities of LCeMS data and overcome the drawbacks that the t-test suffers from due to its failure to account for dependence structure.

3.4.7 Analysis of targeted quantitative proteomic data Untargeted LCeMS-based proteomics is generally biased toward analysis of the most abundant and observable proteins. Biologically relevant molecular responses, however, are often less discernible in that analysis. Targeted quantification by MRM using triple quadrupole mass spectrometers has been introduced to overcome the limitations of untargeted analysis. Briefly, the MRM method organizes the analysis of a specific list of peptides associated with targeted proteins, characterized by the m/z values of their precursor and fragment ions. The precursor-fragment ion pairs are called transitions, which are highly specific and unique for the targeted peptides. A specific ion is selected in the first quadrupole based on its precursor m/z value. The ion gets fragmented by collision-induced dissociation in the second quadrupole. Only the relevant ions produced by the fragmentation are selected in the third quadrupole. The resulting transitions are then used for quantification. As the data acquisition is highly specific with less

89

90

Yi Fu, Tsung-Heng Tsai, Chunhong Mao, Seong K. Mun, Habtom W. Ressom, Minkun Wang, Zhen Zhang and Yue Wang

Figure 3.2 Fourplex iTRAQ labeling. Peptides from four protein samples are labeled with different tags and pooled in MS/MS analysis. An iTAAQ tag includes a reporter ion and a balance ion. The peptide fragments from four samples have an identical m/z value in the first MS stage. In the second stage of MS/MS, reporter ions with different m/z values will be separated from peptide fragments and measured for abundance.

interference from irrelevant ions, the MRM analysis can yield more sensitive and accurate quantification results.

3.4.8 Introduction to label-based protein quantitation Isobaric tagging for relative and absolute quantification (iTRAQ) and tandem mass tag (TMT) multiplex labeling technology are chemical labels that are reacted with protein samples. Multiple (fourplex or eightplex for iTRAQ; 6, 8, or 10 for TMT) differentially labeled samples could be pooled and analyzed by LCeMS/MS simultaneously for protein identification and quantification; see Fig. 3.2.

3.4.9 Introduction of protein data processing pipeline of MS-PyCloud A cloud-computing-based OSS package, MS-PyCloud [37], was developed by the JHU team to integrate the various open-source tools and resources into a pipeline for LCe MS/MS data processing. The pipeline contains various customizable components: data file integrity validation, data quality control, false discovery rate estimation, protein

Biological computing

Figure 3.3 Schematic workflow of MS-PyCloud pipeline for LCeMS/MS data analysis. The pipeline consists of three major components for the functions of peptide identification, protein inference, and quantitation.

inference, PTM identifications, and quantitation of PTMs and global proteins; see Fig. 3.3. MS-PyCloud supports cloud computing on Amazon Web Services. The software is available for download at https://bitbucket.org/mschnau/ms-pycloud.git. Fig. 3.3 shows the schematic workflow of MS-PyCloud, which includes three major components: peptide identification, protein inference, and protein/PTM quantitation. Firstly, raw data from the LCeMS/MS platform are converted to XML format and validated for integrity. Then the data are searched against standard protein databases for peptide identification using publicly available search engines such as such as MyriMatch [38], MS-GFþ [39], Comet [40], and X! TANDEM [41]. The identified peptides are represented in the format of pepXML or mzID. False discovery rate (FDR) is estimated for identified peptides using decoy search, and then peptide spectrum matches (PSMs) are filtered based on a custom-defined FDR cutoff. Significant PSMs from all files are grouped to infer the represented proteins parsimoniously using a bipartite graph analysis algorithm adopted in many protein inference tools [42e44]. Proteins/PTM can be quantified for both label-free and labeled tags such as iTRAQ. For label-free data, spectral counting method is used to measure the protein abundance. For labeled data such as fourplex iTRAQ data, iTRAQ reporter ion intensity is extracted based on identified peptide and raw LS-MS/MS data. Peptide and protein quantifications are calculated based on iTRAQ reporter ion intensity at the PSM level, and the corresponding FDR is estimated at both the peptide and the protein level. Relative protein abundance is calculated as the median of relative abundances of peptides belonging to the same protein, which is calculated as the median of the log2 ratio of reporter ion intensities of PSMs belonging to the same peptides [45e47].

91

92

Yi Fu, Tsung-Heng Tsai, Chunhong Mao, Seong K. Mun, Habtom W. Ressom, Minkun Wang, Zhen Zhang and Yue Wang

3.5 Biological databases and open-source software There is an increasing diversity of data sources in the public domain and increasingly flexible tools by which large-scale biological data analytics can be performed. Public access datasets and software tools have provided a link between the biologists generating the large-scale data, and the bioinformaticians or statistician who wishes access to the data and tools for method improvement or biological analysis.

3.5.1 Brief introduction of major biological databases Numerous databases have emerged in this genomic era, providing gateways for researchers to access the information about genes, transcripts and proteins. The information set includes nucleotide sequence of genes, amino acid sequence of proteins, function of genes or proteins, genomic origins, association with diseases, and many others. Depending on the biological questions that the researcher seeks to answer, these databases could be roughly categorized into a few groups, as listed below. 1. Sequence database a. Nucleotide sequenceebased hits b. Protein hits 2. Mutations and structural variations 3. Gene functions, pathway database, ontology resources 4. Gene expression 5. Protein structure and sequence motif database 6. Proteineprotein interaction database 7. PTM database The complexity of both the biology and the data collection aspect becomes considerably more complex. While RNA experiments are typically conducted by grinding up an entire sample into a “pool” of transcripts, proteomic or metabolomics experiments often study different subcellular compartments (cytoplasm, membrane, nucleus, metabolites), adding an additional dimension to data acquisition, storage, and interpretation. For a single experiment, the amount of raw data can involve up to a 100 gigabytes of memory. How will this be databased and provided to the public? Again, it would be ideal if computational scientists could have access to the raw data, but the logistics of databasing and public access inclusion as well as the complexity of the data become quite challenging.

3.5.2 Introduction of open-source software OSS has become a common resource and readily available powerful tool in all aspects of research and development including biomedical research in recent years. However, there persist many misunderstandings of its definition, use, and relation with other users and

Biological computing

various business models. Additionally, the role that copyright plays in OSS is poorly understood, and as a result one can face unexpected complications when attempting to take advantage of the power of OSS strategy and business models. First, the copyright of the software belongs to the developer or developer’s employer if the produced software was a part of the job. Second, OSS is computer software wherein the source code is made available under an open-source license in which the copyright holder provides (depending upon the specific license) various rights to study, change, and distribute the software. Software is not considered open-source unless it is licensed with one of the many open-source licenses available today. This need for an open-source license is a cause of widespread misunderstanding among those who mistakenly equate “license” with “royalty.” The existence of the license does not mean that OSS is not freedthe license simply provides the terms under which the code can be modified, distributed, and reused. The word “free” has two meanings, freedom and free of costs. The tradition of OSS can be traced back to the beginning of the Unix operating system. AT&T Bell Laboratories developed the Unix operating system in the 1960s and early 1970s. As a part of a long-running antitrust dispute with the Department of Justice, ATTentered into a consent decree with the government to stay out of “any business other than the furnishing of common carrier communications service” [48]. One of the terms in the consent decree was a provision that Bell Systems patents be licensed to others without requiring royalty paymentsdthat is, free of any licensing fees. Though AT&T was under no obligation to provide any technical support or services for Unix software, it was a “godsend” for university computer science departments in terms of more academic research and development. Because of this, the Unix operating system soon became very popular in the computer software community. This process established the tradition of open-source code as royalty free software offered as-is without additional support. In 1991, Linus Torvalds of Finland was successful in making the Unix operating system more useful by developing both an operating system kernel and tools to install and run the code [49]. Meanwhile, another group, GNU, was developing GNU Hurd, a multiserver microkernel that can utilize various processes on the Unix kernel [50]. Torvalds released his kernel as free open-source code in 1992, and Richard Stallman of the GNU project did major work in integrating Torvalds’s kernel with his own GNU system, which resulted in a complete and free operating system software called GNU/ Linux [51]. The combined operating system was released under the GNU’s General Public License (GPL), an open-source licensing model. The GPL granted recipients unfettered rights to redistribute software with the condition that the source code could not be kept secret. The GNU/Linux system is commonly known as Linux, as we know it today. In 1999 IBM announced that it would invest $1 billion in developing the opensource Linux software, and in addition, Sun Microsystems launched its own open-source initiative called OpenOffice [52]. These new major investments into open-source gave

93

94

Yi Fu, Tsung-Heng Tsai, Chunhong Mao, Seong K. Mun, Habtom W. Ressom, Minkun Wang, Zhen Zhang and Yue Wang

Figure 3.4 Three major components of OSS: software, governance, and community.

rise to the industry’s confidence in OSS business models. These activities and many others laid the foundation for today’s vibrant OSS activities. The essence of OSS is developing a set of code collaboratively with community participation, often with volunteers. The core issue is how to get the best of the diverse community and at the same time have disciplined code or product management for a superior product that can be shared. This requires the interplay of software, community management, and governance of the decision-making process to ensure efficient collaboration. Thus the OSS ecosystem consists of interacting three major componentsdsoftware, community, and governancedas shown in Fig. 3.4. As mentioned earlier, OSS must be properly licensed. More than 80 different types of open-source licenses have been approved by the Open Source Initiative [53,54]. These licenses generally fall into one of two categories: permissive licenses and copyleft licenses, which include more restrictions. A permissive license is simple and is the most basic type of open-source license that allows users to do whatever they want with the software as long as they abide by the notice requirements spelled out in the license included in the software. The permissive license, such as Apache 2.0, has gained popularity in recent years, as it is seen as more flexible and business friendly. Copyleft licenses such as GPL, Mozilla Public License, and Eclipse Public License add additional restrictions such as making the source code available under the same original copyleft terms under which the initial code was acquired. When one uses multiple opensource codes, additional care is required. License incompatibility can arise because different licenses can contain conflicting terms, rendering it impossible to legally combine source code from separately licensed OSS to create and publish OSS. One has to be very careful not to mix noncompatible licenses because the terms of the more restrictive license can supersede the terms of a more flexible one resulting in a restrictive license for the resultant code [55]. This is what is commonly known as a viral license because it can impose the same terms of restrictions on all derived works downstream. The following diagram by Wheeler highlights the incompatibility of various OSS

Biological computing

95

Figure 3.5 Free-Libre/open-source software licenses workflow diagram.

licenses. It means that if a code with a permissive license is combined with that of more protective code, the resultant combined code will become more protective (Fig. 3.5). When one has selected an open-source license to support a business model, one needs to have tools such as GitHub to support distribution and change management of the software. GitHub, the largest open repository of OSS, is a hosting service that offers distributed version control and source code management functionality of Git [56]. During the earlier years of Linux development, changes in software were passed around as patches and archived files among collaborators. In 2005, Git was conceived by Linus Torvalds to automate software development and version control involving many contributors around the world. GitHub has become a preferred hosting service of many major corporations and millions of developers around the world. It has been acquired by Microsoft Corporation. The second and critical component of the open-source ecosystem is the community. The community can consist of code developers, open-source promoters, bug fixers, documentation experts, mentors, testers, and others who share the common interests and goals of improving the code at hand. Community members often work for competing organizations that share common interests based on the business need. In any successful open-source operations, there are a number of highly motivated members who are willing to share and learn from others with complementary expertise. These communities, small and large, usually have sponsoring individuals, research labs, or corporations. Some communities are completely open to all, while others may limit the participation to certain groups and individuals depending on the business models of the community and community sponsors. The third component of the ecosystem is governance [57]. Governance has to do with decision-making processes in managing the code and community. The open-source community uses several decision-making models. The benevolent dictator model refers to one person (usually the initial author of the project) who has final say and controls all major project decisions. Python and Linux are

96

Yi Fu, Tsung-Heng Tsai, Chunhong Mao, Seong K. Mun, Habtom W. Ressom, Minkun Wang, Zhen Zhang and Yue Wang

some classic examples. When a company launches an open-source project, the company can be the benevolent dictator. This dictator, however, must be benevolent to seek out the best from the community and avoid forking of the source code. In the meritocracy model, active project contributors who demonstrate high merit earn decision-making authority. Decisions are usually made based on consensus voting. The contribution model is based on the people who do the most work. They are recognized as most influential. Major project decisions are made based on a consensusseeking process rather than voting that tries to incorporate as many community perspectives as possible. The governance model offers a framework to make many decisions such as • establishing shared goals and actions including ROI (return on investment) objectives • establishing a product management strategy and product life cycle • soliciting volunteers • providing a mechanism for code submissions and repository access • growing the community through efficiency, quality, and volume of communication The development and governance of a fully functional collaborative community can be a significant challenge, and the quality of the resulting community will ultimately determine the success of any open-source activities [58]. Organizing a community takes time, effort, and resources. The size and diversity of an open-source community will depend on the potential user base (market size) of the software being developed/reused. If the potential market is large, a community can be formed quickly and become self-sustaining within a short period. However, if the software has limited applications within few organizations, building a community with multiple markets will take a substantial amount of time and resources.

3.5.3 Usability of open source software There is an explosion of available OSS in a variety of code repositories, GitHub being the one of the most popular. However, not all openly available codes are useable. There is an emerging new effort at developing an OSS usability standard by OSEHRA Inc. with multiple levels of certifications based on the requirements involving community peer review: (1) proper open-source license, (2) availability of documentation, (3) proper coding conventions, (4) availability of automated unit and regression tests, (5) testing results, and (6) availability of test data [59].

3.5.4 Commercial products based on open-source software A growing number of successful commercial products are based on OSS. One of the pioneers in this space is the Red Hat-based product line based on open-source community activities of the OSS Fedora. OSS is freely available as-is without any warranty or support service following the early tradition established by AT&T. For example, Fedora

Biological computing

is an open-source community from which Red Hat draws many open-source innovations for its commercial product [60]. Many software engineers with sufficient code expertise can use Fedora code as-is. On the other hand, corporations such as Red Hat can productize OSS by packaging the software with additional capabilities, testing, warranty, services, training, and user manuals. These corporations often become sponsors of the relevant communities. Red Hat has been acquired by IBM. Open-source-related operations encompasses more than just using a software from a website. It can have a long pedigree of initial release, evolution, a collaboration community, and licenses based on a certain business model, community, and sponsors. It is important to understand this pedigree for successful adoption and contribution to the code as part of a community.

3.6 Biological network analysis 3.6.1 Brief introduction of biological network analysis The rapid advance in high-throughput genomic technologies and the growing number of large-scale public biological datasets provide ample opportunities for bioinformatics researchers to study cellular activities at individual gene level and at a higher level of the systematic network. Although huge progress has been made by the scientific research society in discovering new genes, transcripts, and proteins and their functions, finding the regulation mechanisms between genes and the network structure of these regulations remains one of the key goals of systems biology [61]. Genes do not act in isolation but instead work as part of complex networks to perform various cellular processes [62]. Many human diseases including various types of cancer are caused by dysregulated genes. DNA and/or epigenetic mutations within the gene region or its regulatory elements could lead to perturbation in the biological pathways and cause topological changes in the regulatory network structure. This can ultimately impair normal cell physiology and cause disease. For example, cancer driver mutations on a transcription factor can alter its interactions with many of the target genes important in cell proliferation. Various approaches have been proposed and tested using high-throughput gene expression data to infer conserved biological networks including Bayesian networks [63], probabilistic Boolean networks [64], state-space models [65], and network component analysis [66].

3.6.2 Introduction of differential dependency network analysis Genetic regulatory networks are context-specific and dynamic in nature. Under different conditions, different regulatory components and mechanisms are selectively activated or deactivated [67]. One example is the topology of underlying biological network changes

97

98

Yi Fu, Tsung-Heng Tsai, Chunhong Mao, Seong K. Mun, Habtom W. Ressom, Minkun Wang, Zhen Zhang and Yue Wang

in response to internal or external stimuli, where cellular components exert their functions through interactions with other molecular components [68]. To explicitly address differential network analysis, some initial efforts have been recently reported [6,67,69,70]. However, most differential analysis methods that compare gene expression datasets from two conditions address the question of which genes are significantly differentially expressed between conditions. In addition to asking “which genes are differentially expressed,” the new question is “which genes are differentially connected.” It is important to focus on and examine the topological changes in transcriptional networks between disease and normal conditions or under different stages of cell development. For example, a deviation from normal regulatory network topology may reveal the mechanism of pathogenesis, and the 526 genes that undergo the most network topological changes may serve as biomarkers or drug targets. The differential dependency network (DDN) method addresses a distinct question concerning which genes are significantly rewired in the inferred gene-regulator network in disease tissues [71e73]. DDN infers gene networks based on conditional dependencies among genesda key type of probabilistic relationship among genes that is fundamentally distinct from correlation. If two genes are conditionally dependent, then by definition their expression levels are still correlated even after accounting for (e.g., regressing out) the expression levels of all other genes. Thus, a conditional dependence relationship is less likely to reflect transitive effects than mutual correlation and provides stronger evidence that those genes are functionally related. These functional relationships could be regulatory, physical, or other molecular functionality that causes two genes expression to be tightly coupled. DDNs use local dependency models to characterize the dependencies of genes in the network and represent local network structures. Local dependency models decompose the whole network into a series of local networks, which serve as the basic elements of the network used for statistical testing. DDN uses an efficient neighborhood selection strategy based on a penalized regression to enable the inference of a genome-wide network. Unlike pairwise correlation, the conditional dependence between two genes cannot be measured based on just the expression levels of these two genes, instead the whole possible networks among all genes should be considered to find the one that best explains the expression data. Mathematically, DDN solves the optimization with the objective function of: p X 1 ð1Þ ð2Þ ð1Þ ð2Þ 2 f ðbi Þ ¼ kyi Xbi k2 þ l1 ð1 Wji qÞ bji þ bji þ l2 bi bi 2 1 j¼1

where i is the node index (e.g., mRNA, protein, miRNA); bji is the regression coefficient from node i to node j under a specific condition; yi and X are the expression values of dependent and input variables, respectively; p is the number of nodes; Wji is the a priori link from node i to node j via the adjacency matrix extracted from existing biological

Biological computing

Figure 3.6 (A). Differential dependent network analysis of fibrous plaque proteins collected from left anterior descending artery samples. Plot depicts rewiring of the protein network between fibrous plaque (FP) samples and normal samples. Green nodes indicate downregulated proteins and red nodes indicate upregulated proteins in FP samples. Green edges indicate significant correlation in normal samples but not FP samples. Red edges indicate significant correlation in FP samples. Black squares indicate differential network hub proteins GO term analysis of the differential network hub proteins indicated significant enrichment of tricarboxylic acid proteins (P ¼ 4.810e6). (B). Hub genes enriched in TCA cycle pathway. Every protein indicated by a red box was quantitatively lower in FP samples than in normal samples.

99

100

Yi Fu, Tsung-Heng Tsai, Chunhong Mao, Seong K. Mun, Habtom W. Ressom, Minkun Wang, Zhen Zhang and Yue Wang

Figure 3.7 (A) differential dependent network (DDN) analysis for protein expression data from ovarian cancer samples of a CPTAC project revealed a subnetwork of proteins that displayed distinct coexpression patterns between HRD and non-HRD patients. A significant enrichment of HDAC1 and its coregulated proteins in tumors with HRD is observed in tumors with HRD tumors. (B) H4 acetylation regulated by HDAC1 is found to be significantly higher in non-HRD tumors. (C) In independent data of ovarian cancer samples from a CPTAC prospective dataset, decreased acetylation of H4 in HRD is also observed.

knowledge (e.g., KEGG, PPI, STRING, HPRD, IPA); q is the degree of prior influence; and l1 and l2 are the Bayesian parameters on the two penalty terms used to assure both a sparse common network and a sparse differential networks (significant rewiring). DDNs adapt the BCD algorithm, an efficient closed-form solution, to perform the above convex optimization that fully exploits the block-wise separable penalties in Lasso-type models. Note that the differential connections, indicated by the differences between bji(1) and bji(2), are of particular interest, because such network rewiring may reveal pivotal information on how the biological system responds to different biological conditions or interventions. The network rewiring under different conditions will be inferred jointly by solving above equations sequentially for all nodes. The detection procedure proposed by DDN assures the statistical significance of the detected network topological changes by performing a permutation test on individual local structures. DDNs could also pinpoint “hot spots” in the network where the genes exhibit network topological changes between two conditions above a given significance level,

Biological computing

focusing on identifying the hub genes involved in changes in the topology of gene networks between two distinct biological statusesdfor example, between diseased and normal tissues. These hub genes identified by DDNs could play key roles in the underlying regulatory mechanisms between two conditions. For example, in a GPAA project, the hub genes identified by the DDN were found significantly enriched in a tricarboxylic acid cycle pathway that generates energy in cells (Fig. 3.6) [1]. In CPTAC ovarian cancer data, DDN analysis on ovarian cancer samples helped in identifying a subnetwork of 30 proteins involved in histone acetylation or deacetylation (Fig. 3.7) [2]. The differential acetylation was experimentally verified using synthetic peptides and targeted analysis [2].

3.7 Summary The bioinformatics and biological computing associated with DNA and mRNA are quite mature. The use of genomic DNA sequencing of humans and many other organisms creates an anchor for hundreds of associated databases: DNA polymorphisms, mRNA and EST mapping, microRNA, noncoding RNA, enhancers, evolutionary conservation, and others. Proteins are orders of magnitude more complex than DNA or mRNA patterns, with posttranslational modifications, subcellular localization, and binding partners all dictating protein activity and function. High-throughput proteomics is coming of age with the advent of high-resolution mass spectrometers and associated spectra-matching databases. Proteomic profiling using differentially labeled solutions of peptides is reaching widespread use, but bioinformatics and biological computing approaches are just beginning to be developed. Future challenges in biological computing include defining cell- and tissue-specific pathways and networks and the response of networks to environmental and physiological challenges. A focus will be on integration of DNA, mRNA, and proteomics datasets and databases, with attempts to garner support for established networks while defining new networks through a combination of computational modeling and experimental validation.

Acknowledgments The authors are grateful for research grant support from the National Institutes of Health under Grants HL111362-05A1, HL133932, W81XWH-18-1-0723 (BC171885P1), CA184902-01, CA185188, and GM123766.

References [1] D.M. Herrington, et al., Proteomic architecture of human coronary and aortic atherosclerosis, Circulation 137 (2018) 2741e2756. [2] H. Zhang, et al., Integrated proteogenomic characterization of human high-grade serous ovarian cancer, Cell 166 (2016) 755e765.

101

102

Yi Fu, Tsung-Heng Tsai, Chunhong Mao, Seong K. Mun, Habtom W. Ressom, Minkun Wang, Zhen Zhang and Yue Wang

[3] W. Liu, et al., Copy number analysis indicates monoclonal origin of lethal metastatic prostate cancer, Nature Medicine 15 (2009) 559e565. [4] R. Clarke, et al., The properties of high-dimensional data spaces: implications for exploring gene and protein expression data, Nature Reviews Cancer 8 (2008) 37e49. [5] E.P. Hoffman, et al., Expression profiling-best practices for data generation and interpretation in clinical trials, Nature Reviews Genetics 5 (2004) 229e237. [6] A.L. Barabasi, N. Gulbahce, J. Loscalzo, Network medicine: a network-based approach to human disease, Nature Reviews Genetics 12 (2011) 56e68. [7] F. Sanger, et al., Nucleotide sequence of bacteriophage phi X174 DNA, Nature 265 (1977) 687e695. [8] E.L. van Dijk, Y. Jaszczyszyn, D. Naquin, C. Thermes, The third revolution in sequencing technology, Trends in Genetics 34 (2018) 666e681. [9] E.R. Mardis, DNA sequencing technologies: 2006-2016, Nature Protocols 12 (2017) 213e218. [10] M. Kchouk, J. Gibrat, M. Elloumi, Generations of sequencing technologies: from first to next generation, Biology and Medicine 9 (2017) 8. [11] S. Goodwin, J.D. McPherson, W.R. McCombie, Coming of age: ten years of next-generation sequencing technologies, Nature Reviews Genetics 17 (2016) 333e351. [12] J.M. Heather, B. Chain, The sequence of sequencers: the history of sequencing DNA, Genomics 107 (2016) 1e8. [13] E.L. van Dijk, H. Auger, Y. Jaszczyszyn, C. Thermes, Ten years of next-generation sequencing technology, Trends in Genetics 30 (2014) 418e426. [14] C.-Y. Lee, et al., Common applications of next-generation sequencing technologies in genomic research, Translational Cancer Research 2 (2013) 33e45. [15] H. Stankova, et al., BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes, Plant Biotechnology J 14 (2016) 1523e1531. [16] O. Dudchenko, et al., De novo assembly of the Aedes aegypti genome using Hi-C yields chromosomelength scaffolds, Science 356 (2017) 92e95. [17] N.L. van Berkum, et al., Hi-C: a method to study the three-dimensional architecture of genomes, Journal of Visualized Experiments (2010). [18] J.M. Belton, et al., Hi-C: a comprehensive technique to capture the conformation of genomes, Methods 58 (2012) 268e276. [19] L. Mamanova, et al., Target-enrichment strategies for next-generation sequencing, Nature Methods 7 (2010) 111e118. [20] Z. Wang, M. Gerstein, M. Snyder, RNA-Seq: a revolutionary tool for transcriptomics, Nature Reviews Genetics 10 (2009) 57e63. [21] J. Costa-Silva, D. Domingues, F.M. Lopes, RNA-Seq differential expression analysis: an extended review and a software tool, PLoS One 12 (2017) e0190152. [22] F. Tang, et al., mRNA-Seq whole-transcriptome analysis of a single cell, Nature Methods 6 (2009) 377e382. [23] E. Shapiro, T. Biezuner, S. Linnarsson, Single-cell sequencing-based technologies will revolutionize whole-organism science, Nature Reviews Genetics 14 (2013) 618e630. [24] S. Liu, C. Trapnell, Single-cell transcriptome sequencing: recent advances and remaining challenges, F1000Res 5 (2016). [25] T.K. Olsen, N. Baryawno, Introduction to single-cell RNA sequencing, Current Protocols in Molecular Biology 122 (2018) e57. [26] B. Hwang, J.H. Lee, D. Bang, Single-cell RNA sequencing technologies and bioinformatics pipelines, Experimental & Molecular Medicine 50 (2018) 96. [27] S. Jaini, et al., Transcription factor binding site mapping using ChIP-seq, Microbiology Spectrum 2 (2014). [28] R. Nakato, K. Shirahige, Recent advances in ChIP-seq analysis: from quality management to wholegenome annotation, Briefings in Bioinformatics 18 (2017) 279e290. [29] J.T. Wade, Mapping transcription regulatory networks with ChIP-seq and RNA-seq, Advances in Experimental Medicine & Biology 883 (2015) 119e134.

Biological computing

[30] C. Smith, A.M. Stringer, C. Mao, M.J. Palumbo, J.T. Wade, Mapping the regulatory network for Salmonella enterica serovar typhimurium invasion, mBio 7 (2016). [31] G. Pavesi, ChIP-Seq data analysis to define transcriptional regulatory networks, Advances in Biochemical Engineering 160 (2017) 1e14. [32] T.H. Tsai, M. Wang, H.W. Ressom, Preprocessing and analysis of LC-MS-based proteomic data, Methods in Molecular Biology 1362 (2016) 63e76. [33] M.R. Nezami Ranjbar, Y. Zhao, M.G. Tadesse, Y. Wang, H.W. Ressom, Gaussian process regression model for normalization of LC-MS data using scan-level information, Proteome Science 11 (2013) S13. [34] M.R. Ranjbar, M.G. Tadesse, Y. Wang, H.W. Ressom, Bayesian normalization model for label-free quantitative analysis by LC-MS, IEEE/ACM Transactions on Computational Biology and Bioinformatics 12 (2015) 914e927. [35] T.H. Tsai, et al., Multi-profile Bayesian alignment model for LC-MS data analysis with integration of internal standards, Bioinformatics 29 (2013) 2774e2780. [36] T.H. Tsai, M.G. Tadesse, Y. Wang, H.W. Ressom, Profile-Based LC-MS data alignment–a Bayesian approach, IEEE/ACM Transactions on Computational Biology and Bioinformatics 10 (2013) 494e503. [37] L. Chen, et al., MS-PyCloud: An Open-Source, Cloud Computing-Based Pipeline for LC-MS/MS Data Analysis, 2018, 320887. [38] D.L. Tabb, C.G. Fernando, M.C. Chambers, MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis, Journal of Proteome Research 6 (2007) 654e661. [39] S. Kim, P.A. Pevzner, MS-GFþ makes progress towards a universal database search tool for proteomics, Nature Communications 5 (2014) 5277. [40] J.K. Eng, T.A. Jahan, M.R. Hoopmann, Comet: an open-source MS/MS sequence database search tool, Proteomics 13 (2013) 22e24. [41] R. Craig, R.C. Beavis, TANDEM: matching proteins with tandem mass spectra, Bioinformatics 20 (2004) 1466e1467. [42] Z.Q. Ma, et al., IDPicker 2.0: improved protein assembly with high discrimination peptide identification filtering, Journal of Proteome Research 8 (2009) 3872e3881. [43] R. Patro, C. Kingsford, Predicting protein interactions via parsimonious network history inference, Bioinformatics 29 (2013) i237e246. [44] B. Zhang, M.C. Chambers, D.L. Tabb, Proteomic parsimony through bipartite graph analysis improves accuracy and transparency, Journal of Proteome Research 6 (2007) 3549e3557. [45] K.W. Lau, A.R. Jones, N. Swainston, J.A. Siepen, S.J. Hubbard, Capture and analysis of quantitative proteomic data, Proteomics 7 (2007) 2787e2799. [46] P.L. Ross, et al., Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents, Molecular Cell Proteomics 3 (2004) 1154e1169. [47] S.M. Herbrich, et al., Statistical inference from multiple iTRAQ experiments without using common reference standards, Journal of Proteome Research 12 (2013) 594e604. [48] U.S. House, Committee on the Judiciary, C.o.t.J.S.N, 1958. [49] T.E.o.E. Britannica, vol. 2019, Britannica, 2018. [50] G.H. Community, vol. 2019, Free Software Foundation, Inc, 2010. [51] R. Stallman, vol. 2019, Free Software Foundation, gnu.org; 1997. [52] IBM, vol. 2019, IBM, 2019. [53] O.S. Initiative, Licenses by Name vol. 2019, 2019. [54] O.S. Initiative, Open Source Licenses by Category, vol. 2019, 2019. [55] D. Wheeler, vol. 2019, David Wheeler, dwheeler.com; 2007. [56] I. GitHub, vol. 2019, Microsoft, 2019. [57] O.S. Guides, vol. 2019, GitHub, 2019. [58] D. Wynn, R. Pratt, R. Bradley, in: J. Chenok, N. Gardner (Eds.), Innovation Series, vol. 35, IBM Center for The Business of Government, 2015. [59] OSEHRA, vol. 2019, OSEHRA, 2019.

103

104

Yi Fu, Tsung-Heng Tsai, Chunhong Mao, Seong K. Mun, Habtom W. Ressom, Minkun Wang, Zhen Zhang and Yue Wang

[60] I. Red Hat, Linux Platform, vol. 2019, Red Hat, 2019. [61] K. H, Systems biology: a brief overview, Science 295 (2002) 1662e1664. [62] M. Isalan, et al., Evolvability and hierarchy in rewired bacterial gene networks, Nature 452 (2008) 840e845. [63] F. N., Inferring cellular networks using probabilistic graphical models, Science of Science (2004). [64] I. Shmulevich, et al., Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks, Bioinformatics 18 (2002) 261e274. [65] C. Rangel, et al., Modeling T-cell activation using gene expression profiling and state-space models, Bioinformatics 20 (2004) 1361e1372. [66] J.C. Liao, et al., Network component analysis: reconstruction of regulatory signals in biological systems, Proceedings of the National Academy of Sciences of the United States of America 100 (2003) 15522e15527. [67] A. Califano, Rewiring makes the difference, Molecular Systems Biology 7 (2011) 463. [68] L. Hood, et al., Systems biology and new technologies enable predictive and preventative medicine, Science 306 (2004) 640e643. [69] S. Bandyopadhyay, et al., Rewiring of genetic networks in response to DNA damage, Science 330 (2010) 1385e1389. [70] R. Gill, S. Datta, S. Datta, A statistical framework for differential network analysis from microarray data, BMC Bioinformatics 11 (2010) 95. [71] Y. Tian, et al., Integration of network biology and imaging to study cancer phenotypes and responses, IEEE/ACM Transactions on Computational Biology and Bioinformatics 11 (2014) 1009e1019. [72] Y. Tian, et al., KDDN: an open-source Cytoscape app for constructing differential dependency networks with significant rewiring, Bioinformatics 31 (2015) 287e289. [73] Y. Tian, et al., Knowledge-fused differential dependency network models for detecting significant rewiring in biological networks, BMC Systems Biology 8 (2014) 87.

CHAPTER FOUR

Picture archiving and communication systems and electronic medical records for the healthcare enterprise Brent J. Liu1 and H.K. Huang1, 2, 3 1

University of Southern California, Los Angeles CA, USA Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Shanghai Institute of Technical Physics, The Chinese Academy of Sciences, Shanghai, China

2 3

4.1 Introduction Picture archiving and communication systems (PACSs) based on digital, communication, display, and information technologies have revolutionized the practice of radiology, and in a sense the entire clinical continuum in medicine during the past 10 years. This chapter introduces its basic concept, terminology, technology development, and implementation as well as its integration and experiences within the clinical practice. There are many advantages to introducing digital, communications, display, and information technologies to the conventional paper- and film-based operation in radiology and medicine. In addition, the integration of the hospital information system (HIS) with electronic medical records (EMRs), and its impact on the healthcare enterprise and PACS in the last 10 years, are discussed in the latter portion of the chapter.

4.1.1 The role of the picture archiving and communication system in the clinical environment PACS and information technology (IT) can be utilized to improve healthcare delivery workflow efficiency, resulting in faster healthcare delivery and reduced operating costs. With all these benefits, digital, communication, and information technologies are gradually changing the method of acquiring, storing, viewing, and communicating medical images and related information in the healthcare industry. One natural development along this line is the emergence of digital radiology departments and the digital healthcare delivery environment. A digital radiology department has two components: a radiology information system (RIS) and a digital imaging system. The RIS is a subset of the HIS or clinical management system (CMS). When these systems are combined with the EMR system, which manages selected data of the patient, the arrival of the total filmless and paperless healthcare delivery system can become a reality. The digital imaging system, sometimes referred to as a PACS or an image management and Biomedical Information Technology ISBN 978-0-12-816034-3, https://doi.org/10.1016/B978-0-12-816034-3.00004-3

Ó 2020 Elsevier Inc. All rights reserved.

105 j

106

Brent J. Liu and H.K. Huang

communication system, involves image acquisition, archiving, communication, retrieval, processing, distribution, and display. A digital healthcare environment consists of the integration of EMR, HIS/CMS, PACS, and other digital clinical systems. The healthcare delivery system related to PACS and IT is reaching one billion dollars per year (excluding imaging modalities) and continues to grow.

4.1.2 The role of the picture archiving and communication system in medical imaging informatics PACS originated as an image management system for improving the efficiency of the radiology practice. However, it has evolved into an enterprise-wide healthcare system that integrates information media in multiple forms including voice, text, medical records, waveform images, and video recordings. To integrate these various data types requires multimedia technology including hardware platforms, information systems and databases, communication protocols, display technology, and system interfacing and integration. As PACS continues to grow and evolve in its role within the clinical continuum, it has been integrated with these various enterprise-wide media formats to provide a more complete patient record. This wealth of information data becomes the fundamental basis for new approaches in medical research and practice through the discipline of medical imaging informatics, thus ultimately improving overall healthcare delivery, research, and education.

4.1.3 General picture archiving and communication system design: introduction and impact A PACS consists of image and data acquisition, storage, and display subsystems integrated by digital networks and application software. PACS design should emphasize system connectivity and integration. A general multimedia data management system that is easily expandable, flexible, and versatile in its operation calls for both top-down management to integrate various HISs and a bottom-up engineering approach to build a foundation (i.e., PACS infrastructure). A hospital-wide or enterprise PACS is attractive to hospital administrators because it provides economic justification through a returnon-investment cost analysis for implementing the system. In addition, proponents of PACS are convinced that its ultimately favorable cost-benefit ratio should not be evaluated as the balance of the resources of the radiology department alone but should extend to the entire healthcare enterprise. Many hospitals and enterprise-level healthcare entities around the world have implemented large-scale PACS and provided solid evidence that PACS improves the efficiency of healthcare delivery while concurrently saving hospital operational costs. From an engineering point of view, the PACS infrastructure is the basic design concept to ensure that PACS includes features such as standardization, open architecture, expandability for future growth, connectivity,

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

reliability, fault tolerance, and cost-effectiveness. This design philosophy can be constructed in a modular fashion with the infrastructure design described in Section 4.2.

4.1.4 Chapter overview This chapter will first describe the PACS infrastructure and its various components in detail. The latter half of the chapter will conclude with implementation and integration strategies for installing a PACS within a healthcare environment as well as clinical experiences derived from various healthcare institutions’ PACS process. The final sections will introduce the EMR and its integration with the HIS and PACS to provide a more complete data record for the patient and enable large-scale health data analytics to ultimately improve overall healthcare delivery, research, and education.

4.2 Picture archiving and communication system infrastructure 4.2.1 Introduction to picture archiving and communication system infrastructure design PACS infrastructure design is a necessary framework for the integration of distributed and heterogeneous imaging modalities while supporting intelligent database management and clinical workflow of all patient-related information. The infrastructure design offers an efficient means of viewing, analyzing, and documenting image study results and provides a distribution method for effectively communicating study results to both radiologists and referring physicians. The PACS infrastructure consists of a basic skeleton of components (imaging modality interfaces, data storage devices, computers, communication networks, and display systems) integrated with a standardized and robust software system with flexibility for communication, database management, storage management, job scheduling, interprocessor communication, error handling, and network monitoring. When new technologies become available, the infrastructure design provides a fundamental framework for one-to-one replacement of these hardware components. The infrastructure as a whole is versatile and can incorporate rules to reliably perform not only basic PACS management operations but also more complex research, clinical service, and educational requests. The software modules of the infrastructure utilize the ability to handshake and communicate at a system level to permit the components to work together as a system rather than as individual networked computers. The corresponding components of the general PACS infrastructure include patient and image data servers, imaging modalities, data/modality interfaces, PACS server with database and archive, and display workstations connected by communication networks for handling the data/image flow in the PACS and tuned for more efficient clinical workflow. Image and data stored in the PACS can be extracted from the archive and transmitted to application servers for various uses. Fig. 4.1 shows PACS basic

107

Brent J. Liu and H.K. Huang

108 HIS/RIS Database

Report Result

Database Gateway Imaging Modalities

Acquisition Gateway

PACS SERVER:

Workstations

Image Database & Archive

Application Servers

Web Server

Figure 4.1 Generic picture archiving and communication system components and data flow.

components and data flow. This diagram will be expanded to present additional details in later chapters as well as new technologies that have transformed the overall PACS landscape. The PACS application server concept shown at the bottom of Fig. 4.1 broadens the role of PACS in the healthcare delivery system, as it has contributed to the advance of the medical imaging informatics field during the past several years. The Web server is the current state-of-the-art solution to distribute PACS studies through wide area networks (WANs) to clinics and physicians’ offices. Sometimes the Web server is used within healthcare enterprise local area networks (LANs) to distribute PACS studies throughout the hospital or healthcare institution.

4.2.2 Industry standards Transmission of images and textual information between healthcare information systems has always been challenging for two main reasons. First, information systems use different computer hardware and software platforms, and second, images and data are generated from various imaging modalities by different manufacturers. The healthcare industry standards Health Level 7 (HL7) and Digital Imaging and Communications in Medicine (DICOM) have facilitated the integration of these heterogeneous, disparate medical images and textual data into an organized system. In general, interfacing two healthcare components requires two ingredients, a common data format and a communication protocol. HL7 is the standard textual data format, whereas DICOM includes image and textual data format and communication protocols. In conforming to the HL7 standard, it is possible to share HIS/EMR, RIS, and PACS. By adapting the DICOM standard, medical images generated from a variety of image modalities and manufacturers can be interfaced as an integrated healthcare system. These two standards will be discussed in more detail in the following paragraphs. Furthermore, the healthcare initiative titled Integrating the Healthcare Enterprise (IHE) provides a clinical workflow

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

model for driving the adoption of standards and will also be addressed. With the available HL7 and DICOM standards, IHE combines the standards along with clinical workflow profiles to persuade users and manufacturers to adopt and use as best practice methods within the daily clinical practice. 4.2.2.1 Health Level 7 HL7, established in March 1987, was organized by a user-vendor committee to develop a standard for electronic data exchange in healthcare environments, particularly for hospital applications. Within the HL7 standard, “Level Seven” refers to the highest level, which is the application level, in the Open Systems Interconnection (OSI) seven-level communication model. The common goal is to simplify interface implementation between computer applications from multiple vendors. HL7 emphasizes data formats and protocols for exchanging certain key textual data among healthcare information systems, such as HIS/EMR, RIS, and PACS. HL7 addresses the highest level (level 7) of the OSI model of the International Standards Organization (ISO), but it does not conform specifically to the defined elements of the OSI’s seventh leveldit conforms to the conceptual definitions of an application-to-application interface placed in the seventh layer of the OSI model. These definitions were developed to facilitate data communication in a healthcare setting by providing rules to convert abstract messages associated with real-world events into strings of characters comprised in an actual message. The most commonly used HL7 today is version 2.X, which has many options and is thus flexible. During the past few years, version 2.X has been developed continuously and is widely and successfully implemented in the healthcare environment. version 2.X and older versions use a “bottom-up” approach, beginning with very general concepts and adding new features as needed. These new features become options to the implementers so that the standard is very flexible and easy to adapt to different sites. However, these options and flexibility also make it impossible to have reliable conformance tests of any vendor’s implementation. This forces vendors to spend more time analyzing and planning their interfaces to ensure that the same optional features are used in both interfacing parties. There is also no consistent view of the data when HL7 moves to a new version, or of that data’s relationship to other data. Therefore, a consistently defined and object-oriented version of HL7 was needed, which is version 3. Development of version 3 started around 1995 with an initial standard release in December 2005. The primary goal of HL7 version 3 is to offer a definite and testable standard. Version 3 is based on a formal methodology called the HL7 development framework and uses an object-oriented methodology with a reference information model (RIM) to create HL7 messages. The object-oriented method is a “top-down” method. RIM is the backbone of HL7 version 3, as it provides an explicit representation of the semantic and lexical connections between the information in the fields of HL7 messages. Because each aspect of the RIM is well defined, very few options exist in version 3. Through the

109

110

Brent J. Liu and H.K. Huang

object-oriented method and RIM, HL7 version 3 will improve on many of the shortcomings of the 2.X versions. Version 3 uses XML (extensible markup language) for message encoding to increase interoperability between systems and will include new data interchange formats beyond the American Standard Code for Information Interchange (ASCII) as well as support of component-based technology. HL7 version 3 will offer tremendous benefits to providers and vendors as well as analysts and programmers, but complete adoption of the new standard will take time and effort. 4.2.2.2 Digital Imaging and Communications in Medicine version 3.0 standard ACR-NEMA, formally known as the American College of Radiology and the National Electrical Manufacturers Association, created a committee to develop a set of standards to serve as the common ground for various medical imaging equipment vendors. The goal was for newly developed instruments to be able to communicate and participate in sharing medical image information, in particular within the PACS environment. The committee, which focused chiefly on issues concerning information exchange, interconnectivity, and communications between medical systems, began development in 1982. The first version, which emerged in 1985, specified standards in point-to-point message transmission, data formatting, and presentation and included a preliminary set of communication commands and a data format dictionary. The second version, ACRNEMA 2.0, published in 1988, was an enhancement to the first release. It included hardware definitions and software protocols as well as a standard data dictionary. However, networking issues were not addressed adequately in either version. For this reason, a new version that included more defined network protocols was released in 1992. Because of the magnitude of changes and additions in this newer version, it was given an entirely new name: DICOM version 3.0. In 1996, an updated version was released consisting of 13 published parts that form the basis of future DICOM new versions and parts. Manufacturers readily adopted this version to their imaging products. Currently, the latest version of DICOM has been expanded to 18 parts. Two fundamental components of DICOM are the information object class and the service class. Information objects define the contents of a set of images and their relationships, and the service classes describe what to do with these objects. The service classes and information object classes are combined to form the fundamental units of DICOM, called serviceobject pairs (SOPs). The next few paragraphs will describe the DICOM data model, which represents the information object, and the DICOM service classes. 4.2.2.3 Digital Imaging and Communications in Medicine data model Two components relate to the DICOM data model: the DICOM model of the real world and the DICOM file format. The former is used to define the hierarchical data structure levels from the patient, studies, series, and images and waveforms. The latter describes how to encapsulate a DICOM file ready for a DICOM SOP service.

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

The DICOM model of the real world defines several real-world objects in the clinical imaging arena (e.g., patient, study, series, image, etc.) and their relationships within the scope of the DICOM standard. It provides a framework for various DICOM information object definitions. The DICOM model defines four object levels: (1) patient; (2) study; (3) series and equipment; 4) image, waveform, and structured report document. Each of the above levels can contain several (1 to n or 0 to n) sublevels. Fig. 4.2 shows the DICOM realworld data model. Note the levels with which the above-mentioned four objects reside. The DICOM file format defines how to encapsulate the DICOM data set of an SOP instance in a DICOM file. Each file usually contains one SOP instance. The DICOM

Figure 4.2 DICOM model of the real world showing the four main object levels: (1) patient; (2) study; (3) series; and (4) image. Note that there can multiple instances of each object belonging to the patient.

111

112

Brent J. Liu and H.K. Huang

file starts with the DICOM file metadata information (optional), followed by the bit stream of the data set, and ends with the image pixel data if it is a DICOM image file. The DICOM file metadata information includes file identification information. The metadata information uses explicit value representation transfer syntax for encoding. Therefore, the metadata information does not exist in the implicit value representationeencoded DICOM file. Explicit and implicit value representation are two coding methods in DICOM. Vendors or implementers have the option of choosing either one for encoding. DICOM files encoded by both coding methods can be processed by most of the DICOM-compliant software. One data set represents a single SOP instance. A data set is constructed of data elements. Data elements contain the encoded values of the attributes of the DICOM object. If the SOP instance is an image, the last part of the DICOM file is the image pixel data. 4.2.2.4 Digital Imaging and Communications in Medicine service classes DICOM services are used for communication of imaging-related information objects within a device and for the device to perform a service for the object (for example, to store the object, to display the object, etc.). A service is built on top of a set of “DICOM message service elements” (DIMSEs). These DIMSEs are computer software programs written to perform specific functions. There are two types of DIMSEs, one for the normalized objects and the other for the composite objects. DIMSEs are paired in the sense that a device issues a command request and the receiver responds to the command accordingly. The composite commands are generalized, whereas the normalized commands are more specific. DICOM services are referred to as “service classes” because of the object-oriented nature of its information structure model. If a device provides a service, it is called a service class provider; if it uses a service, it is a service class user. Note that a device can be either a service class provider or a service class user or both, depending on how it is used. DICOM uses existing network communication standards based on the ISO’s Open Systems Interconnection (ISO-OSI) for imaging-related information transmission. The ISOeOSI consists of seven layers from the lowest physical (cables) layer to the highest application layer. When imaging-related information objects are sent between layers in the same device, the process is called a service. When objects are sent between two devices, it is called a protocol. When a protocol is involved, several steps are invoked in two devices; the two devices are referred to as in “association” using DICOM. If an imaging device transmits an image object with a DICOM command, the receiver must use a DICOM command to receive the information. On the other hand, if a device transmits a DICOM object with transmission control protocol/Internet protocol (TCP/ IP) communication through a network without invoking DICOM communication, any device connected to the network can receive the data with TCP/IP. However, a decoder is still needed to convert the DICOM object for proper use. The most commonly used

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

communication protocol in DICOM is TCP/IP for transmitting DICOM image objects within PACS. To an end user, the two most important DICOM services are (1) send and receive images; and (2) query and retrieve images. The query and retrieve services are built on top of the send and receive services. 4.2.2.5 Integrating the Healthcare Enterprise Even with the DICOM and HL7 standards available, there is still a need of common consensus on how to use these standards for integrating heterogeneous healthcare information systems. IHE is not a standard nor a certifying authority, instead it is an initiative that was created to develop a high-level information model for driving adoption of HL7 and DICOM standards. IHE is a joint initiative of RSNA (Radiological Society of North America) and HIMSS (Healthcare Information and Management Systems Society) started in 1998. The mission was to define and guide manufacturers to utilize DICOM- and HL7-compliant equipment and information systems to facilitate daily clinical workflow operations. The IHE technical framework defines a common information model and vocabulary for using DICOM and HL7 to complete a set of well-defined radiological and clinical transactions for a certain task. These common vocabularies and models would then facilitate healthcare providers and technical personnel in understanding each other better, which then would lead to smooth systems integration. The first large-scale demonstration was held at the RSNA annual meeting in 1999 and thereafter in 2000 and 2001 RSNA and HIMSS 2001, 2002. In these demonstrations, manufacturers came together to show how actual products could be integrated based on certain IHE protocols. It is the belief of RSNA and HIMSS that with successful adoption of IHE, better integration and use of healthcare systems would benefit both the users and the providers. The IHE Integration Profiles provide a common language, vocabulary, and platform for healthcare providers and manufacturers to discuss integration needs and the integration capabilities of products. IHE initially started with a few clinical domains of which radiology was the major domain with the most integration profiles. As of the 2018 publication year, there are 10 IHE domains each with implemented profiles and continues to grow. The 10 implemented IHE domains with profiles are: (1) IHE cardiology profiles (2) IHE eye care profiles (3) IHE IT infrastructure profiles (4) IHE pathology and laboratory medicine profiles (5) IHE patient care coordination profiles (6) IHE patient care device profiles (7) IHE pharmacy profiles (8) HE quality, research, and public health profiles (9) IHE radiation oncology (10) IHE radiology profiles

113

114

Brent J. Liu and H.K. Huang

Specifically, for IHE radiology profiles, the following integration profiles are divided into subgroups in either final draft or trial draft status: Profiles for workflow (1) Scheduled workflow (2) Patient information reconciliation (3) Postprocessing workflow (4) Reporting workflow (5) Import reconciliation workflow (6) Encounter-based imaging workflow (7) Mammography acquisition workflow (8) Postacquisition workflow Profiles for Content (1) Nuclear medicine image (2) Mammography image (3) Evidence documents (4) Simple image and numeric report (5) Radiation exposure monitoring (6) Radiation exposure monitoring for nuclear medicine (7) Computed tomography (CT)/MR perfusion imaging (8) Magnetic resonance (MR) diffusion imaging (9) Chest X-ray CAD display (10) Digital breast tomosynthesis Profiles for presentation (1) Key image note (2) Consistent presentation of images (3) Presentation of grouped procedures (4) Image fusion (5) Basic image review Profiles for Infrastructure (1) Portable data for imaging (2) Cross-enterprise document sharing for imaging (3) Teaching File and Clinical Trial Export (4) Access to radiology information (5) Audit trail and node authentication-radiology option (6) Charge posting (7) Cross-community access for imaging (8) Cross-enterprise reliable document interchange (9) Imaging object change management (10) Invoke image display

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

(11) Clinical decision support order appropriateness tracking (12) Web-based imaging capture (13) Standardized operational log of events

4.2.3 Connectivity and open architecture If PACS modules in the same hospital cannot communicate with each other, they become isolated systems, each with its own images and patient information. It would be difficult to combine these modules to form a total hospital-integrated PACS. Open network design is essential, allowing a standardized method for data and message exchange between heterogeneous systems. Because computer and communications technology changes rapidly, a closed architecture would hinder system upgradeability. For example, suppose an independent imaging workstation from a given manufacturer would, at first glance, make a good additional component to an MRI scanner for viewing images. If the workstation has a closed proprietary architecture design, however, no components except those specified by the same manufacturer can be augmented to the system. Potential overall system upgrading and improvement would be limited. Considerations of connectivity are important even when a small-scale PACS is planned.

4.2.4 Reliability Reliability is a major concern in a PACS for two reasons. First, a PACS has many components; the probability of a component failing is high. Second, because PACS manages and displays critical patient information, extended periods of downtime cannot be tolerated. The PACS can be considered a mission critical system within the healthcare enterprise that should strive for continuous operation 24 h/7 days a week. In designing a PACS, it is therefore important to use fault-tolerant measures including error detection and logging software, external auditing programs (i.e., network management processes that check network circuits, disk space, database status, processer status, and queue status), hardware redundancy, and intelligent software recovery. Some fail recovery mechanisms that can be used to include an automatic retry of failed jobs with alternative resources and algorithms and intelligent bootstrap routines (a software block executed by a computer when it is restarted) that allow a PACS computer to automatically continue operations after a power outage or system failure. Improving reliability is costly; however, it is essential to maintain high reliability of a complex system.

4.2.5 Security Security, particularly the need for patient confidentiality, is an important consideration because of medical-legal issues and the HIPAA (Health Insurance Portability and Accountability Act) mandated in April 2003. The violation of data security can occur in three different types: physical intrusion, misuse, and behavioral violations. Physical

115

Brent J. Liu and H.K. Huang

116

intrusion relates to facility security, which can be handled by building management. Misuse and behavioral violations can be minimized by account control and privilege control. Most sophisticated database management systems have identification and authorization mechanisms that use accounts and passwords. Application programs may supply additional layers of protection. This is especially the case with web-based applications where transmission of healthcare data occurs over public networks. Further secure transfer protocols are needed. Privilege control refers to granting and revoking user access to specific tables, columns, or views from the database. These security measures provide the PACS infrastructure with a mechanism for controlling access to clinical and research data. With these mechanisms, the system designer can enforce policy as to which persons have access to clinical studies. In some hospitals, for example, referring clinicians are granted image study access only after a preliminary radiology reading has been performed and attached to the image data. An additional security measure is the use of the image digital signature during data communication. If implemented, this feature would increase the system software overhead, but data transmission through open communication channels is more secure.

4.2.6 Current picture archiving and communication system architectures In the past, there were three basic PACS architectures: (1) stand-alone, (2) client-server, and (3) Web-based. From a fault-tolerant design, the standalone architecture was the most robust solution since it would ensure 24/7 mission critical operations should one of the various PACS components faildespecially the PACS server. However, from a clinical workflow perspective, there were many weaknesses associated with the standalone architecture. With the advent of more affordable cluster server and fault-tolerant software solutions, the PACS industry has moved to one of the two remaining PACS architectures that will be discussed in more detail. From these two basic PACS architectures, there are variations and hybrid design types. 4.2.6.1 Client/server picture archiving and communication system architecture The three major features of the client/server model are (1) Images are centrally archived at the PACS server. (2) From a single worklist at the client workstation, an end-user selects images via the archive server. Worklist filters provide the necessary workflow and can be personalized according to the specific user. (3) Because workstations have no cache storage, images are flushed after reading or stored in a temporary cache. Data workflow of the client/server PACS model is shown in Fig. 4.3.

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

Figure 4.3 Client/server picture archiving and communication system architecture and the five workflow steps as described in the text.

Following the numerals: (1) Images from an exam acquired by the imaging modality are sent to the PACS archive server. Image acquisition gateways may be implemented to buffer and stage the archival process for large institutions with a large number of modalities. (2) PACS archive server stores the exam. (3) End-user workstations, or client workstations, have access to entire patient/study database of archive server. The end-user may select preset filters on the main worklist to shorten the number of worklist entries for easier navigation and customized to specific preferences. (4) Once exam is located on worklist and selected, images from the PACS exam are loaded from the server directly into the memory of the client workstation for viewing. Historical PACS exams are loaded in the same manner. (5) Once end-user has completed reading/reviewing the exam, the image data are flushed from memory or stored temporarily in a cache, leaving no image data in local persistent storage on the client workstation Advantages: • Any PACS exams are available on any end-user workstation at any time, making it convenient to read/review. • No prefetching or study distribution is needed. • No query/retrieve function is needed. End-user just selects the exam from the worklist on the client workstation and images are loaded automatically. • Because the main copy of a PACS exam is located on the PACS server and is shared by the client workstations, radiologists will be aware of when they are reading the same exam at the same time and thus avoid duplicate readings.

117

Brent J. Liu and H.K. Huang

118

Disadvantages: • The PACS server is a single point of failure; if it goes down, the entire PACS is down. In this case, end-users will not be able to view any exams on the client workstations. Newly acquired exams must be held back from archival at the modalities until server is back up. • Because there are more database transactions in the client-server architecture, the system is exposed to more transaction errors. • The architecture is very dependent on network performance. • Exam modification to the DICOM header for quality control is not available before archiving. 4.2.6.2 Web-based model The Web-based model PACS is similar to the client/server architecture with regards to data flow. However, the main difference is that the client software is a Web-based application. Additional advantages as compared with client/server: • The client workstation hardware can be platform independent as long as the web browser is supported. • The system is a completely portable application that can be used both on-site and at home with an Internet connection. Additional disadvantages as compared with client/server: • The system may be limited in the amount of functionality and performance by the web browser. As previously mentioned, with consistent new technology, hardware, and software improvements to database management and performance, clustered and parallel servers, network performance, and browser functionality, the client/server and web-based models have become the architecture of choice for PACS vendors.

4.3 Picture archiving and communication system components and workflow 4.3.1 Introduction of components This section provides an overview of each of the PACS components and its integration within the clinical workflow. This includes the basic concept of PACS and its components, which describes the general component architecture and requirements. In addition, a generic PACS workflow in radiology will be discussed that highlights the functionalities and use of these components. As discussed in the previous section, a PACS should be DICOM compliant. The basic components of PACS consists of an image and data acquisition gateway, a PACS server and archive, and display workstations integrated together by digital networks as shown in Fig. 4.1. The following subsections introduce these components in more detail.

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

4.3.2 Image acquisition gateway PACS requires that images from imaging modalities (devices) and related patient data from the HIS/EMR and the RIS be sent to the PACS server and archive. A major task in PACS is to acquire images reliably and in a timely manner from each radiological imaging modality and relevant patient data including study support text information of the patient, description of the study, and parameters pertinent to image acquisition and processing. Image acquisition is a major task for three reasons. First, the imaging modality is not under the auspices of the PACS. Many manufacturers supply various imaging modalities, each of which has its own DICOM-compliant statement. Worse, some older (often called “legacy systems”) imaging modalities may not even be DICOM compliant. To connect many imaging modalities to the PACS requires tedious and labor-intensive work and the cooperation of modality manufacturers. Second, image acquisition is a slower operation than other PACS functions because patients are involved, and it takes the imaging modality some time to acquire the necessary data for image reconstruction. Third, images and patient data generated by the modality sometime may contain format information unacceptable to the PACS operation. To circumvent these difficulties, an image acquisition gateway computer is usually placed between the imaging modality, or modalities, and the rest of the PACS network to isolate the host computer in the radiological imaging modality from the PACS. Isolation is necessary because traditional imaging device computers lack the necessary communication and coordination software that is standardized within the PACS infrastructure. Furthermore, these host computers may not contain enough software intelligence to coordinate with the PACS server to recover various errors such as during network transmission. The image acquisition gateway computer has three primary tasks: It receives image studies data from the radiological imaging modality, converts the data from manufacturer specifications to a PACS standard format (header format, byte ordering, matrix sizes) that is compliant with the DICOM data formats as needed, and queues the image study for forwarding to the PACS server. Connecting a general-purpose PACS acquisition gateway computer with a radiological imaging modality can be implemented with two methods. With peer-to-peer network interfaces, which use TCP/IP Ethernet, image transfers can be initiated either by the radiological imaging modality (a “push” operation) or by the destination PACS acquisition gateway computer (a “pull” operation). The pull mode is advantageous because if an acquisition gateway computer goes down, images can be queued in the radiological imaging modality computer until the gateway computer becomes operational again, at which time the queued images can be pulled and normal image flow resumed. Assuming that sufficient data buffering is available in the imaging modality computer, the pull mode is the preferred mode of operation because an acquisition computer can be programmed to reschedule study transfers if failure occurs (due to itself or the radiological imaging modality). If the designated acquisition gateway computer is

119

120

Brent J. Liu and H.K. Huang

down, and a delay in acquisition is not acceptable, images from the examination can be rerouted to another networked designated backup acquisition gateway computer or a workstation. However, because of the rapid improvements in central processing unit (CPU) power and memory within the host computer of the imaging modality, the “push” method is currently the most popular clinical implementation. Although traditionally, the image acquisition gateway is a separate computer device within PACS, improvements in server hardware processing speed and memory have provided some manufacturers with the ability to integrate the image acquisition gateway component within the PACS server. Although the image acquisition gateway shares the same hardware as the PACS server, the main functionalities remain the same as a standalone image acquisition gateway.

4.3.3 Picture archiving and communication system server and image archive Imaging study examinations along with pertinent patient information from the acquisition gateway computer, the HIS/EMR, and the RIS are sent to the PACS server. The PACS server is the engine of the PACS consisting of high-end computer hardware servers; its two major components are a database server and an archive system. The archive system consists of short-term, long-term, and permanent storage. Current trends in storage technologies have provided the image archive system with a variety of solutions. These components and solutions are explained in more detail in the next section. The following lists some major functions of a PACS server: (1) Receives images from examinations (exams) via acquisition gateway computers (2) Extracts text information describing the received exam (3) Updates a network-accessible database management system (4) Determines the destination workstations to which newly generated exams are to be forwarded (5) Automatically retrieves necessary comparison images from a distributed cache storage or long-term library archive system (6) Automatically corrects the orientation of computed radiography images (7) Determines optimal contrast and brightness parameters for image display (8) Performs image data compression if necessary (9) Performs data integrity check if necessary (10) Archives new exams onto long-term archive library (11) Deletes images that have been archived from acquisition gateway computers (12) Services query/retrieve requests from workstations and other PACS servers in the enterprise PACS (13) Interfaces with PACS application servers

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

4.3.4 Display workstations A workstation includes communication network hardware and software, a local database, a display monitor system, resource management, and processing software. The fundamental workstation operations are listed in Table 4.1. There are four types of display workstations categorized by their resolutions: (1) high-resolution (2.5 2 K) liquid crystal display (LCD) for primary diagnosis at the radiology department, (2) mediumresolution (2000 1600 or 1600 1K) LCD for primary diagnosis of sectional images and at the hospital wards, (3) physician desktop workstation (1K 768) LCD, and (4) hard copy workstations for printing images on film or paper. The primary diagnostic workstation also has access to the PACS server database for retrieving images in an ondemand fashion. Figs. 4.4e4.6 show examples of a typical PACS diagnostic workstation displaying various PACS studies. Note the toolset at the bottom of each figure utilized for manipulating the digital PACS study for case presentation, interpretation, and documentation.

4.3.5 Communications and networking A basic function of any computer network is to provide an access path by which end users (e.g., radiologists and clinicians) at one geographic location can access information (e.g., images and reports) at another location. The important networking data needed for system design includes location and function of each network node, frequency of information passed between any two nodes, cost for transmission between nodes with various-speed lines, desired reliability of the communication, and required throughput. The variables in the design include the network topology, communication line capacities, and data flow assignments.

Table 4.1 Major functions of a picture archiving and communication system display workstation. Function Description

Case preparation Case selection Image arrangement or hanging protocols Interpretation Documentation Case presentation Image reconstruction

Accumulation of all relevant images and information belonging to a patient examination Selection of cases for a given subpopulation Tools for arranging and grouping images for easy review Measurement tools for facilitating the diagnosis Tools for image annotation, text, and voice reports Tools for a comprehensive case presentation Tools for various types of image reconstruction for proper display

121

122

Brent J. Liu and H.K. Huang

Figure 4.4 Example of a picture archiving and communication system diagnostic workstation displaying a magnetic resonance brain exam.

At the local area network level, digital communication in the PACS infrastructure design uses fast (1 gigabit/s) Ethernet and high-speed asynchronous transfer mode (ATM, 155e622 megabits/s and up) technology. In a wide area network, various digital service (DS) speeds can be used, which range from DS-1 (T1, 1.544 megabits/s) to DS-3 (45 megabits/s) and ATM (155e622 megabits/s). There is a trade-off between transmission speed and cost.

Figure 4.5 Example of a picture archiving and communication system diagnostic workstation displaying a computed radiography chest exam.

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

Figure 4.6 Example of a picture archiving and communication system diagnostic workstation displaying a CT chest exam.

The network protocol used should be standard, for example, TCP/IP and the DICOM communication protocol (a higher level of TCP/IP). Sometimes several segmented local area Ethernet branches may be used in transferring data from imaging devices to acquisition gateway computers. A 1 gigabit/s image network is used between acquisition gateway computers and the PACS server because several acquisition computers may send large image files to the server at the same time. High-speed networks are necessary between the PACS server and workstations. It is crucial to have high-speed networks to support the client-server PACS architecture since the PACS workstation is highly dependent on data transfer of images from the PACS server to the PACS workstation’s local memory with performance expectations similar to if PACS workstations were to access images on the local workstation’s hard disk storage. Process coordination between tasks running on different computers connected to the network is an extremely important issue in system networking. This coordination of processes running either on the same computer or on different computers is accomplished by using interprocessor communication methods with socket-level interfaces to TCP/IP. Commands are exchanged as ASCII messages to ensure standard encoding of messages. Various PACS-related job requests are lined up into disk resident priority

123

124

Brent J. Liu and H.K. Huang

queues, which are serviced by various computer system DAEMON (agent) processes. The queue software can have a built-in job scheduler that is programmed to retry a job several times by using either a default set of resources or alternative resources if a hardware error is detected. This mechanism ensures that no jobs will be lost during the complex negotiation for job priority among processes.

4.3.6 Picture archiving and communication system workflow This section discusses a generic PACS workflow steps starting from the patient registering in the HIS/EMR to the RIS ordering examination, the technologist performing the exam, image viewing, reporting, and archiving. Comparing this PACS work flow with the PACS components in Fig. 4.1 and the client-server architecture workflow integrated into the general radiology workflow, PACS has replaced many manual steps in the film-based workflow. The following are the workflow steps for a PACS workflow: (1) Patient registers in HIS/EMR. radiology exam ordered in RIS. Exam accession number is automatically assigned. (2) RIS outputs HL7 messages of HIS/EMR and RIS demographic data to PACS broker/interface engine. (3) PACS broker notifies archive server of scheduled exam for patient. (4) Following prefetching rules, historical PACS exams of the scheduled patient are prefetched from the storage system and staged for fast access from the archive server. (5) Patient arrives at modality. Modality queries PACS broker/interface engine for DICOM worklist. (6) Technologist acquires images and sends PACS exam of images acquired by modality and patient demographic data to the PACS server. (7) On arrival of PACS exam to the PACS archive server, the archive server database is updated with PACS exam as prepared status and archived in the storage system. (8) End-user PACS display workstations now have access to entire patient/study database of archive server and Radiologists can view imaging studies based on preset filters on the main worklist to shorten the number of worklist entries. (9) Once an exam is located on worklist and selected, images from the PACS exam are loaded from the server directly into the memory of the PACS display workstations. (10) Reading radiologist dictates a report with exam accession number on dictation system. Radiologist signs off on PACS exam with any changes. Archive database is updated with changes and marks PACS exam as signed-off status. (11) Transcriptionist fetches the dictation and types report that corresponds to the exam accession number within RIS.

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

(12) RIS outputs HL7 message of results report data along with any previously updated RIS data. (13) Radiologist queries PACS broker/IE for previous reports of PACS exams on reading workstations. (14) In addition, referring physicians have access to view PACS exams and reports on review workstations.

4.4 Picture archiving and communication system server and image archive At the heart of the PACS is the central node considered the engine of the PACS and has two major components: the PACS server and the image archive. The PACS server and image archive consists of both hardware and software architecture in a subsystem. The PACS server directs the data flow for the entire PACS by using interprocess communication among the various major computer algorithm task processes. The image archive provides a hierarchical image storage management system for short-, medium-, and long-term image archiving. Over the years, as more information technology solutions have presented themselves as a viable alternative to existing PACS design, the PACS server and image archive have undergone drastic changes. In this section, the original PACS design will be presented and discussed as well as the current new technology trends for image archive systems.

4.4.1 Image management and design concept Two major aspects should be considered in the design of the PACS image storage management system: data integrity, which protects against no loss of images once they are received by the PACS from the imaging modalities, and system efficiency, which minimizes access time of images at the display workstations. To ensure data integrity, the PACS always retains at least two copies of an individual patient’s imaging study on separate storage devices until the study has been archived successfully to the long-term storage devicede.g., storage area network (SAN) or digital linear tape (DLT) library. This backup scheme is achieved through PACS intercomponent communication among the following PACS components as shown in Fig. 4.1. Fig. 4.7 shows the hierarchical image storage management in PACS: (1) A copy of the PACS study is stored on the imaging modality until the technologist has verified that the studies have been successfully archived to PACS. (2) A copy of the PACS study is stored on the acquisition gateway computer until the image archive subsystem has acknowledged that the study has been received successfully. (3) A copy of the study is retained until the PACS study has been successfully stored to permanent long-term storage (e.g., SAN or DLT).

125

Brent J. Liu and H.K. Huang

126

Image Modality

Acquisition Gateway

PACS Archive Server

R A I D D L T

Display Workstation

Figure 4.7 A diagram showing hierarchical image storage management in picture archiving and communication system (PACS). Note that at least two copies of each PACS image reside on separate storage devices.

(4) The study is deleted from the display workstation when the review has been completed.

4.4.2 Picture archiving and communication system server and storage archive functions The PACS server and storage archive consists of four components: an archive server, a database, a data storage archive or library (e.g., a DLT library), and a communication network. Attached to the archive system through the communication network are the acquisition computers and the display workstations. Images acquired by the acquisition computers from various radiological imaging devices are transmitted to the archive server, from which they are archived and available for access from the appropriate display workstations. The following is a brief description of each of the four subcomponents as well as some of the major functions. 4.4.2.1 The archive server The archive server consists of multiple powerful CPUs, computer systems interface (e.g., fiber channel) data connections, and network interfaces (Ethernet and ATM). With its redundant hardware configuration, the archive server can support multiple processes running simultaneously, and image data can be transmitted over different data connections and networks. In addition to its primary function of archiving images, the archive server acts as a PACS controller, directing the flow of images within the entire PACS from the acquisition gateway computers to various destinations such as storage archive solutions, workstations, or print stations.

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

The archive server can utilize large-capacity redundant array of inexpensive disks (RAID) or SAN as a data cache, capable of storing several weeks/months/years’ worth of images acquired from different radiological imaging devices. As an example, a small 20 GB disk storage, without using compression, can hold simultaneously up to 500 CT, 1000 MR, and 500 computed radiography (CR) studies. Nowadays, very large RAID and SAN technologies are available in the archive server in the client/server model. The magnetic cache disks configured in the archive server should sustain high data throughput for read operation, which provides fast retrieval of images from the RAID or SAN. 4.4.2.2 The database system The database system comprises of redundant database servers running identical reliable commercial database systems (e.g., Sybase, Oracle) with structured query language (SQL) utilities. A mirrored database with two identical databases can be used to duplicate the data during every PACS transactions (not images) involving the server. The data can be queried from any PACS computer via the communication networks. The mirroring feature of the system provides the entire PACS database with uninterruptible data transactions that guarantee no loss of data in the event of system failure or a disk crash. Besides its primary role of image indexing to support the retrieval of images, the database system is necessary to interface with the RIS and the HIS, allowing the PACS database to collect additional patient information from these two healthcare databases. 4.4.2.3 The storage archive or library The storage archive or library consists of multiple input/output drives (usually some data media such as hard disks, DLT, etc) and disk controllers, which allow concurrent archival and retrieval operations on all of its drives. Newer technologies available as archive library solutions include, large-scale RAID, and SAN that may comprise of either tape media or hard disk. Current newer technologies available are vendor neutral archives (VNAs) and cloud-based storage that will be discussed later. The archive or library must have a large storage capacity of terabytes and support mixed storage media if migrating to newer solutions. In this case, most hospitals opt for migrating PACS studies entirely from one data media solution to another in order to reduce the complexities of managing mixed storage media. Redundant power supply is essential for uninterrupted operation. 4.4.2.4 Communication networks The PACS archive system is connected to both the PACS LAN and WAN. The PACS LAN can have a two-tiered communication network composed of Ethernet and ATM or high-speed Ethernet networks. The WAN provides connection to remote sites and can consist of T1/T3 lines, ATM, and fast Ethernet. The PACS LAN uses the high-speed ATM or Ethernet switch to transmit high-volume image data from the archive server

127

128

Brent J. Liu and H.K. Huang

to 1 and 2K display workstations. Gigabit Ethernet (GB/s) is used for interconnecting components to the PACS server including acquisition gateway computers, RIS, and HIS, and as a backup of the ATM or the GB/s Ethernet. 4.4.2.5 Picture archiving and communication system server and storage archive functions In the PACS server and storage archive, processes of diverse functions run independently and communicate simultaneously with other processes using client-server programming, queuing control mechanisms, and job prioritizing mechanisms. Because the functions of the server and the archive are closely related, we sometimes use the term archive server to represent both. Major tasks performed by the archive server include image receiving, image stacking, image routing, image archiving, studies grouping, RIS interfacing, PACS database updating, image retrieving, and image prefetching. The following subsections describe the functionality carried out by each of these tasks. Whenever appropriate, the DICOM standard is highlighted in these processes. Image receiving: Images acquired from various imaging modalities in the gateway computers are converted into DICOM data format if they are not already in DICOM. DICOM images are then transmitted to the archive server via the Ethernet or ATM by using client-server applications over standard TCP/IP. The archive server can accept concurrent connections for receiving images from multiple acquisition gateway computers. DICOM commands can take care of the send and receive processes. Image stacking: Images arrived in the archive server from various gateway computers are stored in its local magnetic disks or RAID/SAN (short-term archive) based on the DICOM data model and managed by the database. The archive server holds as many images in its 1000 gigabyte disks as possible and manages them on the basis of aging criteria. During a hospital stay, for example, images belonging to a given patient remain in the archive server’s short term archive until the patient is discharged or transferred. Thus all recent images can be retrieved from the archive server’s high-speed short term archive instead of the lower-speed long-term storage solution. This feature is particularly convenient for radiologists or referring physicians who must retrieve images from different display workstations. In the client/server PACS model, the short-term archive is very large, some in terabytes of capacity with the long-term archive library solution a SAN storage device. Image archiving: Images arriving in the archive server from gateway computers are copied from short term storage to the archive library for long-term storage. When the copy process is complete, the archive server acknowledges the corresponding acquisition gateway, allowing it to delete the images from its local storage and reclaim its disk space. In this way, the PACS always has two copies of an image on separate storage media systems until the image is archived to the permanent storage. Images that belong to a

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

given patient during a hospital stay with multiple examinations are scattered temporarily across the Archive Library. RIS and HIS interfacing and PACS database updates: The archive server accesses data from HIS/RIS through a PACS gateway computer or interface engine. The HIS/RIS relays a patient admission, discharge, and transfer (ADT) message to the PACS only when a patient is scheduled for an examination in the radiology department or when a patient in the radiology department is discharged or transferred. Forwarding ADT messages to PACS not only supplies patient demographic data to the PACS but also provides information the archive server needs to initiate the prefetch from long-term storage, image archive, and studies grouping tasks. Exchange of messages among these heterogeneous computer systems can use the HL7 standard data format running TCP/IP communication on a client/server basis. In addition to receiving ADT messages, PACS receives examination data and diagnostic reports from the RIS. This information is used to update the PACS database, which can be queried and reviewed from any display workstation. Data transactions performed in the archive server, such as insertion, deletion, selection, and update, are carried out by using SQL utilities in the database. Data in the PACS database are stored in predefined tables, with each table describing only one kind of entity. The design of these tables should follow the DICOM data model for operation efficiency. Individual PACS processes running in the archive server with information extracted from the DICOM image header update these tables and the RIS interface to reflect any changes of the corresponding tables. Image retrieving: Image retrieval takes place at the display workstations. The display workstations are connected to the archive system through the communication networks. The archive library configured with multiple drives can support concurrent image retrievals. The retrieved data are then transmitted from the archive library to the archive server. The archive server handles retrieve requests from display workstations according to on-demand requests. For example, a display workstation that is used for primary diagnosis or is in a conference session or at an intensive care unit or a workstation used exclusively for research and teaching purposes would all have on-demand access based on the client/server architecture. Image prefetching: The prefetching mechanism is initiated as soon as the archive server detects the arrival of a patient via the ADT message from HIS/RIS. Selected historical images, patient demographics, and relevant diagnostic reports are retrieved from the slower access long-term storage archive and the PACS database and staged at the faster access short-term archive before the completion of the patient’s current examination. The prefetch algorithm is based on predefined parameters such as examination type, disease category, radiologist, referring physician, location of the workstation, and the number and age of the patient’s archived images. These parameters determine which historical images should be retrieved.

129

Brent J. Liu and H.K. Huang

130

4.4.3 Digital Imaging and Communications in Medicineecompliant picture archiving and communication system archive server The purpose of the Digital Imaging and Communications in Medicine (DICOM) standard is to promote a standard communication method for heterogeneous imaging systems, allowing the transfer of images and associated information among them. By using the DICOM standard, a PACS would be able to interconnect its individual components together and allow the acquisition gateways to link to imaging devices. However, imaging equipment vendors often select different DICOM compliant implementations for their own convenience, which may lead to difficulties for these systems in interoperation. Therefore, it is an important step to perform throughput testing of the entire system from PACS study acquisition to archival to ensure that the system is integrated properly. A well-designed DICOM-compliant PACS server can use two mechanisms to ensure system integration. One mechanism is to connect to the acquisition gateway computer with DICOM providing reliable and efficient processes of acquiring images from imaging devices. The other mechanism is to develop specialized server software allowing interoperability of multivendor imaging systems. Both mechanisms can be incorporated in the DICOM-compliant PACS server.

4.4.4 Hardware and software components PACS archive server generic hardware components consist of the PACS archive server hardware, peripheral image storage solutions, and fast Ethernet and fiber channel Interfaces. For large-scale PACS, the server computers used are mostly blade server LINUX-based machines. The fast Ethernet interfaces the PACS archive server to the fast Ethernet network, where acquisition gateways and display workstations are connected. The fiber channel integrates peripheral storage archive devices with the PACS archive server. The main archive devices for PACS server may include RAID, DLT, and newer SAN technologies. RAID and SAN, because of their fast access speed and reliability, are extensively used as the short-term archive devices in PACS. Because of its large data storage capacity, DLTand possibly SAN are used for long-term archiving. Many different kinds of storage devices are available for PACS application; in the following we describe the two most popular ones, RAID and DLT, along with newer SAN technology. In addition, to local peripheral devices for image storage, two emerging technologies have become available: cloud storage and the VNA concept. These two technologies have slowly replaced the peripheral image storage devices as the solutions of choice for mostly long-term storage solutions. 4.4.4.1 Redundant array of inexpensive disks RAID is a disk array architecture developed for fast and reliable data access. A RAID groups several magnetic disks (e.g., eight disks) as a disk array and connects the array to

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

one or more RAID controllers. The size of RAID is usually several hundred gigabytes (e.g., 320 GB for eight disks) to terabytes. With the individual disk size increasing, the size of RAID can also be increased. The RAID controller has a fiber channel interface to connect to the PACS server. Multiple RAID controllers with multiple fiber channel interfaces can avoid the single-point failure in the RAID device. 4.4.4.2 Digital linear tape DLTuses a multiple magnetic tape and drive system housed inside a library or jukebox for large volume and long-term archive. With current tape drive technology, the data storage size can reach 40e200 Gbytes per tape. One DLT can hold from 20 to hundreds of tapes. Therefore, the storage size of DLT can be from one to tens of Tbytes, which can hold PACS images from one to several years. DLT usually has multiple drives to read and write tapes. The tape drive is connected to the server through Fiber Channel interface. The data transmission speed is several megabytes per second for each drive. The tape loading time and data locating time are about several minutes (e.g., 3min). Hence, in general, it takes several minutes to retrieve one CR image from DLT. PACS image data in DLT are usually prefetched to RAID for fast access time. 4.4.4.3 Storage area network A current data storage trend in large-scale archiving is SAN technology. With this new configuration, for long-term storage and even short-term storage, the PACS data is stored in a SAN. This SAN is a stand-alone data storage repository with a single IP address. File management and data backup can be achieved with a combination of digital media (e.g., RAID, DLT, etc.) smoothly and with total transparency to the user. In addition, the SAN can be partitioned into several different repositories each storing different data file types. The storage manager within the SAN is configured to recognize and distribute the different clients’ data files and store them to distinct and separate parts of the SAN. 4.4.4.4 Cloud storage Cloud storage is a cloud computing model in which data are stored on remote servers accessed from the Internet, or “cloud.” For most PACS manufacturers, the cloud storage is primarily utilized as a long-term storage solution only. It is maintained, operated and managed by a cloud storage service provider on a storage server built on virtualization techniques. The physical storage spans multiple servers (and often locations), and the physical environment is typically owned and managed by a hosting company, virtual private network services, or hospitals. Cloud storage providers are responsible for keeping the data available and accessible instantaneously and continuously and for providing a secure physical environment. The customer or PACS provider can buy or lease storage capacity from the service providers to store, organized, or utilize the imaging data. Cloud storage services may be accessed through a colocated cloud computer service, a Web service application

131

Brent J. Liu and H.K. Huang

132

programming interface (API) or by applications that utilize the API, such as cloud desktop storage, a cloud Storage Gateway or Web-based content management systems. 4.4.4.5 Vendor neutral archive VNA is an emerging medical imaging technology where images and documents as well as other forms of multimedia data are stored in a standard format with a standard interface such that they can be accessible regardless of the vendor specific system, hence the term, “vendor neutral.” A VNA provides storage that is scalable throughout the life cycle of the image data so that images and related information can be queried, stored, and retrieved as defined by open standards from multiple departments throughout the healthcare enterprise while maintaining patient privacy and security. Further details of the VNA will be discussed in a future section of this chapter. 4.4.4.6 Archive server software PACS archive server software is DICOM compliant and supports DICOM Storage service Class and Query/Retrieve service Class. Through DICOM communication, the archive server receives DICOM studies/images from the acquisition gateway, appends study information to the database, and stores the images in the archive device including the RAID, DLT, or SAN. It receives the DICOM query/retrieve request from display workstations and sends out the query/retrieve result (patient/study information or images) back to workstations. The DICOM services supported in PACS archive server are C-Store, C-Find, and C-Move. All software implemented in the archive server should be coded in standard programming languagesdfor example, C and Cþþ on the LINUX open systems architecture. PACS archive server software is composed of at least six independent components (processes) including receive, insert, routing, send, Q/R-server, and RetrieveSend. It also includes a PACS database. All of these processes run independently and simultaneously and communicate with other processes through Queue control mechanisms.

4.4.5 Disaster recovery and backup archive solutions The PACS archive server is the most important component in a PACS; even though it may have fault-tolerant features, chances are, it could fail occasionally. This is especially challenging to the clinical workflow for the client-server architecture since if the PACS archive server is down, the entire PACS will be down and access to images not possible. A backup archive server is necessary to guarantee its uninterrupted service. Two copies of identical images can be saved through two different paths in the PACS network to two archive libraries. Ideally, the two libraries should be in two different buildings in case of natural disaster. To reduce the cost of redundant archiving, the primary unit can be another DLTor SAN library. The backup archive server can be short term (3 months) or long term. The functions of a backup archive server are twofold: maintaining the PACS continuous operation and preventing loss of image data. Data loss is especially

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

troublesome because if a major disaster occurs, it is possible to lose an entire hospital’s PACS data. In addition, scheduled downtimes to the main PACS archive also impact a filmless institution greatly. Current PACS archives feature disaster recovery or a backup archive in the form of cloud storage or VNA but most depend on both solutions providing fault-tolerance and disaster recovery specific to their respective solutions. Furthermore, current general disaster recovery solutions vary in the approach toward creating redundant copies of PACS data. One approach is to provide a short-term faulttolerant backup archive server using the application service provider (ASP) model at an offsite location similar to cloud storage. The ASP backup archive provides instantaneous, automatic backup of acquired PACS image data and instantaneous recovery of stored PACS image data, all at an acceptable service cost because it utilizes the ASP business model. Fig. 4.8 shows the general architecture of an ASP backup archive server. In addition, should the downtime event render the network communication inoperable, a portable solution is available with a data migrator. The data migrator is a portable laptop with a large-capacity hard disk that contains DICOM software for exporting and importing PACS exams. The data migrator can populate PACS exams that were stored on the backup archive server directly onto the clinical PACS within hours to allow the radiologists to continue to read previous PACS exams until new replacement hardware arrives and is installed or until a scheduled downtime event has been completed.

4.4.6 Current changes in picture archiving and communication system architecture: the vendor neutral archive

T1 Router

Clinical PACS Server

DICOM Gateway

T1 Router

T1

Hospital Site

C on ne ct io n

During the last half decade or so, the PACS climate has changed dramatically. In the past, PACS was usually sold and implemented as an entire system with integrated components described in this chapter that could be proprietary in nature utilizing

PACS Gateway

PACS Storage Offsite Fault-Tolerant Backup Archive

PACS

Figure 4.8 General architecture of the application service provider backup archive server. One DICOM gateway and one PACS gateway are used as the buffers between the two sites. T1 or DS-3 can be used for the wide area network.

133

134

Brent J. Liu and H.K. Huang

intercommunication protocols among the PACS components. The current concept trend is to break apart the various components and sold as separate component pieces. This new concept has been referred to as “deconstructed PACS.” Because of this, PACS vendors have been focusing their marketing only on the viewing and end user application software and not the storage solution. PACS vendors continue to sell PACS because they have knowledge of the radiology workflow so their viewing software will embed these workflow features (e.g., Study protocols, Automatic hanging protocols, decoding the DICOM header) and emphasize them as the domain expert. As described previously in the chapter, storage solutions now include a new technology sold as a third-party product and no longer part of the PACS vendor product, the VNA. The VNA stores the PACS image files in the native DICOM format and is also used to store all other kinds of data including nonradiology images whether they are DICOM or non-DICOM. Most VNA technologies do not focus on the more detailed DICOM fields. Instead, they just extract basic patient information and related metadata information needed to query and retrieve the images. Characteristics for a VNA include a patient-centric approach, flexible and robust to support upgrades and changes of different viewing, acquisition, and workflow challenges, and interchangeable without having to migrate, convert, or change the data formats or interfaces. Image data and related data are stored in native DICOM format with standard DICOM services for query and retrieval. What makes the VNA appealing to hospital administrations is its ability not only to store data from PACS and related PACS applications, but it can store and provide accessibility for other multimedia data (e.g., non-DICOM) from various image acquisition systems throughout the healthcare enterprise making it a one-stop image archive solution for all clinical imaging and related data. Fig. 4.9 shows how a VNA incorporates imaging and imaging-related data from the radiology, cardiology, and other specialty departments into a single VNA where originally, each department was responsible for maintaining and supporting a separate image archive. Not only does the VNA support consolidation of all imaging and imagingrelated data, it also provides built-in disaster recovery solutions and easier IT management for one image archive instead of many. In addition to the above benefits of the VNA for IT management in a healthcare enterprise, the hospital can focus on implementing a universal image viewer through an API to support all departments who need access to the data. Fig. 4.10 further shows how a VNA middleware provides access to all imaging and imaging-related data to various end-user applications. This can include an EMR or other health information exchange software, or even specific diagnostic viewing applications such as a radiology PACS viewer, cardiology PACS viewer, or other specialty PACS viewer.

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

135

Figure 4.9 The concept of the vendor neutral archive (VNA). In the past, each department (e.g., radiology, cardiology, or specialty) will maintain its own PACS or data storage repository. With the VNA there is a single storage infrastructure that consolidates all data and is accessible through an interface.

Cardiology PACS

Ultrasound

Cardiology

Arthroscopy

Dermatology

Radiology PACS Specialty PACS

Surgical Endoscopy

PACS

Gastroenterology

Ophthalmology

Modali es eMR

HIE

VNA Middleware Figure 4.10 The vendor neutral archive (VNA) middleware provides access to imaging and imaging related data either to the requesting picture archiving and communication system (PACS) viewers of each of the departments or to the electronic medical record or other health information exchange software. Note imaging modalities that generate the imaging data can send directly to the VNA through the middleware without having to integrate with a PACS.

136

Brent J. Liu and H.K. Huang

4.5 Picture archiving and communication system clinical experiences 4.5.1 Introduction In this section, methodology and a road map for PACS implementation and system evaluation within a clinical hospital environment will be discussed. In addition, some examples of clinical experiences and pitfalls will be presented. The philosophy of PACS design and implementation is that, regardless of the scale of the PACS being planned, the strategy should always leave room for future expansion including integration with an enterprise PACS. Thus, if the current planning is to have a large-scale PACS, the PACS architecture should allow for future growth to an enterprise PACS. On the other hand, if only a PACS module is being implemented, then the connectivity and compatibility of this module with future modules or a large-scale PACS are important. The terms we discussed in previous chapters including open architecture, connectivity, standardization, portability, modularity, and IHE workflow profiles, should all be considered.

4.5.2 Picture archiving and communication system implementation strategy When implementing a PACS within a clinical environment, it is important to identify some key fundamental concepts that will serve as cornerstones for a successful implementation. First, PACS is an enterprise-wide system or product. It is no longer just for the radiology or imaging department; therefore, careful consideration of all decisions/ strategies going forward should include the entire healthcare continuum, from referring physicians to the radiology department clinical and technical staff to the healthcare institution’s information technology (IT) department. It is crucial for a successful implementation that some of the key areas within the healthcare institution have buy-in of the PACS process including administration, radiology department, IT department, and high-profile customers of radiology (e.g., orthopedics, surgery). Furthermore, a champion(s) should be identified for the PACS process. Usually this is the medical director of radiology, but it can include other physicians as well as IT administrators. Second, PACS is a system with multiple complex components interacting with each other. Each of these components can be an accumulation of multiple hardware components. A general clinical PACS usually comprises of the archive, archive server/ controller, DICOM gateway, web server, workstations, and an RIS/PACS interface. Whether considering implementation or acceptance, all components of the system must be assessed. The following describe some of the steps involved in implementing a PACS within a healthcare institution.

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

4.5.2.1 Risk assessment analysis It is important to perform a risk assessment analysis before implementation so that problem areas and challenges can be mapped out accordingly and timeline schedules made to accommodate the potential roadblocks. Some areas to focus on are the network infrastructure that will be supporting the PACS, the integration of acquisition modality scanners with PACS (e.g., legacy systems, modality worklist, quality control workstations), physical space for the PACS equipment, and resource availability. Resource availability is especially crucial because a successful PACS implementation hinges on the support provided by the in-house radiology department. In making risk assessments, it is also helpful to determine areas in which there is a low risk and a high return. These areas are usually departments where there is a high volume of image (film) and a low rate of return of film back to the radiology department (e.g., critical care areas, orthopedics, surgery). These low-risk/high-return areas can help to drive the implementation phase timeline and also be a good first push in the implementation process. 4.5.2.2 Implementation phase development Implementation of PACS should be performed in distinct phases, which would be tailored based on the risk assessment analysis performed at the healthcare institution. Usually, the first phase is when the main components are implemented such as the archive, archive server/controller, network infrastructure, HIS-RIS-PACS interfaces, workstations, and one or two modality types. The next phases are targeted toward implementing all modality types and a web server for enterprise-wide and off-site distribution of PACS exams. The phased approach allows for a gradual introduction of PACS into the clinical environment, with the ultimate goal being the transformation into a filmless department/hospital. The timing of each phase does not have to be extensive; however, it is important to draw lines for each of the phases so that management of the implementation is clear cut. 4.5.2.3 Development of workgroups Because PACS covers such a broad area within the healthcare institution, it is important to develop workgroups to handle some of the larger tasks and responsibilities. In addition, a PACS implementation team should be in place to oversee the timely progress of the implementation process. The following are some key workgroups and their responsibilities: (1) RIS-PACS interface and testing: Responsible for integration/testing of RIS/PACS interfaces including the modality worklist on the acquisition scanners. (2) PACS modalities and system integration: Responsible for the technical integration of modalities with PACS and installation of all PACS devices.

137

138

Brent J. Liu and H.K. Huang

Figure 4.11 Different stages of the conversion of a clinical space into a reading room for Radiologists. Note the pinwheel shape design utilizing the center of the room. Power and networking are supplied through the column in the center of the floor.

(3) PACS acquisition workflow and training: Responsible for developing workflow and training for clerical and technical staff and for any construction needed in the clinical areas. (4) PACS diagnostic workflow and training: Responsible for developing workflow and training for radiologists and clinicians and for any construction needed in the clinical diagnostic areas (e.g., reading room designs). Fig. 4.11 shows an example of the different stages of conversion of a clinical space into a reading room for radiologists. (5) PACS network infrastructure: Responsible for all design and implementation of the network infrastructure to support PACS. In addition to the above-listed workgroups, a PACS implementation team should be formed to oversee the implementation process. Members should include at least one point person from each workgroup, and additional members should include the PACS implementation manager, the medical director of imaging, the administrative director of imaging, an IT representative, and an engineering/facilities representative. This team should meet at least every 2 weeks and more frequently as the date of live implementation nears. The goals of this team are to update any status items and to highlight any potential stumbling blocks to the implementation process. In addition, these team

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

meetings allow a forum for higher-level administrators to observe the progress of the implementation. It is crucial to identify particular in-house resources for the implementation process. These include a technical supervisor of each modality, a clerical supervisor, a film librarian or film clerk, an RIS support person, and an IT network support person. These resources are an excellent source of information for issues related to PACS such as technologist workflow, clerical workflow, film distribution workflow, designing and performing RIS interface testing with PACS, and the overall hospital IT infrastructure. 4.5.2.4 Implementation management Developing a schedule and implementation checklist can assist management of the implementation process. This template includes topics such as the task description, the date scheduled for the task, and the owner of the task. This template allows for finer granularity of the implementation process to protect against overlooked implementation tasks. Input for the checklist can come from the PACS implementation team meetings. Furthermore, the checklist can be broken down into smaller subtask checklists for tracking of issues within each of the workgroups.

4.5.3 System acceptance One of the key milestones to system turnover is the completion of the acceptance testing (AT) of PACS. There are a few reasons why AT is important to PACS. First, AT provides vendor accountability for delivering the final product that was initially scoped and promised. It also provides accountability for in-house administration with documentation that the system was tested and accepted. AT also provides a glimpse into determining the characteristics of PACS during uptime and whether it will function as promised. Finally, AT provides proof of both PACS performance and functionality as originally promised by the vendor. Most vendors provide their own AT plan; however, usually it is not thorough enough and is a template that is not customized to the specific healthcare institution’s needs. The following sections describe some of the steps in designing and developing a robust AT that can be used for final turnover of PACS in the clinical environment. Acceptance test criteria are divided into two categories. The first category is quality assurance. This includes PACS image quality, functionality, and performance. The second category is technical testing, which focuses on the concept of “no single point of failure” through the PACS and includes simulation of downtime scenarios. Acceptance criteria should include identifying which PACS components are to be tested. Some of the components that should be included are: (1) RIS/PACS interface and/or PACS broker (2) Acquisition gateways (3) Modality scanner(s)

139

Brent J. Liu and H.K. Huang

140

(4) Archive server/storage (5) Diagnostic workstations (6) Review workstations (7) Network devices (8) Web server (if included) Each of the implementation phases of the PACS process should have an AT performed. Acceptance at each of the phases is also crucial for the vendor because it is only after acceptance that it can collect the remainder of the fee balance that is negotiated beforehand. The implementation of the AT is a two-phased approach. The first phase should be performed approximately 1 week before the live date. The content of phase one includes the technical component testing focusing on single points of failure, endto-end testing, contingency solutions for downtime scenarios, and any baseline performance measurements. The second phase should be performed approximately 2 weeks or so after the live date so that the PACS has stabilized in the clinical environment. The contents of phase two include PACS functional and performance testing as well as any additional network testing, all on a loaded clinical network.

4.5.4 Image/data migration Two scenarios are possible to trigger image/data migration: converting to a new storage technology and increasing data volumes. It is possible for a healthcare institution to have a dramatic increase in PACS data volumes once it transforms into a filmless institution. This is possible due in part to the continuous image accumulation as well as the integration of new modalities generating mass volumes of PACS data and archiving the large data quantities to PACS. For example, the multislice detector CT scanner is capable of generating up to 1000 images amounting to almost 500 MB of data per exam. It is very likely that a hospital may need to expand the archive storage capacity. Furthermore, most PACS installed in previous years do not have a secondary copy backup of all the archived PACS image data for disaster recovery purposes. It has only been a recent trend for PACS to offer disaster recovery solutions. Therefore, should a hospital decide to upgrade the archive server performance and expand with a higher-capacity data media storage system, there are a few major challenges facing a successful upgrade. One challenge is how to upgrade to a new PACS archive server in a live clinical setting. Another challenge is how to migrate the previous PACS data to a new data media storage system in a live clinical setting. Some of the issues that surround a migration plan are that the data migration must not hamper the live clinical workflow in any way or reduce system performance. With any migration, it is important that verification be performed to prevent any data loss. Once the data have been successfully migrated to the new data media, the original data media storage system should be removed, which may incur additional downtime of the archive

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

server. Development of a migration plan is key to addressing the surrounding issues and ensuring a data migration that will have the least impact on the live clinical PACS. Because data migration occurs in a live clinical setting, it is important to determine the times at which the data migration will not impact normal clinical workflow. This may include scheduling a heavier data migration rate during off-hours (e.g., nights and weekends) and a lighter rate during operating hours and hours of heavy clinical PACS use. Expert knowledge of the clinical workflow is valuable input toward developing a good schedule data migration. Downtime may be involved both initially and at the end of the data migration process and should be scheduled accordingly with contingency procedures. It may be necessary to fine-tune the data migration rate because estimates for the migration rate may not be accurate initially. Fine-tuning is very crucial because an aggressive migration rate can adversely affect the performance of the entire clinical PACS. Careful attention to the archive and system performance is especially important during the onset of the data migration. The data migration rate may need to be scaled back. This may be an iterative cycle until an optimal migration rate is achieved that does not adversely affect the clinical PACS.

4.5.5 Picture archiving and communication system clinical experiences and pitfalls The following paragraphs describe an overview of two different PACS clinical experiences from two different sized healthcare institutions. One is a large-scale healthcare institution and the second is a high-profile community-sized hospital. In addition, some PACS pitfalls will be discussed. 4.5.5.1 Clinical experiences at Baltimore VA Medical Center The Baltimore VA Medical Center (VAMC) started its PACS implementation in the late 1980s and early 1990s. The VAMC purchased a PACS in late 1991 for approximately $7.8 million, which included $7.0 million for PACS and $800,000 for CR. The manufacturers involved were Siemens Medical (Erlangen, Germany) and Loral Western Developed Labs (San Jose, CA); the product was later changed hand to Loral/Lockheed Martin and then to General Electric Medical Systems. The goals of the project were to integrate with the VA home-grown clinical patient record system and the then to-bedeveloped VistA imaging system. The project has been under the leadership of Dr. Eliot Siegel, Chairman of the radiology Department. The system was in operation in the middle of 1993 in the new Baltimore VAMC. This system has since evolved and has been integrated with other VA hospitals in Maryland into a single imaging network, the VA Maryland Healthcare System. Four major benefits at the Baltimore VAMC are changing the operation to filmless, reducing unread cases, reducing retake rates, and drastically improving the clinical workflow.

141

142

Brent J. Liu and H.K. Huang

The two major contributors to the cost of the system are the depreciation and the service contract. The VA depreciates its medical equipment over a period of 8.8 years, whereas computer equipment is typically depreciated over a 5-year time period. The other significant contributor to the cost of the PACS is the service contract, which includes all of the personnel required to operate and maintain the system. It also includes software upgrades and replacement of all hardware components that fail or demonstrate suboptimal performance. This includes replacement of any monitors that do not pass the quality control tests. No additional personnel are required other than those provided by the vendor through the service contract. In the Baltimore VAMC, the radiology administrator, chief technologist, and chief of radiology share the responsibilities of a PACS departmental system administrator. Cost savings attributed to PACS include three cost areas: (1) film operations; (2) space; and (3) personnel. Films are still used in two circumstances. Mammography exams are still using films, but they are digitized and integrated to the PACS. Films are also printed for patients who need to have them for hospital or outpatient visits outside the VA healthcare network. Despite these two uses, film costs have been cut by 95% compared with the figure that would have been required in a conventional film-based department. Additional savings include reductions in filmrelated supplies such as film folders and film chemistry and processors. The second area in cost savings is space. The ability to recover space in the radiology department because of PACS contributes to a substantial savings in terms of space indirect costs. Finally, the personnel cost savings include radiologists, technicians, and film library clerks. An estimate was made that at least two more radiologists would have been needed to handle the current workload at the VAMC had the PACS not been installed. The efficiency of technologists has improved by about 60% in sectional imaging exams, which translates to three to four additional technologists had the PACS not been used. Only one clerk is required to maintain the film library and to transport film throughout the medical center. 4.5.5.2 Clinical experience at Saint John’s Health Center Saint John’s Health Center, Santa Monica, CA, has a filmless PACS that acquires approximately 130,000 radiological exams annually. As the first phase, St. John’s implemented the PACS with CR for all critical care areas in April 1999. Phase II, completed in April 2000, comprised the integration of MR, CT, US, digital fluorography, and digital angiography within the PACS. Since then, St. John’s PACS volumes have increased steadily. The original storage capacity of the PACS archive was a 3.0 TB MOD Jukebox, which would mean that older PACS exams would have to remain offline before a year is over. Also, the archive had only a single copy of the PACS data. Therefore, should St. John’s encounter a disaster, it might lose all the PACS data because there was no backup. With these considerations, St. John’s determined to overhaul its PACS archive system with the following goals: • Upgrade the archive server to a much larger capacity

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

• Develop an off-site image/data backup system • Conduct an image/data migration during the archive system upgrade These goals were accomplished in late 2001 based on the concepts discussed in this section. With the archive upgrade, all new PACS exams were archived through a Sun Enterprise 450 platform server with a 270 GB RAID. The exams were then archived to a network-attached digital tape storage system comprising an additional Sun Enterprise 450 with a 43 GB RAID and a 7.9 TB storage capacity digital tape library. The storage capacity of the tape library technology was forecast to double in the next few years as the tape density doubles, eventually making it a 16 TB library. 4.5.5.3 Picture archiving and communication system pitfalls PACS pitfalls are mostly from human error, whereas bottlenecks are due to imperfect design in either the PACS or image acquisition devices. These drawbacks can only be realized through accumulated clinical experience. Pitfalls due to human error are often initiated at imaging acquisition devices and at workstations. Three major errors at the acquisition devices are entering wrong input parameters, stopping an image transmission process improperly, and incorrect patient positioning. The errors occur most often at the workstations, where users have to enter many keystrokes or click the mouse frequently before the workstation can respond. Other pitfalls at the workstation unrelated to human error are missing location markers in a CT or MR scout view, images displayed with unsuitable lookup tables, and white borders in CR images due to X-ray collimation. Pitfalls created by human intervention can be minimized by a better quality assurance program, periodic in-service training, and interfacing image acquisition devices directly to the HIS/RIS through a DICOM broker. Bottlenecks affecting the PACS operation include network contention; CR, CT, and MR images stacked up at acquisition devices; slow responses from workstations; and long delays for image retrieval from the long-term archive. Improving the system architecture, reconfiguring the networks, and streamlining operational procedures through a gradual understanding of the PACS clinical environment can alleviate bottlenecks. Utilization of the IHE workflow profiles discussed would also help to circumvent some of the bottleneck problems. During the integration of multivendor PACS components, even though each vendor’s component may come with a DICOM conformance statement, they still may not be compatible. These pitfalls can be minimized through the implementation of two DICOM-based mechanisms, one in the image acquisition gateway and the second in the PACS server, to provide better connectivity solutions for multivendor imaging equipment in a large-scale PACS environment.

143

Brent J. Liu and H.K. Huang

144

4.6 Introduction to hospital clinical systems PACS is a workflow integrated imaging system designed to streamline operations throughout the entire patient care delivery process. One of its major components, image distribution, delivers relevant electronic images and related patient information to healthcare providers for timely patient care either within a hospital or in a healthcare enterprise. Enterprise-level healthcare delivery emphasizes sharing of enterprise integrated resources and streamlining operations. In this respect, if an enterprise consists of several hospitals and clinics, it is not necessary for every hospital and clinics to have similar specialist services. A particular clinical service such as radiology can be shared among all entities in the enterprise. Under this setup, all patients registered in the same enterprise can be referred to a radiology expert center for examinations. In this scenario, the patient being cared for becomes the focus of the operation. A single master index like the patient’s name/ID would be sufficient for any healthcare provider in the enterprise to retrieve the patient’s comprehensive record. For this reason, the data management system would not be the conventional HIS, RIS, or other organizational information system. Rather, the EMR concept will prevail and is currently the latest technology for health data distribution for patients. The EMR concept has reached such an accepted status that current HIS vendors have abandoned the term HIS and now utilize EMR to implement and sell their product. To develop the EMR, the successful integration of the HIS, RIS, and additionally the voice recognition system, are crucial. The following sections will describe each of the aforementioned hospital clinical systems, their interfaces with each other, as well as the EMR concept.

4.6.1 Hospital information system and the electronic medical record The HIS is a computerized management system for handling three categories of tasks in a healthcare environment: (1) Support clinical and medical patient care activities in the hospital. (2) Administer the hospital’s daily business transactions (financial, personnel, payroll, bed census, etc.). (3) Evaluate hospital performances and costs, and project the long-term forecast. Many clinical departments in a healthcare center, such as radiology, pathology, pharmacy, clinical laboratories, and other units, have their own specific operational requirements that differ from the general hospital operation. For this reason, special information systems may be needed in these departments. Often, these information systems are under the umbrella of the HIS, which maintains their operations. Other departments may have their own separate information systems, and some interface mechanisms are built to integrate data between these systems and the HIS. For example, RIS was originally a

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

component of HIS; later, independent RIS was developed because of the limited support offered by HIS to handle special information required by the radiology departmental operations. However, the integration of these two systems is still extremely important for the healthcare center to operate as a total functional entity. Large-scale HIS mostly use mainframe computers. These can be purchased through a manufacturer with certain customization software or home grown through the integration of many commercial products, progressively over years. A home-grown system may contain many reliable legacy components but have out-of-date technology. Therefore, to interface HIS to PACS, caution must be taken to circumvent the legacy problem. Most HIS are an integration of many information data systems, starting the day the healthcare data center was established, and older components may have since been replaced by newer ones over many years of operation. In addition to taking care of the clinical operation, the HIS also supports hospital and healthcare center business and administrative functions. It provides automation for such events as patient registration and ADT as well as patient accounting. It also provides on-line access to patient clinical results (e.g., laboratory, pathology, microbiology, pharmacy, radiology). The system broadcasts in real time the patient demographics and encounters information with HL7 standards to the RIS. Through this path, ADT and other pertinent data can be transmitted to the RIS and the PACS. An EMR is the collection of patient and population health information stored in a digital format. EMRs may include a range of data including demographics, medical history, medication and allergies, immunization status, laboratory test results, radiology images, vital signs, personal statistics like age and weight, and billing information. The EMR is the current new technology version of HIS to address compliance of the

eMR HIS Lab System RIS

Nursing Documentation Live Vital Data

Pharmacy Billing System

Registration System

Physician Documentation Imaging (optional)

Figure 4.12 The concept of the electronic medical record (EMR) and its relation to HIS. Although imaging data is optional in the EMR currently, it will soon be a requirement for all EMRs.

145

146

Brent J. Liu and H.K. Huang

meaningful use law (HITECH Act, described in a later section). The EMR includes all HIS modules plus applications addressing the needs of various departments in a hospital (see Fig. 4.12). This includes the management of data related to the clinic, finance department, laboratory, nursing, pharmacy, radiology, pathology departments, nursing and physicians’ documentation, and also the live vital signs and bedside device information. Many hospitals have as many as 200 disparate systems feeding information into their EMR. The EMR is the ultimate information system in a healthcare enterprise. In an even broader sense, if the information system includes the health record of an individual, then it is called EHR (electronic health record). In this context, we concentrate on EMR. An EMR consists of five major functions: (1) Accepts direct digital input of patient data (2) Analyzes across patients and providers (3) Provides clinical decision support and suggests courses of treatment (4) Performs outcome analysis and patient and physician profiling (5) Distributes information across different platforms and health information systems As previously discussed, the HIS and even the RIS, which deal with patient nonimaging data management and hospital operation, can be considered components of EMR. An integrated HIS-RIS-PACS system, which extends the patient data to include imaging, forms the cornerstone of the EMR. Existing EMRs have certain commonalties. They have large data dictionaries with time stamped in their contents and can query and display data flexibly. Examples of successfully implemented EMRs are COSTAR (computerstored ambulatory record) developed at Massachusetts General Hospital (in the public domain), Regenstrief Medical Record System at Indiana University, HELP (Health Evaluation through Logical Processing) system developed at the University of Utah and Latter-Day Saints Hospital, and the VAHE (Department of Veterans Affairs Healthcare Enterprise information system). Among these systems, the VAHE is one of the most advanced systems in the sense that it is being used daily in many of the VA Medical Centers and it includes images in the EMR. Just like any other medical information system, the development of the EMR faces several obstacles: • Common method to input patient examination and related data to the system • Development of an across-the-board data and communication standard • Buy-in from manufacturers to adopt the standards • Acceptance by healthcare providers An integrated HIS-RIS-PACS system provides solutions for some of these obstacles. It has adopted DICOM and HL7 standards for imaging and text, respectively: • Images and patient-related data are entered into the system almost automatically. • The majority of imaging manufacturers have adopted DICOM and HL7 as de facto industrial standards.

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

Therefore, in the course of developing an integrated PACS, one should keep in mind the big picture, the EMR. Anticipation of future connections and the integrated PACS as a subsystem of EMR with images should be considered thoroughly.

4.6.2 Radiology information system The RIS is designed to support both the administrative and clinical operation of a radiology department, to reduce administrative overhead, and to improve the quality of radiological examination delivery. Therefore, the RIS manages general radiology-related patient demographics and billing information, procedure descriptions and scheduling, diagnostic reports, patient arrival scheduling, film location, film movement, and examination room scheduling. The RIS configuration is very similar to the HIS except that it is on a smaller scale. RIS equipment consists of a computer system with peripheral devices such as RIS workstations (normally no image display), printers, and bar code readers. Most independent RIS are autonomous systems with limited access to HIS. However, some HIS vendors offer an embedded RIS as a subsystem with a higher degree of integration. The RIS maintains many types of patient- and examination-related information including medical, administrative, patient demographics, examination scheduling, diagnostic reporting, and billing information. The major tasks of the system include: (1) Process patient and film folder records. (2) Monitor the status of patients, examinations, and examination resources. (3) Schedule examinations. (4) Create, format, and store diagnostic reports with digital signatures. (5) Track film folders. (6) Maintain timely billing information. (7) Perform profile and statistics analysis. The RIS interfaces to PACS based on the HL7 standard through TCP/IP over Ethernet on a client/server model utilizing a trigger mechanism. Events such as examination scheduling, patient arrivals, and examination begin and end times trigger the RIS to send previously selected information (patient demographics, examination description, diagnostic report, etc.) associated with the event to the PACS in real time.

4.6.3 Voice recognition system Typically, radiological reports are archived and transmitted independently from the image files. They are first dictated by the radiologist and recorded on a digital voice recorder from which a textual form is transcribed and inserted into the RIS several hours later. The interface between the RIS and the PACS allows for sending and inserting these reports into the PACS database, from which a report corresponding to the images can be displayed on the PACS workstation on request by the user. This process is not efficient because of the delay imposed by the transcription, which prevents the text report from

147

148

Brent J. Liu and H.K. Huang

reaching the referring physician in a timely manner. One method is to append the digital voice recordings of the Radiologist to the PACS study. The concept of interfacing this method is to have the digital voice database associated with the PACS image database; thus, before the written report becomes available, the referring physician can look at the images and listen to the report simultaneously. The radiologist views images from the PACS workstation and uses the digital Dictaphone system to dictate the report, which converts it from analog signals to digital format and stores the result in the voice message server. The voice message server in turn sends a message to the PACS data server, which links the voice with the images. The referring physicians at the workstation can, for example, in an intensive care unit, request to review certain images and at the same time listen to the voice report through the voice message server linked to the images. Later, the transcriptionist transcribes the voice data file by utilizing the RIS to generate the text report. The transcribed report is inserted into the RIS database server automatically. The RIS server sends a message to the PACS database server. The latter appends the transcribed report to the PACS image file and signals the voice message server to delete the voice message. The ideal method, which is the current technology in most healthcare institutions, is to use a voice recognition (VR) system that automatically translates voice into text. In this case, the VR system is either called within the PACS application or the RIS application. All the necessary fields are populated (e.g., patient name, medical record number, type of study, etc.) and the radiologists can begin to dictate, and the text report is instantaneously generated by converting the voice into text. Once the radiologist has completed the dictation, the report can be edited, reviewed, and electronically signed off and ready for distribution utilizing voice command controls. In addition, report templates can be created for common diagnosis results that allow the radiologist to quickly create a report result via VR commands. The report is then sent to RIS via an interface and RIS can then forward the report to PACS as needed. Thus the digital voice dictation system may see less use while the VR system will be enhanced with a full set of automatic templates that can be created on demand and utilized by the radiologists.

4.6.4 Interfacing picture archiving and communication, hospital information, radiology information, and voice recognition systems for an electronic medical record There are two current methods of transmitting health data between information systems: through database-to-database transfer and through an interface engine. 4.6.4.1 Database-to-database transfer The database-to-database transfer method allows two or more networked information systems to share a subset of data by storing them in a common local area. For example,

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

the ADT data from the HIS can be reformatted to HL7 standard and broadcasted periodically to a certain local database in the HIS. A TCP/IP communication protocol can be set up between the HIS and the RIS, allowing the HIS to initiate the local database and broadcast the ADT data to the RIS through either a pull or push operation. This method is most often used to share information between the HIS and the RIS. A recent trend is the integration between RIS and PACS databases. In this configuration, common elements are shared between both databases and any changes or modifications made to the patient, study, or image information are updated once without the need to update both databases manually. In addition, at the diagnostic workstation the RIS application would call the PACS application to display the particular study. An additional monitor is usually utilized to display the RIS application. The user will navigate through the RIS application to identify and select the particular radiology study to be diagnosed and the RIS application makes a function call to the PACS application to display the selected PACS study. This method of workflow is called “RIS-driven workflow” since the RIS is the driver of the diagnostic workflow and the PACS acts as a client in this instance. 4.6.4.2 Interface engine The interface engine provides a single interface and language to access distributed data in networked heterogeneous information systems. In operation, it appears that the user is operating on a single integrated database from his/her workstation. In the interface engine, a query protocol is responsible for analyzing the requested information, identifying the required databases, fetching the data, assembling the results in a standard format, and presenting them at the workstation. Ideally, all these processes are done transparently to the user and without affecting the autonomy of each database system. To build a universal interface engine is not a simple task. Most currently available commercial interface engines are tailored to limited specific information systems. 4.6.4.3 Integrating health information, radiology information, picture archiving and communication, and voice recognition systems Another recent trend to streamline the diagnostic workflow and provide as much clinical information as possible to the Radiologist has resulted in the integration of HIS/RIS/ PACS/VR application on one diagnostic workstation. This new integrated workstation is often referred to as the radiology “command center” and allows the Radiologist full access to all available pertinent and historical clinical data of a patient while making a primary diagnosis. Because this is a fairly recent technology trend, the complexities and challenges of integrating multiple applications on a single workstation has impacted the user with factors such as ease-of-use, reliability, and efficiency. More work in the future is needed to fully realize the potential of such an integrated workstation.

149

150

Brent J. Liu and H.K. Huang

In a hospital environment, interfacing the PACS, RIS, and HIS has become necessary to enhance diagnostic process, PACS image management, RIS administration, and research and training are important aspects to consider when integrating systems. For that matter, in the near future, the EMR will eventually become the main end user application for the entire healthcare enterprise. Any specialty departments such as radiology and cardiology will utilize the EMR application to initiate the access to necessary imaging and imaging-related data for the diagnostic workflow and the storage of the specialty department’s imaging data will reside within a VNA.

4.7 Picture archiving and communication systems and electronic medical records 4.7.1 Changes in the roles of the picture archiving and communication systems and electronic medical records in healthcare On February 17, 2009 the $787 Billion American Recovery and Reinvestment Act of 2009 was signed into law by the United States federal government. Included in this law is the appropriation of $19.2 Billion intended to be used to increase the use of an EHR, which is another term for EMR, by physicians and hospitals. This portion of the bill is called, the Health Information Technology for Economic and Clinical Health Act, or the HITECH Act. The U.S. government believed in the benefits of using electronic health records and was ready to invest federal resources to proliferate its use. The HITECH Act accelerated implementation of EMR across the US. This was due to the reimbursement rewards from the government for the cost to implement an EMR paired together with penalties in the form of the reduction in future medical reimbursement rates if the healthcare institution did not meet the meaningful use timeline. Healthcare institutions can earn various incentive payments by meeting the criteria for three stages of “meaningful use” over the benchmarks set by the HITECH Act. “Meaningful use” consists of a set of standards that govern how electronic health records are used by healthcare providers such as physicians, clinicians and hospitals. According to the Centers for Disease Control and Prevention, meaningful use is defined by a series of policy priorities for EMRs including improved quality, safety and efficiency of care, better coordination between providers, ensured privacy and security of personal information, and the engagement of patients in their own health. Organizations that are eligible for the Medicare EHR Incentive Program and achieve meaningful use by 2014 will be eligible for incentive payments. However, those organizations who have failed to achieve that standard by 2015 will be penalized. Stage 1 of the meaningful use program was announced in 2010 and its primary focus concentrates on EHR data and sharing. Healthcare providers were required to focus on storing health information electronically in a standardized format that makes it easy to

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

access for authorized providers and patients. Stage 1 also focused on tracking clinical conditions, using EHR to better coordinate care, utilizing information to begin evaluating and reporting both clinical quality and public health information, and using EHR to better involve patients in their own healthcare. The deadline to receive incentives was year 2015 for stage 1. Stage 2 of meaningful use program was announced in 2014 and broadens the use of EHR software for health information exchange among providers. It will feature further integration for e-prescribing and lab results as well as more extensive sharing of patient care summaries. Those seeking meaningful use incentives will have to continue to encourage patients to become active in their care. The deadline to received incentives was year 2016 for stage 2. Stage 3 of meaningful use program was announced in 2016, Stage 3 takes the advancing clinical EHR practices of Stage 2 and perfects them. The goal is to improve the quality of health information exchanged, which will lead to improved health for patients on a large scale. For providers, access to comprehensive patient data will be efficient and easy. Even public health, from chronic disease to the flu, can be impacted and reduced with the help of EHR. The goal of Stage 3 will be a more collated information network, from lab reports to immunization information. The deadline to received incentives is year 2019 for stage 3. It is important for PACS users (especially Radiologists) to have access to patient information while reviewing imaging studies in PACS in order to make the best clinical diagnostic decision. Therefore, it is vital for imaging data to be consistent with the patient record during image acquisition as well as textual clinical information that is passed from the RIS to PACS via the HL7 interface as described previously. Before the implementation of the EMR as the current standard across the healthcare enterprise, PACS and HIS coexist as two different silos within hospital clinical systems. They share some data through a clinical interface but they were generally treated as two different sets of patient records. The PACS utilizes patient demographics and the accession number from RIS to correlate images to existing patient information in HIS. This was considered a PACS-driven workflow where accessing the HIS was not always necessary. Once the EMR became the standard of care for health informatics data, PACS and imaging data was considered secondary to the patient record that would then integrate with the EMR to complete the overall patient record. PACS users can now easily access all clinically available data if there is integration between PACS and the EMR. However, PACS image data access is now driven from the clinical data provided by the EMR that is

151

152

Brent J. Liu and H.K. Huang

Figure 4.13 The Healthcare Information and Management Systems Society electronic medical record adoption model showing the eight stages of adoption and utilization. Note that PACS is integrated in Stage 1.

different from PACS-driven workflow. The Electronic Medical Record Adoption Model from the Healthcare Information and Management Systems Society is an eight stage (Stages 0e7) model that measures the adoption and utilization of EMR functions required to achieve a near paperless environment that harnesses technology to support optimized patient care (see Fig. 4.13. In this eight stage model, the PACS is integrated within the overall strategy at Stage 1 of the complete EMR implementation model. In this model, the EMR Integration with PACS is only optional is considered as a best practice for users and not a requirement with penalties like the HITECH Act. In the last 10 years, most of the resources and focus of the healthcare enterprise organization has shifted from investing in the advancement in imaging technology such as PACS to investing heavily in the implementation of the EMR. Therefore, with imaging as only part of the overall strategy, the PACS is no longer the driving force for medical information but only a part of the overall medical information solution centering around the EMR. This has had widespread effects not only in the strategies of healthcare institutions but with the PACS manufacturers as well. For example, as previously discussed, PACS

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

Figure 4.14 Eight major steps toward a successful electronic medical record implementation.

storage technologies have moved from a proprietary solution to the concept of the VNA in preparation for integration with the EMR.

4.7.2 Large-scale enterprise-wide electronic medical record implementation and design Fig. 4.14 below shows eight major steps for a successful large-scale implementation of enterprise-wide EMRs. Utilizing these steps as a guide to a specific healthcare enterprise will produce a strategy that gives a foundation for implementation. Execution of this strategy can face a variety of workflow nuances, disgruntled users, and missing data that may require slight modifications and changes to even the best implementations. The following steps will be described in more detail. 4.7.2.1 Step 1: strategic planning The first step in any EMR implementation planning is to outline all the tasks and processes that need to be executed by the team of physicians, practice managers, and IT staff. This includes determining readiness and establishing goals, identifying key stakeholders’ staff, considering collaboration possibilities, and developing and managing a project plan. Key tasks include the following:

153

Brent J. Liu and H.K. Huang

154

• • • • • • • •

Recruit an implementation committee from stakeholder groups. Outline expected implementation costs and define the total budget. Schedule the implementation. Migrate patient and practice data. Create a user training program. Conduct EMR testing (testing and evaluating in a “live” practice environment) Clearly define go-live activities Define critical success factors and evaluation strategies

4.7.2.2 Step 2: adapting the workflow There are two workflow approaches and it is best to determine one of them to follow prior to implementation. The first approach is the best practice (in industry) workflow whereas the second approach is to modify the EMR to fit to an existing clinical workflow. Usually, the best practice workflow is preferred because it is a collection of successful industry implementation experiences, but this would require potential changes in the current clinical workflow that is disruptive by nature. Preparation strategies similar to those of a PACS implementation would help such as developing an outline and analyzing current workflow without the EMR (pre) and then developing a new workflow with the EMR (post) for each department throughout hospital. 4.7.2.3 Step 3: financing It is important to prepare a budget and forecast the potential implementation costs for the EMR implementation. This is challenging since unforeseen costs cannot be predicted or quantified. However, by defining a budget based on the workflow analysis and implementation timeline in the previous steps, the potential for additional costs can be minimized. The close monitoring of the following budget items can also help to protect against unexpected costs overrun: • hardware and network upgrades • practice staff overtime and temporary staffing • productivity loss (can be as high as 35% reduction in patient throughput) • customizations and related consultant costs from the EMR vendor • vendor training fees • any additional consultant costs • data backups and storage 4.7.2.4 Step 4: recruiting the workforce Another important step is determining all staff leaders and consultants that will be related to the project. The team members are responsible for developing and executing all communication and training plans in addition to implementation tasks and tracking of

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

milestones. The following are a list of team members and their responsibilities as needed for the project. The size of the implementation team will depend on the scope of the EMR implementation as well as budget allocation. One key decision to make is whether to hire external EMR implementation specialists or consultants. While the specialists may be extremely beneficial during the implementation process, they may come at a cost that will impact the overall budget. • Project managerdresponsible for managing the overall project • Application analystdresponsible for data migration and cleansing • Application developerdresponsible for system customization • QA testerdresponsible for system testing and performance • Physician advocatedrepresents the physicians and advise on training, data and testing • Nurse advocatedrepresents the nurses and advise on training, data and testing • Superusersdearly adopters for training programs • SMEdsubject matter expert consultant on EMR implementation 4.7.2.5 Step 5: collaboration As with any large-scale implementation, team-based collaboration is a key contributor to success that includes Identifying and working collaboratively with members within the organization who are capable of assisting with EMR implementation. This includes a joint collaboration between nurses, physicians, and technology specialists throughout the clinical healthcare continuum that is highly recommended to develop an effective EMR system with user acceptance and buy-in from the key stakeholders. 4.7.2.6 Step 6: choosing an electronic medical record vendor Identifying the right EMR vendor is a complex process that may involve a variety of input metrics. Each healthcare institution may have its own criteria list to define what makes a particular EMR vendor the right choice. However, the process for making that decision involves a few key steps. First, determining and defining system requirements for acquiring an EMR system should be considered. If the solution provided does not meet initial system requirements then the potential vendor may not be the right choice for a healthcare institutions. Second, investigating potential vendors and soliciting proposals through a request for proposal process is another key step. Finally, the selection of the desired vendor and negotiating the final contract is the last important step prior to beginning to plan the EMR implementation and eventually collaborating with the vendor following system ’go-live’. 4.7.2.7 Step 7: go-live and preparation for clinical use An EMR implementation plan should be developed based on the strategic planning described in Step 1. This can include major task items such as customizing data collection functionality, revisiting the clinical workflow, training system users, scheduling system

155

Brent J. Liu and H.K. Huang

156

testing evaluation and pilot testing, data migration, and finally, going live with the system. There are three things to consider that will impact an EMR implementation timeline: (1) The scope of the project; (2) The size of the project team; and (3) The budget of the project. If these three items have been defined through the previous steps, then the length of the implementation will be based on how carefully the three attributes were defined and outlined. For example, if the size of the project team is limited due to budget concerns, then the implementation timeline maybe longer than usual. In a future section, a use case will be discussed for a large-scale implementation to give the reader a sense of the length of an EMR implementation. In addition to the three key attributes above, there are additional factors to consider during implementation. 4.7.2.7.1 Data migration and cleansing

Key considerations for EMR data migration are conversion of paper records if any, data cleansing and verification, mapping legacy data to new database fields and testing and verification of legacy and new data. 4.7.2.7.2 Training program development

Successful training programs should have the following characteristics: identifying and utilizing superusers as EMR user advocates, maintaining clear communication with vendor support teams, implement role-based training to ensure relevancy and make sure there is feedback loops to keep users in dialogue with project management These characteristics in the training program development are vital to a successful implementation as one of the primary causes of EMR implementation failure is poor user adoption of the system. 4.7.2.7.3 Go-live activities

Clearly define go-live activities both during pre- and post-go-live system testing. In addition to patient communication, the hospital staff should include expected downtime, projecting and predicting staffing schedules with required overtime or temporary staff. Other activities include in-practice communications such as signs on bulletin board and other means of communication, network (wired and wireless) speed and reliability checks, and data backup processes. 4.7.2.8 Step 8: system evaluation and optimizing for quality assessment Establishing evaluation criteria early in the process is key to properly assessing the overall system during the actual system evaluation. This includes developing evaluation modules and logistics, pilot testing evaluation modules, executing evaluation-related assignments, reporting, analyzing, and optimizing findings. The evaluation of an EMR implementation can take many forms but the best one for a given hospital practice depends on the overall project goals. In general, there are four areas that are often needed to assess the

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

success of the EHR implementation. These are profitability, system efficiency, quality of care, and adoption through training.

4.7.3 Electronic medical record integration with medical images and picture archiving and communication system Physicians need access to the highest-quality and most up-to-date information in order to make the best decisions for their patients. However, patient data are often spread across multiple platforms and clinical systems that are not necessarily fully integrated. The EMR generally stores textual information such as patient visit clinical notes and test results. Separately, the PACS provides storage and access to medical images from radiology department. Until recently DICOM images from radiology were only stored in PACS sometimes in a proprietary fashion. With the current technology of the VNA both DICOM and non-DICOM image data across multiple departments can be stored together. Radiology images (in PACS) and reports (in EMR) are critical components of a patient’s medical history. Ensuring that all physician providers have easy access to radiology information can help reduce costs by minimizing duplicate scans and redundant image and report distribution. The most common integration between the EMR and the PACS is based on the results report that is stored within the EMR. This approach assumes that the users will first view the results report in the EMR and if there is a

Figure 4.15 The differences between how images are linked to reports for both radiology and nonradiology images.

157

158

Brent J. Liu and H.K. Huang

finalized radiology report, then the corresponding radiology images can be viewed through an integrated image viewer either native to the PACS or a third-party application. Unfortunately, the PACS is very limited since most PACS only allow access to radiology image studies. The PACS application may be unable to display and store many of the medical images that are not from radiology department. As described in previous sections, the VNA is designed to handle and store all types of medical images. Fig. 4.15 shows the differences between how reports are linked with images for both radiology and nonradiology image data. On the left side of Fig. 4.15 for radiology images, the radiology report is linked with the DICOM image study that has a one study to one report relationship. Each radiology study has a unique accession number to refer to the image study. However, the radiology report only describes diagnostically about the specific DICOM images in the study and is considered a study-centric view. On the right side of Fig. 4.15, most of the nonradiology images will have one short clinical note with many images within the same visit. Instead of a report, nonradiology images are part of a clinical note that shows the complete picture of the patient visit and sometimes no report linked with the nonradiology images will be generated at all. Also, for nonradiology images, the images and clinical note are based on a visit or encounter number. Integration within the EMR has to utilize this unique encounter number to associate all images in one single visit. The two differences described above show how images integrated with the EMR is complex and challenging since any image storage repository such as the VNA has to integrate like PACS for radiology studies but also integrate for nonradiology (nonDICOM) images. In a typical PACS and EMR integration, the EMR usually makes a request to PACS with the accession number and the PACS returns a URL link (EMR systems are usually web-based) that links back to the images that belong to the study for display. However, for an EMR integrated with the VNA, in order to access both radiology and nonradiology images, the VNA not only has to receive an accession number but it also has to receive a visit and encounter ID from the EMR for all imaging data of that specific patient visit to be available for viewing. Ultimately, when considering the integration of VNA/PACS and the EMR, healthcare organizations should consider the following four key elements to optimize immediate value: (1) Plan for best-of-breed solution architecture. (2) Minimize impact on existing clinical care workflow. (3) Keep the patient experience in mind when designing the integration. (4) Leverage current standards compliance if available.

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

Figure 4.16 The six separate and distinct HIS (Affinity) for the four traditional hospitals and two multiservice ambulatory care centers that produce the silo effect and subsequently the need for integration.

4.7.4 Electronic medical record implementation use case: Los Angeles County department of Health Services ORCHID project The Los Angeles County Department of Health Services (LAC-DHS) is the second largest public health system in the U.S. and serves nearly 10 million residents with an operating budget of $4 billion US dollars annually. The LAC-DHS includes four traditional hospital-based facilities: Harbor-UCLA Medical Center (Harbor UCLA), Los Angeles County-University of Southern California Medical Center (LAC þ USC), Olive View-UCLA Medical Center, and Ranchos Los Amigos National Rehabilitation Center (RLANRC); two multiservice ambulatory care centers: High Desert, and Martin Luther King (MLK); and 10 additional comprehensive health clinics. As shown in Fig. 4.16, data from LAC-DHS0 HIS operating in the six primary facilities cannot be shared with other providers. The simple task of transporting a patient from a LAC-DHS hospital to another hospital was highly inefficient and cumbersome since the data did not travel with the patient. For example, paper medical records were photocopied while transport ambulances sit idling. In some instances, the receiving hospital may not even have the complete medical information about an incoming patient because all the paper records may not have been forwarded.

159

160

Brent J. Liu and H.K. Huang

Figure 4.17 The new design of ORCHID showing how all patient-related data are integrated into a single and centralized electronic medical record system supporting the six major hospitals and various health clinics.

Figure 4.18 The ORCHID timeline from the project kickoff and the various milestones including system design, build and implementation, and finally system validation.

Because of this, LAC-DHS made the decision to eliminate the existing HIS infrastructure and the subsequent silo effect with the implementation of a countywide EMR system. This EMR system would provide one single medical record throughout the LAC-DHS health network and all patient records will be maintained in one integrated system. The project was called the Online Real-time Centralized Health Information Database (ORCHID) and the overall implementation design is shown in Fig. 4.17. The main goals of ORCHID were to comply with requirements to attest to the meaningful use program while improving patient safety and care quality. The hope was that this new EMR system would make LAC-DHS more competitive in the healthcare industry by replacing the fragmented and obsolete HIS infrastructure and supporting outpatient care

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

Figure 4.19 Order of ORCHID deployment for the six major LAC-DHS facilities with Harbor-UCLA going first and Olive View the last hospital to go live.

restructuring as part of the overall healthcare reform. ORCHID would be a single source of truth so that LAC-DHS can have a single medical record for each person that receives healthcare at any one of the multiple LAC-DHS healthcare institutions. Fig. 4.18 shows the implementation timeline for the ORCHID project from the kickoff meeting to system validation. The project began on 11/27/2012, with the official kickoff occurring on 5/13/2013. The ORCHID project followed the eight implementation steps described in the previous section with 10% completion by July 2013%, 50% completion in October 2013%, and 90% completion by the month of Jan 2014. The project was finally completed on February 2015 with the Harbor-UCLA facility as the first implementation to go-live and be used for testing and evaluation. Fig. 4.19 shows the order of deployment for the six major LAC-DHS facilities. As stated before, Harbor-UCLA was the first facility to go-live with the ORCHID system followed by MLK (2/2015), LAC þ USC (5/2015), High Desert (8/2015), RLANRC (11/2015), and finally Olive View (2/2016) in that order. With a uniform, standardized and fully integrated countywide EMR system, LAC-DHS is now compliant with the meaningful use program. 4.7.4.1 Integration of ORCHID with picture archiving and communication system and non-DICOM images Currently, the LAC-DHS radiologist workflow is PACS-driven, which lacks the presentation of all hospital related patient information to the radiologist. Therefore, the Radiologists are either missing clinical information that may be key to the decisionmaking process. The current trend is to move to a RIS-driven (or EMR-driven) workflow, where PACS image studies will be displayed through the EMR as the starting point. In this workflow, all patient’s clinical information will be available so that the

161

Brent J. Liu and H.K. Huang

162

clinical decision would be much more comprehensive with relevant clinical data integrated with images. For the LAC-DHS facilities the current EMR (Cerner Corporation) integration is one way from PACS (Fujifilm Medical Systems) to the EMR even though the EMR has been successful with other PACS vendors with regards to a tighter twoway integration. However, it is highly likely that the RIS/EMR driven workflow integration will be completed by early 2019. Although the EMR/PACS integration mostly benefits the radiology department, LAC-DHS must also address the needs of all of their healthcare providers who are 98% of all imaging-related users. Access to both DICOM and non-DICOM images will be provided through the planned implementation of the VNA technology. Since there currently is no standard of integration between the EMR and VNA technologies, LAC-DHS made the decision to integrate the VNA based on the two methods below to have the most flexibility for linking images to the EMR and described in the previous section. (1) For radiology, cardiology, and laboratory imaging where image studies have one accession number linked to one report (mostly DICOM), images will be linked to the VNA from the results report review window whenever the physician provider are viewing a radiology, cardiology, or lab result. (2) Other images, where no results are necessary with the images (mostly non-DICOM and more than 60% of overall hospital medical images), images will be linked from the VNA when reviewing a clinical note (e.g., clinical documentation). The physician provider will be able to view images based on the visit note that can have different types of images associated with this visit note unlike radiology studies.

4.8 Summary In this chapter, various components, terminology, and standards used in PACS and EMR were presented and discussed. IHE consists of protocols of image data workflow allowing connectivity of components in PACS from various vendors based on existing standards. The information systems used in hospitals are HIS or CMS that consist of many clinical databases, such as RIS. These databases are operation oriented designed for special clinical services. The new trend in healthcare information system is EMR, which is patient orienteddi.e., data go where the patient goes and will eventually be integrated with all hospital related medical images. Up-to-date information on these topics can be found in multidisciplinary literature, reports from research laboratories of university hospitals, and medical imaging manufacturers, but not in a coordinated way. Therefore, it is difficult for a radiologist, hospital administrator, medical imaging researcher, radiological technologist, trainee in diagnostic radiology, or student in engineering and computer science to collect and assimilate this information. One major purpose of this chapter is to provide a brief overview and consolidate PACS-related topics and its integration with clinical information systems such as the EMR. PACS and medical imaging informatics is an ever-growing field that

Picture archiving and communication systems and electronic medical records for the healthcare enterprise

mirrors the ever-changing information Technology (IT) landscape. However, the fundamental concepts remain as important as ever and continue to form the bedrock for this expanding field. PACS has impacted the healthcare industry financially and operationally, streamlining clinical workflow and increasing the efficiency of the healthcare enterprise. Medical imaging informatics infrastructure is an emerging field focused to take advantage of existing PACS resources and its image and related data for large-scale horizontal and longitudinal clinical, research, and education applications that could not be performed before due to insufficient data.

4.9 Exercises (1) Based on the generic PACS basic components diagram and data flow (Fig. 4.1) and the components descriptions, identify the single points of failure for the client/server PACS architecture. (2) Describe how the clinical workflow would be impacted for each of the single points of failure. (3) Provide solutions to address the single points of failure identified. (4) Develop a testing script of how to perform acceptance testing for each of the single points of failure. (5) If the EMR is now integrated with PACS, describe how the workflow would be impacted during each of the single points of failure in PACS as identified in exercise 1.

Further reading [1] R.A. Bauman, G. Gell, S.J. Dwyer III, Large picture arching and communication systems of the worlddparts 1 and 2, Journal of Digital Imaging 9 (3 and 4) (1996), 99e103, 172e177. [2] R.E. Dayhoff, K. Meldrum, P.M. Kuzmak, Experience providing a complete online multimedia patient record, in: Session 38. Healthcare Information and Management Systems Society, 2001 Annual Conference and Exhibition, Feb.4e8, 2001. [3] R. Dayhoff, E.L. Siegel, Digital imaging within and among medical facilities, in: R. Kolodner (Ed.), Computerized Large Integrated Health NetworksdThe VA Success, Springer Publishing, New York, 1997, pp. 473e490. [4] D.S. Channin, Integrating the healthcare enterprise: a primer. II. Seven brides for seven brothers: the IHE integration profiles, RadioGraphics 21 (2001) 1343e1350. [5] D. Channin, C. Parisot, V. Wanchoo, A. Leontiew, E.L. Siegel, Integrating the healthcare enterprise: a primer. III. What does IHE do for me? RadioGraphics 21 (2001a) 1351e1358. [6] D.S. Channin, E.L. Siegel, C. Carr, Sensmeier, Integrating the healthcare enterprise: a primer.V, The Future of IHERadioGraphics 21 (2001b) 1605e1608. [7] DICOM Standard, 2003. http://medical.nema.org/. [8] DICOM: Digital Imaging and Communication in Medicine, National Electrical Manufacturers’ Association, NEMA, Rosslyn, VA, 1996. [9] A.J. Duerincks, Picture Archiving and Communication system (PACS), in: Proc. SPIE for Medical Applications, vol. 318, 1982. Newport Beach, CA.

163

164

Brent J. Liu and H.K. Huang

[10] HL7: Health Level Seven, An Application Protocol for Electronic Data Exchange in Health Care Environments. Version 2.1, Health Level Seven, Inc., Ann Arbor, MI, 1991. [11] Health Level Seven. http://www.hl7.org/. [12] HL7 Version 3.0: Preview for CIOs, Managers and Programmers. http://www.neotool.com/ company/press/199912_v3.htm#V3.0_preview. [13] H.K. Huang, Enterprise PACS and image distribution, Computerized Medical Imaging and Graphics 27 (2e3) (2003) 241e253. [14] H.K. Huang, PACS and Imaging Informatics: Basic Principles and Applications, Wiley & Sons, NY, 2004. [15] H.K. Huang, Picture Archiving and Communication Systems: Principles and Applications, Wiley & Sons, NY, 1999. [16] H.K. Huang, K. Andriole, T. Bazzill, et al., Design and implementation of a picture archiving and communication system: the second time, Journal of Digital Imaging 9 (1996) 47e59. [17] H.K. Huang, S.T.C. Wong, E. Pietka, Medical image informatics infrastructure design and applications, Medical Informatics 22 (4) (1997) 279e289. [18] B.J. Liu, F. Cao, M.Z. Zhou, G. Mogel, L. Docemet, Trends in PACS image stroage and archive, Computerized Medical Imaging and Graphics 27 (2e3) (2003) 165e174. [19] B.J. Liu, L. Documet, D. Sarti, H.K. Huang, J. Donnelly, PACS archive upgrade and data migration: clinical experiences, in: Proceedings SPIE Medical Imaging, San Diego, CA, vols. 4685e14, 2002, pp. 83e88. [20] B.J. Liu, H.K. Huang, F. Cao, L. Documet, D.A. Sarti, a fault-tolerant back-up archive using an ASP model for disaster recovery, SPIE Medical Imaging 4685e15 (2002) 89e95. [21] C.J. McDonald, The barrier to electronic medical record systems and how to overcome them, Journal of the American Medical Informatics Association 4 (May/June) (1997) 213e221. [22] R. Osman, M. Swiernik, J.M. McCoy, From PACS to integrated EMR, Computerized Medical Imaging and Graphics 27 (2e3) (2003) 207e215. [23] E.L. Siegel, J.N. Diaconis, S. Pomerantz, R.M. Allman, B. Briscoe, Making filmless radiology work, Journal of Digital Imaging 8 (1995) 151e155. [24] E.L. Siegel, B.I. Reiner, Filmless radiology at the Baltimore VA medical center: a nine year retrospective, Computerized Medical Imaging and Graphics 27 (2e3) (2003) 101e109. [25] E.L. Siegel, D.S. Channin, Integrating the healthcare enterprise: a primerdpart 1. Introduction, RadioGraphics 21 (2001) 1339e1341. [26] F. Yu, K. Hwang, M. Gill, H.K. Huang, Some connectivity and security issues of NGI in medical imaging applications, Journal of High Speed Networks 9 (2000) 3e13. [27] X. Zhou, H.K. Huang, Authenticity and integrity of digital mammography image, IEEE Transactions on Medical Imaging 20 (8) (2001) 784e791. [28] www.rsna.org/IHE. [29] EMRConsultant, Hospital Information Systems, 8/21/2013. http://www.emrconsultant.com/emreducation-center/emr-selection-and-implementation/hospital-information-systems-his/. [30] Wikipedia, Electronic Health Record. https://en.wikipedia.org/wiki/Electronic_health_record. [31] HIMMS Analytics EMRAM. https://www.himssanalytics.org/emram. [32] B.L. Mooney, A.M. Boyle, 10 steps to successful EHR implementation, Medical Economics 88 (9) (2011). S4-6, S8-11. [33] D. Golder, Meaningful use of HER e the rules are final. Now what? Journal - Oklahoma State Medical Association 103 (9) (September 2010) 433. [34] H.K. Huang, PACS-based Multimedia Imaging Informatics: Basic Principles and Applications, third ed., Wiley & Sons, NY, 2019. [35] M.J. Gray, Latest trends in PACS/IT; solutions that mitigate PACS shortcomings are today’s hot commodity, Advance for Imaging & Radiation Oncology 21 (6) (2011) 22e24. [36] D. Yeager, VNA: should you? Radiology Today 16 (11) (2015) 24.

CHAPTER FIVE

Machine learning in medical imaging Ashnil Kumar1, 2, Lei Bi1, 2, Jinman Kim1, 2, David Dagan Feng1 Biomedical & Multimedia Information Technology (BMIT) Research Group, School of Computer Science, The University of Sydney, Sydney, NSW, Australia ARC Training Centre for Innovative BioEngineering, Sydney, NSW, Australia

1 2

5.1 Medical imaging 5.1.1 Role in healthcare In medicine, imaging refers to the use of devices to capture visual information about a given patient. These devices are not limited to optical cameras (e.g., dermoscopes) but also comprise imaging scanners that use reconstruction algorithms to derive images from other nonvisual forms of datade.g., the location of positron annihilation in positron emission tomography (PET) [1,2], reflection of sound in ultrasound (US) [3], or radiation absorption in X-rays and computed tomography (CT) [4]. The exact characteristics of medical imaging data may vary depending on the image modality, the department/application the images were acquired for, and the acquisition protocols used. A key example is magnetic resonance (MR), where changes in the acquisition protocol result in differences in image contrast among different tissues [5]. Another common example is the use of chemical contrast mediums to modulate the physiology of the patient to capture particular anatomical structures (e.g., the use of iodine in angiography to enable differential visualization of blood vessels) [6]. Similarly, in nuclear medicine, different radiotracers are bound to ligands that are preferentially taken up by different structures, with areas of high radiotracer concentration appearing as “hot spots” within the image data [7]. Finally, different imaging modalities exhibit different noise characteristics [8e10]. The variety and complexity of imaging data continue to increase through new advancements in device and tracer technologies; hybrid devices now combine multiple-modalities (PET-CT, PET-MR, SPECT-CT, etc.) to depict complementary information, new contrast media for CT and MR allow better visual discrimination of tissues, and novel radiotracers for PET enable new methods to examine biological function [11]. At the same time, developments in the sensor technologies used in imaging devices are increasing both the image resolution and the field of view, capturing larger views of subjects more precisely. The EXPLORER total-body PET scanner is one such example, where breakthroughs in lutetium oxyorthosilicate crystals and lutetiumyttrium oxyorthosilicate detectors have enabled the creation of a device with a larger Biomedical Information Technology ISBN 978-0-12-816034-3, https://doi.org/10.1016/B978-0-12-816034-3.00005-5

2020 Elsevier Inc. All rights reserved.

167 j

168

Ashnil Kumar, Lei Bi, Jinman Kim and David Dagan Feng

axial field of view that can capture dynamic activities across the entire human body at high resolutions with an almost 40-fold increase in sensitivity [12]. The scale of acquisition of visual data in volume, variety, and velocity is increasing the effort required to analyze and interpret the images. Given the volume of data and its complex visual contents (as just described), machine intelligence approaches offer a way to complement and support the analysis and interpretation of medical images for a range of clinical tasks. This is a challenge that researchers and industry (IBM, Siemens, etc.) are actively looking to address.

5.2 Machine intelligence and machine learning Machine intelligence, often called artificial intelligence, refers to the capacity of machines to exhibit the cognitive functions demonstrated by humans and other animals [13,14]. It is, in essence, the ability of human-constructed “agents” (software tools or algorithms) to mimic the natural processes of knowledge acquisition, interpretation, and decision-making, often for the purpose of performing tasks in an autonomous or semiautonomous fashion. As such, machine intelligence is predicated on algorithms that are capable of using the acquired data to perceive the information necessary for the task from the environment [15,16], reasoning about potential actions to complete the task, and deciding upon the actions that maximize the chance for success [17]. In the current era, a fundamental component in building machine intelligence solutions is the process of machine learning (ML) [18,19]. Conceptually, ML refers to computerized algorithms that are essentially able to “learn from experience” by progressively improving their performance on a given task based on their ongoing successes and failures at that task on a given dataset [18,20]. Thus, ML introduces the critical feedback loop whereby machine intelligence solutions can improve their ability to perform the requisite task and is analogous to the human capacity to improve and adapt through experience. An overview is shown in Fig. 5.1. ML solutions can iteratively improve their perception, reasoning, or decision-making performance for different tasks. In contrast, systems based on programmatically following predefined rulesets [21,22] rely upon the humans writing the rules to “think” of all the possible variations the system will need to process. As such, these rule-based systems are generally incapable of adapting to changes or inputs that were not anticipated because they lack the fundamental ability to learn and improve. Modern ML techniques developed for the general domain usually make use of the “big data” nature of different problem domainsdphotographic image analysis, financial analysis, etc. They are typically predicated on the ability to capture vast quantities of data alongside structured labels that define the “meaning” of the data. However, these circumstances are not present in the medical domain, leading to datasets that are smaller relative to those in the general domain, that have uneven distributions, and where labels

Machine learning in medical imaging

Figure 5.1 An overview of a typical machine learning (ML) approach for image data. The image data are fed into an ML computerized model, which analyzes the data to perform a particular task. An optimizer verifies how well this task has been performed, and the model is then updated (blue line) to improve its performance. This process iteratively continues until the optimizer determines that the model’s error rate has reached its minimum or its accuracy has reached a maximal rate. The dashed lines represent additional inputs into the ML approach that are necessary for specific forms of learning.

may be unstructured or uncertain. In the general domain, labeled datasets have been created with mechanical turks [23], in which an Internet-based platform is used to solicit annotations for image data from everyday people. However, in the medical domain, it is difficult to use mechanical turks to create labeled datasets due to the requirement that trained physicians conduct the labeling as well as variance in physician training, experience, and specialty expertise. These characteristics of medical imaging data mean that direct application of generic ML solutions can be limited. The range of cognitive problems in medical imaging can mean that different learning approaches may need to be derived for each individual challenge. These challenges are compounded by the relatively smaller volume of data, especially labeled training data, available in medical imaging. The consistency of labeling and the distribution of samples are other characteristics of medical data that need to be accounted for by these ML solutions. In the following sections, we will briefly summarize the main categories of ML algorithms and then describe the ways in which they have been adapted for a variety of medical image analysis applications. Our intent is not to list all research that has explored ML in medical imaging, but rather to provide an overview of the main approaches through which ML research has been adapted, tuned, or extended for the characteristics of medical imaging data.

5.3 Supervised learning 5.3.1 Overview Supervised learning is a form of ML that learns the mapping between a data sample and the desired output based on a set of sample-output pairs (also known as the training dataset); it is the most common form of ML [24]. Let X be the domain of the data samples and Y the domain of all possible outputs. The process of supervised learning

169

Ashnil Kumar, Lei Bi, Jinman Kim and David Dagan Feng

170

derives the parameters q of the mapping function f: X/ Y given the training dataset Dtrain ¼ {(x1,y1), (x2,y2),.,(xn,yn)}, in which xi ˛ X represents the ith data sample and yi ˛ Y represents the desired output (the label) of xi. In medical imaging, every xi is usually a representation of a particular image. This representation can take many forms, but the most common is a d-dimensional feature vector (xi ˛ Rd) that quantifies the visual attributes of an image: e.g., color [25], texture [26,27], or shape [28]. The parameters q of the mapping function f are learned by either maximizing a score S over Dtrain that prioritizes f that obtain a higher degree of output correctness: n 1X max S ðyi ; f ðxi ; qÞÞ q n i

(5.1)

or by minimizing a loss L over Dtrain that prioritizes f that have a lower degree of output error: arg min q

n 1X Lðyi ; f ðxi ; qÞÞ n i

(5.2)

The choice of S or L may vary depending upon the specific task or the data. Some common examples are the hinge loss used in support vector machines (SVMs) [29e31] and the cross-entropy lost that is common in convolutional neural network (CNN) classifiers [32,33]. When the size of Dtrain is small (in comparison to the number of parameters q in f ), then a risk of supervised training is overfitting, which is when the parameters q produce a function f that is highly accurate for the data in Dtrain but does not generalize well to unseen data. That is, the parameters q tend to memorize Dtrain. A common technique to combat overfitting is regularization [34], which adds a penalty to S or L according to the complexity of the parameters q. The penalty perturbs the learning process to ideally force the optimization to learn parameters q that generalize better to unseen data. A cross-validation evaluation process [35] is usually followed to assess generalizability, especially with smaller Dtrain. In such schemes, the training data are divided into multiple folds and the algorithm is repeatedly trained on different folds (and evaluated on the other folds) to discover whether the parameters lead to consistent performance across all folds. One issue with cross-validation in the medical imaging domain is that many datasets include multiple images from the same patient. In these cases, it is important to ensure that all the images from one patient remain in the same fold to avoid contaminating the cross-validation process; the risk is that this may further reduce the size of each fold, leading to further challenges with overfitting.

Machine learning in medical imaging

5.3.2 Classification with supervised machine learning Classification is the process of categorizing samples into categories (referred to as classes) that are known a priori. It is often used in medical imaging to separate a dataset of images into different clinically relevant classesde.g., classifying images by disease type [36], grouping images by modality [37,38], identifying the numerous clinical subtypes in one image [39,40], or determining whether an image depicts a benign or malignant case [41,42]. Many medical image classification tasks often require the consideration of multiple classes. A major challenge in multiclass classification problems is that of class imbalancedi.e., where the distribution of the classes in the training dataset is disproportionate. An example of such skewness can be seen in the public ImageCLEF modality classification dataset, which comprises 30 classes where the number of training samples ranges from small (1 sample) to large (2954 samples) [43]. Obtaining additional data from some other source is one possible way to address this class imbalance but faces the same challenges in labeling as identified previously; moreover, in image classification tasks that rely upon subtle visual characteristics, there is a risk that the labels assigned by the new source are inconsistent with those assigned in the original dataset due to differences in experience, training, etc. As such, resampling techniques are the most common methods used to address data imbalance issues. Some classification methods use oversampling [44,45], in which the less frequent classes are randomly sampled additional times to achieve an equivalent distribution; however, this risks overfitting the smaller classes to only those samples that are present, as they will be seen multiple times during the training process. Other methods use undersampling [46], in which some elements of the larger classes are randomly subsampled to achieve data balance; however, this further reduces the size of the overall dataset, potentially leading to overfitting. Another technique is bootstrap resampling [47], where ensembles of multiple classifiers are trained on the same problem with different datasets, each of which comprises elements of the original dataset with combined over- and undersampling. The idea is to use the ensemble to avoid overfitting, as each classifier in the ensemble will have been trained on different data, thus potentially leading to better generalization as a whole. The final method is data augmentation [32], where the data from the original dataset are resampled but with label-preserving distortionsdcrops, flips, etc. This allows the creation of different images with the same label. 5.3.2.1 Nearest neighbor approaches One of the earliest forms of the use of labeled data in medical imaging was the k-nearest neighbors (kNNs) approach [48]. The training dataset was projected onto the feature space, after which a new unlabeled sample was classified based upon a weighted combination of the k labeled samples that were closest to the new sample within the feature space. This approach can be used in a variety of ways. Van Ginneken et al. [49] used

171

172

Ashnil Kumar, Lei Bi, Jinman Kim and David Dagan Feng

kNNs for automatically classifying abnormalities in chest X-ray images according to texture patterns. Murphy et al. [50] used two kNNs as part of a framework for shapebased lung nodule detection and to reduce the number of false positive detections. Kumar et al. [51] used a kNN approach inspired by image retrieval to enable the annotation of CT images of the liver, classifying aspects such as the lobes of the liver, properties of the lesion, and vasculature. 5.3.2.2 Support vector machines SVMs have been used to define a space in which the different classes are maximally separabledi.e., SVM classification learns from the training dataset the projection into a higher dimensional space where two classes can be separated by a hyperplane that maximizes the margin separating the classes [34]; new unlabeled samples are classified according to the side of the hyperplane on which they fall. While linear SVMs are the most common form, many nonlinear variants have also been designed [52], enabling the application of SVMs to a wide variety of data and applications. Fan et al. [52] introduced a framework leveraging nonlinear SVMs to classify abnormal brain MR images using morphological features. Zhou et al. [53] trained SVMs to identify bone fractures across a variety of anatomical imaging modalities (X-rays, CT, MR). Teramoto et al.’s [54] pipeline for lung nodule classification first detected nodule candidates with a shape filter and reduced the false positives among the candidates with an SVM tuned on seven of the key shape characteristics. In a follow-up study, they applied a similar approach to PETCT images, but this time used two SVMs for false positive reduction, one for each modality [55]. Song et al. [56] designed a multistage classification approach for thoracic PET-CT images in which SVMs were used for each stage: to detect abnormalities, to identify tumors against nodal disease, and for false positive reduction. While these studies have all demonstrated the flexibility of SVMs for medical image classification, they are generally limited to SVM strength as binary classifiers: they divide the dataset into two distinct classes. Two main SVM techniques have been utilized to deal with such multiclass classification problems, 1-v-1 and 1-v-all SVMs. The concept behind the 1-v-1 SVM approach ðm2 mÞ is to train 2 SVMs, where m ¼ jYj is the number of classes. Each SVM acts to distinguish whether the unlabeled sample belongs to one of two different classes; the results from multiple binary classifications are combined to determine the final output. Similarly, the 1-v-all approach comprises m SVMs, where each SVM is trained to determine whether the unlabeled sample belongs to a specific class. Generally, a classspecific SVM will return a positive result, indicating the class of the new sample. In the case where multiple class-specific SVMs return positive results, tiebreakers with 1-v1 SVMs may be used. Multiclass SVM approaches have been widely applied for a variety of problems. Zhu et al. [57] adapted a 1-v-1 multiclass SVM to classify neurodegenerative disorders in PET and MR using 202 images; a tenfold cross-validation scheme was

Machine learning in medical imaging

used to ensure that the developed method did not overfit to the training set. Rahman et al. [58] used multiclass SVMs within a medical image search framework; the probabilistic outputs of the SVMs were used to filter the search space to identify the relevant categories to be searched. 5.3.2.3 Supervised deep learning Artificial neural networks, another form of ML, are designed to learn approximations of nonlinear functions [59]. They can also be designed as multiclass classifiers through the use of activations, such as the softmax function [60] that provides the probability that the sample belongs each class, and the cross-entropy loss [36], which decreases logarithmically as the probability of the true class approaches 1. Deep CNNs, which have dominated image classification competitions in recent years [32,33,61,62], make up a deep learning artificial neural network architecture optimized for imaging data due to their ability to learn spatial features from imaging data in a hierarchical fashion, with deeper layers learning more features that are potentially more relevant to the classification application. Thus, the strength of deep CNNs lies in their capacity to internally learn a representation of the feature space where the classification problem can be solved more effectivelydi.e., where Eqs. (5.1) and (5.2) can be more effectively optimized. For this reason, deep CNNs are now considered the state-of-the-art for medical image classification on a variety of imaging modalities for both binary and multiclass problemsde.g., CT lung nodule detection [63], cell pattern classification [64], and prostate cancer detection in MR [65]. Two landmark studies on large-scale datasets have demonstrated that CNNs have the capacity to obtain classification accuracies consistent with those obtained by clinical specialists in skin lesion classification in dermoscopic images [42] and the identification of macular abnormalities in optical coherence tomography images [66]. Researchers have also explored how different CNNs can be used as an ensemble to address very challenging problems with skewed data distributions, with the concept that individual models within the ensemble have distinct strengths that can be pooled together to achieve higher accuracy than the individual CNNs could achieve separately [37,67,68]. A key drawback of many CNNs is that training them generally requires a large quantity of labeled image datade.g., roughly 1.2 million labeled images across the 1000 classes in the ImageNet classification challenge [69]. The reason that deep CNNs require such vast quantities of data is because of the number of parameters they containde.g., the AlexNet architecture has about 60 million parameters [32]. Shin et al. [70] carried out an extensive empirical study to examine the performance of different CNN architectures and learning variants on different datasets to derive insights that could be applied to medical data. They demonstrated that a transfer-learning approach, in which a CNN pretrained on some other problem domain (usually the ImageNet challenge data) can be adapted for a medical imaging problem where there is limited data. The CNN is

173

174

Ashnil Kumar, Lei Bi, Jinman Kim and David Dagan Feng

used as a feature extractor and the features are passed to a different classifier (generally an SVM) that is trained to address the medical imaging classification problem [70]; an example is illustrated in Fig. 5.2. Alternatively, the pretrained CNN is retrained (finetuned) with the smaller medical image dataset to make it more relevant to the medical imaging domain. Tajbakhsh et al. [71] demonstrated that a CNN fine-tuned on a smaller dataset could achieve similar classification results as those obtained by CNNs trained from scratch. Transfer-learned and fine-tuned CNNs have shown great success in a number of areas including cell image classification [72e74], X-ray image classification [75e77], US image classification [78e81], skin cancer classification [42,82e84], and ensembles in a wide variety of applications [37,67,84]. CNNs can also be custom designed for different applications. In medical imaging, the limited data mean that CNN design requires further tuning to select the architectural

Figure 5.2 The concept of transfer learning with CNNs. A CNN is first trained in a general (photographic) domain where there is a large quantity of labeled training data. After training, the convolutional (Conv) layers and the fully connected (FC) layer associated with generic features are treated as a pretrained CNN. This pretrained CNN can be used as a generic feature extractor for a medical imaging domain where there is a smaller quantity of labeled training data; in this example, the generic features extracted from the medical images are used to train a support vector machine (SVM).

Machine learning in medical imaging

hyperparameters (number of convolutional layers, the size of the convolutional kernels, the number of filters, etc.) as well as training hyperparameters (learning rate, regularization, etc.); the intent is to have a model with sufficient capacity for the volume of training data available. Kamnitsas et al. [85] designed a 3-D CNN for brain lesions, choosing initialization and normalization strategies as well as kernel sizes to avoid overfitting to the smaller dataset available to them. In anatomical landmark detection, Zhang et al. [86] designed two CNNs staged in a hierarchical fashion: the first was trained on millions of patches from a small dataset to learn the association between patches and the anatomical landmarks, while the second (which shared some weights with the first) was used to predict the coordinates of the landmark. 5.3.2.4 Multilabel classification A different type of classification problem is multilabel classification, in which each sample has more than one label [87,88]. The most common technique to approach this problem is by using an ensemble of binary classifiers that predict the presence of each possible label [89] or by loss functions that do not assume that each label is mutually exclusive (e.g., Hamming distance or Jaccard index) [90,91]. Multilabel classification is a common task in medical imaging due to the visually complex content of the images as well as the complexity of the clinical conditions they describe. However, while multilabel classification is well-suited to the complex nature of medical image data, current applications are limited due to the significant effort required to collate the training data where samples have been given multiple labels. Where such training data are available, researchers have created algorithms for organ detection [92] and image annotation [51,93,94]. A major challenge is that many multilabel problems and thus their corresponding datasets can have skewed label distributions (some labels are more frequent than othersde.g., in cell protein localization [95]). This requires that the classifier be designed to reduce overfitting to the more frequent labels. 5.3.2.5 Classification of multimodality imaging data The emergence of multimodality imaging, where one scan can comprise multiple types of images, is introducing new challenges for medical image classification. Imaging such as multimodality PET-CT, PET-MR, SPECT-CT, and different MR acquisition protocols contain complementary information (e.g., anatomy from CT and function from PET) that need to be appropriately integrated for particular classification tasks. For example, the classification of lung cancer may benefit from information about the anatomical structures from CT, information about the tumor metabolism from PET, and the spatial localization from the alignment of both modalities [2,96,97]. The approach to multimodality image classification is to construct a feature space that combines or fuses information from each modality. With SVMs, this has been done by concatenating feature vectors obtained from each modality or by integrating the outputs of

175

176

Ashnil Kumar, Lei Bi, Jinman Kim and David Dagan Feng

modality-specific SVMs [98e101]. A number of different CNN approaches have also been recently suggested. These either use each modality as a separate input channel [102] or process each modality separately and then merge the outcomes [103].

5.3.3 Image segmentation with supervised machine learning Segmentation is the process of separating an image into different regions of interest (ROIs). This can take the form of the delineation of the boundary of the ROIs, or the identification of the set of pixels that belong within the ROI. Segmentation is a fundamental element of many medical image analysis pipelines as it allows the separation of different parts of a complex image. The ROIs can then be processed separately in a more optimal mannerde.g., separating anatomical and pathology structures to improve medical image visualization [104]. When viewed from a supervised learning perspective, segmentation can be considered as a per-pixel classification task. The output of the segmentation algorithms are a classification of whether each pixel belongs to a class that represents the boundary of an ROI (delineation or contour algorithms) [105,106] or is contained within an ROI (area algorithms) [107]. The classification decision can be based on image features extracted from the pixel in question, features from its neighborhood, or the contours of the different objects within the image. These features are used to ensure visual consistency for the extracted region. SVMs have been used for the segmentation of retinal blood vessels [108], 3-D US [109], brain tumors [110], and cardiac MR [111]. 5.3.3.1 Segmentation with convolutional neural networks The current state-of-the-art in medical image segmentation algorithms is based on CNNs. Numerous studies have taken the Fully Convolutional Network (FCN) [112], or the U-Net [113], and optimized them for different forms of medical imaging datade.g., skin lesion segmentation [114,115], liver segmentation [116], brain image segmentation [117], and cell detection and counting [118]. The 3-D FCN [119] and V-Net [120] are modifications of the FCN and U-Net that have been adapted to take advantage of the third spatial dimension in volumetric medical imaging modalities (CT, PET, MR). These CNN-based segmentation approaches make use of the convolutional architecture to learn the visual feature maps that are most relevant for the segmentation task, while the deconvolution and up-sampling layers generate the segmentation output (delineation or area) based on these learned characteristics. It should be noted that because such architectures perform per-pixel classification, they can be trained with relatively smaller datasets in comparison to image classification CNNs. However, as the number of pixels belonging to each class or object may be different (a class imbalance issue), a scaled loss function can be used to reduce overfitting to the dominant class [113].

Machine learning in medical imaging

The scarcity of medical imaging training data means that in some cases the boundaries of the segmented ROI may not be well-defined; the multiple down-sampling and upsampling processes in CNN-based segmentation techniques may lose subtle information at region boundaries or may result in small artifacts in sparse regions of the segmented output. As such, it is quite common to couple CNN-based segmentation techniques with postprocessing algorithms that are optimized specifically for the segmentation task. For example, Kamnitsas et al. [85] used a conditional random field to determine the final segmentation from the probabilistic “soft” segmentations produced by the CNN. Similarly, Pereira et al. [121] designed a CNN segmentation algorithm for brain images where a postprocessing step removed artifacts that were smaller than a predefined threshold; the CNN kernel sizes were chosen to reduce overfitting to a small training set size. Bi et al. [114] designed a cascaded FCN that was capable of learning both the coarse appearance and the boundary information, as shown in Fig. 5.3. 5.3.3.2 Segmentation via statistical shape models One of the other challenges with small training datasets is that algorithms must be flexible enough to account for the fact that the same object within different patients may exhibit different structural properties, based on the normal expected variation among humans as well as changes due to disease. Statistical shape models (SSMs) [122] quantify the expected variation of these structures according to a given shape dataset. During segmentation, a reference shape or initial contour is morphed or deformed according to the statistical constraints within the SSM, thereby acting to restrict the segmented shape to the expected variations (i.e., reducing the likelihood of shape outliers). SSM-based techniques have been applied across a variety of anatomical image segmentation tasksde.g., bone segmentation from MR [123], spine segmentation from CT [124], and

Figure 5.3 A cascaded FCN for skin lesion segmentation from dermoscopic images that was designed to address the challenges caused by limited training data [114]. Multiple FCNs were cascaded in a hierarchical structure such that the early-stage FCNs learnt the coarse appearance and localization information while the late-stage FCNs learnt the subtle characteristics of the ROI boundaries. The final output is produced by integrating the complementary segmentation results from the individual FCNs. In the diagram, t represents the index of the FCN across T cascades and Y is the intermediate segmentation output (a probability map).

177

178

Ashnil Kumar, Lei Bi, Jinman Kim and David Dagan Feng

liver segmentation from CT [125]. Level set methods, which can model changes in shape topology through the evolution of the SSM contour, are a well-established family of shape segmentation techniques that have been applied to the segmentation of retinal images [126], neonatal brain images [127,128], and liver tumors [129]. While the noise characteristics of the input image data can be a problem for all ML, shape models can be particularly susceptible to such noise, which may affect the segmentation accuracy. Image modalities with a low signal to noise ratio, such as US, may have multiple candidate locations where the object of interest may occur or may display a high degree of variation/distortion at the borders of the segmented shape [125,130]. For such images, an initialization step indicating the initial position of the contour or a smoothing postprocessing step may be required to obtain a consistent shape boundary in the correct location. A new approach to address this is the integration of deep learning and deformable shape models [131e133], with the deep learning component used to correctly place the initial contour. 5.3.3.3 Saliency-based segmentation The identification of salient regions is another way to approach the segmentation task. The main concept of the saliency approach is to quantify different regions according to their ability to draw the viewer’s visual attention [134]. Saliency algorithms generate saliency maps, images where the pixel intensities indicate areas of the input that have similar “salient” propertiesdi.e., have similar visual importance. Supervised saliency algorithms are often trained with the location of the object of interest as a label (e.g., whether an arbitrary patch from the image contains the object of interest [135]) to learn a visual representation that can be used to quantify the saliency of different regions [136e138]. Saliency algorithms have been used in medical image segmentation tasks as a first stage to distinguish the ROI from the background, with other techniques used to refine the boundaries. Example applications involve the segmentation of skin lesions [139], noisy cell images [140], and vasculature [141]. In recent years, the concept of visual “attention” has combined image and descriptive text data (e.g., a picture and its corresponding caption) with the feature maps learned by CNNs to identify the areas of the image that correspond to different words [142]. It has also been shown that this form of attention mapping is also possible with class labels rather than captions. Zhou et al. [143] used the global average pooling layer to create class activation maps for each individual label, which could be projected back onto the input image to examine the area of the input that was activated for a particular label. In medical imaging, such attention models have been used for the automatic generation of text descriptions, captions, or reports of medical imaging data [40,144,145]. This form of text generation offers a means to potentially reduce clinician reporting times, while also having the visualization capacity to allow humans to interpret and verify the regions of the image used to create the machine-generated report.

Machine learning in medical imaging

5.3.4 Image synthesis with supervised machine learning Image synthesis is the process of artificially generating images that contain some particular desired content. It is analogous to the inverse of the classification problem: generating an image that contains the visual contents that are associated with a specific label. Generative adversarial networks (GANs) [146] are an architecture that can be trained to generate synthetic images. GANs consist of two CNNs that are trained in an adversarial fashion: the generator CNN is trained to create synthetic images in order to fool the discriminator CNN, which is trained to distinguish between real images and artificial ones. In the general domain, GANs have been used to generate synthetic images for a given caption [147]. In the medical domain, it has been surmised that GANs may have the potential to address the limited availability of data through their ability to create visually realistic artificial images that share visual characteristics with real images of a given label or class [147]. These artificial images can then be used to augment the training dataset, increasing the number of samples for the learning process. GANs can be applied separately for each class to rebalance skewed datasets, creating new samples for classes with a few examples, as shown in Fig. 5.4. A number of research studies have used synthetically generated images for data augmentation in cardiovascular MR segmentation [149], skin lesion segmentation [150], CT liver lesion classification [151], and mammogram classification [152]. Bi et al. [148] demonstrated that in segmentation applications, this type of GAN-based data augmentation could be used for a wide variety of medical imaging data. Studies have also suggested that GANs may enable reconstruction of imaging data of one modality from an input that is of another modality [153,154]. This has the potential to reduce the number of imaging scans required for individual patients, which can be time-consuming and expensive. It may also open the door to improving patient safetyde.g., capturing images with nonionizing MR and then using GANs to reconstruct CT, which may reduce patient exposure to ionizing X-ray radiation.

5.4 Unsupervised learning 5.4.1 Overview Unsupervised learning [155] is a form of ML that learns from data samples where the desired output is not specified. As such, unlike supervised approaches, unsupervised learning cannot directly learn mappings between X the domain of the data samples, and Y the domain of possible outputs. Instead, unsupervised approaches aim to discover hidden patterns or relationships that may occur between groups of the elements in the training dataset Dtrain ¼ {x1,x2,.,xn} that may encode or represent the structure of X. Thus, unsupervised learning derives the parameters q of a function f: X / Rd such that

179

180

Ashnil Kumar, Lei Bi, Jinman Kim and David Dagan Feng

Figure 5.4 Using GANs to augment a segmentation training dataset by deriving new samples for each class [148]. A data splitter separates according to class information. Different GAN models are then trained using the corresponding training data for each class. After training, new samples are generated and combined with the original training data to train a new FCN for medical image segmentation.

k f ðxi ; qÞ f ðxj ; qÞk/N

(5.3)

as the patterns in the samples xi and xj become increasingly different, or k f ðxi ; qÞ f ðxj ; qÞk/0

(5.4)

as the patterns become increasingly similar. Multiple functions/patterns may be learned for any given Dtrain, and it is possible for samples to share some patterns and not others. Conceptually, unsupervised learning is the process of searching for commonalities and differences in the raw data, and to define new samples according to the commonalities and differences they share. The lack of reliance on labels means that unsupervised learning techniques can take advantage of a domain where there is a large volume of data that are difficult, time-consuming, or expensive to label, as is the case in medical

Machine learning in medical imaging

imaging. Furthermore, unsupervised techniques open the door to the possibility of using large unlabeled datasets to initialize supervised learning on smaller datasets (e.g., transferlearning or fine-tuning) to enable development of medical image analysis techniques in a semisupervised fashion. A key limitation of unsupervised learning is that the patterns discovered may not necessarily have any semantic meaning within the data domain. As such, careful definition of the search process is necessary in order to ensure that the patterns that are learned may be useful for the problem domain. For example, if the data are distributed nonlinearly, then a method that returns a linear form of f may not be applicable. Unsupervised techniques may also be susceptible to outliers and care must be taken to validate how the learned patterns have been affected by the presence of potential outliers.

5.4.2 Unsupervised clustering Clustering is an ML technique that groups samples such that the samples in each group fulfill some measured or perceived criteria that are not fulfilled by samples in another group; the grouping occurs without the use of any prior knowledge [156]. The exact criteria may differ depending on the dataset or application. For example, in the k-means algorithm [157], samples are divided into k groups (or clusters) such that the samples in one group have a different mean (as a group) compared to the samples in a different group. Meanwhile, density based clustering algorithms [158] attempt to group points such that each group forms a dense collection within the feature space. Hierarchical clustering algorithms [159] work by first defining each sample as separate clusters (or all samples as one cluster, if top-down) and then iteratively merging (or dividing, if topdown) clusters according to a distance metric until only one cluster remains (or as many clusters as there are samples remain). A human can then view the generated hierarchical tree to determine which level of the hierarchy is most appropriate, based upon their knowledge about the problem domain. Hierarchical clustering is generally the method of choice for smaller datasets because the hierarchies can be visualized with a dendrogram [160], which can be used to visually confirm the appropriateness of the clustering (number of clusters and nesting of levels) and to identify outliers. Unlike supervised classification, where the samples are divided across multiple known classes, unsupervised clustering aims to divide the data into several groups whose identity is not defined. Unsupervised clustering can be utilized in circumstances where there exists a large collection of unlabeled data and where it is not feasible to label sufficient data in pursuit of a supervised process. This form of unsupervised clustering can lead to semisupervised approaches (see Section 6). Humans can examine the samples in the clusters (e.g., visualize the images that fall within the clusters) and determine whether each cluster represents a class or category from the larger problem domain. Alternatively, in situations where small amounts of labeled data are available, the labeled

181

Ashnil Kumar, Lei Bi, Jinman Kim and David Dagan Feng

182

samples can be used as constraints to ensure that the generated clusters reflect the known groupsde.g., labeled samples can be used as cluster seeds or to refine the final clusters [161]. 5.4.2.1 Image segmentation via unsupervised clustering Unsupervised image segmentation is a well-established domain for clustering algorithms [161]. The pixels within a particular image are divided into different groups based on characteristics such as their intensity and spatial location; after the computation is complete, each cluster represents a different object or tissue type. Depending on the modality or application, such segmentation algorithms may be “fuzzy,” allowing pixels to belong to multiple clusters with different degrees of membership. This is may be due to characteristics such as the partial volume effect [162], where the details of small regions may be lost due to the scanner’s limited ability to capture fine details. Pham et al. [162] adapted the fuzzy C-means algorithm to model the intensity heterogeneity in MR and demonstrated that this technique was more effective when automatically segmenting noisy image data. Similarly, Belhassen et al. [163] proposed a fuzzy clustering algorithm to quantify PET tumors, adapting to the noise and low resolution of the imaging modality. As shown in Fig. 5.5, Bi et al. [164] used unsupervised clustering to identify the boundary, foreground, and background of skin lesions for cluster-based reconstruction of these regions. These approaches have also been adapted for brain tumor segmentation [165,166]. Other works have used clustering to generate initial segmentations that are later refined via level sets [167e169].

(D) (F)

(A)

(B)

(C)

(E)

(G) (H)

Figure 5.5 An unsupervised skin lesion segmentation method [164]. An input image (A) is segmented into smaller superpixels (B). The image boundary (in the form of superpixels) are then clustered (C). Reconstruction maps (DeE) are created for each cluster by measuring the similarity of the boundary cluster to the rest of the image. Finally, the different reconstruction maps are merged into a single map (G), which was then iteratively refined (H).

Machine learning in medical imaging

5.4.3 Unsupervised representation learning Representation learning (also called feature learning) are ML techniques that derive from the dataset a feature space in which the data can best be characterized [170]. Note that representations can also be learned in a supervised setting, such that the learned representations are reliant upon the characteristics that differentiate between samples belonging to each label. The presence of labels in a supervised setting means that the optimal representations can be learned directly. In contrast, in an unsupervised setting the representations must be derived entirely from the data alone. It is generally important to have more samples than the dimensionality of the input to avoid learning the identity function (memorizing the input samples, leading to an inability to generalize to unseen data). 5.4.3.1 Statistical approaches for unsupervised representation learning One way to construct representations is according to the statistical properties of the features in an imagede.g., the correlation or covariance among pairs of features. Techniques such as independent component analysis (ICA) [171] and principal component analysis (PCA) [172] are two common approaches that have been used in medical imaging. ICA is a technique for deriving a set of independent latent variables (components) that can be linearly combined to form the raw samples in the dataset. Conceptually, it is the process of representing the samples according to the uncorrelated features across the entire dataset. ICA-based approaches have been used on fMRI imaging to discover features related to physical or physiological processes in the brain [173], to extract brain networks based on white matter pathways [174], and to identify tumor tissues [175]. PCA is a similar process for data representation via components that can be linearly combined, with the key difference being that the PCA components are derived to explain the variability in the datasetdi.e., the first component encodes most of the variability, the second component encodes the second most, etc. By selecting the subset of the components that encode a high degree of the variability (90% or greater are common thresholds), PCA can be used as a dimensionally reduced representation of the data. The PCA-reduced representation can then be used in supervised classification (even when the dataset is small or skewed) [37,176], unsupervised segmentation [165,166], and for indexing and searching medical image data [177e179]. 5.4.3.2 Deep unsupervised representation learning Another form of representation learning is the autoencoder [19], which is an unsupervised artificial neural network that is trained to reconstruct the input data through an intermediate hidden or latent representation. Specifically, the autoencoder comprises two components: an encoder that transforms the input data into the hidden

183

Ashnil Kumar, Lei Bi, Jinman Kim and David Dagan Feng

184

representation and a decoder that transforms the hidden representation in an attempt to reconstruct the input. The autoencoder is trained to minimize the difference between the original input and the reconstruction, meaning that the hidden representation becomes an encoding of the original data. Autoencoders are generally seen as a dimensionality reduction or data compression method that preserves the underlying information in the original data [180]. A key element in autoencoder design is the size of the hidden representation in contrast to the number of original data points it can model; an expressive autoencoder will produce a hidden representation in which each element can be reused in reconstructing many original samples [170]. Examples of the use of autoencoders in medical imaging include applications such as compression of mammogram data [181], multiple organ detection [182], and breast cancer histopathology image analysis [183]. There are many variants of autoencoder architectures. Denoising autoencoders are designed to work with corrupted or noisy data and have been used in the medical domain for data imputation and smoothing/denoising of noisy images [184]. Su et al. [185] utilized a stacked denoising autoencoder for cell segmentation by learning to reconstruct cell boundaries from the gradients of image patches containing cells. Mehta and Majumdar [186] used a denoising approach for the reconstruction of CT and MRI images from subsampled data. Convolutional autoencoders incorporate convolutional layers to learn the hidden representation that exploits the spatial characteristics of image data [187]. Chen et al. [188] formulated a convolutional encoder-decoder network to reconstruct low-dose CT images in a manner that reduced noise while preserving structural details. Huang et al. [189] described a convolutional autoencoder that was used to learn functional MR patterns when studying task-specific brain network activity in a domain where there was no underlying ground truth of the neural activity.

5.5 Semisupervised learning Semisupervised learning [190] is a form of ML in which both labeled and unlabeled data are used to learn mappings between samples and the desired output; it is often a combination of unsupervised learning to derive a representation from a large dataset, and supervised learning to map the representation space to known classes or categories. Thus the process of semisupervised learning derives the parameters q and w of two mapping functions f: X/ Rd and g: Rd / Y given the labeled training dataset Dtrain, labeled ¼ {(x1,y1), (x2,y2),.,(xn,yn)} as well as the unlabeled training dataset Dtrain, unlabeled ¼ {xnþ1, xnþ2,.,xnþm} in which xi ˛ X represents the ith data sample and yi ˛ Y represents the desired output (the label) of xi. Conceptually, a general approach is to weight the sample-output mapping (supervised form) with the sample similarity (unsupervised form):

Machine learning in medical imaging

0 1 n nX þm X 1 1 min@ Lðyi ; gð f ðxi ; qÞ; wÞÞ þ l kf ðxi ; qÞ f ðxj ; qÞkA q;w n i m j ¼ nþ1

185

(5.5)

where L is a loss function and l is the weight. While the optimization above is shown in a combined form, it can be carried out separately, in which case the unsupervised learning (latter term) can act as an initialization for the supervised component (former term). Semisupervised learning methods incorporate a broad range of the supervised and unsupervised learning techniques described in the previous sections. Due to the scarcity of labeled training data in the medical domain, many studies have attempted to leverage larger collections of unlabeled data together with smaller labeled datasets in semisupervised medical image analysis applications. As an example, Chang et al. [191] proposed an unsupervised convolutional sparse coding technique to learn a visual dictionary from a large unlabeled dataset, with the intent that the visual dictionary could be transferred or fine-tuned to another smaller dataset with labeled samples. Similarly, semisupervised techniques have also been used for medical image segmentation [192e197] and for improving classification and detection applications [198e200]. Semisupervised methods can also include interactive techniques in which the supervised component comprises a user’s input, which is used to initialize the unsupervised process. In segmentation, the supervised input could be specification of cluster seeds specifying some initial points that should be included within the region being segmented (e.g., points that represent the foreground or background regions). Alternatively, the supervised input could specify the area of the image within which the unsupervised algorithm is expected to operate [201] (e.g., the area to be segmented as shown in Fig. 5.6).

5.6 Reinforcement learning Reinforcement learning [202,203] is a form of ML where an agent learns to complete a particular task by cumulative rewards for successfully achieving the task through a sequence of actions. In contrast to supervised learning, reinforcement learning does not require labeled sample-output data nor does it directly/immediately penalize incorrect or suboptimal predictions or actions. Rather, the focus is on rewarding the machine for eventually arriving at the optimal or correct solution even if this requires actions that may be suboptimal individually. Let p¼(a1,a2,.,aT) be a sequence of actions chosen from an action set A, taken over T time steps, and r(aje) be the reward for taking action a given environment state e ˛ E ¼ {e1,e2,.,eN}. Then reinforcement learning is the process of maximizing the total reward to obtain a desired output state U, through applying actions at within the environment et at time t via a transformation function f:

Ashnil Kumar, Lei Bi, Jinman Kim and David Dagan Feng

186

Figure 5.6 Semisupervised PET tumor segmentation via a grow cut algorithm [201]. A user defines a bounding box that encapsulates the tumor on a PET image slice (A). The box is used to calculate the fore- and background seeds (pixel values); the foreground seeds are set to the peak value within the box while the background seeds are set to the value of the pixels belonging to the edges of the box. Beginning from these seeds, the grow cut algorithm will iteratively propagate through the neighborhood in an unsupervised manner to identify pixels belonging to the associated region. Upon convergence it yields a segmentation output (B) that can be visualized through 3-D rendering (C).

max p

T X

gt rðat jet Þ

(5.6)

t

given eT ¼ U and etþ1 ¼ fðet ; at Þ

(5.7)

where 0 dmax

(7.1)

219

Shi-Hao Feng, Jia-Yan Xu and Hong-Bin Shen

220

where P is the predicted contact score for a residue pair, d represents the current Cb-Cb Euclidean distance between them, and dmax is the statistic of the maximum Cb-Cb distance between two residues with significant contact scores (P .5) on a dataset consisting of 150 protein sequences. Optimizing this energy term leads to minimize the distance of residues in contact, which accelerates the conformation search procedure. More and more studies [18e20] and recent CASP competitions [21e23] have demonstrated that the reliable restraints provided by the accurately predicted 2-D contact map are the key factor for realizing accurate 3-D structure prediction.

7.2 Evaluation of prediction performance Contacts in a protein can be divided into three classesdshort-, medium-, and long-rangedaccording to the number of residues that separate the contact residues along the sequence. The numbers of separating residues for these three classes of contacts are 6e11, 12e23, and at least 24, respectively. Residue pairs separated by less than six residues along the sequence are not considered because they are sequentially close and quite likely to be in contact. Among the three classes, long-range contacts receive more attention than the other two due to the important role they play in protein folding [24]. Long-range contacts can largely reduce the conformation space and lead the structure prediction process to a more precise result. In contact-assist structure prediction, not all predicted contacts are used. Usually, contacts are sorted according to predicted scores, and those with higher scores are selected [25]. Thus, the commonly used measurements that evaluate the performance of contact map predictors are the precisions on top L L L 10; 5 ; 2 ; L long-range predicted contacts.

7.3 Contact map prediction models Although contact map is a simplified representation of protein 3-D structure, it is still not easy to predict. There are Pair ¼ CL2 ¼ LðL1Þ residue pairs in a protein with L 2 residues. Pair grows exponentially with length L, as shown in Fig. 7.3. Even a short protein sequence contains many residue pairs. However, among these residue pairs, only about 2%e3% of residue pairs are truly in contact [26]. There are many false-positive samples in the predicted contact map. Despite the difficulties, contact map prediction has attracted many scientific researchers, and various prediction methods have been applied to solve this problem, such as support vector machine (SVM) [26e28], random forest [29,30], direct-coupling analysis (DCA) [31e36], sparse inverse covariance estimation [20], network deconvolution (ND) [37,38], neural network [25,39e44], recurrent neural network [45,46], deep belief network [47], and convolutional neural network (CNN) [48e54]. Meanwhile, many powerful protein features have been employed in prediction modelsdfor

Artificial intelligence in bioinformatics: automated methodology development for protein residue contact map prediction

Figure 7.3 Illustration of the rapid growth of residue pairs with protein length. The numbers in the figure represent the residues in the protein. Pair represents the number of residue pairs. It grows exponentially with the number of residues in the protein, which causes difficulties for contact prediction.

instance, protein secondary structure, protein solvent accessibility, position-specific scoring matrix (PSSM), and multiple sequence alignment (MSA). In this section, we will classify the recently proposed models into three categories: correlated mutation analysis (CMA), direct correlation analysis, and supervised learning models, as shown in Fig. 7.4. These three categories correspond to three stages in the development of contact map prediction with time. For each category, we will describe several representative models. Table 7.1 lists some models mentioned in this article along with the methods they employed and the URLs of the Web servers or programs. PSICOV, BiLSTM, and ND in the Method column represent sparse inverse covariance estimation, bidirectional long short-term memory, and network deconvolution, respectively.

7.3.1 Correlated mutation analysis At the beginning of contact prediction, the majority of successful approaches extract correlated mutation residue pairs from MSA, with the process known as CMA. MSA is a sequence alignment file consisting of three or more protein sequences that are homologous to the query protein sequence. These sequences are thought to descend from a common ancestor. MSA is usually generated using a sequence alignment tool, such as BLAST [56] or HHblits [57], to search against a large protein sequence database. The principle of CMA is that if a pair of residues are in contact and critical for maintaining the foldof a protein, they possibly show the correlated mutation phenomena, which is also called coevolution. That is, if one of them mutates and may destroy the protein structure, the other residue is more likely to mutate into a complementary amino acid to ensure that the structure and function of the protein remain unchanged. Thus, it is reasonable to regard two correlated mutation residues as being in contact.

221

222

Shi-Hao Feng, Jia-Yan Xu and Hong-Bin Shen

Figure 7.4 Illustration of three classes of models. Correlated mutation analysis models mainly predict contacts from the MSA file through correlated mutation phenomena. However, due to transitive noise, the performance is not satisfying. Direct correlation analysis models are designed to tackle transitive noise and enhance precision. Supervised learning models employ powerful algorithms, such as SVM and CNN, and take advantage of a large number of proteins with known structures for training to make predictions.

To analyze and quantify the correlated mutation between two residues, several measures have been proposed from the start. Pearson coefficient [58] was first used to predict contacts in 1994, because it is originally a measure in statistics that describes the correlation between two variables. From the MSA file, it first generates an N by N matrix for each position in the sequence to describe the variation at that position, where N is the number of sequences in the MSA file. The (k, l) entity in the matrix corresponding to position i represents the mutation from the residue observed at position i in protein labeled k to the residue observed at position i in protein labeled l in MSA. Then, each entity is transformed into a number using a residue similarity matrix derived from statistics. The newly obtained matrix is denoted s(i, k, l), which describes the mutations at position i. Finally, the correlated mutation between position i and position j can be calculated as

Artificial intelligence in bioinformatics: automated methodology development for protein residue contact map prediction

Table 7.1 Some protein contact map prediction models. Name Year Method URL

Reference

PSICOV GREMLIN Freecontact CCMpred MetaPSICOV PconsC BND R2C DNCON2 RaptorX DeepContact DeepCov SPOT-contact

[20] [32] [33] [55] [25] [29] [38] [26] [50] [52] [48] [49] [53]

ResPRE

2011 2013 2014 2014 2014 2014 2015 2016 2017 2017 2018 2018 2018

PSICOV plmDCA mfDCA plmDCA NN RF ND SVM CNN CNN CNN CNN CNNþ BiLSTM 2019 CNN

rij ¼

http://bioinf.cs.ucl.ac.uk/downloads/PSICOV http://gremlin.bakerlab.org https://rostlab.org/owiki/index.php/FreeContact https://github.com/soedinglab/ccmpred http://bioinf.cs.ucl.ac.uk/MetaPSICOV http://c.pcons.net/ http://www.csbio.sjtu.edu.cn/bioinf/BND/ http://www.csbio.sjtu.edu.cn/bioinf/R2C/ http://sysbio.rnet.missouri.edu/dncon2/ http://raptorx.uchicago.edu/ContactMap/ https://github.com/largelymfs/deepcontact https://github.com/psipred/DeepCov http://sparks-lab.org/jack/server/SPOT-contact/

https://zhanglab.ccmb.med.umich.edu/ResPRE/ [54]

1 Xukl ðsikl hsi iÞðsjkl hsj iÞ N 2 kl si sj

(7.2)

where si and hsii represent the standard deviation and the mean of sikl, respectively, and ukl is a weight to reduce the influence from sequential similar proteins in MSA. Its value is related to the fraction of mismatched residues in the two aligned sequences. Later, more powerful measures were designed to predict contacts. Representative measures include mutual information (MI) [59] and its updated measure MIp [60], which are both derived from information theory. They are all based on Shannon’s entropy (H) and measure the randomness or uncertainty of a random variable. In contact map prediction, for a column c in the MSA file, entropy is defined as follows:

Hc ¼

20 X i¼1

pðAAi Þlog20 pðAAi Þ

(7.3)

where p(AAi) represents the frequency that amino acid i appears at column c in the MSA file. When there is only one type of amino acid in column c, entropy is equal to 0; when all 20 types of amino acids appear with equal frequency, entropy is equal to 1. For other situations, entropy ranges from 0 to 1. The MI between column a and column b is calculated as follows: MIða; bÞ ¼ Ha þ Hb Hab

(7.4)

223

Shi-Hao Feng, Jia-Yan Xu and Hong-Bin Shen

224

where Hab is the joint entropy. It is calculated similarly, using the frequencies of all possible residue pairs in column a and column b. The value of joint entropy ranges from 0 to 2. MI measures the correlation between two columns in the MSA file. A large MI means the corresponding residue pair are possibly in contact. Although MI makes great progress in contact prediction, it is still not precise. This is because MI between a pair of columns not only derives from structural interactions but also from random noise and phylogenetic noise [61]. Fig. 7.5A gives an illustration of phylogenetic noise. The black lines in the figure represent the phylogenetic tree, which describes the evolutionary relationship between the plotted sequences. One mutation from C to M occurs in the upper part of the tree, while another mutation from T to D occurs in the lower part. These two mutations are independent from each other but show a significant correlated mutation pattern. To deal with this problem, an MIp [60] measure is proposed. It demonstrates that the MI from random noise and phylogenetic noise can be approximately calculated as follows: APCða; bÞ ¼

MIða; xÞMIðb; xÞ MI

(7.5)

where APC denotes the approximation term. MIða; xÞ and MI represents the mean MI of column a and the mean MI among all columns, which are defined as follows, respectively: 1X MIða; xÞ (7.6) MIða; xÞ ¼ m xsa MI ¼

m n X 2 X MIðx; yÞ mn x ¼ 1 y ¼ xþ1

(7.7)

Figure 7.5 Illustration of two main noises in contact map prediction. Phylogenetic noise is derived from the evolution of protein. In (A), the black lines represent the phylogenetic tree. There are two kinds of mutations: from C to M in the upper tree, and from T to D in the lower tree. These two mutations are independent but present the pattern of correlated mutations in the MSA file. (B) illustrates the transitive noisedi.e., if both residue A and residue B are in contact with residue C, it is much more likely that residue A and B are predicted to be in contact.

Artificial intelligence in bioinformatics: automated methodology development for protein residue contact map prediction

where m ¼ n1 and n is the number of columns in MSA. Then, through subtracting APC(a, b) from MI(a, b), the MI derived from structural interactions will be obtained: MIp ða; bÞ ¼ MIða; bÞ APCða; bÞ

(7.8)

Because MIp reduces the influence of phylogenetic and random noise, it can give a more precise prediction than MI.

7.3.2 Direct correlation analysis Besides the phylogenetic noise, a new type of noise is observed as contact map prediction develops, which is called indirect correlations or transitive noise. As shown in Fig. 7.5B, if residue A and residue B are both in contact with residue C, it is much more likely that residue A and residue B are predicted to be in contact. This kind of noise widely exists in contact map prediction. Multiple models have been proposed to disentangle direct correlation from indirect correlation. These models are referred to as direct correlation analysis models in this section and can be divided into three classes according to the models they employ. 7.3.2.1 Direct-coupling analysis DCA [31e36,55] models are based on the Potts model for reducing transitive noise, which is a generalization of the Ising model in statistical mechanics. The Potts model was initially used to describe interacting spins on a crystalline lattice. It consists of several discrete variables that can be in one of several states. Here, to understand the Potts model, we can consider it a sequence generator. The state of each variable in the Potts model is one of the 20 amino acids or a gap. The number of variables is equal to the number of columns in MSA. Through concatenating the generated amino acids and gaps, we can obtain a sequence with the same length as sequences in the MSA file. According to the theory of the Potts model, the probability of each generated sequence S is defined as 1 0 n X X 1 PðSÞ ¼ exp@ hi ðSi Þ þ Jij ðSi ; Sj ÞA (7.9) Z i¼1 1i > > > > > < weight ¼ minflog2 ðMeff Þ=20; 0:3g; if short-range > > weight ¼ minflog2 ðMeff Þ=16; 0:5g; if medium-range > > > > : weight ¼ minflog2 ðMeff Þ=12; 0:8g; if long-range

(7.26)

where OR2 C, OML, and OCMA are the outputs of R2C, ensemble SVM, and PSICOV, respectively. Meff represents the number of effective sequences in the MSA file. It is defined as X 1 P (7.27) Meff ¼ 1 þ q¼1;/;N Sp;q p ¼ 1;/;N where N is the number of sequences in MSA. Sp,q measures the differences between sequences p and q. If the hamming distance between them is less than 0.38, it is set to 1; otherwise, it equals 0. From Eq. (7.26), we can see that when the MSA contains a large number of effective sequences, the fusion strategy puts more weight on the unsupervised learning model. Moreover, compared with the long-range contact prediction, the strategy puts more weight on the results of ensemble SVM when predicting short-range contacts. These preferences are designed according to the statistics in the experiments. After the fusion strategy, a 2-D Gaussian noise filter is employed to postprocess the raw prediction. The experiment shows that the filter further improves the performance of the model by at least 2%, which demonstrates the existence of Gaussian noise in the predicted contact map. R2C also took part in the CASP competition and ranked second on top L/5 long-range contacts in CASP11. 7.3.3.2 Convolutional neural network-based models CNN is a class of deep learning networks that has attracted much attention in recent studies. It can automatically extract high-level features from raw input features, which are much more powerful than human-designed features. Thus, it has brought significant improvements to a number of fieldsdfor instance, image segmentation [69] and recognition [70]. In the contact map prediction field, CNN has also been employed in many works, and the performance largely exceeds that of the models mentioned above [48e53]. At the beginning of the application of CNN, a similar pipeline is adopted as in the supervised learning models, except that the machine learning algorithms are replaced by CNN. Representative models include RaptorX [52], SPOT-Contact [53], DeepContact [48], and DNCON2 [50]. The features used in these models can be divided into two classes, 1-D and 2-D features, which are similar to MetaPSICOV.

231

232

Shi-Hao Feng, Jia-Yan Xu and Hong-Bin Shen

The former features are related to residues in sequence and consist of PSSM, HMM profile generated by Hhblits, predicted secondary structure, and predicted solvent accessibility. The latter features are predicted results of unsupervised learning models, such as PSICOV, CCMpred, and mfDCA. Then, to generate a 3-D matrix Mat for convolution operation, the 1-D features of residues i and j in the sequence are concatenated to form a new longer 1-D feature vector that is then placed in Mat(i, j,). Through this, the 1-D features of residues in the neighbors of residues i and j are considered by the convolution operation when predicting whether i and j are in contact. Finally, the 2-D features are concatenated with Mat along the channel dimension to generate the final 3D matrix. The deep learning architectures employed in the prediction models are mainly residual networks (ResNets) [70], which are widely used in fields like image processing and natural language processing. The pipeline of CNN-based models is illustrated in Fig. 7.6. Due to the great power of CNN in feature extraction and pattern recognition, the performance enhancement of these models is significant. It should be noticed that all the features mentioned above are derived from MSA. The 2-D features are generated using the DCA model or the sparse inverse covariance estimation model from MSA; the 1-D features, such as PSSM and HMM profile, are all statistics of MSA. So it is natural to wonder whether we can construct a model that directly predicts contacts from MSA. This idea was implemented in the model of plmConv [51]. To generate MSA, plmConv employs Jackhmmer [71] to search against the UniParc database. Then, by

Figure 7.6 Illustration of CNN-based model. Given a protein sequence, 1-D features are first extracted, the size of which is L D dims. Then, the 1-D features are transformed into an L L (2 D) matrix. The (i, j,) entity in the matrix is a vector consisting of the 1-D features of residues i and j. Meanwhile, the 2-D features are also extracted and concatenated with the generated matrix along the channel dimension. The input matrix of the CNN model is an L L(2DþC) matrix, where C represents the number of 2-D features. The CNN model is composed of several residual blocks. In each residual block, there are two convolutional layers with a shortcut (red lines) connected. The output of the CNN model is the predicted contact map.

Artificial intelligence in bioinformatics: automated methodology development for protein residue contact map prediction

Figure 7.7 Illustration of feature transformation in plmConv. The L L 21 21 matrix J is derived from plmDCA, where L represents the length of the protein sequence. Through concatenating all the 21 21 submatrices along the channel dimension, an L L (21 21) matrix Mat is obtained, which serves as the input of the following CNN model.

using the plmDCA model introduced in Section 3.2.1, the LL2121 matrix J is obtained, where L is the length of the sequence. The entry (m,n) in the 21 21 submatrix Jij is related to the correlated mutation between residue type m at position i and residue type n at position j. To convert matrix J into a 3-D matrix Mat for the following convolution operations, all 21 21 submatrices in J are concatenated along the channel dimension. Therefore, the size of matrix Mat is L L 441. The transformation progress is illustrated in Fig. 7.7. Finally, a three-layer CNN is employed to predict the final contact map from matrix Mat. In 2018, a powerful model called DeepCov [49] was proposed. It adopts similar pipeline as plmConv, except that the CNN is deeper, with five convolutional layers, and the matrix J is replaced by the covariance matrix S introduced in the PSICOV model section. In the latest CASP13, the model TripletRes [72], which ranked first, also predicts the contact map only from coevolution features. It uses three ResNets with 24 convolutional layers to process the covariance matrix S, the matrix J predicted by plmDCA, and the matrix Q predicted by PSICOV, respectively. Then, the outputs of three ResNets are concatenated along the channel dimension to form a 3-D matrix that serves as the input of a new ResNet with 24 convolutional layers. The output of the new ResNet is the predicted contact map. All the studies indicate that predicting contact map directly from MSA and deep learning models is a future promising research direction.

7.4 Performance significantly depends on MSA features MSA is the main information source for generating features of almost all the contact map predictors. The quality of MSA, which is related to the number of effective sequences to a certain degree, determines the precision of the final prediction [19]. Thus, besides employing powerful prediction algorithms, it is beneficial to explore more effective frameworks for generating high-quality MSA.

233

Shi-Hao Feng, Jia-Yan Xu and Hong-Bin Shen

234

There are at least two solutions to obtain more effective sequences: one is taking advantage of large databases; the other is employing more sensitive sequence alignment algorithms to get more remote homologous sequences. For instance, a two-stage generation framework is proposed in MetaPSICOV [25]. It first uses HHblits [57] to search against a small database, Uniprot20. If the number of homologous sequences is less than 3000, it uses a more sensitive algorithm, Jackhmmer in the HMMER software suite [71], to search against a larger database, UniRef100. Later, this generation framework is refined in the recent model TripletRes [72]. First, HHblits is employed to search against Uniclust30 and get a homologous sequence set h1 and MSA M1. If the number of effective sequences in M1 is less than 128, then Jackhmmer is employed to search against UniRef90 and a larger homologous sequence set h2 is obtained. From h1 and h2, a more accurate MSA, M2, can be generated by HHblits. Again, if the number of effective sequences in M2 is less than 128, then HMMsearch algorithm in HMMER software is employed to search against Metaclust [73] using an HMM profile generated from M2. This gives us a homologous sequence set h3. Finally, the most accurate MSA, M3, is generated from h1, h2, and h3 by HHblits. This complex MSA generation framework is also one of the reasons that TripletRes achieved better performance in CASP13.

7.5 Conclusions Contact map is a 2-D representation of protein 3-D structure. It brings crucial constraints on protein structure prediction, which largely reduce the conformation space to explore. There are at least three classes of predictors appearing in contact map prediction: CMA, direct-correlation analysis, and supervised learning models. Besides prediction algorithms development, much effort has also been made in MSA feature generation. Accurate contact map prediction is still a challenging problem considering the extremely imbalanced phenomena between the true contact pattern and all the candidates. In the future, exploring transparent deep learning models that can effectively incorporate more domain knowledge, as well as generating more discriminative features from MSA, will be a promising way for further improve the performance in this area.

References [1] J. Schlessinger, Cell signaling by receptor tyrosine kinases, Cell 103 (2) (2000) 211e225. [2] M.M. Davis, P.J. Bjorkman, T-cell antigen receptor genes and T-cell recognition, Nature 334 (6181) (1988) 395. [3] Murray, R.K., et al., Harper’s Illustrated Biochemistry, Mcgraw-hill, New York. [4] L. Schrodinger, The PyMOL Molecular Graphics System, 2015. Version 1.8, https://pymol.org/2/. [5] C.B. Anfinsen, Principles that govern the folding of protein chains, Science 181 (4096) (1973) 223e230. [6] Y. Zhang, Progress and challenges in protein structure prediction, Current Opinion in Structural Biology 18 (3) (2008) 342e348.

Artificial intelligence in bioinformatics: automated methodology development for protein residue contact map prediction

[7] P. Fariselli, R. Casadio, A neural network based predictor of residue contacts in proteins, Protein Engineering 12 (1) (1999) 15e21. [8] O. Lund, et al., Protein distance constraints predicted by neural networks and probability density functions, Protein Engineering 10 (11) (1997) 1241e1248. [9] M. Vendruscolo, E. Kussell, E. Domany, Recovery of protein structure from contact maps, Folding & Design 2 (5) (1997) 295e306. [10] J. Chen, H.-B. Shen, Glocal: reconstructing protein 3D structure from 2D contact map by combining global and local optimization schemes, Current Bioinformatics 7 (2) (2012) 116e124. [11] M. Vassura, et al., Reconstruction of 3D structures from protein contact maps, IEEE/ACM Transactions on Computational Biology and Bioinformatics 5 (3) (2008) 357e367. [12] J.B. Saxe, Embeddability of weighted graphs in k-space is strongly NP-hard, in: Proc. of 17th Allerton Conference in Communications, Control and Computing, Monticello, IL, 1979, pp. 480e489. [13] Crippen, G.M. and T.F. Havel, Distance Geometry and Molecular Conformation, vol. 74, Research Studies Press, Taunton. [14] B. Adhikari, et al., CONFOLD: residue geometry and molecular conformation, vols. 74, Proteins 83 (8) (2015) 1436e1449. [15] B. Adhikari, J. Cheng, CONFOLD2: improved contact-driven ab initio protein structure modeling, BMC Bioinformatics 19 (1) (2018) 22. [16] T. Nugent, D.T. Jones, Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis, Proceedings of the National Academy of Sciences 109 (24) (2012) E1540eE1547. [17] T. Kosciolek, D.T. Jones, De novo structure prediction of globular proteins aided by sequence variation-derived contacts, PLoS One 9 (3) (2014) e92197. [18] M. Michel, et al., PconsFold: improved contact predictions improve protein models, Bioinformatics 30 (17) (2014) i482ei488. [19] S. Ovchinnikov, et al., Protein structure determination using metagenome sequence data, Science 355 (6322) (2017) 294e298. [20] D.T. Jones, et al., PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments, Bioinformatics 28 (2) (2011) 184e190. [21] M. Mabrouk, et al., Analysis of free modeling predictions by RBO aleph in CASP 11, Proteins 84 (2016) 87e104. [22] J. Schaarschmidt, et al., Assessment of contact predictions in CASP12: Co-evolution and deep learning coming of age, Proteins 86 (2018) 51e66. [23] J. Xu, S. Wang, Analysis of Distance-Based Protein Structure Prediction by Deep Learning in CASP13, bioRxiv, 2019, p. 624460. [24] M.M. Gromiha, S. Selvaraj, Comparison between long-range interactions and contact order in determining the folding rate of two-state proteins: application of long-range order to folding rate prediction, Journal of Molecular Biology 310 (1) (2001) 27e32. [25] D.T. Jones, et al., MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins, Bioinformatics 31 (7) (2014) 999e1006. [26] J. Yang, et al., R2C: improving ab initio residue contact map prediction using dynamic fusion strategy and Gaussian noise filter, Bioinformatics 32 (16) (2016) 2435e2443. [27] S. Wu, Y. Zhang, A comprehensive assessment of sequence-based and template-based methods for protein contact prediction, Bioinformatics 24 (7) (2008) 924e931. [28] J. Cheng, P. Baldi, Improved residue contact prediction using support vector machines and a large feature set, BMC Bioinformatics 8 (1) (2007) 113. [29] M.J. Skwark, et al., Improved contact predictions using the recognition of protein like contact patterns, PLoS Computational Biology 10 (11) (2014) e1003889. [30] Z. Wang, J. Xu, Predicting protein contact map using evolutionary and physical constraints by integer programming, Bioinformatics 29 (13) (2013) i266ei273. [31] F. Morcos, et al., Direct-coupling analysis of residue coevolution captures native contacts across many protein families, Proceedings of the National Academy of Sciences 108 (49) (2011) E1293eE1301.

235

236

Shi-Hao Feng, Jia-Yan Xu and Hong-Bin Shen

[32] H. Kamisetty, S. Ovchinnikov, D. Baker, Assessing the utility of coevolution-based residueeresidue contact predictions in a sequence-and structure-rich era, Proceedings of the National Academy of Sciences 110 (39) (2013) 15674e15679. [33] L. Kajań, et al., FreeContact: fast and free software for protein contact prediction from residue coevolution, BMC Bioinformatics 15 (1) (2014) 85. [34] M. Ekeberg, et al., Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models, Physical Review E 87 (1) (2013) 012707. [35] M. Ekeberg, T. Hartonen, E. Aurell, Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences, Journal of Computational Physics 276 (2014) 341e356. [36] M. Weigt, et al., Identification of direct residue contacts in proteineprotein interaction by message passing, Proceedings of the National Academy of Sciences 106 (1) (2009) 67e72. [37] S. Feizi, et al., Network deconvolution as a general method to distinguish direct dependencies in networks, Nature Biotechnology 31 (8) (2013) 726. [38] H.P. Sun, et al., Improving accuracy of protein contact prediction using balanced network deconvolution, Proteins 83 (3) (2015) 485e496. [39] M. Punta, B. Rost, PROFcon: novel prediction of long-range contacts, Bioinformatics 21 (13) (2005) 2960e2968. [40] B. He, et al., NeBcon: protein contact map prediction using neural network training coupled with naı¨ve Bayes classifiers, Bioinformatics 33 (15) (2017) 2296e2306. [41] P. Fariselli, et al., Prediction of contact maps with neural networks and correlated mutations, Protein Engineering 14 (11) (2001) 835e843. [42] N. Hamilton, et al., Protein contact prediction using patterns of correlation, Proteins 56 (4) (2004) 679e684. [43] G. Shackelford, K. Karplus, Contact prediction using mutual information and neural nets, Proteins 69 (S8) (2007) 159e164. [44] W. Ding, et al., CNNcon: improved protein contact maps prediction using cascaded neural networks, PLoS One 8 (4) (2013) e61533. [45] A. Vullo, I. Walsh, G. Pollastri, A two-stage approach for improved prediction of residue contact maps, BMC Bioinformatics 7 (1) (2006) 180. [46] P. Di Lena, K. Nagata, P. Baldi, Deep architectures for protein contact map prediction, Bioinformatics 28 (19) (2012) 2449e2457. [47] B. Adhikari, J. Hou, J. Cheng, Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning, Proteins 86 (2018) 84e96. [48] Y. Liu, et al., Enhancing evolutionary couplings with deep convolutional neural networks, Cell systems 6 (1) (2018) 65e74, e3. [49] D.T. Jones, S.M. Kandathil, High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features, Bioinformatics 34 (19) (2018) 3308e3315. [50] B. Adhikari, J. Hou, J. Cheng, DNCON2: improved protein contact prediction using two-level deep convolutional neural networks, Bioinformatics 34 (9) (2017) 1466e1472. [51] V. Golkov, et al., Protein contact prediction from amino acid co-evolution using convolutional networks for graph-valued images, in: Advances in Neural Information Processing Systems, Barcelona, Spain, December 5e10, 2016, 2016, pp. 4222e4230. [52] S. Wang, et al., Accurate de novo prediction of protein contact map by ultra-deep learning model, PLoS Computational Biology 13 (1) (2017) e1005324. [53] J. Hanson, et al., Accurate prediction of protein contact maps by coupling residual two-dimensional bidirectional long short-term memory with convolutional neural networks, Bioinformatics 34 (23) (2018) 4039e4045. [54] Y. Li, et al., ResPRE: high-accuracy protein contact prediction by coupling precision matrix with deep residual neural networks, Bioinformatics (2019), https://doi.org/10.1093/bioinformatics/btz291. [55] S. Seemayer, M. Gruber, J. So¨ding, CCMpreddfast and precise prediction of protein residueeresidue contacts from correlated mutations, Bioinformatics 30 (21) (2014) 3128e3130.

Artificial intelligence in bioinformatics: automated methodology development for protein residue contact map prediction

[56] S.F. Altschul, et al., Basic local alignment search tool, Journal of Molecular Biology 215 (3) (1990) 403e410. [57] M. Remmert, et al., HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment, Nature Methods 9 (2) (2012) 173. [58] U. Go¨bel, et al., Correlated mutations and residue contacts in proteins, Proteins 18 (4) (1994) 309e317. [59] G.B. Gloor, et al., Mutual information in protein multiple sequence alignments reveals two classes of coevolving positions, Biochemistry 44 (19) (2005) 7156e7165. [60] S.D. Dunn, L.M. Wahl, G.B. Gloor, Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction, Bioinformatics 24 (3) (2007) 333e340. [61] K.R. Wollenberg, W.R. Atchley, Separation of phylogenetic and functional associations in biological sequences by using the parametric bootstrap, Proceedings of the National Academy of Sciences 97 (7) (2000) 3288e3291. [62] N. Meinshausen, P. Bu¨hlmann, High-dimensional graphs and variable selection with the lasso, Annals of Statistics 34 (3) (2006) 1436e1462. [63] O. Banerjee, L.E. Ghaoui, A. d’Aspremont, Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data, Journal of Machine Learning Research 9 (2008) 485e516. [64] J. Friedman, T. Hastie, R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso, Biostatistics 9 (3) (2008) 432e441. [65] H.M. Berman, et al., The protein data bank, Nucleic Acids Research 28 (1) (2000) 235e242. [66] S. Miyazawa, R.L. Jernigan, Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation, Macromolecules 18 (3) (1985) 534e552. [67] M.R. Betancourt, D. Thirumalai, Pair potentials for protein folding: choice of reference states and sensitivity of predicted native states to variations in the interaction schemes, Protein Science 8 (2) (1999) 361e369. [68] D. Jones, et al., Prediction of novel and analogous folds using fragment assembly and fold recognition, Proteins 61 (S7) (2005) 143e151. [69] L.-C. Chen, et al., Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs, IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (4) (2017) 834e848. [70] K. He, et al., Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, USA, June 26eJuly 1, 2016, 2016, pp. 770e778. [71] R.D. Finn, J. Clements, S.R. Eddy, HMMER web server: interactive sequence similarity searching, Nucleic Acids Research 39 (Suppl. l_2) (2011) W29eW37. [72] W.B. Eric, et al., ResTriplet/TripletRes: Learning Contact-Maps from a Triplet of Coevolutionary Matrices, 2018. http://predictioncenter.org/casp13/doc/presentations/Pred_CASP13_contacts_ ResTriplet_TripletRes_Redacted.pdf. [73] M. Steinegger, J. So¨ding, Clustering huge protein sequence sets in linear time, Nature Communications 9 (1) (2018) 2542.

237

CHAPTER EIGHT

Deep learning in biomedical image analysis Minjeong Kim1, Chenggang Yan2, Defu Yang2, Qian Wang3, Junbo Ma4 and Guorong Wu4 1

Department of Computer Science, University of North Carolina at Greensboro, NC, United States Institute of Information and Control, Hangzhou Dianzi University, Hangzhou, China School of Communication and Information Engineer, Xi’an University of Posts & Telecommunications, Xi’an, China 4 Department of Psychiatry, University of North Carolina at Chapel Hill, NC, United States 2 3

8.1 Introductionddeep learning meets medical image analysis The emergence of modern imaging techniques, such as magnetic resonance imaging (MRI) and positron emission topography, offers the opportunity to study the human brain in ways that previously were not possible. As an effective measurement, imaging-based analysis has been increasingly employed for many research and clinical studies, such as brain development/aging [1e3] and the effects of pharmacological interventions [4]. It thus brings forth the need for sophisticated and highly automated image analysis methods to identify and quantify anatomical changes, which are often confounded by complex morphological patterns and interindividual variations in structure and function [5e10]. Recently, deep learning approaches have gained significant interest as a way of building hierarchical representations for unlabeled data [11,12]. Deep learning has achieved many successes in computer vision and machine learning areas [11e26]. The dedicated deep architectures attempt to learn hierarchical structures by learning simple concepts first and then successfully building up more complex concepts by composing the simpler ones together. In light of this, deep learning was named a top-10 breakthrough of 2013 by MIT Technology Review’s. For example, distinct feature representation is always the key to achieve success in pattern recognition or machine learning applications. Conventionally, feature representations such as SIFT are designed ad hoc based on expert knowledge or a predefined model. As a result, these handcrafted features only work well for a limited number of applications. Deep learning has relieved such obstacles by automatically learning to discriminate data in a self-taught manner. Since the burden of feature engineering has shifted from the human to the computer side, deep learning allows for the rapid development of machine learning algorithms for new research projects.

Biomedical Information Technology ISBN 978-0-12-816034-3, https://doi.org/10.1016/B978-0-12-816034-3.00008-0

2020 Elsevier Inc. All rights reserved.

239 j

240

Minjeong Kim, Chenggang Yan, Defu Yang, Qian Wang, Junbo Ma and Guorong Wu

The unprecedented successes of deep learning arise mostly from the following: (1) advancements in high-tech central processing units and graphic processing units (GPUs); (2) availability of data with annotations (i.e., big data); and (3) new development of neural network inference algorithms. There is wide consensus that deep learning can be regarded as an improvement over conventional artificial neural networks by making networks much deeper with a faster training algorithm. It is empirically shown that deep neural networks (DNNs) can discover hierarchical feature representations such that higher-level features can be derived from lower-level features. Due to the capability of hierarchical feature representation learning, deep learning has achieved record-breaking performance in a variety of artificial intelligence applications and grand challenges. Particularly, great improvements in computer vision inspired its application in medical image analysis such as image segmentation, image registration, image fusion, image parcellation, computer-aided diagnosis and prognosis, lesion/landmark detection, and microscopy image analysis, to name a few. Given large numbers of training samples, deep learning methods are very effective for finding the nonlinear mapping between observed variables and the outcome. For example, more than one million annotated images have been collected in the ImageNet Large-Scale Visual Recognition Challenge. For medical imaging applications, the common difficulty is usually a very limited number of training samples due to the cost of data collection and manual annotation. Hence, one of the main challenges in medical image analysis is to develop state-of-the-art deep models without suffering from overfitting. To that end, various strategies have been proposed to either reduce the data dimension or augment data size, which can be classified into the following categories: (1) use 2-D/3-D image patches instead of the entire image as in the computer vision area. Thus, we can significantly reduce the dimensionality of input data and the complexity of the learning model. (2) expand the dataset by artificially generating samples via spatial perturbation (such as affine transformation). Thus, we can improve the robustness of DNNs. (3) pretraining strategydwe use deep models trained over a huge number of natural images in computer vision as “off-the-shelf ” feature extractors and then do the fine-tuning with respect to the target-task samples. Alternatively, we can initialize the model parameters with those of pretrained models from nonmedical or natural images and then fine-tune the network parameters with the task-specific samples. In terms of the input types, we can categorize deep models into (1) typical multilayer neural networks that take input values in vector forms (i.e., nonstructured), and (2) CNNs that take 2-D- or 3-D-shaped (i.e., structured) values as inputs. We first briefly illustrate the computational concepts and theories of neural networks and deep models and their fundamentals for extracting high-level feature representations in Section 2. Section 3 introduces recent studies that have explored deep models for several typical applications in the medical imaging area, which covers image segmentation, image registration, anatomy localization, and nuclei/cell detection in microscopy images.

Deep learning in biomedical image analysis

241

Finally, we conclude this chapter by summarizing research trends and suggesting directions for further improvement in Section 4.

8.2 Basics of deep learning In this section, we explain the fundamental concepts of feed-forward neural networks and basic deep models in the literature. The contents are specifically focused on learning hierarchical feature representations from the existing data. We also describe tips for how to efficiently learn the network parameters of DNNs without overfitting.

8.2.1 Feed-forward neural networks In machine learning, artificial neural networks are a family of learning models that mimic the structural elegance of the neural system and learn patterns from observation. The perceptron is the earliest trainable neural network, with a single-layer architecture composed of an input layer and an output layer. The perceptron or modified perceptron with multiple output units in Fig. 8.1A can be regarded as a linear model, which limits its applications to tasks involving complicated data patterns despite the use of a nonlinear activation function in the output layer. Such a limitation has been successfully addressed by introducing the so-called hidden layer between the input and output layers. Note that the units of the neighboring layers are fully connected to each other in neural networks, while there is no connection among the units in the same layer. An example is the two-layer neural network shown in Fig. 8.1B, which is also called a multilayer perceptron. Given an input vector v ¼ [vi]˛ℛD, we can write the estimation function of an output unit yk as a composition function as 0 1 ! M D X X (8.1) yk ðv; QÞ ¼ f 2 @ W2 f 1 W 1 vi þ b1 þ b2 A; j¼1

kj

i¼1

ji

j

k

where the superscript denotes the index of layer, f1(.), f2(.) denote nonlinear activation functions of units at the specified layer, M is the number of hidden units, and

Figure 8.1 Network architecture of feed-forward neural networks. (A) single-layer neural nework. (B) Multi-layer neural network.

Minjeong Kim, Chenggang Yan, Defu Yang, Qian Wang, Junbo Ma and Guorong Wu

242

h i Q ¼ {W1,W2,b1,b2} denotes the parameter setdespecially, W 1 ¼ Wji1 ˛ℛ MD , h h i i W 2 ¼ Wkj2 ˛ℛ KM , b1 ¼ b1j ˛ℛ M , and b2 ¼ b2k ˛ℛ K . Conventionally, the hidden units’ activation function, f 1(.), is commonly defined with a sigmoidal function such as a “logistic sigmoid” function or a “hyperbolic tangent” function, while the output units’ activation function f 2(.) is dependent on the target task. Since the estimation proceeds in a forward manner, this type of network is also called a feed-forward neural network. If we regard the hidden layer in Eq. (8.1) as feature extractor F(v) ¼ [fj(v)]˛ℛM from input v, the output layer is eventually a simple linear model: 0 1 M X yk ðv; QÞ ¼ f 2 @ Wkj2 fj ðvÞ þ b2k ; A (8.2) j¼1

where fj ðvÞ ¼

f1

D P i¼1

Wji1 vi

þ b1j

. The same interpretation holds when we add

more hidden layers into the neural network. Thus, it is intuitive to understand that the role of hidden layers is to find feature information from the specific learning task. For the practical use of neural networks, the model parameters Q must be learned from the data, which can be optimized by minimizing the learning errors. From an optimization perspective, an error function E for neural networks is highly nonlinear and nonconvex. Thus, there is no analytic solution for the parameter set Q. However, it is possible to resort to a gradient descent algorithm by updating the parameters in an iterative manner. To utilize a gradient descent algorithm, a way must be determined to compute the gradient VE(Q) evaluated at the parameter set Q. For a feed-forward neural network, the gradient can be efficiently evaluated by means of error backpropagation. Once we obtain the gradient vectors for all the layers, the parameter q ˛{W1,W2,b1,b2} can be updated as qsþ1 ¼ qs hVEðqs ; Þ

(8.3)

where h is called the learning rate and s denotes the iteration index. The update process repeats until convergence or reaching the predefined number of iterations. As for the parameter update in Eq. (8.3), the stochastic gradient descent on a small subset of training samples (called mini-patch) is commonly used in the literature.

8.2.2 Stacked autoencoder Autoencoder. An autoencoder (AE) is one typical neural network and is structurally defined by three sequential layers: the input layer, the hidden layer, and the output layer.

Deep learning in biomedical image analysis

243

Here, the goal of AE is to learn the latent feature representations from the 3-D image patches collected from medical images. Let D and L denote, respectively, the dimensions of hidden representations and input patches. Given an input image patch xm˛ℛL (m ¼ 1, .,M), AE maps it to be an activation vector hm ¼ ½hm ð jÞTj¼1;.;D ˛ℛ D by hm ¼ f(Wxm þ b1), where the weight matrix W ˛ℛDL and the bias vector b1˛ℛD are the encoder parameters. Here, f is the logistic sigmoid function f(a) ¼ 1/(1 þ exp(a)). It is worth noting that hm is considered the representation vector of the particular input training patch xm via AE. Next, the representation hm from the hidden layer is decoded to a vector ym˛ℛL, which approximately reconstructs the input image patch vector x by another deterministic mapping, ym ¼ f(WThm þ b2)zxm, where the bias vector b2 ˛ℛL is the decoder parameter. Therefore, the energy function in AE can be formulated as fW ; b1 ; b2 g ¼ arg min

M X f W T ð f ðWxm þ b1 ÞÞ þ b2 xm 2 . 2

W ;b1 ;b2 m ¼ 1

(8.4)

The sparsity constraint upon the hidden nodes in the network usually leads to more interpretable features. Specifically, we regard each hidden node hm( j) as being “active” if the degree of hm( j) is close to 1, or “inactive” if the degree is close to 0. Thus, the sparsity constraint requires most of the hidden nodes to remain “inactive” for each training patch xm. KullbackeLeibler divergence is used to impose the sparsity constraint to each hidden node by enforcing the average activation degree over the whole of the training datadi.e., M P b rj ¼ hm ðjÞ, to be close to a very small value, r (r is set to 0.001 in the m¼1

experiments): rlogr ð1 rÞlogð1 rÞ þ : KL rb rj ¼ b rj 1b rj

(8.5)

Then, the overall energy function of AE with sparsity constraint is defined as fW ; b1 ; b2 g ¼ argmin

M D

X X f W T ð f ðWxm þ b1 ÞÞ þ b2 xm 2 þ b KL rb rj ;

W ;b1 ;b2 m ¼ 1

2

j¼1

(8.6) where b controls the strength of sparsity penalty term. A typical gradient-based backpropagation algorithm can be used for training single AE [26,27]. A single AE is limited in what it can present, because the model is shallow in learning. As shown in Fig. 8.2A, a set of training image patches is sampled from brain MR images, each sized at 15 15 (for demonstration, we use 2-D patches as examples). We set the number of hidden nodes as 100 in this single AE. The reconstructed image patches are

244

Minjeong Kim, Chenggang Yan, Defu Yang, Qian Wang, Junbo Ma and Guorong Wu

Figure 8.2 The reconstructed image patches by single autoencoder (B) and stacked autoencoder. (D), where the residual errors are shown in (C) and (E), respectively.

shown in Fig. 8.2B. It is obvious that many details have been lost after reconstruction from low-dimension representations, as displayed in Fig. 8.2C. Stacked AE. The power of deep learning emerges when several AEs are stacked to form a stacked AE (SAE), where each AE becomes a building block in the deep learning model. In order to train SAE, we use greedy layer-wise learning [12,25] to train a single AE at each time. Specifically, there are three steps in training SAE: (1) pretraining, (2) unrolling, and (3) fine-tuning [11]. In the pretraining step, we train the first AE with all image patches as the input. Then, we train the second AE by using the activations h(1) of the first AE (pink circles in Fig. 8.3) as the input. In this way, each layer of features

Figure 8.3 The hierarchical architecture of a stacked autoencoder.

Deep learning in biomedical image analysis

captures strong high-order correlations based on outputs from the layer below. This layer-by-layer learning can be repeated many times. After pretraining, we build a deep learning network by stacking the AE in each layer, with the higher layer AE nested within the lower layer AE. Figure 8.3 shows a SAE consisting of 2-layer stacked AEs. Since the layer-by-layer pretraining procedure provides very good initialization for the multilevel network, we can efficiently use the gradient-based optimization method (such as L-BFGS or conjugate gradient [28]) to further refine the parameters in SAE in the fine-tuning stage. Due to the deep and hierarchical nature of the network structure, SAE can discover highly nonlinear and complex feature representations for patches in medical images. As shown in Fig. 8.2D and E, the patch reconstruction performance of SAE is much better than using a single AE where the SAE consists of only two layers and the number of hidden nodes in each layer is 255 and 100, respectively.

8.2.3 Convolutional neural networks In deep learning models such as SAE, the inputs are always in vector form. However, for medical images, spatial structure information among neighborhood pixels or voxels is another important context. Hence, simple vectorization can undermine such structural and contextual information in images. A convolutional neural network (CNN) is designed to better utilize spatial and configuration information by taking each 2-D or 3D image as input. Structurally, CNNs have convolutional layers interspersed with pooling layers followed by fully connected layers in a standard multilayer neural network. CNNs explores the three mechanisms of local receptive field, weight sharing, and subsampling, as illustrated in Fig. 8.4, that greatly reduce the degrees of freedom of the model.

Figure 8.4 A graphical illustration of three key components in convolutional neural networks.

245

Minjeong Kim, Chenggang Yan, Defu Yang, Qian Wang, Junbo Ma and Guorong Wu

246

The role of a convolutional layer is to detect local features at different positions in ðlÞ input feature maps with learnable kernels kij di.e., connection weights between the feature map Ai at layer l1 and the feature map Aj at layer l. Specifically, the units of the ðlÞ

convolution layer l compute their activations Aj based only on a spatial contiguous ðl1Þ subset of units in the feature maps, Ai , of the preceding layer l1 by convolving the ðlÞ kernels kij as follows: ! ðl1Þ M X ðlÞ ðl1Þ ðlÞ ðlÞ (8.7) Aj ¼ f Ai kij þ bj ; i¼1

where M(l1) denotes the number of feature maps in the layer l1, “*” denotes a ðlÞ

convolutional operator, bj is a bias parameter, and f(.) is a nonlinear activation function. Due to the mechanisms of weighted sharing and local receptive field, when the input feature map is slightly shifted, the activation of the units in the feature maps is also shifted by the same amount. A pooling layer follows a convolution layer to downsample the feature maps of the preceding convolution layer. Specifically, each feature map in a pooling layer is linked with a feature map in the convolution layer, and each unit in a feature map of the pooling layer is computed based on a subset of units within a local receptive field that finds a representative valuede.g., maximum or average among the units in its field. Usually, the stride of the size of the receptive field in pooling layers is set equal to the size of the receptive field for subsampling, which thus helps a CNN to be translationally invariant. Theoretically, the gradient-descent method combined with a backpropagation algorithm is also applied for learning parameters of a CNN. However, due to the special mechanisms of weight sharing, local receptive field, and pooling, it needs slight changesdi.e., summing of the gradients for a given weight over all connections using the kernel weights, determining which patch in the layer’s feature map corresponds to a unit in the next layer’s feature map, and upsampling the feature maps of the pooling layer to recover the reduced size of maps.

8.2.4 Tips to reduce overfitting A critical challenge in training deep models arises mostly from the limited number of training samples compared with the number of learnable parameters. Thus, it has always created an issue of overfitting. In this regard, recent studies have devised nice algorithmic techniques to better train deep models. Some techniques are as follows: 1. Initialization and momentum: uses a well-designed random initialization and a particular schedule of slowly increasing the momentum parameter as iteration passes 2. Rectified linear unit: applies for nonlinear activation function 3. Denoising: involves stacking layers of denoising AEs that are trained locally to reconstruct the original “clean” inputs from the corrupted versions of them

Deep learning in biomedical image analysis

4. Dropout, dropconnect: randomly deactivates a fraction of the units or connectionsde.g., 50%, in a network on each training iteration 5. Batch normalization: performs normalization for each mini-batch backpropagating the gradients through the normalization parameters

8.2.5 Open-source software toolkits for deep learning With the rapid development of deep learning technologies, the architectures of DNNs have grown more and more complex, which makes implementation of DNNs from scratch very challenging work, especially when executing code on GPUs. The software toolkits for deep learning, or so-called deep learning frameworks/libraries, are developed to ease the coding process of DNNs. These toolkits aim to encapsulate low-level operations, modularize DNNs, and provide high-level application programming interfaces (APIs) to use GPUs transparently. More importantly, they can automatically calculate derivatives for DNNs, which significantly simplifies the implementation of backpropagation in DNNs. The most famous deep learning framework in history is Theano [29], which has mainly been developed by the Montreal Institute for Learning Algorithms (MILA) at the Universite´ de Montreál since 2007. It innovatively introduced higher-order automatic differentiation, computation graphs, and transparent execution on GPUs. All these features have become mainstream ideas in the deep learning community. Unfortunately, MILA stopped developing Theano after the 1.0 release on November 15, 2017 [30] because many giant industry players joined the deep learning community and brought in more resources to support the development of deep learning frameworks. According to Jeff Hale’s report about the power scores of deep learning frameworks on September 20, 2018 [31], the most popular deep learning library is TensorFlow, which has been developed by the Google Brain team since 2015. TensorFlow has the most GitHub activity, Google searches, and ArXiv articles [31] due to its highperformance numerical computation and flexibility of deployment across various platforms (desktops, clusters, servers, and even mobile and edge devices) [32]. Many commercial companies are leveraging this framework, such as DeepMind, Uber, Airbnb, and Dropbox. Many documents and guidelines are available online, and a large community of developers and researchers are supporting it. However, pure TensorFlow is still low-level and requires much boilerplate code. To address this issue, many high-level frameworks have been developed, and the most successful one is called Keras [33]. Keras has much higher-level APIs and is capable of running on top of different backend low-level frameworks, such as TensorFlow, Theano, and CNTK (also known as the Microsoft Cognitive Toolkit, which is another deep learning framework developed by Microsoft [34]). Keras is famous for its fast and easy prototyping with many fully configurable modules. However, due to its high-level

247

248

Minjeong Kim, Chenggang Yan, Defu Yang, Qian Wang, Junbo Ma and Guorong Wu

design, the customization of DNNs for researchers is not as flexible as for low-level frameworks. Other high-level frameworks such as Sonnet [35], which is built on top of TensorFlow by DeepMind, are also very useful. Another problem in TensorFlow is that its statistic computation graph makes debugging very difficult. PyTorch has been the most successful at addressing this issue [36], which makes it the second popular stand-alone framework. PyTorch is the Python successor of Torch (a scientific computing framework written in Lua programming language since 2002; however, it has become inactive since the release of PyTorch). The primary advantage of PyTorch is the support of dynamic computation graphs, which makes PyTorch more like a traditional programming language that can be easily debugged by common debugging tools. In the release of version 1.0 on December 7, 2018, PyTorch integrated Caffee2 [37] (another deep learning framework developed by Facebook Research, which is the successor to the old CAFFE (Convolutional Architecture for Fast Feature Embedding) [38]), which makes PyTorch the biggest competitor of TensorFlow. All the aforementioned deep learning frameworks use Python as their main interface, while some also support R or Cþþ programming languages. Deeplearning4j [39] is the framework mainly using Java and Scala programming languages and is integrated with Hadoop and Apache Spark for distributed environments. MATLAB also has its Deep Learning Toolbox [40] for implementing DNNs. Other frameworks such as MXNet [41], developed by Apache, are still growing. Overall, due to stimulating competition between deep learning frameworks backed by strong industrial players, the software ecosystem for deep learning is now flourishing, which has resulted in various open-source deep learning toolkits available for commercial applications or academic research.

8.2.6 Brief summary of deep learning in biomedical imaging During the last 10 years, machine learning methods have brought a revolution to the computer vision area, with the evidence of novel efficient solutions to many image analysis problems that had remained unsolved. For this revolution into the community of biomedical imaging, many dedicated methods have been designed using deep learning techniques that account for the domain knowledge of biomedical images. Quite a few review papers are highly recommended and provide a comprehensive summary of machine learning in biomedical imaging. To name a few, the recent review paper by Litjens et al. [42] summarized a thorough list of papers of deep learning techniques published in top conferences such as CVPR and NIPS and peer reviewed journals such as IEEE Transactions on Medical Imaging and Medical Image Analysis. A few years ago, Shen et al. [43], Suzuki [44], and Greenspan et al. [45] summarized the

Deep learning in biomedical image analysis

advances of deep learning with a focus on biomedical imaging applications. The reader would have a comprehensive overview of the deep learning in biomedical imaging. Researchers and institutions have also released numerous datasets and useful opensource frameworks for researchers. Here, we give a brief summary. The website of Grand Challenges in Biomedical Image Analysis (https://grand-challenge.org/all_ challenges) includes many competitions and image datasets. For example, the Cancer Imaging Archive [46] and the National Institutes of Health [47] released a tranche of datasets for research use. Nifty-Net (www.niftynet.io) [48] provides a useful open-source framework for researchers to easily explore many published machine learning algorithms.

8.3 Applications in biomedical imaging Deep learning has been widely used in the medical imaging area over the last 5 years. Successful applications include feature representation learning, image segmentation, image registration, and classification of anatomical structures. Impressive improvements have been demonstrated in the literature over other machine learning techniques. These successes have been enough to attract the attention of medical imaging researchers to investigate the potential of deep learning for analyzing medical images such as computed tomography, MRI, microscopy imaging, and so on. In the following, we use feature representation learning, segmentation, and computer-assisted disease diagnosis as examples to illustrate the deployment of deep learning in medical applications.

8.3.1 Deep feature representation learning in the medical imaging area Many medical image processing methods rely on morphological feature representations to identify local anatomical characteristics. However, current feature representations are handcrafted, which requires intensive dedicated efforts. Moreover, designed image

Figure 8.5 Large structural difference around the hippocampus between 1.5-tesla (A) and 7.0-tesla (B) MR images. The 1.5-tesla image is enlarged w.r.t. the image resolution of the 7.0-tesla image for convenience of visual comparison.

249

250

Minjeong Kim, Chenggang Yan, Defu Yang, Qian Wang, Junbo Ma and Guorong Wu

features are often problem-specific and hardly reusabledi.e., not guaranteed to work for all types of images. For instance, there are large appearance differences between the 7.0tesla MR brain image (with the image spatial resolution 0.35 0.35 0.35 mm3) and the 1.5-tesla scanner (with a resolution of 1 1 1 mm3), as shown in Fig. 8.5. Such a large gap in image appearance makes image processing methods (e.g., image segmentation and registration) designed for 1.5 T T1-weighted brain MR images not applicable to 7.0 T MR images [49,50], not to mention other modalities or different organs. As demonstrated in Ref. [51], 7.0-tesla MR images can reveal brain architecture with a resolution equivalent to that obtained from thin slices in vitro. Thus, researchers are able to clearly observe the fine brain structures in mm units, which was only possible with in vitro imaging in the past. Lack of efficient computational tools substantially hinders the translation of new imaging techniques into the medical imaging arena. Although current state-of-the-art methods use supervised learning to find the most relevant and essential features, they require a significant amount of manually labeled training data, while the learned features may be superficial and misrepresent the complexity of anatomical structures. More critically, the learning procedure is often confined to a particular template domain with a certain number of predesigned features. Therefore, once template or image features change, the entire training process must start over again. To address these limitations, Wu et al. [49,50] developed a general feature representation framework that (1) is able to sufficiently capture the intrinsic characteristics of anatomical structures for accurate ROI segmentation and correspondence detection and (2) can be flexibly applied to different kinds of medical images. Specifically, they use SAE to hierarchically learn feature representation in a layer-by-layer manner. As shown at the bottom of Fig. 8.3, the input is a large number of 3-D image patches. Due to the complex nature of medical images, learning the latent feature representations in medical data by employing deep learning is much more difficult than with similar applications in the computer vision and machine learning areas. In particular, the dimensions of the input training patch are often very high. For example, the intensity vector of a 21 21 21 3-D image patch has 9261 elements. Thus, the training of an SAE network becomes very challenging. To alleviate this issue, they resorted to using a convolutional technique to construct the SAE network. The power of feature representations learned by deep learning is demonstrated in Fig. 8.6. A typical image registration result for elderly brain images is shown at the top of Fig. 8.6, where the deformed subject image (Fig. 8.6C) is far from well registered with the template image (Fig. 8.6A), especially for ventricles. Obviously, it is very difficult to learn the meaning of features given the inaccurate correspondences derived from imperfect image registration, as suffered by many supervised learning methods. The performance of our learned features is shown in Fig. 8.6F. For a template point (indicated by the red cross in Fig. 8.6A, using the deep learned feature representations one can successfully find the corresponding point in the subject image, whose ventricle is

Deep learning in biomedical image analysis

Figure 8.6 The similarity maps for identifying the correspondence for the red-crossed point in the template (A) w.r.t. the subject (B) by handcraft features (DeE) and the learned features by unsupervised deep learning (F). The registered subject image is shown in (C). It is clear that inaccurate registration results might undermine supervised feature representation learning that highly relies on correspondences across all training images.

significantly larger. Each point in Fig. 8.6F indicate its likelihood of being selected as correspondence in the respective location. According to the color bar shown in Fig. 8.6, it is easy to locate the correspondence of the red-cross template point in the subject image domain, since high-correspondence probabilities are densely distributed right at the corresponding location and then quickly fade away. Other handcrafted features either detect too many noncorresponding points (when using the entire intensity patch as the feature vector as shown in Fig. 8.6D) or have too few responses and thus miss the correspondence (when using SIFT features as shown in Fig. 8.6E). In general, the learned feature presentations using SAE reveal the least confusing correspondence information for the subject point under consideration and imply the best correspondence detection performance. In order to evaluate eventual improvement in registration accuracy, the authors in Refs. [49,50] further showed deformable image registration performance using deep learning for feature selection on various public datasets. Fig. 8.7C and D shows registration results using conventional intensity-based diffeomorphic Demons [52] and feature-based HAMMER [53] registration methods, respectively, which are the state-ofthe-art registration methods for 1.5 and 3.0 T MR images. The registration results using

251

252

Minjeong Kim, Chenggang Yan, Defu Yang, Qian Wang, Junbo Ma and Guorong Wu

Figure 8.7 Typical registration results on 7.0-tesla MR brain images by Demons (C), HAMMER (D), and H þ DP (E), respectively. Three rows represent three different slices in the template (A), subject (B), and registered subjects (C-E).

the learned feature representation are shown in Fig. 8.7E. In Fig. 8.7, the manually labeled hippocampus on the template image and the deformed subject’s hippocampus by different registration methods are shown by red and blue contours, respectively. Through visual inspection (the overlap of red and blue contours), the registration result using deep learned features is much better than for both the diffeomorphic Demons and the HAMMER. The diffeomorphic Demon registration method fails to register the 7T image, as shown in Fig. 8.7C, since it is simply driven by image intensities, which suffer from image noise and inhomogeneity in 7T images. In addition, due to huge differences in image characteristics between 7 and 3T images, the handcrafted features optimized for 3T images do not work well for 7T images in the feature-based HAMMER registration method either, as shown in Fig. 8.7D. The above applications demonstrate that (1) the latent feature representations inferred by deep learning can well describe local image characteristics; (2) we can rapidly develop image analysis methods for the new medical imaging modalities by using deep learning framework to learn the intrinsic feature representations; and (3) the whole learningbased framework is fully adaptive to learn image data and is reusable for various medical imaging applications, such as segmenting hippocampus [54] and prostate from MR images [55].

Deep learning in biomedical image analysis

8.3.2 Medical image segmentation using deep learning More machine (deep) learning-based methods have been proposed for extracting complicated anatomical structures such as hippocampus in the brain for years. Those methods have achieved high accuracy in evaluation by comparing their segmentation results with ground truths that are manually delineated by experts. The evaluation is mostly performed on large public cohortsdfor example, the ADNI (Alzheimer’s Disease Neuroimaging Initiative) database. Note that the imaging sources that a number of researchers have been working on mostly consist of 1.5T or 3.0T MR images. As illustrated in Fig. 8.5, much more detailed hippocampal structures can be observed in 7.0T images compared with 1.5T or even 3.0T images. Therefore, there may be limitations in extracting hippocampus from recent 7.0T imaging data by using existing learning-based algorithms that are designed for 1.5T images. The limitations come from (1) severe intensity inhomogeneity in 7.0T that can adversely affect the feature consistency of similar anatomical structures; (2) high signal-to-noise ratio that brings forth plenty of anatomical details at the expense of troublesome image noise; and (3) incomplete brain volume (i.e., with only a segment of the brain, considering the practical issue during image acquisition). According to Kim et al. [54], the latent feature representation learned by deep learning (e.g., SAE) has also shown its power in extracting hippocampi in 7.0T images. The authors applied a two-layer stacked ISA (Fig. 8.8) to efficiently learn hierarchical feature representations from the image patches extracted from 7.0T MR images. Specifically, the first ISA model is trained by using a set of overlapped image patches at smaller scale, which are acquired from the input patches. Next, the combination of the activations from all small-scale patches are used as the input of another ISA in the second layer. The activations generated by the two-layer stacked ISA play a role in significantly distinctive features identifying the hippocampal structure when integrated with multiatlas-based segmentation frameworks. A multiatlas-based segmentation framework based on an auto-context model (ACM) was employed in their method design, as

Figure 8.8 Stacked convolutional ISA networks for extracting features from hippocampal area.

253

254

Minjeong Kim, Chenggang Yan, Defu Yang, Qian Wang, Junbo Ma and Guorong Wu

Figure 8.9 Multiatlas hippocampus segmentation framework: training stage (left panel) and testing stage (right panel).

illustrated in Fig. 8.9. ACM utilizes spatial context information that is iteratively perceived from the probability map of segmented hippocampus at the previous iteration without requiring well-aligned training images. After deploying an ACM classifier sequence in each atlas, a set of classifier sequences w.r.t. the number of atlases is trained, where the training samples include not only the underlying atlas but also the linearly aligned other N1 atlases. In the testing stage (right panel of Fig. 8.9), the same hierarchical features learned by the ISA network are extracted for each subject point. Then, the following steps are repeated for each atlas to predict the label for each subject point: (1) map the classifiers on each atlas to the underlying subject space by using the affine registration between the atlas and subject image; (2) predict the probabilistic labeling map by applying the trained ACM classifiers, trained w.r.t. each atlas, to each point in the subject; (3) fuse all labeling results from all atlases to obtain the final segmentation result. In the comparison of the performance of hierarchical feature representation through deep learning with that of handcrafted image features by incorporation into the multiatlas-based segmentation framework, the deep learning-based features have Table 8.1 Quantitative comparisons based on the averaged four overlap metrics, precision (P), recall (R), relative overlap (RO), and similarity index (SI), for the 20 leave-one-out cases, which show the improvements by deep learning-based feature representation over other methods using handcrafted features in the same segmentation framework (unit: %). P R RO SI

By handcrafted features By deep-learning-based feature representations

84.3 88.3

84.7 88.1

77.2 81.9

86.5 89.4

Deep learning in biomedical image analysis

(A)

255

(B)

ground truths

segmentation results by using hand-craft features

(C)

segmentation results by using hierarchical features

Figure 8.10 Comparison of segmented hippocampus regions by using (B) handcrafted features, and (C) hierarchical feature representations. Compared with the manual ground truths (A), hippocampus segmentation results with our learned features give better performance.

achieved consistently higher accuracy than the latter in terms of four overlap metrics w.r.t. ground truths, as shown in Table 8.1. In detail, the smaller image patch with a size of 16 16 3 was used to train the first-layer ISA, and the larger image patch of size 20 20 5 was used to train the second-layer ISA network, respectively. The initial dimension for patch representation by the first ISA layer is 200, while the final dimension by the second layer is 100. Fig. 8.10 shows two typical segmented results by using handcrafted image features (Fig. 8.10B) and the learned hierarchical feature representations (Fig. 8.10C). It can be observed that the segmented hippocampi with deeplearning-based features (Fig. 8.10C) are much closer to the manual ground truths (Fig. 8.10A), especially for the regions indicated by circles.

8.3.3 Nuclear segmentation in mouse microscopy images using convolutional neural networks Our understanding of nervous system function is critically dependent on visualizing the 3-D microstructures of the brain. However, there are several challenges in segmenting cell from microscopy images such as intensity inhomogeneity and poor image resolution, not to mention splitting clumped cells that are most intractable. General speaking, computational challenges are based on data size and complexity. A mouse brain (volume of 1000 mm3) imaged at high resolution (e.g., 0.25 0.25 1 mm) results in w30 TB of data for each fluorescent label [56]. More critically, image quality is usually limited by image acquisition hardware and scanning time. As shown in Fig. 8.11, low image contrast (displayed in the red box) and intensity inhomogeneity (displayed in the blue box) are very common, which makes conventional image processing methods unable to produce accurate cell segmentation results.

256

Minjeong Kim, Chenggang Yan, Defu Yang, Qian Wang, Junbo Ma and Guorong Wu

Figure 8.11 The challenges in segmenting cells from microscopy images.

Current cell detection/segmentation methods are tailored for cell nuclei and use background subtraction by morphological opening method [57,58]. Although the computational cost is low for this method, it assumes that the shape of the feature to segment (in this case nuclei) is similar across cell types. This assumption is likely invalid across all cell types (e.g., neuronal nuclei are round whereas endothelial nuclei are oblong) and therefore will result in poor segmentation. Moreover, since each image point in the microcopy image stack is treated equally, such low-level image processing methods are not sufficient to deal with the inhomogeneity issue. To achieve accurate cell segmentation results from a mouse microscopy image, we propose a novel cascaded CNN approach, as shown in Fig. 8.12. The building block of our proposed segmentation method is a CNN (the bottom of Fig. 8.12). 8.3.3.1 3-D convolutional neural network for cell segmentation Suppose we have a set of image patches X ¼ {xiji ¼ 1, .,N} and the known (manually identified) label Y ¼ {yiji ¼ 1, .,N} (yi ˛{cell, cell edge, back ground} at the center of image patch. The goal is to learn a nonlinear mapping f such that yi ¼ f(xi). Since the mapping function f is usually highly complex, we used the deep learning technique to find the mapping in a layer-by-layer manner, that is, yi ¼ fL ( fL1(. f1(xi))), where the neural network consists of L layers. Note that there are only three neurons (green nodes in the bottom of Fig. 8.12) in the last layer that produce the probability to cell, cell edge, and background, respectively. Specifically, let D and M denote, respectively, the dimensions of hidden representations and input patches. Given an input image patch xi

Deep learning in biomedical image analysis

Figure 8.12 The overview of cascaded convolutional neural network (CNN) of the nuclear segmentation method (top) and the architecture of CNN (bottom).

T ˛ℛM, the neural network maps it to be an activation vector h1i ¼ h1i ð jÞ j¼1;.;D ˛ℛ D by h1i ¼ f1 ðW1 xi þ b1 Þ, where the weight matrix W1˛ℛDL and the bias vector b1˛ℛD are the network parameters in the first layer. Here, f is the logistic sigmoid function f(a) ¼ 1/(1 þ exp(a)). It is worth noting that h1i is considered the low-level representation vector of the particular input training patch xi. Next, the representation h1i from the hidden layer is used as the input of the second layer to learn the network parameters W2 and b2, where the activation vector h2i encodes the correlations across the low-level features. We repeat the same procedure and construct L layers, as shown in the bottom of Fig. 8.12. A typical gradient-based backpropagation algorithm can be used for fine-tuning the network parameters [26,27]. For robustness, the input image patch size must be set sufficiently largede.g., 61 voxels in each dimension. However, it is too complex to learn nonlinear mapping in such a high-dimensional space. Hence, the convolutional technique is employed to reduce the data dimension. The input to the CNN is the large image patch P v with patch size Lv. To make it simple, here, we explain the CNN with a 2-D image patch as an example. Since the dimension of image patch P v is too large, we let a Lw Lw (Lw < Lv) sliding window P w go through the entire big image patch P v , thus obtaining

257

258

Minjeong Kim, Chenggang Yan, Defu Yang, Qian Wang, Junbo Ma and Guorong Wu

(Lv Lw þ 1) (Lv Lw þ 1) small image patches. Eventually, we use these small image patches P w to train the AE in each layer instead of the entire big image patch P v . Given the parameters of the network (weight matrix Wl and bias vector bl in each layer), we can compute (L vLw þ 1) (Lv Lw þ 1) activation vectors. Then max pooling [18] is used to shrink the representations by a factor of C in each direction (horizontally or vertically). Specifically, we compute the representative activation vector among these four activation vectors in the 22 neighborhood by choosing the maximum absolute value for each vector element. Thus, the number of activation vectors significantly reduces to Lv LCw þ1 Lv LCw þ1. Since we apply the maximum operation, shrinking the representation with max pooling allows for high-level representation to be invariant to small translations of the input image patches and reduces the computational burden. 8.3.3.2 Cascaded convolution neural network using contextual features In the previous discussion, we only used the image appearance information to train CNN. Due to low image contrast, low-level features derived from image intensity are not sufficient to steer the training of neural networks. Other high-level features are of great necessity in alleviating the issue of poor image quality. To this end, we resort to contextual features [59,60] that can encode the spatial relationship of one structure to other structures. Since the output of CNN includes the probability of cell at each voxel, we construct patch-wise contextual features based on the tentative cell probability map. It is reasonable to train another CNN using both low-level image appearance and high-level contextual information. Leveraged by high-level heuristics, we can enhance the reliability of the cell probability map and then use the refined contextual features to train another CNN and so on until the segmentation results converge. Eventually, we turn the conventional CNN method into a cascaded architecture, as shown at the top of Fig. 8.12. Experiment setting. In the training stage, we randomly sampled w165,000 training patches to train the cascaded CNN, with each image patch and known label (cell, cell edge, and background) at the patch center. The patch size is set to 61 voxels. Max pooling of a 2 2 2 window is operated to combine the activation vectors from convolutional filters. To evaluate the cell segmentation result, we first compare our cascaded CNN cell segmentation method with the classic Otsu method, where the threshold of intensity is optimized to separate cell and background in the whole image domain. We also apply enhanced Otsu method [61], which considers the issue of image artifacts such as noise. Since the main challenge of cell segmentation in microscopy images is from intensity inhomogeneity, we further deploy local Otsu method, where the intensity threshold is adaptive to each local region. Furthermore, we show the cell segmentation results by object detection method [62] using the filter convolution technique, which works in the frequency domain and assumes the cell voxels usually have high response to certain specifically designed band-pass filters.

Deep learning in biomedical image analysis

Figure 8.13 The advantage of cascaded convolutional neural network (CNN) over single CNN.(A) The original microscopy image. (B) The microscopy image after intensity normalization. (C) The cell probability map by single CNN. (D) The cell probability map cascaded CNN.

8.3.3.3 Advantage of cascaded convolutional neural network over single convolutional neural network First, we demonstrate the cell segmentation with and without cascaded architecture. Fig. 8.13A and B displays the original mouse microscopy image and after intensity normalization, where the low illumination at top left (red bounding box) makes most cells in that dark region appear as background. The cell probability map by single CNN (use intensity information only) and our cascaded CNN is shown in Fig. 13C and D, respectively. It is observable that (1) learning-based approaches are efficient to alleviate the issue of local intensity inhomogeneity; and (2) the cell probability map by cascaded CNN is sharper than single CNN (and thus more reliable to binary into cell and background), indicating the advantage of using contextual features and the cascaded architecture. 8.3.3.4 Evaluation of cell segmentation accuracy with comparison to current stateof-the-art methods Next, we show the segmentation by classic Otsu method (using global threshold) in Fig. 8.14A, enhanced Otsu method (using corrected global threshold) in Fig. 8.14B,

259

Minjeong Kim, Chenggang Yan, Defu Yang, Qian Wang, Junbo Ma and Guorong Wu

260

Figure 8.14 Cell segmentation result by current state-of-the-art methods and the cascaded convolutional neural network method. (A)-(C) show the zoom-in view of the segmentation in blocks A-C.

local Otsu method (using region adaptive threshold) in Fig. 8.14C, band-pass convolution filter in Fig. 8.14D, single CNN (using intensity information only) in Fig. 8.14E, and our proposed cascaded CNN (using both intensity and contextual features) in Fig. 8.14F, respectively. In general, learning-based approaches outperform non-learningbased methods via visual inspection. Furthermore, we calculate the overlap degree between the manual and the automatic cell segmentation results by the six aforementioned approaches. The dice ratios are shown in Table 8.2. It is apparent that the improvement by our cascaded CNN is significant in terms of segmentation accuracy.

Table 8.2 The dice ratio and computational time by six automatic cell segmentation methods. Method Otsu Enhanced Otsu Local Otsu Band-pass filter Single CNN Cascaded CNN

Dice Time (s)

0.421 0.303

0.639 1.572

0.671 3.710

0.576 3.756

0.590 84.21

0.767 124.0

Deep learning in biomedical image analysis

We tested our cell segmentation method on a Dell workstation with a GPU card (NVIDIA TITAN X with 12 GB frame buffer and 3584 cores @ 1.5 GHz). Without specific program optimization, our cascaded CNN method requires 124 s to complete cell segmentation with a 800 600 image region, voxel by voxel.

8.4 Conclusion In this chapter, we briefly introduce the concept of and most popular deep learning models used in biomedical imaging applications. It is worth noting that domain knowledge is very important in selecting the most appropriate learning model for a specific application. We showed examples with a focus on feature representation learning and image segmentation. In general, we found that deep learning technology is very useful and outperforms connectional image processing approaches given a large number of annotated samples.

References [1] T. Paus, et al., Structural maturation of neural pathways in children and adolescents: in vivo study, Science 283 (5409) (1999) 1908e1911. [2] E.R. Sowell, et al., Localizing age-related changes in brain structure between childhood and adolescence using statistical parametric mapping, NeuroImage 9 (6(Pt1)) (1999) 587e597. [3] P.M. Thompson, et al., Growth patterns in the developing brain detected by using continuum mechanical tensor maps, Nature 404 (2000) 190e193. [4] S. Resnick, P. Maki, Effects of hormone replacement therapy on cognitive and brain aging, Annals of the New York Academy of Sciences 949 (2001) 203e214. [5] P.M. Thompson, et al., Tracking alzheimer’s disease, Annals of the New York Academy of Sciences 1097 (1) (2007) 183e214. [6] J. Lerch, et al., Automated cortical thickness measurements from MRI can accurately separate Alzheimer’s patients from normal elderly controls, Neurobiology of Aging 29 (1) (2008) 23e30. [7] N. Schuff, et al., MRI of Hippocampal Vsssolume Loss in Early Alzheimer’s Disease in Relation to ApoE Genotype and Biomarkers, 2009, pp. 1067e1077. [8] G. Frisoni, et al., In vivo neuropathology of the hippocampal formation in AD: a radial mapping MRbased study, NeuroImage 32 (1) (2006) 104e110. [9] L. Apostolova, et al., Conversion of mild cognitive impairment to Alzheimer disease predicted by hippocampal atrophy maps, Archives of Neurology 64 (9) (2007) 1360e1361. [10] A.D. Leow, et al., Alzheimer’s Disease Neuroimaging Initiative: a one-year follow up study using tensor-based morphometry correlating degenerative rates, biomarkers and cognition, NeuroImage 45 (3) (2009) 645e655. [11] G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science 313 (5786) (2006) 504e507. [12] G.E. Hinton, S. Osindero, Y.-W. Teh, A fast learning algorithm for deep belief nets, Neural Computation 18 (7) (2006) 1527e1554. [13] H.-C. Shin, et al., Stacked autoencoders for unsupervised feature learning and multiple organ detectionin a pilot study using 4D patient data, in: IEEE Transaction on Pattern Analysis and Machine Intelligence, 2012. Special issue on deep learning). [14] N. Srivastava, R. Salakhutdinov, Multimodal learning with deep Boltzmann machines, in: Advances in Neural Information Processing Systems, NIPS, 2012.

261

262

Minjeong Kim, Chenggang Yan, Defu Yang, Qian Wang, Junbo Ma and Guorong Wu

[15] Salakhutdinov, R., A. Mnih, and G. Hinton, Restricted Boltzmann machines for collaborative filtering, in Proceedings of the 24th International Conference on Machine Learning. 2007: Corvalis, Oregon. p. 791-798. [16] M.A. Ranzato, et al., Efficient learning of sparse representations with an energy-based model, in: Advances in Neural Information Processing Systems, NIPS, 2006. [17] H. Lee, et al., Unsupervised learning of hierarchical representations with convolutional deep belief networks, Communications of the ACM 54 (10) (2011) 95e103. [18] Lee, H., et al., Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, in Proceedings of the 26th Annual International Conference on Machine Learning. 2009, ACM: Montreal, Quebec, Canada. p. 609-616. [19] H. Lee, C. Ekanadham, A.Y. Ng, Sparse deep belief net model for visual area V2, in: Advances in Neural Information Processing Systems, NIPS, 2008. [20] Q.V. Le, et al., Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2011. [21] Larochelle, H., et al., An empirical evaluation of deep architectures on problems with many factors of variation, in Proceedings of the 24th International Conference on Machine Learning. 2007, ACM: Corvalis, Oregon. p. 473-480. [22] H. Larochelle, et al., Exploring strategies for training deep neural networks, Journal of Machine Learning Research 1 (2009) 1e40. [23] A. Hyvarinen, P. Hoyer, Emergence of phase- and shift-invariant features by decomposition of natural images into independent feature subspaces, Neural Computation 12 (7) (2000) 1705e1720. [24] H. Chen, A.F. Murray, Continuous restricted Boltzmann machine with an implementable training algorithm, Vision, Image and Signal Processing, IEE Proceedings 150 (3) (2003) 153e158. [25] Y. Bengio, et al., Greedy layer-wise training of deep networks, in: Advances in Neural Information Processing Systems, NIPS, 2006. [26] Y. Bengio, A. Courville, P. Vincent, Representation Learning: A Review and New Perspectives, Arxiv, 2012. [27] L. Arnold, et al., An introduction to deep-learning, in: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), 2011. [28] Q.V. Le, et al., On optimization methods for deep learning, in: ICML, 2011. [29] T.T.D. Team, et al., Theano: A Python Framework for Fast Computation of Mathematical Expressions, arXiv preprint arXiv:1605.02688, 2016. [30] Y. Bengio, MILA and the Future of Theano, 2017. Available from, https://groups.google.com/ forum/#!msg/theano-users/7Poq8BZutbY/rNCIfvAEAwAJ. [31] J. Hale, Deep Learning Framework Power Scores 2018, 2018. Available from, https:// towardsdatascience.com/deep-learning-framework-power-scores-2018-23607ddf297a. [32] M. Abadi, et al., Tensorflow: a system for large-scale machine learning, in: OSDI, 2016. [33] F. Chollet, Keras (2015). [34] F. Seide, A. Agarwal, CNTK: microsoft’s open-source deep-learning toolkit, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2016. [35] DeepMind, 2017. Available from: https://github.com/deepmind/sonnet. [36] A. Paszke, et al., Automatic Differentiation in Pytorch, 2017. [37] FacebookResearch (2017). Available from: http://caffe2.ai. [38] Y. Jia, et al., Caffe: convolutional architecture for fast feature embedding, in: Proceedings of the 22nd ACM International Conference on Multimedia, ACM, 2014. [39] D. Team, Deeplearning4j: Open-Source Distributed Deep Learning for the Jvm, vol. 2, Apache Software Foundation License, 2016. [40] Matlab. Deep Learning Toolbox, 2015. Available from: https://au.mathworks.com/products/deeplearning.html. [41] T. Chen, et al., Mxnet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems, arXiv preprint arXiv:1512.01274, 2015.

Deep learning in biomedical image analysis

[42] G. Litjens, et al., A survey on deep learning in medical image analysis, Medical Image Analysis 42 (2017) 60e88. [43] D. Shen, G. Wu, H.-I. Suk, Deep learning in medical image analysis, Annual Review of Biomedical Engineering 19 (2017) 221e248. [44] K. Suzuki, Overview of deep learning in medical imaging, Radiological physics and technology 10 (3) (2017) 257e273. [45] H. Greenspan, B. Van Ginneken, R.M. Summers, Guest editorial deep learning in medical imaging: overview and future promise of an exciting new technique, IEEE Transactions on Medical Imaging 35 (5) (2016) 1153e1159. [46] K. Clark, et al., The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository, Journal of Digital Imaging 26 (6) (2013) 1045e1057. [47] Wang, X., et al. Chestx-ray8: hospital-scale chest x-ray database and benchmarks on weaklysupervised classification and localization of common thorax diseases. in Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on. 2017. IEEE. [48] E. Gibson, et al., NiftyNet: a deep-learning platform for medical imaging, Computer Methods and Programs in Biomedicine 158 (2018) 113e122. [49] G. Wu, et al., Unsupervised deep feature learning for deformable image registration of MR brains, in: International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI, 2013. [50] G. Wu, et al., Scalable high performance image registration framework by unsupervised deep feature representations learning, IEEE Transactions on Biomedical Engineering 99 (2015). [51] Z.-H. Cho, et al., New brain atlasdmapping the human brain in vivo with 7.0 T MRI and comparison with postmortem histology: will these images change modern medicine? International Journal of Imaging Systems and Technology 18 (1) (2008) 2e8. [52] T. Vercauteren, et al., Diffeomorphic demons: efficient non-parametric image registration, NeuroImage 45 (1 Suppl. l) (2009) S61eS72. [53] G. Wu, et al., S-HAMMER: hierarchical attribute-guided, symmetric diffeomorphic registration for MR brain images, Human Brain Mapping (2013). [54] Kim, M., G. Wu, and D. Shen, Unsupervised deep learning for Hippocampus segmentation in 7.0 tesla MR images, in MICCAI Workshop on Machine Learning in Medical Imaging (MLMI 2013). 2013: Nagoya, Japan. [55] Y. Guo, Y. Gao, D. Shen, Deformable MR prostate segmentation via deep feature learning and sparse patch matching, IEEE Transactions on Medical Imaging 35 (4) (2016) 1077e1089. [56] N. Kasthuri, et al., Saturated reconstruction of a volume of neocortex, Cell 162 (3) (2015) 648e661. [57] N. Renier, et al., Mapping of brain activity by automated volume Analysis of immediate early genes, Cell 165 (7) (2016) 1789e1802. [58] D.S. Richardson, J.W. Lichtman, Clarifying tissue clearing, Cell 162 (2) (2015) 246e257. [59] Kim, M., et al., Joint labeling of multiple regions of interest (ROIs) by enhanced auto context models, in 2015 IEEE International Symposium on Biomedical Imaging (ISBI). 2015: New York. [60] Z. Tu, X. Bai, Auto-context and its application to high-level vision tasks and 3D brain image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence 21 (10) (2010) 1744e1757. [61] M. Sezgin, B. Sankur, Survey over image thresholding techniques and quantitative performance evaluation, Journal of Electronic Imaging 13 (1) (2004) 146e165. [62] P. Ghamisi, et al., Multilevel image segmentation based on fractional-order darwinian particle swarm optimization, IEEE Transactions on Geoscience and Remote Sensing 52 (5) (2014) 2382e2395.

263

CHAPTER NINE

Automatic lesion detection with three-dimensional convolutional neural networks Qi Dou1, Hao Chen1, Jing Qin2 and Pheng-Ann Heng1 1

Department of Computer Science and Engineering, The University of Hong Kong, Sha Tin, Hong Kong School of Nursing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong

2

9.1 Introduction Automatic lesion detection in medical images has been a fundamental and crucial topic in the area of medical image analysis. Accurate and efficient localization of the lesions are essential for many clinical procedures, such as disease diagnosis decisionmaking, primary cancer screening, management of early treatment, etc. For example, the cerebral microbleeds serve as important diagnostic biomarkers for brain vascular diseases, and can potentially cause neurologic dysfunction and cognitive impairment [1,2]. The pulmonary nodules are critical indicators of primary lung cancer, and timely surgical intervention of nodules help dramatically increase the survival rate of patients [3,4]. The automatic detection tasks are, however, very challenging. The lesions in medical images have very small size (i.e., at a scale of millimeter), especially when a patient is at an early-stage cancer or other diseases. Moreover, these small lesions are sparsely distributed throughout the anatomical area. The widespread and unpredictable lesion locations make complete and accurate detection even more difficult. In addition, the lesions themselves present large variations in characteristics and contextual information. There also exist many hard mimics, which are normal tissues but heavily resemble the appearance of lesions in scanned medical images. These complicated intra-/interclass variances set further obstacles for a computer-aided detection system to achieve a high sensitivity with a low false positive rate. Typically, the automatic lesion detection system consists of two steps: (1) candidate screening, which would sensitively screen candidates but receive many false positives, and (2) false positive reduction, which aims to remove the false positives and produce the final detection results. In the last generation of computer-aided detection systems,

Biomedical Information Technology ISBN 978-0-12-816034-3, https://doi.org/10.1016/B978-0-12-816034-3.00009-2

2020 Elsevier Inc. All rights reserved.

265 j

Qi Dou, Hao Chen, Jing Qin and Pheng-Ann Heng

266

the first stage has relied on rule-based methods, such as curvature computation, voxel clustering, intensity thresholding and morphological operation. The second stage commonly has employed various classifiers, such as support vector machine (SVM), decision tree, random forest, etc., on the basis of handcrafted features, which are heuristically designed to describe the key characteristics of lesions, such as the intensity, size, sphericity, texture and contexts. The representation capability of those used low-level features have limited the accuracy of previous computer-aided detection systems. Recently, with the remarkable success of deep convolutional neural networks (CNN) in image processing [5,6], the deep learning based representations have been broadly employed in medical image computing. With the unique nature of high dimensionality in medical images (e.g., computed tomography (CT) and magnetic resonance (MR) imaging), how to effectively unleash the power of CNN on 3-D volumetric medical data requires elaborated researches. One straight-forward way is to employ 2-D CNN based on each single slice and process the slices sequentially. Apparently, this solution disregards the spatial information along the third dimension. Alternatively, aggregations of adjacent slices or orthogonal planes (i.e., axial, coronal and sagittal) are useful to enhance complementary spatial information in the 3-D space. Nevertheless, these solutions are still suboptimal, as the employed 2-D kernels are independent from each other and the repeated patterns along the third dimension are insufficiently modeled. In this chapter, we present 3-D convolutional neural network (3-D CNN), which aims to tailor highly representative and discriminative features for volumetric medical data. We further introduce a 3-D CNN based cascaded two-step framework, to efficiently and accurately perform the task of lesion detection from medical images. Two distinct case studies using the developed system are demonstrated with state-of-the-art performance achieved. Our early works related to this chapter were published in Refs. [7e9].

9.2 3-D convolutional neural network Basically, a CNN classification model alternatively stacks convolutional (C) and subsamplingde.g., max-pooling (M)dlayers. In a C layer, small feature extractors (kernels) sweep over the topology and transform the input into feature maps. In an M layer, activations within a neighborhood are abstracted to acquire invariance to local translations. After several C and M layers, feature maps are flattened into a feature vector, followed by fully connected (FC) layers. Finally, a softmax classification layer yields the prediction probability. This section describes the 3-D CNN for medical image analysis, which also follows that fundamental construction.

Automatic lesion detection with three-dimensional convolutional neural networks

267

9.2.1 3-D convolutional kernel In a typical C layer, a feature map is produced by convolving the input with convolution kernels, adding a bias term, and finally applying a nonlinear activation function. By denoting the i-th feature map of the l-th layer as hli and the k-th feature map of the previous layer as hkl1 , a C layer is formulated as: ! X l l (9.1) hli ¼ s hl1 k *W ki þ bi ; k

where W lki and bli are the filter and bias term connecting the feature maps of adjacent layers, the * denotes the convolution operation and the s(,) is the element-wise nonlinear activation function. In 2-D natural image processing, the input of CNN usually consists of three color channels (i.e., RGB). Inspired by this, the most straightforward way to adapt 2-D CNN to support volumetric medical image processing is to replace the color channels with adjacent slices of the volume. As shown in Fig. 9.1A, given a volumetric image of size X Y Z, when we employ this scheme to generate a feature map, we first need to split the input volume along the third dimension into Z isolated slices, and then feed these Z isolated slices into the network. Correspondingly, Z 2-D kernels are formed, with each single slice swept over by a unique kernel (see the red line). However, this scheme cannot sufficiently leverage the spatial information, since the Z 2-D kernels are different from each other. In other words, due to the absence of kernel sharing across the third dimension, the encoded volumetric spatial information is inevitably deficient.

(A)

(B)

Y

Y

Y’ X X’

M M

N

2D feature map

Z slices Network input

2D convolution kernel

X

1 kernel

N

Z kernels

M T N

Y’

M T N

X’ Z’

Z Network input

3D feature volume

3D convolution kernel

Figure 9.1 Comparison of using two- and three-dimensional convolution kernels given volumetric image with size of X Y Z in terms of network input, kernel behavior and generated feature map. Red lines represent the moving direction of kernelsdi.e., sweeping over the two- and threedimensional topologies, respectively. (A) With the two-dimensional convolution (kernel size of M N), the volume is first split into Z isolated slices along the third direction and these slices are input to the network. Each generated feature map is a two-dimensional patch. (B) With the threedimensional convolution (kernel size of M N T), the entire volume is input to the network. Each generated feature map is a three-dimensional volume. (Note that kernel sizes M, N and T need not to be equal. Best viewed in color.)

Qi Dou, Hao Chen, Jing Qin and Pheng-Ann Heng

268

Learning feature representations from all three dimensions is vitally important for biomarker detection tasks from volumetric medical images. In this regard, we propose to set up the 3-D convolution kernel, in the pursuance of encoding richer spatial information of the volumetric data. In this case, the feature maps are 3-D blocks instead of 2-D patches (we call them feature volumes hereafter). As shown in Fig. 9.1B, given the same volumetric image of size X Y Z, when we employ a 3-D convolution kernel to generate a 3-D feature volume, the input to the network is the entire volumetric data. Consequently, a 3-D kernel is formed and it sweeps over the whole 3-D topology (see the red line). By leveraging the kernel sharing across all three dimensions, the network can take full advantage of the volumetric contextual information. Generally, the following equation formulates the exploited 3-D convolution operation in an element-wise manner: X l ulki ðx; y; zÞ ¼ hl1 (9.2) k ðx m; y n; z tÞW ki ðm; n; tÞ; m;n;t

where W lki denotes the 3-D kernel in the l-th layer which convolves over the 3-D feature volume hkl1, the W lki ðm; n; tÞ is an element-wise weight in the 3-D convolution kernel. Following Eq. (9.1) and Eq. (9.2), the 3-D feature volume hli is obtained by summing over the 3-D convolution kernels: ! X hli ¼ s (9.3) ulki þ bli : k

9.2.2 3-D CNN hierarchical model With the 3-D convolutional kernel and the layer, we can hierarchically construct a deep 3-D CNN model by stacking the C, M, and FC layers, as shown in Fig. 9.2. Specifically, in the C layer, a series of 3-D feature volumes are produced. In the M layer, the maxpooling operation, or any other down-sampling operation, is also conducted in the 3-D fashiondi.e., the feature volumes are subsampled based on a cubic neighborhood. In the following FC layer, the 3-D feature volumes are flattened into a feature vector as its input. The ultimate output layer employs the softmax activation to yield the prediction probabilities for the input image. During 3-D CNN implementation, the nonlinear activation function (e.g., the ReLU or LeakyReLU) is used in C and FC layers. The 3-D convolution kernels are randomly initialized from the Gaussian distribution and trainable parameters in the network are updated using the standard backpropagation with stochastic gradient descent or other advanced optimizers. The loss function is derived according to the specific task aiming to solve, for example, the cross-entropy loss for classification tasks, the Dice loss for segmentation tasks, or the adversarial loss for generative model. The developed useful

Automatic lesion detection with three-dimensional convolutional neural networks

Feature volumes

269

Feature volumes Feature volumes

Feature volumes Neurons

C1

Input

M1

C2

C3

FC1 FC2

0 1

C : Convolution M: Max Pooling FC: Fully Connected

Figure 9.2 The hierarchical architecture of the 3-D CNN model.

strategies which can benefit the learning procedure, such as dropout, batch normalization, and residual connection, can be seamlessly embedded into the 3-D convolutional neural network.

9.3 Efficient fully convolutional architecture One of the main concerns about exploiting CNN in medical imaging domain lies in the time performance, as many medical applications require prompt responses for further diagnosis and treatment. The situation is more rigorous when processing volumetric medical data. Directly applying 3-D CNNs to detect lesions using the traditional sliding window strategy is usually impracticable, especially when the input volumetric images are acquired with high resolutions, because thousands or even millions of 3-D block samples need to be analyzed. In most biomarker detection applications, the targets are usually sparsely distributed throughout the volume, such as the microbleed in the 3-D brain MR image. To this end, one promising solution for detection is to first obtain the candidates with a high sensitivity and then perform fine-grained discrimination only on these candidates, so that the computational cost can be greatly reduced. Previous work proposed to retrieve lesion candidates (also called regions-of-interest in some papers) by employing local statistical information, including size, intensity, shape and

Qi Dou, Hao Chen, Jing Qin and Pheng-Ann Heng

270

other geometric features [10e12]. However, due to the large variations of lesions, only relying on these statistical values or handcrafted features are not effective enough. We propose to use 3-D CNN to robustly screen candidates by leveraging high-level spatial representations of lesions learned from a large number of 3-D training samples. However, we still face the challenge of time performance when employing 3-D CNN to retrieve candidates with the traditional sliding window strategy. To this end, inspired by the 2-D fully convolutional networks (FCNs) [13], we propose to extend the strategy into a 3-D format for efficient retrieval of the candidates from volumetric medical images. The proposed 3-D FCN can take an arbitrary-sized volume as input and produce a 3-D score volume within a single forward propagation, and hence can greatly speed up the candidate retrieval procedure without damaging the sensitivity.

9.3.1 Fully convolutional transformation In a 3-D CNN architecture, both the convolutional and down-sampling layers can process arbitrary-sized input, where convolution or max-pooling kernels sweep over the input and generate the corresponding-sized output. However, the traditional FC layers flatten the feature volumes into vectors thus dismissing the spatial relationships. These FC layers then utilize vector-matrix multiplications to generate the output, as shown in the following: hl ¼ s W l hl1 þ bl ; (9.4) where hl1˛ℝP and hl˛ℝQ are the feature vectors in the (L-1)-th and the l-th FC layers, respectively, W l˛ℝQP is the weight matrix and bl denotes the bias term. In traditional CNN, once trained, the weight W l is with a fixed shape, and hence the FC layer has fixed input/output sizes. As a result, a network with traditional FC layers requires that the initial inputs have a fixed size. For example, when the network is trained based on 3-D samples of size 16 16 10, errors will arise if we input a test sample of size 20 16 10, due to the shape mismatch in the first dimension. In this regard, we equivalently rewrite the FC layers into the following convolutional format: ! X hlq ¼ s (9.5) hpl1 *W lpq þ blq ; p

where each neuron in the FC layer is regarded as a 1 1 1 feature volume, W lpq ˛ ℝ111 is the 3-D kernel and the * is the 3-D convolution operation described in Eq. (9.2). In this way, the vector-matrix multiplications are formulated as convolution operations with 1 1 1 kernels. With the FC layers converted into convolutional layers, the network could therefore support arbitrary-sized input.

Automatic lesion detection with three-dimensional convolutional neural networks

9.3.2 3-D score volume generation During the training phase, a traditional 3-D CNN model is learned. Once training is done, to acquire the 3-D FCN model, the FC layers in the traditional 3-D CNN are transformed into the convolutional fashion. More specifically, the multiplication matrix W l˛ℝQP is reshaped into a 5D tensor W l˛ℝQ1P11 (the dimensions are ordered for the ease of implementation), and hence the weight matrix is converted into a series of convolution kernels. During the testing phase, the 3-D FCN model directly inputs a volume and outputs a 3-D score volume (with reduced resolution compared with the original input size). The value at each location of score volume indicates the probability of being a lesion. There are some implementation issues needed to be handled when developing the 3-D FCN model. Specifically, when converting the traditional FC layers into the convolutional fashion by casting the 2-D multiplication matrix (ℝQP) into the 5D tensor (ℝQ1P11), we should precisely maintain the spatial correlation. In addition, during whole volume testing, we need to ensure the dimension consistency in the logistic regression layer, where the feature volumes are first flattened into vectors, then applied to the softmax function and finally reshaped back to form the 3-D score volume. One alternative practice is to directly train the model in an FCN format, such that there are only convolutional and down-sampling layers in the network, without any FC layer.

9.3.3 Score volume index mapping Due to successive layers of convolution and down-sampling operations, the size of the generated 3-D score volume is reduced compared with the original input. Actually, the 3-D score volume is a coarse version of the voxel-wise predictions which are produced by the sliding window strategy. Meanwhile, the locations on this coarse score volume can be traced back to the coordinates on the original input space. Since all three dimensions follow the same index mapping mechanism, we demonstrate the mapping process with one dimension. In our formulation, indices are numbered from zero. Generally, for each C or M layer (supposing nonpadding convolution and nonoverlap pooling) in the model, the index mapping procedure with convolution or max-pooling operation can be calculated by: c1 0 x ¼ d$x þ ; (9.6) 2 where x0 and x denote the coordinates before and after the convolution or max-pooling operation; d and c represent the stride and kernel size, respectively; the P,R represents the floor function. When mapping the location xs in the coarse score volume back through the architecture toward the location xo in the original input volume, we successively deduce

271

Qi Dou, Hao Chen, Jing Qin and Pheng-Ann Heng

272

Table 9.1 The architecture of three-dimensional FCN screening model. Layer Kernel size Stride Output size

Input C1 M1 C2 C3 FC1 FC2

e 553 222 333 331 222 111

e 1 2 1 1 1 1

16 16 10 12 12 8 664 442 222 111 111

Feature volumes

1 64 64 64 64 150 2

the index mapping procedures along all intermediate convolution and max-pooling layers until the initial input layer. For example, based on the screening network architecture shown in Table 9.1, for each position index xs in the coarse score volume, we can obtain its corresponding index xo in the original input as follows: c1 1 c2 1 c3 1 xo ¼ þ þ d2 , xs þ þ 2 2 2 (9.7) ! c4 1 c5 1 c6 1 þ þ ¼ D,xs þ C; 2 2 2 where, according to the network architecture, c1 ¼ 5, c2 ¼ 2, d2 ¼ 2, c3 ¼ 3, c4 ¼ 3, c5 ¼ 2, c6 ¼ 1, and we can calculate D ¼ 2 and C ¼ 6 for the X dimension. As shown in Fig. 9.3, with this mechanism, each location in the 3-D score volume can be mapped back to the centroid of the corresponding receptive field of the neuron.

Mapping

Rece

3D score volume

Original input space

Figure 9.3 The mapping from the three-dimensional score volume onto the original input space.

Automatic lesion detection with three-dimensional convolutional neural networks

Equivalently, if the cubic patch centered on the traced position is input to the traditional 3-D CNN, the prediction probability is indeed the value at the location on the coarse score volume. Consequently, the prediction scores are sparsely mapped back onto the input volume, and regions with high probabilities are retrieved as potential candidates.

9.4 Two-stage cascaded framework for detection Based on 3-D CNN and the fully convolutional architecture, we build a detection network modeling. Specifically, we use a 3-D FCN model and a 3-D CNN model tailored for two different stages and integrate them into an efficient and robust detection framework. In this cascaded framework for lesion detection, each stage serves its own mission. The candidate screening stage with the 3-D FCN aims to accurately reject the background regions and rapidly retrieve a small number of potential candidates. The false positive reduction stage with the 3-D CNN focuses only on the screened set of candidates to further single out the true lesions from challenging mimics.

9.4.1 Candidate screening stage The workflow of the candidate screening stage is presented in Fig. 9.4, including both training and testing phases. During the training phase, the positive samples are extracted from lesion regions and with augmentations to expand the training database. In practice, the network is trained with three substeps. We start from training an initial 3-D CNN with randomly selected nonlesion regions throughout the image as negative samples. Next, we add false positive samples acquired by applying the initial model on the training dataset. Finally, the initial model is fine-tuned with the enlarged training database which consists of positives, randomly selected negatives and supplemental false positives. In this way, the discrimination capability of the network is further enhanced. During the testing phase, the 3-D FCN model takes the whole volume as input and generates the corresponding coarse 3-D score volume. Considering that the produced score volume could be noisy, we utilize the local nonmax suppression in a 3-D fashion as the postprocessing. Locations in the 3-D score volume are then sparsely traced back to coordinates in the original input space, according to the index mapping process. Finally, regions with high prediction probabilities are selected as the potential candidates.

9.4.2 False positive reduction stage In this stage, 3-D small blocks are cropped centered on the screened candidate positions. The extracted 3-D candidate regions are classified by a newly constructed 3-D CNN model, to remove the remaining false positives. Note that the randomly selected nonlesion samples are not strongly representative, especially when we aim to distinguish true

273

Qi Dou, Hao Chen, Jing Qin and Pheng-Ann Heng

274

Training phase

Randomly negative samples

(1)

Traditional 3D CNN Fine-tuned traditional 3D CNN

(3) Positive samples

(2)

Training set False positive samples

Testing phase (3D FCN) Testing volume

Candidates

Convert traditional FC layers

Weight matrix

1x1x1 kernels

C1, M1

Mapping C2

Feature volume

C3

Feature volume

FC2

3D score volume

Figure 9.4 Illustration of the workflow of the screening stage. The training phase is conducted in three substeps: (1) train an initial traditional 3-D CNN with positive samples and randomly selected negative samples; (2) apply the initial model on training set and obtain false positive samples to enlarge the training database; (3) fine-tune the initial traditional 3-D CNN model with the enlarged database to strengthen its discrimination capability. Once training is done, the traditional FC layers are converted into the convolutional fashion (as shown in the brown box). During the testing phase, the 3-D FCN takes a whole volume as input, extracts representative feature volumes and finally produces a 3-D score volume to retrieve candidates.

lesions from their mimics. To generate representative samples and improve the discrimination capability of the 3-D CNN model, the obtained false positives (which take very similar appearance as lesions) on the training set in the screening stage are taken as negative samples when training the 3-D CNN in the second stage. The model ensemble can be employed in this stage to further improve the performance.

Automatic lesion detection with three-dimensional convolutional neural networks

9.5 Case study I: cerebral microbleed detection in brain magnetic resonance imaging 9.5.1 Background of the application Cerebral microbleeds (CMBs) refer to the small foci of chronic blood products, composed of the hemosiderin deposits that leak through pathological brain blood vessels [2]. This lesion is prevalent in patients with cerebrovascular and cognitive diseases (such as stroke and dementia), and also present in healthy aging populations. The existence of cerebral microbleeds and their distribution patterns have been recognized as important biomarker for diagnosis of the cerebrovascular diseases. For example, the CMB lobar distribution would suggest probable cerebral amyloid angiopathy, and the deep hemispheric or infratentorial microbleeds may imply probable hypertensive vasculopathy. The existence of CMBs would bring an increase in the risks of symptomatic intracerebral hemorrhage and recurrent ischemic stroke [14]. Furthermore, these lesions could structurally damage their nearby brain tissues, and further cause cognitive impairment and neurologic dysfunction [15]. In these regards, reliable detection of the CMB is crucial for cerebral diagnosis and may guide physicians in determining which drug to choose for necessary treatment, such as for stroke prevention. Modern advances in MR imaging technologies make the paramagnetic blood products be more sensitive to screening [16], and hence facilitate the recognition of CMBs. As shown in Fig. 9.5, the cerebral microbleed is radiologically visualized as rounded hypointensities of small size within the susceptibility weighted imaging (SWI) MR data (refer to the yellow rectangle). In general, the clinical routine to detect the CMB is based on visual inspection and manual localizing, which is laborious and errorprone. Alternatively, computer-aided detection systems can assist to relieve the workload

Figure 9.5 Illustration of a CMB and a CMB mimic denoted with the yellow and red rectangles, respectively. In each of the big rectangle, the rows demonstrate adjacent slices in axial, sagittal and coronal planes, from top to down. The importance of 3-D information can be observed.

275

Qi Dou, Hao Chen, Jing Qin and Pheng-Ann Heng

276

on radiologists and also improve clinical efficiency. However, the automatic detection of CMBs can encounter several challenges: (1) the small size of lesions; (2) the widespread distributed locations of lesions; (3) the existence of hard mimics (e.g., flow voids, calcification and cavernous malformations) which would resemble the appearance of CMBs and heavily impede the detection process.

9.5.2 Dataset, preprocessing and evaluation metrics Our employed dataset includes 320 SWI images with 1149 CMBs scanned from a 3.0T Philips Medical System with 3-D spoiled gradient-echo sequence using venous blood oxygen level dependent series with the following parameters: repetition time 17 ms, echo time 24 ms, volume size 512 512 150, in-plane resolution 0.45 0.45 mm, slice thickness 2 mm, slice spacing 1 mm and a 230 230 mm2 field of view. The subjects were from two separated groups: 126 cases with stroke (mean age standard deviation: 67.4 11.3) and 194 cases of normal aging (mean age standard deviation: 71.2 5.0). The dataset was labeled by an experienced rater and was verified by a neurologist following the guidance of the Microbleed Anatomical Rating Scale [17]. We employed the Pearson correlation coefficient (PCC) to assess the interobserver agreement between the two raters [18]. Due to the large dataset and expensive manual annotation efforts, we tested the interobserver agreement with a subset of 20 subjects (including 10 cases with stroke and 10 cases of normal aging). The PCC turned out to be 0.91 (P ¼ 3 mm accepted by three or four radiologists as the ground truth. The annotations that were failed to be included in the reference standard (i.e., nonnodules, nodules < 3 mm, and nodules annotated by merely one or two radiologists) were referred to as irrelevant findings. For preprocessing the CT scans, we clipped the grayscale values into the interval of (1000, 400) Houndsfield units and normalized them into the range of (0, 1). The mean intensity was subtracted before inputting the samples to the network. We conducted a series of augmentations for the positive nodule samples, including random translation within the radius region of the pulmonary nodule, flipping, random scaling between [0.9, þ1.1], and random rotation of [90, 180, 270] degrees in the transverse plane. When training the multi-tasking neural network, we set a relatively small training patch size [30 30 10] in the candidate screening stage for fast processing, and the second stage utilized a larger size [60 60 24] to include richer contextual information to accurately detect nodules. The 3-D fully convolutional model was randomly initialized from a Gaussian distribution N ð0; 0:01Þ, and we initialized the learning rate as 0.001. When training the hybrid-loss 3-D residual network, the first three convolutional

285

Qi Dou, Hao Chen, Jing Qin and Pheng-Ann Heng

286

layers were initialized from the FCN model and the remaining parameters of deeper layers were randomly initialized as in Ref. [29]. The convolution layers in the residual units utilized padding to maintain dimension of the feature volumes. The trade-off parameters l and b were set as 0.5 and 1e-4, respectively. The detection performance was evaluated by measuring the sensitivity and average false positive rate per scan, as defined in the challenge. A predicted candidate location was counted as the true positive if it was positioned within the radius of a true lung nodule center. Detections of irrelevant findings were not considered (i.e., regarded as neither false positives nor true positives) in the evaluation. We conducted the free receiver operation characteristic (FROC) analysis by setting different thresholds on the raw prediction probabilities. The evaluation also computed the 95% confidence interval with the bootstrapping [30]. A competition performance metric (CPM) score [31], which was measured as the average sensitivity at seven predefined false positive rates: 1/8, 1/4, 1/2, 1, 2, 4 and 8 false positives per patient, was calculated.

9.6.4 Experimental results To investigate the contribution of our proposed learning strategies, extensive ablation experiments are conducted for analysis. We first assess the capability of screening lung nodule candidates using the 3-D FCN trained to convergence with and w/o the online sample filtering (OSF) scheme. The results are presented in the first two columns of Table 9.5. We can observe that training with online sample filtering strategy significantly improves the candidate screening performance by increasing the sensitivity from 94.3% to 97.1% and reducing the FPs/scan rate from 286.2 to 219.1. The improvements present that selecting the high-loss samples (hard samples) with the online sample filtering strategy can greatly enhance the network’s discrimination capability and improve the performance. To evaluate the effectiveness of the residual learning technique and the hybrid-loss objective equipped in our model for false positive reduction, we implemented three different networksdi.e., plain deep network (DeepNet), residual network (ResNet), and our proposed novel hybrid-loss residual network (ResNet þ HL)daccording to the architecture illustrated in Fig. 9.11. Their results are presented in the last three columns of Table 9.5. With 1.0 FPs/scan, the three networks achieve detection sensitivities of 84.8%, 86.7%, and 90.5%, demonstrating that while the residual learning technique can Table 9.5 Evaluation of the learning strategies in our detection framework. Stages Candidate screening False positive reduction

Methods Sensitivity FPs/scan

FCN 94.3% 286.2

FCN þ OSF 97.1% 219.1

DeepNet 84.8% 1.0

ResNet 86.7% 1.0

ResNet þ HL 90.5% 1.0

Automatic lesion detection with three-dimensional convolutional neural networks

Figure 9.11 False positive reduction with multi-task and hybrid-loss learning.

improve the performance of traditional networks by facilitating gradients flow during optimization, the proposed hybrid-loss objective function can further boost the detection performance by additionally supervising the learning with location and size information. Fig. 9.12 presents the free-response receiver operating characteristic (FROC) curves of three networks, for more comprehensive comparison at a wider range of the false positive rates. It is observed that the proposed ResNet þ HL continually obtains the best performance among these three configurations. For overall lung nodule detection results, Table 9.6 reports the performance of our method and that of other approaches in the LUNA16 challenge. In fact, all participates employed deep learning based approaches, and we refer readers to Ref. [32], a comprehensive summary of LUNA16, to learn more details of other participating methods. It is observed from Table 9.6 that our proposed method achieves a CMP score of 0.839, which set state-of-the-art results. At the false positive rate of 0.5, 1, 2, four and eight per scan, our detection framework achieved the sensitivity of 81.9%, 86.5%, 90.6%, 93.3% and 94.6%, respectively, which are the highest among comparison methods. It is reported that, in real-world clinical practice, the FPs/scan scales between one and four are the mostly concerned [33]. Our proposed method achieves a sensitivity of 90.6% at two FPs/scan, highlighting its promising potential to be readily exploited in real clinical practice. In Fig. 9.13, we depict typical examples of final detection results with the classification probability and regressed diameter indicated. We can observe that our model can recognize the various lung nodules with a high probability, as well as reliably predict the

287

Qi Dou, Hao Chen, Jing Qin and Pheng-Ann Heng

288

Figure 9.12 Comparison of FROC curves using different network configurations for lung nodule detection, with shaded areas presenting the 95% confidence interval. Table 9.6 Comparison with other lung nodule detection methods in LUNA16 Challenge. Teams 0.125 0.25 0.5 1 2 4 8 CPM score

DIAG_ConvNet ZENT Aidence MOT_M5Lv1 VisiaCTLung Etrocad Our method

0.692 0.661 0.601 0.597 0.577 0.250 0.659

0.771 0.724 0.712 0.670 0.644 0.522 0.745

0.809 0.779 0.783 0.718 0.697 0.651 0.819

0.863 0.831 0.845 0.759 0.739 0.752 0.865

0.895 0.872 0.885 0.788 0.769 0.811 0.906

0.914 0.892 0.908 0.816 0.788 0.856 0.933

0.923 0.915 0.917 0.843 0.793 0.887 0.946

0.838 0.811 0.807 0.742 0.715 0.676 0.839

size of the detected nodules. Last but not least, our proposed lesion detection framework is quite efficient taking less than 1 minute for one case, which enables our method to be competent for performing large-scale data processing, such as the annual lung cancer screening program which is launched for high-risk populations.

9.7 Discussion To illustrate the discrimination capability of intermediate features, the representations extracted by the 2-D CNN and 3-D CNN models on the CMB detection task

Automatic lesion detection with three-dimensional convolutional neural networks

Figure 9.13 Examples of lung nodule detection results of our method with the prediction probability and diameter indicated in red.

are embedded into the 2-D plane using the t-SNE toolbox [34], as shown in Fig. 9.14. The CMB and non-CMB samples are distinctly separated based on the features extracted via our 3-D CNN. In contrast, embedding of the aggregated 2-D CNN representations do not present such a clear partition pattern, highlighting the discrimination capability of the 3-D CNN based features, which can encode richer spatial information within the volumetric medical data. Meanwhile, we also visualize the 3-D convolution kernels of the first two convolutional layers in the 3-D FCN. Fig. 9.15A illustrates the C1 layer kernels (with size 5 5 3), where each column represents a 3-D kernel which is demonstrated as three 5 5 maps. With a closer observation, we find that the learned kernels attend to the spherical shapes of the lesion as well as the intensity gradients between the microbleeds and surrounding background. More importantly, the observed slight changes of the three maps within each column prove that the 3-D kernels have effectively captured the spatial information across the third dimension of the volumetric data. Fig. 9.15B illustrates the C2 layer kernels (with size 3 3 3), where each column represents a 3-D kernel which is visualized as three 3 3 maps. These kernels are relatively difficult for straightforward interpretation, because they try to construct some high-level concepts from the output activations of the bottom layer. Nevertheless, we can still observe that these kernels attain evidently organized patterns.

289

Qi Dou, Hao Chen, Jing Qin and Pheng-Ann Heng

290

Figure 9.14 Feature embedding from the two-dimensional CNN (left) and three-dimensional CNN methods (right) with t-SNE toolbox. The red and blue colors correspond to the CMBs and non-CMBs, respectively. Best viewed in color.

(A)

(B)

Visualization of 3D kernels in the 1st C layer

Visualization of 3D kernels in the 2nd C layer

Figure 9.15 Visualization of typical learned filters in the screening three-dimensional CNN model: (A) visualization of the C1 layer kernels, where each column represents a three-dimensional kernel of size 5 5 3, which is visualized as three 55 maps; (B) visualization of the C2 layer kernels, where each column represents a three-dimensional kernel of size 3 3 3, which is visualized as three 3 3 maps.

With the design of the two-stage cascaded detection framework, we keep two aims in mind: efficiency and accuracy. For an automatic lesion detection system targeting real clinical practice, we believe that both of them are equally crucial. In the cascaded architecture, the first stage focuses on excluding massive background regions and screening potential candidates. In this stage, we develop the 3-D FCN to reduce computational cost, thus meet the requirement of efficiency. The second stage focuses on the small number of candidates and remove the difficult false positives which are with similar appearance to the lesions. In this stage, we employ a discrimination 3-D CNN to identify the true lesions with a high sensitivity and low false positive rate, thus meeting the requirement of accuracy. Quantitatively, with the first stage, we obtain hundreds of false positives per subject. After the second stage, only several false

Automatic lesion detection with three-dimensional convolutional neural networks

positives remain. We can see that the second stage removes nearly 99% false positive candidates using the 3-D CNN discrimination model.

9.8 Conclusions This chapter presents 3-D convolutional neural network based deep learning framework for automatic lesion detection in volumetric medical images. For efficiency, we further elaborate the 3-D FCN which inputs an arbitrary-sized volumetric image and directly outputs a 3-D prediction score volume within a single forward propagation. The two-stage cascaded framework has been extensively validated on two distinct challenging applicationsdi.e., cerebral microbleed detection in brain MR images and lung nodule detection in chest CT imagesdwith outstanding performance demonstrated. There are appealing potentials to apply our efficient and accurate lesion detection system in realworld clinical practice.

Acknowledgments We thank our colleagues Dr. Shi Lin, Dr. Vincent CT Mok, Dr. Defeng Wang, Dr. Lei Zhao, Mr. Lequan Yu, Ms. Yueming Jin and Mr. Huangjing Lin, for their early works which are valuable for the contents of this chapter. This work was supported by a grant from the Research Grants Council of HKSAR under General Research Fund (Project no. 14225616) and a grant from Hong Kong Innovation and Technology Commission under ITSP Tier two Platform Funding Scheme (Project no. ITS/426/17FP).

References [1] S.M. Greenberg, M.W. Vernooij, C. Cordonnier, A. Viswanathan, R.A.-S. Salman, S. Warach, L.J. Launer, M.A. Van Buchem, M. Breteler, Cerebral microbleeds: a guide to detection and interpretation, The Lancet Neurology 8 (2) (2009) 165e174. [2] A. Charidimou, A. Krishnan, D. JWerring, H. Rolf Ja¨ger, Cerebral microbleeds: a guide to detection and clinical relevance in different disease settings, Neuroradiology 55 (6) (2013) 655e674. [3] C.I. Henschke, et al., Early Lung Cancer Action Project: overall design and findings from baseline screening, The Lancet 354 (9173) (1999) 99e105. [4] H. MacMahon, et al., Guidelines for management of small pulmonary nodules detected on CT scans: a statement from the Fleischner Society 1, Radiology 237 (2) (2005) 395e400. [5] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, nature 521 (7553) (2015) 436. [6] A. Krizhevsky, et al., Imagenet classification with deep convolutional neural networks, News in Physiological Sciences (2012) 1097e1105. [7] D. Qi, H. Chen, L. Yu, L. Zhao, J. Qin, D. Wang, V.C.T. Mok, L. Shi, P.-A. Heng, Automatic detection of cerebral microbleeds from MR images via 3D convolutional neural networks, IEEE Transactions on Medical Imaging 35 (5) (2016) 1182e1195. [8] D. Qi, H. Chen, Y. Jin, H. Lin, J. Qin, P.-A. Heng, Automated pulmonary nodule detection via 3d convnets with online sample filtering and hybrid-loss residual learning, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2017, pp. 630e638. [9] H. Chen, D. Qi, L. Yu, J. Qin, L. Zhao, V.C.T. Mok, D. Wang, L. Shi, Pheng- Ann Heng, deep cascaded networks for sparsely distributed object detection from medical images, in: Deep Learning for Medical Image Analysis, Elsevier, 2017, pp. 133e154. [10] S.R.S. Barnes, E.M. Haacke, M. Ayaz, A.S. Boikov, W. Kirsch, D. Kido, Semiautomated detection of cerebral microbleeds in magnetic resonance images, Magnetic Resonance Imaging 29 (6) (2011) 844e852.

291

292

Qi Dou, Hao Chen, Jing Qin and Pheng-Ann Heng

[11] B. Ghafaryasl, Fedde van der Lijn, M. Poels, H. Vrooman, M. Arfan Ikram, W.J. Niessen, A. van der Lugt, M. Vernooij, M. de Bruijne, A computer aided detection system for cerebral microbleeds in brain MRI, in: Biomedical Imaging (ISBI), 2012 9th IEEE International Symposium on, IEEE, 2012, pp. 138e141. [12] D. Qi, H. Chen, L. Yu, L. Shi, D. Wang, V.C.T. Mok, P. Ann Heng, Automatic cerebral microbleeds detection from MR images via independent subspace analysis based hierarchical features, in: 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), IEEE, 2015, pp. 7933e7936. [13] J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3431e3440. [14] C. Cordonnier, R.A.-S. Salman, J. Wardlaw, Spontaneous brain microbleeds: systematic review, subgroup analyses and standards for study design and reporting, Brain 130 (8) (2007) 1988e2003. [15] A. Charidimou, D.J. Werring, Cerebral microbleeds and cognition in cerebrovascular disease: an update, Journal of Neurological Sciences 322 (1) (2012) 50e55. [16] J.D.C. Goos, W.M. van der Flier, D.L. Knol, P.J.W. Pouwels, P. Scheltens, F. Barkhof, M.P. Wattjes, Clinical relevance of improved microbleed detection by susceptibility weighted magnetic resonance imaging, Stroke 42 (7) (2011) 1894e1900. [17] S.M. Gregoire, U.J. Chaudhary, M.M. Brown, T.A. Yousry, C. Kallis, H.R. Ja¨ger, D.J. Werring, The microbleed anatomical rating scale (MARS) reliability of a tool to map brain microbleeds, Neurology 73 (21) (2009) 1759e1766. [18] J. de Bresser, M. Brundel, M.M. Conijn, J.J. van Dillen, M.I. Geerlings, M.A. Viergever, P.R. Luijten, G.J. Biessels, Visual cerebral microbleed detection on 7T MR imaging: reliability and effects of image processing, American Journal of Neuroradiology 34 (6) (2013) E61eE64. [19] H. Chen, L. Yu, D. Qi, L. Shi, V.C.T. Mok, P. Ann Heng, Automatic detection of cerebral microbleeds via deep learning based 3d feature representation, in: 2015 IEEE International Symposium on Biomedical Imaging (ISBI), IEEE, 2015, pp. 764e767. [20] A. Liaw, M. Wiener, Classification and regression by randomForest, R News 2 (3) (2002) 18e22. URL, http://CRAN.R-project.org/doc/Rnews/. [21] S. Geman, E. Bienenstock, R. Doursat, Neural networks and the bias/variance dilemma, Neural Computation 4 (1) (1992) 1e58. [22] M. Tan, et al., A novel computer-aided lung nodule detection system for CT images, Medical Physics 38 (10) (2011) 5630e5645. [23] T. Messay, et al., A new computationally efficient CAD system for pulmonary nodule detection in CT imagery, Medical Image Analysis 14 (3) (2010) 390e406. [24] S.G. Armato III, et al., The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans, Medical Physics 38 (2) (2011) 915e931. [25] D. Qi, H. Chen, L. Yu, J. Qin, Pheng-Ann Heng, Multilevel contextual 3-d cnns for false positive reduction in pulmonary nodule detection, IEEE Transactions on Biomedical Engineering 64 (7) (2017) 1558e1567. [26] M. Firmino, et al., Computer-aided detection system for lung cancer in computed tomography scans: review and future prospects, BioMedical Engineering Online 13 (2014) 1e16. [27] A. Shrivastava, A. Gupta, R. Girshick, Training region-based object detectors with online hard example mining, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 761e769. [28] R. Girshick, Fast R-CNN, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1440e1448. [29] K. He, X. Zhang, S. Ren, J. Sun, Identity mappings in deep residual networks, in: European Conference on Computer Vision, Springer, 2016, pp. 630e645. [30] E. Bradley, R.J. Tibshirani, An Introduction to the Bootstrap, CRC press, 1994. [31] M. Niemeijer, et al., On combining computer-aided detection systems, IEEE Transactions on Medical Imaging 30 (2) (2011) 215e223.

Automatic lesion detection with three-dimensional convolutional neural networks

[32] Arnaud Arindra Adiyoso Setio, A. Traverso, B. van Ginneken, C. Jacobs, et al., Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the LUNA16 challenge, Medical Image Analysis (2016) arXiv preprint arXiv:1612.08012. [33] B. Van Ginneken, S.G. Armato, B. de Hoop, et al., Comparing and combining algorithms for computer-aided detection of pulmonary nodules in computed tomography scans: the ANODE09 study, Medical Image Analysis 14 (6) (2010) 707e722. [34] L. Van der Maaten, Geoffrey Hinton, Visualizing data using t-SNE, Journal of Machine Learning Research 9 (2579e2605) (2008) 85.

293

CHAPTER TEN

Biomedical image segmentation for precision radiation oncology Hui Cui1, 2, Hao Wang3, Ke Yan1, Xiuying Wang1, Wangmeng Zuo3 David Dagan Feng1 Biomedical & Multimedia Information Technology (BMIT) Research Group, School of Computer Science, The University of Sydney, Sydney, NSW, Australia Department of Computer Science and Information Technology, La Trobe University, Melbourne, Australia 3 School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China 1 2

10.1 Introduction The development of medical imaging technologies contributes to precision radiation oncology, cancer diagnosis, and treatment. With the multimodality biomedical images generated during various treatment processes, computer-assisted diagnosis makes it possible that rich information from images can be analyzed and evaluated comprehensively and efficiently. Accurate region of interest (ROI) or target object segmentation from medical images plays an indispensable role in computer-assisted diagnosis. For instance, tumor boundary definition reveals the characteristics of the tumor including the shape, size, and location for accurate cancer diagnosis. Gross tumor volume segmentation and metabolic tumor volume measurement are the basis of effective treatment planning and treatment results monitoring. Before machine learning techniques became well developed, boundary identification of ROIs mostly relied on manual delineations. The manual approach is, however, time and labor consuming as well as subjective with limited reproducibility. There may also be inter- and intraobserver variations. Although there has been intensive research on computational image segmentation algorithms, tumor segmentation in precision oncology is a prolonged challenging task. This is mostly because the appearance of tumor varies among different cancer stages, scans, and patients. In addition, the tumor may exhibit indistinct boundaries and heterogeneous intensity distributions. The resolution of medical imaging scans may also introduce limited image resolution and variability in pathologiesdfor instance, location, size, texture, and shape. The development of machine learning techniques has contributed to the efficiency and effectiveness of ROI segmentation on biomedical images. Image segmentation can be considered as the partition of a given image into multiple nonoverlapping regions depending on grayscale value, color distribution, spatial texture, geometric shape, and other features. These features may present similar attributes or consistency in the same Biomedical Information Technology ISBN 978-0-12-816034-3, https://doi.org/10.1016/B978-0-12-816034-3.00010-9

2020 Elsevier Inc. All rights reserved.

295 j

Hui Cui, Hao Wang, Ke Yan, Xiuying Wang and Wangmeng Zuo

296

region but differences in discrepancy regions. Before the emergence of deep neural networks, handcrafted feature-based methods were commonly used for image segmentation, such as thresholding, clustering, and graph partitioning segmentation methods. Graph theory has a long history in mathematics and computer science [1]. Since at least 2011, graph models have been used in image processingdfor instance, semisupervised clustering, user interactive segmentation, and saliency detection. In this section, we will introduce representative graph-based models and regionbased neural networks in biomedical image segmentation. Firstly, key issues in graph construction are introduced. Then we review graph theoretical models for image segmentation. Secondly, deep neural networks for object detection and segmentation are briefly reviewed, followed by region-based and location-aware kernel approximation methods. The applications of deep networks on medical image segmentation are also presented. We finally address the contribution of complete tumor segmentation and quantitative computing of metabolic sub-volumes in tailored dose painting for precision oncology and personalized treatment planning.

10.2 Graph models in biomedical image segmentation The way a graph is designed and constructed serves as the basis of data clustering and image segmentation processes and is essential and important for fully utilizing prior information and achieving expected results. Conventionally, a graph G ¼ ðV; EÞ is constructed with a set of vertices or nodes V ¼ fvi ji ˛½1; NV g and edges E4V V, where an edge eij ˛E connects nodes vi and vj. The weight of a node is denoted by w(vi) or wi, and weight w(vi, vj) or wij is assigned to edge eij to reflect the similarities between nodes vi and vj. Given an image I ¼ {xiji ˛[1,NI]}, there are many ways that a graph G can be specialized in image representation and applied in image segmentation.

10.2.1 Graph nodes Conventionally, a graph node vi corresponds to and represents an image pixel or voxel xi. With the introduction of superpixel techniques, regions or patches are also widely used as a graph node. In these models, the image is preprocessed and partitioned into several irregular regions in an unsupervised manner such as mean shift, quick shift, and so on. Compared with pixel-level nodes, the region-level nodes can represent more informative image features and textures to improve the segmentation results of texture images or coarse images. Region-level models also have the advantages of propagating local grouping cues to a larger image range while minimizing the influences of frequent localintensity changes or noises. However, the segmentation results would largely depend on the initial region partition results, as perfect superpixels cannot always be guaranteed.

Biomedical image segmentation for precision radiation oncology

10.2.2 Graph edges Graph edge is another essential component in graph construction. Graph edges are obtained by connecting nodes. Generally, node connections can be categorized as geometrical and topological connected. Geometrical connections rely on the spatial locations of the nodes, while topological connections depend on the image structure such as gradient or the evolution of level sets. There are weighted and nonweighted edges depending on the tasks to be accomplished. In this section, we will introduce widely used node connections and weighting functions in graph-based image segmentation. 10.2.2.1 Nodes connection 1) Geometrical connection Neighboring/adjacent connection: In most pixel-level graph models [2], the graph nodes constructed for a two-dimensional images are normally 4-adjacent/neighboring connected, and those of a three-dimensional volume can be 6-adjacent, 18-adjacent, or 26adjacent connected. Radiation connection: In some literature [3], a radiation connection is proposed where each node is connected to its neighboring nodes as well as the nodes sharing common boundaries with the neighboring nodes. In such a way, local cues are propagated broader than neighboring connections. Full connection: When the number of nodes is small, the nodes can be fully connected to achieve the whole image information propagation [4,5]. However, the full connection would decrease computation efficiency with increasing node numbers. Moreover, the full connection is not commonly used in pixel-level graph models (Fig. 10.1). 2) Topological connection Compared with geometrical connections, topological connections or topological graphs provide abstracted and structured data representations. Topology information is widely used in the research area, such as volume rendering and scalar field visualization [6e8].

Figure 10.1 Geometrical connections: four and eight neighboring connections, and radiation connection.

297

Hui Cui, Hao Wang, Ke Yan, Xiuying Wang and Wangmeng Zuo

298

Cui et al. [9e11,11a] performed a dense investigation and exploited the topological connections to graph-based medical image segmentations. As an abstract data representation, the nodes in topological graphs correspond to critical points in the image, which are local maximum, local minimum, and saddle points. Two widely used topological connections are Extreme graph: Extremum graph, a simplified MorseeSmale complex, is an abstraction of the gradient flow of an image [6]. Local extremes are connected by edge along the steepest ascending or descending gradient flows. By such, geometrical locations of the nodes are reserved. Contour tree: In a contour tree, an edge tracks the evolutions of level sets or isocontours. The edge is constructed by sweeping through the changes of isovalues from the minimum to maximum value based on the rules in Table 10.1 as defined by Ref. [12]. Compared with extreme graph connection, the contour tree connection represents how the level sets merge and split to form individual components [6]. Contour tree is an abstraction of topological inclusion and neighboring relations [13,14]. As shown in the above introduction, the processing and construction of a topological connection is more complex than that of a geometrical connection. Much research has focused on the simplification of topological representation [6,15,16]. For instance, as PET images are noisy and have low resolution, conventional contour tree construction procedures may produce giant-sized tree structures, making data analysis and visualization impractical, and it normally needs further simplification [15]. The direct adoption of a conventional contour tree may also include redundant information, which makes data analysis impractical [15]. Cui et al. [10] focused on only abstracting topology relations of the ROI extracted from PET by predefined foreground seeds and SUV thresholding. In that work, local extremes were firstly detected to generate topological regions. The topological regions are composed of sets of isocontours within ROI. Then the joint point of two isocontours is defined by a saddle point. In the constructed topology tree, a leaf node corresponds to a local extreme, and an interior node corresponds to a saddle point. The edge in the tree corresponds to a topology region. On the basis of the topology graph, the topology relations of image regions are derived by the rules below and shown in Fig. 10.2. Table 10.1 Contour tree connection.

A node is created at a local minimum, and an individual component appears; An existing component disappears at a local maximum; A node is created when two or more existing components merge to a new component at a saddle point; An existing component splits into new components at a saddle point; An edge connects the node where a new component appears and the node where this component disappears.

Biomedical image segmentation for precision radiation oncology

Figure 10.2 A topology graph construction procedure for PET foreground ROI by Ref. [10]. (A) cropped transaxial FDG-PET image with region of interest in red bounding box; (B) corresponding topology regions; (C) corresponding topology tree with edges representing the regions in (B) with the same color; (D) instance topology branches and corresponding topology regions with isocontours and interior nodes corresponding to saddle points and leaf nodes corresponding to local extremes; (E) derived topology structural relations and topology graph.

10.2.2.2 Weighting function When constructing a graph, weighting functions need to be defined as the affinities between nodes directly reflected by edge weights. Generally, the weighting functions can be specialized in the following ways: An intensity-based weighting function captures the intensity changes between connected nodes and is widely calculated by a Gaussian model [2,17] as 2 wij ¼ exp agi gj (10.1)

299

Hui Cui, Hao Wang, Ke Yan, Xiuying Wang and Wangmeng Zuo

300

where gi denotes the intensity or color of a node vi and a is the scaling parameter. With this definition, the connections between nodes with similar intensities or colors are stronger than those with dissimilar values. With defined weights, the nodes with similar intensities would be more likely to be grouped together. The intensity-based function can also be calculated together with distance information as 2 2 wij ¼ exp agi gj b4i 4j (10.2) where 4i is the position of node vi. Based on this formulation, some methods also estimate the prior likelihood [17,18] of nodes as an alternative of intensity value. However, if the intensity distributions of foreground and background objects are similar or the object boundary is indistinct, it may fail in the appropriate separation. To capture the object boundary, especially when the foreground and background have similar intensity distributions, a boundary-based weighting function [5] is defined by measuring the edge magnitude as wij ¼ exp amaxi0 ˛ij kki0 k2 . (10.3) where ij denotes a straight line connecting nodes vi ; vj and ki0 denotes the edge strength of node vi. However, this weighting function would fail in the complete definition of textured object because of the frequent changes of gradient magnitude. Thus, in some literature, a combined intensity and boundary weighting function is defined as qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi B B (10.4) wij ¼ wC ij w ij þ g$wij where wijC and wijB denote the intensity-based and boundary-based weighting functions, respectively. With defined edge weights, a weighted graph can be categorized as Undirected and directed graph: if at least one edge weight wijswji, then the graph is called a directed graph; otherwise, the graph is an undirected graph. Unweighted graph and weighted graph: if wij ¼ 1, the graph is considered an unweighted graph; otherwise, the graph is a weighted graph.

10.2.3 Graph matrices Graphs are commonly represented by the following matrices associated with graph nodes and edges as shown in Fig. 10.3. Incidence matrix represents the direction of each edge instead of the edge weight as an jE jjE j matrix A:

Biomedical image segmentation for precision radiation oncology

8 1; if sðeij Þ ¼ vj > > > < Aij ¼ þ1; if tðeij Þ ¼ vj > > > : 0; otherwise

301

(10.5)

where s(eij) ¼ vi is the origin or source of eij, and t(eij) ¼ vj is the endpoint or target of eij. Adjacency matrix or weight matrix is a jVj jVj matrix that represents the graph weights below, and it is noted that the adjacency matrix of an undirected graph is symmetric: ( wij ; if eij ˛ε W ij ¼ (10.6) 0 otherwise Laplacian matrix, with size jVj jVj, of an undirected graph is 8 di ; if i ¼ j > > > < L ¼ wij ; if eij ˛ε > > > : 0 otherwise P where di ¼ wij .

(10.7)

j

Constitutive matrix is defined as an jE jjE j matrix: ( wij ; if i ¼ j Cij ¼ 0 otherwise

(10.8)

Given an undirected graph and a diagonal matrix G ¼ 1 where Gii ¼ w(vi), the relations between the above matrices are AT CA ¼ W D ¼ L

(10.9)

where D ¼ diag(di).

10.2.4 Graph-theoretic methods in target object segmentation It has been shown in the literature that since 2011, graph models such as random walker (RW) [2] and graph cut (GC) [19] have proved superior to other conventional methods such as thresholding methods and fuzzy C-means [20e22] in tumor boundary definition from medical images. The graph-based segmentation is formulated in a semisupervised manner by constructing graph models and solving a sparse matrix in a spatially discrete space. Fundamental graph theory approaches such as GC [19] and RW [2] have been

Hui Cui, Hao Wang, Ke Yan, Xiuying Wang and Wangmeng Zuo

302

widely used [20e22]. For example, RW [2] can capture local intensity changes and has proved the capability to solve “weak boundary” problems for various organs from different image modalities [2,23,24]. In this section, we will introduce the popular graph models in target object segmentation. According to the research of [25], the core graph segmentation models GC, RW, and shortest paths can be formulated by a general framework with varying parameters [25], as E ¼ Eunary ð f Þ þ lEbinary ð f Þ

(10.10)

where fi ˛[0,1] denotes the probability of node vi belonging to the foreground set F and background set B, respectively; if a node is known to belong to F (B), fi ¼ 1 ( fi ¼ 0). Nodes with predefined labels are called seeds. l is a parameter to balance the unary and binary terms. The unary term formulates pairs of nodes as X q q wBi j fi 0jp þ wF i j fi 1jp (10.11) Eunary ð f Þ ¼ vi ˛V

and the binary term formulates nodes and edges as X q Ebinary ð f Þ ¼ wij j fi fj jp

(10.12)

eij ˛E

where wF i denotes the unary object weight and wBi denotes the unary background weight. With various combinations of parameters p, q, different core graph models would be generated according to Ref. [25]. In the following sections, we will review the graph regulation models in the three core aspects. 10.2.4.1 Random walkerebased models The RW algorithm is derived by giving q ¼ 2 and q a small finite value in Eq. (10.11). The RW model [2] is a minimization of X !T ! Eð f Þ ¼ wij j fi fj j2 ¼ f L f (10.13) eij ˛E

By partitioning the nodes into seeds V M and unlabeled nodes V U where V M WV U ¼ V and V M XV U ¼ F, Eq. (10.13) can be decomposed into " #" # h i LM B fM ¼ fMT LM fM þ 2fUT BT fM þ fUT LU fU (10.14) Eð f U Þ ¼ fMT fUT T B LU fU and by differentiating E( fU) with respect to fU , the final segmentation is solved by a liner equation as LU fU ¼ BfM

(10.15)

Biomedical image segmentation for precision radiation oncology

303

The RW algorithm [2] can be interpreted as finding the first arrival probability that a walker starting from an unlabeled node reaches a seed, and the node will be assigned the label with max probability. As it focuses on the first arrival probability, which depends on the local relationships between nodes, RW has been proved to perform better than GC under challenging conditionsdfor instance, weak boundaries [2,23,26]. RW, however, is sensitive to the locations and number of input seeds. Many investigators have identified that RW lacks informative global features and depends on local changes in intensities [26], and local pixel-based intensities are insufficient for robust segmentation [18,27]. One solution to incorporate global relations is a random walk with restart (RWR) model [27,28] that captures the global information. RWR can be interpreted as the RW at each step with a restarting probability c to return to the seeds, and is formulated as ! ! ! ! f ¼ ð1 cÞP f þ c b M ¼ cðI ð1 cÞPÞ1 b M (10.16) where P is the transition matrix, which is a row-normalized adjacency matrix W as P ¼ D1W. Similarly, manifold ranking [29] modeled the segmentation in an iterating process with a binormalized adjacency matrix S as ! ! ! ! ! f ðt þ 1Þ ¼ cS f ðtÞ þ ð1 cÞ b M 0 f ¼ ðI cSÞ1 b M (10.17) where S is S ¼ D2 W D2 1

1

Prior knowledge can also be incorporated into the RW model. For instance, by considering the prior intensity distribution, the RW with prior term [1] can be modeled as Etotal ¼ Espatial þ gEaspatial ; Easpatial ¼

k T !s X !sT q !s !s f R f þ f 1 Rs f 1 q¼1

(10.18) where Rs ¼ diag(rs ) and ris are the priors at each node. By differentiating (10.18) and setting to zero: 1 0 k X vEtotal !s @ Rq A f ¼ gRs (10.19) !s / L þ g vf q¼1 If the user input is also considered, then

Hui Cui, Hao Wang, Ke Yan, Xiuying Wang and Wangmeng Zuo

304

0 @LU þ g

k X q¼1

1 !s q s RU A f U ¼ gRU Bys

(10.20)

To consider local and regional grouping information, Kim et al. [30] incorporated regional information obtained by superpixels as a fully connected region layer to improve segmentation for textural objects when compared with RW for natural images in Ref. [2]. 10.2.4.2 Graph Cut, Normalized Cut and Average Cut Given p ¼ 1 and q as a small finite value, a GC model is derived as X CutðF; BÞ ¼ wij

(10.21)

eij ˛E

By updating the energy functions, other segmentation models can be obtained according to varying tasks, such as minimum cut (MCut), as MCutðF; BÞ ¼ argminDCutðF; BÞ; DCutðF; BÞ ¼ F ;B

max wij

vi ˛F ;vj ˛B

(10.22)

Normalized cut (NCut) [31] is obtained as NCutðF ; BÞ ¼ P where AssocðF ; VÞ ¼ wij . v ˛F ;v ˛V i j Average cut (ACut):

CutðF ; BÞ CutðF ; BÞ þ AssocðF ; VÞ AssocðB; VÞ

(10.23)

CutðF ; BÞ CutðF ; BÞ þ jF j jBj

(10.24)

ACutðF ; BÞ ¼

10.2.5 Applications in medical image segmentation Graph models have been used in the automated boundary delineation of ROI from multimodality biomedical images. For instance, an RW-based model and a combined GC and Markov Random Field model [20,32] have been applied to regulate and penalize the energy functions of PET and CT intensity distributions for co-segmentation. In this section, we briefly review the graph-based models in medical image segmentation by how the graph is constructed. Specialized graph models vary in the definition of nodes. In some graph models, image regions can be used as graph nodes [3e5,33], where the images are preprocessed and partitioned into regions in an unsupervised manner: mean shift, quick shift, etc. Region-level nodes represent more informative image features and textures compared

Biomedical image segmentation for precision radiation oncology

with pixel-level nodes. These specialized graph models thus have improved segmentation results over texture images or coarse images [5]. Models with region-level information also have the advantage of propagating local grouping cues to broader image ranges while minimizing the influences of frequent local intensity changes or noise, but the segmentation results are largely dependent on the initial region partition results, as perfect superpixel presegmentation cannot always be guaranteed. Specialized graph models also vary in node connections/edge definitions, and these can be categorized as geometrical and topological connections. In geometrical connections, the edges are constructed based on the spatial locations of nodes. For instance, a radial connection is proposed in the graph model [3], where each node is connected to its neighboring nodes as well as the nodes sharing common boundaries with these neighboring nodes. Other edge definitions include full node connections. When the number of nodes is limited, the nodes can be fully connected to achieve information propagation across the whole image [5]. Full connection, however, decreases computational efficiency with increasing node numbers, and we have previously reported that full regional connections produce misleading grouping information and result in leakage or undersegmentation of inhomogeneous objects [9]. Topological connections and graphs provide abstracted and structured data representations that can be analyzed practically. Topological graph models include an “extreme graph,” which is an abstraction of the gradient flow of an image [6] and a contour tree that represents how the level sets merge and split to form individual components [13,14]. The construction of topology graphs can be complicated in medical image processing. For instance, since PET images are noisy, conventional contour tree construction methods may produce giant-sized tree structures, making data analysis and visualization impractical [15]. Hence, there has been research focused on the simplification of topological representation [6,15,16]. Recently, Cui et al. [9] have researched ROI topology extraction from PET images and exploited the topological connections to graph-based segmentations from PET-CT images [9e11], or entropy maps from CT images [11a].

10.3 Deep network in object detection and segmentation Object detection and segmentation are the two essential steps in many computer vision systems, such as surveillance and pedestrian detection. There have also been substantial applications in medical image analyses. As an important technique in computer vision, deep neural networks have recently achieved unprecedented success in object detection and segmentation. In this section, we present an introduction to several representative models with a brief survey of their applications in medical image analysis.

305

Hui Cui, Hao Wang, Ke Yan, Xiuying Wang and Wangmeng Zuo

306

10.3.1 Deep object detection Object detection is a fundamental process in computer vision. Before the emergence of deep learning, object detection was mainly driven by SIFT [34] and HOG [35] features as well as the development of the deformable part model [36] and its variants [37]. With the development of deep neural networks, object detection can be more precise. For example, Krizhevsky et al. [38] revived the convolutional neural network (CNN) by showing substantially improved image classification accuracy on the ImageNet Large Scale Visual Recognition Challenge [39], proving that CNN-based methods can be further improved over traditional classification methods. However, compared with whole-image classification, object detection tasks require localization of various objects within an image, making it clear that CNN cannot be directly applied to object detection. The above issue can be addressed by operating within the recognition-using region paradigm [40]. A representative model is ReCNN, proposed by Girshick et al. [41], which combines region proposals with CNNs. ReCNN is considered as one of the first successful models that associates CNN with object detection tasks. The promising results demonstrated that CNN can lead to dramatically more accurate object detection performance on the PASCAL VOC benchmark [42] in comparison with other methods using handcrafted features. Other representative CNN architectures have been also proposed for object detection. In the following section, we mainly introduce faster ReCNN [43] and multiscale location-aware kernel representation (MLKP) [44] as the two representative models. 10.3.1.1 Region-based convolutional neural networkebased models A number of methods including fast ReCNN [45] and the later version, faster ReCNN [43], have been developed based on ReCNN [41]. ReCNN incorporates region generation methods (e.g., selective search [46]) and features extracted from the CNN model for object detection, and has shown the power of deep learning in object detection. Although ReCNN achieved impressive improvement compared with other conventional methods, it has three weaknesses including (1) each region proposal is required to be input to the DNN individually and thus there is no sharing computation; (2) as the SVM classifier and bounding-box regressor are learned in a multistage pipeline, ReCNN cannot be trained in an end-to-end manner; and (3) the training process is heavy in computational time and memory requirements because the features extracted from CNN must be precomputed and saved and then be used to train the classifier and regressor. To address the above weaknesses, fast ReCNN was proposed, with a single-stage training process in an advanced ROI-pooling layer. In the fast ReCNN model, each region proposal proposed by selective search is projected on the last feature map (e.g., conv5_3 in VGG16 [47]) into a fixed-size feature map by the ROI-pooling layer. Thus, the input data of fast ReCNN do not necessarily require region proposals. The SVM is

Biomedical image segmentation for precision radiation oncology

also replaced by a SoftMax layer to classify proposals. There are two main training strategies for faster ReCNN: end-to-end training and 4-step alternating training. For the end-toend training strategy, the parameters in the network are all updated in one single iteration, and the losses for the region proposal network (RPN) and base network are combined for calculating the gradient of the parameters. This solution is easy to be implemented and has a fast training speed. Another solution is a pragmatic 4-step training algorithm. It is adopted to learn shared features via alternating optimization of the RPN and the base convolutional network. Both parts share the same convolution layers to form a unified network. It has a low training speed compared with end-to-end training but has more accurate performance. Faster ReCNN solves the key challenge of efficiently generating region proposal boxes. It achieves a significant improvement in the object detection research area and is also the first method toward real-time deep object detection. Generating region proposals is one of the computational bottlenecks in detection algorithms. To address this issue, ReCNN [41] and fast ReCNN [45] can be replaced by real-time algorithms. In 2015, Ren et al. proposed faster ReCNN [43] in considering the trade-off between effectiveness and efficiency. Faster ReCNN utilizes a shared convolutional network, RPN, to predict the proposal boxed directly rather than relying on prior generation methods (e.g., selective search). The embedded RPN improves the object detection accuracy and efficiency. 10.3.1.2 Multiscale location-aware kernel representation Although faster ReCNN and its variants [48e51] have shown promising performance in object detection, they merely focus on the simple first-order CNN representations, which have shown limitations in detection accuracy improvement. To address this issue, Wang et al. [44] proposed MLKP [44] to explore high-order statistics in object detection. MLKP focuses on the generation of discriminative representations for data and provides an alternative feature fusion method. A location-weight-based framework is inserted into the network so that the framework is sensitive to location. There are three parts in MLKP architecture: multiscale feature map generation, highorder representation, and a location-weight network. MLKP also adopts an alternative feature fusion method where the last two feature maps in each convolution block (e.g., conv4_3/2 and conv5_3/2 in VGG16 [45]) are used to capture a high-resolution feature map. A deconvolution layer based on an upsampling mechanism is used to normalize the size of feature maps of different convolutional blocks. The modified multiscale feature map is recovered by a convolution layer to explore high-order representation. High-order statistic information with deep CNNs has proven to have improved performance over challenging fine-grained visual categorization tasks [52,53]. However, they are based on spatially irrelevant global representations with high-order statistics and thus are inappropriate for the bounding box regression task. MLKP modifies the homogenous polynomial kernel by extending its high-order formulation as

307

Hui Cui, Hao Wang, Ke Yan, Xiuying Wang and Wangmeng Zuo

308

8 9 R X = X, diagnostic report, voice report); this can significantly help with image description. For instance, with the help from natural language processing or voice segmentation, the diagnostic report can be used to clarify missing factors in some situations like “describing a lung x-ray of a person remains incomplete and insignificant if we ignore that he smokes.” The content space, on the other hand, provides a global image description and can be used for various query types. In general, a medical image is considered to be composed of a set of salient image objects in three different forms: (1) the anatomic organ(AO)presents the medical organs found in the image such as the brain, lungs, hand, etc. It gathers a set of medical regions (MRs) and is also called the organ of interest; (2) the MRdescribes the internal structure of the AO such as the left ventricle and the right lobe; it allows one to locate any anomaly

349

350

Weidong Cai, Yang Song, Ashnil Kumar, Jinman Kim and David Dagan Feng

and is also referred as the ROI; and (3) medical sign(MS)concerns either medical anomalies (such as tumor, fracture, and lesion) identified and detected by physicians, or unidentified (or variable) objects found in the image; sometimes it is referred as the PBR. Each salient object (AO, MR, or MS) is projected on the following subspaces: (1) The physical subspace contains low-level physical properties of the image content such as various global or local color and texture features that can be extracted manually, semiautomatically, or automatically depending on the contextual space (image type, format, quality, etc.) and may be used later to analyze other subspaces. Moreover, the physical analysis can be achieved based on the pseudo-independent and dependent contexts. For example, the patient’s age is a determinant factor when considering the medical organ shape. The type of the image determines the appropriate color extraction approaches. (2) The spatial subspace holds middle-level geometric features of salient objects such as the shape and spatial relationships features. (3) The semantic subspace concerns high-level semantic properties of salient objects. The objective of the semantic subspace is to integrate highlevel features of objects and relations judged primordially by medical users for image description. However, such semantic feature analysis may require human intervention since explicit semantic objects must be recognized. The semantic subspace is usually described manually by the user because the medical domain is very complex and each term may have several meanings depending on the context. Medical signs can be codified by some existing, albeit controversial, labeling codes for disease classification such as ICD-10 (International Classification of Diseases 10th Revision) [213] or Unified Medical Language System [214]. The above hyperspaced image data model has been integrated into the MIMS prototype [179,209,215e217]. Lehmann et al. [208] presented a general structure for content-based image retrieval in medical applications (IRMA) based on a generic multistep approach including categorization of the entire image; registration with respect to prototypes; extraction and querydependent selection of local features; hierarchical blob/object representation; and image retrieval. To cope with complex medical knowledge, IRMA split the whole retrieval process into seven consecutive steps, where each step represents a higher level of image abstraction, reflecting an increasing level of image content understanding: (1) Image categorization(based on global features)determinesimaging modality and body orientation as well as the examined body region and biological system for each image entry with a detailed hierarchical coding scheme [218] to supplement existing standards such as DICOM. (2) Image registration(in geometry and contrast)ddiagnostic inferences derived from medical images are normally deduced from an incomplete but continuously evolving model of normality [4]. Therefore, registration is based on prototype images defined for each category by experts with medical prior knowledge or by statistical analysis, in which the prototypes can be used for determination of parameters for rotation, translation, scaling, and contrast adjustment. (3) Feature extraction (using local features)derives various local image descriptions with either a category-free or category-specific approach. Like

Content-based large-scale medical image retrieval

the global features for categorization, the number of local feature images is extensible. (4) Feature selection (category and query dependent)dthe separation of feature selection from feature extraction enables the former task to be retrieval-dependent. It can integrate both the image category and query context into the abstraction process with a precomputed set of adequate features. For instance, the retrieval of radiographs with respect to bone fractures or tumors can be conducted using a shape-based or texture-based feature set, respectively. (5) Indexing (multiscale blob-representation) provides an abstraction of the previously generated and selected image features, resulting in a compact image description by clustering of similar image parts into regions described by invariant moments as “blobs.” Thereafter, the blob-representation of the image is adjusted with respect to the parameters determined in the registration step, yielding a multiscale “blob-tree”; (6) Identification (incorporating prior knowledge) provides linking of medical apriori knowledge to certain blobs generated during the indexing step. Therefore, it is the fundamental basis for introducing high-level image understanding by analyzing regional or temporal relationships between blobs. Finally, (7) retrieval (on an abstract blob-level) is performed by searches in the hierarchical blob-structures. This retrieval step requires online computations, while all other steps can be performed automatically in batch mode at image entry time (offline computation). The above multistep approach has been applied to the IRMA database of radiographs (consisting of medical images of six major body regions taken from daily routines), narrowing the gap between the semantic imprint of images and alphanumerical descriptions, which are always incomplete [208,209]. Some other CBMIR systems that provide varied medical images retrieval include I2C (Image Indexing by Content) [219], COBRA (Content-Based Retrieval Architecture) [220], ImageEngine [221], and MedGIFT [111,222,223]. In particularly, MedGIFT, with integration of the GNU image-finding tool [224], multimedia retrieval markup language [225], and Casimage, provides an open source framework of reusable components for a variety of CBMIR systems to foster resource sharing and avoid costly redevelopment.

11.5.3 Retrieval based on physiological functional features The CBMIR techniques introduced in this chapter so far are mainly designed for anatomical images that capture human anatomy at different levels and primarily provide structural information. Unlike those anatomical images, functional/molecular images such as PET and SPECT allow the in vivo study of physiological and biochemical processes, providing functional information previously not availabledthat is what distinguishes medical images most from other types of general images [84,226,227]. Physiological function can be estimated at the molecular level by observing the behavior of a small quantity of an administered substance “tagged” with radioactive atoms. Images

351

352

Weidong Cai, Yang Song, Ashnil Kumar, Jinman Kim and David Dagan Feng

are formed by the external detection of gamma rays emitted from the patient when the radioactive atoms decay. Glucose metabolism, oxygen utilization, and blood flow in the brain and heart can be measured with compounds labeled with carbon (11C), fluorine (18F), nitrogen (13N), and oxygen (15O), which are the major elemental constituents of the body. Existing CBMIR approaches may not be optimal when applied to functional images due to their unique characteristics with regard to the inherent knowledge of the disease state as it affects the physiological and biochemical processes before the morphological change of the body. Such quantitative physiological information inside the functional image content is unlikely to be retrieved by common image retrieval techniques using color, texture, and shape features. Color is not captured in the imaging process and functional images are usually acquired and displayed in grayscale, or pseudocolor. Therefore, the color feature is unlikely to be applicable to functional images. Texture is likely to be confounded by statistical noise in functional images. Shape is also unlikely to be relevant to function. Indeed, function is likely to result in changes in apparent shape during acquisition as the tracer redistributes. It appears that the development of CBMIR for functional images should take into account the specific physiological functional features [169,228]. An early study on content-based retrieval of dynamic PET functional images was reported in Ref. [228]. Based on this work [169], developed a new VOI-based retrieval system for multidimensional dynamic functional [18F]2-fluoro-deoxy-glucose (FDG) brain PET images that are widely used to determine the local cerebral metabolic rate of glucose (CMRGlc) and to depict the glucose consumption and energy requirements of various structural and functional components in human brain. In dynamic functional imaging studies, the prior knowledge has the form of tracer kinetic model to a time series of PET tracer uptake measurements. Such functional information can be defined in terms of a mathematical model m(tjp) (where t ¼ 1,2,., T are discrete sampling times of the uptake measurements while the number of conventional scan time interval T is 22, and p is a set of the model parameters), whose parameters describe the delivery, transport, and biochemical transformation of the tracer. The input function for the model is the plasma time activity curve (PTAC) obtained from serial blood samples. Reconstructed PET images provide the tissue time activity curve (TTAC), or the output function, denoted by f(t) for every voxel in the image. Application of the model on a voxel-by-voxel basis to measured PTAC and TTAC data using certain rapid parameter estimation algorithms [229,230] yields physiological parametric images. In Ref. [169], a four-dimensional fuzzy c-means cluster analysis [231,232] was used to construct VOI functional groups consisting of voxels that have similar kinetic behaviors. The physiological TTACs were firstly extracted for each of the N nonzero voxels in the image to form the kinetic feature vector comprising the voxel values at the dynamic time sequence of tracer uptake measurements. After applying the optimal image sampling schedule technique [233,234], for the dynamic FDG brain PET image study based on the five-parameter

Content-based large-scale medical image retrieval

FDG model, the dimension of TTAC vectors was reduced from 22 to 5, while increasing the signal-to-noise ratio of the individual image frames for better clustering output. The fuzzy c-means cluster analysis was then applied to assign each of the N feature vectors to a set number C of distinct cluster groups and minimized the objective function. Upon convergence, a cluster map is created by assigning to each voxel a value equal to the cluster number for which it has the highest degree of fuzzy membership. From the derived clustered results, the region-growing algorithm [147] was applied to the voxels in each cluster to construct the VOIs for grouping the voxels that were spatially connected and separating the different structures that may have been classified into a cluster due to the similarity of voxel’s kinetic behavior. The TTAC feature vectors extracted from the VOIs are indexed as physiological functional features and used as a key query method in the VOI-FIRS [169]. In VOI-FIRS, the query component “Query by functional and physiologic features” allows the users to manually draw the TTAC feature curve with the labeled grid, or to select from a list of predefined sample TTACs if needed. Once the selection has been made, the TTAC curve can be manually adjusted for individual TTAC sampling points. As the TTAC curve is concentrated in the early temporal frames, the drawn curve can be zoomed for closer inspection. The result demonstrated that retrieving based on the combination of functional features and the spatial properties of the dynamic PET images in the 3-D volumetric location feature, VOIs with the userdefined kinetic TTAC characteristics can be successfully identified, which may have not been possible from the functional feature alone [157,212]. In Ref. [235], a new scheme for efficient 3-D content-based neurological image retrieval was proposed based on 3-D pathology-centric masks for extracting CMRGlc features with volumetric cooccurrence matrices from neurological FDG PET images in clinical dementia studies.

11.5.4 Understanding visual features and their relationship to retrieved data To overcome the semantic gap, as defined in Section 11.2.1, several medical CBIR systems [236e239] retrieve a set of the most similar images (based on feature similarity) and let the user be the final arbiter to determine which of the images are relevant to the query. Results are often presented to the user as a ranked list but there is no guarantee that these ranked lists contain all the relevant images from the dataset. Indeed, the interpretation of modern medical images is time-consuming given the data complexity and thus it is not feasible to simply expand the number of images included in a ranked list and users may not have the opportunity to consider low ranked images for their specific application. Because CBMIR methods often combine multiple types of features, ranked lists do not provide the user with an explanation about the features responsible for the rankings and thus do not explicitly communicate how the result set is distributed in the feature space. For example, the retrieval may provide similar rankings for two images

353

354

Weidong Cai, Yang Song, Ashnil Kumar, Jinman Kim and David Dagan Feng

despite different characteristicsde.g., features representing tumor size versus number of tumors. Thus, it is important for human users to understand which features were relevant to each individual retrieved image because the retrieved images might represent diverse subsets of the database. In such circumstances, it can be important to communicate to users what visual features are important to each image. Most approaches to this problem utilize some form of visual analytics, which utilize interactive visualizations to facilitate analytical reasoning based upon information extracted from the database [240]. This augments complex retrieval tasks by combining human and automated analyses. The VOI-FIRS [169] system used a GUI to support user navigation within a 3-D viewing space, allowing the user to compare the input volume and the retrieved image data to determine if the images were visually relevant. In the general domain, Hiroike et al. [241] transformed the high-dimensional image feature space into a 3-D coordinate space to display thumbnails of images that were clustered based upon the similarity of their features. Similarly, Gao et al. [242] used interactive refinement in multimedia collections. Other approaches break down the challenge even further and examine the contribution of individual features numerically, such as the analysis of the feature similarity of retrieved data by Rodrigues et al. [243]. Itoh et al. [244] created a visualization of highdimensional feature spaces to identify relationships and interdependencies between features, which could be used to eliminate redundant features and emphasize relevant ones; this was applied medical image data to allow humans to optimize feature spaces for applications such as CBMIR. Recent research in CBMIR has described tools for explaining the relevance of the retrieved images by combining both the visualization of the image data and a breakdown of the image features used to rank the images [245,246]. Such tools provide the ability for users to interact with and further refine the retrieved result and are a complementary approach that can be used in combination with relevance feedback. Kumar et al. [245] reported a GUI for PET-CT data that was designed for explaining the ranking of retrieved results. The GUI included multiple views of volumetric and multimodality images. It also included abstractions that represented the spatial features of the imaging data. By interacting with the abstractions, the user was able to obtain image features associated with the features of different regions (e.g., the size of tumors). At the same time, interaction with the abstraction triggered a visualization that highlighted imaging data withsimilarities between the regions in the query and retrieved results. An example is shown in Fig. 11.3 below. The visual analytics for medical image retrieval (VAMIR) tool [246] similarly guides users in an exploration of a multidimensional feature space using the query medical image as a point-of-reference (see Fig. 11.4). Its goal isto allow users to understand what features areimportant for a particular query and to discover “new” relevant images that may not appear within or near the top of a ranked list. In VAMIR, retrieved images are

Content-based large-scale medical image retrieval

Figure 11.3 A tool for explaining the relevance of each retrieved image to the query according to the image data and the image features of individual ROIs [245]. The query image is on the top left of the GUI; the search parameters/weights for different features are presented on the bottom left. The other four sections in the GUI show the retrieved images. The graphs represent the abstraction of the spatial relationships of the regions in the images. Users are able to examine the individual features of a region (bottom left) by interacting with the abstraction.

Figure 11.4 Visualization of image feature spaces for image retrieval with the VAMIR tool [246]. Each individual node represents an individual image in the database. The nodes are plotted according to the difference in a specific feature compared to the query. Colors are used to identify different clusters according to their distance to the query.

355

Weidong Cai, Yang Song, Ashnil Kumar, Jinman Kim and David Dagan Feng

356

plotted as nodes on a canvas representing specific feature spaces (e.g., tumor size or lung texture). Thus, their distance from the node representing the query image within the visualization canvas represents their similarity to the query given that specific feature. This allows the user to quickly identify clusters of similar images as well as outliers (images that are quite dissimilar to the query). VAMIR plots multiple feature spaces at the same time, and a universal selection allowed clusters of images to be examined across multiple spaces. This meant that a user could verify if a similar image was relevant across all feature spaces or only one. It also allowed them to identify “missed” images, those that had a high similarity in the feature the user considered to be most important but low similarity in other features; such retrieved images would be ranked low in a traditional ranked list. Finally, VAMIR allowed users to interactively hover over a node to obtain a pop-up visualization of the image associated with it.

11.6 Summary This chapter introduced CBIR and its key components including image feature extraction, similarity comparison, indexing scheme, and interactive query interface. The need for CBMIR and its related challenges were discussed and were followed by a detailed review of current major CBMIR techniques in four different categories: retrieval based on physical visual features (color and texture); retrieval based on geometric spatial features (shape, 3-D volumetric features, and spatial relationships); retrieval by combination of semantic and visual features (semantic pathology interpretation and generic models); and retrieval based on physiological functional features. The success of CBMIR could open up many new vistas in medical services and research, such as disease tracking, differential diagnosis, noninvasive surgical planning, clinical training, and outcomes research.

11.7 Exercises 1. Describe the mechanism of a content-based image retrieval technique. 2. What is the primary differentiating factor between CBIR and CBMIR? 3. Texture as a visual feature has been successfully applied in numerous CBMIR systemsdi.e., used in MRI head images and HRCT lung images. What is an image texture and what are its attributes that enable content-based retrieval? 4. Why can 3-D volumetric features be used in CBMIR? What are the advantages and disadvantages of 3-D volumetric features versus 2-D shape features? 5. Give an example of a CBMIR application in clinical decision support. 6. What are the advantages and disadvantages of combining semantic and visual features in CBMIR? How does combining these two components exceed the expected results from using just one component? 7. Discuss different approaches to interact with the retrieved results in CBMIR.

Content-based large-scale medical image retrieval

Acknowledgments This work was partially supported by ARC and PolyU/UGC grants.

References [1] I. Bankman (Ed.), Handbook of Medical Imaging Processing and Analysis, Academic Press, San Diego, 2000. [2] H.K. Huang, PACS and Imaging Informatics: Basic Principles and Applications, second ed., WileyLiss, 2004. [3] J. Duncan, N. Ayache, Medical image analysis: progress over two decades and the challenges ahead, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (1) (2000) 85e106. [4] H.D. Tagare, C. Jaffe, J. Duncan, Medical image databases: a content-based retrieval approach, Journal of the American Medical Informatics Association 4 (3) (1997) 184e198. [5] L.H.Y. Tang, R. Hanka, H.H.S. Ip, A review of intelligent content-based indexing and browsing of medical images, Health Informatics Journal 5 (1999) 40e49. [6] H. Mu¨ller, N. Michoux, D. Bandon, A. Geissbuhler, A review of content-based image retrieval systems in medical applications e clinical benefits and future directions, International Journal of Medical Informatics 73 (2004) 1e23. [7] A. Holt, I. Bichindaritz, R. Schmidt, P. Perner, Medical applications in case-based reasoning, The Knowledge Engineering Review 20 (03) (2005) 289e292. [8] J. Boissel, M. Cucherat, E. Amsallem, P. Nony, M. Fardeheb, W. Manzi, M. Haugh, Getting evidence to pre-scribers and patients or how to make EBM a reality, in: Proc. Med. Info. Europe Conf., France, 2003. [9] Y. Rui, T.S. Huang, S.-F. Chang, Image retrieval: past, present, and future, in: Proc. Int. Symposium on Multimedia Information Processing, Taipei, Taiwan, 1997. [10] A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, R. Jain, Content-based image retrieval at the end of the early years, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (12) (2000) 1349e1380. [11] A.D. Bimbo, Visual Information Retrieval, Morgan Kauffman Publishers, San Mateo, CA, 1999. [12] V. Castelli, L.D. Bergman (Eds.), Image Databases: Search and Retrieval of Digital Imagery, John Wiley & Sons, New York, 2002. [13] D. Feng, W.C. Siu, H.J. Zhang (Eds.), Multimedia Information Retrieval and Management: Technological Fundamentals and Applications, Springer, Berlin, 2003. [14] F. Long, H. Zhang, D. Feng, Fundamental of content-based image retrieval, in: D. Feng, W.C. Siu, H. Zhang (Eds.), Multimedia Information Retrieval and Management: Technological Fundamentals and Applications, Springer, Berlin, 2003, pp. 1e26. [15] Y. Rui, T.S. Huang, S.F. Chang, Image retrieval: current techniques, promising directions and open issues, Journal of Visual Communication and Image Representation 10 (1999) 39e62. [16] R.M. Rangayyan, Biomedical Image Analysis, CRC Press, 2005. [17] J. Vendrig, M. Worring, A. Smeulders, Filter image browsing: exploiting interaction in retrieval, in: Proc. Viusl’99 Information and Information System, 1999. [18] J. Robinson, The k-d-B-tree: a search structure for large multidimensional dynamic indexes, in: Proc. SIGMOD Conf., Ann Arbor, April 1981. [19] D. Lomet, B. Salzberg, A robust multimedia-attribute search structure, in: 5th Int. Conf. Data Eng., 1989, pp. 296e304. [20] T. Brinkhoff, H. Kriegel, B. Seeger, Efficient processing of spatial joins using R-trees, in: Proceedings of the ACM SIGMOD, 1993, pp. 237e246. [21] A. Guttman, R-tree: a dynamic index structure for spatial searching, in: Proceedings of the ACM SIGMOD, 1984, pp. 47e57. [22] T. Sellis, N. Roussopoulos, C. Faloutsos, The Rþ-tree: a dynamic index for multidimensional objects, in: Proc. 12th VLDB, 1987, pp. 507e518. [23] N. Beckmann, H. Kriegel, R. Schneider, B. Seeger, The R*-tree: an efficient and robust access method for points and rectangles, in: Proceedings of the ACM SIGMOD, 1990, pp. 322e331.

357

358

Weidong Cai, Yang Song, Ashnil Kumar, Jinman Kim and David Dagan Feng

[24] S. Berchtold, D.A. Keim, H.-P. Kriegel, The X-tree: an index structure for high-dimensional data, in: Prod. 22nd Int. Conf. On Very Large Data Bases, 1996, pp. 28e39. Bombay, India. [25] K.-I. Lin, H.V. Jagadish, C. Faloutsos, The TV tree: an index structure for high-dimensional data, VLDB Journal 3 (4) (1994) 517e549. [26] D.A. White, R. Jain, Algorithms and Strategies for Similarity Retrieval, Technical Report VCL-9601, Visual Computing Laboratory, University of California, San Diego, 1996. [27] M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, P. Yanker, Query by image and video content: the QBIC system, IEEE Computer 28 (9) (September 1995) 23e32. [28] D. White, R. Jain, Similarity indexing: algorithms and performance, in: Proc. SPIE Storage and Retrieval for Image and Video Databases, 1996. [29] R. Ng, A. Sedighian, Evaluating multi-dimensional indexing structures for images transformed by principal component analysis, in: Proc. SPIE Storage and Retrieval for Image and Video Databases, 1996. [30] Y.A. Aslandogan, C.T. Yu, Techniques and systems for image and video retrieval, IEEE Transactions on KDE 11 (1) (1999) 56e63. Jan/Feb. [31] Y. Rui, T.S. Huang, M. Ortega, S. Mehrotra, Relevance feedback: a power tool for interactive content-based image retrieval, IEEE Transactions on Circuits and Systems for Video Technology 8 (5) (1998) 644e655. [32] J. Huang, S. Kumar, M. Metra, Combining supervised learning with color correlograms for contentbased image retrieval, Proceedings of the ACM Multimedia ’95 (November 1997) 325e334. [33] R. Torres, A. Falcao, Content-based image retrieval: theory and applications, Revista de Informatica Teorica e Aplicada 13 (2) (2006) 161e185. [34] X.S. Zhou, T.S. Huang, Relevance feedback in image retrieval: a comprehensive review, Multimedia Systems 8 (2003) 536e544. [35] C. Lopez-Pujalte, V. Bote, F. Anegon, Order-based fitness functions for genetic algorithms applied to relevance feedback, Journal of the Association for Information Science and Technology 54 (2) (2003) 152e160. [36] I. Cox, M. Miller, T. Minka, T. Papathomas, P. Yianilos, The Bayesian image retrieval system, PicHunter: theory, implementation, and psychophysical experiments, IEEE Transactions on Image Processing 9 (1) (2000) 20e37. [37] S. Tong, E. Chang, Support vector machine active learning for image retrieval, in: Proc. 9th ACM Int. Conf. Multimedia, 2001, pp. 107e118. NY, USA. [38] J. Huang, S.R. Kumar, M. Metra, W.J. Zhu, R. Zabith, Spatial color indexing and applications, International Journal of Computer Vision 35 (3) (1999) 245e268. [39] G. Pass, R. Zabith, Comparing images using joint histograms”, Multimedia Systems 7 (1999) 234e240. [40] W. Niblack, et al., Querying images by content using color, texture, and shape, SPIE Conference on Storage and Retrieval for Image and Video Database 1908 (April 1993) 173e187. [41] G. Pass, R. Zabith, Histogram refinement for content-based image retrieval, in: IEEE Workshop on Applications of Computer Vision, 1996, pp. 96e102. [42] J. Huang, et al., Image indexing using color correlogram, in: IEEE Int. Conf. Computer Vision and Pattern Recognition, June 1997, pp. 762e768. Puerto Rico. [43] P. Howarth, A. Yavlinsky, D. Heesch, S. Ru¨ger, Medical image retrieval using texture locality and colour, CLEF2004 e LNCS 3491 (2005) 740e749. [44] B.S. Manjunath, J.R. Ohm, Color and texture descriptors, IEEE Transactions on Circuits and Systems for Video Technology 11 (2001) 703e715. [45] J.R. Smith, S.-F. Chang, Automated Binary Texture Feature Sets for Image Retrieval, Proc. ICASSP, Atlanta, 1996. [46] R.M. Haralick, K. Shanmugam, I. Dinstein, Texture features for image classification, IEEE Transactions on Systems, Man, and Cybernetics SMC-3 (6) (1973). [47] H. Tamura, S. Mori, T. Yamawaki, Texture features corresponding to visual perception, IEEE Transactions on Systems, Man, and Cybernetics Smc-8 (6) (June 1978).

Content-based large-scale medical image retrieval

[48] X. Tang, Texture information in Run-length matrices, IEEE Transactions on Image Processing 7 (11) (November 1998) 1602e1609. [49] A. Laine, J. Fan, Texture classification by wavelet packet signatures, IEEE Transactions on Pattern Analysis and Machine Intelligence 15 (11) (November 1993) 1186e1191. [50] T. Chang, C.C.J. Kuo, Texture analysis and classification with tree-structured wavelet transform, IEEE Transactions on Image Processing 2 (4) (October 1993) 429e441. [51] J.G. Daugman, Complete discrete 2D Gabor transforms by neural networks for image analysis and compression, IEEE Transactions on ASSP 36 (July 1988) 1169e1179. [52] B.S. Manjunath, W.Y. Ma, Texture features for browsing and retrieval of image data, IEEE Transactions on Pattern Analysis and Machine Intelligence 18 (8) (August 1996) 837e842. [53] J. Francos, Orthogonal decompositions of 2D random fields and their applications in 2D spectral estimation, in: N.K. Bose, C.R. Rao (Eds.), Signal Processing and its Applications, North Holland, 1993, pp. 20e27. [54] F. Liu, R.W. Picard, Periodicity, directionality, and randomness: Wold features for image modeling and retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence 18 (7) (July 1996). [55] C. Shyu, C. Brodley, A. Kak, A. Kosaka, A. Aisen, L. Broderick, ASSERT: a physician-in-the-loop content-based retrieval system for HRCT image databases”, Computer Vision and Image Understanding 75 (1e2) (1999) 111e132. [56] J. Mao, A.K. Jain, Texture classification and segmentation using multiresolution simultaneous autoregressive models, Pattern Recognition 25 (2) (1992) 173e188. [57] J. Weszka, C. Dyer, A. Rosenfild, A comparative study of texture measures for terrain classification, IEEE Transactions on Systems, Man, and Cybernetics 6 (4) (1976). [58] A.P. Pentland, Fractal-based description of natural scenes, IEEE Transactions on Pattern Analysis and Machine Intelligence 6 (6) (1984) 661e674. [59] C. Chatfield, A. Collins, Introduction to Multivariate Analysis, Chapman & Hall, London, 1983. [60] E. Persoon, K. Fu, Shape discrimination using Fourier descriptors, IEEE Transactions on Systems, Man, and Cybernetics 7 (1977) 170e179. [61] H. Kauppinen, T. Seppnaen, M. Pietikainen, An experimental comparison of autoregressive and Fourier-based descriptors in 2D shape classification, IEEE Transactions on Pattern Analysis and Machine Intelligence 17 (2) (1995) 201e207. [62] E.M. Arkin, L. Chew, D. Huttenlocher, K. Kedem, J. Mitchell, An efficiently computable metric for comparing polygonal shapes, IEEE Transactions on Pattern Analysis and Machine Intelligence 13 (3) (1991). [63] A. Pentland, R.W. Picard, S. Sclaroff, Photobook: content-based manipulation of image databases, International Journal of Computer Vision (1996). [64] S. Abbasi, F. Mokhtarian, J. Kittler, Enhancing CSS-based shape retrieval for objects with shallow concavities, Image and Vision Computing 18 (3) (2000) 199e211. [65] F. Mokhtarian, S. Abbasi, Shape similarity retrieval under Affine transforms, Pattern Recognition 35 (1) (2002) 31e41. [66] Z. You, A.K. Jain, Performance evaluation of shape matching via chord length distribution, Computer Vision, Graphics, and Image Processing 28 (1984) 185e198. [67] A.K. Jain, Fundamentals of Digital Image Processing, Prentice Hall, New York, 1986. [68] N. Arica, F. Vural, BAS: a perceptual shape descriptor based on the beam angle statistics, Pattern Recognition Letters 24 (9e10) (2003) 1627e1639. [69] G.C.-H. Chuang, C.-C.J. Kuo, Wavelet descriptor of planar curves: theory and applications, IEEE Transactions on Image Processing 5 (1) (January 1996) 56e70. [70] M.K. Hu, Visual pattern recognition by moment invariants, computer methods in image analysis, IRE Transactions on Information Theory 8 (1962). [71] L. Yang, F. Algregtsen, Fast computation of invariant geometric moments: a new method giving correct results, in: Proc. IEEE Int. Conf. on Image Processing, 1994. [72] Y.S. Kim, W.Y. Kim, Content-based trademark retrieval system by using visually salient feature, in: Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1997, pp. 307e312.

359

360

Weidong Cai, Yang Song, Ashnil Kumar, Jinman Kim and David Dagan Feng

[73] L. Prasad, Morphological Analysis of Shapes, in: CNLS Research Highlights, Los Alamos National Laboratory, Los Alamos, NM, July 1997. [74] D.H. Ballard, D.M. Brown, Computer Vision, Prentice Hall, Englewood cliffs, N.J., 1982. [75] S.-K. Chang, Principles of Pictorial Information Systems Design, Prentice Hall Int’l Editions, Englewood Cliffs, N.J., 1989. [76] S.K. Chang, Q.Y. Shi, C.Y. Yan, Iconic indexing by 2-D strings, IEEE Transactions on Pattern Analysis and Machine Intelligence 9 (3) (May 1987) 413e428. [77] H. Samet, The quadtree and related hierarchical data structures, ACM Computing Surveys 16 (2) (1984) 187e260. [78] V.N. Gudivada, V.V. Raghavan, Design and evaluation of algorithms for image retrieval by spatial similarity, ACM Transactions on Information Systems 13 (2) (April 1995) 115e144. [79] M.E. Mattie, L. Staib, E. Stratmann, H.D. Tagare, J. Duncan, P.L. Miller, “PathMaster: contentbased cell image retrieval using automated feature extraction”, Journal of the American Medical Informatics Association 7 (4) (2000) 404e415. [80] Q. Wang, V. Megalooikonomou, D. Kontos, A medical image retrieval framework, in: Proc. IEEE Workshop on Machine Learning for Signal Processing (MLSP’05), 2005, pp. 233e238. [81] C. Brodley, A. Kak, C. Shyu, J. Dy, L. Broderick, A.M. Aisen, Content-based retrieval from medical image databases: a synergy of human interaction, machine learning and computer vision, in: Proc. The 10th National Conf. on Artificial Intelligence, 1999, pp. 760e767. Orlando, FL, USA. [82] A. Marchiori, C. Brodley, J. Dy, C. Pavlopoulou, A. Kak, L. Broderick, A.M. Aisen, “CBIR for medical images ean evaluation trial”, in: Proc. IEEE Workshop on Content-Based Access of Image and Video Libraries, 2001, pp. 89e93. [83] I. El-Naqa, Y. Yang, N.P. Galatsanos, R.M. Nishikawa, M.N. Wernick, A similarity learning approach to content-based image retrieval: application to digital mammography, IEEE Transactions on Medical Imaging 23 (10) (2004) 1233e1244. [84] S.T.C. Wong, CBIR in medicine: still a long way to go, in: Proc. IEEE Workshop on Content-Based Access of Image and Video Libraries, 1998, p. 114. [85] S.T.C. Wong, H.K. Huang, Design methods and architectural issues of integrated medical image data base systems, Computerized Medical Imaging and Graphics 20 (4) (1996) 285e299. [86] T. Glatard, J. Montagnat, I.E. Magnin, Texture based medical image indexing and retrieval: application to cardiac imaging, in: Proc. The 6th ACM SIGMM Int. Workshop on Multimedia Info. Retrieval, 2004, pp. 135e142. [87] M.M. Rahman, P. Bhattacharya, B.C. Desai, A framework for medical image retrieval using machine learning and statistical similarity matching techniques with relevance feedback, IEEE Transactions on Information Technology in Biomedicine 11 (1) (2007) 58e69. [88] C.H. Li, P.C. Yuen, Regularized color clustering in medical image database, IEEE Transactions on Medical Imaging 19 (11) (2000) pp1150e1155. [89] S. Tamai, The color of digital imaging in pathology and cytology, Digital Color Imaging in Biomedicine (2) (2001) 61e66. [90] H.K. Choi, H.G. Hwang, M.K. Kim, T.Y. Kim, Design of the breast carcinoma cell bank system, in: Proc. The 6th Int. Workshop on Enterprise Networking and Computing in Healthcare Industry (HEALTHCOM’04), 2004, pp. 88e91. [91] F. Schnorrenberg, C.S. Pattichis, C.N. Schizas, K. Kyriacou, Content-based retrieval of breast cancer biopsy slides, Technology and Health Care 8 (2000) 291e297. [92] L. Zheng, A.W. Wetzel, J. Gilbertson, M.J. Becich, Design and analysis of a content-based pathology image retrieval system, IEEE Transactions on Information Technology in Biomedicine 7 (4) (December 2003) 249e255. [93] H.L. Tang, R. Hanka, H.S. Ip, Histological image retrieval based on semantic content analysis, IEEE Transactions on Information Technology in Biomedicine 7 (1) (March 2003) 26e36. [94] R.W.K. Lam, K.T. Cheung, H.S. Ip, L.H.Y. Tang, R. Hanka, An iconic and semantic content based retrieval system for histological images, VISUAL 2000- LNCS 1929 (2000) 384e395.

Content-based large-scale medical image retrieval

[95] L.H. Tang, R. Hanka, R. Lan, H.H.S. Ip, Automatic semantic labelling of medical images for content-based retrieval, in: Proc. The Int. Conf. Artificial Intelligence, Expert Systems and Applications (EXPERSYS 1998), 1998, pp. 77e82. Virginia Beach, VA, USA. [96] L.H. Tang, R. Hanka, H.H.S. Ip, R. Lam, Extraction of semantic features of histological images for content-based retrieval of images, in: Proc. The IEEE Symp. Computer-Based Medical Systems (CBMS 2000), 2000. [97] M. Nischik, C. Forster, Analysis of skin erythema using true-color images, IEEE Transactions on Medical Imaging 16 (12) (1997) pp711e716. [98] G.L. Hansen, E.M. Sparrow, J.Y. Kokate, K.Y. Leland, P.A. Iaizzo, Wound status evaluation using color image processing, IEEE Transactions on Medical Imaging 16 (2) (1997) pp78e86. [99] M.M. Rahman, B.c. Desai, P. Bhattacharya, Image retrieval-based decision support system for dermatoscopic images, in: Proc. The 19th IEEE Symp. on Computer-Based Medical Systems, 2006, pp. 285e290. [100] M. Nishibori, Problems and solutions in medical color imaging, in: Proc. Second Int. Symp. on Multispectral Imaging and High Accurate Color Reproduction, 10-11 October 2000, pp. 9e17. [101] B.V. Dhandra, R. Hegadi, M. Hangarge, V.S. Malemath, Analysis of abnormality in endoscopic images using combined HIS color space and watershed segmentation, in: The 18th Int. Conf. On Pattern Recognition, 2006. [102] S. Xia, W. M, Y. Yan, X. Chen, “An endoscopic image retrieval system based on color clustering method, in: Int. Symp. On Multispectral Image Processing and Pattern Recognition, Proc. The SPIE, vol. 5286, 2003, pp. 410e413. [103] K.B. Kim, S. Kim, G.H. Kim, Analysis system of endoscopic image of early gastric cancer, IEEE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E89-A (10) (2006) 2662e2669. [104] U. Honmyo, A. Mitsui, A. Murakami, S. Mizumoto, I. Yoshinada, M. Maeda, S. Yamamoto, S. Shimada, Mechanisms producing color change in flat early gastric cancers, Endoscopy 29 (1997) 366e371. [105] T. Ogihara, H. Watanabe, M. Namihisa, N. Sato, Display of mucosal blood flow function and color enhancement based on blood flow index (IHb color enhancement, Clinical Gastroenterology 12 (1997) 109e117. [106] S. Tsuji, N. Sato, S. Kawano, T. Kamada, Functional imaging for the analysis of the mucosal blood hemoglobin distribution using electronic endoscopy, Gastrointestinal Endoscopy 34 (1988) 332e336. [107] M.P. Tjoa, S.M. Krishnan, Feature extraction for the analysis of colon status from the endoscopic images, BioMedical Engineering Online 2 (9) (2003) pp1e17. [108] Z.L. Chen, Research on Tongue Diagnosis, Shanghai Sci. & Tech. Publishing House, Shanghai, China, 1982. [109] G. Maciocia, Tongue Diagnosis in Chinese Medicine, Eastland Press, 1995. [110] L.S. Shen, B.G. Wei, Y.H. Cai, X.F. Zhang, Y.Q. Wang, Image analysis for tongue characterization, Chinese Journal of Electronics 12 (3) (2003) 317e323. [111] H. Muller, A. Rosset, J.-P. Vallee, A. Geissbuhler, Integrating content-based visual access methods into a medical case database, in: Proc. The Medical Informatics Europe Conf., 2003, pp. 480e485. St. Malo, France. [112] A. Rosset, H. Muller, M. Martins, N. Dfouni, J.-P. Vallee, O. Ratib, “Casimage project e a digital teaching files authoring environment”, Journal of Thoracic Imaging 19 (2) (2004) 1e6. [113] P. Clough, H. Muller, M. Sanderson, The CLEF 2004 cross-language image retrieval track, in: Proc. 5th Workshop Cross-Language Evaluation Forum, CLEF 2004, vol. 3491, September 2004, pp. 597e613. Bath, U. K. [114] D. Xu, A.S. Kurani, J.D. Furst, D.S. Raicu, Run-length encoding for volumetric texture, in: Proc. VIIP, 2004. [115] T.M. Lehmann, M.O. Guld, D. Keysers, T. Deselaers, H. Schubert, B. Wein, K. Spitzer, Similarity of medical images computed from global feature vectors for content-based retrieval, in: Proc. KES2004 e LNAI, 3214, 2004, pp. 989e995.

361

362

Weidong Cai, Yang Song, Ashnil Kumar, Jinman Kim and David Dagan Feng

[116] D.S. Raicu, J.D. Furst, D. Channin, D. Xu, A. Kurani, S. Aioanei, A texture dictionary for human organs tissues’classification, in: Proc. 8th World Multiconf. On Sys. Cyb. and Info.(SCI2004), USA, July 18-21, 2004. [117] L. Soh, C. Tsatsoulis, Texture analysis of SAR Sea Ice Imagery using gray level Co-occurrence matrices, IEEE Transactions on Geoscience and Remote Sensing 37 (2) (March 1999). [118] D.A. Clausi, An analysis of co-occurrence texture statistics as a function of grey level quantization, Canadian Journal of Remote Sensing 28 (1) (2002) 45e62. [119] W. Tsang, A. Corboy, K. Lee, D. Raicu, J. Furst, Texture-based image retrieval for computerized tomography databases, in: Proc.18th IEEE Symp. On Computer-Based Medical Systems (CBMS’05), 2005, pp. 593e598. [120] S. Orphanoudakis, C. Chronaki, D. Vamvaka, I2Cnet: content-based similarity search in geographically distributed repositories of medical images, Computerized Medical Imaging and Graphics 20 (4) (1996) 193e207. [121] P.A. Freeborough, N.C. Fox, MR image texture analysis applied to the diagnosis and tracking of Alzheimer’s disease, IEEE Transactions on Medical Imaging 17 (3) (June 1998) 475e479. [122] J.C. Felipe, A.J.M. Traina, C. Traina, Retrieval by content of medical images using texture for tissue identification, in: Proc. 16th IEEE Symp. On Computer-Based Medical Systems (CBMS’03), 2003, pp. 175e180. [123] D.M. Kwak, B.S. Kim, O.K. Yoon, C.H. Park, J.U. Won, K.H. Park, Content-based ultrasound image retrieval using a coarse to fine approach, Annals of the New York Academy of Sciences 980 (2002) 212e224. [124] M.A. Sheppard, L. Shih, Efficient image texture analysis and classification for prostate ultrasound diagnosis, in: Proc. IEEE Conf. Computational Systems Bioinformatics, 2005, pp. 7e8. [125] H. Alto, R.M. Rangayyan, J.E. Leo Desautels, Content-based retrieval and analysis of mammographic masses, Journal of Electronic Imaging 14 (2) (2005) pp.1-17. [126] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in: Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2015, pp. 1e9. [127] S. Xia, D. Ge, W. Mo, Z. Zhang, A content-based retrieval system for endoscopic images, in: Proc. The 27th Annual Conf. the IEEE EMBS, September 1-4, 2005, pp. 1720e1723. Shanghai, China. [128] A.K. Jain, F. Farroknia, Unsupervised texture segmentation using Gabor filters, Pattern Recognition 24 (12) (1991) 1167e1186. [129] C.G. Zhao, H.Y. Cheng, Y.L. Huo, T.G. Zhuang, Liver CT-image retrieval based on Gabor texture, in: Proc. The 26th Annual Conf. the IEEE EMBS, September 1-5, 2004, pp. 1491e1494. San Francisco, CA, USA. [130] D. Zhao, Y. Chen, H. Correa, Statistical categorization of human histological images, in: Proc. IEEE Int. Conf. on Image Processing (ICIP’05), 2005, pp. 628e631. [131] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE 86 (11) (1998) 2278e2324. [132] A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep convolutional neural networks, in: Proc. Advances in Neural Information Processing Systems, 2012, pp. 1097e1105. [133] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, in: Proc. Int. Conf. Learning Representations, 2015, pp. 1e14. [134] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2016, pp. 770e778. [135] O. Russakovsky, J. Deng, H. Su, J. Drause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A.C. Berg, L. Fei-Fei, ImageNet Large scale visual recognition challenge, International Journal of Computer Vision 115 (3) (2015) 211e252. [136] Y. Bar, I. Diamant, L. Wolf, S. Lieberman, E. Konen, H. Greenspan, Chest pathology detection using deep learning with non-medical training, in: Proc. IEEE Int. Symp. Biomedical Imaging, 2015, pp. 294e297.

Content-based large-scale medical image retrieval

[137] W. Shen, M. Zhou, F. Yang, C. Yang, J. Tian, Multi-scale convolutional neural networks for lung nodule classification, in: Proc. Int. Conf. Information Processing in Medical Imaging, 2015, pp. 588e599. [138] R. Li, W. Zhang, H. Suk, L. Wang, J. Li, D. Shen, S. Ji, Deep learning based imaging data completion for improved brain disease diagnosis, in: Proc. Int. Conf. Medical Image Computing and Computer-Assisted Intervention, 2014, pp. 305e312. [139] J. Shin, M.R. Orton, D.J. Collins, S.J. Doran, M.O. Leach, Stacked autoencoders for unsupervised feature learning and multiple organ detection in a pilot study using 4D patient data, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (8) (2013) 1930e1943. [140] A. Gracia Seco de Herrera, R. Schaer, S. Bromuri, H. Mu¨ller, Overview of the ImageCLEF 2016 medical task, in: Proc. Working Notes of CLEF 2016, 2016, pp. 1e13. [141] Z. Li, X. Zhang, H. Mu¨ller, S. Zhang, Large-scale retrieval for medical image analytics: a comprehensive review, Medical Image Analysis 43 (2018) 66e84. [142] S. Antani, D.J. Lee, L.R. Long, G.R. Thoma, Evaluation of shape similarity measurement methods for spine X-ray images, Journal of Visual Communication and Image Representation 15 (2004) 285e302. [143] D. Comaniciu, D. Foran, P. Meer, Shape-based image indexing and retrieval for diagnostic pathology, in: Proc. The 14th Int. Conf. on Pattern Recognition, 1998, pp. 902e904. [144] R.M. Rangayyan, N.M. El-Faramawy, J.E. Leo, O.A. Alim, Measures of acutance and shape for classification of breast tumors, IEEE Transactions on Medical Imaging 16 (6) (1997) 799e810. [145] S. Ciatto, L. Cataliotti, v. Distante, Nonpalpable lesions detected with mammography:Review of 512 consecutive cases, Radiology 165 (1) (1987) 99e102. [146] M.K. Hu, Visual pattern recognition by moment invariants, in: J.K. Aggarwal, R.O. Duda, A. Rosenfeld (Eds.), Computer Methods in Image Analysis, IEEE Computer Society, Los Angeles, CA, 1977. [147] R.C. Gonzalez, R.E. Woods, Digital Image Processing, Prentice-Hall, New Jersey, 2002. [148] S. Zhu, G. Schaefer, Thermal medical image retrieval by moment invariants, in: Int. Symposium on Biological and Medical Data Analysis e LNCS, 3337, 2004, pp. 182e187. [149] S. Maitra, Moment invariants, Proceedings of the IEEE 67 (1979) 697e699. [150] G. Schaefer, S.Y. Zhu, S. Ruszala, Visualisation of medical infrared image databases, in: Proc. The 27th Annual Conf. of the IEEE EMBS, September 1e4, 2005, pp. 634e637. Shanghai, China. [151] G.P. Robinson, H.D. Targare, J.S. Duncan, C.C. Jaffe, Medical image collection indexing: shapebased retrieval using KD-trees, Computer Vision, Graphics, and Image Processing 20 (4) (1996) 209e217. [152] A.J.M. Traina, A.G.R. Balan, L.M. Bortolotti, C. Traina Jr., Content-based image retrieval using approximate shape of objects, in: Pro. 17th IEEE Symposium on Computer-Based Medical Systems (CBMS’04), 2004, pp. 91e96. [153] W. Zhang, S. Dickinson, S. Sclaroff, J. Feldman, S. Dunn, Shape-based indexing in a medical image database, in: Proc. IEEE Workshop on Biomed. Image Analysis, 1998, pp. 221e230. [154] J. Felipe, M. Ribeiro, E. Sousa, A. Traina, C. Traina Jr., Effective shape-based retrieval and classification of mammograms, in: Proc. the 2006 ACM Symp. on Applied Computing, 2006, pp. 250e255. [155] P. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, Z. Protopapas, Fast and effective retrieval of medical tumor shapes, IEEE Transactions on KDE 10 (6) (1998). Nov./Dec. [156] J.Z. Wang, Pathfinder: multiresolution region-based searching of pathology images using IRM, in: J. Am. Med. Informatics Asso. (AMIA) e Proc. The AMIA Annual Symp., Symposium Suppl, vol. 2000, November 2000, pp. 883e887. Los Angeles, CA, USA. [157] C. Ng, G. Martin, Content-description Interfaces for Medical Imaging, Technical Report CS-RR383, Covertry, UK, 2001. [158] W. Liu, Q. Tong, Medical image retrieval using salient point detector, in: Proc. The 2005 IEEE EMBS 27th Annual Conference, September 1, 2005, pp. 6352e6355. Shanghai, China. [159] S. Chu, S. Narayanan, C.-C. Kuo, Efficient rotation invariant retrieval of shapes with applications in medical databases, in: Pro. 19th IEEE Symp. On Computer-Based Medical Systems (CBMS’06) vol. 673, 2006, p. 678.

363

364

Weidong Cai, Yang Song, Ashnil Kumar, Jinman Kim and David Dagan Feng

[160] P.A. Mlsna, N.M. Sirakov, “Intelligent shape feature extraction and indexing for efficient contentbased medical image retrieval”, in: Pro. 6th IEEE Southwest Symp. On Image Analysis and Interpretation, 2004, pp. 172e176. [161] B. Fischer, C. Thies, M.O. Guld, T.M. Lehmann, Content-based image retrieval by matching hierarchical attributed region adjacency graphs, Proceedings of SPIE e Medical Imaging: Image Processing 5370 (2004) 598e606. [162] S. Antani, L.R. Long, G.R. Thoma, D.J. Lee, Evaluation of shape indexing methods for contentbased retrieval of x-ray images, in: M.M. Yeung, R.W. Lienhart, C.S. Li (Eds.), Proceedings of SPIE 5021, 2003, pp. 405e416. [163] D.J. Lee, S. Antani, L.R. Long, Similarity measurement using polygon curve representation and fourier descriptors for shape-based vertebral image retrieval, in: M. Sonka, J.M. Fitzpatrick (Eds.), Proc. SPIE 5032, 2003, pp. 1283e1291. [164] X. Xu, D.J. Lee, S. Antani, L. Long, A spine x-ray image retrieval system using partial shape matching, IEEE Transactions on Information Technology in Biomedicine 12 (1) (2008) 100e108. [165] W. Hsu, S. Antani, L.R. Long, L. Neve, G.R. Thoma, SPIRS: a web-based image retrieval system for large biomedical databases, International Journal of Medical Informatics 78 (Suppl. 1) (2009) S13eS24. [166] D.J. Lee, S. Antani, Y. Chang, K. Gledhill, L.R. Long, P. Christensen, CBIR of spine X-ray images on inter-vertebral disc space and shape profiles using feature ranking and voting consensus, Data & Knowledge Engineering 68 (12) (2009) 1359e1369. [167] X. Qian, H.D. Tagare, R.K. Fulbright, R. Long, S. Antani, Optimal embedding for shape indexing in medical image databases, Medical Image Analysis 14 (3) (2010) 243e254. [168] J.K. Udupa, G.T. Herman, 3D Imaging in Medicine, CRC Press, 2000. [169] J. Kim, W. Cai, D. Feng, H. Wu, A new way for multidimensional medical data management: volume of interest (VOI)-based retrieval of medical images with visual and functional features, IEEE Transactions on Information Technology in Biomedicine 10 (3) (July 2006) 598e607. [170] Y. Liu, N.A. Lazar, W.E. Rothfus, F. Dellaert, A. Moore, J. Schneider, T. Kanade, Semantic-based biomedical image indexing and retrieval, in: L. Shapiro, H. Kriegel, R. Veltkamp (Eds.), Trends and Advances in Content-Based Image and Video Retrieval, Springer, 2004. [171] Y. Liu, R.T. Collins, W.E. Rothfus, Robust midsagittal plane extraction from normal and pathological 3-D Neuroradiology images, IEEE Transactions on Medical Imaging 20 (3) (2001) 175e192. [172] Y. Liu, W.E. Rothfus, T. Kanade, “Content-based 3D Neuroradiologic image retrieval: preliminary results, in: Proc. IEEE Workshop Content-Based Access of Image and Video Libraries, Conjunction with Int. Conf. Computer Vision, January 1998, pp. 91e100. Bombay, India. [173] J. Declerck, G. Subsol, J.-P. Thirion, N. Ayache, Automatic Retrieval of Anatomical Structures in 3d Medical Images, Technical Report 2485, INRIA, Sophia-Antipolis, France, 1995. [174] A. Guimond, G. Subsol, Automatic MRI database exploration and applications, International Journal of Pattern Recognition and Artificial Intelligence 11 (1997) 1345e1365. [175] V. Megalooikonomou, H. Dutta, D. Kontos, Fast and effective characterization of 3D region data, in: Proc. IEEE Int. Conf. On Image Processing (ICIP’02), 2002, pp. 421e424. [176] W.W. Chu, A.F. Cardenas, R.K. Taira, “KMED: a knowledge-based multimedia medical distributed database system”, Information Systems 19 (4) (1994) 33e54. [177] S. Aksoy, G. Marchisio, C. Tusk, K. Koperski, Interactive classification and content-based retrieval of tissue images, in: Proc. SPIE e Applications of Digital Image Processing XXV, 4790, 2002, pp. 71e81. [178] R. Chbeir, F. Favetta, A global description of medical image with high precision, in: Proc. IEEE Int. Symp. On Bio-Informatics and Biomedical Engineering (BIBE’2000), November 8-10, 2000, pp. 289e296. Washington, D. C. USA. [179] R. Chbeir, Y. Amghar, A. Flory, “MIMS: a prototype for medical image retrieval”, in: Proc. 6th Int. Conf. Of Content Based Multimedia Information Access, RIAO2000, April 2000, 2000, pp. 846e861. [180] E. Petrakis, Content-based retrieval of medical images, International Journal of Computer Research 11 (2) (2002) 171e182.

Content-based large-scale medical image retrieval

[181] W.W. Chu, C.C. Hsu, A.F. Cardenas, R.K. Taira, Knowledge-based image retrieval with spatial and temporal constructs, IEEE Transactions on KDE 10 (6) (1998). [182] P.M. Willy, K.H. Ku¨fer, Content-based medical image retrieval (CBMIR): an intelligent retrieval system for handling multiple organs of interest, in: Proc. The 17th IEEE Symp. on Computer-Based Medical Systems (CBMS’04), 2004, pp. 103e108. [183] A. Kumar, J. Kim, W. Cai, M. Fulham, D. Feng, Content-based medical image retrieval: a survey of applications to multidimensional and multimodality data, Journal of Digital Imaging 26 (6) (2013) 1024e1039. [184] N. Alajlan, M. Kamel, G. Freeman, Geometry-based image retrieval in binary image databases, IEEE Transactions on Pattern Analysis and Machine Intelligence 30 (6) (2008) 1003e1013. [185] A. Kumar, J. Kim, L. Wen, M. Fulham, D. Feng, A graph-based approach for the retrieval of multimodality medical images, Medical Image Analysis 18 (2) (2014) 330e342. [186] C.R. Shyu, C. Pavlopoulou, A.C. Kak, C. E Brodley, Using human perceptual categories for content-based retrieval from a medical image database”, Computer Vision and Image Understanding 88 (2002) 119e151. [187] A.S. Barb, C.R. Shyu, Y.P. Sethi, Knowledge representation and sharing using visual semantic modeling for diagnostic medical image databases, IEEE Transactions on Information Technology in Biomedicine 9 (4) (December 2005) 538e553. [188] E. Stern, S. Swensen, High Resolution CT of the Chest: Comprehensive Atlas, second ed., Lippincott Williams & Wilkins, Philadelphia, PA, 2000. [189] W. Webb, N. Muller, D. Naidich, High-Resolution CT of Lung, Lippincott-Raven, Philadelphia, PA, 1996. [190] K.K.T. Cheung, R.W.K. Lam, H.H.S. Ip, R. Hanka, L.H.Y. Tang, G. Fuller, An object-oriented framework for content-based image retrieval based on 5-tier architecture, in: Proc. Asia-Pacific Software Eng. Conf., vol. 99, December 7-10, 1999, pp. 174e177. Takamatsu, Japan. [191] R.W.K. Lam, H.H.S. Ip, K.K.T. Cheung, L.H.Y. Tang, R. Hanka, A multi-window approach to classify histological features, in: Proc. Int. Conf. Pattern Recognition, vol. 2, September 2000, pp. 259e262. Barcelona, Spain. [192] K.K.T. Cheung, R. Lam, H.H.S. Ip, L.H.Y. Tang, R. Hanka, A software framework for combining iconic and semantic content for retrieval of histological images, VISUAL 2000- LNCS 1929 (2000) 488e499. [193] L.H. Tang, R. Hanka, H.H.S. Ip, K.K.T. Cheung, R. Lam, An intelligent system for integrating semantic and iconic features for image retrieval, Proceedings of Computer Graphics International (2001) 240e245. [194] L. Tan, R. Hanka, H. Ip, K. Cheung, R. Lam, Integration of intelligent engines for a large scale medical image database, in: Proc. 13th IEEE Symp. on Computer-based Medical Systems, 2000, pp. 235e240. [195] Y. Song, W. Cai, S. Eberl, M. Fulham, D. Feng, Thoracic image case retrieval with spatial and contextual information, in: Proc. IEEE Int. Symp. Biomedical Imaging, 2011, pp. 1885e1888. [196] Y. Song, W. Cai, S. Eberl, M. Fulham, D. Feng, A content-based image retrieval framework for multi-modality lung images, in: Proc. IEEE Symp. on Computer-Based Medical Systems, 2010, pp. 285e290. [197] Y. Song, W. Cai, Y. Zhou, L. Wen, D. Feng, Pathology-centric medical image retrieval with hierarchical contextual spatial descriptor, in: Proc. IEEE Int. Symp. Biomedical Imaging, 2013, pp. 202e205. [198] Y. Song, W. Cai, D. Feng, Hierarchical spatial matching for medical image retrieval, in: Proc. Annual ACM Int. Conf. Multimedia Workshop on Medical Multimedia Analysis and Retrieval, 2011, pp. 1e6. [199] G. Ng, Y. Song, W. Cai, Y. Zhou, S. Liu, D. Feng, Hierarchical and binary spatial descriptors for lung nodule image retrieval, in: Proc. Annul Int. Conf. IEEE Eng. in Medicine and Biology Society, 2014, pp. 6463e6466. [200] W. Cai, Y. Song, D. Feng, Regression and classification based distance metric learning for medical image retrieval, in: Proc. IEEE Int. Symp. Biomedical Imaging, 2012, pp. 1775e1778.

365

366

Weidong Cai, Yang Song, Ashnil Kumar, Jinman Kim and David Dagan Feng

[201] M.C. Jaulent, C.L. Bozec, Y. Cao, E. Zapletal, P. Degoulet, A property concept frame representation for flexible image content retrieval in histopathology databases, in: Proc. The Annual Symp. of the Am. Soc. Med. Informatics (AMIA), 2000, pp. 379e383. Los Angeles, CA, USA. [202] H. Shao, W.C. Cui, L. Tang, Medical image description in content-based image retrieval, in: Proc. The 27th Annual Conf. the IEEE EMBS, September 1-4, 2005, pp. 6336e6339. Shanghai, China. [203] W. Cai, F. Zhang, Y. Song, S. Liu, L. Wen, S. Eberl, M. Fulham, D. Feng, Automated feedback extraction for medical imaging retrieval, in: Proc. IEEE Int. Symp. Biomedical Imaging, 2014, pp. 907e910. [204] S. Liu, W. Cai, Y. Song, S. Pujol, R. Kikinis, D. Feng, “A bag of semantic words model for medical content-based retrieval”, in: MICCAI Workshop on Medical Content-based Retrieval for Clinical Decision Support, 2013, pp. 1e8. [205] F. Zhang, Y. Song, S. Liu, S. Pujol, R. Kikinis, D. Feng, W. Cai, Latent semantic association for medical image retrieval, in: Proc. Int. Conf. Digital Image Computing: Techniques and Applications, 2014, pp. 50e55. [206] F. Zhang, Y. Song, W. Cai, S. Liu, S. Liu, S. Pujol, R. Kikinis, Y. Xia, M. Fulham, D. Feng, ADNI, “Pairwise latent semantic association for similarity computation in medical imaging”, IEEE Transactions on Biomedical Engineering 63 (5) (2016) 1058e1069. [207] F. Zhang, Y. Song, W. Cai, A. Hauptmann, S. Liu, S. Pujol, R. Kikinis, M. Fulham, D. Feng, M. Chen, Dictionary pruning with visual word significance for medical image retrieval, Neurocomputing 177 (2) (2016) 75e88. [208] T.M. Lehmann, M.O. Guld, C. Thies, B. Fischer, K. Spitzer, D. Keysers, H. Ney, M. Kohnen, H. Schubert, B.B. Wein, Content-based image retrieval in medical applications, Methods of Information in Medicine (4) (2004) 354e361. [209] D. Keysers, J. Dahmen, H. Ney, B. Wein, T. Lehmann, A statistical framework for model-based image retrieval in medical applications, Journal of Electronic Imaging 12 (1) (2003) 59e68. [210] C. Hsu, W. Chu, R. Taira, A knowledge-based approach for retrieving images by content, IEEE Transactions on KDE (August 1996). [211] R. Chbeir, F. Favetta, A global description of medical imaging with high precision, IEEE Transactions on Systems, Man, and Cybernetics Part B: Cybernetics 33 (5) (October 2003) 752e757. [212] R. Chbeir, Y. Amghar, A. Flory, L. Brunie, A hyper-spaced data model for content and semanticbased medical image retrieval, in: Proc. ACS/IEEE Int. Conf. Computer Systems and Applications, 2001, pp. 161e167. [213] International Classification of Diseases 10th Revision. http://www.who.int/classifications/icd/en/. [214] B.L. Humphreys (Ed.), UMLS Knowledge Sources e First Experimental Edition Documentation, National Library of Medicine, Bethesda, MD, 1990. [215] R. Chbeir, Y. Amghar, A. Flory, System for medical image retrieval the MIMS model, in: Proc. The 3rd Int. Conf. Visual (VISUAL’99), LNCS 1614, June 1999, pp. 37e42. Amsterdam, The Netherlands. [216] S. Atnafu, R. Chbeir, L. Brunie, Content-based and metadata retrieval in medical image database, in: Proc. The 15th IEEE Symp. on Computer-Based Medical Systems (CBMS 2002), 2002, pp. 327e332. [217] R. Chbeir, S. Atnafu, L. Brunie, Image data model for an efficient multi-criteria query: a case in medical databases, in: Proc. 14th Int. Conf. Scientific and Statistical Database Management (SSDBM’02), 2002, pp. 165e174. [218] T. Lehmann, H. Schubert, D. Keysers, M. Kohnen, B. Wein, The IRMA code for unique classification of medical images, Proceedings of SPIE 5033 (2003) 440e451. [219] S.C. Orphanoudakis, C. Chronaki, S. Kostomanolakis, I2C: asystem for the indexing, storage, and retrieval of medical images by content, Medical Informatics 19 (2) (1994) 109e122. [220] E. Ei-Kwae, H. Xu, M. Kabuka, Content-based retrieval in picture archiving and communication systems, Journal of Digital Imaging 13 (2) (2000) 70e81. [221] H. Lowe, I. Antipov, W. Hersh, C. Smith, M. Mailhot, Automated semantic indexing of imaging reports to support retrieval of medical images in the multimedia electronic medical record, Methods of Information in Medicine 38 (303e7) (1999).

Content-based large-scale medical image retrieval

[222] H. Muller, C. Lovis, A. Geissbuhler, The medGIFT project on medical image retrieval, in: Proc. The 15th IEEE Symp. on Computer-Based Medical Systems (CBMS2002), 2002, pp. 321e326. [223] H. Muller, A. Rosset, J.P. Vallee, A. Geissbuhler, Comparing feature sets for content-based image retrieval in a medical case database”, Proceedings of SPIE 2004 (2004). [224] http://ww.gnu.org/software/gift/. [225] http://www.mrml.net/. [226] D. Feng, Information technology applications in biomedical functional imaging, IEEE Transactions on Information Technology in Biomedicine 3 (3) (1999) 221e230. [227] D. Feng, D. Ho, H. Iida, K. Chen, “Techniques for functional imaging”, an invited chapter contributing to Medical imaging techniques and applications, in: C.T. Leondes (Ed.), ’Gordon and Breach International Series in Engineering, Technology and Applied Science’, Gordon and Breach Science Publishers, 1997, pp. 85e145. [228] W. Cai, D. Feng, R. Fulton, Content-based retrieval of dynamic PET functional images, IEEE Transactions on Information Technology in Biomedicine 4 (2) (2000) 152e158. [229] S.C. Huang, M.E. Phelps, E.J. Hoffman, K. Sideris, C. Selin, D.E. Kuhl, Non-invasive determination of local cerebral metabolic rate of glucose in man, American Journal of Physiology. 238 (1980) E69eE82. [230] D. Feng, D. Ho, K. Chen, L.C. Wu, J.K. Wang, R.S. Liu, S.H. Yeh, An evaluation of the algorithms for determining local cerebral metabolic rates of glucose using positron emission tomography dynamic data, IEEE Transactions on Medical Imaging 14 (1995) 697e710. [231] J. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithm, Kluwer, Norwell, MA, 1981. [232] J. Kim, W. Cai, D. Feng, S. Eberl, An objective evaluation framework for segmentation techniques of functional positron emission tomography studies, IEEE NSS-MIC Conference 5 (October 16-22, 2004) 3217e3221. [233] X. Li, D. Feng, K. Chen, Optimal image sampling schedule: a new effective way to reduce dynamic image storage and functional image processing time, IEEE Transactions on Medical Imaging 15 (5) (October 1996) 710e719. [234] D. Feng, W. Cai, R. Fulton, An optimal image sampling schedule design for cerebral blood volume and partial volume correction in neurologic FDG-PET studies, Australian & New Zealand Journal of Medicine 28 (1998) 361. [235] W. Cai, S. Liu, L. Wen, S. Eberl, M. Fulham, D. Feng, 3D neurological image retrieval with localized pathology-centric CMRGlc patterns, in: Proc. IEEE Int. Conf. in Image Processing, 2010, pp. 3201e3204. [236] G. Quellec, M. Lamard, G. Cazuguel, B. Cochener, C. Roux, Wavelet optimization for contentbased image retrieval in medical databases, Medical Image Analysis 14 (2) (2010) 227e241. [237] G. Quellec, M. Lamard, L. Bekri, G. Cazuguel, C. Roux, B. Coch- ener, Medical case retrieval from a committee of decision trees, IEEE Transactions on Information Technology in Biomedicine 14 (5) (2010) 1227e1235. [238] G. Quellec, M. Lamard, G. Cazuguel, C. Roux, B. Cochener, Case retrieval in medical databases by fusing heterogeneous information, IEEE Transactions on Medical Imaging 30 (1) (2011) 108e118. [239] T. Deserno, M. Gu¨ld, B. Plodowski, K. Spitzer, B. Wein, H. Schubert, H. Ney, T. Seidl, Extended query refinement for medical image retrieval, Journal of Digital Imaging 21 (3) (2008) 280e289. [240] J. Thomas, K. Cook, A visual analytics agenda, IEEE Computer Graphics 26 (1) (2006) 10e13. [241] A. Hiroike, Y. Musha, A. Sugimoto, Y. Mori, Visualization of information spaces to retrieve and browse image data, Lecture Notes in Computer Science 1614 (1999) 155e163. [242] Y. Gao, C. Yang, Y. Shen, J. Fan, Incorporate visual analytics to design a human-centered computing framework for personalized classifier training and image retrieval, in: Advances in Information and Intelligent Systems, Ser. Studies in Computational Intelligence vol. 251, Springer, 2009, pp. 165e187. [243] J. Rodrigues, L. Romani, A. Traina, C. Traina, Combining visual analytics and content based data retrieval technology for efficient data analysis, in: IEEE Int Conf Inf Vis, London, United Kingdom, 2010, pp. 61e67.

367

368

Weidong Cai, Yang Song, Ashnil Kumar, Jinman Kim and David Dagan Feng

[244] T. Itoh, A. Kumar, K. Klein, J. Kim, High-dimensional data visualization by interactive construction of low-dimensional parallel coordinate plots, Journal of Visual Languages & Computing 43 (2017) 1e13. [245] A. Kumar, et al., Designing user interfaces to enhance human interpretation of medical contentbased image retrieval: application to PET-CT images, International Journal of Computer Assisted Radiology and Surgery 8 (6) (2013) 1003e1014. [246] A. Kumar, et al., A visual analytics approach using the exploration of multidimensional feature spaces for content-based medical image retrieval, IEEE Journal of Biomedical and Health Informatics 19 (5) (2015) 1734e1746.

CHAPTER TWELVE

Diversity and novelty in biomedical information retrieval Xiangdong An1, Jimmy Xiangji Huang1 and Yuqi Wang2 1

Department of Computer Science, University of Tennessee at Martin, Martin, TN, United States School of Information Technology, York University, Toronto, Canada

2

12.1 Introduction and motivation Biomedicine, i.e., biological medicine, is a branch of medical science that applies biological and physiological principles to clinical practices, which includes vaccines, immunotherapies, gene therapy, and stem cell or tissue therapy [1]. Large amounts of biological and clinical data such as DNA sequences, electronic health records (EHRs), radiology images, and biomedical research literature have been produced at an unprecedented scale and speed [2,3]. Big data technologies are increasingly used in biomedical and healthcare informatics. It is widely accepted that big data has three major properties or dimensions, commonly known as the three Vs: volume, variety, and velocity. Volume refers to the amount of data, variety refers to the types of data, and velocity refers to the speed of data accumulation. According to the model, the challenges of big data processing and management come from all three properties, rather than just the volume alone. Biomedical data have been accumulated at a rapid velocity into an incredible volume with complex and heterogeneous data types. New big data technologies have been continuously proposed to deal with these challenges. There are four major biomedical subdisciplines [1]: (1) bioinformatics, (2) clinical informatics, (3) imaging informatics, and (4) public health informatics. Bioinformatics studies biological system variations at the molecular level and the association between gene and diseases. Clinical informatics focuses on the relationship between patient main diagnosis and underlying cause. Imaging informatics studies methods for generating, managing, analyzing, and using imaging information in biomedical applications. With the growing need for more personalized care, the need to incorporate imaging data into EHRs is rapidly increasing. Public health informatics applies big data techniques to monitor and predict infectious disease outbreaks. The volume of published biomedical research results from the four subdisciplines is growing at an increasing rate [4]. Biomedical literature retrieval is becoming increasingly difficult and complex, and there is a fundamental need for advanced information retrieval systems. Information retrieval (IR) programs search unstructured materials such as text documents in a large Biomedical Information Technology ISBN 978-0-12-816034-3, https://doi.org/10.1016/B978-0-12-816034-3.00012-2

2020 Elsevier Inc. All rights reserved.

369 j

370

Xiangdong An, Jimmy Xiangji Huang and Yuqi Wang

volume to identify relevant documents to user queries. IR studies the representation, storage, organization, and access of information items. In IR, one of the main problems is to determine which documents are relevant and which are not to user information needs. There are many ways to classify biomedical information, in Ref. [5], biomedical information is broadly divided into two categories: patient-specific information and knowledge-based information. Patient-specific biomedical information is the information about the health status of a patient, which may comprise the patients’ medical records. Knowledge-based biomedical information has been derived from biomedical research, which can be applied to individual patients. Researchers have worked on medical information and knowledge retrieval for over a century [5]. In 1879, Dr. John Shaw Billings created Index Medicus to help medical professionals find relevant journal articles [6], where articles were indexed by author name(s) and subject heading(s). In 1966, the National Library of Medicine (NLM) offered an electronic version of the printed Index Medicus, called the Medical Literature Analysis and Retrieval System [7]. Full-text databases began to emerge in 1980s, when computing power and storage were more plentiful. In the late 1990s, the NLM presented all of its databases to the world for free [5]. In information retrieval, we are interested in the information that is not only relevant but also diverse and novel. Novelty is a measure to the degree of dissimilarity between the document in consideration and the documents already seen in the ranked list. This judgment is dependent on both current document and previously read documents [8]. Novelty is targeted at reducing redundancy. Diversity is a measure to quantify the satisfaction of a ranked list to different possible interpretations of a query [9]. Diversity addresses query ambiguity. In the following, we introduce how to boost and evaluate diversity and novelty in biomedical information retrieval.

12.2 Overview of novelty and diversity boosting in biomedical information retrieval There are four major biomedical subdisciplines [1]: (1) bioinformatics, (2) clinical informatics, (3) imaging informatics, and (4) public health informatics. In this chapter, we focus on diversity and novelty boosting and evaluation in biomedical IR. In the IR field, topicality is the degree of topical match between a topic and a document [10], which is independent of any other documents [11]. Novelty is different from topicality in that novelty measures the degree of dissimilarity between the document in consideration and the documents already seen [8]. Novelty is a measure to identify originality and redundancy, and stresses the dynamics of judgment due to users’ knowledge and information need being constantly modified by the information encountered [12]. Diversity quantifies how well a ranked list satisfies different possible interpretations of a query [9], which addresses query ambiguity [8,9,13,14]. IR performance evaluation

Diversity and novelty in biomedical information retrieval

involves test collections, sampling, topics (queries, tasks) formation, and relevance evaluation [15e22]. Since queries represent information needs in ranking, we may improve IR performance by making queries more representative of information needs. Query expansion is one way to make queries more representative. Query expansion algorithms have been proposed to deal with synonyms, acronyms, and homonyms in concept extraction in biomedical IR [23,24]. An integration of different query expansion techniques is used to enhance biomedical information retrieval in Refs. [25,26]. Natural language processing can be used to improve biomedical IR by tokenization and annotation. In Ref. [27], a set of tokenization heuristics is studied to improve biomedical information retrieval. Natural language technology is used to annotate key terms and disambiguate the concepts they represent in Ref. [28]. Two natural language processing IR models are proposed to incorporate automatic POS (Part-Of-Speech)based term weighting schemes into bag-of words and Markov random field models in Ref. [29]. Machine learning methods can be applied to learn models for ranking. In Ref. [30], instead of offering a list of documents, a new information navigation system for biomedical IR is proposed to present users with a list of ranked clusters. Topically similar documents are grouped together to provide users with a better overview of the search results and to support exploration of similar literature within a cluster. In Ref. [31], a dictionary-based approach to biomedical cross-language information retrieval is proposed, where a multilingual lexicon is used. In Ref. [32], auxiliary document information in three basic dimensions, namely “temporal,” “journal,” and “author” is used to enhance genomics IR. A deep learning model is presented to model the relevance of a document to a keyword style query in biomedical information retrieval in Ref. [33]. Methods to represent and optimize diversity and novelty in information retrieval have been widely studied [8,34e38]. A criterion that linearly combines the relevance of a document to a query and the novelty of the document is presented in Ref. [8]. Interactive information retrieval is studied in Ref. [37], where users’ judgment is dynamically involved. It is shown that Wikipedia concepts are superior to MeSH vocabulary in aspect detection in Ref. [38]. In Ref. [35], probabilistic latent semantic analysis is used to cluster and rerank documents in order to boost the novelty. In Ref. [34], it is proposed to locate novel sentences but documents. Bayesian learning is applied to promoting diversity in biomedical IR in Ref. [36]. In Ref. [39], different techniques used in TREC Genomics track are analyzed to show how complementary these techniques are to each other. Nevertheless, there are few studies on metrics that evaluate diversity and novelty. In Ref. [9], a cumulative gain metric a-nDCG is introduced to measure diversity and novelty in IR, where parameter a (0 < a 1) reflects possible assessor errors regarding whether an information nugget belongs to a document. Metrics subtopic recall, subtopic

371

372

Xiangdong An, Jimmy Xiangji Huang and Yuqi Wang

precision, and weighted subtopic precision are introduced in Ref. [14]. Based on the three metrics, it is shown that a proposed ranking model combining both relevance and novelty modestly outperforms a baseline relevance ranking model. In Ref. [40], metric NRBP is proposed to combine a-nDCG and rank-based precision. However, it is unclear about the advantage of NRBP over a-nDCG [41]. Intent-aware (IA) metrics for diversity evaluation are proposed in Ref. [42], where intent probabilities can be estimated from click logs or obtained through human judgments from crowdsourcing Internet marketplaces. Some IA versions of traditional IR metrics are accordingly proposed such as ERR-IA [43], nERR-IA [43], MAP-IA [42], P-IA [42], and D# [44]. In Ref. [41], it is shown that IA metrics do not necessarily reward high intent recall. An empirical study [44] on novelty and diversity metrics implies that a-nDCG and D# are among the best with regard to the discriminative power [45]. Another study suggests that MAP, MAP-IA, and subtopic recall show higher discriminative power than a-nDCG [41]. A diversity metric for the recommender systems that considers both ranking and relevance is proposed in Ref. [46]. A metric geNov proposed in Ref. [47] does not only consider diversity and novelty, relevancy, and ranking, but also considers information richness and differentiates redundant documents from irrelevant ones. In this chapter, we present comparison results between geNov and state-of-the- art metrics for diversity and novelty with regard to their sensitiveness to ranking qualities, discriminative powers, and time efficiencies. In this chapter, we review and introduce recent major research and advances on diversity and novelty boosting approaches and evaluation metrics.

12.3 Boosting diversity and novelty in biomedical information retrieval In the IR field, topicality is the degree of matching of a document to a topic. A judge who is not the information requestor may make the judgment based on topicality, which is independent of any other documents [11]. Novelty measures the degree of dissimilarity between the document in consideration and the documents already seen in the ranked list, which is dependent on both current document and previously read documents [8]. Novelty is a measure to reduce redundancy. Diversity quantifies how well a ranked list satisfies different possible interpretations of a query [9]. Diversity is a measure to address query ambiguity.

12.3.1 Boosting novelty by maximal marginal relevance The first work that addresses the novelty issue in IR is [8]. Traditionally, IR systems rank documents based on their relevance to the user query [11]. Pure relevance ranking is considered appropriate when there are few relevant documents or when very high recall

Diversity and novelty in biomedical information retrieval

373

is required [8]. However, when numerous relevant documents are potentially redundant with each other, document ranking with diversity and novelty is naturally becoming necessary. In Ref. [8], maximal marginal relevance (MMR) is proposed to rank documents according to a combined criterion on “relevant novelty” that considers both query relevance and novelty of information. In this simple method, relevance and novelty are independently measured and linearly combined as a metric called “marginal relevance” as shown in Eq. (12.1). i h def MMR ¼ Arg max lsim1 ðdi ; qÞ ð1 lÞ max sim2 ðdi ; dj Þ ; (12.1) di ˛RS

dj ˛S

where q is a query or user profile, R ¼ IR(C, q, q)di.e., the ranked list of documents retrieved by an IR system given a collection of documents C and q and a relevance threshold q, S is the subset of documents in R already selected, ReS is the set difference, sim1 is the similarity metric used in document retrieval and relevance ranking, and sim2 can be the same as sim1. A document has high marginal relevance if it is both relevant to the query and contains minimal similarity to previously selected documents. This method strives to maximize marginal relevance in retrieval, hence is labeled as “maximal marginal relevance” (MMR) [8]. MMR becomes a pure relevancy measure when the parameter l ¼ 1, and a pure diversity measure when l ¼ 0. Users who wish to locate novel or diverse information to a query with less redundancy, should set l at a smaller value, and those who wish to focus on potentially overlapping or reinforcing relevant documents, should set l to a value closer to 1. In a pilot study with five undergraduates from various disciplines, four users preferred MMR to pure relevance ranking [8]. In a government-run evaluation of 15 summarization systems in 1998, MMR-based summarizer produced the highest F-score of 0.73 [8], where F-score is defined as Eq. (12.2). 1 recall 1 þ precision1 precision , recall F ¼ ¼ 2, : (12.2) precision þ recall 2 MMR or similar methods have been applied in many diversity or novelty boosting or subtopic retrieval approaches [14,48e52].

12.3.2 Boosting novelty by probabilistic latent semantic analysis In Ref. [35], probabilistic latent semantic analysis was used to boost novelty in biomedical IR. In this study, the dataset, topics, and the evaluation were from the Genomics track of Text REtrieval Conference (TREC).

374

Xiangdong An, Jimmy Xiangji Huang and Yuqi Wang

Algorithm 1 Aspect-level performance evaluation (Aspect-based MAP).

The Genomics track of the TREC provided a common platform to evaluate the methods and techniques proposed for biomedical IR. To evaluate diversity and novelty of the passage retrieval results, the aspect-based mean average precision (MAP) was proposed in 2006 and was also used in 2007 [13]. Aspects were a set of terms used to indicate subtopics that were covered by the passage. A passage for a topic is novel if it contains aspect terms assigned to the topic which have not appeared in the passages ranked higher. The aspect-based MAP over all topics can be computed by Algo. 1, which is summarized from the Python evaluation program of the Genomics Track of the Text REtrieval Conference (TREC) 2007 [13,48]. According to Algo. 1, the search needs to reach as many novel passages and new aspects as possible and as early as possible to achieve a high score of the aspect-based MAP. “Aspects” are assigned to each passage by the judges and are considered latent. On the other hand, it is well known that a topic model such as latent semantic analysis [53], probabilistic latent semantic analysis (PLSA) [54], and latent Dirichlet allocation [55] can represent a document as a mixture of latent aspects. In Ref. [35] PLSA is applied to boosting novelty in biomedical IR, where each N retrieved passage di in D ¼ fdi gM i¼1 and each word wj from a vocabulary W ¼ fwj gj¼1 are assumed to be generated from a set of latent aspects Z ¼ fzk gK k¼1 following conditional probabilities. Therefore, all passages retrieved initially can be described as an M N matrix T ¼ ((c(di, wj))ij, where c(di, wj) is the number of times wj appears in

Diversity and novelty in biomedical information retrieval

375 z

d

w

Figure 12.1 Given a hidden aspect z, a document d and the word w are independent.

passage di. Each row in T is then a frequency vector about one passage. Assuming given a hidden aspect factor z, a passage d is independent of the word w as shown by Fig. 12.1 for a symmetric formulation. Starting with a hidden aspect z with P(z), the document d with P(djz) and the word w with P(wjz) are then independently generated. Then by Bayes’ rule, the joint probability P(d, w) can be obtained as follows: X Pðd; wÞ ¼ PðzÞPðdjzÞPðwjzÞ: z˛Z

Therefore, the values of P(z), P(djz), and P(wjz) need to be found to maximize the following likelihood function: XX LðD; W Þ ¼ cðd; wÞlog Pðd; wÞ: d˛D w˛W

The solution can be achieved by EM algorithm [35]. All passages can then be clustered using zd ¼ argmax Pðzi jdÞ zi ˛Z

and the passages in each cluster are sorted based on the probability P(zdjd ) in descending order. All clusters are then merged by continuously picking one from each cluster to form a new reranked list. A set of four runs were reranked using the method and it indicated that over all runs, the maximum performance improvement on the aspect-based MAP by the method is 46.95%, the minimum improvement is 1.47%, and the average improvement is 20.06% [35].

12.3.3 Boosting diversity by relevance-novelty graphical model In Ref. [56], a probabilistic graphical model RelNov was proposed to boost diversity using Wikipedia. Acronyms, homonyms and synonyms are frequently used in biomedical literature, but domain-specific thesauri such as UMLS, MeSH and the Gene Ontology only

376

Xiangdong An, Jimmy Xiangji Huang and Yuqi Wang

provide synonyms, hypernyms, and hyponyms of a specific term without context. Different from the domain-specific thesauri, Wikipedia provides semantic background knowledge and context to almost every entity in the world and is a potentially valuable knowledge resource to help differentiate terms. To use Wikipedia for aspect detection, three steps are involved: 1. Locating the candidate aspect terms in the retrieved document; 2. Mapping these candidate aspect terms to Wikipedia articles; 3. Selecting the most salient concepts from the matched Wikipedia articles. The proposed RelNov model [56] was based on an undirected probabilistic graphical model called Markov Random Field. Nodes in the graph model represent a set of random variables and edges between nodes represent dependencies. Nodes in Markov Random Fields satisfy three Markov properties: 1. Pairwise Markov property: Any two nonadjacent variables are conditionally independent given all other variables. 2. Local Markov property: A variable is conditionally independent of all other variables given its neighbors. 3. Global Markov property: Any two subsets A and B of variables are conditionally independent given a separating subset S where every path from A to B passes through S. The three Markov properties are equivalent for a positive probability [57]. The proposed RelNov model has four nodes including the retrieved document qd, the previous ranked documents qo, the relevance R of the retrieved document and the novelty N of the retrieved document, respectively. The joint distribution of these four nodes represents the probability of a document being both relevant and novel. The document model qd can be further represented with the term-based document model qt and aspect-based document model qa to capture both the lexical information and conceptual information in a retrieved document. For the graphical representation of the Markov random field, please refer to Fig. 12.2 in Ref. [56]. Based on conditional independence assumptions, the joint probability distribution can be expressed as a product of potential functions over the maximal cliques. Using RelNov to rerank runs submitted to TREC 2007 Genomics track, a 16.4% improvement over the highest aspect level MAP and a 9.8% improvement over the highest level MAP were achieved [56].

12.4 Diversity and novelty evaluation metrics There exist many diversity and novelty evaluation models in the literature [9,14,42e44,47]. In this section, we introduce three evaluation frameworks: subtopic retrieval metrics [14], a-nDCG [9], and geNov [47].

Diversity and novelty in biomedical information retrieval

Figure 12.2 Comparison of the scores assigned by aspect-based MAP and geNov to the submitted run and its optimal counterpart run obtained by reranking, where “Aspect-based MAP” and “geNov” are the respective scores assigned to the submitted run, and “Aspect-based MAP (rerank)” and “geNov (rerank)” are the respective scores assigned to the optimal counterpart run.

377

Xiangdong An, Jimmy Xiangji Huang and Yuqi Wang

378

Figure 12.2 (continued).

12.4.1 Subtopic retrieval metrics A set of three measures (s-recall, s-precision, and ws-precision) were introduced in Ref. [14] to help assess and promote diverse rankings. A diverse ranking is to address an ambiguous query to cover different subtopics of a general topic. This is also called the subtopic retrieval. The objective of the subtopic retrieval is to produce a ranked list that covers as many subtopics as possible and does it as early in the ranking as possible. Given a topic t with n subtopics a1, ., an, and a ranking d1, ., dm of m documents, let s(di) be the set of subtopics included in di. The subtopic recall (s-recall ) at rank k, i.e., s-recall@k, can be defined as the percentage of subtopics covered by first k documents [14], i.e., k def Wi¼1 sðdi Þ s-recall@k ¼ : (12.3) n To reflect ranking difference across different topics, s-precision is proposed. Let S be some IR system that produces rankings and r (0 r 1) is a recall level, minRank(S, r) is defined as the minimal rank k at which the ranking produced by S has s-recall r. The subtopic precision (s-precision) at recall r, i.e., s-precision@r, is defined as [14].

Diversity and novelty in biomedical information retrieval

379 def

s-precision@r ¼

minRankðSo ; rÞ ; minRankðS; rÞ

(12.4)

where So is an optimal system that produces the optimal ranking with recall rdi.e., minRank(So,r) is the smallest ranking k to produce subtopic recall r. In order to penalize redundancy, the cost of a ranking can be incorporated into sprecision, where the cost of a ranking can be defined as [14]. def

costðd1 ; .; dk Þ ¼

k X

ðajsðdi Þj þ bÞ ¼ a

i¼1

k X ðjsðdi Þj þ kb;

(12.5)

i¼1

where a is the cost of processing a single subtopic in document di, and b is the cost of presenting a document di to a user. Then minCost(S, r) is defined to be the minimal cost c at which the ranking produced by S has s-recall r. The weighted subtopic precision (wsprecision) at recall level r, i.e., ws-precision@r, is defined to be [14]. def

ws-precision@r ¼

minCostðSo ; rÞ ; minCostðS; rÞ

(12.6)

where again So produces the optimal (lowest-cost) ranking that obtains recall r. This should be a ranking with minimum redundancy. s-recall has been used as a metric in many subtopic retrieval evaluations [58e61], or comparison studies [62,63], but s-precision and ws-precision are not widely seen in the literature.

12.4.2 a-nDCG a-nDCG is different from standard nDCG in that a-nDCG uses parameter a to represent the assessor judgment on the probability that a nugget belongs to a document. The parameter a (0 < a 1) reflects the assessor error. Below is an introduction to the derivation of the measure. For more details, please refer to Ref. [9]. Let q be the query representing the information need from a user, and d be a document that may be relevant to q. Let N ¼ fni gm i¼1 be the space of possible information nuggets. Users information need q can be modeled as a set of information nuggets Q 4 N. Similarly, the information present in a document is modeled as a set of nuggets D 4 N. Let binary random variable r ¼ 1 represent relevance. A document is relevant to q if it contains at least one nugget that is among the information nuggets modeled in q. Let P(ni ˛ q) denote the probability that the users information need contains ni, and P(ni ˛d) denote the probability that the document contains ni. Assuming that ni and njsi are independent when ni ˛d and njsi ˛d or when ni ˛q and

Xiangdong An, Jimmy Xiangji Huang and Yuqi Wang

380

njsi ˛q. So Pðr ¼ 1jq; dÞ ¼ 1

m Y

ð1 Pðni ˛qÞPðni ˛dÞÞ:

(12.7)

i¼1

In Ref. [9], it is assumed that a human assessor may have assessor error. Let J(d, i) ¼ 1 if the assessor has judged that d contains nugget ni, and J(d, i) ¼ 0 if not. Let constant a(0 < a 1) denote the probability of the correctness of the assessor judgment on P(ni ˛d). Assuming nuggets are equally likely relevant, i.e., P(ni ˛q) ¼ g for all i with constant g. Then Eq. (12.7) becomes Pðr ¼ 1jq; dÞ ¼ 1

m Y

ð1 agJðd; iÞÞ:

(12.8)

i¼1

Let Ji,k1 be the number of documents ranked up to position k1 that have been judged to contain nugget ni, and the random variables associated with relevance at each rank be r1, ., rk. Thus Pðrk ¼ 1jq; d1 ; .; dk Þ ¼ 1

m Y

m X Jðdk ; iÞð1 aÞ Ji;k1: 1 gaJðdk ; iÞð1 aÞ Ji;k1 zga

i¼1

i¼1

(12.9) Dropping the constant ga, the kth element of the gain vector G can be defined as G½k ¼

m X

Jðdk ; iÞð1 aÞ Ji;k1:

(12.10)

i¼1

The discounted cumulative gain vector can be defined as DCG½k ¼

k X

G½ j=ðlog2 ð1 þ jÞÞ:

j¼1

The normalized discounted cumulative gain nDCG can be acquired by normalizing DCG with the ideal discounted cumulative gain vector: nDCG½k ¼

DCG½k DCG0 ½k

This version DCG and nDCG are referred to as a-DCG and a-nDCG. Both have been implemented for TRECWeb track diversity and novelty evaluations from 2010 to 2014 [60]. a-nDCG has been popular since its birth and been used in many diversity and novelty evaluation studies [52,59,64,65].

Diversity and novelty in biomedical information retrieval

12.4.3 geNov The measure geNov [47] is a metric on genomics novelty, which was proposed based on aspect-based MAP (mean average precision) from the Genomics Track of Text REtrieval Conference (TREC) [13]. The aspect-based MAP from TREC is summarized in Algo. 1 [48]. There are some noted problems with Algo. 1 [47]: 1. Relevant but redundant passages have no impact to the score; 2. The amount of relevant aspects in current document is not counted in scoring, though the amount of new aspects is counted; 3. Before an irrelevant passage is reached in the nominated document list, the ordering of the documents with different amount of new aspects does not make a difference in scoring. However, we expect a document with more new aspects be ranked higher. A new metric, geNov (genomics Novelty) [47], was proposed to address these problems. It is expected that 1. The novel passages should be ranked higher than the relevant ones; 2. The relevant ones (may be redundant) should be ranked higher than the irrelevant ones; 3. The level of novelty (i.e., the amount of new aspects) and the level of relevancy (i.e., the amount of aspects) should make a difference in scoring. It is shown by propositions in Ref. [47] that novelty and redundancy, relevancy and irrelevancy, and the level of relevancy and novelty would be recognized under ideal conditions. For novel passages, we prefer to see more novel ones earlier. For the redundant passages, we prefer to see more relevant ones first. We combine numNew Aspects and numAspects linearly to measure the “relevant novelty” [48] as follows: rn ¼ l numNewAspects þ ð1 lÞ numAspects: Algo. 1 is then modified into Algo. 2, which is the proposed metric geNov. We use an increasing arithmetic sequence as the incremental values of denominator (e.g., 1, 1.5, 2, 2.5, .) to penalize less novel or less relevant passages. This is incorporated at line seven of Algo. 2 with parameter s 0. If we use t to denote the number of topics and p to represent the number of passages a run can nominate for each topic, the algorithm would have a worst-case complexity of O(tp). In Ref. [47], geNov has been experimentally studied with the 24 variants of the three runs submitted to TREC 2007 Genomics track by York University [66]. In this chapter, we further study the metric with the variants of 60 more runs submitted to TREC 2007 Genomics track. In TREC 2006 and 2007 Genomics track [13], a corpus of 162,259 HTML formatted documents were used for researchers to search for answering passages for a list of 36 topics. Each research group can submit up to three runs, each of which

381

382

Xiangdong An, Jimmy Xiangji Huang and Yuqi Wang

Algorithm 2 Aspect-level performance evaluation, geNov.

may include up to 1000 nominated passages to each of the 36 topics. Therefore, each run may include up to 36,000 passages. For TREC 2007 Genomics Track, a total of 66 runs from 27 research groups around the world were received, but we only got 63 runs from 26 groups for the study. Excluding the three runs submitted by York University that have been used in the evaluation of geNov in Ref. [47], in this section, we report the evaluation results from the other 60 runs. Firstly, we check how differently geNov and the aspect-based MAP assign scores. As shown in Fig. 12.2, geNov consistently assigns higher scores to a run (the original run or its optimal counterpart run from reranking) than aspect-based MAP. It also reveals that both aspect-based MAP and geNov assign significantly higher scores to the optimal counterpart run than to the original run. An optimal run is obtained by reranking the original run such that the new run will get the highest possible score from the designated metric. The results shown in Fig. 12.2 are consistent with the conclusion obtained in Ref. [47] with three runs. Next, we check how geNov responds to different ranking qualities compared with aspect-based MAP.We use the same method in Ref. [47] to rerank each of the 60 submitted runs to generate eight categories of variant runs such that each category of variant runs represents a different ranking quality. Therefore, each category should have 60 variant runs, one from each of the 60 submitted runs and in total there will be 480

Diversity and novelty in biomedical information retrieval

variant runs. The eight sets of variants are divided into two groups. The first group includes the sets one to five, which are different with regard to how the novel, redundant, and irrelevant passages are distributed in a ranked list for each topic in a run. This group is mainly to check the metrics’ ability to differentiate between novelty and redundancy, and between relevancy and irrelevancy in the evaluation. The second group includes the sets six to eight, which are different with regard to whether passages are ranked based on their level of novelty or their level of relevancy. This group is to check how well a metric recognizes the level of novelty and the level of relevancy of the passages. We use the same method in Ref. [47] to label each category of variant runs. “Nov” denotes the novel passages ranked based on novelty, “Red” the redundant passages ranked based on the relevancy, “Ire” the irrelevant passages, “Nov-1” or “Red-1” the novel or the redundant passages ranked in reverse to their level of novelty or level of relevancy, “Red þ Ire” the redundant and the irrelevant passages mixed with their relative positions in the respective submitted runs maintained, and “Nov[Red]” some novel passages and some redundant passages mixed at the end of novel passage section and at the beginning of the redundant passage section. Therefore, the eight sets of run variants can be labeled with the set number and their characteristics as: • 1:Nov,Red,Ire • 2:Nov,Red þ Ire • 3:Nov[Red],Ire • 4:Nov,Ire,Red • 5:Ire,Nov,Red • 6:Nov,Red-1,Ire • 7:Nov-1,Red,Ire • 8:Nov-1,Red-1,Ire The experimental results for variant group 1 are as shown in Fig. 12.3. Aspect-based MAP and geNov both give a significantly low score to 5:Ire,Nov,Red. This is reasonable since the set 5 has the worst rankings which put irrelevant passages on the top of the lists. Otherwise, aspect-based MAP does not respond to the differences in ranking qualities of other four sets: 1:Nov,Red,Ire, 2:Nov,Red þ Ire, 3:Nov[Red],Ire, and 4:Nov,Ire,Red. This is because aspect-based MAP does not consider redundant passages in scoring and if irrelevant passages are ranked behind all novel passages, they do not have any impacts to the score. Nevertheless, geNov does assign different scores to these four categories of variant runs. This is consistent with the results obtained from the variants of three runs in Ref. [47], though the amount of differences in scores from Ref. [47] are higher than the differences of the scores obtained in this experiment. Also, the results from this experiment indicate geNov assigns higher scores to 3:Nov[Red],Ire than to 4:Nov,Ire,Red for the two variant runs from some research groups but other way around for the two variant runs from other research groups. The results in Ref. [47] consistently assign higher scores to 3:Nov[Red],Ire than to 4:Nov,Ire,Red. In the future, we will investigate what causes the inconsistency.

383

384

Xiangdong An, Jimmy Xiangji Huang and Yuqi Wang

Figure 12.3 Comparison of the scores assigned by aspect-based MAP and geNov to variant runs from variant group 1 including 1:Nov,Red,Ire, 2:Nov,Red þ Ire, 3:Nov[Red],Ire, 4:Nov,Ire,Red, and 5:Ire,Nov,Red. Each subfigure presents the results of variant runs from one research group.

Diversity and novelty in biomedical information retrieval

The experimental results for variant group 2 are as shown in Fig. 12.4. Again, aspectbased MAP does not give different scores to all four runs in this group. However, geNov does recognize the differences in ranking qualities in this group and assign decreasing scores to the four variants runs: 1:Nov,Red,Ire, 6:Nov,Red-1,Ire, 7:Nov-1,Red,Ire, and 8:Nov-1,Red-1,Ire. This result is consistent with the result obtained in Ref. [47].

12.5 Evaluation results of diversity and novelty metrics The experimental study conducted over a set of 10 diversity and/or novelty metrics in Ref. [47] is based on the variants of a set of three runs submitted to TREC Genomics track ’07 by York University [47]. These three runs were generated using the Okapi IR system [24,66,67]. In this section, we perform an empirical study on these 10 metrics based on the variant runs of a larger dataset (the 63 runs submitted to TREC Genomics track ’07 by 26 research groups in the world) and present the results.

12.5.1 Sensitiveness to the ranking qualities Metric geNov, and the other nine metrics used in TREC Web Track [60] for diversity and novelty evaluation have been evaluated and compared in Ref. [47]. The nine metrics are ERR-IA, nERR-IA, a-DCG, a-nDCG, NRBP, nNRBP, MAP-IA, P-IA, and strec (subtopic recall s-recall). For geNov, default parameter values s ¼ 0.5, l ¼ 0.8, and d ¼ 0.5 are used. For all other nine metrics, default value a ¼ 0.5 for a-DCG, a-nDCG and NRBP and default value b ¼ 0.5 for NRBP are used. The study was conducted with the 24 variants of the three runs submitted to TREC 2007 Genomics Track by York University [66], where eight variants for each of the submitted runs were created such that these variants display different ranking qualities as discussed in Section 4.3. In this chapter, we further study these metrics based on the variant runs of a larger dataset (the 63 runs submitted to TREC Genomics track ’07 by 26 research groups in the world) and present the results. All variant runs for the 63 runs are obtained as discussed in Section 4.3. We then apply these metrics on these runs to examine their responses to the differences in ranking qualities in these variant runs. The results in Fig. 12.5 are obtained from variant sets one to five on the sensitiveness to novelty, redundancy, and irrelevancy, and the results in Fig. 12.6 are obtained from variant sets six to eight on the sensitiveness to the level of novelty and the level of relevancy. Due to space limitations, we only report results from the first run of each research group. As shown in Fig. 12.5, all 10 metrics give the lowest scores to the variant runs in set 5 (i.e., 5:Ire,Nov,Red). This is expected since this set of runs places irrelevant passages on the top of the lists and is the worst among the eight sets. Nevertheless, among all nine metrics used in TREC Web track, only a-nDCG, strec, nERR-IA, and nNRBP have a moderate response to the variant runs in set 3 (i.e., 3:Nov[Red],Ire), which has some

385

386

Xiangdong An, Jimmy Xiangji Huang and Yuqi Wang

Figure 12.4 Comparison of the scores assigned by aspect-based MAP and geNov to variant runs from variant group 2 including 1:Nov,Red,Ire, 6:Nov,Red-1,Ire, 7:Nov-1,Red,Ire, and 8:Nov-1,Red-1,Ire. Each subfigure presents the results of variant runs from one research group.

Diversity and novelty in biomedical information retrieval

Figure 12.5 Sensitiveness of the 10 metrics to novelty, redundancy, and irrelevancy. The results are obtained from the variant runs of the 63 runs submitted to TREC Genomics track ’07.

387

388

Xiangdong An, Jimmy Xiangji Huang and Yuqi Wang

Figure 12.5 (continued).

novel passages and some redundant passages mixed at the end of novel passage section. All other metrics do not have a perceivable response to this mixture. This reveals that these metrics are incapable to penalize redundant passages ranked higher than novel passages. All nine metrics used in TREC Web track do not distinguish runs in sets 1 (1:Nov,Red,Ire), 2 (2:Nov,Red þ Ire), and 4 (4:Nov,Ire,Red). This implies they do not differentiate between redundant passages and irrelevant ones. On the other hand, geNov

Diversity and novelty in biomedical information retrieval

Figure 12.6 Sensitiveness of the 10 metrics to the level of novelty and the level of relevancy. The results are obtained from the variant runs of the 63 runs submitted to TREC Genomics track ’07.

389

390

Xiangdong An, Jimmy Xiangji Huang and Yuqi Wang

Figure 12.6 (continued).

gives all different scores to the variant runs in these five sets. Particularly, geNov assigns decreasing scores to the variant runs from set one to set 5. As discussed in Section 4.3, the ranking qualities from set one to set five are intuitively decreasing. The results obtained here are consistent with the results obtained in Ref. [47].

Diversity and novelty in biomedical information retrieval

From Fig. 12.6, none of the nine metrics used in TRECWeb track could differentiate between the variants in set 1 (1:Nov,Red,Ire) and the variants in set 6 (6:Nov,Red-1,Ire), and between the variants in set 7 (7:Nov-1,Red,Ire) and the variants in set 8 (8:Nov-1,Red1,Ire). This means they have problems in recognizing the level of relevancy of redundant passages. Metrics _- nDCG, nNRBP and nERR-IA do distinguish sets one and six from sets seven and eight in scoring to appreciate their differences in respecting the level of novelty. All other six metrics have only a tiny or no response to the differences between sets one and six and sets seven and 8. This implies that they have problems to recognize the differences in respecting the level of novelty. On the other hand, geNov gives all different scores to the variant runs in these four sets. Particularly, the scores assigned by geNov to the variants in the four sets are decreasing in the order of sets 1, 6, seven and 8. This is consistent with our intuition on the differences of ranking qualities of the four sets as discussed in Section 4.3. The conclusion obtained here is consistent with that obtained in Ref. [47].

12.5.2 Discriminative power and running time The discriminative power has been used in the metric evaluation as a standard criterion to represent the reliability of a metric [45]. The method simply computes a significance test between every pair of experimental runs and reports the percentage of pairs that are significant at some fixed significance level. A study has been conducted to evaluate the discriminative powers of the 10 metrics above. The study is based on 24 variant runs from the three runs submitted to TREC Genomics track ’07, which produce 276 pairs in total for the experiment. The discriminative power is calculated using the two-tailed paired t-test with a significance level of 0.05, and the retrieval depth k is set to 1000. The result indicates that metric geNov has the highest discriminative power among all 10 metrics. For more details, please refer to Ref. [47]. Since in the above discriminative power test, each set of eight variant runs derived from the same submitted run have the same novel, redundant and irrelevant passages (though they are distributed in the eight variant runs differently after reranking), some precision-oriented measures may get exceptionally low discriminative power when k ¼ 1000. Therefore, we use all 63 runs submitted to TREC Genomics track 2007 as a new dataset to further test the discriminative powers of the above 10 metrics. These 63 runs will produce 1953 pairs in total for the discriminative power experiment. The results are as shown in Table 12.1, where the metrics are sorted in descending order of their discriminative powers. These results are different from the ones in Ref. [47], but geNov is still among the top three measures with the highest discriminative powers. A study has been conducted to compare runtime performance of the 10 diversity and novelty evaluation algorithms above [47]. The result indicates that metric geNov takes about 36% of time any of the other nine state-of-the-art algorithms generally uses. For more details, please refer to Ref. [47].

391

Xiangdong An, Jimmy Xiangji Huang and Yuqi Wang

392

Table 12.1 The discriminative powers of measures under the twotailed paired t-test with a significance level of 0.05 and a retrieval depth of 1000. The test is conducted with 1953 pairs of the measures from the 63 runs submitted to the TREC Genomics track 2007. Discriminative Measures power@1000

P-IA Strec geNov a-nDCG a-DCG nERR-IA nNRBP ERR-IA MAP-IA NRBP

56.02% 52.07% 50.94% 49.10% 40.71% 39.89% 34.51% 31.95% 29.59% 27.39%

12.6 Summary and future work In this chapter, biomedical IR boosting methods for diversity and novelty and their evaluation metrics are reviewed and reexamined. We present three sample methods that boost diversity and novelty in biomedical IR: the method that boosts diversity and novelty by maximal marginal relevance, the method that boosts diversity and novelty by probabilistic latent semantic analysis, and the method that boosts diversity and novelty by Markov random fields. There are many other boosting methods in the literature which could not be covered in the review due to the space constraint. We also present three sample frameworks for diversity and novelty evaluation: subtopic retrieval metrics, a-nDCG, and geNov. Metric geNov is relatively new and the other two frameworks, especially subtopic recall and a-nDCG have been widely used in diversity and novelty evaluations. Again, there are many diversity and novelty evaluation metrics and many of them have been widely used in diversity and novelty evaluations. We could not cover all of them due to the space limitation. We experimentally reexamine geNov with a larger dataset. We present results regarding the differences in its scores assigned to regular and optimal runs, and how well geNov can distinguish different ranking qualities. Finally, we conduct an empirical study over a set of 10 diversity and/or novelty metrics with a larger dataset and present results regarding their capabilities to recognize ranking qualities, their discriminative powers, and running time. Metric geNov shows advantages over other nine metrics in ranking qualities, discriminative powers, and running time. There is much interesting research that we can explore in the future. First, we will study more metrics for measuring the diversity and novelty. Second, we will conduct an in-depth study on more methods that can boost diversity and novelty [68e70]. It is also

Diversity and novelty in biomedical information retrieval

interesting to evaluate these methods and metrics presented in this chapter on more datasets (e.g., Ref. [71]), and to apply these ideas and concepts in other real-world applications (e.g., Refs. [38,72,73]).

Acknowledgments We thank the book editor, Prof. David Feng, and his assistant, Cindy Bai, for their great support in the chapter writing. We also thank the reviewers for their insightful feedback. In addition, we would like to thank the National Institute of Standards and Technology in Gaithersburg, USA for providing us with all the biomedical/genomics datasets. This research was supported by the Natural Sciences and Engineering Research Council of Canada and the Premiers Research Excellence Award.

References [1] J. Luo, M. Wu, D. Gopukumar, Y. Zhao, Big data application in biomedical research and health care: a literature review, Biomedical Informatics Insights 8 (2016) 1e10. [2] Z. Liang, G. Zhang, J.X. Huang, Q.V. Hu, Deep learning for healthcare decision making with emrs, in: BIBM’14, 2014, pp. 556e559. [3] W. Raghupathi, V. Raghupathi, Big data analytics in healthcare: promise and potential, Health Information and Systems 2 (3) (2014) 1e10. [4] A.M. Cohen, W.R. Hersh, A survey of current work in biomedical text mining, Briefings in Bioinformatics 6 (1) (2005) 57e71. [5] W.R. Hersh, Information retrieval and digital libraries, in: E.H. Shortliffe, J.J. Cimino (Eds.), Biomedical Informatics, Springer-Verlag London, 2014, pp. 613e641. [6] M. DeBakey, The national library of medicine: evolution of a premier information center, Journal of the American Medical Association 266 (1991) 1252e1258. [7] W. Miles, A History of the National Library of Medicine: The Nation’s Treasury of Medical Knowledge, U.S. Department of health and Human Services, Bethesda, 1982. Technical report. [8] J. Carbonell, J. Goldstein, The use of MMR, diversity-based reranking for reordering documents and producing summaries, in: SIGIR’98, 1998, pp. 335e336. [9] C.L.A. Clarke, M. Kolla, G.V. Cormack, O. Vechtomova, A. Ashkan, S. Bu¨ttcher, I. Mackinnon, Novelty and diversity in information retrieval evaluation, in: SIGIR’08, 2008, pp. 659e666. [10] J. Miao, J.X. Huang, J.Z. Topprf, A probabilistic framework for integrating topic space into pseudo relevance feedback, ACM Transactions on Information Systems 34 (4) (2016) 22, 1e22:36. [11] B. Boyce, Beyond topicality: a two stage view of relevance and the retrieval process, Information Processing & Management 18 (3) (1982) 105e109. [12] S.P. Harter, Psychological relevance and information science, Journal of the American Society for Information Science and Technology 43 (9) (1992) 602e615. [13] W. Hersh, A. Cohen, L. Ruslen, P. Roberts, TREC 2007 Genomics track overview, in: TREC’07, NIST Special Publication, 2007. SP 500-274. [14] C. Zhai, W.W. Cohen, J. Lafferty, Beyond independent relevance: methods and evaluation metrics for subtopic retrieval, in: SIGIR’03, 2003, pp. 10e17. [15] F. Corcoglioniti, M. Dragoni, M. Rospocher, A.P. Aprosio, Knowledge extraction for informational retrieval, in: ESWC’16, 2016, pp. 317e333. [16] G.V. Cormack, T.R. Lynam, Statistical precision of information retrieval evaluation, in: SIGIR’06, 2006. [17] Q. Hu, J.X. Huang, X. Hu, Modeling and mining term association for improving biomedical information retrieval performance, BMC Bioinformatics 12 (Suppl. 9) (2012) 1471e2105. [18] K. Ja¨rvelin, J. Keka¨laïnen, Cumulated gain-based evaluation of IR techniques, ACM Transactions on Information Systems 20 (4) (2002) 422e446. [19] B. Koopman, P. Bruza, L. Sitbon, M. Lawley, Evaluating medical information retrieval, in: SIGIR’11, 2011, pp. 1139e1140.

393

394

Xiangdong An, Jimmy Xiangji Huang and Yuqi Wang

[20] L. Tamine, C. Chouquet, T. Palmer, Analysis of biomedical and health queries: lessons learned from TREC and CLEF evaluation benchmarks, Journal of the American Society for Information Science and Technology 66 (12) (2015) 2626e2642. [21] J. Waitelonis, C. Exeler, H. Sack, Linked data enabled generalized vector space model to improve document retrieval, in: NLP & DBpedia@ISWC’15, 2015, pp. 34e44. [22] E. Yilmaz, E. Kanoulas, J.A. Aslam, A simple and efficient sampling method for estimating ap and ndcg, in: SIGIR’08, 2008, pp. 603e610. [23] C. Crangle, A. Zbyslaw, J.M. Cherry, E.L. Hong, Concept extraction and synonymy management for biomedical information retrieval, in: TREC’04, 2004. [24] J.X. Huang, M. Zhong, L. Si, Genomics track, in: TREC’05, York University at TREC, 2005 (ssss). [25] A. Abdulla, H. Lin, B. Xu, S.K. Banbhrani, Improving biomedical information retrieval by linear combinations of different query expansion techniques, BMC Bioinformatics 17 (Suppl. l) (2016) 443e543. [26] A.R. Rivas, E.L. Iglesias, L. Borrajo, Study of query expansion techniques and their applications in the biomedical information retrieval, Science World Journal 2014 (2014) 10. Article ID 132158. [27] J. Jiang, C. Zhai, An empirical study of tokenization strategies for biomedical information retrieval, Information Retrieval Journal 10 (4e5) (2007) 341e363. [28] S. Karimi, j. Zobel, F. Scholer, Quantifying the impact of concept recognition on biomedical information retrieval, Information Processing and Managment 48 (2012) 94e106. [29] Y. Wang, S. Wu, D. Li, S. Mehrabi, H. Liu, A part-of-speech term weighting scheme for biomedical information retrieval, Journal of Biomedical Informatics 63 (2016) 379e389. [30] X. Mu, H. Ryu, K. Lu, Supporting effective health and biomedical information retrieval and navigation: a novel facet view interface evaluation, Journal of Biomedical Informatics 44 (2011) 576e586. [31] P. Daumke, K. Marku¨, Biomedical information retrieval across languages, Medical Informatics & the Internet in Medicine 32 (2) (2007) 131e147. [32] Q. Hu, X. Huang, Enhancing genomics information retrieval through dimensional analysis, Journal of Bioinformatics and Computational Biology 11 (3) (2013). [33] S. Mohan, N. Fiorini, S. Kim, Z. Lu, A fast deep learning model for textual relevance in biomedical information retrieval, in: WWW’18, 2018. [34] J. Allan, C. Wade, A. Bolivar, Retrieval and novelty detection at the sentence level, in: SIGIR’03, 2003, pp. 314e321. [35] X. An, J.X. Huang, Boosting novelty for biomedical information retrieval through probabilistic latent semantic analysis, in: SIGIR’13, 2013, pp. 829e832. [36] J.X. Huang, Q. Hu, A Bayesian learning approach to promoting diversity in ranking for biomedical information retrieval, in: SIGIR’09, 2009, pp. 307e314. [37] Y. Xu, H. Yin, Novelty and topicality in interactive information retrieval, Journal of the American Society for Information Science and Technology 59 (2) (2008) 201e215. [38] X. Yin, J.X. Huang, Z. Li, X. Zhou, Survival modeling approach to biomedical search result diversification using wikipedia, IEEE Transactions on Knowledge and Data Engineering 25 (6) (2013) 1201e1212. [39] X. An, N. Cercone, How complementary are different information retrieval techniques? A study in biomedical domain, in: CICLing’14, 2014, pp. 367e380. [40] C.L.A. Clarke, M. Kolla, O. Vechtomova, An effectiveness measure for ambiguous and underspecified queries, in: ICTIR’09, 2009, pp. 188e199. [41] C.L.A. Clarke, N. Craswell, I. Soboroff, A. Ashkan, An comparative analysis of cascade measures for novelty and diversity, in: WSDM’11, 2011, pp. 75e84. [42] R. Agrawal, S. Gollapudi, A. Halverson, S. Leong, Diversifying search results, in: WSDM’09, 2009, pp. 5e14. [43] O. Chapelle, D. Metzler, Y. Zhang, P. Grinspan, Expected reciprocal rank for graded relevance, in: CIKM’09, 2009, pp. 621e630. [44] T. Sakai, R. Song, Evaluating diversified search results using per-intent graded relevance, in: SIGIR’11, 2011, pp. 1043e1052.

Diversity and novelty in biomedical information retrieval

[45] T. Sakai, Evaluating evaluation metrics based on the bootstrap, in: SIGIR’06, 2006, pp. 525e532. [46] S. Vargas, P. Castells, Rank and relevance in novelty and diversity metrics for recommender systems, in: RecSys’11, 2011, pp. 109e116. [47] X. An, J.X. Huang, geNovAnew metric for measuring novelty and relevancy in biomedical information retrieval, Journal of the Association for Information Science and Technology 68 (11) (2017) 2620e2635. [48] X. An, N. Cercone, H. Wang, Z. Ye, A study on novelty evaluation in biomedical information retrieval, in: SPIRE’12, 2012, pp. 54e60. [49] J.F. Forst, A. Tombros, T. Roelleke, Less is more: maximal marginal relevance as a summarisation feature, in: ICTIR’09, 2009, pp. 350e353. [50] J. Goldstein, J. Carbonell, Summarization: (1) using mmr for diversity - based reranking and (2) evaluating summaries, in: TIPSTER’98, Baltimore, Maryland, 1988. [51] S. Guo, S. Sanner, Probabilistic latent maximal marginal relevance, in: SIGIR’10 vols. 19e23, 2010. [52] M. Koniaris, I. Anagnostopoulos, Y. Vassiliou, Evaluation of diversification techniques for legal information retrieval, Algorithms 10 (1) (2017). [53] S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer, R. Harshaman, Indexing by latent semantic analysis, Journal of the American Society for Information Science 41 (1990). [54] T. Hofmann, Probabilistic latent semantic analysis, in: UAI’99, 1999. [55] D.M. Blei, A.Y. Ng, M.I. Jordan, Latent dirichlet allocation, Journal of Machine Learning Research 3 (4e5) (2003) 993e1022. [56] X. Yin, Z. Li, X. Huang, X. Hu, Promoting ranking diversity for genomics search with relevancenovelty combined model, BMC Bioinformatics 12 (Suppl. 5) (2011). [57] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann Publishers, USA, 1988. San Franciso, CA. [58] T. Arni, P. Clough, M. Sanderson, M. Grubinger, Overview of the imageclefphoto 2008 photographic retrieval task, in: Evaluating Systems for Multilingual and Multimodal Information Access, 2008. [59] P. Chandar, B. Carterette, Analysis of various evaluation measures for diversity, in: ECIR’11, 2011. Dublin, Ireland. [60] K. Collins-Thompson, C. Macdonald, P. Bennett, F. Diaz, E.M. Voorhees, TREC 2014 web track overview, in: TREC’14, 2014. [61] T. Leelanupab, G. Zuccon, J.M. Jose, A query-basis approach to parametrizing novelty-biased cumulative gain, in: ICTIR’11, 2011, pp. 327e331. [62] J. Luo, C. Wing, H. Yang, M. Hearst, The water filling model and the cube test: multi-dimensional evaluation for professional search, in: CIKM’13, 2013. [63] T. Sakai, N. Craswell, R. Song, S. Robertson, Z. Dou, C.-Y. Lin, Simple evaluation metrics for diversified search results, in: EVIA’10, 2010. [64] D. Wemhoener, J. Allan, Balancing aspects in retrieved search results, in: ICTIR’15, 2015, pp. 305e308. [65] H.-T. Yu, A. Jatowt, R. Blanco, H. Joho, J.M. Jose, An in-depth study on diversity evaluation: the importance of intrinsic diversity, Information Processing & Management 53 (2017) 799e813. [66] J.X. Huang, D. Sotoudeh-Hosseinii, H. Rohian, X. An, Genomics track, in: TREC’07, University at TREC, York, 2007, pp. 787e791, 2007. [67] M. Hancock-Beaulieu, M. Gatford, X. Huang, S.E. Robertson, S. Walker, P.W. Williams, Okapi at TREC-5, in: TREC’96, 1996. [68] X. Huang, F. Peng, D. Schuurmans, N. Cercone, S.E. Robertson, Applying machine learning to text segmentation for information retrieval, Information Retrieval 6 (3- 4) (2003) 333e362. [69] J. Miao, J.X. Huang, Z. Ye, Proximity-based Rocchio’s model for pseudo relevance, in: SIGIR’12, 2012, pp. 535e544. [70] J. Zhao, J.X. Huang, B. He, CRTER: using cross terms to enhance probabilistic information retrieval, in: SIGIR’11, 2011, pp. 155e164. [71] Y. Liu, X. Huang, A. An, X.Y. ARSA, A sentiment-aware model for predicting sales performance using blogs, in: SIGIR’07, 2007, pp. 607e614.

395

396

Xiangdong An, Jimmy Xiangji Huang and Yuqi Wang

[72] W. Feng, Q. Zhang, G. Hu, J.X. Huang, Mining network data for intrusion detection through combining SVMs with ant colony networks, Future Generation Computer Systems 37 (2014) 127e140. [73] X. Yu, Y. Liu, X. Huang, A. An, Mining online reviews for predicting sales performance: a case study in the movie domain, IEEE Transactions on Knowledge and Data Engineering 24 (4) (2012) 720e734.

CHAPTER THIRTEEN

Toward large-scale histopathological image analysis via deep learning Bin Kong1, a Zhongyu Li1, a and Shaoting Zhang1 1

Department of Computer Science, University of North Carolina at Charlotte, Charlotte, NC United States

13.1 Introduction Different types of cancers are endangering lives worldwide. Fortunately, the recovery rate of multiple types of cancer (e.g., breast cancer, gastric cancer) can be greatly improved with early diagnosis and treatment [1]. The histopathological image has served as the gold standard of cancer diagnosis and played a vital role in diagnosing cancer in clinical settings for over a century. Traditionally, tissue samples were reviewed by pathologists under a microscope, which is time-consuming and prone to error. This is changing with the age of digitized glass slides. The current practice is to first digitize the glass slides into histopathological images with a whole-slide image (WSI) scanner and then employ specialized software (i.e., a virtual slide viewer [2]) to view and inspect these images. WSI scanners can digitize glass slides into histopathological images in just a couple minutes, and the computer software is able to present histopathological images on a screen in real time. Nevertheless, due to the complexity and sheer volume of histopathological image data, effectively diagnosing cancer in gigapixel-sized histopathological images still requires an experienced pathologist a considerable amount of time. In addition, the inter- and intra-observer errors introduced in this process make it more challenging. Computer-assisted diagnosis (CAD) is a potentially powerful tool to offer more reliable, efficient, and consistent cancer diagnosis from histopathological images. Currently, state-of-the-art deep learning-based algorithms can automatically analyze the properties (e.g., cells and cancer regions) of histopathological images. Nevertheless, these powerful algorithms are still at the prototype stage and thus are not widely used. In this chapter, we aim to bridge the gap by reviewing recent efforts toward CAD-based histopathological image analysis algorithms, introducing the unique challenges encountered when building effective cancer diagnostic tools and the essential knowledge needed to overcome some of them.

a Indicates equal contribution

Biomedical Information Technology ISBN 978-0-12-816034-3, https://doi.org/10.1016/B978-0-12-816034-3.00013-4

2020 Elsevier Inc. All rights reserved.

397 j

398

Bin Kong, Zhongyu Li and Shaoting Zhang

13.2 Unique challenges in histopathological image analysis In this section, we briefly introduce the challenges of developing algorithms for analyzing histopathological images. The development of pathology discipline is attributed to its basis of more than a century-long history of microscopic diagnosis of cancer tissue samples. The creation of histopathological images is very complex. Surgeons or other clinicians collect cancer tissue samples and then send them to the pathology laboratory for analysis. Afterward, the materials are processed, and glass slides are created with some portions of the specimens. Finally, the glass slides are reviewed, and a report is dictated by pathologists. In the pathology laboratory, the tissue samples also undergo a series of processing steps before the microscopic histopathological images are acquired, as illustrated in Fig. 13.1. First, a cancer tissue specimen preserved with formalin arrives in the pathology laboratory. Afterward, a pathologist thoroughly examines the specimen (e.g., measuring, sketching, and taking photos) and cuts out multiple pieces of tissues for further analysis. Then an automatic processing machine processes these pieces with chemical solutions and the pieces are then embedded into paraffin blocks, with a microtome used to section micrometer-thin slices of tissue from the paraffin blocks. These almost-transparent tissue slices are then placed on glass slides and stained with different colors to highlight different tissue structures. Finally, the glass slides are digitized with a WSI scanner to create histopathological images. Typically, a large set of glass slides is generated during this process.

Figure 13.1 The typical process of producing glass slides in the pathology laboratory.

Toward large-scale histopathological image analysis via deep learning

Figure 13.2 Tissue folds, out-of-focus regions, bubbles, dirt and dust in histopathological images.

Although parts of the process are automated nowadays, the whole procedure still must be supervised. Errors that happen during this process would lead to different image artifacts (e.g., tissue folds, out-of-focus regions, bubbles, dirt and dust), as shown in Fig. 13.2. Moreover, differences in any step of histopathological image preparation may result in completely variant images. Plus, tissues may have totally different colors, anatomical structures, and nucleus shapes, resulting in large variations of histopathological images, as shown in Fig. 13.3. Addressing these variations is the major concern for developing CAD-based histopathological image analysis systems. Earlier work [3] on histopathological image analysis mainly focuses on quantitative analysis of some manually selected regions. Nevertheless, it is also vital to develop algorithms to directly analyze WSI images, because this allows for quantitative analysis of the entire landscape of histopathological images. Gigabyte WSI has a large image resolution of up to 200, 000 100, 000 pixels that contains tens of thousands or even millions of nuclei or cells. The sheer volume of WSI images presents multiple unique challenges. First, it takes a significant amount of time to analyze the entire image, especially for state-of-the-art algorithms such as deep learning. The expensive

Figure 13.3 Large variations (e.g., color, anatomical structure, and nucleus shape) of histopathological images.

399

Bin Kong, Zhongyu Li and Shaoting Zhang

400

computation requirement becomes a barrier to clinical deployment. In order to address this problem, recent research has mainly been based on the sliding-window approach. However, this may lead to the second challenge: each patch is considered independently and the correlation of topological or structural information between adjacent patches is ignored, thus lowering diagnostic accuracy.

13.3 Computer-aided diagnosis for histopathological image analysis In this section, we briefly review recent advances in histopathological image analysis including fine-grained analysis of regions of interests (ROIs), high-level analysis of WSIs, and deep learning acceleration for histopathological image analysis, especially based on the frontiers of deep learning methods. Automatic histopathological image analysis systems date back to as early as when it could digitize the histopathological image. In recent years, with top-performing convolutional neural networks (CNNs) revolutionizing computer vision [4e6] and medical imaging [7e10], deep neural networks are gaining popularity in the community of histopathological image analysis [11e13]. Neural networks consist of layers of artificial neurons. These hierarchical architectures are able to learn multiple levels of representation and discover complex nonlinear features and relationships in data, making it much easier to extract key information for high-level tasks such as classification and segmentation. However, a great amount of labeled training data is required to optimally train deep learning models with a large number of parameters. Fortunately, with more publicly available large-scale histopathological image databases such as the Cancer Genome Atlas [14], deep learning-based methods have become a methodology of choice for histopathological image analysis tasks. In the field of histopathological image analysis, deep neural networks were originally introduced to solve traditional histopathological image analysis tasks including detection, segmentation, and classification of the nuclei or cells and segmentation of certain anatomical structures such as tubules or organs. These fine-grained analysis methods are usually limited to only some small ROIs because of the intensive computational cost. Recently, however, several lines of research have been conducted to provide high-level analysis directly on WSIs.

13.3.1 Fine-grained analysis of regions of interest The grading of invasive cancer is usually based on some histologic criteria such as the BloomeRichardson grading system [15]. Thus, earlier work [3] on histopathological image analysis evaluates the aggressiveness of invasive cancer based on histologic characteristics such as mitotic activity, tubule formation, and nuclear pleomorphism (nuclear features).

Toward large-scale histopathological image analysis via deep learning

Recently, the increasing popularity of deep learning, as well as the introduction of multiple histopathological image challenges, has fostered the development of CAD approaches to these problems. For instance, in the ICPR 2012 challenge, the best performing team outperformed other approaches with a CNN-based method [11]. They first apply a trained CNN on the unseen histopathological image in a sliding window manner, yielding a probability map that is postprocessed to predict mitosis locations. The same team won the AMIDA13 challenge with a similar method. Xu et al. [16] addressed the problem of nuclei detection with a stacked sparse autoencoder. In the training phase, the stacked sparse autoencoder was trained to reconstruct the nuclei. In the testing stage, the patches with low reconstruction errors are deemed nuclei and vice versa. Gland segmentation has also been extensively explored. In Ref. [17], Chen et al. designed a multihead network to simultaneously predict contour and mask for each gland. Integrating the contour information, this unified network was able to separate the overlapping or clustered glands. There are multiple works focusing on speeding up fine-grained analysis of histopathological images. For instance, Wang et al. [18] replace all the convolution, pooling, and fully connected layers with d-regularly sparse kernel [19]. The resulting model can take a large image as input instead of a small patch. Xu et al. [20] focus on speeding up cell detection with asynchronous prefetching and GPU parallelization. However, a tremendous amount of computational resources still must be used.

13.3.2 High-level analysis of whole-slide images Due to computational cost, fine-grained histologic evaluation of histopathological images is usually limited to some predefined ROIs. Additionally, Basavanhally et al. [21] demonstrated that computerized BloomeRichardson grading of invasive breast cancers within some predefined ROIs of WSIs is problematic. Recently, some approaches have shown the potential of deep learning-based methods on directly analyzing WSIs. Hou et al. [22] present a weakly supervised approach to classify WSIs. As only the ground truth labels of the WSIs are given, they extract discriminative patches with expectation maximization to train a CNN. In the testing stage, each patch of the WSI is scored by the trained CNN, and the scores are aggregated to determine cancer subtypes. To the best of our knowledge, the first challenge that directly analyzed WSIs was CAMELYON16 [23]. Instead of focusing on histologic level predictions, CAMELYON16 looked at the issue of classifying WSIs into two categories: cancerous or normal. Moreover, participants in CAMELYON16 were asked to tackle a second challengeddetecting invasive cancer regions in WSIs. Gigabyte WSIs cannot be fed into CNNs for prediction due to the memory issue. To address the problem, the best performing team [24] extracts patches of fixed size (e.g., 256 256) in a sliding window fashion and feeds them to a GoogleNet [25]. Kong et al. [26] extend this approach by considering neighborhood context information decoded using a recurrent neural

401

Bin Kong, Zhongyu Li and Shaoting Zhang

402

network (RNN). Using model ensembles (i.e., several Inception V3 models [27]), Liu et al. [28] bring the detection result to 88.5% in terms of average FROC on the CAMELYON16 dataset. Chen et al. [29] present an augmented reality microscope that gives pathologists real-time results of deep learning methods. More recently, Xing et al. [13] present a comprehensive review related to deep learning techniques in the analysis of microscopy images, which summarizes current deep learning achievements in various tasks, such as detection, segmentation, and classification in microscopy image analysis. Nevertheless, the above deep learning-based methods face the common problemdthe computation is expensive, which becomes a barrier to clinical deployment.

13.3.3 Deep learning acceleration for histopathological image analysis One of the most challenging problems for WSI image analysis is handling large-scale images (e.g., 2 GB or more). High-performance computing resources such as cloud computing and HPC machines mitigate computational challenges. However, they either have a data traffic problem or are not always available due to high cost. Thus, high accuracy and efficiency become essential for deploying WSI invasive cancer diagnosis software into the clinical applications. In the previous work, there is few focusing on both accuracy and computation efficiency in computer-aided WSI cancer diagnosis. Resource efficiency in neural network study such as network compression is an active research topic. Product quantization [30], hashing, Huffman coding, and low-bit networks [31] have been studied, but they sacrifice accuracy.

13.4 Deep learning for histopathological image analysis In this section, we present our solutions for the analysis of histopathological images based on recent advances in deep learning techniques. Particularly, we consider the spatial information among neighboring patches for more accurate analysis, which is different from related methods that tackle image patches separately.

13.4.1 Overview The top row of Fig. 13.4 shows an overview of the proposed framework. First, WSIs are divided into small image patches with a fixed size. Unlike previous deep neural network methods that treat each small image patch independently, the proposed framework considers each image patch and its neighbors together. Particularly, the Spatio-Net not only extracts discriminative features from each patch but also considers the spatially structured information embedded among the image patch and its neighbors. The SpatioNet then returns a probability map, which indicates the probability of each image patch as a normal or a tumor region. Accordingly, metastases in each WSI can be located by considering the generated probability map.

Toward large-scale histopathological image analysis via deep learning

Figure 13.4 Overview of the proposed framework and the architectures of our Spatio-Net [26]. The top row is the proposed framework. The WSIs are divided into small patches. Each patch and its neighbors are fed into Spatio-Net with spatial structured constraint, resulting in a probability map, which is further processed to locate the metastases. The bottom row shows the detailed structures of the Spatio-Net. The CNN encodes each patch and its neighbors into a grid of fixed-length vectors. Afterward, 2D-LSTM layers are employed to further explore the remaining spatially structured information in the grid to give a more accurate prediction. Different from Ref. [24], Spatio-Net explicitly models the spatially structured information in 2D-LSTM layers.

The Spatio-Net architecture (the bottom row of Fig. 13.4) includes two main modules: CNN and two-dimensional long short-term memory (2D-LSTM). The CNN acts as an effective feature extractor to encode each patch with neighboring patches into compact fixed-length vectors, resulting in a small grid of vectors. Subsequently, we embed the spatially structured information in this grid, which is further explored in 2DLSTM.

13.4.2 Patch encoding with convolutional neural networks Handcrafted features [32,33] are not powerful enough to represent and discriminate the large variances of WSIs. To fully capture the useful information from every image patch, a CNN is employed as a powerful feature extractor. In recent years, we have witnessed numerous kinds of CNN architectures [5,25,34]. In our framework, the deep residual network [6] is employed for feature extraction, since it is highly discriminative to distinguish the subtle differences between normal and tumor patches. Essentially, the deep residual network acts as a feature transformer j(I;U), which can map the image patch I to a fixed-length vector x˛ℝd, where U is learned weights.

403

Bin Kong, Zhongyu Li and Shaoting Zhang

404

13.4.3 Accurate prediction via two-dimensional long short-term memory A straightforward way to incorporate the spatial dependencies among image patches is postprocessing, such as smoothing and averaging neighboring predictions. However, the patch configurations are usually complex, and the spatial dependencies in postprocessing are always suboptimal. Thus, we incorporate the spatial dependency by passing the above-mentioned feature vector grid into a 2D-LSTM model. Compared with a traditional RNN [35], an LSTM [36] is much easier to train because of its special structure, which can avoid the vanishing or exploding gradients problem during backpropagation. Each LSTM unit at current time step t contains a hidden state ht and a memory cell ct. The memory cell aims to learn when to forget previous memory and when to update it. In addition, it contains four gates to control the ow of the corresponding information, i.e., input gate it, forget gate ft, memory gate mt, and output gate ot. Accordingly, the hidden state and the memory cell for the next time step t þ 1 can be updated as itþ1 ¼ sðWi H t Þ ftþ1 ¼ sðWf H t Þ otþ1 ¼ sðWo H t Þ mtþ1 ¼ sðWm H t Þ

(13.1)

ctþ1 ¼ ftþ1 1ct þ itþ1 1mt htþ1 ¼ tanhðotþ1 1ct Þ where Ht is the concatenation of the input xi and current hidden state ht. Wi, Wf, Wo, and Wm are the weight matrices for the input, forget, output, and memory gates, respectively. Nonlinear function sðxÞ ¼ ð1 þ ex Þ1 squashes its input x to [e1, 1]. And 1 denotes element-wise product. Following [37], LSTM is used as a shorthand for Eq. (13.1). Then, it can be simplified as ðmtþ1 ; htþ1 Þ ¼ LSTMðH t ; mt ; W Þ

(13.2)

where W is the concatenation of weight matrices of the four current gates discussed above. In order to determine the current hidden states of each location j, we extract the corresponding hidden states from the N neighboring (N ¼ 8 in our case) LSTM units in the i-th 2D-LSTM layer and the corresponding hidden states from the previous 2D-LSTM layer. Let hsj;i;n ðn ¼ 1; 2; .; N Þ denote the hidden states from the n-th

Toward large-scale histopathological image analysis via deep learning

405

neighboring LSTM unit in the i-th 2D-LSTM layer, and hdj;i indicates the hidden states of the i-th 2D-LSTM layer at location j. Similar to regular 1D-LSTM, for the current LSTM unit j, we use Hj,i as the concatenation of the previous hidden states hsj;i;n , and the input from the i-th 2D-LSTM layer at the corresponding position hdj;i . The memory cells and hidden states of each location j in the (iþ1)-th 2D-LSTM layer can be calculated by s h j;iþ1;n ¼ LSTM H j;i ; msj;i;n ; Wis msj;iþ1;n ; b (13.3) mdj;iþ1 ; hdj;iþ1 ¼ LSTM H j;i ; mdj;i ; Wid where msj;i;n is the memory cell for the n-th neighboring 2D-LSTM node, and mdj;i is the corresponding memory cell from the previous layer. Wid and Wis are the weight matrices for the spatial and depth dimensions, respectively. The 2D LSTM layers are followed by a fully connected layer and normalized by a softmax function afterward. The final output can be interpreted as the probabilities of patches being tumorous or normal.

13.4.4 Loss function Given an image patch and its neighbors, the proposed Spatio-Net can learn to predict their categories, considering spatially structured information. Aside from the 2D-LSTM layers, we further model this constraint in the loss layer. The deep neural network proposed in this article defines a classifier h. Ideally, given the current patch x , its neighbors xl (l ¼ 1,2 .,N), and their ground truth labels y and yl, the predictions for them should be consistent with their labels, i.e., jh(x )h(xl)j should be small if y ¼ yl. Otherwise, the difference should be huge for those belonging to different categories; i.e., jh(x )h(xl)j should be large if y syl. To enforce this constraint in Spatio-Net, a novel loss function, namely spatially structured loss, is defined: Lspatio ¼ Lind ¼

1 ðLind Ldif Þ 2

1 X X 1ðy ¼ yl Þ,½hðx Þ hðxl Þ2 N x ˛D l

Ldif ¼

(13.4)

1 X X 1ðy syl Þ,½hðx Þ hðxl Þ2 N x ˛D l

where 1($) is the indicator function. D is the training set. Lind ensures that the predictions for x and xl are similar if they are from the identical category; i.e., y ¼ yl. Ldif rewards the network to maximally distinguish x from xl if y ¼ yl. Having defined the spatially structured loss, the training criteria for the whole network can be further explored.

Bin Kong, Zhongyu Li and Shaoting Zhang

406

Given the training set D, the training objective becomes the task of estimating the network weights l¼(U,V) (U and V are the parameters for the CNN and 2D LSTM layers, respectively: 1 kUk22 þ kV k22 2 X ¼ logðhðx ÞÞ

Lreg ðlÞ ¼ Lcls

x ˛D

(13.5)

l ¼ arg minfLcls þ aLreg ðlÞ þ bLspatio g l

where Lreg is the regularization penalty term, which ensures the learned weights are sparse. Lcls is the cross-entropy loss function. a and b are hyperparameters that are crossvalidated in our experiments.

13.4.5 Results and discussions We conducted multiple experimental evaluations of the Spatio-Net. In the first experiment, we focused on the effectiveness of the spatially structured constraint for Spatio-Net. We implemented [24] as the-baseline system, which is a single net architecture. The only difference is that we use ResNet [6] with 101 layers (Resnet101) instead of GoogleNet [25], since it achieved better performance in our experiments.1 Compared with the baseline, the other methods follow the same network architecture. The difference is that the-baseline þ PostPro adds smoothing and averaging as postprocessing to the neighboring output labels, which should be similar, and the Spatio-Net integrated the spatially structured constraint in the network. The results of these three algorithms were summarized in Table 13.1. Although adding postprocessing brings certain spatial dependency, it only marginally improves the result (less than 1%), since the constraint is not integrated into the model. Our SpatioNet outperforms the method of [24] by more than 5%, benefiting from the fact that it optimizes the problem with the spatially structured information, while the previous network architecture only has local-optimal solutions. To summarize, adding the spatially structured constraint does not significantly sacrifice the computational efficiency, but greatly benefits the accuracy.

1 The published state-of-the-art results for the cancer metastasis detection competition were reported in our chosen baseline [24]. However, the competition was already closed, and the ground truth of the testing data was no longer available, so we were unable to evaluate our method upon the same testing data. Therefore, for fair comparison, we reimplemented the framework of [24] and evaluated all methods with five-fold cross-validation using the same released dataset.

Toward large-scale histopathological image analysis via deep learning

407

Table 13.1 Quantitative comparisons. We compare Spatio-Net (Ours without postprocessing) with the state-of-the-art framework [24]. Note that GoogleNet [25] is substituted with ResNet with 101 layers [6] (Resnet101) for better accuracy. We also compare Spatio-Net with Resnet101 with postprocessing (smoothing and averaging operations are added upon output labels as the postprocessing to enforce the spatially structured constraints). Methods Resnet101 Resnet101 with PostPro. Ours without PostPro.

Ave. FROC STD

0.7012 0.012

0.7104 0.015

0.7539 0.008

13.5 High-throughput histopathological image analysis How to make the invasive cancer detection system as efficient as possible while maintaining the accuracy? We answer this question with a suite of training and inference techniques: (1) For efficiency we design a small capacity network based on depth-wise separable convolution [38]; (2) To improve accuracy, we refine the small capacity network learning from a large capacity network on the same training dataset. We enforced the logits layer of the small capacity network has a close response as logits layer of the large capacity network. A similar approach was investigated in Ref. [39]. In addition to that, we use an additional teacher guided loss to help the small network better learn from the intermediate layers of the high capacity network; (3) To further speed up the computation in the inference stage, we avoid the procedure of frequently extracting small patches in a sliding window fashion but instead, we convert the model into a fully convolution network (a network does not have multilayer perceptron layers). As a result, our method is five times faster than one of the popular and state-of-the-art benchmark solutions without sacrificing accuracy.

13.5.1 Overview Fig. 13.5 shows an overview of our proposed method. The method is derived from detection by performing patch classification (normal patches vs. cancer patches) via a sliding window. However, it is different from the traditional method of detection by performing classification. The base network is a small capacity network proposed for solving patch classification problem with a faster inference speed than a large capacity network.

13.5.2 Small-capacity network This small network is trained on the training patches. The small capacity network has weak learning capability due to small number of learnable weights and may cause underfitting and lower inference accuracy than the large capacity network in the inference stage. To solve this problem, we enforce the small capacity network learning the “useful

408

Bin Kong, Zhongyu Li and Shaoting Zhang

Figure 13.5 Overview of the proposed framework [40]. The above part indicates the training phase and below part indicates inference phase. Note that we only illustrate proposed transfer learning method in the training phase.

knowledge” from the high capacity network in order to improve inference accuracy. Thus, we first train a high capacity network on the same training set. Then, we distill small capacity network’s weights in a fine-tuning stage discussed in the following section. In the inference stage, we convert multilayer perceptron layers (fully connected layers) of the network into fully convolution layers (fcn layers). This change allows the network using arbitrary sized tiles so that we can use large tiles resulting in faster speed. The output probability map is postprocessed and detection results are produced from it using a method similar to Ref. [28]. The training objective function can be denoted as follows: 1 X L ¼ ðLcls ðxÞ þ lLguide ðxÞÞ þ gLreg (13.6) jSj x˛S where S is the training patches. Lcls denotes the classification loss, comprising of the softmax loss using the hard ground truth label of the training patch and the regression loss using the soft probability label from the large capacity network. We will discuss it in detail in the following section. Lguide is the teacher guided loss, which will be elaborated later. Lreg denotes the regularization penalty term which punishes large weights. Finally, l and g are balancing hyper-parameters to control the weights of difference losses, which are cross-validated in our experiments. To reduce the model’s capacity, we utilized depth-wise separable convolution in our small capacity network architecture. Depth-wise separable convolution (depth-wise convolution and pointwise convolution) is proposed in Ref. [38] and replaces convolution layers. Each kernel in a depth-wise convolution layer performs convolution

Toward large-scale histopathological image analysis via deep learning

409

operation on only a single channel of input feature maps. To incorporate cross-channel information and change the number of output feature maps, pointwise convolution (i.e., 1 1 convolution) is applied after depth-wise convolution. The depth-wise separable convolution in Ref. [41] obtains a large factor of reduction in terms of computation comparing to corresponding convolution operations.

13.5.3 Transfer learning from large-capacity network We utilized a large capacity network (deep and wide network with more weights) to “teach” the small capacity network and adapt the model moving toward large network’s manifold resulting logits of two networks being closer. We use the knowledge of both the output (probability) and intermediate layers (feature) in the large capacity network to teach the small capacity network. The network distilling technique proposed in Ref. [39] serves this transfer learning task. The softmax layer transforms the logit zi for each class into the corresponding probability pi: expðzi =T Þ j˛f0;1g expðzj =T Þ

pi ¼ P

(13.7)

where i ¼ 0 and i ¼ 1 represent negative and positive labels, respectively. T is the temperature which controls the softness of the probability distribution over the label. A higher temperature T > 1 produces soft probabilities distribution over2classes, which helps the transfer learning. We used soft regression loss (Lsoft ¼ pi bp i , where pi and bp i are the probabilities produced by the small and large capacity networks, respectively) to enforce small capacity network’s outputs to match the large capacity network’s outputs. We pretrained the large capacity network using T ¼ 2. In transfer learning, large capacity network’s weights are fixed, and T ¼ 2 is used in both small and large networks. In prediction, T ¼ 1 is used. Additionally, we use the hard ground truth label of the training patch to supervise the training. Then, the total classification loss is as follows: Lcls ¼ Lhard þ bLsoft

(13.8)

where Lhard denotes the softmax (hard) loss. Hyper-parameter b controls the weights of hard and soft losses, which is cross-validated in our experiments.

13.5.4 Feature adaptation from intermediate layers Romero et al. [42] demonstrated that the features learned in the intermediate layers of large capacity networks can be efficiently used to guide the student network to learn effective representations and improve the accuracy of the student network. Inspired by

Bin Kong, Zhongyu Li and Shaoting Zhang

410

this idea, we apply the L2 distance between feature of the teacher network Ftea and the student network Fstu, which we name as teacher guided loss: Lguide ¼ kFtea Fstu k2

(13.9)

While applying teacher guided loss, it is required that shape of the feature map dimension from teacher network should be the same as the student network. However, these two features are from different networks and the shape can be different. Thus, we use an adaptation layer (i.e., a fully connected layer) to map the feature from the student network to the same shape of the teacher network.

13.5.5 Efficient inference In most of popular WSI detection solutions such as [24,28], fixed-size patch-based classification is performed in a sliding window fashion. The number of forward computation is linear to the number of evaluated patches. The memory cannot hold all patches so that frequent I/O operations have to be performed. This is the major source of the computational bottleneck. Inspired by Ref. [43], we replace all the fully connected layers in the small capacity network using equivalent convolutional layers. After the transformation, the FCN can take a significantly larger image if the memory allows. Let sizep be the input image size used in a classification network before FCN transformation. After FCN transformation, the output of the network is a 2D probability map. The resolution of the probability map is scaled due to strided convolution and pooling operations. Let d be the scale factor. We assume that n layers (either convolution or pooling) have stride values > 1 (i.e., stride ¼ 2 in our implementation). Thus the scale factor d ¼ 2n. A pixel location xo in the probability map corresponds to the center xi of a patch with size sizep in the input image. Centers displace d pixels from each other. xi is computed as xi ¼ d,xoþP(sizep1)/2R. 13.5.5.1 Results and analysis We compared Inception V3 network (method I) using explicitly sliding window fashion proposed in Ref. [28], FCN-accelerated Inception V3 (method FI) implementation, student network using explicitly sliding window (method S), FCN-accelerated student network (method FS), FCN-accelerated and distilled student network (FDS), and our final proposed approach: FCN-accelerated and distilled student network with teacher guided loss (FDSG). The stride of the sliding window is 128. The explicitly sliding window-based method is the most widely used method and it achieved the state-of-theart results [24,28]. Note that the original Inception V3 in Ref. [28] utilized eight ensembled models. However, for a fair comparison here, we only used one single model. Due to GPU memory limitation, for the FCN-based methods (FI, FS, FDS, and FDSG), we partition the WSI into several blocks with overlaps and stitch the probability maps to

Toward large-scale histopathological image analysis via deep learning

411

Table 13.2 Comparison of different detection approaches regarding the computational costs and accuracy. Our framework (FDSG) is compared with Inception V3 (I), FCN-accelerated Inception V3 (FI), our student network (S), FCN-accelerated student network (FS), FCN-accelerated and distilled student network (FDS). Methods I FI S FS FDS FDSG

CAMELYON16 Gastric cancer

Time (s) Ave. FROC Time (s) Ave. FROC

1023 0.857 228 0.806

546 0.859 138 0.813

468 0.809 90 0.768

216 0.815 36 0.773

216 0.847 36 0.801

216 0.856 36 0.811

a single one accordingly after the inferences. In the method FI, we used a block 14511451 with an overlap of 267 pixels. In methods FS, FDS, FDSG, we used a block 1792 1792 with an overlap of 192 pixels. Fully convolution-based detection significantly speeds up the inference compared with the corresponding sliding window approach. As illustrated in Table 13.2, the method FI is 1.7 and 1.9 times faster than the method I for gastric cancer and CAMELYON16 datasets, respectively. The method FS is 2.5 and 2.2 times faster than the method S for gastric cancer and CAMELYON16 datasets, respectively. Note that small capacity model (FS) is about 2.5 and 2.2 times faster than the large capacity model (FI) for gastric cancer and CAMELYON16 datasets, respectively. In addition, we observed that the small capacity model reduced Ave. FROC of about 4% and 5% for gastric cancer and CAMELYON16 datasets, respectively. However, once the small network gained knowledge from transfer learning, the detection accuracy of it became close to the large model. For CAMELYON16 dataset, we observed that single Inception V3 model cause Ave. FROC decreasing to 85:7% from 88:5% reported in Ref. [28]. This drop is expected because ensembled models reduced model variance and overfitting. However, this result has been state-of-the-art accuracy among single model-based methods. Lin et al. developed an efficient inference algorithm and in their study [44], they reported 15 min per WSI on CAMELYON16. While we achieved a much faster computation time, the validation is performed in different hardware and software environments. These experiments demonstrate that we could keep the same detection accuracy compared with the method I and improve the efficiency significantly (5 times faster than the method I) via model “compression” and transfer learning. Our proposed model is more memory efficient and costs only 12 MB memory in contrast to the 84 MB required in the method I.

13.6 Summary State-of-the-art deep CNN-based invasive cancer detection methods have pushed the accuracy boundary closer to clinical application. However, the computation and

Bin Kong, Zhongyu Li and Shaoting Zhang

412

accuracy burdens are barriers in real clinical setups. We proposed new methods to keep high detection accuracy with efficient computation and memory usage. Particularly, we improved detection accuracy by considering the spatial information among neighboring patches, and we also show that a larger and higher-performing network is able to teach a small network to have similar prediction power. In this way, the proposed method requires less high-performance computing resources and runs much faster than state-ofthe-art methods. Thus, we expect that our work will become more applicable in the clinical usage.

References [1] M. Horner, L. Ries, M. Krapcho, N. Neyman, R. Aminou, N. Howlader, S. Altekruse, E. Feuer, L. Huang, A. Mariotto, et al., Seer Cancer Statistics Review, 1975e2006, National Cancer Institute, Bethesda, MD, 2009. [2] A.J. Gifford, A.J. Colebatch, S. Litkouhi, F. Hersch, W. Warzecha, K. Snook, M. Sywak, A.J. Gill, Remote frozen section examination of breast sentinel lymph nodes by telepathology, ANZ Journal of Surgery 82 (11) (2012) 803e808. [3] X. Shi, F. Xing, K. Xu, Y. Xie, H. Su, L. Yang, Supervised graph hashing for histopathology image retrieval and classification, Medical Image Analysis 42 (2017) 117e128. [4] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, T. Darrell, Decaf: a deep convolutional activation feature for generic visual recognition, in: International Conference on Machine Learning, 2014, pp. 647e655. [5] A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in: Advances in Neural Information Processing Systems, 2012, pp. 1097e1105. [6] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770e778. [7] G. Litjens, T. Kooi, B.E. Bejnordi, A.A.A. Setio, F. Ciompi, M. Ghafoorian, J.A. Van Der Laak, B. Van Ginneken, C.I. Sańchez, A survey on deep learning in medical image analysis, Medical Image Analysis 42 (2017) 60e88. [8] D. Shen, G. Wu, H.-I. Suk, Deep learning in medical image analysis, Annual Review of Biomedical Engineering 19 (2017) 221e248. [9] E. Wu, B. Kong, X. Wang, J. Bai, Y. Lu, F. Gao, S. Zhang, K. Cao, Q. Song, S. Lyu, Y. Yin, Residual Attention Based Network for Hand Bone Age Assessment, arXiv preprint arXiv:1901.05876. [10] B. Kong, X. Wang, J. Bai, Y. Lu, F. Gao, K. Cao, Q. Song, S. Zhang, S. Lyu, Y. Yin, Attention-driven Tree-Structured Convolutional Lstm for High Dimensional Data Understanding, arXiv preprint arXiv:1902.10053. [11] D.C. Ciressan, A. Giusti, L.M. Gambardella, J. Schmidhuber, Mitosis detection in breast cancer histology images with deep neural networks, in: International Conference on Medical Image Computing and Computer- Assisted Intervention, Springer, 2013, pp. 411e418. [12] A. Cruz-Roa, J. Arevalo, A. Basavanhally, A. Madabhushi, F. Gonza´lez, A comparative evaluation of supervised and unsupervised representation learning approaches for anaplastic medulloblastoma differentiation, in: 10th International Symposium on Medical Information Processing and Analysis, vol. 9287, International Society for Optics and Photonics, 2015, p. 92870G. [13] F. Xing, Y. Xie, H. Su, F. Liu, L. Yang, Deep learning in microscopy image analysis: a survey, IEEE Transactions on Neural Networks and Learning Systems, 29. [14] J.N. Weinstein, E.A. Collisson, G.B. Mills, K.R.M. Shaw, B.A. Ozenberger, K. Ellrott, I. Shmulevich, C. Sander, J.M. Stuart, C.G.A.R. Network, et al., The cancer genome atlas pancancer analysis project, Nature Genetics 45 (10) (2013) 1113. [15] C. Genestie, B. Zafrani, B. Asselain, A. Fourquet, S. Rozan, P. Validire, A. Vincent-Salomon, X. Sastre-Garau, Comparison of the prognostic value of scarff-bloom-richardson and nottingham histological grades in a series of 825 cases of breast cancer: major importance of the

Toward large-scale histopathological image analysis via deep learning

[16] [17] [18] [19] [20] [21] [22] [23]

[24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35]

mitotic count as a component of both grading systems, Anticancer Research 18 (1B) (1998) 571e576. J. Xu, L. Xiang, Q. Liu, H. Gilmore, J. Wu, J. Tang, A. Madabhushi, Stacked sparse autoencoder (ssae) for nuclei detection on breast cancer histopathology images, IEEE Transactions on Medical Imaging 35 (1) (2016) 119e130. H. Chen, X. Qi, L. Yu, P.-A. Heng, Dcan, Deep contour-aware networks for accurate gland segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2487e2496. S. Wang, J. Yao, Z. Xu, J. Huang, Subtype cell detection with an accelerated deep convolution neural network, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2016, pp. 640e648. H. Li, R. Zhao, X. Wang, Highly Efficient Forward and Backward Propagation of Convolutional Neural Networks for Pixelwise Classification, arXiv preprint arXiv:1412.4526. Z. Xu, J. Huang, Detecting 10,000 cells in one second, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2016, pp. 676e684. A. Basavanhally, S. Ganesan, M. Feldman, N. Shih, C. Mies, J. Tomaszewski, A. Madabhushi, Multifield-of-view framework for distinguishing tumor grade in erþ breast cancer from entire histopathology slides, IEEE Transactions on Biomedical Engineering 60 (8) (2013) 2089e2099. L. Hou, D. Samaras, T.M. Kurc, Y. Gao, J.E. Davis, J.H. Saltz, Patch- based convolutional neural network for whole slide tissue image classification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2424e2433. B.E. Bejnordi, M. Veta, P.J. Van Diest, B. Van Ginneken, N. Karssemeijer, G. Litjens, J.A. Van Der Laak, M. Hermsen, Q.F. Manson, M. Balkenhol, et al., Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer, Jama 318 (22) (2017) 2199e2210. D. Wang, A. Khosla, R. Gargeya, H. Irshad, A. Beck, Deep Learning for Identifying Metastatic Breast Cancer, arXiv preprint arXiv:1606.05718. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1e9. B. Kong, X. Wang, Z. Li, Q. Song, S. Zhang, Cancer metastasis detection via spatially structured deep network, in: International Conference on Information Processing in Medical Imaging, Springer, Cham, 2017, pp. 236e248. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, in: CVPR, 2016, pp. 2818e2826. Y. Liu, K. Gadepalli, M. Norouzi, G. Dahl, T. Kohlberger, A. Boyko, S. Venugopalan, A. Timofeev, P. Nelson, G. Corrado, et al., Detecting Cancer Metastases on Gigapixel Pathology Images, arXiv preprint arXiv:1703.02442. P. Chen, K. Gadepalli, R. MacDonald, Y. Liu, K. Nagpal, T. Kohlberger, G.S. Corrado, J.D. Hipp, M.C. Stumpe, An augmented reality microscope for real-time automated detection of cancer, in: Proc. Annu. Meeting American Association Cancer Research, 2018. J. Wu, et al., Quantized convolutional neural networks for mobile devices, in: CVPR, 2016, pp. 4820e4828. M. Rastegari, et al., Xnor-net: Imagenet classification using binary convolutional neural networks, in: ECCV, Springer, 2016, pp. 525e542. X. Zhang, W. Liu, M. Dundar, S. Badve, S. Zhang, Towards large-scale histopathological image analysis: hashing-based image retrieval, IEEE Transactions on Medical Imaging 34 (2) (2015) 496e506. X. Zhang, F. Xing, H. Su, L. Yang, S. Zhang, High-throughput histopathological image analysis via robust cell segmentation and hashing, Medical Image Analysis 26 (1) (2015) 306e315. M.D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, in: European Conference on Computer Vision, Springer, 2014, pp. 818e833. I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT press, 2016.

413

414

Bin Kong, Zhongyu Li and Shaoting Zhang

[36] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Computation 9 (8) (1997) 1735e1780. [37] Z. Peng, R. Zhang, X. Liang, X. Liu, L. Lin, Geometric scene parsing with hierarchical lstm, in: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, AAAI Press, 2016, pp. 3439e3445. [38] F. Chollet, et al., Xception: deep learning with depthwise separable convolutions, in: CVPR, 2017, pp. 1251e1258. [39] G. Hinton, et al., Distilling the Knowledge in a Neural Network, arXiv preprint arXiv:1503.02531. [40] B. Kong, S. Sun, X. Wang, Q. Song, S. Zhang, Invasive cancer detection utilizing compressed convolutional neural network and transfer learning, in: International Conference on Medical Image Computing and Computer- Assisted Intervention, Springer, Cham, 2018, pp. 156e164. [41] A. Howard, et al., Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications, arXiv preprint arXiv:1704.04861. [42] A. Romero, et al., Fitnets: Hints for Thin Deep Nets, arXiv preprint arXiv:1412.6550. [43] J. Long, et al., Fully convolutional networks for semantic segmentation, in: CVPR, 2015, pp. 3431e3440. [44] H. Lin, H. Chen, Q. Dou, L. Wang, J. Qin, P.-A. Heng, Scannet: a fast and dense scanning framework for metastastic breast cancer detection from whole-slide image, in: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, 2018, pp. 539e546.

CHAPTER FOURTEEN

Data modeling and simulation Alessandra Bertoldo and Claudio Cobelli Department of Information Engineering, University of Padova, Padova, Italy

14.1 Introduction In vivo imaging techniques like positron-emission tomography (PET) and magnetic resonance imaging (MRI) provide crucial functional information at the organ/ tissue level of the human body. However, this functional information is not directly available in quantitative terms by simply looking at the images. This usually requires an interpretation of the image with a mathematical model of the underling physiological process. Various classes of modelsde.g., model of data (or inputeoutput), model of system, and graphical modelsdhave been proposed for interpreting PET and MRI data. The focus here is on the specific class of system modeldi.e., compartmental modelingdthat is most frequently used. Compartmental models are widely employed to solve a broad spectrum of physiologic and clinical problems related to the distribution of materials in living systems. The governing law of these models is conservation of mass, and they describe the events in the system by a finite number of variablesdi.e., they are described by ordinary differential equations. These characteristics make them very attractive to users because they formalize physical intuition in a simple and reasonable way. Their usefulness in research, especially in conjunction with tracer experiments, has been demonstrated at wholebody, organ, and cellular levels. Examples and references can be found in several books [1e5]. Purposes for which compartmental models have been developed are various, but the most relevant here are • identification of system structuredsuch models examine different hypotheses regarding the nature of specific physiologic mechanisms; • estimation of unmeasurable quantitiesdthese quantities might include the estimation of internal parameters and other variables of physiologic interest; and • simulation of intact system behavior, where ethical or technical reasons would not allow direct experimentation on the system itself. In this chapter we first briefly review some fundamentals of compartmental models, focusing on tracee and tracer kinetics. Subsequently we discuss some aspects of model identification and parameter estimation. Finally, we will show the power of compartmental

Biomedical Information Technology ISBN 978-0-12-816034-3, https://doi.org/10.1016/B978-0-12-816034-3.00014-6

2020 Elsevier Inc. All rights reserved.

415 j

Alessandra Bertoldo and Claudio Cobelli

416

modeling methodology in interpreting PET and nuclear magnetic resonance functional imaging data.

14.2 Compartmental models To discuss the theory of compartmental models, we first need to give some definitions. A compartment is an amount of material that acts well mixed and kinetically homogeneous. A compartmental model consists of a finite number of compartments with specified interconnections among them. These interconnections represent a flux of material that physiologically represents transport from one location to another, a chemical transformation, or both. An example is shown in Fig. 14.1. Control signals arising in neuroendocrine systems can also be described. In this case, one can have two separate compartmental models, one for the hormone and one for the substrate, that interact via control signals. An example is shown in Fig. 14.2. Given our introductory definitions, it would be useful to discuss possible candidates for compartments before explaining what we mean by a compartment’s well-mixed and kinetic homogeneity. Consider the notion of a compartment as a physical space. Plasma is a candidate for a compartment, a substance such as plasma glucose could be a compartment, and zinc in bone could be a compartment, as could insulin in b-cells. In some experiments, several different substances in plasma can be followed, such as glucose, lactate, and alanine. Thus there can be more than one plasma compartment in the same experiment, one for substance being studied. This notion extends beyond plasma. Glucose and glucose-6-phosphate might need to be shown by two different compartments depending on whether they are found in liver or muscle tissue. Thus a single physical space or substance may actually be represented by more than one compartment depending on the components measured or their location.

A

A

B

B

Figure 14.1 The compartmental system model showing the interconnections among compartments. The administration of material into and sampling from the accessible pools are indicated by the input arrow and measurement symbols (dotted line), respectively. The solid arrows represent the flux of material from one compartment to another.

Data modeling and simulation

417

3

1

4

2

5

SUBSTRATE

HORMONE

Figure 14.2 An example of a multicompartmental model of an endocrineemetabolic control system. The top and bottom multicompartmental models describe the metabolism of the substrate and hormone, respectively. The dotted arrows represent control signals. For example, the dotted arrow from compartment 3 to the input arrow into compartment 4 indicates that the amount of material in compartment 3 controls the input of material into compartment 4.

In addition, one must distinguish between compartments that are accessible and nonaccessible for measurement. Researchers often try to assign physical spaces to nonaccessible compartments. This is a very difficult problem best addressed by the recognition that a compartment is actually a theoretical construct, one that may in fact combine material from several different physical spaces. To equate a compartment with a physical space depends on the system under study and the assumptions made about the particular model. With these notions of what might constitute a compartment in mind, it is easier to define the concepts of well-mixed and kinetic homogeneity. Well-mixed means that any two samples taken for a compartment at the same time would have the same concentration of the substance being studied and therefore be equally representative. Thus the concept of wellmixed relates to the uniformity of the information contained in a single compartment. Kinetic homogeneity means that every particle in a compartment has the same probability of taking any pathways leaving the compartment. When a particle leaves a compartment, it does so because of metabolic events related to transport and utilization, and all particles in the compartment have the same probability of leaving due to one of these events. This process of combining material with similar characteristics into collections that are homogeneous and behave identically is what allows one to reduce a complex physiologic system into a finite number of compartments and pathways. The number of compartments required depends both on the system being studied and on the richness of the experimental configuration. A compartmental model is clearly unique for each system studied, because it incorporates known and hypothesized physiology and biochemistry specific to that system. It provides the investigator with insights into the system’s structure and is only as good as the assumptions incorporated into its structure.

Alessandra Bertoldo and Claudio Cobelli

418

14.2.1 Tracee model In this section, we will discuss the definition of the tracee model using Fig. 14.3. This is a typical compartment, the ith compartment. The tracee model can be formalized by defining precisely the flux of tracee material into and out from this compartment and by establishing the measurement equation if this compartment is accessible for sampling. Once this is understood, the process of connecting several such compartments into a multicompartmental model and writing the corresponding equations is easy. Let Fig. 14.3 represent the ith compartment of an n-compartment model of the tracee system with Qi denoting the mass of the compartment. The arrows represent fluxes into and out of the compartment. The input flux into the compartment from outside the system, the de novo synthesis of material, is represented by Fi0; flux to the environment and therefore out of the system by F0i; flux to and from compartment j by Fji and Fif, respectively; and finally, an exogenous input denoted by Uh(h ¼ 1, .,r). All fluxes Fij(i ¼ 0,1, .,n; j ¼ 0,1, .,n; isj) and masses Qi(i ¼ 1,2, .,n) are 0. The dashed arrow with a bullet indicates that the compartment is accessible to measurement. This measurement is denoted by Cl(l ¼ 1, .,m), where we assume it is a concentration, and Cl ¼ Qi/Vi, where Vi is the volume of compartment i. As already noted, usually only a small number of compartments are accessible to test inputs and measurements. By using the mass balance principle, one can write for each compartment _ i ðtÞ Q

¼

n P j¼0 jsi

Fji ðQ1 ðtÞ/; Qn ðtÞÞ þ

n X

Fij ðQ1 ðtÞ; /; Qn ðtÞÞ

j¼1 jsi

þFi0 ðQ1 ðtÞ; /; Qn ðtÞÞ þ Uh ðtÞ Q ðtÞ Cl ðtÞ ¼ 1 Qi ð0Þ ¼ Qi0 Vi

Uh

(14.1)

Fi0 ci=Qi/Vi

Fij Qi Fji F0i Figure 14.3 The ith compartment of an n-compartmental model showing fluxes into and out of the compartment, inputs, and measurements.

Data modeling and simulation

419

_ i ðtÞ ¼ dQi ðtÞ, and t > 0 is time, the independent variable. All the fluxes Fij, Fi0, where Q dt and F0i are assumed to be functions of the compartmental masses Qi. If one writes the generic flux Fji( j ¼ 0,1, .,n; i ¼ 1,2, .,n; j s i) as (14.2) Fij ðQ1 ðtÞ; .; Qn ðtÞÞ ¼ kij ðQ1 ðtÞ; .; Qn ðtÞÞ Qi ðt where kji(0) denotes the fractional transfer coefficient between compartment i and j, Eq. (14.1) can be rewritten as _ i ðtÞ ¼ Q

n X

kij ðQ1 ðtÞ; .; Qn ðtÞÞQj ðtÞ þ

j¼0 jsi

n X

kji ðQ1 ðtÞ; .; Qn ðtÞÞQj ðtÞþ

j¼1 jsi

Fi0 ðQ1 ðtÞ; .; Qn ðtÞÞ þ Uh ðtÞ Cl ðtÞ ¼

Q1 ðtÞ Vi

(14.3)

Qi ð0Þ ¼ Qi0

Eq. (14.3) describe the nonlinear compartmental model of the tracee system. To make the model operative, one must specify how kij and Fi0 depend on Qi. This obviously depends on the system being studied. Usually kij and Fi0 are functions of one or a few Qi. Some possible examples include the following. • kij are constant and thus do not depend on any Qi: kij ðQ1 ðtÞ; .; Qn ðtÞÞ ¼ kij ¼ constant

(14.4)

• kij are described by a saturative relationship such as MichaeliseMenten: kij Qj ðtÞ ¼

VM Km þ Qj ðtÞ

(14.5)

VM Qjm1 ðtÞ kij Qj ðtÞ ¼ K m þ Qm j ðtÞ

(14.6)

or the Hill equation:

Note that when m ¼ 1 in the above, Eq. (14.6) becomes Eq. (14.5).

Alessandra Bertoldo and Claudio Cobelli

420

• kij is controlled by the arrival compartment, such as by a Langmuir relationship: Qi ðtÞ kij Qj ðtÞ ¼ a 1 (14.7) b • kij is controlled by a remote compartment different from source (Qj) or arrival (Qi) compartments. For example, using the model shown in Fig. 14.2, one could have k02 ðQ5 ðtÞÞ ¼ g þ Q5 ðtÞ

(14.8)

or a more complex description such as k02 ðQ2 ðtÞ; Q5 ðtÞÞ ¼

Vm ðQ5 ðtÞÞ Km ðQ5 ðtÞÞ þ Q2 ðtÞ

(14.9)

where now one must further specify how Vm and Km depend on the controlling compartment, Q5. The input Fi0 can also be controlled by remote compartments. For example, for the model shown in Fig. 14.2, one can have F30 ðQ4 ðtÞÞ ¼

d ε þ Q4 ðtÞ

_ 3 ðtÞ F40 ðQ3 ðtÞÞ ¼ h þ lQ3 ðtÞ þ mQ

(14.10) (14.11)

The nonlinear compartmental model given in Eq. (14.3) permits the description of a physiological system in nonsteady state under very general circumstances. Having specified the number of compartments and functional dependencies, there is now the problem of assigning a numerical value to the unknown parameters that describe them. Some may be assumed to be known, but some need to be tuned to the particular subject being studied. Often, however, the data are not enough to arrive at the unknown parameters of the model, and a tracer is employed to enhance the information content of the data.

14.2.2 Tracer model In this section, we will formalize the definition of the tracer model using Fig. 14.4. This parallels exactly the notions introduced above, except now we follow the tracer, denoted by lowercase letters, instead of the tracee. The link between the two, the tracee and tracer models, is given in the following section. Suppose an isotopic radioactive tracer is injected (denoted by uh) into the ith compartment, and denote qi its tracer mass at time t (Fig. 14.4). Assuming an ideal tracer, traceretracee indistinguishability ensures that tracee rate constants kij also apply to the

Data modeling and simulation

421 fi0

uh

yi=qi/Vi

fij qi fji f0i

Figure 14.4 The ith compartment of an n-compartmental tracer model showing fluxes into and out from the compartment, inputs, and measurements.

tracer. Again, as with the tracee, the measurement is usually a concentration, yl(t) ¼ qi(t)/Vi. The tracer model, given the tracee model Eq. (14.3), is q_ i ðtÞ ¼

n X

kji ðQ1 ðtÞ; .; Qn ðtÞÞqj ðtÞ þ

j¼0 jsi

þ uh ðtÞ y1 ðtÞ ¼

n X

kji ðQ1 ðtÞ; .; Qn ðtÞÞqj ðtÞ

j¼0 jsi

qi ð0Þ ¼ 0

(14.12)

qi ðtÞ Vi

Note that the endogenous production term Fi0 in Eq. (14.3) does not appear in Eq. (14.12); this is because this term applies only to the tracee.

14.2.3 Linking tracer and tracee models The model necessary to arrive at the nonaccessible system properties is obtained by linking the tracee and tracer models to form the traceretracee model; this model is described by Eq. (14.3) and Eq. (14.12). The problem one wishes to solve is how to use the tracee data Cl(t) and \ tracer data yl(t) to obtain the unknown parameters of the model. In the general settingdi.e., the tracee system in nonsteady statedthe problem is complex. This difficulty reduces considerably when the tracee system is in steady state. Since this situation is also the experimental protocol most frequently encountered in PET and nuclear magnetic resonance functional imaging studies, we will consider this important special case in the following. If the tracee is in a constant steady state, the exogenous input Uh is zero, all the fluxes : Fij and masses Qi(t) in the tracee model Eq. (14.1) are constant, and the derivatives Qi ðtÞ are zero. As a result, all the fractional transfer coefficients kij are constant.

Alessandra Bertoldo and Claudio Cobelli

422

The tracee and tracer models given in Eq. (14.3) and Eq. (14.12), respectively, thus become 0¼

n X

kji Qi ðtÞ þ

j¼0 jsi

q. i ðtÞ ¼

kij Qj þ Fi0

Qi ð0Þ ¼ Qi0

C1 ¼

j¼1 jsi

n X j¼1 jsi

y1 ðtÞ ¼

n X

kji qi ðtÞ þ

n X

kij qj ðtÞ þ uh ðtÞ

Qi Vi

(14.13)

qi ð0Þ ¼ 0

j¼1 jsi

(14.14)

qi ðtÞ Vi

This is an important result: the tracer compartmental model is linear and time invariant if the tracee is in a constant steady state, irrespective of whether it is linear or nonlinear. The modeling machinery for Eq. (14.13) and Eq. (14.14) is greatly simplified with respect to the nonlinear models shown in Eq. (14.3) and Eq. (14.12). The strategy is to use the tracer data to arrive at kij and the accessible pool volume Vi of Eq. (14.14) and subsequently use the steady-state tracee model of Eq. (14.13) to solve for unknown parameters Fi0 and the remaining Qi.

14.3 Model identification With the traceretracee model described by Eq. (14.13) and Eq. (14.14), we can now proceed to model identification, the process by which we arrive at a numerical value for the unknown model parameters from the tracer (and tracee) actual measurements. Let’s assume that measurement error is additive and thus tracer actual measurements (assume the scalar case) are described at sample time ti: zðti Þ ¼ yðti Þ þ vðti Þi ¼ 1; .; N

(14.15)

where v(ti) is the tracer measurement error. The error is usually given a probabilistic description, in that they are assumed to be independent and often Gaussian. With Eq. (14.15) and model Eqs. (14.13) and (14.14), the compartmental model identification problem can now be defined: we can begin to estimate the unknown model parameters from the z(ti) noisy data contained. Before solving this problem, however, we must deal with a prerequisite issue for the well-posedness of our parameter estimation. This is the issue of a priori identifiability. As

Data modeling and simulation

423

seen below, this requires reasoning that uses ideal noise-free datadi.e., Eqs. (14.13) and (14.14).

14.3.1 A priori identifiability A priori identifiability is a key step in the formulation of a structural model whose parameters are to be estimated from a set of data. The question that a priori identifiability addresses is the following: do the data contain enough information to estimate all unknown parameters of the postulated model structure? This question is usually referred to as the a priori identifiability problem. It is set in the ideal context of an error-free model structure and noise-free, continuous time measurements and is an obvious prerequisite for well-posedness of parameter estimation from real data. In particular, if it turns out in such an ideal context that the postulated model structure is too complex for the particular set of datadi.e., some model parameters are not identifiable from the datadthere is no way in a real situation, with error in the model structure and noise in the data, that the parameters can be identified. The a priori identifiability problem is also referred to as the structural identifiability problem because it is set independently of a particular set of parameter values. For the sake of simplicity in what follows, only the term a priori will be used to qualify the problem. Only if the model is a priori identifiable is it meaningful to use the techniques to estimate the numerical values of the parameters from the data that will be discussed later. If the model is a priori nonidentifiable, a number of strategies can be considered. One would be to enhance the information content of the experiment by adding, when feasible, inputs and/or measurements. Another possibility would be to reduce the complexity of the model by simplifying its model structurede.g., by lowering the model order or by aggregating some parameters. These simple statements allow one to foresee the importance of a priori identifiability also in relation to qualitative experiment designde.g., definition of an experiment that allows one to obtain an a priori identifiable model with the minimum number of inputs and measurements. Before discussing the problem in depth and the methods available for its solution, it is useful to illustrate the fundamentals through some simple examples. Then some formal definitions will be given, using these simple examples where the identifiability issue can be easily addressed. 14.3.1.1 Examples Example 1. Consider the single compartmental model shown in Fig. 14.5 (left), where the input is a bolus injection of a tracer given at time zero and the measured variable is the tracer concentration. The model and measurement equations are qðtÞ _ ¼ k,qðtÞ þ uðtÞ

qð0Þ ¼ 0

(14.16)

Alessandra Bertoldo and Claudio Cobelli

424

u

y=q2/V2

u

u

y=q1/V1

y=q/V q

q1

k21

q2

q1

k02

k

k21

q2

k01

Figure 14.5 A single compartment (left), a two-compartment (middle), and a two-compartment model in which the irreversible loss is from compartment 1 (right). For all models, the tracer input u(t) is a bolus injection of dose d given at time zero. The compartments are characterized by a volume V, and y is the measured tracer concentration.

yðtÞ ¼

qðtÞ V

(14.17)

where uðtÞ ¼ d,dðtÞ; that is, d is the magnitude of the bolus dose. The unknown parameters for the model are the rate constant k and the volume V. Eq. (14.17) defines the observation of the system in an ideal context of noise-free and continuous-time measurements. In other words, is model output describing what is measured continuously and without errors? It does not represent noisy discrete time measurements. To see how the experiment can be used to obtain estimates of these parameters, note that the solution of Eq. (14.16) is the monoexponential qðtÞ ¼ dekt

(14.18)

The model output y(t) is thus given by yðtÞ ¼

d kt e hAelt V

(14.19)

The model output or ideal data are thus described by a function of the form Aelt , and the parameters that can be determined by the experiment are A and l. These parameters are called the observational parameters. What is the relationship between the unknown model parameters k and Vand the observational parameters A and l? From Eq. (14.19), one sees immediately that A ¼ yð0Þ ¼ l¼k

d V

(14.20) (14.21)

In the example just given, the unknown parameters k and V of the model are a priori uniquely or globally identifiable from the designed experiment because they can be evaluated uniquely from the observational parameters A and l. Since all model parameters

Data modeling and simulation

425

are uniquely identifiable, the model is said to be a priori uniquely or globally identifiable from the designed experiment. So far, we have analyzed the identifiability properties of the model by inspecting the expression of the model output in order to derive the relationships between the observational parameters and the unknown model parameters. The method is easy to understand because it only requires some fundamentals of differential calculus. However, the approach is not practicable in general because it works easily only for some simple linear models of orders one and two. For linear models of higher order, the method becomes quite cumbersome and its application is practically impossible. A simpler method to derive the desired relationships between observational parameters and unknown model parameters consists of writing the Laplace transform for the model output and is known as the transfer function method. Briefly, the advantage of the Laplace transform method is that there is no need to use the analytical solution of the system of linear differential equations. By writing the Laplace transform of the state variablesde.g., massesdand then of the model outputsde.g., concentrationsdone obtains an expression that defines the observational parameters as a function of the unknown model parameters. This gives a set of nonlinear algebraic equations in the original parameters. For the model of Fig. 14.5, the Laplace transforms of Eqs. (14.16) and (14.17) are, respectively, s , QðsÞ ¼ k,QðsÞ þ d YðsÞ ¼

QðsÞ V

(14.22) (14.23)

where s is the Laplace variable, and the capital letter denotes the Laplace transform of the corresponding lowercase letter variable. The transfer function is HðsÞ h

YðsÞ QðsÞ=V ½d=s þ k=V 1=V b ¼ ¼ ¼ h UðsÞ d d sþk sþa

(14.24)

The coefficients a and b are determinable from the experimentdi.e., they are the observational parameters and thus one finds that 1 V

(14.25)

a¼k

(14.26)

b¼

Alessandra Bertoldo and Claudio Cobelli

426

That is, the model is a priori uniquely idessntifiable. For this simple model, the advantage of the Laplace transform method is not evident, but its power will be appreciated when we consider the next example. Example 2. Consider the two-compartment model shown in Fig. 14.5 (middle), where a bolus injection of tracer is made into compartment 1. The accessible compartment is compartment 2. Assume the measured variable is the tracer concentration y(t) ¼ q2(t)/V2. The equations describing this model, assuming a bolus input, are q_ 1 ðtÞ ¼ k21 q1 ðtÞ þ uðtÞ,q1 ð0Þ ¼ 0

(14.27)

q_ 2 ðtÞ ¼ k21 q1 ðtÞ k02 q2 ðtÞ,q2 ð0Þ ¼ 0

(14.28)

yðtÞ ¼

q2 ðtÞ V2

(14.29)

where uðtÞ ¼ d,dðtÞ. The unknown model parameters are k21, k02, and V2. To see how the experiment can be used to obtain estimates of these parameters, one can use either the time domain solution of Eqs. (14.28) (a sum of two exponentials) or the transfer function method, which is much more straightforward. The transfer function is HðsÞ ¼

YðsÞ k21 =V2 b1 ¼ h 2 UðsÞ ðs þ k21 Þðs þ k02 Þ s þ a2 s þ a1

(14.30)

where the coefficients a1 , a2 , and b1 are the observational parameters (known from the experiment) linked to the unknown model parameters by b1 ¼ k21 =V2

(14.31)

a2 ¼ k21 þ k02

(14.32)

a1 ¼ k21 k02

(14.33)

Eqs. (14.31)e(14.33) are nonlinear, and it is easy to verify that it is not possible to obtain a unique solution for the unknown parameters. In fact, from Eqs. (14.32) and (14.33), parameters k21 and k02 are interchangeable and thus each has two solutions, say kI21 , kII21 and kI02 , kII02 . As a result, from Eq. (14.31), V2 has two solutions also, VI2 and VII2 . The two solutions provide the same expression for model output y(t). When there is a finite number of solutions (more than one; two in this case), the unknown parameters are said to be a priori nonuniquely or locally identifiable from the designed experiment. When all model parameters are identifiable (uniquely or nonuniquely) and at least one model

Data modeling and simulation

427

parameter is nonuniquely identifiable (in this case, all three are), the model is said to be a priori nonuniquely or locally identifiable. It is worth also noting that in this case, one has parameters that are a priori uniquely identifiable but are not the original parameters of interest. They are combinations of the original parameters, in particular k21k02, k21 þ k02, and k21/V2. To achieve unique identifiability of this nonuniquely identifiable model, one could design a more complex experiment, or if available, exploit additional independent information available on the system. In this particular case, knowledge of V2, or a relationship between k21 and k02, allows one to achieve unique identifiability of all model parameters. Example 3. Consider the two-compartment model shown in Fig. 14.5 (right), where a bolus injection of a tracer is given at time zero and the measured variable is the concentration of drug in plasma. The equations describing this model are q_ 1 ðtÞ ¼ ðk01 þ k21 Þq1 ðtÞ þ uðtÞ,q1 ð0Þ ¼ 0

(14.34)

q_ 2 ðtÞ ¼ k21 q1 ðtÞ,q2 ð0Þ ¼ 0

(14.35)

yðtÞ ¼

q1 ðtÞ V1

(14.36)

The unknown model parameters are k21, k01, and V1. To see how the experiment can be used to obtain estimates of these parameters, one notes that the transfer function is HðsÞ ¼

YðsÞ 1=V1 b ¼ h UðsÞ sþa s þ k21 þ k01

(14.37)

and thus b ¼ 1=V1

(14.38)

a ¼ k21 þ k01

(14.39)

It is easy to see that while V1 is uniquely identifiable, k21 and k01 have an infinite number of solutions lying on the straight line a ¼ k21 þ k01. When there is an infinite number of solutions for a parameter, one says the parameter is a priori nonidentifiable from the designed experiment. When at least one model parameter is nonidentifiable (in this case, there are two), the model is said to be a priori nonidentifiable. As with the previous example, one can find a uniquely identifiable parameterizationdi.e., a set of parameters that can be evaluated uniquely. In this case, the parameter is the sum k01 þ k12 (V1 has been seen to be uniquely identifiable). Again,

Alessandra Bertoldo and Claudio Cobelli

428

to achieve unique identifiability of k01 and k21, either a more informative experiment is neededde.g., measuring also in compartment 2dor additional information on the system, such as a relationship between k01 and k21, is required. 14.3.1.2 Definitions These simple examples have allowed for understanding the importance of the a priori identifiability problem and provided a means of introducing some basic definitions. Below, we will give some generic definitions that also hold for more general modelsde.g., the nonlinear compartmental models Eq. (14.3) and Eq. (14.12). Consider the model Eq. (14.14). Define with p ¼ ½p1 ; p2 ; .; pM T the set of M unknown model parametersdi.e., kij and either Vi. So the model Eq. (14.14) can be written as yl ¼ gl(t,p). Define now the observational parameter vector F ¼ ½41 ; .; 4R T having the observational parameters fj , j ¼ 1, ., R as entries. Each particular inputeoutput b of the parameter vector F di.e., the experiment will provide a particular value F b components of F can be estimated uniquely from the data by definition. Moreover, the observational parameters are functions of the basic model parameters pi that may or may not be identifiable: F ¼ FðpÞ

(14.40)

Thus to investigate the a priori identifiability of model parameters pi, it is necessary to solve the system of nonlinear algebraic equations in the unknown pi obtained by setting b the polynomials FðpÞ equal to the observational parameter vector F: b FðpÞ ¼ F

(14.41)

These equations are called the exhaustive summary. Examples of this have already been provided in working out Examples 1, 2, and 3 in (14.20)e(14.21) and (14.25)e(14.26); (14.31)e(14.33) and (14.38)e(14.39), respectively. One can now generalize definitions. Let us give them first for a single parameter of the model and then for the model as a whole. The single parameter pi is a priori uniquely (globally) identifiable if and only if the system of Eq. (14.41) has one and only one solution • nonuniquely (locally) identifiable if and only if the system of Eq. (14.41) has, for pi, more than one but a finite number of solutions; • nonidentifiable if and only if the system of Eq. (14.41) has, for pi, infinite solutions. 14.3.1.3 The model is a priori • uniquely (globally) identifiable if all its parameters are uniquely identifiable; • nonuniquely (locally) identifiable if all its parameters are identifiable, either uniquely or nonuniquely, and at least one is nonuniquely identifiable.

Data modeling and simulation

429

14.3.1.4 The transfer function method The problem now is to assess, on the basis of the knowledge only of the assumed model structure and of the chosen experimental configuration, whether the model is a priori nonidentifiable, nonuniquely identifiable, or uniquely identifiable. The most common method to test a priori identifiability of linear dynamic models is the transfer function method. Assuming we have r inputs and m outputs, the approach is based on analysis of the r m transfer function matrix: Yi ðs; pÞ Hðs; pÞ ¼ ½Hij ðs; pÞ ¼ (14.42) Uj ðsÞ where each element Hij of H is the Laplace transform of the response in the measurement variable at port i, yi(t,p) to a unit impulse at port j, uj(t) ¼ d(t). The transfer function approach makes reference to the coefficients of the numerator and denominator polynomials of each of the m r elements Hij(s,p) of the transfer function matrix, ij

ij

respectively, b1 ðpÞ; .; bijn ðpÞ and a1 ðpÞ; .; aijn ðpÞ. These coefficients are the ij

2n r m observational parameters 4l . Therefore, the exhaustive summary can be written as 11 b11 1 ðpÞ ¼ 41

«

«

11 a11 n ðpÞ ¼ 42n

«

«

(14.43)

rm brm 1 ðpÞ ¼ 41

«

«

rm arm n ðpÞ ¼ 42n

This system of nonlinear algebraic equations needs to be solved for the unknown parameter vector p to define the identifiability properties of the model. We have discussed the Laplace transform method to generate the exhaustive summary of the models. The method is simple to use even for system models of orders greater than two. What becomes more and more difficult is the solutiondi.e., to determine which original parameters of the model are uniquely determined by the system of nonlinear algebraic equations. In fact, one must solve a system of nonlinear algebraic equations that is increasing both in number of terms and in degree of nonlinearity with model order. In other words, the method works well for models of low dimensionde.g., order two or threedbut fails when applied to relatively large models, because the system of nonlinear algebraic equations becomes too difficult to be solved.

Alessandra Bertoldo and Claudio Cobelli

430

To deal with the problem in general there is the need to resort to computer algebra methods. In particular, a tool to test a priori identifiability of linear compartmental models of general structure that combines the transfer function method with a computer algebra method, the Grobner basis algorithm, has been developed [6]. Finally, it is worth noting that for some classes of linear compartmental modelsdi.e., catenary and mammillary models, and for the general two- and three-compartmental models, explicit identifiability results are available (see Ref. [4]). From the above considerations, it follows that a priori unique identifiability is a prerequisite for well-posedness of parameter estimation and for reconstructability of stated variables in nonaccessible compartments. It is a necessary step that, because of the ideal context where it is posed, does not guarantee a successful estimation of model parameters from real inputeoutput data.

14.3.2 Parameter estimation At this stage a model has been formulated, and the parameter estimation is well posed. In this section we describe how to obtain numerical estimates of the unknown parameters from noisy experimental data and how to judge the quality of parameter estimationdi.e., is the model able to describe the data, and what is the precision with which the unknown model parameters are estimated? We will confine ourselves to Fisher estimation techniques and particularly to weighted least squares. This is probably the most widely used parameter estimation technique. For its connection to maximum likelihood estimation as well as for Bayesian estimation techniques, the reader is referred to Ref. [5]. 14.3.2.1 Weighted least squares A model of the system has now been formulated. The model contains a set of unknown parameters to which we would like to assign numerical values from the data of an experiment. We assume that we have checked its a priori identifiability. The experimental data are also available. In mathematical terms, the ingredients we have are the model output, which can be written as yðtÞ ¼ gðt; pÞ

(14.44)

where g(t,p) is related to the model of the system and the discrete-time noisy output measurements, zi: zðti Þ ¼ zi ¼ yðti Þ þ vðti Þ ¼ gðt; pÞ þ vi i ¼ 1; .; N

(14.45)

where vi is the measurement error of the ith measurement. The problem is to assign a numerical value of p from data zi. Regression analysis is the most widely used method to “adjust” the parameters characterizing a model to obtain

Data modeling and simulation

431

the “best” fit to a set of data. The weighted residual sum of squares, WRSS, is a good and commonly used measure of how good the fit to the data is. It is given by WRSS ¼

N X i¼1

wi ðzi yi Þ2

(14.46)

where N is the number of observations and (zi yi) the error between the observed and predicted value for each sample time ti, and wi is the weight assigned to the ith datum. WRSS can be considered a function of the model parametersdi.e., WRSS ¼ WRSS(p). The idea is to minimize WRSS with respect to the parameter values characterizing the model to be fitted to the data. It is natural to link the choice of weights to what is known about the precision of each individual datum. In other words, one seeks to give more credibility, or weight, to those data whose precision is high, and less credibility, or weight, to those data whose precision is small. The measurement error is vi in Eq. (14.44). It is a random variable, and assumptions about its characteristics must be made. The most common assumption is that the sequence of vi is a random process with zero mean (i.e., no systematic error), independent samples, and variance known. What this means can be formalized in the statistical setting using the notation E, Var, and Cov to respectively represent mean, variance, and covariance. Eðvi Þ ¼ 0

(14.47)

Covðvi ; vj Þ ¼ 0 for ti stj

(14.48)

Varðvi Þ ¼ s2i

(14.49)

Eq. (14.47) means the errors vi have zero mean; (14.48) means they are independent, and (14.49) means the variance is known. A standardized measure of the error is provided by the fractional standard deviation, FSD, or coefficient of variation, CV: FSDðvi Þ ¼ CVðvi Þ ¼

SDðvi Þ zi

where SD is the standard deviation of the error: pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi SDðvi Þ ¼ Varðvi Þ

(14.50)

(14.51)

FSD or CV is often expressed as a percentagedi.e., the percentage FSD or percentage CV, by multiplying SD(vi)/zi in (14.50) by 100. We have considered the case where the variance is known (14.49). However, one can also easily handle the case where the variance is known up to a proportionality

Alessandra Bertoldo and Claudio Cobelli

432

constantdi.e., Varðvi Þ ¼ bi s2 with bi known and s2 unknown. We shall not consider this case explicitly in the following in order not to make the presentation too heavy. For more details, the reader is referred to Cobelli et al. [4]. Knowing the error structure of the data, how are weights wi chosen? The natural choice is to weight each datum according to the inverse of the variancedi.e., wi ¼

1 s2i

(14.52)

It can be shown that this natural choice of weights is optimal in the linear regression case. Therefore, it is very important to have correct knowledge of the error of the data and to weight each datum according to this error. The problem now is how to estimate the error variance. Ideally one would like to have a direct estimate of the variance of all sources of error. This is a difficult problem. For instance, measurement error is just one component of the error; it can be used as an estimate of the error only if the investigator believes that the major source of error arises after the sample is taken. To have a more precise estimate of the error, the investigator should have several independent replicates of the measurement zi at each sampling time ti from which the sample variance s2i at ti can be estimated. If there is a major error component before the measurement process, for instance an error related to drawing a plasma sample or preparing a plasma sample for measurement, then it is not sufficient to repeat the measurement per se on the same sample several times. In theory, in this situation it would be necessary to repeat the experiment several times. Such repetition is not often easy to handle in practice. Finally, there is the possibility that the system itself can vary during different experiments. In any case, since the above-mentioned approach estimates the variance at each sampling time ti, it requires several independent replicates of each measurement. An alternative and more practical approach consists of postulating a model for the error variance and estimating its unknown parameters from the experimental data. A flexible model that can be used for the error variance is s2i ¼ a þ bðyi Þg

(14.53)

which can be approximated in practice by s2i ¼ a þ bðzi Þg

(14.54)

where a, b, and g are nonnegative model parameters relating the variance associated with an observation to the value of the observation itself. Values can usually be assigned to these parameters or can be estimated from the data themselves.

Data modeling and simulation

433

Linear Regression Let us consider a linear model with M parameters and put the regression problem in compact matrix-vector notation. The measurement equation can be written as z ¼ y þ v ¼ G,p þ v 2

z1

6 6z 6 2 6 6 6 « 4 zN

3

2

y1

3

2

3

v1

7 6 7 6 7 6y 7 6v 7 6 27 6 2 7 ¼ 6 7þ6 7 6 7 6 7 6 « 7 6 « 5 4 5 4 yN

2

g11

6 7 6g 7 6 21 7 7 ¼ 6 6 7 6 / 7 4 5 gN1

vN

g12

/

g22

/

/

/

gN2

/

(14.55) 32

g1M

p1

3

2

v1

76 7 6 6 7 6 g2M 7 76 p2 7 6 v2 76 7þ6 76 7 6 / 76 « 7 6 « 54 5 4 gNM

pM

3 7 7 7 7 (14.56) 7 7 5

vN

with p ¼ ½p1 ; p2 ; /; pM T 2

g11

6 6g 6 21 G¼6 6 6 / 4 gN1

g12

/

g22

/

/

/

gN2

/

g1M

(14.57) 3

7 g2M 7 7 7 7 / 7 5

(14.58)

gNM

The measurement error v, assuming a second-order descriptiondi.e., mean and covariance matrix (NxN)dis E½v ¼ 0

(14.59)

X E v vT ¼ v

(14.60)

and since independence is assumed, Sv is diagonal: Sv ¼ diag s21 ; s22 ; .; s2N

(14.61)

Now if we define the residual vector r r ¼ z G,p

(14.62)

the weighted residual sum of squares is WRSSðpÞ ¼

N 2 X r i

s2 i¼1 i

T 1 ¼ rT S1 v r ¼ ðz G,pÞ Sv ðz G , pÞ

(14.63)

Alessandra Bertoldo and Claudio Cobelli

434

The WLS estimate of p is that which minimizes WRSS(p): p b ¼ arg min WRSSðpÞ ¼ arg min ðz G , ppÞT S1 v ðz G , pÞ p

(14.64)

p

After some calculations one has 1 T 1 p b ¼ GT S1 G Sv z v G

(14.65)

It is also possible to obtain an expression of the precision of p b . Since data z are affected by a measurement error v, one finds that p b is also affected by an error that we call estimation error. It can be defined as p e ¼ pp b p e is a random variable because p b is random. The covariance matrix of p e is Sep ¼ covðe pÞ ¼ E p e,p eT ¼ S^p

(14.66)

(14.67)

and one can show that 1 S^p ¼ GT S1 v G

(14.68)

The precision of the estimate b p i of p is often expressed in terms of standard deviationdi.e., the square root of the variance Varðb p Þ: q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi SD b p i ¼ Var b pi (14.69) It can be given in terms of FSD or CV, which measures the relative precision of the estimate: SD b pi FSD b p i ¼ CV b (14.70) pi ¼ b pi As noted previously, FSD and CV can be expressed as a percent by multiplying them by 100. From Eq. (14.65) and Eq. (14.68), one sees that both p b and S^p depend on s2i . This is why it is essential that the investigator appreciates the nature of the error in the data. Up to this point, the assumption has been made that the model is correct. In this case, from comparison between the equation describing the data z ¼ G,p þ v and the definition of the residual r ¼ z G,b p , one can immediately conclude that residuals r must reflect measurement errors v. For this in fact to be true, two conditions must hold: (1) the correct model or functional description of the data has been selected, and (2) the parameter estimation procedure has converged to values close to the “true” values. The

Data modeling and simulation

435

sequence of residuals can thus be viewed as an approximation of the measurement error sequence. One can check whether the above two conditions hold by testing, on the sequence of residuals, the assumptions made regarding the measurement error. Usually the measurement error is assumed to be a zero mean, independent random process having a known variance. These assumptions can be checked on the residuals by means of statistical tests. Independence of the residuals can be tested visually using a plot of residuals versus time. It is expected that the residuals will oscillate around their mean, which should be close to zero, in an unpredictable way. Systematic residualsdi.e., a long run sequence of residuals above or below zerodsuggest that the model is an inappropriate description of the system because it is unable to describe a nonrandom component of the data. A formal test of nonrandomness of residuals is the run test. A run is defined as a subsequence of residuals having the same sign (assuming the residuals have zero mean); intuitively, a very small or very large number of runs in the residual sequence is an indicator of nonrandomnessdi.e., of systematic errors in the former and of periodicity in the latter case. For details and examples, we refer readers to Cobelli et al. [4]. In WLS estimation, a specific assumption about the variance of the measurement errors has been made. If the model is correct, the residuals must reflect this assumption. Since vi 1 Var (14.71) ¼ 2 Varðvi Þ ¼ 1 si si if we define the weighted residuals as wresi ¼

resi si

(14.72)

they should be a realization of a random process having unit variance. By plotting the weighted residuals versus time, it is thus possible to visually test the assumption about the variance of the measurement errordweighted residuals should lie in a 1, þl wide band. A typical plot of weighted residuals is shown in Fig. 14.6. A pattern of residuals different from what was expected indicates either the presence of errors in the functional description of the data or that the model is correct but the measurement error model is not appropriate. In this case, it is necessary to modify the assumptions about the measurement error structure. Some suggestions can be derived by examining the plot. As an example, consider the case where the variance of the measurement error is assumed to be constant. The residuals are expected to be confined to a 1, þ1 wide region. If their amplitude tends to increase in absolute value with respect to the observed value, a possible explanation is that the variance of the measurement

Alessandra Bertoldo and Claudio Cobelli

436 2 1.5

Weighted residuals

1 0.5 0 -10 -0.5

10

30

50

70

90

-1 -1.5 -2 Time (min)

Figure 14.6 Plot of weighted residuals versus time.

error is not constant, thus suggesting a modification of the assumption about the measurement error variance. Nonlinear Regression. Let us now turn to the nonlinear model of equations: yðtÞ ¼ gðt; pÞ

(14.73)

zi ¼ yi þ vi ¼ gðti ; pÞ þ vi .i ¼ 1; 2; .; N

(14.74)

Let us put the model in the compact matrix-vector notation as we did with the linear model. One has y ¼ GðpÞ

(14.75)

z ¼ y þ v ¼ GðpÞ þ v

(14.76)

z ¼ ½z1 ; z2 ; .; zN T

(14.77)

y ¼ ½y1 ; y2 ; .; yN T

(14.78)

p ¼ ½p1 ; p2 ; .; pM T

(14.79)

GðpÞ ¼ ½gðt1 ; pÞ; gðt2 ; pÞ; .; gðtN ; pÞT

(14.80)

v ¼ ½v1 ; v2 ; .; vN T with Sv ¼ covðvÞ

(14.81)

where

Data modeling and simulation

437

The WLS estimate of p is the one that minimizes WRSSðpÞ ¼ ½z GðpÞT S1 v ½z GðpÞ

(14.82)

It can be easily shownde.g., using the simple nonlinear model yðtÞ ¼ Aea,t dthat an explicit analytical solution for p analogous to Eq. (14.65) is not possible. To arrive at an estimate of p, one possible strategy is based on iterative linearization of the modeldi.e., the Gauss-Newton method. Let us go back to the model of Eq. (14.73) and consider the expression of y(t) obtainable through its Taylor series expansion around a specific value of p, say T p0 ¼ p01 ; p02 ; .; p0M (14.83) by neglecting the terms that contain derivatives of second order and higher 2 p1 p01 6 0 0 0 0 6 6 p2 p2 vg t ; p vg t ; p vg t ; p i i i 0 6 yi ¼ gðti ; pÞ y g ti ; p þ . 6 vp1 vp2 vpM 6 / 4

3 7 7 7 7 (14.84) 7 7 5

pM p0M where the derivatives are evaluated at p ¼ p0. Notice that this equation is now linear in p. The data relate to yi as Eq. (14.74). Thus using Eq. (14.84) and moving to vector notation, one has 2 3 0 0 0 vg t1 ; p vg t1 ; p 7 6 vg t1 ; p . 6 3 6 2 3 vp1 vp2 vpM 7 72 z1 g t1 ; p0 6 7 p1 p01 7 6 7 7 6 76 6 6 7 0 0 0 76 6 vg t2 ; p vg t2 ; p 76 p p0 7 6 z2 g t2 ; p0 7 6 vg t2 ; p 7 2 2 7 6 7 6 . 76 6 6 7 7¼6 vp vp vp 76 1 2 M 6 7 7 6 76 6 7 7 6 76 . 6 7 7 6 7 . . . . 4 4 5 5 6 7 6 7 0 0 7 pM pM 6 zN g tN ; p 6 vg tN ; p0 vg tN ; p0 vg tN ; p0 7 4 5 . vp1 vp2 vpM 2

v1

3

6 7 6 7 6 v2 7 6 7 7 þ6 6 7 6 « 7 6 7 4 5 vN (14.85)

Alessandra Bertoldo and Claudio Cobelli

438

and thus Dz ¼ S,Dp þ v

(14.86)

with obvious definition of Dz, S and Dp from Eq. (14.85). Now, since Dz is known (p0 is given, z is measured) and S can be computed, one can use WLS to estimate Dp with the linear machinery by using Eq. (14.65) with the correspondence z4Dz, p4Dp and G4S: 1 T 1 S Sv Dz (14.87) Db p ¼ ST S1 v S Hence, a new estimate of p can be obtained as p p1 ¼ p0 þ Db

(14.88)

1 0 Now, 1 with p , which 0 is by definition a better estimate than p because WRSS p < WRSS p , the process can restart: the model is linearized around p1, a new estimate p2 is obtained, and so on until the cost function stops decreasing significantlyde.g., when two consecutive values of WRSS(p) are within a prescribed tolerance. Once p b has been obtained, by paralleling the linear case, one can obtain the covariance of the parameter estimates as 1 S^p y ST S1 (14.89) v S

with 2

vgðt1 ; p bÞ 6 6 vp1 6 6 6 6 vgðt2 ; p bÞ 6 6 S ¼ 6 vp1 6 6 . 6 6 6 vgðt ; p N bÞ 4 vp1

vgðt1 ; p bÞ vp2

.

vgðt2 ; p bÞ vp2

.

.

.

vgðtN ; p bÞ vp2

.

3 vgðt1 ; p bÞ 7 vpM 7 7 7 7 vgðt2 ; p bÞ 7 7 vpM 7 7 7 7 . 7 7 vgðtN ; p bÞ 7 5 vpM

(14.90)

14.3.3.1 Residuals and weighted residuals defined for the aforementioned linear case The linear machinery has been used to solve the nonlinear case. However, it is worth remarking that the nonlinear case is more complex to handle than is the linear case. This is true not only from a computational point of view but also conceptually due to the

Data modeling and simulation

WRSS( p)

439

global minimum

p

Figure 14.7 WRSS as a function of p

presence of local minima of WRSS(p) and the necessity to specify an initial estimate of p, p0. To illustrate this additional complexity graphically, let us consider the scalar case with WRSS(p) as a function of p shown in Fig. 14.7. There is more than one minimum for WRSS, and this is distinctly different from the linear case where there is only one (unique) minimum. The minima shown in Fig. 14.7 are called local minima. The difference, then, between the linear and nonlinear case is that in linear regression there is a “unique” minimum for WRSS, while in the nonlinear case there may be several local minima for WRSS. Among the local minima, the smallest is called the global minimum. This has obvious implications for the choice of p0. Generally, to be sure one is not ending up at a local minimum, several tentative values of p0 are used as starting points. The steps of nonlinear least squares estimation have been illustrated using the GausseNewton iterative scheme. This outlines the principles of that class of algorithms that requires the computation of derivatives contained in matrix S. This is usually done numerically (e.g., using central difference methods), albeit other strategies are also availablede.g., the sensitivity system [4]. This class is referred to as gradient-type (derivative) algorithms. Numerically refined and efficient algorithmsde.g., the LevenbergeMarquardt technique, based on the Gauss-Newton principledare available and implemented in many software tools. Another category of algorithms for minimizing WRSS that has been applied in physiological model parameter estimation is one that does not require computation of the derivatives. These algorithms are known as direct search methods, and both deterministic and random search algorithms are available and implemented in software tools. An efficient deterministic direct search algorithm is the simplex method. It is worth emphasizing that with a direct search method, computation of derivatives is not required. Albeit a direct comparison of gradient versus direct search methods is difficult and may be problem dependent, and available experience in physiological model parameter estimation tends to favor the gradient type methods.

Alessandra Bertoldo and Claudio Cobelli

440

14.3.3.2 Test of model order Up to this point, only the problem of testing whether a specific model is an appropriate description of a set of data has been examined. Consider now the case where different candidate models are available, and the problem is to select the model that provides the best description of the data. For example, when performing multiexponential modeling of a decay curve, yðtÞ ¼

n X

A1 eli ,t

(14.91)

i¼1

and the model orderdthat is, the number n of exponentialsdis not known a priori. Mono-, bi- and triexponential models are usually fitted to the data, and the results of parameter estimation are evaluated in order to select the optimum orderdi.e., the “best” value for n. Relying solely on WRSS and an examination of the weighted residuals to determine the optimum model order is not appropriate, since as the model order increases, WRSS will decrease. For example, in dealing with a tracer decay curve following a bolus injection, each additional exponential term added to the sum of exponentials will decrease WRSS. Similarly, the pattern of residuals will become more random. However, each time an exponential term is added, two parameters are added and degrees of freedom are decreased by two. Thus intuitively, when comparing different model structures, both WRSS and degrees of freedom should be evaluated. This is in order to check whether the reduction of WRSS truly reflects a more accurate representation of the data or is merely the result of the increased number of parameters. Hence additional tests are required. The two tests frequently used to compare model structures are the F-test and tests based on the principle of parsimony. In the following, we briefly describe only the latter and refer the reader, for illustration of the F-test, to Cobelli et al. [4]. The most commonly employed tests that implement the principle of parsimonyd i.e., choose the model best able to fit the data with the minimum number of parametersdare the Akaike information criterion (AIC) and the Schwarz criterion (SC). More than two models can be compared, and the model with the smallest criterion is chosen as the best. If one assumes that errors in the data are uncorrelated and Gaussian, with a known measurement error variance, then the criteria are AIC ¼ WRSS þ 2$M

(14.92)

SC ¼ WRSS þ M$lnN

(14.93)

Data modeling and simulation

where M is the number of parameters in the model and N is the number of data. While having different derivations, AIC and SC are similar, as they are made up of a goodnessof-fit measure plus a penalty function proportional to the number of parameters M in the model. Note that in SC, M is weighted ln(N)di.e., with large N, this may become important.

14.4 Model validation It is not difficult to build models of systemsdthe difficulty lies in making them accurate and reliable in answering the question asked. For the model to be useful, one must have confidence in the results and the predictions inferred from it. Such confidence can be obtained by model validation. Validation involves the assessment of whether a compartmental model is adequate for its purpose. This is a difficult and highly subjective task when modeling physiological systems because intuition and an understanding of the system, among other factors, play an important role in this assessment. It is also difficult to formalize related issues such as model credibility or the use of the model outside its established validity range. Some efforts have been made, however, to provide some formal aids for assessing the value of models of physiological systems. Validity criteria and validation strategies for models of physiological systems are available [7] that account for both the complexity of model structure and the extent of available experimental data. Of particular importance is the ability to validate a model-based measurement of a system parameter by an independent experimental technique. A model that is valid is not necessarily a true one; all models have a limited domain of validity, and it is hazardous to use a model outside the area for which it has been validated.

14.4.1 Simulation Suppose one wishes to see how the system behaves under certain stimuli, but it is inappropriate or impossible to carry out the required experiment. If a valid model of the system is available, one can perform an experiment on the model by using a computer to see how the system would have reacted. This is called simulation. Simulation is thus an inexpensive and safe way to experiment with the system. Clearly, the value of simulation results depends completely on the quality or validity of the model of the system. Having derived a complete model, including estimating all unknown parameters and checking its validity in relation to its intended domain of application, it is now possible to use it as a simulation tool. Computer simulation involves solving the model (i.e., the equations that are the realization of the model) in order to examine its output behavior. This might typically be the time course of one or more of the system variables. In other words, we are performing computer experiments on the model. In fact, simulation can be used either during the process of model building or with a complete model. During

441

Alessandra Bertoldo and Claudio Cobelli

442

model building, simulation can be performed in order to clarify some aspects of behavior of the system or part of it in order to determine whether a proposed model representation seems appropriate. This would be done by a comparison of model response with experimental data from the same situation. Simulation, when performed on a complete, validated model, yields output responses that provide information regarding system behavior; information that, depending on the modeling purpose, assists in describing the system, predicting behavior, or yielding additional insights (i.e., explanation). Why carry out computer simulation? The answer is that it might not be possible, appropriate, convenient, or desirable to perform particular experiments on the system (e.g., it cannot be done at all, is too difficult, is too expensive, is too dangerous, is not ethical, or would take too long to obtain results). Therefore, we need an alternative way of experimenting with the system. Simulation offers an alternative that overcomes the just-mentioned limitations. Such experimenting can provide information that is useful in relation to our modeling purpose. In order to perform computer simulation, we first need a mathematical model that is complete in terms of all the parameters being specified and has initial conditions specified for all variables. If the model is not complete in the sense that some parameter values remain unspecified, formal parameter estimation techniques must be employed to obtain such estimates. The model is then implemented on the computer. This assumes that the model equations cannot be, or are not being, solved analytically and that a numerical solution of the system is needed. The model is solved on the computer, with the solution process producing the time course of the system variables. In technical terms, the computer implementation is done either using a standard programming language (e.g., FORTRAN, C) or using a specialist simulation package (e.g., MATLAB).

14.5 Case study To illustrate the methodological points we have been making, consider the following set of data that will be described by a sum of exponentials. In our previous discussion, we have focused on compartmental models. However, when one is starting “from scratch,” it is often wise to fit the data to a sum of exponentials, because this gives a clue as to how many compartments will be required in the model. Consider the data given in Table 14.l; these data are radioactive tracer glucose concentrations measured in plasma following an injection of tracer at time zero. The time measurements are minutes, and the plasma measurements are dpm/mL. The experiment was performed in a normal subject in the basal state [8]. In order to select the order of the multi-exponential model that is best able to describe these data, one-, twoand three-exponential models can be considered: yðtÞ ¼ A1 el1 ,t

(14.94)

Data modeling and simulation

443

Table 14.1 Plasma data from a tracer experiment. Time Plasma

2 4 5 6 7 8 9 10 11 13 14 15 17 19 21 23 25

3993.50 3316.50 3409.50 3177.50 3218.50 3145.00 3105.00 3117.00 2984.50 2890.00 2692.00 2603.00 2533.50 2536.00 2545.50 2374.00 2379.00

99.87 86.33 88.19 83.55 84.37 82.90 82.10 82.34 79.69 77.80 73.84 72.06 70.67 70.72 70.91 67.48 67.58

Time

Plasma

28 31 34 37 40 50 60 70 80 90 100 110 120 130 140 150

2252.00 2169.50 2128.50 2085.00 2004.00 1879.00 1670.00 1416.50 1333.50 1152.00 1080.50 1043.00 883.50 83250 776.00 707.00

65.04 63.39 62.57 61.70 60.08 57.58 53.40 48.33 46.67 43.04 41.61 40.86 37.67 36.65 35.52 34.14

yðtÞ ¼ A1 el1 ,t þ A2 el2 ,t

(14.95)

yðtÞ ¼ A1 el1 ,t þ A2 el2 ,t þ A3 el3 ,t

(14.96)

The measurement error is assumed to be additive: zi ¼ yi þ vi

(14.97)

where errors vi are assumed to be independent, Gaussian with a mean of zero, with an experimentally determined standard deviation of SDðvi Þ ¼ 0:02,zi þ 20

(14.98)

These values are shown, associated with each datum, in Table 14.1. The three models are to be fitted to the data by applying weighted nonlinear regression with weights chosen equal to the inverse of the variance. The plots of the data and the model predictions together with the corresponding weighted residuals are shown in Fig. 14.8, and the model parameters are given in Table 14.2. Examining Table 14.2, we can see that all parameters can be estimated with acceptable precision in the one- and two-exponential models, while some parameters of the three-exponential model are quite uncertain. This means that the three-exponential model cannot be resolved with precision from the data. In fact, the first exponential is so

Alessandra Bertoldo and Claudio Cobelli

444 One exponential

dpm /ml

3000 2500 2000 1500

4500 4000

4500 4000

3500 3000 2500

3500 3000 2500 2000

2000 1500 1000

1000 500 0

1500 1000

500 0 0

50

100

150

10

weighted Residuals

Three exponentials

Two exponentials

4500 4000 3500

500 0 0

50

100

150

0

3

3

2

2

1

1

50

100

150

100

150

5

0

0 0

50

100

150

0 0

50

100

150

0

-1

-1

-2

-2

50

-5

-10

Time (minutes)

-3

-3

Time (minutes)

Time (minutes)

Figure 14.8 The best fit of the data given in Table 14.1 to single, two-, and three-exponential models together with a plot of the weighted residuals for each case. The exponential coefficients and eigenvalues for each model are given in Table 14.2.

Table 14.2 One-, two-, and three-exponential model parameter estimates (see text for explanation). 1 exponential 2 exponentials 3 exponentials

A1 l1 A2 l2 A3 l3 Run test: c2 Test F test AIC SC

Z value 5% region P value WRSS 5% region P value F ratio

3288 (1%) 0.0111 (1%)

1202 (10%) 0.1383 (17%) 2950 (2%) 0.0098 (3%)

5.13 [1.96,1.96] 20% 29.59 [0,3.33]