Machine Learning and Deep Learning in Computational Toxicology 3031207297, 9783031207297

This book is a collection of machine learning and deep learning algorithms, methods, architectures, and software tools t

1,076 150 19MB

English Pages 653 [654] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Machine Learning and Deep Learning in Computational Toxicology
 3031207297, 9783031207297

Table of contents :
Preface
Contents
Editor and Contributors
1 Machine Learning and Deep Learning Promote Computational Toxicology for Risk Assessment of Chemicals
1.1 Risk Assessment of Chemicals
1.2 Computational Toxicology
1.3 Machine Learning in Computational Toxicology
1.4 Deep Learning in Toxicology
1.5 Perspectives
References
Part I Machine Learning and Deep Learning Methods for Computational Toxicology
2 Assessment of the Xenobiotics Toxicity Taking into Account Their Metabolism
2.1 Introduction
2.2 Computational Methods of Studying Metabolism
2.2.1 Databases Containing Xenobiotic Metabolism Information
2.2.2 Descriptors/Notation Used for Metabolism Prediction
2.2.3 Prediction of Biotransformation Sites
2.2.4 Generation of the Structures of Probable Metabolites
2.2.5 Reactive Metabolite Formation Prediction
2.3 Integral Computational Assessment of Xenobiotic Toxicity
2.4 Future Directions in Xenobiotic Toxicity Assessment
References
3 Emerging Machine Learning Techniques in Predicting Adverse Drug Reactions
3.1 Introduction
3.2 Feature Generation for Machine Learning
3.2.1 Structure-Based Features
3.2.2 Interactions and Associations
3.2.3 Data Sources for Feature Generation
3.3 Conventional Methods for ADR Prediction
3.4 Emerging Methods for ADR Prediction
3.4.1 Molecule-Based Methods
3.4.2 Similarity-Based Methods
3.4.3 Network- and Graph-Based Methods
3.5 ADR Prediction Future Directions
References
4 Drug Effect Deep Learner Based on Graphical Convolutional Network
4.1 Introduction
4.2 Results
4.2.1 Gene Vector: Generation and Evaluation
4.2.2 Molecular Feature and Vector Generation
4.2.3 Cell Vector: Generation and Evaluation
4.2.4 Deep Drug Effect Predictor: Training and Validation
4.2.5 Application of DDEP to Predict the Effects of Anti-cancer Drugs Against Breast Adenocarcinoma
4.2.6 Insights into Drug Classification
4.3 Discussion
4.4 Methods
4.4.1 Capture Contextual Information of Genes from Their Interaction Networks
4.4.2 Generating Gene Vectors and Cell Vectors
4.4.3 GCN-Based Pre-models
4.4.4 Deep Drug Effect Predictor
References
5 AOP-Based Machine Learning for Toxicity Prediction
5.1 Introduction
5.2 Research Status and Existing Problems for ML
5.3 General Overview of AOP
5.3.1 The Generation of AOP
5.3.2 The Framework of AOP
5.3.3 Qualitative AOP and Quantitative AOP
5.4 Research Progress of Toxicity Prediction by AOP and ML
5.5 Perspectives and Future Prospects of AOP
References
6 Graph Kernel Learning for Predictive Toxicity Models
6.1 Introduction
6.2 A Brief Introduction of Graph Concepts
6.2.1 Graph Theory Definitions
6.2.2 Graph Kernels Fundamentals
6.3 Graph Kernel Learning for Molecular Representations
6.4 Applications of GKL Methods on Chemical Toxicity
6.4.1 Benchmark Data Sets and Methods About Chemical Toxicity
6.4.2 Applications of Graph Kernel-Based Methods
6.4.3 Applications of Graph Neural Networks
6.4.4 Applications of Learnable Graph Embeddings
6.4.5 Applications of Learnable Graph Kernels
6.5 Challenges and Perspectives of Graph Kernel Learning on Toxicity-Related Problems
6.6 Conclusion
References
7 Optimize and Strengthen Machine Learning Models Based on in Vitro Assays with Mechanistic Knowledge and Real-World Data
7.1 Introduction
7.2 Incorporating AOPs to Construct Parsimonious Machine Learning Models
7.2.1 AOPs and AOP Networks
7.2.2 Using AOPs to Facilitate Building Parsimonious Machine Learning Models
7.3 Utilize Spontaneous Reporting Databases to Corroborate Findings of Machine Learning Models
7.3.1 Statistical Methods for Safety Signal Mining Using Spontaneous Reporting Databases
7.3.2 Obtain Data from FAERS
7.3.3 Poisson Regression Model for Report Counts
7.3.4 Incorporating Host Factors in Testing
7.3.5 Utilize FAERS Data to Corroborate Models Based on in Vitro Assays
7.4 Conclusions
References
8 Multitask Learning for Quantitative Structure–Activity Relationships: A Tutorial
8.1 Introduction
8.2 QSAR and Multitask Learning
8.2.1 Definition of MTL Problem
8.2.2 Task Relatedness
8.2.3 Multitask Neural Networks
8.2.4 Performance Evaluation
8.3 Case Study: NURA Dataset
8.4 Hands-On Tutorial
8.4.1 Getting Started
8.5 Conclusions
References
Part II Tools and Approaches Facilitating Machine Learning and Deep Learning Methods in Computational Toxicology
9 Isalos Predictive Analytics Platform: Cheminformatics, Nanoinformatics, and Data Mining Applications
9.1 Introduction
9.2 Isalos Platform
9.3 Data Input
9.4 Data Transformation
9.4.1 Normalizers
9.4.2 Data Manipulation
9.4.3 Dataset Splitting
9.5 Analytics
9.5.1 Modelling Methodologies
9.5.2 Feature Selection
9.5.3 Existing Model Utilization
9.6 Statistics
9.6.1 Domain—APD
9.6.2 Model Metrics
9.7 Development of Predictive Models with Isalos
9.7.1 Ecotox Models
9.7.2 Molecular, Size, and Surface-Based Safe by Design (MS3bD, MSzeta) Model
9.7.3 Cell Viability Model
9.8 Conclusions
References
10 ED Profiler: Machine Learning Tool for Screening Potential Endocrine-Disrupting Chemicals
10.1 Introduction
10.2 Materials and Methods
10.2.1 Data Sets
10.2.2 Molecular Descriptor Calculation
10.2.3 (Q)SAR Modeling
10.2.4 Applicability Domain and Reliability Evaluation
10.2.5 Software Development
10.3 Development of Predictive Models
10.3.1 Proposed Predictive Model System
10.3.2 Development of SAR Models
10.3.3 Development of QSAR Models
10.4 Development of Software
10.4.1 Features and Overview of the Software
10.4.2 Examples
10.5 Conclusions
References
11 Quantitative Target-specific Toxicity Prediction Modeling (QTTPM): Coupling Machine Learning with Dynamic Protein–Ligand Interaction Descriptors (DyPLIDs) to Predict Androgen Receptor-mediated Toxicity
11.1 Introduction
11.2 Materials and Methods
11.2.1 Study Design
11.2.2 Dataset Curation, Preprocessing, and Chemical Preparation
11.2.3 Molecular Docking
11.2.4 Molecular Dynamics (MD) Simulations
11.2.5 Dynamic Protein–Ligand Interaction Descriptors (DyPLIDs) Calculation
11.2.6 Feature Selection: Down-Selection of Descriptors
11.2.7 Dataset Splitting
11.2.8 Machine Learning for Quantitative AR Activity Prediction Modeling
11.3 Results and Discussion
11.3.1 Conformational Ensemble of AR-Ligand Interactions
11.3.2 Comparison of 6 ns Versus 100 ns Simulations
11.3.3 Fingerprint Chemical Diversity
11.3.4 Predictive QSAR Model
11.3.5 Feature Importance
11.3.6 Model Limitation
11.4 Conclusion
References
12 Mold2 Descriptors Facilitate Development of Machine Learning and Deep Learning Models for Predicting Toxicity of Chemicals
12.1 Introduction
12.2 Mold2 Descriptors
12.2.1 The Descriptors
12.2.2 The Software
12.3 Information Content of Mold2 Descriptors
12.4 Applications in Machine Learning
12.4.1 Predicting Estrogenic Activity
12.4.2 Predicting Androgenic Activity
12.4.3 Predicting Kinase Inhibitors
12.5 Applications in Deep Learning
12.5.1 Predicting DILI
12.5.2 Predicting Drug-Likeness
12.6 Summary
References
13 Applicability Domain Characterization for Machine Learning QSAR Models
13.1 An Outline of Quantitative Structure–Activity Relationship (QSAR) Models
13.1.1 Core Elements of QSAR Models
13.1.2 Validity of QSAR Models
13.2 Concepts and Understandings of AD
13.2.1 Physicochemical, Structural, Mechanistic, and Metabolic Aspects
13.2.2 Interpolation, Distance/Similarity, and Boundary
13.2.3 AD Metrics Evaluating Prediction Performance on Individual Chemicals
13.3 AD Characterization Methods
13.3.1 Descriptor Domain
13.3.2 Structural (Similarity) Domain
13.3.3 Clustering-Based Methods
13.3.4 First-Class AD Metrics
13.3.5 Second-Class AD Metrics
13.3.6 Visualization of AD
13.4 Impacts on the QSAR Modeling Scenario from Machine Learning Algorithms
13.5 Toward Broader AD for Machine Learning QSAR Models
References
14 Controlling for Confounding in Complex Survey Machine Learning Models to Assess Drug Safety and Risk
14.1 Introduction and Background
14.2 The Propensity Score Method
14.3 Software and Assessment of Balance
14.4 A Drug Safety Model for Prescription NSAIDs
14.5 Propensity Score Weighting
14.5.1 Adjusting for Confounding with Propensity Score ATE Weighting
14.5.2 Adjusting for Confounding with Propensity Score ATT Weighting
14.6 Propensity Score Stratification
14.6.1 Adjusting for Confounding with Propensity Score Stratification
14.7 Conclusion
References
15 Multivariate Curve Resolution for Analysis of Heterogeneous System in Toxicogenomics
15.1 Introduction
15.2 Basic Conceptions and Application Scenarios
15.2.1 The Definition of MCR in Heterogeneous Systems
15.2.2 The Ultimate Goal of Applying MCR in TGx
15.3 Method Categories
15.3.1 Determining or Estimating k
15.3.2 Determining or Estimating E and W
15.3.3 Jointly Utilization and Alternate Estimation
15.4 Available Resources
15.4.1 Databases of TGx Data
15.4.2 Functions of MCR
15.4.3 Tools of Deconvolution by Using MCR
15.5 Perspective and Future Directions
References
Part III Machine Learning and Deep Learning for Chemical Toxicity Prediction
16 The Use of Machine Learning to Support Drug Safety Prediction
16.1 Introduction
16.2 Chemical-Based Safety Machine Learning
16.2.1 Overview
16.2.2 Databases
16.2.3 Machine Learning Algorithms
16.3 Case Study—Assessment of Pharmaceutical Impurities
16.4 Conclusions
References
17 Machine Learning-Based QSAR Models and Structural Alerts for Prediction of Mitochondrial Dysfunction
17.1 Introduction
17.2 Datasets and Methods
17.2.1 Data on Mitochondrial Dysfunction
17.2.2 Machine Learning Methods Used for Model Construction
17.2.3 Model Evaluation
17.2.4 Methods to Identify Structural Alerts
17.3 Mitochondrial Dysfunction QSAR Models and Structural Alerts
17.3.1 Mitochondrial Dysfunction QSAR Models
17.3.2 Structural Alerts for Mitochondrial Dysfunction
17.4 Conclusions and Future Directions
References
18 Machine Learning and Deep Learning Applications to Evaluate Mutagenicity
18.1 In Silico Methods to Predict Bacterial Mutagenicity
18.2 Data for Modeling Mutagenicity
18.3 Traditional Machine Learning for Mutagenicity Prediction
18.4 Deep Learning for Mutagenicity Prediction
18.5 Discussion and Perspective
References
19 Modeling Tox21 Data for Toxicity Prediction and Mechanism Deconvolution
19.1 Introduction
19.2 Tox21 10K Compound Library and Assay Data
19.2.1 Tox21 Compound Collection
19.2.2 Tox21 qHTS Process
19.3 Modeling Tox21 Data for Toxicity Prediction
19.3.1 Multiple Species In Vivo Toxicity
19.3.2 Human In Vivo Toxicity
19.3.3 In Vitro Toxicity
19.4 Toxicity Pathways and Mechanisms
19.5 Conclusions and Moving Forward
References
20 Identification of Structural Alerts by Machine Learning and Their Applications in Toxicology
20.1 Introduction
20.2 Approaches for Identification of Structural Alerts
20.2.1 Expert Systems
20.2.2 Computational Approaches
20.2.3 Comparison of Data-Driven Structural Alerts with Expert Systems
20.3 Application of Structural Alerts in Toxicology
20.3.1 Toxicity Prediction
20.3.2 Explanation of QSAR Models
20.3.3 Molecular Optimization
20.3.4 Exploring New Mechanisms
20.4 Perspectives and Outlook
References
21 Machine Learning in Prediction of Nanotoxicology
21.1 Introduction
21.2 Toxicity of Nanomaterials
21.2.1 Toxicity of Carbon Nanomaterials
21.2.2 Toxicity of Transition Metal Dichalcogenides
21.2.3 Toxicity of MOFs
21.3 Prediction of Nanotoxicity by Machine Learning
21.3.1 Prediction of Carbon Nanomaterials Toxicity by Machine Learning
21.3.2 Prediction of Nanometal Toxicity by Machine Learning
21.3.3 Prediction of Nanometal Oxide Toxicity by Machine Learning
21.3.4 Prediction of Other Nanomaterials Toxicity by Machine Learning
21.4 Future Directions of Machine Learning in Nanotoxicology Prediction
References
22 Machine Learning for Predicting Organ Toxicity
22.1 Introduction
22.2 Machine Learning Algorithms
22.2.1 Classification and Regression Tree
22.2.2 k-Nearest Neighbors (kNN)
22.2.3 Naïve Bayes (NB)
22.2.4 Random Forest
22.2.5 Support Vector Machine
22.3 Organ Toxicity Prediction
22.3.1 Liver Toxicity
22.3.2 Kidney Toxicity
22.3.3 Heart Toxicity
22.4 A Case Study for Organ Toxicity Prediction
22.4.1 Data Sources
22.4.2 Supervised Machine Learning
22.4.3 Results
22.5 Summary
References
Part IV The Progress of Machine Learning and Deep Learning in New Areas
23 Computational Modeling for the Prediction of Hepatotoxicity Caused by Drugs and Chemicals
23.1 Introduction
23.2 Machine Learning Methods for Predicting Hepatotoxicity
23.2.1 Toxicity Dataset for Machine Learning
23.2.2 Metrics for Evaluating Model Performance
23.2.3 Machine Learning Algorithms
23.3 A Case Study: Machine Learning Modeling for Hepatotoxicity Prediction
23.3.1 Data Sources
23.3.2 Modeling by Machine Learning Approaches
23.3.3 Results
23.4 Summary and Future Direction
References
24 Artificial Intelligence for Risk Assessment of Cancer Therapy-Related Cardiotoxicity and Precision Cardio-Oncology
24.1 Introduction
24.2 Methods and Materials
24.2.1 Data Resources
24.2.2 Molecular Feature and Vector Generation
24.2.3 Defining Biological Endpoints and Clinical Outcomes
24.2.4 AI/ML Algorithm and Model Selection
24.2.5 Evaluating Model Performance
24.3 Variable Network Construction
24.4 Case Studies
24.4.1 In Silico Pharmacoepidemiologic Evaluation of Drug-Induced Cardiovascular Complications Using Combined Classifiers
24.4.2 Machine Learning-Based Risk Assessment for Cancer Therapy-Related Cardiac Dysfunction in 4300 Longitudinal Oncology Patients
24.4.3 Cardiac Risk Stratification in Cancer Patients: A Longitudinal Patient-Patient Network Analysis
24.5 Future Directions and Conclusion
References
25 Deep Learning Model for Prediction of Compound Activities Over a Panel of Major Toxicity-Related Proteins
25.1 Introduction
25.2 Methods
25.2.1 Dataset
25.2.2 Chemical Diversity Analysis
25.2.3 Prediction Models
25.2.4 Evaluation Metrics
25.3 Results and Discussion
25.3.1 Data Collection and Analysis
25.3.2 Drug and Target Representations Selection
25.3.3 Model Performance
25.3.4 Comparison with Conventional per Protein Models
25.3.5 External Validation
25.4 Conclusions
References
26 Machine Learning for Analyzing Drug Safety in Electronic Health Records
26.1 Introduction
26.2 Drug Safety Problems to Solve with ML
26.2.1 Prescription Error
26.2.2 Medication Misuse
26.2.3 Drug-Drug Interactions
26.3 Recent Trends of NLP and ML Methods in Pharmacovigilance
26.3.1 The Existing of NLP Approaches
26.3.2 Machine Learning Methods
26.4 Discussions
References
27 Powering Toxicogenomic Studies by Applying Machine Learning to Genomic Sequencing and Variant Detection
27.1 Introduction
27.2 Machine Learning in Genomic Variant Detections
27.2.1 Machine Learning Algorithms in Germline Variant Detection
27.2.2 Challenges in Somatic Mutation Calling
27.2.3 Machine Learning to Improve Accuracy of Somatic Mutation Detection
27.3 Training Data for Machine Learning-Based Variant Callers
27.4 Conclusion
References
28 Machine Learning for Predicting Gas Adsorption Capacities of Metal Organic Framework
28.1 Introduction
28.2 Data Sources
28.3 Descriptors of MOFs
28.4 ML Algorithms
28.5 ML Models for Predicting Gas Adsorption of MOFs
28.5.1 ML Models for CH4 Adsorption
28.5.2 ML Models for H2 Adsorption
28.5.3 ML Models for CO2 Adsorption
28.5.4 ML Models for Xe/Kr Selective Adsorption
28.6 Conclusion Remarks and Future Perspective
References

Citation preview

Computational Methods in Engineering & the Sciences

Huixiao Hong   Editor

Machine Learning and Deep Learning in Computational Toxicology

Computational Methods in Engineering & the Sciences Series Editor Klaus-Jürgen Bathe, Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA

This Series publishes books on all aspects of computational methods used in engineering and the sciences. With emphasis on simulation through mathematical modelling, the Series accepts high quality content books across different domains of engineering, materials, and other applied sciences. The Series publishes monographs, contributed volumes, professional books, and handbooks, spanning across cutting edge research as well as basics of professional practice. The topics of interest include the development and applications of computational simulations in the broad fields of Solid & Structural Mechanics, Fluid Dynamics, Heat Transfer, Electromagnetics, Multiphysics, Optimization, Stochastics with simulations in and for Structural Health Monitoring, Energy Systems, Aerospace Systems, Machines and Turbines. Climate Prediction, Effects of Earthquakes, Geotechnical Systems, Chemical and Biomolecular Systems, Molecular Biology, Nano and Microfluidics, Materials Science, Nanotechnology, Manufacturing and 3D printing, Artificial Intelligence, Internet-of-Things.

Huixiao Hong Editor

Machine Learning and Deep Learning in Computational Toxicology

Editor Huixiao Hong Division of Bioinformatics and Biostatistics National Center for Toxicological Research U.S. Food and Drug Administration Jefferson, AR, USA

ISSN 2662-4869 ISSN 2662-4877 (electronic) Computational Methods in Engineering & the Sciences ISBN 978-3-031-20729-7 ISBN 978-3-031-20730-3 (eBook) https://doi.org/10.1007/978-3-031-20730-3 This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2023 All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

Owing to the recent explosive growth of the chemical industry, compounded by improved methods to rapidly synthesize compounds with more diverse chemical structures, the need to efficiently assess the risk to public health of chemicals and evaluate the safety profile of chemical containing products is imperative. Safety evaluation and risk assessment are needed to protect not only the human species, but also other living organisms and our ecosystem. In addition to merely needing to assess chemicals, the field must also assess byproducts of chemicals and interactions with other chemicals that can occur in living organisms or the environment. Until recently, the gold standard for toxicology hazard assessment was animal models. However, animal models have numerous fundamental limitations around their utility, including, but not limited to, high cost, low throughput, low accuracy, and ethical concerns. Thus, the scientific community as a whole has declared the importance of the three “Rs”: Replace, Reduce, and Refine animal use. To meet any of these three R goals, alternative methods are required to assess the safety and hazard of chemicals. Fortunately, the toxicology community is beginning to fully realize the issues associated with animal models concomitant with incredible advancement in the field of computer science. Thus, computational toxicologists are collaborating with computer scientists to spark the dawn of the computational toxicology era. This field has the goal of predicting and understanding the hazards of chemicals in a cost- and time-efficient manner with very low risk of ethical issues. As new computational methods are developed, they can be applied to toxicology challenges, which will allow for continual iteration and improvement. Currently, artificial intelligence, mainly machine learning and deep learning, is in a Cambrian era with new advanced algorithms being developed and deployed across a number of fields. This rapid advance has quickly transformed industry, jobs, and even society. While machine learning and deep learning were developed in the context of other fields, they have all been successfully applied to computational toxicology.

v

vi

Preface

This textbook will survey the landscape of the deployment of various advanced algorithms to solve important questions around the hazard and toxicity of chemicals. As many of these algorithms come from other fields, this work will also cover important best practices to be aware while deploying such algorithms to solve critical challenges. To give well-deserved attention to the many facets of computational toxicology, this book is divided into four parts. First, the text will detail the machine learning and deep learning algorithms most relevant to the field of computational toxicology and why they are relevant. Next, this will be expanded by describing tools and approaches to enable efficient application of advanced algorithms to toxicology. As examples are important to illustrate relevance, this textbook will then focus on the application of machine learning and deep learning to chemical toxicity prediction. This textbook will also delve into the future of nascent approaches and new progress in the area. This textbook was authored with a wide audience in mind. For the classical toxicologist looking to dip a toe into computational approaches, computational concepts are thoroughly introduced and explained through clarifying examples. For the computer scientist looking to apply machine learning and deep learning to other fields, this textbook showcases numerous examples of how others have successfully applied advanced computation to solve toxicology problems. Lastly, for students and other trainees eager to further their career, this body of work will survey a variety of computational toxicology topics and could provide a spark of inspiration for a career direction. While the chapters following do review a variety of computational toxicology topics, it does not contain computer code samples, chapter quizzes, or practice exams. For instructional purposes, developing quizzes and exams are left to a course instructor. The authors would humbly appreciate comments, feedback, and corrections from readers so future work can be improved. The impressive quantity, speed, and diversity of data relevant to computational toxicology have substantially grown in recent years. These increasingly rich datasets have fueled the growth of machine learning and deep learning approaches, which are hungry for large and high-quality datasets. As the fields of toxicology and computer science continue to intersect and synchronize in new ways, the world will benefit from improved prediction and understanding of the toxicity and hazard potential of chemicals.

Preface

vii

This preface reflects the views of the authors and does not necessarily reflect those of the U.S. Food and Drug Administration. Rebecca Kusko, Ph.D. Head of Business Development and Corporate Affairs Immuneering Corporation Cambridge, MA, USA Huixiao Hong, Ph.D. SBRBPAS Expert Chief Bioinformatics Branch National Center for Toxicological Research U.S. Food and Drug Administration Jefferson, AR, USA

Contents

1

Machine Learning and Deep Learning Promote Computational Toxicology for Risk Assessment of Chemicals . . . . . . Rebecca Kusko and Huixiao Hong

1

Part I Machine Learning and Deep Learning Methods for Computational Toxicology 2

3

4

Assessment of the Xenobiotics Toxicity Taking into Account Their Metabolism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dmitry Filimonov, Alexander Dmitriev, Anastassia Rudik, and Vladimir Poroikov Emerging Machine Learning Techniques in Predicting Adverse Drug Reactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yi Zhong, Shanshan Wang, Gaozheng Li, Ji Yang, Zuquan Weng, and Heng Luo Drug Effect Deep Learner Based on Graphical Convolutional Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yunyi Wu, Shenghui Guan, and Guanyu Wang

21

53

83

5

AOP-Based Machine Learning for Toxicity Prediction . . . . . . . . . . . . 141 Wei Shi, Rong Zhang, and Haoyue Tan

6

Graph Kernel Learning for Predictive Toxicity Models . . . . . . . . . . . 159 Youjun Xu, Chia-Han Chou, Ningsheng Han, Jianfeng Pei, and Luhua Lai

7

Optimize and Strengthen Machine Learning Models Based on in Vitro Assays with Mechanistic Knowledge and Real-World Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Thilini V. Mahanama, Arpan Biswas, and Dong Wang

ix

x

Contents

8

Multitask Learning for Quantitative Structure–Activity Relationships: A Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Cecile Valsecchi, Francesca Grisoni, Viviana Consonni, Davide Ballabio, and Roberto Todeschini

Part II 9

Tools and Approaches Facilitating Machine Learning and Deep Learning Methods in Computational Toxicology

Isalos Predictive Analytics Platform: Cheminformatics, Nanoinformatics, and Data Mining Applications . . . . . . . . . . . . . . . . . 223 Dimitra-Danai Varsou, Andreas Tsoumanis, Anastasios G. Papadiamantis, Georgia Melagraki, and Antreas Afantitis

10 ED Profiler: Machine Learning Tool for Screening Potential Endocrine-Disrupting Chemicals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Xianhai Yang, Huihui Liu, Rebecca Kusko, and Huixiao Hong 11 Quantitative Target-specific Toxicity Prediction Modeling (QTTPM): Coupling Machine Learning with Dynamic Protein–Ligand Interaction Descriptors (DyPLIDs) to Predict Androgen Receptor-mediated Toxicity . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Sundar Thangapandian, Gabriel Idakwo, Joseph Luttrell, Huixiao Hong, Chaoyang Zhang, and Ping Gong 12 Mold2 Descriptors Facilitate Development of Machine Learning and Deep Learning Models for Predicting Toxicity of Chemicals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 Huixiao Hong, Jie Liu, Weigong Ge, Sugunadevi Sakkiah, Wenjing Guo, Gokhan Yavas, Chaoyang Zhang, Ping Gong, Weida Tong, and Tucker A. Patterson 13 Applicability Domain Characterization for Machine Learning QSAR Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 Zhongyu Wang and Jingwen Chen 14 Controlling for Confounding in Complex Survey Machine Learning Models to Assess Drug Safety and Risk . . . . . . . . . . . . . . . . . 355 Paul Rogers 15 Multivariate Curve Resolution for Analysis of Heterogeneous System in Toxicogenomics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 Yuan Liu, Jinzhu Lin, Menglong Li, and Zhining Wen

Contents

xi

Part III Machine Learning and Deep Learning for Chemical Toxicity Prediction 16 The Use of Machine Learning to Support Drug Safety Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 Kevin P. Cross and Glenn J. Myatt 17 Machine Learning-Based QSAR Models and Structural Alerts for Prediction of Mitochondrial Dysfunction . . . . . . . . . . . . . . . 433 Weihao Tang, Willie J. G. M. Peijnenburg, and Jingwen Chen 18 Machine Learning and Deep Learning Applications to Evaluate Mutagenicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 Linlin Zhao and Catrin Hasselgren 19 Modeling Tox21 Data for Toxicity Prediction and Mechanism Deconvolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 Tuan Xu, Menghang Xia, and Ruili Huang 20 Identification of Structural Alerts by Machine Learning and Their Applications in Toxicology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479 Chaofeng Lou, Yaxin Gu, and Yun Tang 21 Machine Learning in Prediction of Nanotoxicology . . . . . . . . . . . . . . . 497 Li Mu, Fubo Yu, Yuying Jia, Shan Sun, Xiaokang Li, Xiaolin Zhang, and Xiangang Hu 22 Machine Learning for Predicting Organ Toxicity . . . . . . . . . . . . . . . . . 519 Jie Liu, Wenjing Guo, Fan Dong, Tucker A. Patterson, and Huixiao Hong Part IV The Progress of Machine Learning and Deep Learning in New Areas 23 Computational Modeling for the Prediction of Hepatotoxicity Caused by Drugs and Chemicals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541 Minjun Chen, Jie Liu, Tsung-Jen Liao, Kristin Ashby, Yue Wu, Leihong Wu, Weida Tong, and Huixiao Hong 24 Artificial Intelligence for Risk Assessment of Cancer Therapy-Related Cardiotoxicity and Precision Cardio-Oncology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563 Jessica Castrillon Lal and Feixiong Cheng 25 Deep Learning Model for Prediction of Compound Activities Over a Panel of Major Toxicity-Related Proteins . . . . . . . . . . . . . . . . . 579 Mariia Radaeva, Mohit Pandey, Hazem MsLati, and Artem Cherkasov

xii

Contents

26 Machine Learning for Analyzing Drug Safety in Electronic Health Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595 Meijian Guan 27 Powering Toxicogenomic Studies by Applying Machine Learning to Genomic Sequencing and Variant Detection . . . . . . . . . . 611 Li Tai Fang 28 Machine Learning for Predicting Gas Adsorption Capacities of Metal Organic Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629 Wenjing Guo, Jie Liu, Fan Dong, Tucker A. Patterson, and Huixiao Hong

Editor and Contributors

About the Editor Huixiao Hong is a Senior Biomedical Research and Biomedical Product Assessment Service (SBRBPAS) expert and the chief of Bioinformatics Branch, Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration (FDA), working on the scientific bases for regulatory applications of bioinformatics, cheminformatics, artificial intelligence, and genomics. Before joining the FDA, he was the manager of Bioinformatics Division of Z-Tech, an ICFI company. He held a research scientist position at Sumitomo Chemical Company in Japan and was a visiting scientist at National Cancer Institute at National Institutes of Health. He was also an associate professor and the director of Laboratory of Computational Chemistry at Nanjing University in China. Dr. Hong is a member of steering committee of OpenTox, a member of the board directors of US MidSouth Computational Biology and Bioinformatics Society, and in the leadership circle of US FDA modeling and simulation working group. He published more than 240 scientific papers with a Google Scholar h-index 60. He serves as an associate editor for Experimental Biology and Medicine and an editorial board member for multiple peer-reviewed journals. He received his Ph.D. from Nanjing University in China and conducted research in Leeds University in England.

Contributors Antreas Afantitis NovaMechanics Ltd, Nicosia, Cyprus Kristin Ashby National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA Davide Ballabio Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milano, Italy

xiii

xiv

Editor and Contributors

Arpan Biswas Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA; Center for Nanophase Materials Sciences, Oak Ridge National Laboratory, Oak Ridge, TN, USA Jingwen Chen Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology, Dalian University of Technology, Dalian, China; School of Environmental Science and Technology, Dalian University of Technology, Dalian, China Minjun Chen National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA Feixiong Cheng Cleveland Clinic, Genomic Medicine Institute, Lerner Research Institute, Cleveland, OH, USA; Department of Molecular Medicine, Cleveland Clinic, Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, USA; School of Medicine, Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, OH, USA Artem Cherkasov Vancouver Prostate Centre, University of British Columbia, Vancouver, BC, Canada Chia-Han Chou Infinite Intelligence Pharma, Beijing, China Viviana Consonni Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milano, Italy Kevin P. Cross Instem, Columbus, OH, USA Alexander Dmitriev Laboratory of Structure-Function Based Drug Design, Institute of Biomedical Chemistry, Moscow, Russia Fan Dong National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA Li Tai Fang Freenome, South San Francisco, CA, USA Dmitry Filimonov Laboratory of Structure-Function Based Drug Design, Institute of Biomedical Chemistry, Moscow, Russia Weigong Ge National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA Ping Gong Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS, USA; U.S. Army Engineer Research and Development Center, Vicksburg, MS, USA Francesca Grisoni Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, the Netherlands

Editor and Contributors

xv

Yaxin Gu Laboratory of Molecular Modeling and Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China Meijian Guan Janssen Research and Development, LLC, Spring House, PA, USA Shenghui Guan Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China Wenjing Guo National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA Ningsheng Han Infinite Intelligence Pharma, Beijing, China Catrin Hasselgren Genentech, Inc, South San Francisco, CA, USA Huixiao Hong National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA Xiangang Hu Key Laboratory of Pollution Processes and Environmental Criteria (Ministry of Education)/Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin, China Ruili Huang Division of Preclinical Innovation (DPI), National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD, USA Gabriel Idakwo School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, USA Yuying Jia Key Laboratory of Pollution Processes and Environmental Criteria (Ministry of Education)/Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin, China Rebecca Kusko Immuneering Corporation, Cambridge, MA, USA Luhua Lai Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China; Peking-Tsinghua Center for Life Sciences at the College of Chemistry and Molecular Engineering, BNLMS, Peking University, Beijing, China; Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China Jessica Castrillon Lal Cleveland Clinic, Genomic Medicine Institute, Lerner Research Institute, Cleveland, OH, USA; Department of Molecular Medicine, Cleveland Clinic, Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, USA Tsung-Jen Liao National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA

xvi

Editor and Contributors

Gaozheng Li The Centre for Big Data Research in Burns and Trauma, College of Computer and Data Science, Fuzhou University, Fuzhou, Fujian, China Menglong Li College of Chemistry, Sichuan University, Chengdu, China Xiaokang Li School of Environmental and Material Engineering, Yantai University, Yantai, China Jinzhu Lin College of Chemistry, Sichuan University, Chengdu, China Huihui Liu Jiangsu Key Laboratory of Chemical Pollution Control and Resources Reuse, School of Environmental and Biological Engineering, Nanjing University of Science and Technology, Nanjing, China Jie Liu National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA Yuan Liu College of Chemistry, Sichuan University, Chengdu, China; Medical Big Data Center, Sichuan University, Chengdu, China Chaofeng Lou Laboratory of Molecular Modeling and Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China Heng Luo MetaNovas Biotech Inc, Millbrae, CA, USA Joseph Luttrell School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, USA Thilini V. Mahanama Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA; Department of Industrial Management, Faculty of Science, University of Kelaniya, Gampaha, Sri Lanka Georgia Melagraki NovaMechanics Ltd, Nicosia, Cyprus Hazem MsLati Vancouver Prostate Centre, University of British Columbia, Vancouver, BC, Canada Li Mu Tianjin Key Laboratory of Agro-environment and Safe-Product, Key Laboratory for Environmental Factors Control of Agro-product Quality Safety (Ministry of Agriculture and Rural Affairs), Institute of Agro-environmental Protection, Ministry of Agriculture and Rural Affairs, Tianjin, China Glenn J. Myatt Instem, Columbus, OH, USA Mohit Pandey Vancouver Prostate Centre, University of British Columbia, Vancouver, BC, Canada Anastasios G. Papadiamantis NovaMechanics Ltd, Nicosia, Cyprus Tucker A. Patterson National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA

Editor and Contributors

xvii

Jianfeng Pei Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China Willie J. G. M. Peijnenburg Institute of Environmental Sciences, Leiden University, Leiden, The Netherlands; Center for Safety of Products and Substances of the National Institute of Public Health and the Environment (RIVM), Bilthoven, The Netherlands Vladimir Poroikov Laboratory of Structure-Function Based Drug Design, Institute of Biomedical Chemistry, Moscow, Russia Mariia Radaeva Vancouver Prostate Centre, University of British Columbia, Vancouver, BC, Canada Paul Rogers National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA Anastassia Rudik Laboratory of Structure-Function Based Drug Design, Institute of Biomedical Chemistry, Moscow, Russia Sugunadevi Sakkiah National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA Wei Shi State Key Laboratory of Pollution Control and Resources Reuse, School of the Environment, Nanjing University, Nanjing, Jiangsu, China Shan Sun Key Laboratory of Pollution Processes and Environmental Criteria (Ministry of Education)/Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin, China Haoyue Tan State Key Laboratory of Pollution Control and Resources Reuse, School of the Environment, Nanjing University, Nanjing, Jiangsu, China Weihao Tang School of Environmental Science and Technology, Dalian University of Technology, Dalian, China; Guangdong Key Laboratory of Integrated Agro-environmental Pollution Control and Management, National-Regional Joint Engineering Research Center for Soil Pollution Control and Remediation in South China, Institute of Eco-environmental and Soil Sciences, Guangdong Academy of Sciences, Guangzhou, China Yun Tang Laboratory of Molecular Modeling and Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China Sundar Thangapandian Oak Ridge Institute for Science and Education, Oak Ridge, TN, USA; Hotspot Therapeutics, Inc., Boston, MA, USA Roberto Todeschini Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milano, Italy

xviii

Editor and Contributors

Weida Tong National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA Andreas Tsoumanis NovaMechanics Ltd, Nicosia, Cyprus Cecile Valsecchi Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milano, Italy Dimitra-Danai Varsou NovaMechanics Ltd, Nicosia, Cyprus Dong Wang Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA Guanyu Wang Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China; School of Medicine Life, and Health Sciences, Chinese University of Hong Kong, Shenzhen, China Shanshan Wang The Centre for Big Data Research in Burns and Trauma, College of Computer and Data Science, Fuzhou University, Fuzhou, Fujian, China Zhongyu Wang Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology, Dalian University of Technology, Dalian, China Zhining Wen College of Chemistry, Sichuan University, Chengdu, China; Medical Big Data Center, Sichuan University, Chengdu, China Zuquan Weng The Centre for Big Data Research in Burns and Trauma, College of Computer and Data Science, Fuzhou University, Fuzhou, Fujian, China; College of Biological Science and Engineering, Fuzhou University, Fuzhou, Fujian, China Leihong Wu National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA Yue Wu National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA Yunyi Wu Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China Menghang Xia Division of Preclinical Innovation (DPI), National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD, USA Tuan Xu Division of Preclinical Innovation (DPI), National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD, USA

Editor and Contributors

xix

Youjun Xu Peking-Tsinghua Center for Life Sciences at the College of Chemistry and Molecular Engineering, BNLMS, Peking University, Beijing, China; Infinite Intelligence Pharma, Beijing, China Ji Yang The Centre for Big Data Research in Burns and Trauma, College of Computer and Data Science, Fuzhou University, Fuzhou, Fujian, China Xianhai Yang Jiangsu Key Laboratory of Chemical Pollution Control and Resources Reuse, School of Environmental and Biological Engineering, Nanjing University of Science and Technology, Nanjing, China Gokhan Yavas National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA Fubo Yu Key Laboratory of Pollution Processes and Environmental Criteria (Ministry of Education)/Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin, China Chaoyang Zhang School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, USA Rong Zhang State Key Laboratory of Pollution Control and Resources Reuse, School of the Environment, Nanjing University, Nanjing, Jiangsu, China Xiaolin Zhang Key Laboratory of Pollution Processes and Environmental Criteria (Ministry of Education)/Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin, China Linlin Zhao Genentech, Inc, South San Francisco, CA, USA Yi Zhong The Centre for Big Data Research in Burns and Trauma, College of Computer and Data Science, Fuzhou University, Fuzhou, Fujian, China

Chapter 1

Machine Learning and Deep Learning Promote Computational Toxicology for Risk Assessment of Chemicals Rebecca Kusko and Huixiao Hong

1.1 Risk Assessment of Chemicals Rigorous management of chemical risk requires rigorous risk assessment of predicted toxicological, environmental, and physical effects of an ever-growing panoply of chemicals. Current state-of-the-art experimental systems have not kept pace with the sheer scale and scope of the increasing universe size of all chemical matter in use. As synthetic chemistry approaches have matured, larger and larger chemical libraries have become commonplace, leading to the potential for a myriad of new chemicals to enter various systems via agriculture, environmental exposure, waste, pollution, as well as human use via pharmaceutical drugs and food products. From a chemical risk assessment perspective, a system-level approach is needed to study exposure pathways, outcomes, and populations, including potential transport and mixtures of chemicals into key organs across multiple species including humans (Kavlock et al. 2018). Food product, food supplements, and natural medicine risk assessment demands thorough understanding of digestion and delivery via the gastrointestinal tract, breakdown of metabolites, route of excretion, and context of product use. Finally, the pharmaceutical and biotech industry must focus on rigorous toxicity evaluation criterium such as dose dependent characteristics, organ exposure, safe first-in-human dosages, reproductive toxicity, carcinogenicity, metabolite breakdown, and route of excretion.

R. Kusko Immuneering Corporation, Cambridge, MA 02142, USA e-mail: [email protected] H. Hong (B) National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA e-mail: [email protected] This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2023 H. Hong (ed.), Machine Learning and Deep Learning in Computational Toxicology, Computational Methods in Engineering & the Sciences, https://doi.org/10.1007/978-3-031-20730-3_1

1

2

R. Kusko and H. Hong

The second key task of chemical risk assessment is to identify adverse effects posed by exposure to chemicals (Gan 2019). Many traditional methods exist for chemical hazard identification. In vitro assays can include cytotoxicity, DNA damage, oxidative stress, endocrine disruption, immune cell activation, and more. Unlike two-dimensional (2D) culture, the human body is a complex mixture of cell types interacting in various local microenvironments. Thus, in vitro assays have limited translational utility to the true human condition Moreover, the scale of in vitro assays fundamentally limits how many chemical substances and combinations of chemical substances can be tested. Moving closer to the human context, mice and rats are the standard for in vivo toxicology testing. In addition to being many orders of magnitude less efficient at rapidly screening chemical matter, in vivo experiments introduce economic and ethical concerns.

1.2 Computational Toxicology To overcome the numerous fundamental limitations of laboratory-based toxicology, the field of computational science has been applied with increasing success to various aspects of toxicology. In the field of computational toxicology, computer programs are used to design, build, test, and improve on various models which are aimed at predicting the hazards and toxicological effects of chemicals found in pharmaceuticals, agricultural products, foodstuffs, and the environment. Specifically, these computer models predict mechanism of action, absorption, breakdown, metabolism, pharmacokinetics, pharmacodynamics, excretion, organ penetration, interactions, and even various experimental parameters based on chemical structure and other chemical features. Various computational methods have been developed in toxicological research. Based on the data that used in the computational methods, they can be divided into five classes as illustrated in Fig. 1.1. When a toxicity mechanism targets protein structure, the toxicity of a chemical through interaction with the target protein can be estimated using a variety of methods such as docking and molecular dynamic simulations (Ng et al. 2015; Sakkiah et al. 2019). However, most toxicity effects are not caused by a single protein target. Such toxicity effects can be predicted using computational models developed based on the structures of chemicals whose toxicity data are available (Hong et al. 1998; Shi et al. 2002; Sakamuru et al. 2021; Yang et al. 2021). The toxicity of a chemical can be estimated using genomic and genetic data (Li et al. 2019; Ji et al. 2020), proteomic data (Bruno et al. 2016), and metabolomic data (Cheng et al. 2017) when such data are available. The first computational toxicology approaches to emerge with promising signs pointing toward success were physics based. These models interrogated the chemical structure of the potential toxicant and used structural alerts (Tan et al. 2020, 2021), docking and/or molecular dynamics simulation to predict a specific endpoint. For proteins and other molecules with well-detailed three-dimensional (3D) structure, docking can be used to predict the predominant pose and/or binding mode of a

1 Machine Learning and Deep Learning Promote Computational …

3

Fig. 1.1 Illustration of computational toxicology

ligand (potential toxicant). In other words, docking models reveal how a potential toxicant and a biomolecule may or may not interact at the atomic level. Molecular dynamics simulation models calculate the motion of particles from a starting position across a time frame. Particle interactions throughout the system are calculated by considering physics metrics including previous position, velocity, and acceleration. Like docking, molecular dynamics simulation is used across many fields. In the field of computational toxicology, it is employed to connect structural biology to chemical toxicity (Ng et al. 2014; Selvaraj et al. 2018; Sakkiah et al. 2021a, b). When the 3D structure of a toxicity target protein is unknown or the toxicity is caused by interaction of chemicals with multiple targets, a different approach is used to study the relationship between molecular descriptors of the chemicals and their toxicity. Formally, this approach is known as quantitative structure–activity relationship (QSAR). Generally, a QSAR model is used to predict the toxicity of chemicals based on their chemical structures. A core underlying assumption of this approach is the similarity principal, where the modelers assume that similar structures and/or descriptors will cause similar toxicity. QSAR models have been developed

4

R. Kusko and H. Hong

for predicting various toxicological endpoints (Hong et al. 2017; Huang et al. 2020; Tang et al. 2020; Wang et al. 2021). With the advent of the genomic area and ever-decreasing cost of genomic sequencing, genomics methods have now been taken up and utilized by the computational toxicology field, a discipline termed as toxicogenomics. Toxicogenomics seeks to correlate toxicity patterns of chemicals with their genomic, transcriptomic, and genetic data. Thus, genome-wide transcriptomic responses to potential toxicants are studied in a systematic way, enabling molecular mechanism prediction. A large amount of toxicogenomic data have been generated (Davis et al. 2019), facilitating applications of toxicogenomics. Quality control efforts in genomic and genetic data generation and processing have been made to enable high-quality data to be utilized in toxicogenomics (Hong et al. 2008, 2012). Toxicogenomic approaches show great promise with capturing the many mechanisms of toxicity response which cells can display (Lauschke 2021). After the advent of the genomic area, nascent whole proteome approaches are beginning to emerge for studying toxicology, a branch of proteomics termed as toxicoproteomics (George et al. 2010; Suman et al. 2016). Toxicoproteomics identifies and characterizes proteins expressed at the whole proteome level of a biological system exposed to a toxicant chemical. While toxicoproteomic approaches are applied to many fields, in computational toxicology, this approach can be used two ways. Firstly, toxicoproteomic data can be used to identify protein biomarkers using computational techniques for predicting toxicity of chemicals (Tomonaga et al. 2021). Secondly, toxicoproteomic data can be used to reveal underlying toxicology mechanisms of action including molecular pathways (Pizzatti et al. 2020). The liver and liver cells are not always, but often the focus of toxicoproteomic studies (Kumagai et al. 2006). In parallel to proteomics, the art of studying hundreds of metabolites in a living system, termed as metabolomics, has come of age. Cellular metabolism drives changes in carbohydrates, nucleotides, phospholipids, fatty acids, steroids, and more. Thus, metabolomics can be leveraged to determine healthy cellular metabolism and query for the impact of a potential toxicant. Toxicometabolomics is a discipline which applies metabolomics to characterize the metabolic responses and biological pathways in biological systems caused by toxicant chemicals (Araújo et al. 2021). Toxicometabolomics is poised to pinpoint not only the extent of toxicity impact, but also the mechanism of toxicity of a chemical substance. Compellingly, metabolomics can be performed on human biofluids (blood, urine, etc.) which allows for toxicity monitoring in a clinical setting. A variety of analytical techniques are used in toxicometabolomics (Miggiels et al. 2019). Table 1.1 lists some commonly used techniques and their advantages. Computational toxicology uses a variety of computational methods to study toxicology, including traditional computational methods for data preprocessing and statistical methods such as univariate and multivariate analysis. In addition, computational toxicology includes development of databases (Shen et al. 2013). Data quality is vital to the reliability of computational toxicology findings. There are a variety of

1 Machine Learning and Deep Learning Promote Computational …

5

Table 1.1 Commonly used analytical techniques in toxicometabolomics Analytical technique

Advantages

Nuclear magnetic resonance

Fast, simultaneously measure all types of metabolites, high accuracy and technical reproducibility, works on both liquid and solid matrices, easy sample preparation, structural information, quantification of all metabolites, high-throughput capability

Direct infusion-mass spectrometry

Fast, high reproducibility, small amount of sample, no loss of metabolites in sample preparation, high-throughput capability, simple data processing

Capillary electrophoresis-mass spectrometry

Small sample volume, fast analysis, minimal sample preparation, variety of molecules can be analyzed

Liquid chromatography-mass spectrometry

Very high sensitivity (< μM), robust, enables analysis of thermolabile metabolites, simple sample preparation, suitable for the study of lipids and other macromolecules

Gas chromatography-mass spectrometry

Very high sensitivity (< μM), wide linear range, enables simultaneous analysis of different types of metabolites, compound identification facilitated by libraries

High-resolution mass spectrometry

Enable the determination of accurate mass and isotopic distribution, high resolution, selectivity, and specificity, useful in metabolite identification, method development for a quantification assay can be faster

efforts in quality control of data for computational toxicology, especially for omics data such as proteomic data (Hong et al. 2005a).

1.3 Machine Learning in Computational Toxicology Recently, machine learning, especially deep learning, has gained traction in computational toxicology. Machine learning, a subset of artificial intelligence, differs from other statistical methods in that algorithms are developed and subsequently improved through data. While machine learning is very popular currently, it is not a new approach and first emerged in the 1950s. Machine learning, especially deep learning, relies heavily on large amounts of high-quality training data. The current era of big data has driven a surge in use of machine learning, where these algorithms can shine and deliver to their full potential. After rigorous and robust data cleaning, machine learning approaches all begin by observation of initial data where an algorithm can observe a pattern. This pattern is learned rather than a human inputting all

6

R. Kusko and H. Hong

of the parameters. Machine learning has been successfully applied to a panoply of other fields, including medicine (Rajkomar et al. 2019; Sidey-Gibbons and SideyGibbons 2019; Goecks et al. 2020), drug discovery (Elbadawi et al. 2021), food safety (Deng et al. 2021), chemistry (Meuwly 2021), genomics (Schreiber and Singh 2021), protein structure prediction (AlQuraishi 2021), earth sciences (Beroza et al. 2021), microbiology (Nami et al. 2021), and immunotoxicity (Luo et al. 2015a). The field of machine learning has numerous algorithms which are highly relevant and utilized to serve computational toxicology needs. Matrix factorization, which emerged from linear algebra, assumes that endpoints have arisen from a combination of latent processes and aims to identify the drivers (components) that describe the processes (Cantini et al. 2019). In purely technical terms, a matrix is factorized into lower dimensional components, which when multiplied together will re-create the original matrix. Tensor factorization is a very widely applied machine learning algorithm and involves creating a multi-dimensional array (tensor). The algorithm obtains a compact representation of said tensor (Luo et al. 2017). Another relevant approach, group factor analysis, works especially well as applied to multiple datasets where a joint representation is created (Bunte et al. 2016). This can reveal common patterns across data. Decision tree is a simple machine learning algorithm. It recursively splits a set of samples into subsets by selecting the best independent variables (Quinlan 1986). One of its advantages is the integration of variable selection into its model construction. The k-nearest neighbors (kNN) algorithm is another simple machine learning algorithm (Altman 1992). A kNN model predicts the dependent variable value of a sample using the dependent variable values of k samples that are most near it measured by the distances to the sample. Learning is to determine the parameter k based on a training set. Since kNN uses distance to determine sample to be used for prediction, the performance of a kNN model is impacted by the used distance metric. Some machine learning algorithms have much more complicated mathematical reasoning, including artificial neural network (ANN) and support vector machine (SVM). ANN is an architecture of machine learning, mimicking the biological neural system as illustrated in Fig. 1.2. An ANN model consists of three layers: input layer (independent variables), hidden layer (neurons), and output layer (dependent variable). Mathematically, ANN learns from training samples to determine the weights for combining input data to the neurons and for combining the neurons to output using a decision function by either a feedforward or a backpropagation strategy. SVM is another machine learning algorithm with a complicated mathematical reasoning which was first developed by Vladimir Vapnik and his colleagues (Cortes and Vapnik 1995). SVM learns from training samples to determine a hyperplane in a high-dimensional space which is mapped from a lower dimensional space defined by the original input training data using a kernel function. SVM modeling decides the parameters used in the kernel function by determining the hyperplane. Ensemble learning is an attractive machine learning strategy. Random forest (Breiman 2001) and decision forest (Hong et al. 2005b) are two examples of ensemble learning algorithms. Both random forest and decision forest combine decision trees. However, the construction of decision trees is distinct between these two approaches. Random forest uses a large number of shallow trees

1 Machine Learning and Deep Learning Promote Computational …

7

Fig. 1.2 Illustration of ANN architecture

which are generated using a set of samples based on a subset of original independent variables, while decision forest uses a small number of deep trees which are generated using all samples and all independent variables after removing some dependent variables used by other trees. Machine learning algorithms, linear regression, logistic regression, naive Bayes, k-means clustering, hierarchical clustering, expectation maximization, self-organizing map, least absolute shrinkage and selection operator, and linear discriminant analysis are used in computational toxicology. When applying machine learning to computational toxicology, a one-size-fits-all approach is ill advised. Instead, it is best to first clearly define the specific question or problem, and only then select the optimal machine learning tool for that task. Each machine learning approach has different advantages and disadvantages and thus should be used in a targeted way rather than as a panacea. For example, the question in mind may determine how chemical structures are represented (molecular fingerprints, molecular descriptors) as features to be processed by machine learning. One important task in application of machine learning to computational toxicology is to select a set of optimal independent variables in construction of machine learning models (Idakwo et al. 2019). For prediction of chemical toxicity using machine learning models trained with small molecules and some sort of molecular fingerprint, many algorithms have been utilized, including matrix factorization, SVM, classification and regression tree (CART), random forest, tensor factorization, and kNN. A previous study used the “caret” package in R to compare these methods and found that random forest outperforms the other machine learning algorithms in classification of toxins and nontoxins using a dataset of 2849 known toxic small molecules (Sharma et al. 2017). In support of this finding, another group used SVM and random forest to predict toxicity endpoints using circular descriptors achieved an area under the curve of receiver

8

R. Kusko and H. Hong

characteristic operator (ROC-AUC) of equal to or higher than 0.7 (Koutsoukas et al. 2016). Non-negative matrix factorization was used for analysis of multiple toxicogenomic datasets, and the results demonstrated that this algorithm is able not only to distinguish between the tissue types, but also to discriminate dosage levels of the chemical used (Lee et al. 2012). The unsupervised machine learning algorithm nonnegative tensor factorization was developed and successfully applied in identification of chemical contaminant sources in groundwater (Vesselinov et al. 2019). Of a huge number of successful applications of machine learning in computational toxicology, a handful of examples are mentioned here such as predicting inhibition of mitochondrial fusion and fission (Tang et al. 2021), predicting inhibitors of the mitochondrial electron transport chain (Tang et al. 2022), classifying peroxisome proliferator-activated receptor gamma (PPARγ ) agonists (Wang et al. 2020), predicting nicotinic acetylcholine receptor (nAChR) α7 binding activity of chemicals in tobacco and smoke (Sakkiah et al. 2020), predicting neurotoxicity (Monzel et al. 2020), evaluating drug-induced liver toxicity (Chen et al. 2013); facilitating identification of persistent organic pollutants (Guo et al. 2019, 2021), predicting human leukocyte antigens (HLA)-peptide binding activity (Luo et al. 2015b), and evaluating genotoxicity of metal oxide nanoparticles (Sizochenko et al. 2019). For application of machine learning in toxicogenomics, matrix or tensor factorization methods have been leveraged to identify latent patterns linking transcriptomic toxicity data to structural compound fingerprints. The advent of the NCI-60 dataset has opened up the opportunity for applying machine learning to toxicogenomics. This dataset of 59 human cell lines provides GI50 (50% growth inhibition), total growth inhibition (TGI), and LC50 (50% lethal concentration) data (Shoemaker 2006). Additionally, the cMAP (connectivity Map) dataset containing gene expression data of human cell lines are treated with thousands of chemicals. This provides a resource for training machine learning models (Lamb et al. 2006). Researchers have reported using latent Dirichlet allocation to identify toxicogenomic patterns between gene expression changes and toxicity scores (Kohonen et al. 2017). These authors used the identified pattern of gene expression to understand the mechanism of drug-induced liver injury. Machine learning approaches work best with larger datasets, so as more and more transcriptomic datasets with toxicology endpoints become available, these advanced approaches will only increase in prevalence and utility. Macromolecules have a very different physiochemical features from small molecules and thus have merited their own focused study. In this field of toxicoproteomics, the features, properties, and structural alerts of toxic proteins are studied in an attempt to predict the toxicity of chemicals. The relatively recent availability of proteome data including the Protein Information Resource (PIR) (Barker et al. 2000), the CATH database (Das et al. 2015), the Protein Data Bank (PDB) (Berman et al. 2000), and Uniprot (UniProt Consortium 2019) in addition to macromolecule toxin resources in the Toxin and Toxin Target Database (T3DB) (Wishart et al. 2015) in the Structural Classification of Proteins-2 (SCOP2) (Andreeva et al. 2014) the Dictionary of Secondary Structures of Proteins (DSSP) (Kabsch and Sander 1983), and Stride (Lowe et al. 2009) have allowed for machine learning approaches to be

1 Machine Learning and Deep Learning Promote Computational …

9

applied to toxicogenomics. For example, researchers applied SVM, gradient boosted machine, and generalized linear model classifiers to the UniProt database and were able to discriminate toxic sequences from non-toxic sequences with >99% accuracy (Gacesa et al. 2016). Similarly, boosted-stump classification has been employed to search for toxic sequences within macromolecules (Naamati et al. 2009). Machine learning ensemble methods, such as random forest, are especially applicable to metabolomics. Unlike classical linear regression, nonlinear machine learning algorithms, such as random forest and SVM, do not assume linear relationships which are critical for metabolite analysis and enable discovery of unexpected interactions and co-dependencies of metabolites. For example, a previous study trained a random forest model on metabolomic and lipidomic data to predict liver weight and other clinical chemistry parameters (Acharjee et al. 2016). The authors’ success there suggests that machine learning applied to toxicometabolomics could be used to predict other types of toxicity beyond hepatotoxicity. Based on metabolomic data obtained from Q/TOF/MS (quantitative time-of-flight mass spectrometry), metabolomic biomarkers in rat serum for predicting kidney toxicity were successfully identified using SVM (Song et al. 2020).

1.4 Deep Learning in Toxicology Deep learning, an evolution of machine learning, constructs algorithms in layers, creating an artificial network that can learn and make “intelligent” decisions with minimum human supervision. The simplified architecture of a deep neural network, the most often applied deep learning approach, is illustrated in Fig. 1.3. When a machine learning model makes a wrong prediction, the algorithm will improve by the computer programmer adjusting the code. However, a deep learning model can detect the error and self-adjust. One of the top skills of deep learning is identifying structures in highly dimensional noisy data. This power comes at the cost of needing very large and well-curated datasets to function properly (LeCun et al. 2015). Given the explosion of such data in recent years, deep learning has been successfully leveraged in various fields including, but not limited to neuroimaging (Abrol et al. 2021), drug repositioning (Liu et al. 2021), chemistry (Korshunova et al. 2021), oncology (Kim et al. 2021), pharmacology (Baptista et al. 2021), virology (Mock et al. 2021), fetal embryo transfer (Tran et al. 2019), nuclei segment detection (Yang et al. 2020), sequence analysis (Jing et al. 2020), and medicine (Bora et al. 2021). There are several types of deep learning models, each of which can be applied to various aspects of computational toxicology. Supervised deep learning is similar to supervised machine learning in that an error function between model output and the truth is calculated. Parameters internal to the algorithm are adjusted to improve the performance of the deep learning model based on this error score. A deep learning model might have a large number of such parameters. There are numerous approaches, but stochastic gradient descent is one of the most commonly used methods to improve the error function. With this method, the algorithm is shown

10

R. Kusko and H. Hong

Fig. 1.3 Illustration of deep neural network architecture

many small sets of data and iteratively adjusts the internal parameters accordingly (Bottou 2012). This procedure is repeated until improvement is no longer larger than a pre-determined value. Based on the chain rule for derivatives, deep learning approaches use backpropagation to calculate the objective of an input starting with the previous output (Baldi et al. 2018). This procedure can be repeated through all layers, starting from the output (prediction made by the model) all the way back through to the input (where data is fed in). Feedforward approaches have also been described, where a weighted sum of inputs from the previous layer is computed and passed to the next layer through a nonlinear function (Telgarsky 2015). The often-used term “hidden layers” simply refers to layers which are not the input or output (Fig. 1.3) (Schilling et al. 2019). Other deep learning approaches include convolutional neural networks (Chen et al. 2017) and recurrent neural networks (Zaremba et al. 2015). To date, deep learning has been applied to answer computational toxicology challenges in numerous cases. For example, training datasets of endocrine disrupting chemicals have been getting larger and larger. A group reported that their deep learning QSAR method out-performed other statistical methods, including machine learning. Their reported Pearson correlation coefficient (R2 ) was 0.80 and coefficient of determination (Q2 ) was 0.86 (Heo et al. 2019). As described above, deep learning has been shown to excel at image analysis. Inspired by this, a group of researchers applied deep learning to analyze images of DAPI (40, 6-diamidine-2-fenilindol) stained cells treated with various toxicants (Jimenez-Carretero et al. 2018). The authors leveraged a convolutional

1 Machine Learning and Deep Learning Promote Computational …

11

neural network (Simonyan and Zisserman 2015) to not only detect nuclei, but classify them by health status. Convolutional neural networks have also been employed to screen for potential persistent organic pollutant and/or persistent, bioaccumulative, and toxic substances (Sun et al. 2020). The authors reported average prediction accuracy of 95.3%. In a third convolutional neural network example, researchers used Tox21 data to predict twelve toxicological endpoints and achieved an average ROC-AUC score of 0.757 (Chen et al. 2021). Given the limited data in the Tox21 dataset, the authors used a previously described semi-supervised learning approach called “Mean Teacher” to enhance model performance (Tarvainen and Valpola 2018).

1.5 Perspectives Advanced techniques such as deep learning and machine learning present great promise as more and more data becomes available, but they will be restricted by certain limitations. Deep learning, in particular, has a voracious appetite for computing power. Many smaller scale efforts with lower budgets or limited computational resources would struggle to deploy an appropriately powered deep learning model. For example, the expense of deploying a convolutional neural network scales with product of the number of parameters against the number of data points. In other words, the computational need will scale quadratically with the data. The success of artificial intelligence, more specifically machine learning and deep learning, in the computational toxicology space to date has been greatly enabled by ever-increasing dataset size. While simple linear models can be deployed on small datasets, advanced machine learning and deep learning algorithms absolutely require quite large datasets. There are nascent approaches to apply deep learning to smaller datasets, including few shot learning (Snell et al. 2017). For the real potential of machine learning and deep learning to be achieved in computational toxicology, dataset size will need to continue to increase and in parallel algorithms which can better work with smaller datasets must be developed. Many machine learning and deep learning approaches are developed in data rich spaces, such as image analysis. Toxicology datasets are not yet at this size. To date, chemists have created many different ways to model or represent molecular properties with some amount of abstraction. Molecules can be represented as two-dimensional molecular graphs, three-dimensional molecular graphs, point clouds, meshes, transformers, or string notation, with the most famous being simplified molecular-input line-entry system (SMILES) strings (Hanson 2016). As both the fields of molecular descriptors and machine learning/deep learning advance, more sophisticated queries will become possible in the field of computational toxicology. Disclaimer: This chapter reflects the views of the authors and does not necessarily reflect those of the U.S. Food and Drug Administration.

12

R. Kusko and H. Hong

References Abrol A, Fu Z, Salman M, Silva R, Du Y, Plis S, Calhoun V (2021) Deep learning encodes robust discriminative neuroimaging representations to outperform standard machine learning. Nat Commun 12:353 Acharjee A, Ament Z, West JA, Stanley E, Griffin JL (2016) Integration of metabolomics, lipidomics and clinical data using a machine learning method. BMC Bioinform 17:440 AlQuraishi M (2021) Machine learning in protein structure prediction. Curr Opin Chem Biol 65:1–8 Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46(3):175–185 Andreeva A, Howorth D, Chothia C, Kulesha E, Murzin AG (2014) SCOP2 proto-type: a new approach to protein structure mining. Nucleic Acids Res 42:D310–D314 Araújo AM, Carvalho F, Guedes de Pinho P, Carvalho M (2021) Toxicometabolomics: small molecules to answer big toxicological questions. Metabolites 11(10):692 Baldi P, Sadowski P, Lu Z (2018) Learning in the machine: random backpropagation and the deep learning channel. Artif Intell 260:1–35 Baptista D, Ferreira PG, Rocha M (2021) Deep learning for drug response prediction in cancer. Brief Bioinform 22:360–379 Barker WC, Garavelli JS, Huang H, McGarvey PB, Orcutt BC, Srinivasarao GY, Xiao C, Yeh LS, Ledley RS, Janda JF, Pfeiffer F, Mewes HW, Tsugita A, Wu C (2000) The protein information resource (PIR). Nucleic Acids Res 28:41–44 Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28:235–242 Beroza GC, Segou M, Mostafa Mousavi S (2021) Machine learning and earthquake forecasting— next steps. Nat Commun 12:4761 Bora A, Balasubramanian S, Babenko B, Virmani S, Venugopalan S, Mitani A, de Oliveira MG, Cuadros J, Ruamviboonsuk P, Corrado GS, Peng L, Webster DR, Varadarajan AV, Hammel N, Liu Y, Bavishi P (2021) Predicting the risk of developing diabetic retinopathy using deep learning. Lancet Digit Health 3:e10–e19 Bottou L (2012) Stochastic Gradient Descent Tricks. In: Montavon G, Orr GB, Müller K-R (eds) Neural networks: tricks of the trade, 2nd edn. Springer, Berlin, pp 421–436 Breiman L (2001) Random forests. Mach Learn 45(1):5–32 Bruno M, Ross J, Ge Y (2016) Proteomic responses of BEAS-2B cells to nontoxic and toxic chromium: protein indicators of cytotoxicity conversion. Toxicol Lett 264:59–70 Bunte K, Leppäaho E, Saarinen I, Kaski S (2016) Sparse group factor analysis for biclustering of multiple data sources. Bioinformatics 32(16):2457–2463 Cantini L, Kairov U, de Reyniès A, Barillot E, Radvanyi F, Zinovyev A (2019) Assessing reproducibility of matrix factorization methods in independent transcriptomes. Bioinformatics 35(21):4307–4313 Chen J, Si Y-W, Un C-W, Siu SWI (2021) Chemical toxicity prediction based on semi-supervised learning and graph convolutional neural network. J Cheminform 13:93 Chen M, Hong H, Fang H, Kelly R, Zhou G, Borlak J, Tong W (2013) Quantitative structureactivity relationship models for predicting drug-induced liver injury based on FDA-approved drug labeling annotation and using a large collection of drugs. Toxicol Sci 136(1):242–249 Chen M, Lin Z, Cho K (2017) Graph convolutional networks for classification with a structured label space. ArXiv171004908. https://arxiv.org/abs/1710.04908. Accessed on 8 Jan 2022 Cheng F, Hong H, Yang S, Wei Y (2017) Individualized network-based drug repositioning infrastructure for precision oncology in the panomics era. Brief Bioinform 18(4):682–697 Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297 Gan SL (2019) Importance of hazard identification in risk management. Ind Health 57(3):281–282 Das S, Lee D, Sillitoe I, Dawson NL, Lees JG, Orengo CA (2015) Functional classification of CATH superfamilies: a domain-based approach for protein function annotation. Bioinformatics 31:3460–3467

1 Machine Learning and Deep Learning Promote Computational …

13

Davis AP, Grondin CJ, Johnson RJ, Sciaky D, McMorran R, Wiegers J, Wiegers TC, Mattingly CJ (2019) The comparative toxicogenomics database: update 2019. Nucleic Acids Res 47(D1):D948–D954 Deng X, Cao S, Horn AL (2021) Emerging applications of machine learning in food safety. Annu Rev Food Sci Technol 12:513–538 Elbadawi M, Gaisford S, Basit AW (2021) Advanced machine-learning techniques in drug discovery. Drug Discov Today 26:769–777 Gacesa R, Barlow DJ, Long PF (2016) Machine learning can differentiate venom tox-ins from other proteins having non-toxic physiological functions. PeerJ Comput Sci 2:e90 George J, Singh R, Mahmood Z, Shukla Y (2010) Toxicoproteomics: new paradigms in toxicology research. Toxicol Mech Methods 20(7):415–423 Goecks J, Jalili V, Heiser LM, Gray JW (2020) How machine learning will trans-form biomedicine. Cell 181:92–101 Guo W, Archer J, Moore M, Bruce J, McLain M, Shojaee S, Zou W, Benjamin LA, Adeuya A, Fairchild R, Hong H (2019) QUICK: quality and usability investigation and control kit for mass spectrometric data from detection of persistent organic pollutants. Int J Environ Res Public Health 16(21):4203 Guo W, Archer J, Moore M, Shojaee S, Zou W, Ge W, Benjamin L, Adeuya A, Fairchild R, Hong H (2021) Software-assisted pattern recognition of persistent organic pollutants in contaminated human and animal food. Molecules 26(3):685 Hanson RM (2016) Jmol SMILES and Jmol SMARTS: specifications and applications. J Cheminform 8:50 Heo S, Safder U, Yoo C (2019) Deep learning driven QSAR model for environmental toxicology: effects of endocrine disrupting chemicals on human health. Environ Pollut 253:29–38 Hong H, Neamati N, Winslow HE, Christensen JL, Orr A, Pommier Y, Milne GWA (1998) Identification of HIV-1 integrase inhibitors based on a four-point pharmacophore. Antivir Chem Chemother 9(6):461–472 Hong H, Dragan Y, Epstein J, Teitel C, Chen B, Xie Q, Fang H, Shi L, Perkins R, Tong W (2005a) Quality control and quality assessment of data from surface-enhanced laser desorption/ionization (SELDI) time-of flight (TOF) mass spectrometry (MS). BMC Bioinform 6(Suppl 2):S5 Hong H, Tong W, Xie Q, Fang H, Perkins R (2005b) An in silico ensemble method for lead dis-covery: decision forest. SAR/QSAR Environ Res 16(4):339–347 Hong H, Su Z, Ge W, Shi L, Perkins R, Fang H, Xu J, Chen JJ, Han T, Kaput J, Fuscoe JC, Tong W (2008) Assessing batch effects of genotype calling algorithm BRLMM for the affymetrix GeneChip human mapping 500 K array set using 270 HapMap samples. BMC Bioinform 9(Suppl 9):S17 Hong H, Xu L, Liu J, Jones WD, Su Z, Ning B, Perkins R, Ge W, Miclaus K, Zhang L, Park K, Green B, Han T, Fang H, Lambert CG, Vega SC, Lin SM, Jafari N, Czika W, Wolfinger RD, Goodsaid F, Tong W, Shi L (2012) Technical reproducibility of genotyping SNP arrays used in genome-wide association studies. PLoS ONE 7(9):e44483 Hong H, Thakkar S, Chen M, Tong W (2017) Development of decision forest models for prediction of drug-induced liver injury in humans using a large set of FDA-approved drugs. Sci Rep 7(1):17311 Huang Y, Li X, Xu S, Zheng H, Zhang L, Chen J, Hong H, Kusko R, Li R (2020) Quantitative structure-activity relationship models for predicting inflammatory potential of metal oxide nanoparticles. Environ Health Perspect 128(6):67010 Idakwo G, Luttrell J, Chen M, Hong H, Gong P, Zhang C (2019) A review of feature reduction methods for QSAR-based toxicity prediction. In: Hong H (ed) Advances in computational toxicology: methodologies and applications in regulatory science. Springer, New York, pp 119–139 Ji X, Li P, Fuscoe JC, Chen G, Xiao W, Shi L, Ning B, Liu Z, Hong H, Wu J, Liu J, Guo L, Kreil DP, Łabaj PP, Zhong L, Bao W, Huang Y, He J, Zhao Y, Tong W, Shi T (2020) A comprehensive rat

14

R. Kusko and H. Hong

transcriptome built from large scale RNA-seq-based annotation. Nucleic Acids Res 48(15):8320– 8331 Jimenez-Carretero D, Abrishami V, Fernández-de-Manuel L, Palacios I, Quílez-Álvarez A, DíezSánchez A, Del Pozo MA, Montoya MC (2018) Tox_(R)CNN: deep learning-based nuclei profiling tool for drug toxicity screening. PLOS Comput Biol 14:e1006238 Jing R, Li Y, Xue L, Liu F, Li M, Luo J (2020) AutoBioSeqpy: a deep learning tool for the classification of biological sequences. J Chem Inf Model 60:3755–3764 Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22:2577–2637 Kavlock RJ, Bahadori T, Barton-Maclaren TS, Gwinn MR, Rasenberg M, Thomas RS (2018) Accelerating the pace of chemical risk assessment. Chem Res Toxicol 31(5):287–290 Kim J, Kusko R, Zeskind B, Zhang J, Escalante-Chong R (2021) A primer on applying AI synergistically with domain expertise to oncology. Biochim Biophys Acta BBA Rev Cancer 1876:188548 Kohonen P, Parkkinen JA, Willighagen EL, Ceder R, Wennerberg K, Kaski S et al (2017) A transcriptomics data-driven gene space accurately predicts liver cytopathology and drug-induced liver injury. Nat Commun 8:15932 Korshunova M, Ginsburg B, Tropsha A, Isayev O (2021) OpenChem: a deep learning toolkit for computational chemistry and drug design. J Chem Inf Model 61:7–13 Koutsoukas A, St Amand J, Mishra M, Huan J (2016) Predictive toxicology: modeling chemical induced toxicological response combining circular finger-prints with random forest and support vector machine. Front Environ Sci 4:11 Kumagai K, Ando Y, Kiyosawa N, Ito K, Kawai R, Yamoto T, Manabe S, Teranishi M (2006) Toxicoproteomic investigation of the molecular mechanisms of cycloheximide-induced hepatocellular apoptosis in rat liver. Toxicology 228(2–3):299–309 Lauschke VM (2021) Toxicogenomics of drug induced liver injury—from mechanistic understanding to early prediction. Drug Metab Rev 53(2):245–252 Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, Wrobel MJ et al (2006) The connectivity map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313:1929–1935 LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444 Lee CM, Mudaliar MA, Haggart DR, Wolf CR, Miele G, Vass JK, Higham DJ, Crowther D (2012) Simultaneous non-negative matrix factorization for multiple large scale gene expression datasets in toxicology. PLoS ONE 7(12):e48238 Li Y, Netherland MD, Zhang C, Hong H, Gong P (2019) In silico identification of genetic mutations conferring resistance to acetohydroxyacid synthase inhibitors: a case study of Kochia scoparia. PLoS ONE 14(5):e0216116 Liu R, Wei L, Zhang P (2021) A deep learning framework for drug repurposing via emulating clinical trials on real-world patient data. Nat Mach Intell 3:68–75 Lowe HJ, Ferris TA, Hernandez PM, Weber SC (2009) STRIDE—an integrated standards-based translational research informatics platform. AMIA Annu Symp Proc 2009:391–395 Luo H, Ye H, Ng H, Shi L, Tong W, Mattes W, Mendrick D, Hong H (2015a) Understanding and predicting binding between human leukocyte antigens (HLAs) and peptides by network analysis. BMC Bioinform 16(Suppl 13):S9 Luo H, Ye H, Ng HW, Shi L, Tong W, Mendrick DL, Hong H (2015b) Machine learning methods for predicting HLA-peptide binding activity. Bioinform Biol Insights 9(Suppl 3):21–29 Luo Y, Wang F, Szolovits P (2017) Tensor factorization toward precision medicine. Brief Bioinform 18(3):511–514 Meuwly M (2021) Machine learning for chemical reactions. Chem Rev 121:10218–10239 Miggiels P, Wouters B, van Westen GJP, Dubbelman AC, Hankemeier T (2019) Novel technologies for metabolomics: more for less. Trends Analyt Chem 120:115323 Mock F, Viehweger A, Barth E, Marz M (2021) VIDHOP, viral host prediction with deep learning. Bioinformatics 37:318–325

1 Machine Learning and Deep Learning Promote Computational …

15

Monzel AS, Hemmer K, Kaoma T, Smits LM, Bolognin S, Lucarelli P, Rosety I, Zagare A, Antony P, Nickels SL, Krueger R, Azuaje F, Schwamborn JC (2020) Machine learning-assisted neurotoxicity prediction in human midbrain organoids. Parkinsonism Relat Disord 75:105–109 Naamati G, Askenazi M, Linial M (2009) ClanTox: a classifier of short animal tox-ins. Nucleic Acids Res 37(suppl_2):W363–W368 Nami Y, Imeni N, Panahi B (2021) Application of machine learning in bacteriophage research. BMC Microbiol 21:193 Ng HW, Zhang W, Shu M, Luo H, Ge W, Perkins R, Tong W, Hong H (2014) Competitive molecular docking approach for predicting estrogen receptor subtype α agonists and antagonists. BMC Bioinform 15(Suppl 11):S4 Ng HW, Shu M, Luo H, Ye H, Ge W, Perkins R, Tong W, Hong H (2015) Estrogenic activity data extraction and in silico prediction show the endocrine disruption potential of bisphenol A replacement compounds. Chem Res Toxicol 28(9):1784–1795 Pizzatti L, Kawassaki ACB, Fadel B, Nogueira FCS, Evaristo JAM, Woldmar N, Teixeira GT, Da Silva JC, Scandolara TB, Rech D, Candiotto LPZ, Silveira GF, Pavanelli WR, Panis C (2020) Toxicoproteomics disclose pesticides as downregulators of TNF-α, IL-1β and estrogen receptor pathways in breast cancer women chronically exposed. Front Oncol 10:1698 Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106 Rajkomar A, Dean J, Kohane I (2019) Machine learning in medicine. N Engl J Med 380:1347–1358 Sakamuru S, Zhao J, Xia M, Hong H, Simeonov A, Vaisman I, Huang R (2021) Predictive models to identify small molecule activators and inhibitors of opioid receptors. J Chem Inf Model 61(6):2675–2685 Sakkiah S, Kusko R, Tong W, Hong H (2019) Applications of molecular dynamics simulations in computational toxicology. In: Hong H (ed) Advances in computational toxicology: methodologies and applications in regulatory science. Springer, New York, pp 119–139 Sakkiah S, Leggett C, Pan B, Guo W, Valerio LG Jr, Hong H (2020) Development of a nicotinic acetylcholine receptor nAChR α7 binding activity prediction model. J Chem Inf Model 60(4):2396–2404 Sakkiah S, Guo W, Pan B, Ji Z, Yavas G, Azevedo M, Hawes J, Patterson TA, Hong H (2021a) Elucidating interactions between SARS-CoV-2 trimeric spike protein and ACE2 using homology modeling and molecular dynamics simulations. Front Chem 8:622632 Sakkiah S, Selvaraj C, Guo W, Liu J, Ge W, Patterson TA, Hong H (2021b) Elucidation of agonist and antagonist dynamic binding patterns in ER-α by integration of molecular docking, molecular dynamics simulations and quantum mechanical calculations. Int J Mol Sci 22(17):9371 Schilling A, Metzner C, Rietsch J, Gerum R, Schulze H, Krauss P (2019) How deep is deep enough?—Quantifying class separability in the hidden layers of deep neural networks. ArXiv181101753. https://arxiv.org/abs/1811.01753. Accessed on 8 Jan 2022 Schreiber J, Singh R (2021) Machine learning for profile prediction in genomics. Curr Opin Chem Biol 65:35–41 Selvaraj C, Sakkiah S, Tong W, Hong H (2018) Molecular dynamics simulations and applications in computational toxicology and nanotoxicology. Food Chem Toxicol 112:495–506 Sharma AK, Srivastava GN, Roy A, Sharma VK (2017) ToxiM: a toxicity prediction tool for small molecules developed using machine learning and chemoinformatics approaches. Front Pharmacol 8:880 Shen J, Xu L, Fang H, Richard AM, Bray JD, Judson RS, Zhou G, Colatsky TJ, Aungst JL, Teng C, Harris SC, Ge W, Dai SY, Su Z, Jacobs AC, Harrouk W, Perkins R, Tong W, Hong H (2013) EADB: an estrogenic activity database for assessing potential endocrine activity. Toxicol Sci 135(2):277–291 Shi L, Tong W, Fang H, Xie Q, Hong H, Perkins R, Wu J, Tu M, Blair RM, Branham WS, Wal-ler C, Walker J, Sheehan DM (2002) An integrated “4-phase” approach for setting endocrine disruption screening priorities–phase I and II predictions of estrogen receptor binding affinity. SAR/QSAR Environ Res 13(1):69–88

16

R. Kusko and H. Hong

Shoemaker RH (2006) The NCI60 human tumour cell line anticancer drug screen. Nat Rev Cancer 6:813–823 Sidey-Gibbons JAM, Sidey-Gibbons CJ (2019) Machine learning in medicine: a practical introduction. BMC Med Res Methodol 19(1):64 Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. ArXiv14091556. Accessed on 8 Jan 2022 Sizochenko N, Syzochenko M, Fjodorova N, Rasulev B, Leszczynski J (2019) Evaluating genotoxicity of metal oxide nanoparticles: application of advanced supervised and unsupervised machine learning techniques. Ecotoxicol Environ Saf 185:109733 Snell J, Swersky K, Zemel RS (2017) Prototypical networks for few-shot learning. ArXiv170305175. https://arxiv.org/abs/1703.05175. Accessed on 8 Jan 2022 Song L, Yin Q, Kang M, Ma N, Li X, Yang Z, Jin H, Lin M, Zhuang P, Zhang Y (2020) Untargeted metabolomics reveals novel serum biomarker of renal damage in rheumatoid arthritis. J Pharm Biomed Anal 180:113068 Suman S, Mishra S, Shukla Y (2016) Toxicoproteomics in human health and disease: an update. Expert Rev Proteomics 13(12):1073–1089 Sun X, Zhang X, Muir DCG, Zeng EY (2020) Identification of potential PBT/POP-like chemicals by a deep learning approach based on 2D structural features. Environ Sci Technol 54:8221–8231 Tan H, Wang X, Hong H, Benfenati E, Giesy JP, Gini GC, Kusko R, Zhang X, Yu H, Shi W (2020) Structures of endocrine-disrupting chemicals determine binding to and activation of the estrogen receptor α and androgen receptor. Environ Sci Technol 54(18):11424–11433 Tan H, Chen Q, Hong H, Benfenati E, Gini GC, Zhang X, Yu H, Shi W (2021) Structures of endocrine-disrupting chemicals correlate with the activation of 12 classic nuclear receptors. Environ Sci Technol 55(24):16552–16562. https://doi.org/10.1021/acs.est.1c04997 Tang W, Chen J, Hong H (2020) Discriminant models on mitochondrial toxicity improved by consensus modeling and resolving imbalance in training. Chemosphere 253:126768 Tang W, Chen J, Hong H (2021) Development of classification models for predicting inhibition of mitochondrial fusion and fission using machine learning methods. Chemosphere 273:128567 Tang W, Liu W, Wang Z, Hong H, Chen J (2022) Machine learning models on chemical inhibitors of mitochondrial electron transport chain. J Hazard Mater 426:128067 Tarvainen A, Valpola H (2018) Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. ArXiv170301780. https://arxiv.org/abs/ 1703.01780. Accessed on 8 Jan 2022 Telgarsky M (2015) Representation benefits of deep feedforward networks. ArXiv:1509.08101v2. https://arxiv.org/abs/1509.08101v2. Accessed on 8 Jan 2022 Tomonaga T, Izumi H, Yoshiura Y, Nishida C, Yatera K, Morimoto Y (2021) Examination of surfactant protein D as a biomarker for evaluating pulmonary toxicity of nanomaterials in rat. Int J Mol Sci 22(9):4635 Tran D, Cooke S, Illingworth PJ, Gardner DK (2019) Deep learning as a predictive tool for fetal heart pregnancy following time-lapse incubation and blastocyst transfer. Hum Reprod Oxf Engl 34:1011–1018 UniProt Consortium (2019) UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res 47(D1):D506–D515 Vesselinov VV, Alexandrov BS, O’Malley D (2019) Nonnegative tensor factorization for contaminant source identification. J Contam Hydrol 220:66–97 Wang Z, Chen J, Hong H (2020) Applicability domains enhance application of PPARγ agonist classifiers trained by drug-like compounds to environmental chemicals. Chem Res Toxicol 33(6):1382–1388 Wang Z, Chen J, Hong H (2021) Developing QSAR models with defined applicability domains on PPARγ binding affinity using large data sets and machine learning algorithms. Environ Sci Technol 55(10):6857–6866

1 Machine Learning and Deep Learning Promote Computational …

17

Wishart D, Arndt D, Pon A, Sajed T, Guo AC, Djoumbou Y, Knox C, Wilson M, Liang Y, Grant J, Liu Y, Goldansaz SA, Rappaport SM (2015) T3DB: the toxic exposome database. Nucleic Acids Res 43(Database issue):D928–D934 Yang L, Ghosh RP, Franklin JM, Chen S, You C, Narayan RR, Melcher ML, Liphardt JT (2020) NuSeT: a deep learning tool for reliably separating and analyzing crowded cells. PLOS Comput Biol 16:e1008193 Yang X, Ou W, Zhao S, Wang L, Chen J, Kusko R, Hong H, Liu H (2021) Human transthyretin binding affinity of halogenated thiophenols and halogenated phenols: an in vitro and in silico study. Chemosphereol 280:130627 Zaremba W, Sutskever I, Vinyals O (2015) Recurrent neural network regularization. ArXiv14092329. https://arxiv.org/abs/1409.2329. Accessed on 8 Jan 2022

Part I

Machine Learning and Deep Learning Methods for Computational Toxicology

Chapter 2

Assessment of the Xenobiotics Toxicity Taking into Account Their Metabolism Dmitry Filimonov, Alexander Dmitriev, Anastassia Rudik, and Vladimir Poroikov

2.1 Introduction Xenobiotics are defined as chemicals foreign to humans, including but not limited to drugs, food additives, pesticides, fragrances, and industrial chemicals (Patterson et al. 2010). Among the 1–3 million xenobiotics each person will come in contact with during her lifetime (Idle and Gonzalez 2007), pharmaceuticals play a noteworthy role because the body is intentionally exposed to these potent bioactive substances. About three billion prescriptions for an estimated, 500 unique active ingredients have been written annually in the USA over the past decade, with about nine prescriptions for every US citizen (https://clincalc.com/DrugStats). The liver is the main organ for xenobiotic metabolism, although almost every human body tissue has some ability to metabolize xenobiotics. The metabolism of xenobiotics is often considered as biotransformation phases I and II (Macherey and Dansette 2008). During phase I (modification), molecules usually are modified into more polar hydrophilic metabolites, since these compounds are more easily excreted from the body. The main enzymes of the phase I are isoforms of cytochrome P450 superfamily (CYP3A4, CYP2C9, CYP2C19, CYP2D6, and CYP1A2), which perform most biotransformation reactions (Guengerich 2008). Phase II reactions included conjugation reactions, which are catalyzed by a large group of broadspecificity transferases. As a result, metabolites have higher molecular weight and are generally less active than their substrates.

D. Filimonov · A. Dmitriev · A. Rudik · V. Poroikov (B) Laboratory of Structure-Function Based Drug Design, Institute of Biomedical Chemistry, 10 Bldg. 8, Pogodinskaya Str., Moscow 119121, Russia e-mail: [email protected] D. Filimonov e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Hong (ed.), Machine Learning and Deep Learning in Computational Toxicology, Computational Methods in Engineering & the Sciences, https://doi.org/10.1007/978-3-031-20730-3_2

21

22

D. Filimonov et al.

Fig. 2.1 Drug bioactivation and detoxification (Bezhentsev et al. 2016). Reproduced with the permission of RCR and IOPP

In many cases, biotransformation results in detoxification of xenobiotics and gives rise to chemically stable metabolites, which possess neither pharmacological activity nor toxicity (Lyubimov 2012). Nevertheless, in some cases, drug biotransformation products are either chemically reactive or pharmacologically active. Transformations of this type are called “metabolic activation” or “bioactivation” (Fig. 2.1). Bioactivation may occur in both the first and the second phases of drug metabolism. A pharmacologically active or toxic product can arise both in a single reaction and in a series of enzymatic transformations. Theoretically, all enzymes involved in drug metabolism can also perform bioactivation. However, cytochrome P450 plays a leading role by being involved with not only drug metabolism but also metabolic activation (Ioannides and Lewis 2004; Guengerich 2008, 2021; Lyubimov 2012). During early new pharmaceutical substance R&D, direct assessment of the toxicity of drugs and their metabolites in the human body is impossible for ethical reasons. Moreover, investigation on experimental animals or cellular systems requires the synthesis and experimental testing of many chemical compounds and their metabolites; thus, computational methods are preferable (Wang et al. 2007).

2 Assessment of the Xenobiotics Toxicity Taking into Account Their …

23

2.2 Computational Methods of Studying Metabolism During the past two decades, several reviews have been published considering both experimental (Kulkarni et al. 2005; Kirchmair et al. 2012) and computational (Peach et al. 2012; Wang et al. 2007; Kulkarni et al. 2005; Kirchmair et al. 2012, 2015; Veselovskii et al. 2010; Olsen et al. 2015) methods for evaluation of xenobiotic metabolism. Recently, Alqahtani provided an overview of the available in silico models used to predict the ADME-Tox properties of compounds, particularly metabolism (Alqahtani 2017). Tyzack and Kirchmair presented a detailed dive into computer estimation of cytochrome P450 (CYP) metabolism in the context of drug discovery (Tyzack and Kirchmair 2019). Computational approaches for xenobiotic metabolism prediction can be subdivided into three groups according to the application area (Bezhentsev et al. 2016): (1) prediction of the interaction of xenobiotics with metabolic enzymes; (2) prediction of the sites of metabolism (SOMs) in the chemical structure of a compound to be metabolized; (3) generation of the potential metabolite structures for the subsequent evaluation of their properties. The evaluated structures can be used for further estimation of their properties and integral assessment of prediction of biological activity spectrum (Rudik et al. 2021).

2.2.1 Databases Containing Xenobiotic Metabolism Information An overview of publicly available databases (DB) where one may find information about drug metabolism are described below. DrugBank (Wishart et al. 2018a) (https://go.drugbank.com/) is a DB created and maintained by the Canadian Institutes of Health Research (CIHR) and the Metabolomics Innovation Centre (TMIC). Initially, it was developed as a DB containing information on drugs, drug targets, and mechanisms of action. The DrugBank 5.2 is a comprehensive, freely available, online database containing information on drugs and drug targets. The latest release of DrugBank Online (version 5.1.9, released 2022-01-03) contains over 14,000 drugs, among them about 770 compounds have information about their biotransformation, including structures of metabolites. Metabolite structures and parent compound structures may be downloaded in SDF format. The full content of DrugBank is available as XML format. ChEMBL (Mendez et al. 2019) (https://www.ebi.ac.uk/chembl/) is a freely available DB of bioactive drug-like small molecules. The ChEMBL database (version 29)

24

D. Filimonov et al.

contains information about biotransformation on over 250 drug-like compounds. The ChEMBL database is accessed through a web interface that provides the search of small organic compounds using particular terms or structural formulas as a query. Also, it is possible to download ChEMBL for further use in different database management systems, including Oracle, MySQL, SQLLite and PostgreSQL. MetXBioDB (Djoumbou-Feunang et al. 2019) is a biotransformation database that describes the metabolism of small molecules in the human body that was established in the context of developing BioTransformer (Djoumbou-Feunang et al. 2019). The database contains information on the biotransformation of more than 1000 compounds, and it is mainly represented by one pair “parent compound-metabolite”. The MetXBioDB database can be conveniently downloaded in CSV format. Human Metabolome Database (HMDB) (http://www.hmdb.ca/) is a freely available electronic database containing detailed information about small molecule metabolites found in the human body (Wishart et al. 2018b). The HMDB database, version 5.0, contains data on more than 200 thousand metabolites. This DB is accessed through a web interface and allows the identification of metabolites from mixtures via LC–MS/MS spectroscopy. The sequences of all drug metabolizing enzymes available in FASTA format, metabolite structures available in SDF or XML format, all spectra files available in XML format. Recon3D (http://vmh.life) is a computational resource that includes threedimensional (3D) metabolite and protein structure data and enables integrated analyses of metabolic functions in humans (Brunk et al. 2018). Recon3D contains data on more than 13 thousand metabolic reactions involving more than 4 thousand unique metabolites. Recon 3D has been assembled using multiple data sources. Among them, HMR 2.00 (Pornputtapong et al. 2015) (2478 reactions), metabolomics data sets (1865 reactions), a drug module (Sahoo et al. 2015) (721 reactions), a transport module (51 reactions), host-microbe reactions (24 reactions), absorption and metabolism of dietary compounds (20 reactions), and others (1004 reactions). The “others” category included reactions that captured metabolism in specific human organs (e.g., kidneys), as well as metabolic pathways of lipoproteins, bile acids, and sphingolipids. In addition to these freely available databases, some commercially available resources contain information about drug metabolism. One example is the Cortellis Drug Discovery Intelligence (www.cortellis.com/drugdiscovery/), containing 1.3 million data points in the pharmacokinetics area. They include data from experimental and clinical studies describing a drug’s absorption, distribution, metabolism, and excretion (ADME) profile. In particular, parent compounds and metabolites structures are presented.

2 Assessment of the Xenobiotics Toxicity Taking into Account Their …

25

2.2.2 Descriptors/Notation Used for Metabolism Prediction For metabolism prediction, it is important to know, which atoms or/and bond are changing during reactions. Usually, only structure of the parent compound is considered, and then transformation rules are applied to this structure. The most popular notations used for metabolism prediction are considered below. The Simplified Molecular Input Line Entry System (SMILES) specification (Weininger 1988) encodes a molecular structure as a string. It is widely used in Cheminformatics Software since it is more compact than most other chemical notation, and it is understandable for human. SMILES string is obtained by printing the symbols of the vertices of the molecular graph in the order corresponding to their depth-first traversal. The SMILES ARbitrary Target Specification (SMARTS) describes substructural patterns in molecules. The use of SMARTS helps to search for substructures in molecules. For substructural search, SMILES and SMARTS strings are converted to graph representations and then subgraph isomorphism search is performed. SMART notations contain labels for atoms and bonds, which allow one to describing a group of structures. For example, [R] meant “atom in any ring”. In SMARTS notation one can use logical operators; for example “,” (comma) matches logical disjunction. SMILES and SMARTS can be used to represent reactions, using the “>” symbol between the reactants, products and agents. However, the most suitable for describing reactions is SMIRKS language, which was created for describing a group of reactions (types of reactions) (Leach et al. 1999) that undergo the same set of atom and bond changes. SMIRKS apply the rules, which guarantee that reaction may be presented as a graph and from this representation one can understand the atom and bond changes. SMIRKS requires that all hydrogens directly involved in a transform must be explicitly expressed. Any reaction can be represented as a list of changes undergone by the edges and atoms of a molecule during the transformation. If the reaction rules are designated as “edge elimination”, “decrease in the bond order by 1”, “decrease in the atomic charge by 1”, particular atom and bond labels play no part. This simple idea forms the basis of SMIRKS. In this format, the description of a generalized reaction consists of SMARTS encodings of the substrate and the product, with mapping being specified between the substrate and product atoms. To apply machine learning methods for the prediction of biological activity and toxicity, thousands of different structural descriptors have been proposed (Todeschini and Consonni 2000). Here, we will consider only particular descriptors developed and applied for the prediction of metabolism. Chemical structure descriptors can be used to predict the interaction of xenobiotics with metabolic enzymes, sites of metabolism and metabolite structures. The development of descriptors primarily designed for prediction metabolism is discussed in more detail below. The Labeled Multilevel Neighborhoods of Atom (LMNA) descriptors are similar to MNA descriptors (Filimonov et al. 2014) and consist of a set of structural descriptors, each of which describes atoms and its neighbors. The zero-level LMNA

26

D. Filimonov et al.

descriptor for atom A is the atom’s symbol with mark of “been labeled in the structure with one labeled atom (SoLA)”. If atom is site of metabolism (SOM), it receives label “*”; for example, for atom A a zero-level of LMNA descriptor looks like: D0 = [∗]A where “*” is a sign for a labeled atom. Then, iterative procedure is performed and Nth level of LMNA descriptor has the following structure: D N −1 (B2 ) . . . D N −1 (Bk ) where “−” is a mark added to non-ring atoms (added to the second level of LMNA) and DN −1 (Bi ) is the (N − 1)-level LMNA descriptor for atom A’s ith immediate neighbor Bi . An example of LMNA descriptors for four atoms (pos. 18−21) in the structure with one labeled atom (SoLA), which represents amitriptyline with labeled atom no. 21, is given in Fig.2.2 (Rudik et al. 2014). Quantitative Neighborhoods of Atom (QNA) descriptors are pairs of P and Q values calculated for each atom of a molecule based on its connectivity matrix and the standard values of the ionization potential (IP) and electron affinity (EA) of the atoms (Filimonov et al. 2009). They describe each of the atoms in a molecule, and, at the same time, each of the P and Q values depends on the whole composition and structure of a molecule. Previously developed ligand-based approaches (Tyzack et al. 2014; Rudik et al. 2014) were mainly based on the use of either fragment descriptors (substructures) or fingerprints, generated for each atom of the chemical structure and specifically tagged if a particular atom corresponded to the SOM. In most cases, the

Fig. 2.2 Example of LMNA descriptors for four atoms (pos. 18–21) in the SoLA, which represents amitriptyline with labeled atom no. 21. Reprinted with permission from Rudik et al. (2014). Copyright 2014 American Chemical Society

2 Assessment of the Xenobiotics Toxicity Taking into Account Their …

27

SOM corresponds to a single atom, so the atom-centered QNA descriptors seem best for representing SOMs. Labels of SOM and non-SOM for each atom of a molecule can be generated automatically based on the SDF file used as input. RMNA descriptors (Borodina et al. 2004) are created based on MNA descriptors to represent chemical reactions. The principal difference between the RMNA and MNA/LMNA descriptors is the additional labeling of atoms and simultaneous consideration of two structures, namely the substrate and product structures. Sheridan et al. (2007) have examined several descriptors that describe the local environment around each non-hydrogen atom in each molecule. Among them is the SPAN descriptor ratio, which measures whether the candidate oxidation site is at the end or the middle of a molecule in a topological sense. It is defined as the longest bond path distance from a given atom divided by the longest bond path distance in the whole molecule. The other descriptors examined by Sheridan et al. (2007) describe the atom’s local chemical environments, including physiochemical environment and solvent accessible surface area, which require the 3D structure of a molecule. SMARTCyp uses a reactivity descriptor (E), an accessibility descriptor (A) and Solvent Accessible Surface Area (SASA) descriptor. The atomic SASA for non-hydrogen atoms are computed based on circular fingerprints (Rydberg et al. 2013). The reactivity descriptor estimates the energy required for a CYP to react at this position and is calculated for each atom by matching SMARTS patterns to a lookup table of energies in kJ/mol. The activation energy for the reaction of the given fragment with the cytochrome is calculated beforehand by density functional theory (DFT) (Parr and Yang 1989). If there are several such fragments, the one characterized by the lowest energy is chosen. Atoms not matching any pattern are not considered to be reactive. The example of fragment and activation energy is shown at the scheme below (adopted from https://www.ncbi.nlm.nih.gov/pmc/art icles/PMC4055970/figure/fig1/).

The score function S, by which the atoms are ranked according to the probability of being the CYP2C sites of metabolism, has the following form: S = E−8 * A−0.04 * SASA, where E is the reactivity descriptor, A is the accessibility descriptor. The descriptor A is defined as the longest path distance to the given atom divided by the path distance in the whole molecule. This descriptor is similar to the SPAN descriptor used by Sheridan et al. (2007). Singh et al. (2003) developed a statistical trend vector model used to estimate the AM1 abstraction energy of a hydrogen atom from its local atomic environment. The authors carried out AM1 and trend vector calculations on 50 CYP3A4 substrates whose major sites of metabolism have been described in literature. A plot of the

28

D. Filimonov et al.

lowest hydrogen radical formation energy versus its sterically accessible surface area exposure for these 50 substrates shows that only those hydrogen atoms with solvent accessible surface area exposure more than 8.0 Å2 are susceptible to CYP3A4mediated metabolism. The authors used descriptors that capture the local topological environment of the hydrogen. The descriptors are H-AT1, H-AT1-AT2, H-AT1-AT2AT3, H-AT1-AT2-AT3-AT4, where AT1 is the type of atom to which the hydrogen is directly bonded, AT2, the atom two bonds away, etc. The atom type includes the element, the number of non-hydrogen neighbors, and the hybridization. The authors used the PATTY methodology (Bush and Sheridan 1993) to assign descriptors for all the hydrogens. In Xenosite (2013), SOM prediction used a combination of descriptors. They included topological, quantum chemical descriptors, a SMARTCyp reactivity (SCR) descriptor, as well as a molecule-level and fingerprint similarity descriptors.

2.2.3 Prediction of Biotransformation Sites Currently, there are different experimental and computational approaches to SOM prediction (Kulkarni et al. 2005; Zheng et al. 2009; Kirchmair et al. 2012; Lounnas et al. 2013; Bezhentsev et al. 2016; Sridhar et al. 2017). Various methods are considered in Sect. 2.2.5. Typically, computational approaches are based on: (1) studies of enzyme-ligand interactions (structure-based methods); and (2) computational modeling of the ligands’ properties based on the training sets of known ligands and non-ligands (ligand-based methods). Structure-based approaches are powerful and widely used methods for estimating intermolecular interactions. When the three-dimensional structure of the target protein is unknown or determined with low resolution, the application of a structure-based computational approach might be practically impossible (Lounnas et al. 2013). Another limitation of structure-based design is a panoply of ligandenzyme conformations appearing in the computer-based models, making it very difficult to predict correctly several SOMs in one molecule. Ligand-based approaches are widely used for predicting the sites of metabolism. There is a large body of data on ligands, which are metabolized by cytochrome P450 enzymes and the experimentally investigated SOMs in a diverse set of ligands. Increasingly accumulated information regarding SOMs in several databases has made it possible to develop models for SOM prediction. Recently, several different approaches to ligand-based SOM prediction have been developed and deployed, including machine learning techniques. Tyzack et al. reported an approach to SOM prediction based on probabilistic classifiers such as the naïve Bayesian and the kernel-based algorithm Random Attribute Subsampling Classification Algorithm (RASCAL) initially developed by the authors (Tyzack et al. 2014). The Parzen-Rosenblatt Window (PRW) was the function created to obtain kernels, further used in RASCAL. A full methods description is provided in a previous

2 Assessment of the Xenobiotics Toxicity Taking into Account Their …

29

publication (Tyzack et al. 2014). The developed algorithms were applied to predict the SOMs for the modeling sets of chemical compounds metabolized by CYP2C9, CYP2D6, and CYP3A4. The authors compared their approach to those developed previously, such as Xenosite (Zaretzki et al. 2013), RS-predictor (Zaretzki et al. 2011), and SMARTCyp (Rydberg et al. 2010). Comparison with the other approaches was performed using the Top-2 metrics: the percentage of structures in the test set where a SOM was identified in the top-two predictions. The accuracy of prediction was comparable to reference methods. Tarasova et al. (2017) designed a simple approach for the SOM/non-SOM classification, which was based on the application of QNA descriptors (Filimonov et al. 2009) and several machine learning algorithms. The authors (Tarasova et al. 2017) proposed to use the data on SOM and non-SOM heavy atoms only. Each SOM-atom had the value “1”, non-SOM atoms had the value of “0”. These values of SOM were used as the target variable in the models, while P and Q values (QNA descriptors) were independent variables. The machine learning methods comparison allowed authors to conclude that the random forest classifier gave the best performances for both the balanced and imbalanced datasets. Tarasova et al. (2017) demonstrated that the QNA-based approach using machine learning yielded performance comparable to those reported earlier by Tyzack et al. (2014). Therefore, QNA descriptors in machine learning methods may be applied for predicting the site of metabolism. Metabolic rainbow (Dang et al. 2020) is a system for simultaneously labeling SOM and reaction types by classifying them into five key reaction classes: stable and unstable oxidations, dehydrogenation, hydrolysis, or reduction. These classes unambiguously identify 21 types of phase I reactions, covering 92.3% of known reactions. The Rainbow XenoSite method was able to identify reaction-type specific sites of metabolism with a tenfold cross-validated AUC (area under the receiver operator curve) accuracy 0.97. Rainbow XenoSite with five-color and combined output is freely available online through the following web server http://swami.wustl. edu/xenosite/p/phase1_rainbow. The concept, called the “bond of metabolism” (BOM), extends the traditional “site of metabolism” (SOM) by specifying the information about a set of chemical bonds that are modified or formed as a result of a metabolic reaction (rather than the specific atom) (Tian et al. 2021). As chemical reactions always involve either breaking or forming bonds between a pair of atoms, this concept explicitly describes the location where a chemical reaction occurs in terms of the bonds involved and provides other information about the reaction.

2.2.4 Generation of the Structures of Probable Metabolites SOMs are merely a proxy for metabolic structures: knowledge of a SOM does not explicitly provide the actual metabolite structure. Without an explicit metabolite structure, computational systems cannot evaluate the new molecule’s properties. For example, the metabolite’s reactivity cannot be automatically predicted. This is a

30

D. Filimonov et al.

crucial limitation because reactive drug metabolites are a key driver of adverse drug reactions (ADRs). Additionally, some other metabolic events cannot be predicted, even though the metabolic path of the majority of substrates includes two or more sequential steps. MetabolExpert (Darvas 1987) was the first attempt to create an expert system for simulation of the metabolic fate of xenobiotics in humans exemplified by metabolism of chloramphenicol analogues. MetabolExpert used Prolog programming language and empirical knowledge of metabolic pathways from Testa and Jenner “Drug Metabolism: Chemical and Biochemical Aspects” (Testa and Jenner 1976) for creation of deductive rules called “production rules”. The concept of reaction rules is still used today. Program META (Klopman et al. 1994) used the same principles as MetabolExpert, and it was an expert system for prediction chemicals formed by metabolic transformations. It operated by dictionaries of reaction rules, created by experts to represent known metabolic paths. META, compared to MetabolExpert, uses much more extended reaction rules based on more extensive biotransformation dictionaries. Each dictionary presenting a particular metabolic model including the description for various organs and animal species and cover mammalian metabolism, aerobic and anaerobic degradation xenobiotics by bacteria, and photodegradation. META was able not only to review the sequential formation of all conceivable metabolites from a given parent compound but also to scrutinize the relative importance of the various metabolites on the basis of priority rankings. Metabolizer was the ChemAxon’s commercially available module included in the JChem program package for the early prediction of xenobiotic metabolites and the identification of major metabolites. The Metabolizer program generated possible metabolites for an entered compound and predicted not only metabolic pathways with indication of major metabolites, but also a stability of metabolites. The input data for Metabolizer are the biotransformation dictionary based on a manually curated knowledge base that can be further extended or even replaced for alternative purposes. The TIMES metabolism simulator (Mekenyan et al. 2004) is a hybrid method that preserves the weighting of generated metabolic reactions and the accounting of biochemical interactions due to the rigid fixation of the role of enzymes. It able to create the “maps” of several pathways and calculate the possibility of using various data sources to train a metabolic simulator. The ability of TIMES to integrate the metabolic activation of chemicals and the prediction of the toxicity of metabolites in one platform is considered an important advantage of the method. The MetaDrug software (Ekins et al. 2005) comprises an extensive database of human protein–xenobiotic interactions and various QSAR rule-based metabolite prediction QSAR models for cytochromes P450—the major drug metabolizing enzymes. The prediction is based on binary classifiers for substrates and for inhibitors (for IC50 < 10 uM) for the main cytochrome P450 isoforms (1A2, 2B6, 2D6, and 3A4). These models for substrates and inhibitors are applicable both to the parent

2 Assessment of the Xenobiotics Toxicity Taking into Account Their …

31

compounds (substrates) and to predicted metabolites. MetaDrug can be used for visualizing phases I and II metabolic pathways, as well as interpreting high-throughput data derived from microarrays as networks of interacting objects. Lhasa Limited developed the Meteor software (Marchant et al. 2008) in 1997 (Ridings et al. 1996). The Meteor prediction consists of successive applications of the reaction rules to the input compounds. To avoid the combinatorial explosion during metabolites generation each reaction is provided with a qualitative probability evaluation (probable, plausible, equivocal, doubted, or improbable). The relative reasoning rule estimate the comparative likelihood of two different biotransformations and a relative reasoning rule for example might be “N-demethylation is more likely than O-demethylation”. The prediction stops after generation of the metabolite with a LogP value sufficient for excretion from the body (Boobis et al. 2002). The EAWAG-PPS Pathway Prediction System (formerly called UM-PPS, Gao et al. 2010) is an expert system designed for predicting potential biodegradation pathways of organic compounds. It is based on the Biocatalysis/Biodegradation Database (BBD) containing information on more than 1300 compounds, 900 enzymes, 1500 reactions, and 500 microorganisms and includes about 260 reaction rules. EAWAGPPS envisages measures to prevent combinatorial explosion based on the prioritization of the transformation rules. For some pairs of rules, the relative priority calculated from more than 1000 reactions retrieved from BBD is indicated in the transformation base. To increase prediction quality, variable probability scores for specific types of biotransformations were introduced. For this purpose, regular expressions were associated with some transformation rules. If, during the prediction, a substrate structure complies with the description of the regular expression associated with the given transformation, this transformation is assigned a probability score based on the identified structural feature of the substrate. Then so-called super rules provide additional reduction to the combinatorial explosion. The application of super rules compresses metabolic pathways in the tree of metabolites and reduces the number of predicted unlikely metabolites (Fenner et al. 2008). Systematic Generation of potential Metabolites (SyGMa) uses SMIRKS rules that were developed by experts using a large database of experimentally observed metabolic reactions. The generated metabolites are assigned with the probability of the rule it was formed by Ridder and Wagener (2008). SyGMA is available as a python library at https://github.com/3D-e-Chem/sygma. BioTransformer (Djoumbou-Feunang et al. 2019) is a freely available software package for in silico metabolism prediction and compound identification. BioTransformer combines a machine learning approach with a knowledge-based approach to predict small molecule metabolism in human tissues (e.g., liver tissue), the human gut, and the environment (soil and water microbiota) via its metabolism prediction tool. Using mass spectrometry data obtained from a rat experimental study, BioTransformer could correctly identify metabolites via its metabolism identification tool and suggest potential metabolites. BioTransformer can be used as an open

32

D. Filimonov et al.

access command-line tool or a software library. It is freely available at https://bit bucket.org/djoumbou/biotransformerjar/. Moreover, it is also freely available as an open access RESTful application at www.biotransformer.ca, which allows users to manually or programmatically submit queries and retrieve metabolism predictions or compound identification data. Prediction of drug metabolites can be performed using a rule-free neural machine translation This approach is an end-to-end learning-based method for predicting possible human metabolites of small molecules, including drugs (Litsa et al. 2020). Here, the metabolite prediction task is approached as a sequence translation problem with chemical compounds represented using the SMILES notation and transfer learning on a deep learning transformer model for sequence translation, trained initially on chemical reaction data, to predict the outcome of human metabolic reactions. The authors (Litsa et al. 2020) additionally created an ensemble model which consists of multiple fine-tuned models. Each model takes as input the SMILES sequence of the input molecule and predicts the SMILES sequences of possible metabolites. This method generalizes well to different enzyme families, as it can correctly predict metabolites through phase I and phase II drug metabolism as well as other enzymes and can provide a comprehensive study of drug metabolism that does not restrict to the major enzyme families and does not require the extraction of transformation rules. Prediction of diverse structures of drug metabolites can be performed using the metabolic forest approach (Hughes et al. 2020). The method was validated with the substrate and product structures from a large, chemically diverse, literature-derived dataset of 20,736 records. The metabolic forest finds a pathway linking each substrate and product for 79.42% of these records. By performing a breadth-first search of depth two or three, authors improve performance to 88.43% and 88.77%, respectively. The metabolic forest includes a specialized algorithm for producing accurate quinone structures, the most common type of reactive metabolite. Another approach is XenoNet (Flynn et al. 2020), a metabolic network predictor that can take a pair of a substrate and a target product as input and (1) enumerate pathways, or sequences of intermediate metabolite structures, between the pair and (2) compute the likelihood of those pathways and intermediate metabolites. Each metabolic network has a defined substrate molecule that has been experimentally observed to undergo metabolism into a defined product metabolite. XenoNet can predict experimentally observed pathways and intermediate metabolites linking the input substrate and product pair with 0.88 and 0.46 recall, respectively. Using likelihood scoring, XenoNet also achieves a top-one pathway and intermediate metabolite accuracy of 0.94 and 0.52, respectively. XenoNet is available at https://swami.wustl. edu/xenonet. Hughes and co-authors modeled four of the most common metabolic transformations that result in bioactivation: quinone formation, epoxidation, thiophene sulfuroxidation, and nitroaromatic reduction by synthesizing models of metabolism and reactivity (Hughes et al. 2021). First, the metabolism models predict the formation probabilities of all possible metabolites among the pathways studied. Second, the exact structures of these metabolites are enumerated. Third, using these structures,

2 Assessment of the Xenobiotics Toxicity Taking into Account Their …

33

the reactivity model predicts the reactivity of each metabolite. Finally, a feedforward neural network converts the metabolism and reactivity predictions to a bioactivation prediction for each possible metabolite. These bioactivation predictions represent the joint probability that a metabolite forms and that this metabolite subsequently conjugates to protein or glutathione. Hughes et al. predicted the correct pathway with an AUC accuracy of 0.90 among molecules bioactivated by these pathways. Furthermore, the model predicts whether molecules will be bioactivated, distinguishing bioactivated and non-bioactivated molecules with a 0.81 AUC. This bioactivation model—the first of its kind that jointly considers both metabolism and reactivity—enables drug candidates to be quickly evaluated for a toxicity risk that often evades detection during preclinical trials. The XenoSite bioactivation model is available at http://swami.wustl.edu/xenosite/p/bioactivation. Software GLORY (de Bruyn Kops et al. 2019) predicts chemical structures of metabolites due to the combination of SoM prediction with a new collection of rules for metabolic reactions mediated by the cytochrome P450 enzyme family. GLORY has two modes: MaxEfficiency and MaxCoverage. In MaxEfficiency mode, the use of predicted SoMs to restrict the locations in the molecule at which the reaction rules can be applied was explored. In MaxCoverage mode, the predicted SoM probabilities were instead used to develop a new scoring approach for the predicted metabolites. With this scoring approach, GLORY achieves a recall of 0.83 and can predict at least one known metabolite within the top three ranked positions for 76% of the molecules of a new, manually curated test set. GLORY is freely available as a web server at https://acm.zbh.uni-hamburg.de/glory/. CyProduct (Tian et al. 2021) is another example of a machine learning in silico combined approach for cytochrome P450 metabolism prediction that predicts metabolic byproducts for a specified molecule and a human CYP450 isoform. It includes three modules: (1) CypReact, a tool that predicts if the query compound reacts with a given CYP450 enzyme, (2) CypBOM, a tool that accurately predicts the “bond site” of the reaction (i.e., which specific bonds within the query molecule react with the CYP isoform), and (3) MetaboGen, a tool that generates the metabolic byproducts based on CypBOM’s bond-site prediction. CyProduct predicts metabolic biotransformation products for each of the nine most important human CYP450 enzymes. CypBOM uses the BOM database for 1845 CYP450-mediated phase I reactions in the training set of the CypBOM Predictor to predict the reactive bond locations on substrate molecules. CypBOM Predictor’s cross-validated Jaccard score for reactive bond prediction ranged from 0.380 to 0.452 over the nine CYP450 enzymes. The CyProduct suite and the data sets are freely available at https://bitbucket.org/wishartlab/cyproduct/src/master/.

34

D. Filimonov et al.

2.2.5 Reactive Metabolite Formation Prediction Reactive metabolite formation is thought to be one of the primary causes of idiosyncratic adverse drug reactions, often associated with drug-induced skin, liver, and hematopoietic toxicities. Reactive metabolites, formed via drug metabolism in the body, are electrophilic species that can bind covalently to macromolecules such as proteins and DNA, affecting their function and potentially leading to toxic effects (Thompson et al. 2016). For example, N-acetyl-p-benzoquinone-imine (NAPQI) is a toxic metabolite produced during the oxidation of Paracetamol (Alves et al. 2006) by CYP1A2, CYP3A4, and CYP2E1, which causes liver toxicity. Covalent binding of metabolites to DNA or other macromolecules may occur, for example, for bioactivated sulfamethoxazole, dapsone, procainamide, chloramphenicol, nimesulide, dantrolene, nilutamide, phenytoin, thalidomide, furosemide, norethindrone, and more (Lyubimov 2012). One more well-studied example is hepatotoxic tetrachloromethane, which is metabolized to give DNA-alkylating compounds (alkylators) possessing genotoxicity [some other instances may be found in the paper of Ortiz de Montellano (2013)]. There are diverse classes of RMs, including epoxides, quinones, quinone imines, quinone methides, free radicals, hydrazides, and many reactive oxygen species (Naisbitt et al. 2001; Park et al. 2011). Unfortunately, reactive metabolites are not reliably detected by experiments. Moreover, testing of lead compounds to form DNA or protein adducts during the early stages of drug development is rather capital and time expensive. In contrast, computational methods quickly screen for covalent binding potential, thereby flagging challenging molecules and reducing the total number of necessary experiments (Hughes et al. 2016). One of the main directions for assessing the formation of reactive metabolites is computational model creation that could assess the most common metabolic bioactivation transformations: epoxidation, quinone formation, thiophene sulfur-oxidation, and nitroaromatic reduction. Epoxides are compounds that contain cyclic ether with a three-atom ring (epoxide functional group). This ring approximates an equilateral triangle, which makes it strained, and hence highly reactive. The example of biotransformation of Carbamazepine into toxic epoxide metabolites is shown in Fig. 2.3. The specific location in a molecule where the epoxide functional group is formed represents the so-called site of epoxidation (SOE). Epoxides are metabolites often formed by cytochromes P450 acting on aromatic or double bonds. There are several methods for the prediction of SOE. One of them is the calculation of the activation energies for epoxidation (Rydberg et al. 2014; Zhang et al. 2015). The most common is using the density functional theory (DFT) method with the B3LYP (Kim and Jordan 1994; Stephens et al. 1994) exchange–correlation functional with and without dispersion correction. Rydberg et al. (2014) have performed the comparison of the energy barriers for cytochrome P450 mediated epoxidation of alkenes to the barriers for the hydroxylation of an aliphatic carbon atom next to a

2 Assessment of the Xenobiotics Toxicity Taking into Account Their …

35

Fig. 2.3 Biotransformation of Carbamazepine into toxic epoxide metabolite. The reaction was taken from DrugBank

double bond. The results show that B3LYP tends to underestimate the barriers for hydroxylation relative to epoxidation. Zhang et al. (2015) calculated the activation energies for epoxidation by the active species of P450 enzymes. They show that dispersion corrections in the form of B3LYP-D3 single-point energy calculations with zero-point energy and solvation corrections based on B3LYP-optimized geometries may predict the experimentally observed regioselectivity between P450 epoxidation and hydroxylation. Hughes et al. (2015) used hundreds of epoxidation reactions for constructing their deep convolution network. The final epoxidation model identified SOEs with 0.95 AUC performance and separated epoxidized and non-epoxidized molecules with 0.79 AUC. Theis epoxidation model is available at http://swami.wustl.edu/xenosite and uses 214 numerical descriptors for each bond of molecules, including atom-level descriptors previously developed for the XenoSite metabolism model (Zaretzki et al. 2013) and the XenoSite reactivity model (Hughes et al. 2015). Rudik et al. (2017) used a training set consisting of 355 molecules and 615 reactions and a combination of LMNA descriptors and a Bayesian-like algorithm implemented in PASS software (Filimonov et al. 2014) for SOE estimation. The AUC calculated in leave-one-out and 20-fold cross-validation procedures was 0.9. Prediction of epoxide formation based on the created SAR model is included as the component of MetaTox web-service (http://www.way2drug.com/mg). Hu et al. (2020) used a training set consisting of 829 unique SOEs, which were collected from different sources and three types of fingerprints, namely atom-pair (AP) (Carhart et al. 1985), Morgan2 (Morgan 1965), and topological-torsion (TT) (Nilakantan et al. 1987), which were applied to describe the SOEs. Six machine learning methods were employed to build the classification models, including support vector machine (SVM), decision tree (DT), random forest (RF), k-nearest neighbors (k-NN), logistic regression (LR), and neural network (NN). Sarullo et al. (2020) presented the TLQC method (Transfer Learning from Quantum Chemistry). They constructed two simple neural network architectures to perform transfer learning from quantum chemistry (TLQC) for both the epoxidation and reactivity data sets. The first architecture is a graph-based deep learning model of quantum chemical properties using training data extracted from PubChemQC (Nakata and Shimazaki 2017). They constructed and trained a message passing neural

36

D. Filimonov et al.

network (MPNN) (Gilmer et al. 2017) to predict quantum chemical properties. The atomic state output from this network was further processed by other networks, which were trained to predict up to three molecule-level properties (total energy, highest occupied molecular orbital energy or HOMO, and excitation gap energy), four atomlevel properties (Mulliken charge and population, total and bonded valence), and two bond-level properties (Mayer order and bond length). The model was constructed and trained using Tensorflow, DeepMind GraphNets, and the tflon deep learning toolkit. After computing graph-based encodings of the quantum state for each atom, they trained a neural network to predict bond-level epoxidation labels for each bond in the epoxidation data set. In addition, they trained a neural network to predict atom-level reactivity labels for each atom in the reactivity dataset. Each of the four outputs corresponds to reactivity predictions with cyanide, DNA, GSH, and protein. This neural network was trained via gradient descent with the OpenOpt L-BFGS optimizer. The authors received the following accuracy (top-2 accuracy in LOO CV experiments) 0.85, 0.8, 0.7, and 0.83 for epoxidation, cyanide, GSH, and protein reactivity. Hughes et al. (2016) extracted 2489 reactions (molecules were extracted based on reported reactions with cyanide, DNA, GSH, or protein from Metabolite Database (AMD). The atoms across the whole data set were labeled as reactive or nonreactive for each of the four reactive nucleophilic targets. Then, they labeled the conjugation site on each molecule, known as its reactivity (SOR) site, and used a deep convolution neural network to predict these SORs in cross-validated experiments accurately. Further, they transformed SOR scores to accurate molecule-level electrophilic reactivity scores that accurately predict whether molecules will conjugate to DNA or protein. Additionally, they applied the molecule reactivity scores to calculate DNA and protein selectivity scores to estimate the fraction of molecules that are reactive to DNA and protein but not cyanide or GSH. The MRS scores (reactivity scores for each molecule) were reasonably accurate in separating reactive and nonreactive molecules (90.3%, 78.7%, 77.7%, and 79.8% for cyanide, DNA, GSH, and protein, respectively). XenoSite’s cross-validated ARS (reactivity scores to each atom) predicted reactive atoms with average site AUC accuracies of 0.97, 0.90, 0.93, and 0.94, and top-two accuracies of 0.84, 0.81, 0.81, and 0.84, for cyanide, DNA, GSH, and protein, respectively. Wu et al. (2021) used reactive metabolite prediction for estimation drug-induced autoimmune disease (AD)—they developed a workflow to examine the association between structural alerts and drugs-induced ADs to improve toxicological prescreening tools. Considering reactive metabolite (RM) formation as a welldocumented mechanism for drug-induced ADs, they investigated whether the presence of specific RM-related structural alerts was predictive for the risk of druginduced AD. They used binary classification models generated by CatBoost to evaluate 171 published structural alerts for reactive metabolite formation on 407 drugs to identify structural alerts that could be used to flag the potential risk of druginduced ADs. Feature importance and contribution were explored using Shapley additive explanations (SHAP) values. Text mining software from Linguamatics

2 Assessment of the Xenobiotics Toxicity Taking into Account Their …

37

(IQVIA. Marlborough, MA) was used to conduct a full-text search for the 26 ADrelated MedDRA terms in the Drug Label database (https://dailymed.nlm.nih.gov/ dailymed). Fifty drugs were determined as AD-positive, supported by both drug labeling and literature. High daily dose has been identified as a contributing factor to ADRs (Chen et al. 2013; Alves et al.2016; Stepan et al. 2011), and as was suggested in the literature, the cutoff daily dose was set to ≥ 100 mg. As a result, the authors constructed a library of structural alerts for reactive metabolite formation and examined the association of the alerts with drug-induced ADs. They received a 0.70 AUC and 0.67 balanced accuracy (Wu et al. 2021).

2.3 Integral Computational Assessment of Xenobiotic Toxicity There are many different approaches to estimate the toxicity of a drug-like compound, which can be used for an integrated assessment of its toxicity, taking into account the metabolites. They include modern machine learning and deep learning approaches, traditional (quantitative) structure–activity relationships ((Q)SAR) modeling, etc. (see, e.g., Hong et al. 2016; Li et al. 2018). Many of the general methods are described in the other chapters of this book. Here we present some particular computational approaches to assessing the toxicity of xenobiotics. For many years, such estimation was performed by the experts using so-called toxophores or toxicity alerts (Williams 2006; Nelson 2001), which were also implemented in some computational tools. Examples include thiophenes and other sulfurcontaining heterocycles (via S-oxidation), furans (via epoxidation), anilines (via Nor C-oxidation), nitrobenzenes (via nitro reduction), hydrazines (via oxidation to free radical species), and some carboxylic acid derivatives (via acyl glucuronide or acyl-coenzyme A thioester formation) (Baillie and Rettie 2011). Toxophore patterns can be a specific substructure, a combination of substructures, or a Markush structure with variable features such as R groups or atom lists (Yang et al. 2020). John Ashby first proposed the concept of structural alerts in 1985 in the context of structural analysis of chemical carcinogens (Ashby 1985). Structural alerts are still an effective method for the detection of potentially toxic chemicals (Arce et al. 1990); Moore and Harrington-Brock 2000; Macherey and Dansette 2008; Guengerich 2008; Guengerich 2021; Wu 2009; Lyubimov 2012; Ioannides and Lewis 2004; Ortiz de Montellano 2013; Chen et al. 1998; Hansten and Horn 2010) as well as QSAR models. Figure 2.4 shows some well-studied structural alerts in environmental toxicology and drug discovery collected from the literature, in which some structural alerts are common for different endpoints. Commercially available software such as Derek (LHASA, Nexus) (Ridings et al. 1996), CASE Ultra (MultiCASE Inc.) (Saiakhov and Klopman 2008) (MultiCASE, Inc.), and LeadScope (Roberts et al. 2020) are the most known expert systems which

38

D. Filimonov et al.

Fig. 2.4 Some examples of well-studied structural alerts for hepatotoxicity, mutagenicity, and skin sensitization. R means any substitute or restricted substitute, and two Rs in the same structural alert are not necessarily the same. X means the halogen, including Cl, Br, and I. Adapted with permission from Yang et al. (2020). Copyright 2020 American Chemical Society

have collected many structural alerts for different endpoints. In addition, there is some freely available software such as ToxAlerts (Sushko et al. 2012) and ChemoTyper (Yang et al. 2015) which can also be used to identify structural alerts. ToxAlerts (http://ochem.eu/alerts) (Sushko et al. 2012) is an open Web-based platform for uploading and storing structural alerts published in the scientific literature in a structured manner. This tool includes a capability to virtually screen compound libraries against these alerts to flag toxic chemicals and compounds with potential ADRs. ChemoTyper (Yang et al. 2015) (https://chemotyper.org/) is a tool that allows for searching and highlighting chemical chemotypes (chemical substructures or subgraphs) in datasets of molecules.

2 Assessment of the Xenobiotics Toxicity Taking into Account Their …

39

Most alerts represent functional groups or substructures found in many, both toxic and non-toxic compounds, leading to overly high sensitivity but low specificity predictions. Hewitt et al. (2013) developed new structural alerts for hepatotoxicity but also found that many alerts were likewise present in non-hepatotoxic drugs. Alves et al. (2016) employed structural alerts to analyze the difference between safe and unsafe drugs. This dataset included thirteen withdrawn drugs and seven drugs currently on the market. As a result, they found that all 20 drugs contained at least one alert that categorized them as unsafe. These results show that it is impossible to distinguish the “safe” marketed from “unsafe” withdrawn drugs using only established toxicity alerts, providing yet another illustration of the weakness of structural alerts as reliable drug safety predictors. Therefore, it is necessary to combine structural alerts with machine learning methods. Frequently researchers create QSAR models for toxicity assessment (Chen et al. 2014; Li et al. 2014; Lei et al. 2016; Alves et al. 2016). Structural alerts are used to explain the models and demonstrate which fragments or functional groups may lead to specific toxicity (Yang et al. 2020). GUSAR software was developed to create QSAR/QSPR models based on the appropriate training sets of chemical structures and endpoints in quantitative terms. It is based on QNA descriptors and a self-consistent regression algorithm (Filimonov et al. 2009). Predictions of acute toxicity for rats and mice (LD50 values with four types of administration) are freely available on the website http://www.way2drug. com/gusar (Lagunin et al. 2011). Additionally, prediction of antitargets (off-targets) the set of 32 activities using IC50 (half maximal inhibitory concentration), Ki (inhibition constant), or Kact (activation constant) values are available on the same website (Zakharov et al. 2012). It is now generally accepted that most pharmaceutical agents interact with several, sometimes many, biological targets (Muratov et al. 2020). This often generates beneficial therapeutic activities, but, on the other hand, drugs can also interact with undesired molecular targets to cause adverse or toxic effects. Multi-target profiling of compounds has led to the concept of the biological activity spectrum (Lagunin et al. 2000; Filimonov et al. 2014), defined as the set of different biological activities resulting from the compound interacting with different biological systems. It, therefore, represents an “intrinsic” property of the compound that depends only on its chemical structure. One of the earliest approaches for multi-target modeling was the computer program Prediction of Activity Spectra for Substances (PASS) developed by Filimonov et al. almost 30 years ago (Filimonov et al. 2014). PASS estimates the probable biological activity profile relative to the structural formula of a drug-like organic compound. The estimation is based on the analysis of the training set of compounds with known biological activities, which is given qualitatively as “active” or “inactive”. PASS employs a uniform set of MNA molecular descriptors and a Naïve Bayes classifier to model structure–activity relationships across a wide variety of biological assays. Currently, (PASS 2020 version), the entire list of activities includes 9942 activity types. The list of 1945 selected activities includes 21 toxic and adverse effects (e.g., carcinogenic, cytotoxic, embryotoxic, genotoxic, mutagenic,

40

D. Filimonov et al.

sedative, ulceration, etc.), and 144 antitargets (e.g., an agonist of various 5 Hydroxytryptamine, acetylcholine receptors; thyroid hormone agonist; agonist/antagonist of different adrenoreceptors, histamine receptors; phosphodiesterase inhibitor; blocker of calcium, HERG, sodium channels; inhibitor of various cytochromes, etc.). Thus, PASS provides opportunities to evaluate a wide range of toxic and adverse biological effects of drug-like compounds and their metabolites. The PASS prediction is freely available on the website http://www.way2drug.com/PASSOnline/. An integrated assessment of the toxicity of xenobiotics, taking into account their metabolites, should include an assessment of the toxicity of a mixture of the parent compound, its metabolites, a wide range of toxic effects, side effects, and possible actions on undesirable molecular targets. In addition to the generation of metabolites, MetaTox (Rudik et al. 2019) includes the optional calculation of integral toxicity. Dmitriev et al. (2017) compared four different methods of estimating the integral toxicity based on the prediction of rat acute intravenous toxicity of the parent compound and all its metabolites. The best correspondence between the predicted and published data was found for the method that considers the parent compound’s estimated characteristics and its most toxic metabolite in the form of a harmonic mean of the estimated LD50 values. MetaPASS (Rudik et al. 2021) integrated the possibility of analyzing biological activity probability profiles. For any compound, an aggregated prediction of the biological activity profile can be obtained, taking into account the prediction for all its metabolites. Predicted biological activities are divided into seven categories: the mechanisms of action, pharmacological effects, toxicity, side effects, antitargets, transporters-related interactions, genes expression regulation, and metabolic terms. For any activity, it is possible to predict its presence simultaneously for all chemical structures from the metabolic graph. This makes it possible to assess the contribution of the parent substance and metabolites to the manifestation of activity. The web application MetaPASS (http://www.way2drug.com/metapass) uses the PASS software to estimate biological activity spectra for the parent compounds and their metabolites. Two values (Pa and Pi) for each type of biological activity are calculated for the compound. Pa (probability “to be active”) and Pi (probability “to be inactive”) estimates the belonging of the predicted compound into fuzzy classes of active and inactive compounds, respectively (see Filimonov et al. 2014). To account for the prediction of metabolites of the parent compound, in addition to the initial estimates of biological activity (Pa and Pi), the Pmax value is calculated, which is the maximal Pa value among all Pa values calculated for the compound and its metabolites. MetaPASS uses a color scheme for the prediction result. Brown cell is used when Pmax > Pa for biological activity in cell. Gray is used when activity in cell is not predicted for the parent compound (Pa < Pi), but it is predicted for its metabolite (Pa > Pi). The main interface of the MetaPASS web application is composed of two parts: the metabolism graph is presented on the left side (arrow 1 in Fig. 2.5), and the aggregated prediction for the selected compound is on the right (arrow 2 in Fig. 2.5).

2 Assessment of the Xenobiotics Toxicity Taking into Account Their …

41

Fig. 2.5 Main MetaPASS web application interface. Brown is used to highlight biological activity for which Pmax > Pa. Reprinted with permission from Rudik et al. (2021). Copyright 2021 John Wiley and Sons

To view the metabolism graph of any of these compounds, the user should go to the section “Biotransformation” and then click the “Select Metabolism Graph” button (arrow 4 in Fig. 2.5). To view the prediction result for a particular activity, e.g., for “Antineoplastic (colorectal cancer)”, the user should input the activity name in the corresponding field (arrow 6 in Fig. 2.5), determine any threshold for Pa in the range from 0 to 1 (arrow 5 in Fig. 2.5) and press the “Search” button. The diagram in Fig. 2.6 shows the known activities and relates to them calculated Pa (black) and Pmax (gray) values. As one may see, in all cases the probability estimates of the toxic effects are higher when the metabolism of the fluorouracil is taken into account. For instance, the Pmax value for mutagenic effect is about 0.7 while the Pa value (probability for the parent compound) is only 0.2. The MetaPASS web application using the embedded or user-created metabolism graph will be beneficial for: (a) launching repurposed drugs; (b) optimizing the biological activity profile of new pharmaceutical agents; (c) assessing of the safety of investigational drugs.

2.4 Future Directions in Xenobiotic Toxicity Assessment To improve the quality of drug toxicity prediction by taking their metabolism in the body into account, better structure-based and ligand-based computational approaches of both metabolite identification and toxicity estimation are necessary. The significant limitation of structure-based methods is the need for a threedimensional structure of the target, determined mainly by X-ray analysis of macromolecular crystals. Many biotransformation enzymes are membrane proteins that are

42

D. Filimonov et al.

Fig. 2.6 Prediction results for fluorouracil. Pa value shown in gray color, Pmax—in black color. Reprinted with permission from Rudik et al. (2021). Copyright 2021 John Wiley and Sons

poorly crystallized; thus, alternative experimental and computational methods must be developed. New prospects have been instigated by cryo-electron microscopy, awarded with the Nobel Prize in Chemistry in 2017 (Danelius and Gonen 2021). This structural biology method may enable obtaining the 3D structures of proteins with high resolution and, in combination with AI-based image processing and interpretation, studying the cellular proteomes in situ (Bäuerlein and Baumeister 2021). Among the advanced computational methods in recent years is a machine learning algorithm for protein structure prediction. AlphaFold2 allows the user to obtain a hundred thousand protein models (Jumper et al. 2021). In the 14th Critical Assessment of protein Structure Prediction (CASP14) challenge, this method demonstrated accuracy comparable with experimental structures in most cases. It significantly outperformed other in silico methods (Lupas et al. 2021). The application of this method to study the enzymes involved in drug metabolism is still not described; however, the progress achieved in the case of other proteins looks promising. Ligand-based methods require more reliable training sets and robust in silico models of structure–toxicity relationships. A recent study aimed at collecting, curating, and integrating publicly available acute toxicity data for different species and developing the predictive models using the created training sets was published by Jain et al. (2021). As a result, the information about over 80,000 compounds with data on 59 acute systemic toxicity endpoints became available for the scientific

2 Assessment of the Xenobiotics Toxicity Taking into Account Their …

43

community (https://cactus.nci.nih.gov/download/acute-toxicity-db). In addition, the recently developed multitask models are publicly available at https://github.com/ ncats/ld50-multitask, https://predictor.ncats.io/. Further integration of these developed approaches with metabolism prediction may be the next step in integral toxicity assessment by in silico methods. In addition, we should keep in mind that xenobiotics are metabolized in the human body not only by human cells. The gut microbiota has both direct and indirect effects on drug metabolism, which can have consequences for efficacy and toxicity (Wilson and Nicholson 2017). Over 180 drugs are now recognized as substrates for gut bacterial enzymes and thus vulnerable to direct enzymatic transformation in vivo (McCoubrey et al. 2021; Hatton et al. 2019; Zimmermann et al. 2019). The microbiome may be a significant cause of variability in patients’ medication responses (Vinarov et al. 2021). Analyzing of drug-microbiome interaction may direct the research of pharmacokinetic and toxicological profile of a drug and also may repurpose a drug for microbiome medicine (Ghyselinck et al. 2021; Khan et al. 2021). A considerable amount of data on the microbiome is now accessible in online databases such as the NIH Human Microbiome Project Data Portal, MicrobiomeDB, and China National GeneBank (Chen et al. 2020) (https://db.cngb.org/about/). Sharma et al. (2017) reported on a novel methodology developed by integrating chemoinformatics and machine learning methods to predict the metabolic enzyme and the corresponding bacterial species capable of metabolizing a given xenobiotic/drug molecule at the first/initial step. The database of metabolic enzymes was constructed from 491 human gut bacterial genomes, which contained 324,697 metabolic enzymes assigned with EC numbers. For EC class, the random forest (RF) model displayed an accuracy of 97.19 for tenfold cross-validation. The authors of this approach have developed a web server, DrugBug (http://metagenomics.iiserb. ac.in/drugbug/), by implementing predictive RF modules and the similarity search module. Zimmermann et al. (2019) used a clustering algorithm to identify how drug structure can increase susceptibility to enzymatic transformation in the gut. They noted that the presence of lactone, urea, azo, and nitro functional groups increases the chance of bacterial metabolism. McCoubrey et al. (2021) have successfully developed a classification algorithm that can predict adverse drug effects on the growth of 40 gut bacterial strains. Currently, high-throughput ex vivo studies (Javdan et al. 2020; Zimmermann et al. 2019) or general observation microbiome studies (Everett et al. 2021; Huttenhower et al. 2012; Proctor et al. 2019) are the best data sources for machine learning methods. A few databases have also been created that collect data on the disease-microbiome or drug-microbiome interactions in one place (Janssens et al. 2018; Sun et al. 2018; McCoubrey et al. 2021). The probable termination of drug development at late phase clinical trials due to unfavorable properties of drug candidates, such as unacceptable metabolism rate, the toxicity of metabolites, etc., stimulates the application of computer-aided methods

44

D. Filimonov et al.

to predict these properties as early as possible at the stage of theoretical research and planning the synthesis of lead compounds (Bezhentsev et al. 2016). In silico methods help to prioritize in vitro and in vivo studies and to decrease the risk of failure for potential drugs. In silico methods for metabolism assessment are constantly improving, but biotransformation depends on many factors, including microbiota composition, genetic, phenotypic, and ontogenetic individual factors, which, ideally, should be considered all together. Acknowledgements This study was performed with the support of the Russian Science Foundation (Project No. 19-15-00396).

References Alqahtani S (2017) In silico ADME-Tox modeling: progress and prospects. Expert Opin Drug Metab Toxicol 13(11):1147–1158. https://doi.org/10.1080/17425255.2017.1389897 Alves C, Borges R, Da Silva A (2006) Density functional theory study of metabolic derivatives of the oxidation of paracetamol. Int J Quantum Chem 106(13):2617–2623. https://doi.org/10.1002/ qua.20992 Alves V, Muratov E, Capuzzi S, Politi R, Low Y, Braga R, Zakharov AV, Sedykh A, Mokshyna E, Farag S, Andrade C, Kuz’min V, Fourches D, Tropsha A (2016) Alarms about structural alerts. Green Chem 18(16):4348–4360. https://doi.org/10.1039/C6GC01492E Arce GT, Vincent DR, Cunningham MJ, Choy WN, Sarrif AM (1990) In vitro and in vivo genotoxicity of 1,3-butadiene and metabolites. Environ Health Perspect 86:75. https://doi.org/10.1289/ ehp.908675 Ashby J (1985) Fundamental structural alerts to potential carcinogenicity or noncarcinogenicity. Environ Mutagen 7(6):919–921. https://doi.org/10.1002/em.2860070613 Baillie TA, Rettie AE (2011) Role of biotransformation in drug-induced toxicity: influence of intra- and inter-species differences in drug metabolism. Drug Metab Pharmacokinet 26(1):15–29. https://doi.org/10.2133/dmpk.dmpk-10-rv-089 Bäuerlein FJB, Baumeister W (2021) Towards visual proteomics at high resolution. J Mol Biol 433(20):167187. https://doi.org/10.1016/j.jmb.2021.167187 Bezhentsev VM, Tarasova OA, Dmitriev AV, Rudik AV, Lagunin AA, Filimonov DA, Poroikov VV (2016) Computer-aided prediction of xenobiotic metabolism in humans. Russ Chem Rev 85:854–879. https://doi.org/10.1070/RCR4614 Boobis A, Gundert-Remy U, Kremers P, Macheras P, Pelkonen O (2002) In silico prediction of ADME and pharmacokinetics. Report of an expert meeting organised by COST B15. Eur J Pharm Sci 17:183. https://doi.org/10.1016/s0928-0987(02)00185-9 Borodina Yu, Rudik A, Filimonov D, Kharchevnikova N, Dmitriev A, Blinova V, Poroikov V (2004) A new statistical approach to predicting aromatic hydroxylation sites. Comparison with model-based approaches. J Chem Inf Comput Sci 44:1998. https://doi.org/10.1021/ci049834h Brunk E, Sahoo S, Zielinski DC, Altunkaya A, Dräger A, Mih N, Gatto F, Nilsson A, Preciat Gonzalez GA, Aurich MK, Prli´c A, Sastry A, Danielsdottir AD, Heinken A, Noronha A, Rose PW, Burley SK, Fleming R, Nielsen J, Thiele I, Palsson BO (2018) Recon3D enables a threedimensional view of gene variation in human metabolism. Nat Biotechnol 36(3):272–281. https:// doi.org/10.1038/nbt.4072 Bush BL, Sheridan RP (1993) PATTY: a programmable atom typer and language for automatic classification of atoms in molecular databases. J Chem Inf Comput Sci 33:756–762. https://doi. org/10.1021/ci00015a015

2 Assessment of the Xenobiotics Toxicity Taking into Account Their …

45

Carhart RE, Smith DH, Venkataraghavan R (1985) Atom pairs as molecular features in structureactivity studies: definition and applications. J Chem Inf Comput Sci 25:64–73. https://doi.org/ 10.1021/ci00046a002 Chen W, Koenigs LL, Thompson SJ, Peter RM, Rettie AE, Trager WF, Nelson SD (1998) Oxidation of acetaminophen to its toxic quinone imine and nontoxic catechol metabolites by baculovirusexpressed and purified human cytochromes P450 2E1 and 2A6. Chem Res Toxicol 11:295. https:// doi.org/10.1021/tx9701687 Chen M, Borlak J, Tong W (2013) High lipophilicity and high daily dose of oral medications are associated with significant risk for drug-induced liver injury. Hepatology (Baltimore, Md) 58(1):388–396. https://doi.org/10.1002/hep.26208 Chen YJ, Cheng FX, Sun L, Li WH, Liu GX, Tang Y (2014) Computational models to predict endocrinedisrupting chemical binding with androgen or oestrogen receptors. Ecotoxicol Environ Saf 110:280–287. https://doi.org/10.1016/j.ecoenv.2014.08.026 Chen FZ, You LJ, Yang F, Wang LN, Guo XQ, Gao F, Hua C, Tan C, Fang L, Shan RQ, Zeng WJ, Wang B, Wang R, Xu X, Wei XF (2020) CNGBdb: China National GeneBank DataBase. Yi Chuan 42(8):799–809. https://doi.org/10.16288/j.yczz.20-080. PMID: 32952115 Dang NL, Matlock MK, Hughes TB, Swamidass SJ (2020) The metabolic rainbow: deep learning phase I metabolism in five colors. J Chem Inf Model 60(3):1146–1164. https://doi.org/10.1021/ acs.jcim.9b00836 Danelius E, Gonen T (2021) Protein and small molecule structure determination by the cryo-EM method MicroED. Methods Mol Biol 2305:323–342. https://doi.org/10.1007/978-1-0716-14068_16 Darvas F (1987) Metabolexpert: an expert system for predicting metabolism of substances. In: Kaiser KLE (eds) QSAR in environmental toxicology-II. Springer, Dordrecht, pp 71–81. https:// doi.org/10.1007/978-94-009-3937-0_7 de Bruyn Kops C, Conrad Stork C, Šícho M, Kochev N, Svozil D, Jeliazkova N, Kirchmair J (2019) GLORY: generator of the structures of likely cytochrome P450 metabolites based on predicted sites of metabolism. Front Chem 7:402. https://doi.org/10.3389/fchem.2019.00402 Djoumbou-Feunang Y, Fiamoncini J, Gil-de-la-Fuente A, Greiner R, Manach C, Wishart DS (2019) BioTransformer: a comprehensive computational tool for small molecule metabolism prediction and metabolite identification. J Cheminform 11(1):2. https://doi.org/10.1186/s13321-018-0324-5 Dmitriev A, Rudik A, Filimonov D, Lagunin A, Pogodin P, Dubovskaja V, Bezhentsev V, Ivanov S, Druzhilovsky D, Tarasova O, Poroikov V (2017) Integral estimation of xenobiotics’ toxicity with regard to their metabolism in human organism. Pure Appl Chem 89(10):1449–1458. https:// doi.org/10.1515/pac-2016-1205 Ekins S, Andreyev S, Ryabov A, Kirillov E, Rakhmatulin EA, Bugrim A, Nikolskaya T (2005) Computational prediction of human drug metabolism. Expert Opin Drug Metab Toxicol 1(2):303– 324. https://doi.org/10.1517/17425255.1.2.303 Everett C, Li C, Wilkinson JE, Nguyen LH, McIver LJ, Ivey K, Izard J, Palacios N, Eliassen AH, Willett WC, Ascherio A, Sun Q, Tworoger SS, Chan AT, Garrett WS, Huttenhower C, Rimm EB, Song M (2021) Overview of the microbiome among nurses study (micro-N) as an example of prospective characterization of the microbiome within cohort studies. Nat Protoc 16:2724–2731. https://doi.org/10.1038/s41596-021-00519-z Fenner K, Gao J, Kramer S, Ellis L, Wackett L (2008) Data-driven extraction of relative reasoning rules to limit combinatorial explosion in biodegradation pathway prediction. Bioinformatics 24(18):2079–2085. https://doi.org/10.1093/bioinformatics/btn378 Filimonov D, Zakharov A, Lagunin A, Poroikov V (2009) QNA-based ‘Star Track’ QSAR approach. SAR QSAR Environ Res 20(7–8):679–709. https://doi.org/10.1080/10629360903438370 Filimonov DA, Lagunin AA, Gloriozova TA, Rudik AV, Druzhilovskii DS, Pogodin PV, Poroikov VV (2014) Prediction of the biological activity spectra of organic compounds using the PASS online web resource. Chem Heterocycl Comp 50(3):444–457. https://doi.org/10.1007/s10593014-1496-1

46

D. Filimonov et al.

Flynn NR, Dang NL, Ward MD, Swamidass SJ (2020) XenoNet: inference and likelihood of intermediate metabolite formation. J Chem Inf Model 60(7):3431–3449. https://doi.org/10.1021/acs. jcim.0c00361 Gao J, Ellis LBM, Wackett LP (2010) The university of Minnesota biocatalysis/biodegradation database: improving public access. Nucleic Acids Res 38:D488–D491. https://doi.org/10.1093/ nar/gkp771 Ghyselinck J, Verstrepen L, Moens F, Van Den Abbeele P, Bruggeman A, Said J, Smith B, Barker LA, Jordan C, Leta V, Chaudhuri KR, Basit AW, Gaisford S (2021) Influence of probiotic bacteria on gut microbiota composition and gut wall function in an in-vitro model in patients with Parkinson’s disease. Int J Pharm X 3:100087. https://doi.org/10.1016/j.ijpx.2021.100087 Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE (2017) Neural message passing for quantum chemistry. arXiv:1704.01212 Guengerich FP (2008) Cytochrome p450 and chemical toxicology. Chem Res Toxicol 21(1):70–83. https://doi.org/10.1021/tx700079z Guengerich FP (2021) A history of the roles of cytochrome P450 enzymes in the toxicity of drugs. Toxicol Res 37:1–23. https://doi.org/10.1007/s43188-020-00056-z Hansten PD, Horn JR (2010) The top 100 drug interactions: a guide to patient management. H&H Publications, Washington, p 171 Hatton GB, Madla CM, Rabbie SC, Basit AW (2019) Gut reaction: impact of systemic diseases on gastrointestinal physiology and drug absorption. Drug Discov Today 24(2):417–427. https://doi. org/10.1016/j.drudis.2018.11.009 Hewitt M, Enoch SJ, Madden JC, Przybylak KR, Cronin MT (2013) Hepatotoxicity: a scheme for generating chemical categories for read-across, structural alerts and insights into mechanism(s) of action. Crit Rev Toxicol 43(7):537–558. https://doi.org/10.3109/10408444.2013.811215 Hong H, Chen M, Ng HW, Tong W (2016) QSAR models at the US FDA/NCTR. In: Benfenati E (eds) In silico methods for predicting drug toxicity, methods in molecular biology vol 1425, pp 431–459. https://doi.org/10.1007/978-1-4939-3609-0_18 Hu J, Cai Y, Li W, Liu G, Tang Y (2020) In silico prediction of metabolic epoxidation for drug-like molecules via machine learning methods. Mol Inform 39(8):e1900178. https://doi.org/10.1002/ minf.201900178 Hughes TB, Miller GP, Swamidass SJ (2015) Modeling epoxidation of drug-like molecules with a deep machine learning network. ACS Cent Sci 1(4):168–180. https://doi.org/10.1021/acscentsci. 5b00131 Hughes TB, Dang NL, Miller GP, Swamidass SJ (2016) Modeling reactivity to biological macromolecules with a deep multitask network. ACS Cent Sci 2(8):529–537. https://doi.org/10.1021/ acscentsci.6b00162 Hughes TB, Dang NL, Kumar A, Flynn NR, Swamidass SJ (2020) Metabolic forest: predicting the diverse structures of drug metabolites. J Chem Inf Model 60(10):4702–4716. https://doi.org/10. 1021/acs.jcim.0c00360 Hughes TB, Flynn N, Dang NL, Swamidass SJ (2021) Modeling the bioactivation and subsequent reactivity of drugs. Chem Res Toxicol 34(2):584–600. https://doi.org/10.1021/acs.chemrestox. 0c00417 Huttenhower C, Gevers D, Knight R et al (2012) The human microbiome project, C. Structure, function and diversity of the healthy human microbiome. Nature 486:207–214. https://doi.org/ 10.1038/nature11234 Idle JR, Gonzalez FJ (2007) Metabolomics. Cell Metab 6(5):348–351. https://doi.org/10.1016/j. cmet.2007.10.005 Ioannides C, Lewis DF (2004) Cytochromes P450 in the bioactivation of chemicals. Curr Top Med Chem 4(16):1767–1788. https://doi.org/10.2174/1568026043387188 Jain S, Siramshetty VB, Alves VM, Muratov EN, Kleinstreuer N, Tropsha A, Nicklaus MC, Simeonov A, Zakharov AV (2021) Large-scale modeling of multispecies acute toxicity end points using consensus of multitask deep learning methods. J Chem Inf Model 61(2):653–663. https:// doi.org/10.1021/acs.jcim.0c01164

2 Assessment of the Xenobiotics Toxicity Taking into Account Their …

47

Janssens Y, Nielandt J, Bronselaer A, Debunne N, Verbeke F, Wynendaele E, Van Immerseel F, Vandewynckel YP, De Tré G, De Spiegeleer B (2018) Disbiome database: linking the microbiome to disease. BMC Microbiol 18(1):50. https://doi.org/10.1186/s12866-018-1197-5 Javdan B, Lopez JG, Chankhamjon P, Lee YJ, Hull R, Wu Q, Wang X, Chatterjee S, Donia MS (2020) Personalized mapping of drug metabolism by the human gut microbiome. Cell 181(7):1661–1679 (e1622). https://doi.org/10.1016/j.cell.2020.05.001 Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Silver D, Vinyals O, Senior AW, Kavukcuoglu K, Kohli P, Hassabis D (2021) Applying and improving AlphaFold at CASP14. Proteins 89(12):1711–1721. https://doi.org/10.1002/prot.26257 Khan S, Hauptman R, Kelly L (2021) Engineering the microbiome to prevent adverse events: challenges and opportunities. Annu Rev Pharmacol Toxicol 61(1):159–179. https://doi.org/10. 1146/annurev-pharmtox-031620-031509 Kim K, Jordan KD (1994) Comparison of density functional and MP2 calculations on the water monomer and dimer. J Phys Chem 98(40):10089–10094. https://doi.org/10.1021/j100091a024 Kirchmair J, Williamson MJ, Tyzack JD, Tan L, Bond PJ, Bender A, Glen RC (2012) Computational prediction of metabolism: sites, products, SAR, P450 enzyme dynamics, and mechanisms. J Chem Inf Model 52(3):617–648. https://doi.org/10.1021/ci200542m Kirchmair J, Goller AH, Lang D, Kunze J, Testa B, Wilson ID, Glen RC, Schneider G (2015) Predicting drug metabolism: experiment and/or computation? Nat Rev Drug Discov 14:387–404. https://doi.org/10.1038/nrd4581 Klopman G, Dimayuga M, Talafous J (1994) META. 1. A program for the evaluation of metabolic transformation of chemicals. J Chem Inf Comput Sci 34(6):1320–1325. https://doi.org/10.1021/ ci00022a014 Kulkarni SF, Zhu J, Blechinger S (2005) In silico techniques for the study and prediction of xenobiotic metabolism: a review. Xenobiotica 35:955–973. https://doi.org/10.1080/004982505003 54402 Lagunin A, Stepanchikova A, Filimonov D, Poroikov V (2000) PASS: prediction of activity spectra for biologically active substances. Bioinformatics 16(8):747–748. https://doi.org/10.1093/bioinf ormatics/16.8.747 Lagunin A, Zakharov A, Filimonov D, Poroikov V (2011) QSAR modelling of rat acute toxicity on the basis of pass prediction. Mol Inf 30(2–3):241–250. https://doi.org/10.1002/minf.201000151 Leach AR, Bradshaw J, Green DVS, Hann MM, Delany JJ (1999) Implementation of a system for reagent selection and library enumeration, profiling, and design. J Chem Inf Comput Sci 39(6):1161–1172. https://doi.org/10.1021/ci9904259 Lei T, Li Y, Song Y, Li D, Sun H, Hou T (2016) ADMET evaluation in drug discovery: 15. Accurate prediction of rat oral acute toxicity using relevance vector machine and consensus modeling. J Cheminform 8:6. https://doi.org/10.1186/s13321-016-0117-7 Li X, Chen L, Cheng F, Wu Z, Bian H, Xu C, Li W, Liu G, Shen X, Tang Y (2014) In silico prediction of chemical acute oral toxicity using multi-classification methods. J Chem Inf Model 54(4):1061–1069. https://doi.org/10.1021/ci5000467 Li Y, Idakwo G, Thangapandian S, Chen M, Hong H, Zhang C, Gong P (2018) Target-specific toxicity knowledgebase (TsTKb): a novel toolkit for in silico predictive toxicology. J Environ Sci Health Part C 36(4):219–236. https://doi.org/10.1080/10590501.2018.1537148 Litsa EE, Das P, Kavraki LE (2020) Prediction of drug metabolites using neural machine translation. Chem Sci 11(47):12777–12788. https://doi.org/10.1039/d0sc02639e Lounnas V, Ritschel T, Kelder J, McGuire R, Bywater RP, Foloppe N (2013) Current progress in structure-based rational drug design marks a new mindset in drug discovery. Comput Struct Biotechnol J 5(6):e201302011. https://doi.org/10.5936/csbj.201302011 Lupas AN, Pereira J, Alva V, Merino F, Coles M, Hartmann MD (2021) The breakthrough in protein structure prediction. Biochem J 478(10):1885–1890. https://doi.org/10.1042/BCJ20200963

48

D. Filimonov et al.

Lyubimov AV (ed) (2012) Encyclopedia of drug metabolism and interactions. Wiley, p 764. https:// doi.org/10.1002/9780470921920 Macherey A-C, Dansette PM (2008) Biotransformations leading to toxic metabolites: chemical aspect. In: Wermuth CG (ed) The practice of medicinal chemistry. Academic Press, Amsterdam, pp 674–696 Marchant CA, Briggs KA, Long A (2008) In silico tools for sharing data and knowledge on toxicity and metabolism: derek for windows, meteor, and vitic. Toxicol Mech Methods 18:177–187. https://doi.org/10.1080/15376510701857320 McCoubrey LE, Gaisford S, Orlu M, Basit AW (2021) Predicting drug-microbiome interactions with machine learning. Biotechnol Adv 107797. https://doi.org/10.1016/j.biotechadv.2021.107797 Mekenyan OG, Dimitrov SD, Pavlov TS, Veith GD (2004) A systematic approach to simulating metabolism in computational toxicology. I. The TIMES heuristic modelling framework. Curr Pharm Des 10(11):1273–1293. https://doi.org/10.2174/1381612043452596 Mendez D, Gaulton A, Bento AP, Chambers J, De Veij M, Félix E, Magariños MP, Mosquera JF, Mutowo P, Nowotka M, Gordillo-Marañón M, Hunter F, Junco L, Mugumbate G, RodriguezLopez M, Atkinson F, Bosc N, Radoux CJ, Segura-Cabrera A, Hersey A, Leach AR (2019) ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res 47(D1):D930–D940. https://doi.org/10.1093/nar/gky1075 Moore MM, Harrington-Brock K (2000) Mutagenicity of trichloroethylene and its metabolites: implications for the risk assessment of trichloroethylene. Environ Health Perspect 108(Suppl. 2):215–223. https://doi.org/10.1289/ehp.00108s2215 Morgan HL (1965) The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service. J Chem Doc 5(2):107–113. https://doi.org/10. 1021/c160017a018 Muratov EN, Bajorath J, Sheridan RP, Tetko IV, Filimonov D, Poroikov V, Oprea TI, Baskin II, Varnek A, Roitberg A, Isayev O, Curtarolo S, Fourches D, Cohen Y, Aspuru-Guzik A, Winkler DA, Agrafiotis D, Cherkasov A, Tropsha A (2020) QSAR without borders. Chem Soc Rev 49(11):3525–3564. https://doi.org/10.1039/d0cs00098a Naisbitt DJ, Williams DP, Pirmohamed M, Kitteringham NR, Park BK (2001) Reactive metabolites and their role in drug reactions. Curr Opin Allergy Clin Immunol 1(4):317–325. https://doi.org/ 10.1097/01.all.0000011033.64625.5a Nakata M, Shimazaki T (2017) PubChemQC project: a large-scale first-principles electronic structure database for data-driven chemistry. J Chem Inf Model 57(6):1300–1308. https://doi.org/10. 1021/acs.jcim.7b00083 Nelson SD (2001) Structure toxicity relationships—how useful are they in predicting toxicities of new drugs? In: Dansette PM et al (eds) Biological reactive intermediates VI. Advances in experimental medicine and biology, vol 500, pp 33–43. https://doi.org/10.1007/978-1-4615-066 7-6_4 Nilakantan R, Bauman N, Dixon JS, Venkataraghavan R (1987) Topological torsion: a new molecular descriptor for SAR applications. Comparison with other descriptors. J Chem Inf Comput Sci 27(2):82–85. https://doi.org/10.1021/ci00054a008 Olsen L, Oostenbrink C, Jorgensen FS (2015) Prediction of cytochrome P450 mediated metabolism. Adv Drug Deliv Rev 86:61–71. https://doi.org/10.1016/j.addr.2015.04.020 Ortiz de Montellano PR (2013) Cytochrome P450-activated prodrugs. Future Med Chem 5(2):213. https://doi.org/10.4155/fmc.12.197 Park BK, Boobis A, Clarke S, Goldring CEP, Jones D, Kenna JG, Lambert C, Laverty HG, Naisbitt DJ, Nelson S et al (2011) Managing the challenge of chemically reactive metabolites in drug development. Nat Rev Drug Discov 10(4):292–306. https://doi.org/10.1038/nrd3408 Parr RG, Yang W (1989) Density-functional theory of atoms and molecules. Oxford University Press, New York, p 333. https://doi.org/10.1002/qua.560470107, https://doi.org/10.1093/oso/978 0195092769.001.0001 Patterson AD, Gonzalez FJ, Idle JR (2010) Xenobiotic metabolism: a view through the metabolometer. Chem Res Toxicol 23(5):851–860. https://doi.org/10.1021/tx100020p

2 Assessment of the Xenobiotics Toxicity Taking into Account Their …

49

Peach ML, Zakharov AV, Liu R, Pugliese A, Tawa G, Wallqwist A, Nicklaus MC (2012) Computational tools and resources for metabolism-related property predictions. 1. Overview of publicly available (free and commercial) databases and software. Future Med Chem 4(15):1907. https:// doi.org/10.4155/fmc.12.150 Pornputtapong N, Nookaew I, Nielsen J (2015) Human metabolic atlas: an online resource for human metabolism. Database 2015:bav068. https://doi.org/10.1093/database/bav068 Proctor LM, Creasy HH, Fettweis JM, Lloyd-Price J, Mahurkar A, Zhou W, Buck GA, Snyder MP, Strauss JF, Weinstock GM, White O, Huttenhower C (2019) The integrative HMP (iHMP) research network consortium. The integrative human microbiome project. Nature 569:641–648. https://doi.org/10.1038/s41586-019-1238-8 Ridder L, Wagener M (2008) SyGMa: combining expert knowledge and empirical scoring in the prediction of metabolites. ChemMedChem 3(5):821–832. https://doi.org/10.1002/cmdc.200 700312 Ridings JE, Barratt MD, Cary R, Earnshaw CG, Eggington CE, Ellis MK, Judson PN, Langowski JJ, Marchant CA, Payne MP, Watson WP, Yih TD (1996) Computer prediction of possible toxic action from chemical structure: an update on the DEREK system. Toxicology 106(1–3):267–279. https://doi.org/10.1016/0300-483x(95)03190-q Roberts G, Myatt GJ, Johnson WP, Cross KP, Blower PE Jr (2000) LeadScope: software for exploring large sets of screening data. J Chem Inf Comput Sci 40(6):1302–1314. https://doi.org/10.1021/ ci0000631 Rudik AV, Dmitriev AV, Lagunin AA, Filimonov DA, Poroikov VV (2014) Metabolism site prediction based on xenobiotic structural formulas and PASS prediction algorithm. J Chem Inf Model 54(2):498–507. https://doi.org/10.1021/ci400472j Rudik AV, Dmitriev AV, Bezhentsev VM, Lagunin AA, Filimonov DA, Poroikov VV (2017) Prediction of metabolites of epoxidation reaction in MetaTox. SAR QSAR Environ Res 28(10):833–842. https://doi.org/10.1080/1062936X.2017.1399165 Rudik A, Bezhentsev V, Dmitriev A, Lagunin A, Filimonov D, Poroikov V (2019) Metatox—web application for generation of metabolic pathways and toxicity estimation. J Bioinform Comput Biol 17(1):1940001. https://doi.org/10.1142/S0219720019400018 Rudik A, Dmitriev A, Lagunin A, Filimonov D, Poroikov V (2021) MetaPASS: a web application for analyzing the biological activity spectrum of organic compounds taking into account their biotransformation. Mol Inform 40(4):e2000231. https://doi.org/10.1002/minf.202000231 Rydberg P, Gloriam DE, Zaretzki J, Breneman C, Olsen L (2010) SMARTCyp: A 2D method for prediction of cytochrome P450-mediated drug metabolism. ACS Med Chem Lett 1(3):96–100. https://doi.org/10.1021/ml100016x Rydberg P, Rostkowski M, Gloriam DE, Olsen L (2013) The contribution of atom accessibility to site of metabolism models for cytochromes P450. Mol Pharm 10(4):1216–1223. https://doi.org/ 10.1021/mp3005116 Rydberg P, Lonsdale R, Harvey J, Mulholland A, Olsen L (2014) Trends in predicted chemoselectivity of cytochrome P450 oxidation: B3LYP barrier heights for epoxidation and hydroxylation reactions. J Mol Graph Model 52:30–35. https://doi.org/10.1016/j.jmgm.2014.06.002 Sahoo S, Haraldsdóttir HS, Fleming RM, Thiele I (2015) Modeling the effects of commonly used drugs on human metabolism. FEBS J 282(2):297–317. https://doi.org/10.1111/febs.13128 Saiakhov RD, Klopman G (2008) MultiCASE expert systems and the REACH initiative. Toxicol Mech Methods 18(2–3):159–175. https://doi.org/10.1080/15376510701857460 Sarullo K, Matlock MK, Swamidass SJ (2020) Site-level bioactivity of small-molecules from deeplearned representations of quantum chemistry. J Phys Chem A 124(44):9194–9202. https://doi. org/10.1021/acs.jpca.0c06231 Sharma AK, Jaiswal SK, Chaudhary N, Sharma VK (2017) A novel approach for the prediction of species-specific biotransformation of xenobiotic/drug molecules by the human gut microbiota. Sci Rep 7(1):9751. https://doi.org/10.1038/s41598-017-10203-6

50

D. Filimonov et al.

Sheridan RP, Korzekwa KR, Torres RA, Walker MJ (2007) Empirical regioselectivity models for human cytochromes P450 3A4, 2D6, and 2C9. J Med Chem 50(14):3173–3184. https://doi.org/ 10.1021/jm0613471 Singh SB, Shen LQ, Walker MJ, Sheridan RP (2003) A model for predicting likely sites of CYP3A4mediated metabolism on drug-like molecules. J Med Chem 46(8):1330–1336. https://doi.org/10. 1021/jm020400s Sridhar J, Goyal N, Liu J, Foroozesh M (2017) Review of ligand specificity factors for CYP1A subfamily enzymes from molecular modeling studies reported to-date. Molecules 22(7):1143. https://doi.org/10.3390/molecules22071143 Stepan AF, Walker DP, Bauman J, Price DA, Baillie TA, Kalgutkar AS, Aleo MD (2011) Structural alert/reactive metabolite concept as applied in medicinal chemistry to mitigate the risk of idiosyncratic drug toxicity: a perspective based on the critical examination of trends in the top 200 drugs marketed in the United States. Chem Res Toxicol 24(9):1345–1410. https://doi.org/ 10.1021/tx200168d Stephens PJ, Devlin FJ, Chabalowski CF, Frisch MJ (1994) Ab initio calculation of vibrational absorption and circular dichroism spectra using density functional force fields. J Phys Chem 98(45):11623–11627. https://doi.org/10.1021/j100096a001 Sun Y-Z, Zhang D-H, Cai S-B, Ming Z, Li J-Q, Chen X (2018) MDAD: a special resource for microbe-drug associations. Front Cell Infect Microbiol 8:424. https://doi.org/10.3389/fcimb. 2018.00424 Sushko I, Salmina E, Potemkin VA, Poda G, Tetko IV (2012) ToxAlerts: a web server of structural alerts for toxic chemicals and compounds with potential adverse reactions. J Chem Inf Model 52(8):2310–2316. https://doi.org/10.1021/ci300245q Tarasova O, Rudik A, Dmitriev A, Lagunin A, Filimonov D, Poroikov V (2017) QNA-based prediction of sites of metabolism. Molecules 22(12):2123. https://doi.org/10.3390/molecules221 22123 Testa B, Jenner P (1976) Drug metabolism: chemical and biochemical aspects. Marcel Dekker, New York, p 500 Thompson R, Isin E, Ogese M, Mettetal J, Williams D (2016) Reactive metabolites: current and emerging risk and hazard assessments. Chem Res Toxicol 29(4):505–533. https://doi.org/10. 1021/acs.chemrestox.5b00410 Tian S, Cao X, Greiner R, Li C, Guo AC, Wishart DS (2021) CyProduct: a software tool for accurately predicting the byproducts of human cytochrome P450 metabolism. J Chem Inf Model 61(6):3128–3140. https://doi.org/10.1021/acs.jcim.1c00144 Todeschini R, Consonni V (2000) In: Mannhold R, Kubinyi H, Timmerman H (eds) Handbook of molecular descriptors. WILEY-VCH Verlag GmbH, Germany. https://doi.org/10.1002/978352 7613106 Tyzack JD, Kirchmair J (2019) Computational methods and tools to predict cytochrome P450 metabolism for drug discovery. Chem Biol Drug Des 93(4):377–386. https://doi.org/10.1111/ cbdd.13445 Tyzack JD, Mussa HY, Williamson MJ, Kirchmair J, Glen RC (2014) Cytochrome P450 site of metabolism prediction from 2D topological fingerprints using GPU accelerated probabilistic classifiers. J Cheminform 6:29. https://doi.org/10.1186/1758-2946-6-29 Veselovskii AV, Sobolev BN, Zharkova MS, Archakov AI (2010) Computer-based substrate specifity prediction for cytochrome P450. Biomed Khim 56(1):90–100. https://doi.org/10.18097/pbmc20 105601090 Vinarov Z, Abdallah M, Agundez JAG, Allegaert K, Basit AW, Braeckmans M, Ceulemans J, Corsetti M, Griffin BT, Grimm M, Keszthelyi D, Koziolek M, Madla CM, Matthys C, McCoubrey LE, Mitra A, Reppas C, Stappaerts J, Steenackers N, Trevaskis NL, Vanuytsel T, Vertzoni M, Weitschies W, Wilson C, Augustijns P (2021) Impact of gastrointestinal tract variability on oral drug absorption and pharmacokinetics: an UNGAP review. Eur J Pharm Sci 162:105812. https:// doi.org/10.1016/j.ejps.2021.105812

2 Assessment of the Xenobiotics Toxicity Taking into Account Their …

51

Wang J, Urban L, Bojanic D (2007) Maximising use of in vitro ADMET tools to predict in vivo bioavailability and safety. Expert Opin Drug Metab Toxicol 3(5):641–665. https://doi.org/10. 1517/17425255.3.5.641 Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28(1):31–36. https://doi.org/10.1021/ ci00057a005 Williams DP (2006) Toxicophores: investigations in drug safety. Toxicology 226(1):1–11. https:// doi.org/10.1016/j.tox.2006.05.101 Wilson ID, Nicholson JK (2017) Gut microbiome interactions with drug metabolism, efficacy, and toxicity. Transl Res 179:204–222. https://doi.org/10.1016/j.trsl.2016.08.002 Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z, Assempour N, Iynkkaran I, Liu Y, Maciejewski A, Gale N, Wilson A, Chin L, Cummings R, Le D, Pon A, Knox C, Wilson M (2018a) DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res 46(D1):D1074–D1082. https://doi.org/10.1093/nar/gkx1037 Wishart DS, Feunang YD, Marcu A, Guo AC, Liang K, Vázquez-Fresno R, Sajed T, Johnson D, Li C, Karu N, Sayeeda Z, Lo E, Assempour N, Berjanskii M, Singhal S, Arndt D, Liang Y, Badran H, Grant J, Serra-Cayuela A, Scalbert A (2018b) HMDB 4.0: the human metabolome database for 2018. Nucleic Acids Res 46(D1):D608–D617. https://doi.org/10.1093/nar/gkx1089 Wu K-M (2009) A new classification of prodrugs: regulatory perspectives. Pharmaceuticals 2(3):77– 81. https://doi.org/10.3390/ph2030077 Wu Y, Zhu J, Fu P, Tong W, Hong H, Chen M (2021) Machine learning for predicting risk of drug-induced autoimmune diseases by structural alerts and daily dose. Int J Environ Res Public Health 18(13):7139. https://doi.org/10.3390/ijerph18137139 Yang C, Tarkhov A, Marusczyk J, Bienfait B, Gasteiger J, Kleinoeder T, Magdziarz T, Sacher O, Schwab CH, Schwoebel J, Terfloth L, Arvidson K, Richard A, Worth A, Rathman J (2015) New publicly available chemical query language, CSRML, to support chemotype representations for application to data mining and modeling. J Chem Inf Model 55(3):510–528. https://doi.org/10. 1021/ci500667v Yang H, Lou C, Li W, Liu G, Tang Y (2020) Computational approaches to identify structural alerts and their applications in environmental toxicology and drug discovery. Chem Res Toxicol 33(6):1312–1322. https://doi.org/10.1021/acs.chemrestox.0c00006 Zakharov AV, Lagunin AA, Filimonov DA, Poroikov VV (2012) Quantitative prediction of antitarget interaction profiles for chemical compounds. Chem Res Toxicol 25(11):2378–2385. https://doi. org/10.1021/tx300247r Zaretzki J, Bergeron C, Rydberg P, Huang T-W, Bennett KP, Breneman CM (2011) RS-predictor: a new tool for predicting sites of cytochrome P450-mediated metabolism applied to CYP 3A4. J Chem Inf Model 51(7):1667–1689. https://doi.org/10.1021/ci2000488 Zaretzki J, Matlock M, Swamidass SJ (2013) XenoSite: accurately predicting CYP-mediated sites of metabolism with neural networks. J Chem Inf Model 53(12):3373–3383. https://doi.org/10. 1021/ci400518g Zhang J, Ji L, Liu W (2015) In silico prediction of cytochrome P450-mediated biotransformations of xenobiotics: a case study of epoxidation. Chem Res Toxicol 28(8):1522–1531. https://doi.org/ 10.1021/acs.chemrestox.5b00232 Zheng M, Luo X, Shen Q, Wang Y, Du Y, Zhu W, Jiang H (2009) Site of metabolism prediction for six biotransformations mediated by cytochromes P450. Bioinformatics 25(10):1251–1258. https://doi.org/10.1093/bioinformatics/btp140 Zimmermann M, Zimmermann-Kogadeeva M, Wegmann R, Goodman AL (2019) Mapping human microbiome drug metabolism by gut bacteria and their genes. Nature 570(7762):462–467. https:// doi.org/10.1038/s41586-019-1291-3

Chapter 3

Emerging Machine Learning Techniques in Predicting Adverse Drug Reactions Yi Zhong, Shanshan Wang, Gaozheng Li, Ji Yang, Zuquan Weng, and Heng Luo

3.1 Introduction Adverse drug reactions (ADRs) are one of the major causes for hospitalization and patient death. Moreover, ADRs are a leading cause of clinical trial failures and drug withdrawals (Takeda et al. 2017; Bennett et al. 2021). Thus, ADRs present significant peril to patient health (Bennett et al. 2021). Rapidly pinpointing and readily understanding potential ADRs during drug development can improve drug safety, save costs and improve patient health outcomes. Though ADR assessments take place in various stages during the drug development process, the major stages to identify ADRs include preclinical animal studies and clinical trials. However, most ADR reports, especially drug–drug interactions (DDI), usually come from postmarket surveillance or social platform reporting. In order to intercept ADRs and DDIs in earlier development process, computational frameworks including machine learning models were developed. The mechanisms of ADRs are complex and often idiosyncratic among individuals (Uetrecht 2007). Though more and more data are being generated and becoming available, the challenge of effectively extracting, processing and combining features Y. Zhong · S. Wang · G. Li · J. Yang · Z. Weng The Centre for Big Data Research in Burns and Trauma, College of Computer and Data Science, Fuzhou University, Fuzhou, Fujian, China e-mail: [email protected] Z. Weng e-mail: [email protected] Z. Weng College of Biological Science and Engineering, Fuzhou University, Fuzhou, Fujian, China H. Luo (B) MetaNovas Biotech Inc, Millbrae, CA, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Hong (ed.), Machine Learning and Deep Learning in Computational Toxicology, Computational Methods in Engineering & the Sciences, https://doi.org/10.1007/978-3-031-20730-3_3

53

54

Y. Zhong et al.

from heterogeneous and complex data types from diverse resources still remains unsolved (Lee and Chen 2021). With the recent paradigm shift from conventional machine learning models to representation learning models, the latest developments in the field have shown effectiveness for accurate feature extraction and processing for systematic assessment of ADRs (Lee and Chen 2021). While conventional machine learning methods rely heavily on expert-dependent engineered features (Zheng et al. 2019b), representation learning models such as end-to-end deep learning can automatically extract features to make accurate predictions. Additionally, representation learning has the capability to process a large amount of heterogeneous and complex data types from diverse sources. In this chapter, we will summarize different resources, types of features and machine learning models that either have been used or have potentials to be used for ADR prediction, especially the latest as well as nascent technology developments. We will also discuss potential future directions regarding the improvements of the model robustness and scalability.

3.2 Feature Generation for Machine Learning 3.2.1 Structure-Based Features Machine learning algorithms take inputs in the format of vectors or matrices; therefore, raw data are often transformed to a digitalized representation called features that carry important underlying information for the purpose of prediction (Wu et al. 2018; Lavecchia 2019; Pattanaik and Coley 2020). For example, chemical structures can be transformed to two-dimensional (2D) molecular graphs as inputs for deep learning models (Lavecchia 2019). Figure 3.1 summarizes some common data types for ADR prediction and how these features were generated. In the early stage of drug discovery, the available information of drug candidates are chemical structures (Dey et al. 2018). In order to predict drug properties and ADRs based on structure, an important step is to represent molecules as modelreadable features (Dey et al. 2018), such as molecular descriptors (e.g., molecular weight, logP, etc.), chemical fingerprints (fixed-length vectors encoding molecular structures), SMILES in the format of text-based strings or one-hot encoding vectors, molecular graphs and molecular images (Fig. 3.1a). Molecular descriptors are numerical physiochemical features extracted or calculated from the molecules, which are commonly used for structure–toxicity relationship (STR) modeling (Lémery et al. 2015; Istratoaie et al. 2018). Both onedimensional (1D) descriptors and two-dimensional (2D) descriptors are useful for ADR prediction. 1D descriptors encode physiochemical properties such as molecular weight, logP and fragment counts, which do not represent the topology of molecules. 2D descriptors describe the topological information of molecules, i.e., how the atoms are connected. They are calculated from the molecule graphs and insensitive to the number alteration of graph nodes (Isakwo et al. 2019). The two

3 Emerging Machine Learning Techniques in Predicting Adverse Drug …

55

Fig. 3.1 Data types and feature representations for ADR prediction. a Feature representations for molecular structures. A molecule can be represented as a text-based SMILES string, a one-hot encoding vector, descriptors, fingerprints, a molecular graph or an image (using acetaminophen as an example). b Feature representations for interaction and association data. The drug-ADR, drug–drug, drug-RNA and drug–protein associations can be converted into association matrices and similarity profiles as machine learning features. The similarity profiles include structural similarity profiles (SSP), target similarity profiles (TSP) and gene ontology similarity profiles (GSP). c Networks and graphs. The interaction and associations among drugs, proteins, ADRs and other entities can be presented as bipartite networks, tripartite networks, multi-scale interactome or knowledge graphs

descriptor types are often used in combination. For example, Liu et al. (2020) integrated 1D and 2D descriptors and other drug information as multiple features to predict the hepatotoxicity of oral drugs. Molecular fingerprints encode the structure of a molecule as a fixed-length vector. Typically, the vector is made up with binary digits (bits) representing whether particular substructures are present in the molecule or not (Cereto-Massagué et al. 2015).

56

Y. Zhong et al.

For instance, the extended-connectivity fingerprints (ECFP) can recognize the presence of specific circular substructures around each atom by the refined Morgan algorithm (Rogers and Hahn 2010). Other fingerprints including the molecular access system (MACCS) (Durant et al. 2002) and PubChem fingerprints (Li et al. 2009) are also commonly used to represent molecules. Though descriptors or fingerprints can represent molecules, they only partially cover molecular properties, which may not be enough for predicting clinical endpoints (Duvenaud et al. 2015). One possible solution is to combine multiple descriptors or fingerprints into a high-dimensional multiple-fingerprint feature (MFF) vector (Sandfort et al. 2020). Additionally, with help of deep neural network (DNN) algorithms, combined descriptors or fingerprints can be used for complex prediction tasks with improved performance (Xu et al. 2018). However, predefined and manually engineered descriptors or fingerprints still have limitations in terms of adaptability to predict challenging endpoints such as ADRs (Cadow et al. 2020). SMILES codes are commonly used to describe molecular structures in the format of strings (Öztürk et al. 2016). SMILES strings describe structures in specified rules or “grammar” by using symbols such as C, N and O for atoms and “−”, “=” and “#” for bonds (Öztürk et al. 2016; Subhasish and Mriganka 2020). Since SMILES strings are sequential and textual representations, natural language processing (NLP) algorithms can be used to learn “grammar” and molecular information from them (Schwalbe-Koda and Gómez-Bombarelli 2020). SMILES strings are usually encoded by one-hot encoding or word embedding as inputs for NLP models (Chen et al. 2021a). One-hot encoding is a simple method to encode SMILES strings at the char level, as illustrated in Fig. 3.1a (using acetaminophen as an example). The rows of the converted matrix represent characters in acetaminophen’s SMILES string, while columns represent unique characters in SMILES string lexicon (Kwon and Yoon 2017). The other encoding approach, word embedding, was initially developed to convert words into learnable vectors to ensure related words are nearby in the vector space (Jaeger et al. 2018). Word embeddings can directly be applied to encode SMILES strings and convert each character in SMILES into a learnable vector. Notably, most studies utilized this type of character-level SMILES representation as model inputs, while in the NLP field, both word-level and character-level encodings were common (Chen et al. 2021a). The challenge of using word-level embeddings on SMILES is the tokenization conversion of SMILES strings to chemically meaningful substrings. Li et al. (2021) proposed an algorithm called SMILES Pair Encoding (SPE) to tokenize SMILES into substrings, which was shown to have superior performances on molecular generation and quantitative structure–activity relationship (QSAR) modeling. Huang et al. (2020a) utilize a data-driven chemical Sequential Pattern Mining (SPM) algorithm to find substructures in SMILES strings, which achieved the state-of-the-art performance in the prediction of drug–drug interactions (DDI). Both methods extracted frequent substrings from SMILES strings as tokens from a large chemical dataset. However, since a molecule can be represented by more than one SMILES string, this may lead to inconsistent feature representations of the same molecule (Li and Fourches 2021). SMILES enumeration, as an data augmentation approach, may help to address this issue (Tetko et al. 2019).

3 Emerging Machine Learning Techniques in Predicting Adverse Drug …

57

While SMILES representations have been successfully adopted in ADR prediction (Huang et al. 2020a; Kwon and Yoon 2017), recent studies have introduced molecular graphs for modeling, which converts a molecule into a graph where atoms are nodes and bonds are edges (Kwon et al. 2020). Given a molecular graph G = (V, E), V is a set of attributed nodes represented by node feature vectors of atom properties and E is a set of attributed edges represented by bond feature vectors of bond properties (Jiang et al. 2021; Chen et al. 2021b). Molecular graphs can then be used as the inputs for graph-based machine learning models. Images are not a popular type of molecular representation. However, images can be direct inputs for computer vision models such as convolutional neural networks (CNNs) (Gu et al. 2018). Since many well-developed computer vision models are available for image recognition, molecular images may be used as features for ADR prediction. Recent studies have shown that models based on 2D molecule images outperformed those trained on molecular fingerprints for property prediction (Goh et al. 2017). Additionally, Dhami et al. (2019) developed a DDI prediction model based on molecular images and showed cutting-edge performance in certain evaluation metrics.

3.2.2 Interactions and Associations In addition to chemical structures, the pharmacological effects of drugs, including the interactions and associations between drugs and biological molecules such as proteins and RNAs, can also be used as features. This is critical because unexpected drug-related interactions may lead to ADRs (Pahikkala et al. 2015; Bahar et al. 2017; Edwards and Aronson 2000). The knowledge of drug-related interactions and associations may come from experimental data, including in vitro and in vivo results, as well as clinical trials (Zhou et al. 2019). For example, DrugBank (Wishart et al. 2018) collected drug targets, carriers, transporters and enzymes, which can be converted to drug–protein interactions. The Connectivity Map (Lamb et al. 2006) and LINCS projects (Subramanian et al. 2017) collected a large number of transcriptomic profiles of cell lines treated or perturbed with drug molecules and other chemicals. Figure 3.1b demonstrated a few types of drug-related associations, including drug–drug, drug–protein, drug-RNA and drug-ADR associations. As more data are becoming available, drug-related knowledge is getting more complex and how to combine and convert them into features remains a challenge. Here, we summarize two types of non-graphical feature representations for these associations, (i) association matrices and (ii) drug similarity profiles (Fig. 3.1b). Drug-related association data can be converted into matrices of binary values, where ‘1’ and ‘0’ represent the presence and absence of an association, respectively, or continuous values, where values represent the quantitative measurements of an association. In these matrices, the rows represent different drugs and the columns can be different drugs, genes, proteins, RNAs, ADRs or other types. Matrix factorization models can make direct use of these matrices to infer new links (Luo et al. 2016; Yang

58

Y. Zhong et al.

et al. 2021). Furthermore, the matrices can be an intermediate step to calculate similarities among drugs based on methods such as Jaccard index, Euclidean distances and cosine similarities. The similarity profiles calculated from drug structures and related associations are useful when predicting the pharmacological outcomes (Vilar et al. 2013). Ryu et al. (2018) proposed a method called DeepDDI that can calculate structural similarity profiles (SSP) of drug structures and feed them to a feed-forward deep neural network for DDI prediction. Lee et al. (2019) extended the SSP similarity profiles by including target similarity profiles (TSP) and Gene Ontology (GO) term similarity profiles (GSP) as inputs for DDI prediction and observed improved DDI classification performance. As the number of types of data and complexity of data are ever increasing, heterogeneous graphs were developed to store and represent multi-dimensional drug-related data (Fig. 3.1c). A heterogeneous graph can be defined as a network that contains interconnected nodes of different relation types (Pio et al. 2018). The interconnected structure can be learned for new link prediction between nodes. We classified heterogeneous graph methods into the following types depending on the network complexity as bipartite networks, tripartite networks, multi-scale interactome and knowledge graphs. Below are two typical graph definitions. • Definition 1 Given a graph G = (V, E), V denotes the sets of vertices (nodes, or entities) and E denotes the sets of edges (relations) between vertices. The graph can be an undirected network or a directed network based on the relations, and may contain multiple types of nodes and edges (Li and Pi 2020). Taking Fig. 3.1c as an example, this bipartite network contains two types of nodes, drugs and ADRs, and the associations between them as edges. For the tripartite network, proteins are added as intermediate nodes to bridge drugs and ADRs. The multiscale interactome network contains different types of nodes and edges relevant to ADR prediction, which is also called a knowledge graph as it increases in complexity. • Definition 2 A directed knowledge graph can be defined as a superset of triplets in the format of (head, relation, tail) (denoted as h, r, t), where the heads and tails are nodes and the relations are edges. The relation r denotes a relationship from node h to node t. Similar to molecular graphs, knowledge graphs can be inputs for graph-based models such as graph convolutional networks (Zitnik et al. 2018), which will be introduced in the following sections.

3.2.3 Data Sources for Feature Generation The recent development of high-throughput experimental technologies has enabled the generation of large-scale and high-dimensional data. Many databases are freely available for public access which contain drug and ADR-related data that can potentially be used for ADR prediction. Table 3.1 contains some example data sources that can be used for ADR relevant research. Some of these databases collect drugrelated information such as DrugBank and PubChem, which can be used for feature

3 Emerging Machine Learning Techniques in Predicting Adverse Drug …

59

generation when predicting ADRs. On the other hand, FAERS and SIDER can be data sources for ADR labels. Note there are many databases that collect, curate or process drug and ADR-related information, and this table isn’t an exhaustive list.

3.3 Conventional Methods for ADR Prediction After data collection and feature preparation, machine learning algorithms need to be chosen to develop an ADR prediction model. Conventional machine learning algorithms usually take manually engineered features derived from the original data as inputs (Lavecchia 2019), such as molecular descriptors and chemical fingerprints calculated from the drug structures (Luo et al. 2011; Liu et al. 2020). The task for ADR prediction is mostly either a binary classification or a multi-label classification. Binary classification aims to identify if any ADR is possibly linked to a given drug, while a multi-label classification treats each individual ADR as a label and each drug may have multiple ADR labels (Luo et al. 2011; Liu et al. 2020; Nguyen et al. 2021). Different types of features and classifiers have been used to identify potential drug-ADR associations. For example, Liu et al. (2020) collected features of oral drugs from multiple databases and trained two classifiers, logistic regression and random forest, to predict multiple endpoints of drug-induced liver injury (DILI). Additionally, based on the assumption that ADRs may be caused by unexpected drug interactions on off-targets (Luo et al. 2015, 2011; Yang et al. 2009, 2011), a group of researchers developed DRAR-CPI (Luo et al. 2011) and DDI-CPI (Luo et al. 2014) to identify potential ADRs and DDIs based on computational binding profiles between drugs and targets, respectively. The prediction was made based on similarity search and a trained logistic regression model. Zhang et al. (2015a) deployed multiple machine learning algorithms to predict DDIs by integrating multiple drug similarity features. Similarly, Shi et al. (2016) proposed a classification-based method to predict DDIs based on structure and side effect similarities. Ever since researchers introduced machine learning into this field, examples like these have become numerous. However, the latest studies have moved toward the combination of multi-dimensional complex data and the recently proposed approaches that based on graphs and deep learning. The remainder of this chapter will focus more on the latest research and emerging methods.

3.4 Emerging Methods for ADR Prediction 3.4.1 Molecule-Based Methods In previous sections, we introduced methods to represent data and generate features that can meet the requirements of learning algorithms. For drug candidates in the

60

Y. Zhong et al.

Table 3.1 Example data sources for ADR relevant research Database

Description

AEOLUS

A curated and https://datadryad.org/ standardized ADR data stash/dataset/doi:10. source from FDA’s 5061/dryad.8q0s4 Adverse Event Reporting System (FAERS)

Website

(Banda et al. 2016)

References

Bio2RDF

A bioinformatics knowledge base that collected semantic relationships of biological entities

http://bio2rdf.org/

(Belleau et al. 2008)

BindingDB

A database that contains experimental interactions between drugs and proteins

https://www.bindingdb. org/

(Liu et al. 2007)

ChEBML

A manually curated database of molecules with bioactivity data

https://www.ebi.ac.uk/ chembl/

(Gaulton et al. 2017)

Connectivity Map

Transcriptomic profiles of drug or chemical-treated cell lines

https://www.broadinst itute.org/connectivitymap-cmap

(Lamb et al. 2006)

CTD

Comparative https://ctdbase.org/ toxicogenomics database for chemical–gene–disease networks

(Davis et al. 2009)

DrugBank

Comprehensive drug information database including drug-target annotations and drug–drug interactions

https://go.drugbank. com/

(Wishart et al. 2018)

FAERS

A surveillance database of adverse drug reports from FDA

https://www.fda.gov/ drugs/surveillance/que stions-and-answersfdas-adverse-event-rep orting-system-faers

(Hoffman et al. 2014)

LINCS

A large collection of cellular gene expression and other data based on perturbing agents

https://lincsproject.org/

(Subramanian et al. 2017)

OFFSIDES

Drug side effects extracted from FAERS

http://tatonettilab.org/ offsides/

(Tatonetti et al. 2012) (continued)

3 Emerging Machine Learning Techniques in Predicting Adverse Drug …

61

Table 3.1 (continued) Database

Description

Website

References

PubChem

A comprehensive chemical database including chemical structures, properties, activities and toxicity data

https://pubchem.ncbi. nlm.nih.gov/

(Kim et al. 2021)

TWOSIDES

Drug–drug interactions mined from FAERS

http://tatonettilab.org/ offsides/

(Tatonetti et al. 2012)

SuperTarget

A data source for drug-target relationships

https://bioinformatics. charite.de/

(Günther et al. 2007)

SIDER

A side effect database curated from drug labels

http://sideeffects.emb l.de

(Kuhn et al. 2016)

Tox21

A collection of drug toxicity experimental data

https://tox21.gov/

(Tice et al. 2013)

early development stage, the only guaranteed information is often only the molecular structure. In this section, different emerging methods will be discussed based on molecular structures (Fig. 3.2).

3.4.1.1

Vanilla Deep Neural Networks (DNN)

It is a challenge to extract meaningful features and generate predictive abstractions from complex data (Lavecchia 2019; Aguayo-Ortiz and Fernández-de Gortari 2016). As a potential solution, deep learning (DL) is a technology that uses multiple layers of nonlinear processing units, or neurons, to learn high-level abstractions of data (Lavecchia 2019). The basic structure of a deep neural network (DNN) contains an input layer, hidden layers and an output layer (Fig. 3.2a) (Lavecchia 2019). In a vanilla DNN, the hidden layers are fully connected neurons, which enable powerful learning capacity. As input data get more complex, conventional machine learning algorithms may reach a plateau in performance, while DNNs have the potential to achieve more accurate prediction. However, the overfitting side effect may happen when the network is trained to memorize the data. In order to control overfitting, dropouts are widely used as an approach to randomly shut down some of the neurons during training. It was observed that using dropouts improved the overall predictive performance (Mendenhall and Meiler 2016). Just like conventional machine learning models, vanilla DNNs often require engineered features as inputs, such as molecular descriptors and chemical fingerprints. Since DNNs have advantages at learning more complicated data, recent studies have used them to train on multi-dimensional data which has yielded improved prediction performance. For example, Xie et al. (2020) combined MACCS keys and ECFP fingerprints as features (totally 2214 bits) to predict molecular properties.

62

Y. Zhong et al.

Fig. 3.2 Deep learning models based on molecular structures. a A deep neural network that uses multiple layers of nonlinear processing units to extract high-level abstractions. b A convolutional neural network (CNN) based on SMILES that can process SMILES strings into features. c A recurrent neural network (CNN) based on SMILES input. d An illustration of a CNN model based on molecular images. e A graph convolutional neural network (GCN) that generates neural fingerprints based on molecular graphs

The performance of DNNs outperformed conventional algorithms such as random forest. It was also shown that the combined fingerprints achieved better performance than single sets of fingerprints. Though vanilla DNNs are powerful, the number of parameters can get extremely large as the depth of the network increases, which may impact its training and prediction performance. As a result, variants of DNNs were developed to enable better and more efficient learning, even from the less engineered features.

3.4.1.2

Convolutional Neural Networks (CNNs)

As a commonly used variant of DNNs, convolutional neural networks (CNNs) have very important applications in the field of computer vision, including object recognition, image generation and segmentation (Li et al. 2021; Ciresan et al. 2011). Different from vanilla DNNs, CNNs are made of convolutional layers with a series

3 Emerging Machine Learning Techniques in Predicting Adverse Drug …

63

of filters. The layers perform convolution operations to create a feature map for the input data, and then a pooling layer gathers the final features and feeds them into a fully connected layer for prediction (Ciresan et al. 2011). This specific structure is effective for the feature extraction of grid-structured data such as sequences and images (Pham et al. 2016; LeCun and Bengio 1995). Though recurrent neural networks (RNN, will be introduced later) are more often used to process sequential data, several studies have applied CNNs in natural language processing (NLP) and showed a comparable performance in certain tasks, especially sentence classification such as sentiment analysis and text classification (Xue and Li 2018; Zhang et al. 2015b). Since the neurons within the same layer of CNNs can be updated in parallel, CNNs have advantages over RNNs in large-scale computations (Yin et al. 2017; Pham et al. 2016). Kim (2014) first introduced CNNs into the NLP field by representing each sentence as an n × k matrix, where n denotes number of characters in sentence and k is the feature vector size of each character. Then the matrix was used as an input of the convolutional operations. Despite its efficiency, the model has two weakness: (i) inability to capture long-distance relationships between characters and (ii) the negligence of the positional information of characters. The reason can be attributed to the fixed stride size in the convolution layers. As the sliding window was limited to several consecutive characters at each step, it is hard for the model to learn relationships across distant characters. Fortunately, several study have shown that increasing the receptive field, deepening the network depth and adding sequence positional information in the model are promising ways to capture long-distance relationships and improve the prediction performance (Wang et al. 2020a; Gehring et al. 2017). CNNs can capture hidden features for molecules in the format of either SMILES strings or molecular images (Fig. 3.2b, d). For SMILES inputs, the network encodes each character by one-hot encoding or word embeddings. Then the inputs are fed to a CNN model to learn hidden representations for prediction tasks. For example, Kwon et al. (2017) developed a CNN model to predict chemical–chemical interactions (CCI) based on SMILES. The inputs for the model were the one-hot encoding matrices from two chemicals, which were then fed into shared 1D-CNNs and fully connected layers to learn hidden representations and predict CCIs. Another study developed a CNN-based network using molecular images as inputs and showed the molecular images contained informative features to predict DDIs (Dhami et al. 2019).

3.4.1.3

Recurrent Neural Networks (RNN)

Recurrent neural networks (RNNs) are a series of neurons that can pass information sequentially from one to another, which was originally designed to extract information from sequential data (Fig. 3.2c). As researchers found that the vanilla RNNs may lack the capability of handling long-term dependencies and suffer from gradient vanishing and gradient explosion issues during training (Ribeiro et al. 2020), more complicated variants including long short-term memory networks (LSTM) (Malhotra et al. 2015) and Gated Recurrent Unit (GRU) (Chung et al. 2014) were proposed to

64

Y. Zhong et al.

address the problems. Compared to vanilla RNNs, LSTM networks pass on cellular states and control the information flow with structures called gates. An LSTM cell includes an input gate, a forget gate and an output gate. The input gate determines which value should be updated to the cellular state from the input; the forget gate decides what information from previous values should be kept in the cellular state; and the output gate operates the output values based on the hidden state and the cellular state. As a simpler alternative, GRUs combine the input gates and forget gates into update gates and merge the cellular states with hidden states. Bidirectional LSTM (BiLSTM) is another modification of the RNN architecture which learns the sequential data in both directions for better accuracy (Graves et al. 2005). The BiLSTM network includes two LSTM layers of opposite directions, which can learn sequential information both forward and backward. A max pooling layer and fully connected layers are often attached to the output of BiLSTM to generate a fixed-length output for final prediction (Lee and Chen 2021). In recent years, the concept of attention mechanisms has been introduced into neural networks (Komodakis and Zagoruyko 2017), which enabled the networks to weight and adjust latent representations of the input data. Studies found LSTM combined with attention (Att-BiLSTM) is significantly better than BiLSTM alone, as the attention layer enabled better learning and captured important information when processing long sentences (Liu and Guo 2019). Additionally, word embeddings as learned textual representations were developed to help with natural language processing (NLP) (Wang et al. 2019), which showed superiority in predictive performance compared to hand-crafted features when using RNN classifiers (Lee and Chen 2021; Yepes 2017). Studies utilized a variety of RNNs and their variants to predict molecular properties and ADRs based on SMILES inputs. For example, Chakravarti and Alla (2019) utilized LSTM with the attention mechanism to predict molecular properties based on SMILES, which showed better performance compared to models based on molecular descriptors. Zheng et al. (2019a) developed BiLSTM with self-attention mechanisms to predict drug toxicity. Since the attention layer has the ability to weight the inputs, substructures within the molecules were highlighted by the model as potential ADR structural alerts.

3.4.1.4

Graph Convolutional Neural Networks (GCN)

Although SMILES strings can be converted into machine learning model inputs, it is better if the models can learn from the molecular structures directly. However, it is challenging since molecular structures have various sizes and shapes, while neural networks such as DNN and CNN normally take fixed-size inputs. Therefore, researchers developed graph convolutional neural networks (GCN, also called GCNN) to take molecular graphs as inputs and process them into fixed-length vectors (Shen and Nicolaou 2019; Duvenaud et al. 2015). In a molecular graph, the atoms are considered as nodes and the bonds are edges. Each atom type and bond type within a molecule can be represented by corresponding feature vectors that represent

3 Emerging Machine Learning Techniques in Predicting Adverse Drug …

65

atom-specific and bond-specific properties (Yu et al. 2021). Different from predefined fingerprints such as ECFP4 and MACCS keys, molecular structures in the format of molecular graphs can be converted into fixed-length and high-level features called “neural fingerprints” via convolutions on the graph (Fig. 3.2e) (Duvenaud et al. 2015). Compared to other machine learning models, GCNs have two advantages: (1) they are end-to-end models that can efficiently convert molecules with diverse lengths into lower-dimensional fixed-length fingerprints without manual feature engineering (Shen and Nicolaou 2019); and (2) by incorporating the attention mechanism, they are interpretable models that can associate molecular substructures with properties such as clinical endpoints (Duvenaud et al. 2015; Dey et al. 2018; Jiang et al. 2021). As a variant of GCNs, graph attention networks (GATs) (Veliˇckovi´c et al. 2018) use the attention mechanism to weight neighbor features for aggregation instead of using the same weight during the convolution process. Studies showed GATs achieved better performance than GCNs in some prediction tasks (Veliˇckovi´c et al. 2018; Wang et al. 2020b). One common disadvantage of GCNs and GATs is their limitation of network depth due to gradient vanishing and over-smoothing (Xiong et al. 2019; Huang et al. 2020b). To address this problem, attentive fingerprints were proposed to use RNNs to aggregate both local and distant information, which were shown to have superior performance than GCNs or GATs during molecular property prediction (Xiong et al. 2019). Recent studies have used GCNs and their derivatives for ADR or DDI prediction. Dey et al. (2018) modified vanilla GCNs by adding attention mechanisms and used a softmax function to transform and pool substructures into fixed-sized vectors. The model has an advantage at highlighting substructures within the drugs that may be associated with ADRs such as back pain. Xu et al. (2019) proposed a DDI framework called MR-GNN, which extracted chemical features of DDI pairs from multiple layers of substructures with different receptive fields. Since MR-GNN only considered local information of atoms and atom-level similarities, Wang et al. (2020c) further improved the algorithm and proposed GoGNN. GoGNN extracted features from both atom-level and molecular graph-level connections, which improved the prediction performance. Nyamabo et al. (2021) took advantage of GATs and the coattention mechanism to construct a DDI prediction model, which enabled potential explanations of DDI-associated drug substructures.

3.4.2 Similarity-Based Methods Drugs that share similarities in structures or interactions may have similar properties, for example, the possibility to induce unexpected outcomes such as diabetes, liver dysfunction, hepatotoxicity and/or depression (Deng et al. 2020; Ibrahim et al. 2021). As a result, similarity inference models were proposed to predict drug properties and ADRs based on drug similarity profiles. Structural similarity profiles (SSP) (Ryu et al. 2018) were defined as an i-dimensional raw vector filled with structural similarity scores, where i is the number of reference drugs. This vector of similarity values

66

Y. Zhong et al.

can serve as features for property prediction. Some studies generated a similarity matrix of i rows by i columns across a library of collected drug molecules to make prediction (Deng et al. 2020). There are several ways to compute drug similarities, including Jaccard similarity, cosine similarity and Gaussian-based similarity (Zhang et al. 2018). For example, when calculating drug similarities using their binary structural fingerprints, Jaccard similarity is commonly used (Deng et al. 2020). Moreover, after generating similarity profiles, DNNs can be leveraged for in-depth training. Compared to the structure-based methods, similarity-based methods can take different types of inputs in addition to molecular structures. Ryu et al. (2018) calculated drug similarities of 2159-bit SSPs based on ECFP4 fingerprints. Then the SSP dimensionality was reduced to 50 via PCA and DNNs were utilized to predict pharmacological effects of DDIs. Lee et al. (2019) extended this work by integrating more similarity profiles including target similarity profiles (TSP) and gene ontology similarity profiles (GSP). They reduced the dimensions of these profiles via autoencoders instead of PCA. Zhang and Zang (2020) proposed a model named CNN-DDI that took similarity matrices of drug categories, targets, pathways and enzymes as inputs and used CNNs to predict DDIs. However, CNN-DDI has only four convolutional layers and may not be sufficient to process the similarity features for DDI prediction. As an improvement, DDIMDL (Deng et al. 2020) was developed as a deeper CNN model inspired from the computer vision model VGG16 (Simonyan and Zisserman 2015). DDIMDL outperformed the existing state-of-the-art DDI prediction methods, which showed a multi-modal deep learning framework is promising for DDI prediction.

3.4.3 Network- and Graph-Based Methods The interactions and associations among biological entities can be represented as networks, which are valuable resources to understand the biological processes and mechanisms for complex diseases and ADRs (Jin et al. 2021; Cheng et al. 2019). However, due to the complexity and variety of the data and structures, difficulties remain for analyzing them (de Anda-Jáuregui et al. 2019). The models that can process a comprehensive biological network require multiple layers to understand the complex relationships, including hierarchical relations and heterogeneous structures (de Anda-Jáuregui et al. 2019). Once developed, these models have been shown to be effective in ADR prediction (Kwak et al. 2020; Zhang et al. 2021). The next section focuses on the summarization of different types of network-based inference models.

3.4.3.1

Non-embedding Inference Methods

Many network inference methods are based on embeddings, which encode nodes and edges within the network by low-dimensional vectors (Li and Pi 2020). The embeddings or vectors contain useful information from the original network and can be used

3 Emerging Machine Learning Techniques in Predicting Adverse Drug …

67

for property and link prediction (Aguayo-Ortiz and Fernández-de Gortari 2016). As a simplification, non-embedding inference methods make use of the topology of networks without generating embeddings. Random walk with restart (RWR) is a ranking algorithm to measure the closeness of two nodes in a network. It has been successfully used to identify drug–drug interactions and protein-ADR associations (Li and Patra 2010; Chen et al. 2016; Zhang et al. 2017). Taking protein-ADR association as example, RWR starts from either an ADR node or a set of ADR nodes in the network, randomly visits its immediate neighbors (protein or ADR) at each step and restarts from the seed node at certain possibilities using a simulated random walker (Chen et al. 2016; Li and Patra 2010). Then the visiting probability of each node in the network is used for ranking and prediction (Chen et al. 2016). Liu et al. (2016) extended RWR by running two instances on a heterogenous network, including drug-centered RWR and diseasecentered RWR. The average probability of two RWRs was defined as the confidence score, which effectively balanced both entities in drug-disease associations. Though RWR can help to estimate associations, the probability for visiting neighbor nodes is equal even if nodes are different in a biological network (Ruiz et al. 2021). Thus, Ruiz et al. (2021) developed a biased random walk algorithm to capture the association of drugs treating diseases in the context of a multi-scale interactome. The authors defined a set of edge weights for different types of nodes. When the model walked in the network, it chose neighbors to walk to according to the predefined weights. As discussed in previous sections, the interaction or association network can also be represented in a matrix. Matrix factorization (MF) can also be used to complete missing values and infer new values in a matrix (Shtar et al. 2019; Zhou et al. 2019). MF can learn latent features from connections by factorizing the matrix into lower-dimensional submatrices (Yue et al. 2020; Rohani et al. 2020). Various studies successfully deployed MF in drug-related prediction. For example, Zhang et al. (2018) introduced manifold regularized matrix factorization for DDI prediction based on integrated drug interactions and drug-feature association matrices. Rohani et al. (2020) proposed integrated similarity-constrained matrix factorization (ISCMF) to predict DDIs from latent matrices decomposed by known DDIs.

3.4.3.2

Random Walk with Embeddings

Though non-embedding inference methods are easy to use, they mostly focus on the topology of the connections and may fail to capture the complex information within the network such as node/edge-specific properties. Therefore, embeddingbased inference methods were developed. Inspired by NLP, random walk can be used as a way of node sampling to generate sequences, which are encodable by embeddings (Perozzi et al. 2014). For example, DeepWalk was developed to conduct truncated random walks on a network to obtain a set of node sequences (Perozzi et al. 2014). Since each node in a sequence was similar to a word in a sentence, the skip-gram model (Mikolov et al. 2013) was used to learn the embeddings of each

68

Y. Zhong et al.

node (Jin et al. 2021). DeepWalk is one of the early embedding-based algorithms that combined random walk with the deep leaning technology. Compared with DeepWalk, Node2vec is a further developed and more flexible random walk algorithm, which works by combining breadth-first searching and depth-first searching (Grover and Leskovec 2016). In addition, the time complexity of Node2vec is linearly correlated with the number of nodes in a network, which is suitable for learning large networks (Li and Pi 2020). Several recent studies used Node2vec for network inference. For example, Zhao et al. (2020) proposed a framework of compact feature learning for drug-target interaction prediction. They constructed a drug–protein network and used the Node2vec algorithm to learn the topology information. Then the topology embeddings were combined with other information such chemical structures as final embeddings. Chen et al. (2020b) constructed a more complex network including drugs, proteins, diseases, lncRNAs and miRNAs. Node2vec was used to learn the node embeddings. Then the drug embeddings were fed to a random forest classifier for drug-target prediction, which demonstrated superior performance than models using traditional features such as chemical fingerprints.

3.4.3.3

Knowledge Graph and Translational Distance Models

A knowledge graph (KG) is usually a heterogeneous graph that contains different types of nodes (entities) and edges (relations between entities). In a biomedical knowledge graph, nodes can be biomedical entities such as drugs, genes and diseases, while edges are relations between them such as “treatment” and “binding” (Percha and Altman 2018; Zheng et al. 2021; Himmelstein et al. 2017). Knowledge graphs have been widely used in drug repurposing (Richardson et al. 2020), drug-target discovery (Mohamed et al. 2019) and ADR prediction (Bean et al. 2017). A simple schema of a biomedical knowledge graph is illustrated in Fig. 3.3a. The knowledge in a graph can be represented as triplets (head, relation, tail). For example, a drug (“A”) binding to a target (“X”) can be represented as (drug A, binds_to, target X). When using the knowledge graph to make predictions, models usually convert the entities and relations into embeddings to represent their properties and connectivity. Scoring functions are developed to calculate a score for each triple of (head, relation, tail) using their embeddings. A higher score indicates a higher probability of correctness for this triplet to exist. Different types of models can be used to learn and optimize the embeddings, including translational distance models, semantic matching models and neural network models (Fig. 3.3a). For translational distance models, given each triplet (head, relation, tail), the model treats the relation as a translation from the head to the tail in the embedding space. For example, as one translational distance model, TransE (Fig. 3.3b) (Bordes et al. 2013) assumes: h +r ≈ t

(3.1)

3 Emerging Machine Learning Techniques in Predicting Adverse Drug …

69

Fig. 3.3 Knowledge graph inference models. a The process of embedding learning and link prediction in a knowledge graph. In the example knowledge graph, blue nodes represent drugs and yellow nodes represent proteins. Different models can be used to learn embeddings from the knowledge graph and make link predictions. This figure shows how a model learns from the embedding of a triplet (drug A, synergy, drug C). b An illustration of the TransE model. c The workflow of a RESCAL model. d A step-by-step breakdown of the ConvE algorithm

where h, r and t denote the embeddings of the head, the relation and the tail, respectively. For each triplet, the model optimizes the embeddings to equalize the two sides of equation as much as possible. However, since an entity may exist in more than one triplet connecting with different entities, the two sides of equation may not always equalize. Their difference is defined as the scoring function: S = −||h + r − t||1/2

(3.2)

By optimizing the scoring function, the model learns the embeddings from the graph. However, the assumption of TransE is too simple to capture complicated connections among many different entities in the graph. Therefore, variants were proposed. For example, TransH (Wang et al. 2014) further represents each relation as two vectors, the norm vector and the translation vector on the hyperplane; TransR (Lin et al. 2015) models entities and relations in two different spaces, namely the entity space and the multiple relation space; and TransD (Ji et al. 2015) uses two vectors

70

Y. Zhong et al.

to represent each entity and relation. All these variations use more embeddings to improve the representation of nodes and edges in the heterogeneous graph. Studies have utilized translational distance models to learn embeddings from biological networks and make predictions. For example, Celebi et al. (2019) extracted drug-related knowledge from Bio2RDF (Callahan et al. 2013) to construct a knowledge graph, and used TransE and TransD to learn the embeddings of entities. Then the embeddings of a drug pair were concatenated together as feature inputs for logistic regression, naive bayes and random forest models to predict its DDI probability. Abdelaziz et al. (2017) used TransH and HolE (Nickel et al. 2016) to learn the embeddings of drugs in a knowledge graph and proposed a method called Tiresias to predict DDIs based on drug similarities from the learned embeddings and other resources such as chemical structures and drug targets. The study showed that by adding the learned embeddings to drug similarity calculation, the prediction performance improved by an obvious margin. In addition to the above translational distance models, RotatE (Sun et al. 2018) and HAKE (Zhang et al. 2020) were further proposed for learning more complex information in a graph. RotatE defines relation as rotation from the head to the tail in the complex vector space. It can model three types of advanced relation patterns, symmetry (e.g., synergy between two drugs), inversion (e.g., hypernym and hyponym) and composition (e.g., a drug cures a disease by affecting a protein), which are common and important in a knowledge graph but not modeled well by previous methods (Sun et al. 2018). HAKE maps entities to a polar coordinate system that consists of the modulus part and the phase part, representing the hierarchy across different levels and differences within the same level, respectively. Therefore, it can learn hierarchical information between entities better (e.g., one disease is a subclass of another).

3.4.3.4

Semantic Matching Models

Semantic matching models are based on learning similarities of knowledge representation in a knowledge graph. It assumes that for entities that connect to the same relation, their embeddings are considered similar in the embedding space. A typical semantic matching model is RESCAL (Fig. 3.3c) (Nickel et al. 2011). It represents each entity as a vector and each relation as a matrix. The function below is used to calculate the triplet score. f (h, t) = h T Mr t

(3.3)

In this formula, h and t denote the embeddings of the head and the tail, respectively. Mr denotes the matrix of the relation. RESCAL represents the relation as a matrix and uses bilinear functions to calculate the triple score, which has a high computational complexity. As a simplification, Yang et al. (2015) proposed the DistMult approach

3 Emerging Machine Learning Techniques in Predicting Adverse Drug …

71

that modified the relational matrix to a diagonal matrix, which reduced the computational complexity. However, as a result, DistMult can only model symmetric relations. To solve this problem, Trouillon et al. (2016) proposed ComplEx to represent entities and relations as complex vectors. Since the Hermitian product between two complex numbers depends on the input order, ComplEx can model both symmetric and asymmetric relations (Trouillon et al. 2016). Semantic matching models have been widely used to learn the semantics of entities and relations within a knowledge graph. For example, Malone et al. (2018) found the symmetric relationship modeling in DistMult was very suitable for the study of polypharmacy side effects, since the drug–drug relationship is synergetic and symmetric. They used DistMult to learn representations of entities and relations in the knowledge graph compiled by Zitnik et al. (2018), containing 4,649,411 DDIs and 11,501 drug-target interactions. The results showed that DistMult improved the area under the receiver operating characteristic (AUROC) curve from 0.872 to 0.923 and the area under the precision-recall (AUPR) curve from 0.832 to 0.898 for the prediction of adverse side effects of polypharmacy compared to Zitnik et al. (2018). Paliwal et al. (2020) constructed a knowledge graph that contained drugs, genes, diseases, pathways and a large number of asymmetric relations between them for target discovery. The author compared ComplEx with DistMult and found ComplEx was better than DistMult in knowledge representation (recall@200 for Complex and DistMult was 65.3% and 59.63%, respectively). Nováˇcek and Mohamed (2020) proposed TriVec based on ComplEx and DistMult to predict the side effects of polypharmacy. In this study, each entity and relation were represented by three vectors instead of just one, which enabled the model to learn both asymmetric and symmetric relationships. Test results based on the data of Zitnik et al. (2018) showed that TriVec had better performance than DistMult or ComplEx.

3.4.3.5

CNN and Its Derivatives

In addition to images and molecular graphs, convolution neural networks (CNNs) can also be applied to knowledge graph inference. Dettmers et al. (2018) proposed ConvE, which leveraged two-dimensional convolutions during knowledge graph representation learning. The convolution kernel was utilized to extract the interactive features from the head and relation embeddings to generate a series of feature maps, which were then flattened into a vector and mapped to the same dimension as the tail vector through fully connected layers for the calculation of a triplet score (Fig. 3.3d). The ConvE model was shown to obtain better performance than Complex or DistMult on the benchmark dataset FB15K-237 (Toutanova and Chen 2015). Based on ConvE, Wang et al. (2020d) proposed a method to predict the side effects of polypharmacy. The authors swapped the positions of relation embeddings and tail entity embeddings and found it more suitable for DDI prediction compared to the original ConvE (Wang et al. 2020d). Jiang et al. (2019) pointed out that the convolution kernel used to combine the head and the relation matrices only obtained the interaction information

72

Y. Zhong et al.

of the two adjacent dimensions, which were insufficient. In response to this challenge, they proposed a tool named ConvR to segment and reshape the embeddings of relations into several convolution kernels to strengthen the interactions. The tests on the WN18RR (Dettmers et al. 2018) and FB15K-237 datasets showed that the knowledge representation of ConvR was superior to ConvE. Other than ConvE, graph convolutional neural networks (GCN) can be used to learn embeddings of knowledge graphs similar to molecular graphs. Nathani et al. (Nathani et al. 2019) utilized GCNs to learn knowledge representation in a knowledge graph, and then combined the trained embeddings with CNNs to compute the triplet score. Since GCNs focus on local information and are often too shallow to capture the global connectivity, DGCN (Zhuang and Ma 2018) was developed to extend GCNs by using dual graph convolutional networks. It contained the graph adjacency matrix-based convolution and the positive pointwise mutual information (PPMI) matrix-based convolution to capture both local and global information. Another solution, Cluster-GCN, was proposed by Chiang et al. (2019) to efficiently train deep and large GCNs. It leveraged a graph clustering algorithm to identify dense subgraphs associated with a set of sampled nodes and restricted the neighbor search within the subgraph. Cluster-GCN was capable of sampling meaningful substructures of the network with improved memory usage and computational efficiency. Zhang et al. (2019) proposed a heterogeneous graph neural network called HetGNN, which incorporated the random walk with restart (RWR) strategy to sample fix-sized heterogeneous neighbors that are strongly correlated for each node and grouped them according to their node types. Then two deep GCN modules were used to aggregate feature information of sampled nodes. It was observed that HetGNN outperformed graph attention networks (GAT) in many different link prediction tests. As an emerging technology for learning both molecular graphs and knowledge graphs, there is a preponderance of evidence that GCNs have effectively extracted information from drug-related networks (Lin et al. 2020; Kwak et al. 2020; Jiang et al. 2020; Fu et al. 2020).

3.4.3.6

Generative Models

Generative models were developed to automatically discover the input data patterns and generate novel samples given certain conditional probabilities. They have been used to generate images, videos (Güera and Delp 2018), texts (Floridi and Chiriatti 2020) and even molecular structures (Marques et al. 2021). Recent studies introduced various deep learning generative frameworks into network embedding learning, such as autoencoders (Wang et al. 2016; Yang et al. 2019) and generative adversarial networks (GAN) (Goodfellow et al. 2014; Chen et al. 2020a). Structural deep network embedding (SDNE) (Wang et al. 2016) and deep neural networks for graph representations (DNGR) (Wang et al. 2016) are two examples based on autoencoders. Similar to LINE (Tang et al. 2015), SDNE optimizes the connection weights of nodes (firstorder proximity) and the context similarities of nodes (second-order proximity) to

3 Emerging Machine Learning Techniques in Predicting Adverse Drug …

73

learn both local and global network structures. Specifically, it uses a deep autoencoder to preserve high-order proximity (Li and Pi 2020). DNGR captures network structure information with a positive pointwise mutual information (PPMI) matrix and obtains the node embeddings with a denoising autoencoder. GraphGAN (Wang et al. 2018) is a novel network embedding method which inherited the structure of GAN. A GAN generally includes two parts, a generator and a discriminator. The generator learns to generate a data distribution of interest from a latent space, while the discriminator distinguishes the data with the true distribution of inputs (Goodfellow et al. 2014). In GraphGAN, for a given vertex in network, the generator tries to fit its true connectivity distribution over all vertices and generate fake connections to fuzz the discriminator, while the discriminator tries to distinguish the ground-truth vertex against the fake ones generated by the generator. The generative models have been used to learn drug-related knowledge graph representations. For example, Zeng et al. (2020) developed deepDTnet for new target identification and drug repurposing, which utilizes DNGR to learn vector representations for both drugs and targets from integrated heterogeneous drug–gene–disease network. Karimi et al. (2020) introduced hierarchical variational graph autoencoders to learn embeddings from the gene–gene, gene–disease and disease–disease networks, which were used to propose new drug combinations.

3.5 ADR Prediction Future Directions Based on the latest development of emerging technologies, there are several future directions for identifying and preventing ADRs based on current literature review: • Big data integration. Biological relationship prediction is a nascent research field (Pan et al. 2021), and the future will see the integration of more large-scale data such as multi-omics. However, since useful information may be buried in noise, the trade-off between the quality and the quantity of data may continue to exist. Additionally, since different databases have diverse data structures, terms and contents (Pan et al. 2021), data integration, cleaning and quality control remains challenging. As a solution, machine learning may help to extract useful information from uncurated data or unstructured text (Thirumuruganathan et al. 2020) and cross boundaries among data sources (Pan et al. 2021). By integrating from different dimensions of sources, we may be able to make superior and highly confident prediction given multiple levels of evidence. • Transfer learning. Though the amount of data is increasing rapidly, the quantity of high-quality labeled data may still be a bottleneck. For example, the public data available for some pharmacokinetics endpoints are still not comparable to proprietary information held by large pharmaceutical companies in terms of data size and quality. Learning on available small-scale data is a challenge, and one solution to this issue is to inherit the experience and knowledge from other similar datasets and models (transfer learning). For example, researchers can pretrain a

74

Y. Zhong et al.

GCN network with a large number of molecules to learn structural representations before using it to predict molecule properties. However, the choice of input representations and models for pretraining still remains a challenge (Yu et al. 2021). • Negative sampling. While knowledge graphs are great learning sources for link prediction, true negative samples are not always available in data. As a result, researchers may often use randomly generated negatives for the models to train on. However, negative sampling is a key factor to the prediction accuracy as the quality of generated samples directly impacts the performance of the downstream applications (Qian 2021; Kanojia et al. 2017). Since proper sampling strategies are necessary for good predictive performance, reporting true negative samples and findings are still important for the scientific community and publishers to consider. • Model interpretability. As many deep learning models are considered black boxes, model interpretability is important for users to be able to understand how the prediction is made. Models with attention mechanisms such as GATs can offer interpretability by weighting nodes or edges in a graph (Molnar 2020). The interpretability of models can also be improved by modifying the feature representation. For example, Huang et al. (2020a) used SMILES substrings instead of original SMILES to predict DDIs. The model then weighted the substrings which are meaningful substructures and can be used for structural explanations, thus providing interpretability.

References Abdelaziz I, Fokoue A, Hassanzadeh O, Zhang P, Sadoghi M (2017) Large-scale structural and textual similarity-based mining of knowledge graph to predict drug–drug interactions. J Web Semant 44:104–117 Aguayo-Ortiz R, Fernández-de Gortari E (2016) Overview of computer-aided drug design for epigenetic targets. Epi-informatics. Elsevier, Cambridge, pp 21–52 Bahar MA, Setiawan D, Hak E, Wilffert B (2017) Pharmacogenetics of drug–drug interaction and drug–drug–gene interaction: a systematic review on CYP2C9, CYP2C19 and CYP2D6. Pharmacogenomics 18(7):701–739 Banda JM, Evans L, Vanguri RS, Tatonetti NP, Ryan PB, Shah NH (2016) A curated and standardized adverse drug event resource to accelerate drug safety research. Sci Data 3(1):1–11 Bean DM, Wu H, Iqbal E, Dzahini O, Ibrahim ZM, Broadbent M, Stewart R, Dobson RJ (2017) Knowledge graph prediction of unknown adverse drug reactions and validation in electronic health records. Sci Rep 7(1):1–11 Belleau F, Nolin M-A, Tourigny N, Rigault P, Morissette J (2008) Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J Biomed Inform 41(5):706–716 Bennett CL, Hoque S, Olivieri N, Taylor MA, Aboulafia D, Lubaczewski C, Bennett AC, Vemula J, Schooley B, Witherspoon BJ (2021) Consequences to patients, clinicians, and manufacturers when very serious adverse drug reactions are identified (1997–2019): a qualitative analysis from the Southern network on adverse reactions (SONAR). EClinicalMedicine 31:100693

3 Emerging Machine Learning Techniques in Predicting Adverse Drug …

75

Bordes A, Usunier N, Garcia-Durán A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Proceedings of the 26th international conference on neural information processing systems, vol 2. Curran Associates Inc., Red Hook, pp 2787–2795 Cadow J, Born J, Manica M, Oskooei A, Rodríguez Martínez M (2020) PaccMann: a web service for interpretable anticancer compound sensitivity prediction. Nucl Acids Res 48(W1):W502–W508 Callahan A, Cruz-Toledo J, Ansell P, Dumontier M (2013) Bio2RDF release 2: improved coverage, interoperability and provenance of life science linked data. The semantic web: semantics and big data. Springer, Berlin, pp 200–212 Celebi R, Uyar H, Yasar E, Gumus O, Dikenelli O, Dumontier M (2019) Evaluation of knowledge graph embedding approaches for drug–drug interaction prediction in realistic settings. BMC Bioinform 20(1):1–14 Cereto-Massagué A, Ojeda MJ, Valls C, Mulero M, Garcia-Vallvé S, Pujadas G (2015) Molecular fingerprint similarity search in virtual screening. Methods 71:58–63 Chakravarti SK, Alla SRM (2019) Descriptor free QSAR modeling using deep learning with long short-term memory neural networks. Front Artif Intell 2:17 Chen X, Shi H, Yang F, Yang L, Lv Y, Wang S, Dai E, Sun D, Jiang W (2016) Large-scale identification of adverse drug reaction-related proteins through a random walk model. Sci Rep 6(1):1–10 Chen J, Zhang D, Lin X (2020a) Adaptive adversarial attack on graph embedding via GAN. In: International symposium on security and privacy in social networks and big data. Springer, Singapore, pp 72–84 Chen Z-H, You Z-H, Guo Z-H, Yi H-C, Luo G-X, Wang Y-B (2020b) Predicting drug-target interactions by Node2vec node embedding in molecular associations network. International conference on intelligent computing. Springer, Cham, pp 348–358 Chen G, Tao L, Li Y (2021a) Predicting polymers’ glass transition temperature by a chemical language processing model. Polymers 13(11):1898 Chen J, Zheng S, Song Y, Rao J, Yang Y (2021b) Learning attributed graph representation with communicative message passing transformer. In: Proceedings of the thirtieth international joint conference on artificial intelligence, pp 2242–2248 Cheng F, Kovács IA, Barabási A-L (2019) Network-based prediction of drug combinations. Nat Commun 10(1):1–11 Chiang W-L, Liu X, Si S, Li Y, Bengio S, Hsieh C-J (2019) Cluster-GCN: an efficient algorithm for training deep and large graph convolutional networks. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining, pp 257–266 Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 workshop on deep learning, pp 1–9 Ciresan DC, Meier U, Masci J, Gambardella LM, Schmidhuber J (2011) Flexible, high performance convolutional neural networks for image classification. In: Twenty-second international joint conference on artificial intelligence, pp 1237–1242 Davis AP, Murphy CG, Saraceni-Richards CA, Rosenstein MC, Wiegers TC, Mattingly CJ (2009) Comparative toxicogenomics database: a knowledgebase and discovery tool for chemical–gene– disease networks. Nucl Acids Res 37(suppl_1):D786–D792 de Anda-Jáuregui G, Guo K, Hur J (2019) Network-based assessment of adverse drug reaction risk in polypharmacy using high-throughput screening data. Int J Mol Sci 20(2):386 Deng Y, Xu X, Qiu Y, Xia J, Zhang W, Liu S (2020) A multimodal deep learning framework for predicting drug–drug interaction events. Bioinformatics 36(15):4316–4322 Dettmers T, Minervini P, Stenetorp P, Riedel S (2018) Convolutional 2D knowledge graph embeddings. In: Proceedings of the AAAI conference on artificial intelligence, vol 32(1), pp 1–9 Dey S, Luo H, Fokoue A, Hu J, Zhang P (2018) Predicting adverse drug reactions through interpretable deep learning framework. BMC Bioinform 19(21):1–13

76

Y. Zhong et al.

Dhami DS, Yan S, Kunapuli G, Page D, Natarajan S (2019) Beyond textual data: predicting drug–drug interactions from molecular structure images using Siamese neural networks. arXiv: 191106356 Durant JL, Leland BA, Henry DR, Nourse JG (2002) Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci 42(6):1273–1280 Duvenaud DK, Maclaurin D, Iparraguirre J, Bombarell R, Hirzel T, Aspuru-Guzik A, Adams RP (2015) Convolutional networks on graphs for learning molecular fingerprints. Adv Neural Inf Process Syst 28:2224–2232 Edwards IR, Aronson JK (2000) Adverse drug reactions: definitions, diagnosis, and management. Lancet 356(9237):1255–1259 Floridi L, Chiriatti M (2020) GPT-3: its nature, scope, limits, and consequences. Mind Mach 30(4):681–694 Fu S, Liu W, Tao D, Zhou Y, Nie L (2020) HesGCN: Hessian graph convolutional networks for semi-supervised classification. Inf Sci 514:484–498 Gaulton A, Hersey A, Nowotka M, Bento AP, Chambers J, Mendez D, Mutowo P, Atkinson F, Bellis LJ, Cibrian-Uhalte E, Davies M, Dedman N, Karlsson A, Magarinos MP, Overington JP, Papadatos G, Smit I, Leach AR (2017) The ChEMBL database in 2017. Nucl Acids Res 45(D1):D945–D954 Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. In: International conference on machine learning, pp 1243–1252 Goh GB, Siegel C, Vishnu A, Hodas NO, Baker N (2017) Chemception: a deep neural network with minimal chemistry knowledge matches the performance of expert-developed QSAR/QSPR models. arXiv:170606689 Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial networks. Adv Neural Inf Process Syst 3:2672–2680 Graves A, Fernández S, Schmidhuber J (2005) Bidirectional LSTM networks for improved phoneme classification and recognition. International conference on artificial neural networks. Springer, Berlin, pp 799–804 Grover A, Leskovec J (2016) Node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 855–864 Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X, Wang G, Cai J (2018) Recent advances in convolutional neural networks. Pattern Recogn 77:354–377 Güera D, Delp EJ (2018) Deepfake video detection using recurrent neural networks. In: IEEE international conference on advanced video and signal based surveillance, pp 1–6 Günther S, Kuhn M, Dunkel M, Campillos M, Senger C, Petsalaki E, Ahmed J, Urdiales EG, Gewiess A, Jensen LJ (2007) SuperTarget and matador: resources for exploring drug-target relationships. Nucl Acids Res 36(Suppl_1):D919–D922 Himmelstein DS, Lizee A, Hessler C, Brueggeman L, Chen SL, Hadley D, Green A, Khankhanian P, Baranzini SE (2017) Systematic integration of biomedical knowledge prioritizes drugs for repurposing. Elife 6:e26726 Hoffman KB, Dimbil M, Erdman CB, Tatonetti NP, Overstreet BM (2014) The Weber effect and the United States Food and Drug Administration’s adverse event reporting system (FAERS): analysis of sixty-two drugs approved from 2006 to 2010. Drug Saf 37(4):283–294 Huang K, Xiao C, Hoang T, Glass L, Sun J (2020a) Caster: predicting drug interactions with chemical substructure representation. In: Proceedings of the AAAI conference on artificial intelligence, pp 702–709 Huang W, Rong Y, Xu T, Sun F, Huang J (2020b) Tackling over-smoothing for general graph convolutional networks. arXiv:200809864 Ibrahim H, El Kerdawy AM, Abdo A, Eldin AS (2021) Similarity-based machine learning framework for predicting safety signals of adverse drug–drug interactions. Inform Med Unlock 26:100699

3 Emerging Machine Learning Techniques in Predicting Adverse Drug …

77

Isakwo G, Luttrell IV J, Chen M, Hong H, Gong P, Zhang C (2019) A review of feature reduction methods for QSAR-based toxicity prediction. In: Challenges and advances in computational chemistry and physics, vol 30, p 119 Istratoaie O, Rotaru LT, Varut RM, Varut MC, Fortofoiu MC, Fortofoiu M, Kostici R (2018) QSAR study of ORL1 agonist analgesic effect of some imidazoles with molecular descriptors. Rev Chim (bucharest) 69:459–462 Jaeger S, Fulle S, Turk S (2018) Mol2vec: unsupervised machine learning approach with chemical intuition. J Chem Inf Model 58(1):27–35 Ji G, He S, Xu L, Liu K, Zhao J (2015) Knowledge graph embedding via dynamic mapping matrix. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: long papers), pp 687–696 Jiang X, Wang Q, Wang B (2019) Adaptive convolution for multi-relational learning. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pp 978–987 Jiang H, Cao P, Xu M, Yang J, Zaiane O (2020) Hi-GCN: a hierarchical graph convolution network for graph embedding learning of brain network and brain disorders prediction. Comput Biol Med 127:104096 Jiang D, Wu Z, Hsieh C-Y, Chen G, Liao B, Wang Z, Shen C, Cao D, Wu J, Hou T (2021) Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models. J Cheminform 13(1):1–23 Jin S, Zeng X, Xia F, Huang W, Liu X (2021) Application of deep learning methods in biological networks. Brief Bioinform 22(2):1902–1917 Kanojia V, Maeda H, Togashi R, Fujita S (2017) Enhancing knowledge graph embedding with probabilistic negative sampling. In: Proceedings of the 26th international conference on world wide web companion, pp 801–802 Karimi M, Hasanzadeh A, Shen Y (2020) Network-principled deep generative models for designing drug combinations as graph sets. Bioinformatics 36(Suppl_1):i445–i454 Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, Stroudsburg, PA, pp 1746–1751 Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, Li Q, Shoemaker BA, Thiessen PA, Yu B, Zaslavsky L, Zhang J, Bolton EE (2021) PubChem in 2021: new data content and improved web interfaces. Nucl Acids Res 49(D1):D1388–D1395 Komodakis N, Zagoruyko S (2017) Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: The international conference on learning representations (ICLR), pp 1–13 Kuhn M, Letunic I, Jensen LJ, Bork P (2016) The SIDER database of drugs and side effects. Nucl Acids Res 44(D1):D1075–D1079 Kwak H, Lee M, Yoon S, Chang J, Park S, Jung K (2020) Drug-disease graph: predicting adverse drug reaction signals via graph neural network with clinical data. In: Advances in knowledge discovery and data mining, vol 12085, p 633 Kwon S, Yoon S (2017) Deepcci: End-to-end deep learning for chemical-chemical interaction prediction. In: Proceedings of the 8th ACM international conference on bioinformatics, computational biology, and health informatics, pp 203–212 Kwon Y, Lee D, Choi Y-S, Shin K, Kang S (2020) Compressed graph representation for scalable molecular graph generation. J Cheminform 12(1):1–8 Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, Wrobel MJ, Lerner J, Brunet JP, Subramanian A, Ross KN, Reich M, Hieronymus H, Wei G, Armstrong SA, Haggarty SJ, Clemons PA, Wei R, Carr SA, Lander ES, Golub TR (2006) The connectivity map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313(5795):1929–1935 Lavecchia A (2019) Deep learning in drug discovery: opportunities, challenges and future prospects. Drug Discovery Today 24(10):2017–2032

78

Y. Zhong et al.

LeCun Y, Bengio Y (1995) Convolutional networks for images, speech, and time series. In: The handbook of brain theory and neural networks, vol 3361(10), p 1995 Lee CY, Chen Y-PP (2021) Prediction of drug adverse events using deep learning in pharmaceutical discovery. Brief Bioinform 22(2):1884–1901 Lee G, Park C, Ahn J (2019) Novel deep learning model for more accurate prediction of drug–drug interaction effects. BMC Bioinform 20(1):1–8 Lémery E, Briançon S, Chevalier Y, Bordes C, Oddos T, Gohier A, Bolzinger M-A (2015) Skin toxicity of surfactants: structure/toxicity relationships. Colloids Surf A 469:166–179 Li X, Fourches D (2021) SMILES pair encoding: a data-driven substructure tokenization algorithm for deep learning. J Chem Inf Model 61(4):1560–1569 Li Y, Patra JC (2010) Genome-wide inferring gene–phenotype relationship by walking on the heterogeneous network. Bioinformatics 26(9):1219–1224 Li B, Pi D (2020) Network representation learning: a systematic literature review. Neural Comput Appl 32:16647–16679 Li Q, Wang Y, Bryant SH (2009) A novel method for mining highly imbalanced high-throughput screening data in PubChem. Bioinformatics 25(24):3310–3316 Li Z, Liu F, Yang W, Peng S, Zhou J (2021) A survey of convolutional neural networks: analysis, applications, and prospects. IEEE Trans Neural Netw Learn Syst 1–21 Lin Y, Liu Z, Sun M, Liu Y, Zhu X (2015) Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of the AAAI conference on artificial intelligence, vol 29(1), pp 2181–2187 Lin X, Quan Z, Wang Z-J, Ma T, Zeng X (2020) KGNN: knowledge graph neural network for drug– drug interaction prediction. In: International joint conference on artificial intelligence (IJCAI), pp 2739–2745 Liu G, Guo J (2019) Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing 337:325–338 Liu T, Lin Y, Wen X, Jorissen RN, Gilson MK (2007) BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities. Nucl Acids Res 35(suppl_1):D198– D201 Liu H, Song Y, Guan J, Luo L, Zhuang Z (2016) Inferring new indications for approved drugs via random walk on drug-disease heterogenous networks. BMC Bioinform 17(17):269–277 Liu X, Zheng D, Zhong Y, Xia Z, Luo H, Weng Z (2020) Machine-learning prediction of oral drug-induced liver injury (DILI) via multiple features and endpoints. Biomed Res Int 2020:1–10 Luo H, Chen J, Shi L, Mikailov M, Zhu H, Wang K, He L, Yang L (2011) DRAR-CPI: a server for identifying drug repositioning potential and adverse drug reactions via the chemical–protein interactome. Nucl Acids Res 39(suppl_2):W492–W498 Luo H, Zhang P, Huang H, Huang J, Kao E, Shi L, He L, Yang L (2014) DDI-CPI, a server that predicts drug–drug interactions through implementing the chemical–protein interactome. Nucl Acids Res 42(W1):W46–W52 Luo H, Du T, Zhou P, Yang L, Mei H, Ng H, Zhang W, Shu M, Tong W, Shi L (2015) Molecular docking to identify associations between drugs and class I human leukocyte antigens for predicting idiosyncratic drug reactions. Comb Chem High Throughput Screening 18(3):296–304 Luo H, Ye H, Ng HW, Sakkiah S, Mendrick DL, Hong H (2016) sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides. Sci Rep 6(1):1–10 Malhotra P, Vig L, Shroff G, Agarwal P (2015) Long short term memory networks for anomaly detection in time series. In: Proceeding of European symposium on artificial neural networks, computational intelligence, and machine learning, pp 89–94 Malone B, García-Durán A, Niepert M (2018) Knowledge graph completion to predict polypharmacy side effects. International conference on data integration in the life sciences. Springer, Berlin, pp 144–149 Marques G, Leswing K, Robertson T, Giesen D, Halls MD, Goldberg A, Marshall K, Staker J, Morisato T, Maeshima H, Arai H, Sasago M, Fujii E, Matsuzawa NN (2021) De Novo design of

3 Emerging Machine Learning Techniques in Predicting Adverse Drug …

79

molecules with low hole reorganization energy based on a quarter-million molecule DFT screen. J Phys Chem A 125(33):7331–7343 Mendenhall J, Meiler J (2016) Improving quantitative structure–activity relationship models using artificial neural networks trained with dropout. J Comput Aided Mol Des 30(2):177–189 Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119 Mohamed SK, Nounu A, Nováˇcek V (2019) Drug target discovery using knowledge graph embeddings. In: Proceedings of the 34th ACM/SIGAPP symposium on applied computing, pp 11–18 Molnar C (2020) Interpretable models. In: Interpretable machine learning. Lulu, Research Triangle, NC, pp 79–101 Nathani D, Chauhan J, Sharma C, Kaul M (2019) Learning attention-based embeddings for relation prediction in knowledge graphs. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 4710–4723 Nguyen DA, Nguyen CH, Mamitsuka H (2021) A survey on adverse drug reaction studies: data, tasks and machine learning methods. Brief Bioinform 22(1):164–177 Nickel M, Tresp V, Kriegel H-P (2011) A three-way model for collective learning on multi-relational data. In: Proceedings of the 28th international conference on international conference on machine learning, pp 809–816 Nickel M, Rosasco L, Poggio T (2016) Holographic embeddings of knowledge graphs. In: Proceedings of the AAAI conference on artificial intelligence, pp 1955–1961 Nováˇcek V, Mohamed SK (2020) Predicting polypharmacy side-effects using knowledge graph embeddings. AMIA Summits Transl Sci Proc 2020:449 Nyamabo AK, Yu H, Shi J-Y (2021) SSI–DDI: substructure–substructure interactions for drug–drug interaction prediction. Brief Bioinform 22:bbab133 Öztürk H, Ozkirimli E, Özgür A (2016) A comparative study of SMILES-based compound similarity functions for drug-target interaction prediction. BMC Bioinform 17(1):1–11 Pahikkala T, Airola A, Pietilä S, Shakyawar S, Szwajda A, Tang J, Aittokallio T (2015) Toward more realistic drug–target interaction predictions. Brief Bioinform 16(2):325–337 Paliwal S, de Giorgio A, Neil D, Michel J-B, Lacoste AM (2020) Preclinical validation of therapeutic targets predicted by tensor factorization on heterogeneous graphs. Sci Rep 10(1):1–19 Pan Y, Lei X, Zhang Y (2021) Association predictions of genomics, proteinomics, transcriptomics, microbiome, metabolomics, pathomics, radiomics, drug, symptoms, environment factor, and disease networks: a comprehensive approach. Med Res Rev 42(1):441–461 Pattanaik L, Coley CW (2020) Molecular representation: going long on fingerprints. Chem 6(6):1204–1207 Percha B, Altman RB (2018) A global network of biomedical relationships derived from text. Bioinformatics 34(15):2614–2624 Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 701–710 Pham N-Q, Kruszewski G, Boleda G (2016) Convolutional neural network language models. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 1153–1162 Pio G, Serafino F, Malerba D, Ceci M (2018) Multi-type clustering and classification from heterogeneous networks. Inf Sci 425:107–126 Qian J (2021) Understanding negative sampling in knowledge graph embedding. Int J Artif Intell Appl (IJAIA) 12(1):71–81 Ribeiro AH, Tiels K, Aguirre LA, Schön T (2020) Beyond exploding and vanishing gradients: analysing RNN training using attractors and smoothness. In: International conference on artificial intelligence and statistics, pp 2370–2380

80

Y. Zhong et al.

Richardson P, Griffin I, Tucker C, Smith D, Oechsle O, Phelan A, Rawling M, Savory E, Stebbing J (2020) Baricitinib as potential treatment for 2019-nCoV acute respiratory disease. Lancet 395(10223):e30 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50(5):742–754 Rohani N, Eslahchi C, Katanforoush A (2020) ISCMF: integrated similarity-constrained matrix factorization for drug–drug interaction prediction. Netw Model Anal Health Inform Bioinform 9(1):1–8 Ruiz C, Zitnik M, Leskovec J (2021) Identification of disease treatment mechanisms through the multiscale interactome. Nat Commun 12(1):1–15 Ryu JY, Kim HU, Lee SY (2018) Deep learning improves prediction of drug–drug and drug–food interactions. Proc Natl Acad Sci 115(18):E4304–E4311 Sandfort F, Strieth-Kalthoff F, Kühnemund M, Beecks C, Glorius F (2020) A structure-based platform for predicting chemical reactivity. Chem 6(6):1379–1390 Schwalbe-Koda D, Gómez-Bombarelli R (2020) Generative models for automatic chemical design. Lect Notes Phys 968:445–467 Shen J, Nicolaou CA (2019) Molecular property prediction: recent trends in the era of artificial intelligence. Drug Discov Today 32:29–36 Shi J-Y, Gao K, Shang X-Q, Yiu S-M (2016) LCM-DS: a novel approach of predicting drug– drug interactions for new drugs via Dempster-Shafer theory of evidence. In: IEEE international conference on bioinformatics and biomedicine (BIBM), pp 512–515 Shtar G, Rokach L, Shapira B (2019) Detecting drug–drug interactions using artificial neural networks and classic graph similarity measures. PLoS ONE 14(8):e0219796 Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations, pp 1–14 Subhasish G, Mriganka N (2020) Toxicity detection in drug candidates using simplified molecularinput line-entry system. Int J Comput Appl 175(21):1–4 Subramanian A, Narayan R, Corsello SM, Peck DD, Natoli TE, Lu X, Gould J, Davis JF, Tubelli AA, Asiedu JK, Lahr DL, Hirschman JE, Liu Z, Donahue M, Julian B, Khan M, Wadden D, Smith IC, Lam D, Liberzon A, Toder C, Bagul M, Orzechowski M, Enache OM, Piccioni F, Johnson SA, Lyons NJ, Berger AH, Shamji AF, Brooks AN, Vrcic A, Flynn C, Rosains J, Takeda DY, Hu R, Davison D, Lamb J, Ardlie K, Hogstrom L, Greenside P, Gray NS, Clemons PA, Silver S, Wu X, Zhao WN, Read-Button W, Wu X, Haggarty SJ, Ronco LV, Boehm JS, Schreiber SL, Doench JG, Bittker JA, Root DE, Wong B, Golub TR (2017) A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 171(6):1437–1452.e17 Sun Z, Deng Z-H, Nie J-Y, Tang J (2018) RotatE: knowledge graph embedding by relational rotation in complex space. In: International conference on learning representations, pp 1–18 Takeda T, Hao M, Cheng T, Bryant SH, Wang Y (2017) Predicting drug–drug interactions through drug structural similarities and interaction networks incorporating pharmacokinetics and pharmacodynamics knowledge. J Cheminform 9(1):1–9 Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) Line: large-scale information network embedding. In: Proceedings of the 24th international conference on world wide web, pp 1067– 1077 Tatonetti NP, Patrick PY, Daneshjou R, Altman RB (2012) Data-driven prediction of drug effects and interactions. Sci Transl Med 4(125):125ra131 Tetko IV, Karpov P, Bruno E, Kimber TB, Godin G (2019) Augmentation is what you need! In: International conference on artificial neural networks. Springer, pp 831–835 Thirumuruganathan S, Tang N, Ouzzani M, Doan A (2020) Data curation with deep learning. In: Proceedings of the 23rd international conference on extending database technology, pp 277–286 Tice RR, Austin CP, Kavlock RJ, Bucher JR (2013) Improving the human hazard characterization of chemicals: a Tox21 update. Environ Health Perspect 121(7):756–765 Toutanova K, Chen D (2015) Observed versus latent features for knowledge base and text inference. In: Proceedings of the 3rd workshop on continuous vector space models and their compositionality, pp 57–66

3 Emerging Machine Learning Techniques in Predicting Adverse Drug …

81

Trouillon T, Welbl J, Riedel S, Gaussier É, Bouchard G (2016) Complex embeddings for simple link prediction. In: International conference on machine learning, pp 2071–2080 Uetrecht J (2007) Idiosyncratic drug reactions: current understanding. Annu Rev Pharmacol Toxicol 47:513–539 Veliˇckovi´c P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2018) Graph attention networks. In: International conference on learning representations, pp 1–12 Vilar S, Uriarte E, Santana L, Tatonetti NP, Friedman C (2013) Detection of drug–drug interactions by modeling interaction profile fingerprints. PLoS ONE 8(3):e58321 Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the AAAI conference on artificial intelligence, vol 28(1), pp 1112–1119 Wang D, Cui P, Zhu W (2016) Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1225–1234 Wang H, Wang J, Wang J, Zhao M, Zhang W, Zhang F, Xie X, Guo M (2018) GraphGAN: graph representation learning with generative adversarial nets. In: Proceedings of the AAAI conference on artificial intelligence, pp 2508–2515 Wang B, Wang A, Chen F, Wang Y, Kuo C-CJ (2019) Evaluating word embedding models: methods and experimental results. APSIPA Trans Sig Inf Process 8:E19 Wang B, Zhang X, Zhou X, Li J (2020a) A gated dilated convolution with attention model for clinical cloze-style reading comprehension. Int J Environ Res Public Health 17(4):1323 Wang F, Yang J-F, Wang M-Y, Jia C-Y, Shi X-X, Hao G-F, Yang G-F (2020b) Graph attention convolutional neural network model for chemical poisoning of honey bees’ prediction. Sci Bull 65(14):1184–1191 Wang H, Lian D, Zhang Y, Qin L, Lin X (2020c) GoGNN: Graph of graphs neural network for predicting structured entity interactions. In: Proceedings of the twenty-ninth international joint conference on artificial intelligence, pp 1317–1323 Wang R, Li T, Yang Z, Yu H (2020d) Predicting polypharmacy side effects based on an enhanced domain knowledge graph. International conference on applied informatics. Springer, Berlin, pp 89–103 Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z (2018) DrugBank 5.0: a major update to the DrugBank database for 2018. Nucl Acids Res 46(D1):D1074–D1082 Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, Leswing K, Pande V (2018) MoleculeNet: a benchmark for molecular machine learning. Chem Sci 9(2):513–530 Xie L, Xu L, Kong R, Chang S, Xu X (2020) Improvement of prediction performance with conjoint molecular fingerprint in deep learning. Front Pharmacol 11:606668 Xiong Z, Wang D, Liu X, Zhong F, Wan X, Li X, Li Z, Luo X, Chen K, Jiang H (2019) Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J Med Chem 63(16):8749–8760 Xu Y, Yao H, Lin K (2018) An overview of neural networks for drug discovery and the inputs used. Expert Opin Drug Discov 13(12):1091–1102 Xu N, Wang P, Chen L, Tao J, Zhao J (2019) MR-GNN: multi-resolution and dual graph neural network for predicting structured entity interactions. In: Proceedings of the twenty-eighth international joint conference on artificial intelligence, pp 3968–3974 Xue W, Li T (2018) Aspect based sentiment analysis with gated convolutional networks. In: Proceedings of the 56th annual meeting of the association for computational linguistics, volume 1: long papers, pp 2514–2523 Yang L, Luo H, Chen J, Xing Q, He L (2009) SePreSA: a server for the prediction of populations susceptible to serious adverse drug reactions implementing the methodology of a chemical– protein interactome. Nucl Acids Res 37(suppl_2):W406–W412 Yang L, Wang K, Chen J, Jegga AG, Luo H, Shi L, Wan C, Guo X, Qin S, He G (2011) Exploring offtargets and off-systems for adverse drug reactions via chemical-protein interactome—clozapineinduced agranulocytosis as a case study. PLoS Comput Biol 7(3):e1002016

82

Y. Zhong et al.

Yang B, Yih W-t, He X, Gao J, Deng L (2015) Embedding entities and relations for learning and inference in knowledge bases. In: International conference on learning representations, pp 1–12 Yang L, Cheung N-M, Li J, Fang J (2019) Deep clustering by gaussian mixture variational autoencoders with graph embedding. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6440–6449 Yang M, Wu G, Zhao Q, Li Y, Wang J (2021) Computational drug repositioning based on multisimilarities bilinear matrix factorization. Brief Bioinform 22(4):bbaa267 Yepes AJ (2017) Word embeddings and recurrent neural networks based on long-short term memory nodes in supervised biomedical word sense disambiguation. J Biomed Inform 73:137–147 Yin W, Kann K, Yu M, Schütze H (2017) Comparative study of CNN and RNN for natural language processing. arXiv:170201923 Yu L, Su Y, Liu Y, Zeng X (2021) Review of unsupervised pretraining strategies for molecules representation. Brief Funct Genomics 20(5):323–332 Yue X, Wang Z, Huang J, Parthasarathy S, Moosavinasab S, Huang Y, Lin SM, Zhang W, Zhang P, Sun H (2020) Graph embedding on biomedical networks: methods, applications and evaluations. Bioinformatics 36(4):1241–1251 Zeng X, Zhu S, Lu W, Liu Z, Huang J, Zhou Y, Fang J, Huang Y, Guo H, Li L (2020) Target identification among known drugs by deep learning from heterogeneous networks. Chem Sci 11(7):1775–1797 Zhang C, Zang T (2020) CNN-DDI: a novel deep learning method for predicting drug–drug interactions. In: IEEE international conference on bioinformatics and biomedicine (BIBM), pp 1708–1713 Zhang W, Liu F, Luo L, Zhang J (2015a) Predicting drug side effects by multi-label learning and ensemble learning. BMC Bioinform 16(1):1–11 Zhang X, Zhao J, LeCun Y (2015b) Character-level convolutional networks for text classification. Adv Neural Inf Process Syst 28:649–657 Zhang W, Chen Y, Liu F, Luo F, Tian G, Li X (2017) Predicting potential drug–drug interactions by integrating chemical, biological, phenotypic and network data. BMC Bioinform 18(1):1–12 Zhang W, Chen Y, Li D, Yue X (2018) Manifold regularized matrix factorization for drug–drug interaction prediction. J Biomed Inform 88:90–97 Zhang C, Song D, Huang C, Swami A, Chawla NV (2019) Heterogeneous graph neural network. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining, pp 793–803 Zhang Z, Cai J, Zhang Y, Wang J (2020) Learning hierarchy-aware knowledge graph embeddings for link prediction. In: Proceedings of the AAAI conference on artificial intelligence, pp 3065–3072 Zhang F, Sun B, Diao X, Zhao W, Shu T (2021) Prediction of adverse drug reactions based on knowledge graph embedding. BMC Med Inform Decis Mak 21(1):1–11 Zhao Y, Zheng K, Guan B, Guo M, Song L, Gao J, Qu H, Wang Y, Shi D, Zhang Y (2020) DLDTI: a learning-based framework for drug-target interaction identification using neural networks and network representation. J Transl Med 18(1):1–15 Zheng S, Yan X, Yang Y, Xu J (2019a) Identifying structure–property relationships through SMILES syntax analysis with self-attention mechanism. J Chem Inf Model 59(2):914–923 Zheng Y, Peng H, Zhang X, Zhao Z, Gao X, Li J (2019b) DDI-PULearn: a positive-unlabeled learning method for large-scale prediction of drug–drug interactions. BMC Bioinform 20(19):1– 12 Zheng S, Rao J, Song Y, Zhang J, Xiao X, Fang EF, Yang Y, Niu Z (2021) PharmKG: a dedicated knowledge graph benchmark for bomedical data mining. Brief Bioinform 22 (4):bbaa344 Zhou L, Li Z, Yang J, Tian G, Liu F, Wen H, Peng L, Chen M, Xiang J, Peng L (2019) Revealing drug-target interactions with computational models and algorithms. Molecules 24(9):1714 Zhuang C, Ma Q (2018) Dual graph convolutional networks for graph-based semi-supervised classification. In: Proceedings of the 2018 world wide web conference, pp 499–508 Zitnik M, Agrawal M, Leskovec J (2018) Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics 34(13):i457–i466

Chapter 4

Drug Effect Deep Learner Based on Graphical Convolutional Network Yunyi Wu, Shenghui Guan, and Guanyu Wang

4.1 Introduction Drug discovery has long been afflicted by high cost and low rate of success. In general, the approval of a drug entails the following four stages of development: studies on molecular mechanisms, cell experiments, animal trials, and clinical trials. The molecular level studies include the design of small molecule drugs targeting the disease-causing protein and the analysis of the functions of the protein in the context of signaling pathways. The drug efficacy is then tested by treating cells with different concentrations of the drug and then determining the corresponding cell viability and cytotoxicity. These cell experiments not only provide rapid in vitro drug screening and rapid cell effect inspection but also obtain a great deal of subcellular data, which are however unexplored (further deep mining is necessary). The animal trials can provide some insights into in vivo drug efficacy, but their clinical validity can hardly be established due to the huge differences between animals and humans. The drugs that are effective to animals are not necessarily effective to humans and may even be unsafe, and vice versa (McKim 2010). Due to these inherent weaknesses, a new drug may well finally fail even if it successfully passed all kinds of preclinical trials (Fogel 2018). According to a recent survey, the total success rate of new drugs is as low as 13.8%, and this number is only 3.4% for anticancer drugs (Wong et al. 2019). Y. Wu · S. Guan · G. Wang (B) Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China e-mail: [email protected] Y. Wu e-mail: [email protected] G. Wang School of Medicine Life, and Health Sciences, Chinese University of Hong Kong, Shenzhen, China © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Hong (ed.), Machine Learning and Deep Learning in Computational Toxicology, Computational Methods in Engineering & the Sciences, https://doi.org/10.1007/978-3-031-20730-3_4

83

84

Y. Wu et al.

Because the development of a new drug costs about 2.6 billion dollars (Mullard 2014), this implies a 2.24 billion dollars waste in average for the development of a single drug. To reduce costs and save time, it is necessary to reduce the numbers of wet experiments and clinical trials, which is possible only after the successful development of in silico models of great power of prediction. Deep learning may offer hitherto unimagined possibility to fulfill the foregoing goals. Benefited from Big Data and high-performance computation, which characterize our era, deep learning develops rapidly and has been widely applied to many fields such as image recognition (Krizhevsky et al. 2017), natural language processing (Luong et al. 2013; Sutskever et al. 2014), and biology (Wu and Wang 2018; Kuzminykh et al. 2018; Li et al. 2019; Mao et al. 2021). In language processing, natural language words are converted by the algorithm into vectors of numbers for easy computer recognition (Mikolov et al. 2013b, 2013a; Pennington et al. 2014). This method is extended to biology with the adaptation of word2vec to node2vec, which successfully achieves low-dimensional representation of protein–protein interaction networks (Aditya and Leskovec 2016). The powerful Alphafold2 is a culmination of deep learning in biology, which efficiently incorporates physical and biological knowledge to achieve protein structure prediction with incredible speed and precision (John and Evans 2020; Senior et al. 2020). Depending on the ways of representing drug molecules and utilizing drug toxicity labels, the deep learning architectures are various for drug toxicity predictions (Chakravarti and Alla 2019; Matsuzaka and Uesawa 2019; Chen et al. 2020; Webel et al. 2020). In other words, the structures of deep learning models reflect how the researchers perceive and parse drug molecules, toxicity labels, and the relationship between them. The deep learning architecture has great impact on its performance, which can be very high when the model architecture, drug representation, and application context are highly matched. The more fully the chemical/biological information is utilized, the better the performance. For example, it was found that tailored descriptors and algorithms are important in Bayer’s absorption, distribution, metabolism, excretion, and toxicity (ADME/T) predictive model (Göller et al. 2020). In this paper, we developed a deep learning model based on graph convolutional network (GCN) to predict the effects of drugs on cells. The model integrates multisource information such as the gene interaction networks of human cells, the structure of drug molecules, and the gene expressions induced by drugs. By using the word2vec algorithm, a gene is represented by a list of numbers (a 1024-dimensional gene vector) so that the distance between any two gene vectors quantifies the relatedness of the two genes. That is, if two genes have a high rate of contextual co-occurrence, then their distance should be short in the 1024-dimensional Euclidean space. To find the genes’ relatedness, we searched in the literature all the human gene interaction pathways for the genes’ contextual relationship. Based on the gene vectors, a cell vector can be constructed for a given cell by summing up the product of every gene vector and its expression level and then dividing the number of genes. In this way, the cells are also embedded in the 1024-dimensional Euclidean space, whereby the relationship between the cells can be quantified. For example, cells that are clustered together in the space may belong to the same type or share similar functions. As such,

4 Drug Effect Deep Learner Based on Graphical Convolutional Network

85

the effect of a drug can be evaluated by calculating the drug effect vector, namely the change in the cell vector caused by the drug. Based on these preparations, we developed a GCN to encode the effects of a drug, called deep drug effect predictor (DDEP), to which the chemical structure of the drug is used as the input. We found that drug treatment significantly alters cancer cell vectors, which, interestingly, shift toward normal cell vectors. Moreover, a cell shifts to roughly the same locations in the space after being treated with drugs of similar functions. Therefore, the 1024dimensional cell vector grasps well the comprehensive cell state, on the basis of which drug efficacy can be predicted with accuracy far better than that achieved by simple drug/target classification.

4.2 Results 4.2.1 Gene Vector: Generation and Evaluation We obtained the gene vector for each of the 8808 landmark human genes. We first retrieved all the gene interaction networks from the database STRING (von Mering et al. 2005). We then used Dijkstra’s algorithm (Dijkstra 1959) to capture the contextual relationship between the genes in the interaction networks. If two genes often present in the same context, we say that they have a high degree of relatedness. By the word2vec algorithm (Mikolov et al. 2013a, b), each of the 8808 genes is represented by a list of 1024 numbers, namely a gene vector in a 1024-dimensional Euclidean space (abbreviation: the SPACE). The positioning of the gene vectors in the SPACE is carefully chosen by word2vec so that genes share common context are located closer to each another. That is, the distance between any two gene vectors indicates the relatedness between the corresponding genes. In this way, the 8808 genes are embedded into the SPACE as 8808 gene vectors Gi (i = 1, 2, …, 8808). We then tested the validity of the gene vectors through the annotation of the gene vectors with MyGene (Wu et al. 2013; Xin et al. 2016), a Gene Ontology (GO) web service. If the vector representation is valid, then a common GO term would be enriched in certain areas of the SPACE because highly related genes have a high probability of associating with the same GO term. In other words, if the GO term disperses over the SPACE without enrichment, then the vectors do not represent the genes well. The quantification of enrichment is as follows. In the SPACE, the set of the 50 gene vectors closest to Gi is called the i-th neighborhood. For a given GO term, let E i denote the enrichment of the GO term in the i-th neighborhood, namely the number of genes in the i-th neighborhood that are associated with the GO term. The overall 8808 enrichment E is the average of E i over all the 8808 neighborhoods: E = i=0 E i /8808. We calculated the E values for all the 7840 GO terms that are associated with the 8808 landmark genes. Among them, there are 3655 very rare GO terms; they associate with very few genes (Fig. 4.1b) and consequently their E values are nearly zero. For the remaining 4185 GO terms, there are 2830, 1200, 121,

86

Y. Wu et al.

Fig. 4.1 Statistics of the 4185 frequent GO terms and the 3655 rare GO terms. a The histogram of the E values of the 4185 frequent GO terms, of which 2830, 1200, 121, and 34 have E values within the ranges 0–1, 1–5, 5–10, and 10–30, respectively. b The histogram of the number of genes associated with the 3655 rare GO terms, of which 3059, 461, 113, 18, 2, and 2 are associated with 0–5, 5–10, 10–15, 15–20, 20–30, and 30–50 genes, respectively

and 34 GO terms having E values within the ranges (0, 1), (1, 5), (5, 10), and (10, 30), respectively (Fig. 4.1a). These data validate our gene vector representation.

4.2.2 Molecular Feature and Vector Generation It is vital to choose clinically relevant variables that are significantly associated with CTRCD and further contributed to the high performance of ML models. To do this, the weights of the logistic regression model for each outcome and variable pair can be evaluated. Logistic regression (LR) applies a weight for each feature, and the prediction is the summation of all of the products of the weight and feature pairs. Clinically relevant variables can be based on two criteria: (1) the absolute coefficient of variation (the ratio of standard deviation (SD) and mean) is low to ensure small fluctuation of the weight in the 100 repeats; (2) the absolute associated weight compared with the extremum weight for that outcome is high (relative weight). Additionally, it is advised to test the hazard ratios (95% Confidence Interval (CIs) of the clinically relevant variables to the model outcome to verify its utility. The Wald χ 2 test was used to evaluate the variables with statistically significant coefficients. The following R packages can be used to test hazard ratios on outcomes survival (v2.44-1.1) and survminer (v0.4.6) packages on R 3.6.1. To further visualize the enrichment, we projected the 1024-dimensional gene vectors onto some two-dimensional plane to see if the enrichment of a GO term manifests as the clustering of the corresponding genes on the plane. For a GO term of small E, a projection plane can be easily chosen to cluster the corresponding genes. For example, negative regulation of insulin secretion (GO:0061179) has E = 0.033, which is very small; thus, the corresponding genes, illustrated by the red

4 Drug Effect Deep Learner Based on Graphical Convolutional Network

87

Fig. 4.2 Projection of gene vectors onto linear planes to distinguish a given GO term. The red (yellow) dots represent the genes associated (not associated) with the GO term. The GO terms are: a negative regulation of insulin secretion (GO:0061179), b PERK-mediated unfolded protein response (GO:0036499), c artery development (GO:0060840), d osteoclast differentiation (GO:0030316)

dots in Fig. 4.2a, are well separated from the genes not associated with GO:0061179 (the yellow dots). The linear projection was also successful for PERK-mediated unfolded protein response (GO:0036499) and artery development (GO:0060840), which both have E = 0.5 (Fig. 4.2b, c). The linear projection was still acceptable even for osteoclast differentiation (GO:0030316), which has E = 3.0 (Fig. 4.2d). For larger E values, the linear projection was insufficient, so we projected the genes onto a three-dimensional space by t-distributed stochastic neighbor embedding (t-SNE), a nonlinear dimension reduction method, with an efficient prospect found by TensorBoard (Luus et al. 2019). This method worked well for cell differentiation (GO:0030154; E = 15.8), cell migration (GO:0016477; E = 12.45), DNA replication (GO:0006260; E = 10.09), and development (GO number not assigned) (Fig. 4.3a–d). In conclusion, the gene vectors can be clustered based on their biological functions by linear/nonlinear dimension reduction. The easy visualization of gene–function relationship will be useful to test drug efficacy.

4.2.3 Cell Vector: Generation and Evaluation A cell vector C is defined to be

88

Y. Wu et al.

Fig. 4.3 Projection of gene vectors onto 3-D spaces to distinguish a given GO term. The blue (orange) dots represent the genes associated (not associated) with the GO term. The GO terms are: a cell differentiation (GO:0030154), b cell migration (GO:0016477), c DNA replication (GO:0006260), d development

N C=

i=0

G i × ei N

(4.1)

where N is the number of genes considered for the cell, Gi is the i-th gene vector, and ei is the expression level of the i-th gene. To construct realistic cell vectors, we used the GSE92742 dataset of CMap database (Subramanian et al. 2017; Senior et al. 2020)

4 Drug Effect Deep Learner Based on Graphical Convolutional Network

89

deposited in GEO (Edgar et al. 2002; Barrett et al. 2013), which records the gene expression of 1,319,138 cells. For each of the 1,319,138 cells, the expression levels of some 12,000 genes in the transcriptome are recorded in GSE92742. The intersection of these 12,000 genes and the 8808 landmark genes is a set containing 7042 genes; we therefore set N = 7042 in Eq. (4.1). In brief, a cell vector is constructed by multiplying each of the 7042 gene vectors with its expression level, adding them together, and then dividing by 7042. In this way, all the 1,319,138 cells are embedded into the SPACE. Because 1,319,138 cells are too many to process, we randomly selected 10,000 cells from them for the initial study. Just as the gene vectors allow for the quantification/visualization of the enrichment of a GO term among the genes, the cell vectors allow for the quantification/visualization of the enrichment of a cell type among the cells. To see if cells of the same type are clustered in the SPACE, we calculated the overall enrichment E of a given cell type in the set of 10,000 cells. In total we studied 69 cell types. After statistical analysis, we found that the cell-type enrichment (Fig. 4.4) is much more marked than the GO term enrichment in the genes (Fig. 4.1a). To visualize the enrichment, we tested three cell types: normal lung cells (HCC515; E = 4.58), tumor cells primarily in skin (A375; E = 20.67), and prostate tumor cells (VCAP; E = 39.83). For each of the three cell types, the 10,000 cells were projected onto a three-dimensional space by using t-SNE (Fig. 4.5).

Fig. 4.4 Statistics of the 69 cell types: The histogram of the E values of the 69 cell types, of which 12, 31, 8, 13, and 5 have E values within the ranges 2–5, 5–10, 10–20, 20–30, and 30–40, respectively

90

Y. Wu et al.

Fig. 4.5 Projection of the 10,000 cells onto 3-D spaces to distinguish a target cell type. The blue (orange) dots represent the cells of (not of) the target cell type. The target cell types are: a HCC515, b A375, c VCAP

One sees that cells of the targeted cell type (the blue dots) are clustered and well separated from the other cells (the orange dots). To further test the clustering, we selected another set of 10,000 cells consisting of primary cells (e.g., NPC, NPC.CAS9, NPC.TAK, ASC), normal cell line cells (e.g., HA1E, HA1E.101, HCC515, HEK293T), and tumor cell line cells (e.g., U2OS, WSUDLCL2, YAPC.311), recorded in GSE92742. By using t-SNE, we projected the 10,000 cells onto a three-dimensional space (Fig. 4.6), where the primary cells (green dots), the normal cell line cells (red dots), and the tumor cell line cells (blue dots) are clustered and are well separated from each other. Note that the blue dots significantly outnumber the other dots because the tumor cells occupy a large portion of the database. The validity of cell vector was demonstrated by testing its change under drug perturbation (see below). In particular, we showed that for several cases, in each of which a cancer cell vector, which is far away from the normal cell vector, becomes much closer to the normal cell vector.

4 Drug Effect Deep Learner Based on Graphical Convolutional Network

91

Fig. 4.6 Projection of the 10,000 cells onto a 3-D space so that the cells are clustered according to the three types they belong to: normal cells (red), primary cells (green), and tumor cells (blue)

4.2.4 Deep Drug Effect Predictor: Training and Validation Before the development of DDEP, we developed four GCN-based pre-models (Fig. 4.7) to respectively learn the following four drug properties: oil–water partition coefficient (logP), synthetic accessibility (SAS), qualitative estimate of drug-likeness (QED), and topological polar surface area (TPSA). The pre-models were trained by feeding them with approximately 360,000 drug structures from ZINC database (Irwin and Shoichet 2005). By minimizing the training loss (Eq. (4.6)), namely, the differences between the computed drug properties and the actual drug properties, the pre-models were successfully trained. The minimized training losses are given in Table 4.1. To validate the trained pre-models, we withdrew another set of 50,000 drug structures from ZINC, which were fed to the pre-models to obtain the predicted drug properties and their differences with the actual drug properties, namely the validation losses (Table 4.1). We found that the training was successful because both the training losses and the validation losses were small and were close to each other. The pre-models now “memorize” many drug structures. Based on the four posttraining pre-models, the DDEP was constructed by concatenating the output of the

92

Y. Wu et al.

Fig. 4.7 Deep learning models based on GCN. In the pre-models (left), chemical structure information enters the input layer (gray), passes through the GCLs + attention + gate (purple) and the dense layers (yellow), and results in the output. In the DDEP (right), the pre-models are transferred and serve as input 1. Together with input 2 (the dosage and perturbation time of the drugs), the information passes through dense layers to calculate drug effect vectors

Table 4.1 Training, validation, and test losses of the DDEP logP

QED

SAS

TPSA

Drug effect

Training loss

0.6178

0.1383

0.2091

0.7489

0.01786683875

Validation loss

0.6219

0.1316

0.2101

0.6988

0.020866077

Test loss









0.020866339

pre-models with new input, namely the drug dosage and the duration of drug treatment (Fig. 4.7). The information passes through dense layers to generate the output, namely the 1024-dimensional drug effect vector. To train the DDEP, we again used the GSE92742 dataset. Among the 1,319,138 cells, 672,128 cells were treated with drugs, but only 645,319 cells were usable (some cells were not usable for various reasons, e.g., treated by drugs of overly large size). For each of the 645,319 cells, the corresponding control cell was chosen from a total of 1,319,138, whereby its drug effect vector was obtained by subtracting the control cell vector from the cell vector. These 645,319 cells were divided into the training, validation, and testing sets, consisting of 625,200, 10,000, and 10,119 cells, respectively. The DDEP was then trained sequentially by each of the 625,200 cells.

4 Drug Effect Deep Learner Based on Graphical Convolutional Network

93

During the training by a given cell, the weight values of DDEP were consistently modified to minimize the training loss, namely the least square error between the computed drug effector vector and the actual drug effector vector. The converged training loss was 0.0178668. The trained DDEP was then validated by the 10,000 cells in the validation set. During the validation, the tuning of DDEP was not allowed, thus the computed drug effector vector can be called the predicted drug effector vector. The converged validation loss was 0.0208661. We then used the 10,119 remaining cells to further test the validated DDEP, with the converged test loss 0.0208663 obtained. The test process had no essential difference from the validation process.

4.2.5 Application of DDEP to Predict the Effects of Anti-cancer Drugs Against Breast Adenocarcinoma Our working example was based on the MCF7 cells from the GSE92742 dataset, which is a specific subtype of breast adenocarcinoma from a 69-year-old female donor (Soule et al. 1973). The MCF7 cells were either treated with one of the four drugs at 10 µM concentration for 6 h or not treated with any drug. The untreated cells served as controls because they had been exposed only to culture medium containing dimethyl sulfoxide (DMSO). The four FDA approved drugs were: palbociclib, lapatinib, bosutinib, and sunitinib (Fig. 4.8, the left column). While the former two were used to train DDEP, the latter two were not used. Palbociclib is a cyclindependent protein kinases inhibitor. By inhibiting DNA replication and reducing tumor cell proliferation, it is used to cure postmenopausal women with metastatic breast cancer (National Center for Biotechnology Information 2021c). Lapatinib is a tyrosine kinase receptor inhibitor to treat advanced or metastatic breast cancer overexpressing HER2 (National Center for Biotechnology Information 2021d). Bosutinib is a dual kinase inhibitor (for both BCR-ABL and Src tyrosine kinases). It has shown effectiveness to treat breast cancer patients in phase II clinical trials (Campone et al. 2012; National Center for Biotechnology Information 2021a). Sunitinib is a multispecific tyrosine kinase receptor inhibitor with certain effectiveness to treat breast cancer patients (Burstein et al. 2008). Although it was not better than other drugs according to some phase III clinical trials (Mayer et al. 2010; Crown et al. 2013), these trials did not use molecular marker-based patients stratification and outcome evaluation (Elgebaly et al. 2016; National Center for Biotechnology Information 2021a). The drug effect vectors of the four drugs were predicted by DDEP and then compared with the experimentally obtained drug effect vectors. For palbociclib, the predicted drug effect vector is shown in Fig. 4.9 as the red-colored piecewise linear function of the vector dimension (ranging from 1 to 1024); the experimental drug effect is presented in terms of the mean value (blue lines), the median value (gray lines), and the range from 25 to 75th percentile (pink areas). The multiple presentations were to cover the multiple experiments of treating MCF7 cells with

94

Y. Wu et al.

Fig. 4.8 Effects of drugs on cancer cells as revealed by drug effector vector and DDEP computations. Left column: the four FDA approved drugs against breast adenocarcinoma. Middle column: the distances between the three cell vectors as computed from experimental data. Right column: the distances between the three cell vectors as computed from DDEP

palbociclib, as recorded in GSE92742. For lapatinib, bosutinib, and sunitinib, the results are presented in Figs. 4.10, 4.11, and 4.12, respectively. The representations were successful because the predicted values and experimental values are consistent, and most of the predicted values fall within the range from 25 to 75th percentile (54.2%, 93.7%, 64.3%, and 44.6% for palbociclib, lapatinib, bosutinib, and sunitinib, respectively). Because bosutinib and sunitinib were not used in the training phase, our DDEP can predict the effects of unknown chemicals on cells with good accuracy. After drug treatment, did the cancer cells become healthier? This can be determined by using a normal cell line as a reference. We used MCF10A, an immortalized normal cell line originated from normal human breast epithelium. In total there were three kinds of cell vectors, namely the MCF10A cell vector, the untreated MCF7 cell vector, and the treated MCF7 cell vector; the latter can be further classified according to which drug had been used. The concentration and duration of the drug treatment

4 Drug Effect Deep Learner Based on Graphical Convolutional Network

95

Fig. 4.9 Full 1024-dimension variations of Palbociclib drug effect vector in MCF7 cell. The red line is the predicted values. The blue line represents the mean values of experimental data, and the gray line represents the median values. The pink area shows the range of 25–75th percentile of experimental data

96

Y. Wu et al.

Fig. 4.10 Full 1024-dimension variations of Lapatinib drug effect vector in MCF7 cell. The red line is the predicted values. The blue line represents the mean values of experimental data, and the gray line represents the median values. The pink area shows the range of 25–75th percentile of experimental data

4 Drug Effect Deep Learner Based on Graphical Convolutional Network

97

Fig. 4.11 Full 1024-dimension variations of Bosutinib drug effect vector in MCF7 cell. The red line is the predicted values. The blue line represents the mean values of experimental data, and the gray line represents the median values. The pink area shows the range of 25–75th percentile of experimental data

98

Y. Wu et al.

Fig. 4.12 Full 1024-dimension variations of Sunitinib drug effect vector in MCF7 cell. The red line is the predicted values. The blue line represents the mean values of experimental data, and the gray line represents the median values. The pink area shows the range of 25–75th percentile of experimental data

4 Drug Effect Deep Learner Based on Graphical Convolutional Network

99

had been 10 µM and 6 h, respectively. Even before the application of DDEP, our cell vector characterization has already revealed that, for all the four drugs, the MCF7 cells become much closer to the normal cells after the drug treatment than before the drug treatment (Fig. 4.8, the middle column); the changes are all dramatic because the treated MCF7 cells are now closer to the normal cells than to the untreated MCF7 cells. For example, the distance between the normal cell MCF10A and the untreated cancer cell MCF7 was 0.2820 in the SPACE. After treating the cancer cell with palbociclib, the distance between the treated cancer cell and normal cell reduced markedly to 0.1933, which was even smaller than the distance between the treated cancer cell and the untreated cancer cell. Therefore, our results demonstrate that the four drugs all have good curative effects, which is consistent with the fact that they are all FDA approved. Importantly, the drug effects were accurately predicted by our DDEP because the predicted distances were all very close to the actual distances (Fig. 4.8, the right column). The largest prediction error was with the lapatinib treatment, where the predicted and actual distances between the treated cancer cell and the normal cell were 0.1914 and 0.2026, respectively, with a 5.5% error. The accurate DDEP results not only corroborate our cell vector analysis but also demonstrate the power of our DDEP.

4.2.6 Insights into Drug Classification Through association with drug effect vectors, drugs can also be embedded in the SPACE, which, after proper dimension reduction, become visualizable (Fig. 4.13) and may reveal interesting patterns. For example, the dots clustered together correspond to the drugs that have similar effects on the cells; they perturb cell signaling pathways and gene expressions in a similar manner, and thus they should belong to the same class. As this may represent a new method of drug classification, we compared it with an existing drug classification system to check their conformity. We used the anatomical therapeutic chemical (ATC) classification system (WHO Collaborating Centre for Drug Statistics Methodology 2014), which classifies the active ingredients of drugs into 14 types (A, B, C, D, G, H, J, L, M, N, P, R, S, V) according to the organ or system on which they act and their therapeutic, pharmacological and chemical properties. For every drug in the system, its molecular structure, concentration (fixed at 10 µM), and treatment time (fixed at 6 h) were input into the DDEP, which then calculated the drug effect vector. By t-SNE, the vector was projected onto a three-dimensional space and shown as either a blue dot (if the drug belongs to a targeted type) or a yellow dot (if the drug belongs to all the other types). Figure 4.13a–d show the results when the target types are A (alimentary tract and metabolism), C (cardiovascular system), M (musculo-skeletal system), and A10 (obesity and diabetes), respectively. One sees that the blue dots (i.e., the drugs of the target type) are well separated from the other drugs, demonstrating the validity of our vector based drug classification.

100

Y. Wu et al.

Fig. 4.13 Drug classification based on the clustering of drug effect vectors. Each dot represents a drug effect vector, namely the difference between the cell vector after and before the drug treatment. The blue dots correspond to the targeted drug categories: a alimentary tract and metabolism, b cardiovascular system, c musculo-skeletal system, d obesity and diabetes. The orange dots correspond to the other drug categories. In e, the red, blue, and green dots represent the drugs targeting the MAPK, JAK-STAT, and ERBB pathways, respectively

We then used DDEP to classify drugs targeting different pathways, namely the MAPK pathway (Seger and Krebs 1995; Kim and Choi 2010), the JAK-STAT (Janus kinase/signal transducers and activators of transcription) pathway (Rawlings et al. 2004), and the ERBB pathway, which consists of ErbB family of proteins (ErbB1– ErbB4). These pathways are all important in the regulation of cell proliferation, migration, motility, and apoptosis, the deregulation of which may lead to complex diseases such as cancer. Because they are highly interconnected, it would be difficult

4 Drug Effect Deep Learner Based on Graphical Convolutional Network

101

to distinguish their respective drugs. Nevertheless, our DDEP was effective in distinguish these pathway targeted drugs because the drugs were projected to different areas of a three-dimensional space (Fig. 4.13e).

4.3 Discussion To predict the effects of drugs on cells, we have developed a GCN-based deep learning model, which integrates multi-source information such as the gene interaction networks of human cells, the structure of drug molecules, and the gene expressions induced by drugs. In the model, each gene is represented by a 1024-dimensional gene vector, and the distance between any two gene vectors quantifies the two genes’ relatedness, which was discovered by searching for the genes’ contextual relationship in the literature; a cell vector is then constructed by integrating the vectors of 8808 landmark human genes, as well as their expression levels. The effect of a drug can also be represented by a vector, namely the change in the cell vector caused by the drug. Based on the vector representations, we developed DDEP, which is essentially a GCN encoding the effects of a drug, with the drug chemical structure as the input. We found that DDEP can predict drug efficacy with accuracy far better than that achieved by simple drug/target classification, and the vector representations grasp well the comprehensive states of a cell. The present model is similar to DeepDTnet (Zeng et al. 2020), which is also a deep learning model for target recognition using small molecule drugs from DrugBank, ATC code labels, and integrating information from chemical, phenotypic, genomic and cellular networks. However, the present model is unique in that it extensively uses vectors to represent genes, cells, and drug effects. Against the backdrop of fast development of omics technologies and machine learning methods, and with the accumulation of multi-level data (genome, transcriptome, proteome, drug molecular structure, drug properties), our vector representation will be increasingly closer to the realistic cell states, and our DDEP will be increasingly powerful to predict drug efficacy. With the combination with virtual drug screening, we hope our evolving DDEP will play an important role in drug discovery.

4.4 Methods We used various multi-source data such as gene interaction network data from STRING database (von Mering et al. 2005), gene expression data before and after chemical treatment from CMap database (Subramanian et al. 2017), chemical structure and properties data from ZINC database (Irwin and Shoichet 2005). To generate gene vectors, we captured the context information in large human gene interaction pathway with Dijkstra’s Algorithm (Dijkstra 1959) and then processed the context

102

Y. Wu et al.

Fig. 4.14 Overall methodology

information with word2vec algorithm (Mikolov et al. 2013a, b) (Fig. 4.14a). By integrating the vectors of many landmark genes in the cell and taking their expression levels into account, a cell vector can be generated (Fig. 4.14b). A GCN-based deep learning model, called DDEP, was then constructed that extracts the hidden information of drug structures and properties (Fig. 4.14c) so that the drug efficacy can be predicted (Fig. 4.14d). The DDEP was trained by many cells’ gene expression data both before and after drug treatment.

4.4.1 Capture Contextual Information of Genes from Their Interaction Networks We used Dijkstra’s algorithm, which is wildly used to find the shortest path in a topological graph with weighted edges, to capture gene context information from human gene interaction pathways withdrawn from PathCards database (Belinky et al. 2015). For a given pathway, the gene–gene interaction networks were converted from protein–protein interaction data in STRING database (von Mering et al. 2005). In total, we withdrew from STRING database 8923 genes and 880,588 gene interaction pairs of 3839 pathways that play vital roles in different human cells. We used the combined score (denoted by c), a number between 0 and 1, to quantify the closeness between any two genes. It takes into account seven different aspects of gene correlation: conserved neighborhood, gene fusions, phylogenetic co-occurrence, coexpression, large-scale experiments, literature co-occurrence, and database imports (von Mering et al. 2005). We withdrew combined scores from STRING. The distance between two genes i and j was defined to be d ij = 1 − cij . If genes are nodes of a network, then d ij naturally serves as the weight of the edge between nodes i and j. To apply Dijkstra’s algorithm for a network of n genes, all the n 2 − n possible paths

4 Drug Effect Deep Learner Based on Graphical Convolutional Network

103

of information flow need to be traversed. We have captured all the main context information involving 3,824,504 paths of 3839 pathways in total.

4.4.2 Generating Gene Vectors and Cell Vectors Word2vec is a traditional algorithm for natural language processing (NLP) (Mikolov et al. 2013a, b). It can embed one-hot vectors by constructing a model for the prediction of the contextual words of a center word, or vice versa. Similarly, for a given gene, we created its gene vector by using its contextual genes in the information propagation path of the gene interaction networks, for which word2vec was used for the embedding and skip-gram model with negative sampling was used as an approximate training method. Before training, we performed subsampling to balance the sampling rate between high-frequency genes and low-frequency genes. After training, we obtained 1024-dimensional vectors for the total 8,808 genes that play crucial roles in human gene interaction pathways. Based on the generated gene vectors, cell vectors were generated as described in Results. Note that the gene expression data were in the form of L1000 array, which was designed for generating low-cost and high-throughput data and formed by 978 probes of landmark transcripts and 80 probes of control transcripts (Subramanian et al. 2017).

4.4.3 GCN-Based Pre-models To expedite the convergence of DDEP, we developed four GCN-based pre-models (Fig. 4.7 (left)) to respectively learn the following four drug properties: logP, TPSA, SAS, and QED. The input to the pre-models were chemical structure data including atomic adjacent matrix and feature matrix (Ryu et al. 2018). Only small molecules with atom number less than 50 are kept, the adjacent matrix of each small molecule is built according to its topological graph structure, and the feature matrix is composed of atomic types, atomic connectivity degree, connected hydrogen atoms number (both dominant and hidden), hidden hydrogen atom number, and aromatic or not. The pre-models were based on the GCN in Ref (Ryu et al. 2018), which is characterized by multi-head attention and gate mechanisms. To decrease the number of weights and improve model stability, the weights between the second and third layers were shared (Wu et al. 2013; Xin et al. 2016; Hu 2018). The forward calculation from input layer to output layer involves feeding chemical structure information into the first graph convolutional layer (GCL) with attention, passing calculation results into the second and third GCLs with attention and gate, and then calculating in the last three dense layers the prediction values as the output (Eq. (4.2)):

104

Y. Wu et al.

⎞ K   1 = σ⎝ α (l) H (l) W (l) ⎠ K k=1 j N (i) i j,k j ⎛

Hi(l+1)

(4.2)

where Hi(l) denotes the state of node i at layer l, σ is the sigmoid activation function, K is the total number of attentions, N(i) is the set of node i’s neighbors, W (l) is the convolution weights of layer l, and αi(l)j,k is the attention coefficient between nodes i and j of the k-th attention and l-th layer. More detailedly, αi(l)j





Hi(l) W (l)



T (l) (l) (l) C Hj W

(4.3)

where C (l) is the coupling matrix of layer l. Note that the mechanism of gated skipconnection (gsc) was applied: (l+1) Hi,gsc = z i  Hi(l+1) + (1 − z i )  Hi(l)

(4.4)

where  is element-wise matrix multiplication (Hadamard product) and

z i = σ Wz,1 Hi(l+1) + Wz,2 Hi(l) + bz

(4.5)

where Wz,1 , Wz,2 , and bz are trainable parameters. For the implementation, we used MxNet GPU version 1.2.1 to build up our models. We randomly selected 360,000 records as training set and 50,000 records as validation set from the small chemical molecules dataset. The loss function is defined in Eq. (4.6). Loss =

1 (Yi − Pi )2 2 i

(4.6)

where P and Y represent the prediction and true values, respectively. Distributed data parallel training was performed by Adam optimization algorithm on HighPerformance Cluster (HPC) with four GPU nodes. The total batch size on each GPU node was 12 × 100 = 1200, where 12 is the number of cards of a GPU node. The learning rate, weight decay, learning rate decay, and learning rate decay period were 0.01, 0.95, 0.5, and 20 (30), respectively. The logP, SAS, and QED models were trained for 100 epochs each, while the TPSA model was trained for 150 epochs.

4.4.4 Deep Drug Effect Predictor The pre-models, which had stored information about the physical/chemical properties of the drugs in the GCL weight values, were concatenated to base the development

4 Drug Effect Deep Learner Based on Graphical Convolutional Network

105

of the full DDEP (Fig. 4.7 (right)). The output of the pre-models, together with the data of different drug dosage (0.08, 0.4, 2, 10 µM) and perturbation time (6 h and 24 h), was sent new dense layers, which finally resulted in the output of the 1024-dimensional drug effect vector Effect = Y − X where X and Y are the cell vectors before and after the drug treatment, respectively. The DDEP was trained by using the 645,319 drug-treated cells and the corresponding control cells in the GSE92742 dataset. We used 551 small molecule drugs (Table 4.2) whose ATC category data were from DrugBank (Law et al. 2014; Wishart et al. 2018), among which there were 64 drugs who target single pathways (Table 4.3). The t-SNE algorithm was used to visualize the drug effect vectors in 3-dimensional space.

[H][C@@]12OC3=C(O)C=CC4=C3[C@@]11CCN(C)[C@]([H])(C4)[C@]1([H])C=C[C@@H]2O

COC1=CC2=C(C=C1)N=C(N2)S(=O)CC1=NC=C(C)C(OC)=C1C

DB00284

DB00295

DB00338

6

7

8

[H][C@@]12CC[C@](O)(C(=O)CO)[C@@]1(C)CC(=O)[C@@]1([H])[C@@]2([H])CCC2=CC(=O)C=C[C@]12C

NCCS

DB00847

21

CC(C)[N+](C)(CCOC(=O)C1C2=CC=CC=C2OC2=CC=CC=C12)C(C)C

CN(C)C(=O)C(CCN1CCC(O)(CC1)C1=CC=C(Cl)C=C1)(C1=CC=CC=C1)C1=CC=CC=C1

DB00782

DB00836

19

20

CNC[C@H](O)C1=CC(O)=C(O)C=C1

COC1=CC2=C(NC(=N2)[S@@](=O)CC2=NC=C(C)C(OC)=C2C)C=C1

DB00668

DB00736

17

[H][C@]12[C@H](C[C@@H](C)C=C1C=C[C@H](C)[C@@H]2CC[C@@H]1C[C@@H](O)CC(=O)O1)OC(=O)C(C)(C)CC

18

DB00635

DB00641

15

16

CN\C(NCCSCC1=C(C)NC=N1)=N\C#N

OC[C@H]1O[C@](O)(CO)[C@@H](O)[C@@H]1O[C@@H]1O[C@H](CO)[C@H](O)[C@H](O)[C@H]1O

DB00501

DB00581

13

14

CN1[C@H]2CC[C@@H]1C[C@@H](C2)OC(=O)[C@H](CO)C1=CC=CC=C1

CC1=C(OCC(F)(F)F)C=CN=C1CS(=O)C1=NC2=CC=CC=C2N1

DB00424

DB00448

11

12

CN1CCCN=C1COC(=O)C(O)(C1CCCCC1)C1=CC=CC=C1

CN(CCOC1=CC=C(CC2SC(=O)NC2=O)C=C1)C1=CC=CC=N1

DB00383

DB00412

9

10

[H]C(=O)[C@H](O)[C@@H](O)[C@]([H])(O[C@@]1([H])O[C@H](CO)[C@@]([H])(O[C@H]2O[C@H](C)[C@@H](N[C@@] 3([H])C=C(CO)[C@@H](O)[C@H](O)[C@H]3O)[C@H](O)[C@H]2O)[C@H](O)[C@H]1O)[C@H](O)CO

CC[N+](C)(CC)CCOC(=O)C(O)(C1CCCCC1)C1=CC=CC=C1

CCC1=C(C)CN(C(=O)NCCC2=CC=C(C=C2)S(=O)(=O)NC(=O)N[C@H]2CC[C@H](C)CC2)C1=O

DB00219

DB00222

4

5

[H][C@]12CC[C@]([H])(C[C@@H](C1)OC(=O)C(O)(C1=CC=CC=C1)C1=CC=CC=C1)[N+]21CCCC1

COC1=C(OC)C(CS(=O)C2=NC3=C(N2)C=C(OC(F)F)C=C3)=NC=C1

DB00209

DB00213

2

3

Canonical SMILES

CC1=C(C)C=C2N(C[C@H](O)[C@H](O)[C@H](O)CO)C3=NC(=O)NC(=O)C3=NC2=C1

DrugBankID

DB00140

No.

1

Table 4.2 Canonical SMILES of 551 drugs from DrugBank with labels (ATC codes) ATC

(continued)

A16

A07

A03

A02

A01

A10

A07

A06

A02

A02

A03

A10

A03

A02

A07

A10

A10

A03

A02

A03

A11

106 Y. Wu et al.

OC1=CC=C2C[C@H]3 N(CC=C)CC[C@@]45[C@@H](OC1=C24)C(=O)CC[C@@]35O

DB02329

36

[H][C@@]12C[C@]1([H])N([C@@H](C2)C#N)C(=O)[C@@H](N)C12CC3CC(CC(O)(C3)C1)C2

[H][C@]1(O[C@H](CO)[C@@H](O)[C@H](O)[C@H]1O)C1=CC=C(C)C(CC2=CC=C(S2)C2=CC=C(F)C=C2)=C1

DB06335

DB08907

41

42

OC12CC3CC(C1)CC(C3)(C2)NCC(=O)N1CCC[C@H]1C#N

CCOC1=CC=C(CC2=C(Cl)C=CC(=C2)[C@@H]2O[C@H](CO)[C@@H](O)[C@H](O)[C@H]2O)C=C1

DB04876

DB06292

39

40

[H][C@@]12CC[C@H](O)[C@@]1(C)CC[C@@]1([H])[C@@]2([H])CC[C@@]2([H])CC(=O)CC[C@]12C

OC1=CC=CC2=C1C(=O)C1=C(C=CC=C1O)C2=O

DB02901

DB04816

37

38

[H][C@@]1(CC[C@@]2(C)[C@@]([H])(CC[C@]3(C)[C@]2([H])C(=O)C=C2[C@]4([H])C[C@](C)(CC[C@]4(C)CC[C@@]32C) C(O)=O)C1(C)C)OC(=O)CCC(O)=O

CC(C)CCC[C@@H](C)[C@@]1([H])CC[C@@]2([H])\C(CCC[C@]12C)=C\C=C1\C[C@@H](O)C[C@H](O)C1=C

CC(C)[N+](C)(CCC(C(N)=O)(C1=CC=CC=C1)C1=CC=CC=C1)C(C)C

DB01436

DB01625

34

COCCOC1=CN=C(NS(=O)(=O)C2=CC=CC=C2)N=C1

35

DB01183

DB01382

32

33

CCC1=CN=C(CCOC2=CC=C(CC3SC(=O)NC3=O)C=C2)C=C1

CC(NC(C)(C)C)C(=O)C1=CC(Cl)=CC=C1

DB01132

DB01156

30

31

[H][C@]12SC(C)(C)[C@@H](N1C(=O)[C@H]2NC(=O)[C@H](N)C1=CC=C(O)C=C1)C(O)=O

COCCCOC1=C(C)C(CS(=O)C2=NC3=CC=CC=C3N2)=NC=C1

DB01060

DB01129

28

29

C[N+]1(C)CCC(C1)OC(=O)C(O)(C1CCCC1)C1=CC=CC=C1

CC(C)C1=CC2=C(OC3=NC(N)=C(C=C3C2=O)C(O)=O)C=C1

DB00986

DB01025

26

27

CC1=NC=C(N1CCO)[N+]([O–])=O

NC(N)=NC1=NC(CSCCC(N)=NS(N)(=O)=O)=CS1

DB00916

DB00927

NC(=N)NC(=N)NCCC1=CC=CC=C1

DB00914

23

24

CCOC1=C(C=CC(CC(=O)N[C@@H](CC(C)C)C2=CC=CC=C2N2CCCCC2)=C1)C(O)=O

DB00912

25

Canonical SMILES

DrugBankID

No.

22

Table 4.2 (continued)

(continued)

A10

A10

A10

A10

A06

A14

A02

A03

A11

A10

A06

A08

A10

A02

A02

A01

A03

A02

A02

A10

A10

ATC

4 Drug Effect Deep Learner Based on Graphical Convolutional Network 107

[H][C@@]12CC[C@](C)(O)[C@@]1(C)CC[C@@]1([H])[C@@]2([H])CCC2=CC(=O)C=C[C@]12C

OC[C@H](O)[C@@H](O)[C@H](O)[C@H](O)CO

DB01638

63

OC[C@@H](O)[C@@H](O)[C@H](O)[C@H](O)CO

CC(=O)OC1=CC=CC=C1C(O)=O

DB00742

DB00945

61

62

CC[N+]1(C)CCCC(C1)OC(=O)C(O)(C1=CC=CC=C1)C1=CC=CC=C1

NC1=NC(=O)C2=NC(CNC3=CC=C(C=C3)C(=O)N[C@@H](CCC(O)=O)C(O)=O)=CN=C2N1

DB13844

DB00158

59

[I–].C[N+]1(CCC(O)(C2=CC=CS2)C2=CC=CC=C2)CCOCC1

60

DB13586

DB13666

57

58

C[N+]1(C)CCCCC1COC(=O)C(O)(C1=CC=CC=C1)C1=CC=CC=C1

CCCCCCCCCCCC[N+](CCO)(CCO)CC1=CC=CC=C1

DB13542

DB13565

55

56

CNCC(=O)C1=CC(O)=C(O)C=C1

CCCCCCCCOC1=CC=CC=C1C(=O)NC1=CC=C(C=C1)C(=O)OCC[N+](C)(CC)CC

DB13394

DB13500

53

54

CCOC1=CC=C(CC2=CC(=CC=C2Cl)[C@]23OC[C@](CO)(O2)[C@@H](O)[C@H](O)[C@H]3O)C=C1

CC(C)(C)OC[C@H]1 N(CCNC1=O)C(=O)C[C@H](N)CC1=CC(F)=C(F)C=C1F

DB11827

DB12625

51

52

C[N+]1(C)[C@H]2C[C@@H](C[C@@H]1[C@H]1O[C@@H]21)OC(=O)[C@H](CO)C1=CC=CC=C1

CC(C)(O)C(Cl)(Cl)Cl

DB11315

DB11386

49

50

[Ca++].[H][C@@](O)(CO)[C@@]([H])(O)[C@]([H])(O)[C@@]([H])(O)C([O–])=O.[H][C@@](O)(CO)[C@@]([H])(O)[C@] ([H])(O)[C@@]([H])(O)C([O–])=O

[Ca++].[Ca++].[Ca++].OC(CC([O–])=O)(CC([O–])=O)C([O–])=O.OC(CC([O–])=O)(CC([O–])=O)C([O–])=O

DB11093

DB11126

47

48

CCCC[N+]1(C)[C@H]2C[C@@H](C[C@@H]1[C@H]1O[C@@H]21)OC(=O)[C@H](CO)C1=CC=CC=C1

OCC(O)CO

DB09300

DB09462

OS(=O)(=O)OC1=CC=C(C=C1)C(C1=CC=C(OS(O)(=O)=O)C=C1)C1=CC=CC=N1

DB09268

44

45

OC[C@H]1O[C@H]([C@H](O)[C@@H](O)[C@@H]1O)C1=CC=C(Cl)C(CC2=CC=C(O[C@H]3CCOC3)C=C2)=C1

DB09038

46

Canonical SMILES

DrugBankID

No.

43

Table 4.2 (continued)

(continued)

B05

B01

B05

B03

A03

A03

A14

A01

A03

A03

A01

A10

A10

A04

A03

A12

A12

A06

A03

A06

A10

ATC

108 Y. Wu et al.

[H][C@@]12CCC[C@]1([H])N([C@@H](C2)C(O)=O)C(=O)[C@H](C)N[C@@H](CCC1=CC=CC=C1)C(=O)OCC

CC(C)NCC(O)C1=CC=C(NS(C)(=O)=O)C=C1

CCC(=O)O[C@@H](O[P@](=O)(CCCCC1=CC=CC=C1)CC(=O)N1C[C@@H](C[C@H]1C(O)=O)C1CCCCC1)C(C)C

DB00489

DB00492

82

83

NS(=O)(=O)C1=CC2=C(NC(CC3=CC=CC=C3)NS2(=O)=O)C=C1C(F)(F)F

DB00381

DB00436

80

81

CCOC(=O)C1=C(COCCN)NC(C)=C(C1C1=CC=CC=C1Cl)C(=O)OC

COC1=CC=C(C=C1)[C@@H]1SC2=C(C=CC=C2)N(CCN(C)C)C(=O)[C@@H]1OC(C)=O

[H][C@](O)(CNC(C)(C)C)COC1=NSN=C1N1CCOCC1

DB00343

DB00373

78

79

CCCC1=NC(=C(N1CC1=CC=C(C=C1)C1=C(C=CC=C1)C1=NN=NN1)C(O)=O)C(C)(C)O

CC(C)NCC(O)COC1=CC=C(CC(N)=O)C=C1

DB00275

DB00335

76

77

CN1C(CCl)NC2=CC(Cl)=C(C=C2S1(=O)=O)S(N)(=O)=O

COCCC1=CC=C(OCC(O)CNC(C)C)C=C1

DB00232

DB00264

74

75

CC(C)NC(=O)NS(=O)(=O)C1=C(NC2=CC=CC(C)=C2)C=CN=C1

[H][C@]12[C@H](C[C@@H](C)C=C1C=C[C@H](C)[C@@H]2CC[C@@H]1C[C@@H](O)CC(=O)O1)OC(=O)[C@@H](C)CC

DB00214

DB00227

72

[H][C@]12C[C@@H](OC(=O)C3=CC(OC)=C(OC)C(OC)=C3)[C@H](OC)[C@@H](C(=O)OC)[C@@]1([H])C[C@@]1([H]) N(CCC3=C1NC1=C3C=CC(OC)=C1)C2

73

DB00178

DB00206

70

71

[H][C@]12[C@H](C[C@H](O)C=C1C=C[C@H](C)[C@@H]2CC[C@@H](O)C[C@@H](O)CC(O)=O)OC(=O)[C@@H](C)CC

CCCCC(=O)N(CC1=CC=C(C=C1)C1=CC=CC=C1C1=NNN=N1)[C@@H](C(C)C)C(O)=O

DB00175

DB00177

68

69

[Mg++].[Mg++].[Mg++].OC(CC([O–])=O)(CC([O–])=O)C([O–])=O.OC(CC([O–])=O)(CC([O–])=O)C([O–])=O

OC(C(O)=O)C1=CC=CC=C1

DB11110

DB13218

O.O.O.[OH–].[O–].[O–].[O–].[O–].[O–].[O–].[O–].[O–].[Na+].[Na+].[Fe+3].[Fe+3].[Fe+3].[Fe+3].[Fe+3].OC[C@H] 1O[C@@](CO)(O[C@H]2O[C@H](CO)[C@@H](O)[C@H](O)[C@H]2O)[C@@H](O)[C@@H]1O

DB09146

65

66

NC(CO)(CO)CO

DB03754

67

Canonical SMILES

DrugBankID

No.

64

Table 4.2 (continued)

(continued)

C09

C07

C03

C09

C07

C08

C07

C09

C07

C03

C10

C03

C02

C09

C09

C10

B05

B05

B03

B05

ATC

4 Drug Effect Deep Learner Based on Graphical Convolutional Network 109

NS(=O)(=O)C1=CC2=C(NCNS2(=O)=O)C=C1C(F)(F)F

CCOC(=O)[C@H](CCC1=CC=CC=C1)N[C@@H](C)C(=O)N1CC2=CC=CC=C2C[C@H]1C(O)=O

DB00881

104

CCCCC1=NC=C(\C=C(/CC2=CC=CS2)C(O)=O)N1CC1=CC=C(C=C1)C(O)=O

NS(=O)(=O)C1=C(Cl)C=C2NC=NS(=O)(=O)C2=C1

DB00876

DB00880

102

103

CN1C=NC2=C1C(=O)N(CCCCC(C)=O)C(=O)N2C

CC1CC2=CC=CC=C2N1NC(=O)C1=CC(=C(Cl)C=C1)S(N)(=O)=O

DB00806

DB00808

100

[H][C@]12C[C@H](N(C(=O)[C@H](C)N[C@@H](CCC)C(=O)OCC)[C@@]1([H])CCCC2)C(O)=O

101

DB00774

DB00790

98

99

NCCCC[C@H](N[C@@H](CCC1=CC=CC=C1)C(O)=O)C(=O)N1CCC[C@H]1C(O)=O

CCCCC[C@H](O)\C=C\[C@H]1[C@H](O)CC(=O)[C@@H]1CCCCCCC(O)=O

DB00722

DB00770

96

97

CCOC(=O)[C@H](CCC1=CC=CC=C1)N[C@@H](C)C(=O)N1CC2=CC(OC)=C(OC)C=C2C[C@H]1C(O)=O

NS(=O)(=O)C1=C(Cl)C=C(NCC2=CC=CO2)C(=C1)C(O)=O

DB00691

DB00695

94

95

O[C@@H]1CO[C@@H](O[C@@H]2CO[C@@H](O)[C@H](OS(O)(=O)=O)[C@H]2OS(O)(=O)=O)[C@H] (OS(O)(=O)=O)[C@H]1OS(O)(=O)=O

CCCCC1=NC(Cl)=C(CO)N1CC1=CC=C(C=C1)C1=CC=CC=C1C1=NNN=N1

DB00678

DB00686

92

93

CC(C)NCC(O)COC1=CC=C(COCCOC(C)C)C=C1

OC(=O)C1=CN=CC=C1

DB00612

DB00627

90

91

CCOC(=O)[C@H](CCC1=CC=CC=C1)N[C@@H](C)C(=O)N1CCC[C@H]1C(O)=O

CC(CCC1=CC=CC=C1)NCC(O)C1=CC(C(N)=O)=C(O)C=C1

DB00584

DB00598

88

89

[H][C@@]1(CCC2=CC=CC=C2N(CC(O)=O)C1=O)N[C@@H](CCC1=CC=CC=C1)C(=O)OCC

CC(C)NCC(O)COC1=CC=CC2=C1C=CC=C2

DB00542

DB00571

CC1NC2=CC(Cl)=C(C=C2C(=O)N1C1=CC=CC=C1C)S(N)(=O)=O

DB00524

85

86

[H][C@@]12C[C@H](N(C(=O)[C@H](C)N[C@@H](CCC3=CC=CC=C3)C(=O)OCC)[C@@]1([H])CCCC2)C(O)=O

DB00519

87

Canonical SMILES

DrugBankID

No.

84

Table 4.2 (continued)

(continued)

C09

C03

C09

C03

C04

C09

C03

C01

C09

C03

C09

C05

C09

C10

C07

C07

C09

C07

C09

C03

C09

ATC

110 Y. Wu et al.

[H][C@]12C[C@@H](OC(=O)C=CC3=CC(OC)=C(OC)C(OC)=C3)[C@H](OC)[C@@H](C(=O)OC)[C@@]1([H])C[C@@] 1([H])N(CCC3=C1NC1=C3C=CC(OC)=C1)C2

CCCC(=O)NC1=CC=C(OCC(O)CNC(C)C)C(=C1)C(C)=O

DB01180

118

CC(C)C1=NC(=NC(C2=CC=C(F)C=C2)=C1\C=C\[C@@H](O)C[C@@H](O)CC(=O)O)N(C)S(C)(=O)=O

COC1=C(OC)C=C(CCNCC(O)COC2=CC=CC(C)=C2)C=C1

DB01295

123

CC(C)(C)NCC(O)COC1=CC=CC2=C1C[C@H](O)[C@H](O)C2

NNC1=NN=CC2=CC=CC=C12

DB01203

DB01275

121

122

C[C@H](CS)C(=O)N1CCC[C@H]1C(O)=O

DB01193

DB01197

119

120

COC1=CC=CC=C1OCCNCC(O)COC1=CC=CC2=C1C1=CC=CC=C1N2

DB01098

DB01136

116

[H][C@]12C[C@@H](OC(=O)C3=CC(OC)=C(OC)C(OC)=C3)[C@H](OC)[C@@H](C(=O)OC)[C@@] 1([H])C[C@@]1([H])N(CCC3=C1NC1=CC=CC=C31)C2

DB01089

115

117

CC(C)NCC(O)C1=CC(O)=C(O)C=C1

CC(C)C1=C(C(=O)NC2=CC=CC=C2)C(=C(N1CC[C@@H](O)C[C@@H](O)CC(O)=O)C1=CC=C(F)C=C1)C1=CC=CC=C1

DB01064

DB01076

113

114

CCCCC1=NC2(CCCC2)C(=O)N1CC1=CC=C(C=C1)C1=CC=CC=C1C1=NNN=N1

[H][C@@]12C[C@@]3([H])[C@]4([H])C[C@H](F)C5=CC(=O)C=C[C@]5(C)[C@@]4(F)[C@@H](O)C[C@]3(C)[C@@] 1(OC(C)(C)O2)C(=O)COC(C)=O

DB01029

DB01047

111

112

NS(=O)(=O)C1=C(Cl)C=C2NCNS(=O)(=O)C2=C1

CCOC(=O)C1=C(C)NC(C)=C(C1C1=C(Cl)C(Cl)=CC=C1)C(=O)OC

DB00999

DB01023

109

110

CCCC1=NC2=C(C=C(C=C2C)C2=NC3=CC=CC=C3N2C)N1CC1=CC=C(C=C1)C1=CC=CC=C1C(O)=O

[H][C@]1(CC[C@H](O)C2=CC=C(F)C=C2)C(=O)N(C2=CC=C(F)C=C2)[C@]1([H])C1=CC=C(O)C=C1

DB00966

DB00973

CC(C)NCC(O)COC1=CC=CC2=C1C=CN2

DB00960

106

107

[H][C@@]12CCN(C[C@@H]1C=C)[C@]([H])(C2)[C@@H](O)C1=C2C=C(OC)C=CC2=NC=C1

DB00908

108

Canonical SMILES

DrugBankID

No.

105

Table 4.2 (continued)

(continued)

C07

C02

C07

C09

C07

C02

C07

C10

C02

C10

C01

C05

C09

C07

C09

C10

C09

C07

C01

ATC

4 Drug Effect Deep Learner Based on Graphical Convolutional Network 111

CCCCC1=NC(C)=C(CC(=S)N(C)C)C(=O)N1CC1=CC=C(C=C1)C1=C(C=CC=C1)C1=NN=NN1

DB09279

143

COCCCOC1=C(OC)C=CC(C[C@@H](C[C@H](N)[C@@H](O)C[C@@H](C(C)C)C(=O)NCC(C)(C)C(N)=O)C(C)C)=C1

COC(=O)C1=C(C)NC(C)=C(C1C1=CC(=CC=C1)N(=O)=O)C(=O)OCCN1CCN(CC1)C(C1=CC=CC=C1)C1=CC=CC=C1

DB09026

DB09238

COC1=C(O)C=C(C=C1)C1=CC(=O)C2=C(O)C=C(O[C@@H]3O[C@H](CO[C@@H]4O[C@@H](C)[C@H](O)[C@@H](O)[C@H]4O) [C@@H](O)[C@H](O)[C@H]3O)C=C2O1

141

DB08995

140

CCOC1=NC2=C(N1CC1=CC=C(C=C1)C1=CC=CC=C1C1=NOC(=O)N1)C(=CC=C2)C(=O)OCC1=C(C)OC(=O)O1

CCNCC(O)C1=CC(O)=CC=C1

142

DB08822

DB08985

138

139

CC(N\C(NC#N)=N\C1=CC=NC=C1)C(C)(C)C

CC1=CC2=C(N1)C=CC=C2OCC(CNC(C)(C)C)OC(=O)C1=CC=CC=C1

DB06762

DB08807

136

137

CC(CCC1=CC=CC=C1)NC(C)C(O)C1=CC=C(O)C=C1

COC([C@H](OC1=NC(C)=CC(C)=N1)C(O)=O)(C1=CC=CC=C1)C1=CC=CC=C1

DB06152

DB06403

134

135

OCC1=CC=CN=C1

OC(CNCC(O)C1CCC2=C(O1)C=CC(F)=C2)C1CCC2=C(O1)C=CC(F)=C2

DB04145

DB04861

132

133

NS(=O)(=O)C1=CC(=CC(N2CCCC2)=C1OC1=CC=CC=C1)C(O)=O

OC1=CC=CC=C1

DB02925

DB03255

130

131

C[C@@H]1O[C@@H](OC[C@H]2O[C@@H](OC3=C(OC4=CC(O)=CC(O)=C4C3=O)C3=CC(O)=C(O)C=C3)[C@H](O)[C@@H](O) [C@@H]2O)[C@H](O)[C@H](O)[C@H]1O

CC(C)NCC(O)COC1=CC=CC=C1OCC=C

DB01580

DB01698

128

129

CC(C)(C)NC[C@H](O)COC1=CC=CC=C1C1CCCC1

CN1C=NC2=C1C(=O)NC(=O)N2C

DB01359

DB01412

CCOC(=O)[C@H](CCC1=CC=CC=C1)N[C@H]1CCCN2CCC[C@H](N2C1=O)C(O)=O

DB01340

125

126

CN1C(CSCC(F)(F)F)NC2=CC(Cl)=C(C=C2S1(=O)=O)S(N)(=O)=O

DB01324

127

Canonical SMILES

DrugBankID

No.

124

Table 4.2 (continued)

(continued)

C09

C08

C09

C05

C01

C09

C07

C02

C02

C04

C07

C10

C05

C03

C05

C07

C03

C07

C09

C03

ATC

112 Y. Wu et al.

DB00240

161

[H][C@@]12C[C@@H](C)[C@](O)(C(=O)CO)[C@@]1(C)C[C@H](O)[C@@]1([H])[C@@]2([H])[C@H](Cl)CC2=CC(=O)C=C[C@]12C

C[C@H](CCCC(C)(C)O)[C@@]1([H])CC[C@@]2([H])\C(CCC[C@]12C)=C\C=C1\C[C@@H](O)C[C@H](O)C1=C

COC1=C(C=C(C=C1)C1=CC2=C(C=C1)C=C(C=C2)C(O)=O)C12CC3CC(CC(C3)C1)C2

DB00136

DB00210

159

160

C[C@H]1CCC[C@@H](C)N1NC(=O)C1=CC=C(Cl)C(=C1)S(N)(=O)=O

CCOC1=NC2=CC=CC(C(O)=O)=C2N1CC1=CC=C(C=C1)C1=CC=CC=C1C1=NN=NN1

DB13792

DB13919

157

158

CO[C@H]1[C@@H](C[C@@H]2CN3CCC4=C(NC5=CC=C(OC)C=C45)[C@H]3C[C@@H]2[C@@H]1C(=O)OC)OC(=O) C1=CC(OC)=C(OC)C(OC)=C1

NS(=O)(=O)C1=C(Cl)C=C2CN(C3CCCCC3)C(=O)C2=C1

DB13617

DB13631

155

156

OC1=CC(=C(O)C=C1)S(O)(=O)=O

DB13430

DB13529

153

154

CCC(C)C(C)C1NC2=CC(Cl)=C(C=C2S(=O)(=O)N1)S(N)(=O)=O

COC1=CC(=CC=C1)C(=O)CCN[C@@H](C)[C@H](O)C1=CC=CC=C1

CN(CC1(C)CCCO1)S(=O)(=O)C1=CC(=C(Cl)C=C1)S(N)(=O)=O

DB13398

DB13405

151

152

CCOC(=O)[C@H](CCC1=CC=CC=C1)N[C@@H](C)C(=O)N(CC(O)=O)C1CC2=C(C1)C=CC=C2

[H][C@@]12CCCN1C(=O)[C@H](CC1=CC=CC=C1)N1C(=O)[C@](NC(=O)[C@H]3CN(C)[C@]4([H])CC5=CNC6=CC=CC(=C56) [C@@]4([H])C3)(O[C@@]21O)C(C)C

DB13312

DB13345

149

150

[H][C@@]1(CC[C@]2(O)[C@]3([H])CCC4=C[C@]([H])(CC[C@]4(C)[C@@]3([H])CC[C@]12C)O[C@]1([H])O[C@@]([H])(C) [C@]([H])(O)[C@@]([H])(O)[C@@]1([H])O)C1=COC(=O)C=C1

CCC1=C(C(=O)C2=CC(I)=C(O)C(I)=C2)C2=CC=CC=C2O1

DB13277

DB13307

147

148

C[C@H](CSC(=O)C1=CC=CC=C1)C(=O)N1C[C@H](C[C@H]1C(O)=O)SC1=CC=CC=C1

CCOC1O[C@H]([C@@H](COCC2=CC=CC=C2)OCC2=CC=CC=C2)[C@H](OCC2=CC=CC=C2)[C@H]1O

DB13166

DB13227

145

146

[Zn++].[H][C@@](O)(CO)[C@@]([H])(O)[C@]([H])(O)[C@@]([H])(O)C([O–])=O.[H][C@@](O)(CO)[C@@]([H])(O)[C@]([H])(O) [C@@]([H])(O)C([O–])=O

DB11248

Canonical SMILES

DrugBankID

No.

144

Table 4.2 (continued)

(continued)

D07

D10

D05

C09

C03

C02

C03

C05

C03

C03

C01

C04

C09

C01

C01

C05

C09

C05

ATC

4 Drug Effect Deep Learner Based on Graphical Convolutional Network 113

CC1(C)O[C@@H]2C[C@H]3[C@@H]4C[C@H](F)C5=CC(=O)CC[C@]5(C)[C@H]4[C@@H](O)C[C@]3(C)[C@@]2(O1)C(=O)CO

[H][C@@]12CC[C@H](C(=O)NC(C)(C)C)[C@@]1(C)CC[C@@]1([H])[C@@]2([H])CC[C@@]2([H])NC(=O)C=C[C@]12C

DB01216

182

CC1=CC(=O)N(O)C(=C1)C1CCCCC1

CCC[C@@H]1C[C@H](N(C)C1)C(=O)NC(C(C)Cl)[C@H]1O[C@H](SC)[C@H](O)[C@@H](O)[C@H]1O

DB01188

DB01190

180

181

OC(=O)C1=CC=CC=C1O

[H][C@@]12CC[C@](O)(C(=O)CO)[C@@]1(C)C[C@H](O)[C@@]1([H])[C@@]2([H])C[C@H](C)C2=CC(=O)C=C[C@]12C

DB00936

DB00959

178

[H][C@@]12CC[C@](O)(C(=O)CO)[C@@]1(C)C[C@H](O)[C@@]1([H])[C@@]2([H])CCC2=CC(=O)C=C[C@]12C

179

DB00846

DB00860

176

177

[H][C@@]12C[C@@H](C)[C@](O)(C(=O)CO)[C@@]1(C)C[C@H](O)[C@@]1(F)[C@@]2([H])C[C@H](F)C2=CC(=O)C=C[C@]12C

N1C2=CC=CC=C2N=C1C1=CSC=N1

DB00663

DB00730

174

175

[H][C@@]12C[C@@H](O)[C@](O)(C(=O)CO)[C@@]1(C)C[C@H](O)[C@@]1(F)[C@@]2([H])CCC2=CC(=O)C=C[C@]12C

CC(=O)NS(=O)(=O)C1=CC=C(N)C=C1

DB00620

DB00634

172

173

CCCCOC1=NC2=CC=CC=C2C(=C1)C(=O)NCCN(CC)CC

[H][C@@]12C[C@@H](C)[C@H](C(=O)CO)[C@@]1(C)C[C@H](O)[C@@]1(F)[C@@]2([H])CCC2=CC(=O)C=C[C@]12C

DB00527

DB00547

170

171

OC[C@@H](NC(=O)C(Cl)Cl)[C@H](O)C1=CC=C(C=C1)[N+]([O–])=O

NC[C@@H]1O[C@H](O[C@@H]2[C@@H](CO)O[C@@H](O[C@@H]3[C@@H](O)[C@H](N)C[C@H](N)[C@H]3O[C@H]3O[C@H](CN) [C@@H](O)[C@H](O)[C@H]3 N)[C@@H]2O)[C@H](N)[C@@H](O)[C@@H]1O

DB00446

DB00452

168

169

[H][C@@]12C[C@H](C)[C@](OC(=O)CC)(C(=O)COC(=O)CC)[C@@]1(C)C[C@H](O)[C@@]1(Cl)[C@@]2([H])CCC2=CC(=O)C=C[C@]12C

COC1=CC(OC)=C(Cl)C2=C1C(=O)[C@]1(O2)[C@H](C)CC(=O)C=C1OC

DB00394

DB00400

166

167

NC(=O)N\N=C\C1=CC=C(O1)[N+]([O–])=O

NC1=CC(=NC(N)=[N+]1[O–])N1CCCCC1

DB00336

DB00350

[H][C@@]12CC[C@](O)(C(C)=O)[C@@]1(C)C[C@H](O)[C@@]1(F)[C@@]2([H])C[C@H](C)C2=CC(=O)C=C[C@]12C

DB00324

163

164

NC1=NC(=O)C2=C(N1)N(CCC(CO)CO)C=N2

DB00299

165

Canonical SMILES

DrugBankID

No.

162

Table 4.2 (continued)

(continued)

D11

D10

D01

D10

D01

D07

D07

D01

D07

D10

D07

D07

D04

D09

D06

D01

D07

D11

D08

D07

D06

ATC

114 Y. Wu et al.

[H][C@]1(C)C[C@@]2([H])[C@]3([H])CCC4=CC(=O)C=C[C@]4(C)[C@@]3(F)C(=O)C[C@]2(C)[C@@]1(O)C(=O)CCl

[H][C@@]12C[C@@H](C)[C@](OC(=O)C3=CC=CO3)(C(=O)CCl)[C@@]1(C)C[C@H](O)[C@@]1(Cl)[C@@]2([H])CCC2=CC(=O)C=C[C@]12C

DB13158

DB14512

202

203

O.[OH–].[OH–].[OH–].[OH–].[OH–].[Al+3].[Al+3].[Cl–]

[H][C@@]12C[C@H](C)[C@](O)(C(=O)CCl)[C@@]1(C)C[C@H](O)[C@@]1(F)[C@@]2([H])CCC2=CC(=O)C=C[C@]12C

DB11573

DB11750

200

201

CC(C)(C)CC(C)(C)C1=CC=C(OCCOCC[N+](C)(C)CC2=CC=CC=C2)C=C1

OC1=CC=CC2=C1C(=O)C1=C(O)C=CC=C1C2

DB11125

DB11157

198

OC1=CC(O)=CC=C1

199

DB09357

DB11085

196

197

CC(C)(CO)[C@@H](O)C(=O)NCCCO

[H][C@@]12C[C@@]3([H])C(=C(O)[C@]1(O)C(=O)C(C(N)=O)=C(O)[C@H]2 N(C)C)C(=O)C1=C(O)C=CC(Cl)=C1[C@@]3(C)O

[H][C@@]12C[C@@H](C)[C@H](C(=O)CO)[C@@]1(C)C[C@H](O)[C@@]1(F)[C@@]2([H])C[C@H](F)C2=CC(=O)C=C[C@]12C

DB09093

DB09095

194

[H][C@@]1(C)C[C@@]2([H])[C@]3([H])C[C@]([H])(F)C4=CC(=O)C=C[C@]4(C)[C@@]3([H])[C@@]([H])(O)C[C@]2(C)[C@@]1([H]) C(=O)CO

DB08971

193

195

OC1=CC(Cl)=CC=C1OC1=C(Cl)C=C(Cl)C=C1

[H][C@@]12CC(=C)[C@](O)(C(=O)CO)[C@@]1(C)C[C@H](O)[C@@]1(F)[C@@]2([H])CCC2=CC(=O)C=C[C@]12C

DB08604

DB08970

191

192

OC1=C(SC2=C(O)C(Cl)=CC(Cl)=C2)C=C(Cl)C=C1Cl

[Ag+].NC1=CC=C(C=C1)S(=O)(=O)[N–]C1=NC=CC=N1

DB04813

DB05245

189

190

CCCO

NC(N)=O

DB03175

DB03904

187

188

O[C@H](\C=C\[C@@H](C)[C@@]1([H])CC[C@@]2([H])\C(CCC[C@]12C)=C\C=C1\C[C@@H](O)C[C@H](O)C1=C)C1CC1

NC1=CC=C(C=C1)C(O)=O

DB02300

DB02362

CC(C)(CO)[C@@H](O)C(=O)NCCC(O)=O

DB01783

184

185

[H][C@@]12C[C@H]3OC(CCC)O[C@@]3(C(=O)CO)[C@@]1(C)C[C@H](O)[C@@]1([H])[C@@]2([H])CCC2=CC(=O)C=C[C@]12C

DB01222

186

Canonical SMILES

DrugBankID

No.

183

Table 4.2 (continued)

(continued)

D07

D07

D07

D09

D05

D08

D10

D03

D07

D06

D07

D07

D08

D06

D10

D02

D08

D02

D05

D03

D07

ATC

4 Drug Effect Deep Learner Based on Graphical Convolutional Network 115

[H][C@@]12CC[C@H](C(C)=O)[C@@]1(C)CC[C@]1([H])[C@@]2([H])C=CC2=CC(=O)CC[C@@]12C

CCOC1=CC=CC=C1OCCN[C@H](C)CC1=CC(=C(OC)C=C1)S(N)(=O)=O

DB00706

224

COC1=NC=CN=C1NS(=O)(=O)C1=CC=C(N)C=C1

CN1N=C(SC1=NC(C)=O)S(N)(=O)=O

DB00664

DB00703

222

223

[H][C@@]12CC[C@H](O)[C@@]1(C)CC[C@@]1([H])[C@@]2([H])CCC2=CC(=O)CC[C@]12C

[H][C@@]12CCC(=O)[C@@]1(C)CC[C@]1([H])C3=C(CC[C@@]21[H])C=C(O)C=C3

DB00624

DB00655

220

221

NS(=O)(=O)C1=C(Cl)C=C2NC(NS(=O)(=O)C2=C1)C1CC2CC1C=C2

[H][C@@]12CC[C@](OC(C)=O)(C(C)=O)[C@@]1(C)CC[C@@]1([H])[C@@]2([H])C[C@H](C)C2=CC(=O)CC[C@]12C

DB00603

DB00606

218

219

COC1=CC=CC=C1OC1=C(NS(=O)(=O)C2=CC=C(C=C2)C(C)(C)C)N=C(N=C1OCCO)C1=NC=CC=N1

[H][C@@]12CC3=CNC4=CC=CC(=C34)C1=C[C@@H](CN2C)NC(=O)N(CC)CC

DB00559

DB00589

216

[H][C@@]12CC[C@H](C(C)=O)[C@@]1(C)CC[C@@]1([H])[C@@]2([H])CCC2=CC(=O)CC[C@]12C

217

DB00378

DB00396

214

215

[H][C@@]12CC3=CNC4=CC=CC(=C34)C1=C[C@H](CN2C)C(=O)N[C@@H](CC)CO

[H][C@@]12CC[C@@](O)(C#C)[C@@]1(CC)CC[C@]1([H])[C@@]3([H])CCC(=O)C=C3CC[C@@]21[H]

DB00353

DB00367

212

213

COC1=CC2=C(C=C1OC)C(N)=NC(=N2)N(C)CCCNC(=O)C1CCCO1

[H][C@@]12CC[C@](OC(C)=O)(C(C)=O)[C@@]1(C)CC[C@@]1([H])[C@@]2([H])C=C(C)C2=CC(=O)CC[C@]12C

DB00346

DB00351

210

211

[H][C@@]12CC[C@@](O)(C#C)[C@@]1(CC)CC(=C)[C@]1([H])[C@@]3([H])CCCC=C3CC[C@@]21[H]

NS(=O)(=O)C1=C(Cl)C=CC(=C1)C1(O)NC(=O)C2=CC=CC=C12

DB00304

DB00310

208

209

CCCC1=NN(C)C2=C1N=C(NC2=O)C1=CC(=CC=C1OCC)S(=O)(=O)N1CCN(C)CC1

NC1=CC=C(C=C1)S(N)(=O)=O

DB00203

DB00259

[H][C@@]1(OC(=O)C(O)=C1O)[C@@H](O)CO

DB00126

205

206

[H][C@@]12CC[C@](OC(=O)CCC)(C(=O)CO)[C@@]1(C)C[C@H](O)[C@@]1([H])[C@@]2([H])CCC2=CC(=O)CC[C@]12C

DB14540

207

Canonical SMILES

DrugBankID

No.

204

Table 4.2 (continued)

(continued)

G04

G01

G01

G03

G03

G01

G03

G02

G01

G03

G03

G03

G02

G03

G04

G01

G03

G01

G01

G01

D07

ATC

116 Y. Wu et al.

[H][C@@]12CC[C@@](OC(C)=O)(C#C)[C@@]1(CC)CC[C@]1([H])[C@@]3([H])CC\C(C=C3CC[C@@]21[H])=N/O

NS(=O)(=O)C1=C(Cl)C=C2NC(NS(=O)(=O)C2=C1)C(Cl)Cl

CC1=NS(=O)(=O)C2=C(N1)C=CC(Cl)=C2

[H][C@@]1(CC[C@@]2([H])[C@]3([H])CC[C@@]4([H])NC(=O)C=C[C@]4(C)[C@@]3([H])CC[C@]12C)C(=O)NC1=CC (=CC=C1C(F)(F)F)C(F)(F)F

CCC1NC(=O)C2=CC(=C(Cl)C=C2N1)S(N)(=O)=O

DB01021

DB01119

DB01126

DB01325

241

243

244

[H][C@@]12CC[C@@](O)(C#C)[C@@]1(C)CC[C@]1([H])C3=C(CC[C@@]21[H])C=C(O)C=C3

242

DB00957

DB00977

239

240

NC1=CC=C(C=C1)S(=O)(=O)NC1=CC=CC=N1

CCCCC(C)(O)C\C=C\[C@H]1[C@H](O)CC(=O)[C@@H]1CCCCCCC(=O)OC

DB00891

DB00929

237

238

CCCCNC1=C(OC2=CC=CC=C2)C(=CC(=C1)C(O)=O)S(N)(=O)=O

[H]\C(C)=C(/C(=C(\[H])C)/C1=CC=C(O)C=C1)\C1=CC=C(O)C=C1

DB00887

DB00890

235

236

[H][C@@]12CC[C@@](O)(C#CC)[C@@]1(C)C[C@H](C1=CC=C(C=C1)N(C)C)C1=C3CCC(=O)C=C3CC[C@@]21[H]

CCN[C@H]1C[C@H](C)S(=O)(=O)C2=C1C=C(S2)S(N)(=O)=O

DB00834

DB00869

233

234

[H][C@]12CC3=C(NC4=CC=CC=C34)[C@H](N1C(=O)CN(C)C2=O)C1=CC2=C(OCO2)C=C1

[H][C@@]12C[C@H](O)C[C@]3(O)C[C@H](O)[C@@H](C(O)=O)[C@]([H])(C[C@@H](O[C@]4([H])O[C@H](C)[C@@H](O) [C@H](N)[C@@H]4O)\C=C\C=C\C=C\C=C\C[C@@H](C)OC(=O)\C=C\[C@@]1([H])O2)O3

DB00820

DB00826

231

232

OC(=O)C1=CC(=CC=C1O)\N=N\C1=CC=C(C=C1)S(=O)(=O)NC1=NC=CC=C1

CC(=O)NC1=NN=C(S1)S(N)(=O)=O

DB00795

DB00819

229

230

[H][C@@]12CC[C@H](O)[C@@]1(C)CC[C@]1([H])C3=C(CC[C@@]21[H])C=C(O)C=C3

COC1=CC2=C(C=C1)C=C(C=C2)[C@H](C)C(O)=O

DB00783

DB00788

[H][C@@]12CC[C@@](O)(C#C)[C@@]1(C)CC[C@]1([H])[C@@]3([H])CCC(=O)C=C3CC[C@@]21[H]

DB00717

226

227

[H][C@]12CC3=C(C(O)=C(O)C=C3)C3=CC=CC(CCN1C)=C23

DB00714

228

Canonical SMILES

DrugBankID

No.

225

Table 4.2 (continued)

(continued)

G01

G04

G01

G01

G03

G03

G02

G01

G03

G01

G01

G03

G01

G04

G01

G01

G02

G03

G03

G04

ATC

4 Drug Effect Deep Learner Based on Graphical Convolutional Network 117

CN(C)C1=CC=C(C=C1)[C@H]1C[C@@]2(C)[C@@H](CC[C@]2(O)C(C)=O)[C@@H]2CCC3=CC(=O)CCC3=C12

OC(=O)C1=CC=CC=C1C(=O)NC1=CC=C(C=C1)S(=O)(=O)NC1=NC=CS1

DB13248

265

C[C@H](O)C(=O)[C@@]1(C)CC[C@H]2[C@@H]3CCC4=CC(=O)CCC4=C3CC[C@]12C

CC[C@H](C1=CC=C2C=C(OC)C=CC2=C1)C(C)(C)C(O)=O

DB13129

DB13143

263

264

C[C@]12CC[C@H]3[C@@H](CCC4=CCCC[C@H]34)[C@@H]1CC[C@@]2(O)C#C

CC1=NC=C(N1CC(O)CCl)N(=O)=O

DB12474

DB13026

261

262

CC(=O)[C@@]1(O)CC[C@H]2[C@@H]3C=C(C)C4=CC(=O)CC[C@@H]4[C@H]3CC[C@]12C

CCC12CCC3C(CCC4=CC(=O)CCC34)C1CCC2(O)C#C

DB09389

DB11636

259

260

[H][C@@]12CC[C@@](O)(CC#N)[C@@]1(C)CCC1=C3CCC(=O)C=C3CC[C@@]21[H]

[H][C@@]12CC[C@](C)(C(C)=O)[C@@]1(C)CC[C@@]1([H])[C@@]2([H])C=C(C)C2=CC(=O)CC[C@]12C

DB09123

DB09124

257

CCCS(=O)(=O)NC1=C(F)C(C(=O)C2=CNC3=NC=C(C=C23)C2=CC=C(Cl)C=C2)=C(F)C=C1

258

DB08867

DB08881

255

256

CC[C@]12CC[C@H]3[C@@H](CCC4=CC(=O)CC[C@H]34)[C@@H]1C=C[C@@]2(O)C#C

NCC1=CC=C(C=C1)S(N)(=O)=O

DB06730

DB06795

253

254

CC1=C(N(CC2=CC=C(OCCN3CCCCCC3)C=C2)C2=C1C=C(O)C=C2)C1=CC=C(O)C=C1

[H][C@@]12CC[C@](C)(O)[C@@]1(C)CC[C@@]1([H])[C@@]2([H])CCC2=CC(=O)CC[C@]12C

DB06401

DB06710

251

252

[H][C@@]12C[C@]1([H])[C@@]1(C)C(=CC2=O)C(Cl)=C[C@@]2([H])[C@]3([H])CC[C@](OC(C)=O)(C(C)=O)[C@@]3(C)CC[C@]12[H]

NC1=CC=C(C=C1)S(=O)(=O)NC1=NC=CS1

DB04839

DB06147

249

250

[H][C@@]12C[C@@H](O)[C@H](O)[C@@]1(C)CC[C@]1([H])C3=C(CC[C@@]21[H])C=C(O)C=C3

OC1=C(I)C=C(Cl)C2=C1N=CC=C2

DB04573

DB04815

OC[C@H]1O[C@H]([C@H](O)[C@@H]1O)N1C=NC2=C1NC=NC2=O

DB04335

246

247

[H][C@@]12CCC(=O)[C@@]1(C)CC[C@@]1([H])[C@@]2([H])CC=C2C[C@@]([H])(O)CC[C@]12C

DB01708

248

Canonical SMILES

DrugBankID

No.

245

Table 4.2 (continued)

(continued)

G01

G03

G03

G01

G03

G03

G03

G03

G03

G01

G03

G01

G03

G03

G03

G01

G03

G01

G03

G01

G03

ATC

118 Y. Wu et al.

[H][C@@]12C[C@@H](C)[C@](C)(C(=O)CC)[C@@]1(C)C[C@H](O)[C@@]1([H])[C@@]2([H])CCC2=CC(=O)C=C[C@]12C

DB00256

286

[H][C@@]12C[C@@]3([H])C(C(=O)C4=C(O)C=CC=C4[C@@]3(C)O)=C(O)[C@]1(O)C(=O)C(C(=O)NCNCCCC[C@H](N)C(O)=O)=C(O) [C@H]2 N(C)C

[H][C@@]12[C@@H](C)C3=CC=CC(O)=C3C(=O)C1=C(O)[C@]1(O)C(=O)C(C(N)=O)=C(O)[C@@H](N(C)C)[C@]1([H])[C@H]2O

OC[C@H]1O[C@H](C[C@@H]1O)N1C=C(I)C(=O)NC1=O

DB00249

DB00254

284

285

NC1=CC(O)=C(C=C1)C(O)=O

CC1=C2NC(=O)C3=C(N=CC=C3)N(C3CC3)C2=NC=C1

DB00233

DB00238

282

283

OC(CN1C=NC=N1)(CN1C=NC=N1)C1=C(F)C=C(F)C=C1

NC1=NC=NC2=C1N=CN2[C@@H]1O[C@H](CO)[C@@H](O)[C@@H]1O

DB00194

DB00196

280

281

[H][C@@]12CC[C@](O)(C(=O)CO)[C@@]1(C)CC(=O)[C@@]1([H])[C@@]2([H])CCC2=CC(=O)CC[C@]12C

CC1=C(O)C(CO)=C(CO)C=N1

DB14681

DB00165

278

OC(=O)CC1=CC(I)=C(OC2=CC=C(O)C(I)=C2)C(I)=C1

279

DB00896

DB03604

276

277

N[C@@H](CC1=CC(I)=C(OC2=CC(I)=C(O)C=C2)C(I)=C1)C(O)=O

CN1C=CNC1=S

DB00279

DB00763

274

275

[H][C@@]12CC[C@@](O)(C#C)[C@@]1(C)CC[C@]1([H])[C@@]3([H])CC[C@H](O)C=C3CC[C@@]21[H]

[H][C@@]12CC[C@](O)(C(C)=O)[C@@]1(C)CC[C@@]1([H])[C@@]2([H])CCC2=CC(=O)CC[C@]12C

DB13866

DB14570

272

273

NS(=O)(=O)C1=CC2=C(NC(CC3CCCC3)NS2(=O)=O)C=C1Cl

C[C@]12CC[C@H]3[C@@H](CC=C4C=C(CC[C@H]34)OC3CCCC3)[C@@H]1CC[C@@]2(O)C#C

DB13532

DB13685

270

271

CN1C(\C=C\C2=CC=NC(N)=N2)=NC=C1[N+]([O–])=O

[H][C@@]12CC[C@](O)(C(C)=O)[C@@]1(C)CC[C@@]1([H])[C@@]2([H])C=C(Cl)C2=CC(=O)CC[C@]12C

DB13350

DB13528

CCOC(=O)C(=C\C1=NC=C(N1C)[N+]([O–])=O)\C(C)=O

DB13319

267

268

CC(=O)NC1=CC(=CC=C1O)[As](O)(O)=O

DB13268

269

Canonical SMILES

DrugBankID

No.

266

Table 4.2 (continued)

(continued)

J01

J01

J05

J05

J04

J01

J05

J04

H02

H03

H02

H03

H03

G03

G03

G03

G01

G03

G01

G01

G01

ATC

4 Drug Effect Deep Learner Based on Graphical Convolutional Network 119

COC1=CC(CC2=CN=C(N)N=C2N)=CC(OC)=C1OC

[H][C@]12C[C@@]3([H])[C@H](N(C)C)C(O)=C(C(N)=O)C(=O)[C@@]3(O)C(O)=C1C(=O)C1=C([C@H]2O)C(Cl)=CC=C1O

DB00618

307

[H][C@@]12[C@@H](O)[C@@]3([H])C(C(=O)C4=C(C=CC=C4O)[C@@]3(C)O)=C(O)[C@]1(O)C(=O)C(C(N)=O)=C(O)[C@H]2 N(C)C

[H][C@]12SC(C)(C)[C@@H](N1C(=O)[C@H]2NC(=O)C1=C(OCC)C=CC2=CC=CC=C12)C(O)=O

DB00595

DB00607

305

306

CC1=NN=C(NS(=O)(=O)C2=CC=C(N)C=C2)S1

[H][C@]12SC(C)(C)[C@@H](N1C(=O)[C@H]2NC(=O)C(C(O)=O)C1=CC=CC=C1)C(O)=O

DB00576

DB00578

303

304

[H][C@@]12CC3=C(C(O)=C(NC(=O)CNC(C)(C)C)C=C3N(C)C)C(=O)C1=C(O)[C@]1(O)C(=O)C(C(N)=O)=C(O)[C@@H](N(C)C)[C@]1([H])C2

J01

J01

J01

J01

J01

J01

J01

J05

J05

J01

J04

J05

J01

J01

J01

J04

J04

J01

J01

J05

J01

ATC

(continued)

OC(=O)C1=CN(C2CC2)C2=CC(N3CCNCC3)=C(F)C=C2C1=O

DB00537

DB00560

301

302

CC1=CN([C@H]2C[C@H](N=[N+]=[N–])[C@@H](CO)O2)C(=O)NC1=O

CC(C)[C@H](NC(=O)N(C)CC1=CSC(=N1)C(C)C)C(=O)N[C@H](C[C@H](O)[C@H](CC1=CC=CC=C1)NC(=O)OCC1=CN=CS1)CC1=CC=CC=C1

DB00495

DB00503

299

[H][C@]12SC(C)(C)[C@@H](N1C(=O)[C@H]2NC(=O)C1=C(C)ON=C1C1=C(Cl)C=CC=C1Cl)C(O)=O

300

DB00440

DB00485

297

298

[H][C@]12SC(C)(C)[C@@H](N1C(=O)[C@H]2NC(=O)COC1=CC=CC=C1)C(O)=O

CC(=O)OCC(CCN1C=NC2=CN=C(N)N=C12)COC(C)=O

DB00417

DB00426

295

296

NC1=CC=C(C=C1)S(=O)(=O)NC1=NC=CC=N1

[H][C@]12SC(C)(C)[C@@H](N1C(=O)[C@H]2NC(=O)[C@H](N)C1=CC=CC=C1)C(O)=O

DB00359

DB00415

293

294

CC[C@@H](CO)NCCN[C@@H](CC)CO

NC(=O)C1=NC=CN=C1

DB00330

DB00339

291

292

[H][C@]12SC(C)(C)[C@@H](N1C(=O)[C@H]2NC(=O)C1=C(C)ON=C1C1=C(Cl)C=CC=C1F)C(O)=O

[H][C@]12SC(C)(C)[C@@H](N1C(=O)[C@H]2NC(=O)[C@H](NC(=O)N1CCN(CC)C(=O)C1=O)C1=CC=CC=C1)C(O)=O

DB00301

DB00319

[H][C@@](C)(CN1C=NC2=C(N)N=CN=C12)OCP(=O)(OCOC(=O)OC(C)C)OCOC(=O)OC(C)C

DB00300

288

289

CC1=NOC(NS(=O)(=O)C2=CC=C(N)C=C2)=C1C

DB00263

290

Canonical SMILES

DrugBankID

No.

287

Table 4.2 (continued)

120 Y. Wu et al.

[H][C@]12SC(C)(C)[C@@H](N1C(=O)[C@H]2NC(=O)[C@H](NC(=O)N1CCN(C1=O)S(C)(=O)=O)C1=CC=CC=C1)C(O)=O

NC1=C(F)C=NC(=O)N1

DB01099

328

[H][C@](NC(=O)N1CCNC1=O)(C(=O)N[C@@H]1C(=O)N2[C@@H](C(O)=O)C(C)(C)S[C@]12[H])C1=CC=CC=C1

CN[C@H]1[C@H](O)[C@@H](O)[C@H](CO)O[C@H]1O[C@H]1[C@H](O[C@H]2[C@H](O)[C@@H](O)[C@H](NC(N)=N) [C@@H](O)[C@@H]2NC(N)=N)O[C@@H](C)[C@]1(O)C=O

DB01061

DB01082

326

327

[H][C@]12SC(C)(C)[C@@H](N1C(=O)[C@H]2NC(=O)CC1=CC=CC=C1)C(O)=O

CCN1C=C(C(O)=O)C(=O)C2=CC(F)=C(C=C12)N1CCNCC1

DB01053

DB01059

324

325

NC1=NC2=C(N=CN2[C@@H]2C[C@H](CO)C=C2)C(NC2CC2)=N1

[H][C@@]12CC3=C(C(O)=CC=C3N(C)C)C(=O)C1=C(O)[C@]1(O)C(=O)C(C(N)=O)=C(O)[C@@H](N(C)C)[C@]1([H])C2

DB01017

DB01048

322

323

NC1=NC2=C(N=CN2COC(CO)CO)C(=O)N1

CC1=CC(NS(=O)(=O)C2=CC=C(N)C=C2)=NO1

DB01004

DB01015

320

NNC(=O)C1=CC=NC=C1

321

DB00948

DB00951

318

319

NC1=NC(=O)N(C=C1F)[C@@H]1CS[C@H](CO)O1

[H][C@@]12[C@@H](O)[C@]3([H])C(=C)C4=C(C(O)=CC=C4)C(=O)C3=C(O)[C@]1(O)C(=O)C(C(N)=O)=C(O)[C@H]2 N(C)C

DB00879

DB00931

316

317

[H][C@]1([C@@H](C)O)C(=O)N2C(C(O)=O)=C(S[C@@H]3CN[C@@H](C3)C(=O)N(C)C)[C@H](C)[C@]12[H]

NC1=NC(=O)C2=C(N1)N(COCCO)C=N2

DB00760

DB00787

314

315

[H][C@]12SC(C)(C)[C@@H](N1C(=O)[C@H]2NC(=O)C1=C(C)ON=C1C1=CC=CC=C1)C(O)=O

[H][C@]12SC(C)(C)[C@@H](N1C(=O)[C@H]2N1C(=O)[C@H](NC1(C)C)C1=CC=CC=C1)C(O)=O

DB00713

DB00739

312

313

[O–][N+](=O)C1=CC=C(O1)\C=N\N1CC(=O)NC1=O

NC1=NC(=O)N(C=C1)[C@@H]1CS[C@H](CO)O1

DB00698

DB00709

CC1=CN([C@@H]2O[C@H](CO)C=C2)C(=O)NC1=O

DB00649

309

310

FC(F)(F)[C@]1(OC(=O)NC2=C1C=C(Cl)C=C2)C#CC1CC1

DB00625

311

Canonical SMILES

DrugBankID

No.

308

Table 4.2 (continued)

(continued)

J02

J04

J01

J01

J01

J05

J01

J04

J05

J04

J01

J01

J05

J05

J01

J01

J01

J05

J01

J05

J05

ATC

4 Drug Effect Deep Learner Based on Graphical Convolutional Network 121

[H][C@]12SC(C)(C)[C@@H](N1C(=O)[C@H]2NC(=O)C1=C(OC)C=CC=C1OC)C(O)=O

CC(C)OC(=O)[C@H](C)N[P@](=O)(OC[C@H]1O[C@@H](N2C=CC(=O)NC2=O)[C@](C)(F)[C@@H]1O)OC1=CC=CC=C1

DB08934

349

CC1=CC(\C=C\C#N)=CC(C)=C1NC1=CC=NC(NC2=CC=C(C=C2)C#N)=N1

[H][C@]12CN3C=C(C(=O)NCC4=CC=C(F)C=C4F)C(=O)C(O)=C3C(=O)N1[C@H](C)CCO2

DB08864

DB08930

347

348

[H][C@](CO)(NC(=O)C(Cl)Cl)[C@]([H])(O)C1=CC=C(C=C1)S(C)(=O)=O

CC1=C(C)N=C(NS(=O)(=O)C2=CC=C(N)C=C2)O1

DB08621

DB08798

345

346

COC1=CN=C(NS(=O)(=O)C2=CC=C(N)C=C2)N=C1

COC1=NC(OC)=NC(NS(=O)(=O)C2=CC=C(N)C=C2)=C1

DB06150

DB06821

343

344

[H]C(=N[C@@H]1C(=O)N2[C@@H](C(=O)OCOC(=O)C(C)(C)C)C(C)(C)S[C@]12[H])N1CCCCCC1

[H][C@]12SC(C)(C)[C@@H](N1C(=O)[C@H]2NC(=O)[C@H](C(O)=O)C1=CSC=C1)C(O)=O

DB01605

DB01607

341

[H][C@]12SC(C)(C)[C@@H](N1C(=O)[C@H]2NC(=O)[C@H](N)C1=CC=CC=C1)C(=O)OCOC(=O)C(C)(C)C

342

DB01603

DB01604

339

340

CC1=CC(C)=NC(NS(=O)(=O)C2=CC=C(N)C=C2)=N1

[H][C@]12SC(C)(C)[C@@H](N1C(=O)[C@H]2NC(=O)[C@H](N)C1=CC=CC=C1)C(=O)OC(C)OC(=O)OCC

DB01582

DB01602

337

338

CO\N=C(/C(=O)N[C@@H]1C(=O)N2[C@]1([H])SCC(C[N+]1(C)CCCC1)=C2C([O–])=O)C1=CSC(N)=N1

CC1=NC(NS(=O)(=O)C2=CC=C(N)C=C2)=NC=C1

DB01413

DB01581

335

336

[H][C@@]12CCO[C@]1([H])OC[C@@H]2OC(=O)N[C@@H](CC1=CC=CC=C1)[C@H](O)CN(CC(C)C)S(=O)(=O)C1=CC=C(N)C=C1

[H][C@@]12C[C@@]3([H])C(C(=O)C4=C(O)C=CC=C4[C@@]3(C)O)=C(O)[C@]1(O)C(=O)C(C(=O)NCN1CCCC1)=C(O)[C@H]2 N(C)C

DB01264

DB01301

333

334

CC1COC2=C3N1C=C(C(O)=O)C(=O)C3=CC(F)=C2N1CCN(C)CC1

[H][C@]12SCC(CSC3=NC(=O)C(=O)NN3C)=C(N1C(=O)[C@H]2NC(=O)C(=N/OC)\C1=CSC(N)=N1)C(O)=O

DB01165

DB01212

[H][C@]12SC(C)(C)[C@@H](N1C(=O)[C@H]2NC(=O)C1=C(C)ON=C1C1=CC=CC=C1Cl)C(O)=O

DB01147

330

331

C[C@H]1COC2=C3N1C=C(C(O)=O)C(=O)C3=CC(F)=C2N1CCN(C)CC1

DB01137

332

Canonical SMILES

DrugBankID

No.

329

Table 4.2 (continued)

(continued)

J05

J05

J05

J01

J01

J01

J01

J01

J01

J01

J01

J01

J01

J01

J01

J01

J05

J01

J01

J01

J01

ATC

122 Y. Wu et al.

[H][C@]12SC(C)(C)[C@@H](N1C(=O)[C@H]2NC(=O)[C@H](N)C1=CC=CC=C1)C(=O)OCOC(=O)[C@@H]1N2C(=O)C[C@@]2([H]) S(=O)(=O)C1(C)C

CN1C(=O)NN=C1CN1C=CC(=C(OC2=CC(=CC(Cl)=C2)C#N)C1=O)C(F)(F)F

DB12127

356

CCN(CC)CCOC(=O)C1=CC=C(N)C=C1.[H][C@]12SC(C)(C)[C@@H](N1C(=O)[C@H]2NC(=O)CC1=CC=CC=C1)C(O)=O

CN(C)CCOCC(=O)NC12CC3CC(CC(C3)C1)C2

NC1=NC(=O)N(C=C1)[C@@H]1O[C@H](CO)[C@@H](O)[C@@H]1O

DB00987

369

CN(CC1=CN=C2N=C(N)N=C(N)C2=N1)C1=CC=C(C=C1)C(=O)N[C@@H](CCC(O)=O)C(O)=O

COC1=CC=CC2=C1C(=O)C1=C(O)C3=C(C[C@](O)(C[C@@H]3O[C@H]3C[C@H](N)[C@H](O)[C@H](C)O3)C(C)=O)C(O)=C1C2=O

DB00563

DB00694

367

368

OC[C@H]1O[C@H](C[C@@H]1O)N1C=C(C(=O)NC1=O)C(F)(F)F

FC1=CNC(=O)NC1=O

DB00432

DB00544

365

366

NC1=C2N=CN([C@H]3C[C@H](O)[C@@H](CO)O3)C2=NC(Cl)=N1

CC\C(=C(\CC)C1=CC=C(O)C=C1)C1=CC=C(O)C=C1

DB00242

DB00255

363

364

[H][C@]12SC(C)(C)[C@@H](N1C(=O)[C@H]2NC(=O)C(C)OC1=CC=CC=C1)C(O)=O

DB13288

DB13337

361

362

CO[C@]1(NC(=O)C(C(O)=O)C2=CSC=C2)[C@H]2SC(C)(C)[C@@H](N2C1=O)C(O)=O

CC(=O)NC1=CC=C(\C=N\NC(N)=S)C=C1

DB12343

DB12829

359

360

[H][C@@]12CC3=C(F)C=C(NC(=O)CN4CCCC4)C(O)=C3C(=O)C1=C(O)[C@]1(O)C(=O)C(C(N)=O)=C(O)[C@@H](N(C)C)[C@]1([H])C2

DB12301

DB12329

357

358

CO[C@H]1C[C@H](O[C@H]2[C@H](C)[C@@H](O[C@@H]3O[C@H](C)C[C@@H]([C@H]3O)N(C)C)[C@@H](C)C[C@@] 3(CO3)C(=O)[C@H](C)[C@@H](O)[C@@H](C)[C@@H](C)OC(=O)[C@@H]2C)O[C@@H](C)[C@@H]1O

DB09320

DB11442

354

355

CC(C)OC(=O)[C@H](C)N[P@](=O)(CO[C@H](C)CN1C=NC2=C(N)N=CN=C12)OC1=CC=CC=C1

[H][C@]12SC(C)(C)[C@@H](N1C(=O)[C@H]2NC(=O)C(C(=O)OC1=CC2=C(CCC2)C=C1)C1=CC=CC=C1)C(O)=O

DB09299

DB09319

COC1=C(C=C(C=C1C1=CC2=CC=C(NS(C)(=O)=O)C=C2C=C1)N1C=CC(=O)NC1=O)C(C)(C)C

DB09183

351

352

[H][C@@](CO)(C(C)C)N1C=C(C(O)=O)C(=O)C2=C1C=C(OC)C(CC1=C(F)C(Cl)=CC=C1)=C2

DB09101

353

Canonical SMILES

DrugBankID

No.

350

Table 4.2 (continued)

(continued)

L01

L01

L04

L01

L01

L02

L04

J01

J05

J04

J01

J01

J05

J01

J01

J01

J01

J05

J05

J05

ATC

4 Drug Effect Deep Learner Based on Graphical Convolutional Network 123

OC(=O)C1CCN2C1=CC=C2C(=O)C1=CC=CC=C1

CC(C(O)=O)C1=CC(=CC=C1)C(=O)C1=CC=CC=C1

DB01009

390

OC(CC1=CN=CC=C1)(P(O)(O)=O)P(O)(O)=O

CC1=C(Cl)C(NC2=CC=CC=C2C(O)=O)=C(Cl)C=C1

DB00884

DB00939

388

389

CCCCCN(C)CCC(O)(P(O)(O)=O)P(O)(O)=O

CN1C(C(=O)NC2=NC=C(C)S2)=C(O)C2=C(C=CC=C2)S1(=O)=O

DB00710

DB00814

386

387

NCCCC(O)(P(O)(O)=O)P(O)(O)=O

CN1C(C(=O)NC2=NC=CC=C2)=C(O)C2=C(C=CC=C2)S1(=O)=O

DB00554

DB00630

384

385

CC1=CC=C(C=C1)C1=CC(=NN1C1=CC=C(C=C1)S(N)(=O)=O)C(F)(F)F

CN1C(CC(O)=O)=CC=C1C(=O)C1=CC=C(C)C=C1

DB00482

DB00500

382

[H][C@]1(C[C@@H]2CC[N@]1C[C@@H]2C=C)[C@H](O)C1=CC=NC2=CC=C(OC)C=C12

383

DB00465

DB00468

380

381

COC1=C(OCC(O)COC(N)=O)C=CC=C1

OC1=NC=NC2=C1C=NN2

DB00423

DB00437

378

379

ClC1=CC2=C(OC(=O)N2)C=C1

OC(CN1C=CN=C1)(P(O)(O)=O)P(O)(O)=O

DB00356

DB00399

376

377

[H][C@@]12CC[C@](O)(C(C)=O)[C@@]1(C)CC[C@]1([H])[C@@]3([H])CCC(=O)C=C3CC[C@@]21[H]

CC(C)CCC[C@@H](C)[C@@]1([H])CC[C@@]2([H])\C(CCC[C@]12C)=C\C=C1\C[C@@H](O)CCC1=C

DB13230

DB00169

374

375

NCCC1=CNC=N1

FC1=CN(C2CCCO2)C(=O)NC1=O

DB05381

DB09256

CC(O)(CS(=O)(=O)C1=CC=C(F)C=C1)C(=O)NC1=CC(=C(C=C1)C#N)C(F)(F)F

DB01128

371

372

S=C1N=CNC2=C1NC=N2

DB01033

373

Canonical SMILES

DrugBankID

No.

370

Table 4.2 (continued)

(continued)

M01

M01

M05

M01

M05

M05

M01

M01

M01

M09

M01

M04

M03

M05

M03

M05

L02

L01

L03

L02

L01

ATC

124 Y. Wu et al.

OC(=O)CC1=C(N=C(S1)C1=CC=CC=C1)C1=CC=C(Cl)C=C1

CCCCN1CCCCC1C(=O)NC1=C(C)C=CC=C1C

DB00297

411

[H][C@@]12CC3=CNC4=CC=CC(=C34)[C@@]1([H])C[C@H](CN2CC=C)C(=O)N(CCCN(C)C)C(=O)NCC

O=C1NC(=O)C(N1)(C1=CC=CC=C1)C1=CC=CC=C1

DB00248

DB00252

409

410

OC1N=C(C2=CC=CC=C2Cl)C2=C(NC1=O)C=CC(Cl)=C2

COC1=CC=CC(=C1)[C@@]1(O)CCCC[C@@H]1CN(C)C

DB00186

DB00193

407

408

OC(=O)COC1=NN(CC2=CC=CC=C2)C2=CC=CC=C12

CN1N(C(=O)C(NC(=O)C2=CC=CN=C2)=C1C)C1=CC=CC=C1

DB13407

DB13501

405

406

CCCCOC1=CC=C(CC(=O)NO)C=C1

[H][C@]1(O)CN(C(C)=O)[C@@]([H])(C1)C(O)=O

DB13346

DB13363

403

C[C@H](C(O)=O)C1=CC=C2OC(=NC2=C1)C1=CC=C(F)C=C1

404

DB13217

DB13317

401

402

[H][C@@](C)(C(O)=O)C1=CC(=CC=C1)C(=O)C1=CC=CC=C1

COC1=C(O[C@@H]2O[C@H](CO)[C@@H](O)[C@H](O)[C@H]2O)C=C2CC[C@H](NC(C)=O)C3=CC(=O)C(SC)=CC=C3C2=C1OC

DB09214

DB11582

399

400

OC(=O)COC(=O)CC1=CC=CC=C1NC1=C(Cl)C=CC=C1Cl

COC1=C(O)C=CC(CNC(=O)CCCC\C=C\C(C)C)=C1

DB06736

DB06774

397

398

CS(=O)(=O)NC1=C(OC2=CC=CC=C2)C=C(C=C1)[N+]([O–])=O

CC(CN1CCCCC1)C(=O)C1=CC=C(C)C=C1

DB04743

DB06264

395

396

CCCCC1C(=O)N(N(C1=O)C1=CC=C(O)C=C1)C1=CC=CC=C1

OC(=O)C1=C(NC2=CC=CC(=C2)C(F)(F)F)N=CC=C1

DB03585

DB04552

CN1C(C2=CC=C(Cl)C=C2)S(=O)(=O)CCC1=O

DB01178

392

393

CC(O)(P(O)(O)=O)P(O)(O)=O

DB01077

394

Canonical SMILES

DrugBankID

No.

391

Table 4.2 (continued)

(continued)

N01

N03

N04

N02

N05

M02

M02

M01

M01

M01

M01

M03

M02

M02

M01

M03

M02

M02

M01

M03

M05

ATC

4 Drug Effect Deep Learner Based on Graphical Convolutional Network 125

CNS(=O)(=O)CC1=CC=C2NC=C(CCN(C)C)C2=C1

CO[C@]12CC[C@@]3(C[C@@H]1[C@](C)(O)C(C)(C)C)[C@H]1CC4=C5C(O[C@@H]2[C@@]35CCN1CC1CC1)=C(O)C=C4

DB00921

431

COC1=C(OC)C=C2C(=O)C(CC3CCN(CC4=CC=CC=C4)CC3)CC2=C1

NS(=O)(=O)CC1=NOC2=CC=CC=C12

DB00843

DB00909

429

430

CCCNC(C)C(=O)NC1=CC=CC=C1C

CCC1(C(=O)NCNC1=O)C1=CC=CC=C1

DB00750

DB00794

427

428

[H][C@@]12OC3=C(O)C=CC4=C3[C@@]11CCN(CC3CC3)[C@]([H])(C4)[C@]1(O)CCC2=O

CN1[C@H]2C[C@@H](C[C@@H]1[C@H]1O[C@@H]21)OC(=O)[C@H](CO)C1=CC=CC=C1

DB00704

DB00747

425

426

[H][C@@]12CCCN1C(=O)[C@H](CC1=CC=CC=C1)N1C(=O)[C@](C)(NC(=O)[C@H]3CN(C)[C@]4([H])CC5=CNC6=CC=CC (=C56)C4=C3)O[C@@]21O

DB00669

DB00696

423

424

CCCCOC1=CC=C(C=C1)C(=O)CCN1CCCCC1

DB00599

DB00645

421

422

CCCC(C)C1(CC)C(=O)NC(=S)NC1=O

OCCOCCN1CCN(CC1)C(C1=CC=CC=C1)C1=CC=C(Cl)C=C1

CCC1(C)CC(=O)NC1=O

DB00557

DB00593

419

420

CCC1(NC(=O)N(C)C1=O)C1=CC=CC=C1

DB00497

DB00532

417

418

COC1=C2O[C@H]3C(=O)CC[C@@]4(O)[C@H]5CC(C=C1)=C2[C@@]34CCN5C

[H][C@@]12OC3=C(O)C=CC4=C3[C@@]11CCN(C)[C@]([H])(C4)[C@]1([H])CCC2=O

CCC(=O)C(CC(C)N(C)C)(C1=CC=CC=C1)C1=CC=CC=C1

DB00327

[H][C@@]12CCCN1C(=O)[C@H](CC1=CC=CC=C1)N1C(=O)[C@](C)(NC(=O)[C@H]3CN(C)[C@]4([H])CC5=CNC6=CC=CC(=C56) [C@@]4([H])C3)O[C@@]21O

DB00320

414

DB00333

[H][C@]12C=C[C@H](O)[C@@H]3OC4=C5C(C[C@H]1 N(C)CC[C@@]235)=CC=C4OC

DB00318

413

415

CC(=O)NC1=CC=C(O)C=C1

DB00316

416

Canonical SMILES

DrugBankID

No.

412

Table 4.2 (continued)

(continued)

N07

N03

N06

N05

N01

N05

N02

N02

N02

N01

N01

N03

N05

N03

N02

N07

N02

N02

N02

N02

ATC

126 Y. Wu et al.

CCOC1=C(C=CC=C1)C(N)=O

CCN(CC)CC1=C(O)C=CC(NC2=C3C=CC(Cl)=CC3=NC=C2)=C1

DB00613

452

CCC1=C(C(N)=NC(N)=N1)C1=CC=C(Cl)C=C1

OC(C1CCCCN1)C1=CC(=NC2=C1C=CC=C2C(F)(F)F)C(F)(F)F

DB00205

DB00358

450

451

CCC(C)(O)C#C

CC(=O)OC1=CC=CC(C(O)=O)=C1OC(C)=O

DB13733

DB13839

448

[Ca++].NC(N)=O.CC(=O)OC1=CC=CC=C1C([O–])=O.CC(=O)OC1=CC=CC=C1C([O–])=O

449

DB13544

DB13612

446

447

COC(=O)[C@@H](N)CC1=CC=C(O)C(O)=C1

[O–].[O–].[O–].[Al+3].[Al+3].CC(=O)OC1=CC=CC=C1C(O)=O

DB13313

DB13509

444

445

[Na+].OCCCC([O–])=O

CCOC1=CC=C(NC(=O)CC(C)O)C=C1

DB09072

DB13278

442

443

CCCN(CC)C(CC)C(=O)NC1=C(C)C=CC=C1C

CCCNC(C)C(=O)NC1=C(SC=C1C)C(=O)OC

DB08987

DB09009

440

441

CCC(=O)C1(CCN(C)CC1)C1=CC(O)=CC=C1

NC(=O)C1=CC=CC=C1O

DB06738

DB08797

438

439

CCOC1=CC=C(NC(C)=O)C=C1

CN(CS(O)(=O)=O)C1=C(C)N(C)N(C1=O)C1=CC=CC=C1

DB03783

DB04817

436

437

N[C@@H](CC1=CC(O)=C(O)C=C1)C(O)=O

[H][C@@]12OC3=C4C(C[C@H]5 N(C)CC[C@@]14[C@@]5([H])CC[C@@H]2O)=CC=C3OC

DB01235

DB01551

[H][C@@]12CCCN1C(=O)[C@H](CC(C)C)N1C(=O)[C@](NC(=O)[C@H]3CN(C)[C@]4([H])CC5=C(Br)NC6=CC=CC(=C56)C4=C3) (O[C@@]21O)C(C)C

DB01200

433

434

CN1CCCCC1C(=O)NC1=C(C)C=CC=C1C

DB00961

435

Canonical SMILES

DrugBankID

No.

432

Table 4.2 (continued)

(continued)

P01

P01

P01

N02

N05

N02

N02

N02

N04

N02

N01

N01

N01

N02

N02

N02

N02

N02

N04

N04

N01

ATC

4 Drug Effect Deep Learner Based on Graphical Convolutional Network 127

[H][C@@](C)(N)[C@]([H])(O)C1=CC=CC=C1

COC1=CC=C(CC(C)NCC(O)C2=CC(NC=O)=C(O)C=C2)C=C1

DB00983

473

CC(C)(C)NCC(O)C1=CC(O)=CC(O)=C1

OCC1=C(O)C=CC(=C1)C(O)CNCCCCCCOCCCCC1=CC=CC=C1

DB00871

DB00938

471

472

CC(C)NCC(O)C1=CC(O)=CC(O)=C1

CN[C@@H](C)[C@@H](O)C1=CC=CC=C1

DB00816

DB00852

469

470

CCCC1=C2N(CC)C(=CC(=O)C2=CC2=C1OC(=CC2=O)C(O)=O)C(O)=O

CN1C2=C(N(CC(O)CO)C=N2)C(=O)N(C)C1=O

DB00651

DB00716

467

468

[H][C@@]12C[C@H](C)[C@](O)(C(=O)CO)[C@@]1(C)C[C@H](O)[C@@]1(F)[C@@]2([H])CCC2=CC(=O)C=C[C@]12C

OC(=O)CC1(CC1)CS[C@H](CCC1=CC=CC=C1C(O)(C)C)C1=CC=CC(\C=C\C2=NC3=C(C=CC(Cl)=C3)C=C2)=C1

DB00443

DB00471

465

C[C@H](O)[C@H](C)[C@@H]1O[C@H]1C[C@H]1CO[C@@H](C\C(C)=C\C(=O)OCCCCCCCCC(O)=O)[C@H](O)[C@@H]1O

466

DB00397

DB00410

463

464

CN1C2=C(NC=N2)C(=O)N(C)C1=O

[H][C@]12CC[C@]([H])(C[C@@H](C1)OC(=O)C(CO)C1=CC=CC=C1)[N+]2(C)C(C)C

DB00277

DB00332

461

462

R03

R03

R03

R01

R03

R03

R03

R03

R01

R01

R01

R03

R03

R01

P01

P01

P01

P01

P01

P01

P02

ATC

(continued)

OC1=C2N=CC=CC2=C(Br)C=C1Br

[H][C@@]12C[C@@]3([H])[C@]4([H])C[C@H](F)C5=CC(=O)C=C[C@]5(C)[C@@]4([H])[C@@H](O)C[C@]3(C)[C@@]1(OC(C)(C)O2)C(=O)CO

DB13536

DB00180

459

460

CC(O)CN1C(C)=NC=C1N(=O)=O

CC1=NC2=C(C=C1)C(Cl)=CC(Cl)=C2O

DB12834

DB13306

457

458

[H][C@@]12CC[C@@H](C)[C@]3([H])CC[C@@]4(C)OO[C@@]13[C@]([H])(O[C@@H](OC(=O)CCC(O)=O)[C@@H]2C)O4

[H][C@@]1(C)CC[C@@]2([H])[C@@]([H])(C)C([H])(O)O[C@]3([H])O[C@@]4(C)CC[C@]1([H])[C@@]23OO4

DB09274

DB11638

NCCCC(N)(C(F)F)C(O)=O

DB06243

454

455

COC(=O)NC1=NC2=C(N1)C=C(C=C2)C(=O)C1=CC=CC=C1

DB00643

456

Canonical SMILES

DrugBankID

No.

453

Table 4.2 (continued)

128 Y. Wu et al.

CC(C)(C)C1=CC(=C(O)C=C1NC(=O)C1=CNC2=CC=CC=C2C1=O)C(C)(C)C

OC(CNCCCCCCNCC(O)C1=CC(O)=C(O)C=C1)C1=CC(O)=C(O)C=C1

O=C1NCC2(CCN(CCC3=CC=CC=C3)CC2)O1

DB08957

DB08979

492

493

OC(C(=O)O[C@H]1C[N+]2(CCCOC3=CC=CC=C3)CCC1CC2)(C1=CC=CS1)C1=CC=CS1

[H][C@@]12C[C@@H](C)[C@](OC(=O)C3=CC=CO3)(C(=O)SCF)[C@@]1(C)C[C@H](O)[C@@]1(F)[C@@] 2([H])C[C@H](F)C2=CC(=O)C=C[C@]12C

DB08897

DB08906

490

CC(=O)N[C@@H](CC(=O)N[C@@H](CCC(O)=O)C(O)=O)C(O)=O

491

DB08820

DB08835

488

489

NC1=C(Br)C=C(Br)C=C1CN[C@H]1CC[C@H](O)CC1

CC(C1=C(CCN(C)C)CC2=CC=CC=C12)C1=CC=CC=N1

DB06742

DB08801

486

487

CCC1=C(CC)C=C2CC(CC2=C1)NC[C@H](O)C1=C2C=CC(=O)NC2=C(O)C=C1

CC(=O)N[C@@H](CS)C(O)=O

DB05039

DB06151

484

485

[H][C@@]12C[C@@]3([H])[C@]4([H])CCC5=CC(=O)C=C[C@]5(C)[C@@]4([H])[C@@H](O)C[C@]3(C)[C@@] 1(O[C@@H](O2)C1CCCCC1)C(=O)COC(=O)C(C)C

CC(C)(C)NCC(O)C1=CC(Cl)=C(N)C(Cl)=C1

DB01407

DB01410

482

483

C[N+](C)(C)CCO.CN1C2=C([N–]C=N2)C(=O)N(C)C1=O

CC[C@H](NC(C)C)[C@H](O)C1=C2C=CC(=O)NC2=C(O)C=C1

DB01303

DB01366

480

481

CC(CC1=CC=C(O)C=C1)NCC(O)C1=CC(O)=CC(O)=C1

CC(C)(C)NCC(O)C1=NC(CO)=C(O)C=C1

DB01288

DB01291

478

479

C[C@@H]1CN(CC[C@]1(C(O)=O)C1=CC=CC=C1)[C@H]1CC[C@](CC1)(C#N)C1=CC=C(F)C=C1

[H][C@@]12C[C@@H](C)[C@](O)(C(=O)CO)[C@@]1(C)C[C@H](O)[C@@]1(F)[C@@]2([H])CCC2=CC(=O)C=C[C@]12C

DB01106

DB01234

OC(COC1=CC=CC2=C1C(=O)C=C(O2)C(O)=O)COC1=CC=CC2=C1C(=O)C=C(O2)C(O)=O

DB01003

475

476

CC(C)(C)NCC(O)C1=CC(CO)=C(O)C=C1

DB01001

477

Canonical SMILES

DrugBankID

No.

474

Table 4.2 (continued)

(continued)

R03

R03

R03

R03

R01

R07

R06

R03

R05

R03

R03

R03

R03

R03

R03

R03

R01

R01

R03

R03

ATC

4 Drug Effect Deep Learner Based on Graphical Convolutional Network 129

COC1=CC(C[C@@H]2NCCC3=CC(O)=C(O)C=C23)=CC(OC)=C1OC

CC(C)(C)NCC(O)COC1=CC=CC2=C1CCC(=O)N2

DB00521

514

CNC[C@H](O)C1=CC(O)=CC=C1

NCC[C@H](O)C(=O)N[C@@H]1C[C@H](N)[C@@H](O[C@H]2O[C@H](CN)[C@@H](O)[C@H](O)[C@H]2O)[C@H](O)[C@H]1O[C@H] 1O[C@H](CO)[C@@H](O)[C@H](N)[C@H]1O

DB00388

DB00479

512

513

CCN(CC)CC(=O)NC1=C(C)C=CC=C1C

COC1=CC2=C(C=C1)N(C(=O)C1=CC=C(Cl)C=C1)C(C)=C2CC(O)=O

DB00281

DB00328

510

511

[H][C@]12CN(C[C@@]1([H])NCCC2)C1=C(F)C=C2C(=O)C(=CN(C3CC3)C2=C1OC)C(O)=O

CC(C)NCC(O)COC1=CC=C(CCOCC2CC2)C=C1

DB00195

DB00218

508

509

[H][C@@]12C[C@@H](C)[C@](O)(C(=O)SCF)[C@@]1(C)C[C@H](O)[C@@]1(F)[C@@]2([H])C[C@H](F)C2=CC(=O)C=C[C@]12C

C\C(=C/CO)\C=C\C=C(/C)\C=C\C1=C(C)CCCC1(C)C

DB13867

DB00162

506

CC(CN1C2=CC=CC=C2SC2=C1C=CC=C2)[N+](C)(C)CCO

507

DB13692

DB13840

504

505

CC(O)CN1C=NC2=C1C(=O)N(C)C(=O)N2C

CC(C)(C)NCC(O)C1=CC(NC(N)=O)=C(O)C=C1

DB13449

DB13625

502

503

CC(C)(C)NCC(O)C1=CC=CC=C1Cl

CN1C2=C(N(CCCNCC(O)C3=CC(O)=CC(O)=C3)C=N2)C(=O)N(C)C1=O

DB12248

DB12846

500

501

OS(=O)(=O)CCS

OC1=CC=CC2=C1N=CC=C2

DB09110

DB11145

498

499

OCC1=C(O)C=CC(=C1)[C@@H](O)CNCCCCCCOCCOCC1=C(Cl)C=CC=C1Cl

[H][C@@]12CC[C@](O)(C(=O)CS)[C@@]1(C)C[C@H](O)[C@@]1([H])[C@@]2([H])CCC2=CC(=O)CC[C@]12C

DB09082

DB09091

COC1=CC=C(CC(C)(C)NC[C@H](O)C2=C3OCC(=O)NC3=CC(O)=C2)C=C1

DB09080

495

496

OC(C1=CC=CC=C1)(C1=CC=CC=C1)C12CC[N+](CCOCC3=CC=CC=C3)(CC1)CC2

DB09076

497

Canonical SMILES

DrugBankID

No.

494

Table 4.2 (continued)

(continued)

S01

S01

S01

S01

S01

S01

S01

S01

R03

R06

R03

R03

R03

R03

R03

R02

R05

R01

R03

R03

R03

ATC

130 Y. Wu et al.

CCN[C@@H]1C[C@H](N)[C@@H](O[C@H]2OC(CN)=CC[C@H]2 N)[C@H](O)[C@H]1O[C@H]1OC[C@](C)(O)[C@H](NC)[C@H]1O

DB01044

DB01112

533

534

[H][C@]12SCC(COC(N)=O)=C(N1C(=O)[C@H]2NC(=O)C(=N/OC)\C1=CC=CO1)C(O)=O

COC1=C2N(C=C(C(O)=O)C(=O)C2=CC(F)=C1N1CCNC(C)C1)C1CC1

CN(C)CCOC(=O)C(C1=CC=CC=C1)C1(O)CCCC1

[H][C@]12 N(C)CC[C@@]1(C)C1=C(C=CC(OC(=O)NC)=C1)N2C

DB00979

DB00981

531

CCN1C=C(C(O)=O)C(=O)C2=CC(F)=C(N3CCNC(C)C3)C(F)=C12

532

DB00955

DB00978

529

530

CN1CCC(CC1)=C1C2=C(SC=C2)C(=O)CC2=CC=CC=C12

CC1=CC(=C(O)C(C)=C1CC1=NCCN1)C(C)(C)C

DB00920

DB00935

527

528

CCN(CC1=CC=NC=C1)C(=O)C(CO)C1=CC=CC=C1

CCCCOC1=CC(=CC=C1N)C(=O)OCCN(CC)CC

DB00809

DB00892

525

526

CN(C)CC\C=C1\C2=CC=CC=C2COC2=C1C=C(CC(O)=O)C=C2

CNC(C)C1CCC(N)C(OC2C(N)CC(N)C(OC3OCC(C)(O)C(NC)C3O)C2O)O1

DB00768

DB00798

523

524

[H][C@@]12CC[C@](O)(C(=O)CO)[C@@]1(C)C[C@H](O)[C@@]1([H])[C@@]2([H])CCC2=CC(=O)CC[C@]12C

[H][C@@]12C[C@@]3([H])C(=C(O)[C@]1(O)C(=O)C(C(N)=O)=C(O)[C@H]2 N(C)C)C(=O)C1=C(O)C=CC=C1[C@@]3(C)O

DB00741

DB00759

521

522

[H][C@@]12CC[C@](O)(C(=O)CO)[C@@]1(C)C[C@H](O)[C@@]1(F)[C@@]2([H])CCC2=CC(=O)CC[C@]12C

OC1=CC=C2C(OC3=CC(O)=CC=C3C22OC(=O)C3=C2C=CC=C3)=C1

DB00687

DB00693

519

520

NC[C@H]1O[C@H](O[C@@H]2[C@@H](N)C[C@@H](N)[C@H](O[C@H]3O[C@H](CO)[C@@H](O)[C@H](N)[C@H]3O)[C@H]2O) [C@H](N)C[C@@H]1O

[H][C@@]12C[C@@]3([H])[C@]4([H])C[C@H](F)C5=CC(=O)C=C[C@]5(C)[C@@]4(F)[C@@H](O)C[C@]3(C)[C@@]1(OC(C)(C)O2)C(=O)CO

DB00591

DB00684

OC(=O)CC1=CC=CC=C1NC1=C(Cl)C=CC=C1Cl

DB00586

516

517

CN1[C@H]2CC[C@@H]1C[C@@H](C2)OC(=O)C(CO)C1=CC=CC=C1

DB00572

518

Canonical SMILES

DrugBankID

No.

515

Table 4.2 (continued)

(continued)

S01

S01

S01

S01

S01

S01

S01

S01

S01

S01

S01

S01

S02

S01

S01

S02

S01

S01

S01

S01

ATC

4 Drug Effect Deep Learner Based on Graphical Convolutional Network 131

[H]C(=O)[C@H](O)[C@@H](O)[C@@H](O)[C@H](O)CO

DB11735

551

CN(C)C=NC1=C(I)C=C(I)C(CCC(O)=O)=C1I

NC1=NC(=O)C2=C(NC[C@H](CNC3=CC=C(C=C3)C(=O)N[C@@H](CCC(O)=O)C(O)=O)N2C=O)N1

DB09333

DB11596

549

550

CC1=CC=C(C=C1)N(CC1=NCCN1)C1=CC(O)=CC=C1

CCO

DB00692

DB00898

547

548

[H][C@@]12CC[C@H](O)[C@@]1(C)CC[C@]1([H])[C@@]3([H])CCC(=O)C=C3CC[C@@]21[H]

[H]C(=O)N1C(CNC2=CC=C(C=C2)C(=O)N[C@@H](CCC(O)=O)C(O)=O)CNC2=C1C(=O)NC(N)=N2

DB13169

DB00650

545

CO[C@H]1\C=C\O[C@@]2(C)OC3=C(C)C(O)=C4C(O)=C(NC(=O)\C(C)=C/C=C/[C@H](C)[C@H](O)[C@@H](C)[C@@H](O) [C@@H](C)[C@H](OC(C)=O)[C@@H]1C)C=C(O)C4=C3C2=O

546

DB11753

544

CNCC(O)C1=CC=C(O)C=C1

NC1=CC=C(C=C1)S(=O)(=O)NC1=CC=NN1C1=CC=CC=C1

DB06729

DB09203

542

543

[H][C@@]12C[C@@H](O)[C@@]3([H])[C@@]4(C)CC[C@@H](O)[C@@H](C)[C@]4([H])CC[C@]3(C)[C@@]1(C)C[C@H](OC(C)=O) \C2=C(\CCC=C(C)C)C(O)=O

DB01466

DB02703

540

541

[H][C@@]12OC3=C(OCC)C=CC4=C3[C@@]11CCN(C)[C@]([H])(C4)[C@]1([H])C=C[C@@H]2O

[H][C@@]12C[C@H]3OC(C)(C)O[C@@]3(C(=O)CO)[C@@]1(C)C[C@H](O)[C@@]1([H])[C@@]2([H])CCC2=CC(=O)C=C[C@]12C

CN[C@@H](C)[C@H](O)C1=CC=CC=C1

DB01260

DB01364

538

539

CCN[C@H]1CN(CCCOC)S(=O)(=O)C2=C1C=C(S2)S(N)(=O)=O

CC(C)NCC(O)COC1=C(C)C(C)=C(OC(C)=O)C(C)=C1

DB01194

DB01214

536

537

NC[C@H]1O[C@H](O[C@@H]2[C@@H](N)C[C@@H](N)[C@H](O[C@H]3O[C@H](CO)[C@@H](O)[C@H](N)[C@H]3O) [C@H]2O)[C@H](O)[C@@H](O)[C@@H]1O

DB01172

Canonical SMILES

DrugBankID

No.

535

Table 4.2 (continued)

V08

V03

V08

V03

V03

V03

S01

S01

S01

S01

S01

S01

S01

S01

S01

S01

S01

ATC

132 Y. Wu et al.

CCC1=CC2=C(C=C1N1CCC(CC1)N1CCOCC1)C(C)(C)C1=C(C3=CC=C(C=C3N1)C#N)C2=O

DB11363

21

C[C@@H](OC1=CC(=CN=C1N)C1=CN(N=C1)C1CCNCC1)C1=C(Cl)C=CC(F)=C1Cl

CC(C)OC1=C(NC2=NC=C(Cl)C(N2)=NC2=CC=CC=C2S(=O)(=O)C(C)C)C=C(C)C(=C1)C1CCNCC1

DB08865

DB09063

19

20

CN1C=NC2=C1C=C(C(=C2F)NC3=C(C=C(C=C3)Br)F)C(=O)NOCCO

[H][C@@]12CC3=C(C(O)=CC=C3N(C)C)C(=O)C1=C(O)[C@]1(O)C(=O)C(C(N)=O)=C(O)[C@@H](N(C)C)[C@]1([H])C2

DB11967

DB01017

17

18

COC1=CC(NC2=NC=C(F)C(NC3=NC4=C(OC(C)(C)C(=O)N4COP(O)(O)=O)C=C3)=N2)=CC(OC)=C1OC

CC\C(=C(/C1=CC=CC=C1)C1=CC=C(OCCN(C)C)C=C1)C1=CC=CC=C1

DB00675

DB12010

15

16

CC1=C(C2=C(C1=CC3=CC=C(C=C3)S(=O)C)C=CC(=C2)F)CC(=O)O

DB02733

DB00605

13

14

[H][C@@](CO)(NC1=NC2=C(N=CN2C(C)C)C(NC2=CC=C(C(O)=O)C(Cl)=C2)=N1)C(C)C

CN1C=NC2=C(F)C(NC3=C(Cl)C=C(Br)C=C3)=C(C=C12)C(=O)NOCCO

[H]N([C@H](CO)C1=CC(Cl)=CC=C1)C(=O)C1=CC(=CN1[H])C1=CC(=NC=C1Cl)N([H])C(C)C

DB11689

DB13930

11

12

C1CCNC(C1)C2(CN(C2)C(=O)C3=C(C(=C(C=C3)F)F)NC4=C(C=C(C=C4)I)F)O

CC1=C2C(=C(N(C1=O)C)NC3=C(C=C(C=C3)I)F)C(=O)N(C(=O)N2C4=CC=CC(=C4)NC(=O)C)C5CC5

DB05239

DB08911

9

CC(C)N1C=C(C(=N1)C2=C(C(=CC(=C2)Cl)NS(=O)(=O)C)F)C3=NC(=NC=C3)NCC(C)NC(=O)OC

CCN1C2=CC(=NC=C2C=C(C1=O)C3=CC(=C(C=C3Br)F)NC(=O)NC4=CC=CC=C4)NC

DB11718

DB14840

7

8

10

CNC(=O)C1=NC=CC(=C1)OC2=CC(=C(C=C2)NC(=O)NC3=CC(=C(C=C3)Cl)C(F)(F)F)F

CC(C)(C)C1=NC(=C(S1)C2=NC(=NC=C2)N)C3=C(C(=CC=C3)NS(=O)(=O)C4=C(C=CC=C4F)F)F

DB08896

DB08912

5

CCCS(=O)(=O)NC1=C(C(=C(C=C1)F)C(=O)C2=CNC3=C2C=C(C=N3)C4=CC=C(C=C4)Cl)F

CNC(=O)C1=NC=CC(=C1)OC2=CC=C(C=C2)NC(=O)NC3=CC(=C(C=C3)Cl)C(F)(F)F

C[C@H]1CN(CCN1C2=NC(=O)N(C3=NC(=C(C=C32)F)C4=C(C=CC=C4F)O)C5=C(C=CN=C5C(C)C)C)C(=O)C=C

CN1CCC[C@H]1COC2=NC3=C(CCN(C3)C4=CC=CC5=C4C(=CC=C5)Cl)C(=N2)N6CCN([C@H](C6)CC#N)C(=O)C(=C)F

Canonical SMILES

6

DB00398

DB08881

3

2

4

DB15568

DB15569

1

DrugBankID

No.

Table 4.3 Canonical SMILES of 64 drugs from DrugBank with labels (targeted pathways)

(continued)

MAPK

MAPK

MAPK

MAPK

MAPK

MAPK

MAPK

MAPK

MAPK

MAPK

MAPK

MAPK

MAPK

MAPK

MAPK

MAPK

MAPK

MAPK

MAPK

MAPK

MAPK

Targeted pathway

4 Drug Effect Deep Learner Based on Graphical Convolutional Network 133

CC(=O)OC1=CC=CC=C1C(O)=O

DB00945

42

CC1=NC(NC2=NC=C(S2)C(=O)NC2=C(C)C=CC=C2Cl)=CC(=N1)N1CCN(CCO)CC1

CC(C)CC1=CC=C(C=C1)[C@H](C)C(O)=O

DB01254

DB09213

40

41

CO[C@]1(CC[C@@H](CC1)C1=NC(NC2=NNC(C)=C2)=CC(C)=N1)C(=O)N[C@@H](C)C1=CC=C(N=C1)N1C=C(F)C=N1

CN1CCN(CC1)C1=CC=C(C(=O)NC2=NNC3=CC=C(CC4=CC(F)=CC(F)=C4)C=C23)C(NC2CCOCC2)=C1

DB15822

DB11986

38

39

CC[C@@H]1CN(C[C@@H]1C1=CN=C2C=NC3=C(C=CN3)N12)C(=O)NCC(F)(F)F

CC1=CN=C(NC2=CC=C(OCCN3CCCC3)C=C2)N=C1NC1=CC=CC(=C1)S(=O)(=O)NC(C)(C)C

DB12500

DB15091

36

37

CCS(=O)(=O)N1CC(CC#N)(C1)N1C=C(C=N1)C1=C2C=CNC2=NC=N1

DB08895

DB11817

34

35

[H][C@@]1(C)CCN(C[C@]1([H])N(C)C1=NC=NC2=C1C=CN2)C(=O)CC#N

NS(=O)(=O)C1=CC=C(C=C1)N1N=C(C=C1C1=CC=C(Cl)C=C1)C(F)(F)F

N#CC[C@H](C1CCCC1)N1C=C(C=N1)C1=C2C=CNC2=NC=N1

DB14059

DB08877

32

33

FC(F)(F)C1=CC=C(CNC2=NC=C(CC3=CNC4=NC=C(Cl)C=C34)C=C2)C=N1

COC1=C(C=C(C=C1)C1=CC2=C(C=C1)C=C(C=C2)C(O)=O)C12CC3CC(CC(C3)C1)C2

DB12978

DB00210

30

31

COC1=NC=C(CN2C3CC2CN(C3)C2=CC=C(C=N2)C2=CC(OCC(C)(C)O)=CN3N=CC(C#N)=C23)C=C1

CO[C@@H]1[C@@H](C[C@H]2O[C@]1(C)N1C3=C(C=CC=C3)C3=C1C1=C(C4=C(C=CC=C4)N21)C1=C3CNC1=O)N(C)C(=O)C1=CC=CC=C1

DB15685

DB06595

28

29

CNC(=O)C1=C(SC2=CC=C3C(NN=C3\C=C\C3=CC=CC=N3)=C2)C=CC=C1

COC1=C(C=C2C(OC3=CC=C(NC(=O)NC4CC4)C(Cl)=C3)=CC=NC2=C1)C(N)=O

DB06626

DB09078

26

CN(C1=CC2=NN(C)C(C)=C2C=C1)C1=CC=NC(NC2=CC=C(C)C(=C2)S(N)(=O)=O)=N1

CCN(CC)CCNC(=O)C1=C(C)NC(\C=C2/C(=O)NC3=C2C=C(F)C=C3)=C1C

CCC1=C(NC2CCOCC2)N=C(NC2=CC=C(N3CCC(CC3)N3CCN(C)CC3)C(OC)=C2)C(=N1)C(N)=O

C[C@H]1OC2=C(N)N=CC(=C2)C2=C(C#N)N(C)N=C2CN(C)C(=O)C2=C1C=C(F)C=C2

Canonical SMILES

27

DB01268

DB06589

24

23

25

DB12130

DB12141

22

DrugBankID

No.

Table 4.3 (continued)

(continued)

JAK-STAT

JAK-STAT

JAK-STAT

JAK-STAT

JAK-STAT

JAK-STAT

JAK-STAT

JAK-STAT

JAK-STAT

JAK-STAT

MAPK

MAPK

MAPK

MAPK

MAPK

MAPK

MAPK

MAPK

MAPK

MAPK

MAPK

Targeted pathway

134 Y. Wu et al.

CN1CCN(CC2=CC=C(NC(=O)C3=CC(C#CC4=CN=C5C=CC=NN45)=C(C)C=C3)C=C2C(F)(F)F)CC1

DB08901

63

NC(=O)C1=C2NCC[C@@H](C3CCN(CC3)C(=O)C=C)N2N=C1C1=CC=C(OC2=CC=CC=C2)C=C1

COC1=CC(NC2=C(C=NC3=CC(OCCCN4CCN(C)CC4)=C(OC)C=C23)C#N)=C(Cl)C=C1Cl

DB15035

DB06616

61

62

CCOC1=C(NC(=O)\C=C\CN(C)C)C=C2C(NC3=CC=C(OCC4=CC=CC=N4)C(Cl)=C3)=C(C=NC2=C1)C#N

COC1=C(NC(=O)\C=C\CN2CCCCC2)C=C2C(NC3=CC(Cl)=C(F)C=C3)=NC=NC2=C1

DB11828

DB11963

59

60

CN1CCN(CC1)C1=CC=C(NC2=NC(OC3=CC=CC(NC(=O)C=C)=C3)=C3SC=CC3=N2)C=C1

C#CC1=CC=CC(NC2=NC=NC3=CC4=C(OCCOCCOCCO4)C=C23)=C1

DB11737

DB13164

57

58

COC1=C(NC2=NC=CC(=N2)C2=CN(C)C3=C2C=CC=C3)C=C(NC(=O)C=C)C(=C1)N(C)CCN(C)C

DB05294

DB09330

55

56

COC1=C(OCC2CCN(C)CC2)C=C2N=CN=C(NC3=C(F)C=C(Br)C=C3)C2=C1

CS(=O)(=O)CCNCC1=CC=C(O1)C1=CC2=C(C=C1)N=CN=C2NC1=CC(Cl)=C(OCC2=CC(F)=CC=C2)C=C1

CCN(CC)CC(=O)NC1=C(C)C=CC=C1C

DB01259

DB00281

53

54

COC1=C(OCCCN2CCOCC2)C=C2C(NC3=CC(Cl)=C(F)C=C3)=NC=NC2=C1

COCCOC1=CC2=C(C=C1OCCOC)C(NC1=CC(=CC=C1)C#C)=NC=N2

DB00317

DB00530

51

52

COC1=CC(=CC=C1NC1=NC=C(Cl)C(NC2=CC=CC=C2P(C)(C)=O)=N1)N1CCC(CC1)N1CCN(C)CC1

CC1=CC(NC2=NC=NC3=CC=C(NC4=NC(C)(C)CO4)C=C23)=CC=C1OC1=CC2=NC=NN2C=C1

DB12267

DB11652

49

50

NC(=O)C1=NC(F)=CN=C1O

CN(C)C\C=C\C(=O)NC1=C(O[C@H]2CCOC2)C=C2N=CN=C(NC3=CC(Cl)=C(F)C=C3)C2=C1

DB12466

DB08916

47

CN(C)CCC=C1C2=CC=CC=C2C=CC2=CC=CC=C12

OC1=CC=C(C=C1)C1=C(C(=O)C2=CC=C(OCCN3CCCCC3)C=C2)C2=C(S1)C=C(O)C=C2

CC1=CC(=O)C2=CC=CC=C2C1=O

[O–][N+](=O)C1=CC2=C(NC3=C2CC(=O)NC2=C3C=CC=C2)C=C1

Canonical SMILES

48

DB00481

DB00924

45

44

46

DB04014

DB00170

43

DrugBankID

No.

Table 4.3 (continued)

(continued)

ERBB

ERBB

ERBB

ERBB

ERBB

ERBB

ERBB

ERBB

ERBB

ERBB

ERBB

ERBB

ERBB

ERBB

ERBB

ERBB

JAK-STAT

JAK-STAT

JAK-STAT

JAK-STAT

JAK-STAT

Targeted pathway

4 Drug Effect Deep Learner Based on Graphical Convolutional Network 135

DrugBankID

DB09079

No.

64

COC(=O)C1=CC=C2C(NC(=O)\C2=C(/NC2=CC=C(C=C2)N(C)C(=O)CN2CCN(C)CC2)C2=CC=CC=C2)=C1

Canonical SMILES

Table 4.3 (continued)

ERBB

Targeted pathway

136 Y. Wu et al.

4 Drug Effect Deep Learner Based on Graphical Convolutional Network

137

References Aditya G, Leskovec J (2016) Node2vec: scalable feature learning for networks. In: KDD’16: proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, San Francisco, pp 855–864 Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, Yefanov A, Lee H, Zhang N, Robertson CL, Serova N, Davis S, Soboleva A (2013) NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res 41(Database issue):D991–D995. https://doi.org/10.1093/nar/gks1193 Belinky F, Nativ N, Stelzer G, Zimmerman S, Iny Stein T, Safran M, Lancet D (2015) PathCards: multi-source consolidation of human biological pathways. Database (Oxford). https://doi.org/10. 1093/database/bav006 Burstein HJ, Elias AD, Rugo HS, Cobleigh MA, Wolff AC, Eisenberg PD, Lehman M, Adams BJ, Bello CL, DePrimo SE, Baum CM, Miller KD (2008) Phase II study of sunitinib malate, an oral multitargeted tyrosine kinase inhibitor, in patients with metastatic breast cancer previously treated with an anthracycline and a taxane. J Clin Oncol 26(11):1810–1816. https://doi.org/10. 1200/JCO.2007.14.5375 Campone M, Bondarenko I, Brincat S, Hotko Y, Munster P, Chmielowska E, Fumoleau P, Ward R, Bardy-Bouxin N, Leip E, Turnbull K, Zacharchuk C, Epstein R (2012) Phase II study of single-agent bosutinib, a Src/Abl tyrosine kinase inhibitor, in patients with locally advanced or metastatic breast cancer pretreated with chemotherapy. Ann Oncol Off J Eur Soc Med Oncol 23 3:610–617 Chakravarti SK, Alla SRM (2019) Descriptor free QSAR modeling using deep learning with long short-term memory neural networks. Front Artif Intell 2:17 Chen J, Cheong H-H, Siu SWI (2020) BESTox: a convolutional neural network regression model based on binary-encoded SMILES for acute oral toxicity prediction of chemical compounds. In: Vega-Rodríguez MA, Wheeler T, Martín-Vide C (eds) Algorithms for computational biology. Springer International Publishing, Cham, pp 155–166 Crown JP, Diéras V, Staroslawska E, Yardley DA, Bachelot T, Davidson N, Wildiers H, Fasching PA, Capitain O, Ramos M, Greil R, Cognetti F, Fountzilas G, Blasinska-Morawiec M, Liedtke C, Kreienberg R, Miller WH, Tassell V, Huang X, Paolini J, Kern KA, Romieu G (2013) Phase III trial of sunitinib in combination with capecitabine versus capecitabine monotherapy for the treatment of patients with pretreated metastatic breast cancer. J Clin Oncol 31(23):2870–2878. https://doi.org/10.1200/JCO.2012.43.3391 Dijkstra EW (1959) A note on two problems in connexion with graphs. Numer Math 1(1):269–271. https://doi.org/10.1007/BF01386390 Edgar R, Domrachev M, Lash AE (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30(1):207–210. https://doi.org/10.1093/ nar/30.1.207 Elgebaly A, Menshawy A, El Ashal G, Osama O, Ghanem E, Omar A, Negida A (2016) Sunitinib alone or in combination with chemotherapy for the treatment of advanced breast cancer: a systematic review and meta-analysis. Breast Dis 36(2–3):91–101. https://doi.org/10.3233/bd160218 Fogel DB (2018) Factors associated with clinical trials that fail and opportunities for improving the likelihood of success: a review. Contemp Clin Trials Commun 11:156–164. https://doi.org/10. 1016/j.conctc.2018.08.001 Göller AH, Kuhnke L, Montanari F, Bonin A, Schneckener S, ter Laak A, Wichard J, Lobell M, Hillisch A (2020) Bayer’s in silico ADMET platform: a journey of machine learning over the past two decades. Drug Discov Today 25(9):1702–1709. https://doi.org/10.1016/j.drudis.2020. 07.001 Hu J (2018) Multi-class classification with shared weights neural network and convolutional neural network. In: ABCs 2018—1st ANU bio-inspired computing conference

138

Y. Wu et al.

Irwin JJ, Shoichet BK (2005) ZINC—a free database of commercially available compounds for virtual screening. J Chem Inf Model 45(1):177–182. https://doi.org/10.1021/ci049714+ John J, Evans R (2020) High accuracy protein structure prediction using deep learning. In: Fourteenth critical assessment of techniques for protein structure prediction (abstract book), pp 22–24 Kim EK, Choi E-J (2010) Pathological roles of MAPK signaling pathways in human diseases. Biochim Biophys Acta Mol Basis Dis 1802(4):396–405. https://doi.org/10.1016/j.bbadis.2009. 12.009 Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60:84–90 Kuzminykh D, Polykovskiy D, Kadurin A, Zhebrak A, Baskov I, Nikolenko S, Shayakhmetov R, Zhavoronkov A (2018) 3D molecular representations based on the wave transform for convolutional neural networks. Mol Pharm 15(10):4378–4385. https://doi.org/10.1021/acs.molpharma ceut.7b01134 Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y, Maciejewski A, Arndt D, Wilson M, Neveu V, Tang A, Gabriel G, Ly C, Adamjee S, Dame ZT, Han B, Zhou Y, Wishart DS (2014) DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res 42(Database issue):D1091–D1097. https://doi.org/10.1093/nar/gkt1068 Li Y, Huang C, Ding L, Li Z, Pan Y, Gao X (2019) Deep learning in bioinformatics: introduction, application, and perspective in the big data era. Methods 166:4–21. https://doi.org/10.1016/j. ymeth.2019.04.008 Luong T, Socher R, Manning CD (2013) Better word representations with recursive neural networks for morphology. In: CoNLL Luus FPS, Khan N, Akhalwaya I (2019) Active learning with TensorBoard projector. CoRR abs/1901.0 Mao J, Akhtar J, Zhang X, Sun L, Guan S, Li X, Chen G, Liu J, Jeon H-N, Kim MS, No KT, Wang G (2021) Comprehensive strategies of machine-learning-based quantitative structure-activity relationship models. iScience 24(9):103052. https://doi.org/10.1016/j.isci.2021.103052 Matsuzaka Y, Uesawa Y (2019) Optimization of a deep-learning method based on the classification of images generated by parameterized deep snap a novel molecular-image-input technique for quantitative structure-activity relationship (QSAR) analysis. Front Bioeng Biotechnol 7:65 Mayer E, Dhakil S, Patel T, Sundaram S, Fabian C, Kozloff M, Qamar R, Volterra F, Parmar H, Samant M, Burstein H (2010) SABRE-B: an evaluation of paclitaxel and bevacizumab with or without sunitinib as first-line treatment of metastatic breast cancer. Ann Oncol off J Eur Soc Med Oncol 21(12):2370–2376 McKim JM Jr (2010) Building a tiered approach to in vitro predictive toxicity screening: a focus on assays with in vivo relevance. Comb Chem High Throughput Screen 13(2):188–206. https:// doi.org/10.2174/138620710790596736 Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. In: 1st international conference on learning representations, ICLR 2013a. Workshop track proceedings. Scottsdale, Arizona, 2–4 May 2013 Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th international conference on neural information processing systems, vol 2. Curran Associates Inc., Red Hook, pp 3111–3119 Mullard A (2014) New drugs cost US$2.6 billion to develop. Nat Rev Drug Discov 13(12):877. https://doi.org/10.1038/nrd4507 National Center for Biotechnology Information (2021a) PubChem Compound Summary for CID 5328940, Bosutinib National Center for Biotechnology Information (2021b) PubChem Compound Summary for CID 5329102, Sunitinib National Center for Biotechnology Information (2021c) PubChem Compound Summary for CID 5330286, Palbocic

4 Drug Effect Deep Learner Based on Graphical Convolutional Network

139

National Center for Biotechnology Information (2021d) PubChem Compound Summary for CID 208908, Lapatinib Pennington J, Socher R, Manning CD (2014) GloVe: global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532–1543 Rawlings JS, Rosler KM, Harrison DA (2004) The JAK/STAT signaling pathway. J Cell Sci 117(8):1281–1283. https://doi.org/10.1242/jcs.00963 Ryu S, Lim J, Hong SH, Kim WY (2018) Deeply learning molecular structure-property relationships using attention- and gate-augmented graph convolutional network. arXiv. https://doi.org/10.1039/ b000000x/been Seger R, Krebs EG (1995) The MAPK signaling cascade. FASEB J 9(9):726–735. https://doi.org/ 10.1096/fasebj.9.9.7601337 Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Žídek A, Nelson AWR, Bridgland A, Penedones H, Petersen S, Simonyan K, Crossan S, Kohli P, Jones DT, Silver D, Kavukcuoglu K, Hassabis D (2020) Improved protein structure prediction using potentials from deep learning. Nature 577(7792):706–710. https://doi.org/10.1038/s41586-019-1923-7 Soule HD, Vazguez J, Long A, Albert S, Brennan M (1973) A human cell line from a pleural effusion derived from a breast carcinoma. J Natl Cancer Inst 51(5):1409–1416. https://doi.org/ 10.1093/jnci/51.5.1409 Subramanian A, Narayan R, Corsello SM, Peck DD, Natoli TE, Lu X, Gould J, Davis JF, Tubelli AA, Asiedu JK, Lahr DL, Hirschman JE, Liu Z, Donahue M, Julian B, Khan M, Wadden D, Smith IC, Lam D, Liberzon A, Toder C, Bagul M, Orzechowski M, Enache OM, Piccioni F, Johnson SA, Lyons NJ, Berger AH, Shamji AF, Brooks AN, Vrcic A, Flynn C, Rosains J, Takeda DY, Hu R, Davison D, Lamb J, Ardlie K, Hogstrom L, Greenside P, Gray NS, Clemons PA, Silver S, Wu X, Zhao W-N, Read-Button W, Wu X, Haggarty SJ, Ronco LV, Boehm JS, Schreiber SL, Doench JG, Bittker JA, Root DE, Wong B, Golub TR (2017) A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 171(6):1437–1452.e17. https://doi.org/10.1016/j. cell.2017.10.049 Sutskever I, Vinyals O, Le Q V (2014) Sequence to sequence learning with neural networks. In: Proceedings of the 27th international conference on neural information processing systems, vol 2. MIT Press, Cambridge, pp 3104–3112 von Mering C, Jensen LJ, Snel B, Hooper SD, Krupp M, Foglierini M, Jouffre N, Huynen MA, Bork P (2005) STRING: known and predicted protein–protein associations, integrated and transferred across organisms. Nucleic Acids Res 33(Database issue):D433–D437. https://doi.org/10.1093/ nar/gki005 Webel HE, Kimber TB, Radetzki S, Neuenschwander M, Nazaré M, Volkamer A (2020) Revealing cytotoxic substructures in molecules using deep learning. J Comput Aided Mol Des 34(7):731– 746. https://doi.org/10.1007/s10822-020-00310-4 WHO Collaborating Centre for Drug Statistics Methodology (2014) Guidelines for ATC classification and DDD assignment, 2015, 18th edn. Oslo Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z, Assempour N, Iynkkaran I, Liu Y, Maciejewski A, Gale N, Wilson A, Chin L, Cummings R, Le D, Pon A, Knox C, Wilson M (2018) DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res 46(D1):D1074–D1082. https://doi.org/10.1093/nar/gkx1037 Wong CH, Siah KW, Lo AW (2019) Corrigendum: estimation of clinical trial success rates and related parameters. Biostatistics 20(2):366. https://doi.org/10.1093/biostatistics/kxy072 Wu C, MacLeod I, Su AI (2013) BioGPS and MyGene.info: organizing online, gene-centric information. Nucleic Acids Res 41(D1):D561–D565. https://doi.org/10.1093/nar/gks1114 Wu Y, Wang G (2018) Machine learning based toxicity prediction: from chemical structural description to transcriptome analysis. Int J Mol Sci 19(8). https://doi.org/10.3390/ijms19082358

140

Y. Wu et al.

Xin J, Mark A, Afrasiabi C, Tsueng G, Juchler M, Gopal N, Stupp GS, Putman TE, Ainscough BJ, Griffith OL, Torkamani A, Whetzel PL, Mungall CJ, Mooney SD, Su AI, Wu C (2016) Highperformance web services for querying gene and variant annotation. Genome Biol 17(1):91. https://doi.org/10.1186/s13059-016-0953-9 Zeng X, Zhu S, Lu W, Liu Z, Huang J, Zhou Y, Fang J, Huang Y, Guo H, Li L, Trapp BD, Nussinov R, Eng C, Loscalzo J, Cheng F (2020) Target identification among known drugs by deep learning from heterogeneous networks. Chem Sci 11(7):1775–1797. https://doi.org/10.1039/ C9SC04336E

Chapter 5

AOP-Based Machine Learning for Toxicity Prediction Wei Shi, Rong Zhang, and Haoyue Tan

5.1 Introduction Many studies have attempted to use machine learning (ML) algorithms to establish various in silico models (Hoeng and Peitsch 2015), but with the further application of ML algorithms in toxicity prediction, problems have begun to appear (Ahearn 2020; Miller et al. 2018; Zhong et al. 2021). There is a trade-off between the increase in predictive performance and the decrease in the interpretability of in silico models. Moreover, it is difficult to understand the underlying mechanism through those models (Russo et al. 2019). This means that the biological mechanisms behind models cannot be predicted or elucidated. This dreaded consequence of many ML models is known as the “black box” effect (Idakwo et al. 2018). In addition, many studies have shown a lack of consistency between the predicted data and the measured data (Grenet et al. 2018). Thus, we urgently need to find new and improved ML algorithm methods for toxicity prediction. The proposal of the adverse outcome pathway (AOP) framework provides a new solution for the ML algorithm applications in toxicity prediction. AOP can provide a framework for environmental responses in the context of the organism(s). The framework diagram does not describe the entire biological process in detail but employs known toxicity data to make reasonable predictions of toxic events to improve the performance of the ML-based prediction models. This chapter discusses some of the advantages of ML algorithm model construction based on the AOP framework and summarizes some examples of toxicity prediction based on ML algorithms and the W. Shi (B) · R. Zhang · H. Tan State Key Laboratory of Pollution Control and Resources Reuse, School of the Environment, Nanjing University, Nanjing, Jiangsu, China e-mail: [email protected] H. Tan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Hong (ed.), Machine Learning and Deep Learning in Computational Toxicology, Computational Methods in Engineering & the Sciences, https://doi.org/10.1007/978-3-031-20730-3_5

141

142

W. Shi et al.

AOP framework. Through these examples, we can further understand the practical applicability of ML algorithms based on the AOP. Finally, this chapter explains the current headwinds against the AOP framework and possible development directions in future.

5.2 Research Status and Existing Problems for ML There are three problems when using ML to predict toxicity. The first is the ML “black box” effect. The unclear mechanism of toxicity hinders the improvement of toxicity prediction accuracy using ML. The second reason is that ML can shine in in vitro toxicity prediction due to the relatively simple toxicity mechanism in such experiments. However, the predictive accuracy of ML is insufficient for in vivo toxicity endpoints with more complex mechanisms (Wang et al. 2021a). The third is that the ML algorithm itself has certain inherent limitations (Wang et al. 2021b). ML collects data in general and establishes modeling for prediction, but because of the unclear mechanism, toxicity prediction based on ML ultimately leads to low predictive performance. Bhhatarai et al. (2016) used in vitro data from three species to evaluate OASIS. OASIS is a QSAR platform that includes estrogen receptor (ER) and androgen receptor (AR) binding. This study concluded that only when the toxicity mechanism of the compound is clear, the model can show good predictive performance. For some compounds with obscure toxicity mechanisms, the prediction of their toxicity requires further in vivo experiments to obtain more accurate conclusions. Mansouri et al. (2021) established a program named Collaborative Acute Toxicity Modeling Suite (CATMoS). CATMoS used different ML algorithms for model building, such as associative artificial neural networks (ANNS), k-nearest neighbors (KNNs), and support vector machines (SVM), and in total built 139 predictive models. Finally, the obtained model was evaluated quantitatively and qualitatively, and then based on the weight of evidence (WoE) method, the prediction model was combined into the CATMoS consensus model to take advantage of each method. However, the toxicity endpoint of CATMoS is acute oral toxicity in rats, and this individual toxicity lacks a complete AOP framework. There is no AOP for CATMoS to establish ML models, which means 50% of the predictive models in CATMoS have predictive accuracy below 70%. Due to the relatively simple toxicity mechanism in in vitro experiments, a given ML model will have great predictive performance for in vitro toxicity at present; however, because the toxicity mechanism is not yet clear, ML cannot accurately model in vivo toxicity prediction in the context of a complex biological background. Schwarzman et al. (2015) pointed out that because some breast cancer-causing substances aim at different target organs in different species (Cardona and Rudel 2021), and there are huge biological background differences between the in vivo test environment and the in vitro test environment, some prediction results of models that are established through in vitro experimental data or animal data exhibit blatant errors. Huang et al. (2016) pointed out that since current research does not fully understand

5 AOP-Based Machine Learning for Toxicity Prediction

143

the mechanism of toxic effects and the species differences between experimental animals and humans, the predictive performance of models that are only based on in vivo experimental data is not usually satisfactory. The author also indicated that the combination of structure- and activity-based models achieved greater prediction results. The integration of biological mechanism information into prediction models based on the chemical structure has great research potential. For example, in terms of predicting endocrine-disrupting effects, Chen et al. (2018) systematically discussed the molecular mechanism of action of steroid hormone receptors (SHRs) from the perspective of the mechanism of action. The study pointed out that applying the mechanism of some key processes in which SHRs are activated to in silico virtual screening has great research potential. Inspired by this study, Tan et al. (2020) conducted a more in-depth exploration of the key features of nuclear receptor (NR)-mediated endocrine disruptors and their mechanism of action with NRs. The study used molecular docking and molecular dynamics (MD) simulation to explore the relationship between the hierarchical structure and activity of endocrinedisrupting chemicals (EDCs). Studies have pointed out that the secondary fragments of EDCs determine active binding, and the tertiary fragments determine the type of interference activity. On the basis of this research, Tan et al. (2021) dove into the mechanism of action and the structure–activity relationship between EDCs and twelve classical nuclear receptors. The study concluded that the similarity of nuclear receptor structure determines the structural similarity of EDCs, which cause interference activity. Integrating the mechanistic information into the computer model for toxicity prediction can increase the prediction accuracy of the computer model. It can be seen that adding mechanistic information to ML modeling prediction can effectively address the problem of prediction accuracy. There are certain limitations in the ML algorithm itself. Although the development of current ML algorithms is far superior to traditional linear algorithms, such as SVMs, random forest (RF), and neural networks, the development of nonlinear ML algorithms can be more effectively applied to nonlinear biological information. However, the relationship between the data is vague, so only knowledge-driven ML modeling can deliver excellent predictive ability. Currently, the prediction of ML modeling in predictive toxicology is mainly aimed at some endpoints with clear toxic mechanisms, such as liver toxicity, carcinogenicity, and cardiotoxicity. The prediction of endpoints whose mechanism is not yet clear should be improved (Wang et al. 2020). For example, for drug-induced rhabdomyolysis (DIR), Zhou et al. (2021) use the RF algorithm to establish a QSAR model to determine the severity of DIR of drugs on the market. However, due to the lack of a clear toxicity mechanism for DIR, the ability to predict the drug toxicity of the model leaves room for improvement. One study (Huang et al. 2016) pointed out that read-across was highly dependent on the chemical structure similarity. Read-across will face problems such as compounds with special chemical structures and an “activity cliff”. Therefore, this study established a clustering model based on chemical fragments to cluster PubChem analysis data, obtained biological mechanism information by this model, and finally integrated the toxic mechanism into read-across to improve predictive ability. The study showed that models that combine mechanisms can better predict the toxicity of compounds.

144

W. Shi et al.

5.3 General Overview of AOP AOP is a conceptual framework that integrates existing information and associates molecular initiation events with adverse outcomes (Fig. 5.1). There exists significant logic and sequence in this framework, which emphasizes the context of toxic events. The AOP framework is characterized by modularity (Knapen et al. 2018). The AOP framework is not proposed for completely describing complex biological events but instead provides a summary of toxic events from the cell to individual levels (Garcia-Reyero 2015).

5.3.1 The Generation of AOP In 2001, the International Program on Chemical Safety (IPCS) proposed using the mode of action (MOA) to apply animal-to-human toxicity prediction data (SonichMullin et al. 2001). With the refinement of the MOA framework by IPCS, in 2007, “Science” magazine published “toxicity testing in the twenty-first century: a vision and a strategy” (Krewski et al. 2010). The report describes a future vision of toxicity testing and details the feasibility of changes in toxicity testing methods. With the publication of the report, the ecotoxicology community accepted this point of view and produced many discussions on it (Gutsell and Russell 2013). As the discussion deepened, Ankley et al. (2010) proposed the concept of the AOP framework. The proposal of the AOP framework did not produce a new model but integrated existing knowledge to provide a conceptual framework for toxicity prediction (Groh et al. 2015). To regulate the development and application of AOP, the Organization for

Fig. 5.1 Schematic diagrams of the adverse outcome pathway (AOP) and AOP-based machine learning (ML) algorithms for chemical toxicity prediction. Abbreviation: molecular initiating event (MIE), key event (KE), adverse outcome (AO)

5 AOP-Based Machine Learning for Toxicity Prediction

145

Economic Co-operation and Development (OECD) established AOP guidelines in 2013 to steer the development and evaluation of the AOP framework (OECD 2013).

5.3.2 The Framework of AOP There are three main parts of AOP: molecular initiating events (MIEs), adverse outcomes (AOs), and key events (KEs) (Fig. 5.1). These three parts are connected through key event relationships (KERs). MIE refers to the initial event in AOP, which means that the MIE is the interaction between exogenous interfering substances and specific biomolecules or biological systems and is the foundation of the AOP framework (Allen et al. 2014, 2016). Compared with other parts, the connection between the compound and MIE is the strongest. KEs refers to the intermediate event connecting AO and MIE. This event should be measurable, observable, and repeatable. AO refers to adverse biological interference that occurs at the individual or population level and can sometimes refer to tissue or organ effects (Chen et al. 2017). In general, MIE is the starting point determined in the AOP framework, and AO is the endpoint determined in the AOP framework. The two anchor points are connected by KES. There is at least one KE between AO and MIE (Edwards et al. 2016).

5.3.3 Qualitative AOP and Quantitative AOP Qualitative AOP integrates existing knowledge to build a systematic conceptual framework between MIE and AO. However, because qualitative AOP is only a qualitative description of each KE, we cannot use qualitative AOP to accurately assess chemical risks. It is also impossible to predict the dose and concentration of a chemical that will lead to AO at the individual or population level through qualitative AOP. Therefore, quantitative AOP (qAOP), which can quantitatively describe the relationship between MIE, KE, and AO, was born. qAOP consists of one or several biological computing models, which cannot only quantitatively describe the relationship between KEs mathematically but also provide dose–response and time response information (Ying et al. 2021). Therefore, researchers can use qAOP to steer away from long-term animal experiments to obtain estimates of risk (Conolly et al. 2017).

146

W. Shi et al.

5.4 Research Progress of Toxicity Prediction by AOP and ML There are few modeling cases combining AOP and ML for toxicity prediction. This chapter summarizes the different AOP modeling types combined with ML-based on existing cases (Fig. 5.2). Existing research mainly uses ML algorithms to model and predict different parts of the AOP framework. In the AOP framework, MIE and AO are two definite anchor points, among which MIE is the initial molecular interaction produced between the compound and the organism. Studies have conducted ML modeling for MIE, using compound molecule structural characteristics combined with toxic mechanisms to screen out compounds with potentially toxic effects (Benigni 2017; Benigni et al. 2017). Another determined anchor point in the AOP framework is AO. Studies have used in vitro experimental data to classify compounds as MOAs. ML algorithm modeling and prediction based on MOA can provide more AO with clear toxicity mechanism data. For toxicity endpoints with a complete AOP framework, studies have applied ML algorithm models to the entire AOP framework to quantify the activity of compounds on the toxicity pathway or provide a clear toxicity mechanism for modeling and prediction. Finally, the latest research on the combination of the AOP framework and ML technology is moving in the direction of qAOP. qAOP quantifies the relationship between various key events and provides a feasible solution for compound risk assessment. MIE is a key and the first step of the AOP pathway, and MIE directly determines whether the compound can lead to AO effects. Therefore, MIE-based ML toxicity prediction has become a key and first step in screening for potentially toxic compounds (Wedlake et al. 2019). It has been applied to the prediction of liver toxicity, pulmonary toxicity, endocrine-disrupting effects, and mitochondrial toxicity (Gadaleta et al. 2018; Seo et al. 2021; Chen et al. 2016; Troger et al. 2020). Researchers can effectively screen potentially toxic drugs by combining structure and mechanism information by MIE-based ML toxicity prediction (Yang et al. 2018; Wu and Wang 2018). For example, many studies have been performed on the toxicity prediction of endocrine-disrupting effects. At present, the toxicity prediction of endocrine-disrupting effects still suffers from the lack of effective data in vivo and has an unclear MOA. Research on the mechanism of MIE will further promote the progress of ML in toxicity prediction. Chen et al. (2016) pointed out that the QSAR model was used alone to predict the interaction between hydroxylated polybrominated diphenyl ethers (HO-PBDEs) and thyroid hormone receptors (TRs) is insufficient, because there is a “activity cliff” in QSAR prediction (Maggiora 2006) and MD-based simulations can combine the action mechanism to obtain more accurate forecast data. This study used MD simulations based on the mechanism of coregulators to reveal the relationship between HO-PBDEs, TRs, and coregulators. The mechanistic information communicated by this research can open the way for the combination of MIE and ML technology to predict the toxicity of endocrine-disrupting effects.

5 AOP-Based Machine Learning for Toxicity Prediction

147

Fig. 5.2 Deficits of ML for toxicity prediction include toxic mechanism “black box”, the in vivo toxicity low-precision predictive performance, and ML algorithm limitations. In order to solve these problems, many studies used ML models based on the AOP framework to predict the toxicity of compounds. The advantages of AOP for toxicity prediction include using molecular initiating events as the first in silico screening step classifying based on the modes of action, giving a framework to describe the complex biological background, and quantifying the adverse effects at the AOP level. Finally, this figure summarizes the perspectives of AOP-based ML for toxicity prediction

As the mechanism of endocrine-disrupting effects has been continuously studied and clarified, MIE-based ML toxicity prediction has also been more widely used in screening EDCs. In terms of the MIE of NR-mediated toxicity prediction of EDCs, Wang et al. (2013) used in vitro experiments and MD simulations that deployed the particle mesh ewald (PME) algorithm to explore the binding mode of hydroxylated and methoxylated polybrominated diphenyl ethers (HO-/MeO-PBDEs) and AR, and the study concluded that there is a certain error in predicting androgen activity only by the substituent groups and the number of bromine atoms of compounds. The study also pointed out that the relocation of helix 12 plays an important role in AR-mediated toxic effects. Finally, the study concluded that knowledge of MIE can effectively assist structure-based computer toxicity prediction. Similarly, Chen et al. (2019) conducted an MD simulation that used the PME algorithm to analyze the

148

W. Shi et al.

MIE of bisphenols (BPs) interfering with the AR. This study combined the results of in vitro experiments with the results of MD simulation to explain the mechanism of MIE and further qualitative and quantitative analysis of the potential of BPs to disrupt AR-mediated pathways. In terms of NR-mediated toxicity prediction of EDCs, Wang et al. (2017) used MD simulation to model and predict the MIE of the NR-mediated AOP framework. In this study, the researchers developed a system called the predicting potential effective nuclear receptors (SPEN) program to predict potential effective NRs. This approach is based on the AutoDock Vina platform and leverages a database of 39 human NRs. Studies have shown that the combination of MD simulation and MIE can effectively predict potentially effective NRs based on mechanism and structural information. In addition to the endocrine-disrupting effect, similar studies have been carried out focused on liver toxicity prediction. In the face of the apical adverse effects of nonalcoholic fatty liver disease caused by multiple mechanisms, the predictive performance of QSAR will be greatly influenced. In this case, Gadaleta et al. (2018) used the QSAR model to predict MIE, which leads to hepatic steatosis. The study collected information on MIE events of hepatic steatosis in AOP-WiKi. There was a clear pathway relationship between these MIE events and the AO of hepatic steatosis. This study identified six cellular targets as potential MIE events and used the balanced random forest (BRF) algorithm to model MIE events, and the minimum prediction accuracy of most models was greater than 75%. Using these prediction datasets based on the AOP framework, researchers can screen out potential hepatic steatosis causing chemicals and then further combine them with experiments for verification. Troger et al. (2020) applied MIE-based ML to mitochondrial toxicity. Drugs that regulate mitochondria usually cause strong side effects. However, it is difficult to detect whether drugs induce mitochondrial toxicity in the context of traditional animal experiments. This study modeled MIE based on the pesticide rotenone inhibiting mitochondrial respiratory complex I (CI) causing Parkinson’s disease motor deficits AOP. This study first constructed a pharmacophore model by using a structure-based method and then used gradient boosting, RF, and deep learning model ensemble algorithms to screen out compounds that are likely to cause mitochondrial toxicity from public databases. Finally, three in vitro test methods were used to further filter the screened compounds. The mitochondrial toxicity AOP framework combined the predictive power of ML models with interpretability based on structural methods and provides a model prediction network for exploring the prediction of mitochondrial toxicity in complex in vivo environments. A recent study compared several MIE-based modeling techniques in predicting the potential compound pulmonary toxicity (Seo et al. 2021). Pulmonary fibrosis is a structural abnormality caused by scarring of normal alveolar tissue after damage by abnormal body repair. Some fungicides currently on the market can cause AO of pulmonary fibrosis. To predict the pulmonary fibrosis toxicity of a large number of fungicides, this study performed QSAR, molecular dynamics simulation, and pharmacophore model modeling on the pulmonary fibrosis MIE AOP framework and finally found that the prediction performance of the QSAR model, which used the SVM algorithm, was the best. The study finally showed that ML toxicity prediction

5 AOP-Based Machine Learning for Toxicity Prediction

149

based on MIE could effectively screen fungicides that have the potential to cause pulmonary fibrosis. The AOP framework provides a clear toxicity mechanism for the AO of compounds. Therefore, the classification of the compound MOA based on AOP and then the ML modeling prediction for each MOA provides an effective solution for supplying toxicity data. Theoretically, compounds with similar MOAs lead to similar adverse effects. Lichtenstein et al. (2020) were based on a nonalcoholic fatty liver disease AOP framework to evaluate the mixed effect of binary and ternary mixtures of three target pesticides. In this study, the compounds tested in the study were selected in the chemicals grouped by MOA based on nonalcoholic fatty liver disease AOP. The in vitro test experiment for evaluating the mixed effect of the mixture also combined important information about MOA. Finally, the study showed that there exists good research potential in AOP for assessing the effects of combined exposures. The development of ML can also help researchers better understand compound toxicity mechanisms, thereby promoting the development of the AOP framework. For example, Xu et al. (2020) integrated in vitro high-throughput data, literature data, and text mining data combined with five ML algorithms, Naive Bayes, neural network, SVM, RF, and extreme gradient boosting, to establish a predictive model for in vivo endpoints at different organ levels. This research produced a systematic and comprehensive analysis of different in vivo endpoints. The predicted results can reveal the toxicity pathways related to the endpoints and ultimately expand and supplement the existing AOP framework. The theoretical framework of AOP builds a model prediction network for the prediction of in vivo toxicity under complex biological background (Hsieh et al. 2020). In 2010, Ankley et al. (2010) first proposed the concept of AOP. After that, the OECD developed the first AOP framework successfully used for skin sensitization (Edwards et al. 2016). Skin sensitization events are individual skin inflammatory reactions. In the past, researchers usually used animal experiments to predict whether chemicals can trigger skin sensitization effects (Kim et al. 2019). However, with the deepening of the 3R principle, the EU has banned the sale of cosmetics and their ingredients that have been tested on animals since 2009. In this case, the OECD has developed the skin sensitization AOP, and the skin sensitization AOP framework established by the OECD is also the first case that has been successfully applied to regulations. In the context of the AOP of the OECD framework for skin sensitization events, Borba et al. (2020) constructed a Pred-Skin application. Pred-Skin takes advantage of the characteristic that the AOP framework is modular, using a new database of local lymph node assay data containing 1000 compounds, a set of 138 compounds and human skin sensitization data, and a set of in vitro and chemical data to establish QSAR models, and then integrate into a consensus Naive Bayes model which can predict human skin sensitization effects. The skin sensitization AOP framework developed by OECD has a clear and complete structure, and the occurrence of KES exists independently (Sakuratani et al. 2018). Pred-Skin is based on the characteristics of AOP and uses ML algorithms to combine biological information data at the molecular, cell, and organ levels for comprehensive prediction. Finally, the obtained data are integrated into a consensus Naive Bayes model, and the

150

W. Shi et al.

correct classification rate of Pred-Skin is 89%. Pred-Skin based on the skin sensitization AOP framework can be used as a reliable alternative to animal testing for the prediction of in vivo toxicity of human skin allergies. In addition to skin sensitization, the combination of AOP and ML algorithms is also well applied to toxicity endpoints with a complete AOP framework, such as liver toxicity and endocrine-disrupting toxicity. Nonalcoholic fatty liver disease is a common liver disease. Numerous studies connected endocrine disruptor exposure to this disease. Therefore, it is very important to predict the binding potential of compounds and transcription factors. Jain et al. (2020) extracted a series of high-quality in vivo experimental data from public databases based on hepatic steatosis AOP, used the RF algorithm as a base classifier, and used stratified undersampling and mondrian conformal prediction to model in vivo data, which can predict the potential of the compound to cause liver steatosis in human tissues. The study pointed out that hepatic steatosis AOP integrates the toxicity mechanism of the compound into the model, which uses in vivo data for ML modeling. AOP provides a possibility that can quantify the adverse effects of compounds at the pathway level (Judson et al. 2015). The AOP framework combined with the ML algorithm has also been widely used to predict endocrine-disrupting toxicity endpoints. Browne et al. (2015) selected 18 high-throughput in vitro test experiments based on ER-mediated AOP. These 18 experiments not only included KES, such as receptor dimerization, DNA binding, and transactivation in AOP but also included MIE, such as receptor binding. Then, the 18 high-throughput screening results were integrated into a computational model that can predict the ER bioactivity of chemicals. In 11 Tox21/ToxCast analyzes, Kleinstreuer et al. (2017) generated high-throughput screening data for 1855 chemicals. Next, they mapped the generated high-throughput screening data to key biological events in the AR pathway and integrated the data generated by each chemical through a calculation model. The overall AR pathway activity prediction can be derived from this calculation model. The quantified pathway activity also greatly improved the toxicity prediction ability of ML. Mansouri et al. (2016) developed a CERAPP prediction model based on the above research data on ER bioactivity. CERAPP used different QSAR and structure-based methods to establish 40 classification models and 8 continuous models for predicting antagonist, agonists, and binding ER activity, and the score of a single model ranged from 0.69 to 0.85. In addition, Mansouri et al. (2020) obtained quantitative AR pathway activity data from the above AOP-based research to develop a COMAPA prediction model. COMAPA used ML algorithms such as artificial neural networks (ANNs), k-nearest neighbors (KNNs), and linear discriminant analysis (LDA) to construct 91 predictive models for agonist, antagonist, and binding AR activity prediction. Finally, these prediction datasets were combined into a consensus model, which showed great predictive performance. The most recent study combines advanced deep learning algorithms in ML with the AOP framework and has achieved good results in endocrine-disrupting toxicity. Ciallella et al. (2021) developed a knowledge-based deep neural network (k-DNN). They used a virtual AOP (vAOP) framework to simulate the toxicity pathway of ERα and ERβ agonists. The training data selected for the model come from a data set of 42 compounds that

5 AOP-Based Machine Learning for Toxicity Prediction

151

are known for in vivo rodent uterotrophic bioactivity. Deep neural network algorithms were used for modeling. This deep neural network model based on the vAOP framework accurately simulated the comprehensive effects of in vitro bioassays related to the KEs of toxicity pathways and predicted the results of in vivo estrogen activity by combining high-quality biological data during model training. These few examples show that the quantification of compound pathway activity by AOP gives ML higher predictive reliability. qAOP provides a solution for quantitatively predicting the potential risks of chemicals in the AOP pathway (Perkins et al. 2019). Researchers can use computer models for toxicity prediction of various exposure scenarios and compounds by quantifying the relationship between MIE-KE, KE-KE, and KE-AO so that the activity of the compound can be extrapolated with evidence. The inhibition of aromatase leads to a decrease in fish vitellogenin (VTG) enzyme activity, which ultimately causes a decline in fish reproductive capacity (Reynaud and Deschaux 2006). As a typical qAOP, it is mainly composed of eight KEs and contains three independent calculation models (hypothalamic-pituitary–gonadal (HPG) model, oocyte growth dynamics (OGDM) model, population model) (Conolly et al. 2017). These three independent calculation models are coupled together as qAOP. The HPG model uses experimental data. This experiment exposed pimephale promelas to the aromatase inhibitor fadrozole and then quantitatively described the aromatase-catalyzed conversion of testosterone to 17-β estradiol and stimulated the production of VTG by mechanisms related to VTG transport and endocrine feedback. The OGDM model quantitatively describes the relationship between circulating plasma VTG and oocyte development and egg production and quantitatively described the influence of input values on vital rates. The qAOP of this study combined the three models to quantify the effectiveness of chemicals as aromatase inhibitors in reducing fish fertility. Another study on qAOP also quantified the stoichiometry or concentration of compounds at the individual and population levels that led to harmful outcomes at the population level, providing quantitative toxicity data for compound risk assessment. Dioxin-like compounds (DLCs) are a class of substances with similar structures and high affinities for aryl hydrocarbon receptors (AHRs). DLCs can activate AHR and cause AO, which causes the death of individual fish and birds. Doering et al. (2018) used DLCs to activate AHR as MIE, and the increase in individual mortality in early life as AO proposed an indirect quantitative connection and constructed qAOP based on this connection. The research collected several representative concentration–response data of DLCs in the literature from embryo exposure tests and in vitro AHR experiments to construct this qAOP. The actual concentration–response curve of DLCs and mortality was used to verify qAOP. qAOP uses the reference compound AOP pathway activity as the unit activity intensity to quantify the stoichiometry or concentration of the compound that causes AO at the individual and population levels. This can provide quantitative toxicity data for compound risk assessment (Table 5.1).

152

W. Shi et al.

Table 5.1 Example summary of AOP combined with ML for toxicity prediction Research case Toxicology endpoint

Used methods

References

MIE

Androgen endocrine disruption

Particle mesh ewald (PME)

Wang et al. (2013)

Androgen endocrine disruption

PME

Chen et al. (2019)

Nuclear receptor-mediated endocrine disruption

AutoDock Vina

Wang et al. (2017)

Hepatic steatosis

Balanced random forest (BRF)

Gadaleta et al. (2018)

Mitochondrial toxicity

Gradient boosting, random forest (RF), deep learning

Troger et al. (2020)

Pulmonary fibrosis

Support vector machines (SVMs), genetic algorithm (GA)

Seo et al. (2021)

Nonalcoholic fatty liver disease

Concentration–response model

Lichtenstein et al. (2020)

Carcinogenicity, cardiotoxicity, developmental toxicity, hepatotoxicity, nephrotoxicity, neurotoxicity, reproductive toxicity, skin toxicity

Naive Bayes, neural network, SVM, RF, extreme gradient boosting

Xu et al. (2020)

Skin sensitization

Naive Bayes

Borba et al. (2020)

Nonalcoholic fatty liver disease

RF, stratified undersampling, mondrian conformal prediction

Jain et al. (2020)

Estrogen endocrine disruption

SVM, RF, k-nearest neighbors Browne et al. (KNNs), decision forest (DF), … (2015) and Mansouri et al. (2016)

Androgen endocrine disruption

Artificial neural networks (ANNs), KNNs, linear discriminant analysis (LDA), …

Kleinstreuer et al. (2017) and Mansouri et al. (2020)

Estrogen endocrine disruption

Knowledge-based deep neural network (k-DNN)

Ciallella et al. (2021)

The decline in fish reproductive capacity

Hypothalamic-pituitary–gonadal (HPG), oocyte growth dynamics (OGDM), population model

Conolly et al. (2017)

Increase in individual mortality in early life

Concentration–response model

Doering et al. (2018)

AO

AOP framework

quantitative AOP (qAOP)

5 AOP-Based Machine Learning for Toxicity Prediction

153

5.5 Perspectives and Future Prospects of AOP The development of ML models based on the complete AOP framework is limited by the severe lack of information on toxicity mechanisms and related data. Many studies have shown that compounds can interfere with thyroid hormone (TH) signals to cause abnormal brain development and other AO in animals (Bernal and Nunez 1995). At present, studies have proposed a thyroid AOP network, but due to the lack of relevant research data, most KEs have no test experiments (Noyes et al. 2019). To solve this problem, future research should focus on the development of highthroughput test experiments based on clear AOPs, such as ToxCastTM/Tox21, to increase biological activity data. There are problems of incompleteness and lack of information on toxicity mechanisms in the AOP framework. At present, the toxicity mechanism of some endpoints remains unclear. With the advent of the big data era, researchers can use ML and biological activity big data to reverse engineer the possible AOP framework and conduct experimental verification. In the actual application of toxicity prediction using the AOP framework, it cannot be ignored that there exists a “black box” with regard to the compound MOA. The most critical issue is that the MIE of most chemical substances is still unknown, which severely limits the application of AOPs in the risk assessment of chemical substances. Future research on AOP can aim at each research event by building a predictive model to predict the activation/inhibition relationship of compounds on multiple MIEs in batches. There is a problem that the existing AOP has not yet considered the exposure process of chemical substances. The difference between the in vitro exposure concentration and in vivo exposure concentration greatly increases the uncertainty of the prediction. To solve this problem, researchers can integrate the PBPK model to the quantitative prediction of qAOP, taking full account of the active concentration of the target in the body. In summary, although AOP-based ML toxicity prediction is still in its infancy, and examples are still insufficient, with the development of existing alternative test methods, the “explosive” development of biological activity data provides great development potential for AOP-based toxicity prediction. Acknowledgements This work was supported by the Natural Science Foundation of China (21922603) and Identification and evaluation of anti-androgen active substances in water based on bioconcentration transformation process and biomimetic technology (21577058).

References Ahearn A (2020) The art of the algorithm: machine learning in environmental health research, with Nicole Kleinstreuer. Environ Health Perspect Res Perspect 1. https://doi.org/10.1289/EHP6874 Allen TE, Goodman JM, Gutsell S, Russell PJ (2014) Defining molecular initiating events in the adverse outcome pathway framework for risk assessment. Chem Res Toxicol 27(12):2100–2112 Allen TE, Goodman JM, Gutsell S, Russell PJ (2016) A history of the molecular initiating event. Chem Res Toxicol 29(12):2060–2070

154

W. Shi et al.

Ankley GT, Bennett RS, Erickson RJ, Hoff DJ, Hornung MW, Johnson RD, Mount DR, Nichols JW, Russom CL, Schmieder PK (2010) Adverse outcome pathways: a conceptual framework to support ecotoxicology research and risk assessment. Environ Toxicol Chem 29(3):730–741 Benigni R (2017) Building predictive adverse outcome pathway models: role of molecular initiating events and structure–activity relationships. Appl Vitro Toxicol 3(3):265–270 Benigni R, Battistelli CL, Bossa C, Giuliani A, Tcheremenskaia O (2017) Endocrine disruptors: data-based survey of in vivo tests, predictive models and the adverse outcome pathway. Regul Toxicol Pharmacol 86:18–24 Bernal J, Nunez J (1995) Thyroid hormones and brain development. Eur J Endocrinol 133(4):390– 398 Bhhatarai B, Wilson DM, Price PS, Marty S, Parks AK, Carney E (2016) Evaluation of oasis QSAR models using toxcast™ in vitro estrogen and androgen receptor binding data and application in an integrated endocrine screening approach. Environ Health Perspect 124(9):1453–1461 Borba JV, Braga RC, Alves VM, Muratov EN, Kleinstreuer N, Tropsha A, Andrade CH (2020) Pred-skin: a web portal for accurate prediction of human skin sensitizers. Chem Res Toxicol 34(2):258–267 Browne P, Judson RS, Casey WM, Kleinstreuer NC, Thomas RS (2015) Screening chemicals for estrogen receptor bioactivity using a computational model. Environ Sci Technol 49(14):8804– 8814 Cardona B, Rudel RA (2021) Application of an in vitro assay to identify chemicals that increase estradiol and progesterone synthesis and are potential breast cancer risk factors. Environ Health Perspect 129(7):077003 Chen Q, Wang X, Shi W, Yu H, Zhang X, Giesy JP (2016) Identification of thyroid hormone disruptors among HO-PBDEs: in vitro investigations and coregulator involved simulations. Environ Sci Technol 50(22):12429–12438 Chen Q, Tan H, Wei S, Yu H (2017) Application and prospect of computational toxicology in screening of endocrine disrupting chemicals. Asian J Ecotoxicol Chen Q, Tan H, Yu H, Shi W (2018) Activation of steroid hormone receptors: shed light on the in silico evaluation of endocrine disrupting chemicals. Sci Total Environ 631:27–39 Chen Q, Wang X, Tan H, Shi W, Zhang X, Wei S, Giesy JP, Yu H (2019) Molecular initiating events of bisphenols on androgen receptor-mediated pathways provide guidelines for in silico screening and design of substitute compounds. Environ Sci Technol Lett 6(4):205–210 Ciallella HL, Russo DP, Aleksunes LM, Grimm FA, Zhu H (2021) Revealing adverse outcome pathways from public high-throughput screening data to evaluate new toxicants by a knowledgebased deep neural network approach. Environ Sci Technol 55(15):10875–10887 Conolly RB, Ankley GT, Cheng W, Mayo ML, Miller DH, Perkins EJ, Villeneuve DL, Watanabe KH (2017) Quantitative adverse outcome pathways and their application to predictive toxicology. Environ Sci Technol 51(8):4661–4672 Doering JA, Wiseman S, Giesy JP, Hecker M (2018) A cross-species quantitative adverse outcome pathway for activation of the aryl hydrocarbon receptor leading to early life stage mortality in birds and fishes. Environ Sci Technol 52(13):7524–7533 Edwards SW, Tan Y-M, Villeneuve DL, Meek M, Mcqueen CA (2016) Adverse outcome pathways—organizing toxicological information to improve decision making. J Pharmacol Exp Ther 356(1):170–181 Gadaleta D, Manganelli S, Roncaglioni A, Toma C, Benfenati E, Mombelli E (2018) QSAR modeling of toxcast assays relevant to the molecular initiating events of AOPs leading to hepatic steatosis. J Chem Inf Model 58(8):1501–1517 Garcia-Reyero N (2015) Are adverse outcome pathways here to stay? Environ Sci Technol 49(1):3–9 Grenet I, Yin Y, Comet JP, Gelenbe E (2018) Machine learning to predict toxicity of compounds. In: Proceedings of the ICANN 2018: artificial neural networks and machine learning. Rhodes, Greece, 4–7 Oct 2018. Springer, Berlin, pp 335–345 Groh KJ, Carvalho RN, Chipman JK, Denslow ND, Halder M, Murphy CA, Roelofs D, Rolaki A, Schirmer K, Watanabe KH (2015) Development and application of the adverse outcome pathway

5 AOP-Based Machine Learning for Toxicity Prediction

155

framework for understanding and predicting chronic toxicity: I. Challenges and research needs in ecotoxicology. Chemosphere 120:764–777 Gutsell S, Russell P (2013) The role of chemistry in developing understanding of adverse outcome pathways and their application in risk assessment. Toxicol Res 2(5):299–307 Hoeng J, Peitsch MC (2015) Computational systems toxicology. Springer, New York Hsieh J-H, Sedykh A, Mutlu E, Germolec DR, Auerbach SS, Rider CV (2020) Harnessing in silico, in vitro, and in vivo data to understand the toxicity landscape of polycyclic aromatic compounds (PACS). Chem Res Toxicol 34(2):268–285 Huang R, Xia M, Sakamuru S, Zhao J, Shahane S, Attene-Ramos M, Zhao T, Austin C, Simeonov A (2016) Modelling the Tox21 10 K chemical profiles for in vivo toxicity prediction and mechanism characterization. Nat Commun 7:10425 Idakwo G, Luttrell J, Chen M, Hong H, Zhou Z, Gong P, Zhang C (2018) A review on machine learning methods for in silico toxicity prediction. J Environ Sci Health C 36(4):169–191 Jain S, Norinder U, Escher SE, Zdrazil B (2020) Combining in vivo data with in silico predictions for modeling hepatic steatosis by using stratified bagging and conformal prediction. Chem Res Toxicol 34(2):656–668 Judson RS, Magpantay FM, Chickarmane V, Haskell C, Tania N, Taylor J, Xia M, Huang R, Rotroff DM, Filer DL (2015) Integrated model of chemical perturbations of a biological pathway using 18 in vitro high-throughput screening assays for the estrogen receptor. Toxicol Sci 148(1):137–154 Kim JY, Kim MK, Kim K-B, Kim HS, Lee B-M (2019) Quantitative structure–activity and quantitative structure–property relationship approaches as alternative skin sensitization risk assessment methods. J Environ Sci Health A 82(7):447–472 Kleinstreuer NC, Ceger P, Watt ED, Martin M, Houck K, Browne P, Thomas RS, Casey WM, Dix DJ, Allen D (2017) Development and validation of a computational model for androgen receptor activity. Chem Res Toxicol 30(4):946–964 Knapen D, Angrish MM, Fortin MC, Katsiadaki I, Leonard M, Margiotta-Casaluci L, Munn S, O’Brien JM, Pollesch N, Smith LC (2018) Adverse outcome pathway networks i: development and applications. Environ Toxicol Chem 37(6):1723–1733 Krewski D, Acosta D Jr, Andersen M, Anderson H, Bailar Iii JC, Boekelheide K, Brent R, Charnley G, Cheung VG, Green Jr S (2010) Toxicity testing in the 21st century: a vision and a strategy. J Toxicol Environ Health Part B 13(2–4):51–138 Lichtenstein D, Luckert C, Alarcan J, De Sousa G, Gioutlakis M, Katsanou ES, Konstantinidou P, Machera K, Milani ES, Peijnenburg A (2020) An adverse outcome pathway-based approach to assess steatotic mixture effects of hepatotoxic pesticides in vitro. Food Chem Toxicol 139:111283 Maggiora GM (2006) On outliers and activity cliffs why QSAR often disappoints. J Chem Inf Model 46(4):1535 Mansouri K, Abdelaziz A, Rybacka A, Roncaglioni A, Tropsha A, Varnek A, Zakharov A, Worth A, Richard AM, Grulke CM (2016) CERAPP: collaborative estrogen receptor activity prediction project. Environ Health Perspect 124(7):1023–1033 Mansouri K, Kleinstreuer N, Abdelaziz AM, Alberga D, Alves VM, Andersson PL, Andrade CH, Bai F, Balabin I, Ballabio D (2020) Compara: collaborative modeling project for androgen receptor activity. Environ Health Perspect 128(2):027002 Mansouri K, Karmaus AL, Fitzpatrick J, Patlewicz G, Pradeep P, Alberga D, Alepee N, Allen TE, Allen D, Alves VM (2021) Catmos: collaborative acute toxicity modeling suite. Environ Health Perspect 129(4):047013 Miller TH, Gallidabino MD, MacRae JI, Hogstrand C, Bury NR, Barron LP, Snape JR, Owen SF (2018) Machine learning for environmental toxicology: a call for integration and innovation. Environ Sci Technol 52(22):12953–12955 Noyes PD, Friedman KP, Browne P, Haselman JT, Gilbert ME, Hornung MW, Barone S Jr, Crofton KM, Laws SC, Stoker TE (2019) Evaluating chemicals for thyroid disruption: opportunities and challenges with in vitro testing and adverse outcome pathway approaches. Environ Health Perspect 127(9):095001

156

W. Shi et al.

OECD (2013) Guidance document on developing and assessing adverse outcome pathways. In: Series on testing and assessment, vol 184. Available http://www.oecd.org/officialdocuments/pub licdisplaydocumentpdf/?cote=env/jm/mono(2013)6&doclanguage=en Perkins EJ, Ashauer R, Burgoon L, Conolly R, Landesmann B, Mackay C, Murphy CA, Pollesch N, Wheeler JR, Zupanic A (2019) Building and applying quantitative adverse outcome pathway models for chemical hazard and risk assessment. Environ Toxicol Chem 38(9):1850–1865 Reynaud S, Deschaux P (2006) The effects of polycyclic aromatic hydrocarbons on the immune system of fish: a review. Aquat Toxicol 77(2):229–238 Russo DP, Strickland J, Karmaus AL, Wang W, Shende S, Hartung T, Aleksunes LM, Zhu H (2019) Nonanimal models for acute toxicity evaluations: applying data-driven profiling and read-across. Environ Health Perspect 127(4):047001 Sakuratani Y, Horie M, Leinala E (2018) Integrated approaches to testing and assessment: OECD activities on the development and use of adverse outcome pathways and case studies. Basic Clin Pharmacol Toxicol 123:20–28 Schwarzman MR, Ackerman JM, Dairkee SH, Fenton SE, Johnson D, Navarro KM, Osborne G, Rudel RA, Solomon GM, Zeise L (2015) Screening for chemical contributions to breast cancer risk: a case study for chemical safety evaluation. Environ Health Perspect 123(12):1255–1264 Seo M, Chae CH, Lee Y, Kim HR, Kim J (2021) Novel QSAR models for molecular initiating event modeling in two intersecting adverse outcome pathways based pulmonary fibrosis prediction for biocidal mixtures. Toxics 9(3):59 Sonich-Mullin C, Fielder R, Wiltse J, Baetcke K, Dempsey J, Fenner-Crisp P, Grant D, Hartley M, Knaap A, Kroese D (2001) IPCS conceptual framework for evaluating a mode of action for chemical carcinogenesis. Regul Toxicol Pharmacol 34(2):146–152 Tan H, Wang X, Hong H, Benfenati E, Giesy JP, Gini GC, Kusko R, Zhang X, Yu H, Shi W (2020) Structures of endocrine-disrupting chemicals determine binding to and activation of the estrogen receptor α and androgen receptor. Environ Sci Technol 54(18):11424–11433 Tan H, Chen Q, Hong H, Benfenati E, Gini GC, Zhang X, Yu H, Shi W (2021) Structures of endocrine-disrupting chemicals correlate with the activation of 12 classic nuclear receptors. Environ Sci Technol 55(24):16552–16562 Troger F, Delp J, Funke M, Van Der Stel W, Colas C, Leist M, Van De Water B, Ecker GF (2020) Identification of mitochondrial toxicants by combined in silico and in vitro studies—a structurebased view on the adverse outcome pathway. Comput Toxicol 14:100123 Wang X, Yang H, Hu X, Zhang X, Zhang Q, Jiang H, Shi W, Yu H (2013) Effects of ho-/meo-pbdes on androgen receptor: in vitro investigation and helix 12-involved md simulation. Environ Sci Technol 47(20):11802–11809 Wang X, Zhang X, Xia P, Zhang J, Wang Y, Zhang R, Giesy JP, Shi W, Yu H (2017) A highthroughput, computational system to predict if environmental contaminants can bind to human nuclear receptors. Sci Total Environ 576:609–616 Wang MW, Goodman JM, Allen TE (2020) Machine learning in predictive toxicology: Recent applications and future directions for classification models. Chem Res Toxicol 34(2):217–239 Wang L, Zhao L, Liu X, Fu J, Zhang A (2021a) Seppcnet: deeping learning on a 3d surface electrostatic potential point cloud for enhanced toxicity classification and its application to suspected environmental estrogens. Environ Sci Technol 55(14):9958–9967 Wang Z, Chen J, Hong H (2021b) Developing QSAR models with defined applicability domains on PPARγ binding affinity using large data sets and machine learning algorithms. Environ Sci Technol 55(10):6857–6866 Wedlake AJ, Folia M, Piechota S, Allen TE, Goodman JM, Gutsell S, Russell PJ (2019) Structural alerts and random forest models in a consensus approach for receptor binding molecular initiating events. Chem Res Toxicol 33(2):388–401 Wu Y, Wang G (2018) Machine learning based toxicity prediction: from chemical structural description to transcriptome analysis. Int J Mol Sci 19(8):2358 Xu T, Wu L, Xia M, Simeonov A, Huang R (2020) Systematic identification of molecular targets and pathways related to human organ level toxicity. Chem Res Toxicol 34(2):412–421

5 AOP-Based Machine Learning for Toxicity Prediction

157

Yang H, Sun L, Li W, Liu G, Tang Y (2018) In silico prediction of chemical toxicity for drug design using machine learning methods and structural alerts. Front Chem 6:30 Ying P, Hanxin Z, Xiaowei Z (2021) Research advance of quantitative adverse outcome pathways (qAOPs) in environmental chemicals toxicity assessment I: model building and application cases. Asian J Ecotoxicol 3:1–13 Zhong S, Zhang K, Bagheri M, Burken JG, Gu A, Li B, Ma X, Marrone BL, Ren ZJ, Schrier J (2021) Machine learning: new ideas and tools in environmental science and engineering. Environ Sci Technol 55(19):12741–12754 Zhou Y, Li S, Zhao Y, Guo M, Liu Y, Li M, Wen Z (2021) Quantitative structure–activity relationship (QSAR) model for the severity prediction of drug-induced rhabdomyolysis by using random forest. Chem Res Toxicol 34(2):514–521

Chapter 6

Graph Kernel Learning for Predictive Toxicity Models Youjun Xu, Chia-Han Chou, Ningsheng Han, Jianfeng Pei, and Luhua Lai

6.1 Introduction When a candidate drug is pushed into preclinical studies or clinical trials, it is always accompanied by an ultra-high risk of failure in drug discovery and development. Plenty of expensive experiments must be carried out to evaluate the safety and efficacy of a drug candidate. Generally, systematic toxicity testing is crucial for drug safety assessment. Owing to the complexity and diversity of organisms, various toxicity endpoints could be induced by drug exposure with complicated or unknown Y. Xu (B) Peking-Tsinghua Center for Life Sciences at the College of Chemistry and Molecular Engineering, BNLMS, Peking University, Beijing 100871, China e-mail: [email protected] Y. Xu · C.-H. Chou · N. Han Infinite Intelligence Pharma, Beijing 100083, China e-mail: [email protected] N. Han e-mail: [email protected] J. Pei · L. Lai Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China e-mail: [email protected] L. Lai e-mail: [email protected] L. Lai Peking-Tsinghua Center for Life Sciences at the College of Chemistry and Molecular Engineering, BNLMS, Peking University, Beijing 100871, China Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Hong (ed.), Machine Learning and Deep Learning in Computational Toxicology, Computational Methods in Engineering & the Sciences, https://doi.org/10.1007/978-3-031-20730-3_6

159

160

Y. Xu et al.

mechanisms. Accordingly, it is of great value for the identification of potentially toxic compounds to evade unwanted toxicity that can determine adverse drug reactions and even the failure of drug development. Many efforts on drug developments have been made to pursue low-toxic drugs. In this chapter, we focus on in silico predictions of chemical toxicity via machine learning and deep learning algorithms. With the upsurge of artificial intelligence, a large number of machine learning and deep learning techniques have emerged to predict different toxic endpoints and to reduce the toxicity risk of investigated drugs. These learning techniques aim to build a mapping from molecular structures to the probability of toxicity based on labeled data sets. Currently, many toxicity-related data sources have been collected in this review paper (Yang et al. 2018). Using these public data sources, researchers make efforts on collecting and curating the data of interest for developing predictive services. There are some popular Web servers to help predicting multiple toxicity endpoints such as ProTox-II (Banerjee et al. 2018), AdmetSAR 2.0 (Yang et al. 2019a), and ADMETLab (Dong et al. 2018). All these servers are developed based on machine learning algorithms. In the past ten years, deep learning techniques are gradually applied to toxicity prediction tasks. Some useful deep learning toolkits have been implemented to help users better develop and evaluate predictive models of interest, including DeepChem (Ramsundar et al. 2019) and ChemProp (Yang et al. 2019b). MoleculeNet (Wu et al. 2018) provided a benchmark to chemical property prediction tasks and published a set of toxicity-related data sets, which are frequently used to test new algorithms. We divide these models into two categories: (i) traditional machine learningbased models, like support vector machine (SVM); (ii) deep learning-based models, like graph neural networks (GNN). The traditional machine learning-based models usually prefer to use selected and predefined molecular descriptors or fingerprint as feature vectors to construct specific predictive models of interest. Meanwhile, developers would pay attention to domain of applications and interpretability of these predictive models, which is directly connected to real applications. With the complexity of task problems, deep learning techniques focus on extracting key features from raw data to implement the pipeline of automatic molecular representations (MRs). Without manual intervention, it can automatically learn task-specific MRs via supervised learning and obtain expressive prediction models. However, due to the “black box” learning process, there is a lack of explanations for the predictive behaviors of the models. The generalization capability and applications of deep learning-based toxicity predictive models in real-world need to be further explored. To address these problems, MR learning techniques have been proposed and applied in chemoinformatics. Ideally, we would like to design a set of MRs with three characteristics of predictability, generalizability, and explainability. Since molecular structure is the graph-structure data, graph-driven learning technique is preferable to learn representations directly from molecular graphs, including GNNs, graph embeddings (GEs), and graph kernels (GKs). The GNNs aim at simultaneously learning task-specific representations and making predictive outcomes of tasks. GNN models are usually of great predictive power. Inspired by word embeddings, graph embedding techniques focus on learning a general MR in an isolated stage, and then, the

6 Graph Kernel Learning for Predictive Toxicity Models

161

general representation is fine-tuned for the downstream tasks. Based on the concepts of kernel tricks (Schölkopf et al. 1997), the GKs are intended to learn a kernel matrix which is used to measure the similarity between graphs, and this kernel matrix can be used for graph classification, clustering, and similarity searching. The learned kernel matrix also can be used for reducing dimension to perform explainable representation visualization on a two- or three-dimensional feature map. The goal of these graph-driven learning techniques is to implement a set of expressive representations that can be used for better predictive performance. Actually, it is necessary for chemical toxicity evaluation to develop fully data-driven models with guiding significance based on these learning techniques. Here, we propose a new concept of graph kernel learning (GKL), which aims to utilize the graph-driven techniques of GNNs, GEs, and GKs to learn meaningful MRs in a data-driven manner. In this chapter, we focus on GKL techniques on predicting chemical toxicity problems. Section 6.2 gives a brief introduction of graph theory and graph kernels for readers to understand the fundamentals of graphs. Section 6.3 offers a systematic summary on graph kernels in chemoinformatics, aiming to describe the role of graph kernels. Section 6.4 presents representative methods and their applications about GKL for toxicity prediction. Section 6.5 describes current challenges of GKL methods on chemical toxicity problems. In the final section, conclusions and perspectives for future developments and applications of GKL techniques will be given.

6.2 A Brief Introduction of Graph Concepts 6.2.1 Graph Theory Definitions Graph. A general directed graph G is composed by a pair (V , E), where V is the set of nodes (or vertices), and E ⊆ {(u, v) ∈ V × V } is the set of edges where each (u, v) ∈ E represents a directed connection from vertex u to vertex v in V . In particular, a non-loop undirected graph is that no two edges with identical end points and self-cycles exists. Mathematically, the set of edges, E, satisfies the following two conditions: • For all (u, v) ∈ E(G), (u, v) = (v, u) (undirected edges) • If (u, v) ∈ E(G), u /= v (non-loops). Labeled graph. A labeled graph is a triple G(V , E, L), where (V , E) is a graph as defined previously, and L : V ∪ E → Z is a mapping from the set of nodes V and edges E to the set of node and edge labels Z . L can be defined only on nodes (L : V → Z ) or labels (L : E → Z ). Sometimes labels and labeled graphs are also referred to attributes and attributed graphs. If n is the total number of nodes in V (e.g., n := |V |, cardinality of the set V ). The adjacency matrix An×n of the graph G = (V , E) is defined by

162

Y. Xu et al.

{ Ai j =

) ( 1, vi , v j ∈ E 0, otherwise

where vi and v j are nodes in V . Note that if G is a non-loop undirected graph, its adjacency matrix A is symmetric with diagonal entries equal to zero. Node degree. The degree dG (u) of a node u in G = (V , E) is the number of edges at u, that is dG (u) := |{v : (u, v) ∈ E}|. The numbers ∆min (G) := min{dG (v), v ∈ V } and ∆max (G) := max{dG (v), v ∈ V } are the minimum degree and maximum degree of G, respectively. Note that ∆min (G) = 0 implies the existence of isolated vertices, and ∆min (G) = n implies that at least one vertex is connected to all other nodes including itself. The degree of the graph G is defined by the following formula, dG (G) :=

1 1 ∑ ∫ dG (v) = dG (v), |V | v∈V |V | v∈V

the average degree of all the nodes in G. Walk, path, and circle. A walk ω in a graph G is a non-empty alternating sequence (v1 , e1 , v2 , e2 , ..., ek , vk ) of nodes and edges in G where {vi } ⊆ V and {ei } ⊆ E for i = 1, ...k such that ei = (vi , vi+1 ) for all i = 1, ..., k − 1. If v1 = vk , the walk is closed. If the nodes in ω are all distinct, it defines a path p in G, denoted (v1 , v2 , ..., vk ). If v1 = vk , then p is a cycle. ) ( ' Subgraph.( A graph) G ' = V ' , E ' is a subgraph ( ' of graph ) G = (V , E)' if V ⊆ V ' ' ' ' ' and E ⊆ V × V ∩ E. If additionally, E = V × V ∩ E, we call G an induced subgraph of G. We call a subgraph is a clique if it is complete. We recall that a graph G = (V , E) is complete if all vertices are (for all u, v ∈ V , (u, v) ∈ E). ( ) Isomorphism. We say two graphs G = (V , E) and G ' = V ' , E ' are isomorphic (identical) if there exist a one-to-one correspondence between vertices and nodes. We terms, there exists a bijection f : V → V ' such denote it by(G . )G ' . In mathematical ( ( ' )) ' that for all v, v ∈ E, f (v), f v ∈ E ' . The map f is called an isomorphism function. In order to compare similarities between graphs, the problem of graph comparison has been an active domain in many areas. In other words, given two graphs( G and) G ' , graph comparison problem aims to find a map s : } × } → R such that s G, G ' quantifies the similarity between G and G ' . A few traditional approaches have been reviewed in Ghosh et al. (2018).

6 Graph Kernel Learning for Predictive Toxicity Models

163

6.2.2 Graph Kernels Fundamentals 6.2.2.1

Kernel Methods

Kernel methods refer to machine learning algorithms that learn by comparing pairs of data points using particular similarity measures, that is, kernels. Let X be a non-empty set of data points such as Rd or space of graphs, we call the map k : X × X → R a kernel if there is a Hilbert space Hk and a map φ : X → Hk such that k(x, y) = , where : Hk × Hk → R is the inner product of Hk . Such feature map exists if and only if k is a positive-semi definite function. A Hilbert space is a complete metric space equipped with an inner product√·, ·, the norm (magnitude) of the elements is induced by its inner product: ||x|| = , and a matrix M is called positive-semidefinite if z T M z ≥ 0 for all vectors z. A simple example to understand this notion is to consider X = Rd and φ : Rd → Rd the identity map and the kernel: k(x, y) = = = x T y, for all x, y ∈ X = Rd . Note that the inner product in the Hilbert space Rd is the dot product of vectors. The Gram matrix K defined with respect to a finite set of ( data )points x1 , ..., xm ∈ X of a kernel k is a m ×m matrix which has entries K i j := k xi , x j for i, j ∈ {1, ..., m}. Therefore, kernel methods have the desirable property that they do not rely on explicitly characterizing the vector representation φ(x) of data points but access data only via the Gram matrix K . The benefit of this is often illustrated using the Gaussian radial basis function (RBF) kernel on Rd , d ∈ N, defined as. ) ( ||x − y||2 , kRBF (x, y) = exp − 2σ 2 where σ is a bandwidth parameter. Since there are infinitely many vectors in the Hilbert space Rd , the Gram matrix K is infinite dimensional. However, the kernel may be readily computed for any pair of point x, y ∈ Rd . In the past two decades, kernel methods have been developed for most machine learning paradigms such as support vector machines (SVM) for classification Cortes and Vapnik (1995), Gaussian processes for regression Rasmussen (2003), and kernel principal component analysis for unsupervised learning and clustering Schölkopf et al. (1997). If X = G, space of graphs, a kernel k : G × G → R is called a graph kernel, and this will be explained in the next Sect. 6.2.2.2.

6.2.2.2

Graph Kernels

The concept of graph kernels was first proposed in 2003 for solving graph comparisons. In the past 20 years, graph kernels have become an established and widely used technique for solving classification tasks on graph data. In fact, earlier techniques

164

Y. Xu et al.

similar to graph kernels can be traced back to fingerprint similarities Rogers and Hahn (2010) in the 70s Adamson and Bush (1973). Later in 2000s, several kernels were specifically designed for graphs, including random walk kernels Kashima et al. (2003), tree pattern kernels Mahé and Vert (2009), and neighborhood hash kernels Hido and Kashima (2009). Recently, deep learning-based kernel methods were proposed for graph label and link prediction, including graph convolutional networks (GCN, Kipf and Welling (2017)), neural fingerprint (NeuralFP, Duvenaud et al. (2015)), and message passing neural networks (MPNN, Gilmer et al. (2017)). We recall that a kernel k : G × G → R is called a graph kernel where G is a space of graphs following the same notations in the previous section. The majority of graph kernels are also called convolution kernels. That is, decomposing two graphs into substructures, e.g., vertices or subgraphs, and then evaluate a kernel between each pair of substructures. Mathematically, the convolution kernel is defined as the following: Let R = R1 × · · · × Rd denote a space of components such that a composite object (graph) G ∈ } decomposes into elements of R. Further, let R : R → G be the mapping from the set of components R to the corresponding graph R(x) = G, and R −1 (G) = {x ∈ R : R(x) = G} is the set of all components of the graph }. Then, the R-convolution kernel is. ( ) kC V G, G ' =





d ||

ki (xi , yi ),

x∈R −1 (G) y∈R −1 (G ' ) i=1

where x = (x1 , ..., xd ), y = (y1 , ..., yd ) and ki is a kernel on Ri for i ∈ {1, ..., d}. Similar to kernels defined on vector spaces, graph kernels can be calculated either explicitly (by computing the mapping φ) or implicitly (by computing only the base kernel k). Generally, learning with implicit kernel representations means that the value of the chosen kernel between every pair of graphs must be computed and stored. On the other hand, explicit kernel representation means that we compute a finite dimensional feature vector for each graph, and the values of the kernel can then be computed during learning as the inner product of feature vectors. Indeed, if explicit representation is possible, and the dimensionality of the resulting feature vectors is not too high, or the vectors are sparse, then it is usually faster and more memory efficient than implicit computation Kriege et al. (2019). The traditional computational approaches have been comprehensively reviewed in Kriege et al. (2020). Similarly, GE techniques are used to learn explicit low-dimensional feature vectors, which is beneficial to explicit representation computation of graph kernels.

6 Graph Kernel Learning for Predictive Toxicity Models

165

6.3 Graph Kernel Learning for Molecular Representations To develop high-quality predictive models, representing compounds as efficient feature vectors are crucial. Due to the complicated mechanism of toxicity endpoints, a large number of molecular features were usually explored for constructing reliable models against specific endpoints of interest. For example, Mayr et al. (2016) used more than 10 thousand dimensional features to develop the DeepTox models against 12 toxic targets from the Tox21 data set Richard et al. (2020). Deep learning-driven MR is an option to help evade the complex and uncertain feature engineering and even achieve the state-of-the-art on some specific problems. This option is also preferable to combine with traditional molecular descriptors for providing reasonable predictive results with less overfitting problems Stokes et al. (2020). For deep learning-driven MR, it means a computational operation of automatic learn and extract features via supervised learning and unsupervised learning. One popular approach is to learn features directly from a molecule structure. Actually, the structure is naturally a graph-structure data, which is well represented by a graph in which vertices take the places of atoms and edges that of bonds. The graph is a pair {V, E} with a finite set of verticesV and a set of edgesE ⊆ {{u, v} ⊆ V |u /= v}. The detailed definitions can refer to Sect. 6.2.1. A vertex is as an atom and an edge a bond between atoms. The chemical properties of these atoms and bonds may be represented as vertex and edge attributes. Typically, a molecule graph usually has less than 100 vertices with the degree of less than 5. For a molecule graph, the atom symbol (vertex attribute) has often less than 10 types, and the bond order (edge attribute) often has four types: single, double, triple, and aromatic type. The single bond may have chirality, which can be represented as edge direction. Graph kernels often are viewed as the from k(G 1 , G 2 ) = (G 2 ) determined by the mapping function φ and an inner product . In terms of GKL on molecular graphs, an obvious challenge is to build or learn effective MRs from molecular graphs. Another challenge is how to learn the pairwise relationships without using an inner product operation. For MRs, the earliest fingerprint techniques have been used to address these problems for a long time. Molecular fingerprint is a well-established classical technique to represent molecules by feature vectors. Commonly, vector features are obtained by (i) enumeration of all substructures of a certain class (path, circle, tree, which are defined in Sect. 6.2.1). The two most popular ones are path-based daylight fingerprints and circle-based extended connectivity fingerprints. The tree-based fingerprint is usually constructed based on local chemical properties, which captures spatial volume, ring, or not, pharmacophore profile. (ii) Taken from a predefined dictionary of relevant substructures, which compiled by experts with domain-specific knowledge. For example, MACCS fingerprints and PubChem fingerprints are built on the 166-bit and 881-bit predefined structural keys. (iii) Generated in a data-mining phase. The key feature vectors are obtained through statistical analysis of a special task. These fingerprints discretely encode the number of occurrences of a feature or only its presence or absence by a single bit per feature. Often hashed fingerprint is used

166

Y. Xu et al.

to reduce the fingerprint length to a fixed size at the cost of information loss. Due to these drawbacks of information compression caused by molecular fingerprints, some differential and learnable techniques have been proposed and developed to enhance the expressive information of MRs. We classify these techniques into three categories as the following: • GNNs have offered as an end-to-end manner, where the representation learning and the target learning tasks are conducted jointly, for predicting the label of a molecular graph. The representation learning is an encoder to encode raw graph data into a node-level or graph-level feature vector. The target learning is a decoder to predict the label of these learned features. The encoder architecture is initially designed inspired by the traditional graph kernels like tree-based and circle-based kernels. The encoder and decoder are formulated as message passing networks and readout networks Gilmer et al. (2017). • Instead of an end-to-end manner, learnable GEs focus on learning node-level or graph-level representation in an isolated stage, and the learned representations are then used for the downstream tasks. This technique can directly convert the raw graph-structural data into a low-dimensional vector representation which preserves the intrinsic molecular graph properties. It means that learn the suitable mapping φ for representing a molecular graph as an explicit vector, which is usually trained via self-supervised or semi-supervised learning. • With the idea of GKs, learnable GKs aim to automatically learn kernel functions in a new explicit or implicit feature space for capturing the similarity between molecular graphs. It is always necessary for the mapping and relationship function (similar to inner product) to capture intrinsic and extrinsic properties. Compared with GNNs and GEs, deep GKs will pay more attention to the similarity or dissimilarity learning between the learned representations of paired molecular graphs. It will be beneficial to explainable analysis. Although deep GKs are undoubtedly based on the concept of traditional graph kernels, these learnable kernels are actually the derivatives of GNNs and GEs with designed regularizations that underlie the relationships between the molecular graphs. Once one regularization is well-designed and integrated into these kernel learning, this trick would significantly benefit to the predictive performance on specific tasks. It usually implies key potential features. The additional kernel matrix also can provide explainable visualizations with T-SNE or U-Map techniques. Hereby, the proposed GKL is a general term of GNN, GE, and deep GK techniques. It aims to produce practicable MRs by data-driven learning. Focusing on GKL, we discuss some of their representative applications in compound-toxicity problems in the next section.

6 Graph Kernel Learning for Predictive Toxicity Models

167

6.4 Applications of GKL Methods on Chemical Toxicity 6.4.1 Benchmark Data Sets and Methods About Chemical Toxicity Most of public data sources about chemical toxicity have been collected in Yang et al. (2018). ToxCast and Tox21 are open-access data sets with high-throughput screening for chemical toxicity Richard et al. (2020), which are popular for testing new algorithms. The AdmetSAR has curated many data sources and integrated into a comprehensive data set including various toxicity endpoints Cheng et al. (2012). An open-science platform of Therapeutics Data Commons has integrated almost of the open-access data sets and provided some toxicity data sets, which aims to provide a fair comparison on the public leaderboard Huang et al. (2021). In the past five years, researchers, especially in computer science, started to be devoted on developing new graph-based algorithms or models for exploring molecular property problems. These algorithms were tested on the frequently used data sets. We summarized the toxicity-related data sets in Table 6.1. We found the data sets were more or less limited by the number and toxicity endpoints of compounds, which is far away from the real-world toxicity testings. However, it is still worthy for the comparison results to analyze and evaluate the potential power of different algorithms on the same data sets. It will help to better understand the characteristics of the algorithms and push them to a real-application level. To explore the potential power of these published methods, we collected the developed graph-driven algorithms and their reported performance against the 8 data sets summarized in Table 6.2. There are four types of methods reviewed: (1) Graph kernels, functions which measure the similarity between two molecular graphs, are plugged into machine learning algorithms such as SVM. (2) Graph neural networks Table 6.1 Summary of the toxicity-related data sets which are frequently used for algorithm comparison Data set

Number

Task

Avg. number of nodes

Avg. number of edges

Source

MUTAG

188

1

17.93

19.79

Debnath et al. (1991)

NCI1

4110

1

29.87

32.30

Shervashidze et al. (2011)

NCI109

4127

1

29.68

32.13

Shervashidze et al. (2011)

PTC-MR

344

1

14.29

14.69

Helma et al. (2001)

Tox21

7831

12

18.57

19.29

Wu et al. (2018)

ToxCast

8575

617

18.78

19.26

Wu et al. (2018)

Sider

1427

27

33.64

35.36

Wu et al. (2018)

ClinTox

1478

2

26.16

27.88

Wu et al. (2018)

168

Y. Xu et al.

are based on an end-to-end framework for automatically learning MRs and specific tasks simultaneously. (3) Learnable graph embeddings, inspired by word embedding techniques from natural language processing, focus on learning graph-level or node-level representations via self-supervised learning or semi-supervised learning. (4) Learnable graph kernels use the kernel concepts to simultaneously learn implicit or explicit graph-level representations and their relationships. When the representations are explicit, learnable graph kernels are similar to graph-level embeddings but underlie the relationships in the feature space. These relationships can be used to project a low-dimension space, making graph-level embeddings become meaningful and explainable. Indeed, all of these graph-based methods aim to better represent a molecular graph as an optimal feature vector that contributes to special chemical toxicity tasks. Here, we give a brief overview of four representative methods and their applications on the public toxicity data sets, then discuss current general graph-driven methods based on Table 6.2.

6.4.2 Applications of Graph Kernel-Based Methods Definitions about graph kernels and concepts have been provided in the Sect. 6.2.2. One of the most popular graph kernels used in toxicity problems are WeisfeilerLehman (WL) kernels Shervashidze et al. (2011). The WL algorithm is often used to illustrate graph isomorphism via iterative vertex relabeling, shown in Fig. 6.1. Kriege et al. developed comprehensive machine learning models based on the GKs plugged into SVM models Kriege et al. (2020). Among many kernels, the WL kernel achieves state-of-the-art performance especially in toxicity classification problems, suggesting that WL kernel is a very efficient kernel for representing molecular graphs. Inspired by this WL kernel, Xu et al. designed the expressive GNN architecture, graph isomorphism network (GIN) Xu et al. (2018), which is discussed in the Sect. 6.3.3. Kondor et al. and Jiang et al. applied multiscale graph kernel learning to explore toxicity problems Kondor and Pan (2016); Jiang et al. (2021). Theoretically, these kernels provide a multi-view insight that makes the feature space become more colorful and informative. As shown in Table 6.2, multiscale Laplacian graph kernelbased SVM models present two better and two worse predictive results than WL kernel-based SVM models. Jiang et al. used the multiscale weighted colored graph (MWCG) kernel, which is plugged into the gradient boosting decision tree (GBDT) algorithm, to specially evaluate the Tox21 data sets Jiang et al. (2021). The kernel of MWCG has been proposed and validated for protein flexibility analysis and protein–ligand binding prediction, which outperformed other approaches in the D3R Grand Challenges Gathiaka et al. (2016). For a molecular graph, this multiscale kernel focuses primarily on capturing the information of pairwise non-covalent interactions between the subgraphs, rather than covalent interactions. Actually, it provides a new trick to extract features from molecular graphs. This kernel has been shown to be very efficient for addressing the 12 toxicity data sets, which achieves

Traditional graph kernels

57.9 ± 1.3 57.0 ± 2.0 62.7 ± 1.0 63.3 ± 1.5

83.7 ± 1.5 80.7 ± 3.0 87.3 ± 1.7 87.9 ± 1.6

Random walk Gärtner et al. (2003)*

WL + OA kernels Kriege et al. (2020)

Multiscale Laplacian graph kernel Kondor and Pan (2016)*

58.2 ± 2.4

85.2 ± 2.4

Shortest-path kernel Yanardag and Vishwanathan (2015)*

WL kernels Shervashidze et al. (2011)*

– 57.3 ± 1.4

– 81.7 ± 2.1

AttentiveFP Xiong et al. (2019)*

80.8 ± 1.3

86.6 ± 0.2

80.1 ± 0.5

64.3 ± 0.3

73.0 ± 0.2

62.3 ± 0.3



– –

















-

NCI1



Graphlet kernels Shervashidze et al. (2009)*

– –

D-MPNN Yang et al. (2019b)*

MGCN Lu et al. (2019)*





– –

SchNet Schütt et al. (2017)*

63.9 ± 7.7 –

85.1 ± 7.6 –

GraphSage Hamilton et al. (2017)*

Weave Kearnes et al. (2016)*

MPNN Gilmer et al. (2017)*

64.6 ± 7.0 66.7 ± 5.1

89.4 ± 5.6 89.4 ± 6.1

GIN Xu et al. (2018)*

64.2 ± 4.3

85.6 ± 5.8

GCN Kipf and Welling (2017)*

Graph neural networks

GAT Veliˇckovi´c et al. (2017)*

-

-

DNN + Molecular fingerprint Ramsundar et al. (2019)*

PTC-MR

MUTAG

Method (Ref.)

Baseline

81.3 ± 0.8

86.4 ± 0.2

80.2 ± 0.3

63.5 ± 0.2

73.0 ± 0.2

62.6 ± 0.2





















-

NCI109

57.9 ± 0.1























80.7 ± 2.0 –

71.8 ± 1.1 66.3 ± 0.9

82.6 ± 2.3 70.7 ± 1.6

67.9 ± 2.1 69.1 ± 1.3

76.7 ± 2.5 80.8 ± 2.4

– 67.8 ± 2.4







65.0 ± 2.5

58.5 ± 3.1

ToxCast

74.1 ± 4.4





77.2 ± 4.1

69.8 ± 1.2

Tox21













60.5 ± 6.0

55.2 ± 1.8

63.2 ± 2.3

59.5 ± 3.0

54.5 ± 3.8

54.3 ± 3.4







59.3 ± 3.5

60.7 ± 3.3

SIDER













(continued)

93.3 ± 2.0

63.4 ± 4.2

89.7 ± 4.0

87.9 ± 5.4

71.7 ± 4.2

82.3 ± 2.3







84.5 ± 5.1

76.5 ± 8.5

ClinTox

Table 6.2 Summary of the new graph-based algorithms and their performance. The “*” refers to the reported results are implemented by other authors, not from the original paper. The “n.a.” refers to the no reported standard value. The bold texts are optimal results for one given task among the corresponding methods

6 Graph Kernel Learning for Predictive Toxicity Models 169

Learnable graph kernels

Learnable graph embeddings

PTC-MR 58.9 ± 8.0 60.0 ± 6.4 60.2 ± 6.9 – – – – – – – – 61.6 ± 1.4 – –

MUTAG 72.6 ± 10.2 61.0 ± 15.8 83.2 ± 9.2 – – – – – – – – 89.0 ± 1.1 –

Sub2vec Adhikari et al. (2018)

Graph2vec Narayanan et al. (2017)

Infomax Veliˇckovi´c et al. (2019)*

Supervised + Infomax Veliˇckovi´c et al. (2019)*

EdgePred Hamilton et al. (2017)*

Supervised + EdgePred Hamilton et al. (2017)*

AttrMasking Hu et al. (2019)

Supervised + AttrMasking Hu et al. (2019)

ContextPred Hu et al. (2019)

Supervised + ContextPred Hu et al. (2019)

InfoGraph Sun et al. (2019)

N-GRAM Liu et al. (2018)*

– 57.3 ± 1.1

– 82.7 ± 1.4

ChemRL-GEM Fang et al. (2021a)

Deep graphlet kernel Yanardag and Vishwanathan (2015)



– –

GROVER Rong et al. (2020)

GraphTrans Jain et al. (2021)

Node2vec Grover and Leskovec (2016)*

Method (Ref.)

Table 6.2 (continued)

62.5 ± 0.2



62.7 ± 0.2



– 82.3 ± 2.6























74.3 ± 1.5

50.7 ± 1.5

52.7 ± 1.6

NCI109

82.6 ± 1.2





















73.2 ± 1.8

52.8 ± 1.5

54.9 ± 1.6

NCI1





69.2 ± 0.4

78.1 ± 0.1

73.7 ± 1.0 –

83.1 ± 2.5 –

– –

– 76.9 ± 2.7

65.7 ± 0.6

63.9 ± 0.6

78.1 ± 0.6

75.7 ± 0.7

64.2 ± 0.5 65.1 ± 0.3

77.9 ± 0.4

76.7 ± 0.4

64.1 ± 0.6 66.5 ± 0.3

78.3 ± 0.3

76.0 ± 0.6

62.7 ± 0.4 64.9 ± 0.7

75.3 ± 0.5 77.8 ± 0.3







ToxCast







Tox21



67.2 ± 0.4



65.8 ± 2.3

63.2 ± 0.5



62.7 ± 0.8

60.9 ± 0.6

63.9 ± 0.9

61.0 ± 0.7

63.3 ± 0.9

60.4 ± 0.7

60.9 ± 0.6

58.4 ± 0.8







SIDER

– (continued)

90.1 ± 1.3



94.4 ± 2.1

85.5 ± 3.7



72.6 ± 1.5

65.9 ± 3.8

73.7 ± 2.8

71.8 ± 4.1

70.9 ± 4.6

64.1 ± 3.7

71.2 ± 2.8

69.9 ± 3.0







ClinTox

170 Y. Xu et al.

– – – –

CKGNN Fang et al. (2021b)

GraphLoG Xu et al. (2021a)

GraphMVP Liu et al. (2021)

MolCLE Wang et al. (2021)



MGSSL Zhang et al. (2021)















– –

– –

91.7 ± 1.4 96.9 ± n.a.

GroupCL Xu et al. (2021b)

MoCL Sun et al. (2021)

MolGNet Li et al. (2021b)

65.5 ± 0.8

90.5 ± 0.9

Pairwise half-graph discrimination Li et al. (2021a)

CoMPT Chen et al. (2021)

– 62.5 ± 1.7

86.8 ± 1.3 89.7 ± 1.1

59.2 ± 1.6

82.9 ± 2.7

Deep WL kernel Yanardag and Vishwanathan (2015)

GraphCL You et al. (2020)

60.1 ± 2.5

87.4 ± 2.7

Contrastive multi-view Hassani and Khasahmadi (2020)

PTC-MR

MUTAG

Method (Ref.)

Deep shortest-path kernel Yanardag and Vishwanathan (2015)

Table 6.2 (continued)

















81.7 ± 0.3





77.9 ± 0.4

80.3 ± 0.5

73.5 ± 0.5

NCI1

























80.3 ± 0.3

73.3 ± 0.3

NCI109

73.5 ± n.a. 63.5 ± 0.7 63.1 ± 0.2 69.0 ± 0.0

82.6 ± n.a. 75.9 ± 0.5 80.6 ± 0.1

64.1 ± 0.7

76.5 ± 0.3 75.7 ± 0.5

74.8 ± 0.5 -

83.7 ± 1.9 80.9 ± 1.4

65.9 ± n.a.

62.7 ± 0.9

65.5 ± 0.4



62.4 ± 0.6





ToxCast

77.7 ± n.a.

75.5 ± 0.4

76.1 ± 0.4



73.9 ± 0.7





Tox21

62.7 ± 0.0

63.9 ± 1.2

61.2 ± 1.1

67.8 ± n.a.

61.8 ± 0.8

63.4 ± 3.0

66.1 ± 0.7

62.8 ± n.a.

61.5 ± 0.9

62.7 ± 0.8



60.5 ± 0.9





SIDER

92.5 ± 0.0

79.1 ± 2.8

76.7 ± 3.3

82.9 ± n.a.

80.7 ± 2.1

93.4 ± 1.9

96.3 ± 2.8

75.0 ± n.a.

80.9 ± 2.9

69.4 ± 1.9



76.0 ± 2.6





ClinTox

6 Graph Kernel Learning for Predictive Toxicity Models 171

172

Y. Xu et al.

Fig. 6.1 Overview of WL vertex relabeling algorithm. Two iterations of WL vertex relabeling for a graph with discrete labels (A, B). And vertex can be represented as atom, subtree, and shortest-path

the state-of-the-art performance in toxicity classification prediction for Tox21. More toxicity data sets still need to be validated for the MWCG kernel.

6.4.3 Applications of Graph Neural Networks To avoid feature engineering on graph-structure data as much as possible, the concept of GNN has been proposed as an end-to-end framework for automatically extracting features from graph-structure data and specifically predicting molecular properties. In order to explore automatic MRs for the endpoint of drug-induced liver injury (DILI), we applied a tree-like GNN model (termed as undirected graph recursive neural network, UGRNN, Lusci et al. (2013)) to explore DILI prediction Xu et al. (2015). As we know, DILI is one of the main causes for drug candidate attrition in (pre-)clinical studies, drug withdrawal from the market, and drug labeling with a black box warning Regev (2014). Numerous molecular descriptors and fingerprints have been explored and validated for DILI prediction. The UGRNN is an end-toend architecture and contains an encoder for representing a molecule tree as feature vectors and a decoder for making a decision on the vectors. In Fig. 6.2, the encoder and decoder are both built by multiple-layer neural networks which can be optimized by gradient descent algorithms. We demonstrated that the UGRNN-based models achieved a comparable performance on the public data sets without using manual intervention, which opens up a new direction for learnable toxicity representations extracted from molecular graphs. To explore both predictable and interpretable molecular representations, we developed an end-to-end framework for exploring toxicity problems of acute oral toxicity (AOT) Xu et al. (2017). This framework is described in Fig. 6.3 and contains a circle-like (similar to extended connectivity fingerprints) GCNs for learning molecular representations and a head of multi-layer neural network for classification and

6 Graph Kernel Learning for Predictive Toxicity Models

173

Fig. 6.2 Overview of UGRNN architecture for drug-induced liver injury

regression outputs. This framework was demonstrated that it could not only construct high predictive classification and regression models but also automatically extract key toxicity features from the learned molecular representations. This work suggested the learned low-dimensional representations would be information-rich and interpretable for specific tasks. In the past five years, many GNNs and their toolkits have been developed for molecular property prediction, including NeuralFP Duvenaud et al. (2015), Weave Kearnes et al. (2016), MPNN Gilmer et al. (2017), directed-MPNN (D-MPNN, Yang et al. (2019b)), PotentialNet Feinberg et al. (2018), and AttentiveFP Xiong et al. (2019). They focus primarily on general problems of predicting molecular properties in chemoinformatics, not specially for molecular toxicity. These methods are often simply evaluated on the public toxicity-related data sets such as Tox21, ToxCast, Sider, ClinTox, shown in Table 6.1. PotentialNet and D-MPNN were further validated for ADMET predictions and antibiotic discovery, respectively Feinberg et al. (2020), Stokes et al. (2020). The PotentialNet was used by Merck to systemically evaluate in-house data sets including 31 ADMET-related assays, suggesting that learnable molecular representations would better enable molecular predictors to interpolate and extrapolate to new regions of chemical space than random forest with molecular descriptors and fingerprints Feinberg et al. (2020)) The D-MPNN also was validated in identifying Halicin as an antibacterial molecule from the Drug Repurposing Hub Stokes et al. (2020). Recently, a comparative study explained that GNN methods

174

Y. Xu et al.

Fig. 6.3 Overview of graph convolution network for acute oral toxicity

was actually comparable to GKs on the 11 publicly available data sets, and the firsttwo principal components of the learned MRs were also similar Xiang et al. (2021). Therefore, the GNN techniques can provide a new choice that can be used to develop expressive predictive models for exploring chemical toxicity problems.

6.4.4 Applications of Learnable Graph Embeddings GNN models only focus on learning task-specific MRs, which always lack the ability of transfer. When labeled data is scanty, GNNs may have a poor or weak power in generalization. To overcome these issues, an effective approach is proposed to pretrain models where data is abundant in a self-supervised manner, and then fine-tune these models on the downstream tasks of interest. In fact, this approach belongs to one of transfer learning techniques that are summarized in Cai et al. (2020). An open question is how to effectively pre-train molecular graph data with unsupervised learning? Many GE techniques have been studied in terms of node-level, graph-level, subgraph-level, and knowledge-level representations. These methods aim at preserving more meaningful properties (e.g., local, global, substructure, and domain knowledge) to implement a set of informative node-level or graph-level

6 Graph Kernel Learning for Predictive Toxicity Models

175

Fig. 6.4 Overview of the node-level embeddings. For context prediction, K-hop subgraphs are used to predict center atoms and context atoms, like word embedding technique. For attribute masking, node/edge attributes are randomly masked and need to be predicted by GNN

representations. Then, these representations can be easily plugged into GNN models. With these pre-trained representations, the GNN models can afford better predictive performance. Hu et al. implemented an effective pre-train strategy for node-level embedding and graph-level embedding. To capture semantic local information, two million unlabeled molecular graphs from ZINC15 Sterling and Irwin (2015) were used for node-level learning in a new self-supervised manner, shown in Fig. 6.4. The preprocessed ChEMBL data set with bioassay activities was pre-trained using multi-task supervised learning for incorporating domain knowledge (like biological activities) into the graph-level embeddings. A combined strategy of node-level and graphlevel embeddings was proposed and then demonstrated empirically to be able to efficiently produce transferable graph representations and robustly improve downstream performance without expert selection of supervised pre-training tasks Hu et al. (2019). Recently, Rong et al. proposed a new framework that uses GNN and transformer architecture to learn more informative molecular representations. For Fig. 6.5, it also adopted K-hop neighbors to learn node-level and edge-level embeddings and designed a motif prediction to learn graph-level embeddings with semantic domain knowledge. These carefully designed embeddings are fused and then plugged into GNN models, which make the prediction results achieve the state-of-the-art performance, shown in Table 6.2.

6.4.5 Applications of Learnable Graph Kernels To examine the explainability of graph-level embedding, it is necessary for the learned embeddings to take the pairwise distances into consideration. These pairwise distances are similar to a kernel matrix that can be applied to graph clustering and similarity searching. The KNN techniques can help to make an instance-level

176

Y. Xu et al.

Fig. 6.5 Overview of the node-, edge- and motif-level embeddings

explanation. If k neighbor instances in the embedding space are labeled to positive, the predicted label is positive with a high probability and vice versa. The phenomenon of diagonal dominance often occurs in the toxicity data sets. That is, a given graph is similar to itself but not to any other graph in the data set, shown in Fig. 6.6 (the left part). To alleviate this issue, Yanardag et al. first developed a framework that learns latent MRs with the dependency information between substructures. This framework intends to learn the similarity between two molecular graphs based on the imported dependency relationships, see Fig. 6.6 (the right part). The regularization of dependency can enhance the correlation of graph-level embeddings. This framework has been successfully expanded to WL, SP, and graphlet kernel. Although these are unapparent improvements on the predictive performance for traditional GK-based models, it still indicates potential capability of explanation. Inspired by the pre-train strategy from Hu et al. (2019), Li et al. combined pairwise half-graph discrimination with node attribute masking to pre-train the node-level representation. Similar to kernel tricks, the operation of half-graph discrimination uses contrastive learning to capture better node-level representations with the regularization of subgraphs. It means that these learned node-level representations imply

Fig. 6.6 Visualization of kernel matrix for WL kernel (left) and deep WL kernel (right). The color map encodes the degree of the similarity (darker color indicates higher similarity)

6 Graph Kernel Learning for Predictive Toxicity Models

177

Fig. 6.7 Visualization of the learned representations by UMAP

subgraph-level information. These representations are transferred to GNN models for predicting the downstream tasks. For the evaluation of toxicity prediction, this model has achieved the state-of-the-art performance on four data sets, shown in Table 6.2. Moreover, the analysis of UMAP visualization McInnes et al. (2018) indicated that the learned representations could be well-separated by the scaffold structures, shown in Fig. 6.7. This explainable point is worth further exploring in studying toxicity models. For Table 6.2, some GKL-based works have been studied. It is obvious that most of these works have not obtained better predictive performance with the increase of computational complexity. Therefore, the design of the GKL learning frameworks still needs more explorations, especially knowledge-level kernel learning. One ingenious design could lead to a significant improvement. Meanwhile, these reported GKL algorithms are not specifically designed for toxicity predictions, and they can even be further improved specifically for toxicity problems.

6.5 Challenges and Perspectives of Graph Kernel Learning on Toxicity-Related Problems To reduce the risk of chemical toxicity, a large number of graph-driven models and algorithms have been developed for conveying a better prediction performance on the public data sets. These methods can always provide a set of learnable molecular representations that is a simple and practical option to enhance predictive power. However, many challenges and limitations for real-world applications remain to be solved. Quality and quantity of toxicity data. Data quality is a big challenge. The available public data sets only have hundreds or thousands of labeled data, which are obtained from high-throughput in vitro assays or in vivo tests on animals, such as the popular

178

Y. Xu et al.

Tox21 and ToxCast data. There is still a big gap between preclinical data from animal models and clinical data from humans. Metric learning Hoffer and Ailon (2015) and multimodal learning Ramachandram and Taylor (2017) may be used to fill this gap. For example, Iwata et al. (2021) applied multimodal learning to improve the prediction of drug clearance in humans with both chemical structures and its clearance on rat. Furthermore, it is also highly demanded to obtain more data from clinical trials and applications. Multivariate data sets. Faced with the data limitation situation, one possible achievable solution is to integrate multivariate data (e.g., drug targets, metabonomics, sider effects, adverse events, and clinical indications) to bridge the gap between chemical structures and toxicity endpoints. Actually, it is quite common to integrate multivariate data sets to perform network-based drug repurposing e.g., Duran-Frigola et al. (2020). Consequently, this strategy can also be transferred to build compoundtoxicity networks which are able to provide certain mechanism-level inference and to better understand specific chemical toxicity. Although plenty of works have been started to focus on network-based toxicity inference, there are still some limitations for data integration and model generalization. The learning techniques driven by knowledge graph Wang et al. (2017) could be used to deal with these limitations. Multi-level graph kernel learning. The multiscale embeddings (e.g., node-level, graph-level, subgraph-level, and knowledge-level) have been successfully fused via a series of elaborate designs and used to address chemical toxicity problems. These embeddings are all implemented by differential computations without manual interventions. With the improvement of embedding expressiveness, the amount of computational complexity is also increasing. Especially, GKL methods need to take pairwise distances between data points into consideration. Specially designed approximation algorithms are necessary to reduce computational complexity. Currently, contrastive learning techniques with different auxiliary losses have been used to implicitly learn the relationships in the feature space. Methods for learning embeddings based on multivariate toxicity-related data sets and for fusing these multivariate embeddings need to be developed. It still remains challenging for GKL to implement robust and meaningful embeddings in the Hilbert feature space. Besides chemical structures, the exposure dose is vital for drug toxicity assessment. Actually, the drug metabolism and pharmacokinetics (DMPK) properties of a drug product determine the exposure dose and duration into the target areas. The absorption, distribution, metabolism, and excretion (ADME) properties of a drug product are key factors for conducting DMPK studies. In silico, ADME prediction (like ADMET Predictor® ) has been developed as an important toolkit to support comprehensive assessments on molecular properties and help to conduct DMPK studies. All of these models are built based on machine learning algorithms with molecular descriptors and fingerprints. To our best knowledge, there are only a few cases for applying these advanced GKL techniques to the problems of ADME Feinberg et al. (2020); Xiong et al. (2021). The core issue is data accessibility. Owing to data barriers, federated learning may be a good solution to privately learn more data sets but only share model weights. This strategy can be used fairly to test

6 Graph Kernel Learning for Predictive Toxicity Models

179

various machine learning or deep learning algorithms for implementing the most powerful predictive ADMET models. Moreover, the GKL techniques also can be tried to explore more challenging problems of DMPK properties, such as bioavailability, clearance, Cmax, Tmax, and drug–drug interaction. Consequently, the GKL is allowed to be integrated with other learning techniques like multimodal learning, transferable metric learning, and contrastive learning. It is promising for such integrated techniques to overcome the challenges of species differences as continuous learning of wet experimental data.

6.6 Conclusion In this chapter, we focused on graph-driven methods for predictive toxicity models. In consideration of chemical toxicity complexity, we proposed a new concept of GKL, which aims to utilize the graph-driven methods of GNNs, GEs, and GKs to learn meaningful MRs in a fully data-driven manner. We gave an overview of the fundamentals, purposes, and applications of GKL techniques in chemoinformatics, especially toxicity prediction problems. Subsequently, we summarized the current and future challenges of the GKL techniques and give perspectives on more intersection points of the GKL techniques and drug evaluations. We hope this chapter could help readers to better understand the characteristics and purposes of these GKL methods that are applied to toxicity-related problems. Meanwhile, we hope the fully data-driven GKL methods could have more in-depth excavations in real-world application scenarios.

References Adamson GW, Bush JA (1973) A method for the automatic classification of chemical structures. Inf Stor Retri 9:561–568 Adhikari B, Zhang Y, Ramakrishnan N, Prakash BA (2018) Sub2vec: feature learning for subgraphs. In: Pacific-asia conference on knowledge discovery and data mining, Springer, pp 170–182 Banerjee P, Eckert AO, Schrey AK, Preissner R (2018) ProTox-II: a webserver for the prediction of toxicity of chemicals. Nucleic Acids Res 46:W257–W263 Cai C, Wang S, Xu Y et al (2020) Transfer learning for drug discovery. J Med Chem 63:8683–8694 Chen J, Zheng S, Song Y et al. (2021) Learning attributed graph representations with communicative message passing transformer. In: Proceedings of the 13th international joint conference on artificial intelligence. arXiv preprint arXiv:1809.10341 Cheng F, Li W, Zhou Y et al (2012) admetSAR: a comprehensive source and free tool for assessment of chemical ADMET properties. J Chem Inf Model 52:3099–3105 Cortes C, Vapnik V (1995) Support-Vector Networks. Mach Learn 20:273–297 Debnath AK, Lopez de Compadre RL, Debnath G et al (1991) Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity. J Med Chem 34:786–797 Dong J, Wang N-N, Yao Z-J et al (2018) ADMETlab: a platform for systematic ADMET evaluation based on a comprehensively collected ADMET database. J Cheminf 10:1–11

180

Y. Xu et al.

Duran-Frigola M, Pauls E, Guitart-Pla O et al (2020) Extending the small-molecule similarity principle to all levels of biology with the chemical checker. Nat Biotechnol 38:1087–1096 Duvenaud D, Maclaurin D, Aguilera-Iparraguirre J et al. (2015) Convolutional networks on graphs for learning molecular fingerprints. arXiv preprint arXiv:150909292 Fang X, Liu L, Lei J et al. (2021a) ChemRL-GEM: geometry enhanced molecular representation learning for property prediction. arXiv preprint arXiv:210606130 Fang Y, Yang H, Zhuang X et al. (2021b) Knowledge-aware contrastive molecular graph learning. arXiv preprint arXiv:210313047 Feinberg EN, Joshi E, Pande VS, Cheng AC (2020) Improvement in ADMET prediction with multitask deep featurization. J Med Chem 63:8835–8848 Feinberg EN, Sur D, Wu Z et al (2018) PotentialNet for molecular property prediction. ACS Cent Sci 4:1520–1530 Gärtner T, Flach P, Wrobel S (2003) On graph kernels: hardness results and efficient alternatives. In: Learning theory and kernel machines, Springer, pp 129–143 Gathiaka S, Liu S, Chiu M et al (2016) D3R grand challenge 2015: evaluation of protein–ligand pose and affinity predictions. J Comput Aided Mol Des 30:651–668 Ghosh S, Das N, Gonçalves T et al. (2018) The journey of graph kernels through two decades. Comput Sci Rev 27:88–111 Gilmer J, Schoenholz SS, Riley PF et al. (2017) Neural message passing for quantum chemistry. In: International conference on machine learning, PMLR, pp 1263–1272 Grover A, Leskovec J (2016) node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 855–864 Hamilton WL, Ying R, Leskovec J (2017) Inductive representation learning on large graphs. In: Proceedings of the 31st international conference on neural information processing systems, pp 1025–1035 Hassani K, Khasahmadi AH (2020) Contrastive multi-view representation learning on graphs. In: International conference on machine learning, PMLR, pp 4116–4126 Helma C, King RD, Kramer S, Srinivasan A (2001) The predictive toxicology challenge 2000–2001. Bioinformatics 17:107–108 Hido S, Kashima H (2009) A linear-time graph kernel. In: 2009 9th IEEE international conference on data mining, IEEE, pp 179–188 Hoffer E, Ailon N (2015) Deep metric learning using triplet network. In: International workshop on similarity-based pattern recognition, Springer, pp 84–92 Hu W, Liu B, Gomes J et al. (2019) Strategies for pre-training graph neural networks. arXiv preprint arXiv:190512265 Huang K, Fu T, Gao W et al. (2021) Therapeutics data commons: machine learning datasets and tasks for therapeutics. arXiv preprint arXiv:210209548 Iwata H, Matsuo T, Mamada H et al (2021) Prediction of total drug clearance in humans using animal data: proposal of a multimodal learning method based on deep learning. J Pharm Sci 110:1834–1841 Jain P, Wu Z, Wright M et al. (2021) Representing long-range context for graph neural networks with global attention. Adv Neural Inf Process Syst 34. https://github.com/ucbrise/graphtrans. Accessed 20 Jan 2022 Jiang J, Wang R, Wei G-W (2021) GGL-tox: geometric graph learning for toxicity prediction. J Chem Inf Model 61:1691–1700 Kashima H, Tsuda K, Inokuchi A (2003) Marginalized kernels between labeled graphs. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 321–328 Kearnes S, McCloskey K, Berndl M et al (2016) Molecular graph convolutions: moving beyond fingerprints. J Comput Aided Mol Des 30:595–608 Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: International conference on learning representations. arXiv preprint arXiv:1609.02907

6 Graph Kernel Learning for Predictive Toxicity Models

181

Kondor R, Pan H (2016) The multiscale laplacian graph kernel. Adv Neural Inf Process Syst 29:2990–2998 Kriege NM, Johansson FD, Morris C (2020) A survey on graph kernels. Appl Netw Sci 5:6 Kriege NM, Neumann M, Morris C et al (2019) A unifying view of explicit and implicit feature maps of graph kernels. Data Min Knowl Disc 33:1505–1547 Li P, Wang J, Li Z et al. (2021a) Pairwise half-graph discrimination: a simple graph-level selfsupervised strategy for pre-training graph neural networks. In: Proceedings of the thirtieth international joint conference on artificial intelligence, pp 2694–2700 Li P, Wang J, Qiao Y et al. (2021b) An effective self-supervised framework for learning expressive molecular global representations to drug discovery. Brief Bioinf 22. https://doi.org/10.1093/bib/ bbab109 Liu S, Demirel MF, Liang Y (2018) N-gram graph: simple unsupervised representation for graphs, with applications to molecules. arXiv preprint arXiv:180609206 Liu S, Wang H, Liu W et al. (2021) Pre-training molecular graph representation with 3D geometry. arXiv preprint arXiv:211007728 Lu C, Liu Q, Wang C et al. (2019) Molecular property prediction: a multilevel quantum interactions modeling perspective. In: Proceedings of the AAAI conference on artificial intelligence, pp 1052– 1060 Lusci A, Pollastri G, Baldi P (2013) Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. J Chem Inf Model 53:1563–1575 Mahé P, Vert J-P (2009) Graph kernels based on tree patterns for molecules. Mach Learn 75:3–35 Mayr A, Klambauer G, Unterthiner T, Hochreiter S (2016) DeepTox: toxicity prediction using deep learning. Front Environ Sci 3:80 McInnes L, Healy J, Melville J (2018) Umap: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:180203426 Narayanan A, Chandramohan M, Venkatesan R et al. (2017) graph2vec: learning distributed representations of graphs. arXiv preprint arXiv:170705005 Ramachandram D, Taylor GW (2017) Deep multimodal learning: a survey on recent advances and trends. IEEE Signal Process Mag 34:96–108 Ramsundar B, Eastman P, Feinberg E et al. (2019) DeepChem: democratizing deep-learning for drug discovery, quantum chemistry. Mater Sci Biol. https://github.com/deepchem/deepchem. Accessed 1 Dec 2021 Rasmussen CE (2003) Gaussian processes in machine learning. In: Summer school on machine learning, Springer, pp 63–71 Regev A (2014) Drug-induced liver injury and drug development: Industry perspective. In: Seminars in liver disease, Thieme Medical Publishers, pp 227–239 Richard AM, Huang R, Waidyanatha S et al (2020) The Tox21 10K compound library: Collaborative chemistry advancing toxicology. Chem Res Toxicol 34:189–216 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50:742–754 Rong Y, Bian Y, Xu T et al. (2020) Self-supervised graph transformer on large-scale molecular data. arXiv preprint arXiv:200702835 Schölkopf B, Smola A, Müller K-R (1997) Kernel principal component analysis. In: International conference on artificial neural networks, Springer, pp 583–588 Schütt KT, Kindermans P-J, Sauceda HE et al. (2017) Schnet: a continuous-filter convolutional neural network for modeling quantum interactions. arXiv preprint arXiv:170608566 Shervashidze N, Schweitzer P, Van Leeuwen EJ et al (2011) Weisfeiler-lehman graph kernels. J Mach Learn Res 12:2539–2561 Shervashidze N, Vishwanathan S, Petri T et al. (2009) Efficient graphlet kernels for large graph comparison. In: Artificial intelligence and statistics, PMLR, pp 488–495 Sterling T, Irwin JJ (2015) ZINC 15–ligand discovery for everyone. J Chem Inf Model 55:2324–2337 Stokes JM, Yang K, Swanson K et al (2020) A deep learning approach to antibiotic discovery. Cell 180:688–702

182

Y. Xu et al.

Sun F-Y, Hoffmann J, Verma V, Tang J (2019) Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization. arXiv preprint arXiv:190801000 Sun M, Xing J, Wang H et al. (2021) MoCL: Data-driven molecular fingerprint via knowledgeaware contrastive learning from molecular graph. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery and data mining, pp 3585–3594 Veliˇckovi´c P, Cucurull G, Casanova A et al. (2017) Graph attention networks. arXiv preprint arXiv:171010903 Veliˇckovi´c P, Fedus W, Hamilton WL et al. (2019) Deep graph infomax. In: International conference on learning representations. arXiv preprint arXiv:1809.10341 Wang Q, Mao Z, Wang B, Guo L (2017) Knowledge graph embedding: a survey of approaches and applications. IEEE Trans Knowl Data Eng 29:2724–2743 Wang Y, Min Y, Shao E, Wu J (2021) Molecular graph contrastive learning with parameterized explainable augmentations. bioRxiv Wu Z, Ramsundar B, Feinberg EN et al (2018) MoleculeNet: a benchmark for molecular machine learning. Chem Sci 9:513–530 Xiang Y, Tang Y-H, Lin G, Sun H (2021) A comparative study of marginalized graph kernel and message-passing neural network. J Chem Inf Model 61:5414–5424. https://doi.org/10.1021/acs. jcim.1c01118 Xiong G, Wu Z, Yi J et al (2021) ADMETlab 2.0: an integrated online platform for accurate and comprehensive predictions of ADMET properties. Nucleic Acids Res 49:W5–W14. https://doi. org/10.1093/nar/gkab255 Xiong Z, Wang D, Liu X et al (2019) Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J Med Chem 63:8749–8760 Xu K, Hu W, Leskovec J, Jegelka S (2018) How powerful are graph neural networks? arXiv preprint arXiv:181000826 Xu M, Wang H, Ni B et al (2021a) Self-supervised graph-level representation learning with local and global structure. arXiv preprint arXiv:210604113 Xu X, Deng C, Xie Y, Ji S (2021b) Group contrastive self-supervised learning on graphs. arXiv preprint arXiv:210709787 Xu Y, Dai Z, Chen F et al (2015) Deep learning for drug-induced liver injury. J Chem Inf Model 55:2085–2093 Xu Y, Pei J, Lai L (2017) Deep learning based regression and multiclass models for acute oral toxicity prediction with automatic chemical feature extraction. J Chem Inf Model 57:2672–2685 Yanardag P, Vishwanathan S (2015) Deep graph kernels. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1365–1374 Yang H, Lou C, Sun L et al (2019a) admetSAR 2.0: Web-service for prediction and optimization of chemical ADMET properties. Bioinformatics 35:1067–1069 Yang H, Sun L, Li W et al (2018) In silico prediction of chemical toxicity for drug design using machine learning methods and structural alerts. Front Chem 6:30 Yang K, Swanson K, Jin W et al (2019b) Analyzing learned molecular representations for property prediction. J Chem Inf Model 59:3370–3388 You Y, Chen T, Sui Y et al (2020) Graph contrastive learning with augmentations. Adv Neural Inf Process Syst 33:5812–5823 Zhang Z, Liu Q, Wang H et al. (2021) Motif-based graph self-supervised learning for molecular property prediction. arXiv preprint arXiv:211000987

Chapter 7

Optimize and Strengthen Machine Learning Models Based on in Vitro Assays with Mechanistic Knowledge and Real-World Data Thilini V. Mahanama, Arpan Biswas, and Dong Wang

7.1 Introduction Chronic animal testing has been the mainstay of toxicity evaluations for both environmental chemicals and drug candidates. However, the shortcomings of animal-based evaluations are also well known. In the context of the drug discovery process in particular, animal testing often fails to predict the human toxicity of drug candidates. Even if toxicity has been detected, insight into the biological processes underpinning the toxicity is usually quite limited. Testing based on in vitro assays has been proposed as a promising approach to overcome these limitations (Kavlock and Dix 2010; Tice et al. 2013). By performing in vitro assays directly anchored by human biology for toxicity response, the results are potentially more relevant to toxicity assessment with clear interpretations. In vitro assays primarily based on cell lines can also reduce the cost for toxicity assessment and achieve time savings with the T. V. Mahanama · A. Biswas · D. Wang (B) Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR 72079, USA e-mail: [email protected] T. V. Mahanama e-mail: [email protected] A. Biswas e-mail: [email protected] T. V. Mahanama Department of Industrial Management, Faculty of Science, University of Kelaniya, Gampaha, Sri Lanka A. Biswas Center for Nanophase Materials Sciences, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2023 H. Hong (ed.), Machine Learning and Deep Learning in Computational Toxicology, Computational Methods in Engineering & the Sciences, https://doi.org/10.1007/978-3-031-20730-3_7

183

184

T. V. Mahanama et al.

high-throughput setup. Partially replacing animal-based testing with in vitro assays can also reduce the number of animals used each year for toxicity assessment, which is increasingly demanded by a broad section of society. The potential for increased use of in vitro assays in toxicity testing has been outlined in the Tox21 vision by the National Academy of Sciences (NRC 2007; NRC 2017) as well as in the EU program for registration, evaluation, authorization, and restriction of chemical substances (REACH) program (Rudén and Hansson 2010; Locke et al. 2017). Several large-scale projects have been carried out to develop and test a range of in vitro assays for toxicity assessment. Important examples include the Tox21 program (Collins et al. 2008; Tice et al. 2013), EPA’s ToxCast (Judson et al. 2014; Richard et al. 2016), open TG-GATEs (Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System, Igarashi et al. 2015), the L1000 project (Subramanian et al. 2017), among others. It is thus possible to draw on many in vitro assays to assess gene expression signatures, nuclear receptor binding, cell vitality, and mitochondria function to inform toxicity at the cellular level. There is now significant literature on regarding the use of machine learning models to predict toxicity using a wide range of in vitro assay data (e.g., Liu et al. 2015; Huang et al. 2016; Kleinstreuer et al. 2017). Some papers reported predictive performances similar to those with animal-based evaluations. However, fully realizing the potential of predictive models based on in vitro assays in actual toxicity assessment still faces some important challenges. Though current machine learning approaches can achieve excellent prediction accuracy with a large number of predictors, the high complexity of the model does incur a real cost in applications. Various research groups have published excellent results for machine learning models based on hundreds or even thousands of in vitro assays to predict the potential for toxicity. However, performing all these assays for each application can be costly or impractical. Utilizing a large number of assays also makes the model opaque, which defeats an important purpose of using in vitro assays, i.e., to provide mechanistic insight into toxicity. Another challenge for building predictive models based on in vitro assays is the difficulty to obtain human data for validation. Although most in vitro assays are based on human cell lines and are potentially more relevant to human biology in comparison with animal models, the performance of these models should ideally be compared to actual human toxicology data. However, it is generally not feasible to perform clinical trials for this purpose. On the other hand, recent advances have been made in utilizing real-world data (RWD), data relating to patient health status, and/or the delivery of health care routinely collected from a variety of sources, in addressing important research questions in drug development and public health (FDA 2018). We think there is a unique opportunity to strengthen predictive modeling for toxicity with RWD. In this chapter, we discuss our experiences in developing new approaches to address these two problems. For model complexity, we found that adverse outcome pathways (AOPs) can be leveraged to filter the large number of in vitro assays and lead to more parsimonious machine learning models while still retaining excellent predictive performances. For comparison with human toxicity data, we developed new approaches to analyze spontaneous reporting databases for drug adverse events

7 Optimize and Strengthen Machine Learning Models Based on in Vitro …

185

to provide a new source of data to corroborate the findings of machine learning algorithms. We use drug-induced liver injury (DILI) as an example in discussing these two topics in subsequent sections.

7.2 Incorporating AOPs to Construct Parsimonious Machine Learning Models 7.2.1 AOPs and AOP Networks Mechanistic knowledge is critical for toxicity assessment. This importance will not diminish in the era of high-throughput experiments. Concrete mechanistic knowledge of a chemical’s effect on biological systems can be used to screen out the irrelevant assays and greatly facilitate the modeling task. For this purpose, the AOP framework provides an efficient way to store and access the mechanistic knowledge for toxicity (Ankley et al. 2010; Krewski et al. 2010). AOPs are represented as linear constructs with a sequence of events with biological significance. The main components are molecular initiating event (MIE), key events (KE), and adverse outcome (AO). These constructs in turn describe toxicity pathways with the associated causal information (Allen et al. 2014; Villeneuve et al. 2014; Burden et al. 2015). AOP is a field under active development in multiple directions. Readers can consult the relevant literature for more information (e.g., van der Veen et al. 2014; Labib et al. 2016; Bell et al. 2016; Nymark et al. 2018). In this chapter, we focus on AOPs related to DILI. DILI is an important reason for drug withdrawals from the market after approval as well as the termination of development of promising drug candidates (Chen et al. 2014). As a result, predictive modeling of DILI potential with in vitro assays has been an active area of interest for drug development. Regarding mechanistic knowledge, there has been significant effort in constructing AOPs related to liver toxicity. Khadka et al. (2020) used drug properties extracted from DrugDB to query AOPwiki (https://aopwiki.org/), the primary depository of AOPs, for AOPs potentially relevant to DILI. The authors demonstrated that 10 AOPs related to liver toxicity can be recovered with differential hits by most-DILI-concern and no-DILI-concern drugs on these pathways. As some AOPs share common elements (events), these pathways form seven distinct AOP networks. They are related to biological processes pertaining to steatosis, cholestasis, fibrosis, and liver cancer. As AOPs are under active development, we updated the AOP networks for steatosis, cholestasis, and fibrosis, which are shown in Fig. 7.1. To simplify the presentation, we only show the nodes closely related to molecular entities (genes, nuclear receptors) in each pathway. Complete information for each AOP can be found in AOPwiki. Readers can refer to Fig. 1 of Khadka et al. (2020) for networks related to liver cancer. These networks provide a comprehensive summary of important molecular entities relevant to liver toxicity. They can be used to identify

186

T. V. Mahanama et al.

Fig. 7.1 AOP networks for steatosis, fibrosis, and cholestasis. Only nodes (events) related to molecular entities are shown. AOPs were extracted from AOPwiki. The areas shaded in yellow indicate nodes in the Markov blanket of the organ level event

7 Optimize and Strengthen Machine Learning Models Based on in Vitro …

187

the target for future development of bioassays as well as to guide model building using machine learning approaches, which we will discuss in the subsequent subsection.

7.2.2 Using AOPs to Facilitate Building Parsimonious Machine Learning Models As liver toxicity is a major component of toxicity assessment, there has been a notable effort in building predictive models using in vitro assays. Although some of these models have achieved good accuracy using a large number of assays, a parsimonious model will be more transparent and practical in real applications. In Khadka et al. (2020), we reported a novel approach using AOPs to filter in vitro assays before constructing a predictive model for DILI. Using AOP networks for liver toxicity, we extracted measurable molecular entities (gene expression and nuclear receptor binding) that are relevant to liver toxicity. We then searched datasets from Tox21 and L1000 projects for corresponding entries, which resulted in 41 molecular predictors. These predictors are then combined with three drug properties extracted from liver toxicity knowledge base (LTKB; Chen et al. 2016): daily dose, logP, and reactive metabolite (RM) formation. A logistic model with the elastic net penalty was then constructed with these predictors: log

P(y = 1|X = x) = β0 + β T x, P(y = 0|X = x)

where y is the binary indicator for no-DILI-concern (coded as 0) or most-DILIconcern (7.1) and x is the vector of predictors. We use β 0 and β to denote the regression coefficients. Elastic net penalty (Zou and Hastie 2005) is used for regularization of the model fitting to achieve better predictive performance. Specifically, the regression coefficients were estimated as   N   T 1   yi β0 + xiT β − log 1 + eβ0 +xi β (β0 , β) N i=1

(1 − α)β22 +λ + αβ1 . 2

ˆ = argmin − (βˆ0 , β)

here, N is the number of drugs, and β22 and β1 represent the ridge and Lasso penalties, respectively. The parameter α is set to 0.9, and λ is selected by leave-outone cross-validation. This relatively simple machine learning model achieved a prediction accuracy of 0.91 (sensitivity = 0.96, specificity = 0.83), which is competitive with other models from the literature while using only a moderate number of assays, which is important in practice. Due to the use of logistic regression, the relative contribution of each assay to the prediction can be easily compared. Interestingly, without the AOP

188

T. V. Mahanama et al.

filtering step, just blindly throwing all assays in Tox21 and L1000 in the model will lead to prediction results that are essentially random. This demonstrates that AOPs contain relevant mechanistic information that can be used to select in vitro assays and to construct more interpretable and applicable models for the practice of risk assessment. As the model takes the logistic form, we also obtain a linear predictor, s = β0 + β T x

(7.1)

In the process of model fitting, which can be used as a continuous risk score for DILI (see Fig. 7.2). We later discuss a novel method to corroborate the utility of this risk score using real-world human data. On the other hand, the model can be shrunken even further with the guidance of AOP. As evident in Fig. 7.1, the AOP networks take the form of directed acyclic graphs (DAGs), which makes it natural to consider AOP networks as Bayesian networks (Edwards 2000). The nodes shaded in yellow in Fig. 7.1 form what are called Markov blankets for the end nodes of steatosis, cholestasis, and fibrosis in each network.

Fig. 7.2 Linear predictor s differentiates the potential for DILI risk of different drugs. The DILI concern classification is derived from LTKB. This figure is reproduced from Fig. 3 of Khadka et al. (2020)

7 Optimize and Strengthen Machine Learning Models Based on in Vitro …

189

By the theory of Bayesian networks, the nodes in the Markov blankets provide essentially all the information for liver toxicity from in vitro assays. Motivated by this observation, we fit a Bayesian probit model with Gaussian priors using only nodes in the Markov blanket as predictors. This further reduced model achieved an accuracy of 0.86, which is only a slight decrease from the model in Khadka et al. (2020) but with a further reduced set of assays. In real applications, one must balance accuracy with cost and convenience. Our results show that AOP networks can be used to select different sizes of models for the set of assays depending on specific needs.

7.3 Utilize Spontaneous Reporting Databases to Corroborate Findings of Machine Learning Models Though in vitro assays based on human cell lines, in theory, can be more relevant to human biology than animal models, it is challenging to corroborate the results from machine learning models based on in vitro assays with human toxicology data due to the scarcity of relevant data sources. But one source of RWD is actually underutilized and warrants more attention, which are spontaneous reporting databases for drug adverse events. By necessity, clinical trials have relatively short follow-up times and can only include a limited number of patients. They usually also have strict inclusion criteria. As a result, some adverse effects can only be discovered by monitoring reports of adverse events after the drug has been marketed to the larger population. Spontaneous adverse event (AE) reporting databases provide postmarket surveillance by continuously collecting reports of AEs. The FDA Adverse Event Reporting Systems (FAERS); the VigiBase, an international database from the World Health Organization; the Vaccine Adverse Event Reporting System (VAERS) by the U.S. Food and Drug Administration (FDA) and the Centers of Disease Control and Prevention (CDC) are prominent examples of these databases. These databases have long been used to mine drug safety signals by both regulators and the industry. The analysis of data poses some special challenges, which we briefly discuss below. On the other hand, spontaneous reporting databases are a rich resource of information that can be used for answering some important questions related to drug safety and development. In this section, we discuss a statistical framework for analyzing data in spontaneous reporting databases to corroborate the results obtained with machine learning models using in vitro assays.

190 Table 7.1 2 × 2 table for the event reporting counts of drug j and AE i

T. V. Mahanama et al. Drug j

No drug j

Total

AE i

nij

ni −nij

ni

No AE i

n. j − ni j

n .. − n i. − n . j + n i j

n .. − n i.

Total

n. j

n .. − n . j

n ..

7.3.1 Statistical Methods for Safety Signal Mining Using Spontaneous Reporting Databases Due to the need for postmarket surveillance, there is significant literature on mining safety signals from spontaneous reporting databases. Analyzing report count data from spontaneous reporting databases is challenging in several aspects, including issues related to data quality, case duplication, missing data, inconsistent vocabulary, and the lack of verification for causality. But the most significant difficulty for statistical modeling is the absence of total counts of patients taking each drug in the population. Thus, though the count for a specific drug-AE combination can be obtained from the database, it is not feasible to calculate the traditional incidence rate, which in turn precludes the application of common statistical models. A number of approaches have been proposed to bypass this problem. Some prominent examples are proportional reporting ratios (Evans et al. 2001), reporting odds ratios (Rothman et al. 2004), the likelihood ratio tests (Huang et al. 2011, 2013, 2014; Nam et al. 2017; Xu et al. 2020; Zhao et al. 2018), and Bayesian methods (Bate et al. 1998; DuMouchel 1999; Hu et al. 2015). Among these methods, one commonly used approach is to compare the AE report counts to a baseline under the null hypothesis that the AE and drug have no association. Consider the contingency table given in Table 7.1, the expected value of the report count, nij , associated with AE i and drug j under the null hypothesis can be computed as Ei j =

n i. n . j n ..

(7.2)

This quantity can serve as the baseline value under the null. Assuming that nij follows a Poisson (E ij ) distribution under the null hypothesis, a value much larger than E ij indicates evidence for drug-AE association. Huang et al. (2011), DuMouchel (1999), and others proposed different methods to derive the threshold for declaring significant association between the drug and AE.

7.3.2 Obtain Data from FAERS Before we describe a new method for analyzing data in spontaneous reporting databases, we give a brief introduction to one of the databases, FAERS, which will

7 Optimize and Strengthen Machine Learning Models Based on in Vitro …

191

be used to demonstrate the proposed approach, though the general principles of our method are applicable to other spontaneous reporting databases as well. FAERS continuously collects spontaneous adverse event reports submitted to FDA, and it is an important component of the FDA’s postmarket safety surveillance program for drug and therapeutic biologic products. Over one million adverse events associated with the use of drugs or biological products are entered into the database each year (https://www.fda.gov/drugs/questions-and-answers-fdas-adv erse-event-reporting-system-faers/fda-adverse-event-reporting-system-faers-pub lic-dashboard). It constitutes a rich source of information for the risk of adverse events related to drugs and biological products on the market. The FAERS data are publicly available as quarterly downloads constituted of seven tables from the FDA’s Website. Each event report contains information for the drug and adverse event as well as demographic information for patients, information for therapy, and patient outcomes. As any patient, doctor, or drug company can submit reports to FAERS, the qualities of reports have notable variations. Well-known challenges for analyzing FAERS data include missing values, inconsistency in drug names, duplicated reports, and different terminology. Analyzing FAERS data usually involves extensive preprocessing (Banda et al. 2016). But as various researchers have demonstrated, spontaneous reporting databases can provide valuable information when handled appropriately. For our analysis, we follow the method of Banda et al. (2016) for preprocessing. After preprocessing, the report counts for each AE can be tabulated. As DILI might be reported as a range of different AEs (defined as Medical Dictionary for Regulatory Activities (MedDRA) terms), we created a composite AE for DILI by combining 53 MedDRA preferred terms as in George et al. (2018) and Suzuki et al. (2015).

7.3.3 Poisson Regression Model for Report Counts For mining safety signals in spontaneous reporting databases, the interest is to detect drug-AE combinations with unusually high reports counts (e.g., much higher than E ij ). Drug properties are not considered in the modeling. In our case, we are interested in testing whether a predictor of DILI potential, the linear predictor s in Eq. (7.1) is associated with high report count for DILI in FAERS. If there is a strong positive association, it corroborates the utility of the linear predictor to predict DILI potential using in vitro assays. However, the report counts in FAERS are confounded by the total number of people taking the drug. To address this problem, we use the same approach as the one described for safety signal detection, i.e., use E ij in Eq. (7.2) as the baseline for report count under the null hypothesis of no association. We assume that the report count nij for drug j and AE i follows a Poisson (μij ) distribution, where μij = λij E ij , log(λ) = β0 + β1 s,

192

T. V. Mahanama et al.

with s being the linear predictor from the penalized logistic model. This is a Poisson regression model with offset E ij , which is included to adjust for different baseline report count for different drugs. Under the global null hypothesis that there is no drugAE association, the linear predictor s is not associated with the mean report count of the drug-AE combination, and β 1 is zero. Thus, testing whether there is association between the linear predictor s and higher report counts for DILI is equivalent to testing whether β 1 = 0. Here, we use the likelihood ratio test for inference. The likelihood ratio statistic can be obtained by fitting two Poisson regression models, one with β 1 = 0, and the other with β 1 unrestricted (see, e.g., Dunn and Smyth 2018). This can be accomplished with common software packages such as the glm() function in R. For the calibration of p-values, we use a Monte Carlo simulation approach similar to that in Huang et al. (2011). For a give AE i, the number of reports for each drug j is simulated under H 0 . Given the marginal totals n.1 ,…,n.J , the counts ni1 ,…,niJ have independent Poisson distributions under the null hypothesis. The joint distribution, conditioning on ni. ; n.1 ,…,n.J , is    n .1 n. J (n i1 , ..., n i J )|n i. ; n .1 , ..., n . J ∼ multinomial n i. , , ..., . n .. n.. Using the marginal totals of the data, one can generate a large number of datasets under H 0 and calculate the likelihood ratio statistics for Poisson regression. The likelihood ratio statistic from the real dataset is then compared with simulated values to determine the p-value. For the linear predictor s that we derived from the penalized logistic model for DILI, fitting the Poisson regression model gives βˆ1 = 0.26 and correspondingly ˆ eβ1 = 1.30. This means that if our linear predictor s increases by 1, the report count will increase 30% from the baseline level on average. The corresponding p-value is highly significant (d< and standard deviation σ of the distances included in the subset of training instances, which have lower distance than D_av, are calculated. Z is an empirical cut-off value, and the default value is equal to 0.5. In Isalos, users can calculate the APD threshold value from their training instances by selecting Statistics → Domain—APD. In the configuration window, the users may alter the Z value and select whether the calculations are running on CPU or GPU (Fig. 9.8). Later, in a new tab, users can import the test instances, and, by selecting

9 Isalos Predictive Analytics Platform: Cheminformatics, …

233

Fig. 9.9 Applicability domain configuration window. a Z value numeric input. b Selection of CPU/GPU for computations execution

the APD threshold from Analytics → Existing Model Utilization, the reliability of the predictions is evaluated according to the method described above (Fig. 9.9).

9.6.2 Model Metrics The assessment of a model’s performance can be achieved using statistical metrics that quantify the accuracy of the model (approximation of the prediction to its real value) and demonstrate possible under-fitting or over-fitting phaenomena. Regression Metrics: For regression models, to evaluate their performance, the goodness-of-fit on the train and test data can be measured by selecting Statistics → Model Metrics → Regression Metrics. In the configuration window, users only need to define the columns with the actual and predicted values of the response variable. The values of mean squared error (MSE, Eq. 9.3), root mean squared error (RMSE, Eq. 9.4), mean absolute error (MAE, Eq. 9.5), and the squared correlation coefficient R2 (Eq. 9.6) are calculated. MSE, RMSE, and MAE values closer to 0, as well as R2 closer to 1, correspond to fitter models (Tropsha 2010; Faulon and Bender 2010; Witten lan et al. 2016). MSE = / RMSE = MAE =

N )2 1 ∑ ( yi − yi N i=1

(9.3)

)2 1 ∑ N ( yi − yi i=1 N

(9.4)





N | 1 ∑ || yi − yi | N i=1 ∆

(9.5)

234

D.-D. Varsou et al.

Table 9.1 Sample of confusion matrix for a two-class classification model

Actual class Predicted class



∑ N

⎜ R2 = ⎜ ⎝ /∑ N

i=1 (yi

i=1 (yi

TRUE

FALSE

TRUE

TP

FP

FALSE

FN

TN

(

− y) yi − yi

− y)

2

∑ N

⎞2

)





(

i=1



⎟ ⎟ )2 ⎠

(9.6)



yi − yi



where yi and yi are the original and predicted endpoint values over the N training samples and y and yi are the averages of the original and predicted values, respectively. ∆

Classification Metrics: For classification models, from the Statistics → Model Metrics → Classification Metrics menu, the users define the columns with the actual and predicted values of the response variable and the β value of the F-score. The confusion matrix (Table 9.1) is presented, along with the classification accuracy (Eq. 9.7), the precision (Eq. 9.8), the recall (or sensitivity, Eq. 9.9), the specificity (Eq. 9.10), and the F-score values (Eqs. 9.11 and 9.12) (OECD 2007; Witten lan et al. 2016; Tharwat 2018).

Accuracy =

TP + TN TP + FP + TN + FN

(9.7)

TP TP + FP

(9.8)

Precision = Recall =

TP TP + FN

Specificity =

TN TN + FP

2TP precision · recall = precision + recall 2TP + FN + FP ( ) ) ( 1 + β 2 TP precision · recall 2 ) =( Fβ = 1 + β · 2 β · precision + recall 1 + β 2 TP + β 2 FN + FP F1 = 2 ·

(9.9) (9.10) (9.11)

(9.12)

where TP (true positive) is the frequency of class TRUE instances correctly classified as “TRUE”, TN (true negative) is the frequency of class FALSE instances correctly classified as “FALSE”, FP (false positive—Type I error) is the frequency of class

9 Isalos Predictive Analytics Platform: Cheminformatics, …

235

FALSE instances incorrectly classified as “TRUE”, and FN (false negative—Type II error) is the frequency of class TRUE instances correctly classified as “FALSE”. β is a factor that promotes recall over precision.

9.7 Development of Predictive Models with Isalos In the next paragraphs, three case studies from the field of nanoinformatics are presented. The presented models are built and validated within Isalos and estimate different nano-related endpoints (property-response variable) based on a set of relevant properties (descriptors). All predictive workflows were built following the steps of data partitioning, normalization, variable selection, modelling, validation, and reliability assessment. The data used to build the models are publicly available through the NanoPharos database (https://db.nanopharos.eu/Queries/Datasets.zul) (NanoPharos 2022).

9.7.1 Ecotox Models The ecotox models were developed within Isalos to predict the toxicological effects on Daphnia magna of freshly dispersed and 2-year aged nanomaterials (NMs). The original dataset consisted of dose–response data from 11 Ag and TiO2 NMs with different surface functionalization and dispersed in three different media (high hardness medium and two representative river waters) (Varsou et al. 2021a). Considering the different combinations of ageing, media, and concentration levels, the dataset included 353 instances and 14 variables. A separation threshold of 40% decrease of the initial Daphnia magna population at 48 h (EC40 ) was used to produce two classes with relatively balanced distributions (“toxic”, population decreased more than 40% relative to the untreated controls/“non-toxic”, population decreased less than 40% relative to the untreated controls). In total, 150 NMs were characterized as “toxic” and 203 characterized as “non-toxic”. The toxicity class was chosen as the dependent (endpoint) variable to be predicted (Varsou et al. 2021a). The NMs of the dataset were grouped according to ECHA’s read-across guidance into two categories—freshly dispersed and aged—and two different classification models were developed, one for each category of NMs. For each model, an external validation scheme was applied: data were uploaded to Isalos and were divided randomly into training and test subsets in a ratio of 75%:25% using the Data Transformation → Split → Random Partitioning functionality. The Stratified sampling option was selected, so that the two classes of the dependent toxicity variable (toxic versus non-toxic) were evenly represented in the two subsets. The training set was initially used for variable selection and was then used to define the modelling parameters.

236 Table 9.2 Set of variables used for the development of the two models

D.-D. Varsou et al. Freshly dispersed NMs ecotox model

Aged NMs ecotox model

Core

Core

Tested media

Tested media

TEM size

TEM size

Surface charge

Zeta potential

DLS size

Conductivity

Concentration

Concentration

The training samples were copied and pasted to a new tab and were normalized using the Data Transformation → Normalizers → Z Score function to guarantee their equal contribution to the analysis. Similarly, the test samples were moved to a new tab, and the normalization function was applied by selecting Analytics → Existing Model Utilization and choosing the corresponding tab to the normalization of the training samples. The BestFirst variable selection with CFS evaluator method was applied to the training data to select the most critical, among the available descriptors. However, considering that the type of core, the tested media, and the concentration were elemental parameters during experimental evaluation (they outlined the experimental conditions), these descriptors were excluded from variable selection and were directly included in the subset of modelling variables. The variables used for the development of each of the two models are presented in Table 9.2. The reduced set of training data was moved to a new tab, and Analytics → Classification → kNN was selected to apply the kNN machine learning algorithm to correlate the toxicity endpoint to the selected variables. In both cases, a value of k = 3 was selected. Following model training, the output spreadsheet presents the knearest training neighbours of each instance, along with the corresponding Euclidean distances of each neighbour. Similar results are presented when the model is applied on the test set (Fig. 9.10). To perform the external validation scheme, the normalized test data were moved in a new tab and were filtered according to the selected variables using Data Transformation → Data Manipulation → Select Column(s). The filtered test data were moved to a new tab, and the kNN model was applied to them by choosing it from Analytics → Existing Model Utilization. The predictions were moved to a new tab, and the performance of the model in external data was measured by selecting Statistics → Model Metrics → Classification Metrics. Screenshots of the classification statistics for the two models are presented in Fig. 9.11. Finally, the applicability domain of the models was defined. Training data after variable selection were imported to a new tab, and Statistics → Domain—APD was selected to calculate the APD threshold (Eq. 9.2). Later, in a new tab the filtered test data were imported and Analytics → Existing Model Utilization was selected to inspect if the test samples fall into the domain limits and the predictions should be considered reliable or not (Fig. 9.12).

9 Isalos Predictive Analytics Platform: Cheminformatics, …

237

Fig. 9.10 Results of ecotox-freshly dispersed model on the test set. For each instance, its prediction is presented, along with the three closest training neighbours and their corresponding Euclidean distance from each test instance

Fig. 9.11 Validation results of the two ecotox models: pristine NMs model (left-hand side) and aged NMs model (right-hand side)

9.7.2 Molecular, Size, and Surface-Based Safe by Design (MS3 bD, MSzeta) Model The MSzeta model was developed using Isalos platform and a library of 69 engineered NMs, with the aim to predict zeta potential, a key physicochemical factor

238

D.-D. Varsou et al.

Fig. 9.12 Application of the APD function on a test set. a Test instances column. b Distance of each test instance from its nearest train instance. c APD threshold. d Reliability of predictions

from NMs regulatory context. Zeta potential can directly affect NMs behaviour and stability in different solutions and their interaction with biological organisms. The original dataset included seven physicochemical descriptors and was enriched using 13 molecular descriptors (Papadiamantis et al. 2021). An external validation scheme was followed for model development. The initial dataset was imported to Isalos and was randomly divided into training and test sets using a ratio of 75%:25% via the Data Transformation → Split → Random Partitioning functionality. The descriptors of the training set were normalized using the Data Transformation → Normalizers → Z Score, and the normalization parameters were applied in a new tab to the test set by selecting Analytics → Existing Model Utilization and the corresponding normalization model. The BestFirst variable selection combined with CFS evaluator method was applied to the training data to select the most critical, among the available descriptors. Among the available ones, five—two physicochemical and three molecular—were identified as the most significant, presented in Table 9.3. The reduced set of training data was moved to a new tab, and Analytics → Regression → kNN was selected to apply the kNN machine learning algorithm, with k = 7, to correlate the zeta potential endpoint to the selected variables.

9 Isalos Predictive Analytics Platform: Cheminformatics, … Table 9.3 Variables used for the development of the MSzeta model

239

MSzeta model Coating/functionalization (CT) Core size (SZ) Metal ionic radius (r ion ) Sum of metal electronegativity divided by the number of (∑ ) χ/n O oxygen atoms present in a particular metal oxide The absolute electronegativity (Mulliken electronegativity, χ abs )

Fig. 9.13 Validation results of the MSzeta model

The normalized test data were moved in a new tab and were filtered according to the selected variables of Table 9.3 using Data Transformation → Data Manipulation → Select Column(s). The filtered test data were moved to a new tab, and the kNN model was applied to them by choosing it from Analytics → Existing Model Utilization. The predictions were moved to a new tab, and the performance of the model in external data was measured by selecting Statistics → Model Metrics → Regression Metrics. A screenshot of the regression statistics is presented in Fig. 9.13. Finally, the applicability domain of the model was defined as in the previous case: using Statistics → Domain—APD functionality with the train data and Analytics → Existing Model Utilization to assess the reliability of the predictions with the test data.

9.7.3 Cell Viability Model This regression model was built in Isalos using a dataset of 24 distinct metal oxide NMs, including 77 descriptors: 15 physicochemical, structural, and assay-related descriptors and 62 atomistic computational descriptors. The model can be used to predict the cytotoxicity (log-transformed % cell viability, at 24 h post-exposure) of metal oxide NMs based on the colorimetric lactate dehydrogenase (LDH) assay and the luminometric adenosine triphosphate (ATP) assay, both of which quantify irreversible cell membrane damage (Papadiamantis et al. 2020). The initial dataset was imported to Isalos, and we used Data Transformation → Data Manipulation → Remove Column(s) to remove columns that contain the same values at a percentage equal or higher than 0.25.

240 Table 9.4 Variables used for the development of the cell viability model

D.-D. Varsou et al. Cell viability model NM core size NM hydrodynamic size Type of assay NM exposure dose Conduction band energy (E c ) Average coordination number of metal atoms in the surface region of the NM Average length of surface normal component of force vector of atoms in the surface region of the NM

An external validation scheme was followed for model development, by dividing the dataset into training and test sets in a ratio of 70%:30% using the Data Transformation → Split → Random Partitioning functionality. The descriptors of the training set were normalized using the Data Transformation → Normalizers → Z Score, and the normalization parameters were applied to the test set by selecting Analytics → Existing Model Utilization and the corresponding normalization model. The BestFirst variable selection combined with CFS evaluator method was applied to the training data to select the most critical among the available descriptors. Among the available ones, seven descriptors were identified as the most significant, presented in Table 9.4. The reduced set of training data was moved to a new tab, and Analytics → Regression → kNN was selected to apply the kNN machine learning algorithm, with k = 2, to correlate the log-transformed cell viability endpoint to the selected variables. Following the validation procedure that described for the previous model, the normalized test data were filtered according to the selected variables of Table 9.4 using Data Transformation → Data Manipulation → Select Column(s). The kNN model was applied to the filtered test subset by choosing it from Analytics → Existing Model Utilization. The performance of the model in external data was measured by selecting Statistics → Model Metrics → Regression Metrics. A screenshot of the regression statistics is presented in Fig. 9.14. Finally, the applicability domain of the model was defined using Statistics → Domain—APD functionality with the train data, and Analytics → Existing Model Utilization to assess the reliability of the predictions with the test data.

Fig. 9.14 Validation results of the cell viability model

9 Isalos Predictive Analytics Platform: Cheminformatics, …

241

9.8 Conclusions The abundance of created data, in combination with the need of different research and analysis areas to extract valuable information from them and solve real-life problems, widens the narrow circle of scientists/professionals who would normally benefit from such analyses. Researchers (e.g. experimentalists) who do not have a strong background in data science, statistics, or programming may feel intimidated by the programming environments or have time constraints that impede them from learning to implement their own scripts. Thus, such users are excluded from taking directly advantage of their own data, or are obliged to turn to informatics specialists, who may not be able to interpret the results extracted by the data. Isalos Analytics platform addresses these needs for rapid data handling and modelling as it includes functionalities that allow data transformation, filtering, analysis, and validation. The use of the platform does not require any prior computational or programming knowledge due to its user-friendly environment that consists of menus, buttons, and spreadsheets. This renders Isalos suitable for any interested user, informatics professionals or not, academics, and researchers of any field. Therefore, stakeholders with any background are given, through Isalos, the same access as informatics experts to machine learning tools and can directly manipulate their data and develop custom-made models. Acknowledgements This work was funded by the EU H2020 projects NanoSolveIT (Grant Agreement No. 814572), SABYDOMA (Grant Agreement No. 862296), DIAGONAL (Grant Agreement No. 953152), CompSafeNano (Grant Agreement No. 101008099), the EU H2020 research infrastructure NanoCommons (Grant Agreement no. 731032), the Green Deal Horizon 2020 Project SCENARIOS (Grant Agreement no. 101037509), and the POST-DOC/0718/0070 project, cofunded by the European Regional Development Fund and the Republic of Cyprus through the Research and Innovation Foundation.

References Abadi M, Agarwal A, Barham P et al. (2015) TensorFlow: large-scale machine learning on heterogeneous systems Berthold MR, Cebron N, Dill F et al (2009) KNIME—the Konstanz information miner. ACM SIGKDD Explor Newsl 11:26–31. https://doi.org/10.1145/1656274.1656280 Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 785–794 Daszykowski M, Walczak B, Massart DL (2002) Representative subset selection. Anal Chim Acta. https://doi.org/10.1016/S0003-2670(02)00651-7 Demšar J, Curk T, Erjavec A et al (2013) Orange: data mining toolbox in python. J Mach Learn Res 14:2349–2353 Faulon J-L, Bender A (2010) Handbook of chemoinformatics algorithms. Chapman and Hall/CRC

242

D.-D. Varsou et al.

Gadaleta D, Mangiatordi GF, Catto M et al (2016) Applicability domain for QSAR models: where theory meets reality. Int J Quant Struct Relat 1:45–63. https://doi.org/10.4018/IJQSPR.201601 0102 Goodfellow I, Bengio Y, Courville A (2016) Deep Learning. MIT Press Gulli A, Pal S (2017) Deep learning with Keras. Packt Publishing Ltd Higham DJ, Higham NJ (2017) MATLAB Guide. Soc Indust Appl Math Kennard RW, Stone LA (1969) Computer Aided Design of Experiments 11:137–148 Leach AR, Gillet VJ (2007) An introduction to chemoinformatics Mierswa I, Wurst M, Klinkenberg R et al. (2006) YALE: rapid prototyping for complex data mining tasks. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining—KDD ’06, ACM Press, New York, New York, USA, p 935 NanoPharos (2022) NanoPharos Database. db.nanopharos.eu. Accessed 21 Jan 2022 OECD (2007) Guidance document on the validation of (Quantitative) structure-activity relationship [(Q)SAR] models Papadiamantis AG, Afantitis A, Tsoumanis A et al (2021) Computational enrichment of physicochemical data for the development of a ζ-potential read-across predictive model with Isalos analytics platform. NanoImpact 22:100308. https://doi.org/10.1016/j.impact.2021.100308 Papadiamantis AG, Jänes J, Voyiatzis E et al (2020) Predicting cytotoxicity of metal oxide nanoparticles using isalos analytics platform. Nanomaterials 10:1–19. https://doi.org/10.3390/nano10 102017 Paszke A, Gross S, Massa F et al. (2019) PyTorch: an imperative style, high-performance deep learning library. In: Advances in neural information processing systems 32, Curran Associates, Inc., pp 8024–8035 Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830 R core team, (2021) R: A language and environment for statistical computing. Austria, Vienna Tharwat A (2018) Classification Assessment Methods. Appl Comput Inf 17:168–192. https://doi. org/10.1016/j.aci.2018.08.003 The pandas development team (2020) pandas-dev/pandas: Pandas The University of Waikato (2021) Weka 3: machine learning software in java. https://www.cs.wai kato.ac.nz/ml/weka/index.html. Accessed 15 Jul 2019 Tropsha A (2010) Best practices for QSAR model development, validation, and exploitation. Mol Inform 29:476–488. https://doi.org/10.1002/minf.201000061 Van der Aalst W (2016) Process mining: data science in action Van Hulle MM (2012) Self-organizing Maps. Handbook of natural computing. Springer, Berlin Heidelberg, Berlin, Heidelberg, pp 585–622 Varsou D-D, Ellis LJA, Afantitis A et al (2021a) Ecotoxicological read-across models for predicting acute toxicity of freshly dispersed versus medium-aged NMs to Daphnia magna. Chemosphere 285:131452. https://doi.org/10.1016/j.chemosphere.2021.131452 Varsou DD, Afantitis A, Tsoumanis A et al. (2020) Zeta-potential read-across model utilizing nanodescriptors extracted via the nanoXtract image analysis tool available on the enalos nanoinformatics cloud platform. Small 16. https://doi.org/10.1002/smll.201906588 Varsou DD, Koutroumpa NM, Sarimveis H (2021b) Automated grouping of nanomaterials and read-across prediction of their adverse effects based on mathematical optimization. J Chem Inf Model 61:2766–2779. https://doi.org/10.1021/acs.jcim.1c00199 Witten Ian H, Frank E, Hall MA, Pal CJ (2016) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Fourth Zhang S, Golbraikh A, Oloff S et al (2006) A novel automated lazy learning QSAR (ALL-QSAR) approach: method development, applications, and virtual screening of chemical databases using validated ALL-QSAR models. J Chem Inf Model 46:1984–1995. https://doi.org/10.1021/ci0 60132x Zhang T (2004) Solving large scale linear prediction problems using stochastic. In: ICML 2004 Proceedings of the 21st international conference on machine learning omnipress, pp 919–926

Chapter 10

ED Profiler: Machine Learning Tool for Screening Potential Endocrine-Disrupting Chemicals Xianhai Yang, Huihui Liu, Rebecca Kusko, and Huixiao Hong

10.1 Introduction Since the term “endocrine-disrupting chemicals (EDCs)” was first put forward in 1993 (Colborn et al. 1993), it was well-documented that EDCs could evoke harmful endocrine-related effects on humans and wildlife (Diamanti-Kandarakis et al. 2009; Gore et al. 2015; Kwiatkowski et al. 2016; Matthiessen et al. 2018; United Nations Environment Programme/World Health Organization 2013). The identified detrimental effects included, but were not limited to: alteration of the expression level of endocrine-related genes (Lu et al. 2018), interference with hormone homeostasis (Kim et al. 2011), dysfunction of endocrine glands (Lu et al. 2020), onset of hormonedependent cancers (Alsen et al. 2021; Bokobza et al. 2021), metabolic disease (Papalou et al. 2019), and feminization or demasculinization (Gimeno et al. 1996; Harris et al. 2011; Shenoy et al. 2011). In order to minimize those potential deleterious effects of EDCs on the endocrine system of humans and wildlife, it was therefore an urgent task to identify/screen potential EDCs from commercially used chemicals X. Yang (B) · H. Liu Jiangsu Key Laboratory of Chemical Pollution Control and Resources Reuse, School of Environmental and Biological Engineering, Nanjing University of Science and Technology, Nanjing 210094, China e-mail: [email protected] H. Liu e-mail: [email protected] R. Kusko Immuneering Corporation, Cambridge, MA 02142, USA e-mail: [email protected] H. Hong National Center for Toxicological Research US Food and Drug Administration, Jefferson, AR 72079, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Hong (ed.), Machine Learning and Deep Learning in Computational Toxicology, Computational Methods in Engineering & the Sciences, https://doi.org/10.1007/978-3-031-20730-3_10

243

244

X. Yang et al.

and design new substances without such endocrine disrupting effects (La Merrill et al. 2020; Kassotis et al. 2020). In this regard, a key prerequisite for successful identification potential EDCs is developing appropriate methods. To date, a panoply of validated EDC screening assays has been under revision and development. For example, 14 testing methods were published by the United States Environmental Protection Agency (EPA) underlying Endocrine Disruptor Screening Program (EDSP) (Browne et al. 2017). In the conceptual framework for testing and assessment of endocrine disrupters issued by the Organization for Economic Co-Operation and Development, more than 46 Test Guidelines and standardized test methods were recommended (Organization for Economic Co-Operation and Development 2018). However, practice verified that it was time-consuming and costly (~ 1 million USD per substance) to screen potential EDCs with in vivo-based toxicity testing (Mansouri et al. 2020; Marty 2014). Considering that more than 350,000 substances are or have been used globally (Tang et al. 2020; Wang et al. 2020), fast and cost-effective methods were and still are vital. As a result, computational models and molecular-based in vitro high-throughput screening assays were proposed to prioritize and screen potential EDCs (Browne et al. 2017). The information related to the in vitro high-throughput screening assays for EDCs could be referred to other reviews or reports (Browne et al. 2020; LeBaron et al. 2014; Murk et al. 2013; Organization for Economic Co-Operation and Development 2018; Rotroff et al. 2013; Zhu et al. 2014). Here, this chapter focuses on the computational models of endocrine disrupting effects. Since the 1990s, extensive qualitative and quantitative models were developed for endocrine system biomacromolecule endpoints (AbdulHameed et al. 2021; Banerjee et al. 2018; Browne et al. 2015; Chen et al. 2018; Dimitrov et al. 2016; Garcia de Lomana et al. 2021; Guo et al. 2017; Hong et al. 2015; Kolšek et al. 2014; Li et al. 2010; Sakkiah et al. 2021; Tan et al. 2021; Vedani et al. 2009; Wang et al. 2021b, d; Yang et al. 2013, 2019 Yin et al. 2017). However, the number of available predictive models for nonreceptor-mediated targets (e.g. hormone synthesis-related enzymes, hormone transport proteins, and hormone metabolism-related enzymes) was far less than that of nuclear receptors (Sakkiah et al. 2018). The same trend also has taken place in the available software for predicting the potential endocrine disrupting effects of a given compound (Table 10.1). As shown below, 22 out of 28 endpoints predicted by those tools were nuclear receptors. Notably, available tools integrating into silico models for hormone transport proteins were scarce up until now. Those results highlighted that more models related to nonreceptor-mediated targets should be further derived and additional high-throughput virtual screening tools with nonreceptor-mediated target predictive models should be developed continually. In this chapter, we introduce a high-throughput virtual screening tool named “ED Profiler,” which had been integrated with the (quantitative) structure–activity relationship ((Q)SAR) models for some nonreceptor-mediated targets (e.g. human and fish hormone transporters) and could be employed to predict the potential disrupting effects of EDCs on such nonreceptor-mediated targets. Generally speaking, we first proposed qualitative and/or quantitative models for hormone carrier (i.e. human transthyretin (hTTR), human and fish sex hormone-binding globulin (hSHBG and

10 ED Profiler: Machine Learning Tool for Screening Potential …

245

Table 10.1 Publicly available software for predicting potential endocrine disrupting effects Name of tool

Predictive endpoint

Reference/Link

VEGA (V1.1.5)

Human estrogen receptor (hER), androgen receptor (AR), thyroid receptor alpha (hTRα), thyroid receptor beta (hTRβ), aromatase

https://www.vegahub.eu

ToxProfiler

Estrogen receptor alpha (ERα), AbdulHameed et al. (2021) estrogen receptor beta (ERβ), https://toxpro.bhsai.org AR, bile acid receptor (BAR), pregnane X receptor (PXR), Constitutive androstane receptor (CAR), glucocorticoid receptor (GR), progesterone receptor (PR), peroxisome proliferator-activated receptor alpha (PPAR α), peroxisome proliferator-activated receptor beta (PPAR β), peroxisome proliferator-activated receptor gamma (PPAR γ), retinoic acid receptor alpha (RXRα), vitamin D3 receptor (VD3R), aryl hydrocarbon receptor (AhR), aromatase

Endocrine disruptome

ERα, ERβ, AR, GR, liver X Kolšek et al. (2014) receptor alpha (LXRα), liver X http://endocrinedisruptome. receptor beta (LXRβ), ki.si mineralocorticoid receptor (MR), PPARα, PPARβ 、PPARγ, PR, RXRα, TRα, TRβ

Danish (Q) SAR models

hER, hERα, hAR, AR, https://qsarmodels.food.dtu.dk thyroperoxidase (TPO), hPXR, hAhR, CAR

QSAR toolbox (V4.4.1)

hERα, hAR, hPXR, hTRα, hTRβ

Dimitrov et al. (2016) https://qsartoolbox.org

VirtualToxLab™ (V5.8)

AR, AhR, ERα, ERβ, GR, LXR, MR, PPARγ, PR, TRα, TRβ

Vedani et al. (2009) www.virtualtoxlab.org

ProTox-II

AhR, AR, aromatase, ERα, PPARγ

Banerjee et al. (2018) https://tox-new.charite.de/pro tox_II

admetSAR (V2.0)

hERα, hAR, hTR

Yang et al. (2019) http://lmmd.ecust.edu.cn/adm etsar2/ (continued)

246

X. Yang et al.

Table 10.1 (continued) Name of tool

Predictive endpoint

ChemBioSim

Deiodinase1 (DIO1), Garcia de Lomana et al. (2021) deiodinase2 (DIO2), https://zenodo.org/record/476 deiodinase1 (DIO3), TPO, TR, 1226#.YQTZ7eSP6Uk human sodium iodide symporter (hNIS), thyrotropin releasing hormone receptor (TRHR), thyroid stimulating hormone receptor (TSHR)

Reference/Link

SepPCNET

ER

Wang et al. (2021b) https://pubs.acs.org/doi/10. 1021/acs.est.1c01228

CPTP (Chemicals Predictive Toxicology Platform)P

Human transthyretin (hTTR)

http://cptp.dlut.edu.cn/

ED Profiler (V1.0)

hER, hPXR, hTTR, human sex http://jszy.njust.edu.cn/hjsw/ hormone-binding globulin yxh_en/list.psp (hSHBG), hNIS, fish estrogen receptor (fish ER), fish sex hormone-binding globulin (fish SHBG)

fish SHBG)), human sodium iodide symporter (hNIS), and some nuclear receptors using typical machine learning methods. Then, ED Profiler was developed in Python based on aforementioned qualitative and/or quantitative models.

10.2 Materials and Methods 10.2.1 Data Sets The qualitative and quantitative data for hTTR (Yang et al. 2021), hSHBG (Liu et al. 2016), hPXR (Yin et al. 2017), fish ER (He et al. 2018; Wang et al. 2019b), and fish SHBG (Liu et al. 2017) were obtained from our previous studies. In addition, the qualitative data for hER (Roncaglioni et al. 2008) and human sodium iodide symporter (hNIS) (Buckalew et al. 2020; Hallinger et al. 2017; Wang et al. 2018, 2019a, 2021a) were collected from previous studies. The number of qualitative and quantitative data for each endpoint is listed in Table 10.2. In the SAR modeling of all the biomacromolecules, the endpoint of active substances and inactive substances was specified as A and I, respectively. While the endpoint used to derive the QSAR models for hTTR and hSHBG was the logarithm of the relative binding affinity (log RAB). log RAB was defined as follows (Hong et al. 2015; Liu et al. 2016):

10 ED Profiler: Machine Learning Tool for Screening Potential …

247

Table 10.2 Number of endocrine disrupting effects data for each endpoint and the data sources hTTR

hSHBG

Data type

n (Total) n (Active) n (Inactive) References

Qualitative

445

229

216

Quantitative 88

88



Quantitative 41

41



Qualitative

87

38

125

Yang et al. (2021)

Liu et al. (2016)

Quantitative 87

87



hPXR

Qualitative

2693

739

1954

Yin et al. (2017)

hER

Qualitative

806

288

518

Roncaglioni et al. (2008)

hNIS

Qualitative

1771

112

1659

Buckalew et al. (2020), Hallinger et al. (2017), (Wang et al. 2018, 2019a, 2021a)

fish ER

Qualitative

62

39

23

He et al. (2018), Wang et al. (2019a, b)

fish SHBG Qualitative

70

52

18

Liu et al. (2017)

log R B A = log

I C50, reference compound I C50, ligand

(10.1)

where IC 50, reference compound and IC 50, ligand were the half-maximal inhibitory concentration of the reference compound (i.e. thyroxine and testosterone for hTTR and hSHBG, respectively) and model substance, respectively.

10.2.2 Molecular Descriptor Calculation The PaDEL Descriptors were employed to derive the qualitative and quantitative models (Yap 2011). Considering that the geometry optimization of a given substance was time-consuming and presented a software development challenge, only the descriptors with a fast calculation and lacking geometry optimization were considered here. Specially, we derived the 1D, 2D, and Pubchem Fingerprints descriptors for the (Q)SAR modeling and model quality evaluation. All of those descriptors could be calculated directly from the Simplified Molecular-Input Line-Entry System (SMILES) code of a given compound.

10.2.3 (Q)SAR Modeling For each endpoint, the corresponding data set was firstly divided into a training set (75%) and a validation set (25%) at random. Then, a number of binary classification models or QSAR models were successfully constructed for all the endpoints except

248

X. Yang et al.

for the human sodium iodide symporter (hNIS) using an in-house Euclidean distancebased k-nearest neighbor (kNN) program, which had been successfully used to derive numerous qualitative and quantitative models in our previous studies (Ding et al. 2019; Lin et al. 2019; Liu et al. 2016, 2017; Xi et al. 2020; Yang et al. 2021; Yin et al. 2017). Considering no acceptable SAR models for hNIS were derived employing kNN, other machine learning algorithms (e.g. logistic regression, support vector machine, decision tree, random forest, and so on) were further used to derive its classification models. These machine learning algorithms have been widely used in constructing predictive models for various toxicity endpoints (Chierici et al. 2018; Idakwo et al. 2018; Tang et al. 2018; Wang et al. 2021c; Zhong et al. 2021). The scikitlearn code was employed to perform those machine learning analyses (Pedregosa et al. 2011). In SAR evaluation, the predictive performance of constructed binary qualitative models was evaluated for predictive accuracy (Q), specificity (S p ), sensitivity (S n ), Matthews Correlation Coefficient (MCC), and the area under receiver operating characteristics (ROC) curve (AUC), which were defined as following (Ding et al. 2019): Q=

TP + TN TP + TN + FN + FP

(10.2)

Sp =

TN TN + FP

(10.3)

Sn =

TP TP + FN

(10.4)

TP × TN − FP × FN MCC = √ (TP + FP)(TP + FN)(TN + FP)(TN + FN)

(10.5)

where TP (true positive) and TN (true negative) are the number of chemicals correctly classified as toxic and non-toxic, respectively. FN (false negative) and FP (false positive) are the number of chemicals incorrectly classified as non-toxic and toxic. For QSAR, the internal (i.e. goodness-of-fit and robustness) and the external predictive quality of proposed QSAR models were evaluated by employing the following parameters: R2 train (squared correlation coefficient) for goodness-of-fit; Q2 LOO (leave-one-out cross-validation Q2 ), Q2 LMO (leave-many out cross-validation Q2 ), and Q2 BOOT (bootstrapping coefficient) for robustness; Q2 EXT (externally explained variance), CCC (concordance correlation coefficient), r 2 m (external validation metric), and ∆r 2 m (absolute difference of r 2 m ) for external predictive ability (Lin et al. 2019; Yang et al. 2021). In addition, RMSE (root mean square errors), s (standard errors), and MAE (mean absolute errors) for both the training and validation sets were also used to characterize the predictive ability of optimum models (Lin et al. 2019; Yang et al. 2021). Both the best SAR models and QSAR models were selected from the population of derived qualitative and quantitative models by maximizing the values of statistical parameters such as predictive accuracy (Q),

10 ED Profiler: Machine Learning Tool for Screening Potential …

249

specificity (S p ), and sensitivity (S n ) for SAR models, goodness-of-fit, robustness and external prediction ability for QSAR models.

10.2.4 Applicability Domain and Reliability Evaluation The applicability domain (AD) of both developed SAR and QSAR models was defined by Euclidean distance-based methods, which are defined as follows: / DE-AD =

∑n i=1

(xi − xi-aver )2

(10.6)

where DE-AD is the Euclidean distance used to evaluate AD, x i and x i-aver were idescriptor of a compound, and the average value of i-descriptor, respectively. The reliability of the predicted value for a given substance was assessed by Tanimoto similarity index (T S ). T S was defined as follows (Yang et al. 2021): TS = ∑n i=1

A X −i

∑n A X −i ∩ B X −i ∑i=1 ∑n n + i=1 B X −i − i=1 A X −i ∩ B X −i

(10.7)

where AX-i and BX-i are the i-th descriptor of substances A and B, respectively. We calculated the T S value for a given compound based on the Pubchem Fingerprints descriptors of the compounds in the training set and the target substance. If the T S value of a given substance is ≥ 0.900, the predicted data for this substance was labeled as high reliability; if 0.750 ≤ T S < 0.900, the predicted data was marked as moderate reliability and otherwise was low reliability.

10.2.5 Software Development The conceptual framework of ED Profiler is illustrated in Fig. 10.1. Generally speaking, there were two core layers: the user interface section and the software background section. In the user interface section, users can input target compounds, select models, show basic information, and submit the task. While in the software background section, the software will decide automatically whether the inputted information of given compound is valid or not. When a valid target compound is inputted and a model is selected, the software also will calculate the required descriptors, fill the data gap, perform Applicability Domain (AD) and reliability evaluation, and produce the final report, automatically. The ED Profiler was developed in Python 3.9.4 (https://www.python.org/). The graphic user interface was constructed using the Python tkinter module. The required 1D, 2D, and Pubchem Fingerprints descriptors by the developed models for a given

250

X. Yang et al.

Fig. 10.1 Conceptual framework of ED profiler

compound were derived by the Python padelpy module (https://github.com/ECRL/ PaDELPy), a public available python library for PaDEL Descriptors program (Yap 2011). Other Python modules such as os, re, csv, sys, pandas, sklearn, base64, and unicodecsv were also employed. The final ED Profiler package was bundled by pyinstaller 4.3 (http://www.pyinstaller.org).

10.3 Development of Predictive Models 10.3.1 Proposed Predictive Model System Herein, the SAR models for all endpoints were developed to distinguish whether a target substance within the AD is an active disruptor of a given endpoint or not. For the compounds with active hTTR or hSHBG disrupting effects, their quantitative disrupting data would be further predicted by QSAR models within the AD. The reliability of both the qualitative and quantitative results derived from SAR and QSAR models was assessed by Tanimoto similarity index (Eq. 10.7). Figure 10.2 depicts the predictive system based on the SAR and QSAR models.

10 ED Profiler: Machine Learning Tool for Screening Potential …

251

Fig. 10.2 Schematic diagram of the predicting system based on (quantitative) structure–activity relationship ((Q)SAR) model. AD: applicability domain; RA: reliability assessment

10.3.2 Development of SAR Models The optimal SAR model for each endpoint was listed in Table 10.3. All SAR models in Table 10.3 were developed by the kNN method and contained three predictive variables except for hNIS. The reason why no acceptable SAR models for hNIS were derived employing kNN may be that the data of hNIS was imbalanced (excess of inactive substance here) and the kNN method could not address imbalanced data well. Thus, other machine learning algorithms (e.g. logistic regression, support vector machine, decision tree, and random forest) that could overcome this problem were used in conjunction. The predictive ability of SAR models based on those machine learning algorithms was compared. We found that the SAR model derived by the decision tree method had better classification performance than that of other machine learning algorithms. Thus, the final optimal SAR model for hNIS was developed by the decision tree method and had ten predictive variables. The values of statistical parameters (i.e. S n , S p and Q) for all SAR models were > 0.700. The MCC values of those models were > 0.470. On the basis of the ROC graph of those models, their AUC values were obtained for training and validation sets. The calculated results indicated that the AUC values for all SAR models were > 0.800. The ROC graph of the hTTR SAR model is illustrated in Fig. 10.3 as an example. Taken as a whole, the aforementioned statistical parameters highlight that the final optimum SAR models had good classification performance. According to Eq. 10.6, the calculated maximum Euclidean distance used to evaluate AD was also presented in Table 10.3. For example, the maximum Euclidean distance obtained from the training set of the hTTR SAR model was 0.910. If the Euclidean distance of a new compound was ≤ 0.910, this indicated that the compound was considered in the applicability domain of the hTTR SAR model. Otherwise, the compound was believed to out of the applicability domain of that model.

n

Sn

112

V

0.841

0.867

0.952

32

V

674

V

0.718

0.806

0.773

202

V

0.912

0.906

0.827

0.924

1

0.926

0.898

0.868

Sp

0.866

0.874

0.797

0.892

0.969

0.946

0.866

0.868

Q

0.693

0.728

0.522

0.729

0.934

0.871

0.734

0.736

MCC

0.867

0.937

0.807

0.952

0.981

0.977

0.887

0.937

AUC

0.916

1.02

0.971

0.91

ED

0.895

443

V

16

V

1

0.962

Fish SHBG SAR Model (nT7Ring, BIC5, hmax)

46

T

Fish ER SAR Model (nHBAcc, GATS2m, nAtomP)

1

1328

T

1

0.95

0.903

0.942

1

0.957

0.903

0.946

1

0.912

0.479

0.729

1

0.985

0.896

0.993

0.744

1.89

(continued)

hNIS SAR Model (TopoPSA, PubchemFP182, PubchemFP697, nHBAcc_Lipinski, PubchemFP581, PubchemFP374, nFRing, PubchemFP38, PubchemFP713, PubchemFP95)

0.82

604

T

hER SAR Model (GGI7, maxHBd, nHBAcc)

2019

T

hPXR SAR Model (VAdjMat, WTPT-4, FMF)

0.955

93

T

hSHBG SAR Model (AATS5v, SpMax_Dzi, AATS2i, piPC9)

333

T

hTTR SAR Model (MATS2e, FMF, maxHBa)b

Data seta

Table 10.3 Statistical parameters of the optimum SAR model for each endpoint

252 X. Yang et al.

1

0.972

T and V are Training set and Validation set, respectively

18

V 1

1 1

0.981 1

0.957 1

0.994

1.09

is complexity of a molecule, maxHBa is Maximum E-States for (strong) Hydrogen Bond acceptors, AATS5v is average Broto-Moreau autocorrelation—lag 5/weighted by van der Waals volumes, SpMax_Dzi is leading eigenvalue from Barysz matrix/weighted by first ionization potential, AATS2i is average Broto-Moreau autocorrelation—lag 2/weighted by first ionization potential, piPC9 is conventional bond order ID number of order 9 (ln(1 + x),VAdjMat is vertex adjacency information (magnitude)), WTPT-4 is sum of path lengths starting from oxygen, GGI7 is topological charge index of order 7, maxHBd is maximum E-States for (strong) Hydrogen Bond donors, nHBAcc is the number of hydrogen bond acceptors (using CDK HBond Acceptor Count Descriptor algorithm), SRW4 is self-returning walk count of order 4 (ln(1 + x), MDEO-11 is molecular distance edge between all primary oxygens, MWC5 is molecular walk count of order 5 (ln(1 + x), WTPT-4 is sum of path lengths starting from oxygens, piPC4 is conventional bond order ID number of order 4 (ln(1 + x), TopoPSA is topological polar surface area, PubchemFP182 is Pubchem fingerprint (unsaturated non-aromatic carbon-only ring size 6), PubchemFP697 is Pubchem fingerprint (C–C–C–C-C–C(C)–C), nHBAcc_Lipinski is number of hydrogen bond acceptors (using Lipinski’s definition: any nitrogen; any oxygen), PubchemFP581 is Pubchem fingerprint (O = C–C–C–O), PubchemFP374 is Pubchem fingerprint (C(~H)(~H)(~H)), nFRing is number of fused rings, PubchemFP38 is Pubchem fingerprint (> = 2 Cl), PubchemFP713 is Pubchem fingerprint (Cc1ccc(C)cc1), PubchemFP95 is Pubchem fingerprint (> = 1 Hg), GATS2m is Geary autocorrelation—lag 2/weighted by mass, nAtomP is number of atoms in the largest pi system, nT7Ring is number of 7-membered rings (includes counts from fused rings), BIC5 is bond information content index (neighborhood symmetry of 5-order), hmax is maximum H E-State

b MATS2e is Moran autocorrelation—lag 2/weighted by Sanderson electronegativities, FMF

a

52

T

Table 10.3 (continued)

10 ED Profiler: Machine Learning Tool for Screening Potential … 253

254

X. Yang et al.

Fig. 10.3 ROC graph of the hTTR classification model

10.3.3 Development of QSAR Models For hTTR and hSHBG, three kNN-QSAR models were constructed. Two kNNQSAR models were derived based on the hTTR disrupting data obtained from the radiolabeled ligand displacement method (referred as kNN-QSAR (Radio)) and ANSA-based competitive fluorescence displacement assay (referred as kNNQSAR (ANSA)), respectively. The optimum hTTR kNN-QSAR (Radio), hTTR kNN-QSAR (ANSA), and hSHBG kNN-QSAR models were constructed based on two (i.e. GATS4v (Geary autocorrelation—lag 4/weighted by van der Waals volumes) and minwHBa (Minimum E-States for weak Hydrogen Bond acceptors)), two (i.e. ATSC3c (Centered Broto-Moreau autocorrelation—lag 3/weighted by charges), GATS1s (Geary autocorrelation—lag 1/weighted by I-state)), and four (i.e. ASP-2 (Average simple path, order 2), SsssCH (Sum of atom-type E-State: > CH–), SpMax1_Bhi (Largest absolute eigenvalue of Burden modified matrix—n 1/weighted by relative), and SHtCH (Sum of atom-type H E-State: #CH)) predictive variables (m), respectively. The value of k for all the three optimum QSAR models was 3. The calculated statistical parameters are listed in Table 10.4. As shown, all statistical performance satisfied the needs of acceptable threshold values (i.e. Q2 LOO , Q2 LMO & Q2 BOOT > 0.600, R2 Train > 0.700, Q2 EXT > 0.700, CCC > 0.850, r 2 m > 0.500, and ∆r 2 m < 0.200) (Chirico and Gramatica 2012), indicating that those kNN-QSAR models had good goodness-of-fit, robustness, and external prediction performances. Figure 10.4 illustrates the relationship of observed vs predicted values for the three kNN-QSAR models. The calculated maximum Euclidean distance used to evaluate AD for the three kNN-QSAR is presented in Table 10.4.

10 ED Profiler: Machine Learning Tool for Screening Potential …

255

Table 10.4 Statistical parameters of the kNN-QSAR models

Training set

Validation set

n

Acceptable threshold value

hTTR kNN-QSAR (Radio)

hTTR kNN-QSAR (ANSA)

hSHBG kNN-QSAR

m



2

2

4

k



3

3

3

nTrain



66

30

65

R2 Train

> 0.700

0.927

0.910

0.887

Q2

LOO

> 0.600

0.839

0.852

0.745

Q2 LMO

> 0.600

0.801

0.810

0.724

Q2 BOOT

> 0.600

0.810

0.760

0.695

RMSE Train



0.354

0.336

0.447

sTrain



0.362

0.354

0.457

MAE Train



0.254

0.264

0.361

nEXT



22

11

22

Q2 EXT

> 0.700

0.833

0.870

0.825

CCC

> 0.850

0.910

0.930

0.905

r2

m

> 0.500

0.791

0.816

0.710

∆r 2 m

< 0.200

0.0833

0.110

0.107

RMSE EXT



0.560

0.390

0.631

sEXT



0.603

0.457

0.679

MAE EXT



0.396

0.321

0.525

ED



0.766

0.648

1.19

10.4 Development of Software 10.4.1 Features and Overview of the Software Figure 10.5 illustrates the main interface of ED Profiler including: required target molecular inputs, model selection, key information display, and submission section. ED Profiler is a simple, user-friendly tool that could be employed to predict the disrupting effects of EDCs on seven biomacromolecules to date. Currently, the ED Profiler supports SMILES code and CAS NO as input format in both single and multi-molecule modes. We recommended that the users input the SMILES code for a single substance or multiple compounds directly because the required descriptors could be derived from valid SMILES code. For CAS NO, however, the program will search its own chemical information database using the inputted CAS NO to ascertain if there was a corresponding SMILES code for this compound. If the chemical information database contained the target compound, the corresponding SMILES code would be extracted automatically and employed

256

X. Yang et al.

Fig. 10.4 Plots of the observed vs predicted logRP values for hTTR kNN-QSAR (Radio) model (a), hTTR kNN-QSAR (ANSA) model (b) and hSHBG kNN-QSAR model (c)

to calculate relevant descriptors, otherwise, the target cannot be predicted by ED Profiler. In model selection section, the user could select one or more models. After input of the target molecule and selection of appropriate models, the user could click the submission button. Then, the tool will fill the data gap and evaluate the AD and reliability of prediction results based on the calculated descriptors. Finally, a report containing critical information, such as basic information of inputted target compounds and the observed (if possible) and predicted qualitative and/or quantitative (if possible) value for given endpoint. The evaluation results of the AD and reliability for corresponding predictive value would also be generated.

10.4.2 Examples Here, the new substances not included in our software were employed as an example. Ma et al. (2018) probed the interactions of three compounds (i.e. 4, 4' -diHO- 3, 3' , 5, 5' -tetrabromo-diphenylsulfon, 4-HO-2,2' ,3,4' ,5,5' ,6-heptabrominated diphenyl ethers, and 6-HO-2, 2' , 3, 4, 4' , 5, 5' -heptabromodiphenyl ether) with hTTR by native electrospray ionization mass spectrometry. Their results indicated

10 ED Profiler: Machine Learning Tool for Screening Potential …

257

Fig. 10.5 Main interface of ED profiler

that all the three compounds were hTTR binders. Among those compounds, only 4-HO-2, 2' , 3, 4' , 5, 5' , 6-heptabrominated diphenyl ethers had been compiled into the ED Profiler. Thus, the potential hTTR binding affinity of 4, 4' -diHO- 3, 3' , 5, 5' -tetrabromo-diphenylsulfon (CAS NO: 039635-79-5; SMILES code: O = S(=O)(c(cc(c(O)c1Br)Br)c1)c(cc(c(O)c2Br)Br)c2) and 6-HO-2, 2' , 3, 4, 4' , 5, 5' -heptabromodiphenyl ether (CAS NO: 1350848-48-4; SMILES code: c1(Br)c(Br)c(Br)c(Br)c(O)c1Oc1c(Br)cc(Br)c(Br)c1) with hTTR were predicted by ED Profiler using the corresponding SMILES codes (Table 10.5). Our previous study highlighted that it was critically important to evaluate the predicted data in terms of both applicability domain and reliability (Yang et al. 2021). Only the results marked as within the applicability domain and high reliability could be considered as a high-quality predicted value. As shown in Table 10.5, the two compounds were within the applicability domain of all the (Q)SAR models. However, the reliability assessment results implied that only the qualitative predicted value of the hTTR SAR model and quantitative predicted value of the hTTR QSAR model (Radio) for 6-HO-2, 2' , 3, 4, 4' , 5, 5' -heptabromodiphenyl ether were of high reliability.

258

X. Yang et al.

Table 10.5 Predicted binding affinity of selected model compounds with hTTR and their evaluation results 4, 4' -di-HO- 3, 3' , 5, 5' -tetrabromo-diphenylsulfon

6-HO-2, 2' , 3, 4, 4' , 5, 5' -heptabromodiphenyl ether

Predicted value

ADa

RAb

hTTR SAR

Inactive

In domain

Mc

Active

In domain

H

hTTR QSAR (Radio)







0.713c

In domain

H

hTTR QSAR (ANSA)







− 0.780c

In domain

L

Experimental value

Active (Ma et al. 2018; Huang et al. 2020) Inactive (Šauer et al. 2021)

Predicted value

AD

RA

Active (Ma et al. 2018)

AD applicability domain; RA reliability assessment; H high reliability; M moderate reliability; L low reliability; logRP value

10.5 Conclusions In this chapter, a simple and user-friendly high-throughput virtual screening tool referred to as ED Profiler was detailed, which is characterized by predicting the potential disrupting effects of EDCs on nonreceptor-mediated targets, especially hormone transport proteins. We will continue to improve ED Profiler in the following aspects. First, more models related to nonreceptor-mediated targets will be developed gradually. Second, additional predictive qualitative and/or quantitative models will be constructed by employing different machine learning methods for each biomacromolecule. Third, ED Profiler will be updated with newly constructed models and other functions. Disclaimer: This chapter reflects the views of the authors and does not necessarily represent the official views of U.S. Food and Drug Administration. Acknowledgements The study was supported by the National Natural Science Foundation of China (No. 22176097), China Postdoctoral Science Foundation (2020T130301, 2020M671502); Jiangsu Planned Projects for Postdoctoral Research Funds (2020Z288).

References AbdulHameed MDM, Liu RF, Schyman P, Sachs D, Xu Z, Desai V, Wallqvist A (2021) ToxProfiler: Toxicity-target profiler based on chemical similarity. Comput Toxicol 18:100162 Alsen M, Sinclair C, Cooke P, Ziadkhanpour K, Genden E, van Gerwen M (2021) Endocrine disrupting chemicals and thyroid cancer: an overview. Toxics 9(1):14

10 ED Profiler: Machine Learning Tool for Screening Potential …

259

Banerjee P, Eckert AO, Schrey AK, Preissner R (2018) ProTox-II: a webserver for the prediction of toxicity of chemicals. Nucleic Acids Res 46(W1):W257–W263 Browne P, Judson RS, Casey WM, Kleinstreuer NC, Thomas RS (2015) Screening chemicals for estrogen receptor bioactivity using a computational model. Environ Sci Technol 49(14):8804– 8814 Browne P, Noyes PD, Casey WM, Dix DJ (2017) Application of adverse outcome pathways to U.S. EPA’s endocrine disruptor screening program. Environ Health Perspect 125(9):096001 Browne P, Van Der Wal L, Gourmelon A (2020) OECD approaches and considerations for regulatory evaluation of endocrine disruptors. Mol Cell Endocrinol 504:110675 Bokobza E, Hinault C, Tiroille V, Clavel S, Bost F, Chevalier N (2021) The adipose tissue at the crosstalk between EDCs and cancer development. Front Endocrinol (lausanne) 12:691658 Buckalew AR, Wang J, Murr AS, Deisenroth C, Stewart WM, Stoker TE, Laws SC (2020) Evaluation of potential sodium-iodide symporter (NIS) inhibitors using a secondary Fischer rat thyroid follicular cell (FRTL-5) radioactive iodide uptake (RAIU) assay. Arch Toxicol 94(3):873–885 Chen Q, Tan H, Yu H, Shi W (2018) Activation of steroid hormone receptors: shed light on the in silico evaluation of endocrine disrupting chemicals. Sci Total Environ 631–632:27–39 Chierici M, Giulini M, Bussola N, Jurman G, Furlanello C (2018) Machine learning models for predicting endocrine disruption potential of environmental chemicals. J Environ Sci Health C Environ Carcinog Ecotoxicol Rev 36(4):237–251 Chirico N, Gramatica P (2012) Real external predictivity of QSAR models. Part 2. New intercomparable thresholds for different validation criteria and the need for scatter plot inspection. J Chem Inf Model 52(8):2044–2058 Colborn T, vom Saal FS, Soto AM (1993) Developmental effects of endocrine-disrupting chemicals in wildlife and humans. Environ Health Perspect 101(5):378–384 Diamanti-Kandarakis E, Bourguignon JP, Giudice LC, Hauser R, Prins GS, Soto AM, Zoeller RT, Gore AC (2009) Endocrine-disrupting chemicals: an endocrine society scientific statement. Endocr Rev 30(4):293–342 Ding F, Wang Z, Yang XH, Shi LL, Liu JN, Chen GS (2019) Development of classification models for predicting chronic toxicity of chemicals to Daphnia magna and Pseudokirchneriella subcapitata. SAR QSAR Environ Res 30(1):39–50 Dimitrov SD, Diderich R, Sobanski T, Pavlov TS, Chankov GV, Chapkanov AS, Karakolev YH, Temelkov SG, Vasilev RA, Gerova KD, Kuseva CD, Todorova ND, Mehmed AM, Rasenberg M, Mekenyan OG (2016) QSAR Toolbox—workflow and major functionalities. SAR QSAR Environ Res 27(3):203–219 Hallinger DR, Murr AS, Buckalew AR, Simmons SO, Stoker TE, Laws SC (2017) Development of a screening approach to detect thyroid disrupting chemicals that inhibit the human sodium iodide symporter (NIS). Toxicol in Vitro 40:66–78 Harris CA, Hamilton PB, Runnalls TJ, Vinciotti V, Henshaw A, Hodgson D, Coe TS, Jobling S, Tyler CR, Sumpter JP (2011) The consequences of feminization in breeding groups of wild fish. Environ Health Perspect 119:306–311 He JY, Peng T, Yang XH, Liu HH (2018) Development of QSAR models for predicting the binding affinity of endocrine disrupting chemicals to eight fish estrogen receptor. Ecotoxicol Environ Saf 148:211–219 Hong H, Branham WS, Ng HW, Moland CL, Dial SL, Fang H, Perkins R, Sheehan D, Tong W (2015) Human sex hormone-binding globulin binding affinities of 125 structurally diverse chemicals and comparison with their binding to androgen receptor, estrogen receptor, and α-fetoprotein. Toxicol Sci 143(2):333–348 Huang K, Wang X, Zhang H, Zeng L, Zhang X, Wang B, Zhou Y, Jing T (2020) Structure-directed screening and analysis of thyroid-disrupting chemicals targeting transthyretin based on molecular recognition and chromatographic separation. Environ Sci Technol 54(9):5437–5445 Gimeno S, Gerritsen A, Bowmer T, Komen H (1996) Feminization of male carp. Nature 384:221– 222

260

X. Yang et al.

Gore AC, Chappell VA, Fenton SE, Flaws JA, Nadal A, Prins GS, Toppari J, Zoeller RT (2015) EDC-2: the endocrine society’s second scientific statement on endocrine-disrupting chemicals. Endocr Rev 36(6):E1–E150 Garcia de Lomana M, Morger A, Norinder U, Buesen R, Landsiedel R, Volkamer A, Kirchmair J, Mathea M (2021) ChemBioSim: enhancing conformal prediction of in vivo toxicity by use of predicted bioactivities. J Chem Inf Model 61(7):3255–3272 Guo J, Shi W, Chen Q, Deng D, Zhang X, Wei S, Yu N, Giesy JP, Yu H (2017) Extended virtual screening strategies to link antiandrogenic activities and detected organic contaminants in soils. Environ Sci Technol 51(21):12528–12536 Idakwo G, Luttrell J, Chen M, Hong H, Zhou Z, Gong P, Zhang C (2018) A review on machine learning methods for in silico toxicity prediction. J Environ Sci Health C Environ Carcinog Ecotoxicol Rev 36(4):169–191 Kassotis CD, Vandenberg LN, Demeneix BA, Porta M, Slama R, Trasande L (2020) Endocrinedisrupting chemicals: economic, regulatory, and policy implications. Lancet Diabetes Endocrinol 8(8):719–730 Kim S, Choi K, Ji K, Seo J, Kho Y, Park J, Kim S, Park S, Hwang I, Jeon J, Yang H, Giesy JP (2011) Trans-placental transfer of thirteen perfluorinated compounds and relations with fetal thyroid hormones. Environ Sci Technol 45(17):7465–7472 Kolšek K, Mavri J, Sollner Dolenc M, Gobec S, Turk S (2014) Endocrine disruptome–an open source prediction tool for assessing endocrine disruption potential through nuclear receptor binding. J Chem Inf Model 54(4):1254–1267 Kwiatkowski CF, Bolden AL, Liroff RA, Rochester JR, Vandenbergh JG (2016) Twenty-Five Years of endocrine disruption science: remembering Theo Colborn. Environ Health Perspect 124(9):A151–A154 La Merrill MA, Vandenberg LN, Smith MT, Goodson W, Browne P, Patisaul HB, Guyton KZ, Kortenkamp A, Cogliano VJ, Woodruff TJ, Rieswijk L, Sone H, Korach KS, Gore AC, Zeise L, Zoeller RT (2020) Consensus on the key characteristics of endocrine-disrupting chemicals as a basis for hazard identification. Nat Rev Endocrinol 16(1):45–57 LeBaron MJ, Coady KK, O’Connor JC, Nabb DL, Markell LK, Snajdr S, Sue Marty M (2014) Key learnings from performance of the U.S. EPA endocrine disruptor screening program (EDSP) Tier 1 in vitro assays. Birth Defects Res B Dev Reprod Toxicol 101(1): 23–42 Li F, Xie Q, Li X, Li N, Chi P, Chen J, Wang Z, Hao C (2010) Hormone activity of hydroxylated polybrominated diphenyl ethers on human thyroid receptor-beta: in vitro and in silico investigations. Environ Health Perspect 118(5):602–606 Lin SY, Yang XH, Liu HH (2019) Development of liposome/water partition coefficients predictive models for neutral and ionogenic organic chemicals. Ecotox Environ Safe 179: 40–49 Liu HH, Yang XH, Lu R (2016) Development of classification model and QSAR model for predicting binding affinity of endocrine disrupting chemicals to human sex hormone-binding globulin. Chemosphere 156:1–7 Liu HH, Yang XH, Yin C, Wei MB, He X (2017) Development of predictive models for predicting binding affinity of endocrine disrupting chemicals to fish sex hormone-binding globulin. Ecotoxicol Environ Saf 136:46–54 Lu L, Zhan T, Ma M, Xu C, Wang J, Zhang C, Liu W, Zhuang S (2018) Thyroid disruption by bisphenol s analogues via thyroid hormone receptor β: in vitro, in vivo, and molecular dynamics simulation study. Environ Sci Technol 52(11):6617–6625 Lu L, Wu H, Cui S, Zhan T, Zhang C, Lu S, Liu W, Zhuang S (2020) Pentabromoethylbenzene exposure induces transcriptome aberration and thyroid dysfunction: in vitro, in silico, and in vivo investigations. Environ Sci Technol 54(19):12335–12344 Mansouri K, Kleinstreuer N, Abdelaziz AM, Alberga D, Alves VM, Andersson PL, Andrade CH, Bai F, Balabin I, Ballabio D, Benfenati E, Bhhatarai B, Boyer S, Chen J, Consonni V, Farag S, Fourches D, García-Sosa AT, Gramatica P, Grisoni F, Grulke CM, Hong H, Horvath D, Hu X, Huang R, Jeliazkova N, Li J, Li X, Liu H, Manganelli S, Mangiatordi GF, Maran U, Marcou G, Martin T, Muratov E, Nguyen DT, Nicolotti O, Nikolov NG, Norinder U, Papa E, Petitjean M, Piir

10 ED Profiler: Machine Learning Tool for Screening Potential …

261

G, Pogodin P, Poroikov V, Qiao X, Richard AM, Roncaglioni A, Ruiz P, Rupakheti C, Sakkiah S, Sangion A, Schramm KW, Selvaraj C, Shah I, Sild S, Sun L, Taboureau O, Tang Y, Tetko IV, Todeschini R, Tong W, Trisciuzzi D, Tropsha A, Van Den Driessche G, Varnek A, Wang Z, Wedebye EB, Williams AJ, Xie H, Zakharov AV, Zheng Z, Judson RS (2020) CoMPARA: collaborative modeling project for androgen receptor activity. Environ Health Perspect 128(2):27002 Marty S (2014) Introduction to screening for endocrine activity-experiences with the US EPA’s endocrine disruptor screening program and future considerations. Birth Defects Res B Dev Reprod Toxicol 101(1):1–2 Matthiessen P, Wheeler JR, Weltje L (2018) A review of the evidence for endocrine disrupting effects of current-use chemicals on wildlife populations. Crit Rev Toxicol 48(3):195–216 Ma JY, Li YW, Mei N, Tian HD, Wang X (2018) Study on interaction between brominated flame retardants and transthyretin by native electrospray ionization mass spectrometry (in Chinese). J Instrum Anal 37(5): 525–531 Murk AJ, Rijntjes E, Blaauboer BJ, Clewell R, Crofton KM, Dingemans MM, Furlow JD, Kavlock R, Köhrle J, Opitz R, Traas T, Visser TJ, Xia M, Gutleb AC (2013) Mechanism-based testing strategy using in vitro approaches for identification of thyroid hormone disrupting chemicals. Toxicol in Vitro 27(4):1320–1346 Organization for Economic Co-Operation and Development (2018) Revised Guidance Document 150 on Standardised Test Guidelines for Evaluating Chemicals for Endocrine Disruption, Organization for Economic Co-Operation and Development (OECD) Series on Testing and Assessment, OECD Publishing, Paris, pp 20–21, https://doi.org/10.1787/9789264304741-en Papalou O, Kandaraki EA, Papadakis G, Diamanti-Kandarakis E (2019) Endocrine disrupting chemicals: an occult mediator of metabolic disease. Front Endocrinol (lausanne) 10:112 Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830 Roncaglioni A, Piclin N, Pintore M, Benfenati E (2008) Binary classification models for endocrine disrupter effects mediated through the estrogen receptor. SAR QSAR Environ Res 19(7–8):697– 733 Rotroff DM, Dix DJ, Houck KA, Knudsen TB, Martin MT, McLaurin KW, Reif DM, Crofton KM, Singh AV, Xia M, Huang R, Judson RS (2013) Using in vitro high throughput screening assays to identify potential endocrine-disrupting chemicals. Environ Health Perspect 121(1):7–14 Sakkiah S, Guo WJ, Pan BH, Kusko R, Tong WD, Hong HX (2018) Computational prediction models for assessing endocrine disrupting potential of chemicals. J Environ Sci Health C Environ Carcinog Ecotoxicol Rev 36(4):192–218 Sakkiah S, Selvaraj C, Guo W, Liu J, Ge W, Patterson TA, Hong H (2021) Elucidation of agonist and antagonist dynamic binding patterns in ER-α by integration of molecular docking, molecular dynamics simulations and quantum mechanical calculations. Int J Mol Sci 22(17):9371 Šauer P, Švecová H, Grabicová K, Gönül Aydın F, Mackuˇlak T, Kodeš V, Blytt LD, Henninge LB, Grabic R, Kocour Kroupová H (2021) Bisphenols emerging in Norwegian and Czech aquatic environments show transthyretin binding potency and other less-studied endocrine-disrupting activities. Sci Total Environ 751:141801 Shenoy K, Crowley PH (2011) Endocrine disruption of male mating signals: ecological and evolutionary implications. Funct Ecol 25:433–448 Tan H, Chen Q, Hong H, Benfenati E, Gini GC, Zhang X, Yu H, Shi W (2021) Structures of endocrine-disrupting chemicals correlate with the activation of 12 classic nuclear receptors. Environ Sci Technol 55(24):16552–16562 Tang W, Chen J, Hong H (2020) Development of classification models for predicting inhibition of mitochondrial fusion and fission using machine learning methods. Chemosphere 273:128567 Tang W, Chen J, Wang Z, Xie H, Hong H (2018) Deep learning for predicting toxicity of chemicals: a mini review. J Environ Sci Health C Environ Carcinog Ecotoxicol Rev 36(4):252–271

262

X. Yang et al.

United Nations Environment Programme/World Health Organization (2013) State of the science of endocrine disrupting chemicals. United Nations Environment Programme/World Health Organization (UNEP/WHO), Geneva, pp 23–188 Vedani A, Smiesko M, Spreafico M, Peristera O, Dobler M (2009) VirtualToxLab—in silico prediction of the toxic (endocrine-disrupting) potential of drugs, chemicals and natural products. Two years and 2000 compounds of experience: a progress report. ALTEX 26(3):167–176 Wang J, Hallinger DR, Murr AS, Buckalew AR, Simmons SO, Laws SC, Stoker TE (2018) HighThroughput screening and quantitative chemical ranking for sodium-iodide symporter Inhibitors in ToxCast phase I chemical library. Environ Sci Technol 52(9):5417–5426 Wang J, Hallinger DR, Murr AS, Buckalew AR, Lougee RR, Richard AM, Laws SC, Stoker TE (2019a) High-throughput screening and chemotype-enrichment analysis of ToxCast phase II chemicals evaluated for human sodium-iodide symporter (NIS) inhibition. Environ Int 126:377– 386 Wang J, Richard AM, Murr AS, Buckalew AR, Lougee RR, Shobair M, Hallinger DR, Laws SC, Stoker TE (2021a) Expanded high-throughput screening and chemotype-enrichment analysis of the phase II: e1k ToxCast library for human sodium-iodide symporter (NIS) inhibition. Arch Toxicol 95(5):1723–1737 Wang L, Zhao L, Liu X, Fu J, Zhang A (2021b) SepPCNET: deeping learning on a 3D surface electrostatic potential point cloud for enhanced toxicity classification and its application to suspected environmental estrogens. Environ Sci Technol 55(14):9958–9967 Wang MWH, Goodman JM, Allen THE (2021c) Machine learning in predictive toxicology: recent applications and future directions for classification models. Chem Res Toxicol 34(2):217–239 Wang YN, Liu HH, Yang XH (2019b) Development of binary classification models for predicting estrogenic activity of organic compounds on zebrafish. Asian J Ecotoxicol 14(4):163–169 (in Chinese) Wang Z, Chen J, Hong H (2021d) Developing QSAR models with defined applicability domains on PPARγ binding affinity using large data sets and machine learning algorithms. Environ Sci Technol 55(10):6857–6866 Wang Z, Walker GW, Muir DCG, Nagatani-Yoshida K (2020) Toward a global understanding of chemical pollution: a first comprehensive analysis of national and regional chemical inventories. Environ Sci Technol 54(5):2575–2584 Xi Y, Yang X, Zhang H, Liu H, Watson P, Yang F (2020) Binding interactions of halo-benzoic acids, halo-benzenesulfonic acids and halo-phenylboronic acids with human transthyretin. Chemosphere 242:125135 Yap CW (2011) PaDEL-Descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem 32(7):1466–1474 Yang H, Lou C, Sun L, Li J, Cai Y, Wang Z, Li W, Liu G, Tang Y (2019) admetSAR 2.0: web-service for prediction and optimization of chemical ADMET properties. Bioinformatics 35(6):1067–1069 Yang XH, Ou W, Zhao SS, Xi Y, Wang LJ, Liu HH (2021) Rapid screening of human transthyretin disruptors through a tiered in silico approach. ACS Sustain Chem Eng 9(16):5661–5672 Yang X, Xie H, Chen J, Li X (2013) Anionic phenolic compounds bind stronger with transthyretin than their neutral forms: nonnegligible mechanisms in virtual screening of endocrine disrupting chemicals. Chem Res Toxicol 26(9):1340–1347 Yin C, Yang XH, Wei MB, Liu HH (2017) Predictive models for identifying the binding activity of structurally diverse chemicals to human pregnane X receptor. Environ Sci Pollut Res 24(24):20063–20071 Zhong S, Zhang K, Bagheri M, Burken JG, Gu A, Li B, Ma X, Marrone BL, Ren ZJ, Schrier J, Shi W, Tan H, Wang T, Wang X, Wong BM, Xiao X, Yu X, Zhu JJ, Zhang H (2021) Machine learning: new ideas and tools in environmental science and engineering. Environ Sci Technol 55(19):12741–12754 Zhu H, Zhang J, Kim MT, Boison A, Sedykh A, Moran K (2014) Big data in chemical toxicity research: the use of high-throughput screening assays to identify potential toxicants. Chem Res Toxicol 27(10):1643–1651

Chapter 11

Quantitative Target-specific Toxicity Prediction Modeling (QTTPM): Coupling Machine Learning with Dynamic Protein–Ligand Interaction Descriptors (DyPLIDs) to Predict Androgen Receptor-mediated Toxicity Sundar Thangapandian, Gabriel Idakwo, Joseph Luttrell, Huixiao Hong, Chaoyang Zhang, and Ping Gong

S. Thangapandian Oak Ridge Institute for Science and Education, Oak Ridge, TN, USA Hotspot Therapeutics, Inc., 50 Milk St, Boston, MA 02109, USA G. Idakwo · J. Luttrell · C. Zhang School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, USA e-mail: [email protected] J. Luttrell e-mail: [email protected] C. Zhang e-mail: [email protected] H. Hong National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA e-mail: [email protected] P. Gong (B) Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS, USA e-mail: [email protected] This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2023 H. Hong (ed.), Machine Learning and Deep Learning in Computational Toxicology, Computational Methods in Engineering & the Sciences, https://doi.org/10.1007/978-3-031-20730-3_11

263

264

S. Thangapandian et al.

11.1 Introduction Interactions between a biomacromolecule and a small molecule (ligand) are key in the quantitative assessment of ligand-induced target-specific toxicity (Li et al. 2018). An active chemical binds its target biomacromolecule with high affinity to inhibit (as an antagonist) or activate (as an agonist) the biological activity or function associated with the target (e.g., enzymes, transcription factors). Key molecular interactions occurring at the target-ligand interface are responsible for the specific conformational changes that dictate further downstream structural transition and signaling. These molecular interactions include non-bonded interactions such as coulombic, and van der Waals (vdW) forces, and shape complementarity (Charrette et al. 2016; Stone 2008). Existing ligand-based quantitative structure–activity relationship (QSAR) approaches typically use chemical descriptors generated from the chemical structures of ligands and do not take into account the target-ligand interactions (Drwal et al. 2015; Hao et al. 2016; Hughes et al. 2015). Only a few previous studies attempted to use molecular interaction-based information in drug design but ignored the dynamic nature of these molecular interactions. Deng et al. introduced the pioneering method to calculate the structural interaction fingerprints (SIFt) from target-ligand-binding interactions and successfully demonstrated the SIFt as effective filters in structure-based virtual screening (Deng et al. 2004). The SIFt encoded threedimensional target-ligand interactions in a seven-bit, one-dimensional binary string, where each bit represented the presence or absence of a particular interaction. Kelly et al. expanded the method and provided a better representation of strength, accessibility, and geometric arrangement of hydrogen bonding groups within the ligandbinding site (Kelly and Mancera 2004). Another extension, profile-SIFt, incorporated the conservation of a specific interaction profile within a family of target–ligand interactions (Chuaqui et al. 2005). Mpamhanga et al. developed interaction fingerprint scoring that combined both similarity-based scoring and binding knowledge, based on known co-crystallized X-ray structures, to penalize the compounds that cannot assume the X-ray structure binding mode (Mpamhanga et al. 2006). Nandigam et al. introduced weighted-SIFt to capture relative importance of different binding interactions based on an empirically determined weight fit from inhibitor potency data (Nandigam et al. 2009). Other SIFt variations considered relative positions or distances between interacting target–ligand atom pairs. For instance, atom-pairs-based interaction fingerprints (APIFs) encode the range of distances between interaction points in target and ligand, an idea similar to structure-based pharmacophore design and fragment-based methods (Pérez-Nueno et al. 2009). Coordinate frame-invariant interaction pattern descriptor (TIFP) fingerprints use pseudo-atoms based on the identified pharmacophoric properties for protein and ligand atoms (Desaphy et al. 2013). Structural protein–ligand interaction fingerprints (SPLIF) implicitly encode the target-ligand interactions using extended-connectivity fingerprints (ECFPs) and handle all types of local interactions (Da and Kireev 2014). More recently, protein per atom score contributions derived interaction fingerprint (PADIF) incorporates the strengths of

11 Quantitative Target-specific Toxicity Prediction Modeling (QTTPM): …

265

different interactions and the presence of unfavorable interactions depending on the Genetic Optimization for Ligand Docking (GOLD) program-generated per atom score contributions of the protein atoms (Jasper et al. 2018). The original SIFt and all of its variations were successfully used as post-docking filters in structure-based virtual screening, fragment docking, and scaffold hopping (Marcou and Rognan 2007; Venhorst et al. 2008). However, these fingerprints use a similarity-based ranking in comparison to a known target-ligand complex, which may not find potential chemical leads that interact with the target via different interactions. Also, they were developed from static crystal structures with little or no consideration of the dynamic nature of these interactions. Ash and Fourches recently used molecular dynamics (MD)-generated ligand-based descriptors to discriminate ERK2 kinase binders from non-binders (Ash and Fourches 2017). However, they did not consider the protein–ligand interactions. In the present proof-of-concept case study, we defined dynamic protein–ligand interaction descriptors (dyPLIDs) and applied them to the development of interaction-based quantitative toxicity prediction models with androgen receptor (AR) as the toxicity target. The motivation of this study was to develop a new approach for building dyPLIDsbased machine learning models to predict the in vitro biological response of a biomacromolecule to chemicals. We applied this approach to quantitative prediction of chemical-induced AR agonism and antagonism, i.e., the androgenic disrupting property of chemicals. The AR is an important biomacromolecule involved in the development and maintenance of male reproductive functions (Darbre 2015; LuccioCamelo and Prins 2011; Milla et al. 2011). Models developed using this novel approach outperformed those developed using conventional molecular descriptors. We also identified a list of contributing dyPLIDs that explained the key molecular interactions modulating the molecular mechanism of AR agonism and antagonism. The new quantitative target-specific toxicity prediction modeling (QTTPM) approach developed in this work can be potentially applied to other toxicity and therapeutic targets.

11.2 Materials and Methods 11.2.1 Study Design As depicted in Fig. 11.1, this study began with curation of chemicals with quantitative in vitro bioassay data. Then, the curated chemicals were docked to the ligandbinding domain of AR (AR-LBD) followed by MD simulations to generate dyPLIDs. Additionally, conventional ligand-based descriptors were also calculated for these chemicals. Prior to machine learning-based quantitative prediction modeling, these chemicals were separated into training and test sets by stratified and random splitting. Four machine learning algorithms in combination with three cross-validation approaches

266

S. Thangapandian et al.

Fig. 11.1 Workflow of study design. The dashed box highlights the two dataset-splitting strategies used in this study that resulted in 200 unique data splits. These data splits were either combined with dyPLIDs or conventional descriptors. The algorithm selection part of the study was performed only with chemically stratified datasets containing dyPLIDs. All models were validated using unique holdout test datasets. Data preprocessing and chemical preparation steps are further detailed in Sect. 11.2.2. Acronyms: dyPLID, dynamic protein–ligand interaction descriptor; AR, androgen receptor; LBD, ligand-binding domain; TS, Tanimoto similarity; RF, random forest; SVM, support vector machine; kNN, k-nearest neighbor, GBM, gradient boosting machine

were employed to build QSAR models using the training set. The models were externally validated using the holdout test dataset. Random forest algorithm was selected as the algorithm of choice for further studies based on test set prediction performance. The generated QSAR model with the highest correlation coefficient was considered the best model and further evaluated for its statistical significance against chance correlation by y-randomization (also called response randomization) (Rücker et al. 2007). Further structural analyses were performed, and the descriptors of high importance were investigated using the selected best model.

11.2.2 Dataset Curation, Preprocessing, and Chemical Preparation Toxicology in the twenty-first century (Tox21) is a U.S. Federal research program aiming to advance toxicology testing methods for prioritizing chemicals for further extensive evaluation and identification of chemical toxicity mechanisms (Thomas et al. 2018). The Tox21 program has produced quantitative High-Throughput Screening (qHTS) in vitro assay data for over 10,000 chemicals against 29 signaling pathways and receptor targets, including the AR (Attene-Ramos et al. 2013; Huang et al. 2016; Kavlock et al. 2009; Thomas et al. 2018). We downloaded four qHTS

11 Quantitative Target-specific Toxicity Prediction Modeling (QTTPM): …

267

Table 11.1 Breakdown of Tox21 compounds based on their in vitro AR activity assay outcomes and preprocessing filters. See Sect. 11.2.2 for details of Tox21 dataset curation and preprocessing Agonists

Filter

Antagonists

Inconclusive

Inactive

Total 10,496

Assay outcome

241

143

3560

6552

Unique CID

155

124

N/A

N/A

279

Uncommon atomsa

154

119

N/A

N/A

273

N/A = Not applicable as the chemicals were removed by the first filter. a See Sect. 2.3 for respective atoms and chemicals.

assay datasets from Tox21 website (https://tripod.nih.gov/tox21/assays/). These qHTS assays were performed to identify small molecular agonists or antagonists of the AR signaling pathway using two different cell lines: MDA-kb2 AR-luc (MDA) and AR-UAS-bla-GripTite (BLA). Each cell line was assayed in both agonistic and antagonistic modes, generating two qHTS assay datasets. A total of 10,496 chemicals (with redundancy) were tested (Huang et al. 2016; Wilson et al. 2002). These test chemicals were filtered to remove inactive, inconclusive, and redundant chemicals. After further clean-up of compounds containing uncommon atoms, a final set of 273 active and non-redundant chemicals was retained for further investigation in this study (Table 11.1). This set of chemicals included 154 agonists and 119 antagonists, each of which had a unique identifier of Pubchem Compound ID (CID) and a consensus AR activity value. Measured in vitro activity values were averaged for a chemical in case of multiple assay outcomes (up to 4 assays) or repeated measurements of the same chemical/CID. All activity data were normalized using the z-score method. Normalized activity data of the 273 chemicals are presented in Tables 11.2 and 11.3. To prepare the chemicals for docking, OpenBabel program (O’Boyle et al. 2011) was used to remove salts, add hydrogens, and generate 3D conformations of the chemicals. Geometries and atomic charges were applied using Electronegativity Equalization Method at B3LYP-31G* level (Bultinck et al. 2002) followed by 300 steps of steepest descent minimization using general Amber general forcefield (GAFF) (Wang et al. 2004).

11.2.3 Molecular Docking 11.2.3.1

AR Function and Structure

As a ligand-dependent nuclear transcription factor and nuclear receptor (NR), the AR (Davey and Grossmann 2016) is expressed in a diverse range of tissues and regulates the development and maintenance of the reproductive, musculoskeletal, cardiovascular, immune, neural, and haemopoietic systems (Davey and Grossmann 2016). In addition, it is well-established that the AR plays a crucial role in the

268

S. Thangapandian et al.

Table 11.2 An example of dataset splitting showing a training set of 200 Tox21 compounds with their unique chemical identities (CIDs) and experimentally determined AR activities CID

Actual

Predicted

Residual

CID

Actual

Predicted

Residual − 0.098

6526396

2.458

1.286

1.172

11876263

− 0.001

0.097

91670

2.132

1.504

0.628

7050

− 0.022

− 0.206

0.184

443936

2.101

1.295

0.806

6010

− 0.031

0.061

− 0.092

9878

2.083

1.559

0.524

6238

− 0.051

0.008

− 0.060

71386

2.051

1.768

0.283

5284587

− 0.112

0.057

− 0.169

5311051

2.047

1.605

0.442

443884

− 0.117

− 0.222

0.105

236702

1.988

1.682

0.306

60605

− 0.132

0.200

− 0.332

60196346

1.972

1.661

0.311

5865

− 0.167

− 0.218

0.050

71470

1.929

1.533

0.396

5284486

− 0.182

− 0.080

− 0.102

40000

1.918

1.361

0.558

23671691

− 0.263

− 0.100

− 0.163

5311067

1.909

1.523

0.386

4993

− 0.281

0.004

− 0.285

16961

1.887

1.554

0.333

65067

− 0.283

− 0.340

0.057

443928

1.885

1.576

0.309

222786

− 0.320

0.072

− 0.392

23672582

1.870

1.303

0.567

54682468

− 0.675

− 0.074

− 0.601

5755

1.870

1.129

0.741

8813

− 0.707

− 0.673

− 0.034

16158

1.867

1.542

0.326

1345

− 0.708

− 0.281

− 0.427

656804

1.835

1.370

0.465

5318517

− 0.726

− 0.343

− 0.382

63049

1.793

1.491

0.302

86102

− 0.734

− 0.728

− 0.006

5282494

1.781

0.911

0.870

5284515

− 0.737

− 0.481

− 0.255

5743

1.774

1.508

0.267

5284558

− 0.743

− 0.719

− 0.024

63047

1.758

1.201

0.557

39327

− 0.754

− 0.547

− 0.208

21700

1.747

1.390

0.357

56846451

− 0.757

− 0.699

− 0.057

23680530

1.735

1.630

0.106

4169

− 0.760

− 0.779

0.019

5834

1.716

1.238

0.479

23667299

− 0.765

− 0.679

− 0.087

6714002

1.715

1.450

0.265

213031

− 0.769

− 0.470

− 0.299

5876

1.704

1.267

0.437

9849616

− 0.770

− 0.718

− 0.052

6918178

1.702

1.357

0.345

4890

− 0.772

− 0.534

− 0.239

6741

1.690

1.408

0.282

5756

− 0.774

− 0.324

− 0.449

16533

1.680

1.505

0.175

8227

− 0.788

− 0.777

− 0.011

9782

1.677

1.372

0.304

4933

− 0.792

− 0.701

− 0.091

20469

1.662

1.505

0.157

3391107

− 0.799

− 0.373

− 0.426

224246

1.658

1.333

0.325

74990

− 0.804

− 0.554

− 0.250

5282493

1.626

1.330

0.296

65494

− 0.806

− 0.669

− 0.137

31307

1.579

1.416

0.163

68297

− 0.806

− 0.791

− 0.015

636398

1.567

1.185

0.383

108938

− 0.810

− 0.758

− 0.052 (continued)

11 Quantitative Target-specific Toxicity Prediction Modeling (QTTPM): …

269

Table 11.2 (continued) CID

Actual

Predicted

CID

Actual

Predicted

Residual

23694214

1.566

1.401

Residual 0.165

62306

− 0.827

− 0.825

− 0.002

31378

1.491

0.847

0.644

2999413

− 0.829

− 0.144

− 0.685

441406

1.487

1.360

0.127

2812

− 0.829

− 0.650

− 0.179

32798

1.437

1.001

0.436

2078

− 0.832

− 0.717

− 0.115

5754

1.408

1.094

0.314

31100

− 0.848

− 0.667

− 0.181

247839

1.271

1.052

0.219

5284623

− 0.854

− 0.689

− 0.165

6279

1.080

0.622

0.458

12472902

− 0.868

− 0.710

− 0.157

5952

0.902

0.486

0.416

38884

− 0.870

− 0.781

− 0.088

10631

0.747

0.510

0.236

73672

− 0.870

− 0.800

− 0.070

5753

0.729

0.692

0.037

5881

− 0.871

− 0.409

− 0.462

5284533

0.676

0.567

0.109

21330

− 0.878

− 0.578

− 0.300

6917715

0.540

0.718

− 0.178

104850

− 0.882

− 0.586

− 0.296

10041070

0.497

0.341

0.155

107720

− 0.888

− 0.719

− 0.169

3033968

0.485

0.398

0.087

4615

− 0.890

− 0.382

− 0.507

6446

0.443

0.595

− 0.152

9395

− 0.891

− 0.838

− 0.054

6166

0.422

0.401

0.021

20054868

− 0.906

− 0.797

− 0.109

111332

0.421

0.390

0.031

9831581

− 0.906

− 0.504

− 0.402

8225

0.418

0.017

0.401

44460585

− 0.910

− 0.739

− 0.170

443947

0.342

0.217

0.125

6432394

− 0.917

− 0.387

− 0.530

9403

0.310

0.516

− 0.206

108150

− 0.918

− 0.788

− 0.129

4235

0.297

− 0.161

0.459

93118

− 0.935

− 0.712

− 0.223

108013

0.289

− 0.135

0.424

12922658

− 0.936

− 0.815

− 0.121

68952

0.286

0.559

− 0.273

60699

− 0.936

− 0.237

− 0.700

229455

0.286

0.432

− 0.146

4115

− 0.943

− 0.829

− 0.114

441351

0.260

− 0.330

0.590

11556711

− 0.950

− 0.811

− 0.138

21070

0.260

0.330

− 0.070

2734229

− 0.950

− 0.921

− 0.029

68570

0.259

0.339

− 0.080

84703

− 0.955

− 0.660

− 0.294

5757

0.226

0.224

0.003

5284610

− 0.956

− 0.789

− 0.167

66359

0.225

0.244

− 0.020

31677

− 0.957

− 0.799

− 0.159

23663985

0.223

0.004

0.219

62558

− 0.958

− 0.677

− 0.282

13791

0.223

0.429

− 0.206

53316406

− 0.960

− 0.687

− 0.273

168088

0.222

0.016

0.206

17693

− 0.962

− 0.816

− 0.145

155143

0.221

0.159

0.062

60196379

− 0.965

− 0.616

− 0.349

68947

0.203

0.159

0.044

60196375

− 0.968

− 0.699

− 0.270

65947

0.188

0.429

− 0.241

4499

− 0.969

− 0.728

− 0.241 (continued)

270

S. Thangapandian et al.

Table 11.2 (continued) CID

Actual

Predicted

Residual

CID

Actual

Predicted

Residual

637511

0.179

− 0.260

0.438

6868

− 0.969

− 0.798

− 0.171

11273

0.163

0.227

− 0.064

62379

− 0.971

− 0.864

− 0.108

5362376

0.161

− 0.078

0.239

18506491

− 0.972

− 0.839

− 0.134

6128

0.157

0.167

− 0.010

31101

− 0.979

− 0.773

− 0.206

14743

0.154

0.253

− 0.099

12449

− 0.980

− 0.738

− 0.242

16760141

0.140

− 0.193

0.333

3316

− 0.984

− 0.842

− 0.142

7946

0.134

− 0.050

0.184

457193

− 0.988

− 0.834

− 0.154

65359

0.129

− 0.016

0.145

6505803

− 0.994

− 0.801

− 0.193

6230

0.128

0.074

0.054

16211417

− 0.996

− 0.851

− 0.145

444008

0.116

0.068

0.048

443939

− 1.003

− 0.887

− 0.117

10204

0.113

0.231

− 0.118

60196435

− 1.007

− 0.755

− 0.252

9818306

0.111

0.300

− 0.189

10250129

− 1.010

− 0.904

− 0.106

5284557

0.107

0.220

− 0.114

5280899

− 1.011

− 0.911

− 0.100

3345

0.103

− 0.140

0.244

23160059

− 1.014

− 0.804

− 0.210

222757

0.102

0.239

− 0.136

31200

− 1.016

− 0.655

− 0.360

9904

0.098

0.055

0.042

27295

− 1.022

− 0.818

− 0.204

192826

0.097

0.002

0.094

5280453

− 1.022

− 0.538

− 0.484

5994

0.094

− 0.024

0.118

13765

− 1.025

− 0.597

− 0.428

7592

0.093

− 0.017

0.110

6439929

− 1.028

− 0.497

− 0.531

5281034

0.091

0.158

− 0.067

11057

− 1.028

− 0.943

− 0.085

5832

0.086

0.260

− 0.174

115157

− 1.028

− 0.712

− 0.316

518605

0.080

− 0.213

0.293

46226173

− 1.028

− 0.732

− 0.296

10635

0.079

0.095

− 0.016

448537

− 1.037

− 0.485

− 0.552

11472813

0.077

0.005

0.072

5388983

− 1.045

− 0.850

− 0.195

6540478

0.073

0.407

− 0.334

235227

− 1.051

− 0.846

− 0.205

10633

0.061

0.014

0.047

16722125

− 1.052

− 0.596

− 0.456

6013

0.060

0.043

0.018

11296583

− 1.071

− 0.661

− 0.410

8111

0.050

0.032

0.018

4493

− 1.100

− 0.552

− 0.548

10634

0.036

− 0.082

0.118

17038

− 1.101

− 0.886

− 0.215

12038941

0.003

− 0.184

0.187

180494

− 1.126

− 0.751

− 0.375

See Materials and Methods for more info about how the experimental AR activity values were obtained.

pathogenesis of prostate cancer (Dehm and Tindall 2007) and androgen insensitivity syndrome (Brinkmann et al. 1989; Nadal et al. 2017). The AR is structurally similar to other steroid hormone NRs, such as estrogen receptor and glucocorticoid receptor, and consists of three major functional domains:

11 Quantitative Target-specific Toxicity Prediction Modeling (QTTPM): …

271

Table 11.3 An example of dataset splitting showing a test set of 73 Tox21 compounds with their unique chemical identities (CIDs), experimentally determined AR activities (Actual) and predicted AR activities (Prediction), and difference between the actual and the predicted activities (Residual) CID

CID

Actual

Predicted

71414

Actual 1.876

Predicted 1.233

Residual 0.642

6975516

− 0.025

− 0.343

Residual 0.318

16490

1.862

1.349

0.514

229295

− 0.120

− 0.354

0.234

636374

1.838

1.024

0.814

13789

− 0.274

− 0.190

− 0.084

65478

1.815

1.292

0.523

9051

− 0.335

0.293

− 0.627 − 0.727

5877

1.739

1.328

0.412

5870

− 0.673

0.053

56649463

1.737

1.504

0.234

25015677

− 0.693

− 0.707

0.014

5702068

1.663

1.356

0.306

11080

− 0.704

− 0.738

0.034

26133

1.556

1.182

0.374

39676

− 0.708

− 0.017

− 0.691

2554

1.544

1.090

0.454

7564

− 0.739

− 0.069

− 0.670

443753

1.533

0.912

0.621

107770

− 0.759

− 0.826

0.067

444025

1.530

1.345

0.185

454216

− 0.772

− 0.516

− 0.256

440707

0.649

0.783

− 0.134

16560

− 0.784

− 0.741

− 0.043

9324

0.458

0.915

− 0.457

60196289

− 0.808

− 0.733

− 0.076

19571

0.324

0.292

0.032

66067

− 0.808

− 0.720

− 0.088

231084

0.301

− 0.215

0.516

84677

− 0.810

− 0.602

− 0.208

11033

0.289

− 0.544

0.833

460612

− 0.815

− 0.722

− 0.094

68289

0.271

− 0.581

0.852

5991

− 0.819

− 0.793

− 0.026

5995

0.260

− 0.361

0.621

59086459

− 0.853

− 0.661

− 0.192

27812

0.239

0.129

0.110

56069

− 0.857

− 0.658

− 0.199

9677

0.223

0.080

0.143

1988

− 0.899

− 0.726

− 0.173

25015

0.219

0.375

− 0.157

6450278

− 0.913

− 0.507

− 0.406

248271

0.215

0.416

− 0.202

31315

− 0.949

− 0.826

− 0.123

443935

0.207

0.063

0.143

61253

− 0.960

− 0.600

− 0.359

16051930

0.190

0.584

− 0.394

36242

− 0.960

− 0.489

− 0.471

65784

0.189

0.573

− 0.385

5283731

− 0.966

− 0.523

− 0.443

667493

0.186

0.675

− 0.490

60196297

− 0.989

− 0.769

− 0.221

6389

0.167

0.247

− 0.079

3003141

− 0.999

− 0.851

− 0.148

6537

0.154

0.232

− 0.078

1967

− 1.000

− 0.834

− 0.165

13109

0.144

− 0.203

0.347

8816

− 1.007

− 0.635

− 0.372

60198

0.144

− 0.286

0.430

3397

− 1.020

− 0.188

− 0.833

6300

0.120

0.052

0.068

5364713

− 1.035

− 0.344

− 0.691

14708

0.066

− 0.082

0.148

26177

− 1.038

− 0.344

− 0.694

6231

0.064

0.085

− 0.021

91649

− 1.038

− 0.273

− 0.765

23706212

0.020

− 0.261

0.281

55245

− 1.045

− 0.742

− 0.303

656583

0.013

− 0.317

0.330

8846

0.003

0.161

− 0.164

16115

− 1.051

− 0.626

− 0.425

2734231

− 1.082

− 0.696

− 0.386 (continued)

272

S. Thangapandian et al.

Table 11.3 (continued) CID

Actual

Predicted

28417

− 0.013

− 0.175

Residual

CID

Actual

Predicted

Residual

0.162

See materials and methods for more info about how the experimental and predicted AR activity values were obtained.

N-terminal domain (NTD), DNA-binding domain (DBD), and C-terminal ligandbinding domain (LBD) (Brinkmann et al. 1989; de Jésus-Tran Karine et al. 2006; Gao et al. 2005). Several X-ray structures are available for the LBD providing key structural information (de Jésus-Tran Karine et al. 2006) while NTD and DBD have very limited or no structural information. The AR dimerization characteristics are available from the X-ray structures of dimeric human LBD (PDB ID. 5JJM) (Nadal et al. 2017) (Fig. 11.2a) and dimeric DNA-bound rat DBD (PDB ID. 1R4I) (Shaffer et al. 2004) (Fig. 11.2b). The DBD and LBD are strictly conserved in all NRs, whereas the NTD is diverse with different lengths (de Jésus-Tran Karine et al. 2006; Gao et al. 2005; Nadal et al. 2017). The DBD is relatively smaller than the NTD and LBD. The 3D structures of the NTD and the hinge region of AR are unknown. The structurally well-characterized LBD is comprised predominantly of helices and arranged as a three-layered anti-parallel helical sandwich. The N-termini of helices H3, H5, and H11 form the majority of ligand-binding site (Fig. 11.2a). Upon binding to its natural agonist, testosterone, the AR undergoes specific conformational changes that prepare the receptor for dimerization, which forms a homodimer in a head-to-head fashion (Nadal et al. 2017; van Royen et al. 2012).

11.2.3.2

Ligand Docking to AR-LBD

The X-ray structure of AR-LBD (PDB ID. 2AM9) (de Jésus-Tran Karine et al. 2006) bound with testosterone was used to dock the 273 Tox21 compounds. AutoDock Vina 1.1.2 program (Trott and Olson 2010) was used to perform all the docking calculations. Prior to docking, all ligands including the bound testosterone and water molecules were removed from the X-ray structure and a cubic box (16 × 16 × 16 Å) was generated to include the testosterone binding region. For every compound, 100 docking poses were generated, and a pose with favorable estimated binding free energy was selected for further study. Prior to the docking of Tox21 compounds, a validation study was performed, where a testosterone molecule was docked de novo to the binding site. The predicted favorable binding mode was very similar to the conformation of testosterone identified in the X-ray structure with a root-meansquare deviation (RMSD) value of 0.4 Å (Fig. 11.3), which validated the docking system and method for AR.

11 Quantitative Target-specific Toxicity Prediction Modeling (QTTPM): …

273

Fig. 11.2 Structural details of the full-length androgen receptor (AR). a Dimeric form of the ligandbinding domain (LBD; PDB ID: 5JJM) with dihydrotestosterone (DHT, orange color) bound at the ligand-binding sites of both protomers. Protomer-A and protomer-B are shown in gray and white cartoon representations, respectively. b Dimeric form of the DNA binding domain (DBD) bound to androgen response element (ARE; PDB ID: 1R4I). DBDs are shown in green cartoon representation. The two ARE strands are shown in red and blue cartoon representations, respectively. Thin dashed lines represent the structurally unknown hinge region between DBD and LBD. Thick dashed lines represent the region that connects to the structurally unknown NTDs

274

S. Thangapandian et al.

Fig. 11.3 Molecular docking validation experiment using the AR-LBD-ligand system. a X-ray structure of AR-LBD (PDB ID. 2AM9) with co-crystallized testosterone (white sticks) and dockingpredicted testosterone (orange sticks) within the binding pocket (dashed black box). b Two different views of the co-crystallized (white) and the docking-predicted (orange) conformations of testosterone

11.2.4 Molecular Dynamics (MD) Simulations All MD simulations were performed using Amber 16 simulation package with ff99SB force field (Case et al. 2016; Lindorff-Larsen et al. 2010). The docked AR-ligand complexes obtained from molecular docking calculations were solvated in a cubic water box of the TIP3P water molecular model with a padding distance of 10 Å from the edges of the protein. The system was neutralized by adding Na+ or Cl− monovalent ions, and additional ions were added to maintain the 0.15 M NaCl concentration to mimic the physiological buffer. The SHAKE algorithm was used to constrain covalently bound hydrogen atoms. Each AR-chemical complex was minimized for 5000 steps using the conjugate gradient method followed by the steepest descent method. The system was then equilibrated with 1 kcal/mol/Å2 restraint on protein atoms for 100 ps. Finally, the production MD runs were performed with no restraints. Four representative Tox21 chemicals (i.e., 2 agonists and 2 antagonists with considerable structural diversity, see Fig. 11.4 for their CIDs and structures) were selected and docked to AR-LBD. MD simulations were run for 100 ns on the four prepared AR-ligand complexes (Table 11.4). Their conformational changes were analyzed using the Visual Molecular Dynamics (VMD) program (Humphrey et al. 1996) at the following time points of trajectories: 1, 2, 4, 6, 10, 20, 40, 60, 80, and 100 ns. We performed 6-ns simulations for the remaining 269 Tox21 compounds (Table 11.4) because an analysis of the four 100-ns simulation data suggested that the first 6-ns segment captured most of the conformational changes occurred in the entire 100-ns simulation (see Sects. 11.3.1 and 11.3.2 below for details).

11 Quantitative Target-specific Toxicity Prediction Modeling (QTTPM): …

275

Fig. 11.4 Four representative Tox21 chemicals with diverse activity and structure. Two compounds are agonists (CIDs 10041070 and 11033) and the other two are antagonists (CIDs 104850 and 108938). These four compounds were selected for 100 ns MD simulations

Table 11.4 Information of MD simulations for the 273 AR-active compounds from Tox21 Target

Ligand (CID)

Activity type

Simulation time (ns)

AR-LBD (PDB ID. 2AM9)

10041070

Agonist

100

11033

Agonist

100

104850

Antagonist

100

108938

Antagonist

100

Other 269 ligands

Agonist/Antagonist

269 × 6 ns = 1614

11.2.5 Dynamic Protein–Ligand Interaction Descriptors (DyPLIDs) Calculation Three types of dyPLIDs (Table 11.5, see their definitions below) were calculated for each ligand–receptor complex using the MD trajectories and VMD program. Protein atoms located within the radius of 10 Å centered around the docked chemical (ligand) were used for the calculation of the first two types of dyPLIDs. The 10-Å cutoff radius was chosen based on an assumption that most electrostatic and vdW interactions would occur within this distance, which has also been adopted in most MD programs (Piana et al. 2012). Type 1: Proximity-based descriptors (PBDs) represent available chemical contacts within the 10 Å cutoff by accounting for the number of certain atoms, amino acid residues, and types of residues such as acidic and basic residues that are present in this sphere. There were 73 PBDs derived for the specific system under investigation in this study. Type 2: Pharmacophore-based descriptors (PPBDs) include two subclasses of descriptors. The first subclass calculates the distance between the ligand and each amino acid residue located within the 10 Å radius. The other subclass calculates the angle between any pair of residues with respect to the ligand. PPBDs depict the arrangement and dynamics of binding site residues surrounding the ligand in MD trajectories. A total of 4560 PPBDs were derived in this study. Type 3: Dynamics-based descriptors (DyBDs) include root-mean-squared deviation (RMSD) or fluctuation (RMSF), and radius of gyration (Rg) of all the residues in the AR-LBD domain with no distance cutoff. A structural and conformational analysis of the 100-ns trajectories displayed significant conformational changes of

276

S. Thangapandian et al.

Table 11.5 Breakdown of calculated dyPLIDs for the AR-LBD-ligand complex Proximitya

#Descriptors

Pharmacophorea

#Descriptors

Dynamics

#Descriptors

nTypeAtoms

4

Distance

95

RMSD_AA

250

nTypeAA

20

Angle

4465

RMSD_Lig

1

nAcidic

1

RMSF_AA

250

nBasic

1

RMSF_Lig

1

nCharged

1

Rg_AA

250

nAromatic

1

Rg_Lig

1

nPolar

1

Vina score

1

nHydrophobic

1

Vina rescore

1

nHelical

1

Lig_water

1

nSheet

1

Lig_intraHB

1

nTurn

1

nCoil

1

nHBProt

1

nHBProt4A

36

LigSASA

1

nRes6A

1

Subtotal

73

Total

5390

4560

757

a

Both proximity and pharmacophore descriptors were calculated using amino acids within 10 Å from the center of mass of ligand. AA amino acid, Lig ligand, Rg radius of gyration. Vina scores were calculated before and after (rescore) MD simulations

residues located as far as 20 Å from the center of the ligand and thus all the residues in the AR-LBD domain were considered in this calculation. We also re-scored the estimated binding free energy using Autodock Vina for all the 273 complexes after MD simulations. A total of 757 DyBDs were calculated, representing the dynamics of the AR-LBD in response to ligand binding. In addition to dyPLIDs, 1875 conventional ligand-based molecular descriptors (i.e., 1444 1D/2D descriptors and 431 3D descriptors) were calculated using the PaDEL program (Yap 2011).

11.2.6 Feature Selection: Down-Selection of Descriptors Both the 5390 dyPLIDs and the 1875 conventional descriptors were separately downselected following a two-step feature selection process (Table 11.6 and Fig. 11.1). This process aimed to facilitate faster learning and stabilization models by removing non-informative and redundant descriptors with zero or near-zero variance and those displaying high collinearity (Kuhn and Johnson 2013). The first step removed the descriptors with near-zero variance, which were defined as those containing one

11 Quantitative Target-specific Toxicity Prediction Modeling (QTTPM): … Table 11.6 Number of descriptors remaining during the two-step feature selection process

277

Filter

dyPLID

Conventional

No filter

5390

1875

Near-zero variance

5371

431

303

53

Collinearity

unique value across all chemicals or those either containing a few unique values relative to the number of chemicals (uniqueCut) or displaying a large ratio of the frequency of the most occurring value to the frequency of the second most occurring value (freqCut). The default cutoff values for uniqueCut and freqCut were set to 10 and 20, respectively. This step reduced the number of dyPLIDs to 5371 and conventional descriptors to 431. The second step removed highly collinear descriptors with greater than 0.7 Pearson inter-correlation with other descriptors. These collinear descriptors not only add more complexity to the model, but may also lead to numerical errors and degrade predictive performance (Kuhn and Johnson 2013). After the removal of collinear descriptors, 303 dyPLIDs and 53 conventional descriptors remained. All the values of descriptors were normalized using a Z-score scaling method.

11.2.7 Dataset Splitting Two strategies (i.e., stratified and random splitting) were employed to divide the 273 chemicals into training and test sets. The stratified splitting divided chemicals based on their chemical diversity that enabled diverse training/test sets of chemicals. To achieve this, pairwise Tanimoto similarity (TS) (Bajusz et al. 2015) was determined for all possible pairs among the 273 chemicals. The TS was determined using five different types of fingerprints (E-state, MACCS, Pubchem, Standard, and Shortest Path) that were calculated using Rcpi, an R package. The similarity matrices calculated using fingerprints served as input to the Divisive Analysis (DIANA) method (implemented in Caret, an R package) to compute hierarchical clusters (Kaufman and Rousseeuw 2005; Kuhn 2008). The dendrograms obtained from cluster analysis were cut specifically to produce 73 clusters, and one chemical was drawn from each cluster to make the test set, whereas the rest made up the training set. Subsequently, 100 unique test sets were generated from 50 iterations using two select fingerprint space and the rest of the chemicals were included in 100 unique training sets. For random splitting, 100 unique training and test sets were generated by randomly drawing chemicals in 100 iterations.

278

S. Thangapandian et al.

11.2.8 Machine Learning for Quantitative AR Activity Prediction Modeling The complete machine learning workflow is shown in Fig. 11.1. Briefly, we applied four machine learning algorithms, namely random forest (RF), support vector machines (SVM), k-nearest Neighbors (kNN), and gradient boosting machines (GBM), along with three cross-validation (CV) methods (repeated k-fold, bootstrapping, and leave-one-out) to build quantitative AR activity prediction models using the 100 stratified datasets and the down-selected dyPLID descriptors (this dataset is referred as DS). These algorithms were executed in R 3.5.2 program with tenfold cross-validation and 10 repeats. A total of 1200 models (100 training sets ×4 algorithms ×3 crossvalidation methods) were trained using the training sets and were tested on the holdout test datasets, from which we selected RF with repeated k-fold CV as the best-performing algorithm as evaluated by the correlation coefficient of test datasets: R2 obs,pred = cov(obs,pred) / (σ obs × σ pred ), where cov(obs,pred) is the covariance, and σ obs and σ pred are the standard deviations of observed and predicted AR activity of the test set chemicals, respectively (Fig. 11.5). Top 10, 20, and 30 descriptors were identified based on the calculated percentage increase in mean-squared error (%IncMSE) using RF and subsequently used in model development. Models generated with top 20 descriptors have generally produced models with high test set predictions and favorable paired t-test statistics (Fig. 11.6) both in DS and stratified dataset with conventional descriptors (CS). These 20 selected descriptors were different in every run as identified by RF and the training set. The statistics for this step of descriptor selection were obtained from the 600 RF models generated (100 DS dataset + 100 CS dataset × 10/20/30 descriptors), and significance of their means was confirmed by paired t-tests. After selecting RF as the algorithm of choice and 20 as the best number of descriptors to use, 200 more QSAR models were generated for the following datasets with randomly split training/test sets with dyPLIDs (DR) or conventional descriptors (CR). The correlation coefficient R (mean ± standard deviation, n = 100) was employed to evaluate the degree of linear relationship between the observed and the predicted AR activity. An R ≥ 0.305 would be statistically significant (α = 0.01) as the critical value of the Pearson correlation coefficient r is 0.305 when the sample size (i.e., the number of chemicals) equals 70. The final selected best-performing model was also subjected to a y-randomization test to further validate its statistical significance, i.e., that the correlation did not occur by chance (Rücker et al. 2007). The AR activity values (y) were permuted (i.e., assigned randomly to the training set chemicals) 100 times to generate a set of 100 y-randomized pseudo-models using the same descriptors, following the same model building procedure as described above. The correlation coefficients of the best-performing model were then compared with those of the pseudo-models. The y-randomization test was implemented using a custom-built script running in R program. We ran the above machine learning protocol (Fig. 11.1)

11 Quantitative Target-specific Toxicity Prediction Modeling (QTTPM): …

279

Fig. 11.5 Box plots for the test set R2 values of all 1200 models that were trained using 100 unique DS training/test sets. Four machine learning algorithms (i.e., RF, random forest; SVM, support vector machine; kNN, k-nearest neighbor; GBM, gradient boosting machines) were employed for model training with three cross-validation methods (i.e., a rcv, repeated cross-validation; b boot, bootstrapping; c loo, leave-one-out). Each box plot shows the statistics of the (R2 ) values obtained using 100 distinct test sets with their mean values at the top of each plot. The p-values calculated from ANOVA with Tukey post hoc test for all combinations were < 0.001 (***) except for RF-GBM (0.759)

using the down-selected dyPLIDs and the down-selected conventional descriptors to investigate if dyPLIDs could improve QSAR modeling.

11.3 Results and Discussion 11.3.1 Conformational Ensemble of AR-Ligand Interactions We performed 154 AR-agonist- and 119 AR-antagonist-bound independent allatom MD simulations. Although these simulations (except for 4) were short (6 ns), they collectively provided insights on conformational changes specific to agonists and antagonists. Longer simulations of four structurally diverse representative Tox21 compounds (Table 11.4 and Fig. 11.4) revealed that both agonists (CIDs

280

S. Thangapandian et al.

Fig. 11.6 Selection of number of descriptors in model development. Top 10, 20, and 30 descriptors based on their percentage increase in mean-squared error (%IncMSE) were selected and experimented. Top 20 descriptors have shown better overall predictions both in a dyPLIDs and b Convs datasets and significant paired t-test characteristics. 100 training/test set splits of each category were used in the calculations, and their mean values are shown at the top. The p-values of paired t-test for each combination in a are, 10–20: 0.1827, 10–30: 0.013, 20–30: 0.02 and in b are, 10–20: 0.1827, 10–30: 0.013, 20–30: 0.02

10041070 and 11033) exhibited comparable RMSD and residue-wise RMSF profiles (Fig. 11.7). In contrast, the antagonists (CIDs104850 and 108938) displayed highly diverse conformational changes. This could have stemmed from something as simple as the molecular size of the compounds as CID108938 is considerably smaller to CID104850. Due to its larger size and fewer stable H-bonds (Fig. 11.8), binding of CID104850 might have induced significant conformational changes, compared to the conformational changes upon binding of CID108938, a smaller chemical with a larger number of stable H-bonds. The same was true with CID10041070, one of the agonists similar to testosterone, which also formed fewer H-bonds with AR compared to the smaller CID11033. RMSF profiles calculated from 100 ns simulations showed a few distinct structural fluctuations in AR-ligand bound complexes. The N-terminal of H2 (long loop in AR) antagonist complexes displayed slightly higher fluctuations (~ 0.7 Å higher) compared to agonist complexes. The inter-helical loop formed by amino acids 726–730 from H4-H5 helices (Fig. 11.7b) fluctuated high in the larger antagonist complex (AR-104850) compared to other systems. Conformational changes in H5 upon binding of agonist are known to prepare AR for dimerization (Nadal et al. 2017). Residues forming H6 also displayed similar fluctuations in agonist-bound systems. C-terminal part of inter-helical region between H8 and H9 along with N-terminal part of H9 showed fluctuations in all systems particularly in AR-CID104850, the larger antagonist system. Finally, H11 and H12 have displayed large conformational fluctuations in CID104850 and moderate fluctuations in both agonist systems, whereas the

11 Quantitative Target-specific Toxicity Prediction Modeling (QTTPM): …

281

Fig. 11.7 Conformational changes in the 100 ns MD simulations of four representative Tox21 compounds with diverse AR activities and structures. Two compounds (CIDs 10041070 and 11033) are agonists, whereas the other two (CIDs 104850 and 108938) are antagonists. a RMSD of Cα atoms in the 100 ns simulations shown along with their population densities. b RMSF of Cα atoms in the 100 ns simulations. c Structures of agonist- or antagonist-bound AR-LBDs with every residue colored to indicate its corresponding RMSF value matching the color scale

smaller CID108938 system did not show any considerable fluctuations. Thus, any deviation from key agonist-induced conformational changes may result in aberrant AR function since both CIDs104850 and 108938 displayed differential behaviors to the agonists. In the AR-CID104850 complex, the unique fluctuations at H4 (G724-L728), Nterminal region of H5 (H729-D732), H6 (N771-S778), and H11 to H12 (M886-S908) were observed. Binding of antagonists induced additional fluctuations in different

282

S. Thangapandian et al.

Fig. 11.8 Number of hydrogen bonds formed by the four representative Tox21 compounds as a function of time in the 100 ns MD simulations

parts of the protein indicating that these conformational changes prevented downstream AR signaling, which may include (i) formation of homodimers, (ii) binding of DBD to ARE inside the nucleus, (iii) binding to a chaperoning protein that facilitates the nuclear translocation of AR, or (iv) conformational changes in the NTD of AR. Based on the human LBD-LBD homodimeric X-ray structure (PDB ID. 5JJM) (Nadal et al. 2017), the C-termini of H1 and H5 and the N-terminus of H8 forms the central core of the LBD-LBD interface (Fig. 11.2a). The N-termini of H3 and H7 along with the C-terminus of H11 form the bottom of the binding site. RMSF profiles of these regions showed high fluctuations in MD simulations, and this part of AR may act as an entry gate for ligand binding (Fig. 11.2a). In addition, the N-terminus of H10, connected to H11, would possibly form an interface with DBD of the same protomer, but further experimental evidence is necessary since the structure of the hinge region between DBD and LBD is structurally unknown (Fig. 11.2). Based on this structural information, we surmise that the observed conformational fluctuations at H5 and H8 and the N-terminus of H11 may directly interfere with the required conformational changes for AR homodimerization and LBD-DBD interactions, thereby interfering DNA binding. Since the reliable 3D structure of NTD of AR is not available, structural information of ligand-binding-induced conformational changes on NTD was not studied (Fig. 11.2b).

11.3.2 Comparison of 6 ns Versus 100 ns Simulations A few structural analyses including RMSD of key helices (H3, H5, H11, H12) forming the ligand-binding site were performed and compared between the 6 ns and 100 ns simulations (Fig. 11.9). The average Cα RMSD value of helix H3 was 0.96 Å ± 0.27 for the 6 ns simulation, compared to 1.32 Å ± 0.27 for the 100 ns simulation (Fig. 11.9b). The H3 is a 26-residue long helix, and the average Cα RMSD difference between 6 and 100 ns data was only 0.36 Å. Helices H5, H11, and H12 were

11 Quantitative Target-specific Toxicity Prediction Modeling (QTTPM): …

283

30, 19, and 17 residues long, respectively. The average RMSD difference between their 6 ns and 100 ns simulations was 0.21, 0.27, and 0.47 Å, respectively. H5 forms part of the dimeric interface while H12 is the key helix that serves as the lid of the ligand-binding pocket and generally is repositioned to further stabilize the ligand (Gao et al. 2005; Tan et al. 2015a, b). The relatively high average RMSD difference in H12 (0.47 Å) also confirms the dynamic nature of this region. The calculated average RMSF profiles using the data obtained from the four 100 ns AR-docked complexes also displayed similar results between 6 and 100 ns data (Fig. 11.9c). Once again, H12 helical region showed relatively higher RMSF values than others (H1 to H11), confirming its conformational dynamics. Nevertheless, the difference (0.32 Å) in average RMSF values of H12 between the 6 ns and 100 ns data was minimal compared to that of other helical regions. These observations indicate that most of ligand-induced conformational changes in AR-LBD were captured in 6 ns simulations.

Fig. 11.9 Structural stability analysis of four 100 ns MD simulations of two agonists and two antagonists (cf. Figure 11.3a for their chemical structures). a Structure of LBD highlighting the helices that form the ligand-binding pocket in different colors and are labeled. b Average RMSD (and standard deviation, n = 4) of the key helices calculated using 6 ns (gray) and 100 ns (black) MD simulation data are compared. c RMSF (and standard deviation, n = 4) of LBD residues using 6 ns (gray) and 100 ns (black) MD simulation data. The LBD of AR is formed by residues between 669 and 920. The black bars at the top mark the locations of 12 helices of LBD

284

S. Thangapandian et al.

Fig. 11.10 Chemical similarity-based clustering of the 273 Tox21 AR-active chemicals used in this study. a Similarity matrices of pairwise Tanimoto similarity in the chemical space of five types of fingerprints. Derived clustering dendrograms based on the standard fingerprints b and c the shortest path fingerprints. These dendrograms were used in stratified splitting of the dataset

11.3.3 Fingerprint Chemical Diversity The chemical diversity of the 273 chemicals in the dataset was analyzed using 5 different fingerprints, namely E-state, MACCS, Pubchem, Standard, and Shortest Path (Fig. 11.10). Similarity matrices were generated using each fingerprint with the pairwise TS score for all the chemicals. Out of 5 fingerprints, the shortest path and standard fingerprints explained more than 95% of the dataset compounds with 0.7 or less TS score, i.e., 70% or less pairwise similarity to other compounds in the dataset. In fact, the majority of the compounds within standard and shortest path fingerprint space showed a TS score of less than 0.4 (40% or less pairwise similarity). The similarity matrices generated using these two fingerprints facilitated the hierarchical clustering to pick the most diverse clusters. Other fingerprint spaces failed to explain the diverse nature of the dataset particularly within the agonist compounds (Fig. 11.10a), which stems from the fact that most agonists contain the cyclopentanophenanthrene (steroidal) scaffold as rational molecular design evolved from testosterone, the natural agonist of AR. Antagonists in the dataset were mostly chemically diverse as there was no natural restriction in designing compounds with new scaffolds. From the clusters, 100 unique training and holdout test datasets (i.e., 50 each from two fingerprints) were generated representing the chemical diversity observed within these fingerprint spaces and used in QSAR modeling along with 100 randomly generated training and holdout test sets.

11.3.4 Predictive QSAR Model Four different machine learning algorithms (RF, SVM, kNN, and GBM) were used in generating quantitative prediction models with three cross-validation (CV) methods using the 100 DS training/test sets. From the 1200 generated models (4 algorithms

11 Quantitative Target-specific Toxicity Prediction Modeling (QTTPM): …

285

× 3 CV × 100 DR ), RF algorithm with repeated CV method generated models with high overall holdout test set prediction R2 values if compared to other algorithms (paired t-test: p < 0.05 with SVM, kNN and GBM) with a mean of 0.533 ± 0.092 and a maximum/minimum value of 0.818/0.213 from the 100 models (Fig. 11.5). Next to RF, the GBM generated models displayed test set R2 distribution with an average value of 0.515 ± 0.098 and a maximum/minimum value of 0.749/0.199. The other two algorithms (SVM and kNN) performed poorly with average test set R2 values of 0.218 ± 0.125 and 0.282 ± 0.084, respectively. This trend of algorithm performance (i.e., RF > GBM > kNN and SVM) was true for all three CV methods (Fig. 11.5). The repeated CV method performed superior compared to other CV methods. Thus, the model generated using dyPLID descriptors with chemically stratified dataset (DS ) predicted new compounds with a higher performance than the models generated using conventional descriptors. These dyPLID models also provided valuable insights on key protein–ligand interactions that are difficult to obtain from conventional descriptors. In addition to holdout test set prediction, the yrandomization test also signified the statistical robustness of the dyPLID-based model that none of the 100 randomized models displayed higher correlation to the activity profiles of the dataset (Fig. 11.11). The highest training and test set R2 values of the randomized models were 0.841 and 0.164, respectively, compared to 0.950 and 0.818 in the dyPLIDs-based model (Fig. 11.12a). Hence, the selected DS model indeed explained the activity profile in the dataset and was not generated by chance correlation. The predictive performance of CS improved when both dyPLID and conventional descriptors were used together in model development (Fig. 11.12c). The combined model (DCS ) showed maximum R2 values of 0.974 (0.968 ± 0.002) and 0.646 (0.492 ± 0.09) for training and test sets, respectively (Fig. 11.13). Due to the observed performance, RF with repeated k-fold CV was selected as the method of choice for generating further models using dyPLIDs and conventional

Fig. 11.11 Y-randomization test for the best model selected from dyPLIDs dataset with the best performance in test set prediction. Shown are the training set (x-axis) and the test set (y-axis) correlation coefficient (R2 ) values derived from the best model (black triangle) and 100 y-randomized models (open circles)

286

S. Thangapandian et al.

Fig. 11.12 Influence of descriptors and data-splitting strategies on the performance of the bestperforming random forest (RF) models. Correlation coefficients (R2 ) were derived between experimental and predicted AR activities for the training (black) and the test set (red) combination with the highest holdout test set prediction. a DS , b CS , c DR , and d CR datasets and the corresponding residual plots are shown in Fig. 11.8. Models resulted using dyPLIDs and chemically stratified data splits displayed the highest holdout test dataset prediction compared to the models using conventional descriptors and randomly split data

Fig. 11.13 Comparison of test set correlation coefficients (R2 ) obtained for 5 different combinations 100 split datasets and descriptors. DR and CR represent randomly split data with dyPLIDs and conventional descriptors, respectively; DS and CS represent chemically stratified splits with dyPLIDs and conventional descriptors, respectively. DCS represents the combination of both DS and CS . Corresponding mean values of each category are shown at the top of the plot. Statistical significance in means of different datasets is shown in Table 11.7

11 Quantitative Target-specific Toxicity Prediction Modeling (QTTPM): …

287

descriptors along with uniquely split datasets. In addition, we generated 600 RF models using top 10, 20, and 30 DS and CS datasets to obtain an optimal number of descriptors to use in the model generation. Models generated using 20 descriptors were statistically significant when compared to others (Fig. 11.6), and thus, top 20 descriptors were used in further model generation. One of the models generated by the RF algorithm using dyPLID descriptors and chemically stratified data split (DS ) displayed the highest prediction characteristics, if compared to other algorithms, with a training set correlation coefficient (R2 ) of 0.950 (mean ± standard deviation = 0.962 ± 0.003) and a holdout test set R2 value of 0.818 (mean ± standard deviation = 0.529 ± 0.092) (Figs. 11.12a and 11.13). The best model generated using the same algorithm with conventional descriptors and chemically stratified dataset (CS ) showed an R2 value of 0.947 (0.943 ± 0.003), but only showed 0.523 (0.339 ± 0.102) for the holdout test set, respectively (Figs. 11.12b and 11.13). Conventional descriptors with randomly split dataset (CR ) generated a model with better test set prediction than CS with a training and test set R2 values of 0.945 (0.944 ± 0.003) and 0.615 (0.439 ± 0.099), respectively (Figs. 11.12b, d and 11.13). The statistical significance of the means of these models is given in Table 11.7. The residual values, i.e., the difference between experimental and predicted biological activities, for the best models from DS , CS , DR , and CR datasets are given in Fig. 11.14.

Fig. 11.14 Residuals between experimental and predicted activity values. Residual values of test set chemicals present in a DS , b CS , c DR , and d CR dataset. Agonists, gray; antagonists, red. The dotted lines across the plots mark one standard deviation from the mean of the data and any chemical with a residual value above 1 considered an outlier in terms of the activity prediction

288

S. Thangapandian et al.

Table 11.7 Significance testing between test set prediction (R2 ) among different types of data sets used in this study (cf. Fig. 11.6). P-values from paired t-test for within each type (i.e., random or stratified splits) and unpaired t-test for between types are given Type

N

Mean

SD

Std. Error

DR

100

0.505

0.084

0.008

CR

100

0.439

0.099

0.009

DS

100

0.533

0.092

0.009

CS

100

0.339

0.102

0.010

DCS

100

0.492

0.091

0.009

Paired t-test t

df

95% C.I. Lower

P value Upper

DR ~ CR

5.227

99

0.041

0.091

***

DS ~ CS

14.106

99

0.163

0.217

***

DS ~ DCS

3.458

99

0.016

0.059

***

CS ~ DCS

−33.199

99

−0.162

−0.144

*** 0.056

Two-Sample t-test DR ~ DS

−1.925

198

−0.049

0.0006

DR ~ CS

12.494

198

0.140

0.192

***

DR ~ DCS

1.056

198

−0.011

0.038

0.293

CR ~ DS

−6.671

198

−0.117

−0.063

***

CR ~ CS

7.031

198

0.072

0.128

***

CR ~ DCS

−3.928

198

−0.079

−0.026

***

*p 0.95 were removed from the dataset. Descriptors/fingerprints with many missing values were also removed.

440

W. Tang et al.

The model constructed with an extra tree achieved an AUC of 0.95 for predicting disruptors of the mitochondrial membrane potential. Hemmerich et al. (2020) developed machine learning models for predicting mitochondrial dysfunction based on merging data from Zhang et al. (2009), Tox21 (Attene-Ramos et al. 2015), ChEMBL (Mendez et al. 2019) and PubChem (Kim et al. 2019). After data preparation, the dataset contained 5761 chemicals with 824 mitochondrial toxicants and 4937 non-mitochondrial toxicants. Models generated with random forest, gradient boosting and deep learning achieved balanced accuracies of 0.866, 0.894 and 0.895, respectively. These mitochondrial dysfunction QSAR models were constructed based on imbalanced datasets where the number of chemicals in one class is remarkably higher than that in the other class (Li et al. 2016). Previous studies showed that QSAR models trained on imbalanced datasets have poor performance (Guo et al. 2017). Therefore, methods to solve this imbalanced data issue are needed. Tang et al. (2020) developed mitochondrial dysfunction QSAR models based on the Tox21 mitochondrial dysfunction imbalanced data using machine learning methods (i.e., support vector machine, random forest, classification and regression tree, logistic regression and naive Bayes) and 12 types of fingerprints. The dataset for model construction included 1284 active and 3527 inactive chemicals. To solve the imbalance issue in the dataset, a threshold-moving method was employed. In the threshold-moving method, the proportion of active chemicals was defined as a classification threshold for the discriminant models. Chemicals were classified as active chemicals if the classification probabilities are greater than 0.267 (1284/4811 = 0.267) and inactive otherwise. Consensus models were also constructed through a strategy of average classification probabilities. A total of 4083 consensus models were constructed for each machine learning algorithm by combining different numbers (2–12) of the 12 individual models. The results of Tang et al. (2020) demonstrated that consensus models outperformed individual models for mitochondrial dysfunction. The best consensus model was constructed via the random forest algorithm with BA, AUC, SE and SP of 81.8%, 89.9%, 82.9% and 80.7% in the tenfold cross-validation, respectively. The performance of the best consensus model is similar to the deep learning model on mitochondrial dysfunction in the Tox21 challenge. The small size of the dataset may be the reason why the deep learning model did not outperform the RF model. Compared with the deep learning algorithm that needs much computational time, traditional machine learning algorithms such as RF may be a better choice for constructing QSAR models trained on limited training data. The mitochondrial dysfunction datasets in Tox21 project were produced by a mitochondrial membrane potential assay (Attene-Ramos et al. 2015). As the mitochondrial membrane potential can be altered by different toxicity mechanisms, the models cannot be used to predict chemicals with other mechanisms of mitochondrial dysfunction such as inhibition of mitochondrial fusion and fission (Meyer et al. 2017). Mitochondria fusion and fission processes are critical to maintain mitochondrial functions (Westermann 2010; Youle and van der Bliek 2012). Inhibition of mitochondrial fusion and fission can lead to detrimental events like neurodegenerative disorders.

17 Machine Learning-Based QSAR Models and Structural Alerts …

441

Tang et al. (2021) developed QSAR models for predicting the inhibition of mitochondrial fusion and fission using machine learning algorithms (random forest, deep neural network, logistic regression and Bernoulli naive Bayes) based on the data collected from the PubChem database and additional literature. For inhibition of mitochondrial fusion and fission, the best model was constructed by a logistic regression algorithm. One hundred repetitions of fivefold cross-validation showed that the best model for predicting inhibitors of mitochondrial fission yielded AUC, BA, SE and SP of 84.3%, 76.9%, 76.6% and 77.3%, respectively, and the best model for predicting inhibitors of mitochondrial fusion had AUC, BA, SE and SP of 82.8%, 75.4%, 76.5% and 74.3%, respectively. External validation showed that the best model for mitochondrial fission yielded AUC, BA, SE and SP of 97.5%, 82.5%, 65.0% and 100.0%, respectively, and the best model for mitochondrial fusion had AUC, BA, SE and SP of 78.1%, 69.7%, 54.2% and 85.2%, respectively. Table 17.1 shows the existing machine learning-based QSAR models for predicting mitochondrial dysfunction. Table 17.1 Machine learning QSAR models for predicting mitochondrial dysfunction References

Endpoints

Algorithms

Data size

Model performance

Abdelaziz et al. (2016)

Disruption of MMP

Associative neural network

5941

BA = 90.4%, AUC = 95.0%

Mayr et al. (2016)

Disruption of MMP

Deep neural network

5941

BA = 71.4%, AUC = 94.1%

Barta (2016)

Disruption of MMP

Random forest

5941

BA = 69.2%, AUC = 94.6%

Hemmerich et al. (2020)

Disruption of MMP

Deep neural network

5761

BA = 89.5%

Tang et al. (2020)

Disruption of MMP

Random forest

4811

AUC = 89.9%, BA = 81.8%, SE = 82.9%, SP = 80.7%

Tang et al. (2021)

Inhibition of mitochondrial fusion

Logistic regression

1146

AUC = 82.8%, BA = 75.4%, SE = 76.5%, SP = 74.3%

Tang et al. (2021)

Inhibition of mitochondrial fission

Logistic regression

1473

AUC = 84.3%, BA = 76.9%, SE = 76.6%, SP = 77.3%

Zhang et al. (2009) Mitochondrial dysfunction

Support vector machine

288

Acccv = 84.6%, Accext = 77.1%

Zhang et al. (2017) Mitochondrial dysfunction

Naive Bayes

288

Acccv = 95.0%, Accext = 81.0%

442

W. Tang et al.

17.3.2 Structural Alerts for Mitochondrial Dysfunction Most of the mitochondrial dysfunction structural alerts published in literature are associated with uncoupling of oxidative phosphorylation. Naven et al. (2013) identified 11 structural alerts on uncoupling of oxidative phosphorylation based on assay data generated by respiratory screening technology, which measured oxygen consumption in isolated mitochondria. The 11 structural alerts achieved a sensitivity of 68% and a specificity of over 99%. Naven et al. (2013) found that lipophilicity and acidity are key factors to affect uncoupling activity of chemicals. Based on lipophilicity and acidity of compounds, Enoch et al. (2018) developed a decision tree model that can assign compounds into three levels (i.e., no concern, moderate and high concern) of uncoupling of oxidative phosphorylation. The model was employed to screen 31,778 chemicals collected from the OECD QSAR Toolbox. From 432 chemicals that are regarded as being of high concern, 12 new structural alerts related to uncoupling of oxidative phosphorylation were identified. Tang et al. (2020) identified structural alerts related to mitochondrial membrane potential disruption with three substructure analysis methods i.e., information gain, substructure frequency analysis, and MoSS (a node in KNIME software) based on the Tox21 dataset of mitochondrial dysfunction. The results of the substructure analysis methods showed that phenolic, aromatic, nitro, arylchloride, carboxylic acid and carboxylic acid derivatives are significant for classifying compounds with mitochondrial dysfunction. Some studies reported mitochondrial dysfunction structural alerts with other toxicity mechanisms. Nelms et al. (2015) identified mitochondrial dysfunction structural alerts based on data from 288 chemicals collected from Zhang et al. (2009). The chemicals were divided into eight categories based on structural similarity. Mitochondrial dysfunction structural alerts were identified from these categories, covering five toxicity mechanisms including inhibition of the electron transport chain, initiation of the death receptor pathway, alternative electron acceptance, induction of the membrane permeability transition and uncoupling of oxidative phosphorylation. Tang et al. (2021) identified structural alerts related to inhibition of mitochondrial fusion and fission using the SARpy method. SARpy can extract key substructures associated with toxicity of organic chemicals from SMILES codes. For inhibition of mitochondrial fusion, out of the 45 substructures extracted by SARpy, seven were found to be good structural alerts. The 45 substructures on inhibition of mitochondrial fusion included nitrogen-containing derivatives such as pyrimidine, amino, pyrazole and triazine. For inhibition of mitochondrial fission, 56 substructures were extracted, including nitrogen-containing derivatives, phenolic chemicals and halogenated benzenes. These substructures can achieve balanced accuracies of 75.4% and 81.0%, respectively, indicating that they can be structural alerts for identifying inhibitors of mitochondrial fusion and fission.

17 Machine Learning-Based QSAR Models and Structural Alerts …

443

17.4 Conclusions and Future Directions Beneficial from the advancement of high-throughput screening, a large quantity of mitochondrial dysfunction experimental data were produced and shared in some public databases. QSAR models for predicting mitochondrial dysfunction have been developed based on the dataset using machine learning methods. The models are probably useful for toxicologist or regulatory agencies to generate early assessments on chemical toxicity. Although the models have satisfactory performance, they are classification models that discriminate compounds as mitochondrial toxicants or not. Classification models cannot provide a quantitative prediction on activity of mitochondrial dysfunction. Therefore, regression models that can predict mitochondrial dysfunction activity with continuous values such as 50% effective concentrations should be developed in further studies. The existing models of mitochondrial dysfunction are single-task models that can only predict one endpoint. As mitochondrial dysfunction can be caused through multiple toxicity pathways, it is attractive to construct multi-task QSAR models for simultaneously predicting multiple endpoints on mitochondrial dysfunction.

References Abdelaziz A, Spahn-Langguth H, Schramm K, Tetko IV (2016) Consensus modeling for HTS assays using in silico descriptors calculates the best balanced accuracy in Tox21 challenge. Front Environ Sci 4:2 Attene-Ramos MS, Huang RL, Michael S, Witt KL, Richard A, Tice RR, Simeonov A, Austin CP, Xia MH (2015) Profiling of the Tox21 chemical collection for mitochondrial function to identify compounds that acutely decrease mitochondrial membrane potential. Environ Health Perspect 123:49–56 Balaz S, Sturdik E, Durcova E, Antalik M, Sulo P (1986) Quantitative structure-activity relationship of carbonylcyanide phenylhydrazones as uncouplers of mitochondrial oxidative-phosphorylation. Biochim Biophys Acta 851:93–98 Barta G (2016) Identifying biological pathway interrupting toxins using multi-tree ensembles. Front Environ Sci 4:52 Basile AO, Yahi A, Tatonetti NP (2019) Artificial intelligence for drug toxicity and safety. Trends Pharmacol Sci 40:624–635 Breiman L (2001) Random forests. Mach Learn 45:5–32 Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, Dearden J, Gramatica P, Martin YC, Todeschini R, Consonni V, Kuz’min VE, Cramer R, Benigni R, Yang C, Rathman J, Terfloth L, Gasteiger J, Richard A, Tropsha A (2014) QSAR modeling: where have you been? Where are you going to? J Med Chem 57:4977–5010 Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297 Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 40:139–157 Enoch SJ, Schultz TW, Popova IG, Vasilev KG, Mekenyan OG (2018) Development of a decision tree for mitochondrial dysfunction: uncoupling of oxidative phosphorylation. Chem Res Toxicol 31:814–820

444

W. Tang et al.

Ferrari T, Cattaneo D, Gini G, Bakhtyari NG, Manganaro A, Benfenati E (2013) Automatic knowledge extraction from chemical structures: the case of mutagenicity prediction. SAR QSAR Environ Res 24:631–649 Fetterman JL, Sammy MJ, Ballinger SW (2017) Mitochondrial toxicity of tobacco smoke and air pollution. Toxicology 391:18–33 Guo H, Li Y, Jennifer S, Gu M, Huang Y, Gong B (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239 Hemmerich J, Troger F, Fuezi B, Ecker GF (2020) Using machine learning methods and structural alerts for prediction of mitochondrial toxicity. Mol Inform 39(5):e2000005 Huang R, Xia M (2017) Editorial: Tox21 challenge to build predictive models of nuclear receptor and stress response pathways as mediated by exposure to environmental toxicants and drugs. Front Environ Sci 5:3 Inglese J, Auld DS, Jadhav A, Johnson RL, Simeonov A, Yasgar A, Zheng W, Austin CP (2006) Quantitative high-throughput screening: a titration-based approach that efficiently identifies biological activities in large chemical libraries. Proc Natl Acad Sci U S A 103:11473–11478 Kavlock R, Dix D (2010) Computational toxicology as implemented by the us Epa: providing high throughput decision support tools for screening and assessing chemical exposure, hazard and risk. J Toxicol Env Health-Pt b-Crit Rev 13:197–217 Kim S, Chen J, Cheng TJ, Gindulyte A, He J, He SQ, Li QL, Shoemaker BA, Thiessen PA, Yu B, Zaslavsky L, Zhang J, Bolton EE (2019) PubChem 2019 update: improved access to chemical data. Nucleic Acids Res 47:D1102–D1109 LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444 Lemon SC, Roy J, Clark MA, Friedmann PD, Rakowski W (2003) Classification and regression tree analysis in public health: methodological review and comparison with logistic regression. Ann Behav Med 26:172–181 Li YJ, Guo HX, Liu X, Li YA, Li JL (2016) Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data. Knowl-Based Syst 94:88–104 Lim W, Yang C, Jeong M, Bazer FW, Song G (2017) Coumestrol induces mitochondrial dysfunction by stimulating ROS production and calcium ion influx into mitochondria in human placental choriocarcinoma cells. Mol Hum Reprod 23:786–802 Lowe CN, Williams AJ (2021) Enabling high-throughput searches for multiple chemical data using the US-EPA CompTox chemicals dashboard. J Chem Inf Model 61:565–570 Mayr A, Klambauer G, Unterthiner T, Hochreiter S (2016) DeepTox: toxicity prediction using deep learning. Front Environ Sci 3:80 Mendez D, Gaulton A, Bento AP, Chambers J, De Veij M, Felix E, Magarinos MP, Mosquera JF, Mutowo P, Nowotka M, Gordillo-Maranon M, Hunter F, Junco L, Mugumbate G, RodriguezLopez M, Atkinson F, Bosc N, Radoux C, Segura-Cabrera A, Hersey A, Leach AR (2019) ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res 47:D930–D940 Meyer JN, Leuthner TC, Luz AL (2017) Mitochondrial fusion, fission, and mitochondrial toxicity. Toxicology 391:42–53 Miyoshi H, Nishioka T, Fujita T (1987) Quantitative relationship between protonophoric and uncoupling activities of substituted phenols. Biochim Biophys Acta 891:194–204 Naven RT, Swiss R, Klug-McLeod J, Will Y, Greene N (2013) The development of structure-activity relationships for mitochondrial dysfunction: uncoupling of oxidative phosphorylation. Toxicol Sci 131:271–278 Nelms MD, Mellor CL, Cronin MT, Madden JC, Enoch SJ (2015) Development of an in silico profiler for mitochondrial toxicity. Chem Res Toxicol 28:1891–1902 Picard M, Wallace DC, Burelle Y (2016) The rise of mitochondria in medicine. Mitochondrion 30:105–116 Polishchuk P (2017) Interpretation of quantitative structure-activity relationship models: past, present, and future. J Chem Inf Model 57:2618–2639

17 Machine Learning-Based QSAR Models and Structural Alerts …

445

Raies AB, Bajic VB (2016) In silico toxicology: computational methods for the prediction of chemical toxicity. Wiley Interdiscip Rev-Comput Mol Sci 6:147–172 Ren YY, Zhou LC, Yang L, Liu PY, Zhao BW, Liu HX (2016) Predicting the aquatic toxicity mode of action using logistic regression and linear discriminant analysis. SAR QSAR Environ Res 27:721–746 Rhee SG (2006) H2O2, a necessary evil for cell signaling. Science 5782:1882–1883 Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117 Spycher S, Smejtek P, Netzeva TI, Escher BI (2008) Toward a class-independent quantitative structure-activity relationship model for uncouplers of oxidative phosphorylation. Chem Res Toxicol 21:911–927 Sun HM (2005) A naive Bayes classifier for prediction of multidrug resistance reversal activity on the basis of atom typing. J Med Chem 48:4031–4039 Tang W, Chen J, Wang Z, Xie H, Hong H (2018) Deep learning for predicting toxicity of chemicals: a mini review. J Environ Sci Health Pt C-Environ Carcinog Ecotoxicol Rev 36:252–271 Tang W, Chen J, Hong H (2020) Discriminant models on mitochondrial toxicity improved by consensus modeling and resolving imbalance in training. Chemosphere 253:126768 Tang W, Chen J, Hong H (2021) Development of classification models for predicting inhibition of mitochondrial fusion and fission using machine learning methods. Chemosphere 273:128567 Thomas RS, Paules RS, Simeonov A, Fitzpatrick SC, Crofton KM, Casey WM, Mendrick DL (2018) The US federal Tox21 program: a strategic and operational plan for continued leadership. ALTEX-Altern Anim Exp 35:163–168 UNEP (2019) Global chemicals outlook II. From legacies to innovative solutions: implementing the 2030 agenda for sustainable development—synthesis report. United Nations Environment Programme, 2019. https://www.unenvironment.org/resources/report/global-chemicalsout look-ii-legacies-innovative-solutions. (Accessed 2021-12-3) Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, Li B, Madabhushi A, Shah P, Spitzer M, Zhao SR (2019) Applications of machine learning in drug discovery and development. Nat Rev Drug Discov 18:463–477 Vo AH, Van Vleet TR, Gupta RR, Liguori MJ, Rao MS (2020) An overview of machine learning and big data for drug toxicity evaluation. Chem Res Toxicol 33:20–37 Wallace DC (2011) Bioenergetic origins of complexity and disease. Metab Dis 76:1–16 Wallace KB, Starkov AA (2000) Mitochondrial targets of drug toxicity. Annu Rev Pharmacol Toxicol 40:353–388 Wang Z, Chen J, Qiao X, Li X, Xie H (2016) Computational toxicology: oriented for chemicals risk assessment. Sci Sin Chim 46:222–240 Wang Z, Walker GW, Muir DCG, Nagatani-Yoshida K (2020a) Toward a global understanding of chemical pollution: a first comprehensive analysis of national and regional chemical inventories. Environ Sci Technol 54(5):2575–2584 Wang Z, Chen J, Hong H (2020b) Applicability domains enhance application of PPAR gamma agonist classifiers trained by drug-like compounds to environmental chemicals. Chem Res Toxicol 33:1382–1388 Wang Z, Chen J, Hong H (2021) Developing QSAR models with defined applicability domains on PPAR gamma binding affinity using large data sets and machine learning algorithms. Environ Sci Technol 55:6857–6866 Westermann B (2010) Mitochondrial fusion and fission in cell life and death. Nat Rev Mol Cell Biol 11:872–884 Wills LP, Beeson GC, Hoover DB, Schnellmann RG, Beeson CC (2015) Assessment of ToxCast phase II for mitochondrial liabilities using a high-throughput respirometric assay. Toxicol Sci 146:226–234 Yang HB, Lou CF, Li WH, Liu GX, Tang Y (2020) Computational approaches to identify structural alerts and their applications in environmental toxicology and drug discovery. Chem Res Toxicol 33:1312–1322

446

W. Tang et al.

Youle RJ, van der Bliek AM (2012) Mitochondrial fission, fusion, and stress. Science 337:1062– 1065 Zhang H, Chen QY, Xiang ML, Ma CY, Huang Q, Yang SY (2009) In silico prediction of mitochondrial toxicity by using GA-CG-SVM approach. Toxicol Vitro 23:134–140 Zhang H, Yu P, Ren JX, Li XB, Wang HL, Ding L, Kong WB (2017) Development of novel prediction model for drug-induced mitochondrial toxicity by using naive Bayes classifier method. Food Chem Toxicol 110:122–129 Zolkipli-Cunningham Z, Falk MJ (2017) Clinical effects of chemical exposures on mitochondrial function. Toxicology 391:90–99

Chapter 18

Machine Learning and Deep Learning Applications to Evaluate Mutagenicity Linlin Zhao and Catrin Hasselgren

18.1 In Silico Methods to Predict Bacterial Mutagenicity Chemical agents that could induce permanent, transmissible changes into DNA by producing DNA adducts, insertions, inversions, and small deletions are mutagens (Ames et al. 1975; Hasselgren et al. 2019). Mutagenicity evaluation for chemical agents is an important aspect of risk assessment for industrial chemical registration (Eastmond et al. 2009), for example. To complement in vitro and in vivo testing, in silico methods, including knowledge-based methods (rule-based models), machine learning (ML), and more recently, deep learning (DL) methods, have been used to predict mutagenicity. Computational methods are now becoming more broadly accepted by regulatory agencies, starting with Registration, Evaluation, Authorization, and Restriction of Chemicals (REACH) (2016), followed by the ICH M7 guideline (2018a) and others (Ji et al. 2017; Tcheremenskaia et al. 2019; Hasselgren et al. 2019). This chapter outlines the state of the art of applying ML and DL methods to predict mutagenicity and their role in chemical risk assessment.

18.2 Data for Modeling Mutagenicity There are various assays that generate mutagenicity data suitable for facilitating ML and DL applications, including bacterial mutation assays and mammalian cell gene mutation assays. The bacterial reverse mutation (Ames) test (Ames et al. 1975; Maron L. Zhao · C. Hasselgren (B) Genentech, Inc, 1 DNA Way, South San Francisco, CA 94080, USA e-mail: [email protected] L. Zhao e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Hong (ed.), Machine Learning and Deep Learning in Computational Toxicology, Computational Methods in Engineering & the Sciences, https://doi.org/10.1007/978-3-031-20730-3_18

447

448

L. Zhao and C. Hasselgren

and Ames 1983; OECD 2020) has been used as the gold standard for mutagenicity assessment. It is performed as part of the regulatory genotoxicity testing package to fulfill the requirement of the new drug submission (2018b, p. 2, 2020; Hasselgren et al. 2019) and often for the registration of chemicals and/or to assess worker safety. Please refer to the OECD guideline for experimental protocol (OECD 2020). Hence, a large amount of bacterial mutagenicity data has been generated over the last 40 + years that can be utilized for the development of in silico models. It should be noted that the experimental protocols have changed over time and some historical data would today not be considered compliant and would be deemed low quality. Assessing data quality is an important aspect of model building which is sometimes overlooked but can have significant impact on the quality of the models. Table 18.1 lists some popular mutagenicity datasets that can be used for ML and DL modeling. Since Ashby, J., & Tennant, R. W. published their paper discussing the structure–activity relationship (SAR) relating chemical structure to bacterial mutagenicity in 1988 (Ashby and Tennant 1988) and 1991 (Ashby and Tennant 1991), many mutagenicity datasets have been constructed and published. In 2001, Sawatari et al. 2001 published a database containing 4224 chemicals constructed from the Japan Industrial Safety and Health Law (ISHL) database (Ji et al. 2017) and the National Toxicology Program (NTP) database (2022). There are 863 compounds classified as positive and 3201 as negative included in the database, and SARs were discussed relating 44 of the substructures to bacterial mutagenicity. Kazius et al. 2005 constructed a mutagenicity dataset from the Chemical Carcinogenicity Research Information System (CCRIS) database (2021a)1 and several other sources including NTP, Comparative Toxicogenomics Database (CTD), Genetic Activity Profile (GAP) database, and Carcinogenic Potency Database (CPDB) (Waters et al. 1991, 2021b, c, 2022)2 in 2005, which contains 4337 molecular structures with corresponding bacterial mutagenicity data (2401 mutagens and 1936 non-mutagens). Around the same time, Contrera et al. 2005 published a software for mutagenicity prediction using the dataset generated from the United States Environmental Protection Agency (EPA) database. The dataset contains 3338 compounds and comprises Salmonella and Escherichia Coli (E. coli) reverse mutation data, and data from the Bacillus subtilis rec spot test (Takigami et al. 2002). In 2009, Hansen et al. 2009 published an Ames Salmonella mutagenicity benchmark dataset, which including 6512 compounds (3503 mutagens and 3009 non-mutagens). This benchmark dataset includes mutagenicity data collected prior to 2009, it was derived from CCRIS (2021a), Helma et al. 2004, Kazius et al. 2005, Feng et al. Contrera 2013, VITIC (Judson et al. 2005), and GENE-TOX (2021d).3 Several commercial mutagenicity databases were also developed such as those coming out of Leadscope® (now part of Instem) (2021e, f), MultiCASE (2019a) and Lhasa Limited (2021 g). These databases 1

CCRIS, CTD, CPDB, GENE-TOX was retired and please check the website here for more information: https://www.nlm.nih.gov/toxnet/index.html; The compounds of CCRIS and GENE-TOX have been archived to PubChem database. 2 See Footnote 1. 3 See Footnote 1.

18 Machine Learning and Deep Learning Applications to Evaluate …

449

overlap to some extent with the Hansen’s benchmark dataset, according to the study published by Hillebrecht et al. (Hillebrecht et al. 2011). In addition, there are some mutagenicity datasets compiled using additional sources (Xu et al. 2012; Kumar et al. 2021), and the Hansen’s benchmark data were used for validation purposes in their studies. In 2019, a large bacterial mutagenicity database containing 12,140 chemicals was partially released by the Division of Genetics and Mutagenesis, National Institute of Health Sciences (DGM/NIHS) of Japan, for the purpose of aiming to help the QSAR vendors to validate and improve their QSAR tools for mutagenicity predictions (Honma et al. 2019).

18.3 Traditional Machine Learning for Mutagenicity Prediction The SAR study published in 1988 (Ashby and Tennant 1988) indicated the potential of applying computational methods to understand and predict mutagenicity and the relative wealth of bacterial mutagenicity data available accelerated the development of methods. In addition, the use of quantitative structure activity relationship (QSAR) methods, which includes ML models, to predict mutagenicity and highlight toxicity for risk assessment purposes, contributed to the interest scientific advancements in this field. Modeling efforts using other types of genotoxicity data have also been developed albeit less extensively, mainly due to fewer data points being available, and these are not further discussed in this chapter. A large number of models using different bacterial mutagenicity datasets (often with considerable overlap) and various algorithms have been developed and published, both by academic groups as well as by software vendors. Some of the more well-known and commonly used models are listed in Table 18.2. The Leadscope® Genetox Statistical Model (2019b), Sarah Nexus (2021i), the statistical models from MultiCASE (2019a), CAESAR (2021j), and SciQSAR (2013) are all examples of ML applications, or QSAR models, using mainly traditional structural features and physicochemical property descriptors. Some examples of the knowledge-based models include Leadscope® Genetox Expert Alerts (2021 k), Derek Nexus (2021 l), and Toxtree (Contrera 2013). In addition, ML models developed using less common features/algorithm combinations (Webb et al. 2014; Zhang et al. 2017; Kuhnke et al. 2019; Norinder et al. 2019; Hao et al. 2019) such as models using quantum chemistry descriptors (Hao et al. 2019), mechanistic reactivity descriptors (Kuhnke et al. 2019), feature combination networks (Webb et al. 2014), and novel naïve Bayes classification models (Zhang et al. 2017) have been described. The predictivity of commercial and freeware models has been evaluated in several studies (Hayashi et al. 2005; Snyder 2009; Hansen et al. 2009; European Commission. Joint Research Centre. Institute for Health and Consumer Protection 2010; Hillebrecht et al. 2011; Naven et al. 2012; Ford et al. 2017; Van Bossuyt et al. 2018). For example, in the study published by Hansen et al. 2009 in 2009, QSAR

450

L. Zhao and C. Hasselgren

Table 18.1 Representative mutagenicity datasets Dataset

Size

Ames information

SAR dataset 1 by Ashby, J., and Tennant, R. W

222

SAR dataset 2 by Ashby, J., and Tennant, R. W

Data source

References

115 carcinogens, 1988 the 24 equivocal carcinogens and the 83 non-carcinogens

United States National Cancer Institute (NCI) (1980)/NTP (2022)

Ashby and Tennant 1988

301

154 alerting chemicals and 147 non-alerting chemicals

United States NTP (2022)

Ashby and Tennant 1991

SAR dataset by Sawatari et al

4224

863 mutagenicity 2001 positive and 3201 mutagenicity negative compounds

Japan ISHL (Ji Sawatari et al. et al. 2017) and 2001 United States NTP (2022)

Dataset by Kazius et al

4337

2401 mutagens and 1936 non-mutagens

2005

CCRIS (2021a), NTP (2022), CTD (2021b), GAP (Waters et al. 1991), and CPDB (2021c)4

Kazius et al. 2005

SciQSAR (formerly MDL-QSA) dataset by Contrera, Joseph F., et al

3338

Combining all Salmonella, E. coli, and the Bacillus subtilis rec spot test study results

2005

United States Environmental Protection Agency (EPA) (2021 h)

Contrera et al. 2005

3503 mutagens and 3009 non-mutagens

2009

CCRIS Hansen et al. (2021a), Helma 2009 et al. 2004, Kazius et al. 2005, Feng et al. 2003, VITIC (Judson et al. 2005), and GENE-TOX (2021d)5

Benchmark dataset 6512 by Hansen, Katja, et al

Year

1991

(continued)

4 5

See Footnote 1. See Footnote 1.

18 Machine Learning and Deep Learning Applications to Evaluate …

451

Table 18.1 (continued) Dataset

Size

Ames information

Year

Data source

References

DGM/NIHS dataset

12,140

1757 positive compounds (672 compounds are strong positives) and 10,383 negative compounds

2019

Industrial Honma et al. safety and 2019 health act (ANEI-HOU) of Japan, and The Division of Genetics and Mutagenesis, National Institute of Health Sciences (DGM/NIHS)

Leadscope® SAR Genetox database

11,028

33 bacterial mutagenesis endpoints

Queried in 2021

N/A

Leadscope ® (2021e)

Leadscope® drugs Genetox database

594 drugs and drug products

16,419 detailed genetic toxicity test data

Queried in 2021

N/A

Leadscope ® (2021f)

MultiCASE database

N/A

N/A

Queried in 2021

N/A

MultiCASE (2019a)

Dataset by Kumar, R. et al

4053

2293 mutagens and 1760 non-mutagens

2021

Kazius et al. Kumar et al. 2005; Hsu et al. 2021 2016; Bhagat et al. 2018; Guan et al. 2018; Xu et al. 2012

models using the benchmark dataset (Tropsha 2010; Cherkasov et al. 2014) with four ML algorithms—support vector machines (SVM) (Hearst et al. 1998), random forests (RF) (Breiman 2001), k-nearest neighbors (kNN) (Kramer 2013), and Gaussian processes (GP) (Rasmussen 2004) were compared with three commercial tools: DEREK (not a ML method) (Sanderson and Earnshaw 1991, 2007a), the statistical model from MultiCASE (Klopman 1992, 2007b), and a commercial Bayesian machine learner in Pipeline Pilot (2009). The five non-commercial ML models— SVM, GP, RF, kNN, and the commercial Pipeline Pilot model all produced good and comparable results on the benchmark data set (Hansen et al. 2009). Using the same benchmark dataset, two other tools, Toxtree (Patlewicz et al. 2008) and SciQSAR (formerly MDL-QSAR) (2013), were validated by Contrera 2013. Additionally, the predictive power of DEREK (Sanderson and Earnshaw 1991; Greene et al. 1999; Judson 2006, 2007a), Toxtree (Lahl and Gundert-Remy 2008; Pavan and Worth 2008, 2010a), MultiCASE model (MC4PC) (Klopman 1984, 1992, 2010b), and

452

L. Zhao and C. Hasselgren

Table 18.2 Representative in silico tools for Ames mutagenicity Tool

Vendor/modeler information

Model type

References (Websites are accessed in Dec 2021)

Leadscope® Genetox expert alerts

Leadscope Inc

Structural alert/knowledge-based model

Website (2021 k)

QSAR model

Website (2019b)

Structural alert/knowledge-based model

Website (2021 l)

Leadscope® Genetox statistical models Derek Nexus

Lhasa Limited

Sarah Nexus

QSAR model

Website (2021i)

MultiCASE

MultiCASE Inc

Structural alert/knowledge-based and QSAR models

Website (2019a)

Toxtree

Commissioned by JRC Computational Toxicology and Modeling and developed by Ideaconsult Ltd

Structural alert/knowledge-based model

Patlewicz et al. 2008

CAESAR (CAE) mutagenicity model

Istituto di Ricerche Farmacologiche Mario Negri

QSAR model

Website (2021j)

SciQSAR (formerly MDL-QSAR)

SciVision

QSAR model

Validation by Contrera 2013

Lazar mutagenicity

Maunz, Andreas, et al

QSAR model

Maunz et al. 2013

OASIS/TIMES

LMC—Bourgas University

QSAR model and liver metabolic simulator

Website (2021 m)

OECD Toolbox

OECD

QSAR model

Website (2021n)

ADMEWorks

Fujitsu

QSAR model

Website (2021o)

Leadscope® Model Applier (LSMA, the QSAR type module) (2010c) was evaluated using a large and high-quality data set by Hillebrecht et al. 2011. Since most of these QSAR models were initially developed using the majority of the publicly available data, the models assessed in the studies mentioned above generally show high sensitivity when validated with public datasets. This is further discussed in the review from Benigni and Bossa 2019, which more recently, provided a detailed evaluation of QSAR methods for mutagenicity. In an effort to validate and improve available QSAR models using external (new) data, the Division of Genetics and Mutagenesis, National Institute of Health Sciences (DGM/NIHS) of Japan, partially released a bacterial mutagenicity database containing more than 10,000 chemicals. The majority of these had not previously

18 Machine Learning and Deep Learning Applications to Evaluate …

453

been used for the development of QSAR models (Honma et al. 2019). In the initial validation, 17 models were evaluated. The predictivity varied with the overall accuracy ranging from 64 to 84% with some of the more commonly used, such as Derek Nexus/Sarah Nexus, MultiCASE models (2019a), and Leadscope® Genetox Expert Alerts and Genetox Statistical Models (2019b) demonstrating a high overall accuracy (> 80%) with roughly 58–69% sensitivity and 75–86% specificity. The numbers are very similar to what the European Food Safety Authority (EFSA) (Benigni et al. 2020) found in their evaluation, stating that all the models generated statistically significant predictions and accuracy of about 80%. This is comparable with the experimental variability of the test. It is important to note that the majority of evaluations mentioned here discuss mostly predictivity of public data, often originating from the chemical industry, for example. It has been noted that the predictivity of proprietary compounds may not be as high and chemical coverage may be low (Brigo and Muster 2016).

18.4 Deep Learning for Mutagenicity Prediction With the recent technology advances, such as the improvement of computing speed of computers and development of novel optimization methods, DL techniques have been applied successfully in many areas. Examples include object detection and recognition in image research, autonomous driving, and text generation for natural language processing (LeCun et al. 2015; Zhao et al. 2019). It has also attracted massive attention from the biomedical field (LeCun et al. 2015; Khan and Yairi 2018; Miotto et al. 2018; Zhao et al. 2019; Zhu 2020), and with the increased generation of large amounts of data, state-of-the-art DL approaches have shown the potential to have a significant impact on chemical toxicology as well as on drug development in general (Unterthiner et al. 2015; Baskin 2018; Jing et al. 2018; Tang et al. 2018; Lavecchia 2019; Gini and Zanoli 2020; Hemmerich and Ecker 2020; Chary et al. 2020; Jiao et al. 2020; Kleinstreuer et al. 2021). DL methods learn from the input by constructing multiple hidden layers, which are the representations of the previous (lower level) layers, with highly optimized algorithms, and progressively extracting higher-level features from the raw input (LeCun et al. 2015). DL methods have recently been applied to predict bacterial mutagenicity (Chakravarti and Alla 2019; Li et al. 2021; Hung and Gini 2021; Moon et al. 2022). For example, Li et al. 2021 used an advanced graph convolutional neural network (GCNN) (Zhang et al. 2019b) architecture to identify the molecular representation and develop predictive models for mutagenicity. Their model took advantage of the GCNN architecture; thus, the model cannot only predict the mutagenicity of compounds but also identify the alerting structural features in compounds (Li et al. 2021). This methodology was also used by, Hung and Gini 2021 who also developed GCNN-based models for mutagenicity prediction. In addition, the long short-term memory (LSTM) neural networks (Staudemeyer and Morris 2019) architecture has been applied to mutagenicity data for predictive modeling by Chakravarti and Alla

454

L. Zhao and C. Hasselgren

2019. Their models were trained using either traditional SMILES (Weininger 1988) (which is a line notation for describing the structure of chemicals) or using a new linear molecular notation developed in the study. The prediction accuracy of LSTM models was on par with traditional QSAR models but LSTM models were superior in predicting test chemicals that are dissimilar to the training set compounds. Regarding the often discussed question of what the advantage is of DL comparing to traditional ML methods, there are many studies evaluating the model performance of DL methods versus traditional ML methods (Xu et al. 2012; Gini et al. 2019; Zhang et al. 2019a; Kumar et al. 2021). For example, Kumar et al. 2021 compared the models for mutagenicity prediction using deep neural network (DNN), SVM (Hearst et al. 1998), kNN (Kramer 2013), and RF (Breiman 2001), respectively. They concluded that DNN-based models not only fit the data with better performance as compared to the traditional ML algorithms but also yields better performance metrics such as prediction accuracy, area under the receiver’s operating curve (AUC ROC) and precision-recall curve (PR curve) values. However, in the study of Xu et al. 2012, the three-layer model built using the artificial neural network (ANN) architecture, did not outperform the traditional ML algorithms which included SVM, C4.5 Decision Tree (C4.5 DT) (Hssina et al. 2014, p. 5), kNN, and naïve Bayes (NB) (Kaur and Oberai 2014). Thus, similar to the evaluation of traditional ML models, there is not a clear answer as to which method yields better models. The fact that many different methodologies yield similar results indicates that the limitations in predictivity are more related to availability of data rather than deficiencies in technology. Additional experimental data would be useful for further both further validation of the predictive power and for increased chemical coverage of both traditional QSAR tools and DL methods.

18.5 Discussion and Perspective In silico mutagenicity models are generally divided into two types: 1) knowledgebased systems, including rule-based methods (e.g., Leadscope® Genetox Expert Alerts (2021 k) and Derek Nexus (2021 l) and (2) ML and more recently, DL models. Other models such as DNA docking models (Snyder et al. 2005, 2013) have been proposed but are less frequently used. Additionally, read-across approaches (Gini et al. 2014, 2021p), which are sometimes considered in silico approaches, are often used for risk assessment. ML and DL approaches are typically built, and applied, as classification models to illustrate the relationship between molecular properties (e.g., lipophilicity and polarizability) and/or the molecule structure representation and the mutagenic activity of compounds. From a risk assessment perspective, understanding what feature(s) drives the activity is often of critical importance, both from the context of potentially modifying the molecule, in a molecular design setting, and from the context of doing expert review and understanding the mechanistic foundation of a prediction for a regulatory application. Thus, it is useful if the model is transparent and is interpretable in a manner that allows for identification of potential problematic

18 Machine Learning and Deep Learning Applications to Evaluate …

455

scaffolds or structural features. Since it can be hard to gain a comprehensive understanding of the inner working mechanism after models have been trained, many ML models, especially deep neural networks, are considered “black boxes” (2021q). Methods for overcoming this for methods like SVM and RF have been suggested (Carlsson et al. 2009), but it can in general be difficult to directly indicate whether a substructures/scaffold is related to mutagenicity by analyzing DL model predictions, especially. Another challenge when applying models in an industrial sense is the lack of refined SAR information and how activity is influenced by smaller changes in structure. Often this results in all compounds with a particular core scaffold being predicted in the same category, and there is a lot of work ongoing to overcome this. One such effort is including the chemical space of proprietary compounds when defining structural alerts and hence being able to define these in more detail (Ahlberg et al. 2016; Amberg et al. 2019). Additionally, data sharing efforts and collaborative efforts are having impact in this area (Patel et al. 2018). Applications within research as part of a screening cascade, for example, as well as regulatory applications are widespread within industry and are part of the everyday work. The successful development and use of in silico methods, ML and DL, is highly related to the amount and quality of experimental data available (Zhao and Zhu 2018; Idakwo et al. 2018; Hasselgren et al. 2019). As discussed in the DGM/NIHS publication (Honma et al. 2019), historically tested compounds may have been tested in conditions that would today not be considered compliant with testing protocols, such as in insufficient number of strains, or not being tested up to required concentrations. Also, for older data, the purity of the test article may not have been high enough. In some cases, this may have led to incorrect experimental results and consequently, also to inaccurate model predictions. A significant improvement of model accuracy and model coverage has been achieved in the past few years by efforts to both increase data availability and to ascertain data quality. Additionally, with developments in the availability of ML and DL methods and the utilization of multiple methods in combination, it is reasonable to assume that models will continue to improve over time.

References ADMEWORKS series: Fujitsu global (2021o). https://www.fujitsu.com/global/solutions/businesstechnology/tc/sol/admeworks/. Accessed 20 Dec 2021 Ahlberg E, Amberg A, Beilke LD et al (2016) Extending (Q)SARs to incorporate proprietary knowledge for regulatory purposes: a case study using aromatic amine mutagenicity. Regul Toxicol Pharmacol 77:1–12. https://doi.org/10.1016/j.yrtph.2016.02.003 Amberg A, Anger LT, Bercu J et al (2019) Extending (Q)SARs to incorporate proprietary knowledge for regulatory purposes: is aromatic N-oxide a structural alert for predicting DNA-reactive mutagenicity? Mutagenesis 34:67–82. https://doi.org/10.1093/mutage/gey020 Ames BN, McCann J, Yamasaki E (1975) Methods for detecting carcinogens and mutagens with the salmonella/mammalian-microsome mutagenicity test. Mutat Res Mutagen Relat Subj 31:347– 363. https://doi.org/10.1016/0165-1161(75)90046-1

456

L. Zhao and C. Hasselgren

Ashby J, Tennant RW (1988) Chemical structure, Salmonella mutagenicity and extent of carcinogenicity as indicators of genotoxic carcinogenesis among 222 chemicals tested in rodents by the U.S. NCI/NTP. Mutat Res 204:17–115. https://doi.org/10.1016/0165-1218(88)90114-0 Ashby J, Tennant RW (1991) Definitive relationships among chemical structure, carcinogenicity and mutagenicity for 301 chemicals tested by the U.S. NTP. Mutat Res Genet Toxicol 257:229–306. https://doi.org/10.1016/0165-1110(91)90003-E Baskin II (2018) Machine learning methods in computational toxicology. In: Nicolotti O (ed) Computational toxicology: methods and protocols. Springer, New York, NY, pp 119–139 Benigni R, Bossa C (2019) Data-based review of QSARs for predicting genotoxicity: the state of the art. Mutagenesis 34:17–23. https://doi.org/10.1093/mutage/gey028 Benigni R, Serafimova R, Parra Morte JM et al (2020) Evaluation of the applicability of existing (Q)SAR models for predicting the genotoxicity of pesticides and similarity analysis related with genotoxicity of pesticides for facilitating of grouping and read across: an EFSA funded project. Regul Toxicol Pharmacol 114:104658. https://doi.org/10.1016/j.yrtph.2020.104658 Bhagat HA, Compton SA, Musso DL et al (2018) N-substituted phenylbenzamides of the niclosamide chemotype attenuate obesity related changes in high fat diet fed mice. PLoS ONE 13:e0204605. https://doi.org/10.1371/journal.pone.0204605 Breiman L (2001) Random forests. Mach Learn 45:5–32. https://doi.org/10.1023/A:101093340 4324 Brigo A, Muster W (2016) The use of in silico models within a large pharmaceutical company. Methods Mol Biol Clifton NJ 1425:475–510. https://doi.org/10.1007/978-1-4939-3609-0_20 CAESAR (2021j). http://www.caesar-project.eu/. Accessed 20 Dec 2021 Carcinogenic potency database (CPDB) data (2021c). Download carcinogenic potency database CPDB data. https://www.nlm.nih.gov/databases/download/cpdb.html. Accessed 15 Dec 2021 Carlsson L, Helgee EA, Boyer S (2009) Interpretation of nonlinear QSAR models applied to Ames mutagenicity data. J Chem Inf Model 49:2551–2558. https://doi.org/10.1021/ci9002206 Chakravarti SK, Alla SRM (2019) Descriptor free QSAR modeling using deep learning with long short-term memory neural networks. Front Artif Intell 2:17. https://doi.org/10.3389/frai.2019. 00017 Chary MA, Manini AF, Boyer EW, Burns M (2020) The role and promise of artificial intelligence in medical toxicology. J Med Toxicol 16:458–464. https://doi.org/10.1007/s13181-020-00769-5 Cherkasov A, Muratov EN, Fourches D et al (2014) QSAR modeling: where have you been? Where are you going to? J Med Chem 57:4977–5010. https://doi.org/10.1021/jm4004285 Comprehensive cancer information—National Cancer Institute (1980). https://www.cancer.gov/. Accessed 16 Dec 2021 Contrera JF (2013) Validation of Toxtree and SciQSAR in silico predictive software using a publicly available benchmark mutagenicity database and their applicability for the qualification of impurities in pharmaceuticals. Regul Toxicol Pharmacol 67:285–293. https://doi.org/10.1016/j.yrtph. 2013.08.008 Contrera JF, Matthews EJ, Kruhlak NL, Benz RD (2005) In silico screening of chemicals for bacterial mutagenicity using electrotopological E-state indices and MDL QSAR software. Regul Toxicol Pharmacol 43:313–323. https://doi.org/10.1016/j.yrtph.2005.09.001 DEREK for Windows (2007a). Version 10.0.2 service pack 3, knowledge base release DfW 10.0.0_25_07_2007a. Lhasa Ltd., Leeds, UK Derek Nexus (2021l). https://www.lhasalimited.org/products/derek-nexus.htm. Accessed 20 Dec 2021 Drugs genetox database: Leadscope—chemoinformatics platform for drug discovery (2021f). https://www.leadscope.com/drugs_genetox_database/. Accessed 8 Dec 2021 Dynatrace engineering—understanding black-box ML models with explainable AI (2021q). https://engineering.dynatrace.com/blog/understanding-black-box-ml-models-with-explainab le-ai/. Accessed 21 Dec 2021

18 Machine Learning and Deep Learning Applications to Evaluate …

457

Eastmond DA, Hartwig A, Anderson D et al (2009) Mutagenicity testing for chemical risk assessment: update of the WHO/IPCS harmonized scheme. Mutagenesis 24:341–349. https://doi.org/ 10.1093/mutage/gep014 European Commission. Joint Research Centre. Institute for Health and Consumer Protection (2010) Review of QSAR models and software tools predicting genotoxicity and carcinogenicity. Publications Office, LU Feng J, Lurati L, Ouyang H et al (2003) Predictive toxicology: benchmarking molecular descriptors and statistical methods. J Chem Inf Comput Sci 43:1463–1470. https://doi.org/10.1021/ci034032s Ford KA, Ryslik G, Chan BK et al (2017) Comparative evaluation of 11 in silico models for the prediction of small molecule mutagenicity: role of steric hindrance and electron-withdrawing groups. Toxicol Mech Methods 27:24–35. https://doi.org/10.1080/15376516.2016.1174761 Genetic toxicology data bank (GENE-TOX)—PubChem substance—NCBI (2021d). https:// www.ncbi.nlm.nih.gov/pcsubstance?term=%22Genetic%20Toxicology%20Data%20Bank% 20(GENE-TOX)%22%5BSourceName%5D%20AND%20hasnohold%5Bfilt%5D. Accessed 16 Dec 2021 Genetox expert alerts suite: Leadscope—chemoinformatics platform for drug discovery (2021k). https://www.leadscope.com/genetox_expert_alerts/. Accessed 20 Dec 2021 Gini G, Zanoli F (2020) Machine learning and deep learning methods in ecotoxicological QSAR modeling. In: Roy K (ed) Ecotoxicological QSARs. Springer, US, New York, NY, pp 111–149 Gini G, Franchi AM, Manganaro A et al (2014) ToxRead: a tool to assist in read across and its use to assess mutagenicity of chemicals. SAR QSAR Environ Res 25:999–1011. https://doi.org/10. 1080/1062936X.2014.976267 Gini G, Zanoli F, Gamba A et al (2019) Could deep learning in neural networks improve the QSAR models? SAR QSAR Environ Res 30:617–642. https://doi.org/10.1080/1062936X.2019.1650827 Greene N, Judson PN, Langowski JJ, Marchant CA (1999) Knowledge-based expert systems for toxicity and metabolism prediction: DEREK, StAR and METEOR. SAR QSAR Environ Res 10:299–314. https://doi.org/10.1080/10629369908039182 Guan D, Fan K, Spence I, Matthews S (2018) QSAR ligand dataset for modelling mutagenicity, genotoxicity, and rodent carcinogenicity. Data Brief 17:876–884. https://doi.org/10.1016/j.dib. 2018.01.077 Guidance on information requirements and chemical safety assessment—ECHA (2016) Hansen K, Mika S, Schroeter T et al (2009) Benchmark data set for in silico prediction of Ames mutagenicity. J Chem Inf Model 49:2077–2081. https://doi.org/10.1021/ci900161g Hao Y, Sun G, Fan T et al (2019) Prediction on the mutagenicity of nitroaromatic compounds using quantum chemistry descriptors based QSAR and machine learning derived classification methods. Ecotoxicol Environ Saf 186:109822. https://doi.org/10.1016/j.ecoenv.2019.109822 Hasselgren C, Ahlberg E, Akahori Y et al (2019) Genetic toxicology in silico protocol. Regul Toxicol Pharmacol RTP 107:104403. https://doi.org/10.1016/j.yrtph.2019.104403 Hayashi M, Kamata E, Hirose A et al (2005) In silico assessment of chemical mutagenesis in comparison with results of Salmonella microsome assay on 909 chemicals. Mutat Res 588:129– 135. https://doi.org/10.1016/j.mrgentox.2005.09.009 Hearst MA, Dumais ST, Osuna E et al (1998) Support vector machines. IEEE Intell Syst Their Appl 13:18–28. https://doi.org/10.1109/5254.708428 Helma C, Cramer T, Kramer S, De Raedt L (2004) Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. J Chem Inf Comput Sci 44:1402–1411. https://doi.org/10.1021/ ci034254q Hemmerich J, Ecker GF (2020) In silico toxicology: from structure–activity relationships towards deep learning and adverse outcome pathways. Wires Comput Mol Sci 10:e1475. https://doi.org/ 10.1002/wcms.1475 Hillebrecht A, Muster W, Brigo A et al (2011) Comparative evaluation of in silico systems for Ames test mutagenicity prediction: scope and limitations. Chem Res Toxicol 24:843–854. https://doi. org/10.1021/tx2000398

458

L. Zhao and C. Hasselgren

Honma M, Kitazawa A, Cayley A et al (2019) Improvement of quantitative structure–activity relationship (QSAR) tools for predicting Ames mutagenicity: outcomes of the Ames/QSAR international challenge project. Mutagenesis 34:3–16. https://doi.org/10.1093/mutage/gey031 Hssina B, Merbouha A, Ezzikouri H, Erritali M (2014) A comparative study of decision tree ID3 and C4.5. Int J Adv Comput Sci Appl 8 Hsu K-H, Su B-H, Tu Y-S et al (2016) Mutagenicity in a molecule: identification of core structural features of mutagenicity using a scaffold analysis. PLoS ONE 11:e0148900. https://doi.org/10. 1371/journal.pone.0148900 Hung C, Gini G (2021) QSAR modeling without descriptors using graph convolutional neural networks: the case of mutagenicity prediction. Mol Divers 25:1283–1299. https://doi.org/10. 1007/s11030-021-10250-2 ICH M7 assessment and control of DNA reactive (mutagenic) impurities in pharmaceuticals to limit potential carcinogenic risk (2018a) ICH S2 (R1) Genotoxicity testing and data interpretation for pharmaceuticals intended human use (2018b) Idakwo G, Luttrell J, Chen M et al (2018) A review on machine learning methods for in silico toxicity prediction. J Environ Sci Health Part C 36:169–191. https://doi.org/10.1080/10590501. 2018.1537118 Ji Z, Ball NS, LeBaron MJ (2017) Global regulatory requirements for mutagenicity assessment in the registration of industrial chemicals. Environ Mol Mutagen 58:345–353. https://doi.org/10. 1002/em.22096 Jiao Z, Hu P, Xu H, Wang Q (2020) Machine learning and deep learning in chemical health and safety: a systematic review of techniques and applications. ACS Chem Health Saf 27:316–334. https://doi.org/10.1021/acs.chas.0c00075 Jing Y, Bian Y, Hu Z et al (2018) Deep learning for drug design: an artificial intelligence paradigm for drug discovery in the big data era. AAPS J 20:58. https://doi.org/10.1208/s12248-018-0210-0 Judson PN, Cooke PA, Doerrer NG et al (2005) Towards the creation of an international toxicology information centre. Toxicology 213:117–128. https://doi.org/10.1016/j.tox.2005.05.014 Judson P (2006) Using computer reasoning about qualitative and quantitative information to predict metabolism and toxicity. In: Pharmacokinetic profiling in drug research. Wiley, pp 417–429 Kaur G, Oberai EN (2014) A review article on naive Bayes classifier with various smoothing techniques. 6 Kazius J, McGuire R, Bursi R (2005) Derivation and validation of toxicophores for mutagenicity prediction. J Med Chem 48:312–320. https://doi.org/10.1021/jm040835a Khan S, Yairi T (2018) A review on the application of deep learning in system health management. Mech Syst Signal Process 107:241–265. https://doi.org/10.1016/j.ymssp.2017.11.024 Kleinstreuer NC, Tetko IV, Tong W (2021) Introduction to special issue: computational toxicology. Chem Res Toxicol 34:171–175. https://doi.org/10.1021/acs.chemrestox.1c00032 Klopman G (1984) Artificial intelligence approach to structure-activity studies. Computer automated structure evaluation of biological activity of organic molecules. J Am Chem Soc 106:7315–7321. https://doi.org/10.1021/ja00336a004 Klopman G (1992) MULTICASE 1. A hierarchical computer automated structure evaluation program. Quant Struct-Act Relatsh 11:176–184. https://doi.org/10.1002/qsar.19920110208 Kramer O (2013) K-Nearest neighbors. In: Kramer O (ed) Dimensionality reduction with unsupervised nearest neighbors. Springer, Berlin, Heidelberg, pp 13–23 Kuhnke L, ter Laak A, Göller AH (2019) Mechanistic reactivity descriptors for the prediction of Ames mutagenicity of primary aromatic amines. J Chem Inf Model 59:668–672. https://doi.org/ 10.1021/acs.jcim.8b00758 Kumar R, Khan FU, Sharma A et al (2021) A deep neural network–based approach for prediction of mutagenicity of compounds. Environ Sci Pollut Res 28:47641–47650. https://doi.org/10.1007/ s11356-021-14028-9 Lahl U, Gundert-Remy U (2008) The use of (Q)SAR methods in the context of REACH. Toxicol Mech Methods 18:149–158. https://doi.org/10.1080/15376510701857288

18 Machine Learning and Deep Learning Applications to Evaluate …

459

Lavecchia A (2019) Deep learning in drug discovery: opportunities, challenges and future prospects. Drug Discov Today 24:2017–2032. https://doi.org/10.1016/j.drudis.2019.07.006 Leadscope model applier (2010c). Version 1.2. Leadscope Inc., Columbus, OH. http://www.leadsc ope.com LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444. https://doi.org/10.1038/ nature14539 Lhasa Limited. Welcome Lhasa Ltd (2021g). https://www.lhasalimited.org/. Accessed 27 Jan 2022 Li S, Zhang L, Feng H et al (2021) MutagenPred-GCNNs: a graph convolutional neural networkbased classification model for mutagenicity prediction with data-driven molecular fingerprints. Interdiscip Sci Comput Life Sci 13:25–33. https://doi.org/10.1007/s12539-020-00407-2 Maron DM, Ames BN (1983) Revised methods for the Salmonella mutagenicity test. Mutat Res Mutagen Relat Subj 113:173–215. https://doi.org/10.1016/0165-1161(83)90010-9 Maunz A, Gütlein M, Rautenberg M et al (2013) lazar: a modular predictive toxicology framework. Front Pharmacol 4:38. https://doi.org/10.3389/fphar.2013.00038 Miotto R, Wang F, Wang S et al (2018) Deep learning for healthcare: review, opportunities and challenges. Brief Bioinform 19:1236–1246. https://doi.org/10.1093/bib/bbx044 Moon H-J, Bu S-J, Cho S-B (2022) Mutagenic prediction for chemical compound discovery with partitioned graph convolution network. In: Sanjurjo González H, Pastor López I, García Bringas P et al (eds) 16th International conference on soft computing models in industrial and environmental applications (SOCO 2021). Springer International Publishing, Cham, pp 578–587 MultiCASE (2007b). Version 2.1. Multicase Inc., Beachwood, OH, U.S.A MultiCASE (2010b). Version 2.1.0.99. Multicase Inc., Beachwood, OH, U.S.A. http://www.multic ase.com MultiCASE high quality software for in-silico ICH M7 safety assessment (2019a). http://www.mul ticase.com/. Accessed 16 Dec 2021 National toxicology program. Download NTP study data—national toxicology program (2022). https://ntp.niehs.nih.gov/. Accessed 15 Dec 2021 Naven RT, Greene N, Williams RV (2012) Latest advances in computational genotoxicity prediction. Expert Opin Drug Metab Toxicol 8:1579–1587. https://doi.org/10.1517/17425255.2012.724059 NCBI Chemical carcinogenesis research information system (CCRIS)—PubChem substance (2021a). https://www.ncbi.nlm.nih.gov/pcsubstance?term=%22Chemical%20Carcinogenesis% 20Research%20Information%20System%20(CCRIS)%22%5BSourceName%5D%20AND% 20hasnohold%5Bfilt%5D. Accessed 15 Dec 2021 Non-human genetic toxicity suite: Leadscope—Chemoinformatics platform for drug discovery (2019b). https://www.leadscope.com/product_info.php?products_id=67 Norinder U, Ahlberg E, Carlsson L (2019) Predicting Ames mutagenicity using conformal prediction in the Ames/QSAR international challenge project. Mutagenesis 34:33–40. https://doi.org/10. 1093/mutage/gey038 OECD (2020) Test No. 471: bacterial reverse mutation test. Organisation for Economic Co-operation and Development, Paris Patel M, Kranz M, Munoz-Muriedas J et al (2018) A pharma-wide approach to address the genotoxicity prediction of primary aromatic amines. Comput Toxicol 7:27–35. https://doi.org/10.1016/j. comtox.2018.06.002 Patlewicz G, Jeliazkova N, Safford RJ et al (2008) An evaluation of the implementation of the Cramer classification scheme in the Toxtree software. SAR QSAR Environ Res 19:495–524. https://doi.org/10.1080/10629360802083871 Pavan M, Worth AP (2008) Publicly-accessible QSAR software tools developed by the Joint Research Centre. SAR QSAR Environ Res 19:785–799. https://doi.org/10.1080/106293608025 50390 Rasmussen CE (2004) Gaussian processes in machine learning. In: Bousquet O, von Luxburg U, Rätsch G (eds) Advanced lectures on machine learning: ML Summer Schools 2003, Canberra, Australia, Feb 2–14, 2003, Tübingen, Germany, Aug 4–16, 2003, Revised lectures. Springer, Berlin, Heidelberg, pp 63–71

460

L. Zhao and C. Hasselgren

Read-across—toxit (2021p). https://www.toxit.it/en/services/read-across. Accessed 21 Dec 2021 S2 (R1) Genotoxicity testing and data interpretation for pharmaceuticals intended for human use (2020) Sanderson DM, Earnshaw CG (1991) Computer prediction of possible toxic action from chemical structure; the DEREK system. Hum Exp Toxicol 10:261–273. https://doi.org/10.1177/096032 719101000405 SAR genetox database: Leadscope—chemoinformatics platform for drug discovery (2021e). https:// www.leadscope.com/sar_genetox_database/. Accessed 16 Dec 2021 Sarah Nexus (2021i). https://www.lhasalimited.org/products/sarah-nexus.htm. Accessed 20 Dec 2021 Sawatari K, Nakanishi Y, Matsushima T (2001) Relationships between chemical structures and mutagenicity: a preliminary survey for a database of mutagenicity test results of new work place chemicals. Ind Health 39:341–345. https://doi.org/10.2486/indhealth.39.341 SciQSAR 2D (2013). https://www.pharmaceuticalonline.com/doc/sciqsar-2d-0001 SciTegic Pipeline Pilot (2009). Version 7.0. http://accelrys.com/products/scitegic/ Snyder RD (2009) An update on the genotoxicity and carcinogenicity of marketed pharmaceuticals with reference to in silico predictivity. Environ Mol Mutagen 50:435–450. https://doi.org/10. 1002/em.20485 Snyder RD, McNulty J, Zairov G et al (2005) The influence of N-dialkyl and other cationic substituents on DNA intercalation and genotoxicity. Mutat Res 578:88–99. https://doi.org/10. 1016/j.mrfmmm.2005.03.022 Snyder RD, Holt PA, Maguire JM, Trent JO (2013) Prediction of noncovalent Drug/DNA interaction using computational docking models: studies with over 1350 launched drugs. Environ Mol Mutagen 54:668–681. https://doi.org/10.1002/em.21796 Staudemeyer RC, Morris ER (2019) Understanding LSTM—a tutorial into long short-term memory recurrent neural networks. arXiv:1909.09586v1 (Cs) Takigami H, Matsui S, Matsuda T, Shimizu Y (2002) The Bacillus subtilis rec-assay: a powerful tool for the detection of genotoxic substances in the water environment. Prospect for assessing potential impact of pollutants from stabilized wastes. Waste Manag 22:209–213. https://doi.org/ 10.1016/s0956-053x(01)00071-x Tang W, Chen J, Wang Z et al (2018) Deep learning for predicting toxicity of chemicals: a mini review. J Environ Sci Health Part C 36:252–271. https://doi.org/10.1080/10590501.2018.153 7563 Tcheremenskaia O, Battistelli CL, Giuliani A et al (2019) In silico approaches for prediction of genotoxic and carcinogenic potential of cosmetic ingredients. Comput Toxicol 11:91–100. https:// doi.org/10.1016/j.comtox.2019.03.005 The comparative toxicogenomics database—CTD (2021b). The comparative toxicogenomics database CTD. http://ctdbase.org/. Accessed 15 Dec 2021 The OECD QSAR toolbox—OECD (2021n). https://www.oecd.org/chemicalsafety/risk-assess ment/oecd-qsar-toolbox.htm#Guidance_Documents_and_Training_Materials_for_Using_the_ Toolbox. Accessed 20 Dec 2021 TIMES software—predicting toxicity of chemicals resulting from their metabolic activation (2021m). http://oasis-lmc.org/products/software/times.aspx. Accessed 20 Dec 2021 Toxtree (2010a). Version 1.60. European Commission Research Centre Computational Toxicology Group. http://ecb.jrc.ec.europa.eu/qsar/qsar-tools/index.php?c=TOXTREE Tropsha A (2010) Best practices for QSAR model development, validation, and exploitation. Mol Inform 29:476–488. https://doi.org/10.1002/minf.201000061 U.S. Environmental protection agency (2021h). U.S. Environmental protection agency US EPA. https://www.epa.gov/. Accessed 16 Dec 2021 Unterthiner T, Mayr A, Klambauer G, Hochreiter S (2015) Toxicity prediction using deep learning. arXiv:1503.01445v1 (Cs Q-Bio Stat)

18 Machine Learning and Deep Learning Applications to Evaluate …

461

Van Bossuyt M, Van Hoeck E, Raitano G et al (2018) Performance of in silico models for mutagenicity prediction of food contact materials. Toxicol Sci off J Soc Toxicol 163:632–638. https:// doi.org/10.1093/toxsci/kfy057 Waters MD, Stack HF, Garrett NE, Jackson MA (1991) The genetic activity profile database. Environ Health Perspect 96:41–45 Webb SJ, Hanser T, Howlin B et al (2014) Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity. 21 Weininger D (1988) SMILES, a chemical language and information system. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28:31–36. https://doi.org/10.1021/ci0005 7a005 Xu C, Cheng F, Chen L et al (2012) In silico prediction of chemical Ames mutagenicity. J Chem Inf Model 52:2840–2847. https://doi.org/10.1021/ci300400a Zhang H, Kang Y-L, Zhu Y-Y et al (2017) Novel naïve Bayes classification models for predicting the chemical Ames mutagenicity. Toxicol Vitro 41:56–63. https://doi.org/10.1016/j.tiv.2017.02.016 Zhang J, Mucs D, Norinder U, Svensson F (2019a) LightGBM: an effective and scalable algorithm for prediction of chemical toxicity-application to the Tox21 and mutagenicity data sets. J Chem Inf Model 59:4150–4158. https://doi.org/10.1021/acs.jcim.9b00633 Zhang S, Tong H, Xu J, Maciejewski R (2019b) Graph convolutional networks: a comprehensive review. Comput Soc Netw 6:11. https://doi.org/10.1186/s40649-019-0069-y Zhao Z-Q, Zheng P, Xu S-T, Wu X (2019) Object detection with deep learning: a review. IEEE Trans Neural Netw Learn Syst 30:3212–3232. https://doi.org/10.1109/TNNLS.2018.2876865 Zhao L, Zhu H (2018) Big data in computational toxicology: challenges and opportunities. In: Computational Toxicology. Wiley, pp 291–312 Zhu H (2020) Big data and artificial intelligence modeling for drug discovery. Annu Rev Pharmacol Toxicol 60:573–589. https://doi.org/10.1146/annurev-pharmtox-010919-023324

Chapter 19

Modeling Tox21 Data for Toxicity Prediction and Mechanism Deconvolution Tuan Xu, Menghang Xia, and Ruili Huang

19.1 Introduction As of November 2021, more than 191 million organic and inorganic substances, including alloys, coordination compounds, minerals, mixtures, polymers, and salts, are registered in the Chemical Abstract Service database of the American Chemical Society, with thousands of new substances added per day at this writing (CAS 2021). The US National Toxicology Program (NTP), a world leader in toxicology research, has evaluated ~ 2800 environmental substances for potential human health effects (NTP 2021). However, there is a lack of information on the effects of the vast majority of substances on human health. Such a situation has led to a great deal of uncertainty in the assessment of human health risk and poses a major challenge to medicine development and public health. Traditional toxicity testing using in vivo animal models provides chemical safety reference for human health, but these methods have limitations such as high cost, low throughput, poor reproducibility, ethical issues, and difficulties in the extrapolation of results to humans. Therefore, it is necessary to call for alternative strategies to complement traditional chemical risk assessment. As an alternative to traditional toxicity testing, in vitro assays can be valid substitutes for initial screens of cyto- and genotoxicity, as well as for investigating the molecular mechanisms underlying the potential toxicity of chemicals. One of the T. Xu · M. Xia · R. Huang (B) Division of Preclinical Innovation (DPI), National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD, USA e-mail: [email protected] T. Xu e-mail: [email protected] M. Xia e-mail: [email protected] This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2023 H. Hong (ed.), Machine Learning and Deep Learning in Computational Toxicology, Computational Methods in Engineering & the Sciences, https://doi.org/10.1007/978-3-031-20730-3_19

463

464

T. Xu et al.

most successful examples is the US toxicology in the twenty-first century (Tox21) program (Collins et al. 2008; Kavlock et al. 2009; NRC 2007; Tice et al. 2013), which is a US federal research collaboration involving the National Center for Advancing Translational Sciences (NCATS), the US Environmental Protection Agency (EPA), the National Toxicology Program (NTP), and the US Food and Drug Administration (FDA). The Tox21 program has developed a panel of in vitro assays based on toxicity related targets/pathways to test a collection of approximately 10,000 environmental chemicals and drugs (Tox21 10K) in a quantitative high-throughput screening (qHTS) format, using triplicate 15-dose titrations to generate over 100 million data points to date (Richard et al. 2020). The goals of the Tox21 program include identifying toxicity pathways and compound mechanisms of toxicity, compounds for further extensive toxicological evaluation, and developing predictive models for biological response in human beings. In this chapter, we will review the applications of the Tox21 assay data in building toxicity prediction models, and toxicity mechanism studies.

19.2 Tox21 10K Compound Library and Assay Data 19.2.1 Tox21 Compound Collection The selection of Tox21 compounds was based primarily on the following criteria: physicochemical properties (e.g., solubility, volatility, molecular weight, and logP) amenability for qHTS, known or perceived environmental hazards or exposure concerns, cost, and commercial availability. Currently, the full Tox21 screening library of approximately 10,000 chemicals (8947 unique chemical entities) representing a wide range of structural diversity, often referred to as the “Tox21 10K compound library,” was primarily contributed by three Tox21 partners (i.e., NCATS, NTP, and EPA) (NCATS 2016; PubChem 2013). Each partner brought a separately sourced and approximately equal-sized compound library (Richard et al. 2020). For example, NCATS contributed a total of 3764 unique compounds, consisted of approved and investigational drugs (including unapproved substances tested in humans) (Huang et al. 2011; Huang et al. 2019a, b). NTP contributed a total of 3115 unique compounds, spanning many areas of programmatic environmental and toxicological concern. EPA contributed a total of 4078 unique compounds, including the complete set of procured ToxCast compounds (Richard et al. 2016) that were deemed suitable for screening. After removing duplicates, the composition of the collected compound library is pharmaceutical (34.4%), pesticide (8.1%), pharmaceutical aid (7.0%), food additive (5.2%), consumer product (5.1%), industrial chemical (3.0%), cosmetics (2.1%), household (0.7%), herbicide (0.4%), and others (33.5%) (Fig. 19.1). Each compound in the Tox21 10K compound library has been subjected to analytical chemistry quality control (QC), which provides information on the purity and identity of each sample. The QC results and other annotations for each individual

19 Modeling Tox21 Data for Toxicity Prediction and Mechanism …

465

compound have been made publicly available at https://tripod.nih.gov/tox/samples. The compounds were serially diluted in dimethylsulfoxide (DMSO) to 15 concentrations in 1536-well plates, covering a concentration range of up to four orders of magnitude. To assess the data reproducibility, three physical copies of the library were prepared in three different formats, in which the same compounds were plated in a different well location in each copy. In addition, a set of 88 diverse compounds serving as internal controls were placed in every screening plate in duplicates to assess assay reproducibility and determine positional plate effects (Attene-Ramos et al. 2013).

Fig. 19.1 Composition of the Tox21 10K compound library. The collection is comprised of 8947 structurally diverse compounds including drugs, consumer products, cosmetics, household products, food additives, industrial chemicals, pesticides, etc. Compound annotations including analytical QC results are publicly available at https://tripod.nih.gov/tox/samples

466

T. Xu et al.

19.2.2 Tox21 qHTS Process The Tox21 assay screens are performed on a dedicated robotic system, which consists of a six-axis robotic arm with a specifically designed gripper and barcode reader surrounded by compound plate storage units, assay plate incubators, liquid handlers, and microplate readers (Attene-Ramos et al. 2013). The robotic system is capable of storing compound collections and assay plates and performing qHTS in a fully integrated and automated manner. Screening of the Tox21 10K compound library has yielded a large amount of raw data, which requires new data analysis methods to integrate these data and characterize the activities observed from these assays. NCATS has developed a standardized qHTS data analysis process, including raw data processing, concentration response curve fitting and classification, data reproducibility evaluation, and assignment of activity to compounds (Huang 2016). As soon as the initial data parsing and assessment at NCATS are complete, the assay results (e.g., the concentration response data, curve fitting results, raw plate reads, assay conditions, and sample mapping information) will be shared with the Tox21 partners through a suite of databases and software tools (http://tripod.nih.gov/tox/). After further review for quality and utility, the assay data are subsequently released to the public through a variety of public databases, such as PubChem (http://pubchem.ncbi.nlm. nih.gov/), the NCATS Tox21 Browser (https://tripod.nih.gov/tox21/pubdata/), the NIEHS Chemical Effects in Biological Systems (CEBS) database (http://tools.niehs. nih.gov/cebs3/ui/), and EPA’s CompTox Chemicals Dashboard (https://comptox.epa. gov/dashboard) (Sakamuru et al. 2019). Most of the qHTS assays were performed using human cell lines (80.9%), followed by murine embryo fibroblast (7.4%), Chinese hamster ovary cell lines (5.9%), and others (5.9%) (Fig. 19.2). These assays measure pathway activities including nuclear receptor (NR) signaling (55.9%), stress response (SR) pathway (11.8%), direct cytotoxicity (8.8%), and other molecular targets or pathways related to toxicity (23.5%) (Fig. 19.2). These data can be used to prioritize compounds for more extensive toxicological evaluation and serve as a source of big data in many machine learning and data science projects for virtual screening, computational toxicology, and compound mechanism of toxicity studies. A few example applications of Tox21 data are reviewed in the following sections.

19.3 Modeling Tox21 Data for Toxicity Prediction 19.3.1 Multiple Species In Vivo Toxicity In 2015, the Tox21 activity profiles generated from screening the Tox21 10K compound library against a panel of 30 cell-based assays (i.e., NR signaling and SR pathway assays) were evaluated for their utility in predicting in vivo toxicity

19 Modeling Tox21 Data for Toxicity Prediction and Mechanism …

467

Fig. 19.2 Tox21 assay panel. The figure shows the cell types, cell lines, and biological targets/pathways covered by the Tox21 qHTS assays

(Huang et al. 2016a, b). In vivo toxicity data for multiple species were retrieved from the Registry of Toxic Effects of Chemical Substances (RTECS) database compiled by Leadscope (Leadscope, Inc.). Ultimately, 72 in vivo toxicity endpoints were selected, including 13 different species, such as human, rodents, primates, and birds (Fig. 19.3). Models were built to predict these in vivo toxicity endpoints based on compound assay activity profiles (activity-based models) or chemical structure (structure-based models), or both (combined models), using the self-organizing map (SOM) clustering approach. These models are based on the premise that compounds with similar in vitro and/or structural characteristics are likely to exhibit similar in vivo effects. The area under the receiver operating characteristic curve (AUC-ROC) was taken as a measure of the performance of the model. The activity-based models built for the human endpoints performed significantly better than the models for the mouse/rat/rabbit toxicity endpoints (Fig. 19.3). This may be explained by species differences, i.e., all of the screening data used in this analysis were derived from cell-based assays using human cells or cell lines (Huang et al. 2016a, b). In addition, the models built with the Tox21 assay activity profiles showed reasonable but less than ideal performance for most in vivo toxicity endpoints. This phenomenon may be due to the fact that Tox21 assays focused primarily on NR signaling and SR pathways, and did not adequately cover all biological aspects

468

T. Xu et al.

Fig. 19.3 Performance range of multi species in vivo toxicity prediction models measured by AUC-ROC. In vivo toxicity data including human and animal data were obtained from the Registry of Toxic Effects of Chemical Substances database. The prediction models for in vivo toxicity endpoints were constructed based on either the in vitro assay data (activity model) or chemical structure (structure model), or both (combined model)

involved in toxic responses, suggesting the need to expand the coverage of the biological response space by including assays that target additional pathways relevant for toxicity in the continuation of the Tox21 program (Huang et al. 2016a, b). Models built based on chemical structures exhibited generally better performance than those built based on assay activity, and did not show any species selectivity (Fig. 19.3). The combination of compound structure and activity data led to significantly better models for most in vivo endpoints than models built with structure or activity data alone (Fig. 19.3), demonstrating the value of in vitro assay data in predicting in vivo toxicity (Huang et al. 2016a, b). Similar to the models built based on compound structures, the performance of the combined models did not show any species dependence (Fig. 19.3). In a follow-up study, models were built to predict adverse drug effects using Tox21 assay data (Huang et al. 2018). The results showed that model performance was significantly improved with the addition of drug target annotations that represent the biological response space not covered by the current Tox21 assays.

19.3.2 Human In Vivo Toxicity To further evaluate the utility of the Tox21 assay data in predicting human in vivo toxicity, machine learning models were built using qHTS data generated from screening

19 Modeling Tox21 Data for Toxicity Prediction and Mechanism …

469

the Tox21 10K library against 47 cell-based assays in 2019 (Xu et al. 2020a, b). In vivo human toxicity data were retrieved from the ChemIDPlus Advanced database of the US National Library of Medicine (available at http://chem.sis.nlm.nih.gov/che midplus/) that contains a large number of toxicity reports from multiple laboratories drawn from the published literature. A total of 14 human in vivo toxicity endpoints were collected for modeling purposes, which include behavioral, blood, brain and coverings, cardiac, endocrine, gastrointestinal, kidney ureter and bladder, liver, lungs thorax or respiration, musculoskeletal, peripheral nerve and sensation, sense organs and special senses, skin and appendages skin, and vascular toxicities (Fig. 19.4) (Xu et al. 2020a, b). A total of five machine learning methods [i.e., Naïve Bayes (NB), random forests (RF), support vector machines (SVM), neural networks (NNET), and extreme gradient boosting (XGboost)] were applied to build predictive models for the 14 human in vivo toxicity endpoints using chemical structure, in vitro assay data, and a

Fig. 19.4 Performance of human organ level in vivo toxicity prediction models measured by AUCROC. In this study, 47 Tox21 assays with 147 readouts were used, and the 1024-bit ECFP4 fingerprints were generated using the CDK package in the KNIME software. Human in vivo toxicity data were collected by manually querying each chemical in the ChemIDPlus Advanced database of the US National Library of Medicine. Optimal models were obtained from the best combination of five machine learning models [i.e., Naïve Bayes (NB), random forests (RF), support vector machines (SVM), neural networks (NNET), and extreme gradient boosting (XGboost)] and feature selection methods (i.e., Fisher’s exact test with p value, importance scores from XGboost, and the RF algorithm)

470

T. Xu et al.

combination of both types of data. Three different metrics [i.e., AUC-ROC, balanced accuracy (BA), and Matthew’s correlation coefficient (MCC)] were used to evaluate model performance. As the primary performance metric in this study, the AUC-ROC values varied for each endpoint depending on the specific machine learning method, type of data, and feature selection method used to build the model (Xu et al. 2020a, b). The top four performing models, with AUC-ROC values > 0.8, were the ones for endocrine (0.90 ± 0.00), musculoskeletal (0.88 ± 0.02), peripheral nerve and sensation (0.85 ± 0.01), and brain and coverings (0.83 ± 0.02) toxicities, whereas the best model AUC-ROC values were > 0.7 for the remaining 10 toxicity endpoints (Fig. 19.4). In addition, chemical structure and assay data showed different levels of contribution to model performance for different in vivo toxicity endpoints. Although in vitro assay data, when combined with chemical structure, slightly improved the predictive accuracy for most endpoints (11 out of 14), a noteworthy finding was the near equal success of the structure-only models, and the relatively poor performance of assay-only models (Fig. 19.4). Thus, the top performing models from this study could be applied for hazard screening of large sets of chemicals for potential human toxicity, whereas the top-contributing structural features could serve as structure alerts for toxicity. In addition, the assay-only models could, in principle, be applied to chemicals for which a unique structure is unavailable, such as for those non-structurable substances, e.g., mixtures and natural product extracts, in the Tox21 library. Moreover, the assay features (i.e., cellular targets) shown to be influential in model performance can potentially illuminate mechanistic elements of an in vivo toxicity endpoint.

19.3.3 In Vitro Toxicity In addition to serving as in vitro signatures to predict in vivo toxicity, the Tox21 assay data provide a rich training dataset for the quantitative structure–activity relationship (QSAR) modeling community. For example, the “Tox21 Data Challenge 2014” was launched by NCATS to “crowd-source” data and build predictive toxicity models using chemical structure (Huang et al. 2016a, b; NCATS 2014). The Tox21 assay data generated from 12 in vitro assays served as the training set for this modeling challenge. These assays included seven NR [i.e., aryl hydrocarbon receptor (AhR), androgen receptor full length (AR), androgen receptor ligand binding domain (LBD) (AR-LBD), aromatase, estrogen receptor alpha full length (ER), estrogen receptor alpha LBD (ER-LBD), and peroxisome proliferator-activated receptor gamma (PPAR-gamma)] and five SR pathway assays (i.e., nuclear factor (erythroid-derived 2)-like 2/antioxidant responsive element (ARE), ATAD5, heat shock factor response element (HSE), mitochondrial membrane potential (MMP), and p53) (Fig. 19.5). The competition attracted participants from 18 different countries to develop computational models aimed at better predicting chemical toxicity.

19 Modeling Tox21 Data for Toxicity Prediction and Mechanism …

471

Fig. 19.5 Performance of in vitro toxicity prediction models measured by AUC-ROC. The highquality concentration response data generated by the Tox21 program provide a knowledge-base to correlate chemical structures to their biological activities for quantitative structure–activity relationship (QSAR) modeling

The winning models out of nearly 400 submissions all had good model performance (AUC-ROC > 0.8) (Fig. 19.5). In addition to the traditional machine learning methods (e.g., associative neural networks and random forest), deep learning techniques that use a cascade of many layers of nonlinear processing units to extract features and create transformations accounted for 50% of the winning models (Fig. 19.5). In addition to the Tox21 Challenge models, 3D-SDAR models were developed to predict the human Ether-à-go-go related gene (hERG) inhibition and phospholipidosis (PLD) induction activities of compounds in the Tox21 collection, and both achieved great performance (AUC-ROC > 0.87) (Fig. 19.5) (Slavov et al. 2017). Wu et al. constructed over 5000 models based on the qHTS data from screening the Tox21 10K compound library against 65 assays and found that more complex models such as (LS-)SVM and RF performed only marginally better than simpler models such as linear regression and k-nearest neighbors (KNN) (Wu et al. 2021). In the Tox21 Data Challenge, we found that consensus models generated by combining individual models from all participating teams also yielded excellent model performance, and in some cases even outperformed the winning models (Huang et al. 2016a, b). Consistent with this finding, the US EPA led two worldwide consortiums to virtually screen chemicals for their potential estrogenic and androgenic activities using two consensus models [i.e., the Collaborative Modeling Project for Androgen Receptor Activity (CoMPARA) and the Collaborative Estrogen Receptor

472

T. Xu et al.

Activity Prediction Project (CERAPP)], indicating that consensus models tend to outperform individual models reinforcing the wisdom of the crowd (Mansouri et al. 2016, 2020). In addition, a new deep learning consensus architecture (DLCA) that combines consensus and multitask deep learning approaches to generate large-scale QSAR models demonstrated improved prediction accuracy for both regression and classification tasks for Tox21 assay targets compared to other consensus approaches (Zakharov et al. 2019).

19.4 Toxicity Pathways and Mechanisms The Tox21 qHTS data provide a rich and robust set of compound activity profiles that could help understand compound mechanism of toxicity. For example, the data generated from screening the Tox21 Phase I collection of approximately 3000 compounds against a panel of cell viability and caspase-3/7 assays have shown that these compound activity profiles or signatures are useful for hypotheses generation on compound mode of action (MoA) (Huang et al. 2008). Furthermore, the 10K compounds were clustered by their activity profile similarity using the SOM algorithm. Each cluster was evaluated for the enrichment of Medical Subject Headings (MeSH) (NCBI 2013) pharmacological action (PA) terms using the Fisher’s exact test. These results indicated that compounds with similar activity profiles tend to share similar MoAs. For example, bisphenol type compounds, which are known estrogenic compounds (Rogers et al. 2012), such as bisphenols A, B, Z, and AF, were found in a neighboring cluster of estrogens with similar activity profiles. Moreover, for compounds that fall in the same cluster, some may have MeSH PA annotations and others may not have any annotations. In general, compounds in the same cluster tend to exhibit similar MoAs, and this information can be applied to infer the MoAs of the unknown compounds. For example, fludioxonil is a pesticide that is not assigned a MeSH PA and was clustered together with the estrogenic compounds. This finding suggests that fludioxonil could have endocrine disrupting activities, which is confirmed by literature reports (Medjakovic et al. 2013). These results show that the Tox21 10K compound activity profiles are useful for compound MoA hypotheses generation. If compound A without a known MoA has a activity signature similar to compound B that has a known MoA, this MoA can then be prioritized and tested for compound A. While building in vivo toxicity prediction models based on the Tox21 in vitro assay data, it is also possible to identify the molecular targets and biological pathways covered by the assays that contribute the most to the model performance (Huang et al. 2016a, b, 2018; Xu et al. 2020a, b). For example, it was found that one of the significant contributors to the vascular toxicity prediction model was the hypoxia response element (HRE) assay (i.e., tox21_hre_bla_agonist_p1, p value = 1.84 × 10−02 ) (Xu et al. 2020a, b). In mammalian cells, hypoxia-inducible factor 1 (HIF1) is a transcription factor that binds to HREs in the promoters of target genes to promote transcription, such as the vascular endothelial growth factor (VEGF),

19 Modeling Tox21 Data for Toxicity Prediction and Mechanism …

473

which serves as an important growth factor for angiogenesis and vascularization (Burroughs et al. 2013). To enable systematic and comprehensive identification of molecular targets and biological pathways associated with various in vivo toxicity endpoints, an integrative approach was used to extract molecular targets and biological pathways of chemical-induced toxicity at the human organ level involving eight common endpoints, such as carcinogenicity, cardiotoxicity, developmental toxicity, hepatotoxicity, nephrotoxicity, neurotoxicity, reproductive toxicity, and skin toxicity (including skin irritation and skin sensitization). The combination of molecular targets identified by the models and literature mining results for each toxicity endpoint was subjected to pathway enrichment analysis using the NCATS BioPlanet database (Version 1.0, available at https://tripod.nih.gov/bioplanet/), which is a comprehensive public resource containing 1652 curated human pathways (Huang et al. 2019a, b). A total of 1516 genes associated with toxicity were identified and subsequently analyzed for enrichment of biological pathways, yielding 206 significant pathways (Xu et al. 2020a, b). The top five most significant toxicity genes selected by XGboost importance gain scores and pathways for each in vivo toxicity endpoint were illustrated in Fig. 19.6. For example, the nuclear factor erythroid 2-related factor 2 (NRF2), a transcription factor that is encoded by the nuclear factor erythroid-derived 2-like 2 (NFE2L2) gene, has been reported to be associated with the oxidative stress response to memory, spatial learning, and neuroinflammation by regulating antioxidant response elements (Sharma et al. 2020). The NFE2L2 gene was found to be significantly associated with neurotoxicity in our study (Fig. 6a). Several significant pathways have also been reported to be closely linked to their respective human organs/tissues, which can serve as a validation of our method. MicroRNAs are a class of small (~ 22 nt) endogenous non-coding RNAs that can be involved in the epigenetic regulation of gene expression, such as modulating proteins that are components of the DNA damage response (Hu and Gatti 2011). Genomic instability due to DNA damage caused by carcinogens plays an important role in the initiation and development of cancer. MicroRNA regulation of DNA damage response pathways was also identified as one of the significant pathways related to carcinogenicity (Fig. 6b). The adverse outcome pathway (AOP) framework was developed as a tool for mapping mechanisms of toxic events associated with chemical risk assessment. The molecular targets and biological pathways identified in this study can not only complement the existing AOP framework but can also be applied to facilitate the development of new AOPs. For example, the androgen receptor (AR) was found to be involved in carcinogenicity, cardiotoxicity, and nephrotoxicity (Fig. 6a) (Xu et al. 2020a, b). However, AR was only documented as the key event (KE) (e.g., AR antagonism leading to short anogenital distance in male (mammalian) offspring, https://aopwiki.org/aops/306) and molecular initiating event (MIE) (e.g., AR agonism leading to male-biased sex ratio, https://aopwiki.org/aops/376) in a few reproductive related adverse outcomes (AOs). The linkage of the MIE and the KE in the AOP framework typically relies

474

T. Xu et al.

on the toxicity pathway (Browne et al. 2017). For example, the inhibition of acetylcholinesterase (AchE) is a MIE in the AOP framework developed for neurodegeneration (https://aopwiki.org/aops/281), acute mortality via impaired coordination and movement (https://aopwiki.org/aops/312), and impaired cognitive function (https:// aopwiki.org/aops/405). The pathway “neuronal system,” including the AChE gene, was found to have a significant association with neurotoxicity (Fig. 6b) based on our analysis (Xu et al. 2020a, b).

19.5 Conclusions and Moving Forward To date, the Tox21 10K compound library has been screened against > 70 in vitro assays in qHTS format, generating > 100 million data points. The big data produced through the Tox21 program have been used to build predictive models of in vivo toxicity. However, the models built based on in vitro assay data yielded reasonable but less than ideal performance for most in vivo toxicity endpoints, which may be due to the insufficient coverage of the biological response space by the assays screened. Currently, the Tox21 partners are working on incorporating assays that probe previously underrepresented target space, such as developmental toxicity pathways and G-protein coupled receptor signaling, through cross-partner projects (https://tox21. gov/projects/). On the other hand, the QSAR models developed from the Tox21 assay data exhibited robust and excellent performance, confirming the quality and reliability of the Tox21 qHTS data. Furthermore, these QSAR models can be applied to virtually screen large compound libraries to identify putatively active compounds and prioritize them for follow-up toxicological testing. In phase III of the program, Tox21 aims to add medium to high-throughput transcriptomics assays to test for effects of chemicals on gene expression, providing new and more diverse data resources to better identify chemicals that may cause adverse human health effects. The molecular targets and biological pathways identified from these studies could provide clues to the mechanisms of chemical-induced toxicity and testable hypotheses for experimental testing and verification.

Fig. 19.6 Top five significant molecular targets (a) selected by XGboost importance gain scores and biological pathways (b) for each human in vivo toxicity endpoint. Model-identified molecular targets were combined with literature mining results for each toxicity endpoint and analyzed for pathway enrichment using the NCATS BioPlanet database. The Fisher’s exact test was used to calculate the significance of enrichment of genes for each toxicity endpoint in a particular pathway. Pathways with p-values < 0.05 were considered statistically significant

19 Modeling Tox21 Data for Toxicity Prediction and Mechanism … 475

476

T. Xu et al.

Acknowledgements This work was supported by the Intramural Research Programs of the National Toxicology Program (Interagency agreement #Y2-ES-7020-01), National Institute of Environmental Health Sciences, the US Environmental Protection Agency (Interagency Agreement #Y3HG-7026-03), and the Intramural Research Program of the National Center for Advancing Translational Sciences, National Institutes of Health. We also thank Srilatha Sakamuru, Jinghua Zhao, Caitlin Lynch, Li Zhang, Shuaizhang Li, Samuel Michael, and Carleen Klumpp-Thomas for assisting with the screens, Paul Shinn, Misha Itkin, and Danielle Bougie for compound management and William Leister, Christopher LeClair, and Dingyin Tao for the Tox21 10K library quality control.

References Attene-Ramos MS, Miller N, Huang R, Michael S, Itkin M, Kavlock RJ et al (2013) The Tox21 robotic platform for the assessment of environmental chemicals—from vision to reality. Drug Discov Today 18:716–723 Browne P, Noyes PD, Casey WM, Dix DJ (2017) Application of adverse outcome pathways to u.S. Epa’s endocrine disruptor screening program. Environ Health Perspect 125:096001 Burroughs SK, Kaluz S, Wang D, Wang K, Van Meir EG, Wang B (2013) Hypoxia inducible factor pathway inhibitors as anticancer therapeutics. Future Med Chem 5(5):553–572 CAS content: substances. https://www.cas.org/cas-data/cas-registry. Retrieved 18 Nov 2021 Collins FS, Gray GM, Bucher JR (2008) Toxicology. Transforming environmental health protection. Science 319:906–907 Hu H, Gatti RA (2011) MicroRNAs: new players in the DNA damage response. J Mol Cell Biol 3(3):151–158 Huang R, Southall N, Cho MH, Xia M, Inglese J, Austin CP (2008) Characterization of diversity in toxicity mechanism using in vitro cytotoxicity assays in quantitative high throughput screening. Chem Res Toxicol 21:659–667 Huang R, Xia M, Nguyen D-T, Zhao T, Sakamuru S, Zhao J et al (2016a) Tox21 challenge to build predictive models of nuclear receptor and stress response pathways as mediated by exposure to environmental chemicals and drugs. Front Environ Sci 3:1–9 Huang R, Xia M, Sakamuru S, Zhao J, Shahane SA, Attene-Ramos M et al (2016b) Modelling the Tox21 10K chemical profiles for in vivo toxicity prediction and mechanism characterization. Nat Commun 7:10425 Huang R, Xia M, Sakamuru S, Zhao J, Lynch C, Zhao T et al (2018) Expanding biological space coverage enhances the prediction of drug adverse effects in human using in vitro activity profiles. Sci Rep 8:3783 Huang R, Grishagin I, Wang Y, Zhao T, Greene J, Obenauer JC et al (2019a) The NCATS BioPlanet—an integrated platform for exploring the universe of cellular signaling pathways for toxicology, systems biology, and chemical genomics. Front Pharmacol 10:445 Huang R, Zhu H, Shinn P, Ngan D, Ye L, Thakur A et al (2019b) The NCATS pharmaceutical collection: a 10-year update. Drug Discov Today 24:2341–2349 Huang R, Southall N, Wang Y, Yasgar A, Shinn P, Jadhav A et al (2011) The NCGC pharmaceutical collection: a comprehensive resource of clinically approved drugs enabling repurposing and chemical genomics. Sci Transl Med 3:80ps16 Huang R (2016) A quantitative high-throughput screening data analysis pipeline for activity profiling. In: Zhu H, Xia M (eds) High-throughput screening assays in toxicology, vol 1473, Part 1. Humana Press Kavlock RJ, Austin CP, Tice RR (2009) Toxicity testing in the 21st century: implications for human health risk assessment. Risk Anal 29:485–487

19 Modeling Tox21 Data for Toxicity Prediction and Mechanism …

477

Mansouri K, Abdelaziz A, Rybacka A, Roncaglioni A, Tropsha A, Varnek A et al (2016) CERAPP: collaborative estrogen receptor activity prediction project. Environ Health Perspect 124:1023– 1033 Mansouri K, Kleinstreuer N, Abdelaziz AM, Alberga D, Alves VM, Andersson PL et al (2020) CoMPARA: collaborative modeling project for androgen receptor activity. Environ Health Perspect 128:027002 Medjakovic S, Zoechling A, Gerster P, Ivanova MM, Teng Y, Klinge CM et al (2013) Effect of nonpersistent pesticides on estrogen receptor, androgen receptor, and aryl hydrocarbon receptor. Environ Toxicol 29(10):1201–1216 NCATS (2014) Tox21 data challenge. Available at: https://tripod.nih.gov/tox21/challenge/ NCATS (2016) Tox21 Data Browser NCBI (2013) Mesh, medical subject headings. Available at: http://www.ncbi.nlm.nih.gov/mesh NRC (2007) Toxicity testing in the 21st century: a vision and a strategy. The National Academies Press, Washington, DC NTP (2021) https://ntp.niehs.nih.gov/ PubChem (2013) Tox21 Phase II compound collection Richard AM, Judson RS, Houck KA, Grulke CM, Volarath P, Thillainadarajah I et al (2016) ToxCast chemical landscape: paving the road to 21st century toxicology. Chem Res Toxicol 29:1225–1251 Richard AM, Huang R, Waidyanatha S, Shinn P, Collins BJ, Thillainadarajah I et al (2020) The Tox21 10K compound library: collaborative chemistry advancing toxicology. Chem Res Toxicol 34:189–216 Rogers JA, Metz L, Yong VW (2012) Review: Endocrine disrupting chemicals and immune responses: a focus on bisphenol-A and its potential mechanisms. Mol Immunol 53:421–430 Sakamuru S, Zhu H, Xia M, Simeonov A, Huang R (2019) Profiling the Tox21 chemical library for environmental hazards: applications in prioritisation, predictive modelling, and mechanism of toxicity characterisation. In: Big data in predictive toxicology, pp 242–263 Sharma A, Chunduri A, Gopu A, Shatrowsky C, Crusio WE, Delprato A (2020) Common genetic signatures of Alzheimer’s disease in Down Syndrome. F1000Research 9:1299 Slavov S, Stoyanova-Slavova I, Li S, Zhao J, Huang R, Xia M et al (2017) Why are most phospholipidosis inducers also hERG blockers? Arch Toxicol 91:3885–3895 Tice RR, Austin CP, Kavlock RJ, Bucher JR (2013) Improving the human hazard characterization of chemicals: a Tox21 update. Environ Health Perspect 121:756–765 Wu L, Huang R, Tetko IV, Xia Z, Xu J, Tong W (2021) Trade-off predictivity and explainability for machine-learning powered predictive toxicology: an in-depth investigation with Tox21 data sets. Chem Res Toxicol 34:541–549 Xu T, Ngan DK, Ye L, Xia M, Xie HQ, Zhao B et al (2020a) Predictive models for human organ toxicity based on in vitro bioactivity data and chemical structure. Chem Res Toxicol 33:731–741 Xu T, Wu L, Xia M, Simeonov A, Huang R (2020b) Systematic identification of molecular targets and pathways related to human organ level toxicity. Chem Res Toxicol 34:412–421 Zakharov AV, Zhao T, Nguyen D-T, Peryea T, Sheils T, Yasgar A et al (2019) Novel consensus architecture to improve performance of large-scale multitask deep learning QSAR models. J Chem Inf Model 59:4613–4624

Chapter 20

Identification of Structural Alerts by Machine Learning and Their Applications in Toxicology Chaofeng Lou, Yaxin Gu, and Yun Tang

20.1 Introduction Structural alerts, also called “toxicophores” or “toxic fragments,” are functional groups or special structural patterns of compounds that have potential to cause toxicity (Raies and Bajic 2016). The patterns can be a specific substructure, a combination of substructures, or a Markush structure with various features such as R groups or atom lists. The concept of structural alerts was first proposed by Ashby (1985) in the context of structural analysis of chemical carcinogens. In that study, he listed a set of substructures, of which a large majority of organic chemical carcinogens and mutagens possess one or more, indicating that chemical structure may contribute to an evaluation of new chemical entities for their potential carcinogenicity. Afterward, toxicologists in different fields codified a series of chemical rules, which were helpful in ranking and prioritizing new chemical entities. When the toxicity mechanism is not clear, the chances of success are better if medicinal chemists avoid introducing structural alerts in drug design. Therefore, such structural alerts provide a simple and intuitive way to quickly identify potential toxic compounds and have been widely applied in hazard risk assessment in toxicology. Structural alerts have high chemical reactivity or can be transformed via bioactivation by human enzymes into high chemical reactivity fragments (Limban et al. 2018). Cumulative research over several decades has implicated the involvement C. Lou · Y. Gu · Y. Tang (B) Laboratory of Molecular Modeling and Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China e-mail: [email protected] C. Lou e-mail: [email protected] Y. Gu e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Hong (ed.), Machine Learning and Deep Learning in Computational Toxicology, Computational Methods in Engineering & the Sciences, https://doi.org/10.1007/978-3-031-20730-3_20

479

480

C. Lou et al.

Fig. 20.1 Some examples of knowledge-based structural alerts in carcinogenicity, skin sensitization, and idiosyncratic toxicity

of structural alerts in the generation of reactive metabolites, which can covalently bind to proteins or form adducts and cause idiosyncratic adverse drug reactions (Kalgutkar 2019). According to a survey of 68 withdrawn or black box warninglabeled drugs (Stepan et al. 2011), a significant proportion (55 out of 68, 80.8%) of drugs contained structural alerts, wherein reactive metabolites were demonstrated to be formed in 36 drugs (65%). Consequently, avoiding structural alerts, or at least reducing their chemical reactivity, has become a frequently accepted strategy in new drug design (Fig. 20.1). Notably, despite the universality and applicability of structural alerts in chemical toxicology, there has been a growing concern of false positives as structural alerts disproportionally flag too many chemicals as toxic. Therefore, if one perceives structural alerts as a dogma, then these knowledge-based rules do more harm than good for successful toxicity assessment. We still have a long way to go in exploring structural alerts, but we can see further if we stand on the shoulders of giants.

20.2 Approaches for Identification of Structural Alerts Given the important role of structural alerts in the field of toxicity research and new drug development, methods for identification of structural alerts have received great attention and made great progress. In the early stages of toxicity research, structural alerts were derived from in vivo/in vitro experiments by toxicologists and further summarized into toxicity rules with simple statistical analysis. Afterward, with the increasing number of bioactivity data, many novel computational algorithms were proposed, which broke through the limitation of extracting structural alerts by expert curation and greatly enriched the structural alerts library. More importantly, these data-driven structural alerts performed better than empirical structural alerts in terms

20 Identification of Structural Alerts by Machine Learning and Their …

481

Table 20.1 Methods for identification of structural alerts Approach

Subclass

Example

Expert systems

Commercial software

Derek Nexus (1996), LeadScope (2000)

Free tools

ChemoTyper (2021), ToxTree (2021)

Web servers

ToxAlerts (2012), AGDH

Computational approaches

Frequency analysis

Machine learning

Graph-based

Moss (2002), Gaston (2004)

Fragment-based

SARpy (2013), CASE (2012, 1984), Emerging pattern (2014)

Fingerprint-based

Bioalerts (2016), FP (2017)

QSTR

CNN (2017), NN for assessment (2018)

Others

SOM prediction (2017)

of the overall predictive accuracy for some endpoints such as Ames mutagenicity assay. In general, current methods for identification of structural alerts can be divided into expert systems and computational approaches. The former relies on the experience and knowledge of experts, while the latter utilizes complex algorithms to automatically extract toxicity-related substructures from large datasets. Examples of these two approaches have been summarized in Table 20.1, wherein the computational approach can be further divided into frequency analysis and interpretable machine learning methods.

20.2.1 Expert Systems In the past decades, before the wide application of computer technology, toxicologists used controlled experiments to study the causes of toxicity in terms of chemical structure, i.e., quantitative structure–activity relationship (QSAR). Series of toxicity rules were summarized by toxicologists through in vivo/ in vitro toxicology experiments, mainly from small datasets. For a certain class of compounds with specific toxicity, toxicologists would firstly come up with a series of conjectures about the impact of each substructure on toxicity. Then, several experiments will be designed to record the changes in specific toxicity through the modification of each substructure in the compound. The toxicity of compound disappears when a specific substructure is modified, indicating that this substructure is a structural alert or toxicophore. For

482

C. Lou et al.

Fig. 20.2 Structure of DIOB and tetrahydro-DIOB

instance, Li et al. (2016) have conducted in-depth research on the cause of hepatotoxicity of Diosbulbin B (DIOB) and concluded that the furan moiety was essential for DIOB-induced hepatotoxicity, i.e., structural alert. They replaced the furan of DIOB with a tetrahydrofuran group by chemical hydrogenation of the furan rings of DIOB, and no injury was observed in the animals given the same doses of tetrahydro-DIOB (see Fig. 20.2). Further research revealed that some reactive metabolites of DIOB were generated through cytochrome P450 3A enzymes and consumed hepatic GSH, which could be due at least in part to the hepatic injury. Thus, the furan ring was regarded as a structural alert for hepatotoxicity, which warned us to be cautious in the use of furan rings in drug development. To date, toxicologists have conducted research on many toxicity endpoints, including genotoxicity (Maron and Ames 1983), mutagenicity (Coquin et al. 2015), and endocrine disruptors (Nendza et al. 2016). These researchers summarized many meaningful chemical rules. Commercial software such as Derek Nexus (Ridings et al. 1996) and LeadScope (Roberts et al. 2000) are known as “expert systems,” which have collected many structural alerts for different endpoints. Furthermore, some free tools or web servers including ToxAlerts (Sushko et al. 2012), Australian Government Department of Health (AGDH) (2019), and ChemoTyper (2021) can also be used to identify structural alerts in an orphan compound which had previously been collected from literature (Table 20.1). There is no doubt that the structural alerts from these expert system-based methods greatly promote the research of toxicity mechanisms and toxicity prediction.

20 Identification of Structural Alerts by Machine Learning and Their …

483

20.2.2 Computational Approaches Despite the prevalence and usefulness of expert systems (specifically, to a nontechnical user), an expert system is not enough to summarize the hidden chemical rules from large and complex chemical and biological datasets (Yang et al. 2020). Therefore, many computational algorithms were proposed by researchers, showing great superiority in automating the identification of structural alerts from large datasets (e.g., Bertrand et al. 2012; Ferrari et al. 2013; Sherhod et al. 2014). In general, computational methods for identification of structural alerts can be roughly divided into two approaches: conventional statistical analysis and interpretable machine learning models. The basic principle of the former is frequency analysis, the general idea of which is to find some substructures occurring more frequently in toxic compounds than in nontoxic ones. On the other hand, the latter uses complex algorithms to explore the impact of different substructural patterns on the toxicity endpoints and labels these influential fragments as structural alerts.

20.2.2.1

Conventional Statistical Analysis Approach

Conventional statistical analysis approaches first need to obtain all possible substructures and then perform statistical analysis on each substructure. In general, three methods can help derive the substructure set: (i) Fragment-based approaches utilize cheminformatics tools, such as Pipeline Pilot (2021), to identify all possible fragments from the training set. SARpy, a free standalone tool that can extract key structural rules in terms of their occurrence in a training set, uses a SMILES-based algorithm to break all possible bonds except for the bonds in rings (Ferrari et al. 2013). (ii) Fingerprint-based approaches leverage well-defined fingerprints of various lengths as the source of substructures. For example, PubChem defined an 881-bit fingerprint in which each bit indicates a unique substructure or a topological feature, whereas MACCS fingerprint contained a subset of 166 keys, which were implemented in open-source cheminformatics software packages, including RDkit (2021), CDK (Ayed et al. 2019), etc. Both of these fingerprint approaches were originally designed to represent molecular features for QSAR studies, but they can also be used to derive structural alerts. (iii) Graph-based approaches are more efficient and faster for deriving substructure sets and identifying highly frequent substructures. Tools including Molecular Substructure miner (MoSS) (Borgelt and Berthold 2002) and the Graph Sequence True extractiON (Gaston) (Nijssen and Kok 2004) allow for a molecule to be regarded as a graph with vertexes/atoms and edges/bonds. The former uses depth-first-search rules, while the latter is more flexible because the vertex can be defined as a generalized atom, which has been used by Kazius et al. (2006) and Wang et al. (2012) for genotoxicity research. For instance, “[N,O]” means an atom that can be a nitrogen or an oxygen. After derivation of the substructure set, statistical metrics such as precision, enrichment factor, likelihood ratio, and p-value are commonly used to assess the

484

C. Lou et al.

quality of structural alerts. Precision is defined as the ratio between true positive samples and all samples predicted to be positive. In structural alert studies, we can treat a compound with the structural alert (t) as a positive prediction (Eq. 20.1), and the structure–substructure association matrix can be calculated via substructure matching. The enrichment factor is determined to evaluate how much the presence of toxic compounds has increased the relative-to-average ratio. Compared to precision, the enrichment factor used by Du et al. (2017) is normalized by the ratio of toxic compounds in the dataset (Eq. 20.2). Similarly, the likelihood ratio is used to evaluate a relative increase in toxic compounds, while it compares the toxic/nontoxic ratio rather than the toxic/total ratio in enrichment factor, and hence it can be infinite if there is no nontoxic compound. The p-value (Eq. 20.3) is calculated to evaluate how significant, with respect to random chance, the presence of a given substructure and concurrent toxicity is related to toxicity. Most studies use Fisher’s exact test of independence, in which a hypergeometric distribution of the probability of k success (toxic compound) by random draws without replacement is used to evaluate significance. Precision = P(X = 1|t) =

k m

k/m P(X = 1|t) = P(X = 1) n/N    m N −m k n−K P-value =   N n

Enrichment factor =

Coverage rate =

m N

E( p) = − p × log( p) − (1 − p) × log(1 − p)

(20.1) (20.2)

(20.3)

(20.4) (20.5)

 IG = E(P(X = 1)) − P(t) × E(P(X = 1|t)) − P t × E(P(X = 0|t) (20.6) where X represents a binary property with two possible values (0 means nontoxicity and 1 means toxicity). For a potential structural alert T, t is the compounds that contain the substructure and t represents the compounds that do not contain the substructure. E(p) is the information entropy, in which p means the probability of the molecules in one category. P(A) means the probability of observing A. P(A|B) is the probability of observing event A given the condition B. N is the total number of compounds in the dataset, n is the number of toxic compounds, m is the number of compounds containing the substructure, and k is the number of toxic compounds containing the substructure.

20 Identification of Structural Alerts by Machine Learning and Their …

485

The four metrics mentioned are similar and mainly focus on how precisely the substructures predict the toxicity of compounds. However, if a substructure is strongly associated with a toxicity, it should also have a high coverage rate (Eq. 20.4). Information gain (IG), a comprehensive metric to assess a substructure from the aspect of precision and coverage rate, is widely used in machine learning for ranking feature importance. It is plausible to use IG to assess substructures, as we can regard structural alerts as key features of compounds in toxicity prediction models. In the field of computational toxicity, some researchers labeled the substructures with the highest IG values as structural alerts. All the conventional statistical analyses mentioned above share several common disadvantages. First, redundancy is an inevitable problem because substructures with only slight differences are all treated as unique structural patterns, but share similar metrics values, resulting in a set of similar or partially repeated structural alerts. It is not easy to select the most meaningful one without consideration of the chemical and biological processes associated with the substructures. Second, in current cheminformatics tools, tautomerism cannot be precisely implemented. In general, only the most stable isomers will be considered during calculation, leading to a lack of consideration of possible structural alerts hidden in other isomers. Moreover, none of these approaches consider the interaction between substructures but default that the toxicity might be caused by a single substructure in isolation. Nevertheless, the toxicity of compounds might be related to more than one substructure, as some compounds with structural alerts are nontoxic.

20.2.2.2

Interpretable Machine Learning Approach

Different from conventional statistical analysis approaches, interpretable machine learning methods explore the relationship between substructures and toxicity through complex algorithms. Those substructures with great impacts on toxicity are labeled as structural alerts (Kim and Nam 2020). In this chapter, we provide an essential overview of two important algorithm categories of interpretable machine learning methods: tree-based algorithms and neural network-based algorithms. Tree-based algorithms, such as random forest and extreme gradient boost, can calculate an importance value for each input substructure feature in classification models through Gini impurity or information gain/information entropy, where all these metrics measure the difference between the original data set and two data sets separated by a substructure feature. Substructures with high feature importance values mean that they are critical for model prediction and can be regarded as potential structural alerts. In other words, the feature importance value reveals the correlation between substructures and toxicity and provides access to model interpretation. Another important interpretable machine learning method, neural network-based algorithms, shows great progress in recent years, especially in the field of deep learning. Notably, not all neural network-based algorithms are interpretable. The original artificial neural network models for toxicity prediction generally incur criticism as being “black box” in nature with questionable reliability, despite their ability

486

C. Lou et al.

in some cases to achieve high performance. To break the shackles of the uninterpretable black box model, many novel deep network architectures and algorithms were proposed in recent decade that focus on increasing the ability of neural networks to automatically learn hidden knowledge within molecules and extract important structural features (e.g., Wu et al. 2021; Xiong et al. 2020; Zhang et al. 2021). For instance, Xu et al. (2017) developed an improved molecular graph encoding convolution neural networks architecture for acute oral toxicity (AOT) prediction, which can automatically learn from the dataset and successfully derive AOT-related chemical substructures by reverse mining of the features. Xiong et al. (2020) introduced a new graph neural network architecture for molecular representation with graph attention mechanism and used atom attention weights for the visualization of model prediction results, which intuitively reveals the contribution of each atom in each molecule to toxicity. Compared with other machine learning methods, deep neural network algorithms show great superiority in feature generation and model performance since they avoid the subjectivity of feature selection. In addition to the model interpretability of the machine learning algorithm itself, several interpretation strategies (e.g., Balfer and Bajorath 2015; Polishchuk 2017; Rodríguez-Pérez et al. 2017) have been proposed to reduce the black box nature of machine learning models. These strategies can be utilized as new approaches to identify potential structural alerts, such as feature weighting and sensitivity analysis (So and Richards 1992). Feature weighting is a model-specific method to evaluate the importance of features and has been applied in support vector machine and random forest models. As a model-independent method, sensitivity analysis observes the influence of the fluctuation of systematic feature on the model output, which is convenient for confirming the most relevant features of the model. However, these approaches have certain limitations and cannot be satisfied with increasingly complex machine learning models. In simple terms, the SHAP method quantifies the contribution of each given structural feature to the model output by turning features on and off and visualizing structural patterns that determine model predictions. This provides new directions for machine learning-driven identification of structural alerts.

20.2.3 Comparison of Data-Driven Structural Alerts with Expert Systems Expert systems are rule-based models, in which the rules are designed by experts according to in vivo/in vitro toxicology experiments. In other words, structural alerts from expert systems are more reliable because the relevant mechanisms are well explored and validated. However, computational approaches for deriving structural alerts depend on mathematical algorithms, lacking direct experimental evidence, which leads to difficulty in mechanism exploration. In most circumstances, datadriven structural alerts need further confirmation by comparison with expert system rules to prove the reliability of algorithms. In addition, computational approaches

20 Identification of Structural Alerts by Machine Learning and Their …

487

may ignore minimally significant structural alerts due to lack of data, but in many cases, these alerts should not be discarded. In terms of toxicity prediction, computational approaches, especially machine learning models, perform better than expert systems. Frequency-based methods can be regarded as a special branch of machine learning, in which the features are all possible fragments/substructures and the algorithm is constructed with IF–THEN logic. Importantly, there are differences at the algorithm level. For example, most machine learning methods optimize the inner parameters to obtain the best average score for a batch of compounds, while the frequency-based methods focus on the frequency of each substructure in related compounds without consideration of the predictive performance of the whole model. Similarly, due to the fact that experts focus on the mechanism by which structural alerts are linked to an adverse event and predict the toxicity of new compounds by whether it contains structural alerts, expert systems cannot optimize model performance like computation methods, resulting in poor predictive ability.

20.3 Application of Structural Alerts in Toxicology 20.3.1 Toxicity Prediction In toxicological studies, structural alerts have a wide range of applications as shown in Fig. 20.3. In the field of toxicity prediction, with known structural alerts, researchers can assess the potential risk of a compound against a certain side effect. For example, the presence of a structural alert is marked as a high-risk signal (Allen et al. 2018). Some web servers or tools have added structural alerts as plug-ins for screening molecules, such as ToxAlerts (Sushko et al. 2012), FAF-Drugs (Lagorce et al. 2015), and ChEMBL (Gaulton et al. 2017). In addition to endpoints of widespread interest, like organ level toxicity (Cronin et al. 2017) and genotoxicity (Snyder 2009), structural alerts are convenient for assessing environmental indicators such as the bioconcentration factor (BCF) which is required by legislation including the European Registration, Evaluation, Authorization, and Restriction of Chemicals (REACH) framework (Valsecchi et al. 2019). With the help of structural alerts, regulators are allowed to conduct risk assessments of chemicals to humans and ecosystems more efficiently. However, using structural alerts alone to predict toxicity will inevitably lead to false negatives. In this scenario, many nontoxic compounds could be labeled as potentially toxic compounds, which would greatly reduce the credibility of structural alert model (Schüürmann et al. 2016). Another criticism follows from the potential abuse of structural alerts, because it leads to an unnecessary narrowing of the chemical space available for drug discovery (Kalgutkar and Dalvie 2015). Hence, when using a structural alert model for toxicity prediction, it is critical to have a measure of the applicability domain (Chakravarti et al. 2012). For instance, Wedlake et al. (2020)

488

C. Lou et al.

Fig. 20.3 Schematic diagram of the application of structural alerts in toxicology

proposed a method for assigning confidence in both active and inactive predictions from structural alerts for protein binding molecule initiating events (MIE). This approach relies on the Tanimoto similarity between Morgan fingerprints of new chemicals to relevant chemicals in the training set and gives different confidence based on cutoff values. Furthermore, several studies have demonstrated that the strategy of combining structural alerts and machine learning models can greatly improve the accuracy of toxicity prediction (e.g., Chakravarti and Saiakhov 2019; Pizzo et al. 2016).

20.3.2 Explanation of QSAR Models Compared with structural alert models, machine learning models show a much more satisfactory predictive performance, owing to the development of molecular representation methods such as molecular fingerprints and molecular graphs. However, these high-performance machine learning models also incur criticism for the lack of interpretability, that is, the predictions cannot be explained and the model applicability domain may not be clear enough (Cherkasov et al. 2014). This problem arises from the complex relationship between molecular features and chemical toxicity based on statistical learning algorithms that QSAR models utilize, and thus they are commonly considered “black boxes.” For example, a neural network can transfer molecular features into hidden neurons, which are the activated values of the sum of different weighted input features. When the network becomes complex, it is difficult

20 Identification of Structural Alerts by Machine Learning and Their …

489

to explain the relationship between the values in hidden layers and input features or output values. Given the interpretability of structural alerts in toxicity mechanism, many researchers (e.g., Chen et al. 2014; Lei et al. 2016; Li et al. 2014) proposed a strategy to overcome the “black box” characteristics of QSAR models by combining with structural alerts. This strategy first needs to construct QSAR models and then derive structural alerts from the same training set through machine learning algorithms or other convention statistical analysis, wherein these structural alerts represent the fragments or function groups that may lead to specific toxicity and explain the predictions of QSAR models by determining the presence of structural alerts in target compounds. Alves et al. (2016) demonstrated that blind reliance on structural alerts in chemical read across and toxicity prediction could lead researchers astray and might be harmful to drug discovery and risk assessment. Importantly, the combination of structural alerts and QSAR models together could compensate for individual shortcomings and provide a comprehensive analysis for potential toxicity risk assessment. Since then, the structural alerts and QSAR models have become golden partners in the field of toxicity prediction. A good example is the feature combination network proposed by Webb et al. (2014) which integrates “black box” QSAR models and “interpretable” structural alerts to predict Ames mutagenicity. However, this concept of integration aroused criticism (Rudin 2019), where it was argued that it is not reliable to explain current machine learning models with structural alerts from other sources. Meanwhile, the author proposed that interpretable models were more important and recommended, as they could not only predict the property of a compound but also provide chemical information or structural alerts on why such a prediction was made. In the past decade, many interpretable machine learning algorithms were proposed such as attention mechanism-based graph neural network (Xiong et al. 2020), knowledge-based deep neural network (Ciallella et al. 2021), and gradientweighted class activation mapping (Mukherjee et al. 2021). As mentioned in the computational approach section, structural alerts can be derived from machine learning models, avoiding the inconsistency between predictions and interpretations. However, these machine learning-based QSAR models were constructed from a mathematical perspective, so data-driven structural alerts still need further verification by in vivo/in vitro experiments.

20.3.3 Molecular Optimization Apart from toxicity prediction, structural alerts can provide important guidance for molecular design. As mentioned above, structural alerts have the potential to cause certain toxicities and therefore one can reduce risks of adverse effects associated with exposure to a chemical through avoiding the occurrence of structural alerts or screening them out in lead discovery (Kalgutkar 2019). However, such avoidance strategies could unnecessarily induce drug attrition in the early stage of drug

490

C. Lou et al.

development, so molecular optimization through structural alerts has received great attention. When structural alerts related to toxicity are identified, two molecular optimization schemes are often considered: (i) replacing a portion of the structural alert or fully replacing it in the lead compounds and (ii) changing the local chemical environment of structural alerts by adding another substructure, which can help reduce potential adverse effects. For the former, bioisosterism replacement and scaffold hopping are promising methods for substructure replacement since the replaced substructure will be similar to the original one, avoiding dramatic changes in the bioactivity of the compound (Hu et al. 2017; Seddon et al. 2018; Vainio et al. 2013). Some web servers such as ADMETopt, which can automatically generate several similar compounds that have a slight difference in the scaffold, can achieve scaffold hopping (Yang et al. 2018a, b). However, in some special circumstances, structural alerts are the pharmacophores that relate to drug efficacy, thus direct modification of structural alerts may destroy the core structure of the compound, resulting in significant changes in chemical properties and bioactivity. The latter strategy changes the local chemical environment of structural alerts by adding certain functional groups or structural patterns to reduce the risk of structural alerts while maintaining bioactivity. For instance, Yang et al. (2018a, b) proposed a concept of nontoxic substructures, whose presence may affect the function of corresponding structural alerts and reduce the risk of toxicity. Nevertheless, research in molecular optimization through structural alerts is still a nascent field. More methodologies need to be developed to fulfill the requirement of rational structural alerts’ modification.

20.3.4 Exploring New Mechanisms Toxicity mechanism exploration is an important part of toxicology research, and structural alerts can help toxicologists further understand the underlying mechanisms of toxicity. It is easy to appreciate that structural alerts are closely related to toxicity and thus a newly detected structural alert may indicate the discovery of a new toxicity mechanism. Some structural alerts may trigger molecular initial events (MIE) of adverse outcome pathways (AOPs, Vinken 2013), while others may participate in metabolism and generate reactive metabolites (RM) that, for example, can covalently bind to proteins or DNA, interact with cytochromes P450 (CYP450), or lead to cellular damage or pharmacokinetic drug–drug interactions. For example, Chen et al. (2014) found that benzimidazole could bind to farnesoid-X-receptor (FXR) and such binding may trigger the MIE in the AOP of endocrine disruption. In this way, analysis of benzimidazole explained the important mechanism of endocrine disruption. Another example is a piperazine ring, which has been proven by Paludetto et al. (2019) as a structural alert that can be metabolized by CYP450 enzymes and generate iminium carbonyl RM, inducing liver toxicity. In the field of toxicity mechanism exploration, knowledge-based structural alerts perform better than data-driven structural alerts. This is because data-driven structural

20 Identification of Structural Alerts by Machine Learning and Their …

491

alerts may not necessarily be linked to the mechanism of action (MOA), as much biochemistry information has not been considered in computational approaches. In some genotoxicity and skin sensitization research, it has been demonstrated that data-driven approaches can identify structural alerts which are highly related to the literature reported mechanism (Yang et al. 2017). Nevertheless, many unconvincing structural alerts are detected simultaneously due to dataset noise, which increases the difficulty of new mechanism exploration for toxicologists. From another perspective, although current methods cannot directly detect or validate potential MOA, with additional effort and integration of other resources, these structural alerts are still helpful to explore new mechanisms of toxicity.

20.4 Perspectives and Outlook Over the past decade, considerable progress has been made in the identification and application of structural alerts. As noted above, both prior knowledge-based expert systems and statistics-based computational approaches greatly promoted the development of structural alerts, providing important guidance for toxicity risk assessment. However, a scientific understanding of structural alerts is mandatory. First, compounds with structural alerts may have potential toxicity but not absolute toxicity, which has been demonstrated by many cases of approval drugs. Secondly, multiple structural alerts may coexist in the same molecule, and their toxic effect may change because of interaction between substructures, such as additive, synergistic, and antagonistic effects. Thirdly, structural alert models for toxicity prediction have a relatively narrow applicability domain, resulting in inferior performance compared to machine learning models. Nevertheless, in terms of model interpretation and visualization, structural alert models have great advantages. Finally, rather than blindly following a structural alert avoidance strategy, when in doubt, we need to let data speak and one thing is clear: Structural alerts are merely alerts. Computational approaches focus on the relationships between the occurrence of structural alerts and toxicity and utilize statistical algorithms to summarize toxicity rules from big data. On the other hand, expert systems solely rely on prior knowledge of toxicity mechanisms to identify structural alerts. Therefore, the former has made great achievements in toxicity prediction, while the latter is more prominent in toxicity interpretation. This situation suggests that integrating chemical rules from expert systems in the training of machine learning models may perform better in identification of structural alerts, and thus, a novel framework that can utilize machine learning methodology to explore toxicity mechanisms based on MOA is urgently needed. Furthermore, the development of molecular optimization methods based on structural alerts should not be ignored either. Current chemical rules for structural alert optimization are mainly summarized from expert systems and very few computational researches focus on this field. In conclusion, the identification of structural alerts and their application in toxicology has gone through a great breakthrough from conventional expert systems to artificial intelligence algorithms, but one thing

492

C. Lou et al.

remains true: The important position of structural alerts in toxicology research has never changed.

References Allen TEH, Goodman JM, Gutsell S, Russell PJ (2018) Using 2D structural alerts to define chemical categories for molecular initiating events. Toxicol Sci 165:213–223 Alves V, Muratov E, Capuzzi S, Politi R, Low Y, Braga R, Zakharov AV, Sedykh A, Mokshyna E, Farag S, Andrade C, Kuz’min V, Fourches D, Tropsha A (2016) Alarms about structural alerts. Green Chem 18:4348–4360 Ashby J (1985) Fundamental structural alerts to potential carcinogenicity or noncarcinogenicity. Environ Mutagen 7:919–921 Australian Government Department of Health. https://www.nicnas.gov.au. Accessed 7 Jan 2019 Ayed M, Lim H, Xie H (2019) Biological representation of chemicals using latent target interaction profile. BMC Bioinform 20(Suppl 24):674 Balfer J, Bajorath J (2015) Visualization and interpretation of support vector machine activity predictions. J Chem Inf Model 55(6):1136–1147 Bertrand C, Guillaume P, Bruno C, Alban L, Ronan B (2012) Emerging patterns as structural alerts for computational toxicology. In: Contrast data mining. Chapman and Hall/CRC, pp 259–272 BIOVIA. Pipeline Pilot. https://www.3dsbiovia.com/products/collaborative-science/biovia-pip eline-pilot/. Accessed 11 Jan 2021 Borgelt C, Berthold MR (2002) Mining molecular fragments: finding relevant substructures of molecules. In: 2002 IEEE international conference on data mining, 2002. ICDM 2003. Proceedings. IEEE, pp 51–58 Chakravarti SK, Saiakhov RD (2019) Computing similarity between structural environments of mutagenicity alerts. Mutagenesis 34:55–65 Chakravarti SK, Saiakhov RD, Klopman G (2012) Optimizing predictive performance of CASE Ultra expert system models using the applicability domains of individual toxicity alerts. J Chem Inf Model 52(10):2609–2618 ChemoTyper Community Website. https://chemotyper.org/. Accessed 11 Jan 2021 Chen Y, Cheng F, Sun L, Li W, Liu G, Tang Y (2014) Computational models to predict endocrinedisrupting chemical binding with androgen or oestrogen receptors. Ecotoxicol Environ Saf 110:280–287 Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, Dearden J, Gramatica P, Martin YC, Todeschini R, Consonni V, Kuz’min VE, Cramer R, Benigni R, Yang C, Rathman J, Terfloth L, Gasteiger J, Richard A, Tropsha A (2014) QSAR modeling: where have you been? Where are you going to? J Med Chem 57:4977–5010 Ciallella HL, Russo DP, Aleksunes LM, Grimm FA, Zhu H (2021) Revealing adverse outcome pathways from public high-throughput screening data to evaluate new toxicants by a knowledgebased deep neural network approach. Environ Sci Technol 55(15):10875–10887 Coquin L, Canipa SJ, Drewe WC, Fisk L, Gillet VJ, Patel M, Plante J, SherhodRJ VJD (2015) New structural alerts for Ames mutagenicity discovered using emerging pattern mining techniques. Toxicol Res 4(1):46–56 Cronin MTD, Enoch SJ, Mellor CL, Przybylak KR, Richarz AN, Madden JC (2017) In silico prediction of organ level toxicity: linking chemistry to adverse effects. Toxicol Res 33:173–182 Du H, Cai Y, Yang H, Zhang H, Xue Y, Liu G, Tang Y, Li W (2017) In silico prediction of chemicals binding to aromatase with machine learning methods. Chem Res Toxicol 30:1209–1218 Ferrari T, Cattaneo D, Gini G, Golbamaki Bakhtyari N, Manganaro A, Benfenati E (2013) Automatic knowledge extraction from chemical structures: the case of mutagenicity prediction. SAR QSAR Environ Res 24:631–649

20 Identification of Structural Alerts by Machine Learning and Their …

493

Gaulton A, Hersey A, Nowotka M, Bento AP, Chambers J, Mendez D, Mutowo P, Atkinson F, Bellis LJ, Cibrián-Uhalte E, Davies M, Dedman N, Karlsson A, Magariños MP, Overington JP, Papadatos G, Smit I, Leach AR (2017) The ChEMBL database in 2017. Nucleic Acids Res 45:D945–D954 Hu YD, Stumpfe D, Bajorath J (2017) Recent advances in scaffold hopping. J Med Chem 60:1238– 1246 Kalgutkar AS (2019) Designing around structural alerts in drug discovery. J Med Chem 63(12):6276–6302 Kalgutkar AS, Dalvie D (2015) Predicting toxicities of reactive metabolite-positive drug candidates. Annu Rev Pharmacol Toxicol 55:35–54 Kazius J, Nijssen S, Kok J, Bäck T, Ijzerman AP (2006) Substructure mining using elaborate chemical representation. J Chem Inf Model 46:597–605 Kim H, Nam H (2020) hERG-Att: self-attention-based deep neural network for predicting hERG blockers. Comput Biol Chem 87:107286 Lagorce D, Sperandio O, Baell JB, Miteva MA, Villoutreix BO (2015) FAF-Drugs3: a web server for compound property calculation and chemical library design. Nucleic Acids Res 43:W200–W207 Lei T, Li Y, Song Y, Li D, Sun H, Hou T (2016) ADMET evaluation in drug discovery: 15. Accurate prediction of rat oral acute toxicity using relevance vector machine and consensus modeling. J Cheminform 8:6 Li W, Lin D, Gao H, Xu Y, Meng D, Smith CV, Peng Y, Zheng J (2016) Metabolic activation of furan moiety makes Diosbulbin B hepatotoxic. Arch Toxicol 90(4):863–872 Li X, Chen L, Cheng F, Wu Z, Bian H, Xu C, Li W, Liu G, Shen X, Tang Y (2014) In silico prediction of chemical acute oral toxicity using multi-classification methods. J Chem Inf Model 54:1061–1069 Limban C, Nu¸ta˘ DC, Chiri¸ta˘ C, Negres, S, Arsene AL, Goumenou M, Karakitsios SP, Tsatsakis AM, Sarigiannis DA (2018) The use of structural alerts to avoid the toxicity of pharmaceuticals. Toxicol Rep 5:943–953 Maron DM, Ames BN (1983) Revised methods for the Salmonella mutagenicity test. Mutat Res 113:173–215 Mukherjee A, Su A, Rajan K (2021) Deep learning model for identifying critical structural motifs in potential endocrine disruptors. J Chem Inf Model 61(5):2187–2197 Nendza M, Wenzel A, Müller M, Lewin G, Simetska N, Stock F, Arning J (2016) Screening for potential endocrine disruptors in fish: evidence from structural alerts and in vitro and in vivo toxicological assays. Environ Sci Eur 28(1):26 Nijssen S, Kok JN (2004) A quickstart in frequent structure mining can make a difference. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 647–652 Paludetto MN, Puisset F, Chatelut E, Arellano C (2019) Identifying the reactive metabolites of tyrosine kinase inhibitors in a comprehensive approach: implications for drug–drug interactions and hepatotoxicity. Med Res Rev 39(6):2105–2152 Pizzo F, Lombardo A, Manganaro A, Benfenati E (2016) A new structure-activity relationship (SAR) model for predicting drug-induced liver injury, based on statistical and expert-based structural alerts. Front Pharmacol 7:442 Polishchuk P (2017) Interpretation of quantitative structure–activity relationship models: past, present, and future. J Chem Inf Model 57(11):2618–2639 PubChem. PubChem Substructure Fingerprint. ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/ pubchem_fingerprints.txt. Accessed 11 Jan 2021 Raies AB, Bajic VB (2016) In silico toxicology: computational methods for the prediction of chemical toxicity. Wiley Interdiscip Rev Comput Mol Sci 6(2):147–172 RDKit: Open-Source Cheminformatics Software. http://www.rdkit.org. Accessed 11 Jan 2021 Ridings JE, Barratt MD, Cary R, Earnshaw CG, Eggington CE, Ellis MK, Judson PN, Langowski JJ, Marchant CA, Payne MP, Watson WP, Yih TD (1996) Computer prediction of possible toxic action from chemical structure: an update on the DEREK system. Toxicology 106:267–279

494

C. Lou et al.

Roberts G, Myatt GJ, Johnson WP, Cross KP, Blower PE Jr (2000) LeadScope: software for exploring large sets of screening data. J Chem Inf Comput Sci 40:1302–1314 Rodríguez-Pérez R, Vogt M, Bajorath J (2017) Support vector machine classification and regression prioritize different structural features for binary compound activity and potency value prediction. ACS Omega 2(10):6371–6379 Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1:206–215 Schüürmann G, Ebert RU, Tluczkiewicz I, Escher SE, Kühne R (2016) Inhalation threshold of toxicological concern (TTC)—structural alerts discriminate high from low repeated-dose inhalation toxicity. Environ Int 88:123–132 Seddon MP, Cosgrove DA, Gillet VJ (2018) Bioisosteric replacements extracted from high-quality structures in the protein databank. ChemMedChem 13:607–613 Sherhod R, Judson PN, Hanser T, Vessey JD, Webb SJ, Gillet VJ (2014) Emerging pattern mining to aid toxicological knowledge discovery. J Chem Inf Model 54:1864–1879 Snyder RD (2009) An update on the genotoxicity and carcinogenicity of marketed pharmaceuticals with reference to in silico predictivity. Environ Mol Mutagen 50:435–450 So SS, Richards WG (1992) Application of neural networks: quantitative structure-activity relationships of the derivatives of 2,4-diamino-5-(substituted-benzyl)pyrimidines as DHFR inhibitors. J Med Chem 35:3201–3207 Stepan AF, Walker DP, Bauman J, Price DA, Baillie TA, Kalgutkar AS, Aleo MD (2011) Structural alert/reactive metabolite concept as applied in medicinal chemistry to mitigate the risk of idiosyncratic drug toxicity: a perspective based on the critical examination of trends in the top 200 drugs marketed in the United States. Chem Res Toxicol 24(9):1345–1410 ToxTree website. http://toxtree.sourceforge.net/. Accessed 11 Jan 2021 Sushko I, Salmina E, Potemkin VA, Poda G, Tetko IV (2012) ToxAlerts: a Web server of structural alerts for toxic chemicals and compounds with potential adverse reactions. J Chem Inf Model 52:2310–2316 Vainio MJ, Kogej T, Raubacher F, Sadowski J (2013) Scaffold hopping by fragment replacement. J Chem Inf Model 53:1825–1835 Valsecchi C, Grisoni F, Consonni V, Ballabio D (2019) Structural alerts for the identification of bioaccumulative compounds. Integr Environ Assess Manag 15:19–28 Vinken M (2013) The adverse outcome pathway concept: a pragmatic tool in toxicology. Toxicology 312:158–165 Wang Y, Lu J, Wang F, Shen Q, Zheng M, Luo X, Zhu W, Jiang H, Chen K (2012) Estimation of carcinogenicity using molecular fragments tree. J Chem Inf Model 52:1994–2003 Webb SJ, Hanser T, Howlin B, Krause P, Vessey JD (2014) Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity. J Cheminform 6(1):8 Wedlake AJ, Allen TEH, Goodman JM, Gutsell S, Kukic P, Russell PJ (2020) Confidence in inactive and active predictions from structural alerts. Chem Res Toxicol 33(12):3010–3022 Wu Z, Jiang D, Wang J, Hsieh CY, Cao D, Hou T (2021) Mining toxicity information from large amounts of toxicity data. J Med Chem 64(10):6924–6936 Xiong Z, Wang D, Liu X, Zhong F, Wan X, Li X, Li Z, Luo X, Chen K, Jiang H, Zheng M (2020) Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J Med Chem 63(16):8749–8760 Xu Y, Pei J, Lai L (2017) Deep learning based regression and multiclass models for acute oral toxicity prediction with automatic chemical feature extraction. J Chem Inf Model 57:2672–2685 Yang H, Li J, Wu Z, Li W, Liu G, Tang Y (2017) Evaluation of different methods for identification of structural alerts using chemical Ames mutagenicity data set as a benchmark. Chem Res Toxicol 30(6):1355–1364 Yang H, Lou C, Li W, Liu G, Tang Y (2020) Computational approaches to identify structural alerts and their applications in environmental toxicology and drug discovery. Chem Res Toxicol 33(6):1312–1322

20 Identification of Structural Alerts by Machine Learning and Their …

495

Yang H, Sun L, Li W, Liu G, Tang Y (2018a) Identification of nontoxic substructures: a new strategy to avoid potential toxicity risk. Toxicol Sci 165(2):396–407 Yang H, Sun L, Wang Z, Li W, Liu G, Tang Y (2018b) ADMETopt: a web server for ADMET optimization in drug design via scaffold hopping. J Chem Inf Model 58:2051–2056 Zhang XC, Wu CK, Yang ZJ, Wu ZX, Yi JC, Hsieh CY, Hou TJ, Cao DS (2021) MG-BERT: leveraging unsupervised atomic representation learning for molecular property prediction. Brief Bioinform 22(6):bbab152

Chapter 21

Machine Learning in Prediction of Nanotoxicology Li Mu, Fubo Yu, Yuying Jia, Shan Sun, Xiaokang Li, Xiaolin Zhang, and Xiangang Hu

21.1 Introduction In the recent years, nanomaterials have been widely used in various fields. Nanomaterials are defined as materials with at least one dimension in the nanometer sizes (between approximately 1 and 100 nm). Nanomaterials are commonly classified into zero-dimensional (0D) (e.g., quantum dots), one-dimensional (1D) (e.g., nanotubes and nanofibers), two-dimensional (2D) (e.g., nanosheets and nanofilms), and threedimensional (3D) (e.g., bulk materials composed of nanoparticles) nanomaterials according to their sizes. The nanomaterials at a nanoscale have unique physical and chemical properties and thereby are widely used in the fields such as environmental protection, energy science, and life science. With the increasing usage and disposals of nanomaterials, their toxicity and hazards to the environment and human health have attracted more and more attentions. Many studies have shown that some nanomaterials are harmful to cells and organisms. For example, TiO2 nanoparticle is one of the most widely used nanomaterials in consumer products, agriculture, and energy L. Mu Tianjin Key Laboratory of Agro-environment and Safe-Product, Key Laboratory for Environmental Factors Control of Agro-product Quality Safety (Ministry of Agriculture and Rural Affairs), Institute of Agro-environmental Protection, Ministry of Agriculture and Rural Affairs, Tianjin 300191, China e-mail: [email protected] F. Yu · Y. Jia · S. Sun · X. Zhang · X. Hu (B) Key Laboratory of Pollution Processes and Environmental Criteria (Ministry of Education)/Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300350, China e-mail: [email protected] X. Li School of Environmental and Material Engineering, Yantai University, Yantai 264005, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Hong (ed.), Machine Learning and Deep Learning in Computational Toxicology, Computational Methods in Engineering & the Sciences, https://doi.org/10.1007/978-3-031-20730-3_21

497

498

L. Mu et al.

sectors but is toxic to multiple taxa of microorganisms, algae, plants, invertebrates, and vertebrates (Hou et al. 2019). Graphene nanoribbons were proved genotoxicity to human mesenchymal stem cells (Akhavan et al. 2013). CdTe nanocrystals, as one of the promising quantum dots, showed reproductive toxicity in mice (Akhavan et al. 2016). Understanding the toxic behavior of nanomaterials is very important to prevent the risks and hazards of nanomaterials to the environment and human health. To evaluate toxicity of nanomaterials, traditional biological experimental methods are typically time-consuming, laborious, and expensive. Moreover, the structure and properties of newly produced nanomaterials are becoming more and more complex, posing a great challenge to the traditional toxicological research methods based on biological experiments. Recently, the rapid development of machine learning techniques is expected to reduce the time and cost for nanosafety predictions. Machine learning is a subfield of artificial intelligence, which focuses on using data and algorithms to imitate human learning methods and gradually improve its accuracy. Machine learning can automatically learn knowledge from existing empirical data (training data) and predict specific toxic effects of nanomaterials, with fast calculation speed. A large number of literatures have used machine learning methods to predict the toxicity of nanomaterials. The widely used machine learning algorithms for predicting the toxicity of nanomaterials include random forest (RF), support vector machine (SVM), artificial neural network (ANN), Bayesian network (BN), and so on. Based on a large number of existing toxicity data of nanomaterials, machine learning methods can accurately predict the toxicity of new nanomaterials under different experimental conditions, significantly reducing costs and animal tests and putting insights into the design of safe nanomaterials (Fig. 21.1).

Fig. 21.1 Prediction of immune interaction of nanomaterials by machine learning (Feng et al. 2021)

21 Machine Learning in Prediction of Nanotoxicology

499

21.2 Toxicity of Nanomaterials Currently, numerous studies have shown that the toxicity of nanomaterials is determined by their properties (e.g., type, size, surface properties) and dosages. The type of nanomaterials is a critical factor in determining their nanotoxicity because some materials are themselves toxins (Teo et al. 2014). The widely used nanomaterials include carbon nanomaterials (CNMs), metal nanomaterials, metal oxide nanomaterials, transition metal dichalcogenides (TMDs), metal organic frameworks (MOFs), and carbon organic frameworks (COFs). The properties of these nanomaterials vary greatly and also significantly affect their toxicity. The nanoscale sizes give them the ability to cross biological barriers, thereby transporting and accumulating in organs, tissues and cells, and causing toxic effects. The surface properties of nanomaterials also affect their recognition by organisms. Studies on nanotoxicity, including reproductive toxicity, immunotoxicity, cytotoxicity, and genotoxicity, of nanomaterials have been conducted at individual, tissue, cellular, and molecular level.

21.2.1 Toxicity of Carbon Nanomaterials CNMs are an extensive family of carbon allotropes with a rich variety. The most common CNMs are 0D quantum dots and fullerenes, 1D carbon nanotubes (CNTs), and 2D graphene and its derivatives. The excellent strength, elasticity, electrical conductivity, and thermal conductivity give CNMs a great application potential and have attracted increasing attention in the biological field such as healthcare and biosensor in vivo and in vitro. The potential hazards of CNMs include cell membrane damage, apoptosis/necrosis, mitochondrial DNA damage, lipid peroxidation, oxidative stress, inflammation, and others. The different exposure pathways and types of CNMs dominate the hazards of CNMs. For example, CNTs inhaled by humans cause pulmonary immunotoxicity and induce pulmonary fibrosis and granuloma. Graphene has been founded to induce dose-dependent pulmonary inflammatory response but does not cause fibrosis. As two typical types of CNMs, multi-walled carbon nanotubes (MWCNTs) induced toxicity through oxidative stress, while single-walled carbon nanotubes (SWCNTs) damaged cells directly (Lee et al. 2010). CNMs in the environment could contact biological tissues and can be adsorbed onto biological surfaces. Graphene-based nanomaterials [e.g., graphene oxide (GO), SWCNT, and graphene oxide quantum dot (GOQD)] in aqueous medium were found to cover the surface of plant roots, algal cells, and zebrafish embryos (Hu et al. 2014b; Ouyang et al. 2015; Chen et al. 2016b). The envelopment of nanomaterials onto biological surfaces induced cellular structure damage, increase in cell membrane permeability (Hu et al. 2015a), loss of root integrity (Du et al. 2020), and hypoxia in the embryos (Ouyang et al. 2015). Furthermore, CNMs on the surface of organisms could enter biological tissue and damage cell ultrastructure (Hu et al. 2014b). For example, GO was up taken by zebrafish and translocated to the fish mouth, yolk

500

L. Mu et al.

sac, cardiac section, and tail blood, causing developmental toxicity (e.g., increased embryo heart rates, malformations, mortality rates, severe hatching delay, inhibited spontaneous movements of embryos, and shortened body length) (Chen et al. 2016b; Zhang et al. 2017). The abovementioned developmental toxicity attributed to DNA modification, protein carbonylation, and excessive generation of reactive oxygen species (ROS) (Zhang et al. 2017; Zou et al. 2018). Studies also demonstrated that CNMs (e.g., GO, GOQD, SWCNT, and graphene) in aqueous medium could enter algae and plant root cells through increased permeability and spontaneous membrane penetration (Hu et al. 2014a, 2015a; Ouyang et al. 2015; Du et al. 2020), resulting in remarkable plasmolysis, destruction of the organelle structure (e.g., vacuolar, chloroplasts and thylakoids), increase of oxidative stress, photosynthesis toxicity (e.g., decrease in biosynthesis of chlorophyll and net photosynthetic rate), inhibition of plant growth, and decrease of cell wall thickness and lignin content (Hu et al. 2014c, 2015a; Ouyang et al. 2015; Zhou and Hu 2017; Du et al. 2020; Li et al. 2021). Nanomaterials (e.g., graphene and CuO nanoparticles) in plant root cells were further migrated to plant shoots, resulting in inhibition of seedlings growth, cell shrinkage, chloroplast cross-linking of shoot cells, and alteration of shoot morphology (Wang et al. 2012b; Hu et al. 2014c). Studying the influence of CNMs on the edible part of the plant is vital to food safety. Soil cultivation experiments confirmed that CNMs (e.g., GO, rGO and GOQD) could be transported to the edible part of the plant (e.g., pepper fruits and wheat grains) from stems (Rico et al. 2013; Li et al. 2018b, 2021). Simultaneously, physiological disorders (e.g., increase of H2 O2 and MDA levels, decrease in CAT activity, and cell wall damage) in pepper fruits were observed (Li et al. 2021). Besides, the CNMs (e.g., GO, rGO and GOQD) were detrimental to grain quality (Rico et al. 2013; Li et al. 2018b), including reducing the levels of amylose, amylopectin, globulin, prolamin, mineral elements and, change in protein secondary structure (Li et al. 2018b). Beyond the parent biological toxicity, CNMs could translocate from parent to offspring and further induce toxicity of offspring. For example, studies found GO was able to translocate from parent to the brain of offspring fish and induced neurotoxicity (e.g., loss of dopaminergic neurons and acetylcholinesterase activity) in offspring (Ren et al. 2016; Hu et al. 2017). Studies also detected that CNMs affected the microbial community structure in soil environments and organisms. Microbial species richness and community diversity in soils exposed to CNMs (e.g., GO and rGOPd) were increased, compared with the control (Du et al. 2015b; Zhou et al. 2021). Simultaneously, CNMs (e.g., rGO-Pd) altered the utilization of carbon sources by soil microbes, such as reduction of sugars, polymers, carboxylic acids, amino acids, phenolic acids, and amines (Zhou et al. 2021). Beyond microbial community in soils, CNMs entered plant root cells and then altered endophytic bacterial communities of rice root (Du et al. 2020). In addition to the direct toxicity of CNMs, CNMs also affect the toxicity of other pollutants (e.g., organic pollutants and heavy metals). For example, GO enhanced the uptake of As and the transformation of As (V) to high-toxicity As (III), further greatly amplifying the phytotoxicity of arsenic (As) in wheat, such as a decrease in biomass and root numbers, and increase in oxidative stress (Hu et al. 2014c).

21 Machine Learning in Prediction of Nanotoxicology

501

Besides, GO significantly enhanced the accumulation of polycyclic aromatic hydrocarbons in rice roots and promoted the increase of aryl hydrocarbon receptor and cytochrome P450 levels (Li et al. 2018a). GO promoted the bioaccumulation of Triphenyl phosphate (TPhP) in zebrafish, amplified the growth inhibition of embryos (malformation, mortality, heartbeat, and spontaneous movement) induced by TPhP (Zhang et al. 2020b). However, for tris(1,3-dichloro-2-propyl) phosphate (TDCIPP), GO significantly reduced the mortality and malformation rates of zebrafish induced by TDCIPP (Zou et al. 2020b). The toxicity of CNMs was also affected by their physicochemical properties (e.g., type, size, shape, phase, nanoholes, concentration, surface area, and functionalization of small biomolecules) (Mu et al. 2015; Tong et al. 2019; Li et al. 2020, 2021; Zhang et al. 2020a). For example, GO with a smaller size and lower oxidation content triggers stronger toxicity (e.g., stronger oxidative stress, lower mineral element and chlorophyll content, and higher inhibition of cell division) (Ouyang et al. 2015; Li et al. 2018b, 2021; Zou et al. 2020a). Moreover, the physicochemical properties of CNMs could be influenced by environmental factors (e.g., light irradiation, hydration, natural organic matter, biological secretions, biomedium, and temperature) (Hu et al. 2014c, 2015b; Mu et al. 2016; Li et al. 2017b), further exhibiting different nanotoxicology. For example, GO combined with zebrafish secretions exhibited smaller lateral sizes, more negative surface charges, and lower aggregation state than GO (Mu et al. 2016). GO integrated with root exudates exhibited increased thickness, reduced transparency and size, more unpaired electrons, and disordered structures (Li et al. 2017b). Toxicity studies showed that GO with biological secretions (e.g., root exudates and zebrafish secretions) triggered higher toxicity than GO, such as malformation, death, loss in mitochondrial membrane potential, and inhibiting oxygen exchange of zebrafish embryos (Du et al. 2015a; Mu et al. 2016). Moreover, hydration and irradiation resulted in greater disorder in the graphene structure due to the introduction of water molecules and modifications of the functional groups. The above property alterations mitigated the nanotoxicity of graphene to algal cells by reducing ROS levels, protein carbonylation, and tail DNA (Hu et al. 2015b).

21.2.2 Toxicity of Transition Metal Dichalcogenides 2D materials have been widely explored for their unique electronic, mechanical, and catalytic properties (Lorchat et al. 2020; Sun et al. 2020; Ding et al. 2021). Transition metal dichalcogenides (TMDs) are a class of 2D materials that gained significant interest in the wake of graphene due to their broadly tunable chemical physical properties (Liu et al. 2014; Loan et al. 2014; Chen et al. 2018). TMDs, with a general formula of MX2, in which M represents any transition metal element from group IV, V, or VI (e.g., Ti, Zr, V, Nb, Mo, and W) in the periodic table and X represents a chalcogen (e.g., S, Se, or Te), are a family of around 60 materials, some of which exist as naturally occurring minerals (Makovicky 2006; Wang et al.

502

L. Mu et al.

2012a). TMDs do not consist of single atomic layer like graphene, but the basic unit has three layers of X–M–X which are bonded in the third dimension by weak van der Waals forces (Eng et al. 2014; Ambrosi et al. 2015). The relatively strong intralayer bonding is covalent, whereas the interlayer interactions are dictated by van der Waals forces. This weak interlayer bonding allows for relatively straightforward exfoliation at ion of the bulk material into few layer or monolayer form (Butler et al. 2013). Due to their high surface area and chemical reactivity, significant work has been pursued to integrate some of these unique properties of TMDs for applications (Sarkar et al. 2014; Feng et al. 2015; Li et al. 2017a; Zhu et al. 2019). For example, biosensing platforms have been fabricated based on TMDs for detection of biological molecules (Kalantar-zadeh et al. 2015). The high near infrared absorbance of selected TMDs, such as molybdenum dichalcogenides (MoS2), tungsten dichalcogenides (WS2), and titanium dichalcogenides (TiS2), have made them ideal agents for photothermal therapy (Chou et al. 2013; Cheng et al. 2015). In particular, different combinations of transition metals and chalcogens as well as their various arrangements in the 2D crystals could lead to a substantial range of properties, making TMD materials interesting for applications (Hao et al. 2017). TMDs nanosheets have a large surface that provides a flat surface for protein adsorption and spreading, which can adsorb proteins and alter their structures more easily than a spherical or a rod-like nanoparticle due to the surface curvature on them (Luo et al. 2017). Considering the great potential of TMDs, it is critically important to explore and examine the toxicity that TMDs might present in a biological system as well as the degree of safety with regards to their use. Chemical composition is one of the most important considerations for determining the biological interactions and fate of TMDs materials in vivo (Guiney et al. 2018). The surface chemistry and the dissolution of the material are determined by its chemical composition, which in turn affect the cellular interactions, uptake, and biodistribution (Nel et al. 2013; Zhu et al. 2013). In the past few years, there have been a number of reports studying the toxicity of TMDs nanomaterials in vitro and in vivo (Chng et al. 2014; Chen et al. 2015; Yong et al. 2015; Gu et al. 2016). For 2D materials, several studies have assessed the cytotoxicity of TMDs according to composition (Teo et al. 2014; Latiff et al. 2017; Shang et al. 2017; Chia et al. 2018). For example, SLMoS2 (10–100 mg/L) triggered toxicity to human epithelial kidney cells (HEK293f) and the bacteria but did not induce mutation or malformation (Appel et al. 2016). On the other hand, it was found that the studied TMDs (MoS2, WS2, and WSe2) exhibited significantly lower toxicity than GO tested on the same cells in the same conditions (Teo et al. 2014). Biological cells are the basic building blocks of life (Lee et al. 2018). The cells plasma membranes serve as communication interfaces with their environment, and they are also the first barriers against the interaction of the cells with nanomaterials (Jin et al. 2018). Cell morphology has been identified as a potential indicator of cell response to biomaterials (Jin et al. 2018). Machine learning models, as datadriven models, exhibit advantages in the analysis of complex relationships, but the interpretability of the models is poor (Liu et al. 2018). Structural equation models revealed the quantitative relationships among various variables and improved the

21 Machine Learning in Prediction of Nanotoxicology

503

interpretability of machine learning models (Schnell et al. 2020). Machine learning can be used to identify the facial structure and skeletal structure of cells exposed to TMDs at an early stage to predict the cell fate (Chen et al. 2016a; Sun et al. 2021). A systematic analysis and prediction of the cell morphological response to the physical and biochemical properties of their surrounding microenvironment could be carried out using machine learning.

21.2.3 Toxicity of MOFs MOFs are porous crystalline materials composed of inorganic nodes connected by organic bridges and have highly diverse structures, large specific surface area, high porosity, and customizable chemical properties. A comprehensive assessment of potential environmental and health risks associated with the release of MOFs into the environment during their life cycle (including their manufacture, use, and disposal) is required before and after they enter the market. The degradation of MOFs in solution produces metal ions, which strongly determines the toxicity of these nanomaterials. Other factors such as the formation of other species during degradation and metal cores may also affect the toxicity of MOFs (Ruyra et al. 2015). Embryos and adult zebrafish were sensitive to Cu-MOF materials, and the toxicity was largely restricted to copper release from the material structure into solution (Abramenko et al. 2021). Iron (III) polycarboxylate structure-based MOFs were found to accumulate in acidic cells and then slowly decompose, reaching 10–15% decomposition over 24 h, although the toxicity of the degraded products remains unclear (Durymanov et al. 2019). In contrast, chromium (III)-trimer cross-linked MOFs did not present significantly acute or subacute toxicity in male and female mice (Liu et al. 2019). Zeolitic imidazolate frameworks (ZIFs) constitute a subfamily of typical MOFs and are porous crystalline materials that consist of tetrahedral clusters of MN4 linked by imidazolate ligands. ZIFs have larger surface areas and pore volumes than traditional MOFs, as well as relatively high chemical stability and thermal stability. As nano-enzymes, ZIFs exhibit antibacterial, anti-inflammatory, and antioxidative damage activities due to the quenching of ROS contents in cells or in vivo. ZIFs have been reported to cause significant physiological defects in different model organisms (e.g., zebrafish). The inhibitory effect of ZIF-8 on the growth of microcystis aeruginosa was also observed (Fan et al. 2019). In contrast to instant toxicity, the persistence and recovery of phytotoxicity after exposure to ZIFs provide an in-depth understanding of toxicology. Some biological endpoints (e.g., ROS levels) were persistent, while some (e.g., growth inhibition, membrane permeability, and chlorophyll biosynthesis) were recoverable (Zhang et al. 2021). Most of the indicators of toxicity were not significantly different from those of the control group after recovery.

504

L. Mu et al.

21.3 Prediction of Nanotoxicity by Machine Learning 21.3.1 Prediction of Carbon Nanomaterials Toxicity by Machine Learning Given the fast renewal of CNMs, it is necessary to achieve quick assessment of the potential impact of CNMs on human health through non-biological methods. There has been an increasing amount of studies using machine learning to predict the toxicity of nanomaterials (Fourches et al. 2010). In the early stage of machine learningCNMs research, scientists usually focused on the toxicity of CNTs, the representative 1D nanomaterial. The datasets assembled from CNTs inhalation studies contain various CNTs with different synthesis methods, quality, and impurities, and RF model was then selected to construct the regression analysis (Gernand and Casman 2014). The pulmonary toxicity endpoints predicted were the number of polymorphonuclear neutrophils, number of macrophages, and lactate dehydrogenase, and total protein concentrations. The R2 values of the pulmonary toxicity using machine learning models were between 0.88 and 0.96. This study proved the feasibility of exploring the relationship between the properties and toxicity of CNTs through machine learning, indicating that RF could produce the expected exponential-shaped dose–response curve without any prior assumptions. In addition, the importance analysis of RF model also showed that the dose of CNTs, length of the recovery duration, dose of metallic impurities, and CNTs’ diameter significantly affected the toxicity of the CNTs. Decision tree model was used to explore the cytotoxicity of nanomaterials and further proved an aspect ratio dependency of the cytotoxicity of CNMs (Labouta et al. 2019). A study based on association rule mining (a type of machine learning approach) also confirmed the import role of aspect ratio affecting the toxicity of CNMs, and the aspect ratios were found to be more beneficial between 10 and 100 for the safe design of CNMs (Gul et al. 2021). Another representative material of CNMs is graphene, which opened the prelude to the research of 2D nanomaterials. The toxicity of graphene is worthy of in-depth study because it is regarded as the fundamental building block for graphitic carbon-based nanomaterials of all other dimensions, such as fullerenes, CNTs, and 3D graphite (Choudhary et al. 2014). However, graphene is usually included in the family of CNMs for macroscopic machine learning nanomaterial toxicity studies, and only few studies have specifically performed machine learning-graphene toxicity exploration. Machine learning models including RF, SVM, LASSO (least absolute shrinkage and selection operator) regression, and elastic net were constructed based on 10 graphene properties (diameter, surface modification, oxidation state, exposure dose, exposure time, detection method, organ type, cell morphology, cell source, and cell line) to explore the cytotoxicity of graphene, and the predictive models had R2 values of 0.805, 0.903, and 0.986 for cell viability, IC50 , and lactate dehydrogenase release, respectively (Ma et al. 2021). Graphene diameter was identified as an important factor in controlling its toxicity.

21 Machine Learning in Prediction of Nanotoxicology

505

21.3.2 Prediction of Nanometal Toxicity by Machine Learning Metal-based nanomaterials are a kind of important engineering nanomaterials and are widely used in many fields. For example, metals and metal hydrides have excellent ability to store hydrogen (Luo et al. 2020). MOFs composites were used for efficient oxygen electro-catalysis (Guo et al. 2019). As electrode materials for supercapacitors, MOFs offered high capacitances (Wu et al. 2019). Silver nanoclusters encapsulated into MOFs can rapidly remove heavy metal ions from water (Zhuang et al. 2019). However, metal-based nanomaterials are the main research objects in nanotoxicology. Metal nanomaterials enter the human body from the surrounding environment and can translocate to different tissues, where they are accumulated and develop distinct pathologies (Lachowicz et al. 2021). It is found that exposure to metal-based nanomaterials can interfere with the homeostasis of essential elements and bring adverse effects and related long-term effects on human health (Wang et al. 2021). Nanometal materials have significant toxicity to organisms, and the toxicity is closely related to the size and concentration of nanometal materials. Constructing quantitative structure–activity relationship (QSAR) is a common method to study the toxicity of nanoparticles. For example, QSAR models were developed using machine learning approaches such as SVM-based classification and kNN-based regression (Fourches et al. 2010). QSAR models were used to predict in vitro cytotoxicity of 51 various manufactured nanoparticles with diverse metal cores. There is a huge amount of literatures about the toxicity of nanometal materials, and the number is growing rapidly. However, the diversity and heterogeneity of the nanometal toxicity data, as well as the missing/unreported information make it difficult to build reliable nanostructure–activity relationship models. Nanosafety datasets of metallic nanoparticles with 2005 rows and 31 columns from 63 published articles were extracted through literature search, data curation, and meta-analysis (Trinh et al. 2018). These datasets with missing values were further preprocessed via various gap filling methods by adapting data from manufacturer specification or references on the same nanomaterials. Five datasets with different qualities and degrees of completeness were generated by using PChem scores based on physicochemical data quality and completeness. The datasets were used to develop SVM and RF models to predict toxicity classification of metallic nanoparticles. The datasets with higher quality and completeness (i.e., higher PChem score) produced the better performed machine learning models than those with lower PChem scores. Further analysis of relative attribute importance showed that the physicochemical properties, core size and surface charge, and the experimental conditions of toxicity assays, dose and cell lines, were the four most important attributes to the toxicity of metallic nanoparticles. Heavy metal nanomaterials have high toxicity and persistence and are related to several human cardiovascular diseases and neoplasms. Machine learning methods were used to study the link among cardiovascular diseases severity, heavy metal

506

L. Mu et al.

concentrations, and single nucleotide polymorphisms. The dataset was collected by the DD Clinic foundation of Caserta in Italy between September 2014 and February 2020, including information of 90 patients with cardiovascular diseases diagnosis. There are 27 heavy metal concentrations obtained by human scalp hair analysis tests. Three machine learning methods (general linear model, RF, and ANN) were fed with features of heavy metal concentrations and single nucleotide polymorphisms to predict patients’ risk index (i.e., moderate, high and very high risk). RF resulted the best performing model in term of accuracy (0.61 ± 0.03) and area under receiver operating characteristic curve (AUC) (0.69 ± 0.03) (Monaco et al. 2021). The study showed that the severity of cardiovascular diseases could be predicted by machine learning methods with the heavy metal concentrations as features. Furthermore, the selection of important features by RF can help us to explain the complex biological and genetic pathways about how heavy metals affect cardiovascular disease. Lead, mercury, and cadmium are common heavy metal pollutants with high toxicity to organisms. The prevalence of hypercholesterolemia associated with exposure to lead, mercury, and cadmium was predicated using machine learning models (Park and Kim 2019). This study used a dataset of 10,089 samples and compared predictive performances of five machine learning algorithms including logistic regression (LR), k-nearest neighbor (KNN), decision tree (DT), RF, and SVM. The SVM model had the highest prediction accuracy and the LR model had the highest AUC of 0.718. More and more studies showed solid prediction capabilities of machine learning algorithms in various application domains including public health and environmental science. Machine learning applied to sensors can realize the real-time detection of toxic heavy metals such as mercury (Lim et al. 2021). Combined with image recognition and spectral technology, machine learning models can also be used to identify heavy metal pollution of aquatic organisms (Ji et al. 2017; Petrea et al. 2020; Singh et al. 2021). Machine learning combined other advanced techniques can be used in detection of nano-heavy metal materials with high toxicity. For example, a fluorescence-based biosensor which was coupled to machine learning methodologies was designed with the big advantage of predicting mercury concentration levels without the use of classical reader devices (Pennacchio et al. 2022). A cost-effective spark emission spectroscopy system combined with machine learning methods was developed to quantify the concentration of toxic metals (Davari and Wexler 2020). In this study, an unsupervised learning technique was employed to detect outlier spectra. The cleaned spectra set was fed into LASSO for predicting the concentration of heavy metals. A combination of LASSO feature detection with univariate regression improved the detection limits.

21 Machine Learning in Prediction of Nanotoxicology

507

21.3.3 Prediction of Nanometal Oxide Toxicity by Machine Learning Among nanoparticle-based products, approximately 80% consists of metal oxides, making the toxicity of nanometal oxides a problem that cannot be ignored. For example, TiO2 nanoparticles have been added into various products such as paints, cosmetics, food additives, paper, and plastics, causing a wide range of human exposure including occupational, consumer, and environmental exposure to TiO2 nanoparticles (Chen et al. 2021). Many studies have shown that exposure to TiO2 nanoparticles cause a series of toxic effects, which were more serious than bulk TiO2 (Sizochenko et al. 2019; Gomes et al. 2021). Therefore, the prediction of toxicity of nanometal oxide materials is very important for their safe application. The toxic effects of eight metal oxide nanoparticles on Escherichia coli were evaluated by experiments and the cytotoxicity ranking of these metal oxide nanoparticles was obtained: Er2 O3 , Gd2 O3 , CeO2 , Co2 O3 , Mn2 O3 , Co3 O4 , Fe3 O4 /WO3 (in descending order) (Kar et al. 2021). Seven machine learning algorithms including linear discriminant analysis (LDA), naïve Bayes, multinomial logistic regression, sequential minimal optimization, AdaBoost, J48, and RF were employed to identify the mechanism of toxicity of these eight metal oxide nanoparticles. The linear discriminant analysis model performed the best and the results showed that the core environment of metal defined by the ratio of the number of core electrons to the number of valence electrons and the electronegativity count of oxygen had a positive impact on toxicity. Machine learning methods were also used to predict the toxicity of metal oxide nanoparticles to other organisms, including the lethal effects to embryonic zebrafish, the nanotoxicity to Daphnia magna and Caenorhabditis elegans (Gonzalez-Moragas et al. 2017; Shin et al. 2018; Robinson et al. 2021). The cytotoxicity of metal oxides in different types of in vitro systems, including Escherichia coli, rat alveolar macrophages, human bronchial epithelial cells, Daphnia magna, and Aliivibrio fischeri was predicted using naïve Bayes classifier (Simeone and Costa 2019). The fundamental physical–chemical parameters (e.g., oxidation number, ionic potential of the cation, surface reducibility and redox reactivity of the oxide) were used to build the models. Importantly, the values of the four fundamental physical–chemical parameters can be easily deduced from the chemical formula of the metal oxide nanoparticle with the help of a periodic table, making it possible to predict the level of toxicity of a nanoparticle given its composition. Nanoparticles can easily penetrate biological systems due to their small sizes and cause higher toxicity to organisms. Machine learning QSAR models were conducted for the prediction of the inflammatory potential of metal oxide nanoparticles (Huang et al. 2020). Authors built a comprehensive dataset of 30 metal oxide nanoparticles to establish QSAR models and validated the models using seven new metal oxide nanoparticles with predictive accuracy of 86%. In addition, a quasi-QSAR model was developed to predict the cell viability of human lung (BEAS-2B) and skin (HaCaT) cells exposed to 21 types of metal oxide nanoparticles (Choi et al. 2019).

508

L. Mu et al.

21.3.4 Prediction of Other Nanomaterials Toxicity by Machine Learning The iteration speed of various nanomaterials is extremely fast, but the lack of toxicological data for the emerging nanomaterials has become a huge challenge in constructing corresponding machine learning models. Therefore, in addition to the most common metals, metal oxides, and carbon nanomaterials, machine learning research on other types of nanomaterials (e.g., quantum dots, 2D nanomaterials, and MOFs) is still rare. Quantum dots with particle sizes between 1.5 and 10 nm have been widely used in the medical and health field due to their unique optical properties and in vitro/in vivo optical trackability, such as multifunctional nanoprobes (Maxwell et al. 2020). To explore the cytotoxicity of cadmium-containing semiconductor quantum dots comprehensively, 1741 cell viability-related data samples were obtained, each with 24 qualitative and quantitative attributes describing the material properties and experimental conditions, and then the data were analyzed using RF regression models (Oh et al. 2016). These RF models demonstrated a prediction accuracy of R2 = 0.68 for cell viability and R2 = 0.77 for IC50 , suggesting that meta-analysis combining machine learning can help develop methods for predicting the toxicity of nanomaterials. The RF importance analysis showed that the toxicity response induced by quantum dots correlated primarily with quantum dot diameter, surface ligand, and shell and surface modification. The cellular toxicity of Cd-containing quantum dots was explored using BN based on a dataset compiled from 517 publications comprising 3028 cell viability data samples and 837 IC50 values (Bilal et al. 2019). Quantum dot diameter, exposure time, surface ligand, shell, assay type, surface modification, and surface charge were identified by their BN-QDTox models as the most relevant factors for correlating IC50 . As an interpretable model, the association rules and specific conditional dependences were extracted from the web-based graphical versions. 2D nanomaterials exhibit sheet shapes, with two dimensions are outside the nanoscale, and one dimension is only a single or few atomic layers thick. The atomiclevel thickness gives unique electronic properties to 2D nanomaterials, and the representative ones include graphene, MXene, TMDs, hexagonal boron nitride and black phosphorus. Since 2D nanomaterials are defined by size and are not a class of similar nanomaterials, the primary factor that determines the toxicity of such materials is their chemical composition. There is a study that used machine learning to evaluate the toxicity of certain 2D nanomaterials such as graphene mentioned above, rather than the whole family. Machine learning models were constructed based on experimental and theoretical sets to predict the MXene-induced cytotoxicity (Marchwiany et al. 2020). The results indicated that the surface modification and the divided surface characteristics were the crucial issues concerning the toxicity of MXene. In fact, most studies seem to prefer to build a comprehensive machine learning model based on a variety of common nanomaterials for a certain biological effect (e.g., cytotoxicity, reproductive toxicity and immunotoxicity). Combining the idea

21 Machine Learning in Prediction of Nanotoxicology

509

of meta-analysis, a nanomaterial reproductive toxicity database was constructed, containing more than 18 different nanomaterials with 10 factors (Ban et al. 2018). The exposure method and the type of nanomaterials were screened as the two top priority factors for nanomaterials accumulation, while toxicity indicators and type of nanomaterials were the two top priority factors for reproductive toxicity. Moreover, the nanomaterials-induced pulmonary immunotoxicity was predicated by constructing RF models based on nano-immunotoxicity dataset assembled by publications containing more than 1600 immunotoxicity samples of 57 nanomaterials (Fig. 21.2) (Yu et al. 2021). To overcome the obstacles caused by the low interpretability of machine learning to the research of toxicological mechanism, the authors proposed a tree-based RF feature importance and feature interaction network analysis framework (TBRFA). TBRFA overcame the feature importance bias brought by small datasets through a multiway importance analysis, and revealed that dose, recovery duration and specific surface area dominated the immunotoxicity of nanomaterials. TBRFA also built feature interaction networks, boosted model interpretability, and revealed hidden interactional factors (e.g., specific surface area and zeta potential were mutually restrictive and affected the biocompatibility and toxicity of nanomaterials). In addition to the properties of nanomaterials, another important factor that determines their toxicity is the protein corona. Protein corona is a layer of dynamic protein complex consisting of bound or adsorbed proteins around nanomaterials, which is

Fig. 21.2 Predicting nano-immunotoxicity using machine learning and revealing hidden interactional factors by feature interaction networks (Yu et al. 2021)

510

L. Mu et al.

Fig. 21.3 Functional composition of protein corona predicted by machine learning (Ban et al. 2020)

common in biological fluid (Nguyen and Lee 2017). Protein corona can modify the surface properties, hence changing the biocompatibility and toxicity of nanomaterials. The formation of protein corona should be highly related to the properties of nanomaterials and the body fluid environment in which they are located. Therefore, just like exploring the biological effects of nanomaterials, machine learning is also an effective tool for exploring the laws of protein corona formation. RF was used to learn the complex relationships between properties of nanomaterials and corona composition, and the learned relationships were then used to predict the formation of protein coronas and the related cell responses comprehensively and quantitatively (Fig. 21.3) (Ban et al. 2020). Nanomaterials without modification and surface modification were identified as the most important factors dominating the formation of the protein corona. Overall, the work above showed that the development of machine learning provides solutions for predicting the complicated biological response induced by nanomaterials.

21.4 Future Directions of Machine Learning in Nanotoxicology Prediction The following future research directions are anticipated to improve prediction of nanotoxicity based on machine learning. The details on other issues are provided in the recent work, as organized in Fig. 21.4 (Jia et al. 2021). . Although there is a growing interest in rapid identification and assessment of nanotoxicity, the current number and scale of nanomaterials databases still lag far behind the scientific and decision-making needs for machine learning, especially for new nanomaterials.

21 Machine Learning in Prediction of Nanotoxicology

511

Fig. 21.4 Several insights regarding the development of machine learning related to nanomaterials (Jia et al. 2021)

. A set of standard and universal nanomaterials characterization and testing protocols is urgently needed, which will improve the comparability of literature results and make the nanotoxicity data more suitable for machine learning. . The diversity of nanomaterials makes it difficult to construct universal descriptors to make various materials (e.g., CNMs and TMDs) comparable, and it is also difficult to generalize the machine learning models to new materials (e.g., MOFs). Constructing a set of universal nanodescriptors will provide great convenience for rapid prediction of the toxicity of new materials. . Toxicology usually involves mechanism information, but machine learning with low interpretability is difficult to meet the need for nanotoxicological exploration. Improving the interpretability of machine learning or using interpretable machine learning to explore the mechanism of nanotoxicology is a trend for the future study.

512

L. Mu et al.

References Abramenko N, Deyko G, Abkhalimov E, Isaeva V, Pelgunova L, Krysanov E, Kustov L (2021) Acute toxicity of Cu-MOF nanoparticles (nanoHKUST-1) towards embryos and adult zebrafish. Int J Mol Sci 22(11):5568 Akhavan O, Ghaderi E, Emamy H, Akhavan F (2013) Genotoxicity of graphene nanoribbons in human mesenchymal stem cells. Carbon 54:419–431 Akhavan O, Hashemi E, Zare H, Shamsara M, Taghavinia N, Heidari F (2016) Influence of heavy nanocrystals on spermatozoa and fertility of mammals. Mater Sci Eng C-Mater 69:52–59 Ambrosi A, Sofer Z, Pumera M (2015) Lithium intercalation compound dramatically influences the electrochemical properties of exfoliated MoS2. Small 11(5):605–612 Appel JH, Li DO, Podlevsky JD, Debnath A, Green AA, Wang QH, Chae J (2016) Low cytotoxicity and genotoxicity of two-dimensional MoS2 and WS2. ACS Biomater Sci Eng 2(3):361–367 Ban Z, Zhou QX, Sun AQ, Mu L, Hu XG (2018) Screening priority factors determining and predicting the reproductive toxicity of various nanoparticles. Environ Sci Technol 52(17):9666– 9676 Ban Z, Yuan P, Yu FB, Peng T, Zhou QX, Hu XG (2020) Machine learning predicts the functional composition of the protein corona and the cellular recognition of nanoparticles. Proc Natl Acad Sci USA 117(19):10492–10499 Bilal M, Oh E, Liu R, Breger JC, Medintz IL, Cohen Y (2019) Bayesian network resource for meta-analysis: cellular toxicity of quantum dots. Small 15(34):1900510 Butler SZ, Hollen SM, Cao LY, Cui Y, Gupta JA, Gutierrez HR, Heinz TF, Hong SS, Huang JX, Ismach AF, Johnston-Halperin E, Kuno M, Plashnitsa VV, Robinson RD, Ruoff RS, Salahuddin S, Shan J, Shi L, Spencer MG, Terrones M, Windl W, Goldberger JE (2013) Progress, challenges, and opportunities in two-dimensional materials beyond graphene. ACS Nano 7(4):2898–2926 Chen Y, Tan CL, Zhang H, Wang LZ (2015) Two-dimensional graphene analogues for biomedical applications. Chem Soc Rev 44(9):2681–2701 Chen D, Sarkar S, Candia J, Florczyk SJ, Bodhak S, Driscoll MK, Simon CG, Dunkers JP, Losert W (2016a) Machine learning based methodology to identify cell shape phenotypes associated with microenvironmental cues. Biomaterials 104:104–118 Chen YM, Hu XG, Sun J, Zhou QX (2016b) Specific nanotoxicity of graphene oxide during zebrafish embryogenesis. Nanotoxicology 10(1):42–52 Chen TM, Zou H, Wu XJ, Liu CC, Situ B, Zheng L, Yang GW (2018) Nanozymatic antioxidant system based on MoS2 nanosheets. ACS Appl Mater Interfaces 10(15):12453–12462 Chen ZJ, Han S, Zhang JH, Zheng P, Liu XD, Zhang YY, Jia G (2021) Exploring urine biomarkers of early health effects for occupational exposure to titanium dioxide nanoparticles using metabolomics. Nanoscale 13(7):4122–4132 Cheng L, Yuan C, Shen SD, Yi X, Gong H, Yang K, Liu Z (2015) Bottom-up synthesis of metalion-doped WS2 nanoflakes for cancer theranostics. ACS Nano 9(11):11090–11101 Chia HL, Latiff NM, Sofer Z, Pumera M (2018) Cytotoxicity of group 5 transition metal ditellurides (MTe2; M=V, Nb, Ta). Chem Eur J 24(1):206–211 Chng ELK, Sofer Z, Pumera M (2014) MoS2 exhibits stronger toxicity with increased exfoliation. Nanoscale 6(23):14412–14418 Choi JS, Trinh TX, Yoon TH, Kim J, Byun HG (2019) Quasi-QSAR for predicting the cell viability of human lung and skin cells exposed to different metal oxide nanomaterials. Chemosphere 217:243–249 Chou SS, Kaehr B, Kim J, Foley BM, De M, Hopkins PE, Huang J, Brinker CJ, Dravid VP (2013) Chemically exfoliated MoS2 as near-infrared photothermal agents. Angew Chem Int Ed 52(15):4160–4164 Choudhary N, Hwang S, Choi W (2014) In: Bhushan B, Luo D, Schricker SR, Sigmund W, Zauscher S (eds) Handbook of nanomaterials properties. Springer, Berlin, Heidelberg, pp 709–769 Davari SA, Wexler AS (2020) Quantification of toxic metals using machine learning techniques and spark emission spectroscopy. Atmos Meas Tech 13(10):5369–5377

21 Machine Learning in Prediction of Nanotoxicology

513

Ding YR, Zeng MQ, Zheng QJ, Zhang JQ, Xu D, Chen WY, Wang CY, Chen SL, Xie YY, Ding Y, Zheng ST, Zhao J, Gao P, Fu L (2021) Bidirectional and reversible tuning of the interlayer spacing of two-dimensional materials. Nat Commun 12(1):5886 Du JJ, Hu XG, Mu L, Ouyang SH, Ren CX, Du YD, Zhou QX (2015a) Root exudates as natural ligands that alter the properties of graphene oxide and environmental implications thereof. RSC Adv 5(23):17615–17622 Du JJ, Hu XG, Zhou QX (2015b) Graphene oxide regulates the bacterial community and exhibits property changes in soil. RSC Adv 5(34):27009–27017 Du JJ, Wang T, Zhou QX, Hu XG, Wu JH, Li GF, Li GQ, Hou F, Wu YN (2020) Graphene oxide enters the rice roots and disturbs the endophytic bacterial communities. Ecotoxicol Environ Saf 192:110304 Durymanov M, Permyakova A, Sene S, Guo AL, Kroll C, Gimenez-Marques M, Serre C, Reineke J (2019) Cellular uptake, intracellular trafficking, and stability of biocompatible metal-organic framework (MOF) particles in Kupffer cells. Mol Pharm 16(6):2315–2325 Eng AYS, Ambrosi A, Sofer Z, Simek P, Pumera M (2014) Electrochemistry of transition metal dichalcogenides: strong dependence on the metal-to-chalcogen composition and exfoliation method. ACS Nano 8(12):12185–12198 Fan GD, Bao MC, Zheng XM, Hong L, Zhan JJ, Chen Z, Qu FS (2019) Growth inhibition of harmful cyanobacteria by nanocrystalline Cu-MOF-74: efficiency and its mechanisms. J Hazard Mater 367:529–538 Feng ZQ, Wang T, Zhao B, Li JC, Jin L (2015) Soft graphene nanofibers designed for the acceleration of nerve growth and development. Adv Mater 27(41):6462–6468 Feng RH, Yu FB, Xu J, Hu XG (2021) Knowledge gaps in immune response and immunotherapy involving nanomaterials: databases and artificial intelligence for material design. Biomaterials 266:120469 Fourches D, Pu DQY, Tassa C, Weissleder R, Shaw SY, Mumper RJ, Tropsha A (2010) Quantitative nanostructure-activity relationship modeling. ACS Nano 4(10):5703–5712 Gernand JM, Casman EA (2014) Machine learning for nanomaterial toxicity risk assessment. IEEE Intell Syst 29(3):84–88 Gomes SIL, Amorim MJB, Pokhrel S, Madler L, Fasano M, Chiavazzo E, Asinari P, Janes J, Tamm K, Burk J, Scott-Fordsmand JJ (2021) Machine learning and materials modelling interpretation of in vivo toxicological response to TiO2 nanoparticles library (UV and non-UV exposure). Nanoscale 13(35):14666–14678 Gonzalez-Moragas L, Maurer LL, Harms VM, Meyer JN, Laromaine A, Roig A (2017) Materials and toxicological approaches to study metal and metal-oxide nanoparticles in the model organism Caenorhabditis elegans. Mater Horiz 4(5):719–746 Gu W, Yan YH, Cao XN, Zhang CL, Ding CP, Xian YZ (2016) A facile and one-step ethanolthermal synthesis of MoS2 quantum dots for two-photon fluorescence imaging. J Mater Chem B 4(1):27–31 Guiney LM, Wang X, Xia T, Nel AE, Hersam MC (2018) Assessing and mitigating the hazard potential of two-dimensional materials. ACS Nano 12(7):6360–6377 Gul G, Yildirim R, Ileri-Ercan N (2021) Cytotoxicity analysis of nanoparticles by association rule mining. Environ Sci-Nano 8(4):937–949 Guo F, Yang H, Liu LM, Han Y, Al-Enizi AM, Nafady A, Kruger PE, Telfer SG, Ma SQ (2019) Hollow capsules of doped carbon incorporating metal@metal sulfide and metal@metal oxide core-shell nanoparticles derived from metal-organic framework composites for efficient oxygen electrocatalysis. J Mater Chem A 7(8):3624–3631 Hao JL, Song GS, Liu T, Yi X, Yang K, Cheng L, Liu Z (2017) In vivo long-term biodistribution, excretion, and toxicology of PEGylated transition-metal dichalcogenides MS2 (M = Mo, W, Ti) nanosheets. Adv Sci 4(1):1600160 Hou J, Wang LY, Wang CJ, Zhang SL, Liu HQ, Li SG, Wang XK (2019) Toxicity and mechanisms of action of titanium dioxide nanoparticles in living organisms. J Environ Sci 75:40–53

514

L. Mu et al.

Hu XG, Kang J, Lu KC, Zhou RR, Mu L, Zhou QX (2014a) Graphene oxide amplifies the phytotoxicity of arsenic in wheat. Sci Rep 4:6122 Hu XG, Lu KC, Mu L, Kang J, Zhou QX (2014b) Interactions between graphene oxide and plant cells: regulation of cell morphology, uptake, organelle damage, oxidative effects and metabolic disorders. Carbon 80:665–676 Hu XG, Mu L, Kang J, Lu KC, Zhou RR, Zhou QX (2014c) Humic acid acts as a natural antidote of graphene by regulating nanomaterial translocation and metabolic fluxes in vivo. Environ Sci Technol 48(12):6919–6927 Hu XG, Ouyang SH, Mu L, An J, Zhou Q (2015a) Effects of graphene oxide and oxidized carbon nanotubes on the cellular division, microstructure, uptake, oxidative stress, and metabolic profiles. Environ Sci Technol 49(18):10825–10833 Hu XG, Zhou M, Zhou QX (2015b) Ambient water and visible-light irradiation drive changes in graphene morphology, structure, surface chemistry, aggregation, and toxicity. Environ Sci Technol 49(6):3410–3418 Hu XG, Wei Z, Mu L (2017) Graphene oxide nanosheets at trace concentrations elicit neurotoxicity in the offspring of zebrafish. Carbon 117:182–191 Huang Y, Li XH, Xu SJ, Zheng HZ, Zhang LL, Chen JW, Hong HX, Kusko R, Li RB (2020) Quantitative structure-activity relationship models for predicting inflammatory potential of metal oxide nanoparticles. Environ Health Perspect 128(6):067010 Ji GL, Ye PC, Shi YJ, Yuan LM, Chen XJ, Yuan MS, Zhu DH, Chen X, Hu XY, Jiang J (2017) Laser-induced breakdown spectroscopy for rapid discrimination of heavy-metal-contaminated seafood Tegillarca granosa. Sensors 17(11):2655 Jia YY, Hou X, Wang ZW, Hu XG (2021) Machine learning boosts the design and discovery of nanomaterials. ACS Sustain Chem Eng 9(18):6130–6147 Jin JY, Tilve S, Huang ZH, Zhou LB, Geller HM, Yu PP (2018) Effect of chondroitin sulfate proteoglycans on neuronal cell adhesion, spreading and neurite growth in culture. Neural Regen Res 13(2):289–297 Kalantar-Zadeh K, Ou JZ, Daeneke T, Strano MS, Pumera M, Gras SL (2015) Two-dimensional transition metal dichalcogenides in biosystems. Adv Funct Mater 25(32):5086–5099 Kar S, Pathakoti K, Tchounwou PB, Leszczynska D, Leszczynski J (2021) Evaluating the cytotoxicity of a large pool of metal oxide nanoparticles to Escherichia coli: mechanistic understanding through in vitro and in silico studies. Chemosphere 264:128428 Labouta HI, Asgarian N, Rinker K, Cramb DT (2019) Meta-analysis of nanoparticle cytotoxicity via data-mining the literature. ACS Nano 13(2):1583–1594 Lachowicz JI, Lecca LI, Meloni F, Campagna M (2021) Metals and metal-nanoparticles in human pathologies: from exposure to therapy. Molecules 26(21):90 Latiff NM, Sofer Z, Fisher AC, Pumera M (2017) Cytotoxicity of exfoliated layered vanadium dichalcogenides. Chem Eur J 23(3):684–690 Lee J, Mahendra S, Alvarez PJJ (2010) Nanomaterials in the construction industry: a review of their applications and environmental health and safety considerations. ACS Nano 4(7):3580–3590 Lee JH, Choi HK, Yang L, Chueng STD, Choi JW, Lee KB (2018) Nondestructive real-time monitoring of enhanced stem cell differentiation using a graphene-Au hybrid nanoelectrode array. Adv Mater 30(39):1802762 Li BL, Setyawati MI, Chen LY, Xie JP, Ariga K, Lim CT, Garaj S, Leong DT (2017a) Directing assembly and disassembly of 2D MoS2 nanosheets with DNA for drug delivery. ACS Appl Mater Interfaces 9(18):15286–15296 Li X, Peng L, Yao XJ, Cui SL, Hu Y, You CZ, Chi TH (2017b) Long short-term memory neural network for air pollutant concentration predictions: method development and evaluation. Environ Pollut 231:997–1004 Li XK, Mu L, Hu XG (2018a) Integrating proteomics, metabolomics and typical analysis to investigate the uptake and oxidative stress of graphene oxide and polycyclic aromatic hydrocarbons. Environ Sci-Nano 5(1):115–129

21 Machine Learning in Prediction of Nanotoxicology

515

Li XK, Mu L, Li DD, Ouyang SH, He CJ, Hu XG (2018b) Effects of the size and oxidation of graphene oxide on crop quality and specific molecular pathways. Carbon 140:352–361 Li XK, Ban Z, Yu FB, Hao WD, Hu XG (2020) Untargeted metabolic pathway analysis as an effective strategy to connect various nanoparticle properties to nanoparticle-induced ecotoxicity. Environ Sci Technol 54(6):3395–3406 Li XK, Sun S, Guo SQ, Hu XG (2021) Identifying the phytotoxicity and defense mechanisms associated with graphene-based nanomaterials by integrating multiomics and regular analysis. Environ Sci Technol 55(14):9938–9948 Lim JW, Kim TY, Woo MA (2021) Trends in sensor development toward next-generation point-ofcare testing for mercury. Biosens Bioelectron 183:113228 Liu T, Wang C, Gu X, Gong H, Cheng L, Shi XZ, Feng LZ, Sun BQ, Liu Z (2014) Drug delivery with PEGylated MoS2 nano-sheets for combined photothermal and chemotherapy of cancer. Adv Mater 26(21):3433–3440 Liu J, Dong CC, Deng YX, Ji JH, Bao SY, Chen CR, Shen B, Zhang JL, Xing MY (2018) Molybdenum sulfide co-catalytic Fenton reaction for rapid and efficient inactivation of Escherichia coli. Water Res 145:312–320 Liu CH, Chiu HC, Sung HL, Yeh JY, Wu KCW, Liu SH (2019) Acute oral toxicity and repeated dose 28-day oral toxicity studies of MIL-101 nanoparticles. Regul Toxicol Pharm 107:104426 Loan PTK, Zhang WJ, Lin CT, Wei KH, Li LJ, Chen CH (2014) Graphene/MoS2 heterostructures for ultrasensitive detection of DNA hybridisation. Adv Mater 26(28):4838–4844 Lorchat E, Lopez LEP, Robert C, Lagarde D, Froehlicher G, Taniguchi T, Watanabe K, Marie X, Berciaud S (2020) Filtering the photoluminescence spectra of atomically thin semiconductors with graphene. Nat Nanotechnol 15(4):283–288 Luo NN, Weber JK, Wang S, Luan BQ, Yue H, Xi XB, Du J, Yang ZX, Wei W, Zhou RH, Ma GH (2017) PEGylated graphene oxide elicits strong immunological responses despite surface passivation. Nat Commun 8:14537 Luo Y, Wang Q, Li J, Xu F, Sun L, Zou Y, Chu H, Li B, Zhang K (2020) Enhanced hydrogen storage/sensing of metal hydrides by nanomodification. Mater Today Nano 9:100071 Ma Y, Wang JL, Wu JY, Tong CX, Zhang T (2021) Meta-analysis of cellular toxicity for graphene via data-mining the literature and machine learning. Sci Total Environ 793:148532 Makovicky E (2006) Crystal structures of sulfides and other chalcogenides. Rev Mineral Geochem 61:7–125 Marchwiany ME, Birowska M, Popielski M, Majewski JA, Jastrzebska AM (2020) Surface-related features responsible for cytotoxic behavior of MXenes layered materials predicted with machine learning approach. Materials 13(14):3083 Maxwell T, Nogueira Campos MG, Smith S, Doomra M, Thwin Z, Santra S (2020). In: Chung EJ, Leon L, Rinaldi C (eds) Nanoparticles for biomedical applications. Elsevier, pp 243–265 Monaco A, Lacalamita A, Amoroso N, D’orta A, Del Buono A, Di Tuoro F, Tangaro S, Galeandro AI, Bellotti R (2021) Random forests highlight the combined effect of environmental heavy metals exposure and genetic damages for cardiovascular diseases. Appl Sci-Basel 11(18):8405 Mu L, Gao Y, Hu XG (2015) L-Cysteine: A biocompatible, breathable and beneficial coating for graphene oxide. Biomaterials 52:301–311 Mu L, Gao Y, Hu XG (2016) Characterization of biological secretions binding to graphene oxide in water and the specific toxicological mechanisms. Environ Sci Technol 50(16):8530–8537 Nel A, Xia T, Meng H, Wang X, Lin SJ, Ji ZX, Zhang HY (2013) Nanomaterial toxicity testing in the 21st century: use of a predictive toxicological approach and high-throughput screening. Accounts Chem Res 46(3):607–621 Nguyen VH, Lee BJ (2017) Protein corona: a new approach for nanomedicine design. Int J Nanomed 12:3137–3151 Oh E, Liu R, Nel A, Gemill KB, Bilal M, Cohen Y, Medintz IL (2016) Meta-analysis of cellular toxicity for cadmium-containing quantum dots. Nat Nanotechnol 11(5):479–486

516

L. Mu et al.

Ouyang SH, Hu XG, Zhou QX (2015) Envelopment-internalization synergistic effects and metabolic mechanisms of graphene oxide on single-cell chlorella vulgaris are dependent on the nanomaterial particle size. ACS Appl Mater Interfaces 7(32):18104–18112 Park H, Kim K (2019) Comparisons among machine learning models for the prediction of hypercholestrolemia associated with exposure to lead, mercury, and cadmium. Int J Environ Res Public Health 16(15):2666 Pennacchio A, Giampaolo F, Piccialli F, Cuomo S, Notomista E, Spinelli M, Amoresano A, Piscitelli A, Giardina P (2022) A machine learning-enhanced biosensor for mercury detection based on an hydrophobin chimera. Biosens Bioelectron 196:113696 Petrea SM, Costache M, Cristea D, Strungaru SA, Simionov IA, Mogodan A, Oprica L, Cristea V (2020) A machine learning approach in analyzing bioaccumulation of heavy metals in turbot tissues. Molecules 25(20):4696 Ren CX, Hu XG, Li XY, Zhou QX (2016) Ultra-trace graphene oxide in a water environment triggers Parkinson’s disease-like symptoms and metabolic disturbance in zebrafish larvae. Biomaterials 93:83–94 Rico CM, Morales MI, Barrios AC, Mccreary R, Hong J, Lee WY, Nunez J, Perata-Videa JR, Gardea-Torresdey JL (2013) Effect of cerium oxide nanoparticles on the quality of rice (Oryza sativa L.) grains. J Agric Food Chem 61(47):11278–11285 Robinson RLM, Sarimveis H, Doganis P, Jia XD, Kotzabasaki M, Gousiadou C, Harper SL, Wilkins T (2021) Identifying diverse metal oxide nanomaterials with lethal effects on embryonic zebrafish using machine learning. Beilstein J Nanotechnol 12:1297–1325 Ruyra A, Yazdi A, Espin J, Carne-Sanchez A, Roher N, Lorenzo J, Imaz I, Maspoch D (2015) Synthesis, culture medium stability, and in vitro and in vivo zebrafish embryo toxicity of metalorganic framework nanoparticles. Chem Eur J 21(6):2508–2518 Sarkar D, Liu W, Xie XJ, Anselmo AC, Mitragotri S, Banerjee K (2014) MoS2 field-effect transistor for next-generation label-free biosensors. ACS Nano 8(4):3992–4003 Schnell M, Mittal S, Falahkheirkhah K, Mittal A, Yeh K, Kenkel S, Kajdacsy-Balla A, Carney PS, Bhargava R (2020) All-digital histopathology by infrared-optical hybrid microscopy. Proc Natl Acad Sci USA 117(7):3388–3396 Shang EX, Niu JF, Li Y, Zhou YJ, Crittenden JC (2017) Comparative toxicity of Cd, Mo, and W sulphide nanomaterials toward E. coli under UV irradiation. Environ Pollut 224:606–614 Shin HK, Seo M, Shin SE, Kim KY, Park JW, No KT (2018) Meta-analysis of Daphnia magna nanotoxicity experiments in accordance with test guidelines. Environ Sci-Nano 5(3):765–775 Simeone FC, Costa AL (2019) Assessment of cytotoxicity of metal oxide nanoparticles on the basis of fundamental physical-chemical parameters: a robust approach to grouping. Environ Sci-Nano 6(10):3102–3112 Singh A, Gupta H, Srivastava A, Srivastava A, Joshi RC, Dutta MK (2021) A novel pilot study on imaging-based identification of fish exposed to heavy metal (Hg) contamination. J Food Process Preserv 45(6):e15571 Sizochenko N, Syzochenko M, Fjodorova N, Rasulev B, Leszczynski J (2019) Evaluating genotoxicity of metal oxide nanoparticles: application of advanced supervised and unsupervised machine learning techniques. Ecotoxicol Environ Saf 185:109733 Sun YF, Wang YX, Chen JYC, Fujisawa K, Holder CF, Miller JT, Crespi VH, Terrones M, Schaak RE (2020) Interface-mediated noble metal deposition on transition metal dichalcogenide nanostructures. Nat Chem 12(3):284–293 Sun S, Deng P, Mu L, Hu XG, Guo SQ (2021) Bionanoscale recognition underlies cell fate and therapy. Adv Healthc Mater 10(22):2101260 Teo WZ, Chng ELK, Sofer Z, Pumera M (2014) Cytotoxicity of exfoliated transition-metal dichalcogenides (MoS2, WS2, and WSe2) is lower than that of graphene and its analogues. Chem Eur J 20(31):9627–9632 Tong YC, Feng AQ, Hou X, Zhou QX, Hu XG (2019) Nanoholes regulate the phytotoxicity of single-layer molybdenum disulfide. Environ Sci Technol 53(23):13938–13948

21 Machine Learning in Prediction of Nanotoxicology

517

Trinh TX, Ha MK, Choi JS, Byun HG, Yoon TH (2018) Curation of datasets, assessment of their quality and completeness, and nanoSAR classification model development for metallic nanoparticles. Environ Sci-Nano 5(8):1902–1910 Wang QH, Kalantar-Zadeh K, Kis A, Coleman JN, Strano MS (2012a) Electronics and optoelectronics of two-dimensional transition metal dichalcogenides. Nat Nanotechnol 7(11):699–712 Wang ZY, Xie XY, Zhao J, Liu XY, Feng WQ, White JC, Xing BS (2012b) Xylem- and Phloem-based transport of CuO nanoparticles in maize (Zea mays L.). Environ Sci Technol 46(8):4434–4441 Wang LM, Zhao JT, Cui LW, Li YF, Li B, Chen CY (2021) Comparative nanometallomics as a new tool for nanosafety evaluation. Metallomics 13(4):mfab013 Wu SR, Liu JB, Wang H, Yan H (2019) A review of performance optimization of MOF-derived metal oxide as electrode materials for supercapacitors. Int J Energy Res 43(2):697–716 Yong Y, Cheng XJ, Bao T, Zu M, Yan L, Yin WY, Ge CC, Wang DL, Gu ZJ, Zhao YL (2015) Tungsten sulfide quantum dots as multifunctional nanotheranostics for in vivo dual-modal image-guided photothermal/radiotherapy synergistic therapy. ACS Nano 9(12):12451–12463 Yu FB, Wei CH, Deng P, Peng T, Hu XG (2021) Deep exploration of random forest model boosts the interpretability of machine learning studies of complicated immune responses and lung burden of nanoparticles. Sci Adv 7(22):eabf4130 Zhang XL, Zhou QX, Zou W, Hu XG (2017) Molecular mechanisms of developmental toxicity induced by graphene oxide at predicted environmental concentrations. Environ Sci Technol 51(14):7861–7871 Zhang XL, Hu XG, Wu H, Mu L (2021) Persistence and recovery of ZIF-8 and ZIF-67 phytotoxicity. Environ Sci Technol 55(22):15301–15312 Zhang P, Guo ZL, Luo WH, Monikh FA, Xie CJ, Valsami-Jones E, Lynch I, Zhang ZY (2020a) Graphene oxide-induced pH alteration, iron overload, and subsequent oxidative damage in rice (Oryza sativa L.): a new mechanism of nanomaterial phytotoxicity. Environ Sci Technol 54(6):3181–3190 Zhang XL, Zhou QX, Li XY, Zou W, Hu XG (2020b) Integrating omics and traditional analyses to profile the synergistic toxicity of graphene oxide and triphenyl phosphate. Environ Pollut 263 Zhou QX, Li DD, Wang T, Hu XG (2021) Leaching of graphene oxide nanosheets in simulated soil and their influences on microbial communities. J Hazard Mater 404 Zhou QX, Hu XG (2017) Systemic stress and recovery patterns of rice roots in response to graphene oxide nanosheets. Environ Sci Technol 51(4):2022–2030 Zhu MT, Nie GJ, Meng H, Xia T, Nel A, Zhao YL (2013) Physicochemical properties determine nanomaterial cellular uptake, transport, and fate. Accounts Chem Res 46(3):622–631 Zhu WD, Liu XM, Tan L, Cui ZD, Yang XJ, Liang YQ, Li ZY, Zhu SL, Yeung KWK, Wu SL (2019) AgBr nanoparticles in situ growth on 2D MoS2 nanosheets for rapid bacteria-killing and photodisinfection. ACS Appl Mater Interfaces 11(37):34364–34375 Zhuang PF, Zhang P, Li K, Kumari B, Li D, Mei XF (2019) Silver nanoclusters encapsulated into metal-organic frameworks for rapid removal of heavy metal ions from water. Molecules 24(13):2442 Zou W, Zhou QX, Zhang XL, Mu L, Hu XG (2018) Characterization of the effects of trace concentrations of graphene oxide on zebrafish larvae through proteomic and standard methods. Ecotoxicol Environ Saf 159:221–231 Zou W, Li XY, Li CH, Sun YY, Zhang XL, Jin CX, Jiang K, Zhou QX, Hu XG (2020a) Influence of size and phase on the biodegradation, excretion, and phytotoxicity persistence of single-layer molybdenum disulfide. Environ Sci Technol 54(19):12295–12306 Zou W, Zhang XL, Ouyang SH, Hu XG, Zhou QX (2020b) Graphene oxide nanosheets mitigate the developmental toxicity of TDCIPP in zebrafish via activating the mitochondrial respiratory chain and energy metabolism. Sci Total Environ 727:138486

Chapter 22

Machine Learning for Predicting Organ Toxicity Jie Liu, Wenjing Guo, Fan Dong, Tucker A. Patterson, and Huixiao Hong

22.1 Introduction Organ toxicity is a result of chemical exposure. It plays an important role in safety assessment and is a major part of compound attrition during all stages of drug development (Lin and Will 2012). Drug-induced organ toxicity (DIOT) that results in the removal of marketed drugs or termination of candidate drugs is a major concern for regulatory agencies and pharmaceutical companies (Lu and Chen 2015). Liver toxicity, kidney toxicity, and heart toxicity are the primary organ toxicities during drug development (Fig. 22.1) (Cook et al. 2014; Schuster et al. 2005; Wilke et al. 2007). A huge number of compounds require safety assessment. However, animal testing, the traditional toxicity testing method, is expensive and time consuming. It is difficult to perform animal toxicity testing for all chemicals in the environment (Ashburn and Thor 2004; Paul et al. 2010; Tornqvist et al. 2014). Furthermore, it is important to detect potential organ toxicity during the early stages of drug development to J. Liu · W. Guo · F. Dong · T. A. Patterson · H. Hong (B) National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA e-mail: [email protected] J. Liu e-mail: [email protected] W. Guo e-mail: [email protected] F. Dong e-mail: [email protected] T. A. Patterson e-mail: [email protected] This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2023 H. Hong (ed.), Machine Learning and Deep Learning in Computational Toxicology, Computational Methods in Engineering & the Sciences, https://doi.org/10.1007/978-3-031-20730-3_22

519

520

J. Liu et al.

Fig. 22.1 Drug-induced toxicities associated with the withdrawal from the US market (1976–2005) (Wilke et al. 2007)

reduce cost and time. Therefore, it is crucial and urgent for regulatory agencies and pharmaceutical companies to develop alternative methods for efficient toxicity prediction (Raschi and De Ponti 2017). With the continuously increasing amount of data and computational power, machine learning has become a promising approach for predicting organ toxicity to improve the drug development and safety assessment process. Regulatory agencies and pharmaceutical companies have made great efforts on the development and application of in silico approaches including machine learning for safety assessment. In this chapter, we summarize the computational models that have been developed for organ level toxicity prediction using different machine learning algorithms such as random forest, decision tree, and support vector machine. Some case studies on machine learning models for predicting organ level toxicities such as liver toxicity (Liu et al. 2015; Low et al. 2011), kidney toxicity (Kim and Shin 2014), and heart toxicity (Xu et al. 2020) are also discussed.

22.2 Machine Learning Algorithms Machine learning is an approach to extract or identify patterns from a collection of data to make predictions. Developing machine learning models for toxicity prediction has attracted high interest and acceptance by regulatory agencies and pharmaceutical companies (Patel et al. 2020). This section gives a brief introduction of some machine learning algorithms that have been used and are currently being used in organ toxicity prediction.

22 Machine Learning for Predicting Organ Toxicity

521

22.2.1 Classification and Regression Tree Classification and regression tree (CART) is a term used for decision tree which can be used for classification and regression modeling. The tree-like structure represents the probabilities for predicting samples. It is a popular modeling algorithm for supervised machine learning and is simple for interpretation and visualization. In addition, data preparation for decision tree is easy and does not need data normalization (Breiman et al. 1984; Loh 2011). The models built using decision tree showed good performance on organ toxicity prediction. In prediction of liver toxicity, CART models achieved an accuracy of 0.80–0.84 in tenfold cross-validations and outperformed the models constructed using other machine learning algorithms [linear discriminant analysis (LDA), knearest neighbors (kNN), and Naïve Bayes (NB)] (Liu et al. 2015). The CART models for predicting 35 target organ outcomes achieved good performances with a mean F1 score of 0.71 in fivefold cross-validations (Liu et al. 2017).

22.2.2 k-Nearest Neighbors (kNN) The kNN algorithm is a nonparametric machine learning method which assigns the label (for classification) or calculates the dependent variable value (for regression) for a sample based on k samples that are nearest to it. The distances between samples can be measured using a variety of methods, and Euclidean distance is the most commonly used distance measurement in kNN. There are several algorithmic parameters that require optimization in training kNN models, including the number of nearest neighbors, k, and distance measurement (Baskin 2018). The models built for liver toxicity prediction using kNN and toxicogenomics data achieved a correct classification rate (CCR) of 70% in fivefold cross-validations (Low et al. 2011). The kNN models for liver hypertrophy prediction had a balanced accuracy of 75% in tenfold cross-validations (Liu et al. 2015). The kNN models for kidney pathological endpoints prediction had good performances with a geometric mean of 0.80 in accuracy from threefold cross-validations (Kim and Shin 2014).

22.2.3 Naïve Bayes (NB) NB is a linear machine learning algorithm based on Bayes theorem with the assumption of independence between the independent variables. It is a simple and fast machine learning algorithm. NB has been widely used in toxicity prediction (Bender 2011; Ekins et al. 2010; Williams et al. 2020). The liver toxicity prediction models constructed using NB and the combination of chemical structural descriptors and in vitro high-throughput assay data yielded a

522

J. Liu et al.

balanced accuracy of 75% in tenfold cross-validations (Liu et al. 2015). In a study using clinical data, the models constructed with NB outperformed models built with other algorithms for eight of the 14 studied organ toxicity endpoints (Xu et al. 2020). The NB models built for reproductive toxicity prediction achieved an area under receiver operating characteristic curve (AUC) of 0.884 and accuracy of 91.8% in training as well as an AUC of 0.888 and accuracy of 83% in external validation (Zhang et al. 2020).

22.2.4 Random Forest Random Forest (RF) is an ensemble of bootstrapped decision trees and makes sample predictions by combining all predictions for the sample from the individual decision trees. The decision trees in RF are shallow trees which are constructed using a subset of randomly selected independent variables. One advantage is that data normalization is not required for RF model development. RF is a widely employed machine learning algorithm in various fields (Breiman 2001; Cutler et al. 2007; Strobl et al. 2008). The RF liver toxicity prediction models built using toxicogenomics data achieved a CCR of 76% in fivefold cross-validations (Low et al. 2011). In a drug-induced cardiotoxicity prediction, the RF models built with a combination of transcriptional and molecular features yielded an average AUC of 79% and an average Matthews correlation coefficient (MCC) of 0.38 in the hold-out validations (Mamoshina et al. 2020).

22.2.5 Support Vector Machine Support vector machine (SVM) is a machine learning algorithm that has been widely used for construction of both classification and regression models. For construction of a classifier SVM, this algorithm searches for a hyperplane that can separate the positive and negative groups in a higher dimensional space that is generated by mapping the lower dimensional space using a kernel function. Data normalization and parameter selection are usually used for SVM model optimization (Kim and Nam 2017; Liu et al. 2015; Zhang et al. 2016). The SVM models built using combined data of chemical structural descriptors and in vitro high-throughput screening assay data achieved a balanced accuracy of 0.77 for liver hypertrophy endpoint prediction in tenfold cross-validations (Liu et al. 2015). SVM was applied for developing models of kidney toxicity prediction and the models achieved accuracies of 0.83 for tubulo-interstitial nephritis (TIN), 0.85 for interstitial nephritis (IN), and 0.84 for tubular necrosis (TN) in external validations (Lee et al. 2013).

22 Machine Learning for Predicting Organ Toxicity

523

22.3 Organ Toxicity Prediction Organ toxicity is an adverse effect or disease on a specific organ caused by chemical exposure. The same chemical may affect different organs. Therefore, the toxicity of multiple organs such as liver, kidney, heart, and lung are required to be assessed in the safety evaluation of compounds contained in regulated products. With the advantages of power and efficiency, machine learning has been widely used to develop models for organ toxicity prediction (Patel et al. 2020). This section introduces the machine learning models that have been developed for predicting toxicity of three major organs: liver, kidney, and heart (Table 22.1).

22.3.1 Liver Toxicity Liver is the most vulnerable organ to perturbations from chemicals due to its key role in transforming and clearing xenobiotics. Hepatotoxicity is the most common reason for drug attrition (Bourhia et al. 2020; Matthews et al. 2009; Regev 2014; Russmann et al. 2009). Therefore, much effort has been expended on liver toxicity prediction. Table 22.1 Machine learning models for organ toxicity prediction Organ

Descriptors

Algorithm

Performancea

Liver

Structures, HTS assay data

CART, kNN, LDA, NB, and SVM

Accuracy: 0.67–0.84 (Liu et al. 2015)

Chemical structure, HTS assay data

NB, SVM, RF, XGBoost, and NNET

AUC: 0.68–0.78

References

(Xu et al. 2020)

Structures, genomics RF, kNN, SVM, Accuracy: 0.61–0.77 (Low et al. 2011) data DWD Kidney

Heart

a

Structures

SVM

Accuracy: 0.69–0.83 (Lee et al. 2013)

Genomics data

kNN

Accuracy: 0.80–0.88 (Kim and Shin 2014)

Structures, HTS assay data

NB, SVM, RF, NNET, XGBoost

AUC: 0.72–0.77

(Xu et al. 2020)

Structures, genomics RF data

AUC: 0.79

(Mamoshina et al. 2020)

Structures, HTS assay data

NB, SVC, RF, CART, kNN

F1 score: 0.74–0.75

(Liu et al. 2017)

Chemical structure, HTS assay data

NB, SVM, RF, NNET, XGBoost

Balanced accuracy: 0.63–0.70

(Xu et al. 2020)

From cross-validations

524

J. Liu et al.

Quantitative structure–activity relationship (QSAR) models are constructed based on the hypothesis that compounds with similar structures are likely to have similar biological activities and properties. QSAR models have been widely used for liver toxicity prediction (Chen et al. 2013; Hong et al. 2016; Matthews et al. 2009; Zhu et al. 2016). For example, Matthews et al. (2009) constructed QSAR models using four programs with different machine learning algorithms for drug-induced liver toxicity using the data from post-marketing adverse effects in humans. In vitro bioactivity data have been routinely used for toxicity prediction. To develop the predictive models for liver toxicity in rodents, Liu et al. combined chemical structural descriptors and in vitro high-throughput screening (HTS) assay data (Liu et al. 2015). In their study, the hepatic histopathology endpoints were extracted from in vivo animal studies in ToxRefDB (Martin et al. 2009) and were grouped into three broad liver toxicity categories (hypertrophy, injury, and proliferative lesions) based on domain terminology (Thoolen et al. 2010). Nine data sets were generated for supervised machine learning from three types of descriptors (chemical descriptors only, HTS assay data only, and both descriptors) and the three liver toxicity categories. Five machine learning algorithms, including LDA, NB, SVM, CART, and kNN, were applied to each data set individually for model development. As shown in Fig. 22.2, the models from chemical descriptors only outperformed the models from in vitro bioactivity data only, indicating chemical structure features are key to liver toxicity. Overall, the models built from the combination of chemical and bioactivity descriptors achieved a slightly better performance, suggesting that in vitro bioactivity data could provide additional information for characterizing the three liver toxicity categories in rodents. This demonstrates the performance improvement in liver toxicity prediction for models constructed using multiple types of data such as chemical structures and in vitro bioactivity data. For predicting liver toxicity in humans, Xu et al. (2020) also integrated chemical structure and HTS bioactivity data to build machine learning models. Five machine learning algorithms [NB, SVM, RF, Neuro Network (NNET), and Extreme Gradient Boosting (XGBoost)] were applied and the models built from the integrated chemical structure and bioactivity data, chemical structure data only, and the bioactivity data only achieved the best AUC of 0.78, 0.77, and 0.68, respectively. Consistent with the study in rodents (Liu et al. 2015), the models built with integration of multiple types of descriptors had better performance than the models built with individual types of descriptors. Genomics data have been widely used in toxicological research to develop models via machine learning algorithms for organ toxicity prediction. Low et al. built models using genomics data only, chemical structural descriptors only, and the combination of genomics data and structural descriptors or liver toxicity prediction (Low et al. 2011). In their study, liver organ level histopathology data were assessed for toxicity endpoints classification. A compound was considered toxic if histopathology changes were observed for the compound; otherwise, the compound was considered nontoxic. Models for liver toxicity prediction were then developed using four machine learning algorithms, including RF, kNN, SVM, and distance weighted discrimination (DWD), based on genomics data, chemical structural descriptors, and the combination of

22 Machine Learning for Predicting Organ Toxicity

525

Fig. 22.2 Performance of machine learning models for prediction of liver toxicity

both. The models were evaluated using fivefold cross-validations. The best models built using chemical descriptors only and genomics data only in the fivefold crossvalidations had CCR values 61% and 76%, respectively. The best model constructed using both chemical structural descriptors and genomics data in the fivefold crossvalidations achieved a CCR of 77%, a similar performance to the best model based on genomics data (76%). The results revealed the usefulness of genomics data in the development of liver toxicity prediction models using machine learning algorithms.

22.3.2 Kidney Toxicity Kidney plays an important role in metabolism which makes it vulnerable to druginduced organ toxicity. Drug-induced nephrotoxicity accounts for 19–25% of all cases of acute kidney injury (Bonventre et al. 2010; Faria et al. 2019). Therefore, machine learning models have been developed to assist in the evaluation of kidney toxicity. Chemical structural descriptors have been applied in the development of QSAR models for kidney toxicity prediction (Lee et al. 2013; Matthews et al. 2009; Myshkin et al. 2012; Pizzo et al. 2015; Sakuratani et al. 2013). QSAR models were developed using four software packages [MC4PC (www. multicase.com), BioEpisteme (www.prousscience.com), MDL-QSAR (www.mdli. com), and Leadscope Predictive Data Miner (www.leadscope.com)] for six types of drug-induced urinary tract injury in humans from 1600 chemical structures using the adverse event data curated from the post-market surveillance database of the US Food and Drug Administration (FDA) and the published literature (Matthews et al. 2009).

526

J. Liu et al.

The best models showed a high specificity (86.5%) and low sensitivity (39.3%), indicating that predicting kidney toxicity in humans by QSAR is a challenging task and improvement in model performance is needed. QSAR models were built for predicting nephrotoxicity of chemicals (Lee et al. 2013). The nephrotoxicity data were curated from the PharmaPendium database which contains clinical trial data and reports of post-marketing surveillance. The classification models for predicting three endpoints of kidney injury, including TIN, IN, and TN were developed using SVM based on eight types of chemical fingerprints of 387 pharmacological compounds and their 233 metabolites. Classification accuracies of the models using the 387 pharmacological compounds from the tenfold cross-validations achieved 0.71–0.77, 0.69–0.76, and 0.75–0.83 for TN, IN, and TIN, respectively, while the external validations resulted in accuracies 0.54–0.72, 0.64– 0.79, and 0.71–0.89 for TN, IN, and TIN, respectively. Interestingly, performance of the models based on the 233 metabolites were improved with overall accuracy values 0.70–0.83 for TN, 0.72–0.84 for IN, and 0.79–0.84 for TIN from tenfold cross-validations and 0.68–0.84 for TN, 0.73–0.85 for IN, and 0.72–0.83 for TIN from external validations. Their results demonstrated that nephrotoxicity could be reasonably predicted using QSAR models built with machine learning algorithms and metabolites of compounds provide useful information that should be included in the development of machine learning models to improve nephrotoxicity prediction accuracy. HTS in vitro bioactivity data have been used in the development of machine learning models for predicting nephrotoxicity. Xu et al. built machine learning models for predicting the nephrotoxicity endpoints in kidney ureter and bladder using HTS in vitro bioactivity data and chemical structures (Xu et al. 2020). In their study, the nephrotoxicity endpoint kidney ureter and bladder data were collected by manually searching on the ChemIDPlus database (chem.sis.nlm.nih.gov/chemidplus/). The nephrotoxicity prediction models were constructed using five machine learning algorithms XGBoost, NB, RF, SVM, and NNET based on chemical structural descriptors only, HTS in vitro bioactivity data only, and the combination of both structural descriptors and HTS in vitro bioactivity data. The models were evaluated using fivefold cross-validations. The best model based on chemical structures was obtained using NNET and had an AUC of 0.77. The best model based on HTS in vitro bioactivity data was built with SVM and showed a slightly smaller AUC of 0.72. Interestingly, the best model based on the combination of chemical structures and HTS in vitro bioactivity data yielded an AUC of 0.77, the same from the best model based on chemical structures only. The results showed that HTS in vitro bioactivity data may not convey more information than chemical structural features for predicting nephrotoxicity. Further investigations are needed to improve applications of HTS in vitro bioactivity data in nephrotoxicity prediction. Genomics data have also been applied in the development of machine learning models for kidney toxicity prediction. For example, models for drug-induced kidney toxicity prediction were developed using gene-expression data (Kim and Shin 2014). In this study, the 10 pathological findings in rat toxicity testing were selected from

22 Machine Learning for Predicting Organ Toxicity

527

Open TG-GATEs (http://toxico.nibio.go.jp) as indications of nephrotoxicity. Models for predicting the 10 pathological findings were built using kNN based on 3,708 rat gene expression profiles of 41 drugs. The performance of the models were evaluated using threefold cross-validations, resulting in overall prediction accuracy, sensitivity, and specificity of 0.56–0.92 (geometric mean 0.80), 0.39–0.88 (geometric mean 0.80), and 0.66–0.96 (geometric mean 0.88), respectively. Although the results demonstrated the usefulness of gene expression profiles in the development of machine learning models for predicting nephrotoxicity, it is worth noting that the number of drugs was small and generalization of the models is a limitation. It should also be noted that the performance of models for predicting different pathological findings of nephrotoxicity widely varied, indicating the assessment of nephrotoxicity of chemicals using pathological findings remains a challenging task.

22.3.3 Heart Toxicity The heart is also a common target organ of toxicity. Cardiotoxicity represents damages to the heart muscle due to the exposure of compounds. Thus, evaluation of cardiotoxicity of chemicals in products at the market and in the environment is vital for the protection of public health. Although many experimental methods have been developed for detecting cardiotoxicity, it is not practically feasible to test cardiotoxicity of all chemicals contained in marketed products and in the environment. Various alternative methods including computational approaches have been proposed to assist cardiotoxicity assessment. Machine learning has been widely used in the development of models for predicting cardiotoxicity. Machine learning models were developed using chemical structural descriptors and transcriptional profiles for predicting six drug-induced cardiotoxicity types (cardiac arrhythmias, myocardial disorders, heart failures, coronary artery diseases, cardiac disorders, and pericardial disorders) (Mamoshina et al. 2020). Of 357 drugs with cardiotoxicity data and transcriptional profiles collected from public databases such as DrugBank (www.drugbank.com), 66 were held out for testing and the remaining 291 were used in training. The models were trained with the 291 drugs and tested on the 66 hold-out drugs. In addition, 654 gene expression profiles of 51 drugs were downloaded from the Drug Toxicity Signature Generation Center website (https://martip03.u.hpc.mssm.edu/index.php) as an external validation set to further validate performance of the models. The best models were generated using RF and had an average AUC of 0.79 for the 10 cardiotoxicity forms in the hold-out validation and an average AUC of 0.66 in the external validation. The results demonstrate that combining gene expression profiles and molecular descriptors in the development of machine learning models may improve the performance in prediction of cardiotoxicity. Evidence showed that integrating different types of data in the development of machine learning models for cardiotoxicity prediction could improve model performance. For example, chemical descriptors, ToxPrint Chemotypes, in vitro HTS assay

528

J. Liu et al.

bioactivity data from ToxCast, and in vivo animal organ level histopathology data were integrated to build machine learning models for predicting animal organ level cardiotoxicity (Liu et al. 2017). The models were built using algorithms NB, SVM, RF, CART, and kNN and were evaluated using fivefold cross-validations. The optimal machine learning models based on the down-sampling data (50 positive and 50 negative chemicals) were generated from kNN using the combination of bioactivity data and chemical structural descriptors and the combination of bioactivity data and ToxPrint Chemotypes. This model achieved average F1 scores of 0.75 and 0.74 for predicting chronic and subchronic heart toxicity, respectively, in the fivefold crossvalidations. When using the full data set (chronic study: 125 positive and 125 negative chemicals; subchronic study: 134 positive and 134 negative chemicals), the models constructed using SVM performed the best with average F1 scores of 0.86 and 0.84 for predicting chronic and subchronic heart toxicity, respectively, in the fivefold cross-validations. These results further demonstrated that combining different types of data can improve machine learning models for predicting cardiotoxicity. Moreover, this study revealed that HTS in vitro assay data provide useful information for cardiotoxicity prediction via machine learning. As the models developed based on the full data set performed better than the models built with the down-sampling data set, the number of chemicals with cardiotoxicity data need to be increased to develop robust and accurate machine learning models for cardiotoxicity prediction. Tox21 in vitro HTS bioactivity assay data were utilized in the development of machine learning models for predicting cardiotoxicity (Xu et al. 2020). Five machine learning algorithms (NB, SVM, RF, NNET, and XGBoost) were used to develop models for cardiotoxicity prediction based on chemical structures, in vitro HTS bioactivity assay data, and the combination of both. The models were evaluated using fivefold cross-validations. The optimal cardiac toxicity prediction models constructed using SVM with chemical structural descriptors, XGBoost with in vitro HTS bioactivity assay data, and SVM with the combination of both, had average balanced accuracy values 0.70, 0.63, and 0.67 in the fivefold cross-validation, respectively. The optimal models for vascular toxicity based on chemical structural descriptors only, in vitro HTS bioactivity assay data, and the combination of both were built with NB, XGBoost, and NB and had average balanced accuracy values of 0.77, 0.62, and 0.66 in the fivefold cross-validation, respectively. Surprisingly, for both cardiac and vascular toxicity prediction, the optimal models constructed by combining chemical structures and in vitro HTS bioactivity assay data performed worse than the optimal models built with chemical structures only. The results not only indicate that caution is needed in utilization of in vitro HTS bioactivity assay data in machine learning model development, but also imply the importance of good practices in machine learning model development as the observation could be due to inappropriate procedures used in model construction by some machine learning algorithms.

22 Machine Learning for Predicting Organ Toxicity

529

22.4 A Case Study for Organ Toxicity Prediction The mechanisms of organ level toxicity are complicated. Machine learning provides an approach for in vitro to in vivo extrapolation. Liu et al. developed models for multiple organ level toxicity prediction using chemical structural descriptors, ToxPrint Chemotypes, HTS bioactivity descriptors, and histopathology toxicity endpoints from in vivo guideline animal studies (Liu et al. 2017). This section uses this data as a case study to show applications of machine learning techniques in organ toxicity prediction.

22.4.1 Data Sources The animal toxicity data used (Table 22.2) were curated from ToxRefDB (www.epa. gov/chemical-research/exploring-toxcast-data-downloadable-data). The in vivo toxicity data in ToxRefDB are generated from different guideline studies, species, and target organs (Martin et al. 2009) which are grouped by their study types and target sites of toxicity effects, such as CHR (chronic study): Kidney, MGR (multigenerational study): Kidney, and SUB (subchronic study): Kidney. Thirty-five in vivo toxicity outcomes across target organs and study types with at least 100 chemicals (50 positives and 50 negatives) were obtained from ToxRefDB and were used for developing machine learning models for predicting organ toxicity. Three types of data were used for description of the chemicals, including HTS bioactivity data, chemical structure data, and chemotype data. HTS in vitro assay data were extracted from the ToxCast database (https://www.epa.gov/chemical-res earch/exploring-toxcast-data-downloadable-data, version Nov 2014) (Judson et al. 2010). The chemical structures were obtained from the DSSTox database (Richard et al. 2016) and the structural descriptors were calculated using Morgan fingerprints (Rogers and Hahn 2010). The 729 ToxPrint chemotype (structural fragments) descriptors were generated for each chemical using the ChemoTyper software (Yang et al. 2015). In addition, two hybrid descriptors (bioactivity and chemical structural descriptors, bioactivity, and chemotype descriptors) were generated for machine learning model development (Table 22.3). For each of the five sets of independent variables (structural descriptors or bioactivity data), the compounds have 35 target organ outcomes across three study types. In total, 175 data sets were generated for model development. From each of the 175 data sets, the balanced data subsets with the maximal number of active or inactive chemicals were termed as full data sets. From each of the 175 full data sets, a smaller data set containing 50 active and 50 inactive chemicals was generated and termed as minimal data set.

530 Table 22.2 Chemical distribution across thirty-five organ toxicity endpoints

J. Liu et al. Organ toxicity endpoints

Positives

Negatives

Chronic study: adrenal gland

188

351

Chronic study: bone marrow

78

461

136

403

93

446

Chronic study: heart

128

411

Chronic study: kidney

324

215

Chronic study: liver

414

125

Chronic study: lung

183

356

Chronic study: lymph node

73

466

Chronic study: mammary gland

64

475

Chronic study: pancreas

66

473

Chronic study: brain Chronic study: eye

Chronic study: pituitary gland

76

463

Chronic study: spleen

205

334

Chronic study: stomach

134

405

Chronic study: testes

164

375

72

467

169

370

Chronic study: urinary bladder

53

486

Chronic study: uterus

86

453

Multigenerational study: brain

51

388

Multigenerational study: kidney

147

292

Multigenerational study: ovary

65

374

Multigenerational study: testes

91

348

Subchronic study: adrenal gland

126

401

Subchronic study: bone marrow

82

445

Subchronic study: brain

117

410

Subchronic study: heart

134

393

Subchronic study: kidney

313

214

Subchronic study: liver

387

140

Chronic study: thymus Chronic study: thyroid gland

Subchronic study: lung

87

440

176

351

88

439

Subchronic study: testes

159

368

Subchronic study: thymus

100

427

88

439

Subchronic study: spleen Subchronic study: stomach

Subchronic study: thyroid gland

22 Machine Learning for Predicting Organ Toxicity Table 22.3 Descriptors across thirty-five organ toxicity endpoints

Descriptor type Bioactivity Chemical structure ToxPrint chemotype

531 Number of descriptors 821 2048 729

Bioactivity and chemical structure

2869

Bioactivity and ToxPrint chemotype

1550

22.4.2 Supervised Machine Learning Predictive models were built for each of the 175 full and 175 minimal data sets using five supervised machine learning algorithms (NB, SVM, CART, kNN, and RF). Figure 22.3 summarizes the workflow of predictive model development for organ toxicity.

Fig. 22.3 Workflow for model development. chm: chemical structural descriptors; ct: ToxPrint chemotype descriptors; bio: bioactivity data; bc: bioactivity data and chemical structural descriptors; and bct: bioactivity data and ToxPrint chemotype descriptors

532

J. Liu et al.

For each pair of data set and machine learning algorithm, five numbers (5, 10, 15, 20, and 25) of independent variables were selected by ranking the F-values from ANOVA analysis on the independent variables in the in-loop training set to build models. Ten iterations of fivefold cross-validation were applied to evaluate performance of the 8750 constructed models. Classification performance was measured using F1 score, sensitivity, and specificity.

22.4.3 Results For the minimal data sets (50 positive and 50 negative chemicals), across all the models, the models for predicting multi-generation brain toxicity had the highest F1 scores of 0.85 ± 0.09. The mean F1 scores across models for predicting all target organ toxicity endpoints from machine learning algorithms were 0.73 for kNN (k = 3), 0.72 for kNN (k = 5), 0.70 for RF, 0.69 for SVM (linear kernel), 0.70 for SVM (radial basis kernel), 0.69 for CART, and 0.61 for NB. The mean F1 score across all target organ toxicity endpoints for the five types of independent variables were 0.70 for structural descriptors and bioactivity data, 0.70 for chemotype descriptors and bioactivity data, 0.70 for bioactivity data, 0.68 for chemotype descriptors, and 0.67 for structural descriptors. Of the models from the full data sets (maximal positive and negative chemicals), the models for predicting subchronic bone marrow toxicity had the highest F1 scores of 0.88 ± 0.06. For the full data sets, the mean F1 scores across models for predicting all target organ toxicity endpoints for machine learning algorithms were 0.73 for kNN, 0.72 for RF, 0.70 for SVM, 0.71 for CART, and 0.62 for NB. The mean F1 scores across models for predicting all target organ toxicity endpoints for the five types of independent variables were 0.72 for bioactivity and structural descriptors, 0.72 for bioactivity and chemotype descriptors, 0.72 for bioactivity descriptors, 0.69 for chemotype descriptors, and 0.68 for structural descriptors. To examine the impact of data size, the best F1 score of the models for each of the 35 organ toxicity endpoints developed using different types of descriptors, animal study types, and machine learning algorithms are given in Fig. 22.4. The blue circles and orange triangles are for the models built using the full data sets and the minimal data sets, respectively. Compared with the minimal data sets, the full data sets have more chemicals and the models built on the full data sets had better performance than the models constructed with the minimal data sets. Examination of the impact of different modeling factors on the performance of the organ toxicity prediction models revealed that target organ toxicity endpoints had the largest impact on model performance followed by types of independent variables, while machine learning algorithms had the smallest impact. Furthermore, the models built from a hybrid of two types of descriptors had better performance than the models generated from one type of descriptor. Among the models built from the minimal data sets, the kNN models performed best. However, no single algorithm stands out from the models constructed from the full data sets.

22 Machine Learning for Predicting Organ Toxicity

533

Fig. 22.4 F1 score of the best model constructed for each of the 35 organ toxicity endpoints

This research demonstrated the utility of bioactivity data and chemical structural descriptors in the development of machine learning models for predicting target organ toxicity observed in animal guideline studies.

22.5 Summary Target organ toxicity, such as liver, kidney, heart, and lung is a common concern in safety assessment and the major cause of drug attrition. Therefore, it is important to evaluate potential target organ toxicity for chemicals as early as possible for regulatory agencies and pharmaceutical companies even though the mechanisms underlying the organ toxicity are still not clear. Currently, the safety assessment of chemicals primarily relies on animal testing. However, animal testing is expensive and time consuming which makes it impossible to test thousands of compounds using animal models. With the proposed 3Rs (replace, reduce, and refine) principle, more and more efforts have focused on alternative methods for toxicity testing. Many computational toxicology methods have been developed and used in drug discovery, risk assessment of chemicals, and safety evaluation of products (Cheng et al. 2017; Hong et al. 1998, 2018; Huang et al. 2020; Ji et al. 2021; Ng et al. 2015, 2014; Sakkiah et al. 2020a, b, 2021; Shen et al.

534

J. Liu et al.

2013; Yang et al. 2021). Of the computational methods, machine learning is the most promising method with its computational power and efficiency (Hong et al. 2017, 2005; Idakwo et al. 2019; Luo et al. 2015; Sakamuru et al. 2021; Sakkiah et al. 2018, 2017; Tang et al. 2020; Wang et al. 2020, 2021). The induction of organ toxicity is complicated and impacted by various factors. Therefore, it is a challenge to predict in vivo organ toxicity. The studies indicate that target organ, type of data descriptors, and machine learning algorithms all affect the performance of target organ toxicity prediction (Liu et al. 2017; Xu et al. 2020). The recently developed computational models for human in vivo organ level toxicity prediction provide a promising view on the connections between molecular genes and organ level toxicity (Xu et al. 2021). With the development of new technology and the increasing amount of available data, there will be more ways to optimize and validate computational models for screening chemicals with potential organ level toxicity. Disclaimer The chapter reflects the views of the authors and does not necessarily reflect those of the U.S. Food and Drug Administration.

References Ashburn TT, Thor KB (2004) Drug repositioning: identifying and developing new uses for existing drugs. Nat Rev Drug Discov 3(8):673–683 Baskin II (2018) Machine Learning Methods in Computational Toxicology. In: Nicolotti O (eds) Computational toxicology. Methods in molecular biology, vol 1800. Humana Press, New York, NY Bender A (2011) Bayesian methods in virtual screening and chemical biology. Methods Mol Biol 672:175–196 Bonventre JV, Vaidya VS, Schmouder R, Feig P, Dieterle F (2010) Next-generation biomarkers for detecting kidney toxicity. Nat Biotechnol 28(5):436–440 Bourhia M, Ullah R, A SA, Ibenmoussa S (2020) Evidence of drug-induced hepatotoxicity in the Maghrebian population. Drug Chem Toxicol 1–5 Breiman L (2001) Random forests. Mach Learn 45:5–32 Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. Taylor & Francis Chen M, Hong H, Fang H, Kelly R, Zhou G, Borlak J, Tong W (2013) Quantitative structureactivity relationship models for predicting drug-induced liver injury based on FDA-approved drug labeling annotation and using a large collection of drugs. Toxicol Sci 136(1):242–249 Cheng F, Hong H, Yang S, Wei Y (2017) Individualized network-based drug repositioning infrastructure for precision oncology in the panomics era. Brief Bioinform 18(4):682–697 Cook D, Brown D, Alexander R, March R, Morgan P, Satterthwaite G, Pangalos MN (2014) Lessons learned from the fate of AstraZeneca’s drug pipeline: a five-dimensional framework. Nat Rev Drug Discov 13(6):419–431 Cutler DR, Edwards TC Jr, Beard KH, Cutler A, Hess KT, Gibson J, Lawler JJ (2007) Random forests for classification in ecology. Ecology 88(11):2783–2792 Ekins S, Williams AJ, Xu JJ (2010) A predictive ligand-based Bayesian model for human druginduced liver injury. Drug Metab Dispos 38(12):2302–2308

22 Machine Learning for Predicting Organ Toxicity

535

Faria J, Ahmed S, Gerritsen KGF, Mihaila SM, Masereeuw R (2019) Kidney-based in vitro models for drug-induced toxicity testing. Arch Toxicol 93(12):3397–3418 Hong H, Chen M, Ng HW, Tong W (2016) QSAR models at the US FDA/NCTR. Methods Mol Biol 1425:431–459 Hong H, Neamati N, Winslow HE, Christensen JL, Orr A, Pommier Y, Milne GW (1998) Identification of HIV-1 integrase inhibitors based on a four-point pharmacophore. Antivir Chem Chemother 9(6):461–472 Hong H, Thakkar S, Chen M, Tong W (2017) Development of decision forest models for prediction of drug-induced liver injury in humans using a large set of FDA-approved drugs. Sci Rep 7(1):17311 Hong H, Tong W, Xie Q, Fang H, Perkins R (2005) An in silico ensemble method for lead discovery: decision forest. SAR QSAR Environ Res 16(4):339–347 Hong H, Zhu J, Chen M, Gong P, Zhang C, Tong W (2018) Quantitative structure–activity relationship models for predicting risk of drug-induced liver injury in humans. In: Chen M, Will Y (eds) Drug-induced liver toxicity. Methods in pharmacology and toxicology. Humana, New York, NY, pp 77–100 Huang Y, Li X, Xu S, Zheng H, Zhang L, Chen J, Hong H, Kusko R, Li R (2020) Quantitative structure-activity relationship models for predicting inflammatory potential of metal oxide nanoparticles. Environ Health Perspect 128(6):67010 Idakwo G, Luttrell IV J, Chen M, Hong H, Gong P, Zhang C (2019) A review of feature reduction methods for QSAR-based toxicity prediction. In: Hong H (eds) Advances in computational toxicology. Challenges and advances in computational chemistry and physics, vol 30. Springer, Cham, pp 119–139 Ji Z, Guo W, Sakkiah S, Liu J, Patterson TA, Hong H (2021) Nanomaterial databases: data sources for promoting design and risk assessment of nanomaterials. Nanomaterials (Basel) 11(6):1599 Judson RS, Houck KA, Kavlock RJ, Knudsen TB, Martin MT, Mortensen HM, Reif DM, Rotroff DM, Shah I, Richard AM, Dix DJ (2010) In vitro screening of environmental chemicals for targeted testing prioritization: the ToxCast project. Environ Health Perspect 118(4):485–492 Kim E, Nam H (2017) Prediction models for drug-induced hepatotoxicity by using weighted molecular fingerprints. BMC Bioinform 18(Suppl 7):227 Kim J, Shin M (2014) An integrative model of multi-organ drug-induced toxicity prediction using gene-expression data. BMC Bioinform 15(Suppl 16):S2 Lee S, Kang YM, Park H, Dong MS, Shin JM, No KT (2013) Human nephrotoxicity prediction models for three types of kidney injury based on data sets of pharmacological compounds and their metabolites. Chem Res Toxicol 26(11):1652–1659 Lin Z, Will Y (2012) Evaluation of drugs with specific organ toxicities in organ-specific cell lines. Toxicol Sci 126(1):114–127 Liu J, Mansouri K, Judson RS, Martin MT, Hong H, Chen M, Xu X, Thomas RS, Shah I (2015) Predicting hepatotoxicity using ToxCast in vitro bioactivity and chemical structure. Chem Res Toxicol 28(4):738–751 Liu J, Patlewicz G, Williams AJ, Thomas RS, Shah I (2017) Predicting organ toxicity using in vitro bioactivity data and chemical structure. Chem Res Toxicol 30(11):2046–2059 Loh WY (2011) Classification and regression trees. Wiley Interdiscip Rev: Data Min Knowl Discov 1:14–23 Low Y, Uehara T, Minowa Y, Yamada H, Ohno Y, Urushidani T, Sedykh A, Muratov E, Kuz’min V, Fourches D, Zhu H, Rusyn I, Tropsha A (2011) Predicting drug-induced hepatotoxicity using QSAR and toxicogenomics approaches. Chem Res Toxicol 24(8):1251–1262 Lu TP, Chen JJ (2015) Identification of drug-induced toxicity biomarkers for treatment determination. Pharm Stat 14(4):284–293 Luo H, Ye H, Ng HW, Shi L, Tong W, Mendrick DL, Hong H (2015) Machine learning methods for predicting HLA-peptide binding activity. Bioinform Biol Insights 9(Suppl 3):21–29 Mamoshina P, Bueno-Orovio A, Rodriguez B (2020) Dual transcriptomic and molecular machine learning predicts all major clinical forms of drug cardiotoxicity. Front Pharmacol 11:639

536

J. Liu et al.

Martin MT, Judson RS, Reif DM, Kavlock RJ, Dix DJ (2009) Profiling chemicals based on chronic toxicity results from the U.S. EPA ToxRef Database. Environ Health Perspect 117(3):392–399 Matthews EJ, Ursem CJ, Kruhlak NL, Benz RD, Sabate DA, Yang C, Klopman G, Contrera JF (2009) Identification of structure-activity relationships for adverse effects of pharmaceuticals in humans: Part B. Use of (Q)SAR systems for early detection of drug-induced hepatobiliary and urinary tract toxicities. Regul Toxicol Pharmacol 54(1):23–42 Myshkin E, Brennan R, Khasanova T, Sitnik T, Serebriyskaya T, Litvinova E, Guryanov A, Nikolsky Y, Nikolskaya T, Bureeva S (2012) Prediction of organ toxicity endpoints by QSAR modeling based on precise chemical-histopathology annotations. Chem Biol Drug Des 80(3):406–416 Ng HW, Shu M, Luo H, Ye H, Ge W, Perkins R, Tong W, Hong H (2015) Estrogenic activity data extraction and in silico prediction show the endocrine disruption potential of bisphenol A replacement compounds. Chem Res Toxicol 28(9):1784–1795 Ng HW, Zhang W, Shu M, Luo H, Ge W, Perkins R, Tong W, Hong H (2014) Competitive molecular docking approach for predicting estrogen receptor subtype alpha agonists and antagonists. BMC Bioinform 15(Suppl 11):S4 Patel CN, Kumar SP, Rawal RM, Patel DP, Gonzalez FJ, Pandya HA (2020) A multiparametric organ toxicity predictor for drug discovery. Toxicol Mech Methods 30(3):159–166 Paul SM, Mytelka DS, Dunwiddie CT, Persinger CC, Munos BH, Lindborg SR, Schacht AL (2010) How to improve R&D productivity: the pharmaceutical industry’s grand challenge. Nat Rev Drug Discov 9(3):203–214 Pizzo F, Gadaleta D, Lombardo A, Nicolotti O, Benfenati E (2015) Identification of structural alerts for liver and kidney toxicity using repeated dose toxicity data. Chem Cent J 9:62 Raschi E, De Ponti F (2017) Drug-induced liver injury: towards early prediction and risk stratification. World J Hepatol 9(1):30–37 Regev A (2014) Drug-induced liver injury and drug development: industry perspective. Semin Liver Dis 34(2):227–239 Richard AM, Judson RS, Houck KA, Grulke CM, Volarath P, Thillainadarajah I, Yang C, Rathman J, Martin MT, Wambaugh JF, Knudsen TB, Kancherla J, Mansouri K, Patlewicz G, Williams AJ, Little SB, Crofton KM, Thomas RS (2016) ToxCast chemical landscape: paving the road to 21st century toxicology. Chem Res Toxicol 29(8):1225–1251 Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50(5):742–754 Russmann S, Kullak-Ublick GA, Grattagliano I (2009) Current concepts of mechanisms in druginduced hepatotoxicity. Curr Med Chem 16(23):3041–3053 Sakamuru S, Zhao J, Xia M, Hong H, Simeonov A, Vaisman I, Huang R (2021) Predictive models to identify small molecule activators and inhibitors of opioid receptors. J Chem Inf Model 61(6):2675–2685 Sakkiah S, Guo W, Pan B, Kusko R, Tong W, Hong H (2018) Computational prediction models for assessing endocrine disrupting potential of chemicals. J Environ Sci Health C Environ Carcinog Ecotoxicol Rev 36(4):192–218 Sakkiah S, Guo W, Pan B, Ji Z, Yavas G, Azevedo M, Hawes J, Patterson TA, Hong H (2020a) Elucidating interactions between SARS-CoV-2 trimeric spike protein and ACE2 using homology modeling and molecular dynamics simulations. Front Chem 8:622632 Sakkiah S, Leggett C, Pan B, Guo W, Valerio LG Jr, Hong H (2020b) Development of a nicotinic acetylcholine receptor nAChR alpha7 binding activity prediction model. J Chem Inf Model 60(4):2396–2404 Sakkiah S, Selvaraj C, Gong P, Zhang C, Tong W, Hong H (2017) Development of estrogen receptor beta binding prediction model using large sets of chemicals. Oncotarget 8(54):92989–93000 Sakkiah S, Selvaraj C, Guo W, Liu J, Ge W, Patterson TA, Hong H (2021) Elucidation of agonist and antagonist dynamic binding patterns in ER-alpha by integration of molecular docking, molecular dynamics simulations and quantum mechanical calculations. Int J Mol Sci 22(17) Sakuratani Y, Zhang HQ, Nishikawa S, Yamazaki K, Yamada T, Yamada J, Gerova K, Chankov G, Mekenyan O, Hayashi M (2013) Hazard Evaluation Support System (HESS) for predicting repeated dose toxicity using toxicological categories. SAR QSAR Environ Res 24(5):351–363

22 Machine Learning for Predicting Organ Toxicity

537

Schuster D, Laggner C, Langer T (2005) Why drugs fail—a study on side effects in new chemical entities. Curr Pharm Des 11(27):3545–3559 Shen J, Xu L, Fang H, Richard AM, Bray JD, Judson RS, Zhou G, Colatsky TJ, Aungst JL, Teng C, Harris SC, Ge W, Dai SY, Su Z, Jacobs AC, Harrouk W, Perkins R, Tong W, Hong H (2013) EADB: an estrogenic activity database for assessing potential endocrine activity. Toxicol Sci 135(2):277–291 Strobl C, Boulesteix AL, Kneib T, Augustin T, Zeileis A (2008) Conditional variable importance for random forests. BMC Bioinform 9:307 Tang W, Chen J, Hong H (2020) Discriminant models on mitochondrial toxicity improved by consensus modeling and resolving imbalance in training. Chemosphere 253:126768 Thoolen B, Maronpot RR, Harada T, Nyska A, Rousseaux C, Nolte T, Malarkey DE, Kaufmann W, Küttler K, Deschl U (2010) Proliferative and nonproliferative lesions of the rat and mouse hepatobiliary system. Toxicol Pathol 38(Suppl 7):5S-81S Tornqvist E, Annas A, Granath B, Jalkesten E, Cotgreave I, Oberg M (2014) Strategic focus on 3R principles reveals major reductions in the use of animals in pharmaceutical toxicity testing. PLoS ONE 9(7):e101638 Wang Z, Chen J, Hong H (2020) Applicability domains enhance application of PPARgamma agonist classifiers trained by drug-like compounds to environmental chemicals. Chem Res Toxicol 33(6):1382–1388 Wang Z, Chen J, Hong H (2021) Developing QSAR models with defined applicability domains on PPARgamma binding affinity using large data sets and machine learning algorithms. Environ Sci Technol 55(10):6857–6866 Wilke RA, Lin DW, Roden DM, Watkins PB, Flockhart D, Zineh I, Giacomini KM, Krauss RM (2007) Identifying genetic risk factors for serious adverse drug reactions: current progress and challenges. Nat Rev Drug Discov 6(11):904–916 Williams DP, Lazic SE, Foster AJ, Semenova E, Morgan P (2020) Predicting drug-induced liver injury with Bayesian machine learning. Chem Res Toxicol 33(1):239–248 Xu T, Ngan DK, Ye L, Xia M, Xie HQ, Zhao B, Simeonov A, Huang R (2020) Predictive models for human organ toxicity based on in vitro bioactivity data and chemical structure. Chem Res Toxicol 33(3):731–741 Xu T, Wu L, Xia M, Simeonov A, Huang R (2021) Systematic identification of molecular targets and pathways related to human organ level toxicity. Chem Res Toxicol 34(2):412–421 Yang C, Tarkhov A, Marusczyk J, Bienfait B, Gasteiger J, Kleinoeder T, Magdziarz T, Sacher O, Schwab CH, Schwoebel J, Terfloth L, Arvidson K, Richard A, Worth A, Rathman J (2015) New publicly available chemical query language, CSRML, to support chemotype representations for application to data mining and modeling. J Chem Inf Model 55(3):510–528 Yang X, Ou W, Zhao S, Wang L, Chen J, Kusko R, Hong H, Liu H (2021) Human transthyretin binding affinity of halogenated thiophenols and halogenated phenols: an in vitro and in silico study. Chemosphere 280:130627 Zhang C, Cheng F, Li W, Liu G, Lee PW, Tang Y (2016) In silico prediction of drug induced liver toxicity using substructure pattern recognition method. Mol Inform 35(3–4):136–144 Zhang H, Shen C, Liu RZ, Mao J, Liu CT, Mu B (2020) Developing novel in silico prediction models for assessing chemical reproductive toxicity using the naive Bayes classifier method. J Appl Toxicol 40(9):1198–1209 Zhu XW, Xin YJ, Chen QH (2016) Chemical and in vitro biological information to predict mouse liver toxicity using recursive random forests. SAR QSAR Environ Res 27(7):559–572

Part IV

The Progress of Machine Learning and Deep Learning in New Areas

Chapter 23

Computational Modeling for the Prediction of Hepatotoxicity Caused by Drugs and Chemicals Minjun Chen, Jie Liu, Tsung-Jen Liao, Kristin Ashby, Yue Wu, Leihong Wu, Weida Tong, and Huixiao Hong

23.1 Introduction As the primary metabolic organ, the human liver is vulnerable to drugs and chemicals. Drug-induced liver injury (DILI) is one of the main reasons for halting drug development processes and has caused over 50 drugs to be withdrawn from the market after approval (Chen et al. 2011). Moreover, even when a drug is deemed safe and is approved for public use, a relatively small fraction of the population taking the drug

M. Chen (B) · J. Liu · T.-J. Liao · K. Ashby · Y. Wu · L. Wu · W. Tong · H. Hong National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA e-mail: [email protected] J. Liu e-mail: [email protected] T.-J. Liao e-mail: [email protected] K. Ashby e-mail: [email protected] Y. Wu e-mail: [email protected] L. Wu e-mail: [email protected] W. Tong e-mail: [email protected] H. Hong e-mail: [email protected] This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2023 H. Hong (ed.), Machine Learning and Deep Learning in Computational Toxicology, Computational Methods in Engineering & the Sciences, https://doi.org/10.1007/978-3-031-20730-3_23

541

542

M. Chen et al.

may experience liver damage. It is therefore important to develop predictive models for preventing and reducing the liability of DILI (Chen et al. 2014). The waves of current digitization technologies will continue generating evergrowing volumes of data. In drug development, the advancement of in silico, highthroughput, and toxicogenomic assays has created huge amounts of toxicity data. Consequently, questions such as how to extract hidden patterns and how to interpret the results from data analysis and transform them into scientific knowledge pose new challenges for toxicologists and bioinformaticians. Artificial intelligence and machine learning have emerged as potentially valuable tools within the computational toxicological field. Machine learning is a branch of artificial intelligence that uses statistics-based computational methods with inhouse or commercial software, enabling the discovery of hidden and non-obvious patterns in toxicity data and making reliable statistical predictions when encountering similar new data. Specifically, the ability of machine learning to automatically identify patterns in data is particularly important when expert knowledge is incomplete or inaccurate, when the amount of available data is too large to be handled manually, or when there are exceptions to the general cases (Chicco 2017). In this chapter, we will introduce the basic concepts and technologies associated with machine learning. We will also review current studies in the literature to explain specific machine learning-based approaches. Finally, we will present a state-of-theart scientific case study in which machine learning was used to resolve real-world hepatotoxicity questions.

23.2 Machine Learning Methods for Predicting Hepatotoxicity Machine learning is a branch of artificial intelligence which focuses on the use of data and algorithms to simulate the way that humans learn. Using statistical methods and software, machine learning can extract knowledge from data and discover the hidden or non-obvious patterns in data and make predictions on new data with minimal human intervention. A machine learning mode for predicting hepatoxicity usually consists of the following three components: (1) A toxicity dataset: The input toxicity data can be labeled or without labels by humans, known as supervised and unsupervised machine learning methods, respectively. To develop a supervised machine learning model, a toxicity dataset is usually divided into three subsets: a training dataset (used for developing a machine learning model), a validation dataset (used for measuring the machine learning model’s performance), and a test dataset (used to examine and determine how the machine learning model will perform in real-world conditions). (2) A machine learning algorithm for optimization and decision-making: Based on the input data, a candidate model produces an estimation about a pattern found in

23 Computational Modeling for the Prediction of Hepatotoxicity Caused …

543

the training dataset. If the model can better fit the data point, then weights can be adjusted to reduce the discrepancy between the labeled data and the model estimate. The algorithm repeats this evaluation and optimizes this process, updating weights autonomously until a threshold of measured performance is met. (3) An error metric: An error metric serves to measure and evaluate the prediction performance of the candidate models during the learning process and determine the final model. If labeled data by humans are available, the error function can compare labeled data with predictions to measure the predictive performance of a model. Machine learning has been widely applied to predicting hepatoxicity (Table 23.1) and can be used to extract the underlying patterns and information encoded in molecular or bioactivity descriptors and predict the toxicity profile of new compounds. The basic flowchart for developing machine learning models is illustrated in Fig. 23.1. Several machine learning algorithms have been used to infer the patterns in chemical descriptors and their relationship with toxicity, including Logistic Regression, Naive Bayes, k-Nearest Neighbors, Decision Trees, Support Vector Machines, and Neural Networks. In the next sections, we will discuss some commonly used machine learning techniques for predicting hepatoxicity.

23.2.1 Toxicity Dataset for Machine Learning A toxicity dataset for machine learning usually is a collection of instances or samples, each of which contains a panel of observations with or without labeled outcome data. The outcome data usually are study-specific, while the observation data are more general and can be shared by different types of studies. Molecular descriptors are most frequently used in machine learning for predicting hepatoxicity, followed by biological activity data collected from experiments, especially from high-throughput screening assays and toxicogenomic studies. Developing a machine learning model for hepatoxicity requires a high-quality labeled dataset with many samples or instances. This is often difficult and costly to generate.

23.2.1.1

Chemical Descriptors

Chemical descriptors are characterized as mathematical representations of molecular structures and physicochemical properties and are usually generated by calculations from chemical structures using specially designed algorithms. The descriptors can be qualitative, representing the existence of a specific property, or quantitative, describing the value of the molecules’ physical and chemical information. For example, logP is a quantitative representation of the lipophilicity of the molecules, which can be measured through the partitioning of the molecule between an aqueous phase and a lipophilic phase or calculated via commercial or open-source tools.

Toxicity predicted

Topological indices of molecule

Radial distribution function

Extended connectivity functional fingerprint and physicochemical descriptors

Mold2 chemical descriptor

Toxicogenomic descriptors

2D molecular descriptor

Chemical and in vitro descriptors

Bioassay activity data and Mold2 descriptors

2D fragment descriptor

ISIDA fragment descriptors

MACCS and FP4 fingerprints

Amgen chemical descriptors

Algorithm

k-nearest neighbor

Linear discriminant analysis

Linear discriminant analysis

Decision forest

Random forest

Ensemble recursive partitioning

Recursive random forest

Decision forest algorithm

SVM

SVM

SVM

SVM

70% (n = 319)

66% (n = 1317)

65% (n = 285)

75% (n = 88)

78% (n = 18)

62–68% (n = 531) 66%b (n = 424)

71% (n = 111)

81% (n = 98)

61–79% (n = 190–328)

60% (n = 237)

84% (n = 37)

External testa

70% (n = 222)

74% (n = 238)

76% (n = 382)

76% (n = 127)

70% (n = 197)

59% (n = 295)

78% (n = 74)

Cross-validationa

Table 23.1 Selected studies for predicting hepatotoxicity using machine learning approaches References

(Liu et al.2020) (continued)

(Zhang et al. 2016a, b)

(Muller et al. 2015)

(Fourches et al. 2010)

(Wu et al. 2017)

(Zhu et al. 2016)

(Cheng and Dixon 2003)

(Low et al. 2011)

(Chen et al. 2013a, b)

(Ekins et al. 2010)

(Cruz-Monteagudo et al. 2008)

(Rodgers et al. 2010)

544 M. Chen et al.

Encoding layers based on SMILES, PaDEL descriptors

Graph-based chemical structure

Physicochemical descriptors and fingerprints

PaDEL molecular descriptor

PaDEL molecular descriptors

Deep learning

Deep graph learning

Ensemble classifier

Ensemble of mixed learning

Ensemble of mixed learning

b

Accuracy with number of samples in parenthesis Balanced accuracy

2D and 3D physicochemical descriptors

SVM with a genetic algorithm

a

Toxicity predicted

Algorithm

Table 23.1 (continued)

(Ai et al. 2018)

(Liew et al. 2011)

68% (n = 1087) 84% (n = 285)

(Liu et al. 2015)

81%b (n = 677)

71% (n = 1241)

(Ma et al. 2021)

79% (n = 479)

75% (n = 120)

(Xu et al. 2015)

70–80% (n = 190–197)

References (Mulliner et al. 2016)

62–70% (n = 185–320)

External testa

75% (n = 3712)

Cross-validationa

23 Computational Modeling for the Prediction of Hepatotoxicity Caused … 545

546 Fig. 23.1 The basic flowchart for developing machine learning models

M. Chen et al.

Validation set

Training set

Model training Model tuning Error < threshold

Final predictive model

Test set

To construct a quantitative structure–activity relationship (QSAR) model, chemical structures should be transformed into numerical descriptors which represent the compounds’ structural characteristics. With the assumption that compounds with similar chemical structures can have similar biological activities, chemical descriptors have been widely used in developing QSAR models for classification and regression. QSAR models for hepatoxicity prediction have been applied using chemical descriptors and different machine learning algorithms, such as linear discriminant analysis, decision tree, neural network, and deep learning. Several algorithms have been developed for calculating chemical descriptors, which can be categorized into those based on origin or dimensionality. The originbased descriptors encompass topological (e.g., graph theory-based), geometrical (e.g., distances, valence angles, and surfaces), constitutional (e.g., functional group count), thermodynamic (e.g., heat of formation, entropy), and quantum-chemical (e.g., charge distribution-related) descriptors. Meanwhile, the main classes of dimensionality-based descriptors are as follows: 0D descriptors (e.g., constitutional descriptors, count descriptors), 1D descriptors (e.g., structural fragments, fingerprints), 2D descriptors (i.e., topological indices), 3D descriptors (e.g., 3D-MoRSE descriptors, quantum-chemical descriptors, size, steric, surface, and volume descriptors), and 4D descriptors (e.g., those derived from GRID or CoMFA methods, VolSurf). Chemical descriptors can be calculated by commercial or open-source software packages such as Mold2 (Hong et al. 2008, 2018), PaDEL (Yap 2011), DRAGON (www.talete.mi.it), and RDKit (www.rdkit.org). For example, Mold2 is a software package developed by the FDA’s National Center for Toxicological Research (NCTR) with a total of 77 chemical descriptors mostly derived from those well-documented

23 Computational Modeling for the Prediction of Hepatotoxicity Caused …

547

molecular descriptors reported in literature (Hong et al. 2008). In Mold2, 107 descriptors are 1D descriptors, which are calculated directly from molecular formulas (e.g., count of different atoms, molecular weights). Most descriptors are calculated from 2D chemical structures, including counts of different types of atoms, bonds, functional groups, and chemical-physical properties. Comparison analysis between the chemical descriptors generated by Mold2 and other commercial software, e.g., Cerius2, Dragon, and Molconn-Z, demonstrated that Mold2 descriptors convey sufficient structural information. The test on multiple datasets showed that better models were generated using Mold2 descriptors than were generated using the compared commercial software packages (Hong et al. 2008). Mold2 software was developed using C/C++ and enables rapid calculations by using multiple CPU cores in parallel. It can be used to calculate descriptors for small or large datasets in a single run with all molecules in a single SDF file, which can be exported from other databases. Information about the Mold2 package is publicly available (https://www.fda.gov/science-research/bioinformatics-tools/mold2).

23.2.1.2

Bioactivity Data

Bioactivity data play an important role in drug safety studies and is another toxicity data type that is frequently used in machine learning for hepatoxicity. Some assays, such as high-throughput screening assays, which measure multiple toxicological endpoints simultaneously, provide a promising approach for assessing liver toxicity associated with the use of drugs and chemicals. In vitro high-throughput screening data are frequently used for machine learning. Some government-sponsored initiatives, such as the Toxicology in the 21st Century (Tox21) program (a collaboration between the National Institute of Environmental Health Sciences (NIEHS) of the National Institutes of Health (NIH), National Center for Advancing Translational Sciences (NCATS), US Food and Drug Administration (FDA), and the US Environmental Protection Agency (EPA)) and the Toxicity Forecaster (ToxCast) project of the EPA, offer large datasets with thousands of compounds screened by high-throughput screening biological assays that are in the public domain. These datasets have been used to develop models for toxicity prediction, such as endocrine toxicity (Judson et al. 2015), reproductive and developmental toxicity (Sipes et al. 2011), and organ toxicity (Xu et al. 2020). Liu et al. (2015) developed computational models to predict in vivo liver toxicity in animals by combining high-throughput in vitro assay data from ToxCast and chemical structures. In their study, hypertrophy, injury, and proliferative lesions retrieved from the hepatic histopathologic reports in rat chronic studies were used as liver toxicity endpoints. Meanwhile, 125 ToxCast in vitro bioassays, 726 chemical descriptors, and combinations of both were used as model input data. A panel of machine learning algorithms, including linear discriminant analysis, Naive Bayes, support vector machine (SVM), classification and regression tree, and k-nearest neighbors, were used to develop predictive models.

548

M. Chen et al.

Another type of bioactivity dataset uses toxicogenomics to develop machine learning models for hepatotoxicity. Specifically, the changes of transcriptome expression in cell lines after exposure to a drug with DILI potential could provide mechanistic insight into the development of DILI and improve predictive performance. Low et al. (2011) developed predictive models for hepatotoxicity using toxicogenomic data and structural descriptors, alone and in combination. They developed the models by using four machine learning approaches, i.e., distance-weighted discrimination, k-nearest neighbors, random forest, and SVM. Feng et al. (2019) used gene expression data associated with DILI collected from ArrayExpress to develop SVM and deep learning models for predicting DILI.

23.2.2 Metrics for Evaluating Model Performance Statistical metrics such as accuracy, precision, recall, Matthew’s correlation coefficient (MCC), receiver operating characteristic (ROC) curve, and area under the ROC curve (AUC) were used for assessing the performance of the machine learning models.

23.2.2.1

Sensitivity, Specificity, and Accuracy

Prediction accuracy of a machine learning model is defined by Eq. (23.1) as the ratio of correctly classified samples to the total number of samples. Sensitivity is defined by Eq. (23.2) as the portion of actual positives that are predicted as positives. Specificity is defined by Eq. (23.3) as the portion of actual negatives that are predicted as negatives. Accuracy = (TP + TN)/(TP + TN + FP + FN)

(23.1)

Sensitivity = TP/(TP + FN)

(23.2)

Specificity = TN/(TN + FP)

(23.3)

Here, TP is the number of actual positives which are predicted as positives, TN is the number of actual negatives which are predicted as negatives, FP is the number of actual negatives which are predicted as positives, and FN is the number of actual positives which are predicted as negatives.

23 Computational Modeling for the Prediction of Hepatotoxicity Caused …

23.2.2.2

549

MCC and Balanced Accuracy

MCC is a robust alternative measure for accuracy, unaffected by the imbalanced datasets issue, and is a contingency matrix method of calculating the Pearson product-moment correlation coefficient between actual and predicted values. MCC is calculated using Eq. (23.4). TP ∗ TN − FP ∗ FN MCC = √ (TP + FP) ∗ (TP + FN) ∗ (TN + FP) ∗ (TN + FN)

(23.4)

MCC ranges in the interval [− 1, + 1], with extreme values – 1 and + 1 reached in cases of perfect misclassification and perfect classification, respectively, while MCC = 0 is the expected value for the classifier, with prediction by chance. Balanced accuracy is another metric for adjusting the imbalanced datasets issue and is calculated using Eq. (23.5). Balanced accuracy = (Sensitivity + Specificity)/2

23.2.2.3

(23.5)

AUC

A ROC curve is a graphical plot that illustrates the diagnostic ability of a machine learning model as its discrimination threshold is varied. AUC provides a quantitative metric. The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The TPR is also known as sensitivity, recall, or probability of detection in machine learning and is calculated using Eq. (23.6). The FPR is also known as the fall-out or probability of false alarm and is calculated using Eq. (23.7). TPR = TP/(TP + FN)

(23.6)

FPR = FP/(FP + TN)

(23.7)

23.2.3 Machine Learning Algorithms Machine learning algorithms are mathematic and logic programs that establish or discover patterns people use to make predictions or categorize information. They adjust themselves to perform better as they are exposed to more data. Machine learning uses parameters that are derived from the training data. Here, we briefly

550

M. Chen et al.

introduce some machine learning algorithms that have been widely used in predicting hepatoxicity.

23.2.3.1

k-Nearest Neighbors (k-NN)

The k-NN algorithm is a non-parametric statistical method for classification and regression. In k-NN, an instance is classified by a plurality vote of its neighbors, i.e., the prediction of test instances is made by determining the most prevalent class among the k-nearest neighbors in training samples. The parameter k is usually predetermined or can be learned from the training data. If k = 1, then the instance will be assigned to the class of the single nearest neighbor. Euclidean distance is mostly used to judge the nearest neighbors. Since the k-NN algorithm relies on distance metrics for outcome, normalization of the training data is required and can improve the classification accuracy if the features come in vastly different scales or represent different physical units. Rodgers et al. (2010) used a k-NN QSAR modeling for liver-related adverse effects of drugs. An FDA spontaneous reporting database including approximately 500 approved drugs was used to study human liver adverse effects, mostly defined by elevations of serum liver enzymes. Two hundred compounds with balanced active/inactive ratios were selected for modeling and randomly divided into training and validation sets. The k-NN method was used to develop QSAR models, which reported high sensitivity (> 73%) and specificity (> 94%) for the prediction of liver adverse effects in external validation sets.

23.2.3.2

Random Forest and Decision Trees

Random forest is a tree-based ensemble learning method for generating predictions. It comprised a panel or forest of decision trees, which are constructed using the recursive partitioning algorithm. Initially, each tree is trained by a random mix of training samples, and a subset of randomly selected descriptors is used for growing the tree. After the initial “forest” of decision trees is created, a test instance is predicted by averaging the estimates of results across all decision trees in the forest. The parameters, such as number of trees and features, are optimized in a grid search by using cross-validation. Random forest models have multiple advantages compared to the convenient decision tree, such as embedded feature selection processes. They also are less prone to overfitting and are better for handling unbalanced datasets. A machine learning model for predicting mouse liver toxicity was developed using recursive random forest together with chemical and in vitro bioactivity data (Zhu et al. 2016). Liver toxicity in mouse was defined by a composition of four in vivo toxicity endpoints (i.e., non-proliferative, neoplastic, proliferative, and gross pathology). The input data included 293 biological in vitro descriptors from ToxCast high-throughput screening assays and three sets of chemical descriptors, i.e., 137 CDK descriptors, 848 Dragon descriptors, and 376 Mold2 descriptors. The dataset

23 Computational Modeling for the Prediction of Hepatotoxicity Caused …

551

consisted of 233 chemicals, 67 toxic and 167 non-toxic, making it an unbalanced dataset. A random forest algorithm was used to develop the model, and the prediction was based on the fraction of positive prediction of the 500 trees. Two approaches, i.e., a multiple under-sampling and a shifted classification threshold, were used to deal with the unbalanced datasets issue. Correct classification rate (CCR) from fivefold cross-validations was used to evaluate the performance of the models. The best random forest model, derived from the hybrid in vitro and chemical descriptors, generated 74% CCR, 82% sensitivity, and 66% specificity. In comparison, the model by chemical and in vitro descriptors only had CCRs of 73% and 66%, respectively. The model found chemical descriptors related to logP and molecular weight, and in vitro assays related to CYP2A2 activity were found vital for model performance. The study showed that the random forest model is efficient in variable selection and development of predictive in silico models. Low et al. (2011) used chemical descriptors and toxicogenomic data to develop hepatotoxicity prediction models. CCR from fivefold cross-validation was used to evaluate the performance of the models built by four machine learning approaches, i.e., k-nearest neighbors, SVM, random forest, and distance-weighted discrimination. The models based on chemical descriptors and toxicogenomic data achieved 61% and 76% CCR, respectively. Moreover, the random forest models were superior or equivalent to all other models by k-NN, SVM, and distance-weighted discrimination. The findings demonstrated the utility of toxicogenomic data for hepatotoxicity prediction and robust predictive performance from the models developed by random forest algorithm. Kim and Nam (2017) trained random forest and SVM DILI models with weighted fingerprints, which produced accuracies of approximately 60%.

23.2.3.3

SVM

SVM is one of the most popular supervised machine learning algorithms. In the context of toxicity prediction, the classification question is considered to find a hyperplane in a mapped space to separate active molecules from inactive molecules in the training set. To develop a SVM model is to optimize the algorithmic parameters to find a hyperplane, which maximizes the margin between training samples of the different classes (called “support vectors”) as illustrated in Fig. 23.2. It is hypothesized that the larger the margin between classes, the higher the probability that the model will correctly classify new molecules it was not exposed to during training. If the classification cannot be achieved by linear hyperplane in the original descriptor space, the SVM algorithm uses kernel functions, such as polynomial, sigmoid, and radial basis to transform the original space to a higher dimensional kernel space. Therefore, SVM methods are versatile and can be used for both linear and nonlinear kernel functions in space transformation. Zhang et al. (2016a, b) developed in silico prediction for DILI. The dataset had 1317 diverse molecules collected from publications, and QSAR models were built using five machine learning algorithms with the MACCS and FP4 chemical fingerprints after evaluation by a substructure pattern recognition method. The SVM model

552

M. Chen et al.

H1

H2

H3

Support vector Support vector

Fig. 23.2 SVM is trained with samples from two classes to select the maximum-margin hyperplane. Samples on the margin are called support vectors. Hyperplane H1 does not separate the classes. Hyperplane H2 does, but only with a small margin. Hyperplane H3 separates them with the maximal margin

produced the highest accuracies of 79.5% and 64.5% for training and validation sets, respectively. Furthermore, the model generated 75.0% accuracy for the external test set of 88 compounds collected from the FDA’s Liver Toxicity Knowledge Base (Chen et al. 2013a, b). Additionally, some key substructures’ patterns which correlated well with DILI were also identified as structure alerts. Fourches et al. (2010) used the SVM algorithm to develop a QSAR model for predicting DILI. The dataset contained 531 chemicals, 248 with liver effects in humans and 283 non-DILI compounds. The authors used SVM machine learning algorithms and two types of chemical descriptors: 2D fragments and Dragon molecule descriptors. Five-fold cross-validations were used to evaluate model performance. Accuracies reported by the models with fragment descriptors and Dragon descriptors were 64.2–72.6% and 55.7–70.8%, respectively. The model was tested on 18 compounds reported as liver toxicants for animals, but not humans, and 14 compounds were correctly predicted, with an accuracy of 77.8%.

23.2.3.4

Neural Networks and Deep Learning

The artificial neural network (ANN) was inspired by information processing and distributed communication nodes in biological systems. An ANN algorithm is based on a collection of connected units called artificial neurons combined as layers, with consecutive layers connected by weights. When trained for toxicity prediction, this parallel computational structure (i.e., neural network) receives molecular/bioactivity

23 Computational Modeling for the Prediction of Hepatotoxicity Caused …

553

descriptors in the first/input layer to the toxicity endpoints in the last/output layer. Usually, there is also an intermediate set of hidden layers, which receive the data modified by the previous layer. Deep neural networks, a deep learning architecture, refer to a family of neural networks with multiple hidden layers. They have been popular in the past few years, with demonstrated success in multiple fields. Several variants of the deep learning algorithm have been proposed for different tasks, while feedforward neural networks remain the most popular algorithm. Convolutional neural networks work better for data presented in multiple arrays and have been commonly applied to analyzing visual imagery. Recurrent neural networks have their internal state (memory) to process variable-length sequences of inputs and are successful at tasks such as speech, language, and connected handwriting recognition. Despite its success, however, deep learning presents two common challenges: overfitting and computation time (LeCun et al. 2015). Xu et al. (2015) developed the first deep learning models for predicting DILI. In this study, the deep neural network model was evaluated using the training and testing drugs, with the DILI dataset annotated using FDA-approved labeling documents (Chen et al. 2011). An undirected graph recursive neural network method was used for molecular structure encoding to construct DILI prediction models. The tenfold crossvalidation on the training set produced 80.5% accuracy, 70.3% sensitivity, and 88.2% specificity. Deep learning models were also applied to the test set and generated 70.3% accuracy, slightly better than the 68.9% accuracy from the previously reported decision forest QASR model. Furthermore, the deep learning model achieved slightly better accuracy (64.7% vs. 61.6%) in the second test set but did not outperform in the third dataset (61.9% vs. 63.1%), in comparison with the decision tree model. Therefore, the conclusion that the deep neural network model is superior to the decision forest model should be cautiously applied. A deep graph learning model with property augmentation was developed for predicting DILI (Ma et al. 2021). The author used a graph neural network to learn molecular vector representation based only on its graph structure and the underlying atom/bond level features. However, this deep learning strategy required a large amount of input data to generate a more accurate molecular representation. A larger training dataset combining more drugs with other toxic properties measuring the organism-level toxicity of compounds, such as phospholipidosis, was created and used to train the deep learning model. Results suggested this proposed method significantly outperforms all existing models on the DILI dataset by obtaining an 81.4% accuracy using leave-one-out cross-validation with random splitting. Li et al. (2020) developed a deep learning model using Mold2 descriptors, which outperformed 5 conventional machine learning algorithms with a Matthews correlation coefficient value of 0.331.

554

23.2.3.5

M. Chen et al.

Ensemble Learning

Ensemble learning is an algorithm using voting, averaging, or other ensemble techniques to combine results from the constituent learning algorithms to achieve better predictive performance. Ensemble learning methods reportedly tend to produce better results if the constituent models are significantly diverse. The ensemble methods have been widely used in many fields and have shown promising performance in toxicity prediction. Ai et al. (2018) worked on a dataset of 1241 compounds and generated thirtysix base classifiers which were built using three machine learning algorithms (i.e., random forest, XGBoost, and SVM) and 12 molecular fingerprints including CDK, PubChem, and FP4 for creating an ensemble model. The ensemble models were evaluated by 100 repeats of fivefold cross-validation, resulting in an average accuracy of 71.1 ± 2.6%, sensitivity of 79.9 ± 3.6%, specificity of 60.3 ± 4.8%, and AUC of 0.764 ± 0.026. Testing the ensemble model on an external validation dataset of 286 compounds collected from the Liver Toxicity Knowledge Base (Chen et al. 2011) resulted in an accuracy of 84.3%, sensitivity of 86.9%, specificity of 75.4%, and AUC of 0.904. Compared with the hepatotoxicity prediction models reported in the literature, the ensemble model achieved a relatively higher accuracy. Ancuceanu et al. (2020) developed 267 models using various machine learning algorithms and chemical descriptors of 694 drugs from the DILIrank dataset (Chen et al. 2016), and selected 79 models among them for the ensemble. The ensemble model achieved a balanced accuracy of 74.6%, sensitivity of 76.0%, and specificity of 73.2%. He et al. (2019) used 1254 compounds with DILI data and produced an ensemble method using Marvin descriptors. The authors employed eight machine learning algorithms, including k-nearest neighbors, Naive Bayes, Kstar, Bagging, J48, AdaBoostM1, Dl4j, and random forest to build the ensemble model, which generated 78.3% accuracy, 81.8% sensitivity, and 74.8% specificity. Furthermore, the model was externally validated by Zhang et al. (2016a, b), Ai et al. (2018), and Kotsampasakou et al. (2017) datasets, and the results showed a balanced accuracy of 71.6%, sensitivity of 77.3%, and specificity of 65.8%; this outperformed the original reported DILI models using conventional machine learning algorithms (He et al. 2019).

23.3 A Case Study: Machine Learning Modeling for Hepatotoxicity Prediction The animal-based methodologies for assessing health risks of chemicals are timeconsuming, resource-intensive, and difficult for screening all compounds in the environment. This highlights the urgent need to develop more efficient and informative toxicity determination tools to characterize the bioactivity profiles of environmental chemicals. The advance of high-throughput assays allowed for chemicals

23 Computational Modeling for the Prediction of Hepatotoxicity Caused …

555

to be rapidly screened for a specific type of bioactivity at the molecular or cellular level. This approach can help identify compounds that may modulate specific biological pathways, which helps in understanding the possible role of a chemical within a given biological process. In vitro high-throughput screening (HTS) assays, combined with computational models, could provide an alternative to traditional animal testing studies. The Tox21 and EPA ToxCast programs employed hundreds of high-throughput in vitro assays and screened thousands of environmental chemicals. These bioactivity data provided the opportunity to build predictive models of toxicity with increasing computational power (Krewski et al. 2020), and several models were built based on chemical and in vitro bioactivity and outperformed the models constructed from chemical and bioactivity descriptors only (Liu et al. 2015, 2017; Xu et al. 2020). Here, we introduce the computational models (Liu et al. 2015) developed for predicting in vivo liver toxicity using chemical descriptors only, in vitro high-throughput assay only, and a combination of chemical and ToxCast assay descriptors.

23.3.1 Data Sources Hepatotoxicity data were retrieved from hepatic histopathologic endpoints observed in rat chronic studies selected from ToxRefDB (Martin et al. 2011), an EPA database collecting hundreds of in vivo endpoints observed across different tissues in guideline animal testing studies. The in vivo liver toxicity endpoints we used were assigned to three hepatotoxicity categories (hypertrophy, injury, and proliferative lesions) based on domain terminology (Thoolen et al. 2010). The high-throughput in vitro assay data were obtained from the ToxCast and Tox21 projects and are publicly available. Chemical descriptors were generated from multiple sources, including QikProp software (Schrödinger, version 3.2), Open Babel (O’Boyle et al. 2011), PaDEL (Yap 2011), and PubChem. The chemicals included in both ToxRefDB and ToxCastDB received their binary classification labels for each of the three toxicity endpoints. A set of 677 chemicals was represented by 711 in vitro bioactivity descriptors (from ToxCast assays), 4376 chemical structure descriptors (from QikProp, Open Babel, PaDEL, and PubChem), and three hepatotoxicity categories (from animal studies). After data reduction, 125 ToxCast bioassays and 726 chemical descriptors were used. Combined with three toxicity endpoints (i.e., hypertrophy, injury, and proliferative lesions), a total of nine datasets were obtained for model development (Table 23.2).

556

M. Chen et al.

Table 23.2 Hepatotoxicity endpoints and descriptor sets for model development (Liu et al. 2015) Endpoint

Compounds

Descriptors Chemical

Bioactivity

Chemical + bioactivity

Hypertrophy

624

726

125

726 + 125

Injury

564

726

125

726 + 125

Proliferative lesions

562

726

125

726 + 125

23.3.2 Modeling by Machine Learning Approaches To check the influence of machine learning algorithms, six models were built for each dataset using different algorithms, including linear discriminant analysis (LDA), Naive Bayes, SVM, classification and regression tree (CART), k-NN, and an ensemble of all classifiers (ENSMB). The number of descriptors also has an important impact on the predictive performance. Thus, for each machine learning algorithm, multiple models were built using different numbers of best descriptors filtered using an in-loop t-test, including 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, and 60 descriptors. Performances from 100 iterations of tenfold cross-validations were evaluated using mean balanced accuracy (BA), sensitivity, and specificity. Figure 23.3 illustrates the overall workflow of model generation and evaluation.

Fig. 23.3 Workflow for model generation and evaluation (Liu et al. 2015)

23 Computational Modeling for the Prediction of Hepatotoxicity Caused …

557

Table 23.3 Balanced accuracy of top three models for hepatotoxicity endpoints (Liu et al. 2015) Endpoint

Classifier

Mean balanced accuracy Chemical

Hypertrophy

Injury

Proliferative lesions

Bioactivity

Chemical + bioactivity

CART

0.82

0.79

0.84

ENSMB

0.74

0.76

0.76

SVM

0.76

0.76

0.77

CART

0.81

0.75

0.80

SVM

0.76

0.75

0.76

ENSMB

0.72

0.71

0.73

CART

0.79

0.75

0.80

SVM

0.73

0.75

0.76

ENSMB

0.69

0.71

0.71

23.3.3 Results Machine learning modeling for three in vivo liver toxicity endpoints was reported using three datasets: (i) high-throughput bioactivity descriptors only, (ii) chemical descriptors only, and (iii) bioactivity and chemical descriptors in combination. The details of balanced accuracy from the top three models are listed in Table 23.3. Specifically, the models using chemical descriptors only had the highest balanced accuracy 82% for hypertrophy, 81% for injury, and 79% for proliferative lesions. The models from in vitro ToxCast assay data only obtained the highest balanced accuracy of 79%, 75%, and 75% for hypertrophy, injury, and proliferative lesions, respectively. The similar predictive performance from chemical descriptors only and biological descriptors only demonstrated the utility of in vitro ToxCast assay data for in vivo hepatotoxicity prediction. The models built using the combination of chemical and bioactivity descriptors achieved better-balanced accuracy for hypertrophy (84%), injury (80%), and proliferative lesions (80%) than those from chemical descriptors only and from bioactivity data only. The models constructed using CART, SVM, and ENSMB outperformed the models built using other algorithms. Nuclear receptor activation and mitochondrial functions were frequently identified as highly predictive descriptors for hepatotoxicity. This study demonstrates that machine learning models developed based on the hybrid of bioactivity and chemical structure data can predict in vivo hepatotoxicity.

23.4 Summary and Future Direction Despite many efforts to eliminate hepatotoxic drugs before they are tested in humans, these drugs often escape preclinical toxicity testing and are not identified as hepatotoxic until a later stage of drug development, and sometimes even after approval.

558

M. Chen et al.

Therefore, the development of robust predictive models for evaluating the potential of liver injury in humans caused by drug candidates and chemicals as early as possible is critical. In vivo animal testing is still the primary approach for predicting liver toxicity in humans. Because of the high expenditure and time-consuming nature of animal studies, machine learning methods using in silico or high-throughput in vitro assays have become attractive. Notably, the recent development of deep learning technologies has introduced another promising avenue for developing computational models of liver toxicity. With the advances in computer technologies (e.g., the use of graphical processing units (GPUs) and deep neural network algorithms), deep learning has brought about breakthroughs in speech recognition, image classification, drug discovery, and toxicology. In several public scientific competitions, including the ImageNet LSVRC-2010 contest (Krizhevsky et al. 2017), and the Tox21 Data Challenge in 2015 (Mayr et al. 2016), deep learning algorithms demonstrated predictive performance superior to that of convenient machine learning algorithms. However, machine learning approaches, especially deep learning, are data thirsty, requiring large amounts of labeled toxicity data to develop robust predictive models. The research community has invested tremendous effort in curating labeled liver toxicity data, and several databases are publicly available, including the FDA’s Liver Toxicity Knowledge Base (Chen et al. 2013a, b), the DILIrank dataset (Chen et al. 2011, 2016), and NIH’s LiverTox database (Hoofnagle et al. 2013). Meanwhile, the use of toxicity data is growing within the scientific community. For example, the Tox21 program has tested ~ 8500 chemicals in more than 70 high-throughput assays, generating over 100 million data points (Richard et al. 2021). The EPA’s ToxCast project has also provided high-throughput assays testing thousands of environmental chemicals and drugs (Richard et al. 2016). In the future, high-quality toxicity data will remain the cornerstone for the development of more powerful machine learning models for hepatoxicity. In conclusion, the advances of machine learning and deep learning, together with high-throughput experimental technologies, are promising for developing models to predict hepatotoxicity and will thus continue to play an increasingly important role in drug development by de-risking liver liabilities caused by drugs and chemicals. Disclaimer: This chapter reflects the views of the authors and does not necessarily reflect those of the U.S. Food and Drug Administration. Acknowledgements The authors thank Joanne Berger, FDA Library, for manuscript editing assistance.

23 Computational Modeling for the Prediction of Hepatotoxicity Caused …

559

References Ai H, Chen W, Zhang L, Huang L, Yin Z, Hu H, Zhao Q, Zhao J, Liu H (2018) Predicting druginduced liver injury using ensemble learning methods and molecular fingerprints. Toxicol Sci 165(1):100–107 Ancuceanu R, Hovanet MV, Anghel AI, Furtunescu F, Neagu M, Constantin C, Dinu M (2020) Computational models using multiple machine learning algorithms for predicting drug hepatotoxicity with the DILIrank dataset. Int J Mol Sci 21(6):2114 Chen M, Vijay V, Shi Q, Liu Z, Fang H, Tong W (2011) FDA-approved drug labeling for the study of drug-induced liver injury. Drug Discov Today 16(15–16):697–703 Chen M, Zhang J, Wang Y, Liu Z, Kelly R, Zhou G, Fang H, Borlak J, Tong W (2013a) The liver toxicity knowledge base: a systems approach to a complex end point. Clin Pharmacol Ther 93(5):409–412 Chen M, Hong H, Fang H, Kelly R, Zhou G, Borlak J, Tong W (2013b) Quantitative structureactivity relationship models for predicting drug-induced liver injury based on FDA-approved drug labeling annotation and using a large collection of drugs. Toxicol Sci 136(1):242–249 Chen M, Bisgin H, Tong L, Hong H, Fang H, Borlak J, Tong W (2014) Toward predictive models for drug-induced liver injury in humans: are we there yet? Biomark Med 8(2):201–213 Chen M, Suzuki A, Thakkar S, Yu K, Hu C, Tong W (2016) DILIrank: the largest reference drug list ranked by the risk for developing drug-induced liver injury in humans. Drug Discov Today 21(4):648–653 Cheng A, Dixon SL (2003) In silico models for the prediction of dose-dependent human hepatotoxicity. J Comput Aided Mol Des 17(12):811–823 Chicco D (2017) Ten quick tips for machine learning in computational biology. BioData Min 10:35 Cruz-Monteagudo M, Cordeiro MN, Borges F (2008) Computational chemistry approach for the early detection of drug-induced idiosyncratic liver toxicity. J Comput Chem 29(4):533–549 Ekins S, Williams AJ, Xu JJ (2010) A predictive ligand-based Bayesian model for human druginduced liver injury. Drug Metab Dispos 38(12):2302–2308 Feng C, Chen H, Yuan X, Sun M, Chu K, Liu H, Rui M (2019) Gene expression data based deep learning model for accurate prediction of drug-induced liver injury in advance. J Chem Inf Model 59(7):3240–3250 Fourches D, Barnes JC, Day NC, Bradley P, Reed JZ, Tropsha A (2010) Cheminformatics analysis of assertions mined from literature that describe drug-induced liver injury in different species. Chem Res Toxicol 23(1):171–183 He S, Ye T, Wang R, Zhang C, Zhang X, Sun G, Sun X (2019) An in silico model for predicting drug-induced hepatotoxicity. Int J Mol Sci 20(8):1897 Hong H, Xie Q, Ge W, Qian F, Fang H, Shi L, Su Z, Perkins R, Tong W (2008) Mold(2), molecular descriptors from 2D structures for chemoinformatics and toxicoinformatics. J Chem Inf Model 48(7):1337–1344 Hong H, Zhu J, Chen M, Gong P, Zhang C, Tong W (2018) Quantitative structure–activity relationship models for predicting risk of drug-induced liver injury in humans. In: Chen M, Will Y (eds) Drug-induced liver toxicity. Methods in pharmacology and toxicology. Humana, New York, pp 77–100 Hoofnagle JH, Serrano J, Knoben JE, Navarro VJ (2013) LiverTox: a website on drug-induced liver injury. Hepatology 57(3):873–874 Judson RS, Magpantay FM, Chickarmane V, Haskell C, Tania N, Taylor J, Xia M, Huang R, Rotroff DM, Filer DL, Houck KA, Martin MT, Sipes N, Richard AM, Mansouri K, Setzer RW, Knudsen TB, Crofton KM, Thomas RS (2015) Integrated model of chemical perturbations of a biological pathway using 18 in vitro high-throughput screening assays for the estrogen receptor. Toxicol Sci 148(1):137–154 Kim E, Nam H (2017) Prediction models for drug-induced hepatotoxicity by using weighted molecular fingerprints. BMC Bioinformatics 18(Suppl 7):227

560

M. Chen et al.

Kotsampasakou E, Montanari F, Ecker GF (2017) Predicting drug-induced liver injury: the importance of data curation. Toxicology 389:139–145 Krewski D, Andersen ME, Tyshenko MG, Krishnan K, Hartung T, Boekelheide K, Wambaugh JF, Jones D, Whelan M, Thomas R, Yauk C, Barton-Maclaren T, Cote I (2020) Toxicity testing in the 21st century: progress in the past decade and future perspectives. Arch Toxicol 94(1):1–58 Krizhevsky A, ISutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90 LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444 Li T, Tong W, Roberts R, Liu Z, Thakkar S (2020) DeepDILI: deep learning-powered drug-induced liver injury prediction using model-level representation. Chem Res Toxicol 34(2):550–565 Liew CY, Lim YC, Yap CW (2011) Mixed learning algorithms and features ensemble in hepatotoxicity prediction. J Comput Aided Mol Des 25(9):855–871 Liu J, Mansouri K, Judson RS, Martin MT, Hong H, Chen M, Xu X, Thomas RS, Shah I (2015) Predicting hepatotoxicity using ToxCast in vitro bioactivity and chemical structure. Chem Res Toxicol 28(4):738–751 Liu J, Patlewicz G, Williams AJ, Thomas RS, Shah I (2017) Predicting organ toxicity using in vitro bioactivity data and chemical structure. Chem Res Toxicol 30(11):2046–2059 Liu Y, Gao H, He YD (2020) A compound attributes-based predictive model for drug induced liver injury in humans. PLoS ONE 15:e0231252 Low Y, Uehara T, Minowa Y, Yamada H, Ohno Y, Urushidani T, Sedykh A, Muratov E, Kuz’min V, Fourches D, Zhu H, Rusyn I, Tropsha A (2011) Predicting drug-induced hepatotoxicity using QSAR and toxicogenomics approaches. Chem Res Toxicol 24(8):1251–1262 Ma H, An W, Wang Y, Sun H, Huang R, Huang J (2021) Deep graph learning with property augmentation for predicting drug-induced liver injury. Chem Res Toxicol 34(2):495–506 Martin MT, Knudsen TB, Reif DM, Houck KA, Judson RS, Kavlock RJ, Dix DJ (2011) Predictive model of rat reproductive toxicity from ToxCast high throughput screening. Biol Reprod 85(2):327–339 Mayr A, Klambauer G, Unterthiner T, Hochreiter S (2016) DeepTox: toxicity prediction using deep learning. Front Environ Sci 3:80 Muller C, Pekthong D, Alexandre E, Marcou G, Horvath D, Richert L, Varnek A (2015) Prediction of drug induced liver injury using molecular and biological descriptors. Comb Chem High Throughput Screen 18(3):315–322 Mulliner D, Schmidt F, Stolte M, Spirkl HP, Czich A, Amberg A (2016) Computational models for human and animal hepatotoxicity with a global application scope. Chem Res Toxicol 29(5):757– 767 O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011) Open Babel: an open chemical toolbox. J Cheminform 3:33 Richard AM, Judson RS, Houck KA, Grulke CM, Volarath P, Thillainadarajah I, Yang C, Rathman J, Martin MT, Wambaugh JF, Knudsen TB, Kancherla J, Mansouri K, Patlewicz G, Williams AJ, Little SB, Crofton KM, Thomas RS (2016) ToxCast chemical landscape: paving the road to 21st century toxicology. Chem Res Toxicol 29(8):1225–1251 Richard AM, Huang R, Waidyanatha S, Shinn P, Collins BJ, Thillainadarajah I, Grulke CM, Williams AJ, Lougee RR, Judson RS, Houck KA, Shobair M, Yang C, Rathman JF, Yasgar A, Fitzpatrick SC, Simeonov A, Thomas RS, Crofton KM, Paules RS, Bucher JR, Austin CP, Kavlock RJ, Tice RR (2021) The Tox21 10K compound library: collaborative chemistry advancing toxicology. Chem Res Toxicol 34(2):189–216 Rodgers AD, Zhu H, Fourches D, Rusyn I, Tropsha A (2010) Modeling liver-related adverse effects of drugs using k-nearest neighbor quantitative structure-activity relationship method. Chem Res Toxicol 23(4):724–732 Sipes NS, Martin MT, Reif DM, Kleinstreuer NC, Judson RS, Singh AV, Chandler KJ, Dix DJ, Kavlock RJ, Knudsen TB (2011) Predictive models of prenatal developmental toxicity from ToxCast high-throughput screening data. Toxicol Sci 2124(1):109–127

23 Computational Modeling for the Prediction of Hepatotoxicity Caused …

561

Thoolen B, Maronpot RR, Harada T, Nyska A, Rousseaux C, Nolte T, Malarkey DE, Kaufmann W, Küttler K, Deschl U, Nakae D, Gregson R, Vinlove MP, Brix AE, Singh B, Belpoggi F, Ward JM (2010) Proliferative and nonproliferative lesions of the rat and mouse hepatobiliary system. Toxicol Pathol 38(7 Suppl):5S-81S Wu L, Liu Z, Auerbach S, Huang R, Chen M, McEuen K, Xu J, Fang H, Tong W (2017) Integrating drug’s mode of action into quantitative structure-activity relationships for improved prediction of drug-induced liver injury. J Chem Inf Model 57(4):1000–1006 Xu Y, Dai Z, Chen F, Gao S, Pei J, Lai L (2015) Deep learning for drug-induced liver injury. J Chem Inf Model 55(10):2085–2093 Xu T, Ngan DK, Ye L, Xia M, Xie HQ, Zhao B, Simeonov A, Huang R (2020) Predictive models for human organ toxicity based on in vitro bioactivity data and chemical structure. Chem Res Toxicol 33(3):731–741 Yap CW (2011) PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem 32(7):1466–1474 Zhang C, Cheng F, Li W, Liu G, Lee PW, Tang Y (2016a) In silico prediction of drug induced liver toxicity using substructure pattern recognition method. Mol Inform 35(3–4):136–144 Zhang H, Ding L, Zou Y, Hu SQ, Huang HG, Kong WB, Zhang J (2016b) Predicting drug-induced liver injury in human with Naïve Bayes classifier approach. J Comput Aided Mol Des 30(10):889– 898 Zhu XW, Xin YJ, Chen QH (2016) Chemical and in vitro biological information to predict mouse liver toxicity using recursive random forests. SAR QSAR Environ Res 27(7):559–572

Chapter 24

Artificial Intelligence for Risk Assessment of Cancer Therapy-Related Cardiotoxicity and Precision Cardio-Oncology Jessica Castrillon Lal and Feixiong Cheng

24.1 Introduction The prevalence of cancer survivors has increased dramatically in the past decades. In 2019, 16.9 million cancer survivors were reported and are estimated to increase to greater than 22 million by 2030 (Miller et al. 2019). Although advanced treatment for cancer patients has been improved in the past few years, the cancer patient population is experiencing other types of health complications (Banke et al. 2016; Sturgeon et al. 2019), such as cancer therapy-related cardiac dysfunction (CTRCD) (Hou et al. 2021b). Cardiovascular disease is the second leading cause of death in cancer survivors (11.3%), following cancer re-occurrence (38%) (Sturgeon et al. 2019). Various cancer treatments like radiation, anthracyclines, targeted therapies, and immunotherapies have been linked to adverse cardiovascular outcomes (Zamorano et al. 2016), such as CTRCD. The goal of the field of cardio-oncology is to enhance survivorship care by understanding mechanisms of cardiotoxicity following each cancer drug class, defining robust and personalized predictors of cardiotoxicity, and enhancing monitoring and surveillance of cancer patients currently on known

J. C. Lal · F. Cheng (B) Cleveland Clinic, Genomic Medicine Institute, Lerner Research Institute, Cleveland, OH 44195, USA e-mail: [email protected] J. C. Lal e-mail: [email protected] Department of Molecular Medicine, Cleveland Clinic, Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA F. Cheng School of Medicine, Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, OH 44106, USA © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Hong (ed.), Machine Learning and Deep Learning in Computational Toxicology, Computational Methods in Engineering & the Sciences, https://doi.org/10.1007/978-3-031-20730-3_24

563

564

J. C. Lal and F. Cheng

Fig. 24.1 A diagram illustrating a machine learning (ML) framework for cardiac risk assessment of anticancer therapy during drug development and post-market surveillance

cardiotoxic drugs (Brown et al. 2015; Campia et al. 2019; Lenihan et al. 2019; Lenneman and Sawyer 2016). The field is becoming aware that the standard blunt categorizations for defining cardiotoxicity should be improved. As next-generation sequencing and big data science become more integral for enhancing risk stratification, machine learning models will become increasingly important. Broadly speaking, machine learning involves using statistical models to fit data to the labeled or unlabeled outcomes. Artificial intelligence (AI) and machine learning (ML) approaches are becoming popular to analyze large and complex datasets. In cardiology, ML approaches have been utilized to predict cardiovascular complications (Cai et al. 2018b, 2019), cardiac resynchronization therapy response (Feeny et al. 2020), cardiovascular events after acute myocardial infarction (Wang et al. 2018), and claims data-based mortality risk prediction (Krumholz et al. 2019). In the future, these models have the potential to offer risk assessment of precision-based CTRCD events in cardio-oncology clinical practices (Fig. 24.1). This chapter will discuss essential elements of designing an AI/ML model for CTRCD assessment during drug discovery process and postmarketing drug surveillance. We will showcase two case studies that applied these methods for CRTCD risk assessment using large-scale electronic health record (EHR) and clinical data. Finally, we will discuss existing limitations to these approaches and future directions for AI/ML applications in precision cardio-oncology.

24.2 Methods and Materials 24.2.1 Data Resources The patient population used for analysis should represent oncology patients seen by an oncology specialist and is referred for cardiology evaluation. Several factors should be considered on how best to design the experiment based on the available data, including cancer subtype, type of cardiotoxic therapy, polypharmacy, disease

24 Artificial Intelligence for Risk Assessment of Cancer Therapy-Related …

565

stage, age, and race. CTRCD events are often rare, and it can limit the application of studying a single cancer subtype, single therapy, or single outcome study. There are multiple databases (Table 24.1), including FDA Adverse Events Reporting System, VigiBase, and IBM MarketScan Research Database.

24.2.2 Molecular Feature and Vector Generation It is vital to choose clinically relevant variables that are significantly associated with CTRCD and further contributed to the high performance of ML models. To do this, the weights of the logistic regression model for each outcome and variable pair can be evaluated. Logistic regression (LR) applies a weight for each feature, and the prediction is the summation of all of the products of the weight and feature pairs. Clinically relevant variables can be based on 2 criteria: (1) The absolute coefficient of variation (the ratio of standard deviation (SD) and mean) is low to ensure small fluctuation of the weight in the 100 repeats; and (2) the absolute associated weight compared with the extremum weight for that outcome is high (relative weight). Additionally, it is advised to test the hazard ratios (95% Confidence Interval (CIs) of the clinically relevant variables to the model outcome to verify its utility. The Wald χ 2 test was used to evaluate the variables with statistically significant coefficients. The following R packages can be used to test hazard ratios on outcomes survival (v2.44-1.1) and survminer (v0.4.6) packages on R 3.6.1.

24.2.3 Defining Biological Endpoints and Clinical Outcomes Cancer therapy-related cardiac dysfunction (CTRCD) can range from subacute to long-term adverse cardiovascular effects, including heart failure, arrhythmia, coronary artery disease, stroke, or a combination. Choosing primary and secondary outcomes for the model is specific to the question being addressed and the data available. Many of the clinical trials to date use changes in left ventricular ejection fraction as their study endpoint, or time to disease. It is important to highlight that cardio-oncology is a new field and is limited to observational cohort studies with mixed patient populations and study endpoints. Heterogeneity in these studies can therefore impact the model performance and sensitivity. Therefore, several studies which have utilized machine learning and statistical methods in cardio-oncology have focused on using a binary qualitative variable as the model outcome for example subacute cardiovascular outcomes defined by ICD 9/10 codes, hazard of all de novo CTRCD, or all-cause mortality. As the field advances, model outcomes may change to describe events or biomarkers specific to CTRCD.

566

J. C. Lal and F. Cheng

Table 24.1 Available data resources Database

Type

Description

Website

Drug targets

DrugBank

8250 drug entries including 2016 FDA-approved small molecule drugs and over 6000 experimental drugs

http://www.drugbank.ca/

BindingDB

565,136 compounds http://www.bindingdb.org against 6612 proteins and 1,279,670 binding affinity data

ChEMBL

2,036,512 compounds https://www.ebi.ac.uk/che against 11,224 targets and mbldb drug-like molecules

FDA Adverse Events Reporting System

23 million reports since 1969 on adverse drug reaction and patient outcomes

https://www.fda.gov/ drugs/questions-and-ans wers-fdas-adverse-eventreporting-system-faers/ fda-adverse-event-report ing-system-faers-publicdashboard

VigiBase

30 million reports since 2015 from the World Health Organization (WHO) global database of reported potential side effects of medicinal products

http://www.vigiaccess.org/

IBM MarketScan Research Database

273 million commercial claims and encounters from IBM Watson Health® contains de-identified, patient-specific health data of reimbursed healthcare claims for employees, retirees, and their dependents of over 250 medium and large employers and health plans

https://www.ibm.com/pro ducts/marketscan-res earch-databases

Million Veterans Program

825,000 Veteran reports since 2011 on genetics and health

https://www.research.va. gov/mvp/

Geisinger MyCode Database

More than 90,000 https://www.geisinger.org/ participant reports, precision-health/mycode including molecular data, genotype, and EHR-derived phenotype

Clinical data

(continued)

24 Artificial Intelligence for Risk Assessment of Cancer Therapy-Related …

567

Table 24.1 (continued) Database

Type

Description

Website

Kaiser Permanente Research Bank

Over 12.5 million well-characterized population reports, with more than 12 years of electronic medical records

https://researchbank.kaiser permanente.org/our-res earch/for-researchers/

24.2.4 AI/ML Algorithm and Model Selection Supervised Versus Unsupervised Learning Most classical machine learning methods fall into supervised or unsupervised learning. Supervised learning corresponds to training a dataset to learn the relationship between features and targets pre-assigned by experimental labels or humans. This is the most popular use of ML methods to date and is often used to classify images, language translation, or complex data integration for group stratification. Unsupervised learning models classify unlabeled data based on “learned” patterns the algorithm identifies. Some common applications for unsupervised learning are dimensionality reduction and clustering. Prior to choosing a model, it is important to test several statistical models and identify which provides the best performance for the dataset. An additional consideration, the type of variables used as features in the model, will also determine which model fits best. Starting with the simplest model is often the first step taken. Regression Versus Classification Models Choosing whether to use regression or classification models will be determined by the variables used as features in the model qualitative, quantitative, or both. Linear regression will be used to determine outcomes of quantitative variables like age, BMI, and annual income. Classification models are used to determine outcomes of qualitative variables (sex, race, tumor type, etc.) or a mix (Fig. 24.2). Logistic Regression Simple Linear regression models are examples of a supervised model that is best used to predict quantitative variables to a single outcome. Biological outcomes are complex, and therefore, this method is not often applied. Y = βo + β1 X Multiple Linear Regression We often have more than one variable contributing to an outcome for biological applications. For example, patient demographic factors impact on risk for stroke. To accommodate more than one predictor, multiple linear regression is used. Here, each variable or predictor is given its own slope coefficient or weight.

568

J. C. Lal and F. Cheng

Fig. 24.2 A diagram illustrating theory frameworks of several machine learning approaches: a regression; b classification; and c decision tree-based approaches

Y = βo + β1 X 1 + β2 X 2 + · · · + β p X p

Multiple Logistic Regression Multiple logistic regression estimates the probability of an outcome (disease/no disease, alive/dead, pass/fail, etc.) based on a set of quantitative and qualitative variables. ] [ Y = βo + β1 X 1 + β2 X 2 + · · · + β p X p Log (1 − Y )

24 Artificial Intelligence for Risk Assessment of Cancer Therapy-Related …

569

K-Nearest Neighbor (k-NN) Approach The k-NN approach is a supervised ML approach used for classifying non-parametric data. The object is assigned to groups based on proximity to its neighbors. Pr (Y = j|X = x0 ) =

1 ∑ I (yi = j ) K i∈N 0

K-means Clustering The K-means approach is an unsupervised classification method that will assign an object into k number of groups based on the mean distance to its neighbors (Fig. 24.2). Each cluster centroid is computed by taking the mean vector distance of objects in that cluster. This method differs from the k-NN approach in that the algorithm will learn the patterns in the dataset and assign clusters accordingly. This method is often used to classify demographic information, marketing trends, or sub-groups of patients within a disease phenotype. Support Vector Machine Support Vector Machine (SVM) is a supervised classification method applying nonlinear boundaries for distinguishing groups using the kernel function (Fig. 24.2). Where each variable is placed in a high- or infinite-dimension space, each group is distinguished by the greatest separation, or margin, between two classes because this defines class boundaries in multiple dimensions, allowing for more complex discrimination within a dataset. Random Forest Random forest models are a non-supervised classification method also commonly used in ML that generates groups based on decision trees (Fig. 24.2). The algorithm will classify the data by iteratively sub-sampling the dataset to generate the decision trees and taking the average to improve the prediction accuracy. The higher number of trees provides a more robust and accurate model. Gradient Tree Boosting Like the random forest model, gradient tree boosting also uses the decision tree method for classifying the data (Fig. 24.2). However, this method will build one tree at a time and fit the model to the residual values of the previous step to improve the model. The final model represents each step’s aggregation and, therefore, can often perform better than the random forest model. However, if the data is noisy, this method can be limited to over-fitting making it difficult to tune to a new dataset. Patient-Patient Similarity Matrix For unbiased patient stratification, the patient-patient similarity network-based risk assessment has been developed (Hou et al. 2021b). This method is used specifically for a heterogeneous cancer patient population with cancer therapy-related cardiac

570

J. C. Lal and F. Cheng

dysfunction (CTRCD). This method involves three major steps, 1. Data processing, 2. Patient similarity calculation, and 3. Network Construction and visualization. To calculate patient similarity for all pairs of patients, the cosine similarity of patients A and B is calculated as: cosine AB

∑ n Ai Bi /∑ = /∑ i=1 n n 2 2 i=1 Ai i=1 Bi

where n represents the number of variables, and Ai and Bi indicate the ith variable of patients A and B, respectively. A cosine cutoff is used to determine whether two patients should be connected to the network for visualization. To construct the patient-patient similarity network, K-means clustering analysis is used for classification using the cosine similarity profiles. The sum of squared error is used to identify the optimal number of clusters for the dataset. SSE =

n ∑ (

Xi − X

)2

i=1

where X i indicates each patient and X is the average of the patients within the cluster. As a secondary method to choose the best number of clusters, adjusted rand index and adjusted mutual information can be performed. The scores for both metrics range from 0 to 1, with 1 indicating perfect agreement.

24.2.5 Evaluating Model Performance To evaluate the model performance, it is required to split the data into three parts: training, validation, and testing. The model will learn using the training data and evaluate the model on the validation data. After tuning the model parameters, the model will be tested again using the test data. Regression and classification statistical models have their distinct methods of testing model performance. Regression Methods To quantify the performance of regression models, measuring the quality of fit of the predicted response (using the test data) to the true response (using the training dataset) is performed using the mean squared error (MSE). MSE =

n )2 1 ∑ ( yi − fˆ(xi ) n i=1

where fˆ(xi ) is the prediction that fˆ gives for the ith observation. The smaller the MSE value, the better the model performed.

24 Artificial Intelligence for Risk Assessment of Cancer Therapy-Related …

571

K-fold Cross-Validation Cross-validation is used to test model accuracy by splitting the training and test data into n observations and quantifying the accuracy. The model will iteratively partition the dataset into k groups, and the model will fit the remaining k − 1 groups. The MSE is calculated on the held-out group. The MSE values are averaged to give an estimate of the model performance. AUROC and AUPR To evaluate the precision of the models, area under the receiver operating characteristic curve (AUROC) or precision-recall curve (AUPR) can be assessed. The area under the ROC curve or precision-recall curve is used to measure accuracy and precision. ROC curve shows a false-positive rate (FPR) on the x-axis. This metric informs about the proportion of negative class classified as positive. FPR =

FP FP + TN

where FP represents the false positive and TN represents the true negative. On the y-axis, it shows a true-positive rate, also known as Recall or Sensitivity. It informs about the positive class proportion that was correctly classified. TPR =

TP TP + FN

where TP represents true positive and FN represents false negative. The closer the AUC or AUPR reaches 1, the better the model performed. These scores can be computed using the scikit-learn Python package.

24.3 Variable Network Construction Several advanced machine learning models, like neural and convoluted networks, are limited by black boxes. More specifically, it is difficult to deconvolute what variables have the greatest weight on the model construction. The strength of using patientpatient similarity network is the ability to retrospectively identify a clinical variable network for each patient subgroup. The Pearson correlation coefficient (PCC) values of all pairs of noncategorical variables using their distribution in the patients within a specific subgroup are calculated for each cluster. For sparse networks, the top K percent strategy that uses the K% connections with the highest PCC to construct the network can be applied (5%, 10%, 15%, 20%). The K% can be chosen based on balancing network density with a suitable amount of significant PCC correlations (P < 0.05). Networks can be visualized using Cytoscape (v3.8.2, http://www.cytoscape. org/) and Gephi (v0.9.2, https://gephi.org/).

572

J. C. Lal and F. Cheng

24.4 Case Studies 24.4.1 In Silico Pharmacoepidemiologic Evaluation of Drug-Induced Cardiovascular Complications Using Combined Classifiers Currently, cardio-oncologist are using pharmacovigilance data to identify cancer drugs that lead to adverse cardiovascular outcomes. While using large-scale realworld data is good, it can take 5–10 years to establish and validate adverse drug effects. Alternatively, in silico models can be utilized to predict cardiac adverse effects of drugs with good sensitivity and accuracy (Cai et al. 2018a). In 2018, a research group generated a combined classifier to predict four CV complications, hypertension, arrhythmia, heart block, cardiac failure, and myocardial infarction using four machine learning algorithms—logistic regression, random forest, k-nearest neighbor, support vector machine, and neural network (NN) (Cai et al. 2018a). Here, druginduced complications were collected from MetaADEDB (Cai et al. 2018a). Subsequently, physiochemical properties (i.e., molecular descriptors, MD) and established molecular fingerprints (MF) were identified for each drug and Pearson correlation analysis was used to match MD/MF to CV complications. The total number of drugs were tested for hypertension (n = 1162), heart block (n = 544), arrhythmia (n = 1450), cardiac failure (n = 630), and myocardial infarction (n = 638). The classifiers were used as features for the ML models for hypertension (n = 36), heart block (n = 36), arrhythmia (n = 26), cardiac failure (n = 26), and myocardial infarction (n = 37). All classification models were validated using fivefold crossvalidation and external validation. In general, the combined classifier performed best compared to the four single classifiers for heart block (AUC = 0.842), arrhythmia (AUC = 0.784), myocardial infarction (AUC = 0.790), and cardiac failure (AUC = 0.785). Furthermore, this model was tested on a human pluripotent stem-cell-derived cardiomyocytes and on diverse anticancer drugs. The combined classifier was able to predict reported CV complications related to several tyrosine kinase inhibitors identified in the hiSPC-CM toxicity assay (Sharma et al. 2017). Assessing 63 anticancer drugs with CV complications according to the Drugs@FDA database, the combined performed with high accuracy (87%). In addition to identifying known cardiotoxic anticancer agents like erlotinib, gefitinib, and anthracyclines, it also identified an unknown cardiotoxic agent Tandutinib (MLN-519), a PDGFR inhibitor. This case study suggests that combined classifiers have valid post-marketing utility to speed the detection of drug-toxicity relationships.

24 Artificial Intelligence for Risk Assessment of Cancer Therapy-Related …

573

24.4.2 Machine Learning-Based Risk Assessment for Cancer Therapy-Related Cardiac Dysfunction in 4300 Longitudinal Oncology Patients Early prevention and surveillance are central themes in the cardio-oncology field. Therefore, optimizing “smart” methods to complement cancer management will be vital for the future of cardio-oncology medicine. In this case study (Zhou et al. 2020), electronic health record data, including laboratory test and cardiovascular echocardiography variables, were collected from 4309 cancer patients between 1997 and 2018. Five ML classification models were used to model risk assessment for cancer therapy-related cardiac dysfunction (CTRCD): k-nearest neighbors, logistic regression, support vector machine, random forest, and gradient tree boosting. Five types of cardiovascular outcomes were extracted (heart failure, atrial fibrillation, coronary artery disease, myocardial infarction, stroke) using ICD 9/10 codes. CTRCD served as an additional outcome that was manually checked using EPIC. This model incorporated 45 clinical variables that incorporated patient demographics, laboratory tests, and longitudinal echocardiograph parameters. For this dataset, logistic regression, random forest, and gradient boost tree achieved the best performance. The logistic regression model performance by AUROC for the six outcomes were 0.882 (95% CI, 0.878–0.887) for HF, 0.787 (95% CI, 0.782–0.792) for AF, 0.821 (95%CI, 0.815–0.826) for CAD, 0.807 (95% CI, 0.799–0.816) for MI, 0.660 (95% CI, 0.650–0.670) for stroke, and 0.802 (95% CI, 0.797–0.807) for de novo CTRCD. To assess generalizability of the model, the patients were split by cancer therapy start dates prior to 2017.1.1 and after. The AUROC ranged from 0.913 for HF to 0.656 for MI, indicating the generalizability of ML models for CTRCD risk assessment. This case study suggests that ML has the power to provide robust CTRCD risk assessment using large-scale, longitudinal patient data (Zhou et al. 2020).

24.4.3 Cardiac Risk Stratification in Cancer Patients: A Longitudinal Patient-Patient Network Analysis To better classify similarity in patients with known cardiovascular outcomes, this case study used topology-based K-means clustering to build a patient-patient network in 4632 cancer patients between 1997 and 2019 (Hou et al. 2021a). Five cardiovascular events (atrial fibrillation, coronary artery disease, heart failure, myocardial infarction, or stroke) and overall survival were chosen as outcomes using ICD 9/10 codes following a cancer diagnosis. This study used 112 clinical variables collected from the Care Everywhere Network, including 373 institutions across 48 states. The features were broken down into 43 demographic variables, 24 laboratory testing variables, 7 cardiac variables, and 38 echocardiograph variables.

574

J. C. Lal and F. Cheng

The patient-patient similarity model described above was used to generate patient clusters with similar clinical phenotypes which were classified using their cosine similarity network profiles by K-means clustering. Using this framework, four clinically relevant clusters were highly correlated with overall survival, and de novo CTRCD was identified. Among these, subgroup I (C1) had the highest cumulative hazard of de novo CTRCD with an HR of 3.05 (95% CI 2.51–3.72) followed by cluster 3 (C3) and cluster 4 (C4). Cluster 2 (C2) had the lowest cumulative hazard of de novo HF, AF, CAD, and MI (p < 0.001), as well as the best overall survival probability (Hou et al. 2021a). The inability to extract the features that most contribute to classification is a major limitation to several artificial intelligence models. The strength of this study is the use of network-based methods to perform variable-variable network analysis on each cluster. The authors demonstrated the interaction of each variable and each of the four classes of variables—cardiac (red), echocardiographic (blue), laboratory test results (green), and general demographics (gray). Cardiac variables (Troponin T and NTproB-type Natriuretic Peptide [NT-proBNP]) have the strongest connectivity with de novo CTRCD in C1 compared to the other clinical variable networks. Creatinine showed a strong connectivity with cardiac variables in C1, and this connection was lost in C4. These observations suggest a complementary clinical role of creatine in the risk assessment of CTRCD. Overall, this case study underscores the utility of unbiased ML models for CTRCD risk classification and feature extraction (Hou et al. 2021a). These methodologies can have a great impact on cardiotoxicity surveillance in cancer patients by leveraging large-scale data and machine learning models.

24.5 Future Directions and Conclusion This chapter has described essential steps for designing machine learning and network-based models for cardio-oncology studies. Following careful evaluation, these methodologies can be generalizable for risk assessment of drug-disease interactions outside of cardio-oncology. Here, we have highlighted two prominent case studies that have harnessed large-scale longitudinal data for cardiotoxicity risk assessment. Cardio-oncology is a multidisciplinary field, which requires communication between biologists, physicians, technology developers, data scientists, patient advocates, and stakeholders. Concerted efforts for bio-sample collection, analysis, and application have been laid out in the 2019 Global Cardio-Oncology summit and are now becoming recognized by other societies like the American Heart Association, the American Association of Cancer Research, and the European Society of Cardiology. To date, there are over 50 clinical trials evaluating biomarkers in cardio-oncology (Zaha et al. 2021). Several high-quality papers have used risk-assessment models for monitoring LV dysfunction after chemotherapy and HER2 inhibitor using classical cardiac biomarkers like NT-ProBNP and Troponin T (Chaix et al. 2020; Michel et al.

24 Artificial Intelligence for Risk Assessment of Cancer Therapy-Related …

575

2020). However, using multi-omics and machine learning models will be instrumental to design rigorous and robust approaches for biomarker discovery. It will require a concerted effort across institutions to design studies that involve thousands of samples rather than hundreds. An additional system-wide effort to improve AI/ML models will integrate clinical data that transcends cardiac biomarker use alone. Expanding the scope of variables fed into machine learning models will lead to more novel discoveries for understanding mechanisms of cardiotoxicity. Recent work has highlighted the cross talk between cancer progression and heart failure (Koelwyn et al. 2020; Meijers et al. 2018; Moslehi et al. 2020). Koelwyn et al. show that breast cancer murine models who experienced a myocardial infarction had higher levels of inflammatory monocytes resulting in accelerated tumor growth. Furthermore, peripheral murine blood samples showed that these mice exhibited higher levels of circulating monocytes long term, explained by altered epigenetic reprogramming after the ischemic event (Koelwyn et al. 2020). Similar outcomes were observed in a mouse model of colon cancer as well (Meijers et al. 2018). Furthermore, cardiac and cancer cells have similar stress responses and metabolic alterations. Indeed, cancer cells can promote metabolic remodeling of the heart via nutrient redirection, promoting oxidative stress, and altered energy utilization pathways. Ample basic science data underscores cross talk between cancer, cardiovascular, and metabolic disease which suggests that we should include more comprehensive clinical variables into risk-calculator models. Digital health technologies and wearable devices are emerging as powerful supplemental tools for cardiovascular surveillance and subsequently have begun to attract public interest. Numerous current wearable technologies can measure several physiological parameters such as pulse, cardiac output, blood pressure, heart rhythm, and sympathetic nerve activity (Nam et al. 2016; Perez et al. 2019). However, many of these devices are still in the early stages of evaluation. The CHAMPION study tested CardioMEMS system (Abbott), an implantable pulmonary artery pressure sensor, for evaluating subsequent hospitalization rates. The outcomes showed a significant decrease in HF hospitalization (HR = 0.55, 95% CI 0.49–0.61, p < 0.001) and subsequently reduced healthcare cost of $7433 per patient compared to standard of care after 6 months (Abraham et al. 2016; Desai et al. 2017). The Apple Heart study is another example of emerging medical device data availability. This study was a multicenter prospective study that evaluated PPG-enabled devices (the Apple smartwatch) to detect AF in individuals with no medical history of arrhythmia for 8 months. In a subset cohort that wore both the Apple watch and ECG patches, AF was present in 34% of participants. The sensitivity of the Apple watch corresponding to the standard ECG patch used in the clinic was 84% (95% CI 76–92%) (Perez et al. 2019). This device, as well as the KardiaMobile 6L (AliveCor) device, has received FDA clearance for rhythm monitoring. More clinical studies are thoroughly summarized (Krittanawong et al. 2021). These two examples provide a window into the future for potential use of novel clinical variables that will be integrated in machine learning models for risk assessment in future cardio-oncology management. Challenges remain in machine learning model prior to its application in the clinic, such as data heterogeneity, quality, and small sample sizes, data security, and interpretability

576

J. C. Lal and F. Cheng

of the models. We expect future successful ML models for risk assessment will become more accurate in terms of the generalizability, handling various variable formats, and robust to noise and adversarial attacks. Furthermore, we expect that the evolution of data availability and biomarker discovery will in turn also optimize machine learning models for cardio-oncology surveillance. Imaging data is already incorporated into cardio-oncology differential diagnosis. Incorporating unbiased models with clinical parameters, while maintaining clinician oversight, will improve cardio-oncology management. Acknowledgements This work was supported by the National Heart, Lung, and Blood Institute of the National Institutes of Health under award numbers K99 HL138272 and R00 HL138272 to F.C.; Howard Hughes Medical Institute Gilliam Fellowship GT14941. Competing Interesting The authors declare that there are no competing interests.

References Abraham WT, Stevenson LW, Bourge RC, Lindenfeld JA, Bauman JG, Adamson PB (2016) Sustained efficacy of pulmonary artery pressure to guide adjustment of chronic heart failure therapy: complete follow-up results from the CHAMPION randomised trial. Lancet 387:453–461 Banke A, Schou M, Videbaek L, Møller JE, Torp-Pedersen C, Gustafsson F, Dahl JS, Køber L, Hildebrandt PR, Gislason GH (2016) Incidence of cancer in patients with chronic heart failure: a long-term follow-up study. Eur J Heart Fail 18:260–266 Brown SA, Sandhu N, Herrmann J (2015) Systems biology approaches to adverse drug effects: the example of cardio-oncology. Nat Rev Clin Oncol 12:718–731 Cai C, Fang J, Guo P, Wang Q, Hong H, Moslehi J, Cheng F (2018a) In silico pharmacoepidemiologic evaluation of drug-induced cardiovascular complications using combined classifiers. J Chem Inf Model 58:943–956 Cai Y, Yang H, Li W, Liu G, Lee PW, Tang Y (2018b) Multiclassification prediction of enzymatic reactions for oxidoreductases and hydrolases using reaction fingerprints and machine learning methods. J Chem Inf Model 58:1169–1181 Cai C, Guo P, Zhou Y, Zhou J, Wang Q, Zhang F, Fang J, Cheng F (2019) Deep learning-based prediction of drug-induced cardiotoxicity. J Chem Inf Model 59:1073–1084 Campia U, Moslehi JJ, Amiri-Kordestani L, Barac A, Beckman JA, Chism DD, Cohen P, Groarke JD, Herrmann J, Reilly CM, Weintraub NL (2019) Cardio-oncology: vascular and metabolic perspectives: a scientific statement from the American Heart Association. Circulation 139:e579– e602 Chaix M-A, Parmar N, Kinnear C, Lafreniere-Roula M, Akinrinade O, Yao R, Miron A, Lam E, Meng G, Christie A, Manickaraj AK, Marjerrison S, Dillenburg R, Bassal M, Lougheed J, Zelcer S, Rosenberg H, Hodgson D, Sender L, Kantor P et al (2020) Machine learning identifies clinical and genetic factors associated with anthracycline cardiotoxicity in pediatric cancer survivors. JACC Cardio Oncol 2:690–706 Desai AS, Bhimaraj A, Bharmi R, Jermyn R, Bhatt K, Shavelle D, Redfield MM, Hull R, Pelzel J, Davis K, Dalal N, Adamson PB, Heywood JT (2017) Ambulatory hemodynamic monitoring reduces heart failure hospitalizations in “real-world” clinical practice. J Am Coll Cardiol 69:2357– 2365 Feeny AK, Chung MK, Madabhushi A, Attia ZI, Cikes M, Firouznia M, Friedman PA, Kalscheur MM, Kapa S, Narayan SM, Noseworthy PA, Passman RS, Perez MV, Peters NS, Piccini JP, Tarakji

24 Artificial Intelligence for Risk Assessment of Cancer Therapy-Related …

577

KG, Thomas SA, Trayanova NA, Turakhia MP, and Wang PJ (2020) Artificial intelligence and machine learning in arrhythmias and cardiac electrophysiology. Circ Arrhythmia Electrophysiol 13:e007952 Hou Y, Zhou Y, Hussain M, Budd GT, Tang WHW, Abraham J, Xu B, Shah C, Moudgil R, Popovic Z, Watson C, Cho L, Chung M, Kanj M, Kapadia S, Griffin B, Svensson L, Collier P, Cheng F (2021a) Cardiac risk stratification in cancer patients: a longitudinal patient-patient network analysis. PLoS Med 18:e1003736 Hou Y, Zhou Y, Hussain M, Budd GT, Tang WHW, Abraham J, Xu B, Shah C, Moudgil R, Popovic Z, Watson C, Cho L, Chung M, Kanj M, Kapadia S, Griffin B, Svensson L, Collier P, Cheng F (2021b) Cardiac risk stratification in cancer patients: a longitudinal patient–patient network analysis. PLoS Med 18:e1003736 Koelwyn GJ, Newman AAC, Afonso MS, van Solingen C, Corr EM, Brown EJ, Albers KB, Yamaguchi N, Narke D, Schlegel M, Sharma M, Shanley LC, Barrett TJ, Rahman K, Mezzano V, Fisher EA, Park DS, Newman JD, Quail DF, Nelson ER et al (2020) Myocardial infarction accelerates breast cancer via innate immune reprogramming. Nat Med 26:1452–1458 Krittanawong C, Rogers AJ, Johnson KW, Wang Z, Turakhia MP, Halperin JL, Narayan SM (2021) Integration of novel monitoring devices with machine learning technology for scalable cardiovascular management. Nat Rev Cardiol 18:75–91 Krumholz HM, Normand S-LT, Wang Y (2019) Twenty-year trends in outcomes for older adults with acute myocardial infarction in the United States. JAMA Netw Open 2:e191938–e191938 Lenihan DJ, Fradley MG, Dent S, Brezden-Masley C, Carver J, Filho RK, Neilan TG, Blaes A, Melloni C, Herrmann J, Armenian S, Thavendiranathan P, Armstrong GT, Ky B, and Hajjar L (2019) Proceedings from the global cardio-oncology summit. JACC Cardio Oncol 1:256–272 Lenneman CG, Sawyer DB (2016) Cardio-oncology: an update on cardiotoxicity of cancer-related treatment. Circ Res 118:1008–1020 Meijers WC, Maglione M, Bakker SJL, Oberhuber R, Kieneker LM, de Jong S, Haubner BJ, Nagengast WB, Lyon AR, van der Vegt B, van Veldhuisen DJ, Westenbrink BD, van der Meer P, Silljé HHW, de Boer RA (2018) Heart failure stimulates tumor growth by circulating factors. Circulation 138:678–691 Michel L, Mincu RI, Mahabadi AA, Settelmeier S, Al-Rashid F, Rassaf T, Totzeck M (2020) Troponins and brain natriuretic peptides for the prediction of cardiotoxicity in cancer patients: a meta-analysis. Eur J Heart Fail 22:350–361 Miller KD, Nogueira L, Mariotto AB, Rowland JH, Yabroff KR, Alfano CM, Jemal A, Kramer JL, Siegel RL (2019) Cancer treatment and survivorship statistics, 2019. CA Cancer J Clin 69:363–385 Moslehi J, Zhang Q, Moore KJ (2020) Crosstalk between the heart and cancer. Circulation 142:684– 687 Nam Y, Kong Y, Reyes B, Reljin N, Chon KH (2016) Monitoring of heart and breathing rates using dual cameras on a smartphone. PLoS One 11:e0151013 Perez MV, Mahaffey KW, Hedlin H, Rumsfeld JS, Garcia A, Ferris T, Balasubramanian V, Russo AM, Rajmane A, Cheung L, Hung G, Lee J, Kowey P, Talati N, Nag D, Gummidipundi SE, Beatty A, Hills MT, Desai S, Granger CB et al (2019) Large-scale assessment of a smartwatch to identify atrial fibrillation. N Engl J Med 381:1909–1917 Sharma A, Burridge PW, McKeithan WL, Serrano R, Shukla P, Sayed N, Churko JM, Kitani T, Wu H, Holmstrom A, Matsa E, Zhang Y, Kumar A, Fan AC, Del Alamo JC, Wu SM, Moslehi JJ, Mercola M, and Wu JC (2017) High-throughput screening of tyrosine kinase inhibitor cardiotoxicity with human induced pluripotent stem cells. Sci Transl Med 9 Sturgeon KM, Deng L, Bluethmann SM, Zhou S, Trifiletti DM, Jiang C, Kelly SP, Zaorsky NG (2019) A population-based study of cardiovascular disease mortality risk in US cancer patients. Eur Heart J 40:3889–3897 Wang Y, Li J, Zheng X, Jiang Z, Hu S, Wadhera RK, Bai X, Lu J, Wang Q, Li Y, Wu C, Xing C, Normand S-L, Krumholz HM, Jiang L (2018) Risk factors associated with major cardiovascular events 1 year after acute myocardial infarction. JAMA Netw Open 1:e181079–e181079

578

J. C. Lal and F. Cheng

Zaha VG, Hayek SS, Alexander KM, Beckie TM, Hundley WG, Kondapalli L, Ky B, Leger KJ, Meijers WC, Moslehi JJ, Shah SH (2021) Future perspectives of cardiovascular biomarker utilization in cancer survivors: a scientific statement from the American Heart Association. Circulation 144:e551–e563 Zamorano JL, Lancellotti P, Rodriguez Muñoz D, Aboyans V, Asteggiano R, Galderisi M, Habib G, Lenihan DJ, Lip GYH, Lyon AR, Lopez Fernandez T, Mohty D, Piepoli MF, Tamargo J, Torbicki A, Suter TM (2016) 2016 ESC position paper on cancer treatments and cardiovascular toxicity developed under the auspices of the ESC Committee for practice guidelines: the task force for cancer treatments and cardiovascular toxicity of the European Society of Cardiology (ESC). Eur Heart J 37:2768–2801 Zhou Y, Hou Y, Hussain M, Brown SA, Budd T, Tang WHW, Abraham J, Xu B, Shah C, Moudgil R, Popovic Z, Cho L, Kanj M, Watson C, Griffin B, Chung MK, Kapadia S, Svensson L, Collier P, Cheng F (2020) Machine learning-based risk assessment for cancer therapy-related cardiac dysfunction in 4300 longitudinal oncology patients. J Am Heart Assoc 9:e019628

Chapter 25

Deep Learning Model for Prediction of Compound Activities Over a Panel of Major Toxicity-Related Proteins Mariia Radaeva, Mohit Pandey, Hazem MsLati, and Artem Cherkasov

25.1 Introduction The process of developing a new drug takes on average up to 10 years and costs up to 2.6 billion dollars (Mullard et al. 2020) that can be largely attributed to up to 96% attrition rate for drug candidates (Paul et al. 2010). Drug toxicity is one of the most common reasons for the termination of preclinical and clinical trials (Harvey 2014); moreover, adverse drug reactions can lead to withdrawal of already marketed drug (Park et al. 2005; Weiss et al. 2018). Hence, there is an urgent need for cost-effective and accelerated methods to spot drug toxicity early in the development process. Conventionally, cell-based and animal-based assays are performed to evaluate the toxicity of small molecules (DiMasi 2001). Although these methods are accurate, they are labour—intensive, expensive and unethical in the case of animal studies. On the other hand, recent advancements in computational modelling provide an alternative solution with a plethora of obvious advantages including greatly diminished costs and time to perform the evaluation. Modern studies present robust toxicology prediction models based on advanced representations of the chemical space paired with elaborate deep learning architectures (Baskin 2018; Idakwo et al. 2018; Peng et al. 2019; Wang et al. 2020). The performance of such models also improves as M. Radaeva · M. Pandey · H. MsLati · A. Cherkasov (B) Vancouver Prostate Centre, University of British Columbia, 2660 Oak Street, Vancouver, BC V6H 3Z6, Canada e-mail: [email protected] M. Radaeva e-mail: [email protected] M. Pandey e-mail: [email protected] H. MsLati e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Hong (ed.), Machine Learning and Deep Learning in Computational Toxicology, Computational Methods in Engineering & the Sciences, https://doi.org/10.1007/978-3-031-20730-3_25

579

580

M. Radaeva et al.

the chemical databases grow by the day. For example, ChEMBL database up to date contains around 18 million known chemical activities and is widely used for in silico modelling (Mendez et al. 2019). There are numerous endpoints for toxicity prediction including cardiotoxicity, hepatotoxicity, mutagenicity, immunotoxicity, lethal dose in rats, drug-induced liver injury and many others. Those mentioned above represent more holistic endpoints that are based on cell-based or animal-based data. Another type of toxicity endpoint data is activity measured on particular toxicity-associated protein targets such as hERG and nuclear receptors (AR, ER, PR and GR). As the knowledge about mechanisms of toxicity accumulated, more proteins were added to the list of off-targets that mediate side effects. For example, there is vast evidence showing that drug binding to hERG might cause severe cardiac complications, thus, hERG is one of the top toxicity targets used in in vitro toxicity assessments (Rampe and Brown 2013; Garrido et al. 2020). Importantly, it has been shown that around 75% of adverse drug reactions are dose-dependent and could be explained by certain target profiling (Redfern et al. 2002; Smith and Schmid 2006). Therefore, a model that predicts off-target endpoints could well be applied to approximate the toxicity risks. In recent years, Bowes et al. published an overview of off-target pharmacological profiling employed at four big pharma companies including AstraZeneca, GlaxoSmithKline, Novartis and Pfizer. They stress the importance of extensive preclinical profiling of drugs against numerous proteins that mediate drug toxicity. The authors report a minimal panel of targets essential to test for off-target effects. These proteins were selected based on several considerations: the functional role and essentiality, evidence that chemical binding translates into biological effects, reports on the adverse drug reactions associated with the targeting. The panel includes 44 proteins from four major families—GPCRs, ion channels, enzymes, transporters and nuclear receptors. GPCR class is the most abundant and is further divided into—adenosine, adrenergic, cannabinoid, dopamine, histamine, muscarinic, opioid, cholecystokinin, endothelin, vasopressin and serotonin receptors. The side effects associated with offtarget modulation of GPCRs are broad ranging from various impairments of neural systems to cardiac complications and arrests. Ion channels play an important role in central and peripheral neural systems as they are responsible for cellular excitation, thus undesired pharmacological modulations of ion channels might induce severe health complications. Seven enzymes are presented in the panel—COX-1, COX-2, monoamine oxidase (MAO), tyrosine-protein kinase Lck, acetylcholinesterase and two phosphodiesterases 3 and 4D. For all these enzymes, there is extensive evidence that off-target binding might impair vital functions of an organism. For example, modulation of cyclooxygenases in extreme cases leads to gastrointestinal erosions and renal and hepatic insufficiency (Suleyman et al. 2007). Unintended drug binding to nuclear receptors, primarily androgen and glucocorticoid receptors, is undesirable owing to its immunosuppressant and carcinogenic effects (Mcmaster and Ray 2008; Kronenberger et al. 2015). Finally, three neurotransmitters (dopamine, noradrenaline and serotonin transporters) transporters were also selected for the minimal panel because they play a significant role in drug secretion and uptake (Sherman and Writer 2007).

25 Deep Learning Model for Prediction of Compound Activities Over …

581

In this work, we developed a deep learning model that predicts pharmacological activities of compounds across a panel those 44 protein targets implicated in drug toxicity. The model is built on a dataset collected from ChEMBL and BindingDB databases. Contrary to conventional QSAR models developed to predict one endpoint, we employ a combined approach and train a single model that predicts across the entire panel. We demonstrate that the developed approach enables a superior performance compared to single-protein models. The predictive power of the model could be leveraged for future low-cost and high-efficiency estimations of candidate drugs toxicity.

25.2 Methods 25.2.1 Dataset We collected activity data for 44 protein targets from Bendels et al. (2019) from two databases—ChEMBL (Mendez et al. 2019) and BindingDB (Gilson et al. 2016). To suit the data for the classification task, we converted continuous measures of activity to binary. For the IC50 data, we selected a threshold of 10 μM to convert into binary active/inactive because this concentration is used to routinely screen for off-target binding to these targets in a commercial panel, CEREP44 (www. eurofinsdiscoveryservices.com). For Ki/Kd data, we first used a correction factor of 2 as suggested by Kalliokoski et al. (2013) and then applied the 10 μM cutoff for binary transformation. Before the binary conversion, we also removed data points that fall outside of the typical activity range (usually explained by faulty unity conversions). Finally, we performed data cleaning based on SMILES, i.e. removed inorganic compounds and mixtures, stripped salts, treated tautomers as the same compound and canonize the remaining SMILES. Molecular handling (i.e. salt removal and SMILES canonization) was done using RDKit (http://www.rdkit. org/) chemoinformatics package.

25.2.2 Chemical Diversity Analysis Molecular properties were calculated using RDKit. Dimensionality reduction of the ECSP fingerprints was performed using t-SNE method that clusters local neighbourhoods of molecules revealing any structural clusters. The t-SNE components were computed using the ChemPlot package (Sorkun et al. 2022). Chemical scaffolds were generated using Murcko-type decomposition (Bemis and Murcko 1996).

582

M. Radaeva et al.

25.2.3 Prediction Models All data curation and modelling procedures were performed in the Python programming language. A machine learning module scikit-learn (http://www.scikit-learn.org) was used for early stage data modelling and investigation. The global model architecture is based on a fusion approach where learned representations of the protein and the drug molecule are combined and projected onto a shared latent space before finally feeding them into the prediction head. We describe in detail the selection of protein and drug representation in the “Drug and target representation selection” section. The prediction head for the final estimation is a 3layered multi-layered perceptron (MLP) with layers containing 1024, 1024 and 512 neurons. We use ReLU activation in the model. We trained our models for 50 epochs using Adam optimizer and a learning rate of 0.001. Early stopping was adopted to avoid overfitting. The final global model was implemented using the DeepPurpose package (Huang et al. 2020). Individual models for per protein predictions were developed with the use of Morgan fingerprints for molecular representations. MLP architecture with the same settings as for the global model was employed to provide a fair comparison. These individual models were designed using the code published in the pychembl package (Wilbraham et al. 2019).

25.2.4 Evaluation Metrics To assess the performance of the models, we employed four different metrics. The formulas for the metrics are below Accuracy = Precison =

TP + TN TP + TN + FP + FN

TP TP , Recall = TP + FP TP + FN

ROC AUC = Area Under the True Positive Rate (TPR) − False Positive Rate (FPR) curve, where   TRP = Recall; FPR = 1 −

TN FP + TN



where TP = true positive, TN = true negative, FP = false positive and FN = false negative.

25 Deep Learning Model for Prediction of Compound Activities Over …

583

25.3 Results and Discussion 25.3.1 Data Collection and Analysis In total, we collected 234,155 drug-target associations from ChEMBL and BindingDB. The targets are represented by the 44 proteins suggested by Bendels et al. (2019) as a minimal panel of important off-targets. The measures of activity, Ki/Kd and IC50, were converted to binary active/inactive format. The combination of different activity measures can produce a lot of noise in the data, however, as concluded in a comprehensive analysis of the ChEMBL database by Kalliokoski et al. (2013) such an approach is acceptable for large-scale modelling. Furthermore, they state that mixing IC50 and Ki data from various assays adds a tolerable level of noise when used for big datasets as the variabilities partially cancel out each other. The correction of factor 2 for the Ki/Kd—IC50 conversion appeared to be reasonable as we observed typically higher Kd/Ki values compared to IC50 for measurements on the same drug-target pairs. In addition, the general distribution of Kd/Ki values is shifted to the right of IC50 values confirming the trend and the need for correction. The total number of molecules and the number of actives belonging to each protein is reported in Table 25.1. The dataset as a whole is balanced as it contains 63% of active compounds, however, for some of the targets, this ratio is shifted in one or the other direction. This variation reflects both the natural differences in ranges of protein affinities to small molecules as well as differences in the assays performed. To assess the chemical space of the dataset, we first analysed the distribution of molecular weight and logP parameters (Fig. 25.1a). The majority of data points fall into the drug-like ranges of logP from −0.4 to 5.6 and molecular weight of 200–600 Daltons. Only a small fraction of compounds appeared to be outliers, i.e. 1.15% of molecules are outside of molecular weight range and 1.25% outside of the logP range. We also performed an analysis of chemical diversity based on dimensionality reduction of the ECFP fingerprints (Fig. 25.1b). Such analysis would reveal any abnormal clustering of the data. Here, we see an equal distribution of molecular clusters across the two main t-SNE components. Both decomposition approaches show that active and inactive compounds cover chemical space evenly. Thus, we conclude that the collected data covers a broad spectrum of chemical space without any abnormal distributions and is well suited for machine learning modelling.

25.3.2 Drug and Target Representations Selection We designed a deep learning model that predicts over all 44 selected targets. The model takes as inputs both the target information as well as the molecular information. Protein sequences are converted into one-hot encoded vectors that are passed into a convolutional neural network (CNN). One-hot encoding was chosen, as it is a conventional method used for sequence data vectorization. It is well adapted for

584

M. Radaeva et al.

Table 25.1 Per target statistics Family

UniProt Protein name ID

GPCR

P08908 5-hydroxytryptamine receptor 1A

8280

6708

47

0.75

0.62

P28222 5-hydroxytryptamine receptor 1B

2705

1902

60

0.78

0.75

P28223 5-hydroxytryptamine receptor 2A

9035

6517

46

0.80

0.73

P41595 5-hydroxytryptamine receptor 2B

3121

1904

50

0.62

0.68

P29274 Adenosine receptor A2a

10,169

6363

39

0.87

0.69

P35348 Adrenergic receptor α-1A

2594

2173

47

0.70

0.50

P08913 Adrenergic receptor α-2

2835

1502

54

0.79

0.68

P08588 Adrenergic receptor β-1

1126

785

49

0.79

0.73

P07550 Adrenergic receptor β-2

2447

1471

51

0.76

0.69

P21554 Cannabinoid receptor 1

9550

4849

41

0.83

0.81

P34972 Cannabinoid receptor 2

9048

6632

40

0.80

0.73

P32238 Cholecystokinin receptor

3171

2155

48

0.79

0.80

P21728 Dopamine receptor D(1A)

3579

1847

51

0.77

0.73

P14416 Dopamine receptor D(2)

12,384

8942

45

0.67

0.55

P25101 Endothelin-1 receptor

2571

1860

43

0.73

0.74

P35367 Histamine H1 receptor

3615

2164

54

0.72

0.77

P25021 Histamine H2 receptor

2189

509

58

0.65

0.50

P11229 Muscarinic acetylcholine receptor M1

4060

2406

53

0.77

0.67

P08172 Muscarinic acetylcholine receptor M2

3838

2475

52

0.70

0.65

Molecules Active Unique Global Single molecules scaffold model model % AUC AUC

(continued)

25 Deep Learning Model for Prediction of Compound Activities Over …

585

Table 25.1 (continued) Family

UniProt Protein name ID P20309 Muscarinic acetylcholine receptor M3

4573

3010

49

0.69

0.67

P41143 Opioid receptor delta-type

8019

4891

47

0.80

0.79

P41145 Opioid receptor kappa-type

8465

6300

46

0.76

0.62

P35372 Opioid receptor mu-type

10,312

6658

44

0.83

0.84

2388

1535

53

0.77

0.70

2495

1410

58

0.77

0.75

P37288 Vasopressin V1a receptor Ion channel P46098 5-hydroxytryptamine receptor 3A P63138 GABA-A receptor

540

294

56

0.64

0.62

P35439 Glutamate receptor ionotropic

4171

2407

35

0.73

0.75

P43681 Neuronal acetylcholine receptor α-4/β-2

2129

1378

49

0.68

0.78

17,936

8927

49

0.76

0.60

Q13936 Voltage-gated L-type calcium channel

1237

707

62

0.73

0.77

Q6I9B6 Voltage-gated potassium channel

1037

788

53

0.76

0.79

Q62968 Voltage-gated sodium channel

6982

5288

38

0.78

0.74

P06239 Tyrosine-protein kinase Lck

6193

2880

54

0.79

0.68

Q12809 Potassium voltage-gated channel

K LP NRTM

NHR

PDE

Molecules Active Unique Global Single molecules scaffold model model % AUC AUC

P23219 Cyclooxygenase-1

4756

1373

35

0.68

0.50

P35354 Cyclooxygenase-2

7467

4487

32

0.69

0.68

P22303 Acetylcholinesterase

8480

4717

40

0.86

0.75

P21397 Monoamine oxidase type A

6676

2557

36

0.76

0.50

P10275 Androgen receptor

4119

3346

37

0.74

0.68

P04150 Glucocorticoid receptor

4436

3248

38

0.91

0.90

Q14432 Phosphodiesterase 3

2774

1349

47

0.76

0.67 (continued)

586

M. Radaeva et al.

Table 25.1 (continued) Family

UniProt Protein name ID Q08499 Phosphodiesterase 4D

Molecules Active Unique Global Single molecules scaffold model model % AUC AUC 3330

2911

37

0.79

0.62

5614

4114

37

0.69

0.65

P23975 Sodium-dependent noradrenaline transporter

5880

4404

35

0.77

0.69

P31645 Sodium-dependent serotonin transporter

7829

6237

0.36

0.74

0.61

Transporter Q01959 Sodium-dependent dopamine transporter

Total of molecules, number of active molecules and percentage of unique scaffolds within each target dataset. ‘Global model AUC’ reports AUC scores computed per target from the unified model. ‘Single model AUC’ column reports the performance of the individual models. Protein families: GPCR—G-protein coupled receptors, Ion channels, K—kinase, LP—lipid metabolism, NRTM— neurotransmitter metabolism, NHR—nuclear hormone receptors, PDE—phosphodiesterase and transporters

protein sequence encodings as each of the 20 amino acids could be characterized with a real number (Lin et al. 2002). The choice of CNN over more complex networks such as transformers was motivated by their simplicity, speed of training and inference (Vaswani et al. 2017). Furthermore, one-dimensional CNNs can capture fold information of proteins from their sequences as shown by Hou et al. (2018). To identify the best performing representations of small molecules, we trained four MLP models with different descriptors, Morgan, RDKit, ECFP and Daylight. The highest fivefold CV ROC AUC scores were achieved with Morgan and ECFP fingerprints (Table 25.1). We decided to proceed with Morgan and further tested different combinations of radius and bit numbers. We skipped the 512-bit implementation of the Morgan fingerprint because it would fail to capture the large and chemically diverse space of the dataset. We tested 1024 and 2048-bit settings coupled with radiuses of two and three. The best fivefold CV AUC scores were achieved with 1024-bit implementation and the radius of two.

25.3.3 Model Performance The global model performance was evaluated with fivefold CV, the results are reported in Table 25.2. The mean ROC AUC score equals 0.76 and falls within the range of a typical model built on noisy biological data. Importantly, the performance between folds did not change significantly with a standard deviation of only 0.018 (Fig. 25.2). This indicates that random sampling of the train and test data did

25 Deep Learning Model for Prediction of Compound Activities Over …

587

Fig. 25.1 Chemical space distribution of the collected data. Green dots indicate active compounds, red inactive. a Chemical space defined by molecular weight and logP. b t-SNE decomposition of the ECFP fingerprints of the molecules

not bias the model performance. The low variance between CV-folds shows that the model generalizes well (Jiang and Wang 2017). We then assessed model’s predictions for each protein target. The performance varied between 0.61 AUC for 5-hydroxytryptamine receptor 2B to 0.91 AUC for glucocorticoid receptor. We visualized the performance for each target using

588

M. Radaeva et al.

Table 25.2 Evaluation metrics for global model tested with different fingerprints and averaged scores for per protein Global model ECFP

Daylight

Per protein model average RDKit

Morgan

Morgan

Accuracy

0.75

0.75

0.72

0.78

0.71

AUC

0.76

0.74

0.70

0.76

0.69

Precision

0.82

0.81

0.76

0.82

0.76

Recall

0.84

0.76

0.80

0.84

0.80

Fig. 25.2 Fivefold cross-validated receiver operating characteristic curve of the global model. Red line is a reference and represents random prediction

precision-recall curves because some of the proteins’ datasets are imbalanced and ROC curves might present those results overly optimistic (Fig. 25.3). For each target, the precision-recall curve is above the random line meaning that for all targets the model distinguishes actives from inactive to some extent. As expected, the performance was lower for targets with more unbalanced classes. For instance, histamine H2 and COX-1 only had 23 and 28% of active molecules, respectively, and the AUC scores for those proteins were on the lower boundary of 0.64 and 0.68. We further investigated how molecular diversity within each protein target affects performance. To do so, we estimated the molecular diversity within each dataset by calculating the percentage of unique scaffolds. Thus, in a protein dataset where all the molecules represent a unique scaffold this ratio would be equal to hundred. We observed that the proportion of unique scaffolds varied from 30% (COX-2) to 62%

25 Deep Learning Model for Prediction of Compound Activities Over …

589

Fig. 25.3 Precision-recall curve plotted for each individual target. Each graph contains 22 curved each (44 targets in total). Red line is a reference and represents random prediction

(calcium channel) with a mean of 46%, i.e. within the whole dataset 46% of molecules represent a unique scaffold and the rest have at least one close analogue (Table 25.1). We analysed the relationship between performance and chemical diversity within each protein group by correlating AUC scores and the unique scaffold ratio. We found a weak negative correlation (Pearson coefficient of −0.21) between the two measures. Such weak negative correlation could be explained by the fact that a model built and tested on a dataset with relatively low chemical diversity (i.e. a small percentage of unique scaffolds) predicts better as fewer molecules are outside of the model’s applicability domain. For such models, there is a higher chance that a certain chemical scaffold from the test set has been seen in the training set.

590

M. Radaeva et al.

25.3.4 Comparison with Conventional per Protein Models To evaluate the advantages of a single model that predicts activities across all of the selected proteins, we also built individual models for each protein target. These models did not utilize any protein information but otherwise were built similarly to the unified model. The only input to each model was Morgan fingerprints of small molecules. To make a fair comparison, we used the same neural network architecture, i.e. MLP, with the same number of layers and neurons. For each model, we run fivefold cross-validation keeping the folds constant between the unified model and per protein models. The average ROC AUC score for all the models is 0.68, which is significantly lower than that of the global model, 0.76. Most importantly, some of the individual models failed to distinguish between the classes and labelled all predictions as the same class (AUC 0.5, Table 25.1). This happened for proteins with highly unbalanced classes. For instance, COX-1, H2 receptor and amine oxidase contained 28%, 23% and 38% of actives and the corresponding models failed to predict any positive classes, i.e. recall equals 0. In contrast, the global model’s recalls for corresponding targets were 0.49, 0.39 and 0.66, respectively. Thus, we conclude that a larger target feature space enables the model to generalize across different targets hence enabling better performance on otherwise poorly predicted targets.

25.3.5 External Validation To validate our model on an external dataset, we used a set of novel drug-target interaction reported by Lounkine et al. (2012). They predicted activities of 656-marketed drugs over 73 adverse-reaction associated targets using structure similarity comparison to known ligands reported in ChEMBL database (Mendez et al. 2019). They then validated their predictions through in vitro testing, in which they confirmed 175 novel drug-target interactions with IC50s below 10 μM. Out of these 175 new interactions, 128 belong to 35 proteins from our list. The reported drug-target interactions do not appear in our dataset. We run our global model over the 35 targets and found that 68% of interactions were predicted to be positive. Moreover, we observed that half of the positive predictions (41 out of 82) received a high probability score (>0.9) meaning that the model was highly confident in those predictions (Fig. 25.4). We hypothesized that drug-target pairs missed by the model likely to belong to completely unexplored chemotype-target interactions. Indeed, we found that failed predictions happened to have lower Tanimoto similarity to the corresponding protein dataset than those that were captured by the model. These results highlight that the applicability domain of the model should be taken into account when assessing the reliability of predictions.

25 Deep Learning Model for Prediction of Compound Activities Over …

591

Fig. 25.4 Histogram of predict probabilities across the 128 drug-target pairs from an external dataset. The green line indicates the 0.5 cut-off used to separate actives from inactive

25.4 Conclusions Herein, we presented a machine learning classification model that predicts interactions of small molecules with a panel of 44 major off-target proteins. These protein targets are routinely used for in vitro testing of new candidate drugs at pharmaceutical companies such as Pfizer and Roche (Bowes et al. 2012; Bendels et al. 2019). The model offers an efficient alternative for large-scale screening of compounds for potential toxicity mediated through binding to these proteins. The use of machine learning to assess toxicity is utterly convenient because in contrast to in vitro testing it does not require any human labour, cell cultures or animals and takes just a few minutes to predict hundreds of compounds. The advantages of in silico predictions are ever prominent in cases where the in vitro testing is largely complicated. Those include assays with voltage-gated ion channels (sodium, calcium, potassium channels from the panel) that require specific electrophysiological techniques not readily available in every laboratory. The proposed model outputs probabilities of small molecule interactions with each of the 44 protein targets. Such predictions could be useful not only at the early stages of virtual screening to filter out potentially toxic compounds but also at the later stages of lead optimization. For example, a predicted activity profile might give insights into the biological underpinnings of particular toxicity phenotypes observed in cell culture or animal models. An understanding of the toxicity mechanisms of action opens a possibility for structure-guided optimization of the compound to achieve reduced off-target effects. Instead of building separate models for each target, we employed a universal model approach where one model predicts across all 44 targets. Thus, apart from chemical structure descriptors, we also fed protein sequence information into the model. We show that such an approach outperforms conventional per protein models where only chemical representations are utilized. The superiority of the global model

592

M. Radaeva et al.

could be explained as a special case of transfer learning where the knowledge about one protein activity profile helps to predict another. Such explanation is especially probable in the case of the selected protein panel because it contains several classes of related proteins such as three members of muscarinic acetylcholine receptors and four adrenergic receptors. The better performance of multiclass predictions over single models on toxicity endpoints was also demonstrated in DeepTox paper (Mayr et al. 2016). Similar to the authors of DeepTox, we found that the global model can predict for targets with severely unbalanced classes while single models failed to do so. One of the major limitations of toxicity modelling remains the data availability and quality. Here, we utilize the activity data measure at different assays and different conditions. Thus, there is an inherent variability that hinders the model performance. Moreover, we augmented conventional IC50 measurements with Kd values to increase the dataset size. The noise associated with the assay variability as well as IC50/Kd mixing is not detrimental to the overall model performance due to the abundance of the data. The general patterns of activity profiles are captured despite the limitation of the data quality. We expect that the ever-increasing data on molecular activities will produce more reliable and generalizable in silico toxicological models.

References Baskin II (2018) Computational toxicology. Springer, Berlin, pp 119–139 Bemis GW, Murcko MA (1996) The properties of known drugs. 1. Molecular frameworks. J Med Chem 39(15):2887–2893 Bendels S, Bissantz C, Fasching B, Gerebtzoff G, Guba W, Kansy M, Migeon J, Mohr S, Peters J-U, Tillier F (2019) Safety screening in early drug discovery: an optimized assay panel. J Pharmacol Toxicol Methods 99:106609 Bowes J, Brown AJ, Hamon J, Jarolimek W, Sridhar A, Waldron G, Whitebread S (2012) Reducing safety-related drug attrition: the use of in vitro pharmacological profiling. Nat Rev Drug Discov 11(12):909–922 Dimasi JA (2001) Risks in new drug development: approval success rates for investigational drugs. Clin Pharmacol Ther 69(5):297–307 Garrido A, Lepailleur A, Mignani SM, Dallemagne P, Rochais C (2020) hERG toxicity assessment: useful guidelines for drug design. Eur J Med Chem 195:112290 Gilson MK, Liu T, Baitaluk M, Nicola G, Hwang L, Chong J (2016) BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res 44(D1):D1045–D1053 Harvey AL (2014) Toxins and drug discovery. Toxicon 92:193–200 Hou J, Adhikari B, Cheng J (2018) DeepSF: deep convolutional neural network for mapping protein sequences to folds. Bioinformatics 34(8):1295–1303 Huang K, Fu T, Glass LM, Zitnik M, Xiao C, Sun J (2020) DeepPurpose: a deep learning library for drug–target interaction prediction. Bioinformatics 36(22–23):5545–5547 Idakwo G, Luttrell J, Chen M, Hong H, Zhou Z, Gong P, Zhang C (2018) A review on machine learning methods for in silico toxicity prediction. J Environ Sci Health C 36(4):169–191 Jiang G, Wang W (2017) Error estimation based on variance analysis of k-fold cross-validation. Pattern Recogn 69:94–106

25 Deep Learning Model for Prediction of Compound Activities Over …

593

Kalliokoski T, Kramer C, Vulpetti A, Gedeck P (2013) Comparability of mixed IC50 data—a statistical analysis. PLoS One 8(4):e61007 Kronenberger T, Keminer O, Wrenger C, Windshügel B (2015) Nuclear receptor modulators— current approaches and future perspectives. In: Drug discovery and development-from molecules to medicine Lin K, May AC, Taylor WR (2002) Amino acid encoding schemes from protein structure alignments: multi-dimensional vectors to describe residue types. J Theor Biol 216(3):361–365 Lounkine E, Keiser MJ, Whitebread S, Mikhailov D, Hamon J, Jenkins JL, Lavan P, Weber E, Doak AK, Côté S (2012) Large-scale prediction and testing of drug activity on side-effect targets. Nature 486(7403):361–367 Mayr A, Klambauer G, Unterthiner T, Hochreiter S (2016) DeepTox: toxicity prediction using deep learning. Front Environ Sci 3:80 Mcmaster A, Ray DW (2008) Drug insight: selective agonists and antagonists of the glucocorticoid receptor. Nat Clin Pract Endocrinol Metab 4(2):91–101 Mendez D, Gaulton A, Bento AP, Chambers J, De Veij M, Félix E, Magariños MP, Mosquera JF, Mutowo P, Nowotka M (2019) ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res 47(D1):D930–D940 Mullard M, Cadé M, Morice S, Dupuy M, Danieau G, Amiaud J, Renault S, Lézot F, Brion R, Thepault RA (2020) Sonic hedgehog signature in pediatric primary bone tumors: effects of the GLI antagonist GANT61 on Ewing’s sarcoma tumor growth. Cancers 12(11):3438 Park K, Williams DP, Naisbitt DJ, Kitteringham NR, Pirmohamed M (2005) Investigation of toxic metabolites during drug development. Toxicol Appl Pharmacol 207(2):425–434 Paul SM, Mytelka DS, Dunwiddie CT, Persinger CC, Munos BH, Lindborg SR, Schacht AL (2010) How to improve R&D productivity: the pharmaceutical industry’s grand challenge. Nat Rev Drug Discov 9(3):203–214 Peng Y, Zhang Z, Jiang Q, Guan J, Zhou S (2019) Top: towards better toxicity prediction by deep molecular representation learning. IEEE, pp 318–325 Rampe D, Brown AM (2013) A history of the role of the hERG channel in cardiac risk assessment. J Pharmacol Toxicol Methods 68(1):13–22 Redfern WS, Wakefield ID, Prior H, Pollard CE, Hammond TG, Valentin JP (2002) Safety pharmacology—a progressive approach. Fundam Clin Pharmacol 16(3):161–173 Sherman C, Writer NN (2007) The defining features of drug intoxication and addiction can be traced to disruptions in cell-to-cell signaling. NIDA Notes NIH, NIDA 21(4) Smith DA, Schmid EF (2006) Drug withdrawals and the lessons within. Curr Opin Drug Discov Dev 9(1):38–46 Sorkun MC, Mullaj D, Koelman JVA, Er S (2022) Chemplot, a python library for chemical space visualization Suleyman H, Demircan B, Karagoz Y (2007) Anti-inflammatory and side effects of cyclo-oxygenase inhibitors. Pharmacol Rep 59(3):247 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, vol 30 Wang MW, Goodman JM, Allen TE (2020) Machine learning in predictive toxicology: recent applications and future directions for classification models. Chem Res Toxicol 34(2):217–239 Weiss A, Freeman W, Heslin K, Barrett M (2018) Adverse drug events in us hospitals, 2010 versus 2014. HCUP Statistical Brief 234 Wilbraham L, Sprick RS, Jelfs KE, Zwijnenburg MA (2019) Mapping binary copolymer property space with neural networks. Chem Sci 10(19):4973–4984

Chapter 26

Machine Learning for Analyzing Drug Safety in Electronic Health Records Meijian Guan

26.1 Introduction Adverse drug events (ADEs) are unintended, harmful events that are related to the use of medicines. It is estimated to be the fourth leading cause of death in the United States and Canada, and the sixth leading cause of death worldwide (Hacker 2009). The definition of ADE has evolved over time, it can be resulting from drug-drug interaction, prescription errors, misuse or abuse, or withdrawal of the product. It is estimated that between 5 and 10% of patients may suffer from an ADE at admission, during admission or at discharge, despite various preventative efforts (Coleman and Pontefract 2016). As the widespread ADEs can be a significant cause of morbidity and mortality, effective and accurate drug surveillance plays a key role in the protection of public health and the reduction of healthcare cost due to ADE-related hospital complications (Hakkarainen et al. 2012; Bates et al. 1997). Spontaneous reporting systems (SRS) (Polepalli et al. 2014) have been traditionally used for pharmacovigilance, however, significant, and widespread underreporting of ADEs to spontaneous reporting systems has been reported (Hazell and Shakir 2006). In addition, SRS databases do not always have consistent or complete medical history, comorbidities, and drug exposure information, and may therefore lead to inaccurate incidence reporting (Coloma et al. 2013). Electronic health records (EHRs) have been considered as a key component in the new and more proactive paradigm of pharmacovigilance because of their rich collection of detailed clinical information. These potential resources include electronic medical records with detailed clinical information such as patients’ symptoms, physical examination findings, diagnostic test results and prescribed medications or other interventions (Coloma et al. 2013). The EHR databases have been used to monitor M. Guan (B) Janssen Research and Development, LLC, Spring House, PA, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Hong (ed.), Machine Learning and Deep Learning in Computational Toxicology, Computational Methods in Engineering & the Sciences, https://doi.org/10.1007/978-3-031-20730-3_26

595

596

M. Guan

patient outcomes, carry out formal pharmacoepidemiologic studies, and confirm or refute potential drug safety signals detected initially by SRS (García Rodríguez and Pérez Gutthann 1998; Suissa and Garbe 2007; Kramarz et al. 2001). One challenge of detecting safety signals from the EHR databases is that a large fraction of medical information in HER only exists in unstructured format, such as clinical notes (Turchin et al. 2009). According to one study, only 28.9% of the patients with documented statin side effects had the relevant ADE records in a structured format (Skentzos et al. 2011). Therefore, developing advanced machine learning (ML) and natural language processing (NLP) methods to extract drug safety signals efficiently and accurately from the unstructured EHRs. Several NLP systems, such as MetaMap (Aronson 2001), have been developed to map biomedical text to the unified medical language system (UMLS) and therefore to generate ADE predictions. However, such approaches may miss other important drug information. Another obstacle is that different NLP systems have been evaluated on different standards, making it difficult to identify reproducible NLP methods. More recently, the advancement of ML algorithms such as deep neural networks have provided an opportunity to improve the performance of current medical NLP systems. Deep learning algorithms, for instance, long short-term memory (LSTM) networks, have demonstrated promising results in a number of ADE detection projects that utilize medical narratives in EHR datasets (Jagannatha and Yu 2016; Tutubalina and Nikolenko 2017; Huang et al. 2015; Christopoulou et al. 2020; Jouffroy et al. 2021; Alfattni et al. 2021). While further validation is required, the rapidly evolving artificial intelligence may hold promise for a more effective pharmacovigilance system. In this article, we will go through some examples about how the research community identifies key issues that cause ADEs with EHR datasets and a variety of ML and NLP methods. In addition, existing and emerging EHR-based ML approaches will be surveyed.

26.2 Drug Safety Problems to Solve with ML ADEs are related to many factors; here, we will discuss the opportunities that using ML and EHR to solve key tasks in pharmacovigilance—identifying prescription errors, medication misuse, and drug-drug interactions.

26.2.1 Prescription Error Drug prescription errors are a common type of ADEs that cause substantial morbidity, mortality, and economic cost, estimated at more than $20 billion annually in the United States (Aspden 2007; Andel et al. 2012). ML-based alerting systems have been developed to help identify prescription outliers and irregularities with patient’s

26 Machine Learning for Analyzing Drug Safety in Electronic Health Records

597

clinical records. MedAware (Raanana, Israel) is a commercial software that can analyze historical EHRs, and for each medication, it develops a computational model to capture the population that is likely to be prescribed the medication and the clinical environment in which it is likely to be prescribed. The model then can be used to identify significant statistical outliers given patients’ clinical situation. The system demonstrated high accuracy and low false-positive rate in previous studies (Schiff et al. 2017; Segal et al. 2019). Qi Li and colleagues developed a hybrid ML and NLP algorithm for automated medication discrepancy detection. The method comprises three steps: (1) identification of medication entities from clinical notes with a ML algorithm, (2) medication attributes linkage with a rule-based method, and (3) medication-prescription matching using an NLP-based method for discrepancy discovery. This method was tested with an annotated gold-standard set of medication reconciliation data and achieved promising outcome in terms of precision, recall, and F-value (Li et al. 2015). Recent studies showed that up to 50% of ADEs are related to high alert drugs (HADs). Moreover, ADEs occurring from HADs related to a high mortality due to a narrow drug therapeutic window (Lee et al. 2014). Multiple studies have focused on developing new ML-based approaches to screen drug prescription errors in HADs such as warfarin and digoxin (Hu et al. 2012; Roche-Lima et al. 2020; Corny et al. 2020; Wongyikul et al. 2021). Corny et al. tested a hybrid ML and rule-based expert system in a typical hospital setting to identify the prescriptions with a high risk of medication error. An independent validation by a pharmacist suggested greater accuracy of the hybrid system over regular clinical decision support (CDS) system (Corny et al. 2020). Prescription errors also include dosage errors, which could cause life threatening ADEs or diminished therapeutic effects. As extreme dosage errors are very rare, Nagata et al. developed a one-class support vector machine (OCSVM)-based unsupervised method to detect abnormal overdose or underdose prescriptions. Prescription data were extracted from EHRs of Kyushu University Hospital to construct the overdose and underdose prescriptions. OCSVM detected the majority of the dosage errors (27/31) and outperformed other anomaly detection algorithms (Nagata et al. 2021). In summary, ML-based method might play an essential role in detecting prescription errors, thus reduce the frequency of the ADEs.

26.2.2 Medication Misuse Medication, and specifically opioid-related overdoses remain a major public health problem in the United States. More than 400,000 opioid overdose deaths were documented between 1999 and 2018, with 46,000 deaths in 2018 alone, because of the opioid epidemic (Hedegaard et al. 2020; Wilson et al. 2020). Studies that develop risk prediction tools for opioid overdose have emerged in the last few years. Published

598

M. Guan

models were developed from a variety of data sources, including Veterans Health Administration (VHA) data, Medicare data, commercial insurance data, and countylevel datasets (Lo-Ciganic et al. 2019; Sun et al. 2020; Palumbo et al. 2020; Segal et al. 2020; Hur et al. 2021; Marks et al. 2021; Ward et al. 2021). Compared to the general population, Veterans suffer a disproportionate impact from the opioid epidemic, including overdose, suicide, and death (Seal et al. 2012; Vowles et al. 2020). With the National VHA Corporate Data Warehouse (CDW) data, Ward et al. proposed a multivariate generalized linear mixed modeling (mGLMM) approach to predict overdose and suicide-related events (SRE) separately. It demonstrated a better performance in terms of AUC (84% vs. 77%) and sensitivity (71% vs. 66%) compared to the VHA implemented Stratification Tool for Opioid Risk Mitigation (STORM) (Ward et al. 2021). Lo-Ciganic et al. conducted a prognostic study to predict the incident opioid use disorder in 361,527 fee-for-service Medicare beneficiaries, without cancer, filling more than one opioid prescriptions from 2011 to 2016 (Lo-Ciganic et al. 2020). They applied elastic net, random forests, gradient boosting machine, and deep neural network to predict OUDs. All four approaches achieved similar prediction performances in the validation cohort (C-statistics: 0.874–0.882), while the elastic net required the fewest predictors. The top two decile risk subgroups of elastic net showed the highest OUD rate, with 69% (248 out of 360) of the total OUDs occurred (Lo-Ciganic et al. 2020). Recently, two studies used insurance claim data to predict OUDs in patients (Segal et al. 2020; Hur et al. 2021). Commercial claim data is another valuable data source for opioid overdose research as it contains medical insurance claims, International Classification of Disease (ICD) 9 and 10 diagnosis codes, details of pharmacy purchases, as well as other patient demographics (Segal et al. 2020). Segal et al. implemented a Word2Vec and gradient boosting trees algorithm to predict OUDs with 10 million medical insurance claims. The model achieved high c-statistics at 0.959 and identified a list of key variables that contribute to the OUD risk, including intervertebral disk disorder-related complaints per year, post laminectomy syndrome diagnosed per year, and pain disorders diagnosis per year (Segal et al. 2020). Hur et al. focused on predicting postoperative refills and opioid persistent use with preoperative insurance claims data in 112,898 opioid-naïve patients (Hur et al. 2021). The best model showed the area under the receiver operating characteristics curve (AUROC) at 0.67 for predicting refills and 0.66 for predicting persistent use. In addition, they identified that undergoing major surgery, opioid prescriptions within 30 days prior to surgery, and abdominal pain were useful in predicting refills, while back/joint/head pain were the most important features in predicting new opioid persistent use (Hur et al. 2021). In the latest waves of opioid crisis in the US, mortality has been concentrated within specific regions—primarily the Midwest, Appalachia, and New England (Ciccarone 2019; Centers for Disease Control and Prevention and National Center for Health Statistics 2013). Marks et al. developed a negative binomial modeling approach to identify counties at the highest risk of high overdose mortality in the subsequent years (Marks et al. 2021). They validated the predicted annual countylevel overdose death rates across the USA against observed overdose mortality data

26 Machine Learning for Analyzing Drug Safety in Electronic Health Records

599

collected between 2013 and 2018 and found that their model outperformed the benchmark model. This study showed that machine learning method may help to effectively identify high mortality counties in the future emerging drug misuse epidemic (Marks et al. 2021). In summary, NLP and ML methods have the potential to improve clinical decision support for early intervention and prevention to combat the opioid epidemic.

26.2.3 Drug-Drug Interactions Drug-drug interactions (DDIs) account for up to 30% of ADEs (Datta et al. 2021; Iyer et al. 2014). Application of multiple drugs often enhance therapeutic effect and selectivity; however, it can also lead to serious ADEs such as synergistic toxicity (García-Fuente et al. 2018; Datta et al. 2021). More and more patients are simultaneously treated by multi-drug. Between 2009 and 2012, 38.1% of U.S. adults aging 18–44 years used three or more prescription drugs during a 30-day time window (Nahta et al. 2004; Chou 2010). Therefore, there is an urgent need to develop the state-of-the-art methods to identify potential DDIs. The widespread adoption of EHRs provides a unique opportunity to identify previous known, as well as potentially unknown ADEs (Wang et al. 2009; Banda et al. 2016). For example, Banda et al. investigated the feasibility of prioritizing drugdrug-event associations derived from EHRs using four different information sources (Banda et al. 2016). Iyer et al. used standard methods that measure the disproportionality of the mention of ADEs to detect DDIs from 50 million clinical notes (Iyer et al. 2014). They demonstrated that using clinical notes can achieve as good performance as established methods on SRSs (Iyer et al. 2014). The increasing prevalence of EHRs has made it possible to study multiple medication exposures simultaneously. However, it is still challenging to avoid systematic bias across many results and improve signal-to-noise ratio (Vajravelu et al. 2018). Vajravelu et al. developed a novel method, medication class enrichment analysis (MCEA), to identify biologically relevant findings while analyzing multiple pharmacologic exposures simultaneously (Vajravelu et al. 2018). They applied MCEA on The Health Improvement Network database to analyze medications associated with Clostridium difficile infection (CDI). With the help from MCEA, they were able to narrow down 47 pharmacologic classes that associated with CDI to only fluoroquinolones, which is a class of antibiotics with biologically confirmed association with CDI (Vajravelu et al. 2018). Although the results showed promise of MCEA, additional studies are necessary to confirm the superior performance. In a 2021, study conducted by Datta et al., drug interactions with non-steroidal anti-inflammatory drugs (NSAIDS) that result in drug-induced liver injury (DILI) were investigated (Datta et al. 2021). They applied a ML algorithm on the EHR dataset of about 400,000 hospitalizations to detect several known interactions. The proposed method successfully identified 87.5% of the positive controls that were known to interact with diclofenac and cause increased risk of DILI (Datta et al.

600

M. Guan

2021). Moreover, they identified a novel and potentially hepatotoxic interaction between meloxicam and esomeprazole, which are commonly prescribed together for NSAID-induced gastrointestinal (GI) bleeding (Datta et al. 2021). Their approach also outperformed all the compared prior methods across most metrics (Datta et al. 2021). Patrick et al. focused on predicting DDIs between drugs used to treat psoriasis and its comorbidities by combining molecular data and medical claims (Patrick et al. 2021). The proposed pipeline integrated 984 transcriptomic datasets, molecular structure of 2159 drugs, and medical claims from 150 million patients to predict interactions between 37,611 drug pairs that used to treat psoriasis and its comorbidities (Patrick et al. 2021). Their method achieves > 0.9 area under the receiver operator curve (AUROC) for differentiating 11,861 known DDIs from 25,750 nonDDI drug pairs. Novel DDIs that they identified were confirmed through independent data sources and supported by EHRs (Patrick et al. 2021). In addition to traditional EHR datasets, the extended EHR, such as social media, provide a new channel for identifying ADEs for patients. According to a survey, 72% of Internet users went online to seek health information (White et al. 2016). White et al. performed a large-scale study on Web search logs to detect a specific DDIs (White et al. 2013). Yang et al. proposed to discover DDIs from MedHelp.org, a popular online health community (Yang and Yang 2016). In addition, social media platforms with a large volume of users, for example, Twitter and Instagram, have shown great potential in detecting DDIs (Carbonell et al. 2015; Hamed et al. 2015; Correia et al. 2016). Although social media offers new possibilities to study ADEs and DDIs, it is usually noisier and has limited real-world applications so far. Additional efforts are necessary to prove and understand the role of social media’s role in pharmacovigilance. Predicting DDIs is a difficult task, but with the application of ML algorithms on EHR datasets combining with other data sources such as pharmacologic databases, literatures, molecular data, and social media, we might be able to detect DDI events more accurately and efficiently in the future.

26.3 Recent Trends of NLP and ML Methods in Pharmacovigilance The abundance of information in the unstructured EHR datasets makes NLP a vital tool in detecting ADEs. The main subtasks include annotating unstructured text with named medical entities (e.g., drugs and diseases) and identifying the relations between these annotated entities (relation identification) (Fig. 26.1). Biomedical named entity recognition (NER), relation identification (RI), and the combined task have been active research areas.

Fig. 26.1 Common workflow in an NLP project for ADE identification. This diagram demonstrates common subtasks in an EHR-based NLP project, including free-text data cleaning and processing, medical entity recognition, and drug-ADE relation identification

26 Machine Learning for Analyzing Drug Safety in Electronic Health Records 601

602

M. Guan

26.3.1 The Existing of NLP Approaches Existing NLP methods for EHR ADE identification can be roughly grouped into rulebased, lexicon-based, supervised machine learning, and hybrid approaches. Rulebased methods, such as keyword and trigger phrase search, usually rely on an expertgenerated keyword and trigger phrase list and are especially efficient for a specific disease or pharmaceutical agent. For example, Murff et al. developed a tool that searched free-text discharge summaries for trigger words representing possible ADEs based on expert opinion (Murff et al. 2003). Cantor et al. studied the problem of identifying ADEs from ambulatory care notes for outpatients and identified a list of trigger phrases that had high correlation with the occurrence of ADEs (Cantor et al. 2007). The keyword- and trigger phrase-based approach has been limited to specific instances due to the difficulty of manually constructing and maintaining the keyword/trigger phrase collection. Another widely used method is to extend existing biomedical NLP system for the identification of ADEs; most used systems include MedLEE (Friedman et al. 2004), MetaMap (Aronson 2001), and cTAKES (Savova et al. 2010). For instance, MedLEE was used by several studies to recognize named entities such as medications and potential adverse events in unstructured data from hospital records (Wang et al. 2010; Friedman 2009; Haerian et al. 2012). In addition, the existing NLP systems have often been used together with medical ontology databases such as UMLS to normalize the identified entities (Humphreys et al. 1998; Duke and Friedlin 2010). Although extending existing medical NLP systems have had some successes in detecting ADEs, it nevertheless faces the inherent limitations from the systems, such as lacking annotation or requiring additional assertion classification tools (Luo et al. 2017).

26.3.2 Machine Learning Methods ML-centered approaches have gained traction for the last decade. One benefit of ML approaches is that it can be used to quantitatively assess the likelihood of a candidate drug-disease or drug-symptom being a true ADE (Luo et al. 2017). Numerous ML algorithms have been applied in this area, ranged from traditional statistical models to deep neural networks. Visweswaran et al. annotated terms in the EHR with UMLS and implemented a Bayesian model to evaluate whether a patient had ADEs (Visweswaran et al. 2003). In addition, linear models, hidden Markov models (HMMs), and support vector machines (SVMs) were all used in extracting information from EHR and recognizing named entities (Sampathkumar et al. 2014; Ramesh et al. 2014; Ward et al. 2021). Conditional random fields (CRFs) are probabilistic graphical models that have been recognized as a reliable and high-performance algorithm for labeling recognized entities in biomedical text because of their ability to segment and label sequence data (Lafferty et al. 2001). For example, Aramaki et al. (2010) performed NER for

26 Machine Learning for Analyzing Drug Safety in Electronic Health Records

603

drugs and symptoms with CRFs. They then used SVMs with rule-based patterns as features to classify relation between drugs and symptoms to identify true ADEs (Aramaki et al. 2010). Nikfarjam et al. applied word embedding and CRFs for the prediction of ADE with user posts from Daily-Strength and Twitter (Nikfarjam et al. 2015). Henriksson et al. used CRFs to identify relevant named entities (disorders, symptoms, and drugs) and labeled the attributes of the recognized entities (negation, speculation, and temporality) using random forests (Henriksson et al. 2015). More recently, deep learning models, especially recurrent neural network (RNN) models, have demonstrated promising results in sequence tagging and NER tasks because of their ability to learn from the context surrounding the words in a sequence (Lipton et al. 2015). RNNs are neural networks that have additional weights in the network to create cycles that can model time dependencies and sequential events (Williams and Zipser 1995). Long short-term memory (LSTM) networks are variations of RNN that are effective at learning the long-term dependencies between words in a sequence (Hochreiter and Schmidhuber 1997). Bidirectional LSTM (BiLSTM), a modified version of LSTM that enables processing sequential data from both directions, has been used to process medical text data and achieved elevated performances over non-deep learning tools in ADE detection tasks (Roberts et al. 2015; Wang et al. 2016; Wunnava et al. 2019; Christopoulou et al. 2020). Furthermore, attention mechanism is a technique that enables neural networks to selectively focus on specific information, and relation classification (Bahdanau et al. 2014; Zhou et al. 2016). BiLSTM with attention mechanism has achieved promising results in NER and ADE relation classification in clinical narratives (Dandala et al. 2019; Christopoulou et al. 2020). The combination of RNN and CRF has also been applied to NER tasks and found to be effective (Jagannatha and Yu 2016; Tutubalina and Nikolenko 2017; Huang et al. 2015; Christopoulou et al. 2020; Jouffroy et al. 2021; Alfattni et al. 2021). Most of the deep learning models developed for NER tasks use word embedding to represent text data. Word embedding technique takes a large corpus of text as its input and produces a high-dimension vector space through which each unique word in the corpus being assigned a corresponding vector in the space. These word embeddings can either be trained on domain-specific text (e.g., EHR notes, and PubMed articles) (Choi et al. 2016), or a wide variety of general text (e.g., Wikipedia articles) (Pennington et al. 2014). As there is a growing number of NLP systems that have been created for EHRbased pharmacovigilance, most of the systems have been assessed on different datasets and standards, making it challenging to compare their performance. Therefore, two NLP challenges, Medication and Adverse Drug Events from Electronic Health Records (MADE 1.0) (Jagannatha et al. 2019) and the 2018 National NLP Clinical Challenges (n2c2) Shared Task Track 2 (Henry et al. 2020), were organized to detect medical entities, classify relations between named entities, and the joint task of both on an expert-annotated free-text EHR data. The results from these two challenges suggested that deep neural networks outperform traditional machine learning techniques for the NLP tasks, especially for NER and RI (Table 26.1). However, further, improvement is still necessary for the joint NER-RI task. For

604

M. Guan

Table 26.1 Top-performing solutions in MADE 1.0 and n2c2 shared task track 2 challenges for the joint named entity recognition (NER) and relation identification (RI) Challenge

Team name

Solution description

MADE 1.0

IBMResearch-dandala (Dandala et al. 2019)

Two-step model with BiLSTM-CRF for NER and a combined BiLSTM and attention network for RI

MADE 1.0

UArizonaIschool-Xu (Xu et al. 2018)

Two-step model with BiLSTM-CRF for NER and an SVM-based model for RI

MADE 1.0

UofUtah-Patterson (Chapman et al. 2019)

Two-step model with CRF for NER and a random forest model for RI

MADE 1.0

ASU-BMI (Magge et al. 2018)

Two-step model with BiLSTM-CRF for NER and random forest models for RI

N2c2

UTHealth/Dalian (UTH) (Xu et al. 2017)

ADDRESS, a BiLSTM-CRF-based joint NER, and RI system

N2c2

University of Florida (UFL)

Two-step model BiLSTM-CRF for NER and SVM for RI

N2c2

NaCTeM at University of Manchester/Toyota Technological Institute/AIST (NaCT) (Christopoulou et al. 2020)

BiLSTM-CRF for NER and an ensemble method with BiLSTM and attention mechanism/transformer network for RI

N2c2

Medical University of South Carolina (MSC) (Lafferty et al. 2001)

Two-step model with stacked generalization ensemble for NER and SVM for RI

N2c2

VA Salt Lake City/University of Utah (VA)

Two-step model with BiLSTM-CRF and external trained CRF for NER and 2 stages of random forests for RI

As indicated in the table, the combined model with bidirectional long short-term memory (BiLSTM) neural network and conditional random field (CRF) dominated the NER subtask, while multiple ML algorithms achieved encouraging performance in RI subtask

example, incorporating the larger context or outside knowledge that captures longdistance or implicit drug-ADE relations will likely improve the performance of future systems.

26.4 Discussions Early detection and prevention of ADEs is a key component for a safer and higherquality health care. EHRs contain rich collection of detailed clinical information that allow real-time and accurate monitoring of drug safety. As discussed in this article, a wide variety of ML and NLP approaches have been explored to discover ADEs such as medication errors, medication misuse, and DDIs in the unstructured narratives of EHRs. Powerful ML algorithms, especially those based on deep learning, have shown great results in their ability to detect medical entities and capture the relationships

26 Machine Learning for Analyzing Drug Safety in Electronic Health Records

605

between them. Despite the exciting advancements in the methods, there are significant challenges in deploying and utilizing those approaches in the pharmacovigilance systems. 1. Although mining EHRs with advanced ML algorithms can yield equivalent or even better results than traditional SRS systems, clinical narrative continues to be an underutilized source of data for identifying unreported ADEs. In fact, EHR databases are most often used for validating the ADE signals that have been initially detected in SRS databases (Coloma et al. 2013; Vilar et al. 2018). Increasing adoption of the EHR-based pharmacovigilance systems would help to further validate and advance their abilities. 2. Healthcare organizations often lack the computational infrastructure needed for implementation of the latest text processing techniques, thus limiting their adoption of more advanced EHR- and ML-based pharmacovigilance systems (Ching et al. 2018). Therefore, it is essential to develop lightweight NLP systems that are functional and effective with limited computational resources. 3. Despite rapid growth in the number of EHR-based ML systems, it is still lacking universal standards to evaluate the system performance and reproducibility. In addition, language variability and local environmental differences between different clinical organizations limit adoption of NLP solutions across organizational boundaries (Carrell et al. 2017). The increased adoption of the new systems, as well as more collaborative efforts from the research community such as (MADE 1.0) (Jagannatha et al. 2019) and the 2018 National NLP Clinical Challenges (n2c2) Shared Task Track 2 (Henry et al. 2020) will help to address this issue. 4. Finally, rapid development of systems will outpace the regulatory activities. Validation of advanced technologies is critical to ensure that these systems remain reproducible and fit for purpose (OHDSI 2017). Therefore, developing systems that support both regulatory and clinical use cases will be necessary. HER-based ML pharmacovigilance systems have shown great potential to improve drug safety, and thus help to build a better healthcare system for the future. However, significant efforts from the research communities, health organizations, and regulatory agencies are necessary to achieve our goals.

References Alfattni G, Belousov M, Peek N, Nenadic G (2021) Extracting drug names and associated attributes from discharge summaries: text mining study. JMIR Med Inform 9(5):e24678 Andel C, Davidow SL, Hollander M, Moreno DA (2012) The economics of health care quality and medical errors. J Health Care Finance 39(1):39–50. PMID:23155743 Aramaki E, Miura Y, Tonoike M, Ohkuma T, Masuichi H, Waki K, Ohe K (2010) Extraction of adverse drug effects from clinical records. Stud Health Technol Inform 160(Pt 1):739–743 Aronson AR (2001) Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proceedings of the AMIA symposium, pp 17–21

606

M. Guan

Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv arXiv:1409.0473 Banda JM, Callahan A, Winnenburg R, Strasberg HR, Cami A, Reis BY, Vilar S, Hripcsak G, Dumontier M, Shah NH (2016) Feasibility of prioritizing drug-drug-event associations found in electronic health records. Drug Saf 39(1):45–57 Bates DW, Spell N, Cullen DJ, Burdick E, Laird N, Petersen LA, Small SD, Sweitzer BJ, Leape LL (1997) The costs of adverse drug events in hospitalized patients. JAMA 277(4):307–311 Cantor MN, Feldman HJ, Triola MM (2007) Using trigger phrases to detect adverse drug reactions in ambulatory care notes. Qual Saf Health Care 16(2):132–134 Carbonell P, Mayer MA, Bravo À (2015) Exploring brand-name drug mentions on Twitter for pharmacovigilance. Stud Health Technol Inform 210:55–59 Carrell DS, Schoen RE, Leffler DA, Morris M, Rose S, Baer A, Crockett SD, Gourevitch RA, Dean KM, Mehrotra A (2017) Challenges in adapting existing clinical natural language processing systems to multiple, diverse health care settings. J Am Med Inform Assoc 24(5):986–991 Centers for Disease Control and Prevention, National Center for Health Statistics (2013) Underlying cause of death 1999–2013 on CDC WONDER online database, released 2015. Data are from the multiple cause of death files Chapman AB, Peterson KS, Alba PR, DuVall SL, Patterson OV (2019) Detecting adverse drug events with rapidly trained classification models. Drug Saf 42(1):147–156 Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, Ferrero E, Agapow PM, Zietz M, Hoffman MM, Xie W, Rosen GL, Lengerich BJ, Israeli J, Lanchantin J, Woloszynek S, Carpenter AE, Shrikumar A, Xu J, Cofer EM, Lavender CA, Turaga SC, Alexandari AM, Lu Z, Harris DJ, DeCaprio D, Qi Y, Kundaje A, Peng Y, Wiley LK, Segler MHS, Boca SM, Swamidass SJ, Huang A, Gitter A, Greene CS (2018) Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface 15(141):20170387 Choi Y, Chiu CY, Sontag D (2016) Learning low-dimensional representations of medical concepts. AMIA joint summits on translational science proceedings, pp 41–50 Chou TC (2010) Drug combination studies and their synergy quantification using the Chou-Talalay method. Cancer Res 70(2):440–446 Christopoulou F, Tran TT, Sahu SK, Miwa M, Ananiadou S (2020) Adverse drug events and medication relation extraction in electronic health records with ensemble deep learning methods. J Am Med Inform Assoc 27(1):39–46 Ciccarone D (2019) The triple wave epidemic: supply and demand drivers of the US opioid overdose crisis. Int J Drug Policy 71:183–188 Coleman JJ, Pontefract SK (2016) Adverse drug reactions. Clin Med (Lond) 16(5):481–485 Coloma PM, Trifirò G, Patadia V, Sturkenboom M (2013) Postmarketing safety surveillance: where does signal detection using electronic healthcare records fit into the big picture? Drug Saf 36(3):183–197 Corny J, Rajkumar A, Martin O, Dode X, Lajonchère JP, Billuart O, Bézie Y, Buronfosse A (2020) A machine learning-based clinical decision support system to identify prescriptions with a high risk of medication error. J Am Med Inform Assoc 27(11):1688–1694 Correia RB, Li L, Rocha LM (2016) Monitoring potential drug interactions and reactions via network analysis of Instagram user timelines. In: Biocomputing 2016: proceedings of the Pacific symposium, vol 21, pp 492–503 Dandala B, Joopudi V, Devarakonda M (2019) Adverse drug events detection in clinical notes by jointly modeling entities and relations using neural networks. Drug Saf 42(1):135–146 Datta A, Flynn NR, Barnette DA, Woeltje KF, Miller GP, Swamidass SJ (2021) Machine learning liver-injuring drug interactions with non-steroidal anti-inflammatory drugs (NSAIDs) from a retrospective electronic health record (EHR) cohort. PLoS Comput Biol 17(7):e1009053 Duke JD, Friedlin J (2010) ADESSA: a real-time decision support service for delivery of semantically coded adverse drug event data. In: AMIA annual symposium proceedings, pp 177–181

26 Machine Learning for Analyzing Drug Safety in Electronic Health Records

607

Friedman C (2009) Discovering novel adverse drug events using natural language processing and mining of the electronic health record. J Biomed Inform 5651:1–5 Friedman C, Shagina L, Lussier Y, Hripcsak G (2004) Automated encoding of clinical documents based on natural language processing. J Am Med Inform Assoc 11(5):392–402 García-Fuente A, Vázquez F, Viéitez JM, Alonso FJG, Martín JI, Ferrer J (2018) CISNE: an accurate description of dose-effect and synergism in combination therapies. Sci Rep 8(1):1–9 García Rodríguez LA, Pérez Gutthann S (1998) Use of the UK general practice research database for pharmacoepidemiology. Br J Clin Pharmacol 45(5):419–425 Hacker M (2009) Chapter 13—adverse drug reactions. In: Pharmacology. Academic Press, Cambridge Haerian K, Varn D, Vaidya S, Ena L, Chase HS, Friedman C (2012) Detection of pharmacovigilancerelated adverse events using electronic health records and automated methods. Clin Pharmacol Ther 92(2):228–234 Hakkarainen KM, Hedna K, Petzold M, Hägg S (2012) Percentage of patients with preventable adverse drug reactions and preventability of adverse drug reactions—a meta-analysis. PLoS One 7(3):e33236 Hamed AA, Wu X, Erickson R, Fandy T (2015) Twitter K-H networks in action: advancing biomedical literature for drug search. J Biomed Inform 56:157–168 Hazell L, Shakir SA (2006) Under-reporting of adverse drug reactions: a systematic review. Drug Saf 29(5):385–396 Hedegaard H, Miniño AM, Warner M (2020) Drug overdose deaths in the United States, 1999–2018 Henriksson A, Kvist M, Dalianis H, Duneld M (2015) Identifying adverse drug event information in clinical notes with distributional semantic representations of context. J Biomed Inform 57:333– 349 Henry S, Buchan K, Filannino M, Stubbs A, Uzuner O (2020) 2018 n2c2 shared task on adverse drug events and medication extraction in electronic health records. J Am Med Inform Assoc 27(1):3–12 Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 1735–1780 Hu YH, Wu F, Lo CL, Tai CT (2012) Predicting warfarin dosage from clinical data: a supervised learning approach. Artif Intell Med 56(1):27–34 Huang Z, Xu W, Yu K (2015) Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 Humphreys BL, Lindberg DA, Schoolman HM, Barnett GO (1998) The unified medical language system: an informatics research collaboration. J Am Med Inform Assoc 5(1):1–11 Hur J, Tang S, Gunaseelan V, Vu J, Brummett CM, Englesbe M, Waljee J, Wiens J (2021) Predicting postoperative opioid use with machine learning and insurance claims in opioid-naïve patients. Am J Surg 222(3):659–665 Iyer SV, Harpaz R, LePendu P, Bauer-Mehren A, Shah NH (2014) Mining clinical text for signals of adverse drug-drug interactions. J Am Med Inform Assoc 21(2):353–362 Jagannatha AN, Yu H (2016) Structured prediction models for RNN based sequence labeling in clinical text. In: Proceedings of the conference on empirical methods in natural language processing, pp 856–865 Jagannatha A, Liu F, Liu W, Yu H (2019) Overview of the first natural language processing challenge for extracting medication, indication, and adverse drug events from electronic health record notes (MADE 1.0). Drug Saf 42(1):99–111 Jouffroy J, Feldman SF, Lerner I, Rance B, Burgun A, Neuraz A (2021) Hybrid deep learning for medication-related information extraction from clinical texts in French: MedExt algorithm development study. JMIR Med Inform 9(3):e17934 Kramarz P, France EK, Destefano F, Black SB, Shinefield H, Ward JI, Chang EJ, Chen RT, Shatin D, Hill J, Lieu T, Ogren JM (2001) Population-based study of rotavirus vaccination and intussusception. Pediatr Infect Dis J 20(4):410–416

608

M. Guan

Lafferty JD, McCallum A, Fernando CNP (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning, pp 282–289 Lee J, Han H, Ock M, Lee SI, Lee S, Jo MW (2014) Impact of a clinical decision support system for high-alert medications on the prevention of prescription errors. Int J Med Inform 83(12):929–940 Li Q, Spooner SA, Kaiser M, Lingren N, Robbins J, Lingren T, Tang H, Solti I, Ni Y (2015) An end-to-end hybrid algorithm for automated medication discrepancy detection. BMC Med Inform Decis Mak 15:37 Lipton ZC, Berkowitz J, Elkan C (2015) A critical review of recurrent neural networks for sequence learning. arXiv preprint: arXiv:1506.00019 Lo-Ciganic WH, Huang JL, Zhang HH, Weiss JC, Wu Y, Kwoh CK, Donohue JM, Cochran G, Gordon AJ, Malone DC, Kuza CC, Gellad WF (2019) Evaluation of machine-learning algorithms for predicting opioid overdose risk among medicare beneficiaries with opioid prescriptions. JAMA Netw Open 2(3):e190968 Lo-Ciganic WH, Huang JL, Zhang HH, Weiss JC, Kwoh CK, Donohue JM, Gordon AJ, Cochran G, Malone DC, Kuza CC, Gellad WF (2020) Using machine learning to predict risk of incident opioid use disorder among fee-for-service Medicare beneficiaries: a prognostic study. PLoS One 15(7):e0235981 Luo Y, Thompson WK, Herr TM, Zeng Z, Berendsen MA, Jonnalagadda SR, Carson MB, Starren J (2017) Natural language processing for EHR-based pharmacovigilance: a structured review. Drug Saf 40(11):1075–1089 Magge A, Scotch M, Gonzalez-Hernandez G (2018) Clinical NER and relation extraction using bichar-LSTMs and random forest classifiers. In: International workshop on medication and adverse drug event detection, pp 25–30 Marks C, Abramovitz D, Donnelly CA, Carrasco-Escobar G, Carrasco-Hernández R, Ciccarone D, González-Izquierdo A, Martin NK, Strathdee SA, Smith DM, Bórquez A (2021) Identifying counties at risk of high overdose mortality burden during the emerging fentanyl epidemic in the USA: a predictive statistical modelling study. Lancet Public Health 6(10):e720–e728 Murff HJ, Forster AJ, Peterson JF, Fiskio JM, Heiman HL, Bates DW (2003) Electronically screening discharge summaries for adverse medical events. J Am Med Inform Assoc 10(4):339–350 Nagata K, Tsuji T, Suetsugu K, Muraoka K, Watanabe H, Kanaya A, Egashira N, Ieiri I (2021) Detection of overdose and underdose prescriptions—an unsupervised machine learning approach. PLoS One 16(11):e0260315 Nahta R, Hung MC, Esteva FJ (2004) The HER-2-targeting antibodies trastuzumab and pertuzumab synergistically inhibit the survival of breast cancer cells. Cancer Res 64(7):2343–2346 Nikfarjam A, Sarker A, O’Connor K, Ginn R, Gonzalez G (2015) Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. J Am Med Inform Assoc 22(3):671–681 OHDSI (2017) Large-scale adverse effects related to treatment evidence standardization (LAERTES): an open scalable system for linking pharmacovigilance evidence sources with clinical data. J Biomed Semantics 8(1):11 Palumbo SA, Adamson KM, Krishnamurthy S, Manoharan S, Beiler D, Seiwell A, Young C, Metpally R, Crist RC, Doyle GA, Ferraro TN, Li M, Berrettini WH, Robishaw JD, Troiani V (2020) Assessment of probable opioid use disorder using electronic health record documentation. JAMA Netw Open 3(9):e2015909 Patrick MT, Bardhi R, Raja K, He K, Tsoi LC (2021) Advancement in predicting interactions between drugs used to treat psoriasis and its comorbidities by integrating molecular and clinical resources. J Am Med Inform Assoc 28(6):1159–1167 Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

26 Machine Learning for Analyzing Drug Safety in Electronic Health Records

609

Polepalli RB, Belknap SM, Li Z, Frid N, West DP, Yu H (2014) Automatically recognizing medication and adverse event information from food and drug administration’s adverse event reporting system narratives. JMIR Med Inform 2(1):e10 Ramesh BP, Belknap SM, Li Z, Frid N, West DP, Yu H (2014) Automatically recognizing medication and adverse event information from food and drug administration’s adverse event reporting system narratives. JMIR Med Inform 2(1):e3022 Roberts K, Simpson MS, Voorhees EM, Hersh WR (2015) Overview of the TREC 2015 clinical decision support track. In: Proceedings of the annual text retrieval conference Roche-Lima A, Roman-Santiago A, Feliu-Maldonado R, Rodriguez-Maldonado J, NievesRodriguez BG, Carrasquillo-Carrion K, Ramos CM, da Luz SI, Massey SE, Duconge J (2020) Machine learning algorithm for predicting warfarin dose in Caribbean Hispanics using pharmacogenetic data. Front Pharmacol 10:1550 Sampathkumar H, Chen XW, Luo B (2014) Mining adverse drug reactions from online healthcare forums using hidden Markov model. BMC Med Inform Decis Mak 14:91 Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG (2010) Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc 17(5):507–513 Schiff GD, Volk LA, Volodarskaya M, Williams DH, Walsh L, Myers SG, Bates DW, Rozenblum R (2017) Screening for medication errors using an outlier detection system. J Am Med Inform Assoc 24(2):281–287 Seal KH, Shi Y, Cohen G, Cohen BE, Maguen S, Krebs EE, Neylan TC (2012) Association of mental health disorders with prescription opioids and high-risk opioid use in US veterans of Iraq and Afghanistan. JAMA 307(9):940–947 Segal G, Segev A, Brom A, Lifshitz Y, Wasserstrum Y, Zimlichman E (2019) Reducing drug prescription errors and adverse drug events by application of a probabilistic, machine-learning based clinical decision support system in an inpatient setting. J Am Med Inform Assoc 26(12):1560–1565 Segal Z, Radinsky K, Elad G, Marom G, Beladev M, Lewis M, Ehrenberg B, Gillis P, Korn L, Koren G (2020) Development of a machine learning algorithm for early detection of opioid use disorder. Pharmacol Res Perspect 8(6):e00669 Skentzos S, Shubina M, Plutzky J, Turchin A (2011) Structured vs. unstructured: factors affecting adverse drug reaction documentation in an EMR repository. In: AMIA annual symposium proceedings, pp 1270–1279 Suissa S, Garbe E (2007) Primer: administrative health databases in observational studies of drug effects—advantages and disadvantages. Nat Clin Pract Rheumatol 3(12):725–732 Sun JW, Franklin JM, Rough K, Desai RJ, Hernández-Díaz S, Huybrechts KF, Bateman BT (2020) Predicting overdose among individuals prescribed opioids using routinely collected healthcare utilization data. PLoS One 15(10):e0241083 Turchin A, Shubina M, Breydo E, Pendergrass ML, Einbinder JS (2009) Comparison of information content of structured and narrative text data sources on the example of medication intensification. J Am Med Inform Assoc 16(3):362–370 Tutubalina E, Nikolenko S (2017) Combination of deep recurrent neural networks and conditional random fields for extracting adverse drug reactions from user reviews. J Healthc Eng 2017:9451342 Vajravelu RK, Scott FI, Mamtani R, Li H, Moore JH, Lewis JD (2018) Medication class enrichment analysis: a novel algorithm to analyze multiple pharmacologic exposures simultaneously using electronic health record data. J Am Med Inform Assoc 25(7):780–789 Vilar S, Friedman C, Hripcsak G (2018) Detection of drug-drug interactions through data mining studies using clinical sources, scientific literature and social media. Brief Bioinform 19(5):863– 877 Visweswaran S, Hanbury P, Saul M, Cooper GF (2003) Detecting adverse drug events in discharge summaries using variations on the simple Bayes model. In: AMIA annual symposium proceedings, pp 689–693

610

M. Guan

Vowles KE, Witkiewitz K, Cusack KJ, Gilliam WP, Cardon KE, Bowen S, Edwards KA, McEntee ML, Bailey RW (2020) Integrated behavioral treatment for veterans with co-morbid chronic pain and hazardous opioid use: a randomized controlled pilot trial. J Pain 21(7–8):798–807 Wang X, Hripcsak G, Markatou M, Friedman C (2009) Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. J Am Med Inform Assoc 16(3):328–337 Wang X, Chase H, Markatou M, Hripcsak G, Friedman C (2010) Selecting information in electronic health records for knowledge acquisition. J Biomed Inform 43(4):595–601 Wang Y, Rastegar MM, Elayavilli RK, Liu S, Liu H (2016) An ensemble model of clinical information extraction and information retrieval for clinical decision support. TREC Ward R, Weeda E, Taber DJ, Axon RN, Gebregziabher M (2021) Advanced models for improved prediction of opioid-related overdose and suicide events among veterans using administrative healthcare data. Health Serv Outcomes Res Methodol 1–21 White RW, Tatonetti NP, Shah NH, Altman RB, Horvitz E (2013) Web-scale pharmacovigilance: listening to signals from the crowd. J Am Med Inform Assoc 20(3):404–408 White RW, Wang S, Pant A, Harpaz R, Shukla P, Sun W, DuMouchel W, Horvitz E (2016) Early identification of adverse drug reactions from search log data. J Biomed Inform 59:42–48 Williams RJ, Zipser D (1995) Gradient-based learning algorithms for recurrent networks and their computational complexity. In: Zipser D, Williams RJ (eds) Backpropagation: theory, architectures, and applications. L. Erlbaum Associates Inc., Hillsdale, pp 433–486 Wilson N, Kariisa M, Seth P, Smith H IV, Davis NL (2020) Drug and opioid-involved overdose deaths—United States, 2017–2018. Morb Mortal Wkly Rep 69(11):290 Wongyikul P, Thongyot N, Tantrakoolcharoen P, Seephueng P, Khumrin P (2021) High alert drugs screening using gradient boosting classifier. Sci Rep 11(1):20132 Wunnava S, Qin X, Kakar T, Sen C, Rundensteiner EA, Kong X (2019) Adverse drug event detection from electronic health records using hierarchical recurrent neural networks with dual-level embedding. Drug Saf 42(1):113–122 Xu J, Lee HJ, Ji Z, Wang J, Wei Q, Xu H (2017) UTH_CCB system for adverse drug reaction extraction from drug labels at TAC-ADR 2017. In: Proceedings of the text analysis conference Xu D, Yadav V, Bethard S (2018) UArizona at the MADE1. 0 NLP challenge. In: International workshop on medication and adverse drug event detection, pp 57–65 Yang H, Yang CC (2016) Discovering drug-drug interactions and associated adverse drug reactions with triad prediction in heterogeneous healthcare networks. In: 2016 IEEE international conference on healthcare informatics (ICHI) Chicago, IEEE, pp 244–254 Zhou P, Shi W, Tian J, Qi Z, Li B, Hao H, Xu B (2016) Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th annual meeting of the association for computational linguistics, vol 2, pp 207–212

Chapter 27

Powering Toxicogenomic Studies by Applying Machine Learning to Genomic Sequencing and Variant Detection Li Tai Fang

27.1 Introduction Many diseases have underlying genomic components. While hereditary diseases are usually driven by germline variants, cancers are primarily caused by somatic mutations in cells that allow them to grow and divide uncontrollably. After cancer cells have gained the ability to keep dividing without ever undergoing apoptosis, cancer evolution through continuous somatic mutations also enables them to evade anticancer therapies. There are many toxins and mutagens in the environment that induce mutations in the DNA, thus increasing the chances that a healthy cell will eventually acquire the right combination of driver mutations for tumorigenesis. Identifying mutations in cancer is a critical step in understanding cancer biology and the environmental factors that play a role in causing cancer. The advent of next-generation sequencing (NGS) has allowed entire human genomes to be sequenced quickly and cheaply. As of the writing of this chapter, it costs less than 10 cents to sequence one million base pairs (bps) of DNA using NGS. Thus, the cost of a human wholegenome sequencing (at 30-fold per base coverage) can be achieved for well under $1000 (NHGRI). It has thus become increasingly common for cancer researchers to sequence the tumor and the matched normal samples (typically blood or adjacent healthy tissue) in the same patients to find all the somatic mutations in the tumor. In paired tumor-normal sequencing, somatic mutations are pinpointed at genomic coordinates that have variant base(s) in the tumor and only reference base(s) in the matched normal. Second-generation sequencing technologies (Slatko et al. 2018) are marked by high-throughput (billions) of short-reads (hundreds of bps). Second-generation

L. T. Fang (B) Freenome, 279 East Grand Ave, South San Francisco, CA 94080, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Hong (ed.), Machine Learning and Deep Learning in Computational Toxicology, Computational Methods in Engineering & the Sciences, https://doi.org/10.1007/978-3-031-20730-3_27

611

612

L. T. Fang

sequencing technologies are currently the primary NGS platforms used in translational research and clinical settings. For simplicity, NGS in this chapter refers to second-generation sequencing unless otherwise specified. The sequencing libraries determine what is actually sequenced, e.g.: 1. Whole-genome sequencing (WGS), where the genomic DNA is fragmented using sonication or restriction enzymes into the optimal lengths for sequencing, i.e., hundreds of bps long. The sequencing library consists of DNA fragments with universal adapters attached and sequenced, which are mostly random events. To ensure that the entirety of the genome is sequenced, a standard practice is to sequence the whole genome with at least 30× on average, so that the vast majority of the genome will have > 10× sequencing depth to enable confident genotyping across the whole genome (Meynert et al. 2014). 2. Targeted sequencing, where some genomic regions of interest are selected and enriched in the sequencing library. Whole exome sequencing (WES) are popular libraries that only select for known coding regions of the genome, such that the sequencing will not be “wasted” on less valuable “junk” DNA and introns. WES libraries are usually sequenced at > 100× to account for additional biases that come with target capture. Clinical sequencing for cancer often targets genomic regions that are consequential for diagnosis, prognosis, or treatment, and are often sequenced at multiples of 1000× to capture variants with low clonalities. Targeted sequencing can be performed with two methods: (a) Hybrid capture, which uses probes with complementary sequences to the regions of interest to physically capture those DNA fragments. The captured fragments even by the same probe will have different lengths and sequences due to the randomness of the fragmentation process. (b) PCR enrichment, which uses specific primer sequences to replicate and amplify the regions of interest. The enriched DNA sequences of the same regions will usually have identical sequences due to the nature of PCR replication. 3. Sequencing strategies to measure RNA expression, methylation, protein-DNA interaction, etc., are out of scope for this chapter. Due to the low cost and high-throughput, NGS is an excellent technology to detect small variants, i.e., single nucleotide variations (SNV) or small indels. Largescale aberrations like copy number variation and chromosomal rearrangement can also be detected on NGS, but good accuracies are difficult to achieve for genomic aberrations which are orders of magnitude larger than the lengths of the sequencing reads (Sedlazeck et al. 2018; Gong et al. 2021). In this chapter, we focus on the application of machine learning methods to detect small variants. Today, labs that have adapted current best practices in NGS can achieve very high accuracy in germline variant calling in the vast majority of the human genome, i.e., excluding low-complexity, low-mappability, or highly polymorphic regions (Krusche et al. 2019). Since germline variants are mostly a matter of heredity, they are much

27 Powering Toxicogenomic Studies by Applying Machine Learning …

613

easier to detect than somatic mutations because reads supporting germline variants are expected to make up either 50% (heterozygous) or 100% (homozygous) of all the reads covering the variant candidate position. Thus, genomic positions with a low fraction of variant-supporting reads can usually be discarded as noise. On the other hand, due to the relative rarity of somatic mutations (Kandoth et al. 2013) and the heterogeneity and complex copy number profiles (Storchova and Kuffer 2008) of tumor that gives rise to a full range of variant allele frequencies (VAF) that ranges from nearly 0 to 100%, low-VAF calls cannot simply be discarded as noise. Thus, the number of false positives in a somatic mutation call set often outnumbers actual somatic mutations by orders of magnitude. The state-of-the-art sequencing error rates for the widely adopted Illumina sequencing is approximately 0.1% (Stoler and Nekrutenko 2021). A 30× whole-genome sequencing (WGS) has approximately 9 × 1010 total bases sequenced, resulting in approximately 90 million base call errors that may potentially lead to false positive mutation calls based on sequencing errors alone. Nor do we completely understand the sequencing error profiles of different sequencing contexts, e.g., low GC, homopolymer, or repetitive sequences, etc. Informatic challenges such as mapping short-reads to an incomplete reference genome, especially for the above-mentioned challenging sequences, makes the accurate detection of somatic mutations particularly difficult on a genomic scale.

27.2 Machine Learning in Genomic Variant Detections In the context of genomic variant detection, machine learning (ML) classifiers are computer-aided pattern recognition tools that predict whether a non-reference base call is a real variant or a false positive based on the biological (e.g., common variant position, etc.), genomic (e.g., mappability and complexity of the genomic region, etc.), and sequencing (e.g., read quality, etc.), and bioinformatic features (e.g., mapping confidence) of each variant candidate. To build machine learning classifiers for variant detection, a training data set must be assembled. The training data is a list of variant candidates that contains both real variants as well as false positives, and the features associated with each candidate. The ML algorithm can learn to predict true positives and false positives based on the features associated with each class. For “classic” machine learning algorithms, the features are chosen manually based on researchers’ experience and expertise. Some features highly predictive of false positives are mapping quality scores, alignment edit distances, base quality scores, common variant positions, and the statistical differences for those metrics between variant-supporting and reference-supporting reads (Fang et al. 2015). More recently, there is a lot of active research in the adaptation of deep learning algorithms that directly use read alignment images to build variant classifiers. Table 27.1 lists a number of open-source machine learning-based variant detection algorithms for NGS. In this chapter, we will not attempt to exhaustively review all available machine learning software in this space or their technical details. Rather, we will

614

L. T. Fang

describe representative machine learning software tools, each of which has taken a different general approach, and discuss their performances and limitations. Table 27.1 Partial list of open-source machine learning-based variant calling software Software

Brief description

References

Source repo

SomaticSeq

Uses XGBoost implemented in python to filter out false positives from combined call sets collected from an ensemble of somatic mutation callers. The model uses manually curated genomic and sequencing features

Fang et al. (2015)

http://github.com/bioinf orm/somaticseq

DeepVariant Built on top of TensorFlow Poplin et al. (2018) to classify haplotypes in germline variant detection. Inputs are pileup images of reads covering the variant candidates, with extra genomic and sequencing metrics encoded as additional channels. Has been used to build germline variant detection in multiple platforms, but does not currently support somatic mutation detection

https://github.com/google/ deepvariant

Uses random forest Wood et al. (2018) implemented in scikit-learn. Model uses manually curated genomic and sequencing features to remove false positives from somatic mutation candidates that are captured by two aligners and survived some pre-determined hard filters

https://github.com/PGDX/ cerebro-paper

Cerebro

(continued)

27 Powering Toxicogenomic Studies by Applying Machine Learning …

615

Table 27.1 (continued) Software

Brief description

References

Source repo

SMuRF

Uses random forest implemented in R to filter out false positives from combined call sets collected from an ensemble of somatic mutation callers. Model uses features that are directly output by the callers

Huang et al. (2019)

https://github.com/ska ndlab/SMuRF

NeuSomatic

Uses deep-CNN Sahraeian et al. (2019) https://github.com/bioinf implemented in PyTorch orm/neusomatic to classify somatic mutations, where the input image is a summary of base call tables within a specified window, with extra channels describing additional genomic and sequencing features. Focused on somatic mutations in Illumina sequencing data, but the algorithm has been modified to handle germline variants with other long-read platforms

DeepSSV

Uses deep-CNN implemented in TensorFlow, where the inputs are mixed pileup images from both tumor and normal aligned reads, with extra sequencing and alignment quality metrics encoded as additional channels

Meng et al. (2021)

https://github.com/jin gmeng-bioinformatics/Dee pSSV

SICaRiO

Uses XGBoost with genomic context as features to remove false positive indel calls from any arbitrary call sets

Bhuyan et al. (2021)

https://bitbucket.org/islam2 059/sicario

(continued)

616

L. T. Fang

Table 27.1 (continued) Software

Brief description

Octopus

Primarily a genotyper but Cooke et al. (2021) accounts for arbitrary and variable ploidy, takes into account biological priors. Additionally applies a random forest model to remove false positives. Detects both germline and somatic mutations

References

Source repo https://github.com/lunter group/octopus

Ordered by the time of publication

27.2.1 Machine Learning Algorithms in Germline Variant Detection One of the most widely used variant calling bioinformatic pipeline in NGS is the Broad Institutes’ Genome Analysis Toolkit (GATK). The series of bioinformatic tasks that make up the germline variant detection workflow are (1) adapter and lowquality base trimming, (2) read alignment, (3) PCR and optical duplicate removal,1 (4) base quality score recalibration (BQSR), and (5) variant quality score recalibration (VQSR). The workflow is widely known as the GATK Best Practice (Auwera et al. 2013). Even though the “GATK Best Practice” as a whole is not generally regarded as a machine learning algorithm, BQSR and VQSR are two machine learning procedures that improve the accuracy and estimate the confidence of the variant calls (i.e., assign probability estimate that a variant call is a true positive). Base quality (BQ) score is a Phred-scaled error estimate emitted by the sequencer estimating the probability that a base is identified correctly. The majority of Illumina sequencers’ base quality scores range between Q30 (i.e., implying the base call is 99.9% correct) and Q40 (i.e., implying 99.99% correct). BQSR is a machine learningbased process that recalibrates the base quality scores to better reflect their true error rate in different sequencing contexts, in order to better represent their true error profiles. More accurate base error estimation will improve variant quality scores (i.e., a false positive variant call can be a result of a base call error). The BQSR process assumes reference calls and variant base calls at common single nucleotide polymorphisms (SNP), such as those in dbSNP (Sherry 2001) or ExAC (Lek et al. 2016) databases, as correct base calls (i.e., labeled true positives), whereas variant base calls elsewhere are estimated as mostly errors (i.e., labeled false positives). Then, for all the base calls in the sequencing data, it will recalibrate each base quality score based on the empirical error rate estimated from those inputs, with input features 1

PCR duplicate removal is based on the assumption that DNA fragments with the same start and end positions in the genome are rare events due to the randomness of fragmentation, so they must be PCR duplicate and should be counted only once. However, PCR enrichment-based targeted sequencing uses PCR to duplicate reads in order to increase signal, so this step cannot be used.

27 Powering Toxicogenomic Studies by Applying Machine Learning …

617

such as the sequencing context, its position on the sequencing reads, and the original base quality scores. VQSR follows the same concept to calibrate the scores of variant calls, where users can define certain input features such as depth of coverage, depth of variant allele coverage, mapping quality scores, position on the reads, etc. The support machine vector (SVM) estimator recalibrates BQ and assigns variant quality scores for each data set. One of the best known machine learning-based variant detectors in highthroughput sequencing today is DeepVariant, a deep convolutional neural network (CNN)-based algorithm developed by Google. It initially used eight WGS data sets created by Illumina short-read sequencers from the Genome in a Bottle Consortium (GIAB) to create a very successful classifier in its initial publication (Poplin et al. 2018). Since then, Google has incorporated additional training data sets and improved its algorithm to create models that are more accurate and more widely applicable toward different data types, including PacBio and Oxford Nanopore sequencing data (DeepVariant GitHub). Previous to genomic sequencing, CNN-based models have been widely adapted for image classifications. To determine difficult variant candidates, scientists would often look at aligned reads in a genome browser, and make a manual determination based on their expertise whether a variant candidate is a real variant or false positive. Thus, a genomic variant problem can be transformed into an image classification problem, making CNNs well-suited to classify genomic variants in NGS data. For DeepVariant, instead of manually selecting genomic and sequencing features associated with each variant candidate, it projects the pileup image of all the reads spanning the variant candidate as input where each base can be represented as a pixel (Fig. 27.1). Based on labeled training data (e.g., GIAB), DeepVariant is able to learn from features encoded in those images and classify them into three classes: homozygous variant, heterozygous variant, or false positive variant. DeepVariant and its subsequent versions have achieved outstanding accuracy by winning four out of 12 categories in the precisionFDA v2 Truth Challenge, i.e., (1) all benchmarking regions with PacBio, (2) difficult-to-map regions with Oxford Nanopore, (3) all benchmarking regions with Oxford Nanopore, and (4) all benchmarking regions with Illumina, PacBio, and Oxford Nanopore (Olson et al. 2021). A major advantage of imaged-based deep learning approaches like DeepVariant is that it can easily be applied to new sequencing technologies with relative ease. Thus, it was applied to data from the PacBio and Oxford Nanopore sequencing platforms without a large amount of additional research input, after being originally developed with data from Illumina sequencing platforms. Essentially, the only extra work needed to create DeepVariant classifiers for PacBio and/or Oxford Nanopore generated data was to sequence the GIAB reference samples on these platforms, and then use the same high-confidence call set as labels to train new models. Thus, comprehensive and well-characterized reference data sets are critical for machine learning models like DeepVariant.

Fig. 27.1 IGV snapshot representing (DeepVariant’s actual pileup image is a matrix where each base or genomic position is represented by a single pixel value. Different bases are represented by different matrix values that can be visualized as four different intensities on a gray scale image. This IGV snapshot is not the actual input matrix, but simply a human-friendly visual representation) pileup image for DeepVariant, where a window representing 221-bp in length (i.e., 110 bps to the left and 110 bps to the right of the variant position, represented by 221 pixels in width) and 100 reads in sequencing depth (represented by 100 pixels in height) is created for each variant candidate. Additional channels representing base call qualities (BQ) of each base, mapping qualities (MQ) of each read, their strandness (forward or reverse), base calls at the variant positions, as well as all non-reference bases in the window are additional parts of the features

618 L. T. Fang

27 Powering Toxicogenomic Studies by Applying Machine Learning …

619

27.2.2 Challenges in Somatic Mutation Calling A germline variant may occur in either one or both alleles in the diploid human genome, corresponding to variant allele frequencies (VAF) of either 50% or 100% that are represented in sequencing reads (except in relatively rare cases of copy number aberrations or aneuploidy). However, the same assumptions cannot be made for somatic mutations. A tumor sample has different clonal populations and far more complex copy number aberration profiles (Storchova and Kuffer 2008), resulting in a full range of possible 0 < VAF ≤ 1, i.e., down from mutations that occurred in one single cell and up to being a homozygous mutation. Thus, detecting somatic mutations is not simply finding genomic positions where the predicted genotypes are different between a tumor sample and its matched normal samples. Long gone were the early days when “naive subtraction” was used to detect somatic mutations in cancer NGS data, i.e., re-purposing germline variant callers to find mutation candidates in the cancer genome, and then filter out those variants that are also called in the matched normal (Pleasance et al. 2010). More recently, due to the greater challenges of somatic mutation detection, a plethora of software has become available. Each was designed specifically to call somatic mutations in paired tumor-normal NGS data, but were often optimized for different data characteristics (e.g., different sequencing depths, WGS vs. target captures vs. PCR enrichment) and sample types (e.g., tumors of different mutation burdens, tumor purities, tumor-normal cross contaminations, etc.). Their performances do not translate universally. A number of widely used somatic mutation callers include but are not limited to: MuTect2 (Benjamin et al. 2019), VarScan2 (Koboldt et al. 2012), SomaticSniper (Larson et al. 2012), VarDict (Lai et al. 2016), Strelka2 (Kim et al. 2018), MuSE (Fan et al. 2016), TNscope (Freed et al. 2018), and Lancet (Narzisi et al. 2018). Each caller takes into account challenges specific to somatic mutations, i.e., (1) widely ranging variant allele frequencies (VAF) due to tumor heterogeneity or tumor purity, (2) vastly different somatic mutation rate depending on cancer types (Alexandrov et al. 2013). In fact, the somatic mutation rate in cancer is usually less than the sequencing error rate of current technologies, and (3) expected level of tumor-normal cross contamination.

27.2.3 Machine Learning to Improve Accuracy of Somatic Mutation Detection A logistic regression approach used sequencing features and degree of consensus from three combined somatic mutation callers in addition to a small number of genomic and sequencing features (i.e., sequencing depth, substitution types, and VAF) and metrics from the callers themselves (e.g., mutation scores) to improve the accuracy of somatic mutation calls (Kim et al. 2014). Inspired by this approach,

620

L. T. Fang

SomaticSeq maximizes sensitivity by incorporating up to 11 current somatic mutation callers, and then uses over 100 biological, genomic, sequencing, and bioinformatic features to filter out false positives to obtain much better overall accuracies (Fang et al. 2015). SomaticSeq was used to obtain #1 in indel and #2 in SNV in the Stage 5 of the ICGC-TCGA DREAM Somatic Mutation Calling Challenge by training on sequencing data and truth sets from the previous stages (DREAM Challenge). SomaticSeq has been incrementally improved since then, e.g., adding more predictive features such as linguistic sequence complexity (Troyanskaya et al. 2002) and replacing AdaBoost with the better computationally optimized XGBoost. One limitation with the “classic” machine learning approaches like SomaticSeq is that it still requires human expertise to manually determine what genomic and sequencing features should be included in the model. Each feature’s predictive value is not universal across different data types. For example, for Illumina short-read data, reads with a large number of mismatch and indel calls in addition to the variant candidate is likely a poor quality or mismapped reads. This feature can be summarized as the edit distance of a read, and reads with large edit distance can be discounted or ignored entirely. However, in error-prone long-read platforms such as PacBio and Oxford Nanopore, multiple mismatches or indel alignments in a single long-read represent expected behavior. Features that may be predictive in long-read data types were not implemented in SomaticSeq because it was designed with short-read data types in mind. Building new SomaticSeq models for new platforms will require new expertise and additional research input. In addition, some read-level information that may be obvious in a pileup image representation (e.g., cluster of mismatches and/or indel events) can be very difficult to summarize as manually curated features. Most recently, due to the development and increasing availability of deep learning software (e.g., TensorFlow by Google, PyTorch by Facebook, etc.), using deep learning approaches to detect somatic mutations in genomic sequencing data has become an active area of research. In theory, the developers of DeepVariant should be able to extend the algorithm for somatic mutations with modest modifications. Nevertheless, developing deep learning models for somatic mutations has additional challenges that need to be overcome. Somatic mutation detection will require at least double the input data due to tumor-normal paired sequencing. In addition, sequencing for the purpose of detecting somatic mutations is often done at considerably higher depth due to the range of somatic VAF’s. Each of these will demand greater computing resources, potentially making it inaccessible to all but the most tech-capable organizations. However, a more fundamental problem may be the lack of training data for somatic mutation detections. The initial release of DeepVariant was trained on eight GIAB WGS data sets. Each WGS has nearly 4 million highconfidence germline variants that can be used as training data, and eight WGS data sets will have over 30 million germline variants for training, ensuring a training data of great technical and biological diversity for a highly complex machine learning algorithm. On the contrary, no similar resource has existed for somatic mutation data sets until recently when Sequencing Quality Control Phase 2 Consortium (SEQC2) released over 20 multi-center and multi-platform WGS data sets with corresponding high-confidence somatic mutations for one pair of tumor-normal breast cancer cell

27 Powering Toxicogenomic Studies by Applying Machine Learning …

621

lines (Fang et al. 2021). However, even the SEQC2’s WGS somatic reference samples have only about 40,000 unique somatic variants due to the relative rarity of somatic mutations versus germline variants in a genome. A complex deep learning algorithm with large input data matrices will require a very large training data to capture all the diverse variations of somatic mutations in the genomic and sequencing context. With somatic mutations having greater range of VAFs compared to germline variants, 40,000 somatic mutations may still be insufficiently diverse in biological and genomic contexts to create a robust classifier for highly complex image classifiers like DeepVariant. While highly complex models like DeepVariant will require large amounts of computing resources and training data to build robust somatic mutation models, NeuSomatic, also a CNN-based algorithm developed at Roche Sequencing Solutions, explored a less complex and less demanding model (Sahraeian et al. 2019). Instead of using all individual reads within an alignment window (akin to a pileup image) as the input matrices, NeuSomatic compresses this information into a summary table (Fig. 27.2). This strategy has the advantage of smaller matrices as input features, which will require less complex training data and can be trained with a simplified network architecture, enabling a substantially more efficient implementation. It is also highly scalable for high sequencing depth data. In the DeepVariant scheme, high sequencing depth data will have to be down-sampled to fit the window height (i.e., 100 pixels

Fig. 27.2 Example of the input matrices for the NeuSomatic model. First, the reads are mapped to the reference genome, then the total number of base calls at each genomic position within the window is compressed into the summary tables shown on the right. The model will classify whether the genomic position in the middle of the summary table is a somatic variant or not. The matrices represent the number of gaps (i.e., the “-” on top), A, C, G, and T at each genomic coordinate within the window. Image was adapted from Sahraeian et al. (2019)

622

L. T. Fang

corresponds to a maximum of 100× in depth), which will cause loss of potentially relevant sequencing information. Alternatively, the software needs to be reconfigured to accept much larger images, which will drastically increase the requirement for computational resources. The NeuSomatic scheme, on the other hand, does not require a larger matrix for higher sequencing data as demonstrated in Fig. 27.2. On the other hand, some of the detailed read-level information as shown in Fig. 27.1 is also lost (Fig. 27.3). To recover some of the read-level information without using the entirety of a pileup image, NeuSomatic includes a number of additional input channels that contain summary metrics related to individual read alignment, e.g., edit distances in both variant- and reference-supporting reads, etc. For the example in Fig. 27.3, the first alignment will result in average edit distances of 3 for variant-supporting reads and 0 for reference-supporting reads. The second alignment will result in average edit distances of 1 for variant-supporting reads and 1.33 for reference-supporting reads. Despite some inevitable loss of information in the NeuSomatic data input strategy compared with full pileup input, this approach has proven to be highly capable by achieving the best accuracies in two out of 12 categories in the precisionFDA v2 Truth Challenge: (1) difficult-to-map regions with Illumina, PacBio, and Oxford Nanopore data and (2) tied for all benchmark regions with Illumina, PacBio, and Oxford Nanopore data (Olson et al. 2021). Since NeuSomatic’s initial publication trained on in silico spike in data, the developers have subsequently trained the model on a diverse set of sequencing data types from the SEQC2’s tumor-normal reference data and call sets. Various models were trained and tested from a total of 119 sequencing replicates with millions of somatic mutation instances from eight sequencing centers of WGS and WES with different sequencing depths, fresh and formalin-fixed paraffin-embedded (FFPE) DNA, admixtures of tumor-normal DNA representing different levels of tumornormal cross contamination (5–100% purities for tumor, and 95–100% purities for normal), different sonication durations resulting in different DNA fragment lengths, and different DNA input amount (1 ng to 1 µg) to create classifiers representing each, some, and all of those experimental conditions (Sahraeian et al. 2022). In general, classifiers trained with a variety of data will be more robust across a variety of data. On the other hand, classifiers trained with one specific data type will be more accurate but only toward testing data of the same type. In other words, if you want the best accuracy for somatic mutation in FFPE samples, you should train only with FFPE data. On the other hand, a NeuSomatic classifier trained with all of the data specified above will be quite robust and accurate in tumor-normal somatic mutation detection across most short-read sequencing data.

27 Powering Toxicogenomic Studies by Applying Machine Learning …

623

Fig. 27.3 Toy example showing two different alignments with different pileup images representing two different situations, but the same summary table used by NeuSomatic. In each of the two alignments, the top three reads are reference-supporting reads. The bottom two reads are variantsupporting reads. The dashed vertical lines indicate the variant candidate. In the first alignment, all the mismatches in this window in addition to the variant in question are all carried by the two variantsupporting reads, i.e., both variant-supporting reads have two more mismatches in addition to the variant candidate (i.e., edit distances of 3), whereas the three reference-supporting reads are perfectly matched to the reference (edit distances of 0). In the second alignment, the variant-supporting reads have no mismatch other than the variant candidate (edit distances of 1). The additional mismatches in the window are carried in the three reference-supporting reads (edit distances of 1, 1, and 2). The two scenarios represent two different issues that cannot be distinguished from the base call summary table. However, some of the missing information can be encoded in alignment metrics such as edit distances

27.3 Training Data for Machine Learning-Based Variant Callers All machine learning algorithms require training data from which patterns can be learned and then applied to future data sets. To build an effective machine learning variant classifier, the training data must represent the sequencing samples and data on which the classifier is expected to apply. Therefore, finding the right training data is arguably the most critical step in developing effective machine learning models. The Genome in a Bottle Consortium (GIAB), hosted by the National Institute of Standards and Technology (NIST), has developed a number of reference human cell lines. Based on these cell lines, they have released multi-center and multi-platform sequencing data and high-confidence germline variant reference call sets that can serve as excellent training data for the development of machine learning algorithms

624

L. T. Fang

for germline variant detection (Zook et al. 2014, 2019). Each human genome contains millions of germline variants that can be used as training data. On the other hand, high-quality training data for somatic mutations have been lacking in the community. Between 2014 and 2015, the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas Program (TCGA) jointly cosponsored the DREAM Somatic Mutation Calling Challenge, where they spiked in silico somatic mutations into actual aligned WGS data sets to create synthetic tumors, i.e., it computationally changes base calls in aligned reads and labeled them as somatic mutations (Ewing et al. 2015). Those synthetic data sets were leveraged to benchmark different algorithms. The BAMSurgeon method can be used to create training data but synthetic tumor certainly has limitations, e.g., it relies on the assumption that the reads were mapped and aligned correctly in the sequencing data where the in silico somatic mutations are spiked in Ewing et al. (2015). A number of established reference samples and data sets in the past lack the GIAB-like comprehensive high-confidence somatic mutation call set that can be used to label both true positives and false positives (Alioto et al. 2015; Craig et al. 2016; MDIC 2019). In 2021, the SEQC2 Consortium has published numerous WGS and WES replicates from multiple sequencing centers with different sample volumes, sample preparations, library preparation kits, and sequencing platforms for a single pair of tumornormal cell lines along with the high-confidence somatic mutation reference call set (Fang et al. 2021). This diverse set of publicly available sequencing data contains a more representative range of sequencing metrics and qualities in terms of these variables stated above, and will therefore facilitate the development of machine learning models in the realm of WGS and WES somatic mutation detection (Sahraeian et al. 2022). Nevertheless, the SEQC2’s tumor-normal reference samples were derived from a single breast cancer cell line with approximately 40,000 somatic mutations. With over 100 technical replicates of this tumor-normal reference samples, millions of somatic mutation instances are generated across multiple sequencing centers and platforms to capture a wide spectrum of technical diversities. Nevertheless, 40,000 unique somatic variants pale in comparison with millions of unique germline variants characterized by GIAB, and may lack the full spectrum of genomic and biological diversities desired by the most complex deep learning models. While the SEQC2’s reference samples and call sets represent a great milestone in the development of somatic mutation reference data sets, more such efforts in a variety of cancer types are necessary to push the field forward.

27.4 Conclusion Machine learning is an excellent approach to accurately detect mutations in highthroughput sequencing data due to the complexity of sequencing data sets and the continuous development and evolution of sequencing technologies. The diversity of sequencing platforms, sample types and preparations, and sequencing strategy in addition to complex but not necessarily well-calibrated metrics associated with base

27 Powering Toxicogenomic Studies by Applying Machine Learning …

625

call, mapping, and alignment accuracies in different parts of the genome makes it all but impossible to develop accurate classical statistical models that robustly predict mutations in such a cornucopia of sequencing data. The ongoing development of new sequencing methods makes the whole endeavor more challenging still. Machine learning approach provides a “shortcut” to create classifiers that can accurately detect mutations as long as adequate and appropriate training data can be found. This point is proven by DeepVariant and NeuSomatic’s successes in relatively new and underdeveloped space for variant detection in PacBio and Oxford Nanopore sequencing data in precisionFDA v2 Truth Challenge. While “classic” variant callers like the GATK or MuTect2 do well on mature data types on Illumina platforms, machine learning does not require equal investment in resources and efforts to create accurate callers in brand new technologies, as long as the developers can find well-characterized high-quality training data from which the classifiers can be trained. Having high-quality training data is as equally important as having good algorithms. While GIAB has released a great number of resources with multiple genomes sequenced in many different platforms and carefully curated truth set, the efforts for somatic mutations have lagged behind due to the relative rarity of somatic mutations in a cancer genome (relative to germline variants) and the challenges of establishing a robust truth set. However, this flavor of research is picking up recently. SEQC2 Consortium has recently published a pair of tumor-normal whole-genome reference samples for somatic mutations. GIAB is also currently planning to produce somatic mutation reference samples. This lack of training data will eventually resolve itself in the future, allowing machine learning approaches to be more widespread and robust. Acknowledgements The author thanks Sayed Mohammad Ebrahim Sahraeian of Roche Sequencing Solutions and Rebecca Kusko of Immuneering for his editing and advice on the technical accuracies of this chapter.

References Alexandrov LB, Nik-Zainal S, Wedge DC et al (2013) Signatures of mutational processes in human cancer. Nature 500:415–421. https://doi.org/10.1038/nature12477 Alioto TS, Buchhalter I, Derdak S et al (2015) A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nat Commun 6:10001. https://doi.org/10. 1038/ncomms10001 Auwera GA, Carneiro MO, Hartl C et al (2013) From FastQ data to high-confidence variant calls: the genome analysis toolkit best practices pipeline. Curr Protoc Bioinf 43. https://doi.org/10. 1002/0471250953.bi1110s43 Benjamin D, Sato T, Cibulskis K et al (2019) Calling somatic SNVs and indels with Mutect2. http:// doi.org/10.1101/861054 Bhuyan MSI, Pe’er I, Rahman MS (2021) SICaRiO: short indel call filtering with boosting. Brief Bioinform 22. https://doi.org/10.1093/bib/bbaa238 Cooke DP, Wedge DC, Lunter G (2021) A unified haplotype-based method for accurate and comprehensive variant calling. Nat Biotechnol 39:885–892. https://doi.org/10.1038/s41587-02100861-3

626

L. T. Fang

Craig DW, Nasser S, Corbett R et al (2016) A somatic reference standard for cancer genome sequencing. Sci Rep 6:24607. https://doi.org/10.1038/srep24607 DeepVariant Repo. https://github.com/google/deepvariant. Accessed 30 Dec 2021 DREAM challenge. https://www.synapse.org/#!Synapse:syn312572/wiki/70726. Accessed 30 Dec 2021 Ewing AD, Houlahan KE, Hu Y et al (2015) Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection. Nat Methods 12:623–630. https://doi.org/10.1038/nmeth.3407 Fan Y, Xi L, Hughes DST et al (2016) MuSE: accounting for tumor heterogeneity using a samplespecific error model improves sensitivity and specificity in mutation calling from sequencing data. Genome Biol 17:178. https://doi.org/10.1186/s13059-016-1029-6 Fang LT, Afshar PT, Chhibber A et al (2015) An ensemble approach to accurately detect somatic mutations using SomaticSeq. Genome Biol 16:197. https://doi.org/10.1186/s13059-015-0758-2 Fang LT, Zhu B, Zhao Y et al (2021) Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing. Nat Biotechnol 39:1151–1160. https://doi.org/10.1038/s41587-021-00993-6 Freed D, Pan R, Aldana R (2018) TNscope: accurate detection of somatic mutations with haplotypebased variant candidate detection and machine learning filtering. http://doi.org/10.1101/250647 Gong T, Hayes VM, Chan EKF (2021) Detection of somatic structural variants from short-read next-generation sequencing data. Brief Bioinform 22. https://doi.org/10.1093/bib/bbaa056 Huang W, Guo YA, Muthukumar K et al (2019) SMuRF: portable and accurate ensemble prediction of somatic mutations. Bioinformatics 35:3157–3159. https://doi.org/10.1093/bioinformatics/ btz018 Kandoth C, McLellan MD, Vandin F et al (2013) Mutational landscape and significance across 12 major cancer types. Nature 502:333–339. https://doi.org/10.1038/nature12634 Kim SY, Jacob L, Speed TP (2014) Combining calls from multiple somatic mutation-callers. BMC Bioinformatics 15:154. https://doi.org/10.1186/1471-2105-15-154 Kim S, Scheffler K, Halpern AL et al (2018) Strelka2: fast and accurate calling of germline and somatic variants. Nat Methods 15:591–594. https://doi.org/10.1038/s41592-018-0051-x Koboldt DC, Zhang Q, Larson DE et al (2012) VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res 22:568–576. https://doi.org/ 10.1101/gr.129684.111 Krusche P, Trigg L, Boutros PC et al (2019) Best practices for benchmarking germline smallvariant calls in human genomes. Nat Biotechnol 37:555–560. https://doi.org/10.1038/s41587019-0054-x Lai Z, Markovets A, Ahdesmaki M et al (2016) VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research. Nucleic Acids Res 44:e108–e108. https://doi.org/ 10.1093/nar/gkw227 Larson DE, Harris CC, Chen K et al (2012) SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28:311–317. https://doi.org/10.1093/bioinform atics/btr665 Lek M, Karczewski KJ, Minikel EV et al (2016) Analysis of protein-coding genetic variation in 60,706 humans. Nature 536:285–291. https://doi.org/10.1038/nature19057 MDIC (2019) MDIC SRS report: somatic variant reference samples for NGS landscape of available reference samples Meng J, Victor B, He Z et al (2021) DeepSSV: detecting somatic small variants in paired tumor and normal sequencing data with convolutional neural network. Brief Bioinform 22. https://doi.org/ 10.1093/bib/bbaa272 Meynert AM, Ansari M, FitzPatrick DR, Taylor MS (2014) Variant detection sensitivity and biases in whole genome and exome sequencing. BMC Bioinformatics 15:247. https://doi.org/10.1186/ 1471-2105-15-247 Narzisi G, Corvelo A, Arora K et al (2018) Genome-wide somatic variant calling using localized colored de Bruijn graphs. Commun Biol 1:20. https://doi.org/10.1038/s42003-018-0023-9

27 Powering Toxicogenomic Studies by Applying Machine Learning …

627

NHGRI DNA sequencing costs. https://www.genome.gov/about-genomics/fact-sheets/DNA-Seq uencing-Costs-Data. Accessed 30 Dec 2021 Olson ND, Wagner J, McDaniel J et al (2021) PrecisionFDA truth challenge V2: calling variants from short- and long-reads in difficult-to-map regions. Cell Genomics. 2022;2(5):100129. http:// doi.org/10.1016/j.xgen.2022.100129 Pleasance ED, Cheetham RK, Stephens PJ et al (2010) A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 463:191–196. https://doi.org/10.1038/nature 08658 Poplin R, Chang P-C, Alexander D et al (2018) A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol 36:983–987. https://doi.org/10.1038/nbt.4235 Sahraeian SME, Liu R, Lau B et al (2019) Deep convolutional neural networks for accurate somatic mutation detection. Nat Commun 10:1041. https://doi.org/10.1038/s41467-019-09027-x Sahraeian SME, Fang LT, Karagiannis K et al (2022) Achieving robust somatic mutation detection with deep learning models derived from reference data sets of a cancer sample. Genome Biol 23(1):1–20 Sedlazeck FJ, Lee H, Darby CA, Schatz MC (2018) Piercing the dark matter: bioinformatics of long-range sequencing and mapping. Nat Rev Genet 19:329–346. https://doi.org/10.1038/s41 576-018-0003-4 Sherry ST (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29:308–311. https://doi.org/10.1093/nar/29.1.308 Slatko BE, Gardner AF, Ausubel FM (2018) Overview of next-generation sequencing technologies. Curr Protoc Mol Biol 122. https://doi.org/10.1002/cpmb.59 Stoler N, Nekrutenko A (2021) Sequencing error profiles of Illumina sequencing instruments. NAR Genomics Bioinf 3. https://doi.org/10.1093/nargab/lqab019 Storchova Z, Kuffer C (2008) The consequences of tetraploidy and aneuploidy. J Cell Sci 121:3859– 3866. https://doi.org/10.1242/jcs.039537 Troyanskaya OG, Arbell O, Koren Y et al (2002) Sequence complexity profiles of prokaryotic genomic sequences: a fast algorithm for calculating linguistic complexity. Bioinformatics 18:679– 688. https://doi.org/10.1093/bioinformatics/18.5.679 Wood DE, White JR, Georgiadis A et al (2018) A machine learning approach for somatic mutation discovery. Sci Transl Med 10. https://doi.org/10.1126/scitranslmed.aar7939 Zook JM, Chapman B, Wang J et al (2014) Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat Biotechnol 32:246–251. https://doi.org/10.1038/ nbt.2835 Zook JM, McDaniel J, Olson ND et al (2019) An open resource for accurately benchmarking small variant and reference calls. Nat Biotechnol 37:561–566. https://doi.org/10.1038/s41587019-0074-6

Chapter 28

Machine Learning for Predicting Gas Adsorption Capacities of Metal Organic Framework Wenjing Guo, Jie Liu, Fan Dong, Tucker A. Patterson, and Huixiao Hong

28.1 Introduction Nanomaterials have been widely studied in various fields such as food science, energy, electronics, and drugs due to their physical, chemical, optical, and electrical properties (Yu et al. 2012; Chellaram et al. 2014; Hofmann-Amtenbrink et al. 2015; Pomerantseva et al. 2019; Begum et al. 2020). In recent decades, a class of porous nanomaterials, metal organic frameworks (MOFs), that are formed by the self-assembly of metal clusters and organic linkers have been made and applied in various fields (Batten et al. 2013). The characteristics of MOFs include high thermal solubilities, various porosities, large surface area, and modifiable physicochemical properties. The large number of metal nodes and organic linkers that can be used to synthesize MOFs leads to nearly limitless combinations to make MOFs. All these properties make MOFs one of the top ten emerging technologies in chemistry (Gomollón-Bel 2019). W. Guo · J. Liu · F. Dong · T. A. Patterson · H. Hong (B) National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA e-mail: [email protected] W. Guo e-mail: [email protected] J. Liu e-mail: [email protected] F. Dong e-mail: [email protected] T. A. Patterson e-mail: [email protected] This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2023 H. Hong (ed.), Machine Learning and Deep Learning in Computational Toxicology, Computational Methods in Engineering & the Sciences, https://doi.org/10.1007/978-3-031-20730-3_28

629

630

W. Guo et al.

The high porosities, large surface areas, and open metal sites make MOFs ideal candidates for gas adsorption-based applications including gas storage (Alezi et al. 2015; Ahmed et al. 2019) and gas separation (Tan et al. 2015). To find good performing materials for these applications, the adsorption data are typically used to screen the diverse MOFs database. However, the limitless combinations of metal clusters and organic linkers lead to enormous numbers of MOFs and it is not practical to experimentally process all these MOFs. Classical molecular simulation techniques like density functional theory, molecular dynamics, and Grand Canonical Monte Carlo simulations have been used as alternative methods to compute the gas adsorption properties (Evans et al. 2017; Anderson et al. 2018; Chong et al. 2020; Altintas et al. 2021). These computational methods have significantly reduced the time needed for processing a MOF, from days to minutes (Chong et al. 2020). However, for a large dataset of MOFs, these brute-force approaches would still take a large amount of time. A variety of computational methods have been developed and used for predicting biological activities and physicochemical properties of chemicals (Hong et al. 1998; Shi et al. 2002; Shen et al. 2013; Ng et al. 2014; Cheng et al. 2017; Selvaraj et al. 2018; Tan et al. 2020; Sakkiah et al. 2021). Machine learning (ML) is becoming the most attractive computational technique to provide alternative methods for estimation of physicochemical properties and toxicological activities of chemicals (Hong et al. 2005, 2017, 2018; Luo et al. 2015; Ng et al. 2015a, b; Sakkiah et al. 2017; Huang et al. 2020). ML methods have been used to rapidly predict the gas adsorption properties of MOFs. ML techniques use computers to identify the hidden patterns and relationships in data. With large numbers of structures and limited gas adsorption data, there is no doubt that incorporation of ML methods could help design and develop MOFs. Recent studies have proven that ML can effectively reveal the underlying structure– property relationships and hence accurately predict the gas adsorption properties of MOFs (Chong et al. 2020; Jablonka et al. 2020; Altintas et al. 2021; Yan et al. 2021). This chapter reviews the current development and applications of ML models for predicting gas adsorption capacities of MOFs. Those models were developed using different ML algorithms such as support vector machine (SVM), decision trees (DT), random forest (RF), and neural network (NN) based on various types of descriptors such as geometric descriptors, topological descriptors, chemical descriptors, and energy descriptors.

28.2 Data Sources Large databases of MOFs have been developed to explore the hidden structure– property relationships. Based on types of MOFs, there are two kinds of databases available: experimental MOF databases and hypothetical MOF databases. Table 28.1 summarizes the frequently used MOF databases. Experimental MOF databases like the Cambridge Structural Database (CSD) collect MOFs that have been experimentally synthesized. CSD is the repository for small molecules including a wide range of MOFs and other organic and organometallic molecules. Since MOFs in CSD often have solvent molecules in

28 Machine Learning for Predicting Gas Adsorption Capacities of Metal …

631

Table 28.1 Frequently used MOF databases Database

Type

CoRE MOF-2019

Experimental ~14,000

Compounds Description Experimental MOFs initially developed in 2014 (Chung et al. 2014) and later updated in 2019 (Chung et al. 2019)

CoRE-DDEC

Experimental 2932

The refined CoRE MOF database with density derived electrostatic and chemical charges (Nazarian et al. 2016)

CoRE-DFT optimized

Experimental

The refined CoRE MOF database with optimized MOF structure using density function theory (DFT) (Nazarian et al. 2017)

Moghadam database

Experimental 69,666

832

Experimental MOFs from CCDC (Moghadam et al. 2017)

Northwestern hMOF database hypothetical

137,953

Hypothetical MOF generated using “Tinkertoy” algorithm (Wilmer et al. 2012)

Boyd database

Hypothetical

~300,000

Hypothetical MOF generated using graph theoretical approach (Boyd and Woo 2016)

ToBaCCo

Hypothetical

13,512

Hypothetical MOFs generated using ToBaCCo software (Colón et al. 2017)

the pores, some processing work is required before these structures can be used. In 2014, Chung et al. developed a computational-ready, experimental MOF (CoRE MOF) by removing the free/bound solvents in the structures from CSD (Chung et al. 2014). This database was updated in 2019 to include the MOF structures obtained from CSD updates and a Web of Science search, contributed by the CoRE MOF users, and derived using a topology-based crystal generator (Chung et al. 2019). This updated 2019 CoRE MOF database has nearly 14,000 structures, almost three times the number in the 2014 CoRE MOF database. Some refined subsets of the CoRE MOF database have also been developed since 2014, including CoRE MOF with Density Derived Electrostatic and Chemical charges (DDEC) (Nazarian et al. 2016) and the database with optimized CoRE MOF structure using density function theory (DFT) (Nazarian et al. 2017). However, it is hard to maintain these databases since the manual update is needed when new MOFs are deposited into CSD. With increasing numbers of MOFs, it is difficult to keep CoRE MOF databases and the related databases up to date. To overcome this problem, Moghadam et al. built a regularly updated MOF database which is a subset of CSD by implementing a number of criteria via a CSD Python Application Programming Interface (Moghadam et al. 2017). There were 69,666 MOFs in the database when it was first established in 2017, and the number has kept growing for the past few years.

632

W. Guo et al.

In addition to experimentally synthesized MOF databases, hypothetical MOF (hMOF) databases have also been developed to include MOFs that are computationally designed. Wilmer and coworkers were the first ones to construct a hMOF database (Wilmer et al. 2012). In their work, 102 building blocks and 15 functional groups were geometrically assembled using a bottom-up construction algorithm like snapping Tinkertoy or Lego bricks. Using this “Tinkertoy” algorithm, 137,953 hMOFs were generated. This hMOF dataset (referred to as the Northwestern hMOF database hereafter) is the most widely used database in ML studies. Later, using a similar approach, Fernandez et al. also built a database of 32,450 hMOFs by combining 66 building blocks and 19 functional groups (Fernandez et al. 2014). Similarly, Aghaji et al. generated 324,500 structures from 70 building units and 20 functional groups (Aghaji et al. 2016). An alternative method used to build hMOFs is a topologically based crystal constructors (ToBaCCo) algorithm (Colón et al. 2017). The ToBaCCo algorithm uses various framework topologies as templates to position building blocks. Colón et al. used 41 different topologies and built 13,512 structures in their database. The ToBaCCo software was also used by several other studies to develop hMOF structures (Anderson et al. 2018; Dureckova et al. 2019; Ma et al. 2020; Li et al. 2021). Another interesting work was done by Boyd and coworkers who used a graph theoretical approach to build over 300,000 hMOFs in their database (Boyd and Woo 2016). For the above-mentioned databases, only the Northwestern hMOF database contains gas storage capacities that are available at http://hmofs.northwestern.edu. The other databases only contain structures of MOFs. The gas adsorption properties for these MOFs are usually calculated using computational tools like molecular dynamics and Grand Canonical Monte Carlo (GCMC) simulations. However, due to computational resources needed for the large number of structures, new alternative methods like ML have been used to facilitate the estimation of gas adsorption properties of MOFs.

28.3 Descriptors of MOFs To build a reliable ML model for predicting gas adsorption capacities of MOFs, descriptors, the input variables used to train the ML model, should highly correlate with the gas adsorption capacities. When applying ML models to predict gas adsorption capacities of MOFs, descriptors in the models should not only be able to capture the features of the frameworks, but also provide sufficient information on the gas adsorption capacities of MOFs. The highly diverse structures of MOFs make the selection of informative descriptors challenging. Geometric descriptors like pore volume, surface area, and pore size are the most common descriptors used to train ML models. However, for MOFs with complex chemical environments and pore structures, these simple geometric descriptors could not provide enough information on the characteristics of MOFs. To enhance the

28 Machine Learning for Predicting Gas Adsorption Capacities of Metal …

633

description of MOFs, topological, chemical, and energy descriptors have been developed. In this section, we give a brief introduction on the descriptors that have been used in the development of ML models for predicting gas adsorption capacities of MOFs. Since pore structures are closely related to the gas adsorption capacities of MOFs, various geometric descriptors have been developed to describe the pore environment. Examples of geometric descriptors include density, void fraction (VF), pore volume (PV), gravimetric surface area (GSA), volumetric surface area (VSA), pore limiting diameter (PLD), maximum pore diameter (MPD), largest cavity diameter (LCD), dominant pore diameter (DPD), and global cavity diameter (GCD). These descriptors are usually calculated using simulation packages like Zeo++ (Willems et al. 2012) and Poreblazer (Sarkisov and Harrison 2011). In recent studies, the above-mentioned geometric descriptors have been widely used to train ML models. For example, Ma et al. used five geometric descriptors (VF, VSA, GSA, PLD, and LCD) to study hydrogen adsorption of hMOF (Ma et al. 2020). Aghaji et al. used three geometric descriptors (PV, VF, SA) to predict CO2 capture of hMOF (Aghaji et al. 2016). Fernandez et al. used six geometric descriptors (density, VF, GSA, VSA, MPD, LCD) to predict CH4 and CO2 adsorption of hMOF (Fernandez and Barnard 2016). Liang et al. used seven structural descriptors (LCD, PLD, GCD, PV, density, surface area (SA), porosity) to study the Kr/Xe selective adsorption of the Material Genomic MOFs Database. Geometric descriptors are easy to obtain, but they can only describe the localized geometric features of MOFs. For MOFs with complex pore environments, more comprehensive descriptors are needed. Thus, topological descriptors, another type of structural descriptor, have been developed to enhance the description of MOFs. Lee et al. successfully developed new topological descriptors, persistent homology barcodes, based on persistent homology (Edelsbrunner and Harer 2008) in topological data analysis to provide information on pore connectivity, morphology, and cavity size (Lee et al. 2017). This persistent homology barcode descriptor was also used by Zhang et al. to predict CH4 uptake capacity (Zhang et al. 2019). In another work, Krishnapriyan et al. used persistent homology as the topological descriptor to provide information on the channels and pores (Krishnapriyan et al. 2021). In addition to structural descriptors, chemical descriptors are also widely used to provide information on the chemical environment within pores. The various metal nodes and organic linkers in MOFs lead to highly diverse chemical environments, and chemical descriptors can be used to effectively account for this diversity. One of the commonly used chemical descriptor types is atom property like atomic number and atom type. Fanourgakis et al. showed models with a combination of chemical descriptors like atom number density (number of atoms in a unit cell of MOF over the unit cell volume) and geometric descriptors (density, VF, GSA, DPD, MPD) outperformed all other models (Fanourgakis et al. 2020). Another class of chemical descriptors is atomic property-weighted radial distribution functions (APRDF). Fernandez et al. used AP-RDF to predict CH4 storage and found under low pressures, chemical descriptors have better performance than geometric descriptors (Fernandez et al. 2013a). Dureckova et al. also showed that the best ML models

634

W. Guo et al.

were built from a combination of geometric descriptors and AP-RDF descriptors (Dureckova et al. 2019). Several other chemical descriptors have also been used in ML studies including Henry coefficient (Wu et al. 2019), crystal structure, the total degree of unsaturation, metal types, metallic percentage, electronegativity ratios, and nitrogen-to-oxygen ratios (Gülsoy et al. 2019; Beauregard et al. 2021). Since the interaction energy between the adsorbate gas and MOFs strongly correlate to the adsorption properties, energy-based descriptors have been developed to improve prediction performance. For example, Voronoi energy descriptors were developed by Simon and coworkers to predict the adsorption of Xe (Simon et al. 2015). The Voronoi energy descriptor is the average energy of a gas atom at the gas-accessible Voronoi nodes that represent the pore topology. These descriptors can be calculated using computational geometry approaches. Thornton et al. developed adsorption energy descriptors and used the combination of energy descriptors and geometric descriptors (ρ, VF, GSA, VSA, and LCD) to predict H2 adsorption capacities (Thornton et al. 2017). In another study, descriptors based on energetics of MOF guest interactions were developed to capture the interactions between MOF and H2 (Bucior et al. 2019). Similar descriptors, energy histograms, were developed and used to predict the selective adsorption of Xe/Kr (Li et al. 2021).

28.4 ML Algorithms Many ML algorithms have been used to establish relationships between descriptors and gas adsorption properties of MOFs. In this section, we briefly introduce some used ML algorithms: Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and Artificial Neural Network (ANN). SVM is a supervised machine learning algorithm introduced by Cortes and Vapnik (1995) based on the structural risk minimization principle. In SVM, kernel functions are used to map input descriptors into a high-dimensional space where classes of the samples can be separated by a hyperplane. The commonly used kernel functions for nonlinear feature space include Gaussian radial basis functions and polynomial kernel function. Empirical and experimental analyses are used to decide which kernel function to use. In a training process, the hyperparameters for SVM are tuned to find the optimal hyperplane that maximizes the distance between different classes. Since SVM can handle correlated descriptors and has good generalization performance, it is one of the ML algorithms that have been frequently used in the development of predictive models for gas adsorption of MOFs, such as CO2 /CH4 selectivity (Aghaji et al. 2016), CO2 capture (Fernandez et al. 2014; Aghaji et al. 2016; Fernandez and Barnard 2016; Anderson et al. 2018), H2 storage (Borboudakis et al. 2017; Ahmed and Siegel 2021), CH4 storage (Fernandez et al. 2013a; Aghaji et al. 2016; Ohno and Mukae 2016; Pardakhti et al. 2017), and Xe/Kr separation (Liang et al. 2021). DT is an upside-down tree-like classification/regression algorithm. DT has a root on the top, several layers of internal nodes in the middle, and leaf nodes at the bottom. Branches are the connected nodes from the root to a leaf. The root and

28 Machine Learning for Predicting Gas Adsorption Capacities of Metal …

635

internal nodes can be split into two nodes using descriptors. The path from the root to a leaf represents a series of decision rules used to predict the label of samples. Since the decision rules can be easily retrieved from a model, tree-based models are often used to find the optimal design parameters of high-performing MOFs. For example, Anderson et al. used DT on CO2 capture and found a change in pore chemistry is the most relevant factor for CO2 capture metrics (Anderson et al. 2018). Using DT, Fernandez et al. found that the optimal MOFs should have densities greater than 0.43 g/cm3 and 0.33 g/cm3 and void fraction larger than 0.52 and 0.62 for CH4 storage at 35 and 100 bar, respectively. DT has also been used to predict CO2 /CH4 selective adsorption (Aghaji et al. 2016), H2 storage (Ahmed and Siegel 2021), CO2 capture (Deng et al. 2020), CH4 uptake (Fernandez et al. 2013b; Fernandez and Barnard 2016; Pardakhti et al. 2017; Fanourgakis et al. 2019; Gülsoy et al. 2019), and Xe/Kr selective adsorption (Simon et al. 2015). RF (Breiman 2001) is a supervised machine learning algorithm that uses multiple DTs to make consensus decisions. In construction of DTs, samples are randomly selected from the original training samples and only a subset of features are randomly selected. The prediction for a sample from a RF model is the combination of predictions from its DTs using majority vote for classification and average predictions for regression. The randomness of RF minimizes the effect of overfitting and improves prediction performance. RF has proven to be a very successful ML algorithm for developing models to predict gas adsorption capacities of MOFs (Aghaji et al. 2016; Fernandez and Barnard 2016; Borboudakis et al. 2017; Pardakhti et al. 2017; Anderson et al. 2018; Zhang et al. 2019; Burns et al. 2020; Deng et al. 2020; Ahmed and Siegel 2021; Beauregard et al. 2021). ANNs are a set of algorithms used to recognize underlying relationships in data through a process that mimics the function of biological neural networks. There are three layers in an ANN: an input layer, a hidden layer, and an output layer. Each layer consisted of neurons, and each neuron is connected to all the neurons in the next layer by weights. The weights are randomly chosen at the beginning of a training process and then are calculated from the backpropagation process to minimize errors between predicted values from the output layers and the actual values. Each neuron has an activation function that transforms input values to an output signal. The activation function includes linear, sigmoid, Gaussian, Elliot, and rectified linear unit (ReLU). ReLU is the most frequently used activation function because of its ability to quickly converge. Many ANN models have been developed to predict gas adsorption capacities of MOFs. For example, Deng et al. used back propagation ANN to study the CO2 capture from air (Deng et al. 2020). Fernandez used ANN to predict CO2 and N2 adsorption at low pressures (Fernandez and Barnard 2016). Gülsoy et al. used ANN to predict CH4 storage capacity (Gülsoy et al. 2019). Similar work has been done by Lee et al. focusing on screening high-performance MOFs for CH4 storage (Lee et al. 2021). Anderson et al. used ANN to predict volumetric adsorptions for H2 (Anderson et al. 2018). Deep neural network (DNN) is a type of neural network and contains more than one hidden layer. With a large number of hidden layers, DNN can handle complex problems. Anderson et al. trained deep learning models to predict the full adsorption isotherm for molecules like CH4 ,

636

W. Guo et al.

N2 , Kr, and Xe using both geometric and chemical descriptors (Anderson et al. 2020). Ma et al. trained a DNN model on H2 adsorption data and tested the transferability of the model to other operating conditions and different gases (Ma et al. 2020). Another commonly used ML algorithm is multiple linear regression (MLR). MLR builds a linear relationship between descriptors and modeling properties by minimizing the sum of squared errors between predictions and actual values. MLR attracts interest because of its simplicity and interpretability. However, since it is prone to overfitting, very few studies have used MLR to develop models for predicting gas adsorption capacities of MOFs (Fernandez and Barnard 2016). Considering this, least absolute shrinkage and selective operator (LASSO) regression models (Tibshirani 1996) were developed to avoid overfitting by adding a penalty term to the objection function. Recent studies showed that LASSO is suitable for developing models for predicting gas adsorptions of MOFs (Bucior et al. 2019; Li et al. 2021; Liang et al. 2021).

28.5 ML Models for Predicting Gas Adsorption of MOFs Recent ML studies on gas storage and separation mainly have three goals: (1) to select best descriptors and to develop new descriptors that could improve prediction performance, (2) to estimate gas storage and gas separation potentials of MOFs, and (3) to identify high-performing MOFs and to design novel MOFs. Various ML models have been developed for different gas adsorbents. The frequently used methods to assess model performance are k-fold cross validation and hold-out validation. In a k-fold cross validation, the training dataset is randomly divided into k groups. k − 1 groups are used to build a model, and the remaining group is used to evaluate the model. This process is iterated k times so that each of the k groups is used as the testing set once. In a hold-out validation, the whole dataset is split into two sets: a training dataset and a testing dataset. The training dataset is used to train a model, and the testing dataset is used to evaluate the model. The metrics frequently used to assess the gas adsorption models include area under receiver operating characteristic curve (AUC), Pearson correlation coefficient (r 2 ), coefficient of determination (R2 ), root mean squared error (RMSE), and absolute mean error (AME). AUC is calculated using the curve that is plotted by the true-positive rate against false-positive rate for a given model. For a gas adsorption model, the true-positive rate is the ratio of correctly predicted highly performing MOFs to all high-performing MOFs. The false-positive rate is the ratio of incorrectly predicted low-performing MOFs to all low-performing MOFs. The parameters like r 2 , R2 , RMSE, and AME are computed using Eqs. (28.1–28.4). ]2 i=1 (yi − y)(u i − u) ∑ n 2 2 i=1 (yi − y) (u i − u)

[∑ n r = 2

(28.1)

28 Machine Learning for Predicting Gas Adsorption Capacities of Metal …

∑ n (yi − u i )2 R = 1 − ∑ i=1 n 2 i=1 (yi − y) / ∑ n 2 i=1 (yi − u i ) RMSE = n 2

AME =

n ∑ |yi − u i | n i=1

637

(28.2)

(28.3)

(28.4)

where yi is the observed gas adsorption value for the ith MOF; y is the average observed gas adsorption value; u i is the predicted value, u is the average predicted value, and n is the number of samples in the dataset. Table 28.2 summarizes the ML models developed for CH4 adsorption, H2 adsorption, CO2 adsorption and selective adsorption, and Xe/Kr selective adsorption.

28.5.1 ML Models for CH4 Adsorption Natural gas remains one of the most cost-effective sources of energy. CH4 , the primary component of natural gas, is an attractive fuel due to its low CO2 emission and has been considered as a replacement for gasoline, especially in the transportation industry. The challenge is that storage of CH4 is usually at high pressures. Adsorptionbased CH4 storage has attracted a lot of attention since it uses porous material to store CH4 at relatively low pressures. MOFs with high surface areas and pore volumes have proven to be promising candidates for high-density CH4 adsorption (Peng et al. 2013). ML models for predicting CH4 adsorption have focused on finding the best combinations of descriptors to improve prediction accuracy. New chemical descriptors like AP-RDF have been developed to improve prediction performance at low pressures. Fernandez et al. used six geometric descriptors as the input to build MLR, DT, and SVM models to predict CH4 storage at 1, 35, and 100 bar at 298 K for 137,953 hMOFs from the Northwestern hMOF database (Fernandez et al. 2013b). First, all possible combinations of the six geometric descriptors were used to train MLR models. The results showed that the combination of DPD, VF, and GSA had the best performance with coefficient of determinations (R2 ) of 0.795 and 0.917 for predicting CH4 storages at 35 and 100 bar, respectively. These three descriptors were then used to train DT and SVM models. The results showed that the SVM model had the best prediction accuracy with R2 of 0.851 and 0.941 for predicting CH4 storages at 35 and 100 bars, respectively. Since DT could provide paths toward predictions, the rules to obtain the optimal MOFs for CH4 storage were also established from the DT model. For CH4 storage at 35 and 100 bar, MOFs should have densities greater than 0.43 g/cm3 and 0.33 g/cm3 and void fractions larger than 0.52 and 0.62, respectively. SVM models were also developed with all combinations of two descriptors and the

CO2

70,605

Structural

324,500 58,400 70,433

Structural Structural + Chemical Structural + Chemical

DT

SVM

GBRT

RF

1–10*

40

2.5

Structural

137,953

324,500

137,953

1–10*

Chemical

SVM

SVM

1

0.5

4764 10,608

Structural + Energy Structural + Chemical

RF

RF

1

1

137,953 137,953

Chemical Chemical

SVM

SVM

28,417

Structural + Chemical

4.5

RF

5.8

130,398 4764

Structural + Chemical Structural + Energy

2.5

SVM

RF

35

5.8

137,953

Structural + Chemical Structural

RF

SVM

35

35

130,398 130,398

Structural + Chemical

Poisson

RF

35 Structural + Chemical

130,397 130,398

130,397

Structural + Chemical Structural + Chemical

130,397

Structural + Chemical

Structural + Chemical

4764 27,151

Structural + Energy Structural + Chemical

137,953

Data size

Structural

Type of descriptors

35

SVM

DT

RF

38

38

GBRT

38

35

RF

RF

65

SVM

100

CH4

65

Algorithm

Pressure/bar

Gas

Table 28.2 ML models for predicting gas adsorption of MOFs Refs.

4

0.9423a

0.92

5

0.919b

0.954

3

8

7 7

0.908b

0.944

6

1

3

2

6

6

3

2

5

1

3

5

0.77

0.721

0.886

0.928

0.82

0.83

0.94

0.932

0.9

0.851

0.962

0.97

5

0.9533a

0.84

4 4

0.9608a

3

2

1

0.965

0.955

0.941

Performance (R2 )

(continued)

638 W. Guo et al.

CO2 /H2

CO2 /CH4

Gas

Structural

SVM

MLR

NN

RF

SVM

20

20

20

20

DT

GBM

20

20

GBRT

1–10*

40

DT

Structural Structural + Chemical

SVM

RF

0.1

0.05

1–10*

81,679 70,433

Chemical

RF

SVM

0.1

81,679

Structural

Structural

Structural

Structural

Structural

Structural

400

400

400

400

400

400

324,500 58,400

Structural

324,500

137,593

Structural + Chemical

Structural

Structural

81,679 81,679

0.1

Structural

KNN

MLR

Structural

81,679 81,679

0.1

Structural

137,593 32,450

0.1

ANN

DT

0.1

0.1

Chemical

Chemical

SVM

SVM

0.5

0.15

70,433

Structural + Chemical

0.5

137,593

RF

SVM

1

Chemical

Data size

32,450

SVM

2.5

Type of descriptors

Chemical

Algorithm

Pressure/bar

Table 28.2 (continued)

7

0.575

0.734

0.776

0.768

0.779

0.855

11

11

11

11

11

11

8

0.953b 0.872

7

0.948b

10 3

0.752

0.921b

10 6

0.938b 0.69

10 10

0.928b 0.9b

10 10

0.931b 0.907b

6 9

0.75 0.979b

3

9

0.933

0.978b

Refs. 6

0.78

Performance (R2 )

(continued)

28 Machine Learning for Predicting Gas Adsorption Capacities of Metal … 639

Xe

Kr

H2

Structural

RF

1

LASSO

LASSO

RF

1

1

1

Bayesian

Elastic Net

1

1

RF

ANN

10

1

LASSO

LASSO

1

10

LASSO

RF

10

DNN

SVM

100

1

10

ANN

ANN

1–100*

1–100*

SVM

LASSO

1

2–100*

Structural

ANN

RF

1

1

Structural

Energy

Structural

Energy

Structural

Structural

Structural

Energy

Energy

Energy

Energy

Energy

Energy

13,512

303,791

13,512

303,791

303,791

303,791

13,512

13,512

13,512

13,512

13,512

13,512

13,506 100

Structural

2400 85,000

Structural + Chemical Structural + Energy Structural + Chemical

54,776

400

400

400

400

400

400

Data size

Energy

Structural

Structural

GBM

MLR

Structural

1

DT

1

CO2 /N2

Type of descriptors

1

Algorithm

Pressure/bar

Gas

Table 28.2 (continued) Refs.

0.83

0.393

0.85

0.392

0.392

0.56

0.96

0.96

0.96

0.95

0.96

0.96

17

18

17

18

18

18

17

17

17

17

17

17

15 16

0.998

14

13

12

11

11

11

11

11

11

0.804c

0.88

0.998

0.96

0.347

0.507

0.905

0.394

0.905

0.409

Performance (R2 )

(continued)

640 W. Guo et al.

Structural

RF

Ridge

SVM

XGBoost

1

1

1

1

LASSO

RF

1

1

Bayesian

Elastic Net

1

1

RF

ANN

1.01

1

XGBoost

RF

1

10

Structural

Ridge

SVM

1

1

670,000

Structural

Structural

Structural

Structural

Structural

Energy

Structural

Structural

303,791

303,791

303,791

303,791

13,512

303,791

303,791

303,791

303,791

Structural + Energy Structural

13,512

303,791

303,791

303,791

303,791

Data size

Energy

Structural

Structural

RF

1

Type of descriptors

Algorithm

Pressure/bar

Refs.

19

0.687

0.973

0.66

0.688

0.933

0.15

0.686

0.687

18

18

18

18

17

18

18

18

18

2.21d 0.831

17

18

18

18

18

0.58

0.951

0.334

0.393

0.883

Performance (R2 )

ANN Artificial neural network; DT Decision tree; DNN Deep neural network; GBRT Gradient boosting regression tree, GBM Gradient boosting machines, kNN k-nearest neighbors, LASSO Least absolute shrinkage and selective operator; MLR Multiple linear regression; RF Random forest; SVM Support vector machine. R2 Coefficient of determination * Studied difference in adsorption capacities of the two pressures a Squared Pearson correlation coefficient b AUC c Accuracy d RMSE References: 1: Fernandez et al. (2013b); 2: Fanourgakis et al. (2019); 3: Fanourgakis et al. (2020); 4: Wu et al. (2019); 5: Pardakhti et al. (2017); 6: Fernandez et al. (2013a); 7: Aghaji et al. (2016); 8: Dureckova et al. (2019); 9: Fernandez et al. (2014); 10: Fernandez and Barnard (2016); 11: Anderson et al. (2018); 12: Bucior et al. (2019); 13: Anderson et al. (2019); 14: Thornton et al. (2017); 15: Ma et al. (2020); 16: Borboudakis et al. (2017); 17: Li et al. (2021); 18: Liang et al. (2021); 19: Simon et al. (2015)

Xe/Kr

Gas

Table 28.2 (continued)

28 Machine Learning for Predicting Gas Adsorption Capacities of Metal … 641

642

W. Guo et al.

SVM models built with void fraction and dominant pore diameter outperformed other models with cross validation R2 of 0.82 and 0.93 for predicting CH4 storages at 35 and 100 bar, respectively. It was also noticed that the prediction accuracy of models decreased when the pressure for CH4 storage dropped. The R2 of SVM model for predicting CH4 storage at 1 bar is low at 0.72. The univariate correlation analysis in this study also showed that at low pressures, the structure–property relationships were diffused, suggesting very poor correlation between structures and properties. The possible reason for this finding is that at low pressure, a large fraction of pores are empty and geometric descriptors used to describe pores could not provide sufficient information on the gas adsorption properties. On the other hand, at high pressure, most pores of MOFs are filled with gas; therefore, there is a stronger correlation between the gas adsorption properties and MOF structures. To improve prediction of CH4 storage at low pressures, a new chemical descriptor AP-RDF was introduced (Fernandez et al. 2013a). The coefficient of determination R2 of SVM models constructed with AP-RDF descriptors reached 0.83 and 0.88 for predicting CH4 storage at 35 bar and 100 bar, respectively, which are comparable with the results from the models built using only geometric descriptors. For predicting CH4 storage at low pressures, R2 for the SVM models built with AP-RDF reached 0.83, 0.82, and 0.77 for 4.5 bar, 2.5 bar, and 0.5 bar, respectively, while R2 for the models constructed with only geometric descriptors were 0.46, 0.52, and 0.61, respectively. The results suggest that the use of chemical descriptors can significantly improve ML models for predicting CH4 storage at low pressures. The usefulness of chemical descriptors has also been investigated in several other studies (Pardakhti et al. 2017; Fanourgakis et al. 2019, 2020; Wu et al. 2019). Pardakhti et al. investigated CH4 adsorption at 35 bar and 298 K on 130,398 hMOFs from the Northwestern hMOF database (Pardakhti et al. 2017). In their work, in addition to conventional geometric descriptors, several new chemical descriptors such as number of atoms per unit cell, saturation in terms of carbon, metallic percentage, oxygen to metal ratio, electronegative atoms to total atoms ratio, weighted electronegativity per atom, and metal type were introduced to account for chemical interactions. Different ML algorithms such as RF, Poisson Regression, SVM, and DT were used to train models based on 10,433 structures. It was found that the RF model with both structural and chemical descriptors had the best performance with coefficient of determination R2 of 0.97 and mean absolute percentage error (MAPE) of 8.75%. To validate the model, tenfold cross validation was also carried out and R2 and MAPE were slightly improved to 0.98 and 7.18%, respectively. They also found that the key descriptors for the model were density, void fraction, surface area, pore diameter, metallic percentage, and degree of carbon unsaturation. In another work, Wu et al. applied SVM, RF, and gradient boosting regression tree (GBRT) to predict CH4 gravimetric uptake at 38 bar and 298 K in 130,397 hMOFs from the Northwestern hMOF database (Wu et al. 2019). They found that the chemical descriptors like Henry’s coefficient, functional group number density, and atomic number density greatly improved the prediction accuracy. When descriptors were changed from 8 geometric features to 37 combinational features, Pearson correlation coefficient r 2 from the RF model increased from 0.9407 to 0.9883 and mean square

28 Machine Learning for Predicting Gas Adsorption Capacities of Metal …

643

error (RMSE) decreased from 23.88 to 10.62. Similarly, r 2 for the GBRT model increased from 0.9373 to 0.9908 and RMSE decreased from 24.55 to 9.4. The VF, SA, and number of some linkers and corner atoms per volume were found to be the key factors impacting CH4 adsorption and the optimal design of hMOFs for volumetric capacities had VF of around 0.65–0.88, VSA of 2250 m2 cm−3 , and liquid–crystal display of around 7.5 Å. Another chemical descriptor used in the recent work is named as atom types including types of atom and atom number density in MOFs (Fanourgakis et al. 2020). In this work, atom types were identified using the LAMMPS package (Plimpton 1995) and MOFs with unidentified atoms were removed from hMOF datasets. The RF model was trained on the processed dataset. It was shown that compared to using geometric descriptors only, the combination of the chemical descriptors and geometric descriptors increased coefficient of determination R2 from 0.766 to 0.886 at 1 bar, 0.82 to 0.94 at 5.8 bar, 0.868 to 0.962 at 35 bar, and 0.911 to 0.965 at 65 bar. Energy descriptors were also used in CH4 storage prediction models. Fanourgakis et al. introduced a small number of particles called probe atom that has an appropriately defined van der Waals radium into the system and calculated a new set of the probe atom descriptors (Fanourgakis et al. 2019). Since probe atom descriptors are based on the potential energy surface of MOFs, they can be classified as energy descriptors. A small dataset of CoRE MOFs with around 4700 structures was used to train RF models for predicting CH4 adsorption at 280 and 298 K and 1, 5.8, and 65 bar. The RF models with the combination of geometric descriptors and energy descriptors were shown to have the best performance, while RF models with only energy descriptors had better performance than those with only structural descriptors. For example, for predicting CH4 storage at 1 bar and 298 K, the coefficient of determination R2 obtained from tenfold cross validation for models with combined descriptors, probe descriptors only, and structural descriptors only was 0.928, 0.892, and 0.686, respectively. As shown in Fig. 28.1, a similar trend can be found for models predicting CH4 storage at other pressures. The results suggest that although energy descriptors are more useful for models at low pressures, structural descriptors could provide some additional information on the gas adsorption.

28.5.2 ML Models for H2 Adsorption H2 is a clean energy that has been considered as a replacement for fossil fuels in vehicles. However, the storage of H2 is a big challenge due to its low energy density. MOFs have been demonstrated to be potential candidates for storage of H2 due to their high porosity and ultra-high surface areas. Both experimental characterization and conventional molecular simulation approaches are impractical to find appropriate MOFs for H2 storage from the enormous number of available MOFs. Therefore, ML models have been used to rapidly identify high-performing MOFs from large-scale databases and to find the optimal design rules for MOFs.

644

W. Guo et al.

Fig. 28.1 Comparison of random forest models with different types of descriptors: structural descriptors only (yellow bar), energy descriptors only (red bar), and the combined descriptors of both structural and chemical descriptors (blue). The RF models were trained at 1, 5.8, and 65 bar (x-axis) and 298 K. The accuracy of the model is evaluated by the coefficient of determination R2 (y-axis)

Thornton et al. used ANN to screen 850,000 nanomaterials including MOFs and other nanoporous materials based on their H2 adsorption capacities (Thornton et al. 2017). The gas adsorption properties generated from GCMC simulations were used to train the ANN models. MOFs such as MOF-210, PCN-68, NOTT-400, and ZIF8 were identified by the ANN model to be optimal materials for H2 storage, and the findings are consistent with the experiment results (Hirscher 2011; Ibarra et al. 2011; Yan et al. 2018). The optimal design of hMOFs for H2 adsorption was established as void fraction of around 0.5, pore diameter of around 10 Å, and surface areas larger than 500 m2 /g. This work demonstrates ML can dramatically reduce the computational costs of large-scale screening of hMOFs for H2 adsorption. Anderson et al. also used ANN models to explore the theoretical limits of volumetric H2 storage capacities of hMOFs at some temperatures and pressures (Anderson et al. 2019). ANN models were trained on the gas adsorption properties obtained from GCMC simulations. To enable the models to predict the full isotherm,

28 Machine Learning for Predicting Gas Adsorption Capacities of Metal …

645

the temperatures and pressures were also used as input variables. The highest volumetric delivery capacity of 62 g/L was found when hydrogen was absorbed in MOFs at 100 bar/77 K and desorbed at 5 bar/160 K. Bucior et al. introduced an energy histogram of H2 and the framework as new descriptors and used least absolute shrinkage and selective operator (LASSO) regression models to investigate the relationship between geometric structures and H2 storage (Bucior et al. 2019). The LASSO model was trained on 1000 hMOFs and was used to predict H2 adsorption of 1250 hMOFs. The results showed the model had a coefficient of determination R2 = 0.96, suggesting the new descriptors can effectively describe the interaction between MOFs and H2 . The transferability of model was also studied in this work by applying the model on different MOF datasets: the experimental MOFs from the Cambridge Crystallographic Data Centre (CCDC). Fifty-one MOFs in the CCDC dataset were identified as high-performing MOFs which was consistent with GCMC results. One of the top candidates, MFU-4l, was synthesized in the Bucior lab, and its high H2 storage capacity was confirmed. Ma et al. also investigated the transferability of ML models (Ma et al. 2020). Their goal was to test whether transfer learning could improve accuracy of the prediction models that were trained on a small dataset. In their work, MOFs were constructed using the ToBaCCo package. A DNN model with two hidden layers was trained on 13,506 MOFs for predicting H2 adsorption at 100 bar and 243 K. The coefficient of determination R2 obtained was 0.998, suggesting DNN is a very promising technique to study H2 adsorption properties of MOFs. The model was then transferred to predict H2 adsorption at the same pressure but at a temperature lower than 130 K. The determination coefficient R2 for the model built with transfer learning was 0.991, while the model constructed on the small dataset directly from deep learning only had R2 of 0.96. Similar transfer learning was conducted for CH4 at 298 K and 100 bar. The weights in the transfer learning were fine-tuned using CH4 data and the R2 on the 1000 testing sets for the transfer learning is 0.98, while direct deep learning with weights starting from scratch only had R2 of 0.935. The results showed that the knowledge learned from H2 adsorption models is useful for developing models for predicting CH4 adsorption or H2 adsorption at different conditions. However, this transfer ML model failed to screen for Xe/Kr-separated MOFs, with the determination coefficient R2 of −0.092. The possible explanation for this dramatic drop is the lack of common descriptors for the adsorption of different gases like H2 and Xe/Kr. The key descriptors used for H2 adsorption could not provide sufficient information on Xe/Kr adsorption, suggesting the knowledge is not transferrable between H2 and Xe/Kr.

28.5.3 ML Models for CO2 Adsorption CO2 is a greenhouse gas, and its capture has been studied due to global warming and environmental issues. MOFs have several advantages for CO2 capture, including their high porosity and open metal sites which can greatly enhance CO2 adsorption

646

W. Guo et al.

strength and increase CO2 capture capacities of MOFs. The electrostatic interaction between CO2 and MOFs metal site makes conventional modeling of CO2 capture computationally expensive and time-consuming. Thus, ML models have been widely used to study CO2 adsorption and selective adsorption in mixtures like CO2 /CH4 for natural gas purification, and CO2 /H2 and CO2 /N2 for pre- and post-combustion CO2 capture. ML models on CO2 adsorption have been used to identify MOFs with desired CO2 adsorption performance. Fernandez et al. examined CO2 adsorption at pressures of 1 bar and 0.15 bar of 32,450 hypothetical MOFs (Fernandez et al. 2014). SVM models were trained with AP-RDF as the descriptors to classify MOFs as MOFs with high or low CO2 adsorption capacities. The threshold for MOFs with high CO2 adsorption was set to 1 nmol/g at a pressure of 0.15 bar and 4 nmol/g at a pressure of 1 bar. The results showed that the classification model successfully recovered 945 and 905 of the top 1000 high-performing MOFs at 0.15 bar and 1 bar, respectively, suggesting that this model could be used to pre-screen MOFs for CO2 uptake. In a later work, Fernandez et al. studied the CO2 capture at 0.1 bar and 298 K, and N2 capture at 0.9 bar and 298 K using only geometric descriptors (Fernandez and Barnard 2016). Only 81,679 hypothetical MOFs that have unique frameworks, pore size larger than 1.4 Å, and surface areas larger than 100 Å2 were included. MLR, DT, KNN, RF, SVM, and ANN models were built to classify MOFs. The threshold of high or lowperforming MOFs was set to 1 nmol/g for CO2 and 0.5 nmol/g for N2 . Their results showed that the RF model had the best performance with sensitivity of 0.741 and 0.631 and specificity of 0.944 and 0.991 for CO2 and N2 , respectively. Pore size, void fraction, and surface area were found to be the three key features affecting gas adsorption performance of MOFs. The optimal combination of these three features was found by the DT model to differentiate high and low-performing MOFs. In another work, Fanourgakis et al. compared RF models built using different types of descriptors for predicting adsorption properties of CO2 at low pressures (0.02, 0.5, and 2.5 bar) and 298 K (Fanourgakis et al. 2020). In this work, the hMOF database was preprocessed to retain structures with identified atom types. The results showed that the models with the combination of chemical descriptors (atom types) and geometric descriptors had improved accuracy (R2 = 0.752, 0.933, 0.954 for predicting CO2 adsorption at pressures 0.02, 0.5, and 2.5 bar, respectively) compared to the models built using geometric descriptors only (R2 = 0.574, 0.843, and 0.954 for predicting CO2 adsorption at pressures 0.02, 0.5, and 2.5 bar, respectively). Selective adsorption of CO2 is also studied because of the need to capture CO2 from precombustion gas, post-combustion flue gas, and landfill natural gas purification. MOFs can selectively adsorb a certain gas from a mixture of gases. The selectivity of a MOF for gas i from a mixture of two gases i and j is defined by Eq. (28.5). S=

qi /q j pi / p j

(28.5)

28 Machine Learning for Predicting Gas Adsorption Capacities of Metal …

647

where qi and q j are the mole fractions of gases i and j adsorbed by the MOF; pi and p j are the mole fractions of gases i and j in the mixture. Many studies used ML to identify the optimal design rules for MOFs with highly selective adsorption of CO2 . For example, Aghaji et al. used geometric descriptors to study CO2 adsorption capacity and CO2 /CH4 adsorption selectivity of 324,500 hMOFs at natural gas purification conditions (Aghaji et al. 2016). The working capacity of a MOF in this work is the difference between CO2 adsorption at 10 bar and at 1 bar. DT and SVM models were trained on 32,450 MOFs to classify MOFs as high- or low-performing MOFs. The DT models suggested MOFs with selectivity larger than 10 need void fractions lower than 0.27 and pore sizes lower than 6.6 Å. For MOFs with CO2 working capacities larger than 4 mmol/g, pore sizes should be smaller than 8.5 Å and surface areas greater than 2300 m2 /g. The results also showed that the SVM classifier had a better performance than the DT model with AUC 0.953 and 0.936 for CO2 /CH4 selectivity thresholds 5 and 10, respectively. The SVM model identified 900 of the top 1000 high-performing MOFs from 292,050 MOFs in the testing datasets which have working capacity larger than 2 mmol/g and selectivity larger than 5. Anderson et al. studied the selective adsorption of CO2 over N2 and H2 of 400 MOFs synthesized using ToBaCCo software (Anderson et al. 2018). ML algorithms DT, RF, NN, SVM, and gradient boosting machines (GBM) and simple chemistry and topology descriptors were used to develop models for predicting the selectivity. It was found that the NN and GBM models had the best performance for CO2 selectivity over N2 at 1 bar. The NN model had a R2 of 0.905 and Spearman Ranking Correlation Coefficient (SRCC) of 0.823, while the GBM model yielded a R2 of 0.905 and SRCC of 0.921. For CO2 selectivity over H2 , the GBM model had the best performance with a R2 of 0.855 and SRCC of 0.938. In another work, Dureckova et al. studied CO2 adsorption capacity at 40 and 1 bar and CO2 /H2 selectivity at 40 bar and 313 K of 358,400 hMOFs generated using ToBaCCo software (Dureckova et al. 2019). In their work, geometric descriptors and the combination of geometric descriptors with AP-RDFs were used to train the gradient-boosted regression tree (GBRT) models. The results showed that the model with the combination of six geometric descriptors and three AP-RDFs performed the best, with test R2 of 0.944 for predicting CO2 adsorption capacity and R2 of 0.872 for predicting CO2 /H2 adsorption selectivity. In addition, the 3000 top performing MOFs identified by the GBRT models included 997 of the top 1000 MOFs identified from GCMC simulations.

28.5.4 ML Models for Xe/Kr Selective Adsorption The noble gases Xe and Kr have been widely used in industry [e.g., anesthesia (Cullen and Gross 1951; Franks et al. 1998), medical images (Albert et al. 1994), and gas lasers (Bridges and Chester 1965; Hoff et al. 1973)]. Since Xe and Kr both exist in the air, the conventional extraction method is to purify Xe and Kr from the binary mixtures obtained from by-products of the air separation process. The generally used method is a cryogenic distillation process based on the different boiling points of Xe

648

W. Guo et al.

and Kr (164.9 K for Xe and 119.8 K for Kr at 1 atm). However, this process is very energy and capital-consuming. To overcome this, an alternative environmentally friendly and cost-saving method is to use nanoporous materials as adsorbents to selectively adsorb Xe or Kr. MOFs have emerged as promising candidates because of their high porosity and surface areas. Several studies have demonstrated the use of MOFs in Xe/Kr separation (Ryan et al. 2011; Sikora et al. 2012; Banerjee et al. 2016). However, only a few ML models were developed to study Xe/Kr selective adsorption and most of them focused on the identification of informative descriptors and comparison of ML models (Simon et al. 2015; Li et al. 2021; Liang et al. 2021). Simon et al. utilized RF and DT to predict the adsorption of Xe at low pressure for 670,000 structures in the Nanoporous Materials Genome that includes experimental MOFs, hypothetical MOFs, zeolites, and covalent organic framework (Simon et al. 2015). Both geometric and Voronoi energy descriptors were used to train the models. Voronoi energy descriptors describe the interaction energy between Xe and binding sites of MOFs which are calculated using Voronoi tessellation with computational geometry approaches. The RF model was trained on 15,000 MOFs, and the RMSE of 2.21 for Xe/Kr selectivity was obtained from hold-out validation using the remaining 4066 structures. It was also found that Voronoi energy was the most important descriptor followed by the geometric descriptor of void fraction. A similar observation was found on the importance of energy descriptors by Li et al. (2021) when using a histogram of interaction as the energetic descriptors on the ToBaCCo database. In their work, both LASSO and RF were used to predict the adsorption of Kr/Xe. The results showed that for small and spherical molecules like Kr and Xe, the linear LASSO models are comparable to the RF models. Liang et al. developed models using eight ML algorithms (ridge regression, LASSO, Elastic Net, SVM, Bayesian regression, ANN, RF, and XGBoost) to predict the adsorption and separation properties for Xe/Kr in 303,991 structures in the Material Genomic MOFs Database (Liang et al. 2021). It was found that the XGBoost model with seven structural descriptors had the best performance on the testing set with determination coefficient R2 of 0.951 and RMSE of 0.055 for Xe adsorption and R2 of 0.973 and RMSE of 0.025 for Xe/Kr selection. The density, porosity, pore volume, and PLD of MOFs were found to be the key features in the geometric descriptors that affect the selective adsorption of Xe/Kr. The top performing MOFs identified by XGBoost have 888 and 896 in the top 1000 identified by GCMC simulations for Xe adsorption and Xe/Kr selective adsorption, respectively. The consistent results between XGBoost and GCMC simulations suggest that the model could be used to find promising candidate MOFs.

28.6 Conclusion Remarks and Future Perspective Recently, MOFs have attracted a lot of attention in gas adsorption-based applications due to their high porosity and chemical tunability. As summarized in this chapter, the gas adsorption-based applications have greatly benefited from the incorporation

28 Machine Learning for Predicting Gas Adsorption Capacities of Metal …

649

of ML. ML techniques have proven to have good prediction performance and can be used to identify high-performing MOFs and to design novel MOFs. Although many efforts have been made, the adoption of ML in identification and design of MOFs with high gas adsorption capacities is just the beginning of the adventure. With increasing numbers of MOFs emerging every day, more investigations are needed to improve the performance of ML models so that utilization of ML models in applications of MOFs for gas adsorption can be expanded. Selection of descriptors is important to the development of reliable ML models for predicting gas adsorption of MOFs. Many descriptors have been developed for MOFs, ranging from structural descriptors (geometric descriptors, topological descriptors) to chemical descriptors and energy descriptors. ML models with structural descriptors performed poorly in predicting gas adsorption of MOFs at low pressures. The possible reason is that at a low pressure, most pores in MOFs are empty, and therefore, pore structural descriptors correlate poorly with gas adsorption properties. This problem is overcome by the development of chemical and energy descriptors. However, depending on the absorbent gas, the MOF database, and operating condition, the performance of ML models constructed with combinations of different types of descriptors varies significantly. Therefore, to develop new descriptors and to identify widely applicable descriptors that can be used for developing ML models with improved performance in predicting gas adsorption of MOFs, continuous work in this area will be required. A clear trend in recent studies on prediction of gas adsorption capacity of MOFs is that most of the predictive models were developed using traditional ML algorithms. The continuously increasing sizes of MOFs datasets and the complicated relationships between MOFs and their gas adsorption capacities make deep learning a suitable approach to explore the hidden patterns. We expect more deep learning models will be developed to improve prediction of gas adsorption of MOFs in the future. Disclaimer: This chapter reflects the views of the authors and does not necessarily reflect those of the U.S. Food and Drug Administration.

References Aghaji MZ, Fernandez M, Boyd PG, Daff TD, Woo TK (2016) Quantitative structure–property relationship models for recognizing metal organic frameworks (MOFs) with high CO2 working capacity and CO2 /CH4 selectivity for methane purification. Eur J Inorg Chem 2016(27):4505– 4511 Ahmed A, Siegel DJ (2021) Predicting hydrogen storage in MOFs via machine learning. Patterns 2(7):100291 Ahmed A, Seth S, Purewal J, Wong-Foy AG, Veenstra M, Matzger AJ, Siegel DJ (2019) Exceptional hydrogen storage achieved by screening nearly half a million metal-organic frameworks. Nat Commun 10(1):1568

650

W. Guo et al.

Albert MS, Cates GD, Driehuys B, Happer W, Saam B, Springer CS, Wishnia A (1994) Biological magnetic resonance imaging using laser-polarized 129 Xe. Nature 370(6486):199–201 Alezi D, Belmabkhout Y, Suyetin M, Bhatt PM, Weseli´nski ŁJ, Solovyeva V, Adil K, Spanopoulos I, Trikalitis PN, Emwas A-H, Eddaoudi M (2015) MOF crystal chemistry paving the way to gas storage needs: aluminum-based soc-MOF for CH4 , O2 , and CO2 storage. J Am Chem Soc 137(41):13308–13318 Altintas C, Altundal OF, Keskin S, Yildirim R (2021) Machine learning meets with metal organic frameworks for gas storage and separation. J Chem Inf Model 61(5):2131–2146 Anderson R, Rodgers J, Argueta E, Biong A, Gómez-Gualdrón DA (2018) Role of pore chemistry and topology in the CO2 capture capabilities of MOFs: from molecular simulation to machine learning. Chem Mater 30(18):6325–6337 Anderson G, Schweitzer B, Anderson R, Gómez-Gualdrón DA (2019) Attainable volumetric targets for adsorption-based hydrogen storage in porous crystals: molecular simulation and machine learning. J Phys Chem C 123(1):120–130 Anderson R, Biong A, Gómez-Gualdrón DA (2020) Adsorption isotherm predictions for multiple molecules in MOFs using the same deep learning model. J Chem Theor Comput 16(2):1271–1283 Banerjee D, Simon CM, Plonka AM, Motkuri RK, Liu J, Chen X, Smit B, Parise JB, Haranczyk M, Thallapally PK (2016) Metal–organic framework with optimally selective xenon adsorption and separation. Nat Commun 7(1) Batten SR, Champness NR, Chen X-M, Garcia-Martinez J, Kitagawa S, Öhrström L, O’keeffe M, Paik Suh M, Reedijk J (2013) Terminology of metal–organic frameworks and coordination polymers (IUPAC recommendations 2013). Pure Appl Chem 85(8):1715–1724 Beauregard N, Pardakhti M, Srivastava R (2021) In silico evolution of high-performing metal organic frameworks for methane adsorption. J Chem Inf Model 61(7):3232–3239 Begum S, Karim ANM, Ansari MNM, Hashmi MSJ (2020) In: Hashmi S, Choudhury IA (eds) Encyclopedia of renewable and sustainable materials. Elsevier, Oxford, pp 515–539 Borboudakis G, Stergiannakos T, Frysali M, Klontzas E, Tsamardinos I, Froudakis GE (2017) Chemically intuited, large-scale screening of MOFs by machine learning techniques. npj Comput Mater 3(1):40 Boyd PG, Woo TK (2016) A generalized method for constructing hypothetical nanoporous materials of any net topology from graph theory. Cryst Eng Comm 18(21):3777–3792 Breiman L (2001) Random forests. Mach Learn 45(1):5–32 Bridges WB, Chester AN (1965) Visible and UV laser oscillation at 118 wavelengths in ionized neon, argon, krypton, xenon, oxygen, and other gases. Appl Opt 4(5):573–580 Bucior BJ, Bobbitt NS, Islamoglu T, Goswami S, Gopalan A, Yildirim T, Farha OK, Bagheri N, Snurr RQ (2019) Energy-based descriptors to rapidly predict hydrogen storage in metal–organic frameworks. Mol Syst Des Eng 4(1):162–174 Burns TD, Pai KN, Subraveti SG, Collins SP, Krykunov M, Rajendran A, Woo TK (2020) Prediction of MOF performance in vacuum swing adsorption systems for postcombustion CO2 capture based on integrated molecular simulations, process optimizations, and machine learning models. Environ Sci Technol 54(7):4536–4544 Chellaram C, Murugaboopathi G, John AA, Sivakumar R, Ganesan S, Krithika S, Priya G (2014) Significance of nanotechnology in food industry. APCBEE Proc 8:109–113 Cheng F, Hong H, Yang S, Wei Y (2017) Individualized network-based drug repositioning infrastructure for precision oncology in the panomics era. Brief Bioinform 18(4):682–697 Chong S, Lee S, Kim B, Kim J (2020) Applications of machine learning in metal-organic frameworks. Coord Chem Rev 423:213487 Chung YG, Camp J, Haranczyk M, Sikora BJ, Bury W, Krungleviciute V, Yildirim T, Farha OK, Sholl DS, Snurr RQ (2014) Computation-ready, experimental metal–organic frameworks: a tool to enable high-throughput screening of nanoporous crystals. Chem Mater 26(21):6185–6192 Chung YG, Haldoupis E, Bucior BJ, Haranczyk M, Lee S, Zhang H, Vogiatzis KD, Milisavljevic M, Ling S, Camp JS, Slater B, Siepmann JI, Sholl DS, Snurr RQ (2019) Advances, updates, and

28 Machine Learning for Predicting Gas Adsorption Capacities of Metal …

651

analytics for the computation-ready, experimental metal–organic framework database: core MOF 2019. J Chem Eng Data 64(12):5985–5998 Colón YJ, Gómez-Gualdrón DA, Snurr RQ (2017) Topologically guided, automated construction of metal–organic frameworks and their evaluation for energy-related applications. Cryst Growth Des 17(11):5801–5810 Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297 Cullen SC, Gross EG (1951) The anesthetic properties of xenon in animals and human beings, with additional observations on krypton. Science 113(2942):580–582 Deng X, Yang W, Li S, Liang H, Shi Z, Qiao Z (2020) Large-scale screening and machine learning to predict the computation-ready, experimental metal-organic frameworks for CO2 capture from air. Appl Sci 10(2):569 Dureckova H, Krykunov M, Aghaji MZ, Woo TK (2019) Robust machine learning models for predicting high CO2 working capacity and CO2 /H2 selectivity of gas adsorption in metal organic frameworks for precombustion carbon capture. J Phys Chem C 123(7):4133–4139 Edelsbrunner H, Harer J (2008) Persistent homology—a survey. Discrete Comput Geom 453 Evans JD, Fraux G, Gaillac R, Kohen D, Trousselet F, Vanson J-M, Coudert F-X (2017) Computational chemistry methods for nanoporous materials. Chem Mater 29(1):199–212 Fanourgakis GS, Gkagkas K, Tylianakis E, Klontzas E, Froudakis G (2019) A robust machine learning algorithm for the prediction of methane adsorption in nanoporous materials. J Phys Chem A 123(28):6080–6087 Fanourgakis GS, Gkagkas K, Tylianakis E, Froudakis GE (2020) A universal machine learning algorithm for large-scale screening of materials. J Am Chem Soc 142(8):3814–3822 Fernandez M, Barnard AS (2016) Geometrical properties can predict CO2 and N2 adsorption performance of metal–organic frameworks (MOFs) at low pressure. ACS Comb Sci 18(5):243–252 Fernandez M, Trefiak NR, Woo TK (2013a) Atomic property weighted radial distribution functions descriptors of metal–organic frameworks for the prediction of gas uptake capacity. J Phys Chem C 117(27):14095–14105 Fernandez M, Woo TK, Wilmer CE, Snurr RQ (2013b) Large-scale quantitative structure–property relationship (qspr) analysis of methane storage in metal–organic frameworks. J Phys Chem C 117(15):7681–7689 Fernandez M, Boyd PG, Daff TD, Aghaji MZ, Woo TK (2014) Rapid and accurate machine learning recognition of high performing metal organic frameworks for CO2 capture. J Phys Chem Lett 5(17):3056–3060 Franks NP, Dickinson R, De Sousa SL, Hall AC, Lieb WR (1998) How does xenon produce anaesthesia? Nature 396(6709):324 Gomollón-Bel F (2019) Ten chemical innovations that will change our world: IUPAC identifies emerging technologies in chemistry with potential to make our planet more sustainable. Chem Int 41(2):12–17 Gülsoy Z, Sezginel KB, Uzun A, Keskin S, Yıldırım R (2019) Analysis of CH4 uptake over metal– organic frameworks using data-mining tools. ACS Comb Sci 21(4):257–268 Hirscher M (2011) Hydrogen storage by cryoadsorption in ultrahigh-porosity metal–organic frameworks. Angew Chem Int Ed 50(3):581–582 Hoff PW, Swingle JC, Rhodes CK (1973) Observations of stimulated emission from high-pressure krypton and argon/xenon mixtures. Appl Phys Lett 23(5):245–246 Hofmann-Amtenbrink M, Grainger DW, Hofmann H (2015) Nanoparticles in medicine: current challenges facing inorganic nanoparticle toxicity assessments and standardizations. Nanomedicine 11(7):1689–1694 Hong H, Neamati N, Winslow HE, Christensen JL, Orr A, Pommier Y, Milne GW (1998) Identification of HIV-1 integrase inhibitors based on a four-point pharmacophore. Antivir Chem Chemother 9(6):461–472 Hong H, Tong W, Xie Q, Fang H, Perkins R (2005) An in silico ensemble method for lead discovery: decision forest. SAR QSAR Environ Res 16(4):339–347

652

W. Guo et al.

Hong H, Thakkar S, Chen M, Tong W (2017) Development of decision forest models for prediction of drug-induced liver injury in humans using a large set of FDA-approved drugs. Sci Rep 7(1):17311 Hong H, Zhu J, Chen M, Gong P, Zhang C, Tong W (2018) In: Chen M, Will Y (eds) Drug-induced liver toxicity. Springer, New York, pp 77–100 Huang Y, Li X, Xu S, Zheng H, Zhang L, Chen J, Hong H, Kusko R, Li R (2020) Quantitative structure–activity relationship models for predicting inflammatory potential of metal oxide nanoparticles. Environ Health Perspect 128(6):067010 Ibarra IA, Yang S, Lin X, Blake AJ, Rizkallah PJ, Nowell H, Allan DR, Champness NR, Hubberstey P, Schröder M (2011) Highly porous and robust scandium-based metal–organic frameworks for hydrogen storage. Chem Commun 47(29):8304–8306 Jablonka KM, Ongari D, Moosavi SM, Smit B (2020) Big-data science in porous materials: materials genomics and machine learning. Chem Rev 120(16):8066–8129 Krishnapriyan AS, Montoya J, Haranczyk M, Hummelshøj J, Morozov D (2021) Machine learning with persistent homology and chemical word embeddings improves prediction accuracy and interpretability in metal-organic frameworks. Sci Rep 11(1):8888 Lee Y, Barthel SD, Dłotko P, Moosavi SM, Hess K, Smit B (2017) Quantifying similarity of pore-geometry in nanoporous materials. Nat Commun 8(1):15396 Lee S, Kim B, Cho H, Lee H, Lee SY, Cho ES, Kim J (2021) Computational screening of trillions of metal–organic frameworks for high-performance methane storage. ACS Appl Mater Interfaces 13(20):23647–23654 Li Z, Bucior BJ, Chen H, Haranczyk M, Siepmann JI, Snurr RQ (2021) Machine learning using host/guest energy histograms to predict adsorption in metal–organic frameworks: application to short alkanes and Xe/Kr mixtures. J Chem Phys 155(1):014701 Liang H, Jiang K, Yan T-A, Chen G-H (2021) XGboost: an optimal machine learning model with just structural features to discover MOF adsorbents of Xe/Kr. ACS Omega 6(13):9066–9076 Luo H, Ye H, Ng HW, Shi L, Tong W, Mendrick DL, Hong H (2015) Machine learning methods for predicting HLA-peptide binding activity. Bioinform Biol Insights 9(Suppl 3):21–29 Ma R, Colón YJ, Luo T (2020) Transfer learning study of gas adsorption in metal–organic frameworks. ACS Appl Mater Interfaces 12(30):34041–34048 Moghadam PZ, Li A, Wiggin SB, Tao A, Maloney AGP, Wood PA, Ward SC, Fairen-Jimenez D (2017) Development of a Cambridge structural database subset: a collection of metal–organic frameworks for past, present, and future. Chem Mater 29(7):2618–2625 Nazarian D, Camp JS, Sholl DS (2016) A comprehensive set of high-quality point charges for simulations of metal–organic frameworks. Chem Mater 28(3):785–793 Nazarian D, Camp JS, Chung YG, Snurr RQ, Sholl DS (2017) Large-scale refinement of metal−organic framework structures using density functional theory. Chem Mater 29(6):2521– 2528 Ng HW, Zhang W, Shu M, Luo H, Ge W, Perkins R, Tong W, Hong H (2014) Competitive molecular docking approach for predicting estrogen receptor subtype α agonists and antagonists. BMC Bioinform 15(11):S4 Ng HW, Doughty SW, Luo H, Ye H, Ge W, Tong W, Hong H (2015a) Development and validation of decision forest model for estrogen receptor binding prediction of chemicals using large data sets. Chem Res Toxicol 28(12):2343–2351 Ng HW, Shu M, Luo H, Ye H, Ge W, Perkins R, Tong W, Hong H (2015b) Estrogenic activity data extraction and in silico prediction show the endocrine disruption potential of bisphenol a replacement compounds. Chem Res Toxicol 28(9):1784–1795 Ohno H, Mukae Y (2016) Machine learning approach for prediction and search: application to methane storage in a metal–organic framework. J Phys Chem C 120(42):23963–23968 Pardakhti M, Moharreri E, Wanik D, Suib SL, Srivastava R (2017) Machine learning using combined structural and chemical descriptors for prediction of methane adsorption performance of metal organic frameworks (MOFs). ACS Comb Sci 19(10):640–645

28 Machine Learning for Predicting Gas Adsorption Capacities of Metal …

653

Peng Y, Krungleviciute V, Eryazici I, Hupp JT, Farha OK, Yildirim T (2013) Methane storage in metal–organic frameworks: current records, surprise findings, and challenges. J Am Chem Soc 135(32):11887–11894 Plimpton S (1995) Fast parallel algorithms for short-range molecular dynamics. J Comput Phys 117(1):1–19 Pomerantseva E, Bonaccorso F, Feng X, Cui Y, Gogotsi Y (2019) Energy storage: the future enabled by nanomaterials. Science 366(6468) Ryan P, Farha OK, Broadbelt LJ, Snurr RQ (2011) Computational screening of metal-organic frameworks for xenon/krypton separation. AICHE J 57(7):1759–1766 Sakkiah S, Selvaraj C, Gong P, Zhang C, Tong W, Hong H (2017) Development of estrogen receptor beta binding prediction model using large sets of chemicals. Oncotarget 8(54):92989–93000 Sakkiah S, Guo W, Pan B, Ji Z, Yavas G, Azevedo M, Hawes J, Patterson TA, Hong H (2021) Elucidating interactions between sars-cov-2 trimeric spike protein and ACE2 using homology modeling and molecular dynamics simulations. Front Chem 8(1247) Sarkisov L, Harrison A (2011) Computational structure characterisation tools in application to ordered and disordered porous materials. Mol Simul 37(15):1248–1257 Selvaraj C, Sakkiah S, Tong W, Hong H (2018) Molecular dynamics simulations and applications in computational toxicology and nanotoxicology. Food Chem Toxicol 112:495–506 Shen J, Xu L, Fang H, Richard AM, Bray JD, Judson RS, Zhou G, Colatsky TJ, Aungst JL, Teng C, Harris SC, Ge W, Dai SY, Su Z, Jacobs AC, Harrouk W, Perkins R, Tong W, Hong H (2013) EADB: an estrogenic activity database for assessing potential endocrine activity. Toxicol Sci 135(2):277–291 Shi L, Tong W, Fang H, Xie Q, Hong H, Perkins R, Wu J, Tu M, Blair RM, Branham WS, Waller C, Walker J, Sheehan DM (2002) An integrated “4-phase” approach for setting endocrine disruption screening priorities-phase i and ii predictions of estrogen receptor binding affinity. SAR QSAR Environ Res 13(1):69–88 Sikora BJ, Wilmer CE, Greenfield ML, Snurr RQ (2012) Thermodynamic analysis of Xe/Kr selectivity in over 137 000 hypothetical metal–organic frameworks. Chem Sci 3(7):2217–2223 Simon CM, Mercado R, Schnell SK, Smit B, Haranczyk M (2015) What are the best materials to separate a xenon/krypton mixture? Chem Mater 27(12):4459–4475 Tan K, Zuluaga S, Gong Q, Gao Y, Nijem N, Li J, Thonhauser T, Chabal YJ (2015) Competitive coadsorption of CO2 with H2 O, NH3 , SO2 , NO, NO2 , N2 , O2 , and CH4 in M-MOF-74 (M = Mg Co, Ni): the role of hydrogen bonding. Chem Mater 27(6):2203–2217 Tan H, Wang X, Hong H, Benfenati E, Giesy JP, Gini GC, Kusko R, Zhang X, Yu H, Shi W (2020) Structures of endocrine-disrupting chemicals determine binding to and activation of the estrogen receptor α and androgen receptor. Environ Sci Technol 54(18):11424–11433 Thornton AW, Simon CM, Kim J, Kwon O, Deeg KS, Konstas K, Pas SJ, Hill MR, Winkler DA, Haranczyk M, Smit B (2017) Materials genome in action: identifying the performance limits of physical hydrogen storage. Chem Mater 29(7):2844–2854 Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc Ser B (Methodol) 58(1):267–288 Willems TF, Rycroft CH, Kazi M, Meza JC, Haranczyk M (2012) Algorithms and tools for highthroughput geometry-based analysis of crystalline porous materials. Microporous Mesoporous Mater 149(1):134–141 Wilmer CE, Leaf M, Lee CY, Farha OK, Hauser BG, Hupp JT, Snurr RQ (2012) Large-scale screening of hypothetical metal–organic frameworks. Nat Chem 4(2):83–89 Wu X, Xiang S, Su J, Cai W (2019) Understanding quantitative relationship between methane storage capacities and characteristic properties of metal–organic frameworks based on machine learning. J Phys Chem C 123(14):8550–8559 Yan Y, Da Silva I, Blake AJ, Dailly A, Manuel P, Yang S, Schröder M (2018) High volumetric hydrogen adsorption in a porous anthracene-decorated metal-organic framework. Inorg Chem 57(19):12050–12055

654

W. Guo et al.

Yan Y, Zhang L, Li S, Liang H, Qiao Z (2021) Adsorption behavior of metal-organic frameworks: from single simulation, high-throughput computational screening to machine learning. Comput Mater Sci 193:110383 Yu H, Li L, Zhang Y (2012) Silver nanoparticle-based thermal interface materials with ultra-low thermal resistance for power electronics applications. Scr Mater 66(11):931–934 Zhang X, Cui J, Zhang K, Wu J, Lee Y (2019) Machine learning prediction on properties of nanoporous materials utilizing pore geometry barcodes. J Chem Inf Model 59(11):4636–4644