In Silico Chemistry and Biology: Current and Future Prospects 9783110495171

In Silico Chemistry and Biology: Current and Future Prospects provides a compact overview on recent advances in this hig

137 24 5MB

English Pages 209 [210] Year 2022

Report DMCA / Copyright


Polecaj historie

In Silico Chemistry and Biology: Current and Future Prospects

Table of contents :
Half Title
Also of interest
In Silico Chemistry and Biology: Current and Future Prospects
List of contributing authors
1. Pharmaceutical interest of in-silico approaches
1.1 Introduction
1.1.1 Target recognition
1.1.2 Target confirmation
1.1.3 Lead discovery
1.1.4 Lead optimization
1.1.5 Preclinical studies
1.1.6 Clinical trials
1.2 Approaches
1.2.1 Homology modeling (HM) Applications
1.2.2 Molecular docking (Interaction Networks) Application
1.2.3 Virtual high-throughput screening Application
1.2.4 Quantitative structure-activity relationship (QSAR) Application
1.2.5 Hologram quantitative structure-activity relationship (HQSAR)
1.2.6 Comparative molecular similarity indices analysis (CoMSIA)
1.2.7 3D-pharmacophore mapping Application
1.2.8 De novo design based on 3D-pharmacophore mapping
1.2.9 Microarray analysis Application
1.2.10 Conformational analysis
1.2.11 Monte Carlo simulation Application
1.2.12 Molecular dynamic (MD) simulation Application
2. Novel drug design and bioinformatics: an introduction
2.1 Introduction
2.2 Structure-based drug design
2.2.1 Homology modelling
2.2.2 Ligand docking
2.2.3 Fragment-based drug design
2.2.4 Molecular dynamics
2.3 Ligand-based drug design
2.3.1 Similarity search
2.3.2 Pharmacophore mapping
2.3.3 Quantitative structure-activity relationship
2.4 Quantum mechanics/molecular mechanics
2.5 Proteochemometrics modelling
2.6 Deep learning approach
2.7 Summary and outlook
3. In silico drug design: application and success
3.1 Introduction to in silico drug design
3.1.1 Introduction
3.1.2 Classification
3.1.3 Structure-based drug design (SBDD)
3.1.4 Molecular docking
3.1.5 Pharmacophore generation
3.1.6 Virtual screening (VS)
3.2 SBDD and applications
3.2.1 Introduction
3.2.2 The successful drugs developed using in silico approaches Similarities between carboxypeptidase A and ACE enzymes Dorzolamide MK-927 – structural features crucial for identifying the difference in the potency Aliskerin Saquinavir Zanamivir LY-517717
3.3 Ligand based drug design (LBDD) and applications
3.3.1 Introduction
3.3.2 Molecular descriptors-role in LBDD
3.3.3 In silico applications of QSAR analysis
3.3.4 Extended applications of QSAR combined with SBDD techniques
3.3.5 Applications of mt-QSAR and mtk-QSAR models
3.3.6 Success in the field of LBDD
3.4 In silico approaches-application to predict pharmacokinetic parameters and toxicity (ADMET)
3.4.1 In silico tools to predict absorption Introduction Available databases Absorption parameters Percentage absorption (%HIA) Caco2 (cancer coli-2) permeability Parallel artificial membrane permeability (PAMPA) Madin-Darby canine kidney (MDCK) epithelial cells used to model absorption Kp skin permeability coefficient
3.4.2 In silico tools to predict the distribution Introduction Human serum albumin and other proteins: influence on the distribution Application of QSAR studies to predict the relationship between structure and distribution Case studies Plasma clearance prediction An overview of blood-brain barrier (BBB) penetration and in silico prediction Introduction Permeability-glycoprotein/P-glycoprotein/P-gp Role of Pgp-in drug absorption Breast cancer resistance protein (ABCG2 or BCRP) Successful applications The relation between molecular descriptors and BBB permeation
3.4.3 In silico tools to predict metabolism
3.4.4 In silico tools to predict toxicity hERG prediction & cardiotoxicity Cardiotoxicity Drugs with hERG-related cardiotoxicity Terfenadine, Astemizole Cisapride Chloroquine & quinidine Amantadine Antipsychotics Class III antiarrhythmic agents Antihypertensive agents In silico methods Ligand-based methods Structure-based methods Structural details of hERG 3D-structural details of hERG (KCNH2 or Kv 11.1) channel Important in silico tools to predict hERG toxicity Application of in silico tools for the prediction of toxicity
3.5 Conclusion
4. Protein modeling
4.1 Proteins
4.2 Bioinformatics and the importance of computational tools
4.3 Homologous structures and de novo protein design
4.4 Protein data bank
4.5 Molecular modeling
4.5.1 Comparative modeling
4.5.2 Free modeling
4.6 Selected computational tools
4.6.1 PyMOL
4.6.2 Pfam
4.8 Critical assessment of protein structure prediction (CASP)
4.9 Conclusion
5. Fragment based drug design
5.1 Introduction
5.1.1 Fragment
5.1.2 Design of library
5.1.3 Identification of appropriate fragment to develop (biophysical or biochemical techniques, which interrogate the ligand–target binding) [1, 14, 15]
5.1.4 Elaborating its chemical structure to generate a useful lead compound
5.1.5 Growing
5.1.6 Merging
5.1.7 Linking
5.2 Conclusion
6. An overview of in silico methods used in the design of VEGFR-2 inhibitors as anticancer agents
6.1 Introduction
6.2 History of earlier FDA-approved VEGFR inhibitors and the recent development
6.3 Structure of VEGFR-2
6.4 Applications of in silico studies in the exploration of VEGFR-2 inhibitors
6.4.1 Design of novel piperazine–chalcone hybrids as VEGFR-2 kinase inhibitors
6.4.2 Docking model of 1-piperazinyl-phthalazines as potential VEGFR-2 inhibitors
6.4.3 Identification of BAW2881 as a potent VEGFR-2 inhibitor: a success story
6.4.4 Molecular modeling studies on thienopyrimidine scaffold as VEGFR-2 inhibitors
6.4.5 Identification of new VEGFR-2 kinase inhibitors: pharmacophore modeling and virtual screening
6.4.6 Molecular modeling of quinazoline containing 1,3,4-oxadiazole scaffold as VEGFR-2 inhibitor
6.4.7 Identification of covalently binding, irreversible VEGFR-2 kinase domain inhibitors
6.4.8 Molecular docking study of novel N-(2-carbamoyl-6-methoxyphenyl)-3,4,5-trimethoxybenzamide derivative as VEGFR-2 tyrosine kinase inhibitor
6.5 Conclusions
7. Molecular docking and MD: mimicking the real biological process
7.1 Introduction
7.2 AutoDock; docking of flexible ligands to receptors:
7.3 AutoDock: coordinate file preparation
7.4 Autogrid calculation
7.5 Docking performed using AutoDock
7.6 Analysis performed using AutoDock tools
7.7 AutoDock result
7.8 Molecular dynamic simulations and history
7.9 PDB Structure and need of 3d conformation study
7.10 Conformational changes are a common part of an enzymes’ catalytic cycle
7.11 The overview of calculating md simulation
7.12 GPU and high computation power in MD simulations
7.13 World’s fastest computer and MD simulations
7.14 Force filed: need and selection
7.15 Benefits/outcomes of MD simulations
7.16 Limitations and future prospects of MD simulations
8. Molecular docking studies of tea (Thea sinensis Linn.) polyphenols inhibition pattern with Rat P-glycoprotein
8.1 Introduction
8.2 Materials and methods
8.3 Results
8.4 Discussion
8.5 Conclusion
9. Statistical methods for in silico tools used for risk assessment and toxicology
9.1 Background
9.2 Risk assessment comprises four processes
9.2.1 Hazard identification
9.2.2 Exposure assessment
9.2.3 Effect assessment
9.2.4 Risk characterization
9.3 Risk management
9.3.1 In silico tools used for risk assessment Structure-activity relationships (SARs) QSARs Read-across Expert opinion
9.3.2 Statistical methods for in silico risk assessment Regression models Classification models Model relevance
10. Systems biology–the transformative approach to integrate sciences across disciplines
10.1 Introduction
10.2 Transforming biology-insights from the systems biology approach
10.2.1 Systems and systems biology
10.2.2 Network modelling in systems biology
10.2.3 From systems biology to synthetic biology
10.2.4 Applications of synthetic biology
10.3 Challenges and future directions
10.4 Conclusions

Citation preview

Girish Kumar Gupta, Mohammad Hassan Baig (Eds.) In Silico Chemistry and Biology

Also of interest High Performance Liquid Chromatography. Theory, Instrumentation and Application in Drug Quality Control Omar Al Sayed Omar and Moustafa A. Khalifa,  ISBN ----, e-ISBN ----

Theoretical and Computational Chemistry. Applications in Industry, Pharma, and Materials Science Iwona Gulaczyk and Bartosz Tylkowski (Eds.),  ISBN ----, e-ISBN ----

Chemical Sciences in the Focus Volume  Pharmaceutical Applications Ponnadurai Ramasami (Ed.),  ISBN ----, e-ISBN ----

Solubility in Pharmaceutical Chemistry Christoph Saal and Anita Nair (Eds.),  ISBN ----, e-ISBN ----

Membrane Systems For Bioartificial Organs and Regenerative Medicine Loredana De Bartolo, Efrem Curcio and Enrico Drioli,  ISBN ----, e-ISBN ----

Physical Sciences Reviews. e-ISSN -X

In Silico Chemistry and Biology Current and Future Prospects Edited by Girish Kumar Gupta and Mohammad Hassan Baig

Editors Dr. Girish Kumar Gupta Department of Pharmaceutical Chemistry Sri Sai College of Pharmacy Badhani, Pathankot Punjab India and Research and Development Sri Sai Group of Institutes Badhani, Pathankot 145001 Punjab, India [email protected] Dr. Mohammad Hassan Baig Department of Family Medicine, Gangnam Severance Hospital Yonsei University College of Medicine Seoul, Republic of Korea [email protected]

ISBN 978-3-11-049517-1 e-ISBN (PDF) 978-3-11-049395-5 e-ISBN (EPUB) 978-3-11-049245-3 Library of Congress Control Number: 2022931619 Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at © 2022 Walter de Gruyter GmbH, Berlin/Boston Cover image: snowflock/iStock/Getty Images Plus Typesetting: TNQ Technologies Pvt. Ltd. Printing and binding: CPI books GmbH, Leck

Contents List of contributing authors


Dinesh Kumar, Pooja Sharma, Ayush Mahajan, Ravi Dhawan, and Kamal Dua 1 1 Pharmaceutical interest of in-silico approaches 1 1.1 Introduction 2 1.1.1 Target recognition 2 1.1.2 Target confirmation 2 1.1.3 Lead discovery 2 1.1.4 Lead optimization 2 1.1.5 Preclinical studies 3 1.1.6 Clinical trials 3 1.2 Approaches 3 1.2.1 Homology modeling (HM) 3 1.2.2 Molecular docking (Interaction Networks) 6 1.2.3 Virtual high-throughput screening 7 1.2.4 Quantitative structure-activity relationship (QSAR) 8 1.2.5 Hologram quantitative structure-activity relationship (HQSAR) 8 1.2.6 Comparative molecular similarity indices analysis (CoMSIA) 8 1.2.7 3D-pharmacophore mapping 9 1.2.8 De novo design based on 3D-pharmacophore mapping 9 1.2.9 Microarray analysis 9 1.2.10 Conformational analysis 10 1.2.11 Monte Carlo simulation 10 1.2.12 Molecular dynamic (MD) simulation 11 References Mohammad Kalim Ahmad Khan and Salman Akhtar 2 Novel drug design and bioinformatics: an introduction 15 2.1 Introduction 19 2.2 Structure-based drug design 19 2.2.1 Homology modelling 22 2.2.2 Ligand docking 23 2.2.3 Fragment-based drug design 25 2.2.4 Molecular dynamics 26 2.3 Ligand-based drug design 27 2.3.1 Similarity search 27 2.3.2 Pharmacophore mapping 27 2.3.3 Quantitative structure-activity relationship 28 2.4 Quantum mechanics/molecular mechanics



2.5 2.6 2.7


Proteochemometrics modelling 29 Deep learning approach 29 Summary and outlook 30 References


Shaheen Begum, Mohammad Zubair Shareef and Koganti Bharathi 37 3 In silico drug design: application and success 38 3.1 Introduction to in silico drug design 38 3.1.1 Introduction 38 3.1.2 Classification 38 3.1.3 Structure-based drug design (SBDD) 40 3.1.4 Molecular docking 42 3.1.5 Pharmacophore generation 44 3.1.6 Virtual screening (VS) 45 3.2 SBDD and applications 45 3.2.1 Introduction 45 3.2.2 The successful drugs developed using in silico approaches 53 3.3 Ligand based drug design (LBDD) and applications 53 3.3.1 Introduction 53 3.3.2 Molecular descriptors-role in LBDD 54 3.3.3 In silico applications of QSAR analysis 56 3.3.4 Extended applications of QSAR combined with SBDD techniques 57 3.3.5 Applications of mt-QSAR and mtk-QSAR models 58 3.3.6 Success in the field of LBDD 3.4 In silico approaches-application to predict pharmacokinetic parameters 62 and toxicity (ADMET) 62 3.4.1 In silico tools to predict absorption 65 3.4.2 In silico tools to predict the distribution 71 3.4.3 In silico tools to predict metabolism 73 3.4.4 In silico tools to predict toxicity 79 3.5 Conclusion 79 References Rodrigo S. A. de Araújo, Francisco J. B. Mendonça, Jr., Marcus T. Scotti and Luciana Scotti 85 4 Protein modeling 85 4.1 Proteins 87 4.2 Bioinformatics and the importance of computational tools 88 4.3 Homologous structures and de novo protein design 89 4.4 Protein data bank 90 4.5 Molecular modeling 91 4.5.1 Comparative modeling 91 4.5.2 Free modeling



4.6 4.6.1 4.6.2 4.7 4.8 4.9

Selected computational tools 94 94 PyMOL 95 Pfam 96 SWISS-MODEL Critical assessment of protein structure prediction (CASP) 97 Conclusion 98 References


Rahul Ashok Sachdeo, Tulika Anthwal and Sumitra Nain 101 5 Fragment based drug design 101 5.1 Introduction 104 5.1.1 Fragment 106 5.1.2 Design of library 5.1.3 Identification of appropriate fragment to develop (biophysical or biochemical techniques, which interrogate the ligand–target binding) 106 [1, 14, 15] 5.1.4 Elaborating its chemical structure to generate a useful lead 110 compound 110 5.1.5 Growing 111 5.1.6 Merging 111 5.1.7 Linking 112 5.2 Conclusion 113 References Richie R. Bhandare, Bulti Bakchi, Dilep Kumar Sigalapalli and Afzal B. Shaik 6 An overview of in silico methods used in the design of VEGFR-2 inhibitors as 115 anticancer agents 115 6.1 Introduction 6.2 History of earlier FDA-approved VEGFR kinase inhibitors and the recent 117 development 118 6.3 Structure of VEGFR-2 6.4 Applications of in silico studies in the exploration of VEGFR-2 119 inhibitors 6.4.1 Design of novel piperazine–chalcone hybrids as VEGFR-2 kinase 120 inhibitors 6.4.2 Docking model of 1-piperazinyl-phthalazines as potential VEGFR-2 121 inhibitors 6.4.3 Identification of BAW2881 as a potent VEGFR-2 inhibitor: a success 122 story 6.4.4 Molecular modeling studies on thienopyrimidine scaffold as VEGFR-2 124 inhibitors


6.4.5 6.4.6 6.4.7 6.4.8



Identification of new VEGFR-2 kinase inhibitors: pharmacophore 125 modeling and virtual screening Molecular modeling of quinazoline containing 1,3,4-oxadiazole scaffold 125 as VEGFR-2 inhibitor Identification of covalently binding, irreversible VEGFR-2 kinase domain 126 inhibitors Molecular docking study of novel N-(2-carbamoyl-6-methoxyphenyl)3,4,5-trimethoxybenzamide derivative as VEGFR-2 tyrosine kinase 128 inhibitor 128 Conclusions 129 References

Varruchi Sharma, Anil Panwar, Girish Kumar Gupta and Anil K. Sharma 133 7 Molecular docking and MD: mimicking the real biological process 133 7.1 Introduction 134 7.2 AutoDock; docking of flexible ligands to receptors: 135 7.3 AutoDock: coordinate file preparation 135 7.4 Autogrid calculation 136 7.5 Docking performed using AutoDock 136 7.6 Analysis performed using AutoDock tools 136 7.7 AutoDock result 138 7.8 Molecular dynamic simulations and history 138 7.9 PDB Structure and need of 3d conformation study 7.10 Conformational changes are a common part of an enzymes’ catalytic 138 cycle 139 7.11 The overview of calculating md simulation 139 7.12 GPU and high computation power in MD simulations 141 7.13 World’s fastest computer and MD simulations 142 7.14 Force filed: need and selection 142 7.15 Benefits/outcomes of MD simulations 142 7.16 Limitations and future prospects of MD simulations 143 References Babar Ali, Qazi Mohammad Sajid Jamal, Showkat R. Mir, Saiba Shams, and Mohammad Amjad Kamal 8 Molecular docking studies of tea (Thea sinensis Linn.) polyphenols inhibition 145 pattern with Rat P-glycoprotein 145 8.1 Introduction 147 8.2 Materials and methods 147 8.2.1 3D modeling of Rat P-gp receptor 147 8.2.2 Template search 147 8.2.3 Template selection


8.2.4 8.2.5 8.2.6 8.2.7 8.2.8 8.2.9 8.3 8.4 8.5

Model building 148 148 Model quality estimation 148 Model validation Preparation of receptor molecule 149 Ligand optimization 149 Docking studies 149 Results 154 Discussion 154 Conclusion 154 Abbreviations 155 References



Nermin A. Osman 9 Statistical methods for in silico tools used for risk assessment and 157 toxicology 157 9.1 Background 159 9.2 Risk assessment comprises four processes 159 9.2.1 Hazard identification 159 9.2.2 Exposure assessment 159 9.2.3 Effect assessment 160 9.2.4 Risk characterization 160 9.3 Risk management 160 9.3.1 In silico tools used for risk assessment 164 9.3.2 Statistical methods for in silico risk assessment 168 References Maya Madhavan and Sabeena Mustafa 10 Systems biology–the transformative approach to integrate sciences across 171 disciplines 171 10.1 Introduction 10.2 Transforming biology-insights from the systems biology 173 approach 173 10.2.1 Systems and systems biology 177 10.2.2 Network modelling in systems biology 177 10.2.3 From systems biology to synthetic biology 179 10.2.4 Applications of synthetic biology 188 10.3 Challenges and future directions 189 10.4 Conclusions 189 References Index


List of contributing authors Mohammad Kalim Ahmad Khan Department of Bioengineering Faculty of Engineering Integral University Lucknow, Uttar Pradesh, 226026 India E-mail: [email protected]. Salman Akhtar Department of Bioengineering Faculty of Engineering Integral University Lucknow, Uttar Pradesh, 226026 India Babar Ali College of Pharmacy and Dentistry Buraydah Colleges Buraydah, Al-Qassim Kingdom of Saudi Arabia Tulika Anthwal Department of Pharmacy Banasthali Vidyapith Banasthali, Rajasthan, 304022 India Rodrigo S. A. de Araújo Biological Science Department Laboratory of Synthesis and Drug Delivery State University of Paraiba 58070-450, João Pessoa, PB Brazil Bulti Bakchi Department of Medicinal Chemistry National Institute of Pharmaceutical Education and Research (NIPER) Hyderabad 500037 India Shaheen Begum Institute of Pharmaceutical Technology Sri Padmavati Mahila Visvavidyalayam 517501 Tirupati, Andhra Pradesh India E-mail: [email protected]

Richie R. Bhandare College of Pharmacy & Health Sciences, Ajman University, P.O. Box 340, Ajman, United Arab Emirates and Center of Medical and Bio-allied Health Sciences Research, Ajman University, Ajman, United Arab Emirates E-mail: [email protected] Koganti Bharathi Institute of Pharmaceutical Technology Sri Padmavati Mahila Visvavidyalayam 517501 Tirupati, Andhra Pradesh India Ravi Dhawan Khalsa College of Pharmacy Amritsar 143001 Punjab, India Kamal Dua Discipline of Pharmacy, Graduate School of Health, University of Technology Sydney, Ultimo 2007, NSW, Australia and Faculty of Health, Australian Research Centre in Complementary and Integrative Medicine, University of Technology Sydney, Ultimo 2007, New South Wales, Australia Girish Kumar Gupta Department of Pharmaceutical Chemistry Sri Sai College of Pharmacy Badhani, Pathankot, Punjab, 145001 India Qazi Mohammad Sajid Jamal Department of Health Informatics, College of Public Health and Health Informatics, Qassim University, Al Bukayriyah, Saudi Arabia and Novel Global Community Educational Foundation, Hebersham, Australia E-mail: [email protected]


List of contributing authors

Mohammad Amjad Kamal King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia and West China School of Nursing / Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu 610041, Sichuan, China and Enzymoics, 7 Peterlee Place, Hebersham, NSW 2770, Novel Global Community Educational Foundation, Australia Dinesh Kumar Sri Sai College of Pharmacy Manawala Amritsar 143001 Punjab, India E-mail: [email protected]. Maya Madhavan Department of Biochemistry Government College for Women Thiruvananthapuram, Kerala India E-mail: [email protected]

Sabeena Mustafa Department of Biostatistics and Bioinformatics King Abdullah International Medical Research Center (KAIMRC) King Saud Bin Abdulaziz University for Health Sciences King Abdulaziz Medical City, Ministry of National Guard Health Affairs (MNGHA) Riyadh, Kingdom of Saudi Arabia Sumitra Nain Department of Pharmacy Banasthali Vidyapith, Banasthali Rajasthan, 304022 India E-mail: [email protected] Nermin A. Osman Department of Biomedical Informatics and Medical Statistics Alexandria University Medical Research Institute 165 El-Horria Avenue, Alexandria, 21561 Egypt E-mail: [email protected]

Ayush Mahajan Sri Sai College of Pharmacy Manawala Amritsar 143001 Punjab, India

Anil Panwar Department of Molecular Biology Biotechnology and Bioinformatics College of Basic Sciences and Humanities CCS Haryana Agriculture University Hisar, 125001 India

Francisco J. B. Mendonça, Jr. Biological Science Department Laboratory of Synthesis and Drug Delivery State University of Paraiba 58070-450, João Pessoa, PB Brazil

Rahul Ashok Sachdeo Department of Pharmaceutical Chemistry Government College of Pharmacy Karad, Maharashtra, 415124 India E-mail: [email protected]

Showkat R. Mir Department of Pharmacognosy and Phytochemistry Faculty of Pharmacy Jamia Hamdard (Hamdard University) New Delhi 110062, India

Luciana Scotti Health Center Federal University of Paraíba 50670-910, João Pessoa, PB Brazil E-mail: [email protected]

List of contributing authors

Marcus T. Scotti Health Center Federal University of Paraíba 50670-910, João Pessoa, PB Brazil Afzal B. Shaik Department of Pharmaceutical Chemistry Vignan Pharmacy College Jawaharlal Nehru Technological University Vadlamudi 522213, Andhra Pradesh India E-mail: [email protected] Saiba Shams Siddhartha Institute of Pharmacy Dehra Dun 248001 Uttarakhand, India Mohammad Zubair Shareef Institute of Pharmaceutical Technology Sri Padmavati Mahila Visvavidyalayam 517501 Tirupati, Andhra Pradesh India Anil K. Sharma Department of Biotechnology Maharishi Markandeshwar (Deemed to be University) Mullana-Ambala, Haryana, 133207 India E-mail: [email protected]


Pooja Sharma Department of Pharmaceutical Sciences and Drug Research, Punjabi University, Patiala 147002, Punjab, India and Khalsa College of Pharmacy, Amritsar 143001, Punjab, India Varruchi Sharma Department of Biotechnology and Bioinformatics Sri Guru Gobind Singh College Sector-26, Chandigarh, 160019 India Dilep Kumar Sigalapalli Department of Pharmaceutical Chemistry Vignan Pharmacy College Jawaharlal Nehru Technological University Vadlamudi 522213, Andhra Pradesh India

Dinesh Kumar*, Pooja Sharma, Ayush Mahajan, Ravi Dhawan and Kamal Dua

1 Pharmaceutical interest of in-silico approaches Abstract: The virtual environment within the computer using software performed on the computer is known as in-silico studies. These drugs designing software play a vital task in discovering new drugs in the field of pharmaceuticals. These designing programs and software are employed in gene sequencing, molecular modeling, and in assessing the three-dimensional structure of the molecule, which can further be used in drug designing and development. Drug development and discovery is not only a powerful, extensive, and an interdisciplinary system but also a very complex and timeconsuming method. This book chapter mainly focused on different types of in-silico approaches along with their pharmaceutical applications in numerous diseases. Keywords: in-silico approaches; lead optimization; molecular docking; pharmaceutical application.

1.1 Introduction The in-silico means “performed on the computer” by creating a virtual environment within the computer using the software. These drugs designing software play a significant job in designing new drugs in the field of pharmaceuticals. These programs and software have been engaged in gene sequencing, molecular modeling, and in assessing the three-dimensional (3D) structure of drug candidates, which can further be used in drug designing, formulation, and development. Drug development and discovery is an extensive, authoritative, potential, and an interdisciplinary system but is of complex and time-consuming process. The substantial impetus toward the practice of in-silico studies and molecular docking has been increased for computeraided drug design [1]. There are many aspects that are responsible for its failure may be due to some side effects, less efficacy, or poor pharmacokinetics. The in-silico drug *Corresponding author: Dinesh Kumar, Sri Sai College of Pharmacy, Manawala, Amritsar 143001, Punjab, India, E-mail: [email protected]. Pooja Sharma, Department of Pharmaceutical Sciences and Drug Research, Punjabi University, Patiala 147002, Punjab, India; and Khalsa College of Pharmacy, Amritsar 143001, Punjab, India Ayush Mahajan, Sri Sai College of Pharmacy, Manawala, Amritsar 143001, Punjab, India Ravi Dhawan, Khalsa College of Pharmacy, Amritsar 143001, Punjab, India Kamal Dua, Discipline of Pharmacy, Graduate School of Health, University of Technology Sydney, Ultimo 2007, NSW, Australia; and Faculty of Health, Australian Research Centre in Complementary and Integrative Medicine, University of Technology Sydney, Ultimo 2007, New South Wales, Australia This article has previously been published in the journal Physical Sciences Reviews. Please cite as: D. Kumar, P. Sharma, A. Mahajan, R. Dhawan and K. Dua “Pharmaceutical interest of in-silico approaches” Physical Sciences Reviews [Online] 2022. DOI: 10.1515/psr-2018-0157 |


1 Pharmaceutical interest of in-silico approaches

design requires skills from different fields like structural and molecular biology, nanotechnology, biochemistry, biophysics, and computational scientist. The major objective is to implement research projects for discovering new compounds with desired therapeutic effect [2, 3]. The process of drug discovery involves different steps which are given as below:

1.1.1 Target recognition Target Identification includes detection and isolation of the specific targets to examine their association with disease.

1.1.2 Target confirmation Target Validation is the characterization of the pharmacological response resulted from the alterations done in the activity of a target protein.

1.1.3 Lead discovery Lead discovery involves a chemical compound that has some specificity and effectiveness in contrast to the biological target and is expected that the drug formed can cure the specific disease.

1.1.4 Lead optimization Lead Optimization is the modification in the chemical structure to improve targets selectivity and specificity. This also helps to improve potency, pharmacodynamic, and pharmacokinetic properties of the drug. Both in-vitro and in-vivo experiments are performed in the screening of the lead compound and for developing an efficient and safe drug.

1.1.5 Preclinical studies A preclinical study is a stage that starts before the clinical trials and during which important tests are conducted on animals for checking the potency as well as the toxicity of the drug. During the preclinical trials, the drug safety data are collected. The major aim is to provide the safe dose for the clinical trials done on humans.

1.2 Approaches


1.1.6 Clinical trials Clinical trials include human volunteers on which testing of the new drug which passed the preclinical trial is done, to check if there is any harmful side effect is present or not also to determine its effectiveness and its pharmacodynamic and pharmacokinetic properties. When the drug passes all three phases and qualitative evaluations of the drug in clinical trials. Clinical trials generate data on the safety and efficacy of the drug. The drug is approved and marketed. The modeling is characterized into ligand-based and structure-based methods. The structure-based drug designing involves the 3D structure of either enzyme or receptor in which drug or protein molecule is analyzed and researched for generating protein structures by means of homology modeling, nuclear magnetic resonance, molecular dynamics, and cryo-electron microscopy, etc. The ligand-based method consists of a group of molecules along with various structures of known strength toward computational modeling and toward theoretically predicted models. These models are further optimized to enhance the strength or potency and to identify new chemical entities using virtual screening [4–6].

1.2 Approaches 1.2.1 Homology modeling (HM) It is a well-renowned technique to forecast and make protein receptors on the basis of related protein structures. Completely automatic server streamlines the process of homologous modeling and helping users with no computational expertise can also generate new models and check the homology modeling results with their imagining and explanation. Homologous modeling is also termed as comparative modeling. At present, the most up-to-date server in automated modeling in 25 years is the SWISS-MODEL server and it’s still continuously developing till now [7–9]. Applications The applications of homology modeling have been presented in Table 1.1.

1.2.2 Molecular docking (Interaction Networks) It is a technique that envisages the favored orientation of a compound to another compound which when interacting with each other and results in an established compound but there is no reason to expect the prediction of binding affinities of many,


1 Pharmaceutical interest of in-silico approaches

Table .: The application of homology modeling. S. No.



. .

Clarify the binding mode of O-linked fucose Compound or lead optimization

[] []


Lunatic fringe (L-fng) Peptide CGP in complex with Src kinase, renin, GPRA, and PKC theta Protein tyrosine phosphatase SHP







Alpha glucosidase


-Phosphofructo--kinase (PFKFB)


Cdc phosphatases

. .

BC (Part of the enolase super family) RDH

Evaluation of anticancer lead discovery depending on structure-based virtual screening Identification and evaluation of inhibitors for Alzheimer’s disease on the basis of structure-based virtual estimation Structure-based lead discovery for antiobesity drugs Structure-based virtual screening for antidiabetes lead discovery Structure-based virtual assessment of inhibitors for tumor growth and glycolytic flux Structure-based virtual screening for anticancer lead discovery Protein function studies


Prothrombinase (FXa–FVA)


Cannabinoid receptor , alpha--Aadrenoreceptor, human adenosine A--A receptor, adenosine-- receptor ECE- Understand the loss of catalytic action of ECE- by means of mutagenesis Nod-like receptors Understand the mechanism of protein drawn with immune response M antigen Evaluation of protein function UreF Study of protein mechanism Glut  transport receptor Impending into transport mechanism NHE Understand the access mechanism of Na+/H+ exchange RSK- Lead discovery using docking for prostate and breast cancer

. . . . . . .

Assessment of mutations at binding interaction site and its consequences on the role of RDH Comprehend the identification and function of prothrombinase in drug pockets Binding mode prediction and elucidation of ligand:protein interactions



[] [] [] [] [] []

[] []

[] [] [] [] [] [] []

unlike molecules. The strategy of using available compounds of databases had a great impact on docking. Novel ligands are also discovered by docking [29].

1.2 Approaches

5 Applications –

In the designing of anticancer molecules. This technique is used to study the binding interaction of compounds with amino acids for the designing of anticancer molecules, anti-AIDS and tubulin polymerization inhibitors, etc. [30–35]. Hussain et al. reported the docking studies of coumarin-based molecule binds with the pocket of ER-α. Compounds also exhibit the binding interaction of amino acids such as Met421, Met340, Leu298, Ile373, and Phe356 (Figure 1.1) are important for ER-β binding interactions [36]. Similarly, Kumar et al. established the docking studies reveal that compound (2) interrupt tubulin assemblage and engaged at colchicine binding cavity of tubulin depicted in (Figure 1.2) [37]. In another study, molecular docking presented the binding interaction of compounds (4) and (5) with the active site of IN CCD. Moreover, compound (6) depicts the binding interaction of the V3 loop of gp120 (Figure 1.3) along with a docking score of 6.27 and exhibits the significant effects in the management of AIDS [38–40]. O






Coumarin ring



Molecular Docking

Leu354 Thr347 Asp351 Leu298 Leu536






Gly472 Phe356





Met340 Ile373



Figure 1.1: Binding interactions of compound (1) with amino acid residues.


1 Pharmaceutical interest of in-silico approaches

Figure 1.2: The binding of compounds (2) and (3) along with amino acid residues in the cavity of tubulin.

Figure 1.3: Binding interaction of compound (4–6) with amino acid residues.

1.2.3 Virtual high-throughput screening It is a way for systematic research specifically applied in drug discovery, chemistry, and biology. By means of robotics, processing of data and regulator software, sensitive indicators, and fluid handling devices. Moreover, it allows an investigator to rapidly accomplish millions of pharmacological, genetic and chemical tests. Virtual screening is described as “automatically assessing compounds in very large libraries” by means of computer software [41]. Application Applications of virtual screening are presented in Table 1.2.

1.2 Approaches


Table .: Applications of virtual screening. S. Target No.



Pipeline Pilot Spectrophotometric Power MV assay



. . . .



Mitochondrial enzyme NADH: quinone oxidoreductase (PfNDH) Hydroxysteroid Dehydrogenase Type  β-HSD -HTHRs

Soluble epoxide hydrolase (sEH) CdcB Phoshatase Cytochrome P aromatase (CYP) Microsomal prostaglandin E synthase (mPGES- )


Virtual screening method LB-VS combined with molecular fingerprints and chemoinformatics methods Combined pharmacophore /SB-VS




β-HSD cell-free assay


Radio labeled competi- Retrospective virtual tive binding assay screening


Fluorescence assay



Fluorescence assay



Fluorescence assay

SB-VS/substructure search SBVS

Catalyst In-vitro inhibitory activPipeline Pilot ity in a cell-free assay GLIDE Induced Fit Prolyl oligopeptiFITTED Recombinant human dase (POP/PREP) prolyl oligopeptidase assay on intact living cell Human ATP binding ICM cGMP uptake into cassette (ABC) inside-out vesicles transporter (IOV)



Combined Pharmacophore/SB-VS


SBVS with scoring function manually modified


Combined LBVS/SBVS


1.2.4 Quantitative structure-activity relationship (QSAR) This technique is established to illustrate the association of structural descriptors of a molecule among their biotic actions. Descriptor explains parameters of molecules such as topologic, hydrophobic, steric, and electronic effect of many compounds are determined from the use of computational and empirical methods. Application The QSAR equations predict the biotic activities of novel molecules before their synthesis [51]. Some examples of tools in machine learning for QSAR modeling were presented in Table 1.3.


1 Pharmaceutical interest of in-silico approaches

Table .: List of software and learning algorithms used in QSAR. S. No.



Learning algorithms

. . . . .

TreeNet AZOrange Elki SciTegic Pipeline Pilot Weka

Commercial Open source Open source Commercial Open source

RF SVM, RT, RF, and ANN k-NN Naive Bayes, DT, and SVM RF, SVM, and Naive Bayes

1.2.5 Hologram quantitative structure-activity relationship (HQSAR) This technique can be used to produce a hologram of a chemical compound by producing linear and branched fragments. It is not required to use any 3D data for receptors or ligands. The chemical compound is shredded to the molecular pattern that encrypts the rate of different molecular fragments [52].

1.2.6 Comparative molecular similarity indices analysis (CoMSIA) This approach is used in locating the common physiognomies which can be electronic, steric, hydrogen bond acceptor and donor, hydrophobic and these physiognomies factors are important for proper drug-receptor binding. Moreover, it is important for the process of drug discovery. It also offers electronic, steric, hydrophobic, and ClogP values of the ligands [53].

1.2.7 3D-pharmacophore mapping It is an imperious and easy technique to immediately identify lead molecules along with a favored ligands target. Pharmacophore is termed as the specialized 3D array of the functional substituent on a molecular structure that is necessary to be bind to a macromolecule or attached to an active position of an enzyme. It is important to describe a pharmacophore if we want to know about the interaction of a drug with the receptor [54]. Applications –

Pharmacophore mapping in virtual screening: Pharmacophore patterns are usually prepared using an ample array of theoretical and experimental data. This process consists of three most important phases such as identification of features for therapeutic activity, bioactive confirmation (conformational properties) and

1.2 Approaches


alignment rule or development of superposition for a class of molecules. This method is mostly used in the development of potential therapeutic pharmacophore in the management of numerous diseases [55–57].

1.2.8 De novo design based on 3D-pharmacophore mapping This technique is most widely used to design novel potential scaffolds. To conquer the limitation of this technique it is widely replaced with pharmacophore-depen dent de novo design technique (PhDD). PhDD will be able to robotically produce druglike molecules that meet the provisions of an input pharmacophore theory [58].

1.2.9 Microarray analysis Microarray analysis is also recognized as DNA technology which had an important job in biotechnology. These are generally suitably arranged sets of DNA molecules of the known sequence. The DNA molecule identity is linked to a feature that does not change this information and can be used by scientists to find out their experimental results. This research helps scientists for observing copious genes in a tiny sample instantly and can do their examination on gene expression [59]. Application –

Microarray analysis is extensively used in the management of cancer, determination of biomarkers in cancer, identification of genes in normal and patients suffering from cancer [60]. It is also widely acceptable in antibiotic treatment against numerous bacterial and fungal infections [61].

1.2.10 Conformational analysis Conformational analysis computer-based method in which fetters are consumed in such a way that the chemical compound assumes a conformation like the firm template molecule. Conformational analysis is a tough challenge since even a simple chemical compound will have a huge number of conformational isomers. The common approach in the4 conformational analysis is the usage of a search algorithm to form a sequence of initial conformations. The conformational analysis must deal with deformable chemical compounds with their least energy configurations through several calculation methods [62].


1 Pharmaceutical interest of in-silico approaches

1.2.11 Monte Carlo simulation There are some principles in statistical process which are concerned in this technique that gives suitable unlike conformations of a structure by means of computer simulation using software to allow desired structural, numerical, and thermodynamic characteristics have been analyzed as a weight of average for these characteristics over conformations. Samples of Monte Carlo simulations with flexible temperatures will increase the joining of ligands on active sites [63, 64]. Application – – –

Monte Carlo methods cover assorted applications in aerodynamic, radiation dosimetry, and quantum chromodynamics calculations. It is also used in physical experiments to examine numerous body problems and to study biological systems such as proteins and genomes [65–67]. Monte Carlo approaches are proficient in resolving coupled integral differential equations of radiations and energy transport. Moreover, these methods were also used in numerous videogames, computerized films, photo-realistic descriptions, designing, architecturing, and creating of specialized cinematic effects [68].

1.2.12 Molecular dynamic (MD) simulation It is a valuable process and based on the movement of the molecule which is simulated by resolving Newton’s equations for the motion of every atom, intensifying each atom’s position and speed by a slight increase in the duration of time. These approaches were generally employed to study molecular modeling of the gene, gene expression, and gene sequence via examining the 3D structure of proteins [69–74]. Applications – – – – – – –

It is used in molecular docking, drug design, and drug discovery. This technique is also used in quantum mechanics simulations. In refining structure predictions. In the determination of protein structures and their functions. To study the molecules such as proteins, nucleic acids, and their conformational changes. To view the dynamic evolution of biological systems. Understanding allostery in biochemistry and protein-drug interactions.

Acknowledgments: The authors are thankful to the Vice-chancellor of Punjabi University Patiala, India for their encouragement. The authors are also thankful to Er. S. K.



Punj, Chairman, Sri Sai Group of Institutes and Smt. Tripta Punj, Managing Director, Sri Sai Group of Institutes for their constant moral support. Author Contributions: Conceptualization and methodology-DK, PS and AM; writingoriginal draft preparation, Ravi Dhawan RD and Kamal Dua KD software, data curation, writing—review and editing, visualization, supervision-DK and RD; project administration-DK, RD, and KD. All authors have read and agreed to the published version of the manuscript. Research funding: None declared. Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

References 1. Salo-Ahen OMH, Alanko I, Bhadane R, Bonvin AMJJ, Honorato RV, Hossain S, et al. Molecular dynamics simulations in drug discovery and pharmaceutical development. Processes 2021;9: 71–8. 2. Park DS, Kim JM, Lee YB, Ahn CH. QSID Tool: a new three-dimensional QSAR environmental tool. J Comput Aided Drug Des 2008;22:873–83. 3. McGregor MJ, Muskal SM. Pharmacophore finger printing: application to QSAR and focused library design. J Chem Inf Comput Sci 1999;39:569–74. 4. Macalino SJ, Gosu V, Hong S, Choi S. Role of computer-aided drug design in modern drug discovery. Arch Pharm Res 2015;38:1686–701. 5. Wang T, Wu MB, Lin JP, Yang LR. Quantitative structure-activity relationship: promising advances in drug discovery platforms. Expet Opin Drug Discov 2015;10:1283–300. 6. Geromichalos GD. Importance of molecular computer modeling in anticancer drug development. J BUON 2007;12:101–18. 7. Yang SY. Pharmacophore modeling and applications in drug discovery: challenges and recent advances. Drug Discov Today 2010;15:444–50. 8. Joseph-McCarthy D, Baber JC, Feyfant E, Thompson DC, Humblet C. Lead optimization via highthroughput molecular docking. Curr Opin Drug Discov Dev 2007;10:264–74. 9. Waterhouse A, Bertoni M, Bienert S, Studer G, Tauriello G, Gumienny R, et al. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res 2018;46:296–303. 10. Luther KB, Haltiwanger RS. Role of unusual O-glycans in intercellular signalling. Int J Biochem Cell Biol 2009;41:1011–24. 11. Cohen NC. Structure-based drug design and the discovery of aliskiren (Tekturna): perseverance and creativity to overcome a R&D pipeline challenge. Chem Biol Drug Des 2007;70:557–65. 12. Hellmuth K, Grosskopf S, Lum CT, Wurtele M, Roder N, Von Kries JP, et al. Specific inhibitors of the protein tyrosine phosphatase Shp2 identified by high-throughput docking. Proc Natl Acad Sci USA 2008;105:7275–80. 13. Cozza G, Gianoncelli A, Montopoli M, Laura C, Venerando A, Meggio F, et al. Identification of novel protein kinase CK1 delta (CK1δ) inhibitors through structure-based virtual screening. Bioorg Med Chem Lett 2008;18:5672–5. 14. Claudio NC. Discovery of novel chemotypes to a G-protein-coupled receptor through ligandsteered homology modeling and structure-based virtual screening. J Med Chem 2008;51:581–8.


1 Pharmaceutical interest of in-silico approaches

15. Park H, Hwang KY, Kim YH, Hwan K, Lee JY, Kim K. Discovery and biological evaluation of novel alpha-glucosidase inhibitors with in vivo antidiabetic effect. Bioorg Med Chem Lett 2008;18: 3711–5. 16. Clem B, Telang S, Clem A, Yalcin A, Meier J, Simmons A, et al. Small-molecule inhibition of 6-phosphofructo-2-kinase activity suppresses glycolytic flux and tumor growth. Mol Cancer Therapeut 2008;7:110–20. 17. Park H, Bahn YJ, Jung SH, Jeong DG, Lee SH, Seo I, et al. Discovery of novel Cdc25 phosphatase inhibitors with micromolar activity based on the structure based virtual screening. J Med Chem 2008;51:5533–41. 18. Song L, Kalyanaraman C, Fedorov A, Fedorov EV, Glasner ME, Brown S, et al. Prediction and assignment of function for a divergent N-succinyl amino acid racemase. Nat Chem Biol 2007;3: 486–91. 19. Sun W, Gerth C, Maeda A, Lodowski DT, Van Der Kraak L, Saperstein DA, et al. Novel RDH12 mutations associated with Leber congenital amaurosis and cone-rod dystrophy: biochemical and clinical evaluations. Vis Res 2007;47:2055–66. 20. Autin L, Steen M, Dahlback B, Villoutreix BO. Proposed structural models of the prothrombinase (FXa-FVa) complex. Proteins 2006;63:440–50. 21. Navarrete F, Garcia-Gutierrez MS, Gasparyan A, Austrich-Olivares A, Manzanares J. Role of cannabidiol in the therapeutic intervention for substance use disorders. Front Pharmacol 2021;12: 626010. 22. Gagnidze K, Rozenfeld R, Mezei M, Zhou MM, Devi LA. Homology modeling and site-directed mutagenesis to identify selective inhibitors of endothelin-converting enzyme-2. J Med Chem 2008; 51:3378–87. 23. Proell M, Riedl SJ, Fritz JH, Rojas AM, Schwarzenbacher R. The nod-like receptor (NLR) family: a tale of similarities and differences. PLoS One 2008;3:2119–25. 24. Guimaraes AJ, Hamilton AJ, Guedes HL, Nosanchuk JD, Zancope-Oliveira RM. Biological function and molecular mapping of M antigen in yeast phase of histoplasma capsulatum. PLoS One 2008;3: 3449–57. 25. Salomone-Stagni M, Zambelli B, Musiani F, Ciurli S. A model-based proposal for the role of UreF as a GTPase-activating protein in the urease active site biosynthesis. Proteins 2007;68:749–61. 26. Mueckler M, Thorens B. The SLC2 (GLUT) family of membrane transporters. Mol Aspect Med 2013; 34:121–38. 27. Landau M, Herz K, Padan E, Ben-Tal N. Model structure of the Na+/H+ exchanger 1 (NHE1): functional and clinical implications. J Biol Chem 2007;282:37854–63. 28. Nguyen TL, Gussio R, Smith JA, Lannigan DA, Hecht SM, Scudiero DA, et al. Homology model of RSK2 N-terminal kinase domain, structure-based identification of novel RSK2 inhibitors, and preliminary common pharmacophore. Bioorg Med Chem 2006;14:6097–7105. 29. Shoichet BK, McGovern SL, Wei B, Irwin JJ. Lead discovery using molecular docking. Curr Opin Chem Biol 2002;6:439–46. 30. Kumar D, Jain SK. A comprehensive review of N-heterocycles as cytotoxic agents. Curr Med Chem 2016;23:4338–94. 31. Sharma P, Sharma R, Rao HS, Kumar D. Phytochemistry and medicinal attributes of A. Scholaris: a review. Int J Pharmaceut Sci Res 2015;6:505–13. 32. Kumar D, Sharma P, Singh H, Nepali K, Gupta GK, Jain SK, et al. The value of pyrans as anticancer scaffolds in medicinal chemistry. RSC Adv 2017;7:36977–99. 33. Kaur T, Sharma P, Gupta G, Ntie-Kang F, Kumar D. Treatment of tuberculosis by natural drugs: a review. Plant Arch 2019;19:2168–76.



34. Kumar D, Singh G, Sharma P, Qayum A, Mahajan G, Mintoo MJ, et al. 4-aryl/heteroaryl-4H-fused pyrans as anti-proliferative agents: design, synthesis and biological evaluation. Anti Cancer Agents Med Chem 2018;18:57–73. 35. Sharma P, Shri R, Ntie-Kang F, Kumar S. Phytochemical and ethnopharmacological perspectives of Ehretia laevis. Molecules 2021;26:3489. 36. Hussain H, Krohn K, Uddin VU, Miana GA, Greend IR. Lapachol: an overview. Arkivoc 2007;2: 145–71. 37. Kumar PP, Siva B, Rao BV, Dileep Kumar G, Lakshma Nayak V, Nishant Jain S, et al. Synthesis and biological evaluation of bergenin-1,2,3-triazole hybrids as novel class of anti-mitotic agents. Bioorg Chem 2019;91:103161–8. 38. Kaur R, Sharma P, Gupta GK, Ntie-Kang F, Kumar D. Structure activity relationship and mechanistic insights for anti-HIV natural products. Molecules 2020;25:1–49. 39. Kumar D, Sharma P, Shabu Kaur R, Lobe MMM, Gupta GK, Ntie-Kang F. In search of therapeutic candidates for HIV/AIDS: rational approaches, design strategies, structure–activity relationship and mechanistic insights. RSC Adv 2021;11:17936–64. 40. Pawar R, Das T, Mishra S, Nutan Pancholi B, Gupta SK, Bhat SV. Synthesis, anti-HIV activity, integrase enzyme inhibition and molecular modeling of cetchol, hydroquinone and quinol labdane analogs. Bioorg Med Chem Lett 2014;24:302–7. 41. Regine S, Bohacek, Colin MM, Wayne CG. The art and practice of structure-based drug design: a molecular modeling perspective. Med Res Rev 1996;16:3–50. 42. Biagini GA, Fisher N, Shone AE, Mubaraki MA, Srivastava A, Hill A, et al. Generation of quinolone antimalarials targeting the Plasmodium falciparum mitochondrial respiratory chain for the treatment and prophylaxis of malaria. Proc Natl Acad Sci USA 2012;109:8298–303. 43. Spadaro A, Negri M, Marchais-Oberwinkler S, Bey E, Frotscher M. Hydroxybenzothiazoles as new nonsteroidal inhibitors of 17β-hydroxysteroid dehydrogenase type 1 (17β-HSD1). PLoS One 2012;7: 292–302. 44. Lin X, Huang XP, Chen G, Whaley R, Peng S, Wang Y, et al. Life beyond kinases: structure-based discovery of sorafenib as nanomolar antagonist of 5-HT receptors. J Med Chem 2012;55:5749–59. 45. Xing L, McDonald JJ, Kolodziej SA, Kurumbail RG, Williams JM, Warren CJ, et al. Discovery of potent inhibitors of soluble epoxide hydrolase by combinatorial library design and structure-based virtual screening. J Med Chem 2011;54:1211–22. 46. Lavecchia A, Giovanni C, Pesapane A, Montuori N, Ragno P, Martucci NM, et al. Discovery of new inhibitors of Cdc25B dual specificity phosphatases by structure-based virtual screening. J Med Chem 2012;55:4142–58. 47. Caporuscio F, Rastelli G, Imbriano C, Del RA. Structure-based design of potent aromatase inhibitors by high-throughput docking. J Med Chem 2011;54:4006–17. 48. Birgit W, Katja W, Julia B, Markt P, Noha SM, Wolber G, et al. Pharmacophore modeling and virtual screening for novel acidic inhibitors of microsomal prostaglandin E2 synthase-1 (mPGES-1). J Med Chem 2011;54:3163–74. 49. Stephane DC, Sebastien DE, Eric T, Levan D, Cueto M, Schmidt R, et al. Virtual screening and computational optimization for the discovery of covalent prolyl oligopeptidase inhibitors with activity in human cells. J Med Chem 2012;55:6306–15. 50. Sager G, Orvoll EO, Lysaa RA, Kufareva I, Abagyan R, Ravna AW. Novel cGMP efflux inhibitors identified by virtual ligand screening (VLS) and confirmed by experimental studies. J Med Chem 2012;55:3049–57. 51. Lavecchia A, Cerchia C. In silico methods to address polypharmacology: current status, applications and future perspectives. Drug Discov Today 2016;21:288–98. 52. Suh ME, Park SY, Lee HJ. Comparison of QSAR methods (CoMFA, CoMSIA, HQSAR) of anticancer 1-Nsubstituted imidazoquinoline-4,9-dione derivatives. Bull Kor Chem Soc 2002;23:417–22.


1 Pharmaceutical interest of in-silico approaches

53. Kurogi Y, Guner OF. Pharmacophore modeling and three-dimensional database searching for drug design using catalyst. Curr Med Chem 2001;8:1035–55. 54. Wade RC, Salo-Ahen O. Molecular modeling in drug design. Molecules 2019;24:321–7. 55. Xiao-Qiang D, Hui-Yuan W, Ying-Lan Z, Xiang ML, Jiang PD, Cao ZX, et al. Pharmacophore modelling and virtual screening for identification of new aurora-A kinase inhibitors. Chem Biol Drug Des 2008;71:533–9. 56. Xie HZ, Lin LL, Xia J, Zou J, Yang L, Wei YQ, et al. Pharmacophore modeling study based on known Spleen tyrosine kinase inhibitors together with virtual screening for identifying novel inhibitors. Bioorg Med Chem Lett 2009;19:1944–9. 57. Ji-Xia R, Lin LL, Zou LY, Jin-Liang Y, Sheng-Yong Y. Pharmacophore modeling and virtual screening for the discovery of new transforming growth factor-β type I receptor (ALK5) inhibitors. Eur J Med Chem 2009;44:4259–65. 58. Li R, Fan W, Tian G, Zhu H, He L, Cai J, et al. The sequence and de novo assembly of the giant panda genome. Nature 2010;463:311–7. 59. Nessling M, Solinas-Toldo S, Lichter P, Reifenberger G, Wolter M, Moller P, et al. Genomic imbalances are rare in hairy cell leukemia. Genes Chromosomes Cancer 1999;26:182–3. 60. Pollack JR, Perou CM, Alizadeh AA, Eisen MB, Pergamenschikov A, Williams CF, et al. Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nat Genet 1999;23:41–6. 61. Macgregor PF, Jeremy A. Application of microarrays to the analysis of gene expression in cancer. Clin Chem 2002;48:1170–7. 62. Uthuppan J, Soni K. Conformational analysis: a review. Int J Pharmaceut Sci Res 2013;4:34–41. 63. Cheung DL, Alessandro T. Modelling charge transport in organic semiconductors: from quantum dynamics to soft matter. Phys Chem Chem Phys 2008;10:5941–52. 64. Cheung DL. Molecular simulation of nanoparticle diffusion at fluid interfaces. Chem Phys Lett 2010;495:55–9. 65. Moller W, Eckstein W. Tridyn – a TRIM simulation code including dynamic composition changes. Nucl Instrum Methods Phys Res B 1984;2:814–8. 66. Hemert FJ, Amons R, Wim JMP, Hans VO, Moller W. The primary structure of elongation factor EF-lac from the brine shrimp Artemia. EMBO J 1984;3:1109–13. 67. Milik M, Skolnick J. Insertion of peptide chains into lipid membranes: an off-lattice Monte Carlo dynamics model. Proteins Struct Funct Genet 1993;15:10–25. 68. Chaslot GMJB, Winands MHM, Szita I, van den Herik HJ. Cross entropy for Monte Carlo tree search. ICGA J (Int Comput Games Assoc) 2008;31:145–56. 69. Hansson T, Chris O, Gunsteren WF. Molecular dynamics simulations. Curr Opin Struct Biol 2002;12: 190–6. 70. Friesner RA, Banks JL, Murphy RB, Halgren TA, Klicic JJ, Mainz DT, et al. Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J Med Chem 2004;47:1739–49. 71. Methe BA, Nelson KE, Deming JW, Momen B, Melamud E, Zhang X, et al. The psychrophilic lifestyle as revealed by the genome sequence of Colwellia psychrerythraea 34H through genomic and proteomic analyses. Proc Natl Acad Sci USA 2005;102:10913–8. 72. Peng Y, Li Z, John M. Loss of protein structure stability as a major causative factor in monogenic disease. J Mol Biol 2005;353:459–73. 73. Jacques MJ, Pierre C, Jacob F. Allosteric proteins and cellular control systems. J Mol Biol 1963;6: 306–29. 74. Edmunds NS, McGuffin LJ. Computational methods for the elucidation of protein structure and interactions. Methods Mol Biol 2021;2305:23–52.

Mohammad Kalim Ahmad Khan* and Salman Akhtar

2 Novel drug design and bioinformatics: an introduction Abstract: In the current era of high-throughput technology, where enormous amounts of biological data are generated day by day via various sequencing projects, thereby the staggering volume of biological targets deciphered. The discovery of new chemical entities and bioisosteres of relatively low molecular weight has been gaining high momentum in the pharmacopoeia, and traditional combinatorial design wherein chemical structure is used as an initial template for enhancing efficacy pharmacokinetic selectivity properties. Once the compound is identified, it undergoes ADMET filtration to ensure whether it has toxic and mutagenic properties or not. If the compound has no toxicity and mutagenicity is either considered a potential lead molecule. Understanding the mechanism of lead molecules with various biological targets is imperative to advance related functions for drug discovery and development. Notwithstanding, a tedious and costly process, taking around 10–15 years and costing around $4 billion, cascaded approached of Bioinformatics and Computational biology viz., structure-based drug design (SBDD) and cognate ligand-based drug design (LBDD) respectively rely on the availability of 3D structure of target biomacromolecules and vice versa has made this process easy and approachable. SBDD encompasses homology modelling, ligand docking, fragment-based drug design and molecular dynamics, while LBDD deals with pharmacophore mapping, QSAR, and similarity search. All the computational methods discussed herein, whether for target identification or novel ligand discovery, continuously evolve and facilitate cost-effective and reliable outcomes in an era of overwhelming data. Keywords: bioinformatics, homology modelling, LBDD, MD simulation, molecular docking, SBDD

2.1 Introduction Nowadays, highly sophisticated tools and techniques are being developed and used to tackle big data generated via various genomics, proteomics, and allied projects

*Corresponding author: Mohammad Kalim Ahmad Khan, Department of Bioengineering, Faculty of Engineering, Integral University, Lucknow, Uttar Pradesh, 226026, India, E-mail: [email protected]. Salman Akhtar, Department of Bioengineering, Faculty of Engineering, Integral University, Lucknow, Uttar Pradesh, 226026, India This article has previously been published in the journal Physical Sciences Reviews. Please cite as: M. K. A. Khan and S. Akhtar “Novel drug design and bioinformatics: an introduction” Physical Sciences Reviews [Online] 2021. DOI: 10.1515/psr-2018-0158 |


2 Novel drug design and bioinformatics

Figure 2.1: The genesis of bioinformatics.

exploring complicated biological systems, thereby helping to understand the genetic changes affecting health and diseases. However, globally, the scientific community finds it a Herculean task to quickly manage such enormous data and deducing meaningful findings, which is relatively more comfortable and cost-effective. So, according to the need and demand of time, science was developed called highly interdisciplinary bioinformatics, including quantitative sciences such as biostatistics, mathematics, computer science, chemical sciences, biophysics, imaging, computational biology, biometrics and cybernetics, as well as biological sciences such as genomics, transcriptomics, proteomics, metabolomics, glycomics, phenomics, structural biology, system biology, evolutionary biology, population genetics, cellular and molecular biology to uncover patterns and associations within and between all sets of biological data (Figure 2.1). Thus, after the augment of bioinformatics, we got two approaches to tackle biological problems. One is a wet-lab experimental approach which is the conventional strategy of biological scientists. Another approach is biomolecular modelling and simulation, often known as computational or in silico or dry-lab method. However, wetlab biology is undoubtedly used to develop better models to describe our understanding of biology, while in silico results require validation through wet-lab experimentations. It means that in silico biology depends on experimental science to produce raw data for analysis. It, in turn, provides valuable information, clues and meaningful interpretations for further research and development [1, 2]. Thus, this interdisciplinary science is about improving and facilitating the methods and technologies of acquisition, processing, storage, distribution, analysis, interpretation and display of all biological information used by the people to answer the biological questions otherwise unattainable using conventional strategies. However, the term bioinformatics did not mean what it means today. In the early 1970s, Paulien Hogeweg and Ben Hesper coined bioinformatics to explain biotic systems’ information processes [2–4].

2.1 Introduction


Figure 2.2: The balance between biological activity and drug-like properties.

Bioinformatics and its intercalated approaches revolutionise and accelerate the entire virtual screening process of lead molecules and their subsequent development into drug molecules. There is a gap between the rate of initial data screening and their conversion into drug-like molecules reaching the market for several reasons, including low biopharmaceutical properties, lack of efficacy, toxicity, and market response. According to recent data published by Tufts Center for the Study of Drug Development (, the cost of drug discovery and development of new molecules lies somewhere between $2–3 billion that is supposedly very expensive and inadmissible level [5]. However, the overall cost of launching a new drug molecule to market is from the initial drug discovery and design process through various preclinical, clinical trials to registration and regulatory approvals. Moreover, such gaps could be curtailed to a certain extent by accelerating the accuracy and efficiency of lead optimisation techniques by exploiting in silico potential and thereby enhancing the balance between activities and drug-likeness properties of lead molecules (Figure 2.2). Different tools and techniques in the drug discovery process play an essential role in optimising newly identified small bioactive molecules. Once a molecule is established in the initial phase of the discovery process, we need to streamline biologics’ desirable characteristics. The structure-activity relationship (SAR), QSAR, CADD, SBDD, and de novo drug design is widely used to optimise lead molecules. There are numerous tools for the characterisation of binding cavities, e.g., estimation of charge distribution, pKa values or lipophilicity calculation, and identification of H-bond donors and acceptors; moreover, various docking tools are used along with 3D structure databases of bioactive molecules with different scoring parameters that attempt to depict the binding propensity of designed molecules. To be considered for further improvement, lead structures should be acquiescent to chemistry optimisation and have desirable drug-like properties.


2 Novel drug design and bioinformatics

Figure 2.3: Computational drug discovery approaches are applied in various stages of the drug discovery pipeline.

Although history indicates a different story, many drugs have been discovered by serendipity. However, a deeper understanding of cell biology, genetics, computational tools, target identification methods, and cracking its 3D structure has moved researchers’ more rational approach to drug design. The exponential increase in information on biological macromolecule and small molecules and biologics in various databases has increased the application of computational drug discovery, and it is applied to almost every stage in the drug design workflow, which includes target identification and validation, lead discovery, and optimisation and preclinical tests (Figure 2.3). CADD uses a more targeted search to improve novel drug compounds’ hit rate, which is impossible in traditionally used high throughput screening and combinatorial chemistry. In novel drug design, CADD is mainly used for three primary purposes: (1) filtering more extensive compound libraries into smaller sets. (2) Increasing ADMET properties by guiding lead compounds’ optimisation. (3) Designing novel lead compounds by growing starting molecules one functional group at a time or by piecing together fragments into novel chemotypes [6]. Drug or rational drug design is an inventive method of finding new medication based on biological targets’ knowledge [7]. The classification of drug design can be studied in two ways: structure-based drug design and ligand-based drug design. In structure-based drug design, the 3D structure and functional role of the target are known. We develop molecules with desirable characteristics towards the target, which can be a protein or nucleic acid. Another approach is ligand-based drug design. It is used when the 3D structure of the target is not known, and we try to develop small molecules with desired properties towards the target [8]. Different techniques used in CADD are summarised in Figure 2.4.

2.2 Structure-based drug design


Figure 2.4: Various techniques used in CADD.

2.2 Structure-based drug design With the availability of target structure, we use a structure-based drug design approach. X-ray Crystallography and NMR techniques determine the 3D structures of targets, and the data is stored in a Protein Data Bank (PDB). If the target structure is not known, then we can predict the structure by homology modelling. This approach works on the hypothesis that the molecule’s ability to interact with the target and exert the desired effect is due to its binding on a specific binding site on the biological target. Molecules that share the same kind of interaction with the binding site exert the same biological effect. Hence, novel compounds can be found using the interaction with the binding site. Some of the molecules developed using structure-based drug designs are mentioned in Table 2.1.

2.2.1 Homology modelling If the target’s 3D structure is unknown, we can use homology modelling to determine the structure using a template solved structure, also known as the comparative


2 Novel drug design and bioinformatics

Table .: Drug molecules developed using Bioinformatics interdisciplinary approaches. Year

Generic Name

 Captopril  Zanamivir  Dorzolamide  Saquinavir  Nelfinavir  Raltitrexed  Amprenavir  Isoniazid          

Dasatinib Raltegravir STX- Boceprevir Pim- Kinase Inhibitors Epalrestat Flurbiprofen STX- Steglatro Herzuma

 Rozlytrek  Pemazyre  Lupkynis


Drug Target


Bristol Myers Squibb Glaxo Smith Kline Merck & Co., Inc.

ACE inhibitor SBDD Neuraminidase SBDD Carbonic anhydrase Fragment-based screening Hoffman- La Roche HIV- protease SBDD Hoffman- La Roche HIV- protease SBDD AstraZeneca Thymidylate SBDD synthase GlaxoSmithKline HIV- protease Protein modelling & MD simulation Amsal Chem Inhibin, alpha (InhA) SBVS & pharmacophore modelling Bristol Myers Squibb Tyrosine kinase SBDD Merck & Co., Inc. HIV- integrase SBDD Sigma Aldrich STAT SBVS Schering-Plough Serine protease SBDD Tocris Bioscience Pim- Kinase Hierarchical multistage VS Ono Pharmaceutical Co., Ltd. Aldose Reductase SBVS & MD simulation Abbott COX- Molecular docking Sigma Aldrich STAT SBVS Merck SGLT inhibitor SBVS Celltrion, Inc. and Teva HER/neu receptor SBVS & pharmacoPharmaceutical Industries Ltd. antagonist phore modelling Genentech, Inc. Tyrosine kinase SBVS inhibitor Incyte Corporation FGFR SBVS Aurinia Pharmaceuticals Inc. CalcineurinSBVS inhibitor

modelling of protein. The prediction of 3D structure can be made by several approaches, depending on the availability of template sequences with significant sequence identity. If there is no template available with significant sequence identity to the target sequence, we use de novo methods or ab initio methods [9–11]. If the similarity between the query sequence and template is low (25% suggest that template and target have similar 3D structures, and hence template is suitable for modelling [27]. A sequence identity of >60% means that the resulting homology model is accurate and similar to experimentally derive structures because folding in a protein is more highly conserved than its amino acid sequence [27]. After model generation, it is optimised and minimised to remove or minimise the unfavourable interaction between non-covalently bonded atoms. After energy minimisation, molecular dynamics simulations are recommended using force fields and taking into account that calculations are restricted to avoid deviation from the original template and loss of similarity to the experimental model, followed by validation of the constructed model. There are many ways for validation, but the primary methods are based on stereochemical analysis in the same way as is done for experimental structures. The stereochemistry of the model can be verified by software like PROCHECK [28], WHAT CHECK [29], PROSA [30] and Molprobity [31, 32]. Phylogenetically similar proteins have a similar sequence, and homologous proteins have similar structures due to their conserved sequence. Sequence alignment and template structure will help generate a structural model of target protein [33]. Some of the popular modelling software is SWISS-MODEL [34] and MODELLER [35]. The MODELLER [15] can construct transmembrane protein more efficiently, and the Swiss model [36] is used for polar proteins. Homology modelling has been instrumental in drug design. There are many examples of its usage. In one case, the crystal structure of CXCR4 was used as a template to develop a model of chemo-attractant receptor OXE-R [37]. The binding mode of antihypertensive drugs to angiotensin II receptor type 1 was predicted in another example [38]. Later, crystal structure determination led to validating the homology model and analysing the binding mode of active compounds using MD and


2 Novel drug design and bioinformatics

Figure 2.5: A general protocol for homology modelling.

pharmacophore modelling [39]. The general protocol of homology modelling is shown in Figure 2.5 [40].

2.2.2 Ligand docking Docking is one of the most highly used methods in structure-based drug design. It is the most efficient method in designing, discovering, and synthesis of therapeutic drugs. The molecular docking approach can model the ligand and target at the atomic level to characterise ligand and describe fundamental processes [41]. Two steps can perform docking: the first step is the sampling conformation of ligand in the target’s active site, followed by ranking this conformation by scoring function. Monte Carlo [42] and genetic algorithm [43] are typically used. In the Monte Carlo method, several ligands pose through bond rotation, rigid body translation or rotation is generated. An iterative process of collecting a predefined quantity of conformations, which pass the energy-based selection criterion, is done after saving and modifying the subsequent confirmation in the loop. An earlier version of AutoDock [44], ICM [45] and QXP [46] use the Monte Carlo method. Another class of well-known stochastic methods is the Genetic Algorithm [43]. Darwin’s theory of evolution inspired the genetic algorithm. Genes, which are binary

2.2 Structure-based drug design


strings, are encoded forms of the degree of freedom of ligand. Chromosomes, made up of genes, represent the ligand pose. Mutation and cross-over are two genetic operations in GA. The exchange of genes between two chromosomes happens during the crossover, and sudden random change to the gene is caused by mutation. A new ligand structure is formed when the genetic operators affect the genes. Assessment of new structures is done by scoring function, and the structure which crosses the threshold can be used for the next generation [47]. AutoDock [43], GOLD [48], DIVALI [49], and DARWIN [50] use genetic algorithms. The scoring function helps to separate correct from contorted poses and to separate binders from inactive molecules. Scoring functions are of two types: force field-based scoring function [51], empirical scoring function [52], and knowledge-based scoring function [53]. Assessing the binding energy by calculations of non-bonded interactions is used in classical force field-based scoring functions [40, 50, 51]. An extension of force fieldbased scoring functions will also consider hydrogen bonds, solvations and entropy contributions. DOCK [54], GOLD [48], and AutoDock [43] use such functions. In empirical scoring functions [52], binding energy breaks into several energy components: hydrogen bond, ionic interaction, hydrophobic effect and binding entropy. LUDI [55], PLP [52], ChemScore [56] are examples derived from empirical scoring functions. Statistical analysis of ligand-protein complexes crystal structures is used to obtain the interatomic contact frequencies and distances between the protein and ligand in knowledge-based scoring functions [53]. Examples of knowledge-based scoring functions are PMF [57], DrugScore [58] and Bleep [59]. There are following three types of docking methodologies that are being used conventionally. 1. Induced fit docking: Ligand and receptor, both are considered flexible in this. The ligand binds flexibly to the active site in receptor protein for maximum bonding forces between them. 2. Lock and key docking: According to this, both receptor and ligand are rigid, and they show tight binding with each other. 3. Ensemble Docking: This approach explains the complexity and flexibility of conformational states of proteins. Multiple protein structures are utilised as an ensemble for docking with the ligand. Molecular docking is widely used in the drug discovery process to find novel compounds against drug targets. The flow chart of molecular docking is shown in Figure 2.6.

2.2.3 Fragment-based drug design The fragment-based technique is a promising drug design approach to identifying chemical compounds with a low molecular weight that can bind effectively with the


2 Novel drug design and bioinformatics

Figure 2.6: Basic steps of molecular docking.

molecular targets [60]. Most therapeutic targets are proteins, but there are few examples of molecular targets other than proteins, e.g., nucleic acids, nucleoproteins, lipoproteins and glycoproteins [61, 62]. One of the critical principles underlying fragment-based drug design (FBDD) is that screening small chemical compounds increases the likelihood of finding HIT relative to screening large and complex molecules [63]. A variant of virtual screening, emerging in silico lead discovery method, is FBDD. A low molecular weight fragment of the complete compound is introduced in the binding pocket of the receptor. A lead candidate is grown, using these fragments as starting material. New leads are formed by sequentially joining together molecules. There are three sources of fragments: natural products, biologically active drugs and compounds with novel scaffolds [64]. Fragments usually have a molecular weight of  Da pKa >  (acidic)


Gombar-Polli Mo- Molecules with MolES >  have a lecular E-state high probability for being substrates (MolES) Rule neutral or basic molecules showing a MW >  and a log P value >  are more likely to be transported by ABCB

Number of H-bond acceptors(N + O) ≥  (MW)molecular weight <  Da pKa <  (acidic) with MolES <  have a high probability for being nonsubstrates


methods to predict the extent of drug penetration to the brain. Lipophilicity relates to BBB permeability, and higher lipophilicity increases CNS tissue binding. The effect of molecular weight depends on the transport mechanism (active transport/passive diffusion). In case of transmembrane diffusion, for non-peptides usually, absorption favors when the value is in between 400 and 600 Da whereas for peptides and protein drugs absorption is sufficient even when the molecular weight is higher than 600 Da (enkephalins: higher than 600 Da; cytokine-induced neutrophil chemoattractant1:7800 Da) [92]. Polar surface area is the surface area (°A2) occupied by nitrogen and oxygen atoms and polar hydrogens bonded to these heteroatoms. Jan Kelder et al. studied the relationship between polar surface area (°A2) and BBB permeability log(C brain/C blood) of 45 CNS drugs. Higher PSA values were unfavorable for penetration, and a value of 70°A2 was found to be optimum in their study [69] (Table 3.9). The majority of the in silico models based upon the assumption that drug transport to the brain takes place via passive transport. However, several factors play a pivotal role in the transportation of a drug molecule, such as plasma protein binding, P-gp, and other active transporters and metabolic enzymes, consideration of which is essential. These factors limit the modeling of BBB (Table 3.10).

3.4.3 In silico tools to predict metabolism Ninety percentages of drug molecules undergo metabolism by CYP450 enzymes, in particular 1–7. These enzymes’ activity is affected by age, sex, disease conditions, genetic polymorphism, and activity of different hormones. Phase I reactions introduce polar groups on the drug molecule, and phase II reactions introduce conjugate groups


3 In silico drug design

Table .: QSAR studies applied to predict BBB penetration. Property/ activity


BBB penetration

Lipophilicity and hydrogen bonding Lipophilicity and PSA Polar surface area (PSA) Size, hydrogen-bonding molecular weight Hydrophobicity, stereo parameters, and descriptors based on molecular and quantum mechanics Total polar surface area, ALOGP logP, pKa Kappa molecular shape indices, topological and electrotopological state indices, differential connectivity indices, graph’s radius and diameter, Wiener and Platt indices, Shannon and Bonchev-Trinajsti’c information indices, counts of different vertices E-State indices, chi indices and kappa shape indices Octanol/water partition coefficient (logP), the topological polar surface area (TPSA) and the total number of acidic and basic atoms


[] [] [] [] [] [] [] [] []

[] []

to eliminate molecules from the blood circulation [93]. The prediction of metabolism can be performed using either ligand-based or structure-based methods based on the available information. In some cases, hybrid methodologies provide efficient results. QSAR, pharmacophore approaches, and shape-based approaches constitute ligandbased methods, while molecular docking and molecular dynamics are structure-based approaches. The majority of the in silico tools concentrate on identifying various sites of metabolism of a compound. 3,4-Methylenedioxybenzoyl-2-thienylhydrazone (LASSBio-294) is used to prevent myocardial infarction and this drug is metabolized primarily by CYP2C9. In silico approaches were successfully utilized to study the metabolism of LASSBio-294. Molecular docking study of CYP2C9- LASSBio-294 complex revealed the significant interactions between sulfur atom present in thiophene ring and heme iron of enzyme. Sulfoxidation was predicted as the major metabolic pathway, and surprisingly, these results were correlated well with the in vitro studies. Similarly, the metabolism of 1-[1-(4-Chlorophenyl)-1H-4pyrazolylmethyl] phenylhexahydropiperazine (LASSBio-579), a drug used to treat Schizophrenia was predicted using molecular docking and simulation studies. In this case, the major CYP450 enzyme is CYP1A2. The proximity between the benzene ring of a drug molecule and heme iron revealed the metabolic site i.e. aromatic hydroxylation [94] (Table 3.11).

3.4 In silico approaches-application to predict pharmacokinetic parameters and toxicity


Table .: Local models for hERG toxicity. Model used

Data set

Pharmacophoric features

D-QSAR derived pharmacophore model Five-point pharmacophore hypotheses

Antipsychotic drugs

One ring aromatic and three hydrophobic features and hydrophobicity Pharmacophores contain three hydrophobe/aromatic features and two hydrogen bond acceptors * ClogP (ClogP > : molecule is not a hERG bloacker) The best model showed an average deviation of . (pIC) for the  hERG blockers in the test set

Pharmacophore hypotheses

 uncharged and neutral hERG blocking agents

 hERG blockers

Based on a training set of  molecules, whereas the most representative models with high accuracy Seven complementary The best Catalyst model and six pharmacophore LigandScout models models Three major features contributing to the hERG blockage other features were characterized like compounds with a quinolinol group that were found to be hERG blockers. hERG Five pharmacophore models

Reference []





*positively charged nitrogen atom *high lipophilicity * the absence of negatively charged oxygen atom


3.4.4 In silico tools to predict toxicity hERG prediction & cardiotoxicity The contractility and rhythmic actions of the heart are controlled by the proper functioning of several voltage-gated ion channels (sodium, potassium, and calcium) expressed in the heart musculature. Among them, voltage-gated potassium channels contribute to action potential repolarization; Kv4.3 and Kv1.4 contribute to fast and slow components of the transient outward current (Ito), Kv1.5 is responsible for ultrarapid delayed-rectifier current (IKur), Kv11.1 aka hERG mediates rapid delayed-rectifier current (IKr), and Kv7.1 aka KvLQT1 conducts slow delayed-rectifier current (IKs). Human-ether-a-go-go-related channel (hERG) is a voltage-gated K+ channel (Kv11.1)


3 In silico drug design

expressed not only in the heart but also in the brain, smooth muscle cells, endocrine cells, and in tumor cell lines [95]. Cardiotoxicity hERG channels regulate ventricular and atrial myocyte cardiac action potential repolarization, and inhibition of this channel leads to Long QT syndrome (LQTS), arrhythmia, and Torsade de Pointes (TdP), and also sudden death. Drugs like antiarrhythmics, antihistamines, antifungals, antipsychotics, and antitussives have been withdrawn from the market due to hERG-related cardiotoxicity. It is considered as an off-target interaction with hERG channel. Therefore, assessment of hERG-related cardiotoxicity has become an essential step in decreasing cardiotoxicity risk [96]. In vitro assays, in vivo and in silico studies are useful to assess hERG inhibition. The in vitro methods are costly (patch-clamp technique, flux assays, fluorescence-based assays, and radio-labeled binding assays). Therefore, in silico methods are developed to screen the hERG activity during the early stages of drug discovery [97]. Drugs with hERG-related cardiotoxicity Terfenadine, Astemizole, Cisapride, Vardenafil, Ziprasidone, Droperidol, Dofetilide, Ibutilide, Mibefradil, Grepafloxacin, Terodiline, Levomethadyl, Bepridil, Sotalol, Moxifloxacin, Sulfamethoxazole, Erythromycin, Ketoconazole, Spironolactone, Ampicillin, Losartan, Ritonavir, Saquinavir, Clotrimazole, etc. are few drugs that withdrawn from the market due to their cardiotoxicity. Terfenadine, Astemizole Terfenadine, H1 receptor antagonist, was withdrawn from the market because it’s cardiotoxicity. Astemizole was a second-generation antihistamine drug that was withdrawn from the market because of its side effects (QTc interval prolongation and related arrhythmias due to hERG channel blockade). Terfenadine inhibits the hERG channel. inhibition occurs due to interaction of terfenadine with crucial amino acids Thr-623, Ser-624, Tyr-652 and Phe-656 located in the hERG channel [98]. Cisapride Cisapride is useful to treat nocturnal heartburn and other gastrointestinal disorders. It was also withdrawn from the market due to its affinity with hERG channel. Cisapride interacts with the essential amino acids, such as Thr-623, Ser-624, Tyr-652, and Phe-656 located in the S6 domain of the hERG channel [98]. Chloroquine & quinidine Quinoline containing antimalarial agents also block hERG channel.

3.4 In silico approachespplication to predict pharmacokinetic parameters and toxicity

75 Amantadine Antipsychotics Several antipsychotics, including aripiprazole, clozapine, droperidol, mesoridazine, olanzapine have hERG inhibition potential. Class III antiarrhythmic agents Antihypertensive agents In silico methods Ligand-based methods Ligand-based methods (structures of known hERG channel blockers) include 2D quantitative structure-activity relationship (QSAR), 3D QSAR, 3D pharmacophores, and classification models which aid in understanding the structure-activity relationship of hERG blockers. Structure-based methods Structure-based methods utilize homology models and the hERG ion channel’s crystal structure to explore molecular interactions and binding modes of compounds. Several studies reported the usefulness of these models for identifying blockers and the study of molecular interactions, state-depended inhibition profiles, and binding modes (Table 3.12). Table .: On line tools to predict hERG affinity of ligands. Web server for hERG prediction URL


Pred-HERG ACD-/I-Lab HitPick

Free Commercial Free

PASS SuperPred admetSAR . SwissADME hERG OpenVirtualTox lab

( (/index.php) hitpick. http://ibmc.p.ru/PASS/// –

Free License required Free Free Free Commercial


3 In silico drug design Structural details of hERG hERG channel contains a more extensive active site to accommodate large and variety of chemicals. Earlier studies highlighted hydrophobicity and the presence of charged nitrogen as major structural determinants of hERG inhibition after observing several classes of drugs such as class III antiarrhythmics, antihistaminics, analgesics etc. The majority of these drugs possess hydrophobic centers and at least one positively charged nitrogen. The active site of the hERG channel has a positively charged center, which has a considerable affinity for positively charged ligands. The binding pattern of drugs is different; some of them bind to the open channel while others bind to the channel in open conditions. Mostly, π-π interactions between aromatic rings of drugs and F557, F656, and F557 of hERG active site lead to strong affinity. Haloperidol, terfenadine, cisapride, and dofetilide interact with T623 and S624 residues. Nicotine, which does not have any positively charged nitrogen, binds effectively with the hERG channel and blocks its activity. Spironolactone also blocks the activity of the channel without having any hydrophobic group or and positive nitrogen. These findings indicate the possibilities of other structural determinants or pharmacophoric features to bind with the channel. 3D-structural details of hERG (KCNH2 or Kv 11.1) channel hERG channel is organized as a homotetramer with 1159 amino acids. Each monomer is composed of four subunits. Combinedly these monomers organize to form a central pore responsible for the passage of potassium ions. i) N-terminal, Per-Arnt-Sim (PAS) domain (PAS) domain ii) Voltage sensor domain (VSD): functions as a voltage iii) Pore domain (PD): It is the ligand-binding site. The crucial amino acids Try-652 and Phe-656 are located in this region iv) Nucleotide-binding domain (cNBD) [99] (Table 3.13) Important in silico tools to predict hERG toxicity Target prediction tools such as SEA and Swiss Target Prediction are also predicting hERG affinity [97]. Application of in silico tools for the prediction of toxicity For QSAR analysis, IC50/PIC50 values are correlated against various descriptors, whereas in the case of QSTR, LD50 (Lethal dose 50, a lethal dose to 50% of the given population) values need to be used to derive a predictive model. ADME parameters greatly influence LD50 values, and due to this complexity, QSAR/QSTR models are derived for the specific type of chemicals, Ex. Organophosphorus pesticides, anilines, pyrines, aliphatic amines, and phenyl urea derivatives [100] (Table 3.14).

3.4 In silico approaches-application to predict pharmacokinetic parameters and toxicity


Table .: Important descriptors related to toxicity. Chemical class of compounds

Type of toxicity studied

Twenty six aliphatic & thirty three Acute oral aromatic amines toxicity Twenty phenyl urea derivatives Acute toxicity Pyrines, N-oxide derivatives, hy- Acute droxy, alkyl, pyridyl derivatives toxicity Substituted anilines Acute oral toxicity

Sixty organophosphorus compounds

Acute oral toxicity

Influencing descriptors


Log-P, molecular volume, LUMO


Log-P and electronic parameters


Log-P and LUMO


Fifth order valence cluster molecular connectivity, atomic charge weighted fractional positive surface area, total molecular surface area Lipophilicity, MR, Hydrogen Bonding Acceptor (HBA) and Hydrogen Bonding Donor (HBD)



*LD values were retrieved from Register of Toxicology Effects of Chemical Substances (RTECS) while performing QSAR studies.

Table .: In silico tools available to predict toxicity end points. Software



ADMET Predictor http://www.simulations-plus. com/ ACD ToxSuite (ToxBoxes) products/admet/tox/


Qualitative and quantitative prediction of oestrogen receptor toxicity in rats ER binding affinity prediction. Identify and visualize specific structural toxicophores

Commercial Free web application: http:// webboxes/ Free

CAESAR Derek Commercial Leadscope Commercial MolCode Toolbox Commercial OSIRIS property explorer Freely org/prog/peo/ PASS Commercial Institute of Biomedical Chemistry of the Russian Academy of Medical Sciences, Moscow http://ibmc.p.ru/PASS//

To assess developmental toxicity Developmental toxicity Developmental toxicity in the rodent fetus Quantitative prediction of rat ER binding affinity and AhR binding affinity To predict “undesirable” effects mutagenicity, tumorigenicity, irritating effects and reproductive effects Predicts embryo toxicity


3 In silico drug design

Table .: (continued) Software



T.E.S.T.: The Toxicity Estimation Software Tool std/cppb/qsar/index. html#TEST) TIMES (COREPA) Laboratory of Mathematical Chemistry, Bourgas University TOPKAT (Accelrys) ProTox-II II/ ToxPredict toxpredict-tool QMPRPlus https://www.simulations-plus. com/about/ MedChem Designer https://www.simulations-plus. com/software/medchemdesigner/ eMolTox MouseTox MouseTox/ vNNADMET vnnadmet/login.xhtml admetSAR admetsar pkCSM pkcsm/prediction vegaQSAR Lazar predict


Developmental toxicity estimation


Classification models for the prediction of estrogen, androgen and aryl hydrocarbon binding



Developmental toxicity of pesticides, industrial chemicals Predicts acute toxicity, organ toxicity, mutagenesity, hepatotoxicity Predicts toxic effects


Predicts ADMET properties


Predicts ADMET properties


Predicts toxicity end points


Predicts cytotoxicity


Predicts cytotoxicity, mutagenicity, cardiotoxicity


Predicts AMES toxicity, Acute oral toxicity, fish toxicity


Predicts AMES toxicity, hepatotoxicity


Predicts chemical toxicity


Carcinogenicity, mutagenesity


Available software programs for predicting toxicity end points (Reproduced and modified from Elena Lo Piparo and Andrew Worth, Joint Research Centre reports ).



3.5 Conclusion Drug discovery and development paradigm is time-consuming (approximately 10–15 years) and costly process. In silico approaches imparted significant momentum at all the stages of rational drug design. In silico tools are now well established as non-animal alternatives to evaluate efficacy and toxicity of a chemical or drug molecule. Earlier, these approaches were limited to the pharmacodynamic domain, but their applicability has been extended to even pharmacokinetics. In vitro and in vivo methods are relatively costly and time-consuming when compared to in silico approaches. These approaches revolutionized the concepts of target identification, prediction of biological activity for novel targets, optimization of affinity, absorption, distribution, metabolism, excretion, and toxicity parameters. Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission. Research funding: None declared. Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

References 1. Aparoy P, Kumar Reddy K, Reddanna P. Structure and ligand based drug design strategies in the development of novel 5-LOX inhibitors. Curr Med Chem 2012;19:3763–78. 2. Sliwoski G, Kothiwale S, Meiler J, Lowe EW. Computational methods in drug discovery. Pharmacol Rev 2014;66:334–95. 3. Mandal S, Moudgil M, Mandal SK. Review on rational drug design. Eur J Pharm 2009;625:90–100. 4. Stepniewska-Dziubinska MM, Zielenkiewicz P, Siedlecki P. Improving detection of protein-ligand binding sites with 3D segmentation. Sci Rep 2020;10. 5035. 5. Vasker IA. Protein-protein docking: from interaction to interactome. Biophys J 2014;107:1785–93. 6. Bissantz C, Kuhn B, Stahl M. A medicinal chemist’s guide to molecular interactions. J Med Chem 2010;53:5061–84. 7. Anderson AC. The process of structure-based drug design. Chem Biol 2003;10:787–97. 8. Ferreira LG, Dos Santos RN, Oliva G, Andricopulo AD. Molecular docking and structure-based drug design strategies. Molecules 2015;20:13384–421. 9. Podlaski F, Filipovic Z, Kong N, Kammlott U, Kammlott U, Lukacs C, et al. In vivo activation of the p53 pathway by small-molecule antagonists of MDM2. Science. 2004;303:844–8. 10. Wang S, Zhao Y, Aguilar A, Bernard D, Yang CY. Targeting the MDM2-p53 protein-protein interaction for new cancer therapy: progress and challenges. Cold Spring Harb Perspect Med 2017; 7. Submitted for publication. 11. Scholten DJ, Canals M, Maussang D, Roumen L, Smit MJ, Wijtmans MS, et al. Pharmacological modulation of chemokine receptor function. Br J Pharmacol 2012;165:1611–43. 12. Garcia-Perez J, Rueda P, Staropoli I, Kellenberger E, Alcami J, Arenzana-Seisdedos F, et al. New insights into the mechanisms whereby low molecular weight CCR5 ligands inhibit HIV-1 infection. J Biol Chem 2011;286:4978–90. Submitted for publication.


3 In silico drug design

13. Cavasotto CN, Orry AJ. Ligand docking and structure-based virtual screening in drug discovery. Curr Top Med Chem 2007;7:1006–14. 14. Yang SY. Pharmacophore modelling and applications in drug discovery: challenges and recent advances. Drug Discov Today 2010;15:11–2. 15. Rodolpho BC, Alves VM, Silva AC, Nascimento MN, Silva FC. Virtual screening strategies in medicinal chemistry: the state of the art and current challenges. Curr Top Med Chem 2014;14: 1899–912. 16. Evanthia L, George S, Vassilatis DK, Cournia Z. Structure-based virtual screening for drug discovery: principles, applications and recent advances. Curr Top Med Chem 2014;14:1923–38. 17. Yusuf T, Kruger B, Proschak E. The holistic integration of virtual screening in drug discovery. Drug Discov Today 2013;18:358–64. 18. Talele TT, Khedkar SA, Rigby AC. Successful applications of computer aided drug discovery: moving drugs from concept to the clinic. Curr Top Med Chem 2010;10:127–41. 19. Hirooka K, Shiraga F. Potential role for angiotensin-converting enzyme inhibitors in the treatment of glaucoma. Clin Ophthalmol 2007;1:217–23. PMID: 19668475. Submitted for publication. 20. Doreen S, Leopold S, Gerhard G, Alina PC. Pharmacotherapy of Glaucoma. J Ocul Pharmacol Ther 2015;31(2):63–67. Submitted for publication. 21. Randal Kipp D, Hirschi JS, Wakata A, Glodstein H, Schramm VL. Transition states of native and drug-resistant HIV-1 protease are the same. PNAS 2012;1–6. 10.1073/pnas.1202808109/-/DCSupplemental. 22. Taha MO, Qandil AM, Al-Haraznah T, Khalaf RA, Zalloum H, Al-Bakri AG. Discovery of new antifungal leads via pharmacophore modeling and QSAR analysis of fungal N-myristoyl transferase inhibitors followed by in silico screening. Chem Biol Drug Des 2011;78:391–407. [Epub 2011 Jul 13]. 23. Zaheer U-H, Usmani S, Shamshad H, Mahmood U, Halim SA. A combined 3D-QSAR and docking studies for the in-silico prediction of HIV-protease inhibitors. Chem Cent J 2013;7:88–100. 24. Politi A, Durdagi S, Moutevelis-Minakakis P, Kokotos G, Papadopoulos MG, Mavromoustakos T. Application of 3D QSAR CoMFA/CoMSIA and in-silico docking studies on novel renin inhibitors against cardiovascular diseases. Eur J Med Chem 2009;44:3703–11. 25. Badhani B, Kakkar R. In-silico studies on potential MCF-7 inhibitors: a combination of pharmacophore and 3D-QSAR modeling, virtual screening, molecular docking, and pharmacokinetic analysis. J Biomol Struct Dyn 2016;35:1950–67. 26. Zuo K, Liang L, Du W, Sun X, Liu W, Gou X, et al. 3D-QSAR, molecular docking and molecular dynamics simulation of Pseudomonas aeruginosa LpxC inhibitors. Int J Mol Sci 2017;18:761. 27. Ding L, Wang ZZ, Sun XD, Yang J, Ma CY, Li W, et al. 3D-QSAR (CoMFA, CoMSIA), molecular docking and molecular dynamics simulations study of 6-aryl-5-cyano-pyrimidine derivatives to explore the structure requirements of LSD1 inhibitors. Bioorg Med Chem Lett 2017;27:3521–8. 28. Ming H, Li Y, Wang Y, Yan Y, Zhang S. Combined 3D-QSAR, molecular docking, and molecular dynamics study on piperazinyl-glutamate-pyridines/pyrimidines as potent P2Y12 antagonists for inhibition of platelet aggregation. J Chem Inf Model 2011;51:2560–72. 29. Zhou S, Zhou L, Cui R, Tian Y, Li X, You R, et al. Pharmacophore-based 3D-QSAR modeling, virtual screening and molecular docking analysis for the detection of MERTK inhibitors with novel scaffold. Comb Chem High Throughput Screen 2016;19:73–96. 30. Luo PH, Zhang XR, Huang L, Yuan L, Zhou XZ, Gao X, et al. 3D-QSAR pharmacophore-based virtual screening, molecular docking and molecular dynamics simulation toward identifying lead compounds for NS2B-NS3 protease inhibitors. J Recept Signal Transduct 2017;37:481–92. 31. Singh U, Gangwal R, Prajapati R, Dhoke G, Sangamwar A. 3D QSAR pharmacophore-based virtual screening and molecular docking studies to identify novel matrix metalloproteinase 12 inhibitors. Mol Simulat 2013;39:385–96.



32. Debnath T, Majumdar S, Kalle AM, Aparna V, Debnath S. Identification of potent histone deacetylase 8 inhibitors using pharmacophore-based virtual screening, three-dimensional quantitative structure-activity relationship, and docking study. Res Rep Med Chem 2015;5:21–39. 33. Madhulatha K, Chandra S, Tiwari N, Subbarao N. 3D QSAR, pharmacophore and molecular docking studies of known inhibitors and designing of novel inhibitors for M18 aspartyl aminopeptidase of Plasmodium falciparum. BMC Struct Biol 2016;16:12. 34. Mahiwal K, Kumar P, Narasimhan B. Synthesis, antimicrobial evaluation, ot-QSAR and mt-QSAR studies of 2-amino benzoic acid derivatives. Med Chem Res 2012;21:293–307. 35. Antanasijević D, Antanasijević J, Trišović N, Ušćumlić G, Pocajt V. From classification to regression multi-tasking QSAR modeling using a novel modular neural network: simultaneous prediction of anticonvulsant activity and neurotoxicity of succinimides. Mol Pharm 2017;14:4476–84. 36. Speck-Planche A, Natália M, SoeiroCordeiro D. Chemoinformatics for medicinal chemistry: insilico model to enable the discovery of potent and safer anti-cocci agents. Future Med Chem 2014; 6:2013–28. 37. Speck-Planche A, Natália M, SoeiroCordeiro D. Fragment-based in-silico modeling of multi-target inhibitors against breast cancer-related proteins. Mol Divers 2017;21:511–23. 38. Koga H, Itoh A, Murayama S, Suzue S, Irikura T. Structure-activity relationships of antibacterial 6,7and 7,8-disubstituted 1-A1kyl-1,4-dihydro-4-oxoquinoline-3-carboxylic acids. J Med Chem 1980; 23:1358–63. 39. John VD, Andrew TC, David JC, George BG, Alexander LJ, William AP, et al. The discovery of potent nonpeptide angiotensin II receptor antagonists: a new class of potent antihypertensives. J Med Chem 1990;33:1312–29. 40. Subramanian G, Kitchen DB. Computational approaches for modeling human intestinal absorption and permeability. J Mol Model 2006;12:577–89. 41. Lipinski CA, Lombardo F, Dominy BW, Feeney PJ. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev 2001;46:3–26. 42. Castillo-Garit JA, Cañizares-Carmenate Y, Marrero-Ponce TF, Abad C. Prediction of ADME properties, part 1: classification models to predict Caco-2 cell permeability using atom-based bilinear indices. AFINIDAD 2014;71:129–38. 43. Wang N-N, Huang C, Dong J, Yao Z-J, Zhu M-F, Deng Z-K, et al. Predicting human intestinal absorption with modified random forest approach: a comprehensive evaluation of molecular representation, unbalanced data, and applicability domain issues. RSC Adv 2017;7:19007–18. 44. Avdeef A. The rise of PAMPA. Expet Opin Drug Metabol Toxicol 2005;1:325–42. 45. Irvine J, Kahashi L, Lockhart K, Cheong J, Tolan J, Selick HE, et al. MDCK (Madin−Darby canine kidney) cells: a tool for membrane permeability screening. J Pharmaceut Sci 1999;88:28–33. 46. Korinth G, Schaller KH, Drexler H. Is the permeability coefficient Kp a reliable tool in percutaneous absorption studies? Arch Toxicol 2005;79:155–9. 47. Kaliszan R, Noctor TAG, Wainer IW. Quantitative structure-enantioselective retention relationships for the chromatography of 1,4-benzodiazepines on a human serum albumin based HPLC chiral stationary phase: an approach to the computational prediction of retention and enantioselectivity. Chromatographia 1992;33:546–50. 48. Andrisano V, Bertucci C, Cavrini V, Recanatini M, Cavalli A, Varoli L, et al. Stereoselective binding of 2,3-substituted 3-hydroxypropionic acids on an immobilised human serum albumin chiral stationary phase: sereochemical characterisation and quantitative structure-retention relationship study. J Chromatogr A 2000;876:75–86. 49. Aureli L, Cruciani G, Cesta MC, Anacardio R, De Simone L, Moriconi A. Predicting human serum albumin affinity of interleukin-8 (CXCL8) inhibitors by 3D-QSPR approach. J Med Chem 2005;48: 2469–79. Submitted for publication.


3 In silico drug design

50. Colmenarejo G, Alvarez-Pedraglio A, Lavandera JL. Cheminformatic models to predict binding affinities to human serum albumin. J Med Chem 2001;44:4370–8. 51. Mao H, Hadduk PJ, Craig R, Bell R, Borre T, Fesik SWJ. Rational design of diflunisal analogues with reduced affinity for human serum albumin. J Am Chem Soc 2001;123:10429–35. 52. Valko K, Nunhuck S, Bevan C, Abraham MH, Reynolds DP. Fast gradient HPLC method to determine compounds binding to human serum albumin. Relationships with octanol/water and immobilized artificial membrane lipophilicity. J Pharmaceut Sci 2003;92:2236–48. 53. Markuszewski M, Kaliszan R. Quantitative structure–retention relationships in affinity highperformance liquid chromatography. J Chromatogr B 2002;768:55–66. 54. Ashton DS, Beddell C, Ray AD, Valkó K. Quantitative structure-retention relationships of acyclovir esters using immobilised albumin high-performance liquid chromatography and reversed-phase high-performance liquid chromatography. J Chromatogr A 1995;707:367–72. 55. Ashton DS, Beddell CR, Cockerill GS, Gohil K, Gowrie C, Robinson JE, et al. Binding measurements of indolocarbazole derivatives to immobilised human serum albumin by high-performance liquid chromatography. J Chromatogr B Biomed Sci Appl 1996;677:194–219. 56. Deeb O, Hemmateenejad B. ANN-QSAR model of drug-binding to human serum albumin. Chem Biol Drug Des 2007;70:19–29. 57. Vallianatou T, George L, Tsantili-Kakoulidou A. In-silico prediction of human serum albumin binding for drug leads. Expet Opin Drug Discov 2013;8:583–95. 58. Ghafourian T, Amin Z. QSAR models for the prediction of plasma protein binding. Bioimpacts 2013; 3:21–7. [Epub 2013 Feb 21]. 59. Berellini G, Waters NJ, Lombardo F. In silico prediction of total human plasma clearance. J Chem Inf Model 2012;52:2069–78. 60. Serlin Y, Shelef I, Knyazer B, Friedman A. Anatomy and physiology of the blood-brain barrier. Semin Cell Dev Biol 2015;38:2–6. 61. Gwen Mc Caffrey WDS, Sanchez Covarrubias L, Finch JD, De Marco K, Li Laracuente M, Ronaldson PT, et al. P-glycoprotein trafficking at the blood–brain barrier altered by peripheral inflammatory hyperalgesia. J Neurochem 2012;122:962–75. 62. Kim RB. Drugs as P-glycoprotein substrates, inhibitors and inducers. Drug Metabol Rev 2002;34: 47–54. 63. Didziapetris R, Japertas P, Avdeef A, Petrauskas A. Classification analysis of P-glycoprotein substrate specificity. J Drug Target 2003;11:391–406. 64. Gombar VK, Polli JW, Humphreys JE, Wring SA, Serabjit-Sing CS. Predicting P-glycoprotein substrates by a quantitative structure–activity relationship model. J Pharmaceut Sci 2004;93: 957–68. 65. Gleeson MP. Generation of a set of simple, interpretable ADMET rules of thumb computational & structural chemistry. J Med Chem 2008;51:817–34. 66. Young RC, Mitchell RC, Brown TH, Ganellin CR, Griffiths R, Jones M, et al. Development of a new physico-chemical model for brain penetration and its application to the design of centrally acting H2 receptor histamine antagonists. J Med Chem 1988;31:656–71. 67. Subramanian G, Kitchen DB. Computational models to predict blood-brain barrier permeation and CNS activity. J Comput Aided Mol Des 2003;17:643–64. 68. Clark DE. Rapid calculation of polar molecular surface area and its application to the prediction of transport phenomena. 2. Prediction of blood-brain barrier penetration. J Pharmaceut Sci 1999;88: 815–21. 69. Kelder J, Grootenhuis PD, Bayada DM, Delbressine LP, Ploemen JP. Polar molecular surface as a dominating determinant for oral absorption and brain penetration of drugs. Pharmaceut Res 1999; 16:1514–9.



70. Hou TJ, Xu XJ. ADME evaluation in drug discovery. 3. Modeling blood-brain barrier partitioning using simple molecular descriptors. J Chem Inf Comput Sci 2003;43:2137–52. 71. Ma XL, Chen C, Yang J. Predictive model of blood-brain barrier penetration of organic compounds. Acta Pharmacol Sin 2005;26:500–12. 72. Konovalov DA, Sim N, Deconinck E, Vander Heyden Y, Coomans D. Statistical confidence for variable selection via Monte Carlo cross-validation. J Chem Inf Model 2008;48:370–83. 73. Lanevskij K, Dapkunas J, Juska L, Japertas P, Didziapetris R. QSAR analysis of blood–brain distribution: the influence of plasma and brain tissue binding. J Pharmaceut Sci 2011;100: 2147–59. 74. Zhang L, Zhu H, Oprea IT, Golbraikh A, Tropsha A. QSAR modeling of the blood-brain barrier permeability for diverse organic compounds. Pharmaceut Res 2008;25:1902–14. 75. Rose K, Hall LH, Kier LB. Modeling blood-brain barrier partitioning using the electrotopological state. J Chem Inf Model 2002;42:651–66. 76. Vilar S, Chakrabarti M, Costanzi S. Prediction of passive blood-brain partitioning: straightforward and effective classification models based on in-silico derived physicochemical descriptors. J Mol Graph Model 2010;28:899–903. 77. Ekins S, Crumb WJ, Dustan Sarazan R, Wikel JH, Wrighton SA. Three-dimensional quantitative structure-activity relationship for inhibition of human ether-A-go-go-related gene potassium channel. J Pharmacol Exp Therapeut 2002;301:427–34. 78. Aronov AM. Common pharmacophores for uncharged human ether-a-go-go-related gene (hERG) blockers. J Med Chem 2006;49:6917–21. 79. Durdagi S, Subbotina J, Lees-Miller J, Guo J, Duff HJ, Noskov SY. Insights into the molecular mechanism of hERG1 channel activation and blockade by drugs. Curr Med Chem 2010;17:3514–32. 80. Tan Y, Chen Y, You Q, Sun H, Li M. Predicting the potency of hERG K+ channel inhibition by combining 3D-QSAR pharmacophore and 2D-QSAR models. J Mol Model 2012;18:1023–36. 81. Kratz JM, Grienke U, Scheel O, Mann SA, M Rollinger J. Natural products modulating the hERG channel: heartaches and hope. Nat Prod Rep 2017;34:957–80. 82. Doddareddy M, Klaasse E, Adriaan IJzerman S, Bender A. Prospective validation of a comprehensive in-silico hERG model and its applications to commercial compound and drug databases. ChemMedChem 2010;5:716–29. 83. Jäckel H, Klein W. Prediction of mammalian toxicity by quantitative structure activity relationships: aliphatic amines and anilines. Quant Struct-Act Relat 1991;10:198–204. 84. Nendza M, Dittrich B, Wenzel A, Klein W. Predictive QSAR models for estimating ecotoxic hazard of plant-protecting agents: target and non-target toxicity. Sci Total Environ 1991;109–110:527–35. 85. Cronin MTD, Dearden JC, Duffy JC, Edwards R, Manga N, Worth AP, et al. The importance of hydrophobicity and electrophilicity descriptors in mechanistically-based QSARs for toxicological endpoints. SAR QSAR Environ Res 2002;13:167–76. 86. Johnson SR, Jurs PC, van de Waterbeemd H, Testa B, olkers GF. Computer-assisted lead finding and optimization: current tools for medicinal chemistry. Basel: Verlag-Helvetica Chimica Acta; 1997: 190–208 pp. 87. De villers J, De villers H. Prediction of acute mammalian toxicity from QSARs and interspecies correlations. SAR QSAR Environ Res 2004;15:501–10. 88. Prajapati R, Singh U, Patil A, Khomane K, Bagul P, Bansal A, et al. In-silico model for P-glycoprotein substrate prediction: insights from molecular dynamics and in-vitro studies. J Comput Aided Mol Des 2013;27:347–63. 89. Li D, Chen L, Li Y, Tian S, Sun H, Hou T. ADMET evaluation in drug discovery & development of in silico prediction models for P-glycoprotein substrates. Mol Pharm 2014;11:716–26. 90. Semple G, Andersson BM, Chhajlani V, Georgsson J, Johansson MJ, Rosenquist A, et al. Synthesis and biological activity of?? Opioid receptor agonists. Part 2. Preparation of 3-aryl-2-pyridone


3 In silico drug design

analogues generated by solution- and solid-phase parallel synthesis methods. Bioorg Med Chem Lett 2003;13:1141–5. 91. Taeyoung Y, Stéphane DL, Robbin B, Michael G, James E, Alan H, et al. 2-Arylpyrimidines: novel CRF-1 receptor antagonists. Bioorganic Med Chem Lett 2008;18:4486–90. 92. Banks WA, Lynch JL, Price TO. Cytokines and the blood–brain barrier. In: Siegel A, Zalcman SS, editors. The neuroimmunological basis of behavior and mental disorders. Boston, MA: Springer; 2009. Submitted for publication. 93. Kazmi SR, Jun R, Myeong-Sang Yu, Chanjin J, Dokyun Na. In silico approaches and tools for the prediction of drug metabolism and fate: a review. Comput Biol Med 2109;106:54–64. 94. Braga RC, Alves VM, Silva MF, Muratov E, Fourches D, Tropsha A, et al. Tuning HERG out: antitarget QSAR models for drug development. Curr Top Med Chem 2014;14:1399–415. 95. Wang W, Mac Kinnon R. Cryo-EM structure of the open human ether-a`-go-go-related K+ channel hERG. Cell 2017;169:422–30. 96. Chemi G, Gemma S, Campiani G, Brogi S, Butini S, Brindisi M. Computational tool for fast in-silico evaluation of hERG K+ channel affinity. Front Chem 2017;5:1–9. 97. Villoutreix BO, Olivier T. Computational investigations of hERG channel blockers: new insights and current predictive models. Adv Drug Deliv Rev 2015;86:72–82. 98. Kamiya K, Niwa R, Morishima M, Honjo H, Sanguinetti M. Molecular determinants of hERG channel block by terfenadine and cisapride. J Pharmaceut Sci 2008;108:301–7. 99. Dalibalta S, Mitcheson JS. hERG channel physiology and drug-binding structure–activity relationships. In: Vaz RJ, Klabunde T, editors. Antitargets. Weinheim: WILEY-VCH Verlag GmbH & Co. KGaA; 2008:89–108 pp. 100. Tsakovska I, Lessigiarska I, Netzeva T, Worth A. A mini review of mammalian toxicity (Q) SAR models. QSAR Comb Sci 2008;27:41–8.

Rodrigo S. A. de Araújo, Francisco J. B. Mendonça, Jr., Marcus T. Scotti and Luciana Scotti*

4 Protein modeling Abstract: Proteins are essential and versatile polymers consisting of sequenced amino acids that often possess an organized three-dimensional arrangement, (a result of their monomeric composition), which determines their biological role in cellular function. Proteins are involved in enzymatic catalysis; they participate in genetic information decoding and transmission processes, in cell recognition, in signaling, and transport of substances, in regulation of intra and extracellular conditions, and other functions. Keywords: CADD, enzyme, interactions, model, protein, target

4.1 Proteins Proteins are versatile and essential macromolecules for the cellular functioning. Are polymers consisting of amino acids in sequence that, many times, possess an organized three-dimensional (3-D) arrangement, in response at their monomeric composition, which is also determinant to their biological functions, the involve their roles of enzymatic catalysis, participants in the process of decoding and transmission of the genetic information, cell recognition, signaling, transport of substances, regulation of intra and extracellular conditions, among others [1–3]. Organizationally, these macromolecules can be subdivided into four levels: Primary, Secondary, Tertiary and Quaternary structures [1–3]. – Primary structure: consists of the amino acid sequence, forming the backbone with peptide bonds between residues. The individual characteristics (such as hydrophilicity and hydrophobicity, for example) of each amino acid residue modulate the general properties of the peptide backbone; – Secondary structure: represents the first 3-D arrangement of the basic components of amino acids, without regard to space arrangement of their side chains. Where are observed the first folding patterns and the existence of characteristic structures, such as α-propellers, β-pleated sheets and loops, normally formed from electrostatic interactions and hydrogen bonds between the approximate residue; *Corresponding author: Luciana Scotti, Health Center, Federal University of Paraíba, 50670-910, João Pessoa, PB, Brazil, E-mail: [email protected]. Rodrigo S. A. de Araújo and Francisco J. B. Mendonça, Jr., Biological Science Department, Laboratory of Synthesis and Drug Delivery, State University of Paraiba, 58070-450, João Pessoa, PB, Brazil Marcus T. Scotti, Health Center, Federal University of Paraíba, 50670-910, João Pessoa, PB, Brazil This article has previously been published in the journal Physical Sciences Reviews. Please cite as: R. S. A. de Araújo, F. J. B. Mendonça, M. T. Scotti and L. Scotti “Protein modeling” Physical Sciences Reviews [Online] 2021. DOI: 10.1515/psr-20180161 |


4 Protein modeling

Tertiary structure: constitutes the 3-D arrangement of the polypeptide as a whole, yet also considering side chain amino acid characteristics. This structural level is molded by the characteristics of the individual residues and by the medium in which the protein is inserted in the cell, being highly influenced as well by the secondary structure, through the electrostatic interactions and hydrogen bonds formed between neighboring fold–related amino acids; Quaternary structures: can be visualized in proteins with two or more tertiary structures subunits, in which their complete protein spatial arrangement represents a final level of organization. These subunits interact with each other through the formation of large protein complexes, where contact between their amino acids allows stabilized packing.

In the determination of the 3-D polypeptide shapes, the characteristics of the amino acids present in the protein sequences are responsible for spatial distribution, and representing low energy states, they maintain stable structures in physiological medium. In structuration, non-covalent and electrostatic conformation influences provide greater stability in biological medium; representing their native and functional forms. Factors responsible for stabilization are hydrophobic effects, electrostatic interactions, and chemical crossed bonds. In accordance with their distribution in the biological medium, hydrophobic effects promote reductions in contact between apolar and polar substances. Apolar amino acids tend to arrange themselves with greater stability away from contact with the aqueous medium (and, normally, inserted within the protein), which would be in direct contact with the polar amino acids of protein structure. The opposite can be seen in the transmembrane proteins that, because they are inserted in a lipophilic medium, and tend to maintain their apolar amino acids at the protein surface, having their interior filled by polar monomers. Thus, this effect is the main determinant of the protein’s 3-D structures in physiologic mediums [1–3]. The presence of differing functional groups in the sequenced amino acids present in the polypeptide chain allows the existence of an interacting electrostatic pole within the protein structure providing for componential approximations, either through hydrogen bond formation, or through Van der Waals forces, or through weaker interactions. Such approximations are important to a stable, continuing 3-D structure in physiologic conditions [1–3]. As a consequence of protein structure folding, bonds known as disulfide bonds or chemical-cross bonds can occur between cysteine residues that may draw them closer (being present in the tertiary and quaternary structures); they are important for maintenance of conformational structure. Certain metal ions can also be found as participants of this type of chemical-cross bonds in protein structures [1–3]. Thus, the simple amino acid sequence which forms the primary protein sequence is a direct determinant of 3-D protein conformation; organizing the more advanced 3-D

4.2 Bioinformatics and the importance of computational tools


levels which adapt to this stable and functional conformation. In the search for a better understanding of intra and extra-cellular processes in cellular functions, the importance these macromolecules in a central role is highlighted [4].

4.2 Bioinformatics and the importance of computational tools In the post-genomic era, advances in the elucidation of biologically important structures have increased considerably, and have also allowed for increased understanding of (primary structure) amino acid sequences, this, from a very large number of proteins. However, certain steps that connect the primary amino acid sequence to 3-D conformation and protein structure are still not fully clarified, needing to be explored further for a complete understanding of the process [5–10]. Advances in computational tools have accelerated detailed molecular characteristic predictions involving protein folding and unfolding processes [11]. Bioinformatics refers to employment of computational tools as applied to biological issues or relations. With respect to its use in the structural elucidation of proteins, it can be used in both primary amino acid sequence analysis (classical bioinformatics), or in simulations and predictions concerning 3-D structuring using molecular modeling techniques (structural bioinformatics). These computational tools have been increasingly used in 3-D protein and peptide chain design, those mainly of therapeutic interest [12]. Understanding the process of forming, a template from a simple amino acid sequence to model a plausible 3-D structure for a protein implies considerable advances in protein structure design. It represents the recent and significant success of computational tools to predict protein structures [5], and complement our knowledge concerning the connectivity between the amino acid sequences, 3-D structures, and biological function [4, 13–15]. Thus, prediction of quaternary protein structures has as its main objective; the determination of a protein’s complete 3-D structural, principally from information of the amino acid sequence [5], yet including phylogenetic information on the effect of variations in amino acid sequence on 3-D modeling determinations, and interactions with other cell components [14–16]; considering intrinsic entropy, enthalpy, and free energy factors (which are essential for a more complete understanding of cellular processes), as well as drug design elucidated from their therapeutic targets [17]. Proteins are principal participants in cellular processes, actively interacting with other macromolecules and with the products of their metabolism. Thus, the wide view taken of biological molecule networks that interact with each other can be observed and considered collectively as an Interactome. This reinforces the importance of fundamental understandings in cell and molecular biology concerning 3-D protein–protein structures to rational drug design, whose detailed knowledge of both


4 Protein modeling

function and metabolic change caused to the normal interactome by disease [17], aims to elucidate complexes formed from determined or predicted individual structures [18–20]. Predicting 3-D protein structures depends on databases available of other protein structures, which can help to simulate formation of not yet understood protein quaternary structures. The existence of homologous structures in current protein databases can help by providing knowledge of a common backbone in similar structures, and also help 3-D structure prediction for the investigated molecule [21, 22]. Increases in the number of 3-D elucidated protein structures can, and will in future analyses, be used as templates for prediction of new peptide structures, and represent a great advance in biotechnology, biochemistry and drug development and discovery [5].

4.3 Homologous structures and de novo protein design Modeling protein structures is not always a direct routine to be followed in all yet to be elucidated structure analyses. Certain protein types present models which are more amenable to prediction than others. The existence of many small sequences in common between proteins can represent overly repetitive structural conformation. What should be used to facilitate the modeling process is information contained in primary and secondary structures, or in other key features intrinsic to a peptide sequence [23]. Homologous structures possess similarities between them that allow structural comparison. The existence of well described homologous structures in a current database can accelerate 3-D protein predictions, those not yet being clarified through alignment and comparison of similar portions [24]. Thus, used as templates for quaternary structural prediction of their not yet clarified homology [23], the relationship between homologous structures (from their information of primary and secondary structures) helps to build a common backbone between them and their independent 3-D predictions [21, 22]. For exploration of protein structure spaces and proteome assemblage in studied organisms, homology currently represents the most accurate approach for obtaining protein models [25–27] which might be complementary [28, 29]. In opposition to the above, structure predictions that do not possess well-known homologous proteins can yield greater difficulties when simulating their folding and final formation of their 3-D conformations. In the absence of such homology, modeling becomes “intuition-guided”, using small sequences and similarities with other groups of described proteins, and assembly of a “puzzle” of possible 3-D structures, and extraction of information concerning the intrinsic functionalities of residue sequences in common. The presence of similar sequences in their subunits can also help such predictions, since their individual domains can behave in similar and repetitive ways in their complete quaternary structures [23].

4.4 Protein data bank


A complete absence of described models in the databases represents an even greater challenge, and the protein of interest in 3-D prediction must be initiated theoretically, in an ab initio model, where the information contained in the residue sequences and the conformational energy possibilities should be taken into account in the assembling a plausible model, nearest to its native and functional structure. The observation of features intrinsic to amino acids present in the sequence of a polypeptide chain, such as its electrostatic properties and conformational energies, should be taken into account, and is essential for prediction of the structure and stability of the models revealed, either through utilization of the well-described information in protein databases or upon development of the initial models.

4.4 Protein data bank In 1971, a crystallographic database for 3-D structural data of biological macromolecules, the Protein Data Bank (PDB) was founded; initially with a total of seven described structures, thus starting the advancement of protein structure understanding [28, 30–32]. Although the initial use was limited to a small group of researchers, since the 1980s, with the advance of crystallography and molecular characterization techniques, the number of deposited structures has grown dramatically. Currently, their utilization is widely disseminated diversely among research groups, from researchers in biology and chemistry, to educators and students at all levels; and their most active depositors involve or are generally specialists in X-ray crystallography, nuclear magnetic resonance (NMR), microelectronic microscopy and theoretical modeling techniques. The influx of data and structures since the advancement of the genome project has significantly increased [30]. The database provides digital access to 3-D protein structures [33], where information such as polymeric sequences, chemical bonds and interactions, 3-D macromolecule folds, and experimental data utilized in the description of their 3-D structures. Further information can be accessed from the currently more than 113.000 deposited 3-D structures, encompassing not only proteins, but also other macromolecules types such as nucleic acids and carbohydrates [34–36]. Although its creation had been directed to structural biology researchers, exponential growth has permitted access to other interested biologists, and software developers, (thus bringing additional computational tools, and bioinformatics), and to researchers in more diverse areas, and also to the general public [36]. Today, the generation and submission of new PDB files involves the collaboration of scientists from all over the world, with the establishment of the worldwide PDB (wwPDB) in 2003, to bring standardization, recognition, processing, validation and distribution of the submitted PDB files, under responsibility of the founding members of the Research Collaborators for Structural Bioinformatics (RCSB PDB – USA) [37],


4 Protein modeling

Macromolecular Structure Database at the European Bioinformatics Institute (MSD-EBI) [38], and the Protein Data Bank of Japan (PDBj) [30]. Thus, open access to PDB files provides a rich source, detailing information concerning molecules of biological interest, their interactions, structural studies and functional concepts [36], which can be viewed and analyzed on specific servers, such as Chimera [39] and PyMol [40], as well as many others computational tools and databases [41], objectifying the incremental improvement of modeling methods and predictive information concerning biomolecular complex structures [17]. Available in

Even with the wide range of wwPDB use for access to experimental macromolecular structure information, models developed on theory alone may not be deposited in the crystallographic structure database [26]. This limitation encouraged the creation of the Protein Model Portal (PMP), with the specific purpose of permitting access to strictly theoretical models for protein modeling from known and available homologous models for being shown to be essential for prediction of quaternary protein structures of still unknown yet having particular biological application. The utilization and growth of the PMP will provide a base for the establishment of validation and filing standards, and of theoretical models, that can be associated to the PDB templates [20].

4.5 Molecular modeling The process of obtaining the 3-D structure of a deposited protein on Protein Data Bank occurs chiefly through the technique of X-ray diffraction crystallography, from the reading and processing data of the crystallized structures, to obtaining specific information on the 3-D crystal. NMR techniques have also been shown useful in structural elucidation of proteins, yet without the need to obtain the proteins in their crystalline form. NMR is able to reveal certain details not observed in X-ray crystallography, such as information concerning dynamic character and interactions with other molecules. Unfortunately, the utilization of these classic characterization tools (NMR spectroscopy, and X-ray crystallography) besides high cost, present certain limitations in attempting to describe 3-D structures of biological macromolecules, since they are not able to predict with sufficient accuracy such intrinsic characteristics such as flexibility, size and interactions with other biomolecules. To help in the search for information concerning 3-D structures and related functions, complementary high-throughput

4.5 Molecular modeling


methods, and utilization of available bioinformatics tools in molecular modeling for the structures deposited in the current databases are needed [23, 42]. During the last few decades, refined computational methods have been developed for construction of valid 3-D models to predict protein structures common in simulations of energetically favorable protein folds. The 3-D protein predictions involve certain stages, from knowledge of amino acid sequences, to prediction of secondary structures (α-propellers, β-pleated sheets, and loops), spatial conformations, selection of probable templates, and finally, search refinements toward more trustworthy (though not always reliable) structures [43]. The difficulties faced in sequencing and structuring a reliable protein template are largely affected by the existence (or not) of similar structures deposited in the PDB, that may be compared by homology to accelerate protein predictions [5, 37, 44].

4.5.1 Comparative modeling Modeling as based on structure similarities allows alignment and comparison of the primary structures, and observation of evolutionary relations [45]. This, for a reliable prediction, helps secondary and tertiary structure predictions, and includes insequence contacts between amino acid residues, 3-D structuring, and topology [5]. An advantage of using comparative modeling is its possibility to aid in developing statistical techniques for extracting information concerning evolutive relations between homologous structures. The great growth in the number of sequenced and predicted structures increases useable information availability for new prediction studies and increases processing capacity with large databases [46]. Comparison of homologous structures, and searching for similar amino acid sequences in the current databases, is an initial step. Through alignment of residue sequences and identification of conserved portions, the formation of a similaritiesbased template will enable a 3-D protein model of interest. Afterward, certain (similar) structures can be found, and a correct mold selection should follow with further observations, and can be more closely approximated by grouping of such template proteins into families of interest; with their functional proximities; and the presence or not of binders in their structure that can allow further more reliable comparisons for better prediction of a structure of interest. Alignments with similarities at higher than 40%, allow generation of more trustworthy 3-D model structures, and specific algorithms are responsible to transfer template structure information, and toward assembly of the new predictive 3-D model. The new templates generated must be of high quality, and able to be validated, if not, others methods become necessary.

4.5.2 Free modeling An inexistence of homologous structures or the existence of largely dissimilar portions will difficult the modeling process. These are the principal challenges to prediction of


4 Protein modeling

3-D protein structures [5, 37, 44]. In this case, free modeling [5], known and categorized as ab initio and/or de novo prediction is applied [45]. The de novo method uses information from empirically obtained protein templates stored in databases. However if not highly similar in relation to the protein of interest, the method highlights the use of prediction algorithms drawn from similarities in secondary structures, protein fragments, and folding patterns according to their energy levels. In the first case, the grouping of common and possible secondary structures in each spatial region, guides the assembly of an initial folding mold; taking into account the effects caused by residues present in the amino acid sequences and spatial approximations between them, in many cases, this allows a reliable prediction. The increase in the use of this technique has led to the growth of database algorithms, which can provide information concerning possible folding patterns for a determined protein structure [45]. The absence of well described homologous sequences in protein databases can make the formation of a 3-D template for a target protein difficult, but not impossible. In some cases, the absence of similar structures is represented not only in the absence of homologous primary sequences, but also by the absence of similar folding patterns, that might have helped, as stated above, in prediction of a valid model. Determined methods allow fragmentation of a complete structure using non-global but local searches for similarities, into smaller fragments which can be aligned and compared with fragments of various elucidated proteins. This allows the assembly of small 3-D structures, which may aggregate information toward a global theoretical framework [5, 37, 44]. The amino acid residues present in a peptide sequence involved in 3-D structure folding interact amongst themselves to generate diverse conformations which are made up of varying energy levels. Due to its stability in physiological medium, the configuration closest to the native conformation should possess minimal global energy levels. Thus, when selecting possible 3-D models, attention to coherence and approximation patterns between residues is necessary, allowing the stability of the target mold to be preserved, and a trustworthy conformation to that found in the native protein. In the search for ideal conformation predictions, energy algorithms can be compared between existing conformations and the target protein, in which the patterns of interactions between the residues may repeat and thus help in modeling the unknown structure [5, 37, 44]. The fact that the information utilized for 3-D folding of a protein is intrinsic to the characteristics of its amino acid residues, and that folding must take into account minimum energy levels, the ab initio method is performed without utilizing previously known information and algorithms. To achieve a more stable thermodynamic 3-D conformation for a determined sequence, native functional structures in physiological medium make this method a viable alternative for predicting not yet known structures. The absence of any previously known additional information, in this method represents a great challenge for 3-D resolution of biomolecular structures, although an

4.5 Molecular modeling


alternative, its success has been limited to the prediction small structures, with somewhat studies introduces the advantage of reducing the unwanted usage of animal models in pharmacological research. Alongside, the medicinal chemists find these in silico or computational< methods to be of great help in the process of drug discovery in every aspect whether it may be during the initial stages of


6 In silico methods used in the design of VEGFR-2 inhibitors

optimizing a promising rationale for the novel drug design or in the identification of the lead molecule for an expected target of required pharmacological activity. In an attempt toward the discovery of a newer class of VEGFR-2 inhibitors, several structural modifications to the pre-existing bioactive molecules were carried which resulted in the generation of a huge library of small molecules. Simultaneously, the investigation of active and potential VEGFR-2 inhibitors by employing in silico techniques like Molecular docking, QSAR, Molecular dynamics simulation studies, and ADME/T prediction studies have also been carried out. In the following sections, we have highlighted a few of the discoveries and molecular interactions of active VEGFR-2 inhibitors by employing molecular modeling studies.

6.4.1 Design of novel piperazine–chalcone hybrids as VEGFR-2 kinase inhibitors Ahmed et al. [35] reported novel chalcone-based piperazine and pyrazoline analogs as potent VEGFR-2 inhibitors. The synthesized novel hybrids were studied about their role as antiproliferative agents at the National Cancer Institute by testing them against a huge panel of 60 human cancer cell lines. The results revealed that they were serving a promising role as anticancer agents. Simultaneously, VEGFR-2 inhibitory studies revealed compound 12a to be the most potent analog among all the synthesized ones with an IC50 value of 0.57 μM (Figure 6.3A). Concurrently, computational studies were also performed to observe the molecular interactions of the designed analogs with the binding site of VEGFR-2 (PDB ID: 4ASD) by utilizing the software MOE 2015.10. The highest docking score toward the active binding site was exhibited by the compound 12a which was also observed to show the highest activity in the case of the in vitro studies. The molecular interaction diagram, as depicted in Figure 6.3B, gives a clear layout of the direct interactions of the compound 12a with that of the active site of the enzyme. The carbonyl oxygen of the chalcone moiety present at the center was involved in the generation of a hydrogen bond with the active site amino acid residue LYS868. Thus, the computational studies show that the chalcone-based piperazine derivative 12a was targeting the VEGFR-2 enzyme and was confirmed the same by the in vitro studies.

Figure 6.3A: Structure of 12a.

6.4 Applications of in silico studies in the exploration of VEGFR-2 inhibitors


Figure 6.3B: Molecular docking model of compound 12a at the binding site of VEGFR-2 (Ahmed et al. [35]).

6.4.2 Docking model of 1-piperazinyl-phthalazines as potential VEGFR-2 inhibitors A series of novel N-(aryl)-2-[4-(4-phenylphthalazine-1-yl)piperazin-1-yl]acetamide derivatives were reported to be active VEGFR-2 inhibitors by Abou-Seri et al. [36]. Among the synthesized ones, compound 13a exhibited inhibitory action toward VEGFR-2 at a sub-micromolar range of IC50 of 0.35 ± 0.03 μM (Figure 6.4). In addition to this, in silico studies were performed to explore the binding affinity of the synthesized analogs with the active binding site of the enzyme. This was carried out by employing the MOE software (Molecular Operating Environment) of version 2010.10 and selecting the PDB ID: 4ASD of the enzyme VEGFR-2. Docking simulation studies expressed compound 13a to have a good fit for the active binding site of the enzyme in a similar kind of fashion as of the co-crystal Sorafenib. The docking scores (sorafenib: −15.19 kcal/mol; compound 13a: −16.95 kcal/mol) were also in a comparable manner. Several molecular interactions were observed from the interaction diagram between the active compound 13a and the active binding site as shown in (Figure 6.5). One of them was the hydrogen bond formed in between the carboxylate group of the residue Glu885 and the N–H group of the acetamide moiety used as a linker to connect the piperazine ring with the phenyl carboxyethyl ester part. The carbonyl oxygen of the amide group also shared a hydrogen bond with the residue Asp1046. There were also few pi–pi interactions observed between the other parts of the active compound with the active binding site. Some of them were the pi–pi interactions between the phenyl ring of the phthalazine ring and the amino residue Asp1046; the piperazine ring and the amino acid residue Phe1047; two more interactions between the phenylphthalazine ring and the residues Gly922 and Leu840. The in silico studies performed on the synthesized analogs revealed the need for the introduction of a bulky lipophilic group on the aryl moiety of the 1-piperazinylphthalazine derivatives to be necessary for improving the activity of the series.


6 In silico methods used in the design of VEGFR-2 inhibitors

Figure 6.4: Structure of 13a.

Figure 6.5: (A) docking model of the best-scored compound 13a (yellow) as a 3D style generated pose overlaid on sorafenib (orange), (B) 2D interaction diagram of Compound 13a with the active site (AbouSeri et al. [36]).

6.4.3 Identification of BAW2881 as a potent VEGFR-2 inhibitor: a success story A new series of 6-(pyrimidin-4-yloxy)-naphthalene-1-carboxamide derivatives were identified as potent anticancer agents exhibiting single-digit nanomolar potency as selective VEGFR-2 inhibitors by the research group of Bold et al. [37] (Figure 6.6). Computational studies were performed on the enzyme VEGFR-2 kinase by selecting the combination of the co-crystal AAL993 involving anthranilic acid as the core moiety. The binding interactions of the co-crystal with the active site i.e., hydrogen bonding with the residues C919, E885, and D1046; hydrophobic interaction with the amino acid residue V916 (the “gatekeeper” residue) are depicted in Figure 6.7. AAL993 is also found to be binding with the “DFG out” inactive conformation of the enzyme VEGFR-2 kinase.

6.4 Applications of in silico studies in the exploration of VEGFR-2 inhibitors


Figure 6.6: A comparison study between the former compound PTK787, co-crystal AAL993, and later generation compounds AST487 and BAW2881.

Figure 6.7: Docking of BAW2881 (yellow) against the ATP pocket VEGFR-2 kinase (PDB ID: 5EW3) bound with AAL993 (green). The dashed lines represent hydrogen bonds (Bold et al. [37]).


6 In silico methods used in the design of VEGFR-2 inhibitors

AST487 includes an aminopyrimidine ring in replacement of the pyridine core of AAL993 and is more likely to be selective toward the FLT3 enzyme than the VEGFR-2 enzyme. When molecular docking studies were performed on AST487 to study its binding interactions with VEGFR-2, the aminopyrimidine core of AST487 was found to be the main binding motif to the active site. Furthermore, pharmacophore modeling experiments explained the replacement of the anthranilic acid core moiety of AAL993 with the naphthyl moiety does not cause any hindrance to the favorable interactions needed for binding with the binding pocket. Based on this conclusion, BAW2881, a newer analog with naphthyl core moiety was synthesized. BAW2881 was found to be more potent and selective against VEGFR-2 than FLT3 when compared to AST487. BAW2881 was found to be similar in activity to AAL993 in inhibiting the autophosphorylation of CHO cells with an IC50 of 4 nM. It was also found to be more potent than the former molecule PTK787 which showed inhibitory activity of IC50 0.034 µM. Thus, the structure-based morphing and modeling studies conducted by Bold et al. on the FLT3 inhibitor AST487 lead to the generation of a new series of 6-(pyrimidin-4-yloxy)naphthalene-1-carboxamide derivative (BAW2881), that were found to be selectively inhibiting the VEGFR-2 enzyme.

6.4.4 Molecular modeling studies on thienopyrimidine scaffold as VEGFR-2 inhibitors A novel series of thieno[2,3-d]pyrimidine derivatives were developed and screened for in vitro anticancer properties by Ghith et al. [38]. Among them, compounds 18a, 19a, and 20a were exhibiting good IC50 values of 2.5, 5.48, and 2.27 μM, respectively, making them potential leads as anticancer agents (Figure 6.8). The research group employed the C-Docker protocol of Discovery studio version 4.2 software to perform the molecular docking simulation studies and interpret the binding orientations and interactions of the target molecules with the active site of the VEGFR-2 enzyme (PDB ID: 3VHE). The urea-based part of the thienopyrimidine derivatives was forming molecular interactions with the amino acid residues Cys919, Glu885, and Asp1046. It was also a noteworthy thing that the analogs showed similar kind of orientations and volume consumption of the binding site as that of the co-crystallized ligand. Thus, the computational studies helped in understanding the importance of the molecular interactions of the urea derivatives with the active site for exhibiting the VEGFR-2 inhibitory activity. The pharmacokinetic properties of the compounds were studied and predicted by utilizing the software Accelrys Discovery Studio 2.5. Ghith et al. compared the pharmacokinetic properties of the synthesized analogs with the co-crystal of the protein revealing them to be developing advanced physicochemical properties with improved solubility and drug absorption in aqueous levels.

6.4 Applications of in silico studies in the exploration of VEGFR-2 inhibitors


Figure 6.8: Representative structures of thienopyrimidine scaffold containing VEGFR-2 inhibitors.

6.4.5 Identification of new VEGFR-2 kinase inhibitors: pharmacophore modeling and virtual screening There are several techniques in computational chemistry to advance in drug discovery among which, virtual screening is a technique that helps in identifying a hit among a huge library of small molecules by analyzing the large databases or collections of molecules. The hit molecule obtained from the virtual screening is most likely to show a good binding affinity toward a well-studied drug target or a receptor or an enzyme. This technique of virtual screening was employed by Lee et al. [39]. Over a huge database of around 820,000 commercial compounds to identify a hit candidate that can be most likely to be targeting the VEGFR-2 kinase enzyme. Through pharmacophore modeling and molecular docking simulation studies, around 100 candidates were selected and tested for the in vitro biological inhibitory action toward VEGFR-2. The results revealed the top 10 compounds to inhibit the VEGFR-2 enzyme with a range of IC50 values from 1 to 10 µM. Compound 21a, containing a triazinoindole ring exhibited the potency with an IC50 of 1.6 µM (Figure 6.9). Thus, by employing the computational technique of virtual screening on a large database, a lead molecule suitable to be a potent VEGFR-2 kinase enzyme inhibitor was identified and confirmed with the in vitro biological tests by Lee et al.

6.4.6 Molecular modeling of quinazoline containing 1,3,4-oxadiazole scaffold as VEGFR-2 inhibitor Qiao et al. synthesized and tested a complete series of 4-alkoxyquinazoline1,3,4-oxadiazole derivatives against three human cancer cell lines A549, MCF-7, and Hela [40]. The synthesized derivatives were found to be showing inhibitory activities


6 In silico methods used in the design of VEGFR-2 inhibitors

Figure 6.9: Structure of 21a.

against the cancer cell lines. Compound 22a, exhibited inhibitory activity of IC50 values 0.2, 0.38, and 0.32 μM for MCF-7, A549, and Hela cancer cell lines, respectively. The in vitro inhibition of the VEGFR-2 enzyme by compound 22a also showed significant potency with an IC50 value of 2.32 nM (Figure 6.10). Docking simulation studies against the active binding site of VEGFR-2 (PDB ID: 4ASE) expressed the derivative 22a to be showing interactions with the amino acid residues Lys868, Cys919, His1026, and Asp1046. Ligand Fit Dock protocol version 3.5 software of Discovery studio was employed. It involves the graphical interface DS-CDOCKER protocol, an in-silico technique based on CHARMM.

Figure 6.10: Structure of 22a.

6.4.7 Identification of covalently binding, irreversible VEGFR-2 kinase domain inhibitors The research group of Wissner et al. [41] was able to develop potent irreversibly binding EGFR kinase inhibitors that function by binding covalently to the conserved Cys773 residue of the active site. The availability of the residue Cys773 at the active site as observed from the X-ray crystal structure of the VEGFR-2 catalytic domain responsible

6.4 Applications of in silico studies in the exploration of VEGFR-2 inhibitors


Figure 6.11: Development of quinazolinylamino-benzoquinone derivative (24a) by selecting the quinazoline ring containing VEGFR-2 inhibitor (ZD-4190, 23a).

for the covalent binding was utilized as the main scope for designing newer binding inhibitors for the enzyme. They were able to synthesize a series of 2-(quinazolin4-ylamino)-[1,4]-benzoquinone derivatives by selecting the quinazoline ring as the main moiety for the covalent binding with the VEGFR enzyme (Figure 6.11). They performed docking studies on ZD-4190, a VEGFR-2 inhibitor reported by the Zeneca group. The interaction diagram of the docked inhibitor explains the required orientation needed to be present at the active site of the enzyme to bind with the kinases (Figure 6.12). The 4-anilino group of the inhibitor was located near the Cys1045 residue. Thus, Wissner et al. substituted the 4-anilino group with quinazoline core compound 24a to resemble and obtain the same interaction with the cysteine residue.

Figure 6.12: Binding model for ZD-4190 (A) and 24a (B) at the active site of VEGFR-2 (Wissner et al. [41]).


6 In silico methods used in the design of VEGFR-2 inhibitors

6.4.8 Molecular docking study of novel N-(2-carbamoyl6-methoxyphenyl)-3,4,5-trimethoxybenzamide derivative as VEGFR-2 tyrosine kinase inhibitor In the same manner, Altamimi et al. and the group reported compound 25a as a potent VEGFR-2 inhibitor by performing in vitro and computational studies [42]. The research group synthesized a series of 8-methoxy-2-methoxyphenyl-3-quinazoline-4(3)-one analog and screened them against several cancer cell lines finding out the most potent one to exhibit the VEGFR-2 inhibitory action with IC50 of 98.1 nM (Figure 6.13). By using the Autodock Vina program, the docking energy score of the potent derivative against the ATP binding site (PDB ID: 4ASD) was observed to be −7.3 kcal/mol. Compound 25a exhibited hydrophobic interactions with the amino acid residue L1019 and two hydrogen bond interactions; one by the nitrogen atom of the benzamide moiety with the E885 of the αC helix and the other by the oxygen atom of the benzamide group with the amino acid residue D1046 of the DFG region of the enzyme. Molecular dynamics simulation studies for about 20 ns were also performed by using GROMACS 2018.1 software. The simulation studies reveal a stable binding of the compound because of the hydrogen bonds formed with the amino acid residues available in the binding pocket of the VEGFR-2 enzyme.

6.5 Conclusions The burden of cancer is increasing globally. The approach of cancer treatment by inhibiting the VEGFR-2 enzyme has already become the most promising approach for

Figure 6.13: Structure of 25a.



cancer drug therapy. In recent years, in silico computational approaches proved their importance in drug discovery by demonstrating the crucial requirements of a drug molecule for binding to the protein target and exhibiting pharmacological activities. These approaches are usually a faster alternative to experimental drug discovery techniques, and they provide the advantage of the identification of lead candidates cost-effectively. With the elucidation of the protein structure, a brief knowledge of the docking poses, molecular interactions, and binding mechanisms obtained via computational studies ease the process of drug discovery. One of the most important advantages of in silico methods is the chance of introduction of new changes on the established molecules and in silico prediction of binding affinities toward the protein target. This chapter summarizes the importance of in silico techniques for the discovery of novel VEGFR-2 inhibitors. Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission. Research funding: None declared. Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

References 1. International Agency for Research on Cancer. The global cancer observatory. [Accessed 30 July 2021]. 2. Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer statistics, 2021. CA Cancer J Clin 2021;71:7–33. 3. Baudino AT. Targeted cancer therapy: the next generation of cancer treatment. Curr Drug Discov Technol 2015;12:3–20. 4. (a) Mohammad HB, Mohd A, Rosina K, Surendar D, Khurshid A, Gulam R, et al. Enzyme targeting strategies for prevention and treatment of cancer: implications for cancer therapy. Semin Canc Biol 2019;56:1–11. (b) Jonas C, Egle Z, Amos B, Pascale G. Kinases and cancer. Cancers (Basel) 2018;10:63. 5. (a) Yamaoka T, Kusumoto S, Ando K, Ohba M, Ohmori T. Receptor tyrosine kinase-targeted cancer therapy. Int J Mol Sci 2018;19:3491. (b) Vouri M, Hafizi S. TAM receptor tyrosine kinases in cancer drug resistance. CancerRes 2017;77: 2775–8. 6. Madhusudan S, Ganesan TS. Tyrosine kinase inhibitors in cancer therapy. Clin Biochem 2004;37: 618–35. 7. Shibuya M. Vascular endothelial growth factor and its receptor system: physiological functions in angiogenesis and pathological roles in various diseases. J Biochem 2013;153:13–9. 8. Lemmon MA, Schlessinger J. Cell signaling by receptor tyrosine kinases. Cell 2010;141:1117–34. 9. Cudmore MJ, Hewett PW, Ahmad S, Wang KQ, Cai M, Al-Ani B, et al. The role of heterodimerization between VEGFR-1 and VEGFR-2 in the regulation of endothelial cell homeostasis. Nat Commun 2012;3:1–12. 10. Wang X, Bove AM, Simone G, Ma B. Molecular bases of VEGFR-2-mediated physiological function and pathological role. Front Cell Dev Biol 2020;8:1314.


6 In silico methods used in the design of VEGFR-2 inhibitors

11. Bahram F, Claesson-Welsh L. VEGF-mediated signal transduction in lymphatic endothelial cells. Pathophysiology 2010;17:253–61. 12. Cohen P. Protein kinases – the major drug targets of the twenty-first century? Nat Rev Drug Discov 2002;1:309–15. 13. Cohen P, Cross D, Jänne PA. Kinase drug discovery 20 years after imatinib: progress and future directions. Nat Rev Drug Discov 2021:1–19. 14. Witte ON, Dasgupta A, Baltimore D. Abelson murine leukaemia virus protein is phosphorylated in vitro to form phosphotyrosine. Nature 1980;283:826–31. 15. Veale D, Ashcroft T, Marsh C, Gibson GJ, Harris AL. Epidermal growth factor receptors in non-small cell lung cancer. Br J Cancer 1987;55:513–6. 16. Breccia M, Alimena G. Nilotinib: a second-generation tyrosine kinase inhibitor for chronic myeloid leukemia. Leuk Res 2010;34:129–34. 17. Kantarjian H, Jabbour E, Grimley J, Kirkpatrick P. Dasatinib. Nat Rev Drug Discov 2006;5:717–9. 18. Cortes JE, Muresan B, Mamolo C, Cappelleri JC, Crescenzo RJ, Su Y, et al. Matching-adjusted indirect comparison of bosutinib, dasatinib and nilotinib effect on survival and major cytogenetic response in treatment of second-line chronic phase chronic myeloid leukemia. Curr Med Res Opin 2019;35:1615–22. 19. Redaelli S, Piazza R, Rostagno R, Magistroni V, Perini P, Marega M, et al. Activity of bosutinib, dasatinib, and nilotinib against 18 imatinib-resistant BCR/ABL mutants. J Clin Oncol 2009;27: 469–71. 20. Cortes JE, Gambacorti-Passerini C, Deininger MW, Mauro MJ, Charles C, Dong-Wook K, et al. Bosutinib versus imatinib for newly diagnosed chronic myeloid leukemia: results from the randomized BFORE trial. J Clin Oncol 2018;36:231–7. 21. Noronha G, Cao J, Chow CP, Elena D, Fine R M, Hood J, et al. Inhibitors of ABL and the ABL-T3151 mutation. Curr Top Med Chem 2008;8:905–21. 22. O’Hare T, Zabriskie MS, Eiring AM, Deininger MW. Pushing the limits of targeted therapy in chronic myeloid leukaemia. Nat Rev Cancer 2012;12:513–26. 23. Solca F, Dahl G, Zoephel A, Gerd B, Michael S, Christian K, et al. Target binding properties and cellular activity of Afatinib (BIBW 2992), an irreversible ErbB family blocker. J Pharmacol Exp Ther 2012;343:342–50. 24. Engelman JA, Zejnullahu K, Gale C-M, Eugene L, Gonzales AJ, Takeshi S, et al. PF00299804, an irreversible pan-ERBB inhibitor, is effective in lung cancer models with EGFR and ERBB2 mutations that are resistant to gefitinib. Cancer Res 2007;67:11924–32. 25. Cross DAE, Ashton SE, Ghiorghiu S, Eberlein C, Nebhan CA, Spitzler PJ, et al. AZD9291, an irreversible EGFR TK1, overcomes T790M-mediated resistance to EGFR inhibitors in lung cancer. Cancer Discov 2014;4:1046–61. 26. Bhargava P, Robinson MO. Development of second-generation VEGFR tyrosine kinase inhibitors: current status. Curr Oncol Rep 2011;13:103–11. 27. Lee K, Jeong K–W, Lee Y, Song JY, Kim MS, Lee GS, et al. Pharmacophore modeling and virtual screening studies for new VEGFR-2 kinase inhibitors. Eur J Med Chem 2010;45:5420–7. 28. Nakamura H, Sasaki Y, Uno M, Yoshikawa T, Asano T, Ban HS, et al. Synthesis and biological evaluation of benzamides and benzamidines as selective inhibitors of VEGFR tyrosine kinases. Bioorg Med Chem Lett 2006;16:5127–31. 29. Abou-Seri SM, Eldehna WM, Ali MM, Abou El Ella DA. 1-Piperazinylphthalazines as potential VEGFR-2 inhibitors and anticancer agents: synthesis and in vitro biological evaluation. Eur J Med Chem 2016;107:165–79. 30. Sana S, Reddy VG, Bhandari S, Reddy TS, Tokala R, Sakla AP, et al. Exploration of carbamide derived pyrimidine-thioindole conjugates as potential VEGFR-2 inhibitors with anti-angiogenesis effect. Eur J Med Chem 2020;200:112457.



31. Fuh G, Li B, Crowley C, Cunningham B, Wells JA. Requirements for binding and signaling of the kinase domain receptor for vascular endothelial growth factor. J Biol Chem 1998;273:11197–204. 32. Shinkai A, Ito M, Anazawa H, Yamaguchi S, Shitara K, Shibuya M. Mapping of the sites involved in ligand association and dissociation at the extracellular domain of the kinase insert domaincontaining receptor for vascular endothelial growth factor*. J Biol Chem 1998;273:31283–8. 33. Modi SJ, Kulkarni VM. Vascular endothelial growth factor receptor (VEGFR-2)/KDR inhibitors: medicinal chemistry perspective. Med Drug Discov 2019;2:100009. 34. (a) McTigue M, Murray BW, Chen JH, Deng Y-L, Solowiej J, Kania RS. Molecular conformations, interactions, and properties associated with drug efficiency and clinical performance among VEGFR TK inhibitors. Proc Natl Acad Sci USA 2012;109:18281–9. (b) Oguro Y, Miyamoto N, Okada K, Takagi T, Iwata H, Awazu Y, et al. Design, synthesis, and evaluation of 5-methyl-4-phenoxy5H-pyrrolo[3,2-d]pyrimidine derivatives: novel VEGFR2 kinase inhibitors binding to inactive kinase conformation. Bioorg Med Chem 2010;18:7260–73. (c) Hasegawa M, Nishigaki N, Washio Y, Kano K, Harris PA, Sato H, et al. Discovery of novel benzimidazoles as potent inhibitors of TIE-2 and VEGFR-2 tyrosine kinase receptors. J Med Chem 2007;50:4453–70. (d) Harris PA, Boloor A, Cheung M, Kumar R, Crosby RM, Davis-Ward RG, et al. Discovery of 5-[[4-[(2,3-dimethyl-2H-indazol-6-yl) methylamino]-2-pyrimidinyl]amino]-2-methyl-benzenesulfonamide (Pazopanib), a novel and potent vascular endothelial growth factor receptor inhibitor. J Med Chem 2008;51:4632–40. (e) Weiss MM, Harmange JC, Polverino AJ, Bauer D, Berry L, Berry V, et al. Evaluation of a series of naphthamides as potent, orally active vascular endothelial growth factor receptor-2 tyrosine kinase inhibitors. J Med Chem 2008;51:1668–80. (f) Potashman MH, Bready J, Coxon A, Demelfi TM, Dipietro L, Doerr N, et al. Design, synthesis, and evaluation of orally active benzimidazoles and benzoxazoles as vascular endothelial growth factor-2 receptor tyrosine kinase inhibitors. J Med Chem 2007;0:4351–73. (g) Cee VJ, Cheng AC, Romero K, Bellon S, Mohr C, Whittington DA, et al. Pyridyl-pyrimidine benzimidazole derivatives as potent, selective, and orally bioavailable inhibitors of Tie-2 kinase. Bioorg Med Chem Lett 2009;19:424–7. 35. Ahmed MF, Santali EY, El-Haggar R. Novel piperazine-chalcone hybrids and related pyrazoline analogues targeting VEGFR-2 kinase; design, synthesis, molecular docking studies, and anticancer evaluation. J Enzym Inhib Med Chem 2021;36:307–8. 36. Abou-Seri SM, Eldehna WM, Ali MM, Abou El Ella DA. 1-Piperazinylphthalazines as potential VEGFR-2 inhibitors and anticancer agents: synthesis and in vitro biological evaluation. Eur J Med Chem 2016;107:165–79. 37. Bold G, Schnell C, Furet P, McSheehy P, Josef B, Jürgen M, et al. A novel potent oral series of VEGFR2 inhibitors abrogate tumor growth by inhibiting angiogenesis. J Med Chem 2016;59:132–46. 38. Ghith A, Youssef KM, Ismail NS, Abouzid KA. Design, synthesis and molecular modeling study of certain VEGFR-2 inhibitors based on thienopyrimidine scaffold as cancer targeting agents. Bioorg Chem 10;2:111–28. 39. Lee K, Jeong K–W, Lee Y, Song JY, Kim MS, Lee GS, et al. Pharmacophore modeling and virtual screening studies for new VEGFR-2 kinase inhibitors. Eur J Med Chem 2010;45:5420–7. 40. Qiao F, Yin Y, Shen Y-N, Wang S-F, Sha S, Wu X, et al. Synthesis, molecular modeling, and biological evaluation of quinazoline derivatives containing the 1,3,4-oxadiazole scaffold as novel inhibitors of VEGFR2. RSC Adv 2015;5:19914–23. 41. Wissner A, Floyd MB, Johnson BD, Fraser H, Ingalls C, Nittoli T, et al. 2-(Quinazolin-4-ylamino)-[1,4] benzoquinones as covalent-binding, irreversible inhibitors of the kinase domain of vascular endothelial growth factor receptor-2. J Med Chem 2005;48:7560–81. 42. Altamimi AS, El-Azab AS, Abdelhamid SG, Alamri MA, Bayoumi AH, Alqahtani SM, et al. Synthesis, anticancer, screening of some novel trimethoxy quinazolines and VEGFR2, EGFR tyrosine kinase inhibitors assay; molecular docking studies. Molecules 2021;26:2992.

Varruchi Sharma, Anil Panwar, Girish Kumar Gupta and Anil K. Sharma*

7 Molecular docking and MD: mimicking the real biological process Abstract: In the processes of molecular docking and simulation studies; the computational techniques have a vast and significant role in drug discovery process. The rigid view in the binding of both target and ligand is the basis of modeling strategy process. More evolution to such processes with the time has lead in revealing the path of understanding the dynamic nature of binding processes. In this chapter we have focused on molecular docking along with dynamic studies in reference to biological processes. Keywords: biological processes; drug discovery; modeling strategy; molecular docking; simulation.

7.1 Introduction Molecular docking is a process of fitting together of any two molecular structures, in simpler words we can say that it’s a technique that is used for the prediction of how an enzyme shows interaction with ligand/s. The process of molecular docking in today’s scientific world has become a significant and most important component of the drug discovery process with its low-cost and very fewer complications. Also along with its simple applications of use has made the tool vary popular among the academic communities [1]. In the process of structure based drug designing the process of molecular docking has been predicted most accurate also along with the conformation of varying sized ligands within the suitable binding site of the target [1, 2]. The algorithms involved in molecular docking works majorly on quantifiable predictions of best suitably executed binding energetics, provides the docked complexes arranged on the basis of complexes of both receptor and ligand. The confirmations of binding states are exploited based on two parameters (a) recognition of best suitable binding approaches. (b) Binding conformational predictions, and to rank such confirmations using scoring function [3]. *Corresponding author: Anil K. Sharma, Department of Biotechnology, Maharishi Markandeshwar (Deemed to be University), Mullana-Ambala, Haryana, 133207, India, E-mail: [email protected] Varruchi Sharma, Department of Biotechnology and Bioinformatics, Sri Guru Gobind Singh College Sector-26, Chandigarh, 160019, India Anil Panwar, Department of Molecular Biology, Biotechnology and Bioinformatics, College of Basic Sciences and Humanities, CCS Haryana Agriculture University, Hisar, 125001, India Girish Kumar Gupta, Department of Pharmaceutical Chemistry, Sri Sai College of Pharmacy, Badhani, Pathankot, Punjab, 145001, India This article has previously been published in the journal Physical Sciences Reviews. Please cite as: V. Sharma, A. Panwar, G. K. Gupta and A. K. Sharma “Molecular docking and MD: mimicking the real biological process” Physical Sciences Reviews [Online] 2022. DOI: 10.1515/psr-2018-0164 |


7 Molecular docking and MD

S. No.

Name of algorithms

Features of the algorithm

Matching algorithms

     

MCSS Incremental construction LUDI Monte Carlo Genetic algorithms Molecular dynamics

Geometry-based, suitable to VS and database enrichment for its high speed Fragment-based methods for the de novo design Fragment-based and docking incrementally Fragment-based methods for the de novo design Stochastic search Stochastic search For further refinement after docking

Some of the sampling algorithms: Intermolecular interactions in the Search algorithms play an important role in molecular docking studies. The interactions among protein and its calculation leads to the development of algorithms, the same uses shape of the molecules of interest with its geometrical complementarities such algorithms are represented as shape matching algorithms. This approach is convenient in drug design process. The process of docking in its next is followed by conformational search, in which by applying systematic (somewhat variations in parameters at structural level) and stochastic search methods; modification in translational, torsional along with rotational degrees of freedom takes place [4].

7.2 AutoDock; docking of flexible ligands to receptors: AutoDock is a molecular modeling software which performs protein–ligand computational docking, also along with the docking process it helps in calculating the interaction among both protein and ligand by using empirical force field energy and by Lamarckian genetic algorithm. Lamarckian algorithm works on ligand confirmation along with the calculation of free binding energy. The same are calculated by 3D potential-grids [5]. In AutoDock two programs mainly work; one calculates docking of a ligand in reference to a grid formed other program AutoGrid works upon precalculating the grids. The program also visualizes affinity grids of the atoms with these affinity grids it performs structural and functional analysis of molecules of interest [6]. AutoDock in the recent times was designed for calculating the catalytic properties and binding site of both proteins and DNA, but with time, it has been improvised for screening of entire compound libraries in comparison to pharmaceutically-relevant targets [7]. In the process it shows the best binding site along with the best mode of binding of target to that of the receptor. The binding properties have been characterized in terms of optimal physical configuration and energy confirmations among two different molecules to be docked. It can also help in maximizing/minimizing the molecular interaction, binding energies can be easily evaluated, among the molecules to be docked.

7.4 Autogrid calculation


Two different groups works in performing docking; (a) protein–protein docking, this type of docking works on lock and key mechanism, where both the molecules used are rigid in its states. (b) protein–ligand docking, this is also called as flexible kind of docking, there are conformational changes among the molecules as yielded by the interaction of receptor and ligand [7, 8]. AutoDock uses two search algorithms; global search and genetic algorithm, (a) global search algorithm accomplishes simulated annealing, in which ligand starts at initial state, the state can be either random or user-defined and with the passage of time, the temperature of the system is reduced. The atomic moves are carefully taken into consideration with the energy comparisons in both the states i.e. current state of energy and the previous state of energy in reference to the probability proportional to the temperature [9]. The same steps are repeated till the final solution is attained. Temperature dependency always have an effect on annealing process, in global search, it starts with the high temperature on the other hand for the local search, it initiates with lower temperature range [8, 9]. Genetic Algorithm works on the basis of Charles Darwin’s Theory of Evolution, that provides the conformations as the set of 64 probable rotational angles to each and every bond in the ligand and possible dihedral angles organized in the binary system (0 or 1). In the same; the conformations are ordered as 4 × 6 bits [10]. Local search algorithms use Lamarckian GA (GA + LS hybrid), Hybrid global-local search. The “Lamarckian” feature helps us in finding out the local conformational space, local minima, having an assessment of these properties, the same information is carried forward to upcoming generations for having better and more precise calculations [10].

7.3 AutoDock: coordinate file preparation The program uses the protein model which is polar hydrogen atoms enabled, and PDBQT (with an atomic partial charges and atom types) format of PDB is in use for coordinate files also these files are quite informative. Force field parameter uses various atom types such as aliphatic and aromatic carbon atoms, which form hydrogen bonds and polar atoms (those which do nor form hydrogen bonds). Also an additional PDBQT file is formed for side chain coordinates, in case where side chains are treated as flexible in the protein structure.

7.4 Autogrid calculation Atomic affinity potentials (belonging to each atom type) are pre-calculated. In this procedure the protein is in a 3D grid along with a probe atom with an allowed interaction energy which is positioned at respective grid point. The affinity grids in AutoGrid


7 Molecular docking and MD

are computed for every atom in ligand specially carbon, oxygen, nitrogen and hydrogen.

7.5 Docking performed using AutoDock Docking is performed using Lamarckian genetic algorithm (LGA). This was done number of times for the better docked conformations along with analysis of various properties as the anticipated energy and the consistency of results for getting best results.

7.6 Analysis performed using AutoDock tools AutoDock tools include various methods to analyze the outcomes of docking imitations and various tools for assembling results by similarity in conformations along with visualizations of interactions among ligands and proteins. Also the affinity potential visualizations can be generated by AutoGrid.

7.7 AutoDock result For demonstrating the process we have run the docking experiment, the same has been performed using a designed ligand and mTOR, in which we have achieved with all ligands and the target protein that resulted in Gibbs free energy and the final binding pose (run). The value showing negative reaction energy has a significant best irreversible binding of our molecule of interest along with the target protein. For performing docking calculations, we have used contented local search (GALS) strategy Table .: Ligands run with AutoDock, showing conformational values, binding energies, efficiency of ligands, inhibitory constant values etc. Ligand no 


Binding energy

Ligand efficiency

Inhibitory constant


         

−. −. −. −. −. −. −. −. −. −.

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

7.7 AutoDock result


Figure 7.1: (a) Docked state of ligand, (b) docked ligand interactions reported with Thr at 768 position.

along with well-defined grid points 40, 40, 60 (NPTS) and X = (65.217, Y = 69.728, Z = 46.872 (grid center) followed by the default Lamarckian Genetic Algorithm parameter. Table 7.1 shows a varying range of conformation of ligand (shown only single ligand) signposted by the run that has best binding energy to the target binding site. In this example ligand 1 with conformation 5 was found obligating binding free energy −5.59 kcal/mol (Breakup of del G); considered to be best in its binding among the all defined conformational states, also have a capability of being best probable lead for further investigation. We have also performed pose analysis in order to reveal diverse range bond formations. Docked conformation with lowest binding energy (Run 5) showed significant bond formation with the desired amino acids shown in Figure 7.1, along with its Docked state.

Figure 7.2: Fundamental steps of MD simulations.


7 Molecular docking and MD

7.8 Molecular dynamic simulations and history Molecular dynamics (MD) is a mathematical modelling, performed on a computer, which is designed to predict the behaviour of, or the outcome of, a real-world or physical movements of atoms and molecules (Figure 7.2). The atoms and molecules are allowed to interact for a set amount of time, providing a perspective of the system’s dynamic “evolution.” The trajectories of atoms and molecules are computed in the most common way by numerically solving Newton’s equations of motion for a system of interacting particles, interatomic potentials or molecular mechanics force fields are frequently used to determine forces between particles and their potential energy. Chemical physics, materials science, and biophysics are some of the fields where the approach is used. MD simulations were first carried out in 1957 by Alder and Wainwright on a hard-sphere fluid [11]; Rahman simulated the first fluid with soft interactions in 1964 [12], and Rahman and Stillinger simulated the first complex fluid (water) in 1971 [13]. The first MD simulation of a protein was carried out in 1976 by Andrew McCammon [14].

7.9 PDB Structure and need of 3d conformation study The study of macromolecular structure is crucial to gaining a better knowledge of biology. Interactions between macromolecules give rise to biological processes. The protein data bank (PDB) now has over 175,000 entries, including over 150,000 proteins and 9000 protein nucleic acid complexes. Despite their immense utility, however, structures saved in the PDB only provide a limited perspective of 3D structure. Proteins and nucleic acids are both flexible substances, and their functionality can be influenced by their dynamics. While performing their function, proteins endure major structural changes. Any complex formed by a protein, on the whole, entails some structural change. This can be easily verified by comparing a set of PDB entries that differ only in the size of a tiny ligand bound to a specific protein. There are no modifications to the overall fold, only minor structural variations, but these differences are significant enough to deceive ligand-docking algorithms.

7.10 Conformational changes are a common part of an enzymes’ catalytic cycle Allosteric regulation is entirely predicated on a protein’s ability to exist in two or more stable conformations at the same time. Furthermore, some aspects of protein function can only be understood when dynamic aspects are included. Recent developments in simulation algorithm performance, including specific efforts to boost conformational

7.12 GPU and high computation power in MD simulations


sampling, have promoted the idea of “conformational ensemble” as an alternative to PDB single structure analysis.

7.11 The overview of calculating md simulation In the definition of a system, solvent representation is a critical issue. Several ways have been tested, but the simplest, explicit modelling of solvent molecules, has proven to be the most effective, albeit at the expense of increasing the scale of the simulated systems. Most solvation effects of real solvent, including those of entropic origin, such as the hydrophobic effect, can be recovered using explicit solvent. Once the system is put together, the forces acting on each atom are calculated using equations called force-fields, which are derived from the molecular structure’s potential energy. Forcefield equations are complicated, but they are easy to quantify. The Coulomb’s law and springs for bond length and angles, periodic functions for bond rotations and Lennard–Jones potentials, and the Coulomb’s law for van der Waals and electrostatic interactions, respectively, ensure that energy and force calculations are exceedingly quick even for vast systems. Force-fields currently used in atomistic molecular simulations differ in the way they are parameterized. Although some parameters are not interchangeable and not all force-fields can represent all molecule types, simulations using modern force-fields are usually equivalent. After obtaining the forces acting on individual atoms, the classical Newton’s law of motion is utilised to determine accelerations and velocities, as well as to update atom coordinates.

7.12 GPU and high computation power in MD simulations As a more simplified representation of the system is adopted, substantially bigger time steps are allowed, resulting in a dramatic increase in the simulation’s effective length. Of course, this can be done at the cost of the simulation ensemble’s accuracy. The performance of MD simulations has been greatly increased thanks to algorithmic developments such as fine-tuning of energy calculations, parallelization, and the use of graphical processing units (GPUs). Parallelism and accelerators are used by the current generation of computers to speed up the process. The messaging passing interface (MPI) has long been compatible with the most prominent simulation programmes (AMBER [15], CHARMM [16], GROMACS, or NAMD) [17]. MPI can considerably cut computation time when a large number of computer cores can be used simultaneously. The general technique is to distribute the system to simulate among processors to enhance the locality of


7 Molecular docking and MD

interactions. This scheme is considered as spatial decomposition. Each Processor just needs to simulate a small portion of the system. As previously stated, the introduction of accelerators, particularly GPUs, has proven to be a significant development in simulation codes. GPUs, which were originally built to handle computer graphics, have grown into general-purpose, fully programmable, high-performance processors that represent a significant advancement in the ability to execute atomistic MD. The majority of significant MD codes have previously been optimized for GPUs, and even MD codes designed specifically for GPUs have been created (ACEMD61). At the moment, the default technique for high-throughput MD simulations is to use GPUs alone or in combination with MPI. Surprisingly, while simulations have long been the most popular application of HPC in the life sciences, the increasing power and sophistication of GPUs is resulting in a wider use of personal workstations with comparable performance. Since the first molecular mechanics computer simulations of biological molecules became conceivable, scientists have wished to examine all complicated biological phenomena in silico, avoiding the immense experimental obstacles and expenses. Two intrinsic needs must be met in order to do this: First, simulations must be able to achieve time scales in the millisecond range or even longer. Second, the computer model must precisely reproduce what is measured experimentally. Despite some recent breakthroughs, the overall perception in the area is that neither of these prerequisites has yet been reached, and that the dream will only be realised in the far future, if at all. As more and more powerful computer are been made, there is a huge upliftment in the research related to the MD simulations has seen in last decade (Table 7.2).

Table .: The plot shows publications on molecular dynamics simulations research since last  years (–) in PubMed. 9000 8000 7000 No. of Publica ons

6000 5000

















4000 3000 2000 1000 0 2010




Year of Publica ons

7.13 World’s fastest computer and MD simulations


7.13 World’s fastest computer and MD simulations Fugaku is the world’s most powerful supercomputer. Scientists in Japan are using Fugaku, the world’s most powerful supercomputer, to develop novel personalised treatments and medication regimens. Fugaku, which is ranked No. 1 on the Top 500 list of the world’s most powerful supercomputers, has provided insights into the novel coronavirus, such as assisting scientists in the identification of prospective treatment drugs and modelling the behaviour of airborne virus particles. The researchers can use Fugaku to do simulations that study cell interactions with possible therapeutic compounds. This complicated simulation can help scientists comprehend how proteins interact with other molecules. Fugaku can drastically minimise the time spent on these time-consuming simulations. It’s a crucial tool for evaluating simulation data and identifying novel drug candidates. We can observe how the molecule moves in the body using molecular dynamics, so our comprehension is more precise. Steps in molecular dynamics: 1. File preparation (PDB to gmx). 2. Choose a Force field. 3. Cubic box is created, dimensions are defined. 4. Biomolecule is placed at the center of the box. 5. Box is solvated. 6. Ions added 7. Energy minimization 8. NVT/NPT Equilibration 9. Running MD simulations 10. Analysis like RMSD, RMSF, Radius of gyration At first the PDB files got prepared for MD simulations. Water molecules are removed. The purpose of preparing file is to generate the topology of molecules and making a position restraint file and post-processed structure file. Thereafter a force field is selected. The force field will contain the information that will be written to the topology. After that a simple aqueous system is built. It is possible to simulate proteins and other molecules in different solvents, provided that good parameters are available for all species involved. The box dimensions are defined and filled with water solvent. Ions are added to neutralize the system. Before running simulations molecules are equilibrated for potential energy, NVT and NPT. After equilibration of system, MD simulations are run. Non-bonded interactions and PME are calculated on the GPU, with only bonded forces calculated on the CPU cores. At last analysis is done with RMSD, RMSF, radius of gyration and heat map of residues [18].


7 Molecular docking and MD

7.14 Force filed: need and selection The force field is a collection of equations and associated constants designed to reproduce molecular geometry and selected properties of tested structures. In the context of molecular dynamics simulations of proteins, the term “force field” refers to the combination of a mathematical formula and associated parameters that are used to describe the energy of the protein as a function of its atomic coordinates. To describe the time evolution of bond lengths, bond angles and torsions, also the non-bonding van der Waals and electrostatic interactions between atoms, one uses a force field. Different research groups develop different force fields employing the different level of quantum chemistry calculations. These force fields are based on certain experimental data and that’s why most of the force field differs. Like GROMACS is for protein, AMBER is for nucleic acid, AMBER and CHARMM parameters take care of amino acid backbone and side chain dihedrals and different force field treat water model in the different pattern and so on. OPLS force field is most widely used and also known as UFF (universal force field). The 43A1-S3 force field (FF) suited for lipid bilayers. The 53a5 and 53a6 FF are applied to calculate the solvation free-energy for amino side chains and proteins. 56a_CARBO4GROMACS FF is applied for carbohydrate simulations and ff99bsc0_chiOL3 are for RNA.

7.15 Benefits/outcomes of MD simulations The identification of cryptic or allosteric binding sites, the augmentation of classic virtual-screening approaches, and the direct prediction of ligand binding energies are all roles that molecular dynamics simulations can play in drug discovery. Because ligand binding and the crucial macromolecular motions associated with it are microscopic processes that occur in millionths of a second, present experimental approaches make a thorough knowledge of the atomistic energetics and mechanics of binding impossible. Molecular dynamics simulations can help fill in the gaps that experimental approaches can’t.

7.16 Limitations and future prospects of MD simulations Despite these achievements, the utility of molecular dynamics simulations is still limited by two major challenges: the force fields used need to be refined, and high computational demands prevent routine simulations longer than a microsecond,



resulting in insufficient sampling of conformational states in many cases. The future of computer-aided drug design looks bright, thanks to continual advancements in both computer power and algorithm design; molecular dynamics simulations are likely to play an increasingly crucial role in the development of innovative pharmacological treatments. Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission. Research funding: None declared. Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

References 1. Berry M, Fielding B, Gamieldien J. Chapter 27-practical considerations in virtual screening and molecular docking. In: Tran QN, Arabnia H, editors. Emerging trends in computational biology, bioinformatics, and systems biology. Boston: Morgan Kaufmann; 2015:487–502 pp. 2. Ferreira LG, Santos RD, Oliva G, Andricopulo D. Molecular docking and structure-based drug design strategies. Molecules (Basel, Switzerland) 2015;20:13384–421. 3. Kapetanovic IM. Computer-aided drug discovery and development (CADDD): in silico-chemicobiological approach. Chem Biol Interact 2008;171:165–76. 4. Yuriev E, Agostino M, Ramsland PA. Challenges and advances in computational docking: 2009 in review. J Mol Recogn 2011;24:149–64. 5. Forli S, Huey R, Pique M, Sanner M, Goodsell D, Olson JA. Computational protein–ligand docking and virtual drug screening with the AutoDock suite. Nat Protoc 2016;11:905–19. 6. Morris GM, Huey R, Lindstrom W, Sanner MF, Belew RK, Goodsell DS, et al. AutoDock4 and AutoDockTools4: automated docking with selective receptor flexibility. J Comput Chem 2009;30: 2785–91. 7. Cosconati S, Forli S, Perryman AL, Harris R, Goodsell DS, Olson AJ. Virtual screening with AutoDock: theory and practice. Exp Opin Drug Discov 2010;5:597–607. 8. Rizvi SMD, Shakil S, Haneef M. A simple click by click protocol to perform docking: AutoDock 4.2 made easy for non-bioinformaticians. EXCLI J 2013;12:831. 9. Umamaheswari M, Madeswaran A, Asokkumar K, Sivashanmugam T, Subhadradevi V, Jagannath P. Study of potential xanthine oxidase inhibitors: in silico and in vitro biological activity. Bangladesh J Pharmacol 2011;6:117–23. 10. Vistoli G, Pedretti A, Mazzolari A, Testa B. Homology modeling and metabolism prediction of human carboxylesterase-2 using docking analyses by GriDock: a parallelized tool based on AutoDock 4.0. J Comput Aided Mol Des 2010;24:771–87. 11. Alder BJ, Wainwright TE. Phase transition for a hard sphere system. J Chem Phys 1957;27:1208–9. 12. Rahman A. Correlations in the motion of atoms in liquid argon. Phys Rev 1964;136:A405. 13. Rahman A, Stillinger FH. Molecular dynamics study of liquid water. J Chem Phys 1971;55:3336–59. 14. McCammon JA, Gelin BR, Karplus M. Dynamics of folded proteins. Nature 1977;267:585–90. 15. Case D, Darden TA, Cheatham TE, Simmerling CL, Wang J, Duke RE, et al. AMBER 12. San Francisco: University of California; 2012.


7 Molecular docking and MD

16. Brooks BR, Brooks CL, MacKerell AD, Nilsson L, Petrella RJ, Roux B, et al. CHARMM: the biomolecular simulation program. J Comput Chem 2009;30:1545–614. 17. Nelson MT, Humphrey W, Gursoy A, Dalke A, Kal LV, Skeel RD, et al. NAMD: a parallel, objectoriented molecular dynamics program. Int J Supercomput Appl High Perform Comput 1996;10: 251–68. 18. Chodera JD, Mobley DL, Shirts MR, Dixon RW, Branson K, Pandey VS. Alchemical free energy methods for drug discovery: progress and challenges. Curr Opin Struct Biol 2011;21:150–60.

Babar Ali, Qazi Mohammad Sajid Jamal*, Showkat R. Mir, Saiba Shams and Mohammad Amjad Kamal

8 Molecular docking studies of tea (Thea sinensis Linn.) polyphenols inhibition pattern with Rat P-glycoprotein Abstract: Since 3000 B.C., evergreen plant Thea sinensis (Theaceae) is used both as a social and medicinal beverage. Leaves of T. sinensis contain amino acids, vitamins, caffeine, polysaccharides and polyphenols. Most of the natural medicinal actions of tea are due to the availability and abundance of polyphenols mainly catechins. It has also been stated that some catechins were absorbed more rapidly than other compounds after the oral administration of tea and could increase the bio-enhancing activities of anticancer drugs by inhibiting P-glycoprotein (P-gp). The results of the molecular docking showed that polyphenols bind easily to the active P-gp site. All compounds exhibited fluctuating binding affinity ranged from −11.67 to −8.36 kcal/mol. Observed binding energy required for theaflavin to bind to P-gp was lowest (−11.67 kcal/mol). The obtained data that supports all the selected polyphenols inhibited P-gp and therefore may enhance the bioavailability of drugs. This study may play a vital role in finding hotspots in P-gp and eventually may be proved useful in designing compounds with high affinity and specificity to the protein. Keywords: molecular docking; polyphenols; Thea sinensis Linn.; P-glycoprotein.

8.1 Introduction Thea sinensis (Syn., Camellia sinensis, or C. thea) is a commonly known evergreen tea plant [1]. Tea leaves contain over 700 chemical components, including flavonoids, *Corresponding author: Qazi Mohammad Sajid Jamal, Department of Health Informatics, College of Public Health and Health Informatics, Qassim University, Al Bukayriyah, Saudi Arabia; and Novel Global Community Educational Foundation, Hebersham, Australia, E-mail: [email protected] Babar Ali, College of Pharmacy and Dentistry, Buraydah Colleges, Buraydah, Al-Qassim, Kingdom of Saudi Arabia Showkat R. Mir, Department of Pharmacognosy and Phytochemistry, Faculty of Pharmacy, Jamia Hamdard (Hamdard University), New Delhi 110062, India Saiba Shams, Siddhartha Institute of Pharmacy, Dehra Dun 248001, Uttarakhand, India Mohammad Amjad Kamal, King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia; West China School of Nursing / Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu 610041, Sichuan, China; and Enzymoics, 7 Peterlee Place, Hebersham, NSW 2770, Novel Global Community Educational Foundation, Australia This article has previously been published in the journal Physical Sciences Reviews. Please cite as: B. Ali, Q. M. S. Jamal, S. R. Mir, S. Shams and M. A. Kamal “Molecular docking studies of tea (Thea sinensis Linn.) polyphenols inhibition pattern with Rat P-glycoprotein” Physical Sciences Reviews [Online] 2020. DOI: 10.1515/psr-2018-0165 | 9783110493955-008


8 Molecular docking studies of tea polyphenols

amino acids, polysaccharides, caffeine and vitamins, which are essential for human health [2]. The main polyphenols present in tea are flavanols also known as catechins. Six catechins, catechin (C), catechin gallate (GC), epicatechin gallate (ECG), epigallocatechin (EGC), epigallocatechin gallate (EGCG) and epicatechin (EC) are most commonly found in tea [3]. Other reported compounds are theaflavin like (theaflavin-3,3′-di gallate, theaflavin-3-gallate and theaflavin-3′-gallate), theogallin, proanthocyanidin, thearubigins, also some flavonols like quercetin, kaempferol and rutin [4]. P-gp, from ATP-binding cassette (ABC) superfamily, is present in cancer cells and specific tissue plasma membranes [5–7]. P-gp may affect the bioavailability of various drugs by regulating intestinal absorption, increasing secretion resulting solubility in bile, also controlling their channel path through blood-brain barricades [8, 9]. Some studies showed P-gp is the potential origin of several drugs interacting during bowel absorption [10]. Studies of P-gp–drug interaction offers variety of therapeutically relevant components of drug such as verapamil present is orange juice and grapefruit juice act as P-gp inhibitors [11]. In recent years, more P-gp inhibitors have been reported like ginsenoside present in Ginseng radix, Zanthoxyli fructus extract, paclitaxel of Taxus brevifolia [12, 13]. Recently, many researchers emphasized that already reported inhibitors enhance the proportion of drug substances by restraining P-gp in various intestinal, renal, liver and brain tissues [14]. Most of the drugs block its possible interaction location by inhibiting P-gp, but can also be inhibited through interference of ATP hydrolysis or cell membrane lipids integrity modification [15]. Availability of several binding sites raised difficulties to understand the exact mechanism and structure–activity relationship for substrate or inhibitors. So, question could be raised that how the substrates or receptors and inhibitors active interaction sites are distinguished if the P-gp transportation and inhibition mechanisms are regulated by the same molecular sites [16]. Molecular docking studies can calculate the binding molecular positioning of drug compounds to their protein target receptor of known 3D structure. Therefore, it plays a significant role to explore the molecular structure and the basis of the drug [17, 18]. It is stated that certain polyphenols such as catechins and its derivatives could absorbed quicker upon oral intake. Recently, a study suggests that (3)-epigallocatechin gallate could enhance the bioavailability of chemotherapeutic drugs by blocking P-gp function [19]. Other natural compounds isolated from plants like piperine, gingerol, niaziridin, glycyrrhizin, 3′,5-dihydroxyflavone-7-O-β-D-galactouronide-4′-β-O-D-glucopyranoside, allicin, lysergol, curcumin, capsaicin, sinomenine, genistein, quercetin and naringin with plants extracts from Ammannia multiflora, Aloe vera, Carum carvi, Stevia rebaudiana, after given together or preadministrated with various drug compounds and nutraceutical, have shown increased bio-enhancing activities [20, 21]. However, these studies did not show the exact mechanism of action of drugs. There is an increasing interest in investigating the molecular mechanisms of the bio enhancement of various drugs with natural compounds present in tea on P-gp. Thus,

8.2 Materials and methods


we decided to understand the mechanism and possible modes of action of how tea polyphenols inhibit the multidrug resistance transporter P-gp and increase the bioavailability of drugs by docking analysis.

8.2 Materials and methods 8.2.1 3D modeling of Rat P-gp receptor Rat P-gp 3D structure was not available in the Protein Databank (, So we have modeled rat P-gp ’s 3D structure by using homology methods [22].

8.2.2 Template search To generate the 3D structure of Rat P-gp template identification finished utilizing the BLAST and HHBlits search tool within the SWISS-MODEL template library [23]. The identification of target sequences was found by searching the main protein sequence databases of containing P-gp of Rattus norvegicus (Uniprot ID: P43245) [24]. Impact and HHblits seek discovered 137 and 3270 formats separately [25].

8.2.3 Template selection The multiple sequence alignment performed to find out the best templates (Figure 8.1). On the basis percentage sequence similarity, we have selected PDB ID: 3G60 (ABCB1 A of Mus musculus) as a template for the 3D model building of Rat P-gp.

Figure 8.1: Figure showing the color pattern of multiple sequence alignment with PDB ID: 3G60 (ABCB1 A of Mus musculus) as a template for the 3D model building of Rat P-gp.


8 Molecular docking studies of tea polyphenols

8.2.4 Model building Promos-II a small software incorporated in the SWISS-MODEL has been utilized as a part of the 3D model generation followed by target-sequence matching [26, 27].

8.2.5 Model quality estimation SWISS-MODEL used the QMEAN calculation method to analyze the complete modeled 3D structure quality [28].

8.2.6 Model validation RAMPAGE (∼rapper/rampage.php) an online resource of the Ramachandran Plot examination was used to validate the 3D structure (Figure 8.2).

8.2.7 Preparation of receptor molecule Before starting the molecular interaction analyses the 3D structures were energy minimized using Chimera version 1.10 [29–31].

Figure 8.2: A, B, C showing Z score obtained from a model quality tool from SWISS-MODEL and D showing Ramachandran plot validation of modeled Rat P-gp.

8.3 Results


8.2.8 Ligand optimization The 2D files of polyphenols from tea Catechin, Epicatechin Epigallacocatechin-3-gallate and Theaflavin were obtained from the ChemSpider database (Table 8.1). We cannot use the 2D files for the molecular interaction analysis so we have converted it in .pdb format using the Discovery Studio visualizer tool. Furthermore, the addition of the Gasteiger charge was provided to the ligand molecules also the ligand’s energy was minimized by using the Chimera version 1.10 tool [29].

8.2.9 Docking studies Molecular docking interaction analysis was accomplished by AutoDock tool 4.2 [32, 33]. We have used the docking method followed by looking for P-gp’s top conformation and natural components from the tea complex by the interaction energy evaluation. Lamarckian Genetic Algorithm (LGA) [34] scoring function was used for docking analysis of P-gp compounds. Default values were set for the LGA necessary parameters for ligand–protein complex building. After completion of docking steps, the obtained 10 conformations of the P-gp and compounds complex were evaluated based on the interaction energy of the docked complex by visualization techniques using the Discovery Studio Visualizer tool.

8.3 Results The computational analysis results found by molecular interaction provided in Table 8.2. The docking results showed that natural compounds interact with P-gp. Three forms of catechins have been found to easily bind with a minor variation in the same region (Figure 8.3). The energy required by theaflavin to bind with P-gp was the lowest of around −11.67 kcal/mol, followed by the energy required by catechin (−8.36 kcal/mol), epicatechin (−8.78 kcal/mol) and epigallocatechin 3 gallate (−10.17 kcal/mol) (Table 8.2). All compounds showed binding values of energy ranging from −11.67 to −8.36 kcal/mol. Amino acids were involved in the formation of a catechin complex with P-gp were G202, S221, P222, I224, G225, S228, Y302, V305, Y306, Y309, I337, L338, G340 and T341 for epicatechin, G202, I217, L218, V220, S221, P222, I224, G225, S228, Y302, Y309, I337, L338, G340 and T341 for epigallocatechin-3-gallate T198, F199, G202, F203, S221, I224, G225, S228, Y302, V305, Y306, Y309, I337, L338, G340, T341 and I344 for theaflavin S221, P222, I224, G225, L226, S228, A229, Y302, V305, Y306, Y309, L338, T341, F342, S343, I344, G345 and H346, respectively. Some amino acids are found to be common for all compounds like S221, I224, G225, S228, Y302, Y309, L338 and T341 (Figure 8.3). The active site of P-gp mentioned in the previous study described by Linlin et al. (2012) was considered [24]. In our study, amino acids involved in the hydrophobic interaction G345, F342, Y302, I344, T341, S228, G225, Y306, I224 and Y309 were found same as earlier reported (Figure 8.4) [25]. Inhibition constant has also been determined for polyphenols, which together with energy values provide additional details. The

Chem spi- Compounds der ID

D Structure


Table .: Detailed information of polyphenols from Thea sinensis Linn. Chemical Formula

Average mass




. Da ccc(c(cc[C@@H][C@H](Ccc(cc(ccO)O)O)O)O)O




. Da C[C@H]([C@H] (OC=CC(=CC(=C)O)O)C=CC(=C(C=C)O)O)O




. Da C[C@H]([C@H](OC=CC(=CC(=C)O)O)C=CC(=C(C(=C)O)O) O)OC(=O)C=CC(=C(C(=C)O)O)O




. Da

C[C@H]([C@H](OC=CC(=CC(=C)O)O)C=CC(=O) C(=CC(=C)C(=CC(=CO)O)[C@@H] [C@@H](CC=C(C=C(C=CO)O)O)O)O)O

8 Molecular docking studies of tea polyphenols


Table .: Docking analysis results of binding of polyphenols to Rat P-gp. Polyphenols

Hydrogen bonds information



A:TYR:HH – :UNK:O :UNK:H – A:SER:OG :UNK:H – A:THR:OG :UNK:H – A:ILE:O A:SER:HG – :UNK:O :UNK:H – A:SER:O :UNK:H – A:SER:OG :UNK:H – A:THR:OG :UNK:H – A:ILE:O :UNK:H – A:THR:O :UNK:H – A:THR:O :UNK:H – A:SER:OG





Hydrogen Amino acid residues details in hydrophobic Bonds length region (Å)

Observed binding energy

Observed inhibition constant

−. kcal/mol

. μM

. G, I, L, V, S, P, I, G, −. kcal/mol S, Y, Y, I, L, G, T .

. μM

. G, S, P, I, G, S, Y, V, Y, Y, I, L, G, T . . .

. . . . T, F, G, F, S, I, G, S, −. kcal/mol Y, V, Y, Y, I, L, G, T, . I

. μM

8.3 Results

S. No.




Table .: (continued) Polyphenols

Hydrogen bonds information



A:THR:HG – :UNK:O :UNK:H – A:PHE:O :UNK:H – A:TYR:O :UNK:H – A:PRO:O

Hydrogen Amino acid residues details in hydrophobic Bonds length region (Å)

Observed binding energy

. S, P, I, G, L, S, A, −. kcal/mol Y, V, Y, Y, L, T, F, S, . I, G, H . .

Observed inhibition constant . nM

8 Molecular docking studies of tea polyphenols

S. No.

8.3 Results


Figure 8.3: The docked complex of Rat P-gp with polyphenols [Catechin (in blue color), epicatechin (in green color), epigallocatechin 3 gallate (in magenta color), theaflavin (in purple color)] showing the amino acid residues present in hydrophobic pocket of Rat P-gp. Discovery studio visualizer tool was used to made 3D graphical representation of interaction.

inhibition constant for catechin, epicatechin, epigallocatechin 3 gallate and theaflavin were 15.34 μM, 12.42 μM, 14.78 μM, and 715.17 nM, respectively. Theaflavin showed the highest 715.17 nM.

Figure 8.4: Active site interaction, visualization of selected polyphenols (in multicolor) with Rat P-gp (in gray color), 3D graphical representation was made by discovery studio visualizer.


8 Molecular docking studies of tea polyphenols

8.4 Discussion To explore molecular interaction pattern of tea polyphenols with P-gp, polyphenols such as catechin, epicatechin, epigallocatechin-3-gallate and theaflavin were used for docking studies executing AutoDock 4.2 protocols based on the principle of LGA. Unfortunately, the crystal structure of rat P-gp was unavailable in Protein Data Bank so, we have modeled the structure using a homology modeling approach from the SWISS-MODEL server. The most suitable templates were searched using the BLAST program again PDB (Protein Data Bank). We selected PDB ID: 3G60 (ABCB1 A of M. musculus) as a template for the 3D model building of Rat P-gp. The model quality estimation was done by the Q-Mean scoring function. The obtained QMEAN Z—score was −8.46 kcal/mol (Figure 8.2A, B and C). Further, the model was assessed by RAMPAGE (Ramachandran plot) analysis server. We have found that 88.5% (1097) residues were lying in favored region, 8.5% (105) were in the allowed region and 3.0% (37) were in the outer region (Figure 8.2D). The identification of active site in the receptor molecules facilitates researchers to know the molecular binding interactions of small molecules with substrates. The important parameters were determined to be the energy of binding, inhibition constant and intermolecular energy. Study evidence supports all selected tea polyphenols have contributed as P-gp inhibitors, thereby enhancing the activity of drugs in the different biological processes.

8.5 Conclusion Although it is a difficult task to understand the complete efflux mechanisms of P-gp, advanced screening methods and structure–activity relationship studies which would suggest an opportunity for anticancer drugs in regards to bioavailability enhancement and pharmacokinetics [35]. This manuscript provide details of interactions between polyphenols varying in their chemical nature and affinity with P-gp, which has helped in understanding the precise sites and the functional groups involved in inhibitor recognition.

Abbreviations P-gp uM nM Kcal/mol

P-glycoprotein Micromolar Nanomolar Kilocalorie per mole

Acknowledgment: Our team is thankful to the Qassim University and Buraydah Colleges, Al Qassim, Saudi Arabia for offering required facility to conduct proposed study.



Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission. Research funding: None declared. Conflict of interest statement: None.

References 1. Duke JA. The Handbook of medicinal herbs. Boca Raton, New York, Washington, D.C: CRC Press; 2001:93–4 p. 2. Christiane J, Edward RF. A review of latest research findings on the health promotion properties of tea. J Nutr Biochem 2001;12:404–21. 3. Chu DC, Juneja LR, Yamamoto T, Kim M. Chemistry and applications of green tea. New York: CRC Press; 1997:13–22 pp. 4. Tsung OC. Review All teas are not created equal the Chinese green tea and cardiovascular health. Int J Cardiol 2006;108:301–8. 5. Ambudkar SV, Dey S, Hrycyna CA, Ramachandra M, Pastan I, Gottesman MM. Biochemical, cellular and pharmacological aspects of the multidrug transporter. Annu Rev Pharmacol Toxicol1999;39: 361–98. 6. Doige CA, Ames GF. ATP-dependent transport systems in bacteria and humans: relevance to cystic fibrosis and multidrug resistance. Ann Rev Microbiol 1993;47:291–319. 7. Gatmaitan ZC, Arias IM. Structure and function of P-glycoprotein in normal liver and small intestine. Adv Pharmacol 1993;24:77–97. 8. Preusch PC. Equilibrative and concentrative drug transport mechanisms. In: Atkinson AJ, Daniels CE, Dedrick RL, Grudzinskas CV, Markey SP, editors. Principles of clinical pharmacology. London: Academic Press; 2001:201–24 p. 9. Dietrich CG, Geier A, Oude Elferink RP. ABC of oral bioavailability: transporters as gatekeepers in the gut. Gut 2003;52:1788–95. 10. Benet LZ, Izumi T, Zhang Y, Silverman JA, Wacher VJ. Intestinal MDR transport proteins and P-450 enzymes as barriers to oral drug delivery. J Contr Release 1999;62:25–31. 11. DiMarco MP, Edwards DJ, Wainer IW, Ducharme MP. The effect of grapefruit juice and Seville orange juice on the pharmacokinetics of dextromethorphan: the role of gut CYP3A and P-glycoprotein. Life Sci 2002;71:1149–60. 12. Yoshida N, Takagi A, Kitazawa H, Kawakami J, Adachi I. Inhibition of P-glycoprotein-mediated transport by extracts of and monoterpenoids contained in Zanthoxyli Fructus. Toxicol Appl Pharmacol 2005;209:167–73. 13. Kim SW, Kwon HY, Chi DW, Shim JH, Park JD, Lee YH, et al. Reversal of P-glycoprotein-mediated multidrug resistance by ginsenoside Rg3. Biochem Pharmacol 2003;65:75–82. 14. Tatiraju DV, Bagade VB, Karambelkar PJ, Jadhav VM, Kadam V. Natural bioenhancers: an overview. J Pharmacogn Phytochem 2013;2:55–60. 15. Shapiro AB, Ling V. Effect of quercetin on Hoechst 33342 transport by purified and reconstituted P-glycoprotein. Biochem Pharmacol 1997;53:587–96. 16. Eytan GD, Regev R, Oren G, Assaraf YG. The role of passive transbilayer drug movement in multidrug resistance and its modulation. J Biol Chem 1996;271:897–902. 17. Lengauer T, Rarey M. Computational methods for biomolecular docking. Curr Opin Struct Biol 1996;6:402–6. 18. Kitchen DB, Decornez H, Furr JR, Bajorath J. Docking and scoring in virtual screening for drug discovery: methods and applications. Nat Rev Drug Discov 2004;3:935–49.


8 Molecular docking studies of tea polyphenols

19. Julie J, Michel D, Richard B. Inhibition of the multidrug resistance P-glycoprotein activity by green tea polyphenols. Biochim Biophys Acta 2002;1542:149–59. 20. Patel HB, Dudhatra GB, Mody SK, Awale MM, Modi CM, Kumar A, et al. A comprehensive review on pharmacotherapeutics of herbal bioenhancers. Sci World J 2012;2012:33. 21. Jhanwar B, Gupta S. Biopotentiation using herbs: novel technique for poor bioavailable drugs. Int J PharmTech Res 2014;6:443–54. 22. Feng Z, Pearce L, Xu X, et al. Structural insight into tetrameric hTRPV1 from homology modeling, molecular docking, molecular dynamics simulation, virtual screening, and bioassay validations. J Chem Inf Model 2015;55:572–88. 23. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSIBLAST: a new generation of protein database search programs. Nucleic Acids Res 1997;25: 3389–402. 24. Su L, Jenardhanan P, Mruk DD, Mathur PP, Cheng YH, Mok KW, et al. Role of P-Glycoprotein at the blood-testis barrier on adjudin distribution in the Testis: a revisit of recent data. Adv Exp Med Biol 2012;763:318–33. 25. Remmert M, Biegert A, Hauser A, Soding J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 2012;9:173–5. 26. Guex N, Peitsch MC, Schwede T. Automated comparative protein structure modeling with SWISS-MODEL and Swiss-PdbViewer: a historical perspective. Electrophoresis 2009;30:S162–73. 27. Sali A, Blundell TL. Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 1993;234:779–815. 28. Benkert P, Biasini M, Schwede T. Toward the estimation of the absolute quality of individual protein structure models. Bioinformatics 2011;27:343–50. 29. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, et al. UCSF Chimera - a visualization system for exploratory research and analysis. J Comput Chem 2011;25:1605–12. 30. Wang J, Wang W, Kollman PA, Case DA. Automatic atom type and bond type perception in molecular mechanical calculations. J Mol Graph Model 2006;25:247–60. 31. Wang J, Wolf RM, Caldwell JW, Kollman PA, Case DA. Development and testing of general amber force field. J Comput Chem 2004;25:1157–74. 32. Morris GM, Goodsell DS, Halliday RS, Huey R, Hart WE, Belew RK, et al. Automated docking using Lamarckian genetic algorithm and an empirical binding free energy function. J Comput Chem 1998;19:1639–62.;2-b. 33. Rarey M, Kramer B, Lengauer T, Klebe G. A fast flexible docking method using an incremental construction algorithm. J Mol Biol 1996;261:470–89. 34. Goodsell DS, Morris GM, Olson AJ. Automated docking of flexible ligands: applications of AutoDock. J Mol Recogn 1996;9:1–5. 35. Manthena VS, Varma A, Yasvanth A, Chinmoy S, Dey B, Ramesh P. P-glycoprotein inhibitors and their screening: a perspective from bioavailability enhancement. Pharmacol Res 2003;48:347–59.

Nermin A. Osman*

9 Statistical methods for in silico tools used for risk assessment and toxicology Abstract: In silico toxicology is one type of toxicity assessment that uses computational methods to visualize, analyze, simulate, and predict the toxicity of chemicals. It is also one of the main steps in drug design. Animal models have been used for a long time for toxicity testing. Animal studies for the type of toxicological information needed are both expensive and time-consuming, and to that, ethical consideration is added. Many different types of in silico methods have been developed to characterize the toxicity of chemical materials and predict their catastrophic consequences to humans and the environment. In light of European legislation such as Registration, Evaluation, Authorization, and Restriction of Chemicals (REACH) and the Cosmetics Regulation, in silico methods for predicting chemical toxicity have become increasingly important and used extensively worldwide e.g., in the USA, Canada, Japan, and Australia. A popular problem, concerning these methods, is the deficiency of the necessary data for assessing the hazards. REACH has called for increased use of in silico tools for non-testing data as structure-activity relationships, quantitative structureactivity relationships, and read-across. The main objective of the review is to refine the use of in silico tools in a risk assessment context of industrial chemicals. Keywords: chemical compound; Evaluation; in silico tools; risk assessment; statistical modeling; toxicity.

9.1 Background The term ‘In silico’ refers to the potential in-vivo and in-vitro experiments conducted by using a computer or through computer simulation. Christopher Langton—an American computer scientist who named the term in silico—hypothesized a model about artificial life in a workshop at the Center for Nonlinear Studies at the Los Alamos National Laboratory in 1987 [1]. In silico revolution sounds to boost pharmaceutical and medical innovations and provide solutions so that at an affordable cost every person can live longer and healthier. Adequate studies carried out under the supervision of the United States Environmental Protection Agency evaluated the impact of introducing the field in silico into the industrial technology. In silico is defined as “the incorporation of modern computational approaches and information technology into molecular *Corresponding author: Nermin A. Osman, Department of Biomedical Informatics and Medical Statistics, Alexandria University Medical Research Institute, 165 El-Horria Avenue, Alexandria, 21561, Egypt, E-mail: [email protected]. This article has previously been published in the journal Physical Sciences Reviews. Please cite as: N. A. Osman “Statistical methods for in silico tools used for risk assessment and toxicology” Physical Sciences Reviews [Online] 2022. DOI: 10.1515/psr2018-0166 |


9 Statistical methods for in silico tools

biology to enhance the prioritization of chemicals and data requirements for risk assessment” [2]. Furthermore, Hartung and Hoffmann highlighted the importance of verifying the relevance of the in silico methodologies and their implementation in the industrial technology [3]. The literature revealed that hazard assessment methods are comparatively similar to drug design, industrial chemicals, and pesticides with required management plans [4–6]. Indeed, the methodologies for new chemical risk assessment to determine human toxicity endpoints often emerge from those of the preclinical studies. Risk assessment and management of chemical exposure is a comprehensive stepwise continual approach (Figures 9.1a and 9.1b), where the risk-benefit assessment of using a chemical substance is carefully considered. Risk management also determines the degree or level of the needed actions [7]. In toxicology, there are numerous industrial determinants that provoke the in silico methods. Primarily, the in silico tools developed by the pharmaceutical industry focused on drug design to model pharmacodynamics, pharmacokinetics, and toxicity testing in the biological system. These were coupled with the help of different kinds of statistical algorithms (e.g., physiologically based pharmacokinetic models, Quantitative Structure-Activity Relationship [QSAR], homology, and other molecular models) as well as machine learning, data mining, artificial intelligence, and network analysis tools. Evidence-based predictions calculate the relative values of the received and compiled information by using a formalized procedure or via an expert judgment. The values given to the available evidence are contextualized by the nature of the dataset,

Figure 9.1a: Steps of risk assessment and management.

9.2 Risk assessment comprises four processes


Figure 9.1b: Outline of the risk management methodology.

quality of the information, value of the predictors, validity, and reproducibility of the results, and adequacy of the endpoints [8–11].

9.2 Risk assessment comprises four processes 9.2.1 Hazard identification Hazard identification determines the toxicological profile of chemical substances, their adverse effects, and risk contribution.

9.2.2 Exposure assessment Exposure assessment by statistical models estimates the pharmacokinetics and pharmacodynamics of chemical substances and their processing or degradation to calculate the exposure doses.

9.2.3 Effect assessment Effect assessment estimates comparable concentrations with varying results.


9 Statistical methods for in silico tools

9.2.4 Risk characterization Risk characterization aims to identify the risk significance using the information derived from exposure and effect assessments combined with the data drawn from the in silico tools. Risk significance could be expressed as exposure/effect ratios that are important in relative risk classification. This will help identify and mitigate the use of high-risk chemicals, and ultimately seek other alternatives [12].

9.3 Risk management A risk is an uncertain event or a condition that affects the outcome. Prevention and minimization of risks could be achieved by frequent monitoring, identification, and analysis of the potential determinants and covariates that might influence the outcome.

9.3.1 In silico tools used for risk assessment Lack of data and tools that tackle the health hazards to the humans and the environment are key barriers to the growing industrial technology, raising awareness to establish risk assessment. Animal models used to conduct such assessments are expensive, tedious, and may exhibit ethical violations. Therefore, introducing other evidence-based methods is essential. In silico tools are the most recent methods that encompass Structure-Activity Relationships (SARs), QSARs, Read-Across, and expert systems. These tools create a database for risk assessment by linking the molecular biology chemicals with statistical sequencers [13]. Structure-activity relationships (SARs) SAR describes the relationship between a molecule structure (simple or threedimensional [3D]) and its biological activity. SAR analysis serves mainly to classify the chemical properties of the unique fragments in the molecular structure that are responsible for the observed effect. Moreover, SAR explains to what extent the modifications made in the chemical structures lead to a biological activity [14]. Via chemical synthesis, pharmaceutical companies incorporate new fragments into the medicinal compounds, monitor their biological effects, and evaluate this modification. Water, air, soil, and microbiota are usually exposed to chemical substances. Reaching an equilibrium often depends on the properties of the media and chemicals as well. Biodegradation is one of the most significant transformation processes that influence the pathways and integration of chemical substances into the surrounding ecosystem. Figure 9.2 illustrates a scheme of the steps involved in the equilibrium

9.3 Risk management


Figure 9.2: Flowchart illustrates the steps of the integration and biodegradation of a chemical by a biological organism.

process. The success of a specific SAR in predicting toxicity relies on the sequence of the biodegradation reactions and how well the selected SAR descriptor encapsulates in this phase. By detecting the sequence, a specific SAR descriptor may correlate better with a certain substance or induce a poor association with another. These discrepancies are attributed to the physicochemical, thermodynamic, molecular, or quantum chemical properties of the chemical substances. Experiments revealed that related compounds tend to have significant relationships with all descriptors, although diverse groups of chemicals showed that only certain descriptors can react. Therefore, it is important to address the discrepancy at


9 Statistical methods for in silico tools

the molecular level because each activity (e.g., reaction and biotransformation abilities, solubility, and target behavior) depends on other differences. The paradox of SAR implies that not all identical molecules exhibit similar activities [15, 16]. Such qualitative relationships have facilitated the synthesis and production of new compounds that might be safe to the environment, requiring further validation. This sheds light on the necessity of mathematical correlation analysis known as “QSARs” between the structure of the chemical compounds and their biological activities. QSARs The Registration, Evaluation, Authorization, and Restriction of Chemicals (REACH) regulations recommend QSAR models. They are quite similar to regression modeling used in both the chemical and biological sciences. QSAR models link a set of predictorindependent variable (X) (e.g., the physicochemical properties or theoretical molecular descriptors) to the potency of the response-dependent variables(Y). They could predict a categorical value of the response variable (i.e., biological activity), which is similar to logistic regression. Additionally, QSAR models forecast the potency of novel chemical compounds. QSAR has a mathematical model as follows [17]: Activity = f (physiochemical properties and/or structural properties) + error The error consists of bias error and measurable variability (i.e., the variation in the measured dataset on a correct model). QSAR modeling produces predictive tools to statistically quantify the association between biological activity in chemical substances under study with descriptors of the molecular structure [18]. QSAR models have wide applications in risk assessment, prediction of toxicity, and drug design. QSAR models with good quality depend on the quality of the dataset, the precision in determining descriptors, and testing the assumption of validation. Proper QSAR models should ultimately yield statistically robust and predictive models that implement reliable, rigorous, and precise predictions of the modeled response of novel molecules. Tests for the validity of the assumption are the process of ensuring the relevance and reproducibility of a procedure for a particular objective. The following steps should be considered while submitting the model [19]: 1. Cross-validation or internal validation should be measured to ensure that allimportant predictors are included. 2. Consistency should be measured by splitting or dividing the available dataset into a training set to develop a model and a prediction set to check the assumptions. 3. Generalizability by applying the model on a new external dataset. 4. Data randomization to verify the lack of chance association between the modeling and the response descriptors.

9.3 Risk management


These models demonstrate good classification performance and proper predictive models for the new chemicals. Adding the recent statistically robust software in the analysis leads to 3D data visualization by applying machine learning and force field calculations. The three-classification modeling includes Partial Least Squares Discriminant Analysis, k-Nearest Neighbors, and Support Vector Machine. Consensus analysis incorporates knowledge and predictions via the three separate modeling techniques [20–23]. The consensus approach will foster the efficacy of the models by increasing their reliability for prediction. Moreover, consensus modeling has mitigated the impact of noisy data. Implementation of such mathematical techniques results in better exploration of the chemical space and balances the potential biases associated with each modeling algorithm as shown in Figure 9.3.

Figure 9.3: The three classification modeling of the QSAR data. Read-across Read-across or read-across and grouping use the relevant information from analogous substances to forecast the properties of the target compounds. It is a widely used method to fill data gaps in the REACH logistic regulation [24, 25]. Using QSAR, one or a group of chemicals with an interesting property (e.g., physicochemical characteristic, biological activity, or environmental fate) can be used to predict the same property of other related chemicals. This sets guidelines to evaluate the scientific aspects of the read-across scenario, resulting in appropriate outputs for regulatory consideration. Nevertheless, it does not answer all research questions and experts must judge when implementing the read-across approach. Expert opinion Figure 9.4 illustrates the systematic approach of risk evaluation based on expert opinion according to the following steps [26–28]: (1) profiling of the molecular structures and physicochemical properties with molecular descriptors in addition to a chemical mapping (Paper I); (2) evidence-based selection of test substances (Paper II); (3) exploring the association between the chemical properties, the molecular descriptors, and the results of QSAR modeling (Paper III and IV); and (4) the predictive potential of


9 Statistical methods for in silico tools

a model should be checked by internal and/or external validity methods. A wellvalidated model could predict the effects of the untested substances and their properties. Integration of the holistic four tools is essential to establish a robust database for the toxicological assessment of a drug or chemical.

Figure 9.4: Systematic approach of risk evaluation based on expert opinion.

9.3.2 Statistical methods for in silico risk assessment The statistical analyses applied to in silico models for continuous risk assessment determine the ordinal and nominal scale endpoints (discontinuous or commonly coded as simple integer values) [29, 30]. Herein, the intervals between the scores progressing along the scale are undefined in size. Toxicities recorded as ‘1 = little/none’, ‘2 = moderate’, or ‘3 = high’ are considered as ordinal data because they form an ordered sequence, but the steps between the values are undefined. When two outcomes are recorded (e.g., nontoxic and toxic), the data are reduced to a simple categorization and described as ‘nominal’ or ‘categorical’ outcomes. This pattern of analysis is called “discrimination analysis” that establishes a cut-off value beyond which a chemical is considered ‘safe’. Employing factor analysis can determine the factors contributing to the toxicity of a certain chemical. Factor analysis is a statistical method used to explore the data structure by testing the correlations among the selected variables. It

9.3 Risk management


summarizes data into a few dimensions by condensing numerous variables into a smaller set of variables/factors using methods of cluster classification or logistic regression. Regression models The following equation is referred to as ‘logistic regression’ due to the use of Logit transform. A regression equation is set up in the usual form: Logit(P) =  constant + coefficient * descriptor To assess the proper fitting of the regression equation to a certain dataset, each chemical is taken and the Logit of probability using its value for the descriptor. Since a value of zero equates to the point of exact balance, any positive value of the Logit indicates a possibility of >50% of being “toxic”, whereas a negative value denotes more likeliness of being “nontoxic”. Optimization means adjusting the values in the regression equation where the fitting does not use the usual ‘least squares’ approach, but a criterion called ‘maximum likelihood.’ [31, 32] The following multiple logistic regression is used to assess many descriptors simultaneously: Logit(P) = constant + coefficient 1 * descriptor 1 + coefficient 2 * descriptor 2 ... etc. Then, logistic regression produces a series of equations that calculates the likelihood that a chemical would fall in a given toxicological category. The first equation calculates the likelihood of membership of the lowest category, the second equation calculates the likelihood of belonging to either of the two lowest categories and then the next does the same for membership within the three lowest and so on. Due to the cumulative nature of these predictions, the likelihood always increases working up through the series. In order to submit the regression model, the following prerequisites should be considered [33]: – Is it the chemical of interest? – Is it proved to be scientifically valid and feasible? – Is the result appropriate to predict the regulatory purposes? – Are the documents collected sufficient to yield valid results? The F-values for a model against the critical value for a certain degree of freedom can determine the model’s significance. Then, the significance of the model increases proportionally beyond the critical value with the measured F values. The statistically significant model by the minimum number of predictors warrants the highest prediction and that the predictors are more preferable to be uncorrelated. A pairwise posthoc analysis should be conducted to determine the statistical significance of each regression coefficient. This helps assign specific predictors that contribute significantly to explain the outcome variable [34].


9 Statistical methods for in silico tools

It is mandatory to evaluate the validity and reliability of a particular prediction in reference to a specific regulatory intent, considering the availability of other knowledge regarding the assessment of the weight of evidence assessment. That is, the question is whether the collected data are enough to draw a regulatory inference, and if not, what are the required strategies to minimize the degree of uncertainty and endorse confidence in the conclusion? [35] There is a proven argument concerning the relevance of the model endpoint for the assumed evidence-based purpose. For example, relevance is obvious if the model predicts the regulatory outcome such as the value for acute toxicity. However, further extrapolation is required to correlate the model endpoint (e.g., nucleophilic reactivity to DNA/proteins) with the regulatory endpoint estimate, if several models—especially the new generated ones—focus on predicting mechanistic lower-level endpoints. This should consider the severity of the decision and the possible consequences of reaching a wrong/irrelevant conclusion. Finally, the quantity and quality of information given depend on the data uncertainty, severity of the regulatory decision, and the probability of errors [36]. Therefore, it is unreasonable to establish absolute criteria (common in all regulatory decision-making models) to evaluate the adequacy of risk assessment. One should consider the possibility of overfitting while assessing the validity of statistically based models [37]. The goodness of fit of a regression model reflects the amount of response variability shared by the predicators in the dataset in addition to the model’s significance. The ideal model balances between the complexity, relevance of the applied predictors, and applicability. Moreover, it should rely on the minimum necessary available information. Otherwise, the model might be overfitted, (i.e., very complex and noisy) or under fitted (i.e., very simple and lack essential information) [38]. Therefore, the model should not be used for predictive purposes if it is nonstatistically significant. In contrast, evidence-based rather than statistically based models do not account for these considerations. There are two main reasons for overfitting (i.e., improper selection of independent variables and choice of modeling technique) that lead to model complexity that does not improve its performance. On the one hand, improper selection of independent variables occurs by including unnecessary descriptors to capture the variance of the response, by using intercorrelated predictors (i.e., multicollinearity), or by using nonmeaningful predictors that are correlated by chance with the response [39]. On the other hand, overfitting occurs upon choosing a modeling technique that is more sophisticated to assess the outcome descriptors, or when facing difficulties in describing particular dependencies. Consequently, the statisticians submitted a more specific pattern of modeling, known as “classification models.” Classification models For a classification model, the goodness of fit is evaluated for sensitivity, specificity, accuracy, positive or negative predictive values (PPVs or NPVs), criterion, and false

9.3 Risk management


positive/negative rate [40, 41]. Furthermore, for individual classification models, the relevant statistics should be considerably larger than a predefined threshold of 50%; although, a lower proportion could be acceptable if the focus is on PPVs or NPVs instead of sensitivity or specificity. The Receiver Operational Characteristic (ROC) curve analysis and plotting the sensitivity versus 1 – specificity are usually used to match the variety performance between the different classification models. Methods of checking whether a model is overfitted include applied mathematics computer code packages (e.g., Minitab and MedCalc). Contrarily, a biostatistician will request sufficient evidence-based data about the feasible diagnostic rules and the underlying methods to interpret the results [42]. Excluding the statistical methods that assess the goodness of fit that require proper knowledge and experience, the following could be applied based on the information: (a) the goodness of fit, statistical significance, and internal predictivity, the subsequent statistics should be available such as n, r2 (R2), q2 (Q2), R2adj, s, F statistics including p values; (b) ratio of a number of chemicals (n) to predictors should be at least 5:1 [43]; (c) transparent and comprehensive mechanistic model interpretation and the maximum number of predictors should be 5–6 [44]; (d) the standard error estimate should not be significantly under the experimental error 0.05 for the predicted endpoint. Model relevance The model applicability domain (AD) is a multifaceted concept that could be conveyed as descriptor, mechanistic, structural fragment, and metabolic domains. The dependability of a prediction is assessed by whether the chemical of interest has descriptor values within the predefined ranges, known structural fragments to the model, its predefined mode and/or mechanism of action, the opportunity that it should endure transformation or metabolism, and the characteristics of any products [45]. However, there is no standard measure of model reliability that could be used as evidence-based regulatory guidance. Model reliability should be considered as a literature-based or subjective expert-based concept, using the context where the model is implemented. Thus, a greater or lesser degree of reliability might be adequate for a certain regulatory application, implying that the AD should fit the regulatory context. To evaluate whether a given model is applicable for a given chemical, the following questions should be answered by “yes” [29]: 1. Is the chemical in question fitting well according to the scope of the model? 2. Is the predetermined domain suitable for the regulatory objective? 3. How likely does the model forecast the chemicals that simulate the targeted chemical substance? 4. Considering other confounders, is the model estimate reasonable and relevant? The relevance of putting a clear scientific definition for the model domain has become crucial while tackling a research hypothesis. Data about the descriptors, metabolic or


9 Statistical methods for in silico tools

molecular, and structural fragments domains, are scarce. The recently available models have not been customized for the current regulatory needs and unavoidably include useful biases via the context of prediction. A model may have some deviations toward certain categories of chemicals or a specific prediction that may not have an effect on the model validity; however, they may influence its pertinence for specific purposes. Data concerning these biases can help the user confirm the suitability of the model. Model acceptability is feasible via the read-across argument to assist the reliability of the prediction. This could be determined by the prognostic capability of the model for one or more analogs that are as the same as the one of interest, and the existence of the measured values. Based on the available information, additional generic verification can determine the reasonableness of the predicted value. The judicious applications that assess the model’s applicability are not simple and need specialized expertise. Software applications that generate model estimates differ in the manner and extent to which they integrate and report AD considerations. Author contributions: The author has accepted responsibility for the entire content of this submitted manuscript and approved submission. Research funding: None declared. Conflict of interest statement: The author declares no conflicts of interest regarding this article.

References 1. Koruga D. Ultimate computing: biomolecular consciousness and nanotechnology. Biosystems 1988;22:83–4. 2. Breville M. US environmental protection agency tribal environmental health research program. Epidemiology 2011;22:S115. 3. Hartung T, Hoffmann S. Food for thought on in silico methods in toxicology. ALTEX 2009;36: 155–66. 4. Ekins S, Mestres J, Testa B. In silico pharmacology for drug discovery: methods for virtual ligand screening and profiling. Br J Pharmacol 2007;152:9–20. 5. Ekins S, Mestres J, Testa B. In silico pharmacology for drug discovery: applications to targets and beyond. Br J Pharmacol 2007;152:21–37. 6. Muster W, Breidenbach A, Fischer H, Kirchner S, Müller L, Pähler A. Computational toxicology in drug development. Drug Discov Today 2008;13:303–10. 7. Tennekes H. Novel approaches to chemical risk assessment. Environ Risk Assess Remediat 2017;3: S1. 8. Kortagere S, Krasowski M, Ekins S. The importance of discerning shape in molecular pharmacology. Trends Pharmacol Sci 2009;30:138–47. 9. Valerio L Jr. In silico toxicology for the pharmaceutical sciences. Toxicol Appl Pharmacol 2009;241: 356–70. 10. Merlot C. Computational toxicology—a tool for early safety evaluation. Drug Discov Today 2010;15: 16–22. 11. Worth A. The future of in silico chemical safety … and beyond. Comput Toxicol 2019;10:60–2.



12. Stenner R, Kees Van Leeuwen, Theo Vermeire (Eds.): Risk assessment of chemicals—an introduction. Environ Sci Pollut Res 2008;15:450–1. 13. Myatt G, Bower D, Cross K, Hasselgren C, Miller S, Quigley D. In silico toxicology protocols and software platforms. Toxicol Lett 2017;280:S286. 14. Ma J, Tong C, Liaw A, Sheridan R, Szumiloski J, Svetnik V. Generating hypotheses about molecular structure-activity relationships (SARs) by solving an optimization problem. Stat Anal Data Min: ASA Data Sci J 2009;2:161–74. 15. Gao G. Statistical modeling of SAR images: a survey. Sensors 2010;10:775–95. 16. Gupta-Ostermann D, Shanmugasundaram V, Bajorath J. Neighborhood-based prediction of novel active compounds from SAR matrices. J Chem Inf Model 2014;54:801–9. 17. Roy K, Kar S, Das R. Understanding the basics of QSAR for applications in pharmaceutical sciences and risk assessment, 2nd ed. Amsterdam: Academic Press, an imprint of Elsevier; 2015. 18. Gramatica P. Principles of QSAR modeling. Int J Quant Struct-Property Relat 2020;5:61–97. 19. Gramatica P. Principles of QSAR models validation: internal and external. QSAR Comb Sci 2007;26: 694–701. 20. Migut M, Worring M. Visual exploration of classification models for various data types in risk assessment. Inf Visual 2012;11:237–51. 21. Brereton R, Lloyd G. Partial least squares discriminant analysis: taking the magic away. J Chemom 2014;28:213–25. 22. Salvador-Meneses J, Ruiz-Chavez Z, Garcia-Rodriguez J. Compressed kNN: K-nearest neighbors with data compression. Entropy 2019;21:234. 23. Kumar R. Signature verification using support vector machine (SVM). Int J Sci Res Manag 2017;5: 5327–30. 24. Kovari A, Andersson N, Bell D, Cartlidge G, Fedtke N, Kojo A, et al. Read-across in REACH and the read-across assessment framework (RAAF). Toxicol Lett 2018;295:S9. 25. Benfenati E, Chaudhry Q, Gini G, Dorne J. Integrating in silico models and read-across methods for predicting toxicity of chemicals: a step-wise strategy. Environ Int 2019;131:105060. 26. Cherkasov A, Muratov E, Fourches D, Varnek A, Baskin I, Cronin M, et al. QSAR modeling: where have you been? Where are you going to? J Med Chem 2014;57:4977–5010. 27. Muratov E, Bajorath J, Sheridan R, Tetko I, Filimonov D, Poroikov V, et al. QSAR without borders. Chem Soc Rev 2020;49:3525–64. 28. Matthews E, Contrera J. In silico approaches to explore toxicity end points: issues and concerns for estimating human health effects. Expet Opin Drug Metabol Toxicol 2007;3:125–34. 29. Raies AB, Bajic VB. In silico toxicology: computational methods for the prediction of chemical toxicity. Wiley Interdiscip Rev Comput Mol Sci 2016;6:147–72. 30. Raunio H. In silico toxicology – non-testing methods. Front Pharmacol 2011;2:33. 31. Basilevsky A. The ratio estimator and maximum-likelihood weighted least squares regression. Qual Quantity 1980;14:377–95. 32. Fletcher J. Multiple linear regression. BMJ 2009;338:b167. 33. Nagy G. Sector based linear regression, a new robust method for the multiple linear regression. Acta Cybern 2018;23:1017–38. 34. Beran R. Prediction in random coefficient regression. J Stat Plann Inference 1995;43:205–13. 35. Stoltzfus J. Logistic regression: a brief primer. Acad Emerg Med 2011;18:1099–104. 36. Zhang Z. Residuals and regression diagnostics: focusing on logistic regression. Ann Transl Med 2016;4:195–6. 37. Rynkiewicz J. General bound of overfitting for MLP regression models. Neurocomputing 2012;90: 106–10. 38. Kumar R. Errors in use of multivariable regression analysis. Indian J Pharmacol 2015;47:571–2.


9 Statistical methods for in silico tools

39. Senaviratna NA, Cooray T. Diagnosing ulticollinearity of logistic regression model. Asian J Probab Stat 2019;2:1–9. 40. Hansen M, Cai L, Monroe S, Li Z. Limited-information goodness-of-fit testing of diagnostic classification item response models. Br J Math Stat Psychol 2016;69:225–52. 41. Kartoun U. A glimpse of the difference between predictive modeling and classification modeling. J Clin Epidemiol 2019;109:142. 42. Krupinski E. Receiver operating characteristic (ROC) analysis. Frontline Learn Res 2017;5:31–42. 43. Topliss J, Edwards R. Chance factors in studies of quantitative structure-activity relationships. J Med Chem 1979;22:1238–44. 44. Heinze G, Wallisch C, Dunkler D. Variable selection – a review and recommendations for the practicing statistician. Biom J 2018;60:431–49. 45. Cheng F, Ikenaga Y, Zhou Y, Yu Y, Li W, Shen J, et al. In silico assessment of chemical biodegradability. J Chem Inf Model 2012;52:655–69.

Maya Madhavan* and Sabeena Mustafa

10 Systems biology–the transformative approach to integrate sciences across disciplines Systems Biology: Integrating Biological Sciences Abstract: Life science is the study of living organisms, including bacteria, plants, and animals. Given the importance of biology, chemistry, and bioinformatics, we anticipate that this chapter may contribute to a better understanding of the interdisciplinary connections in life science. Research in applied biological sciences has changed the paradigm of basic and applied research. Biology is the study of life and living organisms, whereas science is a dynamic subject that as a result of constant research, new fields are constantly emerging. Some fields come and go, whereas others develop into new, wellrecognized entities. Chemistry is the study of composition of matter and its properties, how the substances merge or separate and also how substances interact with energy. Advances in biology and chemistry provide another means to understand the biological system using many interdisciplinary approaches. Bioinformatics is a multidisciplinary or rather transdisciplinary field that encourages the use of computer tools and methodologies for qualitative and quantitative analysis. There are many instances where two fields, biology and chemistry have intersection. In this chapter, we explain how current knowledge in biology, chemistry, and bioinformatics, as well as its various interdisciplinary domains are merged into life sciences and its applications in biological research. Keywords: biosensors; biofuels; metabolic engineering; modelling; synthetic biology; systems biology.

10.1 Introduction Spectacular progress has been made in biological sciences due to efforts by scientists across various disciplines of life sciences. Applications of newly developed tools and methods lead to the fast advancement of biotechnology and pharma industry. Evolution of biological sciences goes at an exponential rate and interdisciplinary research *Corresponding author: Maya Madhavan, Department of Biochemistry, Government College for Women, Thiruvananthapuram, Kerala, India, E-mail: [email protected] Sabeena Mustafa, Department of Biostatistics and Bioinformatics, King Abdullah International Medical Research Center (KAIMRC), King Saud Bin Abdulaziz University for Health Sciences, King Abdulaziz Medical City, Ministry of National Guard Health Affairs (MNGHA), Riyadh, Kingdom of Saudi Arabia This article has previously been published in the journal Physical Sciences Reviews. Please cite as: M. Madhavan and S. Mustafa “Systems biology–the transformative approach to integrate sciences across disciplines” Physical Sciences Reviews [Online] 2022. DOI: 10.1515/psr-2021-0102 |


10 Systems biology–the transformative approach

Figure 10.1: Interdisciplinary connections in biological sciences.

became an important tool in developing suitable treatment for diseases. Interdisciplinary is widely accepted as an increasingly important component in life sciences after the human genome project [1]. A majority of biological research relies on molecular biology techniques such as chromatography, polymerase chain reaction, microarrays, next generation sequencing (NGS) etc. [2]. Transdisciplinarity inspires us to ponder a unity of knowledge beyond traditionally accepted disciplines. Figure 10.1 shows an overview of the interdisciplinary connections in biological sciences. Chemical biology is the most recent of the fields to emerge from the interface of chemistry and biology. In recent years, various computational biology methods such as bioinformatics and cheminformatics applications have changed the paradigm of research in the field of life sciences. Bioinformatics, computational biology, genomics, transcriptomics, proteomics, metabolomics, and other fields of life science research arose in 1970s from shotgun marriage of molecular biology and computer science and engineering [3]. Bioinformatics is at the crossroad of many traditional and new generation disciplines in this new era of transdisciplinary science. The contribution of bioinformatics advances helped the mapping of human genome and genomes of many other organisms [4]. Computational prediction of drug-target interactions envision the exploitation of results towards improved biomedical applications leveraging the development of new drugs. The revolution in high throughput molecular measurement systems has brought about an abundance of biological data at molecular level that requires innovative computational methods to process the data and answer questions. This led to the emergence of a new field called systems biology that provides a framework for assembling and manipulating models of biological systems. The roots of this branch can be traced back to Francois Jacob who stated that every object studied in biology consists of many systems, and a biological system consists of interconnected components which exist in mutual dependency and thus comprise a unified whole [5]. The aim of this chapter is to present information about the newly evolving discipline of systems biology which focuses on complex interactions within biological systems.

10.2 Transforming biology-insights from the systems biology approach


10.2 Transforming biology-insights from the systems biology approach 10.2.1 Systems and systems biology Systems biology is the study of biological systems. The systems approach of biological sciences has been propelled by the successful completion of human genome project [6]. The development of systems biology has been driven mainly by three technological advances: high throughput, automated genetic manipulation techniques, availability of complete genomic sequences and technologies for disrupting genes in trans. Thus, the evolution of systems biology occurred concomitant with rise of transcriptomics, proteomics, and metabolomics which are some of the omics technologies. Systems biology is also known as “Integrative Biology”, with the goal of being able to predict de novo biological consequences from a given a list of components [7]. It aims to explore biochemical and biological systems holistically, with the goal of transforming biology. This area of science takes a holistic approach to analyze the complexity of biological systems, based on the understanding that the networks of a whole organism are more than the sum of their parts [8]. It is collaborative, bringing together scientists from several fields such as biology, computer science, engineering, bioinformatics, chemistry, physics, and others to forecast how complex systems will behave. Biological information has certain characteristics which enables it to be represented as networks (Figure 10.2).

Figure 10.2: Characteristics of biological information.


10 Systems biology–the transformative approach

Systems biology uses computational methods to understand interactions and dynamics in systems. These methods enable many advanced studies especially in systems pharmacology and precision medicine for complex diseases [9]. The terms “top–down” and “bottom–up” are widely used to describe systems biology approaches [10, 11]. Furthermore, practitioners of systems biology can be split into two types (which are not mutually exclusive): pragmatic and systems-oriented [12]. Collaboration between theory and experiment is common in systems biology experiments. The chosen model system must be suitable for experimental research, as well as complicated enough to capture the actual phenomenon of interest, like in traditional biological experiments. For various reasons, microorganisms are useful models for studying metabolic networks, gene regulatory networks, and protein–protein interaction networks in systems [13]. These reasons include: (i) the availability of sophisticated molecular biology techniques for manipulating experiments, (ii) the availability of quick and inexpensive culture processes that provide adequate material for controlled studies, (iii) the medicinal and environmental relevance of the pathogenic ones’, as well as (iv) industrial applications. Escherichia coli and other simple bacteria are frequently employed as model organisms to study the organization and behavior of prokaryotic systems. The fruitfly Drosophila and the worm Caenorhabditis

Figure 10.3: Systems biology: An integrative approach [15].

10.2 Transforming biology-insights from the systems biology approach


elegans are also employed as models to better comprehend increasingly complex multicellular animals [14]. Metabolic pathways, signal transduction pathways, and regulatory pathways have all been investigated in a variety of organisms. Metabolic pathways, signal transduction pathways, and regulatory pathways have been investigated in a number of organisms, yielding a wide range of biological findings. Systems biology is considered as a multidisciplinary area of science where biologists and chemists execute biological studies, informaticians provide complicated data analysis and synthesis, and systems modelers develop and operate in-silico models of the biological system. The three main areas of systems biology are biological experiments, data synthesis, and systems modeling (Figure 10.3). Data deconstruction and synthesis statistically define the main components and, more critically, the interactions that drive the biological function or phenotype. In the

Figure 10.4: Top–down (left) and bottom–up modeling (right) approaches in systems biology [16].


10 Systems biology–the transformative approach

post-analysis data integration approach, the different data sets are networked together to create an overall computational model which can be calibrated and validated. Bottom–up and top–down techniques are critical in systems biology for assembling data from all levels of biological pathways that must coordinate physiological functions [17]. The former includes the creation of automated tools and the execution of mathematical models, while the latter includes data processing from the ‘omics’ level to pathways and/or specific gene levels of an organism (Figure 10.4) [18]. These techniques were illustrated by Oltvai and Barabasi in the form of a pyramid, with two levels of “organism specificity” and “universality.” They demonstrated that a cell can be approached equally from the bottom up (universality) or the top down (organism specificity), i.e., from molecules to scale-free networks or modules, or from a network’s scale-free and hierarchical nature to organism-specific modules [19]. The bottom–up technique integrates all organism-specific information into a comprehensive genome-scale model to provide an integrative perspective of the biological interactions that occur inside living systems. Experimental data and information are utilized to recreate metabolic models in the top–down approach via ‘omics’ data. In systems biology, determining the appropriate “level of description” is a constant challenge. Another problem is deciding on the model’s boundaries. The cell boundary is a useful system boundary in genome-scale investigations of microbial organisms. In most other circumstances, determining the proper system boundary is more opaque and must be specified based on existing knowledge about components and the coupling between these components. A cell is an “embedded system” in which one or more custom computers interact with physical systems [20]. This is in line with the characteristics of modern embedded systems, which include the following: (a) interaction with the environment, (b) liveliness, (c) concurrentness, (d) robustness, (e) reactiveness, and (f) heterogeneity. The different modelling formalisms for embedded systems that can be applied to a cell are: (1) Control System Modelling – This is based on control theory and uses ordinary differential equation-based models to investigate and predict transient and steady state behavior of physical systems. (2) Process Modelling – this is based on algorithmic processes that proceed in steps and are independent of time elements to characterize behavior. (3) Actor Modelling – When a distributed chemical algorithm is formed by interactions between distinct types of protein populations, process algorithms may not operate. However, the actor model uses simulation that characterizes transient processes to generate a theoretical computational framework. Although a model of a single biological feature is useful, integrating numerous models to offer a more complete behavior description is more useful from the standpoint of systems biology. Supports such as systems biology markup language (SBML) and Ptolemy can aid with model exchange and aggregation in this regard.

10.2 Transforming biology-insights from the systems biology approach


10.2.2 Network modelling in systems biology One of the keystones in systems biology is network modelling where cells are represented as a complex interplay of networks at different levels. A “network” is defined as a collection of “nodes” and “edges” that connect pairs of nodes. Molecular components are often represented as nodes in network representations of biological molecular systems, with their interactions or linkages represented as edges. Genes, proteins, metabolites, medications, and even diseases and phenotypes can be molecular components in biological networks, and connections can include direct physical contacts, metabolic coupling, and transcriptional activation. Protein–protein interaction networks, cellular signaling networks, gene regulatory networks, disease gene interaction networks, and drug interaction networks are examples of biological networks that can be built [21]. Dynamical modelling, also known as mechanistic modelling, can be thought of as mathematical translations of pathway maps [22]. The optimal mathematical form for a dynamical model is determined by the attributes of the system under investigation and the modeling effort’s goals. There are two types of dynamical systems: deterministic and stochastic. A deterministic dynamical system has a trajectory that is determined by the beginning state and a set of parameters, but a stochastic dynamical model might go to several states with different probability even at the initial condition. There are four steps to create a dynamical model and they are: (1) model design, (2) model construction, (3) model calibration also known as model regression, and (4) model validation. Dynamic models have become a widely used tool to understand cellular regulatory processes. However, since a comprehensive set of benchmark problems is absent, many of these models are not tested properly in application settings [23].

10.2.3 From systems biology to synthetic biology The evolution of synthetic biology has created a paradigm shift in using engineering principles to redesign existing natural biological systems for a given purpose [24]. The term synthetic biology was coined much earlier in 1912 by French biophysicist Stephane Leduc. The rapid advances in recombinant DNA technology inspired the oncologist Waclav Szybalski to popularize the new field of synthetic biology in the 1970s. Currently, synthetic biology is an interdisciplinary field with an inclusive and theoretical framework which applies concepts of electronic circuiting and mechanical manufacturing to cell-free systems, organs and tissues so as novel biological systems can be created. Few of the exciting applications of this field exend to bioremediation, bioproduction, biosensing, probiotic delivery, living therapeutics etc. Synthetic biology is a field that uses biological engineering to build molecular understanding of biological systems to make or redesign microorganisms, plants, animals, and algae. It is based on the fundamental principle that biological systems


10 Systems biology–the transformative approach

can be considered as composed of individual functional elements which can then be recombined in new configurations so as to create new set of properties [25]. Each of these biological parts is conceived as a genetic blueprint in the DNA which indicates that they can be manufactured and reassembled into a larger system by playing with the DNA encoded parts. Such repurposing and reengineering of existing biological parts enable creation of biological systems which can algorithmically process information, with high value applications such as manufacturing food [26] and fuels, creating medicines [27], and developing diagnostics [28]. Though we can say that synthetic biology is a discipline where engineering marries biology, there are some characteristics of living systems which act as the engineering principles that formed the basis for synthetic biology. They are listed below: (1) Modularity–Modularity is an essential concept for synthetic biology which stems from the modular composition of genome which is composed of units such as genes, operons, epigenetic and other regulatory elements, transposons etc. Although these modules of biological systems are comparable to modular electronic devices such as switches, amplifiers etc., manufacturing and assembling biological modules into complex functional systems is challenging. The first and foremost problem is that it is not easy to characterize and comprehend these genetic elements [29]. Another possibility is that too many target promoters can be driven by transcription factors in transcriptional networks which can violate the concept of modularity [30]. Several types of modules such as feedback and feed forward loops identified by network analysis is another bottleneck because visualization and representation of large scale networks is a stumbling block [31]. In short, modular design of biological systems necessitates a careful study of such constraints. (2) Stochasticity–Stochastic gene expression, also known as noise has earlier been considered as one of the biggest landmarks of synthetic biology. Noise is basically generated by accumulating mutations ( 18724274/) and stability of genetic circuits needs to be addressed in this regard. Noise dependence is also important in the way in which living organisms respond to changes in environment. Development of single-cell measurement techniques such as fluorescence microscopy haspermitted the observation of the stochastic nature of biology [33]. Though bulk measurements of cell populations can be adversely affected by stochasticity, design of genetic circuits to exploit this effect is possible using the features like noise-induced bistability, ultrasensitive response, and linearization [34]. (3) Evolvability and robustness–Directed evolution is a technology that uses the qualities of evolvable systems to accelerate evolution and identifying new biological traits. It is a subset of synthetic biology that focuses on novel methodologies and applications at the pathway and genome scale [35]. Synthetic biologists have been looking for ways to improve the robustness of circuits. (4) Analog computation–The idea of building computing and signal-processing skills into living cells is one of the most fascinating aspects of synthetic biology

10.2 Transforming biology-insights from the systems biology approach


[36]. Until now, the most common method for designing biological circuits has been digital computation. The toggle switch, which consists of a two-gene network in which each gene product inhibits the expression of the other, was the first and simplest to be invented. Novel designs of intelligentized gene circuits based on CRISPR technology such as logic gates, analog computing circuits and memory devices help in improving the toolbox of gene editing [37]. (5) Control–To interface our own synthetic systems with natural systems, we will need to understand the principles of control in natural systems. The presence of complex networks of feedback and feedforward control is a distinguishing trait of living systems. Synthetic biology can be used to examine feedback processes that are important in the emergence of complex networks. Synthetic biology has a major role in developing methods to speed up and scale up genetic engineering. The following developments have impacted a lot in this direction. 1. Abstraction of genetic functions into “parts.” There has been a focus on developing genetic components that study gene expression, such as promoters [38]. This also includes the creation of vast libraries of well-characterized parts as well as the development of biophysical/bioinformatics models to predict component behavior [39]. 2. Large-scale construction technologies. Over the last decade, DNA synthesis capacity has expanded, and it is an easy process to synthesize the 20–100 kb required for a big gene cluster [40]. 3. Design automation. New computer-aided design approaches and work environments make building a genetic system and analyzing –omics datasets faster and easier [41]. 4. Synthetic regulation. Logic, clocks, switches, and oscillators have all been built using genetic circuits [42]. 5. Genome editing for host design. Methods like CRISPR-Cas9, which can target almost any part of the genome and have been proved to operate in a number of species, and can be used to produce many genetic changes at once [43].

10.2.4 Applications of synthetic biology Biosensors One of the major applications of synthetic biology lies in engineering organisms leading to creation of biosensors [44]. Ever since ages, the power of animals to sense dangers and give warning signals has been known and applied for environmental sensing. With the advance of genetic engineering, the manipulation of microorganisms encompassing purpose-built analyte sensors and signal output components became


10 Systems biology–the transformative approach

Figure 10.5: Key components of a biosensor [46].

practically feasible. Biosensors are molecular sensors consisting of a genetic element such as a promoter that can create an output such as gene expression, after detecting and responding to specific targets [45]. Each biosensor is made up of two modular

10.2 Transforming biology-insights from the systems biology approach


parts-a sensor and a reporter. The key components of a biosensor are illustrated in Figure 10.5. The sensor is activated upon recognition of a target of interest. This is followed by production of the reporter whose output can be measured in the form of a color change or fluorescence. The construction of a biosensor involves placing the genetic elements encoding the sensor and the reporter in a platform such as a cell or cell-free system with the cellular machinery needed to carry out transcription and translation. The simplest whole‐cell biosensors are microorganisms, either natural (e.g., Vibrio fischeri, which is naturally luminescent [47] or engineered which are used to identify the presence of toxic molecules employing a change in the expression of a reporter protein [48]. Whole-cell sensors directly mounted onto integrated circuits, which are compact and portable have been used to carry out bioluminescent quantification outside of the laboratory [44]. In the age of synthetic biology, advances in the state-of-the-art technologies such as DNA sequencing and gene synthesis have made development of whole-cell biosensors quick and easy Whole-cell biosensors offer a lot of advantages: easy and inexpensive cultivation, possibility of manifold assays, and no requisite for any sophisticated analytical techniques. Being live, they have the ability of self renewing and hence do not need to undergo purification of components such as enzymes, which in turn will lead to low reagent costs. Also, they can be grown in any large desired quantity from even a single cell, which also means that they are portable. The sensitivity of these whole-cell biosensors is often very high because their working is based on genetic operons [49]. Yet, the application of whole-cell biosensors in complex media is put at a disadvantage since it is marked by interference from other compounds in the mixture and low signal-to-noise ratios. Based on the biomolecular make-up and mechanism, intracellular biosensors can be classified into three categories: enzyme-based, protein-based, or RNA-based. Special mention should be made of CRISPR systems which are nothing but hybrid proteinRNA biosensors. Here, special guide RNAs (crRNAs) are used which can interfere with invading pathogenic nucleic acids. The system comprises short repetitive sequences separated by spacers and flanked by a set of cas genes that code for cas proteins [50]. CRISPR/Cas biosensing systems are nucleic acid detection tools that can be employed in a variety of applications such as detection of single nucleotide polymorphism (SNP), pathogens, cancer etc. [51]. A new class of RNA-based sensors called toehold switches which are actually prokaryotic riboregulators that enable precise control of gene expression in response to environmental stimuli has opened a revolutionary scope for diagnostic applications of sensing [52]. Toehold switches can not only be engineered into live cells, but also into synthetic gene circuits on a variety of materials including paper, where they can be developed into diagnostic devices for pathogen detection. Also, fluorescence resonance energy transfer biosensors with wide applications in mechanobiology have been developed that allows studying the dynamics of signal transduction in live cells [53].


10 Systems biology–the transformative approach Biofuels Biofuel production is an alternative approach to develop energy resources. Synthetic biology is a technique that employs bio-based energy to produce fuels from waste materials, microbes, and other sources. Jagadevan et al. published a study on how synthetic biology techniques can be used in microalgae for biofuel production (Figure 10.6) [54]. Specially engineered microorganism and crops can serve as feedstock material in many cases. Biofuel production system depends on efficient integration of foreign genes and pathways into central metabolism. Synthetic biologists pick different organisms to engineer and choose the elements. Microbes are chosen widely since they are relatively easy to grow in labs. Systems biology employs in silico models that use biological data together with various physicochemical constraints in order to identify targets and optimize the system for biotechnological approaches [55]. In short, synthetic biology offers advanced strategies to shift bio-energy sector to the next generation of bio-forms of energy which are affordable, renewable, and eco-friendly. Design of in vitro synthetic pathways Synthetic biology applications can be broadly divided into two areas: in vitro and in vivo. In vitro synthetic biology focusses on construction of synthetic enzymatic pathways to derive desired products. The history of in vitro synthetic biology dates back to Eduard Buchner’s paradigm-shifting discovery of cell-free ethanol fermentation by yeast lysate. The basic bottom–up design principles of in vitro synthetic pathways include (i) pathway design and reconstruction, (ii) enzyme selection, and (iii) coenzyme management. The essential point of in vitro synthetic biology platform is the

Figure 10.6: Use of synthetic biology techniques using microalgae in biofuel production [54].

10.2 Transforming biology-insights from the systems biology approach


design and reconstruction of an enzymatic pathway from basic building blocks called BioBricks to building modules such as enzyme complexes. These can then form a complex synthetic pathway for the purpose of biomanufacturing [56]. The pathway design usually starts from natural metabolic pathways with necessary modifications. Significant attention has to be paid to the selection of stable enzymes, for which three major strategies can be applied-enzyme mining, protein engineering and enzyme immobilization [57]. Synthetic biology also provides opportunities for De novo protein design whereby protein nanostructures of different architectures and functions could be generated. Computational modelling accompanied by experimental validation and screening helps in assembling the protein secondary structure blocks. De novo protein cages can also serve as delivery containers for siRNA, enzymes, drugs etc. with potential applications in drug delivery, diagnosis etc. [58]. Bottom up approaches have also been used by synthetic biologists to build novel genetic circuits by combining multiple genetic modules that interact with each other to perform a defined function in cells. Synthetic biology efforts have yielded a number of genetic circuits that can sense and remember environmental stimuli [59–61]. The expression of transcription factors has been kept under tight control in stem cells using genetic circuits, leading to generation of desired lineages of stem cells. Another area where synthetic biology has a lot of potential in personalized medicine is engineered synthetic tissues and organoids [62]. Synthetic biology also contributes to development of genetic circuits which are innovative approaches in precision medicine and gene therapy. Biological engineering to design gene switches involve linking a sensor part that detects the ligand to an actuator part that controls gene expression [63]. A wide range of synthetic biology applications also exists in engineering DNA-based cellular recording devices which provide a powerful framework for following intracellular and extracellular biological events. Metabolic engineering Metabolic engineering relies on improvement of cellular functions by manipulation of genetic, enzymatic and regulatory processes in cells to achieve desired goals such as enhanced production of metabolites. Traditional approaches in metabolic engineering require considerable time to design and implement [64]. However, systems metabolic engineering uses mathematic models to simulate and predict behaviors emerging in complex systems which can be extensively applied for the improvement of microbial production [65]. Fine control over protein expression is a basic mandate for metabolic engineering. The expression levels of its protein coding sequences are controlled by varying the sequences of genetic regulatory elements, which optimizes the function of the genetic system. Protein expression can be controlled by modulating each rate limiting step in gene expression. Ribosome binding sites (RBS) present in mRNA


10 Systems biology–the transformative approach

control the accuracy and efficiency of initiation of translation and hence they are commonly mutated to optimize metabolic pathways and genetic circuits [66]. Advanced synthetic biology circuits and pathways are dependent on robust genetic tools such as efficient and stable promoters which can express genes of interest [67]. Promoter elements that are based on a bicistronic design have been developed for E. coli that exhibit much more consistent performance, regardless of the downstream gene sequence that is being expressed [68]. Another significant focus is the development of RNA switches (riboswitches) which are useful tools in engineering cells to perform desired functions in a ligand responsive manner by manipulating secondary structure within a mRNA transcript [69]. A riboswitch is made up of two parts-an “aptamer” that acts as the biosensor and present in the 5′-UTR of the regulated mRNA and an “expression platform”. Upon ligand binding of the aptamer, a conformational change is induced in the expression platform leading to regulation of gene transcription. Though intensive research has been carried out on riboswitches in prokaryotes, mammalian protein dependent riboswitches (PDRs) have only recently been characterized and understanding their molecular mechanism may help in designing of better drugs and also to reduce the side effects of the drugs. The existing challenges in this area include lack of information regarding the genome/transcriptome of organisms, design and construction of robust bacterial chassis for the technique, and production of toxic intermediates while modifying the metabolic pathway and metabolic flux. Toxicology studies Toxicology involves study of adverse events triggered by physical, chemical or biological agents at the level of a biological system, which requires interdisciplinary approaches. Traditional toxicology approaches usually miss out on understanding the effects of chemicals on biological pathways. The advances in genomics and systems oriented perspective in biology have opened a new branch called systems toxicology. Data rich omics technologies such as transcriptomics, proteomics and metabolomics which employ techniques such as microarray analysis, spectrometric techniques etc. generate abundant quantitative data. Toxicogenomics is a rapidly evolving field where genome studies are used not only to evaluate the response to the exposure of a biological system to a toxin, but also to yield predictive toxicology data which can be used to predict pathogenesis, assess risk, and prevent human disease [70]. Comparative toxicogenomics database ( is a robust, publicly available database that aims to advance understanding about how environmental exposures affect human health. Proteomic approaches can also be used to study the effect of toxicants on cellular proteins leading to the identification of biomarkers and toxicity signatures. In the newly evolving toxicoproteomics, the proteome profile is investigated for identification and quantitative evaluation of changes that occur on chemical exposure. In line with

10.2 Transforming biology-insights from the systems biology approach


this, there is another concept called exposome that takes into consideration the measure of all exposures of an individual in a lifetime and their adverse effect on human health. The basic principle behind this concept is that phenotype is the result of interaction of genes with environment. Other than biomarker measurements, exposure tools also include sensors, geographic information systems and conventional tools such as survey instruments. The inherent value of exposomic data in epidemiologic studies is that they can provide greater understanding of the relationships among a broad range of chemical and other risk factors and diseases and ultimately lead to more effective and efficient prevention and control [71]. Network medicine, an extension of network biology, is an emerging area dealing with genetic and molecular interactions, network biomarkers of disease and therapeutic target discovery. Network medicine considers cells as a multilayer network composed of three independent layers: regulatory network, protein interaction network and metabolic network. The basic paradigms for combining biological networks with biomedical big data are (a) network based approach to human disease using interactome studies, (b) identification of important genes studying coexpression based network modeling and (c) construction of phenotype specific gene regulatory networks [72]. A significant limitation of synthetic biology from a toxicology perspective is that no synthetic structure has been constructed yet that is sufficiently biologically detailed enough to represent how a toxicant or group of toxicants might interact with a whole organ system [70]. Medical applications The application of systems biology within the sphere of present day medical research can be defined as systems medicine, its concept dating back to 1992 [73]. The exponential development of highly advanced scientific and medical research analytical technologies throughout the past years has led to the establishment of systems medicine which uses vast amounts of clinical data generated by computer models to analyze health of individual patients. The important applications of systems biology approaches in medicine are summarized below: – Human Disease Networks–Though there is tremendous genomic data made available by advances in high throughput technologies and computational approaches, very limited progress has been made in understanding the functional consequences of the personal sequence variations. Systems biology approaches using network-based tools can help in distilling this large data into actionable knowledge about pathogenesis that enable development of better strategies for disease management. Network medicine, one of the steps towards personalized medicine sets the stage for exploring disease complexity at cellular and molecular levels and also studying relationships between apparently different pathophenotypes. Goh et al. have used the collected gene-disease associations to build the first human disease network by linking diseases that share one or more disease


10 Systems biology–the transformative approach

genes [74]. The mechanistic links between various diseases can be summarized as a diseasome, which is a comprehensive network of disease–diseae relationships and clusters. The first version of the diseasome was based on the OMIM database’s collection of human diseases and disease genes [75]. Treatment response prediction–An important area in which systems biology approaches have been applied is biomarker discovery. The integrative approach of systems biology to the analysis of large scale omics data has been used to develop predictive markers of the effectiveness of drug and combination therapy. The subnetwork-based analysis of gene expression profiles is a more successful and reliable tool to predict disease severity and patient outcomes. Research in biomarkers using systems medicine approach [76] has yielded many promising results in diseases such as cancer [77], neuropsychiatric conditions [78], kidney disease [79], and heart diseases [80]. Investigation of disease mechanisms–Systems biology approaches can generate disease maps which are conceptual maps that allow multicontextual visual representation of disease mechanisms. This is a large scale collaborative work which requires the expertise of disease expert groups (clinicians and experimental biologists) and pathway expert groups [81]. Disease associated gene prediction–Biological networks are responsible for defining genotype–phenotype relationships. Hence a study of the network properties should enable prediction of disease associated genes, even for uncharacterized diseases with no known molecular basis. Studying dysregulation at a biological network level can predict new diseases also associated with a given gene. Therapeutic applications Advances in synthetic biology have provided solutions to many biomedical challenges such as emergence of new infectious diseases and drug resistance. Many medical applications such as immunotherapies using bioengineered cells such as T cells with tumor receptors, chimeric antigen receptor (CAR)-T cells etc. have also been made possible [82]. Synthetic biology is a vital component in drug discovery and its main applications in therapeutics can be summarized as follows: – Drug-target networks–The concept of polypharmacology has transformed drug designing from “one drug one target” to “one drug multiple targets”. The rich pattern of interactions among drugs and their targets is illustrated by analysis of drug target networks [83]. Integrating systems biology and polypharmacology holds the promise of expanding the current opportunities to improve clinical efficacy and decrease side effects and toxicity. Advances in these areas have led to the establishment of network pharmacology which is an emerging frontier research area in the era of artificial intelligence and big data.

10.2 Transforming biology-insights from the systems biology approach


Prediction of drug-target interactions–This is one of the most important steps in the genomic drug discovery pipeline. Pharmacological profile of drugs generated by computational tools enable a good understanding of the actions of a drug and its interactions with target [84]. Investigations of drug adverse effects–Prediction of safety and toxicology of drugs is one of the early stages of the drug development pipeline. Immense progress has been made in this aspect by integrating systems biology and biological data. Adverse effects of drugs can also be successfully predicted by combining systems biology with chemoinformatics analysis [85]. Drug repositioning–Drug repositioning is a powerful technique which can aid conventional drug discovery and can radically reduce the risks of drug development so that such drugs can enter clinical trials more rapidly. There are mainly two classical drug repositioning methods-ligand based approaches [84] and structure based approaches [86]. Ligand-based approaches are based on the principle that similar compounds are most likely to have similar biological properties. However, structure-based approaches work on the principle that proteins with similar structure tend to have similar functions and also bind to similar compounds. Advances in data mining, network analysis and machine learning are leading to an explosive growth of data in the form of microarray gene expression signatures, pharmaceutical databases etc. The different mechanisms by which drugs function can be studied in comparison with existing treatments, but still identification of disease-related candidate drugs remains a critical issue that need to be addressed. Predictions of drug combinations–It is known that greater therapeutic benefit can be achieved by using drug combinations rather than using a single drug. Numerous effective and potential drug combinations have been predicted by applying systems biology approaches [87]. Also, the effect of such drug combinations can be simulated by dynamical modelling using computational approaches. Exploring the drug-target interactions could lead to discovery of new efficacious drug combinations. Novel synergistic drug combinations can be identified by developing a large-scale drug combination network (DCN) [87]. Infectious diseases studies Infectious diseases are becoming a continuous threat and a major health concern worldwide. Systems biology approaches are being continuously implemented to address major challenges in infectious diseases research. A deeper understanding of the host–pathogen interactions can be arrived at employing systems biology tools that involve complex biological systems [88]. There is a consortium called NIAID/DMID systems biology consortium for infectious diseases which deals with infectious diseases caused by a variety of pathogens and antimicrobial resistance. The consortium consists of teams from various disciplines such as microbiology, immunology,


10 Systems biology–the transformative approach

machine learning, bioinformatics, biostatistics etc. which work together to gain insight into the molecular mechanisms within infectious organisms and their interaction with host ( Cancer research Cancer is a class of diseases or disorders characterized by the uncontrolled division of cells. Chemical biology plays a vital role in cancer prevention and management studies [89]. This facilitates use and design of drug components which can achieve a desired biological function. Medicinal chemistry and pharmaceutical chemistry deal with chemical synthesis and development of drug agents. Proteomics has tremendous potential in cancer diagnosis and treatment which helps identification of novel biomarkers leading to a proper understanding of the cancer biology dynamics [90]. Although a synthetic tool kit for cancer is being developed, there are various issues in cancer research such as deciphering disease mechanisms, creating novel diagnostic tools, and treatment modalities that need to be explored. Synthetic biology toolkits using DNA, RNA, and protein bioparts have been used to address drug target identification, drug discovery, and therapeutic treatment in cancer research, bringing oncology research into a new dimension [91].

10.3 Challenges and future directions One of the major challenges associated with systems biology is the management of huge amount of data generated from the high throughput experiments. The job of structuring this information is formidable since the validation by ‘wet’ laboratories involves many research experiments and strong collaboration between laboratories. Also, although we have made significant advances in quantitative measurements of multicellular systems, consistent recording of qualitiative measurements including annotation of critical cell–cell interactions is still a big challenge. Sharing of softwares to read and interpret data from multiple modelling techniques and interoperability of computational tools are also areas of concern. The challenges, some of which may be technical or cultural, require a need for community investment. Taking into consideration the vast possibilities of systems biology, we need to address the challenges and issues presently associated with it. More algorithms should be developed that can extract the biological data elements. High throughput simulators could be developed that employ machine learning techniques to analyze time series of multicellular data. Community curated public data libraries and social media could be properly used to transform raw image data into standardized data sets.



10.4 Conclusions Scientific research is faced with more complicated questions these days, most of which are of a transdisciplinary nature. Systems biology has enormous potential to address the wide range of newly emerging global challenges such as healthcare, environmental issues and the development of more sustainable processes. Though steps are being taken at many levels of research to integrate research mindsets and toolsets across disciplines, systems biology still faces the problem of fragmentation as reflected in the tension between basic insights and clinical translational applications. Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission. Research funding: None declared. Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

References 1. Lewis J, Bartlett A, Atkinson P. Hidden in the middle: culture, value and reward in bioinformatics. Minerva 2016;54:471–90. 2. Gasperskaja E, Kučinskas V. The most common technologies and tools for functional genome analysis. Acta Med Litu 2017;24:1–11. 3. Bartlett A, Lewis J, Williams ML. Generations of interdisciplinarity in bioinformatics. New Genet Soc 2016;35:186–209. 4. Bayat A. Science, medicine, and the future: bioinformatics. BMJ 2002;324:1018–22. 5. Trewavas A. A brief history of systems biology. “Every object that biology studies is a system of systems.” Francois Jacob (1974). Plant Cell 2006;18:2420–30. 6. Hood L, Rowen L. The human genome project: big science transforms biology and medicine. Genome Med 2013;5:79. 7. Wake MH. Integrative biology: science for the 21st Century. Bioscience 2008;58:349–53. 8. Gupta MK, Misra K. A holistic approach for integration of biological systems and usage in drug discovery. Netw Model Anal Health Inform Bioinform 2016;5:4. 9. Tavassoly I, Goldfarb J, Iyengar R. Systems biology primer: the basic methods and approaches. Essays Biochem 2018;62:487–500. 10. Stelling J. Mathematical models in microbial systems biology. Curr Opin Microbiol 2004;7:513–8. 11. Ideker T, Lauffenburger D. Building with a scaffold: emerging strategies for high- to low-level cellular modeling. Trends Biotechnol 2003;21:255–62. 12. Likić VA, McConville MJ, Lithgow T, Bacic A. Systems biology: the next Frontier for bioinformatics. Adv Bioinf 2010;2010:268925. 13. Sevimoglu T, Arga KY. The role of protein interaction networks in systems biomedicine. Comput Struct Biotechnol J 2014;11:22–7. 14. Pandey UB, Nichols CD. Human disease models in Drosophila melanogaster and the role of the fly in therapeutic drug discovery. Pharmacol Rev 2011;63:411–36.


10 Systems biology–the transformative approach

15. Doni Jayavelu N. Integrative systems biology approaches for analyzing high-throughput data: applications to modeling of gene regulatory networks. Trondheim, Norway: Norwegian University of Science and Technology; 2015. 16. Sobie EA, Lee Y-S, Jenkins SL, Iyengar R. Systems biology—biomedical modeling. Sci Signal 2011; 4:tr2 LP–tr2. 17. Shahzad K, Loor JJ. Application of top-down and bottom-up systems approaches in ruminant physiology and metabolism. Curr Genom 2012;13:379–94. 18. Bruggeman FJ, Westerhoff HV. The nature of systems biology. Trends Microbiol 2007;15:45–50. 19. Oltvai ZN, Barabási A-L. Systems biology. Life’s complexity pyramid. Science 2002;298:763–4. 20. Reeves GT, Hrischuk CE. Survey of engineering models for systems biology. Comput Biol J 2016; 2016:4106329. 21. Emmert-Streib F, Dehmer M, Haibe-Kains B. Gene regulatory networks and their applications: understanding biological and medical problems in terms of networks. Front Cell Dev Biol 2014;2:38. 22. Aguda BD, Goryachev AB. From pathways databases to network models of switching behavior. PLoS Comput Biol 2007;3:1674–8. 23. Hass H, Loos C, Raimúndez-Álvarez E, Timmer J, Hasenauer J, Kreutz C. Benchmark problems for dynamic modeling of intracellular processes. Bioinformatics 2019;35:3073–82. 24. Benner SA, Sismour AM. Synthetic biology. Nat Rev Genet 2005;6:533–43. 25. Lucks JB, Qi L, Whitaker WR, Arkin AP. Toward scalable parts families for predictable design of biological circuits. Curr Opin Microbiol 2008;11:567–73. 26. Tyagi A, Kumar A, Aparna SV, Mallappa RH, Grover S, Batish VK. Synthetic biology: applications in the food sector. Crit Rev Food Sci Nutr 2016;56:1777–89. 27. Mortimer JC. Plant synthetic biology could drive a revolution in biofuels and medicine. Exp Biol Med 2018;244:323–31. 28. Slomovic S, Pardee K, Collins JJ. Synthetic biology devices for in vitro and in vivo diagnostics. Proc Natl Acad Sci USA 2015;112:14429–35. 29. Cameron DE, Bashor CJ, Collins JJ. A brief history of synthetic biology. Nat Rev Microbiol 2014;12: 381–90. 30. Del Vecchio D, Ninfa AJ, Sontag ED. Modular cell biology: retroactivity and insulation. Mol Syst Biol 2008;4:161. 31. Mishra D, Rivera PM, Lin A, Del Vecchio D, Weiss R. A load driver device for engineering modularity in biological networks. Nat Biotechnol 2014;32:1268–75. 32. de Lorenzo V, Danchin A. Synthetic biology: discovering new worlds and new words. EMBO Rep 2008;9:822–7. 33. Kumar RM, Cahan P, Shalek AK, Satija R, Jay DaleyKeyser A, Li H, et al. Deconstructing transcriptional heterogeneity in pluripotent stem cells. Nature 2014;516:56–61. 34. Kim KH, Qian H, Sauro HM. Nonlinear biochemical signal processing via noise propagation. J Chem Phys 2013;139:144108. 35. Cobb RE, Si T, Zhao H. Directed evolution: an evolving and enabling synthetic biology tool. Curr Opin Chem Biol 2012;16:285–91. 36. Burgos-Morales O, Gueye M, Lacombe L, Nowak C, Schmachtenberg R, Hörner M, et al. Synthetic biology as driver for the biologization of materials sciences. Mater Today Bio 2021;11:100115. 37. Zhou Q, Zhan H, Liao X, Fang L, Liu Y, Xie H, et al. A revolutionary tool: CRISPR technology plays an important role in construction of intelligentized gene circuits. Cell Prolif 2019;52:e12552. 38. Nielsen AAK, Segall-Shapiro TH, Voigt CA. Advances in genetic circuit design: novel biochemistries, deep part mining, and precision gene expression. Curr Opin Chem Biol 2013;17: 878–92.



39. Chen Y-J, Liu P, Nielsen AAK, Brophy JAN, Clancy K, Peterson T, et al. Characterization of 582 natural and synthetic terminators and quantification of their design constraints. Nat Methods 2013;10: 659–64. 40. Gibson DG, Glass JI, Lartigue C, Noskov VN, Chuang R-Y, Algire MA, et al. Creation of a bacterial cell controlled by a chemically synthesized genome. Science 2010;329:52–6. 41. Hillson NJ, Rosengarten RD, Keasling JD. j5 DNA assembly design automation software. ACS Synth Biol 2012;1:14–21. 42. Brophy JAN, Voigt CA. Principles of genetic circuit design. Nat Methods 2014;11:508–20. 43. Cong L, Ran FA, Cox D, Lin S, Barretto R, Habib N, et al. Multiplex genome engineering using CRISPR/Cas systems. Science 2013;339:819–23. 44. Khalil AS, Collins JJ. Synthetic biology: applications come of age. Nat Rev Genet 2010;11:367–79. 45. Mehrotra P. Biosensors and their applications – a review. J Oral Biol Craniofacial Res 2016;6: 153–9. 46. Thavarajah W, Verosloff MS, Jung JK, Alam KK, Miller JD, Jewett MC, et al. A primer on emerging field-deployable synthetic biology tools for global water quality monitoring. Npj Clean Water 2020; 3:18. 47. Bulich AA, Isenberg DL. Use of the luminescent bacterial system for the rapid assessment of aquatic toxicity. ISA Trans 1981;20:29–33. 48. Belkin S, Smulski DR, Vollmer AC, Van Dyk TK, LaRossa RA. Oxidative stress detection with Escherichia coli harboring a katG’::lux fusion. Appl Environ Microbiol 1996;62:2252–6. 49. Gui Q, Lawson T, Shan S, Yan L, Liu Y. The application of whole cell-based biosensors for use in environmental analysis and in medical diagnostics. Sensors 2017;17:1623. 50. Ishino Y, Krupovic M, Forterre P. History of CRISPR-Cas from encounter with a mysterious repeated sequence to genome editing technology. J Bacteriol 2018;200:e00580–17. 51. Li Y, Li S, Wang J, Liu G. CRISPR/Cas systems towards next-generation biosensing. Trends Biotechnol 2019;37:730–43. 52. Green AA, Silver PA, Collins JJ, Yin P. Toehold switches: de-novo-designed regulators of gene expression. Cell 2014;159:925–39. 53. Liu L, He F, Yu Y, Wang Y. Application of FRET biosensors in mechanobiology and mechanopharmacological screening. Front Bioeng Biotechnol 2020;8:1299. 54. Jagadevan S, Banerjee A, Banerjee C, Guria C, Tiwari R, Baweja M, et al. Recent developments in synthetic biology and metabolic engineering in microalgae towards biofuel production. Biotechnol Biofuels 2018;11:185. 55. Rupprecht J. From systems biology to fuel–Chlamydomonas reinhardtii as a model for a systems biology approach to improve biohydrogen production. J Biotechnol 2009;142:10–20. 56. Zhang Y-HP. Production of biofuels and biochemicals by in vitro synthetic biosystems: opportunities and challenges. Biotechnol Adv 2015;33:1467–83. 57. Shi T, Han P, You C, Zhang Y-HPJ. An in vitro synthetic biology platform for emerging industrial biomanufacturing: bottom-up pathway design. Synth Syst Biotechnol 2018;3:186–95. 58. Zhou W, Šmidlehner T, Jerala R. Synthetic biology principles for the design of protein with novel structures and functions. FEBS Lett 2020;594:2199–212. 59. Burrill DR, Inniss MC, Boyle PM, Silver PA. Synthetic memory circuits for tracking human cell fate. Genes Dev 2012;26:1486–97. 60. Friedland AE, Lu TK, Wang X, Shi D, Church G, Collins JJ. Synthetic gene networks that count. Science 2009;324:1199–202. 61. Siuti P, Yazbek J, Lu TK. Synthetic circuits integrating logic and memory in living cells. Nat Biotechnol 2013;31:448–52. 62. Healy CP, Deans TL. Genetic circuits to engineer tissues with alternative functions. J Biol Eng 2019; 13:39.


10 Systems biology–the transformative approach

63. Re A. Synthetic gene expression circuits for designing precision tools in oncology. Front Cell Dev Biol 2017;5:77. 64. Kumar RR, Prasad S. Metabolic engineering of bacteria. Indian J Microbiol 2011;51:403–9. 65. Lee SY, Mattanovich D, Villaverde A. Systems metabolic engineering, industrial biotechnology and microbial cell factories. Microb Cell Factories 2012;11:156. 66. Salis HM, Mirsky EA, Voigt CA. Automated design of synthetic ribosome binding sites to control protein expression. Nat Biotechnol 2009;27:946–50. 67. Shetty RP, Endy D, Knight TFJ. Engineering BioBrick vectors from BioBrick parts. J Biol Eng 2008;2:5. 68. Mutalik VK, Guimaraes JC, Cambray G, Lam C, Christoffersen MJ, Mai Q-A, et al. Precise and reliable gene expression via standard transcription and translation initiation elements. Nat Methods 2013; 10:354–60. 69. Schmidt CM, Smolke CD. RNA switches for synthetic biology. Cold Spring Harb Perspect Biol 2019;11. 70. Mc Auley MT, Choi H, Mooney K, Paul E, Miller VM. Systems biology and synthetic biology: a new epoch for toxicology research. Adv Toxicol 2015;2015:575403. 71. DeBord DG, Carreón T, Lentz TJ, Middendorf PJ, Hoover MD, Schulte PA. Use of the “Exposome” in the practice of epidemiology: a primer on -Omic Technologies. Am J Epidemiol 2016;184:302–14. 72. Sonawane AR, Weiss ST, Glass K, Sharma A. Network medicine in the age of biomedical big data. Front Genet 2019;10:294. 73. Kamada T. System biomedicine: a new paradigm in biomedical engineering. Front Med Biol Eng 1992;4:1–2. 74. Goh K-I, Cusick ME, Valle D, Childs B, Vidal M, Barabási A-L. The human disease network. Proc Natl Acad Sci U S A 2007;104:8685–90. 75. Goh K-I, Choi I-G. Exploring the human diseasome: the human disease network. Brief Funct Genomics 2012;11:533–42. 76. Wang K, Lee I, Carlson G, Hood L, Galas D. Systems biology and the discovery of diagnostic biomarkers. Dis Markers 2010;28:199–207. 77. Generalov E, Clarke T, Iddamalgoda L, Sundararajan VS, Suravajhala P, Goltsov A. Chapter 3 – Systems biology in biomarker development for cancer signaling therapy. In: Jørgensen JTBT-C, editor. Companion and complementary diagnostics. Academic Press; 2019:27–51 pp. 78. Alawieh A, Zaraket F, Li J-L, Nokkari A, Razafsha M, Fadlallah B, et al. Systems biology, bioinformatics, and biomarkers in neuropsychiatry. Front Neurosci 2012;6:187. 79. Schaub JA, Hamidi H, Subramanian L, Kretzler M. Systems biology and kidney disease. Clin J Am Soc Nephrol 2020;15:695–703. 80. Louridas GE, Kanonidis IE, Lourida KG. Systems biology in heart diseases. Hippokratia 2010;14: 10–6. 81. Mazein A, Ostaszewski M, Kuperstein I, Watterson S, Le Novère N, Lefaudeux D, et al. Systems medicine disease maps: community-driven comprehensive representation of disease mechanisms. Npj Syst Biol Appl 2018;4:21. 82. Caliendo F, Dukhinova M, Siciliano V. Engineered cell-based therapeutics: synthetic biology meets immunology. Front Bioeng Biotechnol 2019;7:43. 83. Yildirim MA, Goh K-I, Cusick ME, Barabási A-L, Vidal M. Drug-target network. Nat Biotechnol 2007; 25:1119–26. 84. Keiser MJ, Setola V, Irwin JJ, Laggner C, Abbas AI, Hufeisen SJ, et al. Predicting new molecular targets for known drugs. Nature 2009;462:175–81. 85. Chang RL, Xie L, Xie L, Bourne PE, Palsson BØ. Drug off-target effects predicted using structural analysis in the context of a metabolic network model. PLoS Comput Biol 2010;6:e1000938.



86. Kinnings SL, Liu N, Buchmeier N, Tonge PJ, Xie L, Bourne PE. Drug discovery using chemical systems biology: repositioning the safe medicine Comtan to treat multi-drug and extensively drug resistant tuberculosis. PLoS Comput Biol 2009;5:e1000423. 87. Fitzgerald JB, Schoeberl B, Nielsen UB, Sorger PK. Systems biology and combination therapy in the quest for clinical efficacy. Nat Chem Biol 2006;2:458–66. 88. Aderem A, Adkins JN, Ansong C, Galagan J, Kaiser S, Korth MJ, et al. A systems biology approach to infectious disease research: innovating the pathogen-host research paradigm. mBio 2011;2: e00325–10. 89. Sturla SJ, Irwin JJ, Loeppky RN, Mulvihill MJ, Searcey M. Chemistry in cancer research: a vital partnership. Cancer Res 2007;67:6539–43. 90. Shruthi BS, Vinodhkumar P, Selvamani. Proteomics: a new perspective for cancer. Adv Biomed Res 2016;5:67. 91. Shankar S, Pillai MR. Translating cancer research by synthetic biology. Mol Biosyst 2011;7: 1802–10.

Index 3D structure 146 (3)-epigallocatechin gallate 146 abstract 133 AChE 28 active site 120–122, 124, 126 ADMET 15, 18 AIDS 5, 13 algorithm 9 algorithms 158 Aliskiren 27 Amber 25 angiotensin II 58, 81 antibiotic 9 anticancer 5, 11, 12, 116, 120, 122, 124, 130, 131 artificial 29 artificial intelligence 158 ATP hydrolysis 146 attributes 28 autodock 22, 23 – coordinate file preparation 135 – docking of flexible ligands to receptors 134 autodock result 136 autodock tool 4.2 149 autogrid calculation 135 bacterial 9 Ben Hesper 16 benefits/outcomes of MD simulations 142 binding 118, 120–122, 124–126, 128, 131 bio– active 17, 26, 27 – availability 154 – chemistry 2, 10 – fuel 182 – informatics 16, 171, 172, 189, 190, 192 – logical 2, 10, 12, 13 – markers 9 – pharmaceutical 17, 29 – statistician 167 – transformation 162 blast 21, 147 CADD 17, 18 cancer 9, 115, 120, 129, 130 catalyst 14 catechins 146 CHARMm 25 ChemSpider database 149

Chimera 148 classification 160, 163, 165, 166, 169, 170 clinical 17, 30 colchicine 5 comparative 19 computational 15, 16, 18, 29, 120, 124, 125, 128 confirmation 8 conformational changes are a common part of an enzymes’ catalytic cycle 138 data mining 158 dataset 158, 162, 165, 166 descriptions 10 descriptors 161–163, 165–167 determinants 158, 160 differential scanning fluorimetry 108 discovery studio visualizer tool 149 discriminant 163 DNA 9, 14 Docking 22, 23, 115, 116, 120, 124–128, 131 – performed using autodock 136 donepezil 28 dorzolamide 47 dose 2 drugbank 30 drug-likeness 17 drugs 146 drugscore 23 effectiveness 2, 3 efflux mechanisms 154 electronic 7, 8 endpoints 158, 159, 164, 166 energy of binding 154 ensemble 23 epicatechin 149 epigallocatechin 3 gallate 149 error 162, 167 experiments 161 extrapolation 166 FASTA 21 FDA 25 fingerprints 26, 27, 29 force filed: need and selection 142 fragment – definition 104 – evolution 110 – growing 110



– linking 110, 111 – merging 111 – optimization 110 function 12 fungal 9 GenBank 21, 31 genetic 6 genetic algorithm 22 genomics 15, 29 global 21 glucosidase 12 gold 23 GPU and high computation power in MD simulations 139 hit-to-lead optimization 110 homologous 21, 27 homology 15, 19, 21 homology methods 147 hydrogen bond 120, 121, 128 hydrophobic 7, 8 hydrophobic interaction 149 infections 9 inhibition constant 153, 154 inhibitors 5, 11–14, 115–118, 120, 122, 124–126, 129–131, 146 in silico 115, 121, 129 in vitro 120, 125, 126, 128, 130, 131 integral 10 interaction 120–122, 127 interaction energy 149 interdisciplinary 16, 30 intermolecular energy 154 introduction 133 IUPAC 27 kcal/mol 149 lead 15, 17, 18, 24, 27, 29, 30 learning 7 local 21 LUDI 23 machine 7 management 5, 9 MD simulation 25 mechanics 10 methodologies 158 microscopy 3 modeling 15, 16, 19, 21, 25–29, 120, 124, 125, 130, 131, 162, 163, 166, 170, 176, 190 molecular docking 146, 149

molecular docking and MD: mimicking the real biological process 133 molecular dynamic simulations and history 138 molecular modeling 115, 119, 124, 125 molecule structure 160 Monte Carlo 22 Mus musculus 147 mutagenesis 21 nanotechnology 2 natural compounds 149 NCE 30 network 158 NGS 29 NMR 19, 33 N-myristoyl transferase 54 norfloxacin 28 numerical 10 overfitted 166, 167 paradox 162 pathways 175, 176, 182–184, 190 Paulien Hogeweg 16 PCM 28 PDB 19 PDB ID: 3G60 147 PDB Structure and need of 3d conformation study 138 peptidomimetics 51, 62 P-glycoprotein 145 P-gp 145, 147, 154 P-gp–drug interaction 146 pharmacokinetics 29, 158, 159 pharmacophore 8, 9, 12, 15, 22, 26–28 phylogenetic 30 physiochemical properties 162 PIR 21 pKa 17 PMF 23 pocket 5 polymerization 5 polypharmacology 186 polyphenols 145 potential 157, 160, 163 preclinical 17, 18 prediction 20, 27, 29, 30 predictions 158, 162, 163, 165 process 1, 3, 8, 10 promising 23 protein databank 147


protein-protein interactions 40, 42 proteomics 15 QM/MM 28 QSAR 15, 17, 26–29 quantum 10, 14 Raltegravir 27 ramachandran plot 154 RAMPAGE 148 rational 18 rattus norvegicus 147 R&D 29 receptors 116, 117, 130 references 143 reliability 163, 166–168 repositioning 187, 193 repurposing 29 robotics 6 robustness 28 rule of 3 see what is a fragment? 105 SBDD 15, 17 scaffold 25 screening 125, 130, 131 steps in molecular dynamics: 141 structure–activity relationship 146, 154 superposition 9 SWISS-MODEL 147 SwissProt 21 synthetic biology 177–179, 181–186, 190, 191, 193 systematic 163


target 15, 18, 19, 21, 22, 25–30 tea plant 145 tea polyphenols 154 temperatures 10 template 15, 19, 21 the overview of calculating MD simulation 139 Thea sinensis 145 theaflavin 145, 149 therapeutic 2, 8, 13, 22, 24, 29 therapy 116, 129, 130 thermodynamic 10 thermofluor assays 108 toxicity 158, 161, 162, 164, 166, 169 toxicogenomics 184 toxicoproteomics 184 treatment 9, 13 tubulin 5 tyrosine 11 tyrosine kinase 116, 129 underfitted 166 Uniprot ID: P43245 147 VEGFR-2 115, 117, 118, 120–122, 124–126, 128–131 violations 160 virtual 17, 24, 25, 27 world’s fastest computer and MD simulations 141 X-ray 19 X-ray crystallography 109 Zelboraf 25