Structural Biology in Drug Discovery: Methods, Techniques, and Practices [1 ed.] 1118681010, 9781118681015

With the most comprehensive and up-to-date overview of structure-based drug discovery covering both experimental and com

1,079 129 14MB

English Pages 688 [668] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Single-Cell Assays: Microfluidics, Genomics, and Drug Discovery (Methods in Molecular Biology, 2689) 1071633228, 9781071633229

This detailed volume explores the use of single-cell assays in research for drug discovery, microfluidics, and more. The

194 60 27MB Read more

Single-Cell Assays: Microfluidics, Genomics, and Drug Discovery (Methods in Molecular Biology, 2689) 1071633228, 9781071633229

This detailed volume explores the use of single-cell assays in research for drug discovery, microfluidics, and more. The

216 41 8MB Read more

Computational Drug Discovery and Design (Methods in Molecular Biology, 2714) [2nd ed. 2024] 1071634402, 9781071634400

This second edition provides new and updated methods and techniques for identification of drug target, binding sites pre

352 48 14MB Read more

Computational Drug Discovery and Design (Methods in Molecular Biology, 2714) [2nd ed. 2024] 1071634402, 9781071634400

This second edition provides new and updated methods and techniques for identification of drug target, binding sites pre

149 93 46MB Read more

Structural bioinformatics: applications in preclinical drug discovery 9783030052812, 9783030052829

898 140 6MB Read more

Supramolecules in Drug Discovery and Drug Delivery: Methods and Protocols [2207, 1 ed.] 9781071609194, 9781071609200

This detailed book aims to provide readers with critical information to accomplish the synthesis of nanosystems for the

907 64 13MB Read more

Target Discovery and Validation, Methods and Strategies for Drug Discovery [1. Auflage] 9783527818266, 352781826X

1,370 144 10MB Read more

Drug Discovery and Evaluation: Methods in Clinical Pharmacology [2011 ed.] 9783540898900, 3540898905, 9783540898917, 3540898913

Drug Discovery and Evaluation has become a more and more difficult, expensive and time-consuming process. The effect of

2,965 82 33MB Read more

Computational Methods in Drug Discovery and Repurposing for Cancer Therapy 9780443152801

Computational Methods in Drug Discovery and Repurposing for Cancer Therapy provides knowledge about ongoing research as

274 129 16MB Read more

Drug Discovery and Drug Development: The Indian Narrative 9789811580017, 9789811580024

1,253 142 11MB Read more

Structural Biology in Drug Discovery: Methods, Techniques, and Practices [1 ed.]
1118681010, 9781118681015

Author / Uploaded
Jean-Paul Renaud (editor)

Citation preview

Structural Biology in Drug Discovery

Structural Biology in Drug Discovery Methods, Techniques, and Practices

Edited by

Jean‐Paul Renaud President & CSO at Urania Therapeutics Ostwald, France Cofounder at NovAliX Illkirch, France CNRS Research Director at IGBMC Illkirch, France

This edition first published 2020 © 2020 John Wiley & Sons, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions. The right of Jean‐Paul Renaud to be identified as the author of the editorial material in this work has been asserted in accordance with law. Registered Office John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA Editorial Office 111 River Street, Hoboken, NJ 07030, USA For details of our global editorial offices, customer services, and more information about Wiley products, visit us at www.wiley.com. Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats. Limit of Liability/Disclaimer of Warranty In view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of experimental reagents, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each chemical, piece of equipment, reagent, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. Library of Congress Cataloging‐in‐Publication Data Names: Renaud, Jean-Paul, 1960– editor. Title: Structural biology in drug discovery : methods, techniques, and practices / edited by Jean-Paul Renaud. Description: First edition. | Hoboken, NJ : Wiley, 2020. | Includes bibliographical references and index. | Identifiers: LCCN 2019015619 (print) | LCCN 2019017015 (ebook) | ISBN 9781118900406 (Adobe PDF) | ISBN 9781118900505 (ePub) | ISBN 9781118681015 (hardback) Subjects: | MESH: Drug Discovery–methods | Molecular Biology–methods | Genetic Techniques | Drug Design Classification: LCC RS420 (ebook) | LCC RS420 (print) | NLM QV 745 | DDC 615.1/9–dc23 LC record available at https://lccn.loc.gov/2019015619 Cover Design: Wiley Cover Image: © adempercem/iStock.com Set in 10/12pt Warnock by SPi Global, Pondicherry, India Printed in United States of America 10 9 8 7 6 5 4 3 2 1

I dedicate this book to the memory of Guy Dodson (13 January 1937–24 December 2012), a pioneer of biotherapeutics with his contribution to the structural study of insulin along with Dorothy Hodgkin, who thought it was most important that science, and in particular structural biology, should help improve human health, as shown by his dedication to research projects on proteins from pathogens such as Mycobacterium tuberculosis. He will remain my scientific mentor and an inspiring model of wonderful human being. Jean‐Paul Renaud

(photograph kindly provided by Brian Matthews)

vii

Contents List of Contributors xxii Preface xxvii Part I 1

Overview, Concepts, and Approaches 1

The Evolving Role of Structural Biology in Drug Discovery 3 Jean‐Paul Renaud

1.1 Introduction 3 1.2 The Expanding Toolbox of Structural Biology for Drug Discovery 5 1.3 The Various Uses of Structural Biology in Drug Discovery 10 1.4 Evolving Drugs and Targets 12 1.5 Current Trends and Perspectives 13 References 14 2

A Structural View on Druggability: Experimental and Computational Approaches 23 Ursula Egner and Roman C. Hillig

2.1 Introduction 23 2.2 Views on Target Druggability 24 2.3 In Silico Methods for Druggability Assessment of Targets with Well‐defined Pockets 25 2.3.1 Binding Site Identification 26 2.3.2 Selection of Descriptors and Datasets for Method Development 28 2.3.3 Development of Druggability Models 29 2.3.4 Druggability Prediction via Alternative Methods 33 2.4 Experimental Methods for Druggability Assessment 34 2.5 A Challenge for Druggability Predictions: Protein–Protein Interactions 37 2.5.1 Identification of Binding Sites for Protein–Protein Interaction Targets 39 2.5.1.1 Computational Solvent Mapping 39 2.5.1.2 Molecular Dynamics and Monte Carlo Simulations 40 2.5.1.3 Hot Spot Identification 42 2.5.2 Druggability Assessment of Protein–Protein Interaction Targets 43 2.6 Perspective 45 References 46 3

Structural Chemogenomics: Profiling Protein–Ligand Interactions in Polypharmacological Space 53 Babs Briels, Chris de Graaf, and Andreas Bender

3.1 Introduction 53 3.2 Simultaneously Targeting Multiple Proteins Can Be More Efficient in Disrupting Disease Mechanism 54 3.2.1 Disease Mechanisms Call for the Need of A Systematic Targeting Approach 54 3.2.2 Multiple Ways to Adopt A Multi‐target Approach with Low‐Affinity Binders 55 3.2.3 Polypharmacology Allows for Repurposing of Marketed Drugs and Drug Rescue 56 3.2.4 Optimization of Multiple Structure–Activity Relationships Simultaneously Is Difficult 57

viii

Contents

3.3 Computer‐Aided Approaches for Profiling Bioactivities of Ligands 57 3.3.1 Ligand‐Based Methods for Target Prediction 58 3.3.1.1 In Silico Target Prediction Using Chemical Similarity Methods and Data Mining 58 3.3.1.2 Ligand‐Based Pharmacophore‐Based Models for Target Prediction 58 3.3.1.3 Identification of Privileged Ligand Scaffolds and Structure–Selectivity Cliffs 59 3.3.2 Protein–Ligand‐Based Methods to Predict Bioactivities of Ligands Against Targets 60 3.3.2.1 Chemogenomics for All‐Against‐All Bioactivity Prediction 60 3.3.2.2 Proteochemometric Modeling Combines Protein–Ligand Space in a Predictive Statistical Method 61 3.3.3 Structure‐Based Tools for Prospective Protein–Ligand Interaction Prediction 62 3.3.3.1 Cavity Detection Methods Locate Probable Ligand Binding Sites 62 3.3.3.2 Cavity Description Methods Simplify Protein Structure Description for Fast Binding Site Comparison 63 3.3.3.3 Similarity Search Methods 64 3.3.3.4 Scoring Protein Cavity Similarity by Counting Overlapping Occurrences 65 3.3.3.5 Protein‐Based Pharmacophore Modeling 65 3.3.3.6 Target Fishing Using Proteome‐Scale Docking 66 3.3.3.7 Post‐processing Docking Data to Complement Scoring Functions 66 3.3.3.8 Knowledge‐Based Structural Databases to Navigate Protein–Ligand Interaction Space 67 3.4 Applications 68 3.4.1 Designing Ligands with Desired D2 and D4 Dopamine Receptor Activities 69 3.4.2 Similarity Ensemble Approach (SEA) to Predict Off‐Target Binding of FDA‐Approved Drugs 69 3.4.3 Target Prediction in Chemical Genetics Using SEA 69 3.4.4 Ligand‐Based Prediction of Target Space for Malaria Drug Development Using PredictFX 70 3.4.5 Chemically Similar Compounds of Multi‐target SAR‐Tested Ligands Inhibit Cell Proliferation and Tumor Growth 70 3.4.6 Prospective Prediction of Cross‐Reaction Protein Kinase Inhibitors Using Binding Site Comparison Method SiteAlign 70 3.4.7 Explaining the Mode of Action of Traditional Medicine Using A Chemogenomic Target Prediction Tool 71 3.4.8 Combining Ligand‐Based and Structure‐Based Pharmacophore Screenings to Determine Novel Bioactive GPCR Ligands and Multi‐target Binding 71 3.4.9 Profiling Bioactive Compounds Using a Combined Ligand‐Based and Protein‐Based Workflow 71 3.5 Conclusion 71 References 72 4

Fragment‐Based Ligand Discovery 79 Ben J. Davis and Roderick E. Hubbard,

4.1 Introduction 79 4.2 The Evolution of FBLD 79 4.3 The FBLD Process 81 4.4 Fragment Libraries 82 4.4.1 Step A: Chemoinformatics Selection 82 4.4.2 Step B: Focused Libraries? 83 4.4.3 Step C: Characterizing Diversity 83 4.4.4 Step D: Visual Inspection 83 4.5 Maintaining a Fragment Library 83 4.6 Fragment Screening 84 4.6.1 Nuclear Magnetic Resonance (NMR) 84 4.6.2 Surface Plasmon Resonance (SPR) 87 4.6.3 Thermal Shift Assay (TSA) 88 4.6.4 Biochemical Assay 88 4.6.5 Crystallography 88 4.6.6 Mass Spectrometry 88 4.6.7 Isothermal Titration Calorimetry (ITC) 88 4.6.8 Other Ideas and Approaches 89

Contents

4.7 Integrating Fragments with Other Compounds 89 4.8 Validating Fragment Hits: Comparing Methods 90 4.9 Fragment Hit Rates 91 4.10 Determining Structures of Fragment: Protein Complexes 91 4.11 Fragment Evolution 92 4.12 Concluding Remarks 96 Acknowledgments 96 References 96 5

Combining Structural, Thermodynamic, and Kinetic Information to Drive Hit‐to‐Lead Progression 99 Geoffrey A. Holdgate and Christopher Phillips

5.1 Introduction 99 5.1.1 Hit Identification 99 5.2 The Role of Thermodynamics in Hit to Lead 99 5.2.1 Isothermal Titration Calorimetry (ITC) Technology and Measurements 99 5.2.1.1 Introduction 99 5.2.1.2 Commercial ITC Instrument History 99 5.2.1.3 ITC Instrumentation 100 5.2.1.4 The ITC Experiment 100 5.2.1.5 Measurement of Thermodynamic Parameters 101 5.2.1.6 Benefits of Applying Thermodynamic Measurements in Hit to Lead 103 5.2.1.7 Exploitation of Measured Thermodynamics 104 5.3 The Role of Kinetics in Hit to Lead 107 5.3.1 Surface Plasmon Resonance (SPR) Technology and Measurements 107 5.3.1.1 Introduction 107 5.3.1.2 Commercial SPR Instrument History 108 5.3.1.3 General SPR Instrumentation 108 5.3.1.4 The SPR Experiment 109 5.3.1.5 Other Methods to Determine Binding Kinetics 113 5.3.1.6 Biolayer Interferometry (BLI) 113 5.3.1.7 Radioligand Binding 113 5.3.1.8 Residence Times 114 5.3.1.9 Mechanism of Action 116 5.3.1.10 Protein Structure and Dynamics 116 5.3.1.11 Correlating Kinetic and Structural Data in Hit‐to‐Lead Programs 117 5.3.1.12 Correlating Structural and Thermodynamic Data in Hit‐to‐Lead Programs 118 5.4 Summary 121 References 122 6

Allostery as Structure‐Encoded Collective Dynamics: Significance in Drug Design 125 Indira H. Shrivastava, Chang Liu, Anindita Dutta, Ahmet Bakan, and Ivet Bahar

6.1 Introduction 125 6.1.1 Experimental Data Highlight the Complexity of Allosteric Events, Beyond MWC or KNF Models: Bacterial Chaperonin GroEL as a Paradigm 125 6.1.2 New Views: Allostery as Population Shift Between Preexisting Conformers or Reconfiguration Along Preexisting Soft Path 127 6.1.3 Dynamics and Drug Discovery 128 6.2 Computational Methods 128 6.2.1 Gaussian Network Model (GNM): Simplest ENM 129 6.2.2 Anisotropic Network Model (ANM) Theory and Assumptions 131 6.3 Applications 132 6.3.1 Conformational Sampling by ANM and Comparison with Experimental Data and Molecular Simulations 132 6.3.1.1 p38 Protein Kinase 132

ix

x

Contents

6.3.1.2 Leucine Transporter (LeuT) 132 6.3.2 Allosteric Drug Binding Sites Inferred from ENM Analysis: Application to HIV‐1 Reverse Transcriptase 133 6.3.3 Pharmacophore Modeling from Druggability Simulations 134 6.4 Future Directions and Conclusion 136 6.4.1 Allosteric Cooperativity and Cellular Effects 136 6.4.2 Future Directions and Concluding Remarks 138 References 138 Part II 7

Tools 143

Biophysical Assessment of Target Protein Quality in Structure‐Based Drug Discovery 145 Arne Christian Rufer and Michael Hennig

7.1 Biophysical Methods in Drug Discovery 145 7.2 Case Study I: Micro‐inhomogeneity 146 7.2.1 Recovering Active Enzyme for Ligand Co‐crystallization: The Lysosomal Cysteine Protease Cathepsin S 146 7.3 The Role of Biophysical Methods in the Optimization of Protein Crystallization 150 7.3.1 Biophysical Methods Performed in Solution (Liquid Sample) 151 7.3.2 Biophysical Methods That Require Immobilization on Surfaces or Transfer of the Protein to the Gas Phase 154 7.4 Case Study II: Minimizing Macro‐inhomogeneity 154 7.4.1 Optimizing Detergent Conditions for Membrane Proteins: The Mitochondrial β‐Oxidation Pacemaker Carnitine Palmitoyltransferase 2 154 7.4.2 Comparison of AUC to SEC for the Characterization of CPT‐2 158 7.5 Outlook and Concluding Remarks: Requirements for Upcoming Biophysical Methods 159 Acknowledgment 160 References 160 8

An Industrial Perspective on Protein–Ligand Complex Crystallization 165 Carien Dekker and Arnaud Goepfert

8.1 Introduction to Co‐crystal Structures and Drug Development 165 8.2 Basics of X‐Ray Analysis of Co‐crystals 165 8.2.1 Protein Crystallography in a Nutshell 165 8.2.2 Application of X‐Ray Analysis to the Study of Ligand Binding 166 8.3 Ligands 167 8.3.1 Detection of Ligand Binding 167 8.3.2 Validation of Ligand Binding 167 8.3.3 Ligand Binding Sites: Pockets, Grooves, and Allosteric Sites 168 8.3.4 Crystal Packing Sites and Other Artifacts 168 8.3.5 Complementary Techniques: NMR, SPR, ITC, DSF, SAXS, and EM 168 8.4 Preparing Protein for Successful Crystallization 169 8.4.1 Construct Design and the Importance of Boundary Predictions 170 8.4.2 Protein Production 170 8.4.3 Protein Purification and Quality Assessment 171 8.5 Crystallization 172 8.5.1 The Crystallization Process 172 8.5.2 Crystallization Screening 172 8.5.3 Automation of Crystallization Setup and Imaging 173 8.5.4 Optimization of Crystallization 174 8.5.5 What If the Protein Does Not Crystallize? 174 8.6 Selecting Ligands for Crystallization 175

Contents

8.6.1 Introduction to Ligands and Compound Selection 175 8.6.2 Finding the Best Starting Point for Hit to Lead 176 8.6.3 Hot Spots, Potency, and IC50’s 176 8.6.4 The Importance of Solubility 177 8.6.5 The Effects of Ligands on Protein Stabilization 177 8.6.6 Pitfalls of Detecting Ligand Binding in a Protein Crystal 177 8.7 Methods for Obtaining Co‐Crystals: Soaking and Co‐crystallization 178 8.7.1 Concept of Ligand Soaking 178 8.7.2 Advantages and Potential of Soaking Ligands 178 8.7.3 Ligand Concentrations 179 8.7.4 Exemplary Soaking Can Guide Drug Discovery 180 8.7.5 Concept of Co‐crystallization 182 8.7.6 Advantages and Potential of Co‐Crystallization 182 8.7.7 Case Study: Complex Structures of HdmX with p53 Peptide Analogs 182 8.8 Future Perspective 184 Acknowledgment 184 References 184 9

Membrane Protein Crystallization 187 Ching‐Ju Tsai and Gebhard F. X. Schertler,

9.1 Introduction 187 9.1.1 A Brief History 187 9.1.2 Membrane Protein Biochemistry 187 9.1.3 Overview from Protein Preparation to Crystallization 189 9.2 Membrane Protein Production 189 9.2.1 Membrane Proteins from Natural Sources 191 9.2.2 Bacterial and Yeast Expression Systems 191 9.2.3 Insect Cell Expression Systems 191 9.2.4 Mammalian Cell Expression Systems 192 9.2.5 Cell‐Free Expression Systems 192 9.3 Amphipathic Manipulation of Membrane Proteins 192 9.3.1 Detergent in Membrane Protein Biochemistry 192 9.3.2 Classical Detergents 194 9.3.3 Maltose‐neopentyl Glycol Amphiphiles 194 9.3.4 Amphipols 194 9.3.5 Lipopeptide Detergents and β‐Strand Peptides 195 9.3.6 Nanodiscs 195 9.4 Preparing Protein Samples for Crystallization 195 9.4.1 Overview of Protein Sample Preparation 195 9.4.2 Criteria for Selecting Suitable Protein Targets 196 9.4.3 Quality Control of Purified MPs 197 9.4.4 Protein Engineering 198 9.5 Membrane Protein Crystallization 198 9.5.1 In Surfo Crystallization 199 9.5.2 In Meso Crystallization 200 9.5.3 Bicelle Crystallization 200 9.5.4 2D Crystallization 201 9.5.5 Effect of Detergent and Lipid in Crystallization 201 9.5.6 Complex with Fusion and Binding Partners 202 9.5.7 Optimizing Crystallization Conditions 202 9.5.8 Evaluating Crystallization Conditions 202 9.6 Methods for Determining Membrane Protein Structures 203 9.6.1 X‐Ray Crystallography 203

xi

xii

Contents

9.6.2 X‐Ray Free‐Electron Laser 203 9.6.3 Cryo‐electron Microscopy 204 9.6.4 Nuclear Magnetic Resonance Spectroscopy 204 9.7 Conclusion and Outlooks 204 References 205 10

High‐Throughput Macromolecular Crystallography in Drug Discovery: Evolving in the Midst of Revolutions 211 Gérard Bricogne

10.1 Introduction 211 10.2 Setting the Scene for Evolution and Revolutions 212 10.2.1 MX, SBDD, and SG and Beyond 212 10.2.2 MX for SBDD in a Nutshell: A Vast Combinatorial Search 212 10.2.3 The High‐Throughput (HT) Imperative in MX‐for‐SBDD Workflows 213 10.3 Baseline: The “Human, All Too Human” Workflow of Early MX for SBDD 214 10.4 First Wave of Automation Toward High‐Throughput Operation: Robotics Without Refactoring 215 10.5 Second Wave of Automation: The Twin Tracks of In Situ Crystallography and Microcrystallography 217 10.5.1 In Situ Crystallography 218 10.5.1.1 Initial Concept 218 10.5.1.2 Immediate Screening for Diffraction Quality 218 10.5.1.3 Data Collection from Multiple Small Crystals 218 10.5.1.4 Robotics Unbound: Take 1 219 10.5.1.5 Related Instrumental Developments 220 10.5.2 Microcrystallography 221 10.5.2.1 Initial Concept 221 10.5.2.2 Hard Limits on MX Diffraction Measurements from Fundamental Principles 221 10.5.2.3 Aiming for the Highest Possible S/N Ratio in MX Experiments 222 10.5.2.4 Related Instrumental Developments 228 10.5.3 Convergence Toward Multi‐crystal Data Collection 228 10.5.3.1 Microcrystallography in Action: A Chronicle of Early GPCR Structure Determinations 229 10.5.3.2 In Situ Crystallography in Action: Two Recent Studies on Membrane Proteins 231 10.5.3.3 Evolution of Processing Methods for Multi‐crystal Datasets 231 10.6 An Emerging Third Wave of Automation: Serial Microcrystallography 232 10.6.1 XFELs and SFX: A Very Brief Introduction 232 10.6.1.1 Sample Delivery Issues 232 10.6.1.2 Data Analysis Issues 233 10.6.2 Serial Crystallography as the Convergence of SMX and SFX 233 10.6.3 Pros and Cons of SMX vs. SFX 233 10.6.4 The Heart of the Matter for Serial Crystallography: The Humpty Dumpty Problem 234 10.6.5 A Neglected Niche: Club Class Data Collection on Macrocrystals 235 10.7 From Diffraction Data to Structural Results: Evolving Best Practices 235 10.7.1 Setting Up the Framework for Ligand Screening and Binding Mode Characterization by HTMX 235 10.7.1.1 Crystal System 235 10.7.1.2 High‐Quality Reference Model 235 10.7.1.3 Bookkeeping of Variants of Crystal Forms and Reference Models 236 10.7.1.4 Ligand Chemoinformatics and Molecular Geometry 236 10.7.2 Operating the High‐Throughput Ligand Screening Pipeline 236 10.7.2.1 Processing Diffraction Images and Identification of Crystal Form 236 10.7.2.2 Use of Reference Atomic Models and mtz Files 236 10.7.2.3 Eliciting Difference Density for Ligand Detection 236 10.7.2.4 Ligand Electron Density Analysis and Automated Fitting 237 10.7.2.5 Final Refinement of Ligand–Target Complex 237 10.7.2.6 Validation of the Ligand–Target Co‐structure 238

Contents

10.7.3 Post‐analysis of Ensembles of Ligand Co‐structures 238 10.7.4 Auto‐processing at Synchrotrons: Toward Specialized Cloud Computing? 238 10.8 Conclusions and Outlook: Whither HTMX for Drug Discovery? 239 Acknowledgments 240 References 240 11

Assessment of Crystallographic Structure Quality and Protein–Ligand Complex Structure Validation 253 Karolina A. Majorek, Matthew D. Zimmerman, Marek Grabowski, Ivan G. Shabalin, Heping Zheng, and Wladek Minor

11.1 Introduction 253 11.2 Quality Parameters 256 11.3 Dataset Quality 256 11.3.1 Dataset Quality: Resolution 256 11.3.2 Dataset Quality: Other Quality Indicators of Diffraction Data 257 11.3.3 Agreement of the Model to the Experimental Data 258 11.3.3.1 Agreement of the Model to the Experimental Data: R and Rfree Factors 258 11.3.3.2 Agreement of the Model with the Experimental Data: B‐Factors and Occupancy 258 11.3.3.3 Agreement of the Model to the Experimental Data: Electron Density Fit 259 11.3.4 Stereochemical Accuracy of the Model 261 11.3.4.1 Stereochemical Accuracy of the Model: Bond Length and Angle r.m.s.d. 261 11.3.4.2 Stereochemical Accuracy of the Model: Ramachandran Plot and Peptide Planarity 261 11.3.4.3 Stereochemical Accuracy of the Model: van der Waals Clashes and Hydrogen Bonding Networks 262 11.3.4.4 Stereochemical Accuracy of the Model: Metal Ions 263 11.3.5 Coordinate Uncertainty 263 11.4 Low‐Resolution Structures 263 11.5 Possible Influence of Crystal Packing 264 11.6 Software Tools 265 11.7 Analysis of Quality Metrics 268 11.8 Conclusions 271 Acknowledgments 271 References 271 12

Complementary Information from Neutron Crystallography Studies 277 Motoyasu Adachi and Ryota Kuroki

12.1 Introduction 277 12.2 Differences in Characteristics of X‐Rays and Neutrons 277 12.3 A Brief History of Protein Neutron Crystallography 278 12.4 Facility, Neutron Source, and Detector 278 12.5 Current Status of Neutron Protein Crystallography 278 12.6 Method for Neutron Crystallography of Proteins 279 12.6.1 Sample Preparation 279 12.6.1.1 Sample Requirement for Neuron Crystallography 279 12.6.1.2 Protein Overproduction and Purification for Neutron Protein Crystallography 280 12.6.1.3 Perdeuteration of Protein 280 12.6.1.4 Consideration of Chemical Instability During the Crystallization Experiment 280 12.6.1.5 Crystal Lattice Engineering to Control Crystal Packing 281 12.6.2 Crystal Growth for Neutron Protein Crystallography 281 12.6.2.1 Crystallization Screening to Obtain a Large Crystal Volume 281 12.6.2.2 Macroseeding to Grow Large Crystals 281 12.6.2.3 Periodic Addition of Protein to Promote Large Crystal Growth 281 12.6.3 Neutron Diffraction Experiment 281 12.6.3.1 Mounting a Deuterated Protein Crystal 281 12.6.3.2 Neutron Diffraction Data Collection 281 12.6.3.3 Data Processing of the Neutron Diffraction Dataset 282 12.6.3.4 Model Building Along with the Neutron Scattering Length Map 283

xiii

xiv

Contents

12.7 General Information Obtained from Neutron Crystallography 283 12.7.1 Hydrogen Atoms Contributing to the Fold of a Protein 283 12.7.1.1 Hydrogen Bonding Interaction 283 12.7.1.2 Structure of Aliphatic Groups 284 12.7.2 Hydration Structure 284 12.7.3 Ionization Status of Side Chains 285 12.7.3.1 Ionization Status of Active Site Residues 285 12.7.3.2 Hydrogen Atoms of Residues at Protein–Protein Interfaces 285 12.8 Use of Neutron Crystallography for the Structure Analysis of Protein Drug Targets 285 12.8.1 Acid Proteases 285 12.8.1.1 Acid Proteases as a Drug Target 285 12.8.1.2 Neutron Structure Analysis of HIV‐1 Protease 285 12.8.2 Serine Proteases 286 12.8.2.1 Serine Proteases as a Drug Target Protein 286 12.8.2.2 Neutron Structure Analysis of Porcine Pancreatic Elastase with Its Inhibitor 287 12.9 Use of Neutron Protein Crystallography for Drug Design 287 12.9.1 Determination of the Ionization State of Active Site Residues 287 12.9.2 Characteristics of the Hydration of Water 288 12.9.3 Discrimination of Atoms, Water Molecules, and Other Bound Inorganic Ions 288 12.9.4 Improvement of Protein–Protein Association 288 12.10 Future Perspectives of the Use of Neutron Crystallography for Drug Design 289 References 289 13

Determination of Protein Structure and Dynamics by NMR: State of the Art and Application to the Characterization of Biotherapeutics 295 Marc‐André Delsuc, Marc Vitorino, and Bruno Kieffer

13.1 Introduction 295 13.2 Solution Structure Determination of Macromolecules by NMR 296 13.2.1 Introduction 296 13.2.2 The NMR Observables 297 13.2.3 From NMR Observables to 3D Models 299 13.2.4 Current Developments 300 13.3 Assignment and Labeling Strategies 301 13.3.1 Spectral Assignment 301 13.3.1.1 Homonuclear Spectroscopy 301 13.3.1.2 Labeling with 15N, 13C, and 2H 301 13.3.2 Partial or Full Assignment 301 13.3.2.1 Backbone Assignment 301 13.3.2.2 Side‐Chain Assignment 302 13.3.2.3 Site‐Directed Mutagenesis 303 13.3.3 Isotopic Labeling Methods 303 13.3.3.1 Bacterial 303 13.3.3.2 Cell‐Free 303 13.3.3.3 Yeast 303 13.3.3.4 Insect or Mammalian Cells 303 13.3.3.5 Chemical Synthesis 303 13.3.4 Mixed Strategies 303 13.3.4.1 Native Unlabeled Proteins 303 13.3.4.2 Classical 15N, 15N─13C, and 15N─13C─2H Approaches 303 13.3.4.3 Methyl Groups 304 13.3.4.4 SAIL 304 13.3.4.5 Selective Amino Acid Labeling/Unlabeling 304 13.3.4.6 Subunit Labeling/Segmental Labeling 304

Contents

13.3.4.7 Chemical Labeling 304 13.4 NMR and Dynamic Aspects 304 13.5 Biomolecular Dynamics by NMR 307 13.5.1 Introduction 307 13.5.2 Probing Disorder: Relaxation Measurements and Interpretation 308 13.5.3 Beyond the Rotational Diffusion Limit 309 13.6 Intrinsically Disordered Proteins 310 13.6.1 Introduction 310 13.6.2 General Presentation of IDPs 310 13.6.3 NMR Techniques for IDP 311 13.6.4 Interpreting IDP Spectra 312 13.6.5 IDPs as Drug Targets 313 13.7 Alternative Approaches for Non‐soluble Proteins 313 13.7.1 Solid‐State NMR 313 13.7.2 Membrane Proteins 314 13.8 Optimized Strategies for the Study of Biomolecules in Solution 315 13.8.1 Instrumental Setup 315 13.8.2 Sample Preparation 315 13.8.2.1 Unlabeled Proteins in Buffer 316 13.8.2.2 Labeled Proteins 316 13.8.3 NMR Experiments 316 13.8.3.1 Quality Control 316 13.8.3.2 1D Spectroscopy on Unlabeled Samples 317 13.8.3.3 2D Spectroscopy on Unlabeled Samples 317 13.8.3.4 2D and 3D Spectroscopy on Labeled Samples 318 13.9 Conclusion 318 References 319 14

NMR Studies of Protein–Small Molecule Interactions for Drug Discovery 325 Daniel F. Wyss and Edward R. Zartler

14.1 Introduction 325 14.2 Early‐Stage Discovery 327 14.2.1 Target Validation 327 14.2.2 Hit Identification and Validation 328 14.2.3 Hit‐to‐Lead Optimization 332 14.3 Lead Optimization 333 14.4 Emerging Applications/Fields 335 14.4.1 Membrane Proteins 335 14.4.2 In‐Cell NMR Spectroscopy 337 14.4.3 Intrinsically Disordered Proteins 337 14.5 Outlook 340 References 340 15

Computational Structural Biology for Drug Discovery: Power and Limitations 347 Andrey V. Ilatovskiy, and Ruben Abagyan

15.1 Introduction 347 15.2 Converting PDB Entries into Full‐Atom Models 347 15.2.1 Extent of the Construct 347 15.2.2 Electron Density Fit 348 15.2.3 PDB Files 349 15.2.4 Pocket Environment and Functional State 349 15.2.5 Proteome Coverage 349 15.3 Homology Modeling 351 15.3.1 Homology Modeling Process 351

xv

xvi

Contents

15.3.2 Target Difficulty 352 15.3.3 Model Quality 352 15.3.4 Assessments 353 15.3.5 Limitations 353 15.3.5.1 Sequence–Structure Relationship and Template Selection 353 15.3.5.2 Target‐Template Alignment 353 15.3.5.3 Backbone Deviations 353 15.3.5.4 Loops 354 15.3.5.5 Ligand Binding 354 15.3.5.6 Model Selection and Ranking 354 15.4 Prediction of Protein–Ligand Interactions 354 15.4.1 Prediction of Transition State Analogs 357 15.4.2 Compound Profiling 357 15.4.3 Prediction of Compound Metabolism 357 15.5 Future Perspective 357 Acknowledgments 358 References 358 Part III Structure-Based Discovery in Some Important or Promising Targets and Therapeutic Families 363 16

The Role of Structural Biology in Kinase Inhibitor Drug Discovery Success 365 Mauro Angiolini

16.1 Protein Kinases and Their Structural Elements: A Dynamic Landscape for Drug Discovery 365 16.2 The Challenge of Selectivity and Drug Resistance: Design and Discovery of Afatinib, the First Irreversible Protein Kinase Inhibitor Approved for Cancer Treatment 370 16.3 When Structural Biology Drives Chemistry to Therapeutic Breakthrough: The Vemurafenib Case History 375 16.4 Second‐Generation Anaplastic Lymphoma Kinase Inhibitors: The Discovery of the First‐in‐Class Drug Ceritinib 377 16.5 The Discovery of Type II Inhibitor Ponatinib: A Milestone in the Struggle Against the ABL Gatekeeper Resistant Mutation T315I 380 16.6 Protein Kinase Inhibitor Drug Discovery and Structural Biology: Future Perspective 383 References 384 17

Serine Proteinases from the Blood Coagulation Cascade 395 Herman Schreuder and Hans Matter

17.1 Introduction 395 17.1.1 Some Background of Heart Disease 395 17.1.2 The Blood Coagulation Cascade 395 17.1.3 General Background About Serine Proteases 396 17.1.4 Indirect Coagulation Inhibitors 397 17.2 Thrombin 398 17.2.1 General Structure and Active Site 398 17.2.2 Exosite‐I 398 17.2.3 Exosite‐II 400 17.2.4 Discovery of Direct Thrombin Inhibitors 400 17.2.5 Hirudin 400 17.2.6 Hirudin‐Derived Thrombin Inhibitors 400 17.2.7 Thrombin Active Site Inhibitors 401 17.2.8 Melagatran 402

Contents

17.2.9 Dabigatran 402 17.2.10 The Quest for Orally Available, Non‐prodrug Direct Thrombin Inhibitors 402 17.3 Factor Xa (FXa) 404 17.3.1 Factor Xa Crystal Structure and Active Site 404 17.3.2 Heparin‐Binding Exosite 405 17.3.3 Discovery of Direct FXa Inhibitors 405 17.3.4 First‐Generation FXa Inhibitors Containing Benzamidines 405 17.3.4.1 DX‐9065a 405 17.3.4.2 Otamixaban 407 17.3.5 Second‐Generation FXa Inhibitors with Neutral S1 Substituents 407 17.3.5.1 Reversed Binding Mode of Chlorobenzothiophene Substituents 407 17.3.5.2 Apixaban 408 17.3.5.3 Rivaroxaban 409 17.3.5.4 The Secret Behind Chloro‐aryl Binding in the S1 Pocket 409 17.3.5.5 Dual Thrombin/FXa Inhibitors 410 17.4 Factor VIIa (FVIIa) 411 17.4.1 The FVIIa Active Site 412 17.4.2 Exosites 412 17.4.3 Discovery of Direct FVIIa Inhibitors 412 17.5 Further Developments 414 17.6 Factor IXa 414 17.7 Factor XIa 415 17.8 Impact of Structure‐Based Drug Design 415 References 416 18

Epigenetic Proteins as Emerging Drug Targets 423 P. Ann Boriack‐Sjodin

18.1 Introduction 423 18.2 Acetylation/Deacetylation 424 18.2.1 Histone Deacetylases 425 18.2.1.1 Metal Ion‐Dependent HDACs 425 18.2.1.2 Sirtuins 427 18.2.2 Bromodomains 429 18.2.3 Histone Acetyltransferases 430 18.3 Methylation/Demethylation 430 18.3.1 Protein Methyltransferases 431 18.3.1.1 Inhibition via SAM Competitive Binding: DOT1L 432 18.3.1.2 Inhibition via Substrate Competitive Binding: EHMT2 432 18.3.1.3 Inhibition Through an Allosteric Mechanism: PRMT3 435 18.3.1.4 Success Without Structure: EZH2 436 18.3.2 Demethylases 436 18.3.2.1 LSD Demethylases 437 18.3.2.2 JmjC Demethylases 437 18.3.2.3 Arginine Deiminases 438 18.3.3 Methyl‐lysine Reader Domains 438 18.3.3.1 PHD Domains 438 18.3.3.2 The “Royal Family” 438 18.3.3.3 WD40 Repeat Domains 440 18.4 Other Epigenetic Modifications 440 18.5 The Future of Epigenetic Drug Discovery 440 Acknowledgments 441 References 441

xvii

xviii

Contents

19

Impact of Recently Determined Crystallographic Structures of GPCRs on Drug Discovery 449 Francesca Deflorian, Jonathan S. Mason, Andrea Bortolato, and Benjamin G. Tehan

19.1 G Protein‐Coupled Receptors as Pharmaceutical Targets 449 19.2 Topology and Classes of GPCRs 449 19.3 GPCR X‐Ray Crystal Structures 450 19.4 Class A GPCR Small Ligand Binding Sites and Druggability 451 19.5 Class B GPCR X‐Ray Crystal Structures 455 19.6 Class C GPCRs and Allosteric Modulators 458 19.7 GPCR Activation 459 19.8 Structure‐Based Approaches to GPCR Drug Discovery 461 19.8.1 Virtual Screening 461 19.8.2 Fragment‐Based Drug Design 468 19.8.3 Biophysical Mapping™ 470 19.8.4 A Case History of Full SBDD for GPCRs: “High‐End” Design 470 19.9 Conclusion 473 References 473 20

Targeting Protein–Protein Interactions Perspective 479 Chun‐wa Chung and Michael M. Hann

20.1 Introduction 479 20.2 Detection and Analysis of PPIs 479 20.3 PPI Screening 482 20.3.1 Screening Technologies 482 20.3.2 Fragment‐Based Screening Methods 482 20.4 Examples 483 20.4.1 Inhibitors of MDM2–p53: A Breakthrough in PPI Targeting 483 20.4.1.1 Structure‐Based Design of Spirooxindoles as MDM2 Inhibitors 484 20.4.1.2 Dimer Stabilization: A Mechanism of Dual Inhibition of MDM2/X 485 20.4.1.3 Other Opportunities for Inhibition 485 20.4.2 Mimicry of Smac Peptide, IAP Antagonists 486 20.4.3 Fragment‐Based Approaches for BCL Antagonists 487 20.4.4 Epigenetic Reader Proteins 487 20.4.4.1 Bromodomains 487 20.4.4.2 Methyl‐lysine Reader Domains 490 20.4.5 Competitive Antagonists: IL‐2 Receptor 492 20.5 Stapled Peptides 494 20.6 Alternatives to Small‐Molecule Orthosteric Inhibition 494 20.6.1 Protein Degradation 494 20.6.2 PPI Stabilizers 495 20.7 Conclusion and Perspectives 496 References 496 21

Mass Spectrometry‐Based Strategies for Therapeutic Antibodies Extensive Characterization and Optimization (OptimAbs) 503 Amandine Boeuf, François Debaene, Daniel Ayoub, Hélène Diemer , Anthony Ehkirch, Elsa Wagner‐Rousset, Alain Van Dorsselaer , Sarah Cianférani, and Alain Beck

21.1 Introduction 503 21.2 Intact mAb Analysis 505 21.2.1 Denaturing MS 505 21.2.2 Native MS 506 21.2.3 Intact mAb Analyses of Hz6F4 Isotypes 506 21.3 Middle‐Up mAb Analysis 506 21.4 Bottom‐Up Peptide Mapping for Primary Structure Assessment 508 21.5 Top‐Down Approaches for mAb Sequencing 510

Contents

21.5.1 Top‐Down mAb Analysis 510 21.5.2 Middle‐Down mAb Analysis 510 21.6 Hydrogen/Deuterium Exchange Mass Spectrometry 513 21.7 Native MS and Ion Mobility–Mass Spectrometry (IM‐MS) for the Characterization of mAb/Ag Binding Stoichiometry and of Protein Conformation 514 21.8 From Optimized Antibodies (OptimAbs) to Optimized Antibody–Drug Conjugates (OptimADCs) 514 21.9 Concluding Remarks 515 Acknowledgments 515 References 516 Part IV 22

Challenges and New Frontiers 521

Integrating Evolution of Drug Resistance into Drug Discovery: Lessons from the Viral Proteases of HIV‐1 and HCV 523 Ayşegül Özen and Celia A. Schiffer

22.1 Evolution of Antiviral Drug Resistance 523 22.1.1 Human Immunodeficiency Virus 523 22.1.2 Structure and Function of HIV‐1 Protease 523 22.1.3 HIV‐1 Protease Inhibitors as Antivirals 524 22.1.4 Viral Resistance to HIV‐1 Protease Inhibitors 524 22.2 Substrate Envelope in Drug Design 528 22.2.1 Substrate Envelope in HIV Protease 528 22.2.2 Another Quickly Evolving Virus: Hepatitis C 528 22.2.3 NS3/4A Protease as an Antiviral Target 528 22.2.4 Generality of the Substrate Envelope Hypothesis 530 22.3 Protein Dynamics Is Key to Molecular Recognition 530 22.3.1 Change in Dynamics as a Drug Resistance Mechanism 531 22.3.2 Protein Dynamics Is Often Neglected in Drug Design 531 22.4 Robust Drug Design: Hitting Multiple Targets at a Time 532 22.4.1 Drug Target’s Function and Protein Dynamics 532 22.4.2 Integrating Dynamics into Drug Discovery While Avoiding Resistance: Dynamic Substrate Envelope 533 22.4.3 Predicting Substrate Coevolution Using the Dynamic Substrate Envelope 536 22.5 Future Perspective: Integrating Evolution and Conformational Dynamics into Drug Design 536 Acknowledgments 537 References 537 23

A Comprehensive Review on Mycobacterium tuberculosis Targets and Drug Development from a Structural Perspective 545 Jean‐Denis Pedelacq, Minh Chau Nguyen, Thomas C. Terwilliger , and Lionel Mourey

23.1 Introduction 545 23.2 Tuberculosis 545 23.2.1 The Disease 545 23.2.2 Epidemiologic Data and Treatment 545 23.2.3 Current Problems and Risks: The Challenge of a Tuberculosis‐Free World 546 23.3 Mycobacterium tuberculosis 548 23.3.1 Mycobacterium Species 548 23.3.2 The Mycobacterium tuberculosis Genome and Proteome 548 23.3.3 The Structural Proteome of Mycobacterium tuberculosis 548 23.4 Tuberculosis Drug Discovery and Development 549 23.4.1 Target Identification 549 23.4.2 Current Drug Development Pipeline 550 23.4.3 Structural Studies of Pertinent Enzyme–Drug Complexes 550

xix

xx

Contents

23.4.3.1 Isoniazid and Ethionamide 550 23.4.3.2 Pyrazinamide Activation Pathway and Target 554 23.4.3.3 RNA Polymerase and Regulatory Proteins 554 23.4.3.4 Fluoroquinolones Target DNA Gyrase 555 23.4.3.5 Examples of TB Drugs in Development and Their Targets 557 23.5 Conclusion and Perspectives 558 References 559 24

Using Crystal Structures of Drug‐Metabolizing Enzymes in Mechanism‐Based Modeling for Drug Design 567 Hao Sun and Dennis Scott

24.1 Structure‐Based Modeling of Cytochrome P450s 569 24.1.1 CYP3A4 572 24.1.2 CYP2C8, CYP2C9, and CYP2C19 574 24.1.3 CYP2D6 574 24.1.4 CYP1A2 575 24.2 Other Phase I Drug‐Metabolizing Enzymes 575 24.3 Phase II Drug‐Metabolizing Enzymes 576 24.4 Interplay Between Metabolism and Inhibition 577 24.5 Future Directions 579 Acknowledgment 581 References 581 25

Intrinsically Disordered Proteins: Targets for the Future? 587 Vladimir N. Uversky

25.1 Introduction 587 25.1.1 Introducing Intrinsically Disordered Proteins 587 25.1.2 Techniques for Structural Characterization of IDPs and IDPRs 588 25.1.3 Abundance of IDPs and IDPRs and Their Biological Functions 592 25.1.4 IDPs/IDPRs in Human Diseases 593 25.2 IDPs as Novel Drug Targets 594 25.2.1 Why IDPs? A Brief Overview 594 25.2.2 Druggability of Ordered Proteins 594 25.2.2.1 Major Categories of Protein‐Directed Drugs 594 25.3 Molecular Mechanisms of Drugs Targeting Ordered Proteins 595 25.3.1 Targeting Active Site 595 25.3.2 Targeting Protein–Protein Interaction Interfaces 595 25.3.3 Drugs for IDPs and IDPRs 595 25.4 Disorder‐Based Rational Drug Design 596 25.5 Direct Targeting of IDPs/IDPRs 597 25.6 Targeting Functionally Misfolded IDPs: A Hypothesis 599 25.7 Targeting Intrinsically Disordered Structural Ensembles: α‐Synuclein as an Illustration 599 25.8 Targeting Aggregating IDPs 601 25.9 Conclusions 602 Acknowledgments 603 References 603 26

Cryo‐electron Microscopy as a Tool for Drug Discovery in the Context of Integrative Structural Biology 613 Sacha De Carlo and Hervé‐William Rémigy

26.1 Introduction 613 26.2 The Resolution Revolution 613 26.3 What Is Cryo‐EM? 614 26.4 The Cryo‐EM Single‐Particle Analysis Workflow 614 26.4.1 Biochemistry 614

Contents

26.4.2 Screening the Conditions to Obtain an Optimal Sample 615 26.4.2.1 Differential Scanning Fluorimetry and High‐Resolution EM 616 26.4.2.2 ProteoPlex 616 26.4.3 Sample Vitrification 616 26.4.4 High‐Resolution Cryo‐EM 617 26.4.5 Data Collection 617 26.4.6 Increasing the Image Contrast 619 26.4.6.1 Choosing a Suitable Detector 620 26.4.6.2 Phase Plates to Address Smaller Macromolecular Complexes 622 26.4.6.3 Beam‐Induced Motion Correction and Dose Optimization 622 26.4.7 Three‐Dimensional Reconstruction and Map Interpretation 622 26.5 Selected Case Studies 624 26.5.1 The Glutamate Receptor: A Membrane Protein 624 26.5.2 Visualizing Ligands in High‐Resolution Cryo‐EM Maps of TRPV1 624 26.5.3 Visualizing Large Complexes Without Symmetry: The Ribosome 625 26.5.4 Visualizing Antigen–Antibody Complexes by Cryo‐EM 626 26.5.5 Cryo‐EM and SBDD: Antimalarial Mefloquine Derivatives 626 26.6 Summary and Conclusion 627 References 627 27

Application of Hard‐X‐Ray Free‐Electron Lasers for Static and Dynamic Processes in Structural Biology 633 Shibom Basu, Petra Fromme, and Raimund Fromme

27.1 Introduction: Overview of X‐Ray Free‐Electron Lasers 633 27.2 Comparison with Conventional X‐Ray Crystallography 634 27.3 XFEL Structures: Successes Since 2009 636 27.3.1 Photosystem I Structure at 8.5 Å: Proof of Concept 636 27.3.2 Lysozyme Structure at 1.9 Å: Proof of High Resolution 637 27.3.3 Single‐Virus‐Particle Imaging: Mimivirus at 30 nm 637 27.3.4 Cathepsin B Structure at 2.1 Å: First Unknown Structural Insight and Use of In Vivo Grown Crystals 638 27.3.5 Serotonin Receptor Structure at 2.8 Å: Room‐Temperature Conformation Different from the Cryogenic Structure Solved at Synchrotron 639 27.3.6 Phycocyanin: A Model Protein to Compare Different Crystal Delivery Methods to Obtain High‐Resolution Structures 640 27.3.7 Photosystem I and Ferredoxin: First Proof of Principle for Time‐Resolved Studies 640 27.3.8 Photosystem II: Unraveling the Water Oxidation Process – An Attempt to Make a Molecular Movie 641 27.4 Challenges in XFELs 642 27.5 Future Outlook 644 27.5.1 Comparison of Existing XFELs 644 27.6 Conclusion 644 Acknowledgments 645 References 645 Index 649

xxi

xxii

List of Contributors Ruben Abagyan

Skaggs School of Pharmacy and Pharmaceutical Sciences University of California San Diego La Jolla CA, USA Motoyasu Adachi

National Institutes for Quantum and Radiological Science and Technology Japan Atomic Energy Agency Tokai, Japan Mauro Angiolini

Eudendron Varese Italy Daniel Ayoub

Centre d’Immunologie Pierre‐Fabre Saint‐Julien en Genevois France Ivet Bahar

Department of Computational and Systems Biology School of Medicine University of Pittsburgh Pittsburgh PA, USA Ahmet Bakan

Department of Computational and Systems Biology School of Medicine University of Pittsburgh Pittsburgh PA, USA Shibom Basu

Center of Applied Structural Discovery at Biodesign Institute School of Molecular Sciences Arizona State University

Tempe AZ, USA and European Molecular Biology Laboratory Grenoble Outstation Grenoble France Alain Beck

Centre d’Immunologie Pierre‐Fabre Saint‐Julien en Genevois France Andreas Bender

Department of Chemistry University of Cambridge Cambridge UK Amandine Boeuf

Centre d’Immunologie Pierre‐Fabre Saint‐Julien en Genevois France P. Ann Boriack‐Sjodin

Epizyme, Inc. Cambridge MA, USA and Accent Therapeutics, Inc., Lexington USA Andrea Bortolato

Sosei Heptares Cambridge UK Gérard Bricogne

Global Phasing Limited Cambridge UK

List of Contributors

Babs Briels

Marc‐André Delsuc

Division of Medicinal Chemistry Vrije Universiteit Amsterdam Amsterdam The Netherlands

Institut de Génétique et de Biologie Moléculaire et Cellulaire Illkirch France

Chun‐wa Chung

Hélène Diemer

Medicinal Sciences and Technology GlaxoSmithKline R&D Stevenage UK Sarah Cianférani

Laboratoire de Spectrométrie de Masse BioOrganique Institut Pluridisciplinaire Hubert Curien Université de Strasbourg, CNRS Strasbourg France Ben J. Davis

Vernalis Research Cambridge UK François Debaene

Laboratoire de Spectrométrie de Masse BioOrganique Institut Pluridisciplinaire Hubert Curien Université de Strasbourg, CNRS Strasbourg France Sacha De Carlo

DECTRIS Ltd. Baden‐Dättwil Switzerland

Laboratoire de Spectrométrie de Masse BioOrganique Institut Pluridisciplinaire Hubert Curien Université de Strasbourg, CNRS Strasbourg France Anindita Dutta

Department of Computational and Systems Biology School of Medicine University of Pittsburgh Pittsburgh PA, USA Ursula Egner

Bayer AG Pharmaceuticals, Research & Development Berlin Germany Anthony Ehkirch

Laboratoire de Spectrométrie de Masse BioOrganique Institut Pluridisciplinaire Hubert Curien Université de Strasbourg, CNRS Strasbourg France Petra Fromme

Sosei Heptares Cambridge UK

Center of Applied Structural Discovery at Biodesign Institute School of Molecular Sciences Arizona State University Tempe AZ, USA

Chris de Graaf

Raimund Fromme

Francesca Deflorian

Division of Medicinal Chemistry Vrije Universiteit Amsterdam Amsterdam The Netherlands Carien Dekker

Protein Sciences, Chemical Biology & Therapeutics, Novartis Institutes for Biomedical Research Novartis Pharma AG Basel Switzerland

Center of Applied Structural Discovery at Biodesign Institute School of Molecular Sciences Arizona State University Tempe AZ, USA Arnaud Goepfert

Galapagos SASU Romainville France

xxiii

xxiv

List of Contributors

Marek Grabowski

Bruno Kieffer

Department of Molecular Physiology and Biological Physics University of Virginia Charlottesville VA, USA

Institut de Génétique et de Biologie Moléculaire et Cellulaire Illkirch France

Michael M. Hann

Japan Atomic Energy Agency Tokai, Japan

Medicinal Sciences and Technology GlaxoSmithKline R&D Stevenage UK Michael Hennig

leadXpro AG Villigen Switzerland Roman C. Hillig

Bayer AG Pharmaceuticals, Research & Development Berlin Germany Geoffrey A. Holdgate

Hit Discovery, Discovery Sciences BioPharmaceuticals R&D AstraZeneca Alderley Park UK Roderick E. Hubbard

Vernalis Research Cambridge UK and York Structural Biology Laboratory University of York York UK

Ryota Kuroki

Chang Liu

Department of Computational and Systems Biology School of Medicine University of Pittsburgh Pittsburgh PA, USA Karolina A. Majorek

Department of Molecular Physiology and Biological Physics University of Virginia Charlottesville VA, USA and CRUK Beatson Institute Glasgow UK Jonathan S. Mason

Sosei Heptares Cambridge UK Hans Matter

Sanofi‐Aventis Pharma Deutschland GmbH Frankfurt am Main Germany

Andrey V. Ilatovskiy

Wladek Minor

Skaggs School of Pharmacy and Pharmaceutical Sciences University of California San Diego La Jolla CA, USA and Division of Molecular and Radiation Biophysics Konstantinov Petersburg Nuclear Physics Institute NRC Kurchatov Institute Gatchina Russia

Lionel Mourey

Department of Molecular Physiology and Biological Physics University of Virginia Charlottesville VA, USA Institut de Pharmacologie et de Biologie Structurale (IPBS) Université de Toulouse, CNRS, UPS Toulouse France

List of Contributors

Minh Chau Nguyen

Gebhard F. X. Schertler

Institut de Pharmacologie et de Biologie Structurale (IPBS) Université de Toulouse, CNRS, UPS Toulouse France

Division of Biology and Chemistry‐Laboratory of Biomolecular Research Paul Scherrer Institute Villigen Switzerland and Department of Biology ETH Zürich Zürich Switzerland

Ayşegül Özen

Department of Biochemistry and Molecular Pharmacology University of Massachusetts Medical School Worcester MA, USA and Blueprint Medicines Cambridge USA Jean‐Denis Pedelacq

Institut de Pharmacologie et de Biologie Structurale (IPBS) Université de Toulouse, CNRS, UPS Toulouse France Christopher Phillips

Structure, Biophysics and FBLG, Discovery Sciences, BioPharmaceuticals R&D AstraZeneca Cambridge UK Hervé‐William Rémigy

ThermoFisher Scientific Eindhoven The Netherlands Jean‐Paul Renaud

Urania Therapeutics Ostwald France Arne Christian Rufer

Lead Discovery, pRED, Pharma Research & Early Development F. Hoffmann‐La Roche Ltd Basel Switzerland

Celia A. Schiffer

Department of Biochemistry and Molecular Pharmacology University of Massachusetts Medical School Worcester MA, USA Herman Schreuder

Sanofi‐Aventis Pharma Deutschland GmbH Frankfurt am Main Germany Dennis Scott

Pfizer Groton CT, USA Ivan G. Shabalin

Department of Molecular Physiology and Biological Physics University of Virginia Charlottesville VA, USA Indira H. Shrivastava

Department of Computational and Systems Biology School of Medicine University of Pittsburgh Pittsburgh PA, USA Hao Sun

Pfizer Groton CT, USA Benjamin G. Tehan

Sosei Heptares Cambridge UK

xxv

xxvi

List of Contributors

Thomas C. Terwilliger

Elsa Wagner‐Rousset

Los Alamos National Laboratory Los Alamos NM, USA

Centre d’Immunologie Pierre‐Fabre Saint‐Julien en Genevois France

Ching‐Ju Tsai

Daniel F. Wyss

Division of Biology and Chemistry‐Laboratory of Biomolecular Research Paul Scherrer Institute Villigen Switzerland Vladimir N. Uversky

Department of Molecular Medicine and USF Health Byrd Alzheimer’s Research Institute Morsani College of Medicine University of South Florida Tampa FL, USA and Laboratory of New Methods in Biology Institute for Biological Instrumentation Russian Academy of Sciences Pushchino Russia Alain Van Dorsselaer

Laboratoire de Spectrométrie de Masse BioOrganique Institut Pluridisciplinaire Hubert Curien Université de Strasbourg, CNRS Strasbourg France Marc Vitorino

NovAliX Illkirch France

Screening, Target and Compound Profiling Merck Research Laboratories Kenilworth NJ, USA Edward R. Zartler

Quantum Tessera Consulting Collegeville PA, USA Heping Zheng

Department of Molecular Physiology and Biological Physics University of Virginia Charlottesville VA, USA Matthew D. Zimmerman

Department of Molecular Physiology and Biological Physics University of Virginia Charlottesville VA, USA and Commonwealth Computer Research, Inc. Charlottesville VA, USA

xxvii

Preface Compared to the age of the Earth, the presence of human life on it represents the last four seconds of a day. The same probably holds true for rational drug design compared to the immemorial quest of humans for medicines. Science has been an organized human endeavor since the seventeenth century, and it is only since the second half of the twentieth century that the combined progress of physics, chemistry, biology, and computer science have given birth to molecular structural biology, the basis for rational drug design. Not only is structural biology a young discipline, but it is also experiencing these days tremendous developments, and thus it is a very exciting time for practitioners involved in its application to drug discovery, where spectacular technical advances are meeting more and more diverse innovative approaches for therapeutic intervention. On the technological side, we have moved from standard macromolecular X‐ray crystallography at third‐ generation synchrotron radiation facilities to burgeoning new developments. Serial femtosecond crystallography at X‐ray free‐electron laser (XFEL) facilities has enabled macromolecular structure determination at room temperature using nanocrystals, yielding physiologically more relevant structures while overcoming both radiation damage and the need for large crystals, which can be game‐changing for challenging targets such as membrane proteins. XFEL also brings new insights into dynamics through time‐resolved studies leading to experimental “molecular movies,” in particular on ligand binding, that will certainly be useful for drug design. Besides, classical X‐ray crystallography has been reinventing itself with serial microcrystallography at synchrotron micro‐focus beamlines, in situ diffraction, automated crystal harvesting, multi‐axis goniometers, etc. NMR is also constantly evolving with innovative methods for both ligand‐observed NMR to monitor target–ligand interactions at high throughput and protein‐observed NMR to solve structures and obtain detailed information on binding sites and on the dynamics of binding, with the help of higher magnetic fields, more sensitive probes, and residue‐specific labeling

schemes. But recently, the most spectacular advances have come from the side of single‐particle cryo‐electron microscopy (cryo‐EM): the convergence of many technical improvements in sample preparation, in microscopes, in detectors, and in software has revolutionized the use of this imaging technique, leading to a wealth of high‐ resolution structures of large and/or dynamic macromolecular assemblies, including many that could not be crystallized. In particular, single‐particle cryo‐EM will certainly play a major role for the study of membrane protein complexes. Another recently developed cryo‐ EM technique, micro‐electron diffraction (MicroED), has been shown to yield high‐resolution structures from well‐ordered nanocrystals, but its wide applicability to macromolecular assemblies needs to be further assessed. Structure‐based drug design is nowadays more and more relying on dynamic information, both from experiments and from simulations, and on the integration of complementary informations provided by a variety of biophysical techniques, in particular on the thermodynamics of binding, even though it cannot be used in a predictive fashion up to now and needs to be discussed in correlation with very‐high‐resolution structures to be meaningful, and on the kinetics on binding, although the concept of residence time should not be considered alone but in relation with pharmacokinetics. Finally, progress in structural bio‐/chemo‐informatics has been essential to complement experimental techniques with virtual screening and dynamics simulations and for the search of optimized chemical scaffolds now taking into account the flexibility of both the target and the ligand and the energetics of bound water molecules. On the therapeutic side, the repertoire of drug modalities and targets has widened considerably in the recent years. Besides small molecules and biologics, new territories of drug chemical space are being explored, including hybrids such as antibody–drug conjugates (ADCs), macrocycles, peptides and peptidomimetics, RNA, DNA, genome editing systems, and cells. Targets once thought to be undruggable or highly challenging such as membrane proteins and protein–protein interactions are now actively pursued. For more established targets, new

xxviii

Preface

drug candidates are designed to act not only at active sites but also at allosteric sites to achieve higher selectivity. The classical single‐target approach has opened to the multi‐target approach (polypharmacology) to tackle complex diseases. A protein of interest can now be targeted either directly or at the level of its encoding gene (gene editing), its mRNA (gene silencing, exon skipping, etc.), its transcriptional or translational regulation (regulatory factor modulation), or even its degradation (PROTAC™s). In all of these new therapeutic strategies, there is a thrilling game to play for structural biology in the frame of a renewed and extended partnership with medicinal chemistry, for instance, to design specific probes, PROTAC™s, or engineered CRISPR/Cas variants for improved genome editing. The book gathers contributions from expert practitioners in the field and is divided into four sections: Part I: Overview, concepts, and approaches. Part II: Tools, with a strong focus on experimental techniques. Part III: Case studies of structure‐based discovery on important, established but still challenging therapeutic target families, as well as on monoclonal antibodies as an example of biotherapeutics. Part IV: Some present‐day frontiers such as intrinsically disordered proteins, neglected diseases, or viral resistance, and promising techniques for the future (XFELs and cryo‐EM) have recently experienced spectacular developments but are not yet ready to be used on a routine basis in drug discovery because of some

remaining technical bottlenecks and issues such as throughput, access to high‐end equipments, and availability of experienced practitioners. However, progress in these fields is rapid and should be carefully monitored for drug discovery to benefit as soon as possible from the latest technological advances. We hope this book will spark interest and communicate enthusiasm for the exciting current developments in the structural biology aspects of drug discovery. In this complex field, it is crucial that structural biologists work hand in hand with medicinal chemists, computational scientists, pharmacologists, and others in order to synergize their respective efforts and boost the development of new drugs. The future holds in the combined use of all available techniques according to the French saying “faire feu de tout bois” to make the best of the expanding structural biology and biophysics toolbox in order to accelerate drug discovery. Early‐stage drug discovery is probably at the dawn of a new era that will see a burst of new biological knowledge and innovative therapeutic approaches. Let us hope these new advances will translate into efficient and safe new therapeutic treatments for mankind. Because this book encountered a number of difficulties along the way, I would like to wholeheartedly thank my editor Jonathan Rose for his extraordinary patience and my wife Christiane for her support and resilience. And last but not least, I am grateful to the many coauthors – the success of this book will be theirs. Jean-Paul Renaud

1

Part I Overview, Concepts, and Approaches

3

1 The Evolving Role of Structural Biology in Drug Discovery Jean‐Paul Renaud Urania Therapeutics, Ostwald, France

Corpora non agunt nisi fixata. Paul Ehrlich (1854–1915) In science truth always wins. Max Perutz (1914–2002) A picture is worth a thousand words.

1.1

Introduction

Life is based on “informed matter” (matter + information) shaped up by thousands of chemical reactions and intri cate interaction networks, and biology primarily relies on the observation of structures, historically from organisms to organs, tissues, cells, organelles and, more recently, macromolecules and macromolecular assemblies, fol lowing the progress of observation instruments and tech niques. In this book, “structural biology” refers to molecular structural biology, starting in 1953 with the discovery of the double‐helical nature of DNA, crucially based on unacknowledged experimental data from Rosalind Franklin, the true hero behind this discovery [1, 2]. However, the real breakthrough, achieved the same year, was the solution to the phase problem by Max Perutz [3], which led to the determination in the late 1950s of the first crystallographic structures of proteins, that of myo globin at 2 Å [4, 5] and that of hemoglobin at 5.5 Å [6]. Modern drug discovery is a highly complex and multi disciplinary activity that began at the turn of the twenti eth century [7], and structural biology started to have an impact in the seventies [8]. A founding father in the field and a visionary, Max Perutz not only opened the way to macromolecular crystallography by solving the phase problem but also understood very early the huge poten tial of structural biology for drug discovery. In 1968, he could relate clinical symptoms and abnormal properties

of mutant hemoglobins with the probable structural alterations caused by the corresponding mutations based on the crystal structure of hemoglobin [9]; in 1976, he reviewed the relevance of molecular biology, and particularly of molecular structural biology, to medicine, predicting that the tridimensional (3D) structures of enzymes could be used to design “tailor‐made” inhibi tors to treat diseases [10]. Indeed, the same year appeared the first rational ligand design study based on the crystal structure of deoxyhemoglobin in complex with 2,3‐ diphosphoglycerate [11]. Actually, the first drug derived from a structure‐based approach, the antihypertensive captopril [12], an angio tensin‐converting enzyme inhibitor, was approved in 1981, followed by dorzolamide [13], a carbonic anhy drase inhibitor used for the treatment of glaucoma, in 1995. The concept of structure‐based drug design (SBDD) became popular in the 1980s, but technical bottlenecks hampered its early adoption by the phar maceutical industry due to the high costs and long timelines involved. The first important breakthrough had been recombinant DNA technology [14], which enabled the production in bacteria of large amounts of proteins for structural studies thanks to heterologous gene expression systems, but early expression systems often failed to produce soluble proteins from mamma lian cDNAs, and few protein structures of therapeutic interest were available at the time. In the late 1990s, a series of technical improvements and breakthroughs at all levels, including: ●●

●●

new recombinant gene expression systems, including affinity tags for efficient protein or multiprotein com plex purification, parallelization, miniaturization, and automation of gene expression tests; protein purification; and protein crystallization,

Structural Biology in Drug Discovery: Methods, Techniques, and Practices, First Edition. Edited by Jean-Paul Renaud. © 2020 John Wiley & Sons, Inc. Published 2020 by John Wiley & Sons, Inc.

4

1 The Evolving Role of Structural Biology in Drug Discovery ●●

●●

advances in X‐ray crystallography (cryo‐cooling, sam ple changers, micro‐focus beamlines, single‐photon‐ counting detectors, seleno‐methinonine derivatives for phasing, etc.) and nuclear magnetic resonance (NMR) (high‐field magnets, sample changers, cryo‐ probes, site‐specific labeling, etc.), progress in software integration, as well as the massive increase in the speed of computers and in data storage capacities,

altogether leading to high‐throughput automated data collection and processing, made this adoption possible. Most of these developments were driven by the struc tural genomics (or, more correctly, structural proteom ics) initiatives [15–17] that followed the genome sequencing projects, particularly the Human Genome Project [18] (see also the “Structural Genomics Supplement” of Nature Structural Biology, November 2000 issue). Structural genomics consortia initially worked on bacterial targets because they were more amenable to a high‐throughput approach, but later some of them, such as Structural Proteomics IN Europe (SPINE) and the Structural Genomics Consortium (SGC), focused on human targets, more difficult to tackle but of high potential value for drug discovery pro grams [19, 20]. In the late 1990s, another important approach, frag ment‐based drug design (FBDD), reviewed in Chapter 4, was developed in industry, by both big pharmas and small biotechs [16, 21–23], even though the seminal con cept by William Jencks in 1981 [24] and the first proposal of fragment cocktail crystallography by Wim Hol in 1990 [25] came from academia. It emerged as an alternative to the then dominant high‐throughput screening (HTS) approach, based on massive combinatorial chemistry and looking for hits with nanomolar binding affinities, which in many cases failed to deliver easily optimizable hits. Through the screening of much smaller libraries (102–103) of smaller compounds called fragments (care fully selected compounds with a molecular mass around 200 Da) compared with the huge libraries (105–106) of “lead‐like” and “drug‐like” compounds with a molecular mass around 500 Da built for HTS, the chemical space is more efficiently sampled [26], and better starting points for the design of lead compounds are identified [27, 28]. Since the idea is to look for fragment hits binding to “hot spots” on the target even with millimolar affinities that can then be grown, linked, or merged into leads with much higher affinities, the rise of FBDD was allowed by the development of sensitive, high‐throughput biophysi cal techniques for fragment screening [29] and the care ful design of synthetically useful fragment libraries [30]. FBDD has proved to be effective in accelerating the hit‐ to‐lead process, two fragment‐derived drugs have

already been marketed (the selective B‐Raf V600E inhib itor vemurafenib in 2011 for the treatment of metastatic melanoma – only six years from project inception to market – and the Bcl‐2 inhibitor venetoclax in 2016 for the treatment of chronic lymphocytic leukemia), and dozens of compounds are currently in clinical trials [31]. Now structural biology has been fully integrated in the pharmaceutical industry drug discovery pipeline, and many companies use both HTS and fragment screening to identify starting points for their small‐molecule R&D programs. As HTS is still much used [32] even though it is labor intensive and costly, combining HTS with FBDD allows to get the best of both worlds, for instance, through deconstruction/reconstruction strategies [33] that can help in lead optimization, in particular by suggesting ideas for scaffold hopping. This has to be con sidered in a larger context where drug discovery in big pharmas tends to be more and more highly automated [34]. Hence the requirement, for structural biology to significantly impact a drug discovery program, that the structure of a newly validated target be available right from the start with a robust system (mostly based on X‐ ray crystallography up to now) already set up to allow the high‐throughput determination of hundreds of target– ligand complex structures. For an optimal therapeutic effect, a drug should have a known mechanism of action. In the old days, drugs were developed from “black‐box” phenotypic screenings using biological assays, mainly cellular or animal models related to pathological states, but the attrition rate was high due to a poor knowledge of the human biology underlying diseases. Progress in biochemistry and molecular biology has triggered a shift toward target‐ based drug discovery, boosted by the advent of high‐ throughput genomics and proteomics technologies and the completion of human genome sequencing [35]. The target‐based approach, to which SBDD applied to small molecules is ideally suited, has been dominant for the three past decades. However, this approach also failed to significantly decrease attrition in clinical phases, mostly due to lack of efficiency, which can be explained by the reductionist character of the “one disease–one target– one drug” paradigm and often by insufficient validation of the target [36]. This is why there has been in the recent years a renaissance of phenotypic drug discovery [37, 38] with powerful new tools such as high‐content screening, ultimately leading to identification of new targets, for instance, through the use of chemogenics libraries for which both the target and the mechanism of action are known [39]. Classically, proteins have essentially been the targets of small‐molecule drugs, but, with the advent of biologics (also called biotherapeutics or biopharmaceuticals), they have also become drugs themselves, including

1.2 The Expanding Toolbox of Structural Biology for Drug Discovery

ormones, enzymes, and antibodies, and the structural h biology methods were already available to characterize and optimize them. Already in 1972, the biological prop erties of chemically modified insulins were discussed in relation to structure [40]. Insulin started to be used as a drug in the early 1920s, but it was purified from porcine or bovine pancreas. The recombinant human insulin, the first “biologic” in the modern sense, was marketed in 1982 by Eli Lilly. Since then, the number of biologics has been steadily increasing, reaching 38% of the R&D pipe line drugs in 2017, mostly antibodies and other recombi nant proteins [41]. In 2017, among the top 10 drug worldwide sales, 8 were biologics including 6 antibodies, 1 protein conjugate vaccine, and 1 recombinant insulin analog [42]. However, among the newly approved thera pies over the 2010–2017 period, the small‐molecules‐to‐ biologics ratio has remained fairly constant (~75%/25%) [41], suggesting that the importance of biotherapeutics will certainly continue to increase but without putting an end to small‐molecule drugs, at least for a near future. Now biologics are diversifying, involving types of mac romolecules such as RNA [43] or DNA (for gene ther apy), and are becoming more complex, e.g. CRISPR/Cas genome editing systems [44] and cell‐based therapies. Molecular recognition between a target and a ligand is a central theme of structural biology in drug discovery, starting from Paul Ehrlich’s statement “Corpora non agunt nisi fixata.” Modern SBDD relies not only on struc tural information but also on information on the ther modynamics, kinetics, and dynamics of target–ligand interactions, hence the inclusion of complementary information from various biophysical techniques as well as molecular dynamics simulations. Chapter 5 summa rizes how thermodynamic and kinetic information can be used in SBDD. For a long time, SBDD had mostly relied on the essentially static information from crystal structures and on dynamic information from NMR. Target–ligand interaction models have evolved from a static to a dynamic vision and from lock‐and‐key to induced fit to conformational selection and ligand bind ing ensembles, and protein flexibility is now recognized as a crucial parameter for protein function and is more and more taken into account for drug design (as well as ligand flexibility) thanks to ever‐increasing computing power. Allostery [45–47] has become an important concept in drug design and is discussed in Chapter 6.

1.2 The Expanding Toolbox of Structural Biology for Drug Discovery The precise knowledge of the structure of cellular macromolecules (proteins and nucleic acids), their stable or transient assemblies (multi‐protein complexes and

nucleoprotein complexes), and their complexes with lipids, cofactors, and ligands (endogenous, dietary, or xenobiotic) is essential to understand their function, in particular their dysfunction in disease, and to design therapeutic strategies aiming at directly or indirectly modulating their activity by drugs. X‐ray crystallography and NMR have long been the most used structural biol ogy techniques in drug discovery, but the toolbox has expanded with the increasing role of biophysics [29], the use of a whole range of computational techniques ranging from docking to molecular dynamics ([48] and Chapter 15), and the recent burst of cryo‐electron microscopy (cryo‐EM) thanks to the so‐called resolution revolution [49, 50]. Now structural biology has become instrumental in target‐based/mechanism‐based drug discovery, and Part II of the book reviews the main tech niques used in structural biology in a drug discovery context. The direct imaging techniques used in struc tural biology are shown in Table 1.1. To obtain atomic‐scale information on target–ligand complexes, even though atomic resolution (by the way, a term to be used with caution [51, 52]) is rarely achieved, the main direct imaging technique that has been used is single‐crystal X‐ray crystallography, mostly at synchro trons. It is not the purpose of this book to describe in detail the basics and various aspects of macromolecular X‐ray crystallography, a huge field in itself, and readers who want more comprehensive information on this topic are encouraged to refer to a selection of outstanding books [53–55]. High‐throughput crystallography allowed by extensive automation at third‐generation synchrotron radiation sources [56–60], optimized for high brilliance, has matched the needs of industrial programs [15, 16, 61]. However, a number of limitations were still remain ing, and some of them have been addressed by recent developments in the field. For instance, in situ diffraction [62–65] and automated crystal harvesting [66–68] have alleviated the burden of handling and mounting crystals. Microcrystallography involving micro‐focus beams 20 μm or less in diameter [69] allowed microcrystals (10–50 μm) or imperfect crystals to be used for data col lection, circumventing the problem of growing large crystals for challenging targets including membrane pro teins. Even so, radiation damage remains an important issue although it is significantly reduced by cryo‐cooling [70, 71], but it can be eliminated with serial femtosecond crystallography (SFX), a radically new approach that uses an X‐ray free‐electron laser (XFEL) facility as a source of femtosecond X‐ray pulses [72–75]. Nanocrys tals are brought to the beam either via the continuous stream of a viscous medium thanks to an injection device or on a fixed target that is rastered by the beam and are never shot more than once by an X‐ray pulse, leading to room‐temperature, radiation‐damage‐free structures.

5

6

1 The Evolving Role of Structural Biology in Drug Discovery

Table 1.1 Direct imaging techniques used in drug discovery. Maximum resolution obtained with macromolecules

Sample

Probe

Technique

Interaction

Outcome

Solid (crystal)

Photons (X‐rays)

X‐ray crystallography

Electrons

Electron density map

~0.5 Å

Solid (crystal)

Electrons

Electron crystallography (including micro‐electron diffraction [MicroED])

Electrons and protons

Electrostatic potential map

~1 Å

Solid (crystal)

Neutrons

Neutron crystallography

Nuclei

“Nuclear density” map

~1 Å

Solid (cryo‐cooled Electrons solution)

Cryo‐electron microscopy (cryo‐EM)

Electrons and protons

Electrostatic potential map

~1.5 Å

Solution

Photons (X‐rays)

Small‐angle X‐ray scattering

Electrons

Molecular envelope

~1 nm

Solution

Neutrons

Small‐angle neutron scattering

Nuclei

Molecular envelope

~1 nm

Cryo‐cooled cell suspension

Electrons

Electron tomography

Electrons and protons

Electrostatic potential map

~5 nm

Cryo‐cooled cell suspension

Photons (X‐rays)

Soft X‐ray tomography

Electrons

Electron density map

~25 nm

The “diffract before destroy” concept behind XFEL was put forward in 1986 [76], simulated in 2000 [77], and experimentally established in 2011 [72]. SFX has been most successful for structure determination of mem brane proteins grown in a lipidic cubic phase (LCP) as both the crystallization and sample delivery medium [74, 75] and also enables ultrafast time‐resolved studies (“molecular movies”) [78–80]. SFX also provides more native structures since working at room temperature avoids the bias of conformational trapping at very low temperature, as is the case with cryo‐cooling. Pioneers in the field describe in Chapter 27 the historical land marks of the use of XFELs in structural biology and the remaining challenges. And finally, the spectacular suc cess of SFX prompted synchrotron scientists to combine the use of a micro‐focus beamline with a rapid detector and microcrystals either on a fixed target or in the con tinuous flow of a microcrystal suspension to perform multi‐crystal data collection, leading to serial micro crystallography (SMX) at synchrotron facilities, also called serial synchrotron crystallography (SSX) [81, 82]. In this way, radiation damage is minimized, even at room temperature [83]. Chapter 10 provides an exhaustive account of these recent technical advances in X‐ray crystallography as well as an outstanding integrated vision of the whole process from crystals to diffraction data to electron density map to model and how it should be optimally implemented in the future to address the high‐throughput imperative of the pharmaceutical industry.

Neutron crystallography (Chapter 12) has also been used, albeit marginally, because of the paucity of neutron sources and the need for large crystals (~mm3) to com pensate for the low flux of available neutron beams, even though the new spallation sources produce much more intense beams allowing for much shorter collection times and the use of smaller crystals [84, 85]. However, neutron crystallography brings complementary, some times crucial information on the position of hydrogen atoms, in particular the protonation state of the ligand, of specific amino acids of the target, and of water mole cules in the active site. Indeed, the reversible binding of protons plays a key role in the catalytic mechanism of many enzymes and in allosteric regulation. Neutrons interact with nuclei, and the scattering lengths of hydrogen (and deuterium) nuclei are in the same range of magnitude as those of other atoms; hence the H/D atoms are clearly seen in “nuclear density” maps (although it is slightly more complicated for hydrogen due to its negative scattering length, hence the wide spread use of H/D exchange). On the other hand, X‐rays are scattered by electrons, and thus X‐ray scattering depends on the atomic number Z, and hydrogen atoms are hardly seen in X‐ray crystallographic structures, except (partially) at ultrahigh resolution. Even though the position of most hydrogen atoms can be inferred from the positions of the other atoms from stereochemical considerations (sometimes only statistically due to the rotational degrees of freedom), some hydrogen atoms cannot be unambiguously located, for instance on

1.2 The Expanding Toolbox of Structural Biology for Drug Discovery

istidine, aspartate, and glutamate residues. For instance, h it is sometimes essential to have a complete knowledge of an enzymatic catalytic mechanism in order to design an efficient inhibitor, and in such cases neutron crys tallography is a powerful complement to X‐ray crystal lography, as shown by recent joint X‐ray/neutron crystallographic studies for drug design (see, for instance, [86–88]). Another advantage of neutron crys tallography is the lack of radiation damage to crystals even at room temperature with the beams used for dif fraction experiments. The third crystallographic technique used in struc tural biology is electron crystallography. It was used in the early days of electron microscopy to obtain struc tures from 2D crystals [89–92], but 2D crystals are not always available, and its use declined with the develop ment of single‐particle analysis (SPA) cryo‐EM (see below). Electron crystallography has recently reemerged thanks to a new electron diffraction technique called MicroED that uses 3D microcrystals [93, 94]. For all crystallographic studies (using X‐rays, neutrons, or electrons), the first and obligatory step is to obtain crystals, hence the importance of the science of macro molecular crystallization (see part 4 of [54] as well as [95–99]). Industrial aspects of target–ligand crystalliza tion campaigns are described in Chapter 8. Crystallization of membrane proteins is especially challenging and is described separately in Chapter 9; as it will certainly remain a real bottleneck for crystallographic structure determination, in this domain cryo‐EM will certainly play a very important complementary role. By the way, it is of utmost importance that all macro molecules or molecular complexes that are to be used for crystallization trials as well as for almost all structural or biophysical studies be of the highest possible purity and their quality (identity, conformational homogeneity, oligomeric state, stability in terms of oxidation and aggregation, characterization of contaminants, etc.) be biophysically assessed through a thorough quality control protocol, as described in Chapter 7. This is even more crucial when the macromolecule is a biotherapeutic drug candidate, as illustrated in Chapter 21 for the characteri zation of therapeutic antibodies by mass spectrometry techniques. When the macromolecule or molecular complex is the target, it has been shown that the use of biophysical methods increases the success in obtaining target–ligand complex crystal structures [100]. A major recent evolution in structural biology has been the advent of cryo‐EM as another major tool for the high‐ resolution structure determination of macromolecular complexes. As neutron crystallography, electron micros copy is a fallout of the wave–particle duality hypothesis for mulated in 1924 by Louis de Broglie in his PhD thesis [101] – hence the need to support basic science. The first

electron microscope was built by Ernst Ruska and Max Knoll in 1931, and since then the limit in resolution has been regularly pushed back by progress in technology (accelerating voltage, electron optical lenses, vacuum sys tems, etc.). But radiation damage was a bottleneck until the invention of cryo‐EM in the early 1980s [102]. However, for a long time, its use was restricted to the study of large com plexes that could not be crystallized and usually led to low‐ resolution models. The most significant breakthrough in the recent years has certainly been the spectacular progress of SPA cryo‐EM. Recent improvements in microscopes, in sample preparation and vitrification, and in software (object classification, projection image alignment) as well as the introduction of single‐electron detectors and Volta phase plates have led to cross the 3‐Å resolution limit and to study smaller objects (see, for instance, [103–105] and Chapter 26). SPA cryo‐EM has already made and will con tinue to make a huge impact in biology by enabling the structural study of a number of biologically important, large, and/or dynamic assemblies that cannot be crystal lized, including complexes that undergo large conforma tional changes such as the spliceosome. Besides, the role of cryo‐EM in drug discovery has also been discussed recently [106–111]; although there is clearly a huge potential ahead, it appears that at present cryo‐EM is not yet ready to be used as a routine tool for SBDD. Important barriers have already started to be broken down, namely, in terms of reso lution achieved and size of study objects ([103] and table 1 of [111]), but major bottlenecks such as sample vitrification, throughput of data collection, and access to microscopes have to be overcome for cryo‐EM to compete with high‐ throughput X‐ray crystallography. Robust automated workflows with rigorous criteria to assess the quality of cryo‐EM maps and structures still need to be set up. The other main experimental technique to build atomic models of macromolecules and their complexes is NMR, although it is not a direct imaging technique but rather a spectroscopic technique. NMR yields detailed structural information from the chemical envi ronment of atomic nuclei and the interactions between them. Moreover, NMR gives access to the dynamics of molecular systems in solution. It is also a versatile and sensitive tool to study target–ligand interactions. Due to its importance, two chapters are dedicated to NMR: Chapter 13 describes its use for the determination of protein structure and dynamics, which is crucial for the characterization of biotherapeutics and for the study of challenging targets such as intrinsically disordered proteins (IDPs) or disordered protein domains that cannot be crystallized (see also Chapter 25), and Chapter 14 describes the use of NMR studies of target– small molecule interactions in drug discovery. Table 1.2 recapitulates key events in the development of the three main techniques (X‐ray crystallography,

7

8

1 The Evolving Role of Structural Biology in Drug Discovery

Table 1.2 Selected key events in the development of the three main techniques (X‐ray crystallography, NMR, and cryo‐EM) used for 3D structure determination of macromolecules and their adoption for drug discovery by the pharmaceutical industry. Field

Year

Event

X‐ray crystallography

1953

Max Perutz solved the phase problem [3]

1958–1960

Crystal structures of myoglobin (Kendrew) [4, 5] and hemoglobin (Perutz) [6]

Max Ferdinand Perutz and John Cowdery Kendrew (1962)

1969

Structure of insulin [112]

Dorothy Crowfoot Hodgkin (1964)

1970s

Structures of globins, enzymes, and polypeptide hormones

1971

Protein Data Bank [113]

1979

Collaborative Computational Project No. 4 (CCP4) – Software for Macromolecular X‐Ray Crystallography [114]

1981

Approval of Capoten™ (captopril), first drug derived from a structure‐based approach (hypothetical model of the target, ACE, based on the crystal structure of carboxypeptidase A, a mechanistic homolog) [12]

1985

First X‐ray crystallographic determination of a membrane protein structure [115]

1989

Crystal structures of viral proteases [116–120]

1994

Structure of F1‐ATPase [121]

1995

Approval of Trusopt™ (dorzolamide), first drug unequivocally derived from structure‐based drug design (SBDD) [13]

1998

First structure of an ion channel [122]

2000

First structure of a mammalian GPCR [123]

2000s

High‐throughput X‐ray crystallography at synchrotron beamlines: sample changers, automated operation, and remote data collection [56–60]

2003

40+ compounds derived from SBDD have entered clinical trials; 7 of them have been approved [124]

2011

Refined structure of the yeast ribosome, largest asymmetric macromolecular assembly for which a crystal structure was determined so far [125]

2011

Structure of a GPCR – G protein complex [126]

2011

Approval of vemurafenib, first drug derived from FBDD [28, 127]

1946

Invention of nuclear magnetic resonance [128, 129]

1957

First spectrum of bovine pancreatic ribonuclease at 40 MHz [130]

1966

Application of Fourier transform spectroscopy to magnetic resonance [131, 132]

1971

2D NMR introduction [133–135]

1985

Proteinase inhibitor IIA solution structure by 1H NMR [136]

1988

3D NMR introduction [137, 138]

1996

First proof of concept for FBDD: “SAR by NMR” [21]

1999

Saturation transfer difference (STD) NMR [139]

2000

Screening by NMR against proteins containing 13C‐labeled methyl groups [140]

NMR

Associated Nobel Prize

Johann Deisenhofer, Robert Huber, and Harmut Michel (1988) John Walker (1997)

McKinnon (2003)

Brian K. Kobilka (2012) Felix Bloch and Edwards Mills Purcell (1952)

Richard Ernst (1991)

Kurt Wüthrich (2002)

1.2 The Expanding Toolbox of Structural Biology for Drug Discovery

Table 1.2 (Continued) Field

Cryo‐EM

Orthogonal techniques

Year

Event

2000

Screening by NMR based on magnetization transfer from bulk water (WaterLOGSY) [141]

2000–2002

Spin labels to identify protein–ligand interactions by NMR [142]

2002

“NMR reporter screening” for the detection of high‐affinity ligands [143]

2003

High‐throughput NMR screening using fluorine (FAXS) [144]

2004

Affinity‐tag‐based NMR screening of protein–protein interactions [145]

2010

NMR sample changer for medium‐throughput automation [146]

1968

3D reconstruction of the helical tail of bacteriophage T4 by electron crystallography [89]

Aaron Klug (1982)

1981

Cryo‐electron microscopy of thin vitrified specimens [102, 147, 148]

Jacques Dubochet (2017)

1981

Use of multivariate statistics in analyzing the images of biological macromolecules [149]

Joachim Frank (2017)

1990

Structure of bacteriorhodopsin by electron crystallography [91]

Richard Henderson (2017)

1995

In theory, 10 000 images of particles are sufficient to provide an atomic‐resolution structure of a 100‐kDa complex with EM [150]

Richard Henderson (2017)

1996

Refined structure of bacteriorhodopsin by electron crystallography [151]

Richard Henderson (2017)

2002

Electron Microscopy Data Bank (EMDB) [152]

2005

Structure of aquaporin 0 at 1.9 Å by electron crystallography reveals lipid–protein interactions [153]

2007–2008

First structures of high‐symmetry viral particles at near‐atomic resolution [154–156]

2008

First structure of a non‐icosahedral particle at near‐atomic structure [157]

2009

Fully automated 12‐sample changer [158, 159]

2011–2013

Direct electron detector introduced, with “movie” mode [160, 161]

2012

Launching of Collaborative Computational Project for Electron Microscopy (CCP‐EM) [162]

2012

Proteasome: first structure of a mega‐complex solved by a combination of cryo‐EM and X‐ray crystallography [163]

2013–2014

Structures at near‐atomic resolution by cryo‐EM [161, 164–166]

2014

Volta phase plates introduced [167]

2017

Structures of GPCR – G protein complexes at near‐atomic resolution [168, 169]

2017

Rabbit muscle aldolase structure at 2.6 Å solved by single‐particle cryo‐EM at 200 keV [170]

1969

Transfection technology [171]

1973–1974

Recombinant DNA [14]

1980s

Structural bioinformatics (molecular modeling and computational chemistry)

1981

Fragment‐based drug design (FBDD) conceptualized by William Jencks [24]

Source: Adapted from figure 2 of [111].

Associated Nobel Prize

9

10

1 The Evolving Role of Structural Biology in Drug Discovery

NMR, and cryo‐EM) used for 3D structure determina tion of macromolecules and their adoption for drug dis covery by the pharmaceutical industry (adapted from [111]). Concerning X‐ray crystallography, Tom Blundell, another pioneer in the field, has given a vivid account of the exchanges between academia and industry over the whole period [172]. As an experimental structure is always a model built from experimental data, it always relies in part on inter pretation, and structural information (in practice, mostly derived from X‐ray crystallography) should always been analyzed with a critical eye and used with caution in drug design [173, 174] – hence the crucial need for thorough validation process, as explained in detail in Chapter 11 [175]. X‐ray crystallography maps show electron densities and thus somehow represent the chemical identity of atoms (although this begins to appear only at ultrahigh resolution). However, these maps may be biased since they are computed from both measured intensities (including experimental errors) and estimated or calcu lated phases that also include errors; moreover, a fre quent source of problems is density overfitting. Concerning cryo‐EM, one should bear in mind that it provides electrostatic potential maps (thus not directly comparable with electron density maps derived from X‐ ray diffraction data) that should also be interpreted with caution [176]. Furthermore, concerns have recently been raised about the lack of a thorough validation protocol for cryo‐EM structures compared with X‐ray crystallog raphy, emphasizing the need for more validation tools [177, 178] and stricter cryo‐EM structure deposition standards [179]. Chapter 15 serves as an introduction to computational structural biology/structural bioinformatics, a huge domain that is beyond the scope of this book but is nev ertheless an essential component of structural biology for drug discovery. Finally, some techniques are not described in detail in this book but have been reviewed elsewhere, such as small‐angle X‐ray scattering (SAXS) and small‐angle neutron scattering (SANS) [180, 181] and solid‐state NMR [182]. Indeed both X‐rays and neutrons can also be used to study macromolecules in solution (Table 1.1). SAXS and SANS are low‐resolution techniques, but they usefully complement high‐resolution structure analysis by bringing information on the shape and dynamics of large complexes and of IDPs. A number of established and emerging complemen tary biophysical techniques are used to study target– ligand interactions (see tables 1 and 2 in [29] and references therein), either in a high‐throughput fashion to screen large libraries or with a lower throughput to obtain detailed information on the binding of selected

compounds. For screening purposes, the most popular techniques these days are surface plasmon resonance (SPR); differential scanning fluorimetry (DSF), also called “thermal shift assay”; and microscale thermopho resis (MST). Techniques with a lower throughput – but, conversely, with a higher information content – include isothermal titration calorimetry (ITC) and non‐covalent/ native mass spectrometry. Some techniques are able to probe conformational changes, such as second‐harmonic generation (SHG), surface acoustic waves, electrically switchable nanolevers, and, more recently, attenuated total reflection Fourier transform infrared (ATR‐FTIR) spectroscopy [183]. Some recent books [184, 185], review articles [29, 186], and a specific conference [187] are dedicated to the growing use of biophysics in drug discovery. In summary, the increasing use of integrative struc tural biology in drug discovery implies the need for a comprehensive toolbox: in general one technique cannot bring all answers, and thus a combination of comple mentary techniques must be used, each having its own limitations and shortcomings. It is also very important to use orthogonal techniques based on different physical principles in order to cross‐validate results, which con siderably reduces the number of false positives and thus helps to save time and efforts.

1.3 The Various Uses of Structural Biology in Drug Discovery The basic task of molecular structural biology has been to provide information on the 3D structures and confor mational dynamics of the main chemical players of life: macromolecules (proteins and nucleic acids), alone or in complex with natural or synthetic ligands or in more complex assemblies. The Protein Data Bank (PDB) has been a repository for such structures [188], and, since it was established in 1971, ~156 000 structures have been deposited in the PDB, including ~139 000 by X‐ray crys tallography (~89.1%), ~12 800 by NMR (~8.2%), and ~3 800 by electron microscopy (~2.4%) (figures as of September 2019; for an updated breakdown, see https://www.rcsb.org/ pdb/statistics/holdings.do). This structural knowledge provides the basis to eluci date biological mechanisms at the molecular level and propose new therapeutic modes of intervention/modu lation and for rational drug design. Indeed a deep understanding of biological mechanisms underlying pathologies is key to successful drug discovery [189]. Today, structural biology is useful throughout the early‐ stage drug discovery pipeline (from target identification to lead optimization), but, in the early days, its role was

1.3 The Various Uses of Structural Biology in Drug Discovery

Figure 1.1 Various uses of structural biology in the drug discovery pipeline. Cryo‐EM, cryo‐electron microscopy; NMR, nuclear magnetic resonance.

Target identification and validation

Target evaluation

Discovery biology/ basic research (in complement to genetic and genomic approaches, phenotypic screening, and chemical biology)

Hit identification

Structure-based screening (binding assays) FBDD, SBDD

Ligand- and proteinobserved NMR, X-ray crystallography, Ligandability various other assessment biophysical techniques, X-ray crystallography, and structural cryo-EM X-ray bioinformatics crystallography, NMR, other biophysical techniques, and structural bioinformatics

essentially confined to the last stage, lead optimiza tion – the classical SBDD approach to improve com pound potency and selectivity. Figure 1.1 summarizes the various uses of structural biology in drug discovery. Indeed, in the 1980s, crystal structures of the target (or of a homolog when the structure of the target itself was not available) started to be used for lead optimization, but, because of the time needed to solve crystal struc tures, structural biology could not be used at the upstream stages. One of the first reviews on the early attempts to use structural biology for rational drug design dates back to 1986 [190]. Since then, SBDD has become a very powerful strategy to assist medicinal chemists in delivering optimized leads from selected validated hits, essentially by improving the affinity and selectivity of a ligand for a target from the visualization of target–ligand interactions, guided by the progress in biological activity, through iterative (structure → design → synthesis → test) cycles until some desired criteria (e.g. affinity/efficacy/potency and various pharmacolog ical parameters) are reached. There are two major requirements for this approach to be successfully applied: 1) The 3D structure of the target must be available at the start of the project – meaning that the target is known (target‐centric drug discovery). 2) A robust crystallographic pipeline must have been set up to progress rapidly. Indeed, the principles of SBDD were already set in the 1980s, but it was only after several bottlenecks in throughput were overcome thanks to: ●● ●●

the streamlining of the gene‐to‐crystal pipeline, automated data collection at synchrotron (the crystal‐ to‐diffraction data pipeline),

Hit-to-lead and lead optimization

●●

●●

X-ray crystallography, protein-observed NMR, and other biophysical techniques with the help of structural bioinformatics and chemoinformatics

automated structure determination and ligand build ing (the diffraction data‐to‐structure pipeline), progress in computers (computing speed and data storage capacities),

that SBDD could be implemented in pharma companies. High‐throughput X‐ray crystallography could then be used not only for lead optimization but also for the upstream stages of the drug discovery pipeline (Figure 1.1): i) Target identification and validation (discovery biol ogy): as a major tool of structural proteomics and in complement to genetic and genomic approaches, phe notypic screenings (a beautiful example in the case of PCSK9, a challenging target, is given in [191, 192]), and chemical biology that all generate data and hypotheses that need to be validated, which can lead to new ideas for therapeutic intervention. Chemical probes [193, 194] are a great example of the intricacy of structural biology and medicinal chemistry in drug discovery: structural biology can help to design highly potent and selective chemical probes that in turn are used for target validation. Once a target is validated, structural biologists and medicinal chemists continue to work hand in hand to deliver even more potent and selective compounds with drug‐like properties. It also happens that a higher‐quality chemical probe (thanks to structure‐based optimization) leads to the de‐vali dation of a target [195]. Availability of crystal struc tures also enables structure‐based filtering or virtual screening for target deorphanization, e.g. in the case of nuclear receptors [196] or G protein‐coupled recep tors (GPCRs) [197, 198]. Hits can then be validated for receptor binding by X‐ray crystallography or other bio physical methods, optimized through structural studies,

11

12

1 The Evolving Role of Structural Biology in Drug Discovery

and the resulting chemical tools used to explore biology. SPA cryo‐EM has also become an invaluable technique for unveiling new structures of challenging targets and large/dynamic macromolecular complexes as seen from the recent deluge of exciting publications (see, for instance, table 1 of [111]; since 2002, cryo‐EM structures are deposited in the Electron Microscopy Data Bank [EMDB] [152, 199]), uncovering new biol ogy and potentially leading to new therapeutic strate gies (see, for instance, [200–202]). ii) Target evaluation (ligandability/druggability assess ment), where experimental methods complement sequence‐related computational methods and struc tural bioinformatics [203], as described in detail in Chapter 2, in order to assess at a very early stage whether a target can be modulated by small mole cules with drug‐like properties, which is key for tar get prioritization, and to increase the success rate of drug discovery projects. iii) Hit identification (e.g. fragment screening using X‐ ray crystallography [22, 204]). Numerous biophysical techniques are used to probe target–ligand interac tions (see, for instance, table 1 of [29]), but the great advantage of X‐ray crystallography screening is that not only binding is confirmed when a ligand is seen in the electron density map, as well as the identity of the ligand (and hence the occurrence of false posi tives is minimized), leading to direct deconvolution when compound cocktails are used, but also the pre cise binding site is revealed as well as the details of the interaction, enabling hit prioritization and con stituting the starting point for hit‐to‐lead optimiza tion. Crystallography has even been used as an affinity technique to isolate and identify strong bind ers from partially purified natural extracts [205]. iv) Hit‐to‐lead (typically the FBDD approach), where high‐throughput determination of target–ligand complex crystal structures greatly accelerates the fragment‐to‐lead process. Structural bioinformatics (Chapter 15) is an essential complement to experimental methods, which can be used, for instance: ●●

●●

●●

to help introduce selectivity by discriminating the target from close relatives that should not be targeted [206], to predict off‐targets that may cause adverse side effects [207, 208], to design covalent drugs from high‐affinity non‐cova lent ligands [209].

When the protein of interest is no longer the TARGET, as in the case of small‐molecule drugs, but the DRUG itself, as in the case of biotherapeutics (including mono clonal antibodies, recombinant proteins, peptides, etc.),

structural biology techniques and other biophysical techniques are also used all along the R&D pipeline but with different goals: discovery biology, biotherapeutics selection, biotherapeutics structure determination (see, for instance, the use of NMR in Chapter 13), ideally in complex with its target [210], biotherapeutics charac terization and optimization (see, for instance, the use of mass spectrometry for the characterization of therapeu tic antibodies in Chapter 21), biotherapeutics quality control (essentially to assess its stability [211] and the presence of contaminants), and biosimilar analysis.

1.4 Evolving Drugs and Targets In the old days, drugs were mostly derived from ancestral knowledge on plants and, to a lesser extent, minerals. With the development of organic chemistry in the second half of the nineteenth century, natural products started to be syn thesized, and many new synthetic compounds became available for pharmaceutical investigations. However, pre cise mechanisms of action could be revealed only with the advent of biochemistry and later molecular biology and structural biology. For instance, the industrial production of aspirin was started by Bayer at the end of the nineteenth century, but its target, cyclooxygenase (COX), was only isolated in 1976, its cDNA cloned in 1988, and its crystal structure solved in 1994. But in 1991, an alternative form of COX, COX‐2, was identified, and in the following years, it was shown that inhibition of COX‐2 led to the anti‐inflam matory effect of aspirin, while inhibition of COX‐1 caused its unwanted side effects. The determination of the crystal structure of COX‐2 in 1996 paved the way to the design of COX‐2 selective inhibitors with reduced side effects [212]. Today’s drugs and drug candidates have become much more diverse and complex, ranging from small molecules to macrocycles, peptides or peptidomimetics, recombi nant proteins (including antibodies and their numerous modified versions, such as antibody–drug conjugates [ADCs]), RNAs and their chemically modified versions (aptamers, small interfering RNAs [siRNAs], antisense oli gonucleotides [called ASOs or AONs], or mRNAs), DNAs (for gene therapy), and even cells (cell therapy) in order to address more and more challenging targets [213, 214]. In parallel, the range of drug targets has recently consider ably widened. New therapeutic candidates are developed that not only act directly on a protein (active‐site‐directed or allosteric inhibitors, activators, or modulators) or a protein–protein interface (protein–protein interaction inhibitors or stabilizers) but also modulate the expression of a protein target’s encoding gene at the level of DNA (genome editing) or mRNA (gene silencing, exon skipping, complementation, inhibition of translation, etc.) or even at

1.5 Current Trends and Perspectives

the level of its transcriptional or translational regulation (modulation of gene expression regulatory factors). A recent strategy, targeted protein degradation, consists in specifically inducing the degradation of an unwanted protein thanks to bifunctional ligands often called PROTACs (proteolysis‐targeting chimeras) that simulta neously recruit the target protein and a ubiquitin E3 ligase, resulting in the ubiquitination of the target pro tein, tagging it for proteasome degradation [215]. SBDD is inherently linked to the target‐centric approach. At the beginning, SBDD projects were focusing on single pharmacological targets, essentially proteins, that were known or thought to be involved in diseases. The number of existing drug targets has been a recurrent question [216–218]. Two important criteria for pursuing a drug tar get are its druggability (Chapter 2) and the possibility to design selective drugs that do not bind closely related pro teins to avoid potential adverse effects (indeed many pro tein targets belong to multimember families). Another obvious criterion is target tractability, and the increase in the effective number of drug targets in drug discovery pro jects using SBDD has paralleled technical progress in pro tein biochemistry and protein crystallization. Before the 2000s, SBDD was applied mostly to soluble proteins or protein domains. The first crystal structure of a membrane protein appeared in 1985 [115], but the number of mem brane protein crystal structures began to grow significantly only after 2000 [219, 220], especially in the therapeutically important GPCR family. Recently, cryo‐EM has signifi cantly contributed to the determination of new structures of challenging membrane proteins, including ion channels, and of large membrane protein complexes [111, 221]. The use of structural biology in drug discovery is illus trated in Part III of the book for a selection of important therapeutic target families that have already been inten sively studied but still remain challenging: ●● ●● ●● ●● ●●

Kinases (Chapter 16) Serine proteinases (Chapter 17) Epigenetic targets (Chapter 18) GPCRs (Chapter 19) Protein–protein interfaces (Chapter 20)

Finally, the idea that modulating a single target is sometimes insufficient to treat complex diseases opened the way to polypharmacology, also called systems phar macology or network pharmacology [222, 223]. Chapter 3 describes the in silico methods of structural chemog enomics used in the multi‐target approach.

1.5

Current Trends and Perspectives

In the swirl around the spectacular progress in cryo‐EM, a news feature entitled “The revolution will not be

crystallized” suggested crystallography would soon be ousted by cryo‐EM [224], but this statement was certainly premature. Indeed X‐ray crystallography is far from obso lete [225] and, on the contrary, is constantly reinventing itself, with in situ crystallography, automated crystal har vesting, SFX, and SMX, as brilliantly exposed in Chapter 10, and will certainly remain, for a couple of years at least, the most suitable technique for high‐throughput SBDD, especially with small‐size proteins or protein domains and with fragments. Most probably, both tech niques will remain complementary for a long time, and researchers will simply have to choose the one best suited to bring them the information they need at a given stage. As crystallization is still the limiting step for crystallo graphic studies, further progress is still needed in the automation [226] and miniaturization of the process [227] and in the follow‐up (crystal recognition) [228]. Innovative methods are always needed [74, 229], and future developments should help to address the current and upcoming needs ranging from nanocrystals for SFX and MicroED to large crystals for neutron crystallogra phy [74, 230]. Some recent studies use in vivo grown microcrystals for SFX [231, 232] or SMX [81, 233], in particular for direct in cellulo diffraction [232, 233]. Cryo‐EM will be a most useful complementary tech nique to answer biological questions, mostly by address ing large and/or dynamic macromolecular complexes and challenging targets such as membrane proteins. Meanwhile, it will undoubtedly continue to make the necessary progress in terms of resolution and through put needed to become a routine tool in drug discovery. One important issue is the definition of widely accepted, rigorous deposition standards. MicroED apparently holds a great potential [234, 235] but still has to demonstrate its wide applicability to mac romolecule structure determination. Structural biology is becoming more and more inte grated, from atoms to organisms and with the adjunction of the time dimension (4D). As a first step, bridging the gap between molecular and cellular structural biology [236] will not only advance basic knowledge in biology but also give some hints to develop new therapeutic approaches. Besides continuing to provide 3D – and more and more 4D – information to advance knowledge in biology, structural biology is certainly heading toward very excit ing future developments, in particular thanks to the almost infinite possibilities opened by the so‐called new modalities [213, 214]. More than ever structural biolo gists will work hand in hand with medicinal chemists (see, for instance, [237, 238] in the case of PROTACs). As always, new breakthroughs will mostly be dictated by technological developments. For instance, it remains to be seen whether and when cryo‐EM will become a routine, high‐resolution, and high‐throughput tool for

13

14

1 The Evolving Role of Structural Biology in Drug Discovery

drug discovery. It will also depend on a larger access to the community [239], as has happened with X‐ray crys tallography at synchrotron facilities. In this regard, it is also essential that a sufficient number of highly trained scientists be formed in both academic and industrial environments. In the same way, SFX holds a huge poten tial and could become a significant player provided sev eral technical bottlenecks are overcome, but many more facilities are needed to provide enough beam time to users. Maybe the very compact XFEL in construction at the University of Arizona (ASU Compact XFEL) will help address this issue. We are entering the Big Data era. Structural biology is shifting from 3D to 4D (space + time), i.e. from static to dynamic. Huge progress in our understanding of biology is expected from a combination of 4D structural data with “omics” (genomics, proteomics, metabolomics, etc.), ther modynamics and kinetics studies, molecular dynamics, systems biology, etc. No doubt that one day all this pro gress will enhance drug discovery and surely structural biology will still have a crucial part to play in this fas cinating evolution.

Structural biology will be instrumental in the fight against mutations that cause drug resistance in cancer cells, for instance, in the case of kinases ([240–242] and Chapter 16), and in bacterial [243] and viral (Chapter 22) pathogens, and also in precision medicine [244] by helping understand how natural mutations in human targets alter drug response (see, for instance, [245] in the case of GPCRs). It will also be instrumental in the study of other challenging targets such as proteins from pathogenic organisms causing neglected diseases such as Myco bacterium tuberculosis (Chapter 23), enzymes that metab olize drugs such as P450s (Chapter 24), IDPs (Chapter 25), and exciting new targets such as RNAs [246]. Drug discovery is such a complex and difficult endeavor, involving so many different aspects, disciplines, and skills that it is a wonder that quite a number of efficient drugs eventually reach the market to treat various conditions. Structural biology is just another piece of the puzzle, a very powerful one, but unable on its own to guide the medicinal chemist to the optimal drug compound. This is a very exciting period for this fascinating part of the drug discov ery toolbox, and probably much more lies in the future.

References 1 Franklin, R.E. and Gosling, R.G. (1953). Molecular

2

3 4

5

6

7 8

9

10

configuration in sodium thymonucleate. Nature 171: 740–741. Watson, J.D. and Crick, F.H. (1953). Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature 171: 737–738. Eisenberg, D. (1994). Max Perutz’s achievements: how did he do it? Protein Sci. 3: 1625–1628. Kendrew, J.C., Bodo, G., Dintzis, H.M. et al. (1958). A three‐dimensional model of the myoglobin molecule obtained by x‐ray analysis. Nature 181: 662–666. Kendrew, J.C., Dickerson, R.E., Strandberg, B.E. et al. (1960). Structure of myoglobin: a three‐dimensional Fourier synthesis at 2 A. resolution. Nature 185: 422–427. Perutz, M.F., Rossmann, M.G., Cullis, A.F. et al. (1960). Structure of haemoglobin: a three‐dimensional Fourier synthesis at 5.5‐A. resolution, obtained by X‐ray analysis. Nature 185: 416–422. Drews, J. (2000). Drug discovery: a historical perspective. Science 287: 1960–1964. Congreve, M., Murray, C.W., and Blundell, T.L. (2005). Structural biology and drug discovery. Drug Discov. Today 10: 895–907. Perutz, M.F. and Lehmann, H. (1968). Molecular pathology of human haemoglobin. Nature 219: 902–909. Perutz, M.F. (1976). Fundamental research in molecular biology: relevance to medicine. Nature 262: 449–453.

11 Beddell, C.R., Goodford, P.J., Norrington, F.E. et al.

(1976). Compounds designed to fit a site of known structure in human haemoglobin. Br. J. Pharmacol. 57: 201–209. 12 Ondetti, M.A., Rubin, B., and Cushman, D.W. (1977). Design of specific inhibitors of angiotensin‐converting enzyme: new class of orally active antihypertensive agents. Science 196: 441–444. 13 Baldwin, J.J., Ponticello, G.S., Anderson, P.S. et al. (1989). Thienothiopyran‐2‐sulfonamides: novel topically active carbonic anhydrase inhibitors for the treatment of glaucoma. J. Med. Chem. 32: 2510–2513. 14 Committee on Recombinant DNA Molecules (1974). Potential biohazards of recombinant DNA molecules. Proc. Natl. Acad. Sci. U. S. A. 71: 2593–2594. 15 Jhoti, H. (2001). High‐throughput structural proteomics using x‐rays. Trends Biotechnol. 19: S67–S71. 16 Blundell, T.L., Jhoti, H., and Abell, C. (2002). High‐ throughput crystallography for lead discovery in drug design. Nat. Rev. Drug Discov. 1: 45–54. 17 Terwilliger, T.C., Stuart, D., and Yokoyama, S. (2009). Lessons from structural genomics. Annu. Rev. Biophys. 38: 371–383. 18 International Human Genome Sequencing Consortium (2001). Initial sequencing and analysis of the human genome. Nature 409: 860–921.

References

19 Banci, L., Bertini, I., Cusack, S. et al. (2006). First

20

21

22

23

24

25

26

27 28

29

30

31

32

steps towards effective methods in exploiting high‐ throughput technologies for the determination of human protein structures of high biomedical value. Acta Crystallogr. D Biol. Crystallogr. 62: 1208–1217. Gileadi, O., Knapp, S., Lee, W.H. et al. (2007). The scientific impact of the Structural Genomics Consortium: a protein family and ligand‐centered approach to medically‐relevant human proteins. J. Struct. Funct. Genomics 8: 107–119. Shuker, S.B., Hajduk, P.J., Meadows, R.P., and Fesik, S.W. (1996). Discovering high‐affinity ligands for proteins: SAR by NMR. Science 274: 1531–1534. Nienaber, V.L., Richardson, P.L., Klighofer, V. et al. (2000). Discovering novel ligands for macromolecules using X‐ray crystallographic screening. Nat. Biotechnol. 18: 1105–1108. Lesuisse, D., Lange, G., Deprez, P. et al. (2002). SAR and X‐ray. A new approach combining fragment‐based screening and rational drug design: application to the discovery of nanomolar inhibitors of Src SH2. J. Med. Chem. 45: 2379–2387. Jencks, W.P. (1981). On the attribution and additivity of binding energies. Proc. Natl. Acad. Sci. U. S. A. 78: 4046–4050. Verlinde, C.L., Fan, E., Shibata, S. et al. (2009). Fragment‐based cocktail crystallography by the medical structural genomics of pathogenic protozoa consortium. Curr. Top. Med. Chem. 9: 1678–1687. Hall, R.J., Mortenson, P.N., and Murray, C.W. (2014). Efficient exploration of chemical space by fragment‐ based screening. Prog. Biophys. Mol. Biol. 116: 82–91. Erlanson, D.A. and Jahnke, W. (2016). Fragment‐based Drug Discovery: Lessons and Outlook. Wiley‐VCH. Erlanson, D.A., Fesik, S.W., Hubbard, R.E. et al. (2016). Twenty years on: the impact of fragments on drug discovery. Nat. Rev. Drug Discov. 15: 605–619. Renaud, J.P., Chung, C.W., Danielson, U.H. et al. (2016). Biophysics in drug discovery: impact, challenges and opportunities. Nat. Rev. Drug Discov. 15: 679–698. Keseru, G.M., Erlanson, D.A., Ferenczy, G.G. et al. (2016). Design principles for fragment libraries: maximizing the value of learnings from Pharma Fragment‐Based Drug Discovery (FBDD) Programs for use in academia. J. Med. Chem. 59: 8189–8206. Erlanson, D. (2018). Fragments in the clinic: 2018 edition. http://practicalfragments.blogspot. com/2018/10/fragments‐in‐clinic‐2018‐edition.html (accessed 6 October 2018). Macarron, R., Banks, M.N., Bojanic, D. et al. (2011). Impact of high‐throughput screening in biomedical research. Nat. Rev. Drug Discov. 10: 188–195.

33 Chen, H., Zhou, X., Wang, A. et al. (2015). Evolutions

34 35 36 37

38

39

40

41

42

43 44

45

46 47

48

49 50

in fragment‐based drug design: the deconstruction‐ reconstruction approach. Drug Discov. Today 20: 105–113. Schneider, G. (2018). Automating drug discovery. Nat. Rev. Drug Discov. 17: 97–113. Egner, U., Kratzschmar, J., Kreft, B. et al. (2005). The target discovery process. Chembiochem 6: 468–479. Sams‐Dodd, F. (2005). Target‐based drug discovery: is something wrong? Drug Discov. Today 10: 139–147. Moffat, J.G., Vincent, F., Lee, J.A. et al. (2017). Opportunities and challenges in phenotypic drug discovery: an industry perspective. Nat. Rev. Drug Discov. 16: 531–543. Haasen, D., Schopfer, U., Antczak, C. et al. (2017). How phenotypic screening influenced drug discovery: lessons from five years of practice. Assay Drug Dev. Technol. 15: 239–246. Jones, L.H. and Bunnage, M.E. (2017). Applications of chemogenomic library screening in drug discovery. Nat. Rev. Drug Discov. 16: 285–296. Blundell, T., Dodson, G., Hodgkin, D., and Mercola, D. (1972). Insulin: the structure in the crystal and its reflection in chemistry and biology. Adv. Protein Chem. 26: 279–402. Informa UK Ltd. (2017). Pharma R&D annual review 2017. https://pharmaintelligence.informa.com/~/ media/Informa‐Shop‐Window/Pharma/Files/PDFs/ whitepapers/RD‐Review‐2017.pdf (accessed 6 October 2018). Urquhart, L. (2018). Market watch: Top drugs and companies by sales in 2017. Nat. Rev. Drug Discov. 17: 232. Sullenger, B.A. and Nair, S. (2016). From the RNA world to the clinic. Science 352: 1417–1420. Fellmann, C., Gowen, B.G., Lin, P.C. et al. (2017). Cornerstones of CRISPR‐Cas in drug discovery and therapy. Nat. Rev. Drug Discov. 16: 89–100. Changeux, J.P. (2013). The concept of allosteric modulation: an overview. Drug Discov. Today Technol. 10: e223–e228. Motlagh, H.N., Wrabl, J.O., Li, J., and Hilser, V.J. (2014). The ensemble nature of allostery. Nature 508: 331–339. Changeux, J.P. and Christopoulos, A. (2016). Allosteric modulation as a unifying mechanism for receptor function and regulation. Cell 166: 1084–1102. Sledz, P. and Caflisch, A. (2018). Protein structure‐ based drug design: from docking to molecular dynamics. Curr. Opin. Struct. Biol. 48: 93–102. Kuhlbrandt, W. (2014). Biochemistry. The resolution revolution. Science 343: 1443–1444. Bai, X.C., McMullan, G., and Scheres, S.H. (2015). How cryo‐EM is revolutionizing structural biology. Trends Biochem. Sci. 40: 49–57.

15

16

1 The Evolving Role of Structural Biology in Drug Discovery

51 Wlodawer, A. and Dauter, Z. (2017). `Atomic

52

53

54

55

56

57

58

59

60

61

62

63

64

65

resolution’: a badly abused term in structural biology. Acta Crystallogr. D Struct. Biol. 73: 379–380. Chiu, W., Holton, J., Langan, P. et al. (2017). Responses to ‘Atomic resolution’: a badly abused term in structural biology. Acta Crystallogr. D Struct. Biol. 73: 381–383. Rupp, B. (2010). Biomolecular Crystallography: Principles, Practice, and Applications to Structural Biology. Garland Science, Taylor and Francis Group. Arnold, E., Himmel, D.M., and Rossmann, M.G. (eds.) (2012). International Tables for Crystallography, Crystallography of Biological Macromolecules, 2e, vol. Volume F. Wiley. Wlodawer, A., Dauter, Z., and Jaskolski, M. (eds.) (2017). Protein Crystallography – Methods and Protocols. Humana Press. Abola, E., Kuhn, P., Earnest, T., and Stevens, R.C. (2000). Automation of X‐ray crystallography. Nat. Struct. Biol. 7 (Suppl): 973–977. Muchmore, S.W., Olson, J., Jones, R. et al. (2000). Automated crystal mounting and data collection for protein crystallography. Structure 8: R243–R246. Cohen, A.E., Ellis, P.J., Miller, M.D. et al. (2002). An automated system to mount cryo‐cooled protein crystals on a synchrotron beam line, using compact sample cassettes and a small‐scale robot. J. Appl. Cryst. 35: 720–726. Beteva, A., Cipriani, F., Cusack, S. et al. (2006). High‐ throughput sample handling and data collection at synchrotrons: embedding the ESRF into the high‐ throughput gene‐to‐structure pipeline. Acta Crystallogr. D Biol. Crystallogr. 62: 1162–1169. Cipriani, F., Felisaz, F., Launer, L. et al. (2006). Automation of sample mounting for macromolecular crystallography. Acta Crystallogr. D Biol. Crystallogr. 62: 1251–1259. Wasserman, S.R., Koss, J.W., Sojitra, S.T. et al. (2012). Rapid‐access, high‐throughput synchrotron crystallography for drug discovery. Trends Pharmacol. Sci. 33: 261–267. Jacquamet, L., Ohana, J., Joly, J. et al. (2004). Automated analysis of vapor diffusion crystallization drops with an X‐ray beam. Structure 12: 1219–1225. Bingel‐Erlenmeyer, R., Olieric, V., Grimshaw, J.P.A. et al. (2011). SLS crystallization platform at beamline X06DA—a fully automated pipeline enabling in situ X‐ray diffraction screening. Cryst. Growth Des. 11: 916–923. le Maire, A., Gelin, M., Pochet, S. et al. (2011). In‐plate protein crystallization, in situ ligand soaking and X‐ray diffraction. Acta Crystallogr. D Biol. Crystallogr. 67: 747–755. Gelin, M., Delfosse, V., Allemand, F. et al. (2015). Combining ’dry’ co‐crystallization and in situ

66

67

68

69

70

71

72

73 74

75

76

77

78

79

80

diffraction to facilitate ligand screening by X‐ray crystallography. Acta Crystallogr. D Biol. Crystallogr. 71: 1777–1787. Cipriani, F., Rower, M., Landret, C. et al. (2012). CrystalDirect: a new method for automated crystal harvesting based on laser‐induced photoablation of thin films. Acta Crystallogr. D Biol. Crystallogr. 68: 1393–1399. Deller, M.C. and Rupp, B. (2014). Approaches to automated protein crystal harvesting. Acta Crystallogr. F Struct. Biol. Commun. 70: 133–155. Yin, X., Scalia, A., Leroy, L. et al. (2014). Hitting the target: fragment screening with acoustic in situ co‐ crystallization of proteins plus fragment libraries on pin‐mounted data‐collection micromeshes. Acta Crystallogr. D Biol. Crystallogr. 70: 1177–1189. Smith, J.L., Fischetti, R.F., and Yamamoto, M. (2012). Micro‐crystallography comes of age. Curr. Opin. Struct. Biol. 22: 602–612. Garman, E.F. and Owen, R.L. (2006). Cryocooling and radiation damage in macromolecular crystallography. Acta Crystallogr. D Biol. Crystallogr. 62: 32–47. Garman, E.F. (2010). Radiation damage in macromolecular crystallography: what is it and why should we care? Acta Crystallogr. D Biol. Crystallogr. 66: 339–351. Chapman, H.N., Fromme, P., Barty, A. et al. (2011). Femtosecond X‐ray protein nanocrystallography. Nature 470: 73–77. Spence, J.C.H. (2017). XFELs for structure and dynamics in biology. IUCrJ 4: 322–339. Johansson, L.C., Stauch, B., Ishchenko, A., and Cherezov, V. (2017). A bright future for serial femtosecond crystallography with XFELs. Trends Biochem. Sci. 42: 749–762. Cheng, R.K.Y., Abela, R., and Hennig, M. (2017). X‐ray free electron laser: opportunities for drug discovery. Essays Biochem. 61: 529–542. Solem, J.C. (1986). Imaging biological specimens with high‐intensity soft x rays. J. Opt. Soc. Am. B 3: 1551–1565. Neutze, R., Wouts, R., van der Spoel, D. et al. (2000). Potential for biomolecular imaging with femtosecond X‐ray pulses. Nature 406: 752–757. Kupitz, C., Basu, S., Grotjohann, I. et al. (2014). Serial time‐resolved crystallography of photosystem II using a femtosecond X‐ray laser. Nature 513: 261–265. Nango, E., Royant, A., Kubo, M. et al. (2016). A three‐dimensional movie of structural changes in bacteriorhodopsin. Science 354: 1552–1557. Stagno, J.R., Liu, Y., Bhandari, Y.R. et al. (2017). Structures of riboswitch RNA reaction states by mix‐and‐inject XFEL serial crystallography. Nature 541: 242–246.

References

81 Gati, C., Bourenkov, G., Klinge, M. et al. (2014). Serial

crystallography on in vivo grown microcrystals using synchrotron radiation. IUCrJ 1: 87–94. 82 Yamamoto, M., Hirata, K., Yamashita, K. et al. (2017). Protein microcrystallography using synchrotron radiation. IUCrJ 4: 529–539. 83 Weinert, T., Olieric, N., Cheng, R. et al. (2017). Serial millisecond crystallography for routine room‐ temperature structure determination at synchrotrons. Nat. Commun. 8: 542. 84 Blakeley, M.P., Hasnain, S.S., and Antonyuk, S.V. (2015). Sub‐atomic resolution X‐ray crystallography and neutron crystallography: promise, challenges and potential. IUCrJ 2: 464–474. 85 Chen, J.C. and Unkefer, C.J. (2017). Fifteen years of the protein crystallography station: the coming of age of macromolecular neutron crystallography. IUCrJ 4: 72–86. 86 Vandavasi, V.G., Weiss, K.L., Cooper, J.B. et al. (2016). Exploring the mechanism of beta‐lactam ring protonation in the class A beta‐lactamase acylation mechanism using neutron and X‐ray crystallography. J. Med. Chem. 59: 474–479. 87 Aggarwal, M., Kovalevsky, A.Y., Velazquez, H. et al. (2016). Neutron structure of human carbonic anhydrase II in complex with methazolamide: mapping the solvent and hydrogen‐bonding patterns of an effective clinical drug. IUCrJ 3: 319–325. 88 Gerlits, O., Keen, D.A., Blakeley, M.P. et al. (2017). Room temperature neutron crystallography of drug resistant HIV‐1 protease uncovers limitations of X‐ray structural analysis at 100 K. J. Med. Chem. 60: 2018–2025. 89 De Rosier, D.J. and Klug, A. (1968). Reconstruction of three dimensional structures from electron micrographs. Nature 217: 130–134. 90 Henderson, R. and Unwin, P.N. (1975). Three‐ dimensional model of purple membrane obtained by electron microscopy. Nature 257: 28–32. 91 Henderson, R., Baldwin, J.M., Ceska, T.A. et al. (1990). Model for the structure of bacteriorhodopsin based on high‐resolution electron cryo‐microscopy. J. Mol. Biol. 213: 899–929. 92 Glaeser, R.M. (1999). Review: electron crystallography: present excitement, a nod to the past, anticipating the future. J. Struct. Biol. 128: 3–14. 93 Shi, D., Nannenga, B.L., Iadanza, M.G., and Gonen, T. (2013). Three‐dimensional electron crystallography of protein microcrystals. Elife 2: e01345. 94 Nannenga, B.L., Shi, D., Leslie, A.G.W., and Gonen, T. (2014). High‐resolution structure determination by continuous‐rotation data collection in MicroED. Nat. Methods 11: 927–930.

95 McPherson, A. (1999). Crystallization of Biological

96 97

98

99

100

101 102

103

104

105 106

107

108

109

110

Macromolecules. New York: Cold Spring Harbor Laboratory Press. Bergfors, T. (2009). Protein Crystallization: Second Edition. La Jolla: International University Line. McPherson, A. and Gavira, J.A. (2014). Introduction to protein crystallization. Acta Crystallogr. F Struct. Biol. Commun. 70: 2–20. McPherson, A. and Kuznetsov, Y.G. (2014). Mechanisms, kinetics, impurities and defects: consequences in macromolecular crystallization. Acta Crystallogr. F Struct. Biol. Commun. 70: 384–403. McPherson, A. and Cudney, B. (2014). Optimization of crystallization conditions for biological macromolecules. Acta Crystallogr. F Struct. Biol. Commun. 70: 1445–1467. Chung, C.W. (2007). The use of biophysical methods increases success in obtaining liganded crystal structures. Acta Crystallogr. D Biol. Crystallogr. 63: 62–71. de Broglie, L. (1924). Recherches sur la théorie des quanta. Université de Paris. Dubochet, J. and McDowall, A.W. (1981). Vitrification of pure water for electron microscopy. J. Microsc. 124: RP3–RP4. Merk, A., Bartesaghi, A., Banerjee, S. et al. (2016). Breaking cryo‐EM resolution barriers to facilitate drug discovery. Cell 165: 1698–1707. Frank, J. (2017). Advances in the field of single‐ particle cryo‐electron microscopy over the last decade. Nat. Protoc. 12: 209–212. Cheng, Y. (2018). Single‐particle cryo‐EM‐How did it get here and where will it go. Science 361: 876–880. Subramaniam, S., Earl, L.A., Falconieri, V. et al. (2016). Resolution advances in cryo‐EM enable application to drug discovery. Curr. Opin. Struct. Biol. 41: 194–202. Merino, F. and Raunser, S. (2017). Electron cryo‐ microscopy as a tool for structure‐based drug development. Angew. Chem. Int. Ed. Engl. 56: 2846–2860. Venien‐Bryan, C., Li, Z., Vuillard, L., and Boutin, J.A. (2017). Cryo‐electron microscopy and X‐ray crystallography: complementary approaches to structural biology and drug discovery. Acta Crystallogr. F Struct. Biol. Commun. 73: 174–183. Rawson, S., McPhillie, M.J., Johnson, R.M. et al. (2017). The potential use of single‐particle electron microscopy as a tool for structure‐based inhibitor design. Acta Crystallogr. D Struct. Biol. 73: 534–540. Boland, A., Chang, L., and Barford, D. (2017). The potential of cryo‐electron microscopy for structure‐ based drug design. Essays Biochem. 61: 543–560.

17

18

1 The Evolving Role of Structural Biology in Drug Discovery

111 Renaud, J.P., Chari, A., Ciferri, C. et al. (2018).

127 Bollag, G., Tsai, J., Zhang, J. et al. (2012). Vemurafenib:

112

128

113 114

115

116

117

118

119

120

121

122

123

124

125

126

Cryo‐EM in drug discovery: achievements, limitations and prospects. Nat. Rev. Drug Discov. 17: 471–492. Adams, M.J., Blundell, T.L., Dodson, E.J. et al. (1969). Structure of Rhombohedral 2 zinc insulin crystals. Nature 224: 491. Protein Data Bank (1971). Nat. New Biol. 233: 223. Collaborative Computational Project, Number 4 (1994). The CCP4 suite: programs for protein crystallography. Acta Crystallogr. D Biol. Crystallogr. 50: 760–763. Deisenhofer, J., Epp, O., Miki, K. et al. (1985). Structure of the protein subunits in the photosynthetic reaction centre of Rhodopseudomonas viridis at 3A resolution. Nature 318: 618–624. Miller, M., Jaskolski, M., Rao, J.K. et al. (1989). Crystal structure of a retroviral protease proves relationship to aspartic protease family. Nature 337: 576–579. Navia, M.A., Fitzgerald, P.M., McKeever, B.M. et al. (1989). Three‐dimensional structure of aspartyl protease from human immunodeficiency virus HIV‐1. Nature 337: 615–620. Lapatto, R., Blundell, T., Hemmings, A. et al. (1989). X‐ray analysis of HIV‐1 proteinase at 2.7 A resolution confirms structural homology among retroviral enzymes. Nature 342: 299–302. Wlodawer, A., Miller, M., Jaskolski, M. et al. (1989). Conserved folding in retroviral proteases: crystal structure of a synthetic HIV‐1 protease. Science 245: 616–621. Miller, M., Schneider, J., Sathyanarayana, B.K. et al. (1989). Structure of complex of synthetic HIV‐1 protease with a substrate‐based inhibitor at 2.3 A resolution. Science 246: 1149–1152. Abrahams, J.P., Leslie, A.G., Lutter, R., and Walker, J.E. (1994). Structure at 2.8 A resolution of F1‐ATPase from bovine heart mitochondria. Nature 370: 621–628. Doyle, D.A., Morais Cabral, J., Pfuetzner, R.A. et al. (1998). The structure of the potassium channel: molecular basis of K+ conduction and selectivity. Science 280: 69–77. Palczewski, K., Kumasaka, T., Hori, T. et al. (2000). Crystal structure of rhodopsin: a G protein‐coupled receptor. Science 289: 739–745. Hardy, L.W. and Malikayil, A. (2003). The impact of structure‐guided drug design on clinical agents. Curr. Drug Discov.: 15–20. Ben‐Shem, A., Garreau de Loubresse, N., Melnikov, S. et al. (2011). The structure of the eukaryotic ribosome at 3.0 A resolution. Science 334: 1524–1529. Rasmussen, S.G., DeVree, B.T., Zou, Y. et al. (2011). Crystal structure of the beta2 adrenergic receptor‐Gs protein complex. Nature 477: 549–555.

129 130

131 132

133 134

135

136

137

138

139

140

141

142

143

the first drug approved for BRAF‐mutant cancer. Nat. Rev. Drug Discov. 11: 873–886. Purcell, E.M., Torrey, H.C., and Pound, R.V. (1946). Resonance absorption by nuclear magnetic moments in a solid. Phys. Rev. 69: 37–38. Bloch, F., Hansen, W.W., and Packard, M. (1946). Nuclear induction. Phys. Rev. 69: 127. Saunders, M., Wishnia, A., and Kirkwood, J.G. (1957). The nuclear magnetic resonance spectrum of ribonuclease. J. Am. Chem. Soc. 79: 3289–3290. Ernst, R.R. (1966). Sensitivity enhancement in magnetic resonance. Adv. Magn. Reson. 2: 1–135. Ernst, R.R. and Anderson, W.A. (1966). Application of Fourier transform spectroscopy to magnetic resonance. Rev. Sci. Instrum. 37: 93. Jeener, J. (1971). Unpublished lecture. Ampère Summer Scholl, Basko Polje, Yugoslavia. Aue, W.P., Bartholdi, E., and Ernst, R.R. (1976). Two‐dimensional spectroscopy. Application to nuclear magnetic resonance. J. Chem. Phys. 64: 2229–2246. Bax, A. and Grzesiek, S. (1993). Methodological advances in protein NMR. Acc. Chem. Res. 26: 131–138. Williamson, M.P., Havel, T.F., and Wuthrich, K. (1985). Solution conformation of proteinase inhibitor IIA from bull seminal plasma by 1H nuclear magnetic resonance and distance geometry. J. Mol. Biol. 182: 295–315. Oschkinat, H., Griesinger, C., Kraulis, P.J. et al. (1988). Three‐dimensional NMR spectroscopy of a protein in solution. Nature 332: 374–376. Vuister, G.W., Boelens, R., and Kaptein, R. (1988). Nonselective three‐dimensional NMR spectroscopy. The 3D NOE‐HOHAHA experiment. J. Magn. Reson. 80: 176–185. Mayer, M. and Meyer, B. (1999). Characterization of ligand binding by saturation transfer difference NMR spectroscopy. Angew. Chem. Int. Ed. 38: 1784–1788. Hajduk, P.J., Augeri, D.J., Mack, J. et al. (2000). NMR‐based screening of proteins containing 13C‐ labeled methyl groups. J. Am. Chem. Soc. 122: 7898–7904. Dalvit, C., Pevarello, P., Tato, M. et al. (2000). Identification of compounds with binding affinity to proteins via magnetization transfer from bulk water. J. Biomol. NMR 18: 65–68. Jahnke, W. (2002). Spin labels as a tool to identify and characterize protein‐ligand interactions by NMR spectroscopy. Chembiochem 3: 167–173. Jahnke, W., Floersheim, P., Ostermeier, C. et al. (2002). NMR reporter screening for the detection of high‐

References

144

145

146

147

148

149

150

151

152

153

154

155

156

157

affinity ligands. Angew. Chem. Int. Ed. Engl. 41: 3420–3423. Dalvit, C., Fagerness, P.E., Hadden, D.T. et al. (2003). Fluorine‐NMR experiments for high‐throughput screening: theoretical aspects, practical considerations, and range of applicability. J. Am. Chem. Soc. 125: 7696–7703. Ludwiczek, M.L., Baminger, B., and Konrat, R. (2004). NMR probing of protein‐protein interactions using reporter ligands and affinity tags. J. Am. Chem. Soc. 126: 1636–1637. Bruker (2010). Bruker introduces SampleXpress(TM), a new and easy‐to‐use autosampler that delivers increased efficiency for NMR. https://ir.bruker.com/ press‐releases/press‐release‐details/2010/Bruker‐ Introduces‐SampleXpressTM‐a‐New‐and‐Easy‐to‐ Use‐Autosampler‐that‐Delivers‐Increased‐Efficiency‐ for‐NMR/default.aspx (accessed 26 April 2019). Adrian, M., Dubochet, J., Lepault, J., and McDowall, A.W. (1984). Cryo‐electron microscopy of viruses. Nature 308: 32–36. Dubochet, J., Adrian, M., Chang, J.J. et al. (1988). Cryo‐electron microscopy of vitrified specimens. Q. Rev. Biophys. 21: 129–228. van Heel, M. and Frank, J. (1981). Use of multivariate statistics in analysing the images of biological macromolecules. Ultramicroscopy 6: 187–194. Henderson, R. (1995). The potential and limitations of neutrons, electrons and X‐rays for atomic resolution microscopy of unstained biological molecules. Q. Rev. Biophys. 28: 171–193. Grigorieff, N., Ceska, T.A., Downing, K.H. et al. (1996). Electron‐crystallographic refinement of the structure of bacteriorhodopsin. J. Mol. Biol. 259: 393–421. Tagari, M., Newman, R., Chagoyen, M. et al. (2002). New electron microscopy database and deposition system. Trends Biochem. Sci. 27: 589. Gonen, T., Cheng, Y., Sliz, P. et al. (2005). Lipid‐ protein interactions in double‐layered two‐ dimensional AQP0 crystals. Nature 438: 633–638. Yu, X., Jin, L., and Zhou, Z.H. (2008). 3.88 A structure of cytoplasmic polyhedrosis virus by cryo‐electron microscopy. Nature 453: 415–419. Zhang, X., Settembre, E., Xu, C. et al. (2008). Near‐ atomic resolution using electron cryomicroscopy and single‐particle reconstruction. Proc. Natl. Acad. Sci. U. S. A. 105: 1867–1872. Jiang, W., Baker, M.L., Jakana, J. et al. (2008). Backbone structure of the infectious epsilon15 virus capsid revealed by electron cryomicroscopy. Nature 451: 1130–1134. Ludtke, S.J., Baker, M.L., Chen, D.H. et al. (2008). De novo backbone trace of GroEL from single particle electron cryomicroscopy. Structure 16: 441–448.

158 European Commission (2009). Specific Targeted

159

160

161

162

163

164

165

166

167

168

169

170

Research Project “High‐throughput three‐ dimensional electron microscopy” (HT‐3DEM), EU 6th Framework Programme Project no. LSHG‐ CT‐2005‐018811, Publishable Final Activity Report (30 September 2009). https://cordis.europa.eu/ project/rcn/78402/factsheet/en (accessed 26 April 2019). Coudray, N., Hermann, G., Caujolle‐Bert, D. et al. (2011). Automated screening of 2D crystallization trials using transmission electron microscopy: a high‐throughput tool‐chain for sample preparation and microscopic analysis. J. Struct. Biol. 173: 365–374. Milazzo, A.C., Cheng, A., Moeller, A. et al. (2011). Initial evaluation of a direct detection device detector for single particle cryo‐electron microscopy. J. Struct. Biol. 176: 404–408. Li, X., Mooney, P., Zheng, S. et al. (2013). Electron counting and beam‐induced motion correction enable near‐atomic‐resolution single‐particle cryo‐EM. Nat. Methods 10: 584–590. Wood, C., Burnley, T., Patwardhan, A. et al. (2015). Collaborative computational project for electron cryo‐microscopy. Acta Crystallogr. D Biol. Crystallogr. 71: 123–126. Beck, F., Unverdorben, P., Bohn, S. et al. (2012). Near‐atomic resolution structural model of the yeast 26S proteasome. Proc. Natl. Acad. Sci. U. S. A. 109: 14870–14875. Liao, M., Cao, E., Julius, D., and Cheng, Y. (2013). Structure of the TRPV1 ion channel determined by electron cryo‐microscopy. Nature 504: 107–112. Allegretti, M., Mills, D.J., McMullan, G. et al. (2014). Atomic model of the F420‐reducing [NiFe] hydrogenase by electron cryo‐microscopy using a direct electron detector. Elife 3: e01963. Amunts, A., Brown, A., Bai, X.C. et al. (2014). Structure of the yeast mitochondrial large ribosomal subunit. Science 343: 1485–1489. Danev, R., Buijsse, B., Khoshouei, M. et al. (2014). Volta potential phase plate for in‐focus phase contrast transmission electron microscopy. Proc. Natl. Acad. Sci. U. S. A. 111: 15635–15640. Liang, Y.L., Khoshouei, M., Radjainia, M. et al. (2017). Phase‐plate cryo‐EM structure of a class B GPCR‐G‐ protein complex. Nature 546: 118–123. Zhang, Y., Sun, B., Feng, D. et al. (2017). Cryo‐EM structure of the activated GLP‐1 receptor in complex with a G protein. Nature 546: 248–253. Herzik, M.A. Jr., Wu, M., and Lander, G.C. (2017). Achieving better‐than‐3‐A resolution by single‐ particle cryo‐EM at 200 keV. Nat. Methods 14: 1075–1078.

19

20

1 The Evolving Role of Structural Biology in Drug Discovery

171 Pagano, J.S. (1969). Assay of infectious DNA. In:

172

173

174

175

176

177

178

179

180

181

182

183

184

185

Fundamental Techniques in Virology (eds. K. Habel and N.P. Salzman), 184–197. New York: Academic Press. Blundell, T.L. (2017). Protein crystallography and drug discovery: recollections of knowledge exchange between academia and industry. IUCrJ 4: 308–321. Wlodawer, A., Minor, W., Dauter, Z., and Jaskolski, M. (2008). Protein crystallography for non‐ crystallographers, or how to get the best (but not more) from published macromolecular structures. FEBS J. 275: 1–21. Davis, A.M., St‐Gallay, S.A., and Kleywegt, G.J. (2008). Limitations and lessons in the use of X‐ray structural information in drug design. Drug Discov. Today 13: 831–841. Zheng, H., Hou, J., Zimmerman, M.D. et al. (2014). The future of crystallography in drug discovery. Expert Opin. Drug Discovery 9: 125–137. Wang, J. and Moore, P.B. (2017). On the interpretation of electron microscopic maps of biological macromolecules. Protein Sci. 26: 122–129. Neumann, P., Dickmanns, A., and Ficner, R. (2018). Validating resolution revolution. Structure 26: 785–795 e784. Afonine, P.V., Klaholz, B.P., Moriarty, N.W. et al. (2018). New tools for the analysis and validation of cryo‐EM maps and atomic models. Acta Crystallogr. D Struct. Biol. 74: 814–840. Wlodawer, A., Li, M., and Dauter, Z. (2017). High‐ resolution cryo‐EM maps and models: a crystallographer’s perspective. Structure 25: 1589– 1597 e1581. Svergun, D.I., Koch, M.H.J., Timmins, P.A., and May, R.P. (2013). Small‐Angle X‐ray and Neutron Scattering from Solutions of Biological Macromolecules. Oxford University Press. Vestergaard, B. (2016). Analysis of biostructural changes, dynamics, and interactions ‐ Small‐angle X‐ray scattering to the rescue. Arch. Biochem. Biophys. 602: 69–79. van der Wel, P.C.A. (2018). New applications of solid‐state NMR in structural biology. Emerg. Top. Life Sci. 2: 57–67. Guldenhaupt, J., Amaral, M., Kotting, C. et al. (2018). Ligand‐induced conformational changes in hsp90 monitored time resolved and label free‐towards a conformational activity screening for drug discovery. Angew. Chem. Int. Ed. Engl. 57: 9955–9960. Huddler, D. and Zartler, E.R. (2017). Applied Biophysics for Drug Discovery. John Wiley & Sons Ltd. Canales, A. (2017). Biophysical Techniques in Drug Discovery. Royal Society of Chemistry.

186 Genick, C.C. and Wright, S.K. (2017). Biophysics: for

187

188

189

190

191

192

193

194

195

196

197

198

199

200

HTS hit validation, chemical lead optimization, and beyond. Expert Opin. Drug Discovery 12: 897–907. NovAliX Conferences «Biophysics in Drug Discovery». https://www.novalix‐conferences.org (accessed 26 April 2019). Burley, S.K., Berman, H.M., Christie, C. et al. (2018). RCSB Protein Data Bank: Sustaining a living digital data resource that enables breakthroughs in scientific research and biomedical education. Protein Sci. 27: 316–330. Shih, H.P., Zhang, X., and Aronov, A.M. (2018). Drug discovery effectiveness from the standpoint of therapeutic mechanisms and indications. Nat. Rev. Drug Discov. 17: 19–33. Hol, W.G.J. (1986). Protein crystallography and computer graphics – toward rational drug design. Angew. Chem. Int. Ed. Engl. 25: 767–778. Petersen, D.N., Hawkins, J., Ruangsiriluk, W. et al. (2016). A small‐molecule anti‐secretagogue of PCSK9 targets the 80S ribosome to inhibit PCSK9 protein translation. Cell Chem. Biol. 23: 1362–1371. Li, W., Ward, F.R., McClure, K.F. et al. (2019). Structural basis for selective stalling of human ribosome nascent chain complexes by a drug-like molecule. Nat. Struct. Mol. Biol. 26: 501–509. Bunnage, M.E., Chekler, E.L., and Jones, L.H. (2013). Target validation using chemical probes. Nat. Chem. Biol. 9: 195–199. Arrowsmith, C.H., Audia, J.E., Austin, C. et al. (2015). The promise and peril of chemical probes. Nat. Chem. Biol. 11: 536–541. Ellermann, M., Eheim, A., Rahm, F. et al. (2017). Novel class of potent and cellularly active inhibitors devalidates MTH1 as broad‐spectrum cancer target. ACS Chem. Biol. 12: 1986–1992. Renaud, J.P. (2004). Orphan receptors: targets for the future. In: SCREENING - Trends in Drug Discovery, 02/2004, 35–37. GIT VERLAG GmbH & Co. KG. Ngo, T., Kufareva, I., Coleman, J. et al. (2016). Identifying ligands at orphan GPCRs: current status using structure‐based approaches. Br. J. Pharmacol. 173: 2934–2951. Roth, B.L., Irwin, J.J., and Shoichet, B.K. (2017). Discovery of new GPCR ligands to illuminate new biology. Nat. Chem. Biol. 13: 1143–1151. EMBL‐EBI. Electron Microscopy Data Bank. https://www.ebi.ac.uk/pdbe/emdb (accessed 26 April 2019). Liu, Q., Acharya, P., Dolan, M.A. et al. (2017). Quaternary contact in the initial interaction of CD4 with the HIV‐1 envelope trimer. Nat. Struct. Mol. Biol. 24: 370–378.

References

201 Fitzpatrick, A.W.P., Falcon, B., He, S. et al. (2017).

217 Overington, J.P., Al‐Lazikani, B., and Hopkins, A.L.

202

218

203

204

205

206

207

208

209

210

211

212 213

214

215

216

Cryo‐EM structures of tau filaments from Alzheimer’s disease. Nature 547: 185–190. Ho, C.M., Beck, J.R., Lai, M. et al. (2018). Malaria parasite translocon structure and mechanism of effector export. Nature 561: 70–75. Egner, U. and Hillig, R.C. (2008). A structural biology view of target drugability. Expert Opin. Drug Discovery 3: 391–401. Patel, D., Bauman, J.D., and Arnold, E. (2014). Advantages of crystallographic fragment screening: functional and mechanistic insights from a powerful platform for efficient drug discovery. Prog. Biophys. Mol. Biol. 116: 92–100. Aguda, A.H., Lavallee, V., Cheng, P. et al. (2016). Affinity crystallography: a new approach to extracting high‐affinity enzyme inhibitors from natural extracts. J. Nat. Prod. 79: 1962–1970. Huggins, D.J., Sherman, W., and Tidor, B. (2012). Rational approaches to improving selectivity in drug design. J. Med. Chem. 55: 1424–1444. Kellenberger, E., Foata, N., and Rognan, D. (2008). Ranking targets in structure‐based virtual screening of three‐dimensional protein libraries: methods and problems. J. Chem. Inf. Model. 48: 1014–1025. Barelier, S., Sterling, T., O’Meara, M.J., and Shoichet, B.K. (2015). The recognition of identical ligands by unrelated proteins. ACS Chem. Biol. 10: 2772–2784. Singh, J., Petter, R.C., Baillie, T.A., and Whitty, A. (2011). The resurgence of covalent drugs. Nat. Rev. Drug Discov. 10: 307–317. Scapin, G., Dandey, V.P., Zhang, Z. et al. (2018). Structure of the insulin receptor‐insulin complex by single‐particle cryo‐EM analysis. Nature 556: 122–125. Razinkov, V.I., Treuheit, M.J., and Becker, G.W. (2013). Methods of high throughput biophysical characterization in biopharmaceutical development. Curr. Drug Discov. Technol. 10: 59–70. Flower, R.J. (2003). The development of COX2 inhibitors. Nat. Rev. Drug Discov. 2: 179–191. Valeur, E., Gueret, S.M., Adihou, H. et al. (2017). New modalities for challenging targets in drug discovery. Angew. Chem. Int. Ed. Engl. 56: 10294–10323. Valeur, E. and Jimonet, P. (2018). New modalities, technologies, and partnerships in probe and lead generation: enabling a mode‐of‐action centric paradigm. J. Med. Chem. 61: 9004–9029. Lai, A.C. and Crews, C.M. (2017). Induced protein degradation: an emerging drug discovery paradigm. Nat. Rev. Drug Discov. 16: 101–114. Hopkins, A.L. and Groom, C.R. (2002). The druggable genome. Nat. Rev. Drug Discov. 1: 727–730.

219 220

221

222

223

224

225

226

227

228

229

230 231

232

233

(2006). How many drug targets are there? Nat. Rev. Drug Discov. 5: 993–996. Santos, R., Ursu, O., Gaulton, A. et al. (2017). A comprehensive map of molecular drug targets. Nat. Rev. Drug Discov. 16: 19–34. White, S.H. (2009). Biophysical dissection of membrane proteins. Nature 459: 344–346. Hendrickson, W.A. (2016). Atomic‐level analysis of membrane‐protein structure. Nat. Struct. Mol. Biol. 23: 464–467. Cheng, Y. (2018). Membrane protein structural biology in the era of single particle cryo‐EM. Curr. Opin. Struct. Biol. 52: 58–63. Hopkins, A.L. (2008). Network pharmacology: the next paradigm in drug discovery. Nat. Chem. Biol. 4: 682–690. Anighoro, A., Bajorath, J., and Rastelli, G. (2014). Polypharmacology: challenges and opportunities in drug discovery. J. Med. Chem. 57: 7874–7887. Callaway, E. (2015). The revolution will not be crystallized: a new method sweeps through structural biology. Nature 525: 172–174. Grimes, J.M., Hall, D.R., Ashton, A.W. et al. (2018). Where is crystallography going? Acta Crystallogr. D Struct. Biol. 74: 152–166. Shaw Stewart, P.D. and Mueller‐Dieckmann, J. (2014). Automation in biological crystallization. Acta Crystallogr. F Struct. Biol. Commun. 70: 686–696. Gorrec, F. (2014). Progress in macromolecular crystallography depends on further miniaturization of crystallization experiments. Drug Discov. Today 19: 1505–1507. Bruno, A.E., Charbonneau, P., Newman, J. et al. (2018). Classification of crystallization outcomes using deep convolutional neural networks. PLoS One 13: e0198883. Giege, R. (2017). What macromolecular crystallogenesis tells us – what is needed in the future. IUCrJ 4: 340–349. Gavira, J.A. (2016). Current trends in protein crystallization. Arch. Biochem. Biophys. 602: 3–11. Redecke, L., Nass, K., DePonte, D.P. et al. (2013). Natively inhibited Trypanosoma brucei cathepsin B structure determined by using an X‐ray laser. Science 339: 227–230. Sawaya, M.R., Cascio, D., Gingery, M. et al. (2014). Protein crystal structure obtained at 2.9 A resolution from injecting bacterial cells into an X‐ ray free‐electron laser beam. Proc. Natl. Acad. Sci. U. S. A. 111: 12769–12774. Boudes, M., Garriga, D., Fryga, A. et al. (2016). A pipeline for structure determination of in vivo‐grown

21

22

1 The Evolving Role of Structural Biology in Drug Discovery

234

235

236

237

238

239

240

crystals using in cellulo diffraction. Acta Crystallogr. D Struct. Biol. 72: 576–585. Rodriguez, J.A., Eisenberg, D.S., and Gonen, T. (2017). Taking the measure of MicroED. Curr. Opin. Struct. Biol. 46: 79–86. Nannenga, B.L. and Gonen, T. (2019). The cryo-EM method microcrystal electron diffraction (MicroED). Nat. Methods 16: 369–379. Patwardhan, A., Brandt, R., Butcher, S.J. et al. (2017). Building bridges between cellular and molecular structural biology. Elife 6. Nowak, R.P., DeAngelo, S.L., Buckley, D. et al. (2018). Plasticity in binding confers selectivity in ligand‐induced protein degradation. Nat. Chem. Biol. 14: 706–714. Maniaci, C. and Ciulli, A. (2019). Bifunctional chemical probes inducing protein-protein interactions. Curr. Opin. Chem. Biol. 52: 145–156. Stuart, D.I., Subramaniam, S., and Abrescia, N.G. (2016). The democratization of cryo‐EM. Nat. Methods 13: 607–608. Cowan‐Jacob, S.W., Jahnke, W., and Knapp, S. (2014). Novel approaches for targeting kinases: allosteric

241

242 243

244

245

246

inhibition, allosteric activation and pseudokinases. Future Med. Chem. 6: 541–561. Fabbro, D., Cowan‐Jacob, S.W., and Moebitz, H. (2015). Ten things you should know about protein kinases: IUPHAR Review 14. Br. J. Pharmacol. 172: 2675–2700. Ferguson, F.M. and Gray, N.S. (2018). Kinase inhibitors: the road ahead. Nat. Rev. Drug Discov. 17: 353–377. Pandurangan, A.P., Ascher, D.B., Thomas, S.E., and Blundell, T.L. (2017). Genomes, structural biology and drug discovery: combating the impacts of mutations in genetic disease and antibiotic resistance. Biochem. Soc. Trans. 45: 303–311. Dugger, S.A., Platt, A., and Goldstein, D.B. (2018). Drug development in the era of precision medicine. Nat. Rev. Drug Discov. 17: 183–196. Hauser, A.S., Chavali, S., Masuho, I. et al. (2018). Pharmacogenomics of GPCR drug targets. Cell 172: 41–54 e19. Warner, K.D., Hajdin, C.E., and Weeks, K.M. (2018). Principles for targeting RNA with drug‐like small molecules. Nat. Rev. Drug Discov. 17: 547–558.

23

2 A Structural View on Druggability Experimental and Computational Approaches Ursula Egner and Roman C. Hillig Bayer AG, Pharmaceuticals, Research & Development, Berlin, Germany

2.1 Introduction Innovative and druggable targets are needed to fill the pipelines of pharmaceutical industry and to reduce the failure rates in clinical trials [1, 2]. The requirements for a good drug target depend on the therapeutic area as specific indications represent diverse medical needs and require varying safety profiles. The specifications in terms of safety and tolerability for targets in women’s health are more challenging than for kinase inhibitors in the therapy of life‐threatening diseases such as cancer. Key properties of a good drug target are its ability to selectively bind drug‐like molecules in a suitable pocket, its disease‐modifying potential, its role under physiological conditions, the availability of biomarkers, situation of intellectual property, and many more [1–6]. Druggability assessments have been integrated into the target assessment process in many pharmaceutical companies as well as in drug discovery units of universities [1–5, 7–9]. The majority of current drug targets are G protein‐coupled receptors (GPCRs), nuclear receptors, ion channels, or enzymes like kinases and proteases [10]. However, through the shift toward biology‐ and genomics‐based target identification, new types of targets such as nonenzymes and protein–protein interaction (PPI) targets have emerged in recent years [11–13]. Several initiatives across the scientific community aim to enrich and expand our knowledge about druggable targets in the human genome [14–19]. One of those is the “Illuminating the Druggable Genome” initiative of the US National Institute of Health, in which a systematic collection and processing of data from different resources (genomic, proteomic, chemical, disease‐related) aim to provide an in‐depth view about knowledge and deficits in knowledge of targets and thus may encourage research on less explored targets [14]. Another activity is the development of a collection of potent, selective, and broadly characterized small‐molecule modulators of

protein function (chemical probes), an initiative of the Structural Genomics Consortium (SGC) together with partners in academia and the pharmaceutical industry [15, 16, 20]. Such probes, together with an inactive but structurally related compound as control, offer the possibility to thoroughly validate targets for drug discovery. A number of different approaches and web servers have been published in recent years to predict the druggability of targets with well‐defined pockets. With nonenzymatic targets starting to enter the pipeline of pharmaceutical companies, druggability assessment approaches need to be expanded to include targets with shallow and often transient pockets. In our company, we aim to predict the molecular druggability of a target based on several aspects: (i) an assessment of the three‐dimensional (3D) structure of a target including available co‐complex structures, (ii) an assessment of the availability of low‐molecular‐mass modulators for the target and their drug‐likeness and selectivity profile, and (iii) an assessment of the internal experience with the target class and the target itself. These aspects together with envisioned timelines for lead generation are summarized in a prediction of high, medium, or low druggability. In our opinion, ligandability or bindability describes best what can be achieved in an automated procedure to predict whether a pocket of a protein structure is suited to accommodate a low‐molecular‐weight compound. As already suggested by Edfeldt et al., ligandability of a target is a necessary prerequisite for its druggability [21]. However, since the term “druggability” is more intuitive, we will use it in this review instead of ligandability or bindability of a target. In our opinion, druggability scores are very useful in providing a qualitative classification between more druggable and less druggable targets. These assessments find their way into portfolio prioritizations and may guide lead‐finding strategies. For less druggable targets, multiple parallel lead‐finding strategies may be initiated

Structural Biology in Drug Discovery: Methods, Techniques, and Practices, First Edition. Edited by Jean-Paul Renaud. © 2020 John Wiley & Sons, Inc. Published 2020 by John Wiley & Sons, Inc.

24

2 A Structural View on Druggability

to increase the likelihood of success, resulting in potentially longer timelines and increased resource requirements for lead generation. In this review we focus on literature published since 2008 to address the recent developments regarding druggability assessment. We will start with an overview of varying interpretation of the term “druggability” in the literature, followed by a description of computational concepts utilizing structural information of proteins as well as of experimental approaches to assess druggability for classical targets (e.g. enzymes and receptors) and targets involved in PPIs.

2.2 Views on Target Druggability The pioneering publication by Hopkins and Groom in 2002 introducing the concept of the “druggable genome” inspired scientists in both academia and pharmaceutical industry to focus on the research of druggable proteins and explore the characteristics of drug targets. In their view, druggable targets comprise all proteins for which a drug‐like compound is known [22]. The expansion to all known homologous proteins enabled an estimation of the druggable genome. Potential drug targets were then those druggable proteins that, in addition, could be linked to disease. In recent years the usage of the term “druggability” has become very popular, but its meaning varies considerably between authors. The interpretation covers everything from the likelihood to obtain a safe, efficacious therapeutic agent for a protein target to the likelihood that a protein pocket can accommodate a (drug‐like) ligand or belongs to a protein class with known (drug‐like) ligands. Accordingly other terms for druggability like ligandability, bindability, and chemical tractability can be found in publications. Ligandability and bindability reflect the possibility of designing small‐molecule ligands interacting with the protein binding site. This is a necessary but not sufficient condition for druggability [21, 23, 24]. Druggability may also include an assessment of the position of a target in cellular pathways or the behavior of proteins in interaction networks that needs to be analyzed by a systems biology approach or may refer to the likelihood of finding orally bioavailable small molecules that bind to a particular target in a disease‐modifying way [21, 23–25]. Chemical tractability is described as a measure for the probability to achieve a useful level of potency for chemical modulation of the therapeutically relevant biochemical activity in vitro [26, 27]. Instead of a binary yes/no assessment, druggability is considered to consist of a continuum of chemical tractability [26]. Other studies emphasize that target druggability primarily addresses the ability of a protein pocket to accommodate a drug‐

like ligand [28]. In our understanding, target druggability is the likelihood of finding a selective low‐molecular‐ weight molecule that can bind with high affinity to the target and modify its activity [3]. Some researchers additionally analyze whether inhibitors are already known as this shows that the target in question has a pocket accessible to and suitable for the binding of a small molecule. Moreover, if such known inhibitors bind preliminary via enthalpy, rather than entropy, the target is considered to have a higher druggability as the binding pocket is able to establish specific interactions with the ligand [29]. In addition to a predicted suitable binding pocket, there are further categories that provide evidence whether a target can be inhibited by a small molecule. A critical assessment should also cover whether endogenous small‐molecule ligands bind to the protein, whether tool compounds have been published, and whether compounds are already in clinical development. Such data increase the confidence level of having identified a druggable target [4, 7]. Along these lines, Bickerton et al. described compounds by a quantitative estimate of drug‐ likeness (QED) and analyzed and ranked targets according to the mean QEDs of their associated ligands [26]. The majority of drug targets (70%) were found in the top 50% of the prioritized target list. In an analysis of in vitro bioactivity data from literature and patents, 1654 human targets were identified and ranked by assayed compounds and molecular scaffolds. The target list included non‐targets, failed targets, and cross‐screening targets. The top 278 most actively pursued targets (17%) contained 90% of the compounds [27]. These targets were associated with high chemical tractability. In addition, the authors associated druggability with the likelihood of developing compounds with appropriate in vivo properties (bioavailability, efficacy, and safety profiles) and showed that these two characteristics were related: a high chemical tractability of a target in vitro may translate to a higher druggability, i.e. a higher probability to obtain compounds with in vivo activity [27]. In a detailed review on applications and perspectives of druggability, Abi Hussein et al. propose a protocol for applying different assessments [9]. First, a validation of the importance of the targeted protein with a systems biology approach is followed by a characterization of relevant pockets by machine learning methods, and finally the druggability is assessed by virtual screening methods. The authors emphasize that the complementarity of the three approaches gives a better overall assessment of druggability of a target protein and they provide a graphical overview on the advancement druggability prediction methods in relation to the evolution of the datasets for method development and criteria for drug‐likeness of inhibitors over time.

2.3 In Silico Methods for Druggability Assessment of Targets with Well‐defined Pockets

(a)

Site identification

(b)

Numerous algorithms available

Description of properties e.g. shape, hydrophobicity etc.

Hydrophobicity 3.00 2.00 1.00 0.00 –1.00 –2.00 –3.00

Dataset of druggable and undruggable targets

Protein

Drug

. Cyclooxygenase 1 c-abl kinase Factor Xa Progesterone receptor Phosphodiesterase 5 Beta2-adrenoreceptor .

Aspirin Imatinib Rivaroxaban Mifeprestone Sildenafil Salbutamol

PDB entry

(d)

Algorithm for druggability description Dominated by machine learning techniques using sequence or structure-related properties 1.5

1eqg 1iep 2w26 2w8y 1tbf 2rh1

1 3nd Principal Component

(c)

0.5 0 –0.5 –1

Druggable / Undruggable

–1.5 –2

–3

–1

0

–2

1

–1

2

3

2nd Principal Component

0

4

1

5

2

1st Principal Component

Figure 2.1 General approach for druggability predictions based on structural information of the target. (a) Potential binding sites on the surface of a given protein structure, shown in green, are identified using available cavity detection algorithms (left panel: factor Xa from co‐ crystal structure with rivaroxaban, PDB entry 2w26; right panel: protein kinase B‐Raf from co‐crystal structure with sorafenib, PDB entry 1uwh). (b) Descriptors for molecular properties of the detected binding sites such as volume, depth, hydrophobicity, etc. are set up. As an example, the binding site of sorafenib in B‐Raf kinase (PDB entry 1uwh) is depicted as molecular surface and color‐coded by hydrophobicity. (c) A dataset of crystal structures of druggable and undruggable proteins is compiled and split into a training set and a test set (see, e.g. Perola et al. [30] as an example dataset that was also used for this figure). (d) A druggability prediction algorithm is developed by analyzing using the training set which combination of descriptors is most suitable to correctly assign druggable versus undruggable targets. This algorithm is then tested for its predictive power using the test set. (See insert for color representation of the figure.)

2.3 In Silico Methods for Druggability Assessment of Targets with Well‐defined Pockets Since 2008, research relating to druggability predictions focused on the exploration of 3D structural information of proteins to characterize binding pockets by geometrical and physicochemical criteria and assess their ability to accommodate low‐molecular‐weight compounds. In principle, there are two different in silico approaches to develop druggability models: (i) use of pocket descriptors in scoring functions to discriminate druggable from undruggable targets (see Table 2.1 for prediction programs and references therein) and (ii) a prediction of the

maximal affinity that a drug‐like molecule could attain for a binding site [31, 41]. There are a number of recent reviews in relation to druggability predictions emphasizing the significance of druggability assessments in research [8, 9, 23, 30, 42–46]. The development of models for druggability prediction crucially depends on the evaluation of binding sites and the identification of appropriate pocket descriptors. We will therefore describe these methods in more detail. In general, the principal steps in druggability predictions on the basis of pocket descriptors are (i) identification of possible binding sites, (ii) selection of a set of property descriptors to characterize these sites, (iii) generation of a dataset of druggable and undruggable proteins necessary

25

26

2 A Structural View on Druggability

Table 2.1 Overview of druggability assessment methods of targets with well‐defined pockets since 2008 according to their year of publication.

Reference/software

Pocket identification

Pocket descriptors

Training set of protein structures

Scoring method

Halgren, SiteMap/ Dscore [31]

Geometry and energy based

Hydrophilicity, volume/size, enclosure

17 druggable, 6 difficult, 4 undruggable targets

Weighted sum of pocket descriptors

Weisel et al. [28]

Geometry based

Shape descriptors

721 crystal structures

Self‐organizing maps

Schmidtke and Barril, Fpocket [32]

Geometry based

Hydrophobicity, polarity

NRDD: 45 druggable, 5 prodrug, and 20 undruggable

Logistic model using pocket descriptors

Sheridan et al., DLID [33]

Geometry based

Hydrophobicity, volume, buriedness

~290 000 pockets from ~42 000 crystal structures

Linear regression of pocket descriptors

Krasowski et al., DrugPred [8]

Ligand binding site from PDB entry

56 descriptors capturing polarity, size, compactness, and amino acid composition

NRDLD: 71 druggable targets, 44 less druggable targets

Discriminant analysis

Volkamer et al., DoGSiteScorer [34]

Geometry based

Mainly hydrophobicity, volume, shape

NRDD

1) SimpleScore: Linear regression of pocket descriptors, 2) SVM

Desaphy et al., VolSite/Shaper [35]

Geometry based

73 descriptors including hydrophobicity, volume, etc.

NRDD

SVM

Perola et al. [30]

Ligand binding site from PDB entry

5 property ranges: volume, depth, enclosure, % charged, hydrophobicity

60 drug targets, reference set: 440 targets

Comparison with druggable property ranges

DrugEBIlity (ChEMBL resource6) [36]

Geometry based

Hydrophobicity, polarity, volume, enclosure

400 nonredundant proteins

Ensemble of decision tree and SVM

Yuan et al., CavityDrugScore [23]

Geometry based

Hydrophobicity, H‐bonding, volume, enclosure

NRDD

Weighted sum of pocket descriptors

Liu and Altman, DrugFEATURE [37]

Geometry based

80 descriptors for microenvironments relating to geometry and physicochemistry

Training sets from Hajduk et al. [93] and Cheng et al. [81]

Frequency of microenvironments matched in druggable subsites normalized by size of query site

Hussein et al., PockDrug [40]

Geometry based

52 descriptors for relating to geometry and physicochemistry

NRDLD, NRDD

Combination of 7 models from linear discriminant analysis of pocket descriptors based on 4 different pocket identification methods

for model generation and validation, and (iv) development of druggability models based on the identified descriptors and the datasets and analyzing their ability in differentiating between druggable and undruggable targets (Figure 2.1). In Table 2.1 an overview of available approaches since 2008 for druggability assessments for targets with well‐defined binding sites is given. Besides these approaches, other methodologies have been developed to identify druggable pockets like reverse docking techniques, computational solvent mapping, molecular dynamics (MD) calculations, or hot spot identification. These topics will be introduced in more detail in later sections as some of these techniques find their application especially in the field of PPIs with their flat, shallow, and in some cases adaptable binding sites. At first we would

like to introduce the general approach for druggability assessments of targets with well‐defined pockets following the individual steps in such an assessment. 2.3.1 Binding Site Identification A large number of different approaches are available for pocket identification from geometry‐ or energy‐ based methods to computational solvent mapping of small chemical probes or fragment‐like compounds or sequence‐based methods, many of which have been developed before 2008. There are a number of recent reviews on the methodology of the different algorithms and their accessibility including software developed before 2008 as well as the assessment of binding site

2.3 In Silico Methods for Druggability Assessment of Targets with Well‐defined Pockets

prediction methods, and the reader is referred to this literature for detailed discussions [23, 44, 46–53]. Ideally the binding site can readily be identified in available co‐crystal structures. Otherwise, at least for enzymes, the largest cavity is most often identical with the active site and can be readily determined from the structures. Geometry‐based pocket identification methods utilize the fact that pockets are shaped as concave cavities and therefore apply criteria such as size, volume, depth, enclosure, buriedness, etc. for pocket detection. Energy‐based methods seek to detect pockets by computing interaction energies between protein atoms and chemical probes based on the assumption that the energetic properties of binding sites can be readily distinguished from the rest of the protein. A comparison of geometry‐ and energy‐based methods showed that both methods correctly identified a similar amount of binding sites in apo and holo structures (around 95%) within the five best ranked binding pockets [54]. Larger variations among geometry‐based pocket prediction methods were observed in another study by Volkamer et al. [46]. The authors found variations between 78 and 94% for true active site predictions from holo structures for pocket identification within the three largest predicted pockets. The advantages of geometry‐based prediction methods over energy‐based ones are faster calculation times and robustness against missing atoms in Protein Data Bank (PDB) structures [32]. Another approach of binding site identification is solvent mapping by computational means. The idea stems from the observation that fragment‐sized compounds bind to preferred positions in the protein structure as identified by NMR or X‐ray crystallography [55–57]. Such energetically favorable positions on the protein surface are identified in silico by testing organic solvent molecules of different size and shape. Regions where several probes are predicted to bind often coincide with hot spots and often represent part of a ligand binding site [39, 58–67]. Computational solvent mapping has often been applied for binding site identification of PPI targets but can as well be used for classical targets [43, 49, 68, 69]. An approach to identify pockets that open up in the presence of ligands was introduced by Cimermancic et al. [70]. From a nonredundant set of pairs of ligand‐free and ligand‐bound protein structures, the authors derived a set of 92 binding pockets, 84 so‐called cryptic binding sites, and 705 concave surface patches. Initially, a set of 58 feature descriptors relating to physicochemical properties, geometry, and sequence conservation and protein dynamics scores relating to pocket size were selected to compare the characteristics of the individual pockets, followed by machine learning approaches to identify the most predictive model. The three features most relevant for the identification of cryptic pockets were pocket

formation in MD calculations, sequence conservation, and the likelihood of binding small‐molecule fragments derived from docking calculations. Cryptic pockets were identified in proteins such as protein tyrosine phosphatase 1B (PTP1B) and TEM beta‐lactamase but also in PPI sites such as interleukin 2 (IL‐2), Bcl‐XL, ZipA, and others. In the case of PTP1B, the authors experimentally validated a known allosteric pocket by NMR spectroscopy. A previously reported ligand, ABDF, attaches covalently to a cysteine residue nearby the allosteric site. More such applications would be beneficial to demonstrate the predictivity of the software (https://modbase. compbio.ucsf.edu/cryptosite). The identification of cryptic sites such as allosteric pockets or transient pockets in protein–protein interfaces may lead to additional opportunities for drug discovery. For a review on the progress in computational methods to predict and identify cryptic and allosteric binding sites, the reader is referred to recent literature [52, 70–74]. However, whether these sites will be relevant to disturb protein function needs to be proven experimentally. An annotated archive of druggable binding sites extracted from PDB is provided with sc‐PDB [75, 76]. In the sc‐PDB, cofactors, peptides, and putative ligands were considered as ligand, whereas buffer compounds, crystallization agents, and sugars were discarded. Other filters included the presence of a certain number of amino acid residues in the binding site, molecular mass of the ligands, buried surface area, and pocket size. The sc‐PDB database release 2017 contained 16 034 entries corresponding to 4782 different proteins and 6326 different ligands. Binding site entries in another resource of druggable cavities, the “Pocketome” database, are listed as druggable pockets from the PDB when they correspond to a small‐ molecule binding site in a protein that (i) has an entry in the reviewed part of the UniProt Knowledgebase (http://www.uniprot.org/help/uniprotkb), (ii) has been co‐crystallized in complex with at least one drug‐like small molecule, and (iii) is represented in at least two PDB entries [77]. The “Pocketome” collection currently comprises 3243 entries (http://pocketome.org, last updated 29 December 2015). Other databases related to cavities on proteins have been assembled, but they do not feature druggability assignments [44, 45, 53]. Binding site identification methods for druggability assessments cover a broad range of methods from extracting the ligand binding site from co‐crystal structures to the application of geometry‐ and energy‐based methods (Table 2.1). This already points to one of the caveats in druggability prediction: pockets are differently recognized and judged by different algorithms. These differences propagate into different pocket descriptors and will subsequently influence prediction results.

27

28

2 A Structural View on Druggability

2.3.2 Selection of Descriptors and Datasets for Method Development A series of datasets of druggable and less druggable targets have been generated to develop druggability prediction methods. They differ in size and composition (see Table 2.1) and have been used to deduce and test properties (descriptors) that characterize the binding sites, e.g. in terms of geometrical and physicochemical considerations. Ideally a dataset for druggability prediction should cover targets from as many different protein families as possible. The dataset should be of reasonable size so that it can be split up into an adequate training and validation set. The data should be carefully selected to prevent overfitting. The selection of a suitable set of druggable targets is a straightforward task making use of positive reports from successful screening campaigns in pharmaceutical industry and exploring databases such as DrugBank, PDBbind, bioassayR, DGIdb, or the Astex diverse set of complex structures with drug‐like ligands [17, 38, 78–80]. As the PDB is the source of structural coordinates, the selection of targets is biased to those that are experimentally amenable to structure determination to a resolution of at least ≤2.5 Å. Other targets like GPCRs or ion channels are more challenging to crystallize and are underrepresented in the datasets of druggable targets. As the number of GPCR and other membrane protein structures with sufficient resolution is increasing over the recent years, it would be worthwhile to analyze how the inclusion of membrane protein binding sites in the training sets for the druggability predictions would alter the assessments. To assemble a dataset of less druggable or undruggable targets is much more difficult. Various approaches have been followed: a target may be deemed undruggable after the target has been explored by several lead identification campaigns in pharmaceutical industry without identifying any hits suitable for lead optimization. Low druggability is also associated with a low hit rate in an experimental fragment screen, but here the hit rate is influenced by the biophysical screening method and the fragment library [57]. However, advances in library composition and/or screening methods might prompt a change in classification from less druggable targets to druggable ones. This was the case in recent years for PPI targets that were deemed undruggable 10 years ago. As of today compounds in clinical trials were reported for at least six PPI targets [12]. Although inhibitors might have been identified for such targets, from a pharmaceutical point of view, they will remain difficult targets as increased resources and timelines will be needed to identify alternative lead compounds. It has to be emphasized that there is no metric available that conclusively proves a target to be undruggable.

Some of the authors of druggability models in Table 2.1 enriched their datasets of undruggable targets by decoy pockets. Decoy pockets are pockets of a certain size, e.g. greater than 100 Å3, which contain no ligands. Training of druggability models using decoy pockets introduces an unwanted structural bias as the pockets are rather separated by size instead of their ability to bind a drug molecule. Overall, the size of the dataset of druggable targets varies between approximately 40 and 70 targets between the approaches listed in Table 2.1 as compared with the considerably smaller size of the undruggable targets of 4 to approximately 20. The diverse composition and size of the datasets will result in differences in pocket descriptor space and thus may give rise to diverging assessments in druggability predictions. In one of the first model generations for druggability predictions, Halgren [31] used the relatively small dataset of 17 druggable, 6 difficult/less druggable, and 4 undruggable targets of Cheng et al. [81]. Since then, authors based their analyses and developments on larger datasets. Weisel et al. filtered their dataset from the refined set from PDBbind with a preference for monomeric structures or those that could be converted to monomers (Set A: 909 structures, Set B: 98 complexes from the PDBbind core set of unrelated entries) [28]. Sheridan et al. analyzed the entire set of entries in the PDB (version 09/2008), considering all protein structures with a resolution ≤3.0 Å, resulting in ~42 000 protein structures and ~290 000 pockets [33]. Two datasets, the nonredundant druggability dataset (NRDD) and the nonredundant set of druggable and less druggable (NRDLD) binding sites, were developed in recent years by Schmidtke and Barril [32] and Krasowski et al. [8], which were selected as basis for the development of druggability models by other researchers. In the NRDD of Schmidtke and Barril, druggable targets were selected when an X‐ray structure in complex with a marketed oral drug was available. In addition, visual inspection was applied to classify targets as druggable, difficult (e.g. in the case of prodrugs), or undruggable (e.g. in case of ligands with non‐drug‐like properties). This selection resulted in 45 druggable targets, 5 prodrug targets, and 20 undruggable targets (for which only non‐drug‐like ligands were known). The NRDLD dataset of Krasowski et al. was compiled by evaluating literature, DrugBank, the dataset of Cheng et al., and structures from the Astex diverse set [8, 79, 81]. The dataset is composed of 115 targets of which 44 targets were classified as less druggable. These targets were filtered out from the PDBbind, DrugBank, and ChEMBL databases [36, 80, 82–84]. Perola et al. mined the PDB and a number of related databases (MDDR, DrugBank, Integrity, and others) for drug targets ([30] and references therein). Sixty targets with a single ligand in the drug binding pocket were

2.3 In Silico Methods for Druggability Assessment of Targets with Well‐defined Pockets

selected. A reference dataset of 440 structures was compiled using the PDBbind database. A further dataset of 11 structurally known targets was generated with targets with compounds in clinical phase III trials, 9 of which were associated with orally administered drugs. Not only composition and size vary between the published datasets but also the assignments of targets as druggable or undruggable. Due to the different interpretation of druggability, the classification of targets may differ between the various datasets. For example, in the NRDD dataset of Schmidtke and Barril, β‐lactamases are considered to be druggable, whereas Krasowski et al. classify them as less druggable as reported drugs for this target bind covalently to the protein and no drug‐like inhibitors were reported from high‐throughput screening (HTS) campaigns [8]. In general, the composition of all drug target datasets is biased toward kinases and nuclear hormone receptors, while GPCRs and ion channels are underrepresented. Astonishingly, the derived conclusions from these recent datasets regarding pocket descriptors do not differ considerably from those established before 2008 ([85] and references in [3]). The new data confirms former analyses that characteristics which distinguish druggable pockets from undruggable sites are often dominated by properties like hydrophobicity and enclosure whereas polar interactions do not contribute significantly to the description of druggable pockets [30, 31, 44, 50, 86]. In some cases, very high numbers of pocket descriptors were initially proposed for model development, while the descriptors identified as significant at the end of the studies were often only very few. A case in point is the study of Desaphy et al. who used 73 binding site descriptors reflecting properties such as volume, hydrophobicity, aromaticity, polarity, and accessibility as input values for a binary classification model using support vector machines (SVMs) [35]. Up to now, no single pocket descriptor could be identified that was able to distinguish druggable from undruggable pockets. Polar interactions play an important role in binding affinity and specificity, and in some of the models, polar descriptors contributed to method development. The most recent systematic analysis of Perola et al. [30] identified the following property ranges for druggable pockets (obeyed by 90% of the drug target set): volume ≥500 Å3, depth ≥10.4 Å, enclosure ≥0.28, percentage of charged residues ≤26.3, and hydrophobicity ≥–1.12. Only 57% of the binding pockets in their diverse set fall into this property range. Volkamer et al. [86] identified average volumes of approximately 900 Å3 for druggable pockets, whereas undruggable ones have an average size of approximately 300 Å3. Depth was another distinguishing descriptor as well as enclosure with undruggable pockets being more solvent exposed. When analyzing a

slightly modified extended druggability dataset of 901 entries from Schmidtke and Barril (649 druggable, 208 difficult, 44 undruggable), Volkamer et al. observed that the size and shape of difficult pockets resembled those of druggable ones, whereas the physicochemical properties show a higher similarity to undruggable cavities. This is in agreement with an earlier analysis of Halgren who showed that size and enclosure are characteristics that allow separating druggable from undruggable pockets, while hydrophobicity is suitable to distinguish between difficult and undruggable ones [31]. Pockets suitable to bind small‐molecule compounds tend to be larger and are more buried than undruggable or empty pockets [28, 86]. The sequence composition of active site pockets is stronger conserved than in other areas, with Tyr, Trp, and Gly particularly abundant in conserved sites [87]. 2.3.3 Development of Druggability Models The pocket descriptors derived from the druggable and undruggable datasets form the basis for the development of a prediction model development. The methods to derive a scoring metric for druggability assessment are manifold, from simple regression equations and comparison with property ranges to applications of SVMs. In publications on new methods for druggability prediction, the authors often compare their approach with previously published ones to demonstrate the superiority of the newly developed method. One of the earlier methods was published in 2009 by Halgren and employed an enhanced version of the earlier developed SiteScore equation for identifying binding sites as implemented in the SiteMap software [31]. Using a dataset of 528 proteins from PDBbind complexed to ligands with drug‐like properties, Halgren showed that with the SiteMap software the success rate of identifying all binding sites as the top 1 binding site was 86%, which increased to 98% when only binding sites with ligands less than 1 nM were considered [31]. Based on these results, Halgren advanced SiteScore into Dscore, which classifies druggability by applying an equation that includes positive terms for volume and enclosure and a negative contribution of hydrophilicity. When used on the dataset of Cheng et al. (27 proteins, 22 of which had marketed drugs or drug‐like candidates published and 5 targets had none and were classified undruggable), Dscore correlated well with the earlier classification of druggable, difficult, and undruggable targets (Dscore values 0.98 for druggable ones and values in between for difficult targets) [31, 81]. Loving et al. selected in their study protein structures above a minimal Dscore threshold subjecting them to flexible protein docking and modeling accounting for light protein backbone movement and side‐chain flexibility [67].

29

30

2 A Structural View on Druggability

The authors validated their approach with the test set from Cheng et al. [81] as well as six PPI targets depicted by Wells et al. [88] and extended their analysis to mammalian structures in the PDB (~18 000 structures relating to ~5500 targets). When flexibility modeling is applied, a third of the structures (~5700 corresponding to ~1100 targets) were found to be potentially druggable with a Dscore of greater than 1.7 after flexibility modeling. Although accurate modeling of protein flexibility is still a subject of research, this approach may provide additional hypothesis of potentially druggable pockets that need to be experimentally validated. Using SiteMap/Dscore+ for protein–drug complexes in the PDB, Lucas and Günther analyzed potential correlations of the calculated druggability of these structures with the characteristics of the corresponding drugs [89]. The authors demonstrated that the molecular complexity of drugs, i.e. the number of chiral carbon atoms in a compound, is increased in protein cavities with lower druggability. This is not unexpected as polar cavities of lower druggability impose geometric constraints that can only be satisfied when appropriately decorated compounds bind to such sites. The larger NRDD dataset of Schmidtke and Barril was extended by 354 decoy pockets and used for training and validation purposes applying a logistic model to develop the scoring function [32]. When a druggability score threshold of 0.7 was applied, Schmidtke and Barril could identify 28 out of 45 druggable targets in the NRDD as being druggable, and 4 out of 20 undruggable targets were false positives with a mean druggability score of ≥0.7. Schmidtke and Barril compared their Fpocket algorithm with that of SiteMap (Dscore) and concluded that both methods are similarly capable of retrieving druggable pockets from structurally known proteins when using the NRDD dataset. An even larger dataset of approximately 5700 pockets containing a drug‐like ligand out of the approximately 290 000 pockets in the PDB was used by Sheridan et al. to define a drug‐like density (DLID) metric [33]. For a given cavity this metric reflects how many similar binding pockets are available that bind drug‐like ligands. This is set in relation to the total number of similar pockets. In addition, simple pocket descriptors could be linked to DLID values via an equation containing volume, buriedness, and hydrophobicity. Sheridan et al. calculated a median DLID value for all targets in the original dataset of Cheng et al. taking into account all available structures of a protein at a cutoff value of 90% sequence identity [33]. DLID values compare well with the results obtained earlier by Cheng et al. with a correlation coefficient of 0.84. The correlation coefficient for Dscore and Fpocket is less pronounced (0.67 and 0.65, respectively).

Krasowski et al. developed the DrugPred druggability prediction by using the NRDLD dataset and 56 descriptors, which were fed into a model derived by partial least squares projection to latent structures discriminant analysis (PLS‐DA) [8]. Relevant descriptors for the model were those relating to size, polarity, and hydrophobicity. The authors decided against the use of decoy pockets as their use decreased the predictivity of the model. When applying DrugPred to the NRDLD structures and those of Cheng et al., the prediction accuracy was reported to be 0.89 and 0.81, respectively. When using Fpocket for druggability prediction of the NRDLD dataset, the accuracy was found to be lower (0.69 instead of 0.89). When applying DrugPred to the smaller dataset of NRDD of Schmidtke and Barril, the predictions became worse, an indication that the size of the training set and its composition are important in developing druggability prediction models. A different approach was taken by Yuan et al. who selected 210 complexes from the PDBbind and plotted the affinities of the associated ligands against a CavityScore value, a term that is related to cavity volume and surface area [23]. From the linear relationship between both parameters, Yuan et al. defined a CavityDrugScore value covering parameters for volume, hydrophobicity, and hydrophilicity as well as enclosure. Druggable proteins in the NRDLD dataset were successfully separated by using a CavityDrugScore of −180. The prediction accuracy of CavityDrugScore was described by the authors as worse than the accuracy from DrugPred, but better than that of Fpocket. For the validation set, accuracies of 0.82, 0.89, and 0.69 were obtained for CavityDrugScore, DrugPred, and Fpocket, respectively. By training an SVM on the basis of the NRDLD dataset, Desaphy calculated the accuracy of their method as being 0.80 [35]. Using 37 novel druggable and undruggable cavities, the authors calculated the accuracy of their method as 0.89, which is comparable with DrugPred (0.89), but higher than SiteMap and Fpocket (0.65 and 0.73, respectively). Two false negatives were detected in the druggable dataset: (i) DNA gyrase, most likely due to the polar yet buried binding site, and (ii) thymidine phosphorylase, which all methods failed to predict as druggable. The training of global and local pocket descriptors on SVMs was also reported for DoGSiteScorer [34, 90]. Volkamer et al. compared their prediction method with that of Fpocket and SiteMap and observed that all three prediction models have similar accuracies when applied to the NRDD datasets including decoy pockets. A second scoring function, SimpleScore, is a linear combination of volume, enclosure, and lipophilicity, which gave similar results as compared with Dscore in SiteMap. The SVM‐ based score showed the highest performance when

2.3 In Silico Methods for Druggability Assessment of Targets with Well‐defined Pockets

subpockets were considered. By applying DoGSiteScorer to the pocketome of human protein kinases with the DFG‐in conformation, Volkamer et al. could identify target kinases with known approved drugs as top scorer [91]. For those protein kinases without experimental structures, homology models were generated and used in the druggability calculations. The identification of less explored kinases with a high druggability score like MLK1, CK1G2, and various CLKs may inspire the target selection process in early drug discovery. Patel et al. obtained druggability predictions for 46 protein targets in cancer from the ChEMBL DrugEBIlity database (https://www.ebi.ac.uk/chembl/drugebility), part of the ChEMBL resource [7, 36, 92]. The algorithm of the DrugEBIlity server applies strict and relaxed criteria for classification of targets as druggable versus undruggable depending on whether the cavities are consistent with physicochemical properties compliant with binding of ligands with “rule of five” (Ro5)‐consistent properties (strict criteria) or not (relaxed criteria). In the analysis of Patel et al., a variety of different functional classes including bromodomains and GTPases were predicted as druggable prior to publication of compound data. Other validated druggable families such as histone demethylases were also predicted as druggable despite not being in the training set. In addition to the ChEMBL DrugEBIlity server, the authors calculated druggability scores from SiteMap (Dscore) with a target being druggable with a Dscore greater than 0.9. For the 46 proteins analyzed, structures predicted to be druggable adopting the strict criteria using ChEMBL DrugEBIlity had at least one cavity that scored 0.9 or higher in SiteMap. Perola et al. derived a simple set of guidelines to estimate target druggability from their drug target set. About 90% of these validated drug targets follow the guidelines in contrast to 57% of the diverse set [30]. In the dataset of targets with compounds in clinical phase III trials, 10 of the 11 targets were identified as druggable. Compared with SiteMap (Dscore) and Fpocket, the authors demonstrated that their guidelines could retrieve a larger set of druggable targets from the “phase III set” (10 with property guidelines, 7 with SiteMap [Dscore], and 4 with Fpocket). However, this small‐scale evaluation might not necessarily hold true for a larger dataset. Despite different algorithms for pocket detection, different descriptors, and varying models used for scoring, the accuracy of druggability predictions is remarkably consistent across the different software packages for structures with well‐defined pockets. For targets with less druggable pockets, however, the programs and server differ in their results. Several factors influence the individual predictions: (i) The use of different pocket prediction algorithms will result in differences in the pocket size and shape for the ligand binding pocket as

well as for decoy pockets. This in turn will have an influence on the pocket descriptors with which the individual models are trained. The binding site of coagulation factor X is identified as one pocket in DoGSiteScorer with an average druggability score of 0.6 (0 being undruggable, 1.0 druggable), while Fpocket splits the site into two pockets and ranks it poorly. In cases where depth is one of the descriptors used in training, shallow pockets like those from HIV reverse transcriptase or HIV integrase may not be identified as relevant drug binding cavities. (ii) The classification of targets as druggable or less druggable differs in the literature, especially for difficult targets. The target β‐lactamase is classified as druggable by Halgren [31] and Schmidtke and Barril [32], while Krasowski et al. [8] assign the label “less druggable” to this protein. Other targets of approved drugs with well‐ defined pockets (GST, P4M, cytochrome P450 105A1, glucoamylase) were classified undruggable upon detailed (and comprehensive) analysis by Schmidtke and Barril. For example, P4M was originally classified as druggable as it was assigned as the target of the drug levodopa in DrugBank. Following Schmidtke and Barril, this assignment could not be confirmed, and the authors classified P4M in their analysis as undruggable [32]. (iii) Most druggability prediction methods score pockets as druggable by volume, enclosure, and hydrophobicity. Consequently, they are likely to fail in predicting targets with hydrophilic cavities such as carbonic anhydrase or 3‐hydroxy‐3‐methylglutaryl‐coenzyme A (HMG‐CoA) reductase as druggable [86]. Both proteins feature binding sites where ligands bind either via metal‐mediated interactions or via hydrogen bonding and ionic interactions. Figure 2.2 shows some examples of structures from the published datasets where druggability predictions or classifications of druggability vary in the literature although for all these targets co‐complex crystal structures with approved drug molecules have been determined. Liu and Altman propose that drug binding sites resemble sites with favorable physicochemical properties that can be translated into druggable microenvironments [37]. A single drug binding site comprises approx. 10–20 of such microenvironments. In the DrugFEATURE software, a druggability score is calculated by counting the frequency of observed druggable microenvironments in a query site normalized to the size of the site. The dataset from Hajduk et al. was used to determine the value of DrugFEATURE scores for druggable targets. Using a score value of greater than 1.9, DrugFEATURE identified all 10 druggable and 13 out of the 14 undruggable sites of this dataset [93]. The DrugFEATURE calculations correlate well with the results from Cheng et al. with all but one druggable targets having scores between 1.9 and 4.7 [81]. A comparison with other druggability assessment methods based on the relatively small dataset from Cheng

31

32

2 A Structural View on Druggability

(a)

(b)

Factor Xa (PDB 2w26)

HMG-CoA (PDB 1hw8)

(c)

(d)

HIV reverse transcriptase (PDB 1t03)

Neuraminidase (PDB 1nnc)

Figure 2.2 Difficult targets with known drugs, a challenge for druggability assessment. Four targets are shown with bound drugs that represent a challenge in druggability classification and prediction. The classification and predictions vary for factor Xa and HMG‐CoA (a, b), the shallow binding pocket is difficult to identify for HIV reverse transcriptase (c), or a prodrug approach is classified as less druggable (neuraminidase). (a) Factor Xa in complex with rivaroxaban, (b) HMG‐CoA in complex with mevastatin, (c) HIV reverse transcriptase in complex with tenofovir, and (d) neuraminidase in complex with zanamivir.

et al. showed a better performance of DrugFEATURE in comparison with Fpocket, SiteMap, and DLID [31–33]. Hussein et al. developed the PockDrug web server for predicting druggability of pockets on holo and apo proteins [40, 94]. In their approach to predict druggability, the authors identify pockets by 4 different pocket identification methods and characterized them by 52 physicochemical pocket descriptors. A linear discriminant analysis was performed on the best combinations of pocket descriptors, resulting in nine relevant pocket descriptors relating to geometry, hydrophobicity, and aromaticity. The seven most stable and efficient models were combined to predict the pocket druggability. The NRLDL dataset from Krasowski et al. was used to construct the PockDrug druggability model and the NRDD dataset of Schmidtke and Barril for model validation [8, 32]. Based on these models, druggable pockets could be distinguished from less druggable pockets with mean probabilities of 0.87 ± 0.15 for the druggable pockets and 0.18 ± 0.15 for the less druggable pockets. One [8, 32] advantage of the server is that it accepts pockets estimated by user or identified within a protein structure, which allows to submit structures obtained by homology modeling.

Several publications from different laboratories analyze the free energy of water molecules within potential ligand binding sites because druggable targets show an accumulation of sites with such thermodynamically unstable water molecules [95–97]. A quantitative metric for target ligandability based on the thermodynamics of water networks on the protein surface was proposed by Vukovic and Huggins [97]. An MD simulation of the water network in a 10‐Å box around a protein was calculated, and clusters of weakly bound water molecules were identified. Such sites were interpreted as ligandable hot spots and correlated with an experimental metric derived from the BindingDB database. The experimental metric correlates the number of reported compounds for a target with an affinity cutoff of 100 nM with the total number of compounds tested for the target thus mirroring the effort to develop potent compounds for a given target. However, as the authors point out in their discussion, the number of compounds synthesized for a given target does not necessarily reflect the effort to generate such compounds (screening of large libraries versus synthesis of few selected compounds versus effort of chemical synthesis). The sum of the free energies of

2.3 In Silico Methods for Druggability Assessment of Targets with Well‐defined Pockets

water molecules in identified hot spots (so‐called displacement scores) was then scaled to the experimentally found ligandability range of 0.1–0.77. The correlation of the experimental and predicted ligandabilities of the targets in the test set is relatively low (0.42), showing rather a trend of which targets might be druggable or more challenging. For example, HIV‐1 protease and PDE5A were identified as druggable targets, whereas cdk2 was flagged to be challenging. Improvements of the metric are needed to enable a more reliable prediction. Similarly, Beuming et al. studied whether a potential correlation between hydration sites of weakly bound water molecules (obtained from WaterMap MD simulations) and target druggability can be observed [95]. In the majority of druggable targets of the dataset of Cheng et al., the binding sites were strongly enriched in high‐ energy hydration sites [81]. For 12 out of 17 druggable targets, a top‐ranked cluster of hydration sites was located in the proximity of the binding site. When summing up all hydration site energies greater than 2.0 kcal mol−1 into a hot spot value, this index was able to separate targets into druggable or challenging targets. As of today many computational approaches to assess target druggability have been developed (see Table 2.1); however, an unbiased comparison of all methods is difficult to obtain. In the end the user needs to decide which software to consider for druggability assessment and will most likely add visual inspection of the target structures as well as selectivity considerations in the final assessment. We fully agree with Henrich et al. that “the human expert outperforms the computational tools in assessing protein druggability” as the own experience with the target or target class will influence the assessment [49]. 2.3.4 Druggability Prediction via Alternative Methods Alternative methods for druggability prediction include MD simulations with varying solvent molecules as well as reverse docking and in silico fragment screening approaches. Seco et al. performed MD simulations using a solvent mixture of isopropanol and water [41]. They could show that a high occupancy of isopropanol molecules in a binding site was correlated to a high binding free energy and thus to the maximal affinity of potential inhibitors. Maximal affinity predictions in the sub‐nM range indicated a high druggability of the binding site. The authors compared the predicted isopropanol sites with crystallographically determined ones in thermolysin, the p53 core domain, and bovine pancreatic elastase. For thermolysin and the p53 core domain, the binding positions predicted from the MD calculations were in agreement with the experimentally observed ones. For elastase, two

of the three crystallographically observed isopropanol positions could be reproduced in the simulations. The maximal predicted affinity of potential inhibitors predicted the sites in the p53 core domain as undruggable and that of thermolysin and elastase as druggable, which is in agreement with known inhibitors. In an analysis of five targets of pharmacological interest (MDM2, LFA‐1, PTP1B, p38, AR), druggable sites could be clearly distinguished from undruggable ones. Similarly, Bakan et al. performed MD simulations with isopropanol as probe and an enlarged probe set of four small organic fragments and calculated the maximal binding affinity on a similar target set (MDM2, LFA‐1, PTP1B, p38, Eg5) [98]. The probe set was extracted from a substructure analysis of a set of approximately 1300 of approved small‐molecule drugs (isopropanol, isopropylamine, acetamide, acetate). From a final relative binding free energy map of the probes, interaction spots of all probes were identified and subsequently clustered into binding sites. A predicted binding site was considered druggable when the corresponding calculated maximal binding affinity was at least 10 μM and comprised at least six interaction spots. With this method the location and binding affinities for known drug molecules could be captured correctly, especially when the probe set was used in comparison to a single molecule as this constantly yielded better prediction of achievable affinities. For example, the simulations located the allosteric pocket of Eg5 correctly, which is approximately 12 Å away from the catalytic site. Achievable affinities were calculated to be 0.3 nM for the probe mixture and 27 nM when using isopropanol as solvent. The best IC50 reported for an Eg5 inhibitor is 0.2 nM, which is in good agreement with the predictions. PTP1B is a challenging target with a highly basic catalytic site. Only the use of charged probes allowed the identification of a probe interaction spot at the catalytic site. The predicted maximal achievable affinities for the PTP1B catalytic site were 0.3–0.9 nM, which is in agreement with the best experimental inhibitor with an IC50 value of 2.2 nM [98]. In a reverse docking approach, Lee et al. docked 35 drug molecules into the crystal structures of target proteins [99]. In the druggable protein set, the average of the docking scores was greater than those in the less druggable set. Ward docked 950 fragments into protein structures to allow an in silico druggability prediction [100]. The mean docking score of the ensemble of fragments was used as an indicator for druggability. In total, data for 13 proteins with 1–6 structures per protein were presented. Druggability scores were dependent on the protein conformation analyzed, e.g. the inactive conformations of p38 and epidermal growth factor receptor (EGFR) scored slightly better than the active conformations. This approach may help to identify the X‐ray

33

34

2 A Structural View on Druggability

structures with the most tractable pockets in a set of different structures of the same target. By applying virtual screening campaigns with a library of 11 000 fragment‐like compounds against 152 protein binding sites, Huang et al. could demonstrate that the hit rate correlated well with that from experimental fragment screening for targets with well‐defined pockets as well as for PPI targets [101]. After selecting the best docking poses, the complexes were submitted to multi‐ scale energy minimizations, and the binding energies were calculated. An energy cutoff value was chosen empirically to maximally differentiate druggable and undruggable binding sites. The “druggability score” was defined as the decimal logarithm of the hit rate with a score greater than 0.36 indicating druggable binding sites. The hit rates were calculated for 15 well‐known drug targets where some multiple co‐complex structures were published. Example targets included ACE, cdk2, FXa, and p38. All targets were predicted to be druggable with scores between 0.6 and 1.8 with ACE and neuraminidase at the lower end of the druggability scores. The binding sites of both targets have very polar cavities, which might be the reason for the low predicted druggability. With respect to different conformations of proteins, no striking dependency was observed between the druggability score and the crystal structure considered in the calculations.

2.4 Experimental Methods for Druggability Assessment In recent years fragment‐based screening (FBS) has developed into an established approach both in the pharmaceutical industry and in academia as an additional lead‐finding strategy besides HTS. Fragments are smaller than compounds used in HTS; their physicochemical properties rather obey the “rule of three” (Ro3) instead of the Ro5 [102]. Fragments following the Ro3 guidelines feature a lower molecular mass (150–250 Da) with few functionalities ( λ > 3 Å) are in thermal equilibrium at around room temperature with the cooling water near a reactor core, and cold neutrons (2.5 Å > λ > 20 Å) are in thermal equilibrium at the temperature of liquid H2. In the pulsed neutron factory, a coupled moderator gives a higher flux than a decoupled one, but a broader pulse. For time‐of‐flight experiments, narrower pulsed neutron from a decoupled moderator is better to separate diffraction spots, but a high flux is practically needed to detect diffraction signals by macromolecular crystallography. b L2 is the crystal‐to‐detector distance. c Maximum value of a cell constant in the crystal for measurement.

Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB), whereas over 125 000 X‐ray structures have been deposited. These neutron structures, however, do not comprise uniquely different proteins and include sets of related proteins; thus, the analysis of hydrogens and hydration in protein structures by neutron diffraction is at a nascent stage. An important recent development is the collaborative use of neutron and X‐ray diffraction data for a comprehensive determination of biomolecular structure [18] – with specific utility in drug discovery. Although the number of drug target proteins whose structure has been solved by neutron diffraction is still limited, recent improvements in neutron beamlines and detectors, as well as developments of sample preparation techniques (i.e. efficient perdeuteration), will make it possible to obtain critical

information regarding the location of hydrogen atoms of catalytic residues and stereochemistry of protein‐bound water molecules for effective drug discovery programs.

12.6 Method for Neutron Crystallography of Proteins 12.6.1 Sample Preparation 12.6.1.1 Sample Requirement for Neuron Crystallography

For neutron diffraction experiments, a relatively large amount of protein sample is needed: single crystals having a volume greater than 1 mm3 are commonly used for recent structure analysis with reactor and pulsed neutron

279

280

12 Complementary Information from Neutron Crystallography Studies

beams [19–63]. A 1‐mm3 crystal contains 0.5–1.0 mg of protein; thus, typical screening experiments to identify suitable crystal growth conditions for large crystals can consume substantial amounts of purified protein. The size of crystals required for neutron crystallography is expected to decrease upon planned improvements in neutron beam intensity, as well as the use of efficiently perdeuterated protein samples [64–70]. To obtain better quality neutron diffraction data, the hydrogen atoms existing in a protein are usually replaced with deuterium to reduce the incoherent signals produced by hydrogens. There are at least two approaches used to replace exchangeable hydrogen atoms by deuterium. One approach is to soak the protein crystal into a buffer prepared using D2O and deuterated reagents. In this treatment, most of the exchangeable hydrogen atoms (i.e. amine and amide hydrogen atoms, carboxylates, and hydroxyls), excluding aliphatic and aromatic hydrogen atoms, are replaced with deuterium atoms. Further replacement of aliphatic and aromatic hydrogen atoms needs the complete replacement of the common cultivation medium (e.g. lysogeny broth [LB] broth) with a perdeuterated medium (e.g. BioExpress cell growth media (U‐D, 98%): CGM‐1000‐D, SPECTRA 9: CGM‐3030‐D, Cambridge Isotope Laboratories, Inc.) and minimum medium containing a deuterated carbon source (e.g. glycerol) [71]. 12.6.1.2 Protein Overproduction and Purification for Neutron Protein Crystallography

To obtain large amounts of protein sample, it is convenient to overproduce with the technique of genetic engineering. There are several systems for effective large‐scale production of recombinant proteins for neutron diffraction studies. The Escherichia coli expression system is one of the most common systems to express proteins by perdeuteration, although such a host lacks the ability to perform several posttranslational modifications. For example, additional treatments such as refolding under redox conditions may be needed to produce secretion proteins with disulfide bridges. Mammalian and yeast secretion systems are effective for the large‐scale overproduction of secreted proteins. Recent reports of antibody expression using mammalian secretion systems have achieved greater than gram‐level yields [72]. The yeast methanol‐induced secretion expression system is notably effective for the production of perdeuterated proteins because methanol as a deuterium source is relatively inexpensive. 12.6.1.3 Perdeuteration of Protein

The use of perdeuteration has allowed a reduction in the crystal‐size requirement for neutron diffraction by an order of magnitude. While neutron diffraction com-

monly utilizes crystals having greater than 1 mm3 volume, neutron diffraction studies of perdeuterated human aldose reductase (2.2 Å resolution) were achieved with a crystal volume of 0.15 mm3 [64], antifreeze protein (1.85 Å resolution) with a crystal volume of 0.13 mm3[65], HIV‐1 protease (HIV‐PR) (about 2.0 Å resolution) with a crystal volume of 0.2 and 0.5 mm3 [66–68], and T4 lysozyme WT and mutant (about 2.1 Å resolution) with a crystal volume of 0.5 and 0.9 mm3, respectively [69, 70]. Expression host cultivation in deuterated medium usually takes a much longer period (~ a few days) to obtain the same amount of cell density since deuterium appears somewhat toxic to E. coli. In this case, it is important to maintain the selective marker for the expression plasmid – i.e. replenish antibiotic as needed over the longer culture period. However, once appropriate cell density has been achieved, protein induction usually takes a few hours to overnight cultivation (3–18 hours). Purification of perdeuterated protein expressed in deuterated medium is typically accomplished using hydrated buffers (for reasons of cost); thus, the resulting purified proteins are dialyzed against deuterated buffer to ensure that readily exchangeable hydrogen atoms are maintained in the perdeuterated state before crystallization. The perdeuteration ratio of the sample is determined by measuring the molecular mass by mass spectrometry. The exchange rate for deuteration is usually rapid (within a few days); even so, some fraction of hydrogen atoms in the hydrated protein can remain even after a few months of deuteration dialysis. Such hydrogens are typically protected in the native structure; thus, if the protein sample exhibits reversible folding after denaturation, it may be an effective strategy to expose the sample to the deuterated solution under denaturing conditions and then renature. 12.6.1.4 Consideration of Chemical Instability During the Crystallization Experiment

Crystal growth yielding a size large enough for neutron crystallography usually takes a longer period (i.e. a few months) in comparison with crystals useful for X‐ray crystallography. Chemical instability of protein samples over such long periods sometimes results in reduced diffraction quality, or suppression of crystal growth, because of the formation of aggregates or insoluble precipitates. To improve the chemical stability of the sample, additional mutations can be introduced into the protein. One such mutation is the substitution of free cysteine residues into other amino acids (e.g. Ala, Ser, Val) [73]. Nonspecific oxidation of free cysteine residues sometimes results in large molecular aggregates, which may decrease the amount of protein available to be crystallized.

12.6 Method for Neutron Crystallography of Proteins

Another chemical instability is the deamidation of asparagine (Asn) residues located at the surface of the target protein. Deamidation of asparagine forms a mixture of negatively charged aspartic acid and isoaspartic acid after forming a five‐membered succinimide intermediate [74]. Such deamidation is particularly effective if the asparagine precedes a glycine residue – as often occurs in surface turns. In such cases, mutation of Asn to Gln can effectively prevent deamidation; however, protein function should be confirmed after the introduction of such mutations. 12.6.1.5 Crystal Lattice Engineering to Control Crystal Packing

The unit cell volume of the crystal lattice, number of asymmetric unit in the cell (Z), and lattice type (P, I, F, C) strongly affect the strength of diffraction [1]. Specifically, if the unit cell volume of the crystal lattice can be reduced, the intensity and separation of diffraction spots can be improved. It is known that surface mutations of a protein can induce crystallization by promoting interactions for specific intermolecular crystal contacts [75, 76]. Thus, mutations that cause a protein to crystallize in a space group having a smaller unit cell volume are particularly attractive for neutron diffraction studies. However, such mutational effects are largely unpredictable and are often identified by serendipity. A surface mutation introduced into β‐lactamase Toho‐1 suppressed the formation of twinned crystal [77], and the mutant structure was finally analyzed by neutron crystallography [34, 48]. 12.6.2 Crystal Growth for Neutron Protein Crystallography 12.6.2.1 Crystallization Screening to Obtain a Large Crystal Volume

In order to grow large crystals suitable for neutron diffraction, once nucleation has occurred, crystallization conditions need to be kept under a mesostable region where nucleation is suppressed but the crystal continues growing. Crystal growth depletes the protein concentration in solution; however, simply increasing the protein and precipitant concentrations can shift the crystallization phase diagram back into the nucleation zone. Thus, the growth of large crystals suitable for neutron diffraction can present technical challenges. 12.6.2.2 Macroseeding to Grow Large Crystals

After finding suitable conditions to grow crystals, it is typically necessary to scale up the crystallization experiment for neutron diffraction studies. In general, a 1‐mm3 crystal volume contains approximately 1 mg of protein. Therefore, at least 1 mg of protein is required in each crystal growth well, necessitating a “sitting‐drop” or “batch” approach – as the traditional “hanging‐drop”

crystallization methodology cannot support the necessary large volume of solution. To obtain a sufficiently large volume of crystal, a continuous supply of protein to the growing crystal is typically necessary (i.e. to maintain mesostable conditions in the face of reduced protein concentration in solution). Macroseeding is one effective way to grow a large crystal volume [78–80]. 12.6.2.3 Periodic Addition of Protein to Promote Large Crystal Growth

In addition to macroseeding, it is often effective to periodically add protein into the solution of the growing crystal without simultaneously altering precipitant concentration. There are several successful examples of growing large volume crystals using macroseeding followed by periodic protein addition to the crystal solution [78, 80]. Since the crystal growth conditions can require a precise range of protein and precipitant concentrations, the protein and precipitant concentrations should be strictly controlled. In Figure 12.2a, an example of crystal growth of ADPRase [80] is shown. Periodic addition may be suitable for automated execution of crystal growth (i.e. programmed robot manipulation of crystal solutions in sitting‐drop experiments). 12.6.3 Neutron Diffraction Experiment 12.6.3.1 Mounting a Deuterated Protein Crystal

Since a neutron beam does not damage protein crystals, there is no need to freeze crystals, and they are typically mounted within a quartz capillary as with traditional room‐temperature X‐ray data collection methods. Due to the large volume, a deuterated protein crystal is typically mounted within a 2–3‐mm‐diameter quartz capillary. An NMR tube is also commonly used for mounting the large crystal for neutron diffraction experiments. After sealing the capillary or the NMR tube, the sample is exposed to the neutron beam at room temperature for the length of time required to obtain a dataset with appropriate completion (currently, typically more than 40 days, as in the case of data collection of HIV‐PR [30]). Although radiation damage is negligible in a neutron diffraction experiment, cryo‐neutron diffraction at 100 K is occasionally performed [81]. Sample cooling is advantageous to collect higher‐resolution neutron diffraction data under conditions where such cooling leads to a significant decrease of thermal factors. Furthermore, such cooling permits both neutron and subsequent X‐ray diffraction data to be collected under identical conditions (see below). 12.6.3.2 Neutron Diffraction Data Collection

Neutron diffraction experiments usually require long exposures for one frame when using a reactor neutron beam. A full neutron diffraction dataset for an HIV‐PR

281

12 Complementary Information from Neutron Crystallography Studies

(a) 2.5

Crystal length (mm)

282

(7)

(1) (6)

0.2mm

2.0 1.5

(8)

(5)

(5)

1.0

(8)

(4) (3)

0.5

(3)

(1)

1.0 mm

(2) 0

10

20

30

40

50

60

Elapsed days

(b)

(c) 2.0 Å

Exposure time = 4 h

(d)

Slit 6/3 mm Si(111) 1plate

Figure 12.2 (a) The process of crystal growth of ADP‐ribose pyrophosphatase (ADPRase). Growth of the crystal after periodic addition of ADPRase solution is shown. Actual images of crystals during this process are included in panel a. (b) Cylindrical neutron diffractometer, BIX‐4, installed at research reactor JRR‐3 in Japan Atomic Energy Agency. (c) Diffraction image from an HIV‐1 protease crystal recorded on neutron imaging plate. (d) Structure of HIV‐1 protease/KNI‐272 complex determined by neutron crystallography with (left panel) and without (right panel) hydrogen atom and water molecule information. (See insert for color representation of the figure.)

crystal required more than 46 days’ beam time with exposure of 360 minutes per frame using BIX‐4 (Figure 12.2b) of JRR‐3 in JAEA (0.3° oscillation step, scanning 54.3°, requiring 181 frames total). An image of the neutron diffraction pattern from HIV‐PR collected by BIX‐4 in JRR‐3 is shown in Figure 12.2c. Data collection using a 2‐mm3 volume crystal on BioDIFF of FRM‐ II in TUM can cover a full diffraction dataset within 10 days. Both of these diffractometers are equipped with a cryo system to keep the sample crystal at 100 K.

12.6.3.3 Data Processing of the Neutron Diffraction Dataset

Ideally, an X‐ray diffraction dataset should subsequently be collected from the same crystal used for neutron diffraction. The resolution of X‐ray diffraction data from a synchrotron facility is usually approximately 0.5 Å higher (i.e. typically a resolution limit of 1.2–1.4 Å) than the resolution of neutron diffraction data (i.e. a resolution limit of 1.5–1.9 Å) owing to the stronger X‐ray beam of the synchrotron. Since the neutron beam does not appreciably

12.7 General Information Obtained from Neutron Crystallography

damage the crystal, X‐ray diffraction following neutron diffraction permits an X‐ray dataset to be collected for the same crystal used for collection of neutron diffraction data. The higher‐resolution X‐ray data enables a non‐ hydrogen atomic model to be refined for use as ideal phase information for the neutron diffraction data. If the X‐ray diffraction data is not available from the same crystal, crystals obtained under the same mother liquor can be used as a replacement. Thus, neutron diffraction studies are best approached by a sequential neutron/X‐ray diffraction studies using the same crystal and data collection conditions (i.e. cryo‐cooling). Neutron diffraction spots on images are integrated, scaled, and combined into the intensity dataset including structure factors by using the programs DENZO and SCALEPACK [82], and programs developed for diffractometer. 12.6.3.4 Model Building Along with the Neutron Scattering Length Map

The process to build a molecular model into a neutron density map is fundamentally the same as building a molecular structure into an electron density map from X‐ray crystallography. The structural refinement, including both neutron and X‐ray diffraction data (“joint refinement”), is conveniently performed by the computer program PHENIX [83]. Currently, several structures determined by joint refinement have been reported [84]. Representative structures solved by neutron and X‐ray crystallography are shown in Figure 12.2d. The neutron scattering length maps are drawn by taking into account neutron scattering length and thermal factor for each atom. If hydrogen atoms bonded to carbon atoms still remain in the protein crystal, nuclear density for the side chain of an amino acid will appear atypically “skinny” because of the negative contribution of adjacent hydrogen atoms. Conversely, if such hydrogens have been replaced by deuterium, they will be clearly visible. Hydrogen atoms bonded to polar atoms (O, N) that are replaced with deuterium are similarly clearly observable. The use of X‐ray diffraction data can improve the positional refinement of non‐hydrogen atoms that are bonded to hydrogen/deuterium, as well as atoms having a small neutron scattering length such as sulfur.

12.7 General Information Obtained from Neutron Crystallography 12.7.1 Hydrogen Atoms Contributing to the Fold of a Protein 12.7.1.1 Hydrogen Bonding Interaction

One of the most important nonbonded interactions responsible for the structure, catalysis, molecular

communication, and information storage properties of biomolecules is the hydrogen bond. The hydrogen bond is the electrostatic interaction created between a hydrogen atom (partial positive charge) and lone‐pair electron acceptor (partial negative charge). If the specific location of a hydrogen atom is visible, the characteristics of a hydrogen bond interaction contributing to protein function and stability can be unambiguously determined. The characteristics of hydrogen bonding are affected by the circumstances created by protein structure such as local dielectric, electrostatic field, local charge/H‐ bond partner(s), solvation, and so on. One way to identify a hydrogen bond formed under “extraordinary circumstances” is the direct observation of hydrogen atoms having stereochemistry that distinguishes them from typical hydrogen bonds. For this purpose, the “Hydrogen and Hydration Database” (HHDB) has been prepared, and the character of hydrogen bonding interaction can be quantified using a utility available in HHDB. For example, if the geometry of hydrogen bonding interactions is plotted so that the bond vector of the donor atom and hydrogen atom are plotted along the horizontal axis, the distribution of locations of acceptor atoms can be plotted by reference in the X–Y plane. From this plot, distributions of acceptor groups are typically clustered in several characteristic regions, which strongly reflect the secondary structure associated with the interacting group. For example, α‐helices are known to have a unique hydrogen bonding interaction in which C═O at the n position is hydrogen bonded to the N─H at n + 4 position, which is identifiable as a characteristic region in this plot. Structural dynamics and strain associated with function may be identified from H‐bond interactions that are inconsistent with the local secondary structure. As with stereochemical libraries derived from X‐ray structures, knowledge of hydrogen bonding stereochemistry will be significantly improved as more neutron diffraction structures are solved at high resolution and deposited in the HHDB. The results of neutron crystallography also show that there are some unique hydrogen bonding interactions in α‐helices. One such interaction is the bifurcated hydrogen bond [20], in which two acceptor atoms (O) in C═O groups at n and n + 1 positions share one hydrogen atom belonging to the donor (N) at the n + 4 position (Figure 12.3a). Another unique and important type of hydrogen bond interaction is the low‐barrier hydrogen bond (LBHB) – newly observed, for example, in the crystal structure of photo‐reactive yellow color protein (PYP) determined to 1.5 Å resolution by neutron crystallography [29]. The LBHB is a unique hydrogen bond that has a very short donor–acceptor distance (s

308

Figure 13.6 Molecular motion timescales, energy landscapes, and NMR measurements. Top: motions in the pico‐ to the nanosecond timescale result from rapid jumps between an infinite number of micro‐states. These motions affect the relaxation rates of various spin states that may be measured by monitoring the signal intensity as a function of time. Middle: motions in the micro‐ to the nanosecond timescale are due to transitions between states separated by an activation energy barrier of few kcal∙mol−1. These motions result in resonance line broadening. Quantitative information may be obtained on the populations of states and on the transition rate by measuring transverse relaxation rates under different effective radio‐frequency fields (indicated here as the CPMG frequency). Bottom: real‐time kinetics may be observed in the timeframe of seconds by recording fast 2D spectra shortly after a perturbation of the system. These spectra provide a view of molecular transformations at atomic resolution.

difference between relaxation times of a magnetization that is perpendicular (transverse) or collinear (longitudinal) to the applied magnetic field gets larger. Thus, for large macromolecules, the longitudinal relaxation time is on the order of seconds, while the transverse relaxation time lies in the range of milliseconds. The definition of these multiple timescales is unique to NMR spectroscopy and contributes to its invaluable contribution for our understanding of molecular recognition mechanisms. However, it also requires that measurement schemes are specifically designed and adapted to probe the functionally relevant dynamical features for the molecular system of interest. The follow-

ing paragraphs aim at providing some guidelines along these lines based on recent methodologies and their applications. 13.5.2 Probing Disorder: Relaxation Measurements and Interpretation Very soon after its application to facilitate the assignment of protein resonances, the possibility to use 15N nuclei as a probe for global and internal protein motions was established [83]. The success of this method is based on several key elements: (i) the large dispersion of 1 H─15N correlation peaks in 1H─15N HSQC spectra pro-

13.5 Biomolecular Dynamics by NMR

vides a mean to measure relaxation rates from individual backbone amide nitrogen atoms; (ii) the magnetization state of 15N nuclei is read indirectly through the covalently bound amide proton, rendering these experiments very sensitive; and (iii) within the protein backbone, each nitrogen and its bound proton can be considered as an isolated pair of spins, facilitating the interpretation of 15 N relaxation rates in terms of molecular motions. A plot of these relaxation rates along the protein sequence provides a first insight on the local fluctuations of the protein backbone as well as on its global dynamical behavior. The ratio of transverse to longitudinal relaxation rates is mostly sensitive to the global rotational diffusion of the protein for 1H─15N pairs of nuclei located in its folded parts, while the profile of heteronuclear 1 H─15N NOE values reports on the presence of high‐ frequency motions that may reveal local disorder. Alternatively the product of the two relaxation rates has been shown to be a sensitive probe for self‐oligomerization. Numerous studies have used these approaches to describe the redistribution of protein motions upon interaction with a small ligand or with larger molecules [84]. It was shown that ligand binding generally decreases the amplitude of backbone high‐frequency motions, although the reverse has been observed in a number of cases, highlighting the ability of proteins to redistribute the molecular fluctuations over the protein backbone to fine‐tune the protein–ligand interaction energies. More quantitative analysis of heteronuclear relaxation rates involves the choice of a specific model for the molecular motions. Due to the reduced number of observables per amino acid (three for a complete set of relaxation rates at a given magnetic field), the choice is restricted to models with a limited number of parameters. For this reason, the “model‐free” approach proposed by Lipari and Szabo was adopted as a standard model to describe protein motions from heteronuclear relaxation data [85]. In the most simple application of this model, the internal high‐ frequency motions are described as an order parameter (S2), whose value between zero and one provides a measure of the amplitude of these motions. The rotational diffusion of the whole protein is described by a correlation time. The inability of the simplest form of this model to reproduce the experimental relaxation rates is taken as an evidence for more complex fluctuations, and the model may be extended to account for additional motions. The most impressive application of this approach is its ability to measure the anisotropy of the protein rotational diffusion thanks to a relation linking the individual N─H vectors orientations within the protein to the measured relaxation rates of the nitrogen atom [86]. Contrasting with their importance in molecular recognition mechanisms, the dynamics of amino acid side chains prove more challenging to describe using hetero-

nuclear relaxation measurements. Protein enrichment with 13C isotope enables carbon relaxation measurements; however, both measurements and their interpretation as motional parameters prove to be difficult due to the coupling with neighboring carbons and protons. Quantitative information about side‐chain dynamics is obtained by measuring 13C and 2H (D) relaxation rates with protein samples where methyl groups of isoleucine (δ1), valine, and leucine are selectively labeled with 13C and 2H (13CHD2) in an otherwise uniformly deuterated background. Such labeled samples, which enable the study of dynamical properties of high‐molecular‐mass proteins or complexes, are obtained using appropriate labeled α‐ketoacid precursors in E. coli cells grown in minimal medium [87]. Measurements of the relaxation rates of both the 13C and 2H nuclei allow a robust and accurate description of the methyl dynamical behavior reported through an order parameter of the random 2 fluctuations around the methyl axis (Saxis ) [18]. This approach is applicable to the study of proteins or protein complexes as large as the proteasome, provided that the methyl group assignments could be obtained [70]. As the technology of methyl labeling still evolves, novel probes are made available such as the alanine or methionine methyl groups [71]. The order parameters derived from NMR relaxation rate measurements provide a unique access to the conformational entropy of the protein on a per‐residue basis [88, 89]. The comparison between backbone or side‐ chain conformational entropies measured for a protein in its free or ligand‐bound state reveals how proteins take advantage of this energy to achieve optimal selectivity and affinity for a particular ligand. Several recent examples have enlightened the power of such approaches to further our understanding of molecular recognition mechanisms, revealing the role of conformational entropy in protein–ligand binding energy [90], the persistence of ligand entropy when bound to their receptor [91], or the importance of entropy to achieve specific protein–DNA sequence recognition [92]. 13.5.3 Beyond the Rotational Diffusion Limit The information provided by NMR relaxation studies is restricted to motions occurring in a time window with an upper limit defined by the global rotational correlation time of the molecule (10−9–10−8 seconds). Molecular motions occurring in a slower timescale (10−5–10−2 seconds) affect the linewidth of resonance peaks and may be analyzed quantitatively using relaxation dispersion measurements [93]. These experiments effectively measure the relaxation rate in the presence of a weak radio‐frequency irradiation of variable intensity. The way relaxation rates depend on the field intensity

309

310

13 Determination of Protein Structure and Dynamics by NMR

rovides information on the exchange rate between the p different states populated by the protein, the difference between the corresponding chemical shifts, and the population of these states, provided their number is limited to two or three. It has been recently recognized that a non‐detectable low‐populated state of a protein can be structurally characterized using relaxation dispersion approaches providing a way to describe excited states of proteins [94]. Combining these approaches with crystallographic data provided invaluable insights into enzymatic mechanisms [95–97], residues involved in ligand release [98], or intermediate states in protein folding mechanisms [99]. Of particular importance is the possibility to study the correlation between multiple ligand binding pockets to elucidate allosteric regulation mechanisms [100]. The development of NMR methods enabling the description of low‐populated states of biomolecules is currently a field of intense research, and novel approaches have recently been made available to detect and characterize the presence of transient low‐populated states, such as chemical exchange saturation transfer (CEST), dark‐state exchange saturation transfer (DEST), or PRE [101]. The use of extensive RDC measurements in different anisotropic media provides a way to explore conformational fluctuation with timescales larger than the rotational correlation time of the molecule, filling the gap between 10−8 and 10−2 seconds [102, 103]. The development of fast acquisition methods extends the range of dynamic timescales to a few seconds, the time required to record heteronuclear correlation spectra. These methods prove useful for the measurement of hydrogen/ deuterium exchange rates, the characterization of protein folding intermediates, or the description of large complex assembly mechanisms [104, 105]. Since many diseases such as Alzheimer or cancer involve altered protein dynamics, it is of fundamental importance to take into account this reality for future therapeutic strategies. The unique ability of NMR spectroscopy to provide structural and dynamic information on biomolecule transient conformations offers new opportunities to elaborate ligands specifically designed to target these states, which are very often the functionally relevant ones.

13.6 Intrinsically Disordered Proteins 13.6.1 Introduction The discovery that some proteins may natively exist in an unstructured state is quite recent. For decades it has been thought that the function of a protein or of any biological macromolecules is determined by its spatial

structure. Since the middle of the twentieth century, many protein structures have been acquired, and the structural details have continuously sharpened. This has opened a window on the molecular details of the protein function, which has had a profound impact on the community of biologists. It was now possible to understand the specificity of enzymes to their substrates or of antibody to their antigens. The concordance of this approach with the lock‐and‐key model hypothesized by Fisher at the end of the nineteenth century led to conclude that all interesting proteins ought to be structured. Correlatively, unstructured objects could not bear any function, and, if occasionally found in biological systems, it was the sign of some pathological malfunctioning. However, since the beginning of the twenty‐first century, experimental evidence has been accumulating, indicating that some proteins are natively present in a state lacking the characteristic features of a structured protein. These proteins are called nowadays intrinsically disordered proteins, using the word disordered rather than unstructured, to stress the fact that they do not lack something, but are just different [106]. These systems have now been extensively studied, and there is a growing corpus of literature describing their properties and the techniques for studying them. It was found that they are quite abundant in particular in higher eukaryotes where IDPs are said to represent 30% of the protein material. Surprisingly, IDPs are frequently involved in the regulation of important biological mechanisms such as cell cycle, kinase cascades, and regulation of gene expression. In consequence, they constitute important potential therapeutic targets and should not be overlooked. We will focus on how NMR can bring some information on IDPs, particularly when involved in important biological functions, and as such candidates for drug targeting. In consequence, low‐complexity domains (polyQ, polyG, low‐complexity sequences, and repetitive patterns) will be excluded. Additionally, for the sake of simplicity, we will indistinctively call IDPs either complete proteins being in the disordered state or disordered domains of proteins containing also structured domains. 13.6.2 General Presentation of IDPs Structured proteins are usually organized around a central hydrophobic core, with characteristic secondary structures solidly connecting together the different parts. In consequence, hydrophobic residues are well represented and organized into “patches” along the primary sequence. In contrast, IDP primary sequences are characterized by a biased composition of amino acids, where small, charged, and generally hydrophilic residues are more present, in contrast to larger hydrophobic resi-

13.6 Intrinsically Disordered Proteins

dues. Many programs, built from database statistical analysis, are able to predict the disordered state of a primary sequence, and it is considered to be relatively easy to predict whether a given protein domain will show IDP properties or not. It is also generally recognized that these domains can be found in physical states alternating between different more or less unfolded structural states. For instance, a transitory recognition site may be formed depending on the presence of a partner or upon PTMs. The same IDP may fold only upon binding to a given partner but may also present an alternate fold when bond to a second independent partner. This explains the role observed for IDPs in the regulation of biological networks, where they act as integration platforms, on which many partners interact with the IDP itself, but also between the IDP partners themselves in a transient manner. IDPs are also commonly subject to many PTMs, such as phosphorylation, acylation, ADP ribosylation, glycosylation, methylation, ubiquitination, or SUMOylation [82, 107]. Many IDPs contain multiple different PTMs that modulate their molecular interactions. A striking example of this is the presence in the DNA‐packing histone proteins of a disordered tail that experiences an extraordinary variety of modifications, thus modulating the histone interactions not only with other histones but also with the transcription machinery, in a way that controls chromatin packing and the down‐ or upregulation of gene expression. Additionally, it was demonstrated that many sites of protein phosphorylation are located within intrinsically disordered regions. For instance, it is known that the SH3 domains found in kinases have a propensity to bind to unstructured extended proline‐rich tracts. All these properties have to be pondered when considering a disordered protein as a potential drug target. In particular, because signaling pathways are transient and because IDP domains interact with many possible partners, one expects the affinities with the various partners to be rather weak, in the micromolar range, corresponding to high koff values. Also, most interactions will be protein–protein interactions (PPIs) with small surface contacts, involving only a few key residues. By definition IDPs lack a definite structure, so how can we describe such objects? Following Uversky [108], IDPs can be categorized in three main classes. The random coil (RC) state is observed when all energetically authorized states are equivalent, and no particular bias toward certain turns, loops, or secondary structures can be found. The molten globule (MG) state, on the contrary, presents stable secondary structures, eventually separated with RC tracts. Some hydrophobic van der Waals contacts are present, but the chain lacks a well‐packed core and presents no defined tertiary fold so that it

remains fully dynamic. Finally the pre‐molten globule (PMG) state is an intermediate state where some hydrophobic clusters are formed with some residual secondary structure. There is of course a continuum of possibilities from a fully RC state to a fully ordered structure, and this classification is just indicative. Depending on the purpose of the study and the state of the protein, describing more precisely the conformational states of an IDP requires that one resorts to complex modeling, where ensemble of conformers are built so that the ensemble, as a whole, describes the behavior of the IDP. The methods used to achieve this goal are however beyond the scope of this chapter. 13.6.3 NMR Techniques for IDP Disordered proteins do not crystallize – or actually they might, but eventually in one of their many structural states, which might or might not be relevant to the problem under study. Obviously, IDPs should be studied in solution as much as possible, where conformational equilibrium can freely take place. Because of its ability to identify structural elements in solution, joined to the possibility of analyzing dynamic objects and transient states, NMR is probably the best available spectroscopy to study IDPs in solution. For a recent review, check Jensen et al. [109]. IDPs are characterized by the rapid interconversion between different conformational states. From an NMR point of view, this has two important impacts on the spectrum: long relaxation rates and averaged spectral parameters. First, because of the fast tumbling of residues not embedded in a rigid structural network, spectral lines present long relaxation times, resulting in narrow linewidths. The presence of sharp lines, well separated from the noise level, allows a high sensitivity of the spectra and the possibility to work at rather low concentrations. The long relaxation rates provide the possibility to acquire long transients and allow resolving even small J‐ couplings. In consequence, magnetization transfer based on coherent transfers through J‐couplings, as found in multidimensional spectroscopy, will work particularly well. On the other hand, due to the absence of structural contacts, NOESY will not provide any useful information. Furthermore, as IDPs are not compactly folded, with each local domain relatively free to move, the overall size of the protein is not reflected in the linewidths, and very large IDPs can easily be observed. Second, the rapid exploration of the conformational space results in an averaging of all spectral parameters, such as chemical shifts, J‐couplings, or RDCs. Observed chemical shifts are in consequence very close, if not equal, to RC values, leading to a considerably reduced

311

312

13 Determination of Protein Structure and Dynamics by NMR

spreading of chemical shifts, compared with what is observed for ordered proteins. In particular, 1H chemical shifts of the HN of the backbone, which are usually dominated by static internal H‐bonds and structural effects in folded proteins, appear clustered in a narrow chemical shift range around 8 ppm. This has a drastic effect not only on 1H 1D spectra but also on 1H─15N 2D HSQC spectra, which appear compact and crowded. Because of these specific properties, the assignment procedures for IDP spectra are different from the standard ones. Due to the reduced range of chemical shifts, 1D or 2D spectra are rapidly overcrowded and can be difficult to interpret due to the presence of many resonance superimpositions. However, if double 15N─13C labeling can be performed, the high quality and sensitivity of the 3D spectra more than compensate for this difficulty. The fact that a complete set of 3D experiments can be easily acquired, thanks to the ease of magnetization transfers, allows gathering a comprehensive set of information, quite adapted to automatic assignment procedures as shown for the Tau protein [110]. Thanks to the efficiency of magnetization transfers and to the narrow linewidths, larger experiments such as 4D or even 5D can be considered, in reasonable times, with the help of non-uniform sampling [111]. The large dispersion of carbonyl carbon frequencies is exploited using the 15N─13C correlation, and a set of carbon‐detected 3D spectra has been developed to get protein backbone assignments [112]. This approach requires highly sensitive carbon channels provided by cryogenic probes. If double labeling is not manageable, an alternative method has also been used, based on the analysis of synthetic peptides of intermediate length, spanning the primary sequence of the protein [113]. Peptides are constructed so that they can be assigned using regular 1D and 2D 1H techniques and are chosen to overlap each other along the primary sequence. Owing to the fact that the protein chain is not folded and chemical shifts are mostly impacted by the local environment, assumption can be made that the shifts in the different peptides are close enough to the ones in the complete protein. The assigned peptides can then be used to successfully assign the complete disordered chain. Additionally, this second approach, using synthetic peptides that can be easily modified, opens the possibility to explore the effect of PTMs or mutations. One remaining difficulty in the study of IDPs by NMR that should not be overlooked is proteolysis. IDP chains are quite fragile and highly susceptible to degradation. Not only small catalytic amounts of proteases may still be present in the solution after protein purification, but also spontaneous proteolysis of the chain is not an unseen phenomenon. The difficulty lies in the fact that, in the absence of a definite fold, the NMR spectrum will

be only slightly perturbed, so such degradation will be difficult to detect and monitor. 13.6.4 Interpreting IDP Spectra The averaging effect created by the constant conformational exchange implies that the measured spectral parameters carry little information; in consequence some special procedures have to be used to extract as much information as possible from IDP NMR spectra. Not being able to rely on NOESY spectra, the classical analysis techniques are unavailable, and two alternative types of approaches are commonly used: chemical shift analysis and RDCs. The 1H and 13C chemical shifts are quite sensitive to secondary structure, in particular for the atoms located on the protein backbone, and we have seen above that, in ordered proteins, chemical shifts carry information about the secondary structure of the protein chain. For instance, in the CHα moiety, chemical shift values above the RC value are indicative of a β‐ sheet structure, and chemical shift values below the RC value are indicative of an α‐helix structure. In IDPs, the effect is less pronounced but can nevertheless be analyzed, provided corrections for temperature and pH are performed and a precise referencing is ensured. In this line, the use of 4,4‐dimethyl‐4‐silapentane‐1‐sulfonic acid (DSS) as a reference is strongly recommended over trimethylsilyl propionate (TSP), as the latter is more prone to be affected by pH. In order to analyze chemical shifts, several scales and methods can be found in the literature, and chemical shifts are classically analyzed in terms of secondary structural propensity or transient secondary structures using dedicated chemical shift databases [114]. On the other hand, the RDC signal is sensitive to the orientation of the chemical bond with respect to the main magnetic field (see above). In a pure solution, where all orientations are equiprobable, dipolar couplings are averaged to zero, and no information can be extracted. However, if some partial alignment is imposed, for instance, by a dilute liquid crystal, RDCs are observed. For an ordered protein, the shape of the protein and its interactions with the alignment media determine this partial alignment. For an IDP, one would think that the absence of a determined shape would preclude any alignment effect. On the contrary, it was found that in sterically aligning media, the RDCs of a completely disordered chain present a typical bell shape, explained by the nearly perpendicular orientation of the N─HN bond vector relative to the principal alignment axis, and an increased angular averaging toward the termini of the chain [115]. Additionally, it is observed that, in partially folded IDPs, RDCs are extremely sensitive to elements of secondary structure. For instance, a transient helix, organized in an

13.7 Alternative Approaches for Non‐soluble Proteins

otherwise fully extended disordered chain, results in a characteristic perturbation of the bell shape, with a local inversion of the sign of the RDC value. Together, information gathered from chemical shift and RDC measurements allows the determination of partial or transient structuration of the protein chain. When measured in the absence of any interacting partners, this is the signature of a natural propensity of the disordered chain and of the presence of a bias toward specific conformations in the sampling of the conformational space as illustrated by an exemplary analysis on the Sendai virus [116]. When an IDP interferes with a folded protein, it is generally observed that the unfolded chain acquires a transient fold, characteristic of the interaction. Any interaction of the IDP chain with a cognate partner will impact the measurement and will thus be easily detectable. Depending on the strength of the interaction and its kinetic parameters, several phenomena may be observed. If the IDP chain binds and folds with a high affinity, the lines of the bound domain will acquire the spectral parameters of the folded partner, so usually a large frequency shift from the free form and increasing linewidths are observed. If an excess of IDP is present, both bound and unbound forms will be present in the spectra. On the other hand, a weak interaction, characterized by a fast exchange between the bound and free forms, will be observed as modifications of both the position and width of the spectral lines, proportional to the ratio of the bound and unbound concentrations. In that case, only one line will be present in the spectra, with a position and a width sensitive to the concentrations of the bound and free forms. In the case of intermediate kinetics, with a characteristic exchange time on the order of the NMR timescale (see above), the exchange phenomenon brings an additional relaxation mechanism that may broaden the NMR signals beyond detection. When this is the case, lines will just disappear, and no information can thus be extracted. In this case, different experimental conditions have to be explored in the hope of either slowing or speeding up the exchange to bring it out of the NMR chemical shift timescale. 13.6.5 IDPs as Drug Targets IDPs are usually involved in vivo in many interactions with many protein partners and act as regulating agents [117]. In consequence, targeting IDPs for drug development is profoundly related to targeting PPIs. If the structure of the protein complex is known, the design of a small interacting molecule is doable. However, the weak affinity and poor specificity of these molecules reduce their potential therapeutic use. For these reasons, the

design of new molecules efficiently targeting IDPs requires specific approaches and new developments. The search of new IDP targets is a very active field, and the druggability of an IDP should be evaluated in light of the specific features of such molecular objects that make them different from globular proteins. The latter usually present a large hydrophilic surface, with the presence of a characteristic hydrophobic surface or even a binding pocket with a determined geometry. Specific interactions with IDPs are better described by a series of compact and patterned interaction zones, found on a flexible backbone, adopting a single defined conformation only upon binding. These characteristic patterns, sometimes called short linear interactive motives (SLIM) [118] or molecular recognition element (MORE) [119], have been intensively studied and categorized. When targeting PPIs involving IDPs, one may choose to mimic the IDP itself, thus displacing the interaction, or to target the protein interface, and many examples of both approaches are found in the literature (see, for instance, two early examples of these approaches: [120, 121]).

13.7 Alternative Approaches for Non‐soluble Proteins All the developments presented so far have been devoted to NMR studies performed in vitro in solution. However there are many alternatives to this standard scheme. Proteins can be studied by NMR spectroscopy in many different nonstandard conditions, and these are increasingly becoming an important source of information for drug discovery and development. In this final part, we will rapidly look at the methods available to study proteins in the cell membrane or in the solid phase. 13.7.1 Solid‐State NMR NMR spectroscopy is inherently anisotropic, and nearly all NMR parameters are strongly dependent on the orientation of the molecule relative to the main magnetic field applied to the sample. In particular, dipolar couplings or chemical shifts depend strongly on this orientation and can show effect strengths several orders of magnitude larger than the usual chemical shift range observed for solution conditions. Sharp resonance lines are observed in liquid‐state NMR thanks to the averaging effect brought by the rapid tumbling of the protein in the solvent, ensuring all orientations are equiprobable and in rapid exchange. This rapid reorientation reduces all anisotropic interactions to their mean value. For dipolar couplings it corresponds to zero, while for chemical shifts it is the isotropic value, as usually observed in liquid‐state NMR. This averaging is not available for

313

314

13 Determination of Protein Structure and Dynamics by NMR

solid‐state samples, and in consequence, for such samples, one observes NMR spectra displaying very broad patterns, with a shape showing all possible orientations of the molecule, the so‐called “powder pattern” shape. For a general review on solid‐state NMR (ssNMR), check, for instance, [122–124]. There is an elegant solution to this problem for spin ½, which fortunately corresponds to the most abundant spins studied in biological NMR: 1H, 13C, 15N, and 31P. The effects of anisotropy on these spins are only described by rank‐2 tensors that display an angular dependence that crosses zero at the so‐called magic angle with a value of 54.7°. In consequence, a rapid rotation of the sample along an axis oriented at this angle respective to the magnetic field effectively reduces the effect of the anisotropy to zero and restores sharp isotropic lines in the NMR spectrum of the sample in solid phase. This technique, called MAS (for magic angle spinning), requires specific equipment but, in the right conditions, leads to highly resolved spectra, which can be used to determine precisely molecular structures and interactions at the atomic level [125]. In contrast to liquid‐state NMR, the linewidth of MAS ssNMR signals is no longer dependent on the size of the molecule. Several parameters dominate the line shapes, in particular the spinning speed and angle alignment, but, most of all, to ensure sharp lines, a highly homogeneous sample is paramount. Very regular solid, eventually micro‐crystallized, leads to sharp lines that are amenable to assignment and structural studies. Due to the inherent difficulty to get sharp 1H signals, protein assignments had been limited to the 13C and 15N atoms until recent developments of ultrafast MAS rotors, with spinning frequencies above 100 kHz, enabled the detection of protons in fully protonated proteins with improved sensitivity [126]. Furthermore, the reduced size of the rotors allows backbone and side‐chain full assignment using submilligram quantities of protein. As the size of the system does not impact the width of the NMR lines, large structured objects can be studied with this approach. ssNMR can provide atomic details on large biological supramolecular assemblies such as large molecular machines, helical filaments, cross‐β fibrils, or membrane‐embedded oligomeric proteins. The inherent insolubility and large size of these multicomponent assemblies restrict the use of solution NMR, and their noncrystallinity and elongated shapes present obstacles to X‐ray crystallography studies. Provided the complexity of the NMR spectra is kept at a reasonable level enabling their analysis, structural atomic details can be obtained on these systems that would be difficult to study otherwise. Several reviews can be found in the literature, covering fields such as the structure of large repetitive assemblies [127],

amyloids [128], membrane proteins [129], or even nucleic acid–protein interactions [130]. An alternative to MAS consists in spinning the sample relatively slowly at the magic angle. In this approach, called HR‐MAS, short for high‐resolution MAS, the spinning speed is not high enough to sharpen the lines of the molecules in the solid state. However, solutes and other mobile molecules, present in the solvent surrounding the solid phase, display sharp lines because their internal mobility, combined to the spinning speed, is sufficient to reduce all second‐order interactions. In consequence, only the mobile parts are visible in the spectrum, every other signal being blurred to a wide weak signal. This approach has been used to study the interaction of a PDZ domain, fixed on affinity beads, with a series of cognate peptides [131]. On the PDZ domain, several methyl signals from the unstructured C‐terminal tail, which is involved in the binding of the PDZ domain to the partner peptides, were observed and measured. 13.7.2 Membrane Proteins The structural study of integral membrane proteins is still in its infancy, as testified by the paucity of membrane protein structures in the PDB. Many techniques have been used to study these systems, including ssNMR. Already more than 90 structures have been determined using ssNMR, either alone or in combination with liquid‐state NMR or X‐ray crystallography (see [132, 133]). One difficulty in the study of membrane proteins by MAS ssNMR is the risk of disrupting the fragile sample by the centrifugal forces applied to the sample by the high rotation speed. One alternative is to work on a sample where all the membrane normal vectors are aligned and oriented with the magnetic field. In this case, the second‐rank tensor interactions are not averaged to zero, but rather reduced to one value because of the common orientation of all the molecular frameworks. Realizing a perfect alignment is beyond the possible, and some orientational spreading remains, which in consequence broadens the spectral lines. For this reason, complete assignment is not feasible; however, the study of such static uniaxially oriented samples results in angular restraints, which provides information for the detailed analysis of polypeptide structures [134]. Because of the large 15N chemical shift anisotropy of the bound peptide, the orientation and detailed conformation of membrane‐ embedded helices can be precisely determined by combining the measure of dipolar coupling and chemical shift of 15N spectra. This approach has been successfully applied on toxins and membrane‐disrupting peptides. Integral membrane proteins can also be studied in solution. Here the protein has to be protected in a soluble lipid environment. Typically, this is performed by a

13.8 Optimized Strategies for the Study of Biomolecules in Solution

careful choice of detergents so that stable and well‐ defined micelles are generated. It has been demonstrated that matching the micelles hydrophobic diameter to the hydrophobic thickness of the membrane protein led to spectra that are often adequate for high‐resolution structural investigations. An alternative is to use bicelles, a sort of mixed phospholipid micelles, which present less curvature than micelles and thus mimic more adequately the real membrane environment [135]. Nanodiscs (i.e. small membrane patches stabilized by apolipoproteins) have also been used, for both liquid‐state NMR and ssNMR (after precipitation by centrifugation) [136]. All these techniques have been developed over the last few years, and striking results enlightening the molecular mechanisms of signal transduction by GPCR have been obtained (see [137, 138] for reviews).

13.8 Optimized Strategies for the Study of Biomolecules in Solution It has been shown that NMR is an extremely potent tool for the structure‐based drug discovery process. Not only NMR can be used to determine target structures at a molecular level, but also molecular details such as target– ligand interactions, dynamic changes in molecular complexes, or even unstructured domains can be closely monitored, and all the parameters can be used to drive the drug discovery process. However, NMR stands apart from most other analytical methods by the fact that it is a scalable technique. We mean here that it allows both rapid and simple (and cheap) experiments to bring early information on the system, while more involved experiments are able to bring detailed information on the same system. NMR will thus be used in the early stage of the study for target quality control. It can also be used to validate hits obtained from high‐ or medium‐throughput screening studies. NMR can also be used to quantify hits, for instance, in terms of binding affinity or in terms of competition with native ligands during the hit‐to‐lead process. Finally, the geometry of binding can be determined and used to optimize the binding molecule. While the former experiments can usually be conducted with a few nanomoles of target, the latter ones generally require a larger amount of material, usually isotopically labeled, expensive to produce, and laborious to purify. In consequence, it is important to develop an overall strategy that will benefit from the NMR scalable properties and will accumulate results all along the study, starting with early results obtained on small amount of sample and generating fundamental results at more advanced stages.

This strategy will be applied at three levels: NMR instruments, samples, and NMR experiments. One should rely on a planning of increasing complexity of the samples, starting with simple‐to‐produce samples and finishing with the fully labeled samples presented above. 13.8.1 Instrumental Setup An NMR laboratory should be equipped with various types of instruments, which will cover all the requirements for structural and interaction studies. However, considering the price of equipment and maintenance, the instruments have to be chosen with care. An NMR instrument is really constructed around three more or less independent parts: the magnet, the probe, and the electronic console. The magnet is characterized by the strength of the magnetic field that determines the ultimate overall sensitivity and spectrum resolution. As all frequencies are proportional to the magnetic field intensity, the magnet is traditionally characterized by the frequency of the 1H spins – the larger the better (and the more expensive). The available experimental volume (the magnet bore) is an additional parameter that might be important, in particular for an ssNMR machine. The probe is the apparatus that actually excites the spins and detects the signals and therefore determines the kind of signals that can be detected. In a structural biology laboratory, 1H and 13C spins should be detectable on virtually all probes, which should also be equipped with high‐quality magnetic field gradients. However, the presence of other possible additional channels (15N, 19F, 31P, tunable) is a matter of choice and determines which experiment will be possible. These days, probes come in two flavors: regular probes and probes equipped with electronics cooled to a few Kelvin. These cold probes are quite expensive to purchase and maintain (they can be as expensive as the magnet), but they bring a gain in sensitivity that allows faster and/or more sensitive experiments. Nearly an order of magnitude in sample amount can be gained using cold probes. Finally, the console contains the electronics and presents no particular parameters, except being adapted to the actual magnet and probes. Table 13.3 summarizes the different types of NMR equipment that are typically found in a structural biology laboratory. Having several instruments with different features in the laboratory allows a large panel of possible experiments and prevents the obsolescence of the equipment by upgrading each machine independently. 13.8.2 Sample Preparation It was presented above how many different kinds of samples can be studied by NMR: unlabeled, partially labeled, or fully isotopically labeled, at physiological concentration

315

316

13 Determination of Protein Structure and Dynamics by NMR

Table 13.3 The different types of NMR instruments found in a structural biology laboratory. Instrument type

500–600 MHz

Probe

Sensitivity limit (μM) Cost

Typical use

Regular

200

−

Quality control

25

+

Quality control, ligand screening, small protein structure

Cold 700–800 MHz 900 MHz and above

Regular

100

+

Ligand screening, medium protein structure

Cold

10

++

Ligand screening, medium protein structure, protein–protein interactions

Regular

50

++

Large protein structure, large complexes, IDPs

5

+++

Large protein structure, large complexes, IDPs

Cold

All instruments should be able to measure 1H, 13C, and 15N nuclei.

Table 13.4 The different types of protein samples amenable to NMR studies, along with the different experiments that can be realized. Sample type

Cost

Quality control

Screening

Binding information

Structure

Unlabeled

—

Possible

Ligand monitoring

Poor

—

15

N labeled

—

Easy

Protein monitoring

Good

Difficult

15

N─13C

+

—

Side chain

Good

Possible

++

—

Good

Ideal

H─ N─ C

2

15

13

1

H‐specific

+++

Ideal

Detailed

Methylated

+++

Ideal

Partial

or high concentration, or in a simple buffer or in complex media. All these different states determine which kind of NMR measurement will be convenient, and each sample should be prepared in view of a given analysis. Table 13.4 summarizes the various cases that will be found during a drug discovery study, ranging from small‐molecule screening to biological target study. Different level of isotopic labeling can be useful for protein studies, and this will determine the type of NMR experiment that can be performed. 13.8.2.1 Unlabeled Proteins in Buffer

This is usually the simplest sample that will be available in the laboratory, usually coming directly from the production unit. Most production groups are accustomed to producing samples for screening or structural biology and will provide the protein at a given mass concentration (expressed in mg ml−1) in a suitable buffer. NMR is sensitive to the molar concentration rather than to the mass concentration, and the conversion has to be done. As a rule of thumb, to obtain a concentration of 1 mM of protein requires 1 mg ml−1 of sample per kDa. In other words, a 1‐mM solution of a 20‐ kDa protein is 20 mg ml−1, and alternatively, a 1‐mg ml−1 sample of the same 20‐kDa protein is 50 μM. 13.8.2.2 Labeled Proteins

However, for further studies, the full analytical power of NMR can only be expressed with isotopic labeled

roteins. Several level of labeling are possible, as prep sented in Table 13.4, and will determine which type of experiment will be possible with a given sample, as described below. For large projects, it is common to have several preparations of the same system, with different labeling schemes, allowing the full range of experiments presented below. 13.8.3 NMR Experiments Table 13.5 summarizes the different types of NMR studies, along with the different experiments that can be realized (concentrations are related to a 700–800‐MHz spectrometer equipped with a cold probe). 13.8.3.1 Quality Control

The very first use of NMR in the study of a target is the control of the quality of the protein sample itself. Many techniques are available for this control (SDS‐PAGE, size exclusion chromatography, dynamic light scattering, absorbance, mass spectrometry, etc.), and these techniques are certainly useful and should not be overlooked. However, NMR is probably the only technique able to rapidly probe the folding of a protein and to tell whether the conformation of the protein under study is correct and active. Of course this control is paramount, as ignoring the denaturation state, or even the misfolding of the target can lead to nothing but meaningless results.

13.8 Optimized Strategies for the Study of Biomolecules in Solution

Table 13.5 The different types of NMR studies.

Type of information

Binding information

Structural information on binding

10

Folding purity

Poor

—

—

50

Purity

None

—

—

200

Side‐chain contacts

Poor

Possible

Natural abundance 13 C‐HSQC

—

500

Folding hydrophobic core

Good

Possible

13

13

100

All

15

15

20

Very good

Localization

Experiment type

Required label

1D 1H

—

DOSY NOESY

C‐HSQC N‐HSQC

C N

Minimum concentration (μM)

Backbone

Detailed

13.8.3.2 1D Spectroscopy on Unlabeled Samples

13.8.3.3 2D Spectroscopy on Unlabeled Samples

Due to the extreme sensitivity of chemical shifts to the surroundings, the NMR signals of the side‐chain and backbone atoms will be responsive to the overall fold of the protein. Of course, structural information is lost through the complex ways in which interactions add up; however, overall, an NMR spectrum acts as a faithful fingerprint of the protein global and local folding. In a 1H 1D spectrum, two types of signals, found in two well‐separated spectral zones, are particularly sensitive to the fold: signals from the amide protons (HN) and signals from the methyl protons (Met). Because the NMR signal is sensitive to the amide hydrogens being engaged in hydrogen bonds and because of the secondary and tertiary structures surrounding the backbone, the HN signals of a folded protein spread over the 6.5– 10 ppm region, while signals from RC regions would cluster around the 8–8.5 ppm range, leading to quite a different spectral pattern. On the other hand, many methyl groups are engaged in the hydrophobic core of a folded protein, and the ring current of the aromatic moieties engaged in this core has a strong impact on chemical shifts, either up‐ or downfield, with a high sensitivity to molecular stacking. So methyl signals from aliphatic side chains are expected in the 0.7–1.5 ppm range, but can be found between −2 and +3 ppm when engaged in tertiary structures. In particular, the appearance of signals below, say, 0.6 ppm, is the signature of a structured protein, and shifts in this region of the spectrum are the sign of a variation of the structure. The HN spectral region cannot be used with the same confidence because of a higher sensitivity of the signals to pH, salt concentration, or other solvent effects. It should be noted that the methyl signals are observable even in deuterated water, whereas the amide signals require a sample in H2O. In summary, a simple 1D NMR experiment, which can be performed on a medium machine with a small amount of material, can directly inform on the overall quality of the sample.

The full analytical power of NMR really unfolds with multidimensional (2D–3D) spectroscopy. While the most common 2D NMR spectroscopy relies on isotopic labeling, some 2D experiments on unlabeled samples are available and can bring valuable information at low operating costs. Diffusion‐ordered NMR spectroscopy (DOSY) departs from other 2D experiments, as the vertical axis is not presenting chemical shifts, but rather the translational diffusion coefficient of the different species present in solution. This is obtained by a pulse sequence close to imaging techniques, which rely on gradient modulations, and a complex analysis of the gradient dependence of the signal. This experiment is able to separate all constituents in a mixture and can show all unwanted pollutants, in particular small organic molecules, even at very low concentrations. Of course, all the atoms of a given molecule diffuse with the same coefficient. In consequence, if the protein under study presents some irregularity along the diffusion axis, this might be the sign of some sample polydispersity, a situation that has to be avoided for high‐quality measurements. In this sense, it brings an information comparable to dynamic light scattering, with however less sensitivity for large objects and more resolution for small molecules. NOESY experiments are also easy to perform on an unlabeled sample at medium concentration. It provides information on atomic contacts in the tertiary structure of the protein. Certain characteristic patterns in the amide region and in the Hα region may be used to determine the presence of helices and sheets; however, in the absence of assignment, this information remains limited. The BEST and SO‐FAST recent spectroscopic developments [139] allow minimizing the loss of signal from spin relaxation by a clever use of magnetization transfer. The gain in sensitivity makes possible the measure of 2D 13 C HSQC, at natural abundance of the 13C isotope, for higher sample concentrations (500 μM and above). This

317

318

13 Determination of Protein Structure and Dynamics by NMR

experiment allows to “zoom” on the hydrophobic core of the protein and is exquisitely sensitive to any minor event in the protein, either a rearrangement of the backbone tertiary structure or any ligand binding event. This approach has been used to control the quality of therapeutic proteins in their final packaging, without any additional preparation step [140]. It has been recently shown [141] that natural abundance 2D 13C HSQC can successfully be used to follow the binding of a small ligand on a large protein. In the published case, two binding sites of different affinities are observed. The fact that both sites are independent but nevertheless share common residues could be demonstrated even in the absence of an assignment of the protein spectrum. 13.8.3.4 2D and 3D Spectroscopy on Labeled Samples

Production of a labeled sample is more demanding and more expensive. However, the gains in resolution and sensitivity brought by the additional observable nuclei are substantial, and many experiments require labeling, in particular for structural analysis. Some labeling schemes are more expensive than others, and it is possible to optimize a production sample for peculiar requirements. 15 N labeling is relatively cheap to perform. It allows a direct look at individual amino acids of a protein (except prolines) by acquiring a 2D 15N HSQC spectrum. This experiment is very sensitive and rapid. It displays a single peak per residue originating from the H─N amide moiety of the peptide backbone. Any interaction or modification of the overall structure of the protein will be echoed by a displacement of the lines of the residues involved in the process. It is thus the technique of choice for monitoring folding/unfolding events and protein–ligand or protein– protein interaction. Assignment is required for a complete analysis of the interaction data. However, on 15N samples, it relies on 1H─1H magnetization transfers (typically in 15N HSQC‐TOCSY and 15N HSQC‐NOESY 2D or 3D experiments), which can be burdensome to analyze for anything but small proteins. Nevertheless, when available, complete structural information is accessible and can be used to analyze all interactions in terms of spatial positioning. Because of all these qualities and the small price overhead of 15N isotopes, some groups choose to produce all their protein samples directly as 15 N labeled, bypassing the unlabeled production altogether. 15 N─13C double labeling is the technique of choice for the study of medium‐sized protein samples. Standard sets of 2D, 3D, and eventually 4D NMR experiments allow the assignment of all the active atoms of the backbone (excluding oxygen) in a straightforward manner, eventually amenable to automation. The fact that

agnetization transfers do not occur between hydrom gens, but involve covalently connected heavy atoms, ensures strong transfers, less impacted by the linewidth and thus by the size of the system. The assignment provides the chemical shift values for all the observed atoms in the protein. Departure of these values from the mean tabulated RC values provides strong information on the local secondary structure for each amino acid and can be readily used as a rough estimate of the fold of the protein. If a homologous structure is already known (from the PDB, for instance), it can be used as a template to compute, with quite strong confidence, a homology structure. Going from this partial information to a detailed 3D structure requires additional information, which has to be brought by additional experiments, such as 1H─1H distances, extracted from NOESY spectra or RDCs obtained from experiments in partially aligned liquid crystals. For larger systems, NMR linewidths widen because of spin relaxation, and this standard approach cannot be used anymore. Higher‐level labeling has to be considered, where most of the hydrogens (the major relaxing spins) are replaced by deuterium. This leads to the triple labeling 2H─15N─13C. However, hydrogens are needed for the NMR measure and have thus to be reintroduced in the final sample. This can be easily done on the amide hydrogens, exchangeable with the solvent, by simply dissolving the protein in protonated water. However this approach is limited, and usually one has to resort to the complex preparation protocols presented above.

13.9 Conclusion NMR plays a particular role in biology. Indeed, more than just a method providing structural information, NMR offers a complete and rather complex view of a molecular system under study, including its structural and dynamic aspects. These later gained importance during the past years since it is now widely accepted that molecular structures only partially account for the functional properties of molecular systems and molecular dynamics should also be considered. This evolution was made possible thanks to the development of numerous spectroscopic methods enabling the observation of a molecular system’s dynamics at various timescales. Today, NMR spectroscopy offers a very large toolbox dedicated to the study of biomolecular systems. The available experiments can be quick or lengthy, partial or comprehensive, cheap or expensive, or very easy to run or requiring a complex setup. Whether you are interested in the detailed atomic structure of a protein–ligand interface, in the analysis of

References

the dynamics of a protein–protein complex, or simply in sample quality assessment, NMR will be of invaluable help. However, the variety of available experiments and their complexity may restrict its use. Experimental set-

ups have to be carefully designed to address properly the specific questions raised by a particular system. It was our aim to facilitate this process and we hope we have met this ambitious objective.

References 1 Kuhlbrandt, W. (2014). Biochemistry. The resolution 2

3

4

5

6

7

8

9

10

11

12

13

14

revolution. Science 343: 1443–1444. Wüthrich, K. (2003). NMR studies of structure and function of biological macromolecules (Nobel Lecture). J. Biomol. NMR 27: 13–39. Barrett, P.J., Chen, J., Cho, M.K. et al. (2013). The quiet renaissance of protein nuclear magnetic resonance. Biochemistry 52: 1303–1320. Kay, L.E. (2016). New views of functionally dynamic proteins by solution NMR spectroscopy. J. Mol. Biol. 428: 323–331. Shen, Y., Delaglio, F., Cornilescu, G., and Bax, A. (2009). TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts. J. Biomol. NMR 44: 213–223. Wishart, D.S. (2011). Interpreting protein chemical shift data. Prog. Nucl. Magn. Reson. Spectrosc. 58: 62–87. Doreleijers, J.F., Vranken, W.F., Schulte, C. et al. (2009). The NMR restraints grid at BMRB for 5,266 protein and nucleic acid PDB entries. J. Biomol. NMR 45: 389–396. Han, B., Liu, Y., Ginzinger, S.W., and Wishart, D.S. (2011). SHIFTX2: significantly improved protein chemical shift prediction. J. Biomol. NMR 50: 43–57. Shen, Y. and Bax, A. (2010). SPARTA+: a modest improvement in empirical NMR chemical shift prediction by means of an artificial neural network. J. Biomol. NMR 48: 13–22. Lange, O.F., Rossi, P., Sgourakis, N.G. et al. (2012). Determination of solution structures of proteins up to 40 kDa using CS‐Rosetta with sparse NMR data from deuterated samples. Proc. Natl. Acad. Sci. U. S. A. 109: 10873–10878. Zweckstetter, M. (2008). NMR: prediction of molecular alignment from structure using the PALES software. Nat. Protoc. 3: 679–690. Schmidt, C., Irausquin, S.J., and Valafar, H. (2013). Advances in the REDCAT software package. BMC Bioinformatics 14: 302. Morin, A., Eisenbraun, B., Key, J. et al. (2013). Collaboration gets the most out of software. Elife 2: e01456. Wassenaar, T.A., van Dijk, M., Loureiro‐Ferreira, N. et al. (2012). WeNMR: structural biology on the grid. J. Grid Comput. 10: 743–767.

15 Maciejewski, M.W., Schuyler, A.D., Gryk, M.R. et al.

16

17

18

19

20

21

22

23

24

25

26

(2017). NMRbox: a resource for biomolecular NMR computation. Biophys. J. 112: 1529–1534. Cavalli, A., Salvatella, X., Dobson, C.M., and Vendruscolo, M. (2007). Protein structure determination from NMR chemical shifts. Proc. Natl. Acad. Sci. U. S. A. 104: 9615–9620. Shen, Y., Lange, O., Delaglio, F. et al. (2008). Consistent blind protein structure generation from NMR chemical shift data. Proc. Natl. Acad. Sci. U. S. A. 105: 4685–4690. Tugarinov, V. and Kay, L.E. (2005). Quantitative 13C and 2 H NMR relaxation studies of the 723‐residue enzyme malate synthase G reveal a dynamic binding interface. Biochemistry 44: 15970–15977. Xu, Y., Long, D., and Yang, D. (2007). Rapid data collection for protein structure determination by NMR spectroscopy. J. Am. Chem. Soc. 129: 7722–7723. Vuister, G.W., Tessari, M., Karimi‐Nejad, Y., and Whitehead, B. (1999). Pulse sequences for measuring coupling constants. In: Biological Magnetic Resonance, vol. 16 (ed. R. Krishna and L. Berliner), 195–257. Dordrecht: Kluwer. Tolman, J.R., Flanagan, J.M., Kennedy, M.A., and Prestegard, J.H. (1995). Nuclear magnetic dipole interactions in field‐oriented proteins: information for structure determination in solution. Proc. Natl. Acad. Sci. U. S. A. 92: 9279–9283. Tjandra, N. and Bax, A. (1997). Direct measurement of distances and angles in biomolecules by NMR in a dilute liquid crystalline medium. Science 278: 1111–1114. Li, G.W., Liu, H., Qiu, F. et al. (2018). Residual dipolar couplings in structure determination of natural products. Nat. Prod. Bioprospect. 8: 279–295. Bax, A. and Grishaev, A. (2005). Weak alignment NMR: a hawk‐eyed view of biomolecular structure. Curr. Opin. Struct. Biol. 15: 563–570. Cornilescu, G., Marquardt, J.L., Ottiger, M., and Bax, A. (1998). Validation of protein structure from anisotropic carbonyl chemical shifts in a dilute liquid crystalline phase. J. Am. Chem. Soc. 120: 6836–6837. Cordier, F. and Grzesiek, S. (1999). Direct observation of hydrogen bonds in proteins by interresidue (3h) J(NC′) scalar couplings. J. Am. Chem. Soc. 121: 1601–1602.

319

320

13 Determination of Protein Structure and Dynamics by NMR

27 Battiste, J.L. and Wagner, G. (2000). Utilization of

28

29

30

31

32

33

34

35

36

37

38

39

40

41

site‐directed spin labeling and high‐resolution heteronuclear nuclear magnetic resonance for global fold determination of large proteins with limited nuclear overhauser effect data. Biochemistry 39: 5355–5365. Su, X.C. and Otting, G. (2011). Paramagnetic labelling of proteins and oligonucleotides for NMR (vol. 46, p. 101, 2010). J. Biomol. NMR 50: 99–100. Lindorff‐Larsen, K., Best, R.B., Depristo, M.A. et al. (2005). Simultaneous determination of protein structure and dynamics. Nature 433: 128–132. Clore, G.M., Nilges, M., Brunger, A.T. et al. (1987). A comparison of the restrained molecular dynamics and distance geometry methods for determining three‐ dimensional structures of proteins on the basis of interproton distances. FEBS Lett. 213: 269–277. Rieping, W., Habeck, M., and Nilges, M. (2005). Inferential structure determination. Science 309: 303–306. Nilges, M. (1995). Calculation of protein structures with ambiguous distance restraints – automated assignment of ambiguous NOE crosspeaks and disulfide connectivities. J. Mol. Biol. 245: 645–660. Schmidt, E. and Guntert, P. (2015). Automated structure determination from NMR spectra. Methods Mol. Biol. 1261: 303–329. Bardiaux, B., Malliavin, T., and Nilges, M. (2012). ARIA for solution and solid‐state NMR. Methods Mol. Biol. 831: 453–483. Herrmann, T., Guntert, P., and Wuthrich, K. (2002). Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA. J. Mol. Biol. 319: 209–227. Montelione, G.T., Nilges, M., Bax, A. et al. (2013). Recommendations of the wwPDB NMR Validation Task Force. Structure 21: 1563–1570. Vuister, G.W., Fogh, R.H., Hendrickx, P.M. et al. (2014). An overview of tools for the validation of protein NMR structures. J. Biomol. NMR 58: 259–285. Linge, J.P., Williams, M.A., Spronk, C.A. et al. (2003). Refinement of protein structures in explicit solvent. Proteins 50: 496–506. Schwieters, C.D., Bermejo, G.A., and Clore, G.M. (2018). Xplor‐NIH for molecular structure determination from NMR and other data sources. Protein Sci. 27: 26–40. Güntert, P. and Buchner, L. (2015). Combined automated NOE assignment and structure calculation with CYANA. J. Biomol. NMR 62: 453–471. Guerry, P., Duong, V.D., and Herrmann, T. (2015). CASD‐NMR 2: robust and accurate unsupervised analysis of raw NOESY spectra and protein structure

42

43

44 45

46

47

48

49

50

51

52

53

54 55

56

determination with UNIO. J. Biomol. NMR 62: 473–480. Mareuil, F., Malliavin, T.E., Nilges, M., and Bardiaux, B. (2015). Improved reliability, accuracy and quality in automated NMR structure calculation with ARIA. J. Biomol. NMR 62: 425–438. Dominguez, C., Boelens, R., and Bonvin, A.M. (2003). HADDOCK: a protein‐protein docking approach based on biochemical or biophysical information. J. Am. Chem. Soc. 125: 1731–1737. Hooft, R.W.W., Vriend, G., Sander, C., and Abola, E.E. (1996). Errors in protein structures. Nature 381: 272–272. Davis, I.W., Leaver‐Fay, A., Chen, V.B. et al. (2007). MolProbity: all‐atom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Res. 35: W375–W383. Bhattacharya, A., Tejero, R., and Montelione, G.T. (2007). Evaluating protein structures determined by structural genomics consortia. Proteins 66: 778–795. Berjanskii, M., Liang, Y.J., Zhou, J.J. et al. (2010). PROSESS: a protein structure evaluation suite and server. Nucleic Acids Res. 38: W633–W640. Mechelke, M. and Habeck, M. (2014). Bayesian weighting of statistical potentials in NMR structure calculation. PLoS One 9: e100197. Cuniasse, P., Tavares, P., Orlova, E.V., and Zinn‐Justin, S. (2017). Structures of biomolecular complexes by combination of NMR and cryoEM methods. Curr. Opin. Struct. Biol. 43: 104–113. Byeon, I.J., Meng, X., Jung, J. et al. (2009). Structural convergence between Cryo‐EM and NMR reveals intersubunit interactions critical for HIV‐1 capsid function. Cell 139: 780–790. Huang, R., Ripstein, Z.A., Augustyniak, R. et al. (2016). Unfolding the mechanism of the AAA+ unfoldase VAT by a combined cryo‐EM, solution NMR study. Proc. Natl. Acad. Sci. U. S. A. 113: E4190–E4199. Vahidi, S., Ripstein, Z.A., Bonomi, M. et al. (2018). Reversible inhibition of the ClpP protease via an N‐ terminal conformational switch. Proc. Natl. Acad. Sci. U. S. A. 115: E6447–E6456. Schwieters, C.D., Suh, J.Y., Grishaev, A. et al. (2010). Solution structure of the 128 kDa enzyme I dimer from Escherichia coli and its 146 kDa complex with HPr using residual dipolar couplings and small‐ and wide‐angle X‐ray scattering. J. Am. Chem. Soc. 132: 13026–13045. Marion, D. (2013). An introduction to biological NMR spectroscopy. Mol. Cell. Proteomics 12: 3006–3025. Frueh, D.P. (2014). Practical aspects of NMR signal assignment in larger and challenging proteins. Prog. Nucl. Magn. Reson. Spectrosc. 78: 47–75. Hiroaki, H. (2013). Recent applications of isotopic labeling for protein NMR in drug discovery. Expert Opin. Drug Discov. 8: 523–536.

References

57 Kazimierczuk, K., Stanek, J., Zawadzka‐Kazimierczuk,

58

59

60

61

62

63 64

65

66

67

68

69

70

71

A., and Kozminski, W. (2013). High‐dimensional NMR spectra for structural studies of biomolecules. Chemphyschem 14: 3015–3025. Amero, C., Asuncion Dura, M., Noirclerc‐Savoye, M. et al. (2011). A systematic mutagenesis‐driven strategy for site‐resolved NMR studies of supramolecular assemblies. J. Biomol. NMR 50: 229–236. Yokoyama, S. (2003). Protein expression systems for structural genomics and proteomics. Curr. Opin. Chem. Biol. 7: 39–43. Takeda, M. and Kainosho, M. (2012). Cell‐free protein production for NMR studies. Methods Mol. Biol. 831: 71–84. Hoffmann, B., Lohr, F., Laguerre, A. et al. (2018). Protein labeling strategies for liquid‐state NMR spectroscopy using cell‐free synthesis. Prog. Nucl. Magn. Reson. Spectrosc. 105: 1–22. Urbanek, A., Morato, A., Allemand, F. et al. (2018). A general strategy to access structural information at atomic resolution in polyglutamine homorepeats. Angew. Chem. Int. Ed. Engl. 57: 3598–3601. Gossert, A.D. and Jahnke, W. (2012). Isotope labeling in insect cells. Adv. Exp. Med. Biol. 992: 179–196. Sastry, M., Bewley, C.A., and Kwong, P.D. (2012). Mammalian expression of isotopically labeled proteins for NMR spectroscopy. Adv. Exp. Med. Biol. 992: 197–211. Sitarska, A., Skora, L., Klopp, J. et al. (2015). Affordable uniform isotope labeling with (2)H, (13)C and (15)N in insect cells. J. Biomol. NMR 62: 191–197. Meola, A., Deville, C., Jeffers, S.A. et al. (2014). Robust and low cost uniform (15)N‐labeling of proteins expressed in Drosophila S2 cells and Spodoptera frugiperda Sf9 cells for NMR applications. J. Struct. Biol. 188: 71–78. Franke, B., Opitz, C., Isogai, S. et al. (2018). Production of isotope‐labeled proteins in insect cells for NMR. J. Biomol. NMR 71: 173–184. Gardner, K.H. and Kay, L.E. (1998). The use of 2H, 13C, 15 N multidimensional NMR to study the structure and dynamics of proteins. Annu. Rev. Biophys. Biomol. Struct. 27: 357–406. Tugarinov, V., Hwang, P.M., and Kay, L.E. (2004). Nuclear magnetic resonance spectroscopy of high‐ molecular‐weight proteins. Annu. Rev. Biochem. 73: 107–146. Kay, L.E. (2011). Solution NMR spectroscopy of supra‐molecular systems, why bother? A methyl‐ TROSY view. J. Magn. Reson. 210: 159–170. Kerfah, R., Plevin, M.J., Sounier, R. et al. (2015). Methyl‐specific isotopic labeling: a molecular tool box for solution NMR studies of large proteins. Curr. Opin. Struct. Biol. 32: 113–122.

72 Kainosho, M., Torizawa, T., Iwashita, Y. et al. (2006).

73

74

75

76

77

78

79

80 81

82

83

84

85

86

Optimal isotope labelling for NMR protein structure determinations. Nature 440: 52–57. Kainosho, M., Miyanoiri, Y., Terauchi, T., and Takeda, M. (2018). Perspective: next generation isotope‐aided methods for protein NMR spectroscopy. J. Biomol. NMR 71: 119–127. Rasia, R.M., Brutscher, B., and Plevin, M.J. (2012). Selective isotopic unlabeling of proteins using metabolic precursors: application to NMR assignment of intrinsically disordered proteins. Chembiochem 13: 732–739. Jaipuria, G., Krishnarjuna, B., Mondal, S. et al. (2012). Amino acid selective labeling and unlabeling for protein resonance assignments. Adv. Exp. Med. Biol. 992: 95–118. Xue, J., Burz, D.S., and Shekhtman, A. (2012). Segmental labeling to study multidomain proteins. Adv. Exp. Med. Biol. 992: 17–33. Mikula, K.M., Krumwiede, L., Pluckthun, A., and Iwai, H. (2018). Segmental isotopic labeling by asparaginyl endopeptidase‐mediated protein ligation. J. Biomol. NMR 71: 225–235. Gobl, C., Madl, T., Simon, B., and Sattler, M. (2014). NMR approaches for structural analysis of multidomain proteins and complexes in solution. Prog. Nucl. Magn. Reson. Spectrosc. 80: 26–63. Kitevski‐LeBlanc, J.L. and Prosser, R.S. (2012). Current applications of 19F NMR to studies of protein structure and dynamics. Prog. Nucl. Magn. Reson. Spectrosc. 62: 1–33. Wright, P.E. and Dyson, H.J. (2009). Linking folding and binding. Curr. Opin. Struct. Biol. 19: 31–38. Tzeng, S.‐R. and Kalodimos, C.G. (2011). Protein dynamics and allostery: an NMR view. Curr. Opin. Struct. Biol. 21: 62–67. Bah, A. and Forman‐Kay, J.D. (2016). Modulation of intrinsically disordered protein function by post‐ translational modifications. J. Biol. Chem. 291: 6696–6705. Barbato, G., Ikura, M., Kay, L.E. et al. (1992). Backbone dynamics of calmodulin studied by 15N relaxation using inverse detected two‐dimensional NMR spectroscopy: the central helix is flexible. Biochemistry 31: 5269–5278. Kieffer, B. and Atkinson, R. (2004). The role of protein motions in molecular recognition: insights from heteronuclear NMR relaxation measurements. Prog. Nucl. Magn. Reson. Spectrosc. 44: 141–187. Lipari, G. and Szabo, A. (1982). Model‐free approach to the interpretation of nuclear magnetic resonance relaxation in macromolecules. 1. Theory and range of validity. J. Am. Chem. Soc. 100: 4546–4559. Tjandra, N., Wingfield, P., Stahl, S., and Bax, A. (1996). Anisotropic rotational diffusion of perdeuterated HIV protease from 15N NMR relaxation measurements at two magnetic fields. J. Biomol. NMR 8: 273–284.

321

322

13 Determination of Protein Structure and Dynamics by NMR

87 Sheppard, D., Sprangers, R., and Tugarinov, V. (2010).

Experimental approaches for NMR studies of side‐ chain dynamics in high‐molecular‐weight proteins. Prog. Nucl. Magn. Reson. Spectrosc. 56: 1–45. 88 Yang, D.W. and Kay, L.E. (1996). Contributions to conformational entropy arising from bond vector fluctuations measured from NMR‐derived order parameters: application to protein folding. J. Mol. Biol. 263: 369–382. 89 Akke, M. (2012). Conformational dynamics and thermodynamics of protein‐ligand binding studied by NMR relaxation. Biochem. Soc. Trans. 40: 419–423. 90 Wand, A.J., Moorman, V.R., and Harpole, K.W. (2013). A surprising role for conformational entropy in protein function. Top. Curr. Chem. 337: 69–94. 91 Stockmann, H., Bronowska, A., Syme, N.R. et al. (2008). Residual ligand entropy in the binding of p‐substituted benzenesulfonamide ligands to bovine carbonic anhydrase II. J. Am. Chem. Soc. 130: 12420–12426. 92 Tzeng, S.R. and Kalodimos, C.G. (2012). Protein activity regulation by conformational entropy. Nature 488: 236–240. 93 Palmer, A.G. 3rd. (2014). Chemical exchange in biomacromolecules: past, present, and future. J. Magn. Reson. 241: 3–17. 94 Vallurupalli, P., Bouvignies, G., and Kay, L.E. (2012). Studying “invisible” excited protein states in slow exchange with a major state conformation. J. Am. Chem. Soc. 134: 8148–8161. 95 Fraser, J.S., Clarkson, M.W., Degnan, S.C. et al. (2009). Hidden alternative structures of proline isomerase essential for catalysis. Nature 462: 669–673. 96 Bhabha, G., Lee, J., Ekiert, D.C. et al. (2011). A dynamic knockout reveals that conformational fluctuations influence the chemical step of enzyme catalysis. Science 332: 234–238. 97 Otten, R., Liu, L., Kenner, L.R. et al. (2018). Rescue of conformational dynamics in enzyme catalysis by directed evolution. Nat. Commun. 9: 1314. 98 Carroll, M.J., Mauldin, R.V., Gromova, A.V. et al. (2012). Evidence for dynamics in proteins as a mechanism for ligand dissociation. Nat. Chem. Biol. 8: 246–252. 99 Meinhold, D.W. and Wright, P.E. (2011). Measurement of protein unfolding/refolding kinetics and structural characterization of hidden intermediates by NMR relaxation dispersion. Proc. Natl. Acad. Sci. U. S. A. 108: 9078–9083. 100 Bruschweiler, S., Schanda, P., Kloiber, K. et al. (2009). Direct observation of the dynamic process underlying allosteric signal transmission. J. Am. Chem. Soc. 131: 3063–3068.

101 Anthis, N.J. and Clore, G.M. (2015). Visualizing

102

103

104

105

106

107

108

109

110

111

112

113

114

transient dark states by NMR spectroscopy. Q. Rev. Biophys. 48: 35–116. Bouvignies, G., Markwick, P.R., and Blackledge, M. (2007). Simultaneous definition of high resolution protein structure and backbone conformational dynamics using NMR residual dipolar couplings. Chemphyschem 8: 1901–1909. Ban, D., Sabo, T.M., Griesinger, C., and Lee, D. (2013). Measuring dynamic and kinetic information in the previously inaccessible supra‐tau(c) window of nanoseconds to microseconds by solution NMR spectroscopy. Molecules 18: 11904–11937. Gal, M., Schanda, P., Brutscher, B., and Frydman, L. (2007). UltraSOFAST HMQC NMR and the repetitive acquisition of 2D protein spectra at Hz rates. J. Am. Chem. Soc. 129: 1372–1377. Amero, C., Schanda, P., Dura, M.A. et al. (2009). Fast two‐dimensional NMR spectroscopy of high molecular weight protein assemblies. J. Am. Chem. Soc. 131: 3448–3449. Habchi, J., Tompa, P., Longhi, S., and Uversky, V.N. (2014). Introducing protein intrinsic disorder. Chem. Rev. 114: 6561–6588. Zhou, J., Zhao, S., and Dunker, A.K. (2018). Intrinsically disordered proteins link alternative splicing and post‐ translational modifications to complex cell signaling and regulation. J. Mol. Biol. 430: 2342–2359. Uversky, V.N. (2002). Natively unfolded proteins: a point where biology waits for physics. Protein Sci. 11: 739–756. Jensen, M.R., Zweckstetter, M., Huang, J.R., and Blackledge, M. (2014). Exploring free‐energy landscapes of intrinsically disordered proteins at atomic resolution using NMR spectroscopy. Chem. Rev. 114: 6632–6660. Narayanan, R.L., Durr, U.H., Bibow, S. et al. (2010). Automatic assignment of the intrinsically disordered protein Tau with 441‐residues. J. Am. Chem. Soc. 132: 11906–11907. Zerko, S., Byrski, P., Wlodarczyk‐Pruszynski, P. et al. (2016). Five and four dimensional experiments for robust backbone resonance assignment of large intrinsically disordered proteins: application to Tau3x protein. J. Biomol. NMR 65: 193–203. Felli, I.C. and Pierattelli, R. (2014). Novel methods based on (13)C detection to study intrinsically disordered proteins. J. Magn. Reson. 241: 115–125. Smet, C., Leroy, A., Sillen, A. et al. (2004). Accepting its random coil nature allows a partial NMR assignment of the neuronal Tau protein. Chembiochem 5: 1639–1646. Nielsen, J.T. and Mulder, F.A.A. (2018). POTENCI: prediction of temperature, neighbor and pH‐corrected

References

115

116

117

118

119

120

121

122

123

124 125

126

127

128

chemical shifts for intrinsically disordered proteins. J. Biomol. NMR 70: 141–165. Bernado, P., Blanchard, L., Timmins, P. et al. (2005). A structural model for unfolded proteins from residual dipolar couplings and small‐angle x‐ray scattering. Proc. Natl. Acad. Sci. U. S. A. 102: 17002–17007. Schneider, R., Maurin, D., Communie, G. et al. (2015). Visualizing the molecular recognition trajectory of an intrinsically disordered protein using multinuclear relaxation dispersion NMR. J. Am. Chem. Soc. 137: 1220–1229. Fuxreiter, M., Toth‐Petroczy, A., Kraut, D.A. et al. (2014). Disordered proteinaceous machines. Chem. Rev. 114: 6806–6843. Neduva, V. and Russell, R.B. (2005). Linear motifs: evolutionary interaction switches. FEBS Lett. 579: 3342–3345. Oldfield, C.J., Cheng, Y., Cortese, M.S. et al. (2005). Coupled folding and binding with alpha‐helix‐forming molecular recognition elements. Biochemistry 44: 12454–12470. Vassilev, L.T., Vu, B.T., Graves, B. et al. (2004). In vivo activation of the p53 pathway by small‐molecule antagonists of MDM2. Science 303: 844–848. Lamarre, D., Anderson, P.C., Bailey, M. et al. (2003). An NS3 protease inhibitor with antiviral effects in humans infected with hepatitis C virus. Nature 426: 186–189. Habenstein, B. and Loquet, A. (2016). Solid‐state NMR: an emerging technique in structural biology of self‐assemblies. Biophys. Chem. 210: 14–26. Kaplan, M., Pinto, C., Houben, K., and Baldus, M. (2016). Nuclear magnetic resonance (NMR) applied to membrane‐protein complexes. Q. Rev. Biophys. 49: e15. Bai, S., Wang, W., and Dybowski, C. (2010). Solid state NMR spectroscopy. Anal. Chem. 82: 4917–4924. McDermott, A. (2009). Structure and dynamics of membrane proteins by magic angle spinning solid‐ state NMR. Annu. Rev. Biophys. 38 (1): 385–403. Stanek, J., Andreas, L.B., Jaudzems, K. et al. (2016). NMR spectroscopic assignment of backbone and side‐chain protons in fully protonated proteins: microcrystals, sedimented assemblies, and amyloid fibrils. Angew. Chem. Int. Ed. Engl. 55: 15504–15509. Loquet, A., Habenstein, B., and Lange, A. (2013). Structural investigations of molecular machines by solid‐state NMR. Acc. Chem. Res. 46: 2070–2079. Tycko, R. and Wickner, R.B. (2013). Molecular structures of amyloid and prion fibrils: consensus versus controversy. Acc. Chem. Res. 46: 1487–1496.

129 Judge, P., Taylor, G., Dannatt, H.W., and Watts, A.

130

131

132

133

134

135

136

137 138

139

140

141

(2015). Solid‐state nuclear magnetic resonance spectroscopy for membrane protein structure determination. Methods Mol. Biol. 1261: 331–347. Marchanka, A., Simon, B., and Carlomagno, T. (2013). A suite of solid‐state NMR experiments for RNA intranucleotide resonance assignment in a 21 kDa protein–RNA complex. Angew. Chem. 125: 10180–10185. Viéville, J.M.P., Charbonnier, S., Eberling, P. et al. (2014). A new NMR technique to probe protein‐ ligand interaction. J. Pharm. Biomed. Anal. 89: 18–23. Habenstein, B., Loquet, A., Hwang, S. et al. (2015). Hybrid structure of the type 1 pilus of uropathogenic Escherichia coli. Angew. Chem. Int. Ed. Engl. 54: 11691–11695. Wang, S. and Ladizhansky, V. (2014). Recent advances in magic angle spinning solid state NMR of membrane proteins. Prog. Nucl. Magn. Reson. Spectrosc. 82: 1–26. Bechinger, B., Resende, J.M., and Aisenbrey, C. (2011). The structural and topological analysis of membrane‐ associated polypeptides by oriented solid‐state NMR spectroscopy: established concepts and novel developments. Biophys. Chem. 153: 115–125. Marcotte, I. and Auger, M. (2005). Bicelles as model membranes for solid‐ and solution‐state NMR studies of membrane peptides and proteins. Concepts Magn. Reson. A 24A: 17–37. Hagn, F., Nasr, M.L., and Wagner, G. (2018). Assembly of phospholipid nanodiscs of controlled size for structural studies of membrane proteins by NMR. Nat. Protoc. 13: 79–98. Baldus, M. (2018). GPCR: lock and key become flexible. Nat. Chem. Biol. 14: 201–202. Goncalves, J.A., Ahuja, S., Erfani, S. et al. (2010). Structure and function of G protein‐coupled receptors using NMR spectroscopy. Prog. Nucl. Magn. Reson. Spectrosc. 57: 159–180. Felli, I.C. and Brutscher, B. (2009). Recent advances in solution NMR: fast methods and heteronuclear direct detection. Chemphyschem 10: 1356–1368. Quinternet, M., Starck, J.P., Delsuc, M.A., and Kieffer, B. (2013). Heteronuclear NMR provides an accurate assessment of therapeutic insulin’s quality. J. Pharm. Biomed. Anal. 78–79: 252–254. Quinternet, M., Starck, J.P., Delsuc, M.A., and Kieffer, B. (2012). Unraveling complex small‐molecule binding mechanisms by using simple NMR spectroscopy. Chemistry 18: 3969–3974.

323

325

14 NMR Studies of Protein–Small Molecule Interactions for Drug Discovery Daniel F. Wyss1 and Edward R. Zartler2 1 2

Screening, Target and Compound Profiling, Merck Research Laboratories, Kenilworth, NJ, USA Quantum Tessera Consulting, Collegeville, PA, USA

14.1 Introduction The study of small molecule (low‐molecular‐mass [MM] compound)–target (mostly proteins) interactions is central to small‐molecule drug discovery. Biomolecular NMR (bioNMR) provides a wide array of approaches to study such interactions: ranging from functional readouts to biophysical measurements to atomic‐level structural models. The earliest step in drug discovery is the validation of the target as druggable, or more appropriately ligandable. Ligandability is defined as the ability to bind exogenous ligands. This is then followed by screening the target against a library of compounds. Confirmation of these active compounds leads to development of structure–activity relationships (SAR), and the compound enters the “hit‐to‐lead” (H2L) stage. At this point, the ligand–target interaction is extensively characterized. The key to impact at the early H2L stage is rapid, and easy to interpret, data that informs medicinal chemistry (medchem) decisions. At the later phase of H2L, and entering to lead selection, the resolution of the data becomes more and more atomic. Oftentimes, full structure calculations will not be performed using NMR data, but rather atomic‐level interactions are determined, which are then used to either filter or score docking poses. During the course of a drug discovery program (Figure 14.1), the affinity of a small‐molecule ligand to a drug target typically increases from weak affinity (Kd > μM) of validated screening hits to high affinity (Kd μM). NMR is a major tool to reveal physical mechanisms underlying molecular recognition. As such, it is currently used in various stages of the drug discovery process, from target ligandability assessment, hit identification by compound (and in particular, fragment) screening, hit validation, pharmacophore identification, H2L optimization, and LO. In the past decade, significant instrumental and methodological improvements have been made, resulting in an overall gain in sensitivity and opening new perspectives for drug

Structural Biology in Drug Discovery: Methods, Techniques, and Practices, First Edition. Edited by Jean‐Paul Renaud. © 2020 John Wiley & Sons, Inc. Published 2020 by John Wiley & Sons, Inc.

326

14 NMR Studies of Protein–Small Molecule Interactions for Drug Discovery

Target val.

Screening/hit expan.

H2L

LO and beyond

Figure 14.1 NMR and the drug discovery path. Low‐resolution to high‐resolution NMR experiments can be applied at all stages.

Low res Low res

Low res High res High res

High res

Figure 14.2 Target‐ and ligand‐based NMR experiments.

BioNMR structural Ligand-detected

Target-detected 13C,15N-labeled protein Atomic level – complex (targets