Computer Simulations of Aggregation of Proteins and Peptides (Methods in Molecular Biology, 2340) 1071615459, 9781071615454

This volume provides computational methods and reviews various aspects of computational studies of protein aggregation.

123 105 17MB

English Pages 491 [483] Year 2022

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Computer Simulations of Aggregation of Proteins and Peptides (Methods in Molecular Biology, 2340)
 1071615459, 9781071615454

Table of contents :
Preface
Contents
Contributors
Chapter 1: Bioinformatics Methods in Predicting Amyloid Propensity of Peptides and Proteins
1 Introduction
2 Amyloid Databases for the Development of New Bioinformatics Predictors
2.1 AMYPdb
2.2 Waltz-DB and Waltz-DB 2.0
2.3 AmyLoad
2.4 CPAD
2.5 AmyPro
2.6 ZipperDB
2.7 AmyCo
3 Methods for Predicting Amyloid Sequences
3.1 Tango
3.2 3D Profiles
3.3 PASTA and PASTA 2.0
3.4 Aggrescan
3.5 Aggrescan3D
3.6 NetCSSP
3.7 Pafig
3.8 AmylPred and AmylPred2
3.9 FoldAmyloid
3.10 Waltz
3.11 AmyloidMutants
3.12 MetAmyl
3.13 FISH Amyloid
3.14 ArchCandy
3.15 BetaSerpentine
3.16 Fibrilizer
3.17 APPNN
3.18 AmyloGram
3.19 AggScore
3.20 RFAmyloid
3.21 AgMata
3.22 PATH
4 Summary
References
Chapter 2: Protocols for Rational Design of Protein Solubility and Aggregation Properties Using Aggrescan3D Standalone
1 Introduction
2 A3D Standalone Package
2.1 Installation Instructions and Requirements
2.2 A3D Input Commands
2.3 Mutation Commands
2.4 Dynamic Mode Commands
2.5 Output Commands
3 Examples
3.1 Basic Run in Static Mode
3.2 Automated Mutations Run
3.3 Dynamic Mode Run
3.4 Managing Projects
4 Case Studies
4.1 Rational Design of Soluble Variants of the Green Fluorescent Protein
4.2 Rational Design of Soluble Variants of a Single-Domain VH Antibody
4.3 Rational Design of the Fab Domain of the Therapeutic Monoclonal Antibody
5 Notes
References
Chapter 3: Using Surface Hydrophobicity Together with Empirical Potentials to Identify Protein-Protein Binding Sites: Applicat...
1 Introduction
2 Hydrophobicity
3 Cadherins
4 Cadherin Interactions
5 Method for Evaluating Various Interacting Conformations Between Two Rigid Protein Structures
6 Some Simple Questions About Protein Aggregation
References
Chapter 4: Computational Models for the Study of Protein Aggregation
1 Introduction
2 Application of All-Atom Models to Study Protein Aggregation
2.1 Typical FFs Used for Studies of Systems Involving Peptides or Proteins
2.2 Applications of All-Atom Models
3 Coarse-Grained Models
3.1 Typical FFs Used for Studies of Systems Involving Peptides or Proteins
3.2 Applications of General-Purpose Coarse-Grained FFs
3.3 OPEP Coarse-Grained FF
3.4 UNRES Coarse-Grained FF
4 Other Off-Lattice Coarse-Grained FFs
4.1 The Vacha-Frenkel Model
4.2 The Barz-Urbanc Model
4.3 The Hoang-Trovato-Seno-Banavar-Maritan Model
4.4 The Mioduszewski-Cieplak Model
4.5 The Ilie-den Otter-Briels Model
4.6 The Pellarin-Caflish Model
4.7 The Bellesia-Shea Model
5 On-Lattice Models for the Study of Protein Aggregation
5.1 The Irback-Jónsson-Linnemann-Linse-Wallin model
5.2 The Zhang-Muthukumar Model
5.3 The Abeln-Vendruscolo-Dobson-Frenkel Model
5.4 The Li-Klimov-Straub-Thirumalai Model
6 Conclusions
References
Chapter 5: Probing Protein Aggregation Using the Coarse-Grained UNRES Force Field
1 Introduction
2 Methods
2.1 UNRES Model and Force Field
2.2 Molecular Dynamics and Replica Exchange Molecular Dynamics with UNRES
2.2.1 Equations of Motion and Their Integration
2.2.2 Temperature Control
2.2.3 Confining the System
2.2.4 Replica Exchange and Multiplexed Replica Exchange Molecular Dynamics
2.3 Principal Component Analysis
3 Software Description and Availability
4 Simulation Protocols
4.1 Simulating the Formation of Peptide Aggregates from Isolated Monomers
4.1.1 Analysis of Aggregation Simulations
4.2 Adding New Monomers to a Pre-existing Template
4.2.1 System Setup
4.2.2 Running the MD Simulations
4.2.3 Analysis of Results
5 Applications
5.1 Important Sites and the Mechanism for Amyloid Fibril Formation
5.2 The Formation of Tau- and Aβ-Peptide-Mixed Aggregates
6 Conclusions
References
Chapter 6: Contact-Based Analysis of Aggregation of Intrinsically Disordered Proteins
1 Introduction
2 Methods
2.1 Self-Assembly of Model Proteins into a Virus Capsid
2.2 Self-Assembly of IDPs in a Coarse-Grained Model
2.3 Self-Assembly of IDPs in an All-Atom Model
References
Chapter 7: Molecular Insights into the Effect of Metals on Amyloid Aggregation
1 Introduction
2 Amylin
2.1 Zn2+ Ion-Binding Sites in Amylin Fibril-Like Oligomers
2.2 The Effect of the Concentrations of Zn2+ Ions on Amylin Aggregation
2.3 The Effect of Zn2+ Ions on the Conformations of Amylin Fibril-Like Oligomers
3 Amyloid β
3.1 Metal-Binding Sites in Aβ Aggregates
3.2 The Effect of Zn2+ Concentrations on Aβ Aggregation and Polymorphism
4 α-Synuclein (AS)
4.1 The Specific Cu2+-Binding Sites in AS Fibrillary Aggregates
4.2 The Effect of Cu2+ Concentrations on Metal-Binding Sites
4.3 The Effect of Cu2+ Concentrations on Conformational Change and Polymorphism
5 Challenges and Future Perspectives
References
Chapter 8: From Quantum Mechanics, Classical Mechanics, and Bioinformatics to Artificial Intelligence Studies in Neurodegenera...
1 Introduction
2 DFT Calculations
3 CPMD Simulations
4 T-REMD Simulations
5 Bioinformatics Analysis of Intrinsic Disorder Predisposition
6 Deep Learning Studies Coupled with DFT Calculations
7 Appendix
References
Chapter 9: Computer Simulations Aimed at Exploring Protein Aggregation and Dissociation
1 Introduction
2 Computer Simulations of Amyloid Protein Aggregation
2.1 Monte Carlo Simulations
2.2 Activated Methods
2.3 MD Simulations
2.4 Guided MD Simulations
2.5 REMD
2.6 Simulated Tempering
3 Non-equilibrium MD Simulations Toward Understanding Amyloid Fibril Dissociation and Virus Dissociation
4 Conclusions
References
Chapter 10: All-Atom Molecular Dynamics Simulation Methods for the Aggregation of Protein and Peptides: Replica Exchange/Permu...
1 Introduction
2 Molecular Dynamics Simulations in Physical Ensembles
2.1 Methods
2.2 Applications to Aggregates of Aβ Peptides
2.2.1 Aggregation of Aβ Fragments at Air-Water Interfaces
2.2.2 Structure of Aβ Amyloid Fibril in Bulk Water
3 Replica-Exchange and Replica-Permutation Molecular Dynamics Method
3.1 Methods
3.1.1 Metropolis Algorithm
3.1.2 Suwa-Todo Algorithm
3.1.3 Replica-Exchange Method
3.1.4 Replica-Permutation Method
3.1.5 Hamiltonian Replica-Permutation Method
3.2 Applications to Aβ Fragments
4 Nonequilibrium Molecular Dynamics Method
4.1 Methods
4.2 Application to Amyloid Fibril Disruption by Supersonic Wave
5 Conclusions
References
Chapter 11: Determination of the Most Stable Packing of Peptides from Ribosomal S1 Protein, Protein Bgl2p, and Aβ peptide in β...
1 Introduction
2 Materials
3 Methods
3.1 Construction of the β-layers
3.2 Construction of the Oligomers
3.3 Method of Molecular Dynamics Simulations
3.4 Molecular Dynamics Simulations of β-layers
3.5 Molecular Dynamics Simulations of Oligomers
4 Notes
References
Chapter 12: Molecular Dynamics Simulations of Protein Aggregation: Protocols for Simulation Setup and Analysis with Markov Sta...
1 Introduction
2 Simulation and Analysis Protocols
2.1 MD Simulations and Basic Analysis
2.1.1 Preparation of the Simulation Box Containing Six Peptides
2.1.2 Creation of Directories for the Different Simulation Steps
2.1.3 Topology Building
2.1.4 Energy Minimization
2.1.5 NVT Equilibration
2.1.6 NPT Equilibration
2.1.7 MD Production Run
2.1.8 MD Analysis: Oligomer Size and Contact Maps
2.2 Markov State Models for the Analysis of Protein Aggregation
2.2.1 Feature Selection
2.2.2 Dimension Reduction and Discretization
2.2.3 Construction of the Markov State Model
2.2.4 Hidden Markov Model
2.3 Transition Networks for the Analysis of Protein Aggregation
2.3.1 Running the Transition Network Analysis Tcl Script
2.3.2 The State Attributes
2.3.3 The Transition Matrix
2.3.4 Visualizing the Transition Network
3 Summary
4 Appendix 1: Input Files for the MD Simulation
5 Appendix 2: Python Script for the Calculation of the Oligomerization State and Contact Maps
6 Appendix 3: Tcl Scripts for the Transition Network Analysis
References
Chapter 13: Challenges in Experimental Methods
1 Introduction
2 Peptide Synthesis
3 Experimental Methods for Identifying Amyloids
3.1 Chemical Methods
3.2 Analytical Methods
3.3 High-Resolution Methods
4 Conclusions
References
Chapter 14: Aggregates Sealed by Ions
1 Introduction
2 Ions: Atmosphere and Condensation
2.1 A Short Experimental Appendix: Small-Angle X-ray Scattering
3 Ions: Correlation
4 Metal Binding to Proteins
4.1 A Wider Experimental Appendix: X-ray Absorption Spectroscopy
4.1.1 The Physics of X-ray Absorption
4.2 Using XAS to Study Metal-Protein Complex
4.2.1 Tetanus Neurotoxin
4.2.2 The Aβ Peptide
4.2.3 The Prion Protein
4.3 A Short Experimental Appendix: Double Electron-Electron Resonance
5 Ions: Coordination, Bridges, and Transport
6 Conclusions
References
Chapter 15: Modifying Amyloid Motif Aggregation Through Local Structure
1 Introduction
2 Materials
2.1 Buffers, Dyes, and Solvents
2.2 Peptides
2.3 Evaporating Peptides
2.4 Measurement of Peptide Concentration
2.5 Thioflavin T Fluorescence Aggregation Assay Materials and Instrumentation
2.6 High-Performance Computing Cluster and Software
3 Methods
3.1 Peptide Selection
3.2 Peptide Disaggregation
3.3 Determination of Peptide Concentration
3.4 Thioflavin T Fluorescence Aggregation Assay
3.5 Molecular Dynamics Simulation
3.6 Molecular Dynamics Analysis
4 Notes
References
Chapter 16: Assessing the Stability of Biological Fibrils by Molecular-Scale Simulations
1 Introduction
2 Methodologies for the Characterization of the Mechanical Stability
2.1 The Continuum Approach
2.1.1 Hertzian Mechanics
2.1.2 The Ensemble Tip and Biological Fibrils
2.2 All-Atom MD Approach for the Study of Filament Response to Force
2.2.1 Preparing the Initial Coordinates for All-Atom MD Simulation
2.2.2 Steered All-Atom MD for Assessing Filament Force Response
2.3 Structure-Based Coarse Grained Model for the Nanomechanical Characterization
2.3.1 Contact Maps
2.3.2 Go-MARTINI
3 Notes
3.1 Numerical Simulations and the Breakdown of Continuum
3.2 Bacterial Filaments
3.3 Peptide and β-Amyloid Fibrils
References
Chapter 17: Predictive Modeling of Neurotoxic α-Synuclein Polymorphs
1 Introduction
2 Model Building, Computational Methodology, and Analysis
2.1 Construction, Refinement, and Characterization of Stable Helical αS Monomers
2.1.1 Generating Starting Helical αS Monomer Structures
2.1.2 Relaxation and Characterization of Helical αS Monomers
2.2 Construction and Validation of Helical αS Multimers
2.2.1 Protocol for Construction of Stable Helical αS Tetramers
2.2.2 Testing the Applicability of the Design Protocol
2.2.3 Validation of the Protocol Used to Design αS Multimers
2.2.4 Comparison with NMR Experimental Observables
2.3 Construction of Membrane-Bound αS Tetramers
2.3.1 Broken vs. Extended αS Tetramers
2.3.2 Preparation of Membrane-Bound αS Tetramers
Criteria for Model Selection
Generation of Lipid Bilayers with Different Compositions
Protocol for Relaxation of Tetramer-Membrane Systems
Implications from Simulations of Tetramer-Membrane Interactions
3 Conclusion
References
Chapter 18: Characterization of Amyloidogenic Peptide Aggregability in Helical Subspace
1 Introduction
2 Materials and Methods
2.1 Rationale for Selection of Starting Helical Monomer Models
2.1.1 Role of N-Terminus in Amyloid-β Aggregation
2.1.2 Role of C-Terminus in α-Synuclein Aggregation
2.2 Modeling the Helical Subspace
2.2.1 Employing the Predictive Power of Physical Models
2.2.2 Model Relaxation, Dynamics, and Test for Convergence
Atomistic Equilibrium Molecular Dynamics Simulations
Hamiltonian Replica Exchange Simulations with Solute Scaling
Assessment of Convergence in Simulations
Bootstrapped Block Covariance Overlap Method
Cumulative Averages of Secondary Structure Content
2.2.3 Characterisation of the Degree of Helical Folding/Unfolding
2.2.4 Conformational Distribution of Helical Intermediates
2.3 Identifying the Driving Force Behind Helix-Helix Associations
2.3.1 Mapping the Population Shifts of Partially Folded Helical States
Free Energy Surfaces
Effective Dimension D
2.3.2 Dynamic Cross-Correlation Network Analysis
Generating Network Model
Path Analysis
Strategy for Cross-Correlated Network Analysis of Amyloidogenic Peptides
Mapping the Long-Range Regulation of Partially Folded States by the Termini
2.4 Modeling the Optimized Helical Stabilities That Resist Helical Oligomerization
2.4.1 Mapping the Short-Range Tertiary Interactions
Interaction Maps
Energetic Assessment of Interactions
Computation of Conformational Energies
Computation of Interaction Energies
Relative Contribution of Domain-Domain Interactions To Overall Energies
2.4.2 Predicting the Effect of Terminal Deletions
2.5 Benchmarking Against NMR Observables
2.6 Estimation of Error
3 Conclusion
References
Chapter 19: Exploration of Protein Aggregations in Parkinson´s Disease Through Computational Approaches and Big Data Analytics
1 Introduction
2 Protein Misfolding and PD
3 Structure of α-Synuclein
3.1 Structures of Lewy Bodies
4 Misfolding and Aggregation of α-Synuclein
5 α-Synuclein and Simulation Analysis
6 Membrane Effects in α-Synuclein Aggregation
7 Other Key Players Involved in PD
8 Presenilin 1
9 Presenilin 2
10 PARKN and PINK1
10.1 DJ-1
11 Leucine-Rich Repeat Kinase 2
12 HTRA2
13 PARK2
14 Big Data and PD
15 Predictive Analysis
16 Big Data Approaches to Understand PD
17 Analysis of PD Using Apriori Algorithm
18 Big Data and Its Use in Drug Discovery
19 Management of Clinical Trial and Omics Data Analysis
20 Selection of Chemical Scaffolds from Databases
21 Orphan Drugs and Drug Repurposing
22 Resources Useful for PD Research
22.1 Parkinson´s Disease Research Database (PDRD)
22.2 Accelerating Medicines Partnership: Parkinson´s Disease
22.3 The Parkinson´s Progression Markers Initiative (PPMI)
22.4 ParkDB
22.5 PDbase
22.6 Parkinson´s Disease Data and Organizing Center
22.7 BioGPS and ProADD
22.8 WALTZ-DB 2.0 and CPAD 2.0
23 Conclusion
References
Chapter 20: Final Remarks
Index

Citation preview

Methods in Molecular Biology 2340

Mai Suan Li Andrzej Kloczkowski Marek Cieplak Maksim Kouza Editors

Computer Simulations of Aggregation of Proteins and Peptides

METHODS

IN

MOLECULAR BIOLOGY

Series Editor John M. Walker School of Life and Medical Sciences University of Hertfordshire Hatfield, Hertfordshire, UK

For further volumes: http://www.springer.com/series/7651

For over 35 years, biological scientists have come to rely on the research protocols and methodologies in the critically acclaimed Methods in Molecular Biology series. The series was the first to introduce the step-by-step protocols approach that has become the standard in all biomedical protocol publishing. Each protocol is provided in readily-reproducible step-bystep fashion, opening with an introductory overview, a list of the materials and reagents needed to complete the experiment, and followed by a detailed procedure that is supported with a helpful notes section offering tips and tricks of the trade as well as troubleshooting advice. These hallmark features were introduced by series editor Dr. John Walker and constitute the key ingredient in each and every volume of the Methods in Molecular Biology series. Tested and trusted, comprehensive and reliable, all protocols from the series are indexed in PubMed.

Computer Simulations of Aggregation of Proteins and Peptides Edited by

Mai Suan Li Division of Theoretical Physics, Polish Academy of Sciences, Warsaw, Poland

Andrzej Kloczkowski Battelle Center for Mathematical Medicine, Nationwide Children's Hospital, Columbus, OH, USA

Marek Cieplak Instutute of Physics, Polish Academy of Sciences, Warsaw, Poland

Maksim Kouza Battelle Center for Mathematical Medicine, Nationwide Children's Hospital, Columbus, OH, USA

Editors Mai Suan Li Division of Theoretical Physics Polish Academy of Sciences Warsaw, Poland

Andrzej Kloczkowski Battelle Center for Mathematical Medicine Nationwide Children’s Hospital Columbus, OH, USA

Marek Cieplak Instutute of Physics Polish Academy of Sciences Warsaw, Poland

Maksim Kouza Battelle Center for Mathematical Medicine Nationwide Children’s Hospital Columbus, OH, USA

ISSN 1064-3745 ISSN 1940-6029 (electronic) Methods in Molecular Biology ISBN 978-1-0716-1545-4 ISBN 978-1-0716-1546-1 (eBook) https://doi.org/10.1007/978-1-0716-1546-1 © Springer Science+Business Media, LLC, part of Springer Nature 2022 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Humana imprint is published by the registered company Springer Science+Business Media, LLC, part of Springer Nature. The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.

Preface Protein aggregation is a hallmark of a large group of major human diseases, including Alzheimer’s disease, type 2 diabetes, amyotrophic lateral sclerosis, Parkinson’s disease, and (as has been recently found) preeclampsia among many others. The aim of this volume of “Methods in Molecular Biology” was to cover broadly all computational methods and problems related to protein aggregation and combine biophysical and biomedical viewpoints. We have assembled a large group of scientists working in this field and tried as much as possible to diversify the group of authors by taking into account gender, geographic diversification, and age factors. In addition to many well-known and scientifically established leaders in this field, we invited several young highly promising scientists to write chapters. The set of authors represents a large geographic diversification. We have authors from the USA, Poland, Russia, Spain, Italy, France, Germany, Japan, Ireland, Turkey, Georgia, Israel, Mexico, Peru, China, India, Vietnam, and Pakistan. The following chapters contribute to this volume of “Methods in Molecular Biology”: Małgorzata Kotulska and Jakub W. Wojciechowski reviewed and compared various popular methods, such as Tango, 3D Profiles, PASTA and PASTA 2.0, Aggrescan and Aggrescan3D, and many others used to computationally predict amyloid propensity of a protein or peptide. Since the accuracy of bioinformatics methods may be highly dependent on reference data used to develop and evaluate the predictors, they also reviewed most popular databases of amyloids, such as AMYPdb, Waltz-DB and Waltz-DB 2.0, AmyLoad, CPAD, AmyPro, ZipperDB, and AmyCo, used for the development of bioinformatics tools for aggregation studies. Sebastian Kmiecik, Salvador Ventura, and their colleagues reviewed the architecture and functionalities of the Aggrescan3D (A3D) standalone package for the rational design of protein solubility and aggregation properties based on three-dimensional protein structures. Aggrescan3D is a highly efficient method that was implemented earlier by these authors as a webserver. They presented case studies showing how the A3D standalone tool improves the prediction protein solubility by predicted non-destabilizing amino acid mutations at the protein surfaces. Robert L. Jernigan and collaborators discussed the application of empirical (statistical) contact potentials to protein-protein interactions and argued that properly formulated potentials with properly defined reference state are needed. Since the comparisons are made between different bound structures, a preferred reference state should be defined with respect to a given residue type interacting with an average residue instead of interacting with solvent. A two-stage procedure for generating and evaluating interacting protein pairs was described, and applied to the interactions of E-cadherins. Mai Suan Li and his collaborators discussed different versions of the four most popular all-atom force fields AMBER, CHARMM, GROMOS, and OPLS, which are being used in simulations of folded and/or intrinsically disordered proteins. They reviewed both continuous and discrete coarse-grained models, which have been used to study the kinetics of aggregation.

v

vi

Preface

Adam Liwo with recently deceased Harold A. Scheraga and their colleagues describe the application of the physics-based UNited RESidue (UNRES) force field developed to study protein aggregation, in both template-free simulations and simulations of aggregation propagation from an existing template (seed). They illustrate their methodology on the examples of A-peptide aggregation and A-peptide-assisted aggregation of the peptides derived from the repeat domains of tau. Marek Cieplak with collaborators wrote a chapter on contact-based analysis of aggregation of intrinsically disordered proteins. They reviewed both coarse-grained and all-atom models of aggregation and focused on aggregation of polyglutamines and polyalanines, as well as on the association of two chains of α-synuclein and up to 20 chains of a 12-residuelong segment of protein tau. Yifat Miller described important roles of essential nutrient divalent metal ions in key biological processes and the disturbed homeostasis of these ions caused by aging. Free metal ions released from brain cells interact with Aβ (in AD) and AS (in PD) to form metal–amyloid aggregate complexes that promote amyloid aggregation and cause neurodegenerative diseases. The author discussed recent computational studies to investigate the effect of metal ions on amylin, Aβ, and AS. Vladimir N. Uversky with collaborators described various packages that enable to determine the 3-dimensional structures of amyloid β proteins and their smaller bioactive fragments important in Alzheimer’s disease pathogenesis. They described density functional theory, Car-Parrinello molecular dynamics simulations, temperature-dependent replica exchange molecular dynamics simulations, disorder predictors based on bioinformatics, and neural network deep learning and their application to study neurodegenerative diseases. Philippe Derreumaux with Phuong H. Nguyen reviewed the current computer simulation methods used for aggregation with some atomistic and coarse-grained results aimed to better characterize the early-formed oligomers and amyloid fibril formation. They studied the applications of nonequilibrium molecular dynamics simulations to better understand the dissociation of protein assemblies. Satoru G. Itoh and Hisashi Okumura reviewed the all-atom MD simulation methods for protein aggregation and the use of physical statistical ensembles, such as the canonical and isothermal-isobaric ensemble. Then they reviewed the generalized-ensemble algorithms such as replica exchange and replica-permutation MD methods, which have been developed to overcome a problem of getting trapped in local-minimum free-energy states, and nonequilibrium MD simulation method. Oxana V. Galzitskaya with collaborators tried to determine the most stable packing of peptides in β-layers to construct an oligomer structure for fibril growth. They built β-layers consisting of eight short peptides with the amino acid sequences IVRGVVVAID, VDSW NVLVAG (VESWNVLVAG), KLVFFAEDVG, and IIGLMVGGVV corresponding to the amyloidogenic regions of ribosomal S1 protein from E. coli, protein glucantransferase Bgl2p from the yeast cell wall, and Aβ peptide. The amyloidogenic regions were first predicted theoretically and then were confirmed experimentally. Birgit Strodel and collaborators discussed molecular dynamics (MD) simulations at the atomic level for capturing key steps during the formation of amyloid oligomers. They provided a step-by-step guide for setting up, running, and analyzing MD simulations of aggregating peptides using GROMACS and explained and provided the tools to derive Markov state models and transition networks from MD data of peptide aggregation.

Preface

vii

Marlena E. Ga˛sior-Głogowska with colleagues reviewed various challenges encountered in experimental studies of amyloids. There are many methods available for studying amyloids: ranging from simple staining techniques to a variety of highly complex methods to study their detailed biochemical and structural characterization. Moreover, the results obtained by various experimental methods often differ significantly, which may lead to misclassification of amyloid peptides. Due to ambiguity of such costly and time-consuming experiments, bioinformatical methods are becoming widely used in studying amyloids and for interpretation of experimental data. Giovanni La Penna and Silvia Morante reviewed some recent results where the role of ions is found essential in sealing more or less pre-organized assemblies of macromolecules. They discuss the role of multivalent ions in assisting the assembly process, shifting population in polymorphic states, and sealing protein aggregates. Lukasz A. Joachimiak and his student presented a protocol that correlates experimentally determined aggregation propensities for peptides measured by the Thioflavin T (ThT) fluorescence aggregation assay with their conformational ensembles obtained from molecular dynamics simulations. They showed that the integration of experiment and simulation enables to uncover structural rules that govern protein aggregation. Adolfo B. Poma and his collaborators discussed nanomechanical characterization of biological fibrils by molecular dynamics simulations and their relation to atomic force microscopy experiments. These computational methodologies range from atomistic simulations with molecular force fields to coarse-grained simulations with statistical potentials. The author discussed several methodologies for the calculation of mechanical parameters, such as the elastic constants of relevant biological systems, and their applications to bacterial filament and β-amyloid systems. Damien Thompson and his colleagues discussed computational models, methods, design rules, and strategies to identify the driving force behind helical self-assembly and the molecular origin of aggregation resistance in helical intermediates of Aβ42 and αS. By extensive multi-scale mapping of intrapeptide interactions, they showed that their computational models can capture features that are otherwise imperceptible to experiments and allow to control initial pathogenic aggregability of amyloidogenic peptides. Another chapter by Damien Thompson and collaborators discussed the assembly of monomeric α-synuclein (αS) into aggregation-resistant helically folded tetramers and related multimers that are key targets for Parkinson’s disease (PD). They applied computational modeling and simulations to obtain high-resolution structural information on the assembly of αS and interactions with biological surfaces. They described a useful protocol for designing helical αS multimers, especially tetramers, and scanning the peptide–membrane interface for cell-bound αS tetramers. These computationally modeled structures were validated by comparison with available experimental data. Mubashir Hassan and collaborators reviewed computational methods to study the misfolding and aggregation of α-synuclein protein responsible for PD. They reviewed also big data and machine learning methods used to better understand molecular mechanisms involved in PD and in computer-aided development of drugs for PD treatment. They reviewed genes involved in PD and various databases and websites useful in PD research studies.

viii

Preface

Overall, our book covers most of currently available models and methods for computational characterization of protein and peptide aggregation and should be useful for both biophysical and biomedical community. Warsaw, Poland Columbus, OH, USA Warsaw, Poland Columbus, OH, USA

Mai Suan Li Andrzej Kloczkowski Marek Cieplak Maksim Kouza

Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 Bioinformatics Methods in Predicting Amyloid Propensity of Peptides and Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Małgorzata Kotulska and Jakub W. Wojciechowski 2 Protocols for Rational Design of Protein Solubility and Aggregation Properties Using Aggrescan3D Standalone. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aleksander Kuriata, Aleksandra E. Badaczewska-Dawid, Jordi Pujols, Salvador Ventura, and Sebastian Kmiecik 3 Using Surface Hydrophobicity Together with Empirical Potentials to Identify Protein–Protein Binding Sites: Application to the Interactions of E-cadherins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Robert L. Jernigan, Pranav Khade, Ambuj Kumar, and Andrzej Kloczkowski 4 Computational Models for the Study of Protein Aggregation . . . . . . . . . . . . . . . . Nguyen Truong Co, Mai Suan Li, and Pawel Krupa 5 Probing Protein Aggregation Using the Coarse-Grained UNRES Force Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ana V. Rojas, Gia G. Maisuradze, Harold A. Scheraga, and Adam Liwo 6 Contact-Based Analysis of Aggregation of Intrinsically Disordered Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marek Cieplak, Łukasz Mioduszewski, and Mateusz Chwastyk 7 Molecular Insights into the Effect of Metals on Amyloid Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yifat Miller 8 From Quantum Mechanics, Classical Mechanics, and Bioinformatics to Artificial Intelligence Studies in Neurodegenerative Diseases . . . . . . . . . . . . . . . Orkid Coskuner-Weber, M. Gokhan Habiboglu, David Teplow, and Vladimir N. Uversky 9 Computer Simulations Aimed at Exploring Protein Aggregation and Dissociation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Phuong H. Nguyen and Philippe Derreumaux 10 All-Atom Molecular Dynamics Simulation Methods for the Aggregation of Protein and Peptides: Replica Exchange/Permutation and Nonequilibrium Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Satoru G. Itoh and Hisashi Okumura 11 Determination of the Most Stable Packing of Peptides from Ribosomal S1 Protein, Protein Bgl2p, and Aβ peptide in β-layers During Molecular Dynamics Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anna V. Glyakina, Nikolai K. Balabaev, and Oxana V. Galzitskaya

ix

v xi

1

17

41

51

79

105

121

139

175

197

221

x

12

13 14 15 16

17 18

19

20

Contents

Molecular Dynamics Simulations of Protein Aggregation: Protocols for Simulation Setup and Analysis with Markov State Models and Transition Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Suman Samantray, Wibke Schumann, Alexander-Maurice Illig, Martin Carballo-Pacheco, Arghadwip Paul, Bogdan Barz, and Birgit Strodel Challenges in Experimental Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marlena E. Ga˛sior-Głogowska, Natalia Szulc, and Monika Szefczyk Aggregates Sealed by Ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Giovanni La Penna and Silvia Morante Modifying Amyloid Motif Aggregation Through Local Structure . . . . . . . . . . . . . Sofia Bali and Lukasz A. Joachimiak Assessing the Stability of Biological Fibrils by Molecular-Scale Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rodrigo A. Moreira, Joseph L. Baker, Horacio V. Guzman, and Adolfo B. Poma Predictive Modeling of Neurotoxic α-Synuclein Polymorphs . . . . . . . . . . . . . . . . . Liang Xu, Shayon Bhattacharya, and Damien Thompson Characterization of Amyloidogenic Peptide Aggregability in Helical Subspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shayon Bhattacharya, Liang Xu, and Damien Thompson Exploration of Protein Aggregations in Parkinson’s Disease Through Computational Approaches and Big Data Analytics . . . . . . . . . . . . . . . . . . . . . . . . . Saba Shahzadi, Muhammad Yasir, Bisma Aftab, Sumbal Babar, and Mubashir Hassan Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mai Suan Li, Andrzej Kloczkowski, Marek Cieplak, and Maksim Kouza

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

235

281 309 343

357

379

401

449

469

471

Contributors BISMA AFTAB • Institute of Molecular Biology and Biotechnology, The University of Lahore, Lahore, Pakistan SUMBAL BABAR • Institute of Molecular Biology and Biotechnology, The University of Lahore, Lahore, Pakistan ALEKSANDRA E. BADACZEWSKA-DAWID • Department of Chemistry, Iowa State University, Ames, IA, USA JOSEPH L. BAKER • Department of Chemistry, The College of New Jersey, Ewing, NJ, USA NIKOLAI K. BALABAEV • Institute of Mathematical Problems of Biology RAS, Keldysh Institute of Applied Mathematics of Russian Academy of Sciences, Pushchino, Russia SOFIA BALI • Molecular Biophysics Graduate Program, University of Texas Southwestern Medical Center, Dallas, TX, USA BOGDAN BARZ • Institute of Biological Information Processing: Structural Biochemistry (IBI7), Forschungszentrum Ju¨lich, Ju¨lich, Germany; Institute of Physical Biology, Heinrich Heine University Du¨sseldorf, Du¨sseldorf, Germany SHAYON BHATTACHARYA • Department of Physics, Bernal Institute, University of Limerick, Limerick, Ireland MATEUSZ CHWASTYK • Institute of Physics, Polish Academy of Sciences, Warsaw, Poland MAREK CIEPLAK • Institute of Physics, Polish Academy of Sciences, Warsaw, Poland NGUYEN TRUONG CO • Institute of Physics, Polish Academy of Sciences, Warsaw, Poland ORKID COSKUNER-WEBER • Molecular Biotechnology, Turkish-German University, Istanbul, Turkey PHILIPPE DERREUMAUX • Laboratoire de Biochimie The´orique, UPR 9080, CNRS, Universite´ de Paris, Paris, France; Institut de Biologie Physico-Chimique, Fondation Edmond de Rothschild, PSL Research University, Paris, France OXANA V. GALZITSKAYA • Institute of Protein Research, Russian Academy of Sciences, Pushchino, Moscow Region, Russia; Institute of Theoretical and Experimental Biophysics, Russian Academy of Sciences, Pushchino, Moscow Region, Russia MARLENA E. GA˛SIOR-GŁOGOWSKA • Department of Biomedical Engineering, Wroclaw University of Science and Technology, Wrocław, Poland ANNA V. GLYAKINA • Institute of Protein Research, Russian Academy of Sciences, Pushchino, Moscow Region, Russia; Institute of Mathematical Problems of Biology RAS, Keldysh Institute of Applied Mathematics of Russian Academy of Sciences, Pushchino, Russia HORACIO V. GUZMAN • Jozˇef Stefan Institute, Ljubljana, Slovenia M. GOKHAN HABIBOGLU • Electrical and Electronics Engineering, Turkish-German University, Istanbul, Turkey MUBASHIR HASSAN • Institute of Molecular Biology and Biotechnology, The University of Lahore, Lahore, Pakistan ALEXANDER-MAURICE ILLIG • Institute of Biological Information Processing: Structural Biochemistry (IBI-7), Forschungszentrum Ju¨lich, Ju¨lich, Germany SATORU G. ITOH • Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of Natural Sciences, Okazaki, Aichi, Japan; Institute for Molecular Science, National Institutes of Natural Sciences, Okazaki, Aichi, Japan; Department of

xi

xii

Contributors

Structural Molecular Science, SOKENDAI (The Graduate University for Advanced Studies), Okazaki, Aichi, Japan ROBERT L. JERNIGAN • Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA, USA LUKASZ A. JOACHIMIAK • Center for Alzheimer’s and Neurodegenerative Diseases, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, USA PRANAV KHADE • Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA, USA ANDRZEJ KLOCZKOWSKI • Battelle Center for Mathematical Medicine, Nationwide Children’s Hospital, Columbus, OH, USA; Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA SEBASTIAN KMIECIK • Biological and Chemical Research Centre, University of Warsaw, Warsaw, Poland MAŁGORZATA KOTULSKA • Department of Biomedical Engineering, Wroclaw University of Science and Technology, Wroclaw, Poland MAKSIM KOUZA • Battelle Center for Mathematical Medicine, Nationwide Children’s Hospital, Columbus, OH, USA; Faculty of Chemistry, University of Warsaw, Warsaw, Poland PAWEL KRUPA • Institute of Physics, Polish Academy of Sciences, Warsaw, Poland AMBUJ KUMAR • Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA, USA ALEKSANDER KURIATA • Faculty of Chemistry, Biological and Chemical Research Centre, University of Warsaw, Warsaw, Poland GIOVANNI LA PENNA • Institute for Chemistry of Organo-Metallic Compounds, National Research Council of Italy, Florence, Italy MAI SUAN LI • Institute of Physics, Polish Academy of Sciences, Warsaw, Poland; Institute for Computational Science and Technology, Ho Chi Minh City, Vietnam ADAM LIWO • Faculty of Chemistry, University of Gdan´sk, Gdan´sk, Poland GIA G. MAISURADZE • Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY, USA YIFAT MILLER • Department of Chemistry, Ben-Gurion University of the Negev, Be’er-Sheva, Israel; Ilse Katz Institute for Nanoscale Science and Technology, Ben-Gurion University of the Negev, Be’er-Sheva, Israel ŁUKASZ MIODUSZEWSKI • Institute of Physics, Polish Academy of Sciences, Warsaw, Poland SILVIA MORANTE • Department of Physics, University of Roma Tor Vergata, Roma, Italy RODRIGO A. MOREIRA • Soft Matter and Biosystems, Institute of Fundamental Technological Research, Polish Academy of Sciences, Warsaw, Poland PHUONG H. NGUYEN • Laboratoire de Biochimie The´orique, UPR 9080, CNRS, Universite´ de Paris, Paris, France; Institut de Biologie Physico-Chimique, Fondation Edmond de Rothschild, PSL Research University, Paris, France HISASHI OKUMURA • Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of Natural Sciences, Okazaki, Aichi, Japan; Institute for Molecular Science, National Institutes of Natural Sciences, Okazaki, Aichi, Japan; Department of Structural Molecular Science, SOKENDAI (The Graduate University for Advanced Studies), Okazaki, Aichi, Japan ARGHADWIP PAUL • Institute of Biological Information Processing: Structural Biochemistry (IBI-7), Forschungszentrum Ju¨lich, Ju¨lich, Germany

Contributors

xiii

ADOLFO B. POMA • Soft Matter and Biosystems, Institute of Fundamental Technological Research, Polish Academy of Sciences, Warsaw, Poland JORDI PUJOLS • Institut de Biotecnologia i de Biomedicina (IBB), Universitat Auto`noma de Barcelona, Bellaterra, Spain; Departament de Bioquı´mica i Biologia Molecular, Universitat Auto`noma de Barcelona, Bellaterra, Spain ANA V. ROJAS • Schrodinger Inc., New York, NY, USA SUMAN SAMANTRAY • Institute of Biological Information Processing: Structural Biochemistry (IBI-7), Forschungszentrum Ju¨lich, Ju¨lich, Germany HAROLD A. SCHERAGA • Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY, USA WIBKE SCHUMANN • Institute of Biological Information Processing: Structural Biochemistry (IBI-7), Forschungszentrum Ju¨lich, Ju¨lich, Germany; Institute of Theoretical and Computational Chemistry, Heinrich Heine University Du¨sseldorf, Du¨sseldorf, Germany SABA SHAHZADI • Institute of Molecular Sciences and Bioinformatics, Lahore, Pakistan BIRGIT STRODEL • Institute of Biological Information Processing: Structural Biochemistry (IBI-7), Forschungszentrum Ju¨lich, Ju¨lich, Germany; Institute of Theoretical and Computational Chemistry, Heinrich Heine University Du¨sseldorf, Du¨sseldorf, Germany MONIKA SZEFCZYK • Department of Bioorganic Chemistry, Wroclaw University of Science and Technology, Wrocław, Poland NATALIA SZULC • Department of Biomedical Engineering, Wroclaw University of Science and Technology, Wrocław, Poland DAVID TEPLOW • Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, USA DAMIEN THOMPSON • Department of Physics, Bernal Institute, University of Limerick, Limerick, Ireland VLADIMIR N. UVERSKY • Department of Molecular Medicine and USF Health Byrd Alzheimer Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, USA; Institute for Biological Instrumentation, Russian Academy of Sciences, Pushchino, Moskow Region, Russia SALVADOR VENTURA • Institut de Biotecnologia i de Biomedicina (IBB), Universitat Auto`noma de Barcelona, Bellaterra, Spain; Departament de Bioquı´mica i Biologia Molecular, Universitat Auto`noma de Barcelona, Bellaterra, Spain JAKUB W. WOJCIECHOWSKI • Department of Biomedical Engineering, Wroclaw University of Science and Technology, Wroclaw, Poland LIANG XU • Department of Physics, Bernal Institute, University of Limerick, Limerick, Ireland MUHAMMAD YASIR • Institute of Molecular Biology and Biotechnology, The University of Lahore, Lahore, Pakistan

Chapter 1 Bioinformatics Methods in Predicting Amyloid Propensity of Peptides and Proteins Małgorzata Kotulska and Jakub W. Wojciechowski Abstract Several computational methods have been developed to predict amyloid propensity of a protein or peptide. These bioinformatics tools are time- and cost-saving alternatives to expensive and laborious experimental methods which are used to confirm self-aggregation of a protein. Computational approaches not only allow preselection of reliable candidates for amyloids but, most importantly, are capable of a thorough and informative analysis of a protein, indicating the sequence determinants of protein aggregation, identifying the potential causal mutations and likely mechanisms. Bioinformatics modeling applies several different approaches, which most typically include physicochemical or structure-based modeling, machine learning, or statistics based modeling. Bioinformatics methods typically use the amino acid sequence of a protein as an input, some also include additional information, for example, an available structure. This chapter describes the methods currently used to computationally predict amyloid propensity of a protein or peptide. Since the accuracy of bioinformatics methods may be highly dependent on reference data used to develop and evaluate the predictors, we also briefly present the main databases of amyloids used by the authors of bioinformatics tools. Key words Amyloid, Misfolding, Computational prediction, Machine learning, Classification

1

Introduction Research on amyloid proteins has intensified in recent years, following evidence of their contribution to a significant number of diseases. Fast and reliable methods are needed to identify which proteins are capable of self-aggregation or cross-seeding amyloid aggregation of other proteins. Most importantly, it is essential to recognize which mutations may drive a protein to amyloid aggregation, or to inhibit the process. Therefore, thorough and informative analysis of a protein, identifying the amyloidogenic hot spots, is highly desirable. Experiments to determine amyloid propensity are laborious and costly, and some are inconclusive or even misleading. Therefore, large-scale experimental studies are currently not possible without support from bioinformatics.

Mai Suan Li et al. (eds.), Computer Simulations of Aggregation of Proteins and Peptides, Methods in Molecular Biology, vol. 2340, https://doi.org/10.1007/978-1-0716-1546-1_1, © Springer Science+Business Media, LLC, part of Springer Nature 2022

1

2

Małgorzata Kotulska and Jakub W. Wojciechowski

Several bioinformatics methods have been proposed for in silico identification of amyloids. These methods are based on different approaches acquired from classical protein bioinformatics, including physicochemical methods, statistical and machine learning methods, or a combination of these. Computational approaches using amino acid sequences allow reliable identification of candidates for amyloids. Since it has been shown [1] that protein aggregation usually results from amyloidogenic propensity of its shorter segments, many predictors search for potential amyloid seeds, identifying the essential parts in the protein. The computational process is fast and usually reliable. To increase the identification accuracy, one may employ several methods with different approaches and look for a consensus in the results. The growing interest in developing new modeling methods for amyloid analysis requires well-organized reference data which underlie the process of modeling and enable subsequent validation of the new methods. Early modeling attempts were limited to scarce data collected in simple datasheets. As more data became available, elaborate and dedicated databases were compiled, which could be used to develop new prediction methods for amyloids.

2

Amyloid Databases for the Development of New Bioinformatics Predictors Bioinformatics approaches, as all types of modeling, need initial reference data to develop and then validate a new method. Since the discovery of amyloid proteins, researchers have collected and organized the data in various ways, resulting in various types of databases. Some of these are disease oriented, while others focus on aggregation profiles of the proteins. The latter type of database is of special interest for bioinformaticians developing amyloid predictors. To facilitate the analysis of amyloids, studies included the minimal length peptides which may be responsible for the amyloid propensity of the whole protein. The first data collection of amyloidogenic fragments was published in [1] and based on the hexapeptide STVIIE, which was methodically mutated and tested, finally providing 56 amyloid-positive and 38 amyloid-negative fragments. A strong bias in this set, resulting from one underlying template, was partially removed when new peptides became available. Two collections released by Eisenberg’s group [2] had an immense impact on early bioinformatics methods: AmylHex (67 amyloid-positive and 91 amyloid-negative hexapeptides) and AmylFrag (28 amyloid fragments of a variable length between 6 and 51). Currently, the collections of amyloid fragments, peptides, and proteins have grown and are organized in professional databases. Below, we briefly describe the currently available sources of data that support development of new bioinformatics predictors, in chronological order of their first release.

Amyloid Predictors

2.1

AMYPdb

3

AMYPdb is the oldest database of amyloids which is still active. It was first published in 2004 [3], and its current beta version was released in 2018. AMYPdb is dedicated to amyloid precursor families. It collects full-length proteins confirmed as amyloids, presenting their sequences, structures, and gene locations, for all available isoforms. The unique features of the database are available patterns within protein families, for a set of match parameters selected by the user. It is available as a web server: http://amypdb.genouest.org/e107_plugins/amypdb_project/project.php.

2.2 Waltz-DB and Waltz-DB 2.0

Waltz-DB was first released in 2015 [4]. In 2020, the database was extended and enhanced with new functionalities [5]. The database contains hexapeptides tested for their amyloid propensities. Its original version started with 1100 hexapeptides, and it has extended (as of April 2020) to 1416 hexapeptides, of which 515 are amyloidogenic. The entries contain the exact results of the amyloid studies, if available, some physicochemical measures, and the prediction output for a few bioinformatics tools. The unique feature of the database is availability of computed model structures for each of the entries. It is available as a web server: http://waltzdb.switchlab.org/.

2.3

AmyLoad

AmyLoad [6] was released in 2015. It is a service dedicated to amyloid-forming peptides. As of April 2020, it includes 1481 peptides (444 positive) from different sources which range from 4 to 83 amino acids. The entries contain information on the analytical methods applied for the amyloid study, the precursor protein, and cross-references linked to the PubMed database. The service also includes several amyloid predictors, either on-site implementation or a link to the original site. It is available as a web server: http://comprec-lin.iiar.pwr.edu.pl/amyload/.

2.4

CPAD

CPAD was first released in 2016 [7], and its updated version appeared in 2020 [8]. The database collects amyloid-forming and non-amyloid peptides. It includes 268 unique amyloid proteins and 2031 fragments of different lengths (917 amyloid-positive), collected from different sources. A user can find specific information on each amyloid, including the hot spots, structure of a protein or peptide, as well as aggregating complexes. The unique feature of the service is the included database of aggregation kinetics. It is available as a web server: https://web.iitm.ac.in/bioinfo2/cpad2/index.html.

2.5

AmyPro

AmyPro was published in 2018 [9] and contains validated amyloid and prion proteins, classified also with regard to their functionalities. The entries indicate borders of experimentally validated amyloidogenic segments, provide descriptions of the protein functional

4

Małgorzata Kotulska and Jakub W. Wojciechowski

relevance, also in their amyloid form, and a list of the experimental techniques that were applied to evaluate the amyloid form. The database also presents information from UniProt regarding function and observed mutations, as well as PDB structures categorized according to the protein form. The literature references are linked to the PubMed database. The unique feature of this database is a clear separation of functional amyloids and prions, both of which deserve more extensive studies by bioinformatics. It is available as a web server: http://amypro.net/. The two databases described below are slightly different as they do not present experimental aggregation profiles of proteins or their fragments. However, we believe they may also provide useful information for the development of new predictors. 2.6

ZipperDB

ZipperDB was started in 2010, at an early stage of the computational research on amyloids [10]. It differs from the previously presented databases in that its entries contain computational predictions of fibril-forming segments. The fragments were identified by the 3D Profile method proposed by the authors of ZipperDB [2]. As of April 2020, the database includes computationally predicted amyloidogenic hexapeptides contained in proteins from 112 different genomes. The majority of these have not yet been experimentally confirmed. It is available as a web server: https://services.mbi.ucla.edu/zipperdb/.

2.7

AmyCo

AmyCo [11] is a disease-oriented database of 2019, mostly dedicated to amyloidosis. However, it also presents detailed information on proteins underlying the diseases. The entries, which are annotated with the precursor and related proteins, are linked to ICD-10, MeSH, OMIM, PubMed, AmyPro, and UniProtKB databases. It is available as a web server: http://bioinformatics.biol.uoa.gr/amyco.

3

Methods for Predicting Amyloid Sequences Amyloid predictors can be based on very different approaches. Some methods rely on physicochemical properties of a protein chain, referring to its total hydrophobicity, charge, helical, and β-sheet propensities, contributing to functions known as statistical potentials. More sophisticated methods from this group take into account these properties in terms of a potentially stable tertiary protein structure. Development of physicochemical methods usually does not depend on access to a high-quality reference dataset, accompanied with correct annotation of the instances into amyloid or non-amyloid classes, since they do not learn from any training datasets. They are based on classical methods of protein modeling,

Amyloid Predictors

5

testing if a query sequence is capable of forming and maintaining an amyloid-type structure. Another approach is based on a statistical profile of amino acid sequences, often encoded by their physicochemical properties. A more advanced group of the statistical methods are machine learning techniques. These methods may be heavily dependent on experimentally confirmed data (supervised machine learning), representing peptides annotated into amyloidogenic or non-amyloidogenic classes. The advantage of this approach lies in possible discoveries of combined physicochemical characteristics determining the amyloid propensity of proteins, which may have not been obvious to researchers explicitly including certain properties into the modeling. The critical point of the statistical approach is, however, access to large and reliable reference data. The data need to represent the whole universe of proteins, and their annotation should be very reliable, ideally confirmed by several different experimental methods. A majority of these methods analyze protein sequences using a sliding window which encompasses a few amino acids representing a potentially amyloidogenic fragment. For historical reasons related to the development of amyloid databases, the window often spans six amino acids. The majority of computational tools represent a combination of physiochemical and statistical approaches, revealing different aspects of the amyloid processes. Currently available bioinformatics predictors or modelers, assigned to their main computational methods, are presented in Fig. 1. Included are only those methods which are currently available as web servers or downloadable software. We describe them in chronological order of their first release, which shows the historical course of development in this field. Notably, depending on the time of their release, the methods were based on different reference data used for their development and different data for evaluation of their performance. 3.1

Tango

Tango was released in 2004 [12], and it is based on statistical physics. It was one of the earliest methods for predicting beta aggregation propensity, additionally assuming that the core regions of an aggregate are fully buried. However, as its authors emphasize, beta aggregation propensity is not necessarily equivalent to amyloidogenicity [13]. Tango identifies possible states or conformations of a given fragment, using the Boltzmann distribution. To obtain an aggregation propensity, the partition function of the conformational space is calculated for a given fragment. Tango enables aggregation prediction for different temperatures, pH, and ionic strength values. Furthermore, it takes into consideration the modifications of N- and C-terminals of peptides. It is available to download as well as a web server: http://tango.crg.es/.

6

Małgorzata Kotulska and Jakub W. Wojciechowski

Fig. 1 The main groups of bioinformatics approaches for modeling and prediction of amyloids and their representative methods 3.2

3D Profiles

3.3 PASTA and PASTA 2.0

3D Profiles, published in 2006 [2], was the first physicochemical and structure-based approach to predict aggregation of protein fragments. As a template for modeling amyloid fibers, it uses the cross-β spine structure of the NNQQNY peptide, derived from the sup35 yeast prion. A query sequence is analyzed through a sliding window spanning six amino acids. Then, each hexapeptide is threaded on the template structure. The template consists of two β-sheets, one of which is shifted relative to the other and the model scored. If the lowest energy conformation of such a system is below an assumed threshold, the fragment is considered as amyloidogenic. Upon registration, modeling results for query sequences are searched in the ZipperDB database. The user obtains information on the potential amyloidogenicity and a model structure. The service is available as a web server: http://services.mbi.ucla.edu/zipperdb/. PASTA, first published in 2006 [14], and the current version PASTA 2.0, in 2014 [15], use statistical potentials. PASTA 2.0 predicts amyloidogenicity of a protein fragment based on pairwise potentials, which estimate interaction energy between pairs of amino acids in a sequence. This energy is calculated for a given fragment with itself in both parallel and antiparallel manners. It allows not only for identification of amyloidogenic regions in the protein but also for inference about the type of β-sheet formed by a given fragment. In principle, the same approach can be applied for the prediction of cross aggregation between two different fragments, also from different proteins, which is an available option.

Amyloid Predictors

7

Apart from amyloidogenicity, PASTA 2.0 predicts the secondary structure and disordered regions, using machine learning methods. It is available as a web server: http://protein.bio.unipd.it/pasta2/. 3.4

Aggrescan

Aggrescan, published in 2007 [16], is a server for the prediction of amyloidogenic hot spots. It uses an aggregation-propensity value for each residue in the sequence, based on experimentally derived aggregation propensity in hexapeptides, derived from the effects of mutations on amyloid beta aggregation [17]. The prediction is performed by averaging this aggregation propensity over a sliding window and assigning this value to the central position of the window. If there are five or more positions with average propensity above a certain threshold and there are no proline residues, such region is considered amyloidogenic. It is available as a web server: http://bioinf.uab.es/aggrescan/.

3.5

Aggrescan3D

Aggrescan3D is a set of physicochemical methods, first introduced in 2015, which extended Aggrescan adding different functionalities. The web server [18] and a standalone package [19] are methods for structure-based prediction of aggregation properties. A new version of a web server Aggrescan3D 2.0 was published in 2019 [20]. The method uses amyloid classification of a sequence from the previously developed prediction method Aggrescan [16]. However, in contrast to its sequence-based predecessor, these tools work on a protein structure as an input. The methods which only rely on a sequence find difficulties predicting the aggregation propensity of folded globular proteins, where aggregation-prone sites are often not contiguous in sequence or buried inside the native structure. The Aggrescan3D (A3D) server overcomes these limitations by taking into account the protein structure and the experimental aggregation propensity scale from Aggrescan. The method first finds amino acids that are solvent accessible and assigns a spherical region attributing aggregation propensity of a given fragment for each of these residues. Then, the sum of scores for a given structure region is used for the prediction. An option to perform the prediction in dynamic mode is also available. It takes into account protein flexibility probed with CABS flex simulations [21, 22]. Finally, the tool can be used for rapid testing for amyloidogenicity of mutants. The method is available as both a server [18] and a standalone Python package [19]. In Aggrescan3D ver. 2.0, the authors added dynamic mode prediction for larger, multichain proteins, performing the prediction of mutation effects on the overall protein stability and automatic optimization of the protein solubility. Aggrescan3D also offers standalone packages, links accessible from the sites of web servers. Aggrescan3D: http://biocomp.chem.uw.edu.pl/A3D/. Aggrescan3D 2.0: http://biocomp.chem.uw.edu.pl/A3D2.

8

Małgorzata Kotulska and Jakub W. Wojciechowski

3.6

NetCSSP

NetCSSP, released in 2009 [23], is a machine learning method predicting a propensity for non-native secondary structure of a protein. It finds out if a protein fragment can naturally transit to a non-native secondary structure, especially becoming a β-strand, typical of amyloidogenic peptides. The method is based on two artificial neural networks which scan the sequence with a window seven amino acids long. Therefore, the query sequence needs to be at least seven amino acids long, which could be a limiting factor since a large part of known amyloid hot spots encompass six amino acids. It is available as a web server: http://cssp2.sookmyung.ac.kr/.

3.7

Pafig

Pafig, published in 2009 [24], is a machine learning predictor of amyloidogenicity. It is based on the Supported Vector Machine classifier, which uses selected physicochemical and biological properties of amino acids from the AAindex database [25]. Similar to many other methods, it scans a query sequence with a window six amino acids long. Although there are concerns about the quality of the artificially built training dataset for this method [26], it is still in use as a standalone method and part of meta-predictors. Pafig executable modules can be downloaded from: http://www. mobioinfor.cn/pafig/.

3.8 AmylPred and AmylPred2

AmylPred was first developed in 2009 [27] and its newer version AmylPred2 in 2013 [28]. They are meta-predictors which compute consensus results from several different methods. The first version of AmylPred was based on five methods: amyloidogenic pattern discovered in the dataset from Lopez de la Paz et al. [1], Tango [12], SecStr by the same authors which model conformational switches [29], Average Packing Density—later implemented as FoldAmyloid [30, 31], hexapeptide conformational energy [32]. A modern version of the tool, AmylPred2, uses 11 external methods. The authors added NetCSSP [23], Pafig [24], AmyloidMutants [33], Salsa [34], Waltz [35], and Aggrescan [16]. A protein fragment is considered amyloidogenic if half of the selected methods agree upon its amyloidogenicity. It is available as a web server: AmylPred: http://aias. biol.uoa.gr/AMYLPRED/, AmylPred2: http://aias.biol.uoa.gr/ AMYLPRED2/.

3.9

FoldAmyloid, published in 2010 [34], is a statistics-based method. It uses several scoring functions based on the expected number of residue–residue contacts for a given amino acid type and the expected number of hydrogen bond donors and acceptors in a protein [33]. Scores are averaged in a sliding window, calculating a “smoothed” profile. If the score is above an assumed threshold at all positions of a given fragment, it is classified as amyloidogenic. A user can decide on a scoring function to use the length of averaging window and the threshold value. A single protein sequence for each query is only allowed. It is available as a web server: http://bioinfo.protres.ru/fold-amyloid/.

FoldAmyloid

Amyloid Predictors

3.10

Waltz

9

Waltz was released in 2010 [29]. It is a statistics-based method, one of the earliest which addressed the importance of an exact sequence pattern in amyloidogenic fragments. To develop it, the authors significantly extended the available AmylHex dataset [2] of experimentally validated amyloidogenic hexapeptides. Based on the enlarged dataset, they built a position-specific scoring matrix. The matrix gave rise to a new scoring function. The scoring takes into consideration the position specific amino acid frequencies, physicochemical properties of amino acids and position-specific energy function derived from the set of structural models of amyloid fibers. The prediction of amyloid propensity is performed using a window sliding along a query sequence, spanning six amino acids. The score obtained for the window is compared with a threshold, which can be modified by the user. It is available as a web server: https://waltz. switchlab.org/.

3.11 AmyloidMutants

AmyloidMutants [31] is a method based on statistical physics and the concept of structural patterns commonly occurring in known structures of amyloid fibers. The method assesses the possibility of adopting each of these states for a given amino acid sequence using statistical potential derived from residue–residue contact frequencies. Using energy criteria, AmyloidMutants quantifies the effects of mutations on fibril conformation and stability. The tool is available as a web server, which allows to adjust a large number of settings and assess the influence of mutations. The web server is available at: http://amyloid.csail.mit.edu/index.html.

3.12

MetAmyl

MetAmyl, available since 2010 [36], is a meta-predictor which combines four methods to predict amyloidogenicity of a hexapeptide fragment: FoldAmyloid [34], Pafig [24], Salsa [32], and Waltz [29]. However, the consensus of classifiers is not the ultimate response. The results of constituent predictors are used as inputs to a machine learning method based on logistic regression. The procedure is performed for each window encompassing six amino acids in a query sequence. The authors indicate that the score from logistic regression model could be interpreted as the probability to form an amyloid fiber. In order to increase the speed of the method, the authors precomputed results for all 64 million possible hexapeptides. The tool is available as a web server: http:// metamyl.genouest.org/e107_plugins/metamyl_aggregation/db_pre diction_meta.php.

3.13

FISH Amyloid

FISH Amyloid, which was released in 2014 [37], is an original machine learning method based on identification of co-occurrence patterns. The correlation between position-specific occurrences of specific pairs of amino acids is calculated from a window of specified length, which is shifted along all training sequences. The most relevant segments in amyloid-positive

10

Małgorzata Kotulska and Jakub W. Wojciechowski

sequences, carrying a strongly recurrent classification pattern, are automatically found in the iterative procedure. They are considered as amyloid-positive part of the training data. All generated negative segments equally contributed to their representative pattern. The distance between patterns from positive and negative segments is represented in the co-occurrence matrix of positives. Selected are only patterns distant enough from the averaged pattern of negative segments and close to other positive sequences. The training dataset, which was used to develop the method, consisted of 436 (146 amyloid-positive) experimentally tested fragments, available at the date of the method’s release. A query sequence is compared with the resulting classification patterns. The tool is available as a web server: http://comprec-lin.iiar.pwr.edu.pl/ fishInput/. 3.14

ArchCandy

ArchCandy, published in 2015 [38], is a method based on a physicochemical approach. It was one of the first methods which assumed that minimal amyloidogenic fragments may be longer than a few amino acids. ArchCandy is based on the assumption that amyloid-forming fragments adopt a β-arch conformation which consists of two β-strands joined by a turn. First, short candidate fragments in a query sequence are identified, which are then gradually extended at both sides. The elongating β-arch is evaluated in the procedure, using a custom scoring function. The score takes into consideration electrostatic interaction, potential of salt bridge formation, solvent accessible surface, and numerous other terms. The major limitation of this method is that it considers only parallel packing of the β-arches. The Java software can be ordered upon registration at the web site: https://bioinfo.crbm.cnrs.fr/index.php? route¼tools&tool¼7.

3.15

BetaSerpentine

BetaSerpentine, released in 2018 [39], builds upon ideas introduced in ArchCandy [38]. It models longer amyloid fragments, forming β-serpentines which are defined as multiple interconnected β-arches. In order to do so, it predicts β-arches with the ArchCandy algorithm, ranks them in order of preference, and then attempts to merge such predicted fragments into longer segments. Contrary to ArchCandy, BetaSerpentine was not designed to detect amyloidogenic regions but reconstruct all possible β-serpentine arrangements in a query protein sequence. As this method was designed to work with longer protein fragments it cannot be used for sequences shorter than 18 amino acids. The tool is available as a web server: https://bioinfo.crbm.cnrs.fr/index.php?route¼tools& tool¼25. Its source code is also available: github.com/ stanislavspbgu/BetaSerpentine.

Amyloid Predictors

11

3.16

Fibrilizer

Fibrilizer is intended for prediction of amyloid structures. It consists of four modules, released in 2014 and 2015: CreateFibril, Fibril Mutant, MAPOR, and SEMBA. CreateFibril is a tool which builds models of protein fibrils at atomic resolution, based on the previously published studies of its authors [40, 41]. Fibril Mutant evaluates the stability of an amyloid, based on the provided .pdb structure. MAPOR analyzes the mutational landscape of a protein, based on the energy function and with regard to Lennard-Jones, Coulomb, and solvation energies. SEMBA, using the same measures, analyzes binding affinity of amyloid proteins. The programs can be used with installed GROMACS, SCWRL4, and PDB2PQR. The modules are available for download from: http://amyloid.cs. mcgill.ca/.

3.17

APPNN

APPNN, Amyloidogenicity Propensity Prediction Neural Network, was published in 2015 [42]. It was developed in reply to the growing number of experimentally validated amyloidogenic fragments. A substantial number of training data enables the use of more complex machine learning models including artificial neural networks. The core of this method is a neural network using biochemical and physicochemical properties of amino acids. The analysis uses a sliding window containing six amino acids. The input sequence is scanned and annotated as amyloidogenic if at least one such fragment is classified as amyloidogenic. APPNN is available as an R package from: http://cran.r-project.org/web/packages/appnn/index.html.

3.18

AmyloGram

AmyloGram, available since 2015 [43], uses an n-gram-based method and machine learning (Fig. 2). It was trained on 1465 sequences (421 amyloid-positive), and each sequence was represented by a wide range of features that could be relevant in amyloidogenicity. The feature set included the full alphabet of amino acids, two standard physicochemical representations, and 18,535 unique encodings obtained automatically from clustering training sequences. The procedure was followed with generation of n-grams and selection of the most informative alphabet, using the method called Quick Permutation Test. Finally, a random forest classifier was employed to select the alphabet most representative for amyloid fragments. The method discovered the most informative encoding for amyloidogenicity, including six subgroups of amino acids. Interestingly, the alphabet, which was identified as the most efficient encoding for the modeling, did not belong to the set of classical physicochemical representations. However, it turned out surprisingly meaningful with regard to the amyloid propensity. The tool is available as a web server: http://www.smorfland.uni.wroc.pl/ shiny/AmyloGram/.

12

Małgorzata Kotulska and Jakub W. Wojciechowski

Fig. 2 The main stages of classification by AmyloGram 3.19

AggScore

AggScore, published in 2018 [44], is another example of a structure-based method. It predicts aggregation prone regions by identifying hydrophobic and charged patches on the protein surface. Based on these patches, a score for each surface region is calculated as a linear combination of hydrophobic and electrostatic terms. Higher values of the scoring function represent higher aggregation propensity. The aggregation propensity function was trained on a benchmark set of 31 adnectin proteins. Although the method was designed for globular proteins, the authors showed that it could be used for the prediction of hexapeptide aggregation, based on their predicted structures. AggScore is implemented as a part of commercially available Schro¨dinger’s BioLuminate Suite.

3.20

RFAmyloid

RFAmyloid, available since 2018 [45], is a method based on machine learning using protein fragments. It applies a random forest classifier, which takes as an input a set of sequence descriptors of physicochemical properties of amino acids, predicted structural features, such as secondary structure or solvent accessible area, as well as features derived from amino acid profiles. The tool is available as a web server: http://server.malab.cn/RFAmyloid/.

3.21

AgMata

AgMata, released in 2020 [46], is a statistical method for the identification of amyloidogenic regions using unsupervised machine learning. AgMata calculates the aggregation propensity by evaluating which regions of a given protein are capable of forming strong β-sheet-like interactions. It is based on statistical potentials which include biophysical information, encoding each position of a sequence with a set of features. Based on this description, it estimates the probability of pairing between given residues in a parallel and antiparallel manner. The interaction between positions is not explicitly dependent on the amino acid type but rather on the local environment of the amino acids. As an unsupervised method, it is independent of any training dataset. The source code in Python 2.7 is available from: https://bitbucket.org/bio2byte/agmata.

Amyloid Predictors

3.22

4

PATH

13

PATH, Prediction of Amyloidogenicity by Threading [47], is a recently developed tool of 2020, which combines molecular modeling and machine learning techniques. Using comparative structure modeling, a query sequence of a short peptide is threaded on amyloid structures that represent seven experimentally confirmed steric zipper classes. Models are scored using several statistics and physics-based scoring functions. These scores are then used as inputs for the logistic regression classifier. The method, apart from classification into amyloids and non-amyloids, also returns predicted structures of aggregates formed by the given fragment. PATH is a computational pipeline based on Python and bash scripts, and requires Modeller [48] and PyRosetta [49] software installed on the computer. The package can be downloaded from: https://github.com/KubaWojciechowski/PATH.

Summary Amyloid aggregation of proteins, which may lead to various amyloidoses, is still largely unexplored. The mechanisms underlying protein aggregation, identification of protein mutations which are decisive in the process, the aggregation kinetics, and, most importantly, potential measures that could help to control the aggregation need extensive research on a large scale. Using only currently available experimental methods, such wide studies would not yet be possible and would lead to an excessively time consuming and expensive endeavor. However, researchers have already proved that amyloid sequences have enough regularities to be successfully modeled in silico. Bioinformatics modeling has led to the development of effective computational tools and web services, including amyloid databases of different profiles and amyloid predictors, classifying potential amyloid propensity of amino acid sequences or modeling their structures. These tools allow for fast and effective selection of candidates for amyloid peptides and proteins. More importantly, they are able to indicate sequence determinants of amyloidogenicity and explain the mechanisms underlying aggregation. This may help to design mutations that steer amyloid fibrillation. A wide variety of different approaches to modeling reveals different aspects of the amyloid processes, and a significant number of currently available tools provide means for effective consensus methods. Building on these results will provide good methods to understand and control amyloids better.

14

Małgorzata Kotulska and Jakub W. Wojciechowski

Acknowledgment This work was partially supported by the National Science Centre, Poland, Grant 2019/35/B/NZ2/03997. References 1. Lopez de la Paz M, Serrano L (2004) Sequence determinants of amyloid fibril formation. Proc Natl Acad Sci U S A 101:87–92 2. Thompson MJ, Sievers SA, Karanicolas J et al (2006) The 3D profile method for identifying fibril-forming segments of proteins. Proc Natl Acad Sci U S A 103:4074–4078 3. Pawlicki S, Le Bechec A, Delamarche C (2008) AMYPdb: a database dedicated to amyloid precursor proteins. BMC Bioinformatics 9:273 4. Beerten J, Van Durme J, Gallardo R et al (2015) WALTZ-DB: a benchmark database of amyloidogenic hexapeptides. Bioinformatics 31:1698–1700 5. Louros N, Konstantoulea K, De Vleeschouwer M et al (2020) WALTZ-DB 2.0: an updated database containing structural information of experimentally determined amyloid-forming peptides. Nucleic Acids Res 48(D1): D389–D393 6. Wozniak PP, Kotulska M (2015) AmyLoad: website dedicated to amyloidogenic protein fragments. Bioinformatics 31:3395–3397 7. Thangakani AM, Nagarajan R, Kumar S et al (2016) CPAD, curated protein aggregation database: a repository of manually curated experimental data on protein and peptide aggregation. PLoS One 11:e0152949 8. Rawat P, Prabakaran R, Sakthivel R, Mary Thangakani A, Kumar S, Gromiha MM (2020) CPAD 2.0: a repository of curated experimental data on aggregating proteins and peptides. Amyloid 27:128 9. Varadi M, De Baets G, Vranken WF et al (2018) AmyPro: a database of proteins with validated amyloidogenic regions. Nucleic Acids Res 46(D1):D387–D392 10. Goldschmidt L, Teng PK, Riek R et al (2010) Identifying the amylome, proteins capable of forming amyloid-like fibrils. Proce Natl Acad Sci USA 107:3487–3492 11. Nastou KC, Tsiolaki PL, Iconomidou VA (2019) The amyloid interactome: mapping protein aggregation. Amyloid 26(Suppl 1):142–143 12. Fernandez-Escamilla AM, Rousseau F, Schymkowitz J, Serrano L (2004) Prediction of sequence-dependent and mutational effects

on the aggregation of peptides and proteins. Nat Biotechnol 22(10):1302–1306 13. Rousseau F, Schymkowitz J, Serrano L (2006) Protein aggregation and amyloidosis: confusion of the kinds? Curr Opin Struct Biol 16 (1):118–126 14. Trovato A, Chiti F, Maritan A, Seno F (2006) Insight into the structure of amyloid fibrils from the analysis of globular proteins. PLoS Comput Biol 2(12):e170 15. Walsh I, Seno F, Tosatto SC, Trovato A (2014) PASTA 2.0: an improved server for protein aggregation prediction. Nucleic Acids Res 42: W301–W307 16. Conchillo-Sole´ O, de Groot NS, Avile´s FX et al (2007) AGGRESCAN: a server for the prediction and evaluation of “hot spots” of aggregation in polypeptides. BMC Bioinformatics 8:65 17. de Groot NS, Aviles FX, Vendrell J, Ventura S (2006) Mutagenesis of the central hydrophobic cluster in Abeta42 Alzheimer’s peptide. Side-chain properties correlate with aggregation propensities. FEBS J 273(3):658–668 18. Zambrano R, Jamroz M, Szczasiuk A et al (2015) AGGRESCAN3D (A3D): server for prediction of aggregation properties of protein structures. Nucleic Acids Res 43(W1): W306–W313 19. Kuriata A, Iglesias V, Kurcinski M et al (2019) Aggrescan3D standalone package for structure-based prediction of protein aggregation properties. Bioinformatics 35 (19):3834–3835 20. Kuriata A, Iglesias V, Pujols J et al (2019) Aggrescan3D (A3D) 2.0: prediction and engineering of protein solubility. Nucleic Acids Res 47(W1):W300–W307 21. Jamroz M, Kolinski A, Kmiecik S (2013) CABS-flex: server for fast simulation of protein structure fluctuations. Nucleic Acids Res 41: W427–W431 22. Kuriata A, Gierut AM, Oleniecki T et al (2018) CABS-flex 2.0: a web server for fast simulations of flexibility of protein structures. Nucleic Acids Res 46(W1):W338–W343 23. Kim C, Choi J, Lee SJ et al (2009) NetCSSP: web application for predicting chameleon

Amyloid Predictors sequences and amyloid fibril formation. Nucleic Acids Res 37:W469–W473 24. Tian J, Wu N, Guo J, Fan Y (2009) Prediction of amyloid fibril-forming segments based on a support vector machine. BMC Bioinformatics 10(Suppl 1):S45 25. Kawashima S, Pokarowski P, Pokarowska M et al (2008) AAindex: amino acid index database, progress report 2008. Nucleic Acids Res 36:D202–D205 26. Kotulska M, Unold O (2013) On the amyloid datasets used for training PAFIG - how (not) to extend the experimental dataset of hexapeptides. BMC Bioinformatics 14:351 27. Frousios KK, Iconomidou VA, Karletidi CM, Hamodrakas SJ (2009) Amyloidogenic determinants are usually not buried. BMC Struct Biol 9:44 28. Tsolis AC, Papandreou NC, Iconomidou VA, Hamodrakas SJ (2013) A consensus method for the prediction of ‘aggregation-prone’ peptides in globular proteins. PLoS One 8(1): e54175 29. Hamodrakas SJ, Liappa C, Iconomidou VA (2007) Consensus prediction of amyloidogenic determinants in amyloid fibril-forming proteins. Int J Biol Macromol 41(3):295–300 30. Galzitskaya OV, Garbuzynskiy SO, Lobanov MY (2006) Prediction of amyloidogenic and disordered regions in protein chains. PLoS Comput Biol 2(12):e177 31. Garbuzynskiy SO, Lobanov MY, Galzitskaya OV (2010) FoldAmyloid: a method of prediction of amyloidogenic regions from protein sequence. Bioinformatics 26(3):326–332 32. Zhang Z, Chen H, Lai L (2007) Identification of amyloid fibril-forming segments based on structure and residue-based statistical potential. Bioinformatics 23(17):2218–2225 33. O’Donnell CW, Waldispu¨hl J, Lis M et al (2011) A method for probing the mutational landscape of amyloid structure. Bioinformatics 27(13):i34–i42 34. Zibaee S, Makin OS, Goedert M, Serpell LC (2007) A simple algorithm locates beta-strands in the amyloid fibril core of alpha-synuclein, Abeta, and tau using the amino acid sequence alone. Protein Sci 16(5):906–918. Erratum in: Protein Sci (2007)16(6):1242 35. Maurer-Stroh S, Debulpaep M, Kuemmerer N et al (2010) Exploring the sequence determinants of amyloid structure using positionspecific scoring matrices. Nat Methods 7 (3):237–242

15

36. Emily M, Talvas A, Delamarche C (2013) MetAmyl: a METa-predictor for AMYLoid proteins. PLoS One 8(11):e79722 37. Gasior P, Kotulska M (2014) FISH Amyloid - a new method for finding amyloidogenic segments in proteins based on site specific co-occurrence of aminoacids. BMC Bioinformatics 15(1):54 38. Ahmed AB, Znassi N, Chaˆteau MT, Kajava AV (2015) A structure-based approach to predict predisposition to amyloidosis. Alzheimers Dement 11(6):681–690 39. Bondarev SA, Bondareva OV, Zhouravleva GA, Kajava AV (2018) BetaSerpentine: a bioinformatics tool for reconstruction of amyloid structures. Bioinformatics 34(4):599–608 40. Smaoui MR, Orland H, Waldispu¨hl J (2015) Probing the binding affinity of amyloids to reduce toxicity of oligomers in diabetes. Bioinformatics 31(14):2294–2302 41. Smaoui MR, Poitevin F, Delarue M et al (2013) Computational assembly of polymorphic amyloid fibrils reveals stable aggregates. Biophys J 104(3):683–693 42. Famı´lia C, Dennison SR, Quintas A, Phoenix DA (2015) Prediction of peptide and protein propensity for amyloid formation. PLoS One 10(8):e0134679 43. Burdukiewicz M, Sobczyk P, Ro¨diger S et al (2017) Amyloidogenic motifs revealed by n-gram analysis. Sci Rep 7(1):12961 44. Sankar K, Krystek SR Jr, Carl SM et al (2018) AggScore: prediction of aggregation-prone regions in proteins based on the distribution of surface patches. Proteins 86 (11):1147–1156 45. Niu M, Li Y, Wang C, Han K (2018) RFAmyloid: a web server for predicting amyloid proteins. Int J Mol Sci 19(7):2071 46. Orlando G, Silva A, Macedo-Ribeiro S et al (2020) Accurate prediction of protein betaaggregation with generalized statistical potentials. Bioinformatics 36(7):2076–2081 47. Wojciechowski JW, Kotulska M (2020) PATH prediction of amyloidogenicity by threading and machine learning. Sci Rep 10(1):7721 48. Sali A, Blundell TL (1994) Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 234(3):779–815 49. Chaudhury S, Lyskov S, Gray JJ (2010) PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta. Bioinformatics 26(5):689–691

Chapter 2 Protocols for Rational Design of Protein Solubility and Aggregation Properties Using Aggrescan3D Standalone Aleksander Kuriata, Aleksandra E. Badaczewska-Dawid, Jordi Pujols, Salvador Ventura, and Sebastian Kmiecik Abstract Protein aggregation is a major hurdle in the development and manufacturing of protein-based therapeutics. Development of aggregation-resistant and stable protein variants can be guided by rational redesign using computational tools. Here, we describe the architecture and functionalities of the Aggrescan3D (A3D) standalone package for the rational design of protein solubility and aggregation properties based on threedimensional protein structures. We present the case studies of the three therapeutic proteins, including antibodies, exploring the practical use of the A3D standalone tool. The case studies demonstrate that protein solubility can be easily improved by the A3D prediction of non-destabilizing amino acid mutations at the protein surfaces. Key words Protein aggregation, Structure-based design, Aggregation prediction, Protein design, Aggrescan3D, Computational modeling

1

Introduction Protein-based therapeutics, such as antibodies, receptor decoys, and replacement enzymes have already transformed the drug discovery field and have the potential to yield new therapies. Perhaps the most difficult challenge facing development and manufacture of protein therapeutics is protein aggregation [1]. The experimental identification of protein regions responsible for aggregation can be expensive and time-consuming. Therefore, there is an increasing need for computational tools that can support these efforts and aid in the design of soluble protein variants [1]. In 2015, we introduced Aggrescan3D (A3D) web server for the prediction of aggregation propensity in protein structures [2]. The method relies on experimentally derived scale of aggregation propensity for each natural amino acid [3], which has been successfully used in a sequence-based prediction Aggrescan method [4]. In addition, A3D uses structure information and allows for the

Mai Suan Li et al. (eds.), Computer Simulations of Aggregation of Proteins and Peptides, Methods in Molecular Biology, vol. 2340, https://doi.org/10.1007/978-1-0716-1546-1_2, © Springer Science+Business Media, LLC, part of Springer Nature 2022

17

18

Aleksander Kuriata et al.

identification of aggregation-prone residues on protein surface. Through introducing virtual mutations, A3D is an effective tool for the design of protein variants with increased solubility and testing of the impact of pathogenic mutations. A3D server enables also to take into account the dynamic flexibility of protein structures, which may influence aggregation propensity. This is possible in the A3D dynamic mode that uses the CABS-flex method for the fast simulations of flexibility of globular proteins [5, 6]. Since 2015, A3D has been successfully used in many studies aimed at enhancing protein solubility [7–10]. In 2019, following the users’ requests, we introduced the updated 2.0 version of the A3D web server [11] and A3D standalone application package [12]. The A3D 2.0 web server offers several new functionalities like simultaneous prediction of changes in protein solubility and stability upon mutation, the automated mutations tool that identifies high scoring residues and suggests protein variants with optimized solubility, the dynamic mode calculations for large and multimeric proteins, a REST-ful service to incorporate A3D calculations in automatic pipelines, and a new, enhanced web server interface. The A3D standalone application is a fully functional implementation of the A3D 2.0 web server that is intended to work locally and introduces command line utilities. Therefore, A3D standalone addresses the important aspects of data privacy and user control over every stage of the modeling process. In this work, we briefly describe the A3D standalone functionalities (see Subheading 2), provide some basic examples of its usage (see Subheading 3) and demonstrate A3D applications for rational design of antibodies and other therapeutic proteins (see Subheading 4). As demonstrated, the A3D standalone is a flexible tool that can be adapted to a variety of needs and easily incorporated by the users into their own pipelines.

2

A3D Standalone Package

2.1 Installation Instructions and Requirements

Local installation of A3D standalone requires Python 2.7. We recommend using Anaconda—a scientific Python distribution that comes with pre-installed packages and conda package manager which allows for an easier and less error-prone installation. Installation of Anaconda is straightforward. For this purpose, download the installer for Python 2.7 from their web page and follow the instructions below (though A3D can be installed in many ways, more instructions are provided in the A3D online repository, see Note 1). The installation of base A3D functionality can be achieved with one command (note: Windows users should use the Anaconda Prompt):

Design of Protein Solubility and Aggregation Properties

19

Fig. 1 Aggrescan3D pipeline $ conda install -c lcbio aggrescan3d

To verify the installation run: $ aggrescan -i 2gb1 -w test_run -v 4

A3D comes with a built-in server-like app that can be operated through the web browser (we recommend Google Chrome and suggest not to use Internet Explorer as it will not support full functionality of the app). To verify the app installation type: $ a3d_server

On the first run, you will be prompted to install packages needed for the server. Type “y” to install them—this is mandatory for the app to run. Re-run the command—you will be prompted for the FoldX location. We recommend downloading and providing FoldX’s location at this point, although it can be set up later via the UI. If the installation succeeds, the server should start and be available under the address localhost:5000 in your web browser. Please note that the A3D program must continue running for the web server to function. A3D has two main dependencies that are needed for its basic functionalities: FoldX [13, 14] and CABS-flex packages [5] (see the

20

Aleksander Kuriata et al.

A3D pipeline in Fig. 1). FoldX is used for the energetic minimization of input structures, creation of mutant structures, and stability calculations. CABS-flex standalone package is required for the dynamic mode to generate a set of models reflecting protein flexibility. Both packages can be downloaded free of charge for academic users (for download instructions, see Note 1). 2.2 A3D Input Commands

The only required input of the A3D is a protein structure in PDB format. Table 1 introduces basic input and set-up commands. Additional technical tips are provided in Note 2.

2.3 Mutation Commands

A3D allows testing the effect of mutations on the input structure. The mutations can be introduced (1) manually, where the user specifies a residue and a mutation type; or (2) automatically on the most aggregation-prone residues identified by A3D analysis. Mutants are generated using the FoldX program (--foldx option is required) and A3D analysis is performed on each mutant including the calculation of energetic effect of the mutation. Table 2 introduces mutation commands.

2.4 Dynamic Mode Commands

In the dynamic mode, A3D uses CABS-flex tool [5] for fast simulations of protein flexibility. CABS-flex generates several protein models, which reflect protein flexibility [15, 16], and a full A3D analysis in static mode is performed on each of the model. A3D takes full advantage of the CABS-flex package and allows the user to tailor the simulation to their needs. Table 3 describes the dynamic mode commands.

2.5 Output Commands

Output commands mostly concern logging and debugging as the calculations of output remain much the same. We recommend using a high verbosity value to gain insights on how the program operates. Table 4 provides the output-related commands.

3

Examples

3.1 Basic Run in Static Mode

Once the installation is done, A3D can be used in two ways: the command line or the app. Please note that for Windows all the commands need to be run with Anaconda Prompt. Below, we describe how to run a simple job using both methods. Using command line run (for the command syntax, see Tables 1 and 4): $ aggrescan -v 4 -w a3d_simple -i 2gb1

Design of Protein Solubility and Aggregation Properties

21

Table 1 A3D input and setup commands Command

Description

Example

-i, --protein PDB

Allows providing input protein structure as PDB code (fetched from -i 2GB1 www.rcsb.org) or a text file in pdb or pdb.gz format -i filepath

-C, --chain CHAIN_ID

Limits the A3D analysis to the specific protein chain(s) provided as the -C A valid chain identifiers. All other chains will be ignored

-c, --config_file CONFIG_FILE

Allows providing a configuration file that sets up the program’s -c filepath parameters. Note that commands from command line override the config file

-D, --distance VALUE

Sets up the distance value used in A3D score calculations (see Subheading 3.1). default: 10

-D 10

-f, --foldx

Introduces a structure repair by FoldX program before A3D stage. The flag is also required when mutation options are used

-f filepath -f

-n, --naccess

Uses naccess instead of FreeSASA for solvent-accessible surface area calculation

-n

Table 2 A3D mutation setup commands Command

Description

Example

-m, --mutate VALUE

Provides the mutation(s) code applied to the input protein. The syntax for the option: -m See Note 2 for additional tips

-m MW1A

-am, -auto_mutation OPTIONS: N M []

Predicts more soluble mutants (into Glu, Asp, Arg, Lys) for N automatically selected residues. The second argument, M, is the number of used cores. An optional argument is a list of residues for which mutations are excluded. The syntax for the option: -am ‘

[list_of_excluded_residues as ]’ See Note 2 for additional tips

-am ‘2 2 1A 2B 3C’

This command keeps the job in an a3d_simple folder, where one can check all the output files provided by A3D: l

A3D.csv—text file with A3D score results consists of columns: (1) Protein—an internal name used by the program, (2) Chain—one-letter chain ID, (3) Residue—residue’s index, (4) Residue_name—one-letter code identifying the amino acid, (5) Score—Aggrescan3D score.

l

Score plot for each chain in PNG format, chainID.png.

22

Aleksander Kuriata et al.

Table 3 A3D dynamic mode commands Command

Description

Example

-d, --dynamic Uses the CABS-flex tool for the protein flexibility simulation and aggregation propensity analysis on generated models

-d

--n_models VALUE

Chooses the number of models that CABS-flex will generate for the analysis

--n_models 8

--cabs_dir PATH

Specifies a path for CABS-flex program. This is mostly intended as a developer option

--cabs_dir /PATH/ CABS

--cabs_config Specifies a path for a CABS-flex configuration file that gives the user more control --cabs_config / PATH of the CABS-flex simulation PATH/ CABS_config.ini

Table 4 A3D output commands Command

Description

Example

-v, --verbose VALUE

Decides how much information will be reported during the program run. -v 4 Choose between: 0—Critical, 1—Warning, 2—Info (default), 3—Log files, 4—Debug. See Note 2 for additional information

-r, --remote

Disables the display of logging to the console and stores the logs in the working directory in the Aggrescan.log file instead

-M, --movie FORMAT

If PyMol is installed, a short movie of the protein is saved. The supported -m webm formats are “webm” and “mp4”

-w, --work_dir PATH

Saves all the results and temporary files to the specified PATH (note: this -w test_run will overwrite existing results if the same path is specified multiple times)

-r

l

PDB files—input.pdb and output.pdb (the result of the simulation with b factor field replaced with A3D score).

l

config.ini—configuration file that allows the user to re-run the simulation with the same settings or changing some options (see Note 2).

To run the same simulation in the app, run the server (a3d_server) and do not close that terminal window. Go to localhost:5000 in your browser and use the interface to set up the parameters of the run. Click “Run” button to start the program. We provide the sample setup in Fig. 2. After the run ends, A3D results will be displayed in several tabs, providing the details of the prediction and several interactive tools to inspect A3D scores and visualize the protein structure. User’s

Design of Protein Solubility and Aggregation Properties

23

Fig. 2 Web interface of the A3D standalone. The customization of the simulation run may be set within “Input structure” and “Options” panels marked in red

experience when running the A3D app should be similar to our A3D 2.0 web server. For the analysis of the results:

3.2 Automated Mutations Run

l

Plot tab presents the interactive per-chain plots of A3D scores for each residue.

l

Score tab provides score statistics and a score overview for each residue in tabular format.

l

Structure tab provides a visualization of the protein structure with coloring based on the A3D score. Residues can be labeled, and snapshots taken which are all available in the Gallery tab. The example results are presented in Fig. 3.

The A3D automated mutations feature automatically invokes two consecutive modeling steps. In the first, A3D analyzes the input structure in the static mode and identifies the most aggregationprone residues (two by default). In the second step, A3D creates mutants in which the most aggregation-prone residues are mutated to each of the four charged amino acids: D, E, K, and R (if two aggregation-prone residues are identified in the first step, then

24

Aleksander Kuriata et al.

Fig. 3 The example protein structure analyzed in the Structure tab of web interface of the A3D standalone. Use the Show labels button to select labels for aggregation-prone residues. Click Take snapshot to save the current view in the Gallery tab. The protein structure can be freely rotated with holding left mouse button and zoomed in and out with the mouse wheel

2  4 ¼ 8 mutants are created). It has been demonstrated that the presence of charged residues attenuates the aggregation propensity of hydrophobic stretches by provoking electrostatic repulsions, thus acting as gatekeepers of aggregation [17]. The automated mutations feature (-am option) creates a ranked list of point mutations, where both the solubilizing (AvgScore, AvgScoreDiff) and energetic (EnergyDiff, ΔΔG) effects are considered. More negative A3D average scores point to higher solubility predictions, while negative energetic effects suggest thermodynamically stabilizing mutations. This feature requires providing a path to FoldX force field algorithm. Using command line run (the command syntax is described in Table 2): $ aggrescan -v 4 -w a3d_enhance -i 2gb1 -am -f /path/to/foldx

This will generate several additional result files (the format of mutation codes is explained in Note 3): l

l

Mutations_summary.csv—CSV file containing the energetic and solubility effect for each created mutant. .csv—CSV file with A3D scores of each residue for the specific mutant.

Design of Protein Solubility and Aggregation Properties

25

Fig. 4 The Automated mutations tab provides similar functionality to the Structure tab but for all the mutants. The additional interactive Score profile plot shows the differences between the mutants. Click on the legend to show/hide selected mutant, mouse over the plot to see the details of each residue and use the tool at the bottom to save the plot l

l

l

.pdb—PDB output file with the mutant’s structure. .png—plot of A3D score for each mutant of the specific residue with the wild-type as a baseline (PNG format). .svg—plot of A3D score for each mutant of the specific residue with the wild type as a baseline (SVG format).

To run the same simulation in the app, select Enhance protein solubility and Stability calculation options in the app interface (see Fig. 2, the interface enables also providing a path to FoldX force field). After clicking the “Run” button, a new window will appear where certain amino acids can be frozen, i.e., excluded from the mutation procedure. To run the program, click “Save changes and submit.” A3D models multiple mutants with solubilizing substitutions in the aggregation-prone regions of the initial structure. These structures are re-analyzed, and the predictions are reported in the new tab Automated mutations. There, users can compare the impact on the aggregation propensity (A3D score) and protein stability (ΔΔG) for each of the mutants and visualize their 3D structures (see Fig. 4). Finally, we provide illustrative examples of using automated mutation feature in “Case Studies” (Subheadings 4.1, 4.2, and 4.3). 3.3 Dynamic Mode Run

To gain insights into the effect of protein flexibility on the aggregation propensity, the dynamic mode can be used. As mentioned earlier, CABS needs to be installed for this option to work. For the command line add a -d to the basic run example provided in Subheading 3.1 (for the command syntax see Table 3): $ aggrescan -v 4 -w a3d_dynamic -i 2gb1 -d

26

Aleksander Kuriata et al.

Please note that dynamic and automatic protein enhancements are incompatible. For the app, tick the Dynamic mode box. A new tab will appear in the app’s results page, offering a very similar functionality to the automated mutations output. A3D allows a detailed analysis of each flexible model that can be analyzed and compared instead. A plot of the RMSF (the measure of protein structure fluctuations during the CABS-flex simulation) will also appear at the bottom of the Dynamic tab, and additional files will be generated in the working directory: l

averages—Json-like formatted file with Aggrescan3D score for each of the CABS-flex generated models.

l

CABSflex_rmsf.csv—tab separated file with two columns: residue ID (chain letter + residue index, for example, A13) and RMSF score calculated by CABS-flex.

l

CABSflex_rmsf.png—a plot of the RMSF.

l

models.tar.gz or model_x.pdb—all the models generated by CABS-flex (if the GUI is used, the tar.gz archive is extracted).

l

stats.tar.gz or model_x.csv—Aggrescan3D analysis for each of CABS-flex models. The file is formatted similarly to A3D.csv. If the GUI is used, the .tar.gz archive is extracted.

The example of using the dynamic mode is provided in Subheading 4.3. 3.4 Managing Projects

4

All the projects run with the A3D app can be found in the Manage project tab, which allows the users to delete or re-run projects (see Fig. 5). There also is an option to add projects that were run manually from the command line to the Project Manager and enjoy the benefits of the visualization. In order to do so, click on the “Add a project” button and select the config.ini file that was generated with the command prompt.

Case Studies Aberrant self-assembly into nonfunctional aggregates is an intricate molecular process with toxic consequences associated with the onset of different human pathologies [18]. Protein aggregation is also a major and expensive drawback in the manufacturing and storage of protein-based therapies, draining production pipelines and restricting the development of otherwise promising biotherapies [19, 20]. Therefore, a large effort has been made to develop efficient strategies for rational design of soluble and stable biotherapeutics, particularly antibodies [21], and early stage detection of aggregation-prone regions (APRs). In the case studies below, we

Design of Protein Solubility and Aggregation Properties

27

Fig. 5 A3D project manager. It contains a list of projects, for which results can be accessed by clicking on a project name. If a project was run manually, it can still be added to the manager interface

discuss the use of the structure-based A3D automated mutation protocol to design mutant’s variants with improved solubility and/or stability without compromising their activity. The case studies employed in this chapter include the green fluorescent protein (GFP) (see Subheading 4.1); the heavy chain variable domain (VH) of the human DP47 antibody germline (see Subheading 4.2); and the Fab domain of the monoclonal antibody bevacizumab (Avastin®, Genentech) (see Subheading 4.3). As demonstrated, comparison of theoretically calculated scores with experimental measurements for selected mutants shows that A3D can be effective and low-cost tool, supporting rational design of protein-based drugs free from noxious aggregation processes. 4.1 Rational Design of Soluble Variants of the Green Fluorescent Protein

For the analysis of the GFP, we used a crystal structure of the folding-optimized GFP (fr-GFP, PDB: 2B3Q [22]). The A3D predictions were obtained using A3D static mode with automated mutation (-am option) for ten most aggregation-prone residues identified by the tool: $ aggrescan -i 2B3Q.pdb -w GFP_static -v 4 -f ~/PATH/FoldX/ foldx4 -D 10 -am ’10 4’

The A3D analysis for fr-GFP (see left column of Fig. 6) indicated several residues with significant aggregation propensity, reported by positive A3D scores, incorporated in a single

28

Aleksander Kuriata et al.

Fig. 6 A3D prediction of residue aggregation properties in fr-GFP protein. The left column shows the A3D aggregation profile plot. The right column shows the crystal structure of fr-GFP (PDB: 2B3Q). The protein surface is colored according to A3D score, where blue implies a soluble residues, and predicted aggregationprone residues are indicated in red shades

aggregation-prone patch at the protein surface (V11, G10, Y39, L221) and lesser APR of the isolated valine (V176, see protein surface in Fig. 6). All these residues and others that constituted the set of the ten residues with the highest A3D score were virtually substituted to charged amino acids. The energetic and solubilizing effects of these mutations are summarized in Table 5. Out of all possible variations (saved in Mutations_summary.csv file), the tool automatically selected several of the most stabilizing mutations with optimized solubility (highlighted in Table 5). Green fluorescent protein is an example of a protein that contains a modest aggregation-prone region formed by only a few exposed residues. For this reason, when the user requests the A3D screening of virtual mutations on a larger number of the residues, only some of them will have a significant aggregationprone score, while others may be automatically selected as the most beneficial, in terms of solubility and stability, although being not involved in the structural APRs. That is the case here. As presented in Table 5, among ten selected residues, the highest ranked virtual mutants (black bold font) correspond to residues with the lowest A3D profile score (T62, A206, V176, T38). Moreover, T62 and V68 are not contributing to an exposed APR, being a part of the α-helix that runs through the center of the beta-barrel. F223 and G10 are also important for maintaining stability of the protein spatial structure, which is reflected in the adverse energy effect of all virtual mutations (ΔΔG). Without excluding such residues from the analysis, the user may turn a neutral amino acid into an extrasoluble residue, while maintaining other dangerous aggregationprone exposed amino acids.

Design of Protein Solubility and Aggregation Properties

29

Table 5 A3D static mode prediction of the automatically generated mutants (-am option) in fr-GFP protein. The values in the table correspond to the protein stability (EnergyDiff, stabilizing for ΔΔG < 0.5, marked in green) and solubility (AvgScore, AvgScoreDiff) parameters. The residues are ordered in descending A3D score and the highest ranked virtual mutants are bolded

A3D

EnergyDiff of mutation to X [kcal/mol] by FoldX

profile

AvgScore | AvgScoreDiff by A3D

score

D

E

K

R

1.079

0.615

–0.065

0.182

–1.090 | –0.062

–1.089 | –0.061

–1.080 | –0.051

–1.082 | –0.054

0.971

0.462

0.160

0.313

–1.070 | –0.042

–1.072 | –0.044

–1.072 | –0.043

–1.060 | –0.032

0.025

2.448

0.968

0.836

–1.060 | –0.032

–1.066 | –0.038

–1.063 | –0.035

–1.068 | –0.040

1.468

0.745

0.033

0.457

–1.049 | –0.021

–1.069 | –0.041

–1.056 | –0.027

–1.063 | –0.035

0.186

0.123

–0.317

–0.882

–1.058 | –0.030

–1.055 | –0.027

–1.046 | –0.018

–1.056 | –0.028

0.597

0.672

–0.157

–0.176

–1.101 | –0.073

–1.076 | –0.048

–1.076 | –0.048

–1.082 | –0.053

2.344

1.517

1.098

0.805

–1.070 | –0.042

–1.071 | –0.043

–1.065 | –0.036

–1.061 | –0.033

–0.149

–0.361

–0.919

–0.730

–1.065 | –0.037

–1.063 | –0.035

–1.067 | –0.039

–1.074 | –0.046

2.463

1.374

1.352

1.692

–1.038 | –0.010

–1.037 | –0.009

–1.035 | –0.007

–1.033 | –0.004

–0.208

–0.345

0.961

0.478

–1.036 | –0.008

–1.036 | –0.008

–1.029 | –0.001

–1.025 | –0.003

Residue id

V11

Y39

G10

L221

T38

V176

F223

A206

V68

T62

1.359

0.901

0.527

0.351

0.176

0.141

0.016

0.008

0.003

0.000

30

Aleksander Kuriata et al.

Since the intention in this study is to attenuate the aggregation proneness of the protein, we need to target only the most aggregation-prone residues (V11, Y39, L221) first and then identify the amino acid variation that maximize protein solubility and stability. Note that the same residues would be significant if the user limited the A3D automatic screening to 3–5 top scored residues (default: 2). For these residues, A3D scores indicate that mutations to any of the charged amino acid increase protein’s solubility, while their energetic effects may be dramatically different. The introduction of negatively charged residue may have a destabilizing effect (especially Asp mutants), which is a call of special concern for the thermodynamic stability of the protein structure. The positively charged mutants may be significantly more soluble and similarly stable (especially Lys variants) compared to wild-type GFP. Other studies show that the needed effect of increased solubility and stabilization can often only be achieved with the simultaneous introduction of several point mutations [23–25]. These predictions can be assessed using A3D with user-indicated mutations (-m option). Such exercises were presented in our previous work [26], where the effects of cumulative mutations for aggregation-prone residues detected by A3D were analyzed and confirmed experimentally, resulting in the triple mutant GFP/KKK as folded, stable, and the most soluble variant (PDB: 6FWW). 4.2 Rational Design of Soluble Variants of a Single-Domain VH Antibody

For the analysis of DP47 VH, we used a homology model of DP47 VH that was created by Swiss-Model [27] based on the 6GHG [28] template with 98.21% of sequence identity, GMQE (Global Model Quality Estimation) of 0.99 and a QMEAN Z-score of 1.05. The more soluble variants of protein were designed by using A3D default static mode with automated mutation (-am option) for ten most aggregation-prone residues identified by the tool. $ aggrescan -i model.pdb -w DP47_static -v 4 -f ~/PATH/FoldX/ foldx4 -D 10 -am ’10 4’

The A3D score profile for DP47 VH (see left column of Fig. 7) indicated two main groups of aggregation-prone residues: four tyrosines (Y30, Y57, Y98, Y103) and four leucines (L3, L9, L43, and L109). In addition to these, G8, V91, and S97 also scored significantly. The protein surface (see right column of Fig. 7), colored as a function of A3D score, shows that all the indicated residues are exposed and contribute to structural aggregationprone regions. The strongest signal is visible on the APR patch formed by G8, L9, V91, and L109, while L3 is an APR of an isolated amino acid. Finally, ten residues highly ranked by A3D were virtually mutated to charged amino acids. The most stabilizing point substitutions with increased protein solubility are presented in Table 6.

Design of Protein Solubility and Aggregation Properties

31

Fig. 7 A3D prediction of residue aggregation properties in DP47 VH antibody. The left column shows the A3D aggregation profile plot. The right column shows the structure of DP47 model (obtained from SwissModel based on homology template of PDB: 6GHG). The protein surface is colored according to A3D score, where blue implies a soluble residues, and predicted aggregation-prone residues are indicated in red shades

As presented in Table 6, each of the virtual mutants is predicted to significantly improve the antibody solubility. Despite this generalized protection against aggregation, substitutions to negatively charged amino acids have a destabilizing impact. Alternatively, substitutions to either Lys or Arg are predicted as neutral or stabilizing variants, with the exception of G8, L43, and W104. While any of the substitutions of L109, L9, S97 to a positively charged amino acids might be further considered as faithful candidate to redesign DP47 VH solubility, we would discard the possibility not to include Y57, Y98, or Y103 to the study, although A3D suggests some valid modifications. This is because the aggregationprone regions containing these tyrosines overlap to a large extent with all three CDRs [26] (CDR1: A22-S33, CDR2: I49-Y57, CDR3: A96-Y104) of the VH, a particular region of antibodies responsible to bind antigens. Likely, protein regions involved in protein–protein interfaces are enriched in hydrophobic residues, being usually predicted as APRs by A3D and may be automatically included by A3D as solubilizing mutations. Attenuating the hydrophobicity of such important regions might lead to a decreased affinity, poor specificity, and ultimately a loss of protein function; all of them, undesired outcomes of reshaping VP47 VH surface. This is a second call of special concern when redesigning protein surfaces for avoiding the creation of variants mutated in functionally involved regions. The aggregation-prone residues being a part of the complementarity-determining regions (CDRs) of antibody (especially functionally relevant) may be manually excluded from the virtual mutation study by using an optional argument to -am

32

Aleksander Kuriata et al.

Table 6 A3D static mode prediction of the automatically generated mutants (-am option) in DP47 VH antibody. The values in the table correspond to the protein stability (EnergyDiff, stabilizing for ΔΔG < 0.5; no observed) and solubility (AvgScore, AvgScoreDiff) parameters. The residues are ordered in descending A3D score, and the highest ranked virtual mutants are bolded. Residues located in CDRs are marked with asterisk

A3D

EnergyDiff of mutation to X [kcal/mol] by FoldX

profile

AvgScore | AvgScoreDiff by A3D

score

D

E

K

R

1.036

0.466

–0.056

–0.264

–0.557 | –0.126

–0.562 | –0.130

–0.556 | –0.124

–0.555 | –0.123

0.550

0.350

–0.119

0.09211

–0.535 | –0.103

–0.536 | –0.104

–0.532 | –0.100

–0.536 | –0.104

0.118

0.268

–0.053

0.490

–0.536 | –0.104

–0.537 | –0.105

–0.532 | –0.100

–0.537 | –0.105

1.070

0.576

0.038

0.026

–0.529 | –0.097

–0.527 | –0.095

–0.528 | –0.096

–0.525 | –0.093

2.419

1.143

0.136

0.235

–0.473 | –0.041 –0.469 | –0.037

–0.476 | –0.044

–0.483 | –0.051

2.042

2.840

2.0624

2.257

–0.482 | –0.050

–0.486 | –0.054

–0.483 | –0.051

–0.487 | –0.055

1.811

0.038

0.014

0.077

–0.446 | –0.014

–0.453 | –0.021

–0.457 | –0.025

–0.470 | –0.039

4.822

3.769

2.813

2.502

–0.469 | –0.038

–0.486 | –0.054

–0.479 | –0.047

–0.509 | –0.077

0.495

0.201

–0.066

0.162

–0.512 | –0.080

–0.501 | –0.070

–0.504 | –0.072

–0.535 | –0.103

2.211

1.395

0.853

1.088

–0.529 | –0.097

–0.534 | –0.102

–0.527 | –0.095

–0.541 | –0.109

Residue id

L109

L9

*Y98

*Y57

V91

G8

S97

W104

*Y103

L43

1.492

1.434

0.998

0.848

0.838

0.831

0.705

0.575

0.565

0.553

Design of Protein Solubility and Aggregation Properties

33

option provided as a list of residues that will be not considered for mutations: $ aggrescan -i model.pdb -w DP47_static -v 4 -f ~/PATH/FoldX/ foldx4 -D 10 -am ’10 4 30A 57A 98A 103A’

4.3 Rational Design of the Fab Domain of the Therapeutic Monoclonal Antibody

For the A3D analysis of the Fab domain of the therapeutic monoclonal antibody, we used the crystal structure of PDB: 1BJ1 [29], removing all other chains except H and L. First, we carried out the static mode analysis (stage 1) with the automated mutations to identify the top-ranked aggregation-prone residues of light (L) and heavy (H) chains: $ aggrescan -i 1BJ1-HL.pdb -w Fab_static -v 4 -f ~/PATH/FoldX/ foldx4 -D 10 -am ’10 4 32L 94L 41H 54H 103H 108W’

Next, in order to study the effect of protein flexibility on aggregation properties, we run the analysis in the dynamic mode (stage 2): $ aggrescan -i 1BJ1-HL.pdb -w Fab_dynamic -v 4 -f ~/PATH/FoldX/ foldx4 -D 10 –d

Accordingly, we obtained a set of models, generated with CABS-flex, reflecting the prevailing structural fluctuations of the Fab domain. The highest A3D scoring model was selected as the most aggregation-prone conformer in solution and was used as an input to the static mode analysis (stage 3) with the automated mutations to identify alternative mutants to solubilize the protein: $ aggrescan -i model-HL.pdb -w Fab_static_alter -v 4 -f ~/PATH/ FoldX/foldx4 -D 10 -am ’10 4 32L 94L 41H 54H 103H 108W’

The crystal structure (output of stage 1) and the most aggregation-prone model (output of stage 2) reflecting conformational fluctuations are compared in Fig. 8. The surfaces of both the structures are colored according to the A3D score, which facilitates the identification of structural APRs (red shades highlight residues with A3Dscore >0). As expected, the higher A3D average score for the CABS-flex-generated model (0.52 compared to 0.71 for 1BJ1) stems from an increase of exposed aggregation-prone residues that in normal conditions (static) would be sheltered from solvent. The lower row in Fig. 8 shows the RMSF profile (distances between corresponding residues of the superimposed structures) on the left and superimposed backbones of the structures (1BJ1 in red and CABS-flex in blue) on the right.

34

Aleksander Kuriata et al.

Fig. 8 The protein surface colored as a function of A3D score for crystallographic structure of 1BJ1 (top left) and CABS-flex model reflecting conformational fluctuations (top right) of the Fab domain. The red shades highlight significant structural APRs, while the blue ones correspond to soluble residues. The RMSF profile (bottom left) and superimposed backbones (bottom right) of 1BJ1 (red) and CABS-flex model (blue) are provided as a measure of structural flexibility of the Fab domain

The aggregation-prone residues for both the static and the top-ranked CABS-flex model are indicated in Fig. 9, where the upper row shows the protein surface and the lower contains the A3D profiles for H and L chains on the left and right, respectively. Both the A3D approaches managed to spot several residues with significant aggregation tendencies. Despite being broadly distributed along the protein sequence, most of these residues colocalize in a specific region of the protein surface and constitute a major structural APR. Based on the available studies [30] and following an additional visual inspection, we have identified this region as the CDR of the Fab, which comprise the CDR1, CDR2, and CDR3 of the H and L chains. Since the amino acidic composition of the CDR is essential for antibodies to successfully perform their function and bind their targets, we decided to exclude all the residues comprised in this region from the analysis, even though they are scored as highly aggregation prone by A3D (see Y32, Y54, and Y103 from H Chain and Y32, F50, L54, and V94 from L Chain). Of note, residues of CDRs (shaded in gray in aggrescan3D

Design of Protein Solubility and Aggregation Properties

35

Fig. 9 A3D prediction of residue aggregation properties in the Fab domain of monoclonal antibody. The upper row shows: on the left—the crystal structure of the Fab domain (PDB: 1BJ1), where H and L chains are indicated in cyan and green, respectively; on the right—the surface of the CABS-flex model, generated in the dynamic mode, colored according to A3D score and labeled with highly aggregation-prone residues. The lower row shows the aggregation profile plots for H and L chains on the left and right, respectively. Aggregationprone residues predicted for 1BJ1 are indicated in blue, while for model reflecting fluctuations in red and common for both structures in black

profiles) agglutinate most of the aggregation proneness of the structure, suggesting again that the molecular determinants behind aggregation and inter-molecular interactions are constrained by the same physicochemical properties, such as hydrophobicity. Additional residues, not related to CDRs, were also predicted as aggregation prone by A3D, being ideal targets to redesign Fab’s surface. Accordingly, we analyzed solubilizing mutations of both the structures with the automated mutations tool of A3D. Table 7 contains the stability (EnergyDiff) and solubility (AvgScore, AvgScoreDiff) values for the A3D suggested mutants. Four of them, V5,

36

Aleksander Kuriata et al.

Table 7 Results for mutations provided by A3D static mode with the automated mutations (-am option) for the Fab domain. The values in the table correspond to the energy effect (EnergyDiff, stabilizing for ΔΔG < 0.5, marked in green) of the mutation measured in kcal/mol and calculated by FoldX (embedded in A3D) and solubility effect described by AvgScore predicted by A3D. The gray font corresponds to mutations from outside the highest ranked set

Residue id

V5H

0.970 (1.299)

F70H

0.000 (0.626)

V173H

0.129 (1.031)

V179H

0.543 (0.708)

L180H

1.336 (1.256)

V110L

L154L

EnergyDiff, AvgScore, AvgScoreDiff 1BJ1 crystal CABS–flex model

A3D auto scorea mutation

–0.060 (1.470) 0.591 (0.622)

D

1.300

–0.724

–0.013

–0.719

–0.587

–0.026

E

0.603

–0.724

–0.013

–0.789

–0.587

–0.026

K

–0.257

–0.723

–0.012

–0.879

–0.586

–0.025

R

–0.053

–0.724

–0.013

–0.734

–0.589

–0.029

R







–1.004

–0.581

–0.020

D

2.997

–0.711

0.000

–1.591

–0.591

–0.03

K

0.635

–0.710

0.001

–0.739

K

0.679

–0.714

–0.003

–0.525

–0.586 –0.577

–0.025 –0.016

R

0.474

–0.705

0.006

–0.887

–0.575

–0.014

E

0.439

–0.736

–0.025

–0.153

–0.598

–0.037

K

0.154

–0.733

–0.022

0.279

–0.595

–0.034

R

0.330

–0.733

–0.022

0.350

–0.598

–0.037

D

1.020

–0.732

–0.021

–1.468

–0.592

–0.032

E

0.550

–0.731

–0.020

–1.607

–0.593

–0.032

K

0.547

–0.717

–0.006

–1.951

–0.594

–0.033

R

0.871

–0.718

–0.007

–2.164

–0.592

–0.031

K

0.399

–0.728

–0.017

–0.055

–0.594

–0.033

R

0.752

–0.732

–0.021







a

The first value is the A3D score for selected residues of crystallographic structure of 1BJ1, while the second value (in parentheses) corresponds to A3D score for CABS-flex model generated in the dynamic mode

V179, L180 from Chain H, and L154 from Chain L, were detected by both the approaches, which might be explained by a sustained exposure of these amino acids to solvent. In contrast, F70 and V173 from Chain H and V110 from Chain L were exclusively detected in the most aggregation-prone CABS-flex model, probably because their transient exposure to solvent is restricted by structural fluctuations. All computationally created mutants are more soluble than the wild-type Fab; however, substitutions of V5, L180, V110, and L154 residues are predicted to have a greater impact on Fab’s solubility. The effects on structural stability are significantly diverse, although substitutions to positively charged residues are again favored as a general trend, similarly to the previous examples. A recent experimental study performed by Courtois et al. [30] showed that the same mutations proposed by the A3D automated

Design of Protein Solubility and Aggregation Properties

37

mutation protocol reduce the aggregation propensity of the fab domain when compared to the wild-type. A3D successfully identified V5, L180, V110, and L154 as the major drivers of aggregation and suggested mutation to K to improve protein solubility, while minimizing the impact to protein stability (see Table 8). The implementation of the dynamic mode (stage 2) allowed us to spot V110 as an aggregation-prone residue, which otherwise would have been unnoticed by the analysis in static mode.

5

Notes 1. Aggrescan3D standalone repository (which contains source code, wiki, installation instructions, and issue tracker) is available at https://bitbucket.org/lcbio/aggrescan3d. FoldX suite is available through academic and commercial licenses; details are provided at http://foldxsuite.crg.eu. CABS-flex standalone is distributed under the MIT license, which is free for academic and nonprofit users. CABS-flex source code, wiki and installation instructions are available at the repository: https:// bitbucket.org/lcbio/cabsflex. 2. Technical hints for A3D options:

Option

Useful tips

-f, --foldx

Once the option is used with the path /home/user/foldx, it can be omitted in the future runs

-n, --naccess

This option exists mainly to allow for compatibility with the original A3D server which used naccess

-m, --mutate

Multiple mutations can be added with a single -m line, and the -m option can be used multiple times. Be careful not to mutate a single residue multiple times, this will cause undefined behavior The mutant is built using FoldX software, alternatively a mutant might be built outside the program and supplied as regular input

-am

The automated mutations with -am option will choose N residues and attempt four mutations on each (into glutamic acid, arginine, lysine, and aspartic acid). Selecting a large N might result in a long simulation

--auto _mutation

The second argument, M, is the number of cores used concurrently to perform the calculations. We advise the number of used cores to be the number of physical cores of your PC or less An optional argument to -am option is a list of residues that will be “fixed”— not considered for mutations In the dynamic mode, the automated mutation option is not compatible, but -m option for user-indicated mutations is available (continued)

38

Aleksander Kuriata et al.

Option

Useful tips

-c, --config_file

The parenthesis should be omitted in a config file. Detailed instructions on how to create a config file can be found in configReade.ini file downloaded with the package Command line options overwrite those specified in config, so: $ aggrescan -c config.ini -w /other/dir will re-run the simulation with a different working directory Cabs config: Details on how to make those can be found in the CABS program wiki. This gives the user pretty much full control of the CABS simulation.

-v, --verbose

0—Only critical messages will be displayed (usually this means that simulation failed or encountered a serious issue) 1—Warning messages will be displayed as well; these are usually minor issues or situations that might need user’s attention 2—The default Info behavior will display messages about program progress 3—Log files verbosity level will display messages concerning file management 4—Debug verbosity level will report as much and accurately as it can about what it is doing. With -v 4 all the temporary files are left in the “tmp” folder. These should mostly be useful for debugging purposes. Exceptions will be raised to provide Python traceback, and all temporary files will be kept after the simulation is finished

3. The mutation code is in the format: . An example of a mutation code: “VM1A” means that valine, which is first residue of the protein chain A, was mutated into methionine.

Acknowledgments S.K. acknowledges support from the National Science Centre of Poland, grant MAESTRO no. 2014/14/A/ST6/00088. S.V. acknowledges support from the Spanish Ministry of Economy and Competitiveness no. BIO2016-78310-R and ICREA, ICREAAcademia 2015. J.P. acknowledges support from the Spanish Ministry of Science and Innovation via a doctoral grant (FPU14/ 07161). A.E.B-D. acknowledges financial support by Roy J. Carver Charitable Trust through Iowa State University Bioscience Innovation Postdoctoral Fellowship. Conflict of Interest None declared. References 1. Ebo JS, Guthertz N, Radford SE, Brockwell DJ (2020) Using protein engineering to understand and modulate aggregation. Curr Opin Struct Biol 60:157–166. https://doi.org/10. 1016/j.sbi.2020.01.005

2. Zambrano R, Jamroz M, Szczasiuk A et al (2015) AGGRESCAN3D (A3D): server for prediction of aggregation properties of protein structures. Nucleic Acids Res 43:W306–W313. https://doi.org/10.1093/nar/gkv359

Design of Protein Solubility and Aggregation Properties 3. De Groot NS, Aviles FX, Vendrell J, Ventura S (2006) Mutagenesis of the central hydrophobic cluster in Aβ42 Alzheimer’s peptide: sidechain properties correlate with aggregation propensities. FEBS J 273:658–668. https:// doi.org/10.1111/j.1742-4658.2005. 05102.x 4. Conchillo-Sole´ O, de Groot NS, Avile´s FX et al (2007) AGGRESCAN: a server for the prediction and evaluation of “hot spots” of aggregation in polypeptides. BMC Bioinformatics 8:65. https://doi.org/10.1186/1471-21058-65 5. Kurcinski M, Oleniecki T, Ciemny MP et al (2019) CABS-flex standalone: a simulation environment for fast modeling of protein flexibility. Bioinformatics 35:694–695. https:// doi.org/10.1093/bioinformatics/bty685 6. Kuriata A, Gierut AM, Oleniecki T et al (2018) CABS-flex 2.0: a web server for fast simulations of flexibility of protein structures. Nucleic Acids Res 46:W338–W343. https://doi.org/ 10.1093/nar/gky356 7. Pulido P, Llamas E, Llorente B et al (2016) Specific Hsp100 chaperones determine the fate of the first enzyme of the plastidial isoprenoid pathway for either refolding or degradation by the stromal Clp protease in arabidopsis. PLoS Genet 12:e1005824. https://doi.org/ 10.1371/journal.pgen.1005824 ˇ erovnik E (2017) Putative alternative func8. Z tions of human stefin B (cystatin B): binding to amyloid-beta, membranes, and copper. J Mol Recognit 30:e2562. https://doi.org/10. 1002/jmr.2562 9. Patel P, Parmar K, Vyas VK et al (2017) Combined in silico approaches for the identification of novel inhibitors of human islet amyloid polypeptide (hIAPP) fibrillation. J Mol Graph Model 77:295–310. https://doi.org/10. 1016/j.jmgm.2017.09.004 10. Bhandare VV, Ramaswamy A (2018) The proteinopathy of D169G and K263E mutants at the RNA Recognition Motif (RRM) domain of tar DNA-binding protein (tdp43) causing neurological disorders: a computational study. J Biomol Struct Dyn 36:1075–1093. https:// doi.org/10.1080/07391102.2017.1310670 11. Kuriata A, Iglesias V, Pujols J et al (2019) Aggrescan3D (A3D) 2.0: prediction and engineering of protein solubility. Nucleic Acids Res 47:W300. https://doi.org/10.1093/nar/ gkz321 12. Kuriata A, Iglesias V, Kurcinski M et al (2019) Aggrescan3D standalone package for structure-based prediction of protein aggregation properties. Bioinformatics 35:3834.

39

https://doi.org/10.1093/bioinformatics/ btz143 13. Schymkowitz J, Borg J, Stricher F et al (2005) The FoldX web server: an online force field. Nucleic Acids Res 33:W382. https://doi.org/ 10.1093/nar/gki387 14. Buß O, Rudat J, Ochsenreither K (2018) FoldX as protein engineering tool: better than random based approaches? Comput Struct Biotechnol J 16:25 15. Jamroz M, Kolinski A, Kmiecik S (2014) CABS-flex predictions of protein flexibility compared with NMR ensembles. Bioinformatics 30:2150–2154. https://doi.org/10.1093/ bioinformatics/btu184 16. Kmiecik S, Kouza M, Badaczewska-Dawid AE et al (2018) Modeling of protein structural flexibility and large-scale dynamics: coarsegrained simulations and elastic network models. Int J Mol Sci 19:E3496. https://doi.org/ 10.3390/ijms19113496 17. Beerten J, Schymkowitz J, Rousseau F (2013) Aggregation prone regions and gatekeeping residues in protein sequences. Curr Top Med Chem 12:2470–2478. https://doi.org/10. 2174/1568026611212220003 18. Chiti F, Dobson CM (2017) Protein misfolding, amyloid formation, and human disease: a summary of progress over the last decade. Annu Rev Biochem 86:27. https://doi.org/ 10.1146/annurev-biochem-061516-045115 19. Moussa EM, Panchal JP, Moorthy BS et al (2016) Immunogenicity of therapeutic protein aggregates. J Pharm Sci 105:417 20. Roberts CJ (2014) Therapeutic protein aggregation: mechanisms, design, and control. Trends Biotechnol 32:372 21. Lu RM, Hwang YC, Liu IJ et al (2020) Development of therapeutic antibodies for the treatment of diseases. J Biomed Sci 27:1 22. Pe´delacq JD, Cabantous S, Tran T et al (2006) Engineering and characterization of a superfolder green fluorescent protein. Nat Biotechnol 24:79. https://doi.org/10.1038/ nbt1172 23. Schimmele B, Plu¨ckthun A (2008) Engineering proteins for stability and efficient folding. In: Fersht AR (ed) Protein science encyclopedia. KGaA, Weinheim 24. Cabrita LD, Gilis D, Robertson AL et al (2007) Enhancing the stability and solubility of TEV protease using in silico design. Protein Sci 16:2360–2367 25. Kumar S, Roffi K, Tomar DS et al (2018) Rational optimization of a monoclonal antibody for simultaneous improvements in its solution properties and biological activity.

40

Aleksander Kuriata et al.

Protein Eng Des Sel 31:313–325. https://doi. org/10.1093/protein/gzy020 ˜ o´-Polo M, Vareja˜o N et al 26. Gil-Garcia M, Ban (2018) Combining structural aggregation propensity and stability predictions to redesign protein solubility. Mol Pharm 15:3846–3859. https://doi.org/10.1021/acs. molpharmaceut.8b00341 27. Waterhouse A, Bertoni M, Bienert S et al (2018) SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res 46:W296. https://doi.org/10. 1093/nar/gky427 28. Regula JT, Imhof-Jung S, Mølhøj M et al (2018) Variable heavy-variable light domain and Fab-arm CrossMabs with charged residue

exchanges to enforce correct light chain assembly. Protein Eng Des Sel 31:289. https://doi. org/10.1093/protein/gzy021 29. Muller YA, Chen Y, Christinger HW et al (1998) VEGF and the Fab fragment of a humanized neutralizing antibody: crystal structure of the complex at 2.4 Å resolution and mutational analysis of the interface. Structure 6:1153. https://doi.org/10.1016/S09692126(98)00116-6 30. Courtois F, Agrawal NJ, Lauer TM, Trout BL (2016) Rational design of therapeutic mAbs against aggregation through protein engineering and incorporation of glycosylation motifs applied to bevacizumab. MAbs 8:99. https:// doi.org/10.1080/19420862.2015.1112477

Chapter 3 Using Surface Hydrophobicity Together with Empirical Potentials to Identify Protein–Protein Binding Sites: Application to the Interactions of E-cadherins Robert L. Jernigan, Pranav Khade, Ambuj Kumar, and Andrzej Kloczkowski Abstract Studying the interactions within protein structures can inform about the details of how proteins of various types interact and aggregate. Empirical contact potentials have proven to be extremely important in the evaluation of individual modeled protein structures, but have found few applications to protein–protein interactions. In part, this is caused by a lack of properly formulated potentials with a proper reference state. Since the comparisons are made between different bound structures, the proper reference state should take into account other contacts. Therefore, a preferred reference state should be defined with respect to a given residue type interacting with an average residue instead of interacting with solvent as typically is used in derivation of statistical contact potentials. Here, a two-stage procedure for generating and evaluating interacting protein pairs is described, and an example of E-cadherin interactions is shown. Key words E-cadherins, Protein–protein interactions, Hydrophobicity, Transmembrane proteins, Hydrophobic cluster, Contact potentials, Protein aggregation, Calcium-dependent adhesion

1

Introduction Uncovering the underlying geometric and chemical complementarity for protein–protein binding has a long history, with a wide variety of approaches having been pursued. In the early days, regions of high surface shape complementarity [1] were used to identify the best molecular geometry for docking [2, 3] even proposed surface curvature as a useful tool for locating binding sites. Also considered were regions of high density in hydrogen bond sites [4, 5], and those areas with favorable electrostatic properties [6] were used to optimize the chemical complementarity as well as to search selected surfaces for potential binding sites. Studies using molecular dynamics simulations [7, 8] were also applied to

Mai Suan Li et al. (eds.), Computer Simulations of Aggregation of Proteins and Peptides, Methods in Molecular Biology, vol. 2340, https://doi.org/10.1007/978-1-0716-1546-1_3, © Springer Science+Business Media, LLC, part of Springer Nature 2022

41

42

Robert L. Jernigan et al.

understand better the principal characteristics of molecular associations [9]. And, there are many newer papers on this subject [10– 12]. Protein–protein interactions play a critical role in protein function. Completion of many genomes is being followed rapidly by major efforts to identify interacting protein pairs experimentally in order to decipher the networks of interacting, coordinated-inaction proteins. Identification of protein–protein interaction sites and detection of specific amino acid residues that contribute to the specificity and strength of protein interactions is an important problem with broad applications ranging from rational drug design to the analysis of metabolic and signal transduction networks. Experimental identification of residues in protein–protein interaction surfaces must come either from the determination of the structure of protein–protein complexes or from functional assays, such as yeast two-hybrid experiments. The rapidly growing experimental data of protein structures (although the number of solved protein–protein complexes is still limited) and on protein–protein interactions, as well as improved methods of protein structure prediction led in the last 20 years to the development of computational methods for identifying and for the prediction of amino acid residues that participate in protein–protein interactions. Integration of information from various sources (sequential information, structural information, interactome information, pathway databases), identification of conserved residues, and of co-evolution couplings, as well as literature text mining helps to identify protein–protein interactions and to better understand molecular mechanisms of protein–protein interactions. This accumulating information is being assembled in several public databases such as DIP [13], BIOGRID [14], and STRING [15].

2

Hydrophobicity Hydrophobicity is important for protein–protein interactions and binding of other molecules. Usually these occur in clusters and the binding sites are found to be located at one of the strongest hydrophobic clusters on the surface of the protein acceptor. Young et al. [16] found that in 25 of 38 cases, the binding was to the strongest hydrophobic cluster, and in the remaining cases, it was located in one of the top six cases. Also, it was reported that the most hydrophobic cluster coinciding with more than one-third of the surface buried by the bound ligand. The remaining 13 cases bound to one of the top six hydrophobic clusters. These results suggest that, at surface, hydrophobicity can be used to identify those regions on a protein surface most likely to bind any ligand [16].

Using Surface Hydrophobicity Together with Empirical Potentials

43

energy 1 H

P 0.5 A

B H

P

A

B

0

Fig. 1 Contact potential for pairs of amino acids in different classes. H is for hydrophobic residues, P for polar residues, A for negatively charged amino acids, and B for positively charged amino acids. It can be clearly seen that the strongest interactions by far are for pairs of hydrophobic amino acids in contact in structures

Why is this? It is primarily because the hydrophobic interacting pairs are the strongest types of interactions, so matching hydrophobic residues is the most important for gaining strength (low energy). The interactions between two proteins have been shown to follow the usual contact interactions for interactions between amino acids coming from interactions with solvent [17]. And, just as for contacts between amino acids within a single protein, the specificity for the interactions comes from the interactions between polar and charged residues. The work showed that the reference state for these potentials is forming interactions from the amino acid being solvent exposed, which holds for both intermolecular interactions and intramolecular interactions. The relative values of this potential function are shown in Fig. 1. Principally these reflect hydrophobicity values. Another hydrophobicity-based scale was developed by Chris Dobson’s group for use to identify the aggregation tendencies of each amino acid [18]. Both of these studies point out the importance of hydrophobicity for binding strength between proteins. The importance of hydrophobicity of interacting residues for protein contact potentials (CPs) was revealed by the detailed statistical analysis of 29 different contact potentials published in the literature [19]. It has been shown that all analyzed CPs can be divided into two groups, regardless of their having completely different derivation origin. Most of these knowledge-based statistical potentials could be well approximated by appropriate combinations of one-body components. The CP matrices of the first class can be approximated with a correlation of order 0.9 by the formula eij ¼ hi + hj, 1  i, j  20, where the residue-type dependent factor h is highly correlated with

44

Robert L. Jernigan et al.

the frequency of occurrence of a given amino acid type inside proteins. Potentials belonging to the second class can be approximated with a correlation of 0.9 by the formula eij ¼ c0  hihj + qiqj, where c0 is a constant, h is highly correlated with the Kyte–Doolittle hydrophobicity scale, and a new, less-dominant, residue-typedependent factor q is correlated (~0.9) with amino acid isoelectric points pI [19].

3

Cadherins Cadherins are a class of type-1 transmembrane proteins that play important role in cell adhesion. Their name comes from the phrase “calcium-dependent adhesion.” They play a critical role in the formation of adherens junctions that bind cells together. Cadherins mediate homotypic adhesion between cells. Another type of proteins named integrins mediate adhesion between the cell and the extracellular matrix. By regulating contact formation and stability, cadherins play a crucial role in development: cell adhesion, sorting, and tissue morphogenesis. Cadherins are also involved in signaling in cancer and play an important role as a therapeutic target in oncological studies. The external domain of a cadherin that is located outside of a cell is composed of multiple repeats of the same protein chain where each repeat can bind calcium. Calcium binding makes the whole chain rigid enough to connect it with a chain from another cell. The role of cadherins in molecular biology and medicine has been reviewed in [20].

4

Cadherin Interactions One of the most important cases of protein–protein interactions is between two identical cadherins (as seen in Fig. 2), whose interactions control cell–cell interactions [21]. It serves as an interesting example of protein–protein interactions.

5 Method for Evaluating Various Interacting Conformations Between Two Rigid Protein Structures This is a two-stage procedure: (1) identify the strong potential hydrophobic patches and (2) rotate and translate pairs of hydrophobic patches with respect to each other to achieve the most favorable non-hydrophobic interactions. This first step is premised on the fact that hydrophobic interactions are quite nonspecific and are often able to rearrange side chains to accommodate one another

Using Surface Hydrophobicity Together with Empirical Potentials

45

Fig. 2 E-Cadherin domain showing the residues in dimeric interactions when attached to an identical domain from an opposing cell. The structure shown is one chain, the PDB structure 3q2v. The closely interacting residues are shown in color, with the hydrophobic residues colored magenta, and the interacting residues on the edge of the hydrophobic patch in yellow. In addition, tryptophan (strongly hydrophobic in red) interacts strongly by fitting into a pocket onto the opposing chain. The residues in yellow are mostly polar but in this case also include several prolines, which could also have been considered to be part of the hydrophobic patch

and maintain strong interactions. The second step is critical for obtaining the optimal interaction position. For this purpose, the interactions that evaluate the specificity should be used, which use a reference state of exposure to an average residue. Many of these empirical contact potentials have proven themselves to be the most effective type of potentials for evaluating predicted protein structures at the CASP competitions [22]. One of the earliest was that of Miyazawa and Jernigan from 1985 [23], which was subsequently refined by using more structures in 1996 [24]. This latter potential is the one we choose to describe here. Stage 1—Identify the strongest hydrophobic patches. Use the values from Table 1 for all possible patches separately. Then, choose a set of the strongest ones for considering in Stage 2. Stage 2—Evaluate interactions between all pairs of interacting non-hydrophobic residues outside the hydrophobic patches identified in Stage 1 by using the contact potentials with a reference state of solvent exposure (Table 2). These values were not published in the 1996 paper [24] but have been computed here according to the equation Eij ¼ eij + err  eir  ejr. Since placing intermolecular hydrophobic patches together will simultaneously cause polar residues on the outside fringe to come into contact, it is important to evaluate these interactions with a different reference state of interacting with another amino acid. These values are provided in Table 2. And, at the same time, similar evaluations of the hydrophobic residue pairs should be carried out.

46

Robert L. Jernigan et al.

Table 1 Contact potential between residue types and an average residue [24]. These values can also be used as a hydrophobicity scale and have a similarity to other hydrophobicity scales

6

C

3.57

M

3.92

F

4.76

I

4.42

L

4.81

V

3.89

W

3.81

Y

3.41

A

2.57

G

2.19

T

2.29

S

1.98

N

1.92

Q

2.00

D

1.84

E

1.79

H

2.56

R

2.11

K

1.52

P

2.09

Some Simple Questions About Protein Aggregation Protein aggregation is a widespread problem, from protein syntheses where inclusion bodies are improperly folded aggregates to amyloid deposits in the brain. The characteristics of amino acid interactions are important considerations for this problem. In general, hydrophobic interactions are the strongest and provide substantial stability for protein binding. The remaining residue types provide specificity to favor specific cases out of the ones having hydrophobic interactions. In an earlier study, we observed that the most hydrophobic patches on protein surfaces are often the sites not only for proteins to interact with one another but also for small molecule ligand binding. Thirty-eight co-molecule crystal structures were

Using Surface Hydrophobicity Together with Empirical Potentials

47

Table 2 Specificity contact potentials for pairs of residue types. Eight residue types (C, M, F, I, L, V, W, Y) with the highest values of hydrophobicity in Table 1 were excluded. Because of the symmetry of the potentials only the upper half is shown

A G T S N Q D E H R K P

A

G

T

S

N

Q

D

E

H

R

K

P

1.07

0.84

0.93

0.88

0.92

1.08

0.94

1.18

1.15

1.22

1.13

1.12

0.27

0.53

0.43

0.38

0.67

0.41

0.83

0.77

0.69

0.65

0.64

0.59

0.39

0.34

0.53

0.3

0.41

0.60

0.61

0.59

0.71

0.32

0.28

0.58

0.11

0.31

0.55

0.53

0.49

0.68

0.05

0.23

0.07

0.15

0.45

0.38

0.20

0.59

0.61

0.36

0.45

0.76

0.43

0.33

0.60

0.28

0.52

0.09

0.39

0.39

0.67

0.68

0.31

0.32

0.46

0.79

0.28

0.66

0.86

0.67

0.76

1.11

0.71

0.97

0.83 0.76

considered where we observed that a small set of the most hydrophobic patches were the actual binding site. This also holds clearly for cadherin, as can be seen in Fig. 3 for E-cadherin dimers. These were static structures, and there is also some possibility that the protein dynamics expose some essential hydrophobic binding sites. The characteristics of the sequences of globular proteins are to have a balance of hydrophobic residues for strength of interactions together with a proper balance polar and charged residues. The polar and charged residues impart specificity to yield a more limited set of specific favorable structures. The polar non-charged residues are especially important since they can form hydrogen bonds. Many multimeric proteins have strong hydrophobic patches at their interfaces. Some other structures such as hemoglobin have more unusual distributions of hydrophobic patches on their surfaces (Fig. 4). Clearly the hemes bind at hydrophobic patches but the remaining hydrophobic patch distribution is somewhat puzzling. Does avoiding these relatively small hydrophobic patches help to guide oxygen along the more polar surface to reach its binding site? Or could it be guiding the oxygen to entries of some of the important pathways to the hemes [25]? There are significant advantages to investigating protein aggregation with coarse-grained models such as is proposed here. What has been discussed here is appropriate for globular proteins where the dynamics has been ignored. Further studies should include

48

Robert L. Jernigan et al.

Fig. 3 High hydrophobicity (95% highest cases shown in magenta) of all the tessellations in the nrPDB) tessellation patches for the interface of the cadherin trans-dimer (PDB: 3q2v); The interaction between the monomers is stabilized by this large magenta patch shown in the center and further supported by a geometric fit by tryptophan 2 near the high hydrophobicity patch. See Fig. 2 for a closer view of the binding hydrophobic patch at the interaction site

Fig. 4 Tetrameric hemoglobin showing its hydrophobic surface patches in magenta, with the hemes shown as sticks. The four subunits are shown in yellow, beige, gray, and turquoise

treatment of the dynamics. Also, there are other types of proteins where aggregation is important, such as membrane proteins and fibrous proteins. In addition, there is an important problem of aggregation within the intrinsically disordered proteins that is an important aspect of their structures. The general sequence characterization of disordered proteins is having a high fraction of charged amino acids as their hallmark, and this has implications of the amino acid composition in general. The remaining hydrophobic residues are likely to aggregate, with some of the polar determining more precisely how these interact. This is an undeveloped research area. Would it be possible to predict which protein sequences would be most likely to aggregate? And, even to develop molecular models of possible interacting structures, no matter how

Using Surface Hydrophobicity Together with Empirical Potentials

49

evanescent they may be? Could sequentially close hydrophobic residues induce partial intramolecular folding establishing a balance between that and intermolecular aggregation? All these problems need further studies to shed more light on the molecular mechanisms of protein aggregation.

Acknowledgments We acknowledge support from NSF grant DBI 1661391 and NIH grant R01 GM127701. References 1. Shoichet BK, Bodian DL, Kuntz ID (1992) Molecular docking using shape descriptors. J Comput Chem 3:1–18 2. Kuntz ID, Blaney JM, Oatley SJ, Langridge R, Ferrin TE (1982) A geometric approach to macromolecular-ligand interactions. J Mol Biol 161:269–288 3. Nicholls A, Sharp KA, Honig B (1991) Protein folding and association: in-sights from the interfacial and thermodynamic properties of hydrocarbons. Proteins 8:281–296 4. Danziger DJ, Dean PM (1989) Automated site-directed drug design: a general algorithm for knowledge acquisition about hydrogenbonding regions at protein surfaces. Proc R Soc Lond B 236:101–113 5. Danziger DJ, Dean PM (1989) Automated site-directed drug design: the prediction and observation of ligand point positions at hydrogen-bonding regions on protein surfaces. Proc R Soc Lond B 236:115–124 6. Goodford P (1985) A computational procedure for determining energetically favorable binding sites on biologically important macromolecules. J Med Chem 28:849–857 7. Brooks CL III, Karplus M, Pettit M (1988) Proteins: a theoretical perspective of dynamics, structure and thermodynamics. In: Advances in chemical physics, vol LXXI. John Wiley and Sons, New York, NY 8. Capon OJ, Ward RHR (1991) The CD4-gpl20 interaction and AJDS pathogenesis. Annu Rev Immunol 9:649–678 9. Janin J, Chothia C (1990) The structure of protein-protein recognition sites. J Biol Chem 265:16027–16030 10. Jones S, Thornton JM (1996) Principles of protein-protein interactions. Proc Natl Acad Sci U S A 93:13–20

11. Keskin O, Gursoy A, Ma B, Nussinov R (2008) Principles of proteinprotein interactions: what are the preferred ways for proteins to interact? Chem Rev 108:1225–1244 12. Keskin O, Tuncbag N, Gursoy A (2016) Predicting protein–protein interactions from the molecular to the proteome level. Chem Rev 116:4884–4909 13. Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D (2004) The database of interacting proteins: 2004 update. Nucleic Acids Res 32:D449–D451 14. Oughtred R, Stark C, Breitkreutz BJ, Rust J, Boucher L, Chang C, Kolas N, O’Donnell L, Leung G, McAdam R, Zhang F, Dolma S, Willems A, Coulombe-Huntington J, ChatrAryamontri A, Dolinski K, Tyers M (2019) The BioGRID interaction database: 2019 update. Nucleic Acids Res 47:D529–D541 15. Szklarczyk D, Gable AL, Lyon D, Junge A, Wyder S, Huerta-Cepas J, Simonovic M, Doncheva NT, Morris JH, Bork P, Jensen LJ, Mering CV (2019) STRING v11: proteinprotein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res 47:D607–D613 16. Young L, Jernigan RL, Covell DG (1994) A role for surface hydrophobicity in proteinprotein recognition. Protein Sci 3:717–729 17. Keskin O, Bahar I, Badretdinov AY, Ptitsyn OB, Jernigan RL (1998) 1998. Empirical solvent-mediated potentials hold for both intra-molecular and inter-molecular inter-residue interactions. Protein Sci 7:2578–2586 18. Pawar AP, duBay KF, Zurdo J, Chiti F, Vendruscolo M, Dobson CM (2005) Prediction of “aggregation-prone” and “aggregation-susceptible” regions in proteins

50

Robert L. Jernigan et al.

associated with neurodegenerative diseases. J Mol Biol 350:379–392 19. Pokarowski P, Kloczkowski A, Jernigan RL, Kothari NS, Pokarowska M, Kolinski A (2005) Inferring ideal amino acid interaction forms from statistical protein contact potentials. Proteins 59:49–57 20. Van Roy F (2013) The molecular biology of cadherins. In: Progress in molecular biology and translational science, vol 116. Academic Press, Oxford 21. Manibog K, Sankar K, Kim SA, Zhang Y, Jernigan RL, Sivasankar S (2016) Molecular determinants of cadherin ideal bond formation: conformation-dependent unbinding on a multidimensional landscape. Proc Natl Acad Sci U S A 113:E5711–E5720

22. Feng Y, Kloczkowski A, Jernigan RL (2010) Potentials ‘R’ Us web -server for protein energy estimations with coarse-grained knowledge-based potentials. BMC Bioinformatics 11:92 23. Miyazawa S, Jernigan RL (1985) Estimation of effective inter residue contact energies from protein crystal structures: quasi-chemical approximation. Macromolecules 18:534–552 24. Miyazawa S, Jernigan RL (1996) Residueresidue potentials with a favorable contact pair term and an unfavorable high packing density term for simulation and threading. J Mol Biol 256:623–644 25. Lin T-L, Song G (2011) Efficient mapping of ligand migration channel networks in dynamic protein. Proteins 79:2475–2490

Chapter 4 Computational Models for the Study of Protein Aggregation Nguyen Truong Co, Mai Suan Li, and Pawel Krupa Abstract Protein aggregation has been studied by many groups around the world for many years because it can be the cause of a number of neurodegenerative diseases that have no effective treatment. Obtaining the structure of related fibrils and toxic oligomers, as well as describing the pathways and main factors that govern the self-organization process, is of paramount importance, but it is also very difficult. To solve this problem, experimental and computational methods are often combined to get the most out of each method. The effectiveness of the computational approach largely depends on the construction of a reasonable molecular model. Here we discussed different versions of the four most popular all-atom force fields AMBER, CHARMM, GROMOS, and OPLS, which have been developed for folded and intrinsically disordered proteins, or both. Continuous and discrete coarse-grained models, which were mainly used to study the kinetics of aggregation, are also summarized. Key words Protein aggregation, Coarse-grained model, Lattice model, AMBER, CHARMM, GROMOS, OPLS

1

Introduction After synthesis in the ribosome, the protein can fold into a native state and is likely to become functional. However, under the influence of various factors, such as changes in the translation rate of codons, sequence, crowded environment, it can aggregate (Fig. 1), which can cause a number of neurodegenerative diseases [1]. Therefore, problem of protein aggregation has attracted the attention of many researchers in recent decades. Appearance of plaques from amyloid beta (Aβ) peptides and tau-protein in the brain is considered as a hallmark of Alzheimer’s disease [2], while accumulation of α-synuclein is believed to cause Parkinson’s disease [3]. In total, there are about 20 different neurodegenerative diseases associated with the self-assembly of various proteins, although fibrillar structures are applicable in some cases. The study of protein aggregation includes the determination of the structure of aggregates and mechanisms of their formation. Depending on the conditions, the aggregate can be amorphous

Mai Suan Li et al. (eds.), Computer Simulations of Aggregation of Proteins and Peptides, Methods in Molecular Biology, vol. 2340, https://doi.org/10.1007/978-1-0716-1546-1_4, © Springer Science+Business Media, LLC, part of Springer Nature 2022

51

52

Nguyen Truong Co et al.

Fig. 1 Schematic graph showing that a protein can either fold into its native state in order to be functional or misfold to aggregate. There are three possible scenarios for aggregation: the formation of toxic off-pathway oligomers, amorphous aggregate, and fibrillar structure through on-pathway oligomers

(e.g., at a high concentration of metal ions) or fibrillar with a crossbeta structure, as shown by experiment using solid state NMR, solution NM, and cryo-EM. It was previously thought that only fibrils are toxic to the nervous system, but recent experimental studies have revealed that off-pathway oligomers are also highly toxic [4] (Fig. 1). Therefore, it is necessary to determine the structure of oligomers, but because of their transient nature, experimental methods cannot solve this problem. In such a context, computational tools such as molecular dynamics simulations are helpful. Amyloid fibril formation mechanism has been experimentally proven to follow the nucleation growth mechanism [5, 6], in which the augmentation of aggregation mass in time obeys the sigmoid curve (Fig. 2) and consists of three characteristic stages: the lag phase, elongation or growth phase, and equilibrium or saturation phase [1, 7]. The lag phase corresponds to the period, in which soluble monomers randomly form unstable oligomers, and its end is marked by the formation of a primary critical nucleus, which acts as a stable template into which other peptides are favorably incorporated and contribute to fibril elongation. In the elongation phase, formation of protofibrils from the template obeys the “docklock” mechanism [8]. The accumulation of fibril mass also benefits from the secondary nucleation, where available small fibrils catalyzes the formation of oligomers on their surface, resulting in the formation of β-sheet-rich species [9]. Finally, as soon as the balance between the attachment of monomers to mature fibrils and their detachment is reached, the system enters the saturation phase. Protein folding has been known to be computationally challenging due to a rough free energy landscape, but the numerical

Computational Models for the Study of Protein Aggregation

53

Fig. 2 Sigmoid kinetics of fibril formation process typically observed in the fluorescent experiment. Blue and orange colors represent species favoring disordered and ordered configurations, respectively. 1SN refers to one-step nucleation kinetics, in which highly amyloidogenic peptides can associate to form β-rich oligomers and then a critical nucleus. 2SN stands for two-step nucleation kinetics, in which poorly amyloidogenic peptides are first formed small amorphous oligomers, and then they evolve into a disordered stable critical nucleus, which gradually transforms its structure into a rich β-structure template

study of fibril formation is even more difficult because the fibril formation time (hour-day) is about four orders of magnitude longer than the folding time (μs–s). Therefore, all-atom molecular simulations are usually limited to the fibril formation of short amyloid peptides or the early stage of aggregation of longer proteins [10]. Coarse-grained models have often been used to study aggregation kinetics, but not to obtain a fibrillar structure unless they are hybridized with all-atom models. There are many good reviews of coarse-grained [11] and all-atom force fields (FFs) [12, 13] and their application to study self-assembly of proteins. Therefore, our goal is to provide a short survey of the latest developments in this area, with a focus on coarse-grained models and our own contributions. Our review begins with a description of the main all-atom FFs developed for proteins that have a native state and intrinsically disordered proteins or both. We restrict ourselves to the results obtained using these FFs for the aggregation of amyloid proteins and peptides. Most of the current off- and on-lattice models and their application to the study of the thermodynamics and kinetics of oligomerization and

54

Nguyen Truong Co et al.

fibrillation will be briefly discussed. Future directions in the development of computational models for the study of the aggregation of biomolecules will be outlined.

2

Application of All-Atom Models to Study Protein Aggregation All-atom FFs are the most natural choice to conduct theoretical studies of many biological phenomena involving proteins, nucleic acids, lipids, and sugars. Most of the FFs focusing on proteins are based on Anfinsen’s dogma saying that at least for small globular proteins the native structure is determined only by the amino acid sequence [14]. However, the process of protein folding is so complicated that it is impossible to study all possible protein conformations [15]. Therefore, one should focus on a dominant folding pathway that leads to a native structure of proteins [16]. Although this is true for many proteins, it was found that protein aggregates and fibrils possess very stable and low-energy structures, which are kinetically preferable [17]. High stability of the fibrils is one of the reasons why diseases such as Alzheimer, Parkinson, or dementia with Lewy bodies are so difficult to treat [18, 19].

2.1 Typical FFs Used for Studies of Systems Involving Peptides or Proteins

The four most popular all-atom FFs used to study proteins are probably Amber [20], CHARMM [21], GROMOS [22], and OPLS [23]. Although technically only GROMOS and OPLS-UA are unified atom FFs, due to the fact that some hydrogen atoms are implicitly included in heavy-atom parameters with which they are covalently bound, other FFs are also using some tricks to constrain motion of some or all hydrogen atoms in order to speed up simulations, such as SHAKE [24], RATTLE [25], and SETTLE for rigid water molecule [26] algorithms. All of the mentioned FFs are non-polarizable and belong to first-generation FFs sharing very similar energy equation [27]. Therefore, the most significant differences between them are mostly due to various parameterization methods used to obtain parameters [28]. All of the popular all-atom FFs exist in many versions and alterations developed by years of development, and although, in general, the accuracy of modern FFs is much better than of the old ones, not always new versions are universally better. For instance, they may overestimate the alpha [29] or beta [30] content. Thus, one should always be aware of possible bias of the method [31]. Some popular versions of FFs used to study protein systems are Amber ff99sb [32], ff14sb [33], and new ff19s [34], which is recommended to be used with more advance water model, e.g., 3-charge, 4-point rigid water model OPC [35] instead of regular three-site water model TIP3P [36]. CHARMM C22/CMAP (C27) [37, 38], C36 [39], and new C36m [40] designed with improvements for regular and disordered proteins, GROMOS

Computational Models for the Study of Protein Aggregation

55

54A7 and 54B7, which improved the stability of proteins compared to 53A6 [41] and newer 54A8 [42] which fixed charge interactions [43], OPLS-AA [44], OPLS-AA-L [45], and other modifications, such as L-OPLS [46] and OPLS-AA/M [47, 48]. Additional issue is caused by peptides and proteins without stable secondary and tertiary structures, such as intrinsically disordered proteins (IDPs) [49], for which FFs often overestimate structure stability; therefore, some variants specially designed for these cases were developed [50], such as Amber ff14IDPs [51], and further improved versions ff14IDPSFF [52] and CHARMM36IDPSFF [53], significantly improving agreement of the simulations with the experimental observables [54]. As water models can significantly change the properties of disordered proteins [55, 56], due to strong interaction of such molecules with water [57, 58], Table 1 shows short summary of popular state-of-the-art FFs with recommended water models coupled with them. 2.2 Applications of All-Atom Models

Due to computational limitations, for many years most of the applications of all-atom FFs focused on single peptide or protein systems [69]. Klimov and Thirumalai [70] were the first to apply an all-atom model to study the aggregation of three short peptides Aβ16–22 (KLVFFAE), which stimulated a lot of works in this area. Interesting approach to overcome the problem of too short computational time to study aggregation effect in large systems was to study the addition of the monomeric chain to oligomers and fibrils, which showed the two-stage dock-and-lock mechanism of such process if oligomer is big enough [8]. Sometimes the aggregation can be studied by using even only two chains or many chains but heavily truncated, if a good description of this process can be provided [71]. A good example is studies in which possibility of the aggregation was investigated by measurement of fibril-prone structure population [72], as in the works of Viet and Li, who demonstrated that addition of Aβ40 inhibits Aβ42 aggregation [73]. Thanks to the truncation, it is possible to investigate aggregation of dimers, trimers, and bigger oligomers [74] using enhanced sampling methods, such as replica exchange molecular dynamics [75–77], which allows to overcome energy barriers and, therefore, study the conformational space more thoroughly. On the other hand, we found that in case of monomeric IDPs with the use of modern FFs, it may be enough to use conventional MD simulations, which can be speed-up by using GP-GPU calculations [78, 79], due to small energy barriers between various conformations [58]. It is especially useful because studies of monomers’ beta content may be enough to predict their aggregation rates [80]; however, it is important to remember that aggregation rate alone does not provide any information about conformation and toxicity of aggregates, and some small changes in the sequence may, e.g., induce the formation of nontoxic ellipse-like aggregates

56

Nguyen Truong Co et al.

Table 1 Summary of the modern all-atom FF for studying structured proteins and IDPs. Official distributions of the FF versions are highlighted by gray background. In “Systems” column, letters S and D indicate that FF is suitable for structured or/and disordered proteins, respectively

FF

Parameters ff99sb a99SB-disp

Systems Notes S S+D

Ref.

Old and very [32] popular FF Optimized a99SB[59] ILDN FF

S+D Improved ff99SBnmr1 FF

ff99SBnmr2 S+D ff03CMAP S AMBER

ff14sb D ff14IDPs D ff14IDPSFF S+D ESFF1 S+D ff19sb S

C22 S

C22/CMAP S

CHARMM C36 S+D C36m D CHARMM36IDPSFF

Correction maps [CMAP]optimized ff03sb FF. Improved ff99sb by tuning dihedral potentials ff14sb FF with modifications for 8 residues. ff14sb FF with modifications for 8 residues. Extended ff14sb FF with 71 backbone CMAP energy terms Improved backbone profiles from ff14sb. Very old and popular version of the FF C22 with inclusion of an energy correction map [CMAPs] Recent version of the FF for regular proteins Optimized C36 FF with emphasis on disordered proteins C36m with modified CMAP parameters for 20 residues.

[60]

[61]

Recommended Ref. water model TIP3P

[36]

TIP4P-D

[55]

TIP3P for folded and TIP4P-D for IDPs TIP4P-Ew for folded and TIP4P-D for IDPs

[36,55]

[55,62]

[33]

TIP3P

[36]

[51]

TIP3P

[36]

[52]

TIP3P

[36]

[63]

TIP4P-D

[55]

[34]

OPC

[35]

[37]

TIP3P

[36]

[38]

Modified TIP3P [64]

[39]

[40]

[53]

Modified TIP3P [εh= −0.046 [39] kcal/mol] Modified TIP3P [εh= −0.100 [40] kcal/mol] Modified TIP3P [εh= −0.100 [40] kcal/mol] (continued)

Computational Models for the Study of Protein Aggregation

57

Table 1 (continued) S S

54A7 54A7_β GROMOS

S

54A8 S

OPLS-AA

S

OPLS-AA-L S

OPLSIDPSFF OPLS

S

L-OPLS S+D OPLS-AA/M

Popular FF version Optimization of beta-structures in 54A7 FF 54a7 with recalibrated nonbonded interactions of charged residues All-atom version of OPLS FF OPLS-AA with optimized key Fourier torsional coefficients Residue-specific variant of OPLSAA-L FF Improved treatment of long hydrocarbons [e.g. lipid bilayers] Improved OPLSAA FF for proteins with additional parameters for nucleic acids

[41]

SPC

[65]

[66]

SPC

[65]

[42]

SPC

[65]

[67]

TIP4P-D

[55]

[46]

TIP3P-MOD

[68]

[44]

[45]

[47,48] TIP3P

[36]

[81]. Another important note is that one should be careful when studying truncated systems, because lack of even one or two amino acid residues can significantly change their properties, such as aggregation rate, which is especially true for IDPs [82]. Influence of lipid bilayer on the amyloid peptide aggregation and peptide on bilayer stability was also investigated for monomers [83], dimers [84], and tetramers [85] including the structures manually inserted into the membrane forming beta-barrel structures [86], which were found also experimentally [87]. In case of amyloid beta, it is equally important to study aggregation of the chains as the inhibition of this process, due to the presence of different compounds, such as fullerenes [88] and their derivatives [89], curcumin [90], or small peptides [91]. Such attempts are conducted by many groups in order to provide theoretical background for fibril-related disease [10]. Computational studies of aggregates of amyloidogenic polypeptides such as Aβ, α-synuclein, islet amyloid polypeptide, tau protein, and prion protein have been recently reviewed by Ilie and Caflisch [92].

58

3

Nguyen Truong Co et al.

Coarse-Grained Models In general, idea of coarse-graining is based on the assumption that reduction of the interacting centers decreases the computational time required for every MC or MD step and that conformational space can be searched more thoroughly due to smoother energy landscape (Fig. 3) and better sampling. Simplification of the system representation always bears a risk that some important details will be missing, negatively impacting accuracy of the method [93, 94]; therefore, users should be even more careful to check if a given approach provides satisfactory results for the investigated phenomenon [95, 96]. On the more positive side, coarse-grained FF can provide much wider view than all-atom methods, due to ability to study system or phenomenon much more extensively, using higher number of longer trajectories, providing better statistics and average properties than single-trajectory studies and discover secondary pathways of some processes [97]. All types of coarse-grained FFs, including structure-based, knowledge-based, and dynamics-based model, are currently very intensively developed to allow reliable simulations of macromolecular complexes [98]. Additionally, there are approaches to develop and use multiscale coarse-grained simulations to study biological systems [99, 100], or to include polarization [101] and reactivity [102]; there are not yet advanced enough to use for complex biomacromolecular systems.

3.1 Typical FFs Used for Studies of Systems Involving Peptides or Proteins

Although there are many in-house coarse-grained FFs dedicated for studies of very limited number of systems, or even single cases [103, 104], there are also general-purpose coarse-grained FFs, such as AWSEM [105], CABS [106], MARTINI [107, 108],

Fig. 3 Schematic representation of the all-atom, coarse-grained, and lattice representations and respective energy landscapes

Computational Models for the Study of Protein Aggregation

59

OPEP [109, 110], PaLaCe [111], PRIMO [112], SIRAH [113, 114], and UNRES [115–117]. Contrary to the all-atom FFs, coarse-grained ones differ significantly not only in parameters of interacting centers, but also in system representation (level of reduction) and energy functions [93, 118]. Although such universal FFs allows in principle to study a plethora of phenomena, such as protein folding, conformational changes, and aggregation, due to complexity of these processes, performance of every method should be verified before application. One good example is the MARTINI FF, which was found to cause excessive, irreversible, and non-selective aggregation of membrane proteins [119], which is one of the problems assigned to be fixed in the next version of the FF [120]. Another issue comes from the unstructured character of IDPs, which are poorly described by most of the FFs, mostly due to overestimation of secondary structure stability. In some cases, it can be simply fixed by tuning energy potential terms responsible for secondary structure, like in ASWEN-IDP [121] or increasing strength of protein–water interactions [122]; however, sometimes design of the completely new method instead of modifying existing one is more convenient, like in case of FRAGFOLD-IDP [123], being redesigned CABSFlex [124]. One should have in mind that although design of IDP-specific version of the method should improve performance for these proteins, it may corrupt results for regular systems. Another problem may come from the fact that majority of coarse-grained FFs use knowledge-based potentials to describe studied systems. Their usage to study aggregation of IDPs, such as Aβ42, may be problematic, due to the fact that they are strongly biased by the structures deposited in the Protein Data Bank [125], used for parametrization, which are available only for fibrils, not for monomeric or oligomeric forms in water. 3.2 Applications of General-Purpose Coarse-Grained FFs

Despite the limitations, at some point every popular coarse-grained FF was successfully used to study aggregation or oligomerizationrelated effects. It is unavoidable, as it is predicted that more than 80% of proteins stay not alone in the cell, but in complexes [126]. A few examples are presented below. The AWSEM Amylometer is a useful and powerful tool for prediction of amyloidogenic segments from the sequence providing additional information of thermodynamic and kinetic roles of these segments in folding and aggregation based on AWSEM FF [127]. Very recently, the same FF was used to study nucleation of two fibrils derived from patients with Pick’s and Alzheimer’s diseases showing importance of oligomeric structure on the fibrilization: oligomers with parallel in-register β-strands lead to fibril formation, while not ordered β-strand stacking lead to amorphous structures [128]. Even though in the

60

Nguyen Truong Co et al.

MARTINI model, secondary structure is fixed during simulation, there are still some attempts, in which it can be successfully used for structural studies of aggregation dynamics of self-assembling systems, such as self-assembling peptides [129], dipeptides [130] and other short peptides [131], and even effect of lipid bilayer on Aβ peptide [132]. Newer FFs, like PRIMO and PaLaCe, were not extensively used yet, but even for them they are examples of association studies [133]. Various coarse-grained FFs are used to study also aggregation process of other molecules, such as α-synuclein, for which it was found that formation of β-hairpin in region 38–53 is necessary for the aggregation [134] and that non-amyloid-β component (NAC) mutations can disturb aggregation [135]. Influence of some molecules, like trehalose, which promote alternate aggregation pathway leading to the formation of amorphous aggregates was also studied as a possible way to treat Parkinson disease [136]. Another molecule, which is commonly studied for aggregations effect, is tau protein, for which nucleation kinetics of hexapeptide fragments involved in fibril formation was extensively studied by a coarsegrained FF [137], as well as effect of the temperature on fibril formation [138]. 3.3 OPEP Coarse-Grained FF

The most extensively used coarse-grained FF for studies of peptide and protein aggregation is OPEP (Optimized Potential for Efficient Protein Structure Prediction), developed by the group of Philippe Derreumaux for more than 20 years [109]. In OPEP, protein chain representation is reduced this way that the backbone consists of all atoms (N, Cα, C, O, and H), while side chain is simplified to coarse-grained form (one bead, except for proline, which is described by all heavy atoms) to find good compromise between accuracy and speed-up [139]. Authors not only carefully designed the FF and model, but extensively tested it for protein folding capabilities [140], pH dependence [141], and investigated influence of simulation temperature [142] and thermostat on the obtained results [143]. Using ART-OPEP simulations, it was demonstrated that the formation of oligomeric metastable structures is an important step in fibrilization process and provided possible explanation of dependency of β-sheet formation on pH conditions [144]. Later, using OPEP and all-atom FF, it was presented that the formation of stable β-barrel structures of NHVTLSQ oligomers [145] may be an important early aggregation step in fibril formation process [146]. Very recently, the same group demonstrated using all-atom methods that similar structures may be obtained by truncated Aβ11–40 in dipalmitoylphosphatidylcholine membrane models [147]. In other recent studies, OPEP was used to study 1000

Computational Models for the Study of Protein Aggregation

61

chains of very truncated Aβ16–22 to show prefibril elongation mechanism, including pore and branch formation during aggregation process [148]. Thanks to the addition of the hydrodynamical effects to the OPEP FF, it can better capture kinetics of aggregation and association processes, speeding-up the collapse of molecules by about 40% [149]. 3.4 UNRES Coarse-Grained FF

UNRES (UNited RESidue) is a coarse-grained physics-based FF, in which polypeptide chain representation is reduced to two interaction centers: peptide and side chain. It has been developed for more than 20 years by Liwo, Scheraga, and coworkers to allow realistic studies of peptides and proteins. It allows to study ab initio folding processes of many single-chain [95, 150, 151] and multi-chain proteins [152] including their aggregation for two [153], four [154, 155], and more chains with a good accuracy, if the satisfactory simulation time can be reached. UNRES can work purely as an ab initio method, without any information from databases, or to utilize in simulations some information, like predicted contacts between residues [156], SAXS data [157] or domain [158], and protein fragment [159] structures. In the past, UNRES was used to study the growth mechanism of Aβ fibrils by adding monomer to the existing fibril template without any additional bias, which allowed to confirm the docklock mechanism with two distinct locking stages and importance of hydrophobic contacts between chains [160]. These studies were recently extended to determine most probable pathways of fibril elongation and residues necessary for the process to occur [161]. Also the importance of α-to-β transition of 17–21 residue fragment was found by the molecular dynamics simulation in UNRES FF to allow propagation along the 28-residue aminoterminal fragment of Aβ chains [162]. It was demonstrated that in the presence of extended 16–21 residue fragment of Aβ (such as in fibrils, but not in monomers), Aβ can bind to repeat domain of tau forming Aβ40 fibril–tau aggregates [163]. In the most recent studies, we showed using multi-scale simulations that tetrameric structures of Aβ significantly differ from Aβ fibrils [155] and that Aβ–water interactions are key for stabilization of monomers [58] and small oligomers. We also showed that the formation of the tetramer is mostly due to interaction between two dimers, rather than trimer and monomer (Fig. 4) what was suggested also by other studies using OPLS/AA FF [164].

62

Nguyen Truong Co et al.

1

(1.5 %)

% 7 4. % 4.4 3.2 %

24

.2

%

32.0 % 0%

6% 50.

13.

3 (8.0 %)

7%

73.

% .6 63

25.2 %

83

.8

%

4 (76.6 %)

2

.6

21 (13.9 %)

%

Fig. 4 Transition network from UNRES REMD simulation showing oligomer size (label on each node) and transitions between different forms (arrows) [155]

4

Other Off-Lattice Coarse-Grained FFs In the following paragraphs, some off-lattice coarse-grained models designed and applied for the study of peptide aggregation are briefly described. Since there are many models of these types, for convenience we call them after the name of the authors who have developed them. Representation of amino acids in most of the models described below is more simplified than in general-purpose coarse-grained FFs.

4.1 The Vacha– Frenkel Model

Vacha and Frenkel constructed a generic CG model, in which the peptides had patchy sphero-cylinder’s (PSC) shape with a stripe on its side representing their binding ability. Using Monte Carlo simulations, the authors proved that this model can successfully predict the existence of oligomeric states capturing a two-filament amyloidlike structure [165]. Then, in the two-state modified model, specifically designed to describe Aβ peptides, the patchy particles can switch between α- or β-states corresponding to the soluble peptide or extended peptide conformation in Aβ-amyloid structure, respectively. By performing 2μs of Dynamic Monte Carlo (DMC) simulation for a system of 600 PSCs, the authors were able to observe

Computational Models for the Study of Protein Aggregation

63

fibrillar species with a morphology similar to experimental observations. The kinetics of patchy self-assembly is consistent with Oosawa’s theory and the critical nucleation size was estimated to be about 3.8 chains [166]. Based on the PSC model, properties of the fibrillar nucleation-dependence kinetics were further studied [167–169], and the effect of various surfaces on the rate of amyloid formation was systematically investigated [170]. 4.2 The Barz–Urbanc Model

Barz and Urbanc defined the unit of their minimal self-assembly model as a tetrahedron of two attractive (hydrophobic) and two repulsive (hydrophilic) beads located at its vertices [171]. The model employed a discrete MD algorithm combined with periodic boundary condition and an implicit solvent. By modulating the values of the hydropathic parameter η, the ratio between repulsive and attractive interaction, the authors obtained various morphologies of aggregates such as quasi-spherical oligomers, curved tubules, curvilinear protofibrils, and multi-domain aggregates. The mechanism of monomer addition, assembly fusion, and breakdown has also been reported [171].

4.3 The Hoang–Trovato–Seno– Banavar–Maritan Model

Auer et al. extended Hoang-Trovato-Seno-Banavar-Maritan tube model [172], in which each residue is simplified to Cα atom, represented as a flexible tube, in order to study the nucleation and growth mechanism of peptide fibrillation [173] and to shed light on the kinetics of conversion from disordered oligomer species into protofilaments [174]. Hoang et al. applied the model to study the sequence dependence of the aggregation process, stating that the fibril template created by highly fibril-prone sequences can assist the formation of poorly amyloidogenic sequences into a fibrillike structure [175].

4.4 The Mioduszewski–Cieplak Model

Cieplak and Mioduszewski developed one-bead-per-residue Cα model to investigate intrinsic disorder proteins using unique design in which contacts between beads can form and disappear during MD simulation [176]. Using their model, the authors successfully constructed the polyglutamine and polyalanine phase diagrams which not only confirmed the existence of liquid–gas coexistence curve at room temperature but also revealed a novel amyloid-glass phase corresponding to the fibril-liked structures of the proteins at low temperatures [177]. Recently, the model was updated with the introduction of nonradial multibody pseudo-improper-dihedral potentials which allowed to more accurately capture protein and protein assembly properties during MD simulations [178].

4.5 The Ilie–den Otter–Briels Model

Ilie and coworkers developed a highly CG polymorph patchy particle model [179] to study α-synuclein and its self-assembly. In their model, the protein is treated as a particle, and a changeable internal state was assigned to characterize the structural adaptability of this intrinsically disordered protein.

64

Nguyen Truong Co et al.

The disordered state and β-sheets are described as a solid sphero-cylinder with a long attractive stripe and soft spheres, respectively. The probability of shifting between the two states was set to favor the ordered state for bound particles. The authors matched the particle parameters to the experimental data of α-synuclein and performed Brownian dynamics simulations. They found that the kinetics of fibril formation confronted to either the nucleation and growth mechanism or a two-step mechanism. Furthermore, the preformed fibrils promoted the conversion of oligomers to fibrils. The authors also introduced a higher resolution version of α-synuclein as a mixture connecting polymorph particles and examined the kinetics of peptides incorporation at the fibril end [180]. 4.6 The Pellarin– Caflish Model

Pellarin and Caflish developed a two-state CG phenomenological model of a simple amphipathic peptide consisting of ten beads, of which six beads described the peptide side chain and the remaining four served as its backbone [181]. The peptide can rotate around its internal dihedral, which is a unique freedom degree of the system, to switch between amyloid-competent (β) and amyloid-protected states (π), and the energy barrier between the two states modulates the degree of amyloidogenicity of the peptide. For highly amyloidprone chains (HAPs), aggregation events occurred even at concentrations lower than the critical concentration of micelle (CMC), and the fibril mass was accumulated directly through a single pathway of a small nucleus without micelles, as well as intermediate protofibrils with the growth rate strongly dependent on the peptide concentration. Poorly amyloid-prone (PAPs) proteins nucleated through multiple pathways with a large nucleus at concentrations above CMC and fibril formation proceeded slowly through different metastable intermediates. It was found that the concentration of molecules has little impact on the fibril growth rate [181, 182]. Their simulation results also indicated that HAPs favorably formed the fibril morphology with the highest stability, while for PAPs, the formation of the fibril shape was regulated by kinetics [183]. Additionally, self-assembly of PAPs was accelerated by the crowding effect, retarded by membranes as well as surfactants, but does not play a clear role in membrane leakage. Whereas, the process of fibril formation of HAPs promoted membrane leakage is lightly enhanced by membranes, but is not sensitive to crowders and marginally influenced by surfactants [184–186].

4.7 The Bellesia– Shea Model

Each amino acid residue in the Bellesia–Shea model [187] is represented by three beads, two beads for the backbone and one for the sidechain. The authors introduced four different types of side chains: hydrophobic, polar, positively and negatively charged, and end group capped at both termini of the peptide to prevent edgeto-edge aggregation. The dihedral potential of the backbone was used to alter the β-sheet propensity of peptides. The model shared a

Computational Models for the Study of Protein Aggregation

65

similar self-assembly kinetics with the Pellarin–Caflisch CG model [188], for HAPs the aggregation pathway included a small ordered β-strand forming nucleus, which subsequently acted as a template for fibril growth, while for PAPs, initially formed amorphous clusters gradually transforming into ordered structures. Bellesia–Shea model can capture diverse structures such as disordered oligomers, beta barrels, and multilayer fibrils. Simulations in the presence of absorbing solid foreign surfaces and lipid bilayers membranes showed that both environments promote the formation of the β-sheet motif near the surfaces [189].

5

On-Lattice Models for the Study of Protein Aggregation This part discusses the most simplified on-lattice coarse-grained models used to study protein aggregation. Such models are usually designed to solve a specific problem and focus on general physical principles, rather than on biology. In particular, we pay more attention to the lattice model, originally constructed by Li, Klimov, and Thirumalai [190], and then developed and extended by Li and Co [191–195] to various systems.

5.1 The Irback–Jonsson– Linnemann–Linse– Wallin model

Irback et al. designed a model [196], in which the peptides are represented by unit-length sticks located on lattice sites. Each peptide consists of three vectors characterizing the backbone, hydrogen bond, and side chain orientation (Fig. 5a). The movement of sticks is enabled through a MC algorithm. MC simulations

Fig. 5 Monomers and fibril-like structures in several lattice models. (a) Stick model [196], (b) cuboid model [197], (c) one-bead lattice model, (d) multi-bead lattice model [198] (the small part represents a fragment of peptide)

66

Nguyen Truong Co et al.

of more than 105 monomers allowed to capture the sigmoid kinetics of fibrillar growth. In addition, the interplay between length and width during the fibril nucleation indicated that the longitudinal growth of fibril-like structures occurs only after their width reaches a threshold value [196]. 5.2 The Zhang– Muthukumar Model

To study the nucleation and elongation of amyloid fibrils, Zhang and Muthukumar developed a lattice model, in which monomers are represented by cuboid unit cells in a cuboidal simulation box (Fig. 5b) [197]. A unit cell can reflect an extended peptide, a folded peptide, and a pair of peptides, and its random movements are allowed by MC random walks. The simulation result showed that the aggregation of monomers followed nucleation-dependent behavior, in which the lag phase and elongation stage follow the Ostwald ripening mechanism [197].

5.3 The Abeln–Vendruscolo– Dobson–Frenkel Model

Abeln et al. have developed a multi-bead lattice model allowing to study the interplay between fibrillization and folding processes (Fig. 5d) [198, 199]. In this model, each residue comprises one bead located at a lattice site and a vector representing the directionality of the side chain. Vacant lattice sites describe the surrounding solvent and possibly interact with both the backbone and side chains. The classical MC simulation showed that a predefined folding structure can be achieved by careful design of the peptide sequence. The authors also obtained the β-strand motif in both folded structure and fibrillar species. In addition, Tran and coworkers implemented the OPEP FF to the Abeln–Vendruscolo– Dobson–Frenkel model and performed simulations of aggregation of truncated Aβ peptides to estimate their critical nucleus size [200].

5.4 The Li–Klimov–Straub– Thirumalai Model

Inspiring by the oligomerization of the Aβ16–22 fragments, Li et al. developed a simple on-lattice model, where a polypeptide chain has eight beads +HHPPHH. Here + and  stand for positively and negatively charged beads, respectively, while P and H refer to polar and hydrophobic residues (Fig. 5c) [190]. The interactions between the pairs of residues were chosen so that they roughly mimic the real properties. In Monte Carlo simulations, the peptides changed their configuration through random local and global moves in combination with the classical Metropolis algorithm. The fibrillar structure (Fig. 5c) corresponds to the lowest energy. The model can describe the typical sigmoidal dependence of the fibril mass on the simulation time involving the lag, growth, and saturation phases (Fig. 2). Furthermore, three popular types of kinetics have been observed at various stages in the fibril formation process, including nucleation and growth, templated assembly, as well as nucleated conformational conversion [190].

Computational Models for the Study of Protein Aggregation

67

The Li–Klimov–Straub–Thirumalai model has been used to systematically study the factors that govern the kinetics of protein aggregation. It was found [191] that the stronger attractive electrostatic and hydrophobic interactions between the polypeptide chains, the faster fibril formation, which is consistent with the experiment [201, 202]. Interestingly, the fibril formation time τfib exponentially depends on the population of the fibril-prone mono 0 meric state N* (Fig. 6a), PN*: τfib ¼ τ0fib exp cP max N∗ , where τ fib and c are fitting constants. Our studies using all-atom models supported this conclusion [80]. Similar to the experimental [203] and simulation results obtained using the Pellarin–Caflish models [181], for HAPs, we observed a direct association of peptides to form rich β-strand clusters during an early stage of aggregation. The fibril growth phase appears only after the formation of a critical nucleus, from which monomers favor to incorporate into the template. Based on this argument, we designed various systems, including a rich β preformed template and one separate peptide (Fig. 6b) and measured adding time τadd (the time required for a peptide to assemble into its template). Simulations showed that τadd increased until the size of the preformed template reaches the critical nucleus size (Nc), above which τadd becomes independent of the template size. The obtained Nc value agreed well with the critical nucleus size estimated by the

Fig. 6 The factors governing the self-assembly rate uncovered by using Li–Klimov–Straub–Thirumalai lattice model: (a) Fibril-prone state N*, (b) preformed template and adding monomer, (c) snapshot of polypeptide chains surrounded by cubic crowders, (d) six peptides in a confined box, (e) fibril structure of six chains on a hydrophilic smooth surface

68

Nguyen Truong Co et al.

free energy scaling method [193, 204]. In addition, using all-atom simulations, we proved that the pathways of fibril formation from the immobilized template (monomers in the template were kept fixed) must cross more intermediate states than paths starting from the fluctuating template. Consequently, the fluctuating template assisted the fibril formation better than the immobile template. As a result, the template fluctuation can be considered as a factor controlling the aggregation process, and the slow formation of mature fibril structures during the saturation phase is probably due to the rigidity of their preformed template [194]. To study the impact of the environment on the fibril growth kinetics, cube crowders were added to the Li–Klimov–Straub–Thirumalai model (Fig. 5c). They can make self-avoiding random walks and do not interact with each other, nor with peptides. Having carried out simulations with various concentrations and sizes of crowders, we captured the excluded volume effect of crowding particles [205, 206]. For a given crowder size, the presence of the milieu restricted the space for peptides resulting in accelerated selfassembly of the chains. This observation well matched with the previous theoretical [207] and experimental works [185, 208]. However, when the size of crowders was sufficient small, they hindered the aggregation process, and this dual effect is consistent with the experiment of Cabaleiro-Lago et al. [209]. Besides, the study of protein fibrils in a confinement space was performed by switching periodic boundary conditions and changing the size of the simulation box (Fig. 6d). The compromise between energetic terms and entropy resulted in a U-shape dependence of τfib on the size of the confining box [192]. Combing the Li–Klimov–Straub–Thirumalai model with all-atom models, we were able to show that the higher the mechanical stability of the fibril state, the faster the fibril formation [195] which is partly consistent with the experiment. However, this hypothesis requires further computational and theoretical support. Finally, we also used our simplified model as a guide tool to develop a phenomenological theory to explain the mechanism of heat-induced degradation of fibrils and to explore the effect of different surfaces on peptide assembly [210]. We have shown that the time dependence of the fibril content, which can be measured by the ThT fluorescence assay, obeys a bi-exponential function. However, the number of unbounded chains, which can be probed by tryptophan fluorescence, follows the logistic kinetics.

6

Conclusions In conclusion, we highlight future directions in the development of computational models to study protein folding and aggregation.

Computational Models for the Study of Protein Aggregation

69

A good example of use of the all-atom FF to study enormous systems was done by Sugita, Feig and coauthors, who studied bacterial cytoplasm with more than 100 million atoms in a nanosecond timescale [211]. These simulations showed possible problems with the FF, which were not possible to be identified in simulations of smaller systems and presented view of proteins and other biomacromolecules in crowding environment, whose behavior can be significantly different than in bulk water [211] mostly due to confined space and presence of nonspecific interactions between molecules, which can affect protein stability [212], folding, and aggregation [213]. Although limited timescale of the simulations did not allow study protein aggregation explicitly, obtained results strongly suggest that this phenomenon should be studied including other molecules, which are competing in interactions, destabilizing the structure and promoting association [214]. Due to the constant improvement and development of both computational resources and methods [215, 216], in near future such large-scale studies should become more affordable and may even allow to use polarizable FFs [217, 218] to better capture effects related to charge distribution. It is well known that the presence of metal ions can change not only the aggregation rate but the morphology of the aggregate [219]. For example, at high concentrations of Cu2+, the aggregate of Aβ peptides becomes amorphous. The development of FFs that adequately describe free transition metal ions and their interaction with fibril-prone proteins remains a challenge. Finally, machine learning is emerging as a useful tool for constructing coarse-grained FFs from large ab initio databases [220], and important advances in this direction can be expected.

Acknowledgments We appreciate collaboration with Hoang Linh Nguyen. This research has been supported by Narodowe Centrum Nauki in Poland (Grant 2019/35/B/ST4/02086), the Department of Science and Technology at Ho Chi Minh city (Grant 07/2019/HÐ-KHCNTT), the Vietnam Ministry Education and Training (Project B2019.SPD.03), Vietnam. References 1. Chiti F, Dobson CM (2006) Protein misfolding, functional amyloid, and human disease. Annu Rev Biochem 75:333–366 2. Hardy JA, Higgins GA (1992) Alzheimer’s disease: the amyloid cascade hypothesis. Science (80- ) 256:184–185 3. Spillantini MG, Crowther RA, Jakes R et al (1998) α-Synuclein in filamentous inclusions

of Lewy bodies from Parkinson’s disease and dementia with Lewy bodies. Proc Natl Acad Sci U S A 95:6469–6473 4. Bernstein SL, Dupuis NF, Lazo ND et al (2009) Amyloid-β protein oligomerization and the importance of tetramers and dodecamers in the aetiology of Alzheimer’s disease. Nat Chem 1:326–331

70

Nguyen Truong Co et al.

5. Jarrett JT, Lansbury PT (1993) Seeding “one-dimensional crystallization” of amyloid: a pathogenic mechanism in Alzheimer’s disease and scrapie? Cell 73:1055–1058 6. Harper JD, Lansbury PT (1997) Models of amyloid seeding in Alzheimer’s disease and scrapie: mechanistic truths and physiological consequences of the time-dependent solubility of amyloid proteins. Annu Rev Biochem 66:385–407 7. Hofrichter J, Ross PD, Eaton WA (1974) Kinetics and mechanism of deoxyhemoglobin S gelation: a new approach to understanding sickle cell disease. Proc Natl Acad Sci U S A 71:4864–4868 8. Nguyen PH, Li MS, Stock G et al (2007) Monomer adds to preformed structured oligomers of Aβ-peptides by a two-stage docklock mechanism. Proc Natl Acad Sci U S A 104:111–116 9. Cohen SIA, Linse S, Luheshi LM et al (2013) Proliferation of amyloid-β42 aggregates occurs through a secondary nucleation mechanism. Proc Natl Acad Sci 110:9758–9763 10. Nasica-Labouze J, Nguyen PH, Sterpone F et al (2015) Amyloid β protein and Alzheimer’s disease: when computer simulations complement experimental studies. Chem Rev 115:3518–3563 11. Wu C, Shea JE (2011) Coarse-grained models for protein aggregation. Curr Opin Struct Biol 21:209 12. Morriss-Andrews A, Shea J-E (2015) Computational studies of protein aggregation: methods and applications. Annu Rev Phys Chem 66:643–666 13. Chong S-H, Chatterjee P, Ham S (2017) Computer simulations of intrinsically disordered proteins. Annu Rev Phys Chem 68:117–134 14. Anfinsen CB (1973) Principles that govern the folding of protein chains. Science (80- ) 181:223–230 15. Levinthal C (1969) How to fold graciously. In: Debrunner P, Tsibris J, Munck E (eds) Mossbauer spectroscopy in biological systems: Proceedings of a Meeting Held at Allerton House, Monticello, Illinois. University of Illinois Press, Champaign, IL, pp 22–24 16. Levinthal C (1968) Are there pathways for protein folding? J Chim Phys 65:44–45 17. Hartl FU, Hayer-Hartl M (2009) Converging concepts of protein folding in vitro and in vivo. Nat Struct Mol Biol 16:574–581 18. Frozza RL, Lourenco MV, Felice FG (2018) Challenges for Alzheimer’s disease therapy:

insights from novel mechanisms beyond memory defects. Front Neurosci 12:37 ˜a J, Sanz-Ros J (2018) Alzheimer’s dis19. Vin ease: only prevention makes sense. Eur J Clin Investig 48:e13005 20. Weiner PK, Kollman PA (1981) AMBER: assisted model building with energy refinement. A general program for modeling molecules and their interactions. J Comput Chem 2:287–303 21. Brooks BR, Bruccoleri RE, Olafson BD (1983) CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. J Comput Chem 4:187–217 22. Hermans J, Berendsen HJC, Gunsteren WFV et al (1984) A consistent empirical potential for water-protein interactions. Biopolymers 23:1513–1518 23. Jorgensen WL, Tirado-Rives J (1988) The OPLS [optimized potentials for liquid simulations] potential functions for proteins, energy minimizations for crystals of cyclic peptides and crambin. J Am Chem Soc 110:1657–1666 24. Gunsteren WFV, Berendsen HJC (1977) Algorithms for macromolecular dynamics and constraintdynamics. Mol Phys 34:1311–1327 25. Andersen HC (1983) Rattle: a “velocity” version of the shake algorithm for molecular dynamics calculations. J Comput Phys 52:24–34 26. Miyamoto S, Kollman PA (1992) Settle: an analytical version of the SHAKE and RATTLE algorithm for rigid water models. J Comput Chem 13:952–962 27. Gonza´lez MA (2011) Force fields and molecular dynamics simulations. Collect SFN 12:169–200 28. Lopes PEM, Guvench O, MacKerell AD (2015) Current status of protein force fields for molecular dynamics simulations. Methods Mol Biol (Clifton, NJ) 1215:47–71 29. Garcia AE, Sanbonmatsu KY (2002) α-helical stabilization by side chain shielding of backbone hydrogen bonds. Proc Natl Acad Sci U S A 99:2782–2787 30. Ono S, Nakajima N, Higo J et al (2000) Peptide free‐energy profile is strongly dependent on the force field: comparison of C96 and AMBER95. J Comput Chem 21:748–762 31. Freddolino PL, Park S, Roux B et al (2009) Force field bias in protein folding simulations. Biophys J 96:3772–3780 32. Hornak V, Abel R, Okur A et al (2006) Comparison of multiple Amber force fields and

Computational Models for the Study of Protein Aggregation development of improved protein backbone parameters. Proteins Struct Funct Bioinf 65:712–725 33. Maier JA, Martinez C, Kasavajhala K et al (2015) ff14SB: improving the accuracy of protein side chain and backbone parameters from ff99SB. J Chem Theory Comput 11:3696–3713 34. Tian C, Kasavajhala K, Belfon KAA et al (2020) Ff19SB: amino-acid-specific protein backbone parameters trained against quantum mechanics energy surfaces in solution. J Chem Theory Comput 16:528–552 35. Izadi S, Anandakrishnan R, Onufriev AV (2014) Building water models: a different approach. J Phys Chem Lett 5:3863–3871 36. Jorgensen WL, Chandrasekhar J, Madura JD et al (1983) Comparison of simple potential functions for simulating liquid water. J Chem Phys 79:926–935 37. MacKerell AD, Bashford D, Bellott M et al (1998) All-atom empirical potential for molecular modeling and dynamics studies of proteins. J Phys Chem B 102:3586–3616 38. Mackerell AD, Feig M, Brooks CL (2004) Extending the treatment of backbone energetics in protein force fields: limitations of gas-phase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulation. J Comput Chem 25:1400–1415 39. Huang J, MacKerell AD (2013) CHARMM36 all-atom additive protein force field: validation based on comparison to NMR data. J Comput Chem 34:2135–2145 40. Huang J, Rauscher S, Nawrocki G et al (2017) CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nat Methods 14:71–73 41. Schmid N, Eichenberger AP, Choutko A et al (2011) Definition and testing of the GROMOS force-field versions 54A7 and 54B7. Eur Biophys J 40:843–856 42. Reif MM, Hu¨nenberger PH, Oostenbrink C (2012) New interaction parameters for charged amino acid side chains in the GROMOS force field. J Chem Theory Comput 8:3705–3723 43. Reif MM, Winger M, Oostenbrink C (2013) Testing of the GROMOS force-field parameter set 54A8: structural properties of electrolyte solutions, lipid bilayers, and proteins. J Chem Theory Comput 9:1247–1264 44. Jorgensen WL, Maxwell DS, Tirado-Rives J (1996) Development and testing of the OPLS all-atom force field on conformational

71

energetics and properties of organic liquids. J Am Chem Soc 118:11225–11236 45. Kaminski GA, Friesner RA, Tirado-Rives J et al (2001) Evaluation and reparametrization of the OPLS-AA force field for proteins via comparison with accurate quantum chemical calculations on peptides. J Phys Chem B 105:6474–6487 46. Siu SWI, Pluhackova K, Bo¨ckmann RA (2012) Optimization of the OPLS-AA force field for long hydrocarbons. J Chem Theory Comput 8:1459–1470 47. Robertson MJ, Tirado-Rives J, Jorgensen WL (2015) Improved peptide and protein torsional energetics with the OPLS-AA force field. J Chem Theory Comput 11:3499–3509 48. Robertson MJ, Tirado-Rives J, Jorgensen WL (2016) Performance of protein-ligand force fields for the flavodoxin-flavin mononucleotide system. J Phys Chem Lett 7:3032–3036 49. Wright PE, Dyson HJ (1999) Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. J Mol Biol 293:321–331 50. Huang J, MacKerell AD (2018) Force field development and simulations of intrinsically disordered proteins. Curr Opin Struct Biol 48:40–48 51. Song D, Wang W, Ye W et al (2017) ff14IDPs force field improving the conformation sampling of intrinsically disordered proteins. Chem Biol Drug Des 89:5 52. Song D, Luo R, Chen H-F (2017) The IDP-specific force field ff14IDPSFF improves the conformer sampling of intrinsically disordered proteins. J Chem Inf Model 57:1166–1178 53. Liu H, Song D, Lu H et al (2018) Intrinsically disordered protein-specific force field CHARMM36IDPSFF. Chem Biol Drug Des 92:1722–1735 54. Liu H, Song D, Zhang Y et al (2019) Extensive tests and evaluation of the CHARMM36IDPSFF force field for intrinsically disordered proteins and folded proteins. Phys Chem Chem Phys 21:21918–21931 55. Piana S, Donchev AG, Robustelli P et al (2015) Water dispersion interactions strongly influence simulated structural properties of disordered protein states. J Phys Chem B 119:5113–5123 56. Best RB, Zheng W, Mittal J (2014) Balanced protein-water interactions improve properties of disordered proteins and non-specific protein association. J Chem Theory Comput 10:5113–5124

72

Nguyen Truong Co et al.

57. Gallat FX, Laganowsky A, Wood K et al (2012) Dynamical coupling of intrinsically disordered proteins and their hydration water: comparison with folded soluble and membrane proteins. Biophys J 103:129–136 58. Krupa P, Quoc Huy PD, Li MS (2019) Properties of monomeric Aβ42 probed by different sampling methods and force fields: role of energy components. J Chem Phys 151:055101 59. Robustelli P, Piana S, Shaw DE (2018) Developing a molecular dynamics force field for both folded and disordered protein states. Proc Natl Acad Sci U S A 115:E4758–E4766 60. Yu L, Li DW, Bru¨schweiler R (2020) Balanced amino-acid-specific molecular dynamics force field for the realistic simulation of both folded and disordered proteins. J Chem Theory Comput 16:1311–1318 61. Zhang Y, Liu H, Yang S et al (2019) Wellbalanced force field ff03 CMAP for folded and disordered proteins. J Chem Theory Comput 15:6769–6780 62. Horn HW, Swope WC, Pitera JW et al (2004) Development of an improved four-site water model for biomolecular simulations: TIP4PEw. J Chem Phys 120:9665–9678 63. Song D, Liu H, Luo R et al (2020) Environment-specific force field for intrinsically disordered and ordered proteins. J Chem Inf Model 60:2257 64. MacKerell AD Jr, Brooks B, Brooks CL III et al (1998) CHARMM: the energy function and its parametrization with an overview of the program. Encyclop Computat Chem 1:271–277 65. Berendsen HJC, Postma JPM, Gunsteren WFV et al (1981) Interaction models for water in relation to protein hydration. In: Pullman B (ed) Intermolecular forces. Springer, Dordrecht, pp 331–342 66. Lin Z, Gunsteren WFV (2013) Refinement of the application of the GROMOS 54A7 force field to β-peptides. J Comput Chem 34:2796–2805 67. Yang S, Liu H, Zhang Y et al (2019) Residuespecific force field improving the sample of intrinsically disordered proteins and folded proteins. J Chem Inf Model 59:4793–4805 68. Sun Y, Kollman PA (1995) Hydrophobic solvation of methane and nonbond parameters of the TIP3P water model. J Comput Chem 16:1164–1169 69. Scheraga HA (1983) Recent progress in the theoretical treatment of protein folding. Biopolymers 22:1–14

70. Klimov DK, Thirumalai D (2003) Dissecting the assembly of Aβ16-22 amyloid peptides into antiparallel β sheets. Structure 11:295–307 71. Nguyen PH, Li MS, Derreumaux P (2014) Amyloid oligomer structure characterization from simulations: a general method. J Chem Phys 140:094105 72. Nam HB, Kouza M, Zung H et al (2010) Relationship between population of the fibril-prone conformation in the monomeric state and oligomer formation times of peptides: insights from all-atom simulations. J Chem Phys 132:165104 73. Viet MH, Li MS (2012) Amyloid peptide Aβ 40 inhibits aggregation of Aβ 42: evidence from molecular dynamics simulations. J Chem Phys 136:245105 74. Nguyen PH, Li MS, Derreumaux P (2011) Effects of all-atom force fields on amyloid oligomerization: replica exchange molecular dynamics simulations of the Aβ16–22 dimer and trimer. Phys Chem Chem Phys 13:9778 75. Hansmann UHE (1997) Parallel tempering algorithm for conformational studies of biological molecules. Chem Phys Lett 281:140–150 76. Hansmann UHE, Okamoto Y (1993) Prediction of peptide conformation by multicanonical algorithm: new approach to the multipleminima problem. J Comput Chem 14:1333–1338 77. Sugita Y, Okamoto Y (1999) Replicaexchange molecular dynamics method for protein folding. Chem Phys Lett 314:141–151 78. Go¨tz AW, Williamson MJ, Xu D et al (2012) Routine microsecond molecular dynamics simulations with AMBER on GPUs. 1. generalized born. J Chem Theory Comput 8:1542–1555 79. Salomon-Ferrer R, Go¨tz AW, Poole D et al (2013) Routine microsecond molecular dynamics simulations with AMBER on GPUs. 2. Explicit solvent particle mesh ewald. J Chem Theory Comput 9:3878–3888 80. Thu TTM, Co NT, Tu LA et al (2019) Aggregation rate of amyloid beta peptide is controlled by beta-content in monomeric state. J Chem Phys 150:225101 81. Minh Thu TT, Huang SH, Tu LA et al (2019) G37V mutation of Aβ42 induces a nontoxic ellipse-like aggregate: an in vitro and in silico study. Neurochem Int 129:104512 82. Nguyen HL, Thi Minh Thu T, Truong PM et al (2016) Aβ41 aggregates more like Aβ40

Computational Models for the Study of Protein Aggregation than like Aβ42: in silico and in vitro study. J Phys Chem B 120:7371–7379 83. Huy Pham DQ, Krupa P, Nguyen HL et al (2020) Computational model to unravel the function of amyloid-β peptides in contact with a phospholipid membrane. J Phys Chem B 124:3300–3314 84. Manna M, Mukhopadhyay C (2013) Binding, conformational transition and dimerization of amyloid-β peptide on gm1-containing ternary membrane: insights from molecular dynamics simulation. PLoS One 8:e71308 85. Brown AM, Bevan DR (2016) Molecular dynamics simulations of amyloid β-peptide (1-42): tetramer formation and membrane interactions. Biophys J 111:937–949 86. Nguyen PH, Campanera JM, Ngo ST et al (2019) Tetrameric Aβ40 and Aβ42 β-barrel structures by extensive atomistic simulations. I. In a bilayer mimicking a neuronal membrane. J Phys Chem B 123:3643–3648 87. Serra-Batiste M, Ninot-Pedrosa M, Bayoumi M et al (2016) Aβ42 assembles into specific β-barrel pore-forming oligomers in membrane-mimicking environments. Proc Natl Acad Sci U S A 113:10866–10871 88. Huy PDQ, Li MS (2014) Binding of fullerenes to amyloid beta fibrils: size matters. Phys Chem Chem Phys 16:20030–20040 89. Bednarikova Z, Huy PDQ, Mocanu MM et al (2016) Fullerenol C60(OH)16 prevents amyloid fibrillization of Aβ40-: in vitro and in silico approach. Phys Chem Chem Phys 18:18855–18867 90. Ngo ST, Li MS (2012) Curcumin binds to Aβ1-40 peptides and fibrils stronger than ibuprofen and naproxen. J Phys Chem B 116:10165–10175 91. Viet MH, Ngo ST, Lam NS et al (2011) Inhibition of aggregation of amyloid peptides by beta-sheet breaker peptides and their binding affinity. J Phys Chem B 115:7433–7446 92. Ilie IM, Caflisch A (2019) Simulation studies of amyloidogenic polypeptides and their aggregates. Chem Rev 119:6956. https:// pubs.acs.org/sharingguidelines 93. Kmiecik S, Gront D, Kolinski M et al (2016) Coarse-grained protein models and their applications. Chem Rev 116:7898–7936 94. Oprzeska-Zingrebe EA, Smiatek J (2019) Some notes on the thermodynamic accuracy of coarse-grained models. Front Mol Biosci 6:87 95. Krupa P, Mozolewska MA, Wis´niewska M et al (2016) Performance of protein-structure predictions with the physics-based UNRES

73

force field in CASP11. Bioinformatics 32:3270–3278 96. Kynast P, Derreumaux P, Strodel B (2016) Evaluation of the coarse-grained OPEP force field for protein-protein docking. BMC Biophys 9:4 97. Maisuradze G, Medina J, Kachlishvili K et al (2015) Preventing fibril formation of a protein by selective mutation. Proc Natl Acad Sci 112:13549–13554 98. Pak AJ, Voth GA (2018) Advances in coarsegrained modeling of macromolecular complexes. Curr Opin Struct Biol 52:119–126 99. Izvekov S, Voth GA (2005) A multiscale coarse-graining method for biomolecular systems. J Phys Chem B 109:2469–2473 100. Wang Y, Noid WG, Liu P et al (2009) Effective force coarse-graining. Phys Chem Chem Phys 11:2002–2015 101. Uhlig F, Zeman J, Smiatek J et al (2018) First-principles parametrization of polarizable coarse-grained force fields for ionic liquids. J Chem Theory Comput 14:1471–1486 102. Dannenhoffer-Lafage T, Voth GA (2020) Reactive coarse-grained molecular dynamics. J Chem Theory Comput 16:2541–2549 103. Rao S, Li X, Liang H (2007) Developing coarse-grained force fields for polystyrene with different chain lengths from atomistic simulation. Macromol Res 15:610–616 104. Botan V, Ustach VD, Leonhard K et al (2017) Development and application of a coarsegrained model for PNIPAM by iterative Boltzmann inversion and its combination with lattice Boltzmann hydrodynamics. J Phys Chem B 121:10394–10406 105. Davtyan A, Schafer NP, Zheng W et al (2012) AWSEM-MD: protein structure prediction using coarse-grained physical potentials and bioinformatically based local structure biasing. J Phys Chem B 116:8494–8503 106. Kmiecik S, Kolinski A (2007) Characterization of protein-folding pathways by reducedspace modeling. Proc Natl Acad Sci U S A 104:12330–12335 107. Marrink S, Risselada H (2007) The MARTINI force field: coarse grained model for biomolecular simulations. J Phys Chem B 111:7812–7824 108. Monticelli L, Kandasamy SK, Periole X et al (2008) The MARTINI coarse-grained force field: extension to proteins. J Chem Theory Comput 4:819–834 109. Derreumaux P (1999) From polypeptide sequences to structures using Monte Carlo simulations and an optimized potential. J Chem Phys 111:2301–2310

74

Nguyen Truong Co et al.

110. Maupetit J, Tuffery P, Derreumaux P (2007) A coarse-grained protein force field for folding and structure prediction. Proteins 69:394–408 111. Pasi M, Lavery R, Ceres N (2013) PaLaCe: a coarse-grain protein model for studying mechanical properties. J Chem Theory Comput 9:785–793 112. Kar P, Gopal SM, Cheng Y-M et al (2013) PRIMO: a transferable coarse-grained force field for proteins. J Chem Theory Comput 9:3769–3788 113. Darre´ L, MacHado MR, Dans PD et al (2010) Another coarse grain model for aqueous solvation: WAT FOUR? J Chem Theory Comput 6:3793–3807 114. Machado MR, Barrera EE, Klein F et al (2019) The SIRAH 2.0 force field: altius, fortius, citius. J Chem Theory Comput 15:2719–2733 115. Liwo A, Ołdziej S, Pincus MR et al (1997) A united-residue force field for off-lattice protein-structure simulations. I. Functional forms and parameters of long-range sidechain interaction potentials from protein crystal data. J Comput Chem 18:849–873 116. Khalili M, Liwo A, Jagielska A et al (2005) Molecular dynamics with the united-residue model of polypeptide chains. II. Langevin and Berendsen-bath dynamics and tests on model alpha-helical systems. J Phys Chem B 109:13798–13810 117. Liwo A, Baranowski M, Czaplewski C et al (2014) A unified coarse-grained model of biological macromolecules based on meanfield multipole-multipole interactions. J Mol Model 20:2306 118. Singh N, Li W (2019) Recent advances in coarse-grained models for biomolecules and their applications. Int J Mol Sci 20:3774 119. Javanainen M, Martinez-Seara H, Vattulainen I (2017) Excessive aggregation of membrane proteins in the Martini model. PLoS One 12: e0187936 120. Marrink SJ, Corradi V, Souza PCT et al (2019) Computational modeling of realistic cell membranes. Chem Rev 119:6184 121. Wu H, Wolynes PG, Papoian GA (2018) AWSEM-IDP: a coarse-grained force field for intrinsically disordered proteins. J Phys Chem B 122:11115–11125 122. Ramis R, Ortega-Castro J, Casasnovas R et al (2019) A coarse-grained molecular dynamics approach to the study of the intrinsically disordered protein α-synuclein. J Chem Inf Model 59:1458–1471

123. Kosciolek T, Buchan DWA, Jones DT (2017) Predictions of backbone dynamics in intrinsically disordered proteins using de novo fragment-based protein structure predictions. Sci Rep 7:1–12 124. Jamroz M, Orozco M, Kolinski A et al (2013) Consistent view of protein fluctuations from all-atom molecular dynamics and coarsegrained dynamics with knowledge-based force-field. J Chem Theory Comput 9:119–125 125. Berman HM (2000) The protein data bank. Nucleic Acids Res 28:235–242 126. Bergga˚rd T, Linse S, James P (2007) Methods for the detection and analysis of proteinprotein interactions. Proteomics 7:2833–2842 127. Chen M, Schafer NP, Zheng W et al (2018) The associative memory, water mediated, structure and energy model (AWSEM)Amylometer: predicting amyloid propensity and fibril topology using an optimized folding landscape model. ACS Chem Neurosci 9:1027–1039 128. Chen X, Chen M, Schafer NP et al (2020) Exploring the interplay between fibrillization and amorphous aggregation channels on the energy landscapes of tau repeat isoforms. Proc Natl Acad Sci U S A 117:4125–4130 129. Saracino GAA, Fontana F, Jekhmane S et al (2018) Elucidating self-assembling peptide aggregation via morphoscanner: a new tool for protein-peptide structural characterization. Adv Sci 5:1800471 130. Frederix PWJM, Ulijn RV, Hunt NT et al (2011) Virtual screening for dipeptide aggregation: toward predictive tools for peptide self-Assembly. J Phys Chem Lett 2:2380–2384 131. Szała B, Molski A (2019) Aggregation kinetics of short peptides: all-atom and coarsegrained molecular dynamics study. Biophys Chem 253:106219 132. Pannuzzo M, Milardi D, Raudino A et al (2013) Analytical model and multiscale simulations of Aβ peptide aggregation in lipid membranes: towards a unifying description of conformational transitions, oligomerization and membrane damage. Phys Chem Chem Phys 15:8940–8951 133. Kar P, Gopal SM, Cheng YM et al (2014) Transferring the PRIMO coarse-grained force field to the membrane environment: simulations of membrane proteins and helixhelix association. J Chem Theory Comput 10:3459–3472

Computational Models for the Study of Protein Aggregation 134. Yu H, Han W, Ma W et al (2015) Transient β -hairpin formation in α -synuclein monomer revealed by coarse-grained molecular dynamics simulation. J Chem Phys 143:243142 135. Brodie NI, Popov KI, Petrotchenko EV et al (2019) Conformational ensemble of native α-synuclein in solution as determined by short-distance crosslinking constraint-guided discrete molecular dynamics simulations. PLoS Comput Biol 15:e1006859 136. Katyal N, Agarwal M, Sen R et al (2018) Paradoxical effect of trehalose on the aggregation of α-synuclein: expedites onset of aggregation yet reduces fibril load. ACS Chem Neurosci 9:1477–1491 137. Smit FX, Luiken JA, Bolhuis PG (2017) Primary fibril nucleation of aggregation prone tau fragments PHF6 and PHF6. J Phys Chem B 121:3250–3261 138. Cheon M, Chang I, Hall CK (2012) Influence of temperature on formation of perfect tau fragment fibrils using PRIME20/DMD simulations. Protein Sci 21:1514–1527 139. Chebaro Y, Pasquali S, Derreumaux P (2012) The coarse-grained OPEP force field for non-amyloid and amyloid proteins. J Phys Chem B 116:8741–8752 140. Sterpone F, Nguyen PH, Kalimeri M et al (2013) Importance of the ion-pair interactions in the OPEP coarse-grained force field: parametrization and validation. J Chem Theory Comput 9:4574–4584 141. Barroso Da Silva FL, Sterpone F, Derreumaux P (2019) OPEP6: a new constant-pH molecular dynamics simulation scheme with OPEP coarse-grained force field. J Chem Theory Comput 15:3875–3888 142. Kalimeri M, Derreumaux P, Sterpone F (2015) Are coarse-grained models apt to detect protein thermal stability? the case of OPEP force field. J Non-Cryst Solids 407:494–501 143. Spill YG, Pasquali S, Derreumaux P (2011) Impact of thermostats on folding and aggregation properties of peptides using the optimized potential for efficient structure prediction coarse-grained model. J Chem Theory Comput 7:1502–1510 144. Mousseau N, Derreumaux P (2005) Exploring the early steps of amyloid peptide aggregation by computers. Acc Chem Res 38:885–891 145. Song W, Wei G, Mousseau N et al (2008) Self-assembly of the β2-microglobulin NHVTLSQ peptide using a coarse-grained protein model reveals a β-barrel species. J Phys Chem B 112:4410–4418

75

146. Simone AD, Derreumaux P (2010) Low molecular weight oligomers of amyloid peptides display B-barrel conformations: a replica exchange molecular dynamics study in explicit solvent. J Chem Phys 132:165103 147. Ngo ST, Nguyen PH, Derreumaux P (2020) Stability of Aβ11-40 trimers with parallel and antiparallel β-sheet organizations in a membrane-mimicking environment by replica exchange molecular dynamics simulation. J Phys Chem B 124:617–626 148. Chiricotto M, Melchionna S, Derreumaux P et al (2019) Multiscale aggregation of the amyloid Aβ 16-22 peptide: from disordered coagulation and lateral branching to amorphous prefibrils. J Phys Chem Lett 10:1594–1599 149. Sterpone F, Derreumaux P, Melchionna S (2015) Protein simulations in fluids: coupling the OPEP coarse-grained force field with hydrodynamics. J Chem Theory Comput 11:1843–1853 150. Lee J, Liwo A, Scheraga HA (1999) Energybased de novo protein folding by conformational space annealing and an off-lattice united-residue force field: application to the 10-55 fragment of staphylococcal protein A and to apo calbindin D9K. Proc Natl Acad Sci U S A 96:2025–2030 151. He Y, Mozolewska MA, Krupa P et al (2013) Lessons from application of the UNRES force field to predictions of structures of CASP10 targets. Proc Natl Acad Sci U S A 110:14936 152. Lensink MF, Brysbaert G, Nadzirin N et al (2019) Blind prediction of homo‐ and hetero‐ protein complexes: the CASP13‐CAPRI experiment. Proteins 87:1200–1221 153. Rojas AV, Liwo A, Scheraga HA (2007) Molecular dynamics with the United-residue force field: ab initio folding simulations of multichain proteins. J Phys Chem B 111:293–309 154. Yas¸ar F, Sieradzan AK, Hansmann UHE (2014) Folding and self-assembly of a small heterotetramer. J Chem Phys 140:105103 155. Nguyen HL, Krupa P, Hai NM et al (2019) Structure and physicochemical properties of the Aβ42 tetramer: multiscale molecular dynamics simulations. J Phys Chem B 123:7253–7269 156. Khoury GA, Liwo A, Khatib F et al (2014) WeFold: a coopetition for protein structure prediction. Proteins 82:1850–1868 157. Karczyn´ska AS, Mozolewska MA, Krupa P et al (2017) Prediction of protein structure with the coarse-grained UNRES force field assisted by small X-ray scattering data and

76

Nguyen Truong Co et al.

knowledge-based information. Proteins Struct Funct Bioinf 86:228 158. Krupa P, Mozolewska MA, Joo K et al (2015) Prediction of protein structure by templatebased modeling combined with the UNRES force field. J Chem Inf Model 55:1271–1281 159. Mozolewska MA, Krupa P, Zaborowski B et al (2016) Use of restraints from consensus fragments of multiple server models to enhance protein-structure prediction capability of the UNRES force field. J Chem Inf Model 56:2263–2279 160. Rojas A, Liwo A, Browne D et al (2010) Mechanism of fiber assembly: treatment of Aβ peptide aggregation with a coarse-grained united-residue force field. J Mol Biol 404:537–552 161. Rojas A, Maisuradze N, Kachlishvili K et al (2017) Elucidating important sites and the mechanism for amyloid fibril formation by coarse-grained molecular dynamics. ACS Chem Neurosci 8:201–209 162. Rojas AV, Liwo A, Scheraga HA (2011) A study of the α-helical intermediate preceding the aggregation of the amino-terminal fragment of the β amyloid peptide (Aβ 1-28). J Phys Chem B 115:12978–12983 163. Rojas AV, Maisuradze GG, Scheraga HA (2018) Dependence of the formation of Tau and Aβ peptide mixed aggregates on the secondary structure of the N-terminal region of Aβ. J Phys Chem B 122:7049–7056 164. Barz B, Liao Q, Strodel B (2018) Pathways of amyloid-β aggregation depend on oligomer shape. J Am Chem Soc 140:319–327 165. Va´cha R, Frenkel D (2011) Relation between molecular shape and the morphology of selfassembling aggregates: a simulation study. Biophys J 101:1432–1439 166. Bieler NS, Knowles TPJ, Frenkel D et al (2012) Connecting macroscopic observables and microscopic assembly events in amyloid formation using coarse grained simulations. PLoS Comput Biol 8:e1002692 167. Sˇaric´ A, Chebaro YC, Knowles TPJ et al (2014) Crucial role of nonspecific interactions in amyloid nucleation. Proc Natl Acad Sci 111:17869 168. Michaels TCT, Liu LX, Curk S et al (2018) Reaction rate theory for supramolecular kinetics: application to protein aggregation. Mol Phys 116:3055–3065 169. Sˇaric´ A, Buell AK, Meisl G et al (2016) Physical determinants of the self-replication of protein fibrils. Nat Phys 12:874–880 170. Va´cha R, Linse S, Lund M (2014) Surface effects on aggregation kinetics of

amyloidogenic peptides. J Am Chem Soc 136:11776–11782 171. Barz B, Urbanc B (2014) Minimal model of self-assembly: emergence of diversity and complexity. J Phys Chem B 118:3761–3770 172. Hoang TX, Trovato A, Seno F et al (2004) Geometry and symmetry presculpt the freeenergy landscape of proteins. Proc Natl Acad Sci U S A 101:7960–7964 173. Auer S, Dobson CM, Vendruscolo M (2007) Characterization of the nucleation barriers for protein aggregation and amyloid formation. HFSP J 1:137–146 174. Auer S, Meersman F, Dobson CM et al (2008) A generic mechanism of emergence of amyloid protofilaments from disordered oligomeric aggregates. PLoS Comput Biol 4: e1000222 175. Hung NB, Le D-M, Hoang TX (2017) Sequence dependent aggregation of peptides and fibril formation. J Chem Phys 147:105102 176. Mioduszewski Ł, Cieplak M (2018) Disordered peptide chains in an α-C-based coarsegrained model. Phys Chem Chem Phys 20:19057–19070 177. Mioduszewski Ł, Cieplak M (2020) Protein droplets in systems of disordered homopeptides and the amyloid glass phase. Phys Chem Chem Phys 22:15592–15599 178. Mioduszewski Ł, Ro´z˙ycki B, Cieplak M (2020) Pseudo-improper-dihedral model for intrinsically disordered proteins. J Chem Theory Comput 16:4726–4733 179. Ilie IM, Otter WK, Briels WJ (2016) A coarse grained protein model with internal degrees of freedom. Application to α-synuclein aggregation. J Chem Phys 144:85103 180. Ilie IM, Otter WK, Briels WJ (2017) The attachment of α-synuclein to a fiber: a coarse-grain approach. J Chem Phys 146:115102 181. Pellarin R, Caflisch A (2006) Interpreting the aggregation kinetics of amyloid peptides. J Mol Biol 360:882–892 182. Pellarin R, Guarnera E, Caflisch A (2007) Pathways and intermediates of amyloid fibril formation. J Mol Biol 374:917–924 183. Pellarin R, Schuetz P, Guarnera E et al (2010) Amyloid fibril polymorphism is under kinetic control. J Am Chem Soc 132:14960–14970 184. Friedman R, Caflisch A (2011) Surfactant effects on amyloid aggregation kinetics. J Mol Biol 414:303–312 185. Magno A, Caflisch A, Pellarin R (2010) Crowding effects on amyloid aggregation kinetics. J Phys Chem Lett 1:3027–3032

Computational Models for the Study of Protein Aggregation 186. Friedman R, Pellarin R, Caflisch A (2009) Amyloid aggregation on lipid bilayers and its impact on membrane permeability. J Mol Biol 387:407–415 187. Bellesia G, Shea J-E (2007) Self-assembly of β-sheet forming peptides into chiral fibrillar aggregates. J Chem Phys 126:245104 188. Bellesia G, Shea J-E (2009) Diversity of kinetic pathways in amyloid fibril formation. J Chem Phys 131:111102 189. Morriss-Andrews A, Brown FLH, Shea J-E (2014) A coarse-grained model for peptide aggregation on a membrane surface. J Phys Chem B 118:8420–8432 190. Li MS, Klimov DK, Straub JE et al (2008) Probing the mechanisms of fibril formation using lattice models. J Chem Phys 129:175101 191. Li MS, Co NT, Reddy G et al (2010) Factors governing fibrillogenesis of polypeptide chains revealed by lattice models. Phys Rev Lett 105:218101 192. Co NT, Hu C-K, Li MS (2013) Dual effect of crowders on fibrillation kinetics of polypeptide chains revealed by lattice models. J Chem Phys 138:185101 193. Co NT, Li MS (2012) New method for determining size of critical nucleus of fibril formation of polypeptide chains. J Chem Phys 137:95101 194. Kouza M, Co NT, Nguyen PH et al (2015) Preformed template fluctuations promote fibril formation: insights from lattice and all-atom models. J Chem Phys 142:145104 195. Kouza M, Co NT, Li MS et al (2018) Kinetics and mechanical stability of the fibril state control fibril formation time of polypeptide chains: a computational study. J Chem Phys 148:215106 196. Irb€ack A, Jo´nsson S, Linnemann N et al (2013) Aggregate geometry in amyloid fibril nucleation. Phys Rev Lett 110:058101 197. Zhang J, Muthukumar M (2009) Simulations of nucleation and elongation of amyloid fibrils. J Chem Phys 130:35102 198. Abeln S, Vendruscolo M, Dobson CM et al (2014) A simple lattice model that captures protein folding, aggregation and amyloid formation. PLoS One 9:e85185 199. Ni R, Abeln S, Schor M et al (2013) Interplay between folding and assembly of fibrilforming polypeptides. Phys Rev Lett 111:58101 200. Tran TT, Nguyen PH, Derreumaux P (2016) Lattice model for amyloid peptides: OPEP force field parametrization and applications

77

to the nucleus size of Alzheimer’s peptides. J Chem Phys 144:205103 201. Chiti F, Stefani M, Taddei N et al (2003) Rationalization of the effects of mutations on peptide andprotein aggregation rates. Nature 424:805–808 202. Bowerman CJ, Ryan DM, Nissan DA et al (2009) The effect of increasing hydrophobicity on the self-assembly of amphipathic β-sheet peptides. Mol BioSyst 5:1058–1069 203. Fowler DM, Koulov AV, Alory-Jost C et al (2005) Functional amyloid formation within mammalian tissue. PLoS Biol 4:e6 204. Ferrone F (1999) Analysis of protein aggregation kinetics. Methods Enzymol 309:256–274 205. Asakura S, Oosawa F (1954) On interaction between two bodies immersed in a solution of macromolecules. J Chem Phys 22:1255–1256 206. Zhou H-X, Rivas G, Minton AP (2008) Macromolecular crowding and confinement: biochemical, biophysical, and potential physiological consequences. Annu Rev Biophys 37:375–397 207. Obrien EP, Straub JE, Brooks BR et al (2011) Influence of nanoparticle size and shape on oligomer formation of an amyloidogenic peptide. J Phys Chem Lett 2:1171–1177 208. Wu WH, Sun X, Yu YP et al (2008) TiO2 nanoparticles promote β-amyloid fibrillation in vitro. Biochem Biophys Res Commun 373:315–318 209. Cabaleiro-Lago C, Quinlan-Pluck F, Lynch I et al (2010) Dual effect of amino modified polystyrene nanoparticles on amyloid β protein fibrillation. ACS Chem Neurosci 1:279–287 210. Co NT, Lan PD, Quoc Huy PD et al (2020) Heat-induced degradation of fibrils: exponential vs logistic kinetics. J Chem Phys 152:115101 211. Yu I, Mori T, Ando T et al (2016) Biomolecular interactions modulate macromolecular structure and dynamics in atomistic model of a bacterial cytoplasm. elife 5:e19274 212. Harada R, Tochio N, Kigawa T et al (2013) Reduced native state stability in crowded cellular environment due to protein-protein interactions. J Am Chem Soc 135:3696–3701 213. Ignatova Z, Krishnan B, Bombardier JP et al (2007) From the test tube to the cell: exploring the folding and aggregation of a β-clam protein. Biopolymers 88:157–163 214. Zhou HX (2013) Influence of crowded cellular environments on protein folding, binding, and oligomerization: biological consequences

78

Nguyen Truong Co et al.

and potentials of atomistic modeling. FEBS Lett 587:1053–1061 215. Ng W (2015) What drives computational chemistry forward: theory or computational power? Preprint 216. Aldeghi M, Biggin PC (2017) Advances in molecular simulation. In: Comprehensive medicinal chemistry III. Elsevier Inc, Amsterdam, pp 14–33 217. Ren P, Ponder JW (2002) Consistent treatment of inter- and intramolecular polarization in molecular mechanics calculations. J Comput Chem 23:1497–1506

218. Ponder JW, Wu C, Ren P et al (2010) Current status of the AMOEBA polarizable force field. J Phys Chem B 114:2549–2564 219. Faller P, Hureau C, Penna GL (2014) Metal ions and intrinsically disordered proteins and peptides: from Cu/Zn amyloid-β to general principles. Acc Chem Res 47:2252–2259 220. Gkeka P, Stoltz G, Barati Farimani A et al (2020) Machine learning force fields and coarse-grained variables in molecular dynamics: application to materials and biological systems. J Chem Theory Comput 16:4775

Chapter 5 Probing Protein Aggregation Using the Coarse-Grained UNRES Force Field Ana V. Rojas, Gia G. Maisuradze, Harold A. Scheraga, and Adam Liwo Abstract Protein aggregation is the cause of many, often lethal, diseases, including the Alzheimer’s, Parkinson’s, and Huntington’s diseases, and familial amyloidosis. Theoretical investigation of the mechanism of this process, including the structures of the oligomeric intermediates which are the most toxic, is difficult because of long time scale of aggregation. Coarse-grained models, which enable us to extend the simulation time scale by three or more orders of magnitude, are, therefore, of great advantage in such studies. In this chapter, we describe the application of the physics-based UNited RESidue (UNRES) force field developed in our laboratory to study protein aggregation, in both free simulations and simulations of aggregation propagation from an existing template (seed), and illustrate it with the examples of Aβ-peptide aggregation and Aβ-peptide-assisted aggregation of the peptides derived from the repeat domains of tau (TauRD). Key words Protein aggregation, Coarse graining, UNRES force field, Molecular dynamics

1

Introduction Protein and peptide aggregation is a well-known phenomenon, which is behind many neurodegenerative diseases such as Alzheimer’s, Parkinson’s, and Huntington’s disease, familial amyloidosis, and many others. The formation of amyloid plaques is the last stage of this process, and a plaque is not as harmful as are its oligomer precursors, which can pierce and, eventually, destroy neuron cells [1]. Both the entire proteins (e.g., human cystatin C [2]) and peptides cleaved from proteins [e.g., the amyloid-beta, Aβ42 peptide cleaved from the amyloid-precursor protein [3], polyglutamate chains cleaved from huntingtin [1], the repeat domains of tau (TauRD) [4]] are capable of forming aggregates. While the structures of several mature amyloids have been recently elucidated [5, 6], those of intermediates or the pathways along which these intermediates grow into amyloids are not well understood.

Harold A. Scheraga was deceased at the time of publication. Mai Suan Li et al. (eds.), Computer Simulations of Aggregation of Proteins and Peptides, Methods in Molecular Biology, vol. 2340, https://doi.org/10.1007/978-1-0716-1546-1_5, © Springer Science+Business Media, LLC, part of Springer Nature 2022

79

80

Ana V. Rojas et al.

Molecular simulations remain the only way to get some insight into these structures and phenomena. Because of the large time scale of oligomerization and aggregation, coarse-grained approaches, which offer time-scale extension by several orders of magnitude with respect to all-atom simulations [7], are a reasonable choice here. In this chapter, we describe the use of the physics-based coarsegrained UNRES model of polypeptide chains [8–14] in studying the oligomerization and aggregation of Aβ peptides and Aβ peptides with the peptides from TauRD. In Sect. 2, we briefly describe the UNRES model, the implementation of molecular dynamics and its extensions with this model and the tools for the analysis of simulation results. A brief description of the UNRES software is given in Sect. 3. The protocols used to simulate oligomerization as well as amyloid growth starting from a pre-formed template are described in Sect. 4. Examples of applications of the UNRES-based methodology to study aggregation mechanisms are presented in Sect. 5.

2

Methods

2.1 UNRES Model and Force Field

U

¼ w SC

X U SCi SC j i λ2(2N5)  T , and the corresponding eigenvectors ek ¼ e k1 , e k2 , . . . , e k2N , where T is the transpose of a matrix, 5 with eki ¼ e ki ð1Þ; e ki ð2Þ , where e ki ð1Þ and e ki ð2Þ are the components of the projection of the eigenvector ek on the ith internal coordinate corresponding to the cosine and the sine of θ or γ, respectively (cf. Eqs. 6 and 7). The contribution of angle i to a mode k is the so-called influence: h 2  2 i ð8Þ νki ¼ e ki ð1Þ þ e ki ð2Þ The total mean square fluctuation (MSF) of the coarse-grained internal coordinates can be decomposed into modes: E XX XD MSF ¼ λk νki ð9Þ ðui  hui iÞ2 ¼ i

k

i

The eigenmodes with the largest eigenvalues λk correspond to the collective modes contributing the most to the MSF of the protein (see Eq. 9). The projection of the trajectory on the eigenvector ek is named the principal component PCk. A few PCs are sufficient to describe a large part of the fluctuations of a protein.

Probing Protein Aggregation Using the Coarse-Grained UNRES Force Field

3

85

Software Description and Availability The UNRES package is available from www.unres.pl; this page also contains installation instructions, instructions on how to run the programs constituting the package, and the description of input and output files. The package includes the standard version of UNRES (with which the simulations reported in this chapter were performed) and the latest additions, including the extension to lipid membrane environment [42], phosphorylated proteins [43], protein–nucleic acid interaction [44], as well as the latest scale-consistent version of UNRES [13, 14]. Both single-chain and oligomeric proteins can be processed. The package consists of UNRES (production simulations), WHAM (post-processing the results of replica exchange simulations with WHAM), CLUSTER (clustering), and XDRFPDB (coordinate format conversion) sections. Both the original FORTRAN 77 and the newer FORTRAN 90 [45] codes are available. For storage of trajectory files, the freely available Europort Data Compression XDRF library (http:// hpcv100.rc.rug.nl/xdrfman.html) has been implemented, and the BLAS routines [46] are used for matrix diagonalization. The package can be installed on any platform running UNIX but for systems exceeding 1000 residues 1 GB RAM/core is a minimum memory requirement and a parallel platform running Message Passing Interface (MPI) library is required to carry our REMD and MREMD calculations. For less demanding calculations, UNRES server can be accessed at http://unres-server.chem.ug.edu.pl [47]. Both canonical and (M)REMD simulations can be run using this server. Instructions on how to prepare and run jobs and other documentation are available from the server web page, while the description is provided in ref. 47.

4

Simulation Protocols In this section we show how the methodology described in Sect. 2 has been implemented in studying two phases of the aggregation processes: initial oligomer formation from isolated monomers and adding a new monomer to a pre-existing template, with the examples of simulations of the oligomerization of Aβ128 and aggregation of Aβ140 peptides, respectively.

4.1 Simulating the Formation of Peptide Aggregates from Isolated Monomers

When designing MD simulations, running multiple trajectories is recommended. This strategy helps to explore different regions of the conformational space, resulting in its better coverage. This practice is particularly important when working with disordered proteins. The diversity of the conformations explored by these

86

Ana V. Rojas et al.

proteins cannot be captured by a single trajectory, within reasonable time. This disordered nature is also translated to the aggregation process. For example, we have noticed, with aggregation simulations of the Aβ128 peptide (a truncated version of the amyloid β peptide), that the oligomeric states from different trajectories were also quite diverse [31]. Therefore, when studying the aggregation of peptides, we recommend, if resources are available, to launch multiple (ideally 100) independent trajectories. In UNRES, the number of independent trajectories is determined by the number of processors requested in the launching script, and by default, each trajectory will start with a different set of initial velocities. 4.1.1 Analysis of Aggregation Simulations

When analyzing aggregation trajectories, it is desirable to identify the presence of aggregates without having to visually inspect the trajectories. Because amyloid aggregates are characterized by intermolecular β-sheets, the presence of such structures can be used to assess the aggregation state of the system throughout a trajectory. As an example, we will discuss the results from our work on the aggregation of Aβ128 [31]. Figure 2a, b shows the fractions of residues in α-helix or β-sheet conformations, as well as the fraction of chains forming a dimer or higher order oligomer. The data were collected from 40 MD trajectories of a system of five monomers. Each trajectory was 100 ns long. Figure 2 shows the fractions of residues in the α-helix (blue) and β-sheet (red) conformations, as well as the fractions of chains forming a dimer or higher order oligomer (green). The monomers were initially either in the extended (Fig. 2a) or in the helical (Fig. 2b) conformation and were separated by a 30 Å distance. Aβ128 shows certain tendency to adopt α-helical conformations, evidenced by the fast growth in the α-helical content in the trajectories that were started from the extended conformation (Fig. 2a). Regardless of the initial conformation, as the trajectory progresses, the α-helical content slowly decreases, while the β-content increases (Fig. 2a, b). The increase of the β-sheet content is due to the formation of intermolecular β-sheets. As more and more chains join the initial aggregates, the percentage of chains in an oligomeric state increases, and because they associate by forming β-sheets, their helical structure is lost. While the conformations and aggregation states explored by the different trajectories result in ensemble averages that can be related to experimental techniques, such as circular dichroism, it is useful, and the main advantage of MD simulations, to inspect the trajectories to better understand the process. For instance, a representative example of aggregation progresse can be seen in Fig. 3. The sequence shows how an initial dimer is later joined by other chains to form a trimer and finally a tetramer, describing how the helical structures disappear as the β-sheets are formed.

Probing Protein Aggregation Using the Coarse-Grained UNRES Force Field

87

Fig. 2 Fractions of residues with α-helical (blue) and β-sheet (red) conformations, and the fraction of chains that have formed dimers or higher order oligomers (green). The values were obtained by averaging over 40 canonical MD trajectories of five-chain systems with the chains initially in the extended (a) or helical (b) conformation. The inset on Panel (a) shows the fast growth of the α-helical content during the first 4 ns of simulation. (Reprinted with permission from J. Phys. Chem. B 115, 12978 (2011). Copyright (2011) American Chemical Society)

Fig. 3 Snapshots from a representative trajectory illustrating the process of aggregation. Different colors are used for different chains. Oligomerization starts with the formation of a dimer, which acts as a seed and triggers further aggregation. (Adapted with permission from J. Phys. Chem. B 115, 12978 (2011). Copyright (2011) American Chemical Society)

88

Ana V. Rojas et al.

4.2 Adding New Monomers to a Pre-existing Template 4.2.1 System Setup

Fibril formation is characterized by an initial phase, during which a critical nucleus (seed) is formed, followed by an elongation phase, during which new monomers join the seed [48]. The size of the critical nucleus could range between tens to hundreds of peptides. Even with coarse-grained models, simulating the growth of a realsize fibril nucleus would be prohibitive, at least for a peptide of the size of Aβ40. Therefore, a template that replaces a real nucleus is often used. We illustrate this approach by describing our work [49] on the growth of fibrils of the Aβ40 peptide. The fibrils formed by Aβ40 exhibit the characteristic cross-β structure, in which peptide chains adopt an extended conformation, lying perpendicular to the axis of the fibril, and held together by inter-molecular hydrogen bonds parallel to the fibril axis [50] (see Fig. 4a, b). In this work, we used the structure determined by Petkova et al. [50], in which each Aβ40 peptide adopts a structure that resembles a β-hairpin (Fig. 4a). Two chains lie in a plane, and the chains stack on top of each other, connected by a network of hydrogen bonds parallel to the axis of the fibril (Fig. 4b). In the work we describe here [49], we used a two-layer fibril to model the nucleus. We termed this construct a template, onto which the monomers could bind. The incoming monomer was placed at a 20 Å distance from the surface of the fibril template, as shown in Fig. 4c. Because the incoming monomer will only bind to the ends of a fibril, a two-layer template was enough to capture the interactions between the monomer and a real fibril [49]. The initial conformation was generated with Pymol by manually placing the monomer at the desired distance. The conformation was saved in the PDB format and defined the initial geometry of the system. To give the template the stability of a real fibril, distance restraints were applied. This was done by adding an additional term, URestr to the UNRES potential energy function given by Eq. 10 X U Restr ¼ w Restr ½Q ðl Þ  12 ð10Þ l

where the index l runs over all the segments being restrained, wRestr is the weight of the term, set at 5  104 kcal/mol, and Q(l) is given by Eq. 11 " #  2 X 1 1 nat Q ðl Þ ¼ ð11Þ exp  d i,j  d i,j N distl 2 i, j

where di,j and d nat i,j are the current and initial distances between the Cα atoms of residues i and j, respectively, and N distl is the total number of distances in segment l. The restraints were applied to the distances between all residues in a chain as well as to those between residues belonging to different chains, but only if those chains were in contact. These sets of restraints allowed a certain flexibility to the template and resulted in higher rate of native binding than harmonic restraints.

Probing Protein Aggregation Using the Coarse-Grained UNRES Force Field

89

Fig. 4 Backbone representation of an Aβ940 monomer based on the model with striated ribbon morphology by Petkova et al. [50] (a) and structural model of an Aβ940 fibril in the same model (b). Residues with index 18 are omitted from the diagram because they were conformationally disordered in the NMR model. (c) A two-layer fibril (four chains) acts as a template for the free monomer to bind. Restraining forces are used to stabilize the chain in the template, but no restraints are applied on the free monomer, which is initially placed 20 Å away from the concave end of the template. (Adapted with permission from A. Rojas et al., ACS Chem. Neurosci. 8, 201–209 (2017). Copyright (2017) American Chemical Society) 4.2.2 Running the MD Simulations

Our canonical simulations showed that the binding of a free monomer of Aβ to a fibril template was characterized by a series of non-native binding and full or partial unbinding events before native binding could be seen. During this process, the monomer might get trapped in a meta-stable state and never progress to native or near-native binding during the course of a simulation. Therefore, to speed up the dissociation from these states, we used replica exchange molecular dynamics (REMD) with a small range of temperatures, between 280 and 320 K. A total of 120 trajectories, with 40 trajectories at 300 K and 20 trajectories at 280, 290, 310, and 320 K were run. This moderate range of temperature helped to speed up the binding without disrupting the stability of the template.

90

Ana V. Rojas et al.

4.2.3 Analysis of Results

For a monomer to join the template to follow its pattern, residues i, i + 1, i + 2, . . ., iN, where N is the size of the peptide, must form hydrogen bonds to residues i, i + 1, i + 2, . . ., iN, respectively, with one of the external template chains. Therefore, to identify the degree of binding of a monomer to the template, we kept track of the hydrogen bonds between the monomer and each of the chains in the template. NHBs are those hydrogen bonds made between peptide groups with the same indices (often referred as “inregister” HBs [50]), while all other hydrogen bonds are considered nNHBs. By computing the number of native and non-native hydrogen bonds (NHB and nNHB, respectively), we could identify the trajectories exhibiting native binding. But we also used this information to understand the type of conformations and binding modes present in the ensemble of conformations explored by the free monomers, even those that did not result in native binding. Using NHB and nNHB as reaction coordinates, we constructed a two-dimensional potential of mean force (PMF), w(NHB,nNHB), as defined by Eq. 12 w ðNHB, nNHBÞ ¼ kB T ln ðh f ðNHB, nNHBÞiÞ

ð12Þ

where kB is the Boltzmann constant, T is the temperature (300 K), and hf(NHB, nNHB)i is the fraction of structures with a given number of NHBs and nNHBs averaged over snapshots. To remove the bias from the REMD simulations, we only used the conformations at 300 K. The PMF and the representative structures at different points are shown in Fig. 5. Several basins can be identified. A small basin at NHB ¼ nNHB ¼ 0 contains those conformations in which the monomer has not docked onto the fibril (snapshot A). The basin at NHB  2, and 5  nNHB  16, corresponds to conformations for which the monomer has bound as an antiparallel β-sheet (snapshots B–D). Two more basins can be seen at NHB ¼ 11, and nNHB ¼ 2 (snapshot F) and nNHB ¼ 7 (snapshots G and H), populated by conformations in which the monomer has formed native hydrogen bonds along either the N- or the C-terminal strand, but still the other end of the peptide has made few nonnative hydrogen bonds. Finally, a smaller basin can be seen at 25  NHB  27, and nNHB ¼ 0. This latest basin corresponds to native like conformations (snapshot E). Another important remark is the presence of a scarcely populated region at 20  NHB  25 separating the native basin from the rest of the regions, indicating a free-energy barrier.

5

Applications In this section, we discuss more applications of using the UNRES force field to investigate peptide and protein aggregation. The brain of patients with Alzheimer’s disease (AD) exhibits extracellular

Probing Protein Aggregation Using the Coarse-Grained UNRES Force Field

91

Fig. 5 A two-dimensional PMF as a function of the number of native and nonnative hydrogen bonds (NHB and nNHB), obtained from REMD simulations of a free monomer interacting with a four-chain fibril template. Snapshots of representative conformations in each basin are shown. The chains in the template are colored from light blue, while the free monomer is colored from orange to red. The small basin at NHB ¼ nNHB ¼ 0 corresponds to unbound conformations (snapshot A). A long basin at NHB  2, and 5 < nNHB  16, corresponds to conformations with nonnative (antiparallel or off-registry for more than three amino acids) binding (snapshots B–D). Two basins at NHB ¼ 11, correspond to conformations with one strand locked in the fibrillar conformation (snapshots F–H). The small basin at 25  NHB  27 and nNHB ¼ 0 is the native basin (snapshot E). (Reprinted from JMB, 404, A. Rojas, A. Liwo, D. Browne, H. A. Scheraga, Mechanism of Fiber Assembly: Treatment of A Peptide Aggregation with a Coarse-Grained United-Residue Force Field, 537–552, Copyright (2010), with permission from Elsevier)

plaques formed by fibrils of the amyloid-β (Aβ) peptide and intracellular tangles of a different fibrillar material, often referred to as paired helical filaments (PHFs), formed by the tau protein. The presented applications address the following two important issues: (1) what governs the growth and stability of the fibrils formed by the Aβ peptide and (2) what triggers the aggregation of the peptides from the tau protein. 5.1 Important Sites and the Mechanism for Amyloid Fibril Formation

It is well-known that the Aβ peptide rapidly aggregates to form oligomers, protofibrils, and fibrils en route to the deposition of amyloid plaques associated with AD. Because of the inability to identify and follow the low concentration of oligomers over time experimentally, it is difficult to determine the fundamental microscopic reactions taking place during aggregation. Nowadays only the experimental structures of the fibrils for the most common forms of Aβ(140) [6, 50] and Aβ(142) [51–54], determined by

92

Ana V. Rojas et al.

Fig. 6 The native hydrogen bonds (black) and the nonnative hydrogen bonds (red) between the free monomer and the fibril template formed during the entire trajectory. (Reprinted with permission from A. Rojas et al., ACS Chem. Neurosci. 8, 201–209 (2017). Copyright (2017) American Chemical Society)

Fig. 7 Free-energy landscape (in kcal/mol) along the first two PCs, μ(PC1,PC2) ¼ kBT ln P(PC1,PC2), with representative structures at the minima for the free monomer aggregation trajectory. (Reprinted with permission from A. Rojas et al., ACS Chem. Neurosci. 8, 201–209 (2017). Copyright (2017) American Chemical Society)

solid-state NMR spectroscopy and cryo-electron microscopy, are available; therefore, it is of interest to model the entire process of Aβ-fibril elongation, which would enable us to identify not only earlier-detected steps (dock and lock) for the Aβ monomer [55– 57], but also the details of those microscopic reactions that take place during aggregation and are very difficult to observe by experiment.

Probing Protein Aggregation Using the Coarse-Grained UNRES Force Field

93

Fig. 8 Thermal unfolding of the free monomer from the fibril template. The values on the right side of panels and corresponding colors indicate the probabilities that the native hydrogen bonds between the residues in the free monomer and the corresponding residues in one of the chains of the template remained formed over the entire trajectories. The probability at any time was calculated by counting the fraction of trajectories (over 16 trajectories) in which a particular hydrogen bond was still formed at that specific time. (Reprinted with permission from A. Rojas et al., ACS Chem. Neurosci. 8, 201–209 (2017). Copyright (2017) American Chemical Society)

For this study, the Aβ(140) fibril polymorph with twofold symmetry [50] (Fig. 4a, b) was selected and 120 canonical and REMD simulations were carried out for 35  106 steps for each trajectory (which is equivalent to 85.5μs effective simulation time per replica) [58]. Each REMD simulation started with the monomer in an extended conformation at a distance of 20 Å from the

94

Ana V. Rojas et al.

surface of the template. The monomer was placed facing the concave end of the fibril (Fig. 4c). In order to save computational time, the fibril template consisted of two layers (i.e., four chains) with restraining forces stabilizing its fibrillar conformation. To identify the degree of binding of a monomer to the template, we kept track of the hydrogen bonds between the monomer and each of the chains in the template (see Sect. 4.2.3). To describe how a free monomer first docks onto an amyloid template and then locks, the hydrogen bonds (HBs) between the free monomer and the chains in the template were classified into native and nonnative hydrogen bonds (NHBs and nNHBs; see Sect. 4.2.3). Several different binding scenarios were observed during the search for the fibril conformation by a free monomer including the trajectories in which the monomer binds the template making HBs that are shifted with respect to native only by one or two residues. The total number of HBs in these trajectories is almost as large as that in the native conformation. This makes the binding quite stable because, in order to adopt the native binding, a long β-strand has to be unzipped. These shifted conformations were identified as an intermediate preceding native binding [58–61]. Figure 6 illustrates how the free monomer binds to the fibril template and locks itself into the fibril making NHBs along both β-strands (black lines correspond to NHBs and red lines correspond to nNHBs). In particular, the first β-strand of the free monomer binds nonnatively to one of the chains in the template from the beginning of the trajectory (docking step). At the 1  107 th time step, the second β-strand of the free monomer starts to make NHBs with the same chain in the template, and then at the 1.6  107 th time step, the loop connecting the first and second β-strands (except for residue Val16) joins the second β-strand in the native binding (locking step). Although both native and nonnative bindings last quite long [until (2.8–2.9)  107 time steps], the free monomer completely unbinds from the template, and right after that, almost instantly, residues Leu9, Val10, Phe11, and Phe12 of the first β-strand of the free monomer bind natively to that same chain in the template (docking step) followed by the rest of the monomer (locking step). Moreover, these residues remain natively bound to the template throughout the time interval of (3.03–3.1)  107, during which the rest of the first β-strand is unbound. These results indicate that the native binding of residues Leu9, Val10, Phe11, and Phe12 to the fibril template plays a crucial role in aggregation of the free monomer. Moreover, the pathway along which the free monomer binds to the fibril template and successfully propagates the fibrillar conformation is determined by the free-energy landscape (FEL) along the first two PCs in Fig. 7. The aggregation pathway illustrated in Fig. 7 by two major basins along with several minima and representative structures is in agreement with the results shown in Fig. 6. In

Probing Protein Aggregation Using the Coarse-Grained UNRES Force Field

95

particular, most of the conformations (except minimum 3), in which the free monomer (rainbow-colored chain) is partially and nonnatively bound to the fibril template are found in the left basin, the first β-strand being shifted by one residue from the corresponding residues of the chain in the template, and the conformations in which both β-strands of the free monomer are bound (nonnatively and natively) to the template are found in the right basin. In order to validate the findings regarding the residues playing a crucial role in the stability of Aβ fibrils, thermal dissociation of an Aβ peptide from the fibril template was studied. In particular, 16 MD trajectories with 4  106 times step intervals at eight different temperatures from 290 to 430 K were run, in which the four chains of the template were restrained to the fibrillar conformation, and the monomer was allowed to move freely. As can be seen from Fig. 8, the hydrogen bonds are very stable up to 370 K for most of the residues, and they start breaking at 390 K. The most stable hydrogen bonds remained formed, albeit for a short time, at the highest temperature of 430 K. Those are the hydrogen bonds formed by residues Lys8, Leu9, Val10, Phe11, and Phe12 (KLVFF). Thus, it can be concluded that the KLVFF hydrophobic core plays a crucial role in aggregation. This finding is in agreement with earlier and recent experimental [62–68] and computational results [55–57, 69–71]. Also, the correct binding, with native hydrogen bonds, of the free monomer to the fibril template at both stages is crucial for fibril elongation. In other words, if the monomer is incorrectly bound (with nonnative hydrogen bonds) to the fibril template during the first “docking” stage, it can remain attached to it for a long time before it dissociates and either attempts a different binding or allows another monomer to bind. This finding is consistent with an experimentally observed “stop-and-go” mechanism of fibril growth [65–67]. 5.2 The Formation of Tauand Aβ-Peptide-Mixed Aggregates

It is well-known that the tau protein is highly soluble and does not aggregate normally [72, 73]. The widely accepted “amyloid hypothesis” [4] states that Aβ fibrils appear early in the disease, and they facilitate the aggregation of tau, although the details of this interaction, critical for the design of drugs aimed to prevent it, are not well understood. There are many discrepancies regarding the mechanism of the formation of tau aggregates. Therefore, it is important to understand the structural principles governing the interaction between Aβ and tau [74]. Six different tau isoforms exist in the human brain. The C-terminal half of tau contains what is known as the repeat domain. Three of the six tau isoforms contain four repeats (R1R4), but the other three contain only three of the repeats (R1, R3, and R4). The repeat domain of tau (TauRD) forms the core of tau fibrils [75] and

96

Ana V. Rojas et al.

Fig. 9 Simulations of Tau3RD or Tau4RD and an Aβ40 fibril template. (a) The probability for each residue in the Aβ peptides of forming a hydrogen bond with any of the residues in the Tau monomer. The two regions along the Aβ peptides that adopt a β-sheet conformation are shaded in pink. Hydrogen bonding occurs mainly along the N-terminal strand of the Aβ peptides. The region with the highest probability, residues 16–21, is indicated by a horizontal black line, as well as the sequence along this fragment. (b) Representative snapshot showing the binding of Tau4RD and the Aβ40 template (gray). Different colors are used for each of the tau repeats: R1 in blue, R2 in yellow, R3 in green, and R4 in red. A purple box indicates the KLVFFA fragment on Aβ40. (Reprinted with permission from A. Rojas et al., J. Phys. Chem. B 122, 7049–7056 (2018). Copyright (2018) American Chemical Society)

promotes β-sheet formation [76, 77]. Therefore, the main focus is on the three-repeat (Tau3RD) and the four-repeat (Tau4RD) tau fragments. For this reason, MD simulations of a four-chain (two-layer) Aβ40 template [50] interacting with Tau3RD or Tau4RD were carried out. The simulation setup was the same as in the first application (Fig. 4c), the only difference being that the Aβ-free monomer was replaced by the peptide from tau. The simulations were carried out

Probing Protein Aggregation Using the Coarse-Grained UNRES Force Field

97

Fig. 10 Free-energy landscape (in kcal/mol) along the first two PCs with representative structures at the minima (labeled 1–9) for the trajectory of Tau4RD binding to the Aβ40 template. (Reprinted with permission from A. Rojas et al., J. Phys. Chem. B 122, 7049–7056 (2018). Copyright (2018) American Chemical Society)

Fig. 11 The β-sheet content (in %) of Tau3RD and Tau4RD fragments in the absence (red) or presence (blue) of the Aβ40 template. The errors were obtained from fluctuations between the different trajectories. (Reprinted with permission from A. Rojas et al., J. Phys. Chem. B 122, 7049–7056 (2018). Copyright (2018) American Chemical Society)

by using REMD. A total of 120 REMD trajectories were run with temperatures ranging from 280 to 320 K. Each trajectory was 72  105 steps long (which is equivalent to 17.6μs effective simulation time per replica). The tau-protein peptides studied form hydrogen bonds with the template in 69% of Tau4RD trajectories and in 67% of Tau3RD ones. By tracking the regions of Aβ40 that participated in the

98

Ana V. Rojas et al.

Fig. 12 Simulations of dimerization between Aβ40 and TauRD. (a) The probability that each of the residues along the Aβ40 sequence would have formed a hydrogen bond with any of the residues in Tau4RD (blue) or Tau3RD (red). The regions that in the fibrils are in a β-strand conformation are shaded in pink. (b) β-Sheet and β-helix content along the Aβ40 sequence. (c) Example of a dimer formed between Tau4RD (orange) and Aβ40 (purple), illustrating the higher tendency to form hydrogen bonds along the C-terminal region of Aβ40. (Reprinted with permission from A. Rojas et al., J. Phys. Chem. B 122, 7049–7056 (2018). Copyright (2018) American Chemical Society)

hydrogen bonds formed with Tau3RD or Tau4RD and averaging over the four chains in the template, a binding propensity along the Aβ40 sequence was calculated. The results show that the residues in Aβ40 responsible for the binding to Tau3RD and Tau4RD are primarily residues along the N-terminal half of Aβ40 (Fig. 9a, b), with the highest binding propensity seen along the KLVFFA fragment of Aβ40 [74]. The pathway along which the tau-protein peptides bind to the fibril template is determined by the FEL along the first two PCs (Fig. 10). It appears that, at an early stage in the trajectory, the GSTENLK fragment of Tau4RD (residues 18–24 of the R1 repeat)

Probing Protein Aggregation Using the Coarse-Grained UNRES Force Field

99

Fig. 13 Free-energy landscape (in kcal/mol) along the first two PCs with representative structures at the minima (labeled 1–7) for the trajectory of tau protein binding to the Aβ40 monomer. (Reprinted with permission from A. Rojas et al., J. Phys. Chem. B 122, 7049–7056 (2018). Copyright (2018) American Chemical Society)

drives the peptide toward the KLVFFA fragment (residues 16–21) of one of the Aβ40 chains and binds this fragment forming a β-sheet (representative structures of minima 2–6). The R2, R3, and R4 repeats do not interact with the Aβ40 template during this time interval. Only after finishing the formation of the β-strand and binding of the R1 repeat with the Aβ40 template does the R4 repeat start moving toward the second semifilament of the Aβ40 template, and the GSLDNIT fragment (residues 19–25 of the R4 repeat) binds to the KLVFFA fragment of one of the chains with antiparallel orientation and forms a β-strand (see representative structures of minima 7–9 in Figs. 9b and 10). Another demonstration of the influence of the Aβ40 template on the tau peptides is shown in Fig. 11, where the conformations with a higher β-sheet content along Tau3RD and Tau4RD are promoted by the Aβ40 template, which suggests that the presence of Aβ40 might make tau more prone to aggregation. In this work, the question as to why the soluble Aβ42, but not Aβ40, can trigger the aggregation of tau-protein peptides was addressed. Many experimental studies have indicated that Aβ42, but not Aβ40, can induce tau peptide aggregation [78, 79]. However, these studies often use the Aβ peptides in a non-aggregated state [78]. Because Aβ42 aggregates at a faster rate than Aβ40, the studies might be evaluating the effect of different aggregation states. This possibility was pointed out in a study by Rank et al. [80]. Therefore, in order to see whether the ability of Aβ40 to bind

100

Ana V. Rojas et al.

to Tau3RD and Tau4RD fragments was only seen when Aβ40 was in the fibrillar conformation, simulations in which the tau fragments interacted with an Aβ40 monomer (i.e., non-aggregated) were carried out [74]. The number of hydrogen bonds formed between the Tau4RD or Tau3RD fragments and the Aβ40 monomer (Fig. 12) is considerably lower than it was for the Aβ40 template (Fig. 9a). This finding indicates that Tau4RD and Tau3RD have little affinity for non-aggregated Aβ40. Similar simulations carried out with Aβ42 show that TauRD’s affinity for Aβ42 monomers is also low (not shown). Taken together, these results indicate that neither Aβ40 nor Aβ42 monomers can form aggregates in complex with tau peptides. Most likely, when the mix contains Aβ42, tau peptides bind to the Aβ42 that has already started to form aggregates. When Aβ40 and Tau4RD, or Tau3RD, do form dimers, it is mostly through intermolecular β-sheets involving the residues along the C-terminal portion of Aβ40 (Fig. 12a). The reason for Tau4RD and Tau3RD associating with the C-terminal half of Aβ40, instead of the N-terminal half [as was the case with the Aβ40 template (Fig. 9a)], can be found in the secondary structure of the Aβ40 monomers. The 16KLVFFA21 region of Aβ40, which in the fibril is part of the N-terminal β-strand, and that so efficiently attracts Tau4RD and Tau3RD (Fig. 9a), is more often found in an α-helical conformation in the monomeric state (Fig. 12b). Due to this change in secondary structure, the tau fragments bind to the C-terminal strand instead, which is found in a β-sheet conformation most of the time (see example in Fig. 12c), but to which they seem to have lower affinity. Hence, Aβ40 or Aβ42 monomers are rarely found in complex with Tau3RD or Tau4RD. As an illustrative example, the FEL along the PCs was built for one of the MD trajectories to describe, in detail, the pathway along which the TauRD and Aβ40 monomers form heterodimers (Fig. 13).

6

Conclusions The examples presented in this chapter demonstrated that coarsegrained modeling with UNRES is a useful tool for studying peptide and protein aggregation. Owing to substantial reduction of representation (only two sites per residue), which results in a 1000-fold or more time-scale extension compared to all-atom simulations, time scales inaccessible to all-atom dynamics can be reached. At the same time, owing to the rigorous derivation of the UNRES energy function based on the cluster-cumulant expansion of the potential of mean force [13, 15], the force field is predictive and can, therefore, be used in simulating structures for which experimental data are difficult or impossible to obtain, which is the case of oligomeric intermediates in amyloid formation.

Probing Protein Aggregation Using the Coarse-Grained UNRES Force Field

101

Acknowledgments This work was supported by grants UMO-2017/25/B/ST4/ 01026 from the National Science Center of Poland (Narodowe Centrum Nauki), from the National Institutes of Health (GM-14312) and from the National Science Foundation (MCB10-19767). Calculations were carried out using the computational resources provided by (a) our 588-processor Beowulf cluster at the Baker Laboratory of Chemistry and Chemical Biology, Cornell University, (b) the supercomputer resources at the Interdisciplinary Center of Mathematical and Computer Modeling (ICM), University of Warsaw (grant G10-05), (c) the supercomputer resources at the Informatics Center of the Metropolitan Academic Network (CI TASK) in Gdan´sk. References 1. Williams AJ, Paulson HL (2008) Polyglutamine neurodegeneration: protein misfolding revisited. Trends Neurosci 31:521–528 2. Staniforth RA, Giannini S, Higgins LD, Conroy MJ, Hounslow AM, Jerala R, Craven CJ, Waltho JP (2001) Three-dimensional domain swapping in the folded and molten-globule states of cystatins, an amyloid-forming structural superfamily. EMBO J 20:4774–4781 3. Goate A, Chartier-Harlin MC, Mullan M, Brown J, Crawford F, Fidani L, Giuffra L, Haynes A, Irving N, James L (1991) Segregation of a missense mutation in the amyloid precursor protein gene with familial Alzheimer’s disease. Nature 349:704706 4. Hardy J, Selkoe DJ (2002) The amyloid hypothesis of Alzheimer’s disease: progress and problems on the road to therapeutics. Science 297:353–356 5. Tycko R (2006) Molecular structure of amyloid fibrils: insights from solid-state NMR. Q Rev Biophys 39:155 6. Lu JX, Qiang W, Yau WM, Schwieters C, Meredith S, Tycko R (2013) Molecular structure of β-amyloid fibrils in Alzheimer’s disease brain tissue. Cell 154:1257–1268 7. Khalili M, Liwo A, Jagielska A, Scheraga HA (2005) Molecular dynamics with the unitedresidue model of polypeptide chains. II. Langevin and Berendsen-bath dynamics and tests on model α-helical systems. J Phys Chem B 109:13798–13810 8. Liwo A, Ołdziej S, Pincus MR, Wawak RJ, Rackovsky S, Scheraga HA (1997) A unitedresidue force field for off-lattice protein-structure simulations. I: Functional forms and parameters of long-range side-chain interaction

potentials from protein crystal data. J Comput Chem 18:849–873 9. Liwo A, Pincus MR, Wawak RJ, Rackovsky S, Ołdziej S, Scheraga HA (1997) A unitedresidue force field for off-lattice protein-structure simulations. II: Parameterization of local interactions and determination of the weights of energy terms by Z-score optimization. J Comput Chem 18:874–887 10. Liwo A, Czaplewski C, Ołdziej S, Rojas AV, Kaz´mierkiewicz R, Makowski M, Murarka RK, Scheraga HA (2008) Simulation of protein structure and dynamics with the coarse-grained UNRES force field. In: Voth GA (ed) Coarsegraining of condensed phase and biomolecular systems. CRC Press, Boca Raton, FL 11. Liwo A, Baranowski M, Czaplewski C, Gołas´ E, He Y, Jagieła D, Krupa P, Maciejczyk M, Makowski M, Mozolewska MA, Niadzvedtski A, Ołdziej S, Scheraga HA, Sieradzan AK, S´lusarz R, Wirecki T, Yin Y, Zaborowski B (2014) A unified coarse-grained model of biological macromolecules based on meanfield multipole-multipole interactions. J Mol Model 20:2306 12. Sieradzan AK, Krupa P, Scheraga HA, Liwo A, Czaplewski C (2015) Physics-based potentials for the coupling between backbone- and sidechain-local conformational states in the united residue (UNRES) force field for protein simulations. J Chem Theory Comput 11:817–831 13. Sieradzan AK, Makowski M, Augustynowicz A, Liwo A (2017) A general method for the derivation of the functional forms of the effective energy terms in coarse-grained energy functions of polymers. I. Backbone potentials of

102

Ana V. Rojas et al.

coarse-grained polypeptide chains. J Chem Phys 146:124106 14. Liwo A, Sieradzan AK, Lipska AG, Czaplewski C, Joung I, Z˙mudzin´ska W, Hałabis A, Ołdziej S (2019) A general method for the derivation of the functional forms of the effective energy terms in coarse-grained energy functions of polymers. III. Determination of scale-consistent backbone-local and correlation potentials in the UNRES force field and forcefield calibration and validation. J Chem Phys 150:155104 15. Liwo A, Czaplewski C, Pillardy J, Scheraga HA (2001) Cumulant-based expressions for the multibody terms for the correlation between local and electrostatic interactions in the united-residue force field. J Chem Phys 115:2323–2347 16. Krupa P, Sieradzan AK, Rackovsky S, Baranowski M, Ołdziej S, Scheraga HA, Liwo A, Czaplewski C (2013) Improvement of the treatment of loop structures in the UNRES force field by inclusion of coupling between backbone- and side-chain-local conformational states. J Chem Theory Comput 9:4620–4632 17. Liwo A, Khalili M, Czaplewski C, Kalinowski S, Ołdziej S, Wachucik K, Scheraga HA (2007) Modification and optimization of the unitedresidue (UNRES) potential energy function for canonical simulations. I. Temperature dependence of the effective energy function and tests of the optimization method with single training proteins. J Phys Chem B 111:260–285 18. He Y, Xiao Y, Liwo A, Scheraga HA (2009) Exploring the parameter space of the coarsegrained UNRES force field by random search: selecting a transferable medium-resolution force field. J Comput Chem 30:2127–2135 19. Krupa P, Hałabis A, Z˙mudzin´ska W, Ołdziej S, Scheraga HA, Liwo A (2017) Maximum likelihood calibration of the UNRES force field for simulation of protein structure and dynamics. J Chem Inf Model 57:2364–2377 20. Khalili M, Liwo A, Rakowski F, Grochowski P, Scheraga HA (2005) Molecular dynamics with the united-residue model of polypeptide chains. I. Lagrange equations of motion and tests of numerical stability in the microcanonical mode. J Phys Chem B 109:13785–13797 21. Rakowski F, Grochowski P, Lesyng B, Liwo A, Scheraga HA (2006) Implementation of a symplectic multiple-time-step molecular dynamics algorithm, based on the united-residue mesoscopic potential energy function. J Chem Phys 125:204107 22. Swope WC, Anderson HC, Berens PH, Wilson KR (1982) A computer simulation method for

the calculation of equilibrium constants for the formation of physical clusters of molecules: application to small water clusters. J Chem Phys 76:637–649 23. Liwo A, Khalili M, Scheraga HA (2005) Ab initio simulations of protein-folding pathways by molecular dynamics with the united-residue model of polypeptide chains. Proc Natl Acad Sci U S A 102:2362–2367 24. Berendsen HJC, Postma JPM, van Gunsteren WF, DiNola A, Haak JR (1984) Molecular dynamics with coupling to an external bath. J Chem Phys 81:3684–3690 25. Kleinerman DS, Czaplewski C, Liwo A, Scheraga HA (2008) Implementations of Nose–Hoover and Nose–Poincare thermostats in mesoscopic dynamic simulations with the united-residue model of a polypeptide chain. J Chem Phys 128:245103 26. Nose S (1984) A unified formulation of the constant temperature molecular dynamics methods. J Chem Phys 81:511–517 27. Nose S (2001) An improved symplectic integrator for Nose-Poincare thermostat. J Phys Soc Jpn 70:75–77 28. Rojas AV, Liwo A, Scheraga HA (2007) Molecular dynamics with the united residue force field: Ab initio folding simulations of multichain proteins. J Phys Chem B 111:293–309 29. Sieradzan AK, Liwo A, Hansmann UHE (2012) Folding and self-assembly of a small protein complex. J Chem Theory Comput 8:3416–3422 30. Sieradzan AK (2015) Introduction of periodic boundary conditions into UNRES force field. J Comput Chem 36:940–946 31. Rojas A, Liwo A, Scheraga HA (2011) A study of the α-helical intermediate preceding the aggregation of the amino-terminal fragment of the Aβ-amyloid peptide (1-28). J Phys Chem B 115:12978–12983 32. Nanias M, Czaplewski C, Scheraga HA (2006) Replica exchange and multicanonical algorithms with the coarse-grained united-residue (UNRES) force field. J Chem Theory Comput 2:513–528 33. Czaplewski C, Kalinowski S, Liwo A, Scheraga HA (2009) Application of multiplexing replica exchange molecular dynamics method to the unres force field: tests with α and α + β proteins. J Chem Theory Comput 5:627–640 34. Hansmann UHE, Okamoto Y (1994) Comparative study of multicanonical and simulated annealing algorithms in the protein folding problem. Physica A 212:415–437 35. Rhee YM, Pande VS (2003) Multiplexedreplica exchange molecular dynamics method

Probing Protein Aggregation Using the Coarse-Grained UNRES Force Field for protein folding simulation. Biophys J 84:775–786 36. Kumar S, Bouzida D, Swendsen RH, Kollman PA, Rosenberg JM (1992) The weighted histogram analysis method for free-energy calculations on biomolecules. I. The method. J Comput Chem 13:1011–1021 37. Murtagh F (1985) Multidimensional clustering algorithms. Physica-Verlag, Vienna 38. Kitao A, Hirata F, Go N (1991) The effects of solvent on the conformation and the collective motions of protein: normal mode analysis and molecular dynamics simulations of melittin in water and in vacuum. Chem Phys 158:447–472 39. Mu Y, Nguyen PH, Stock G (2005) Energy landscape of a small peptide revealed by dihedral angle principal component analysis. Proteins 58:45–52 40. Altis A, Nguyen PH, Hegger R, Stock G (2007) Dihedral angle principal component analysis of molecular dynamics simulations. J Chem Phys 126:244111 41. Maisuradze GG, Liwo A, Scheraga HA (2009) Principal component analysis for protein folding dynamics. J Mol Biol 385:312–329 42. Zie˛ba K, S´lusarz M, S´lusarz R, Liwo A, Czaplewski C, Sieradzan AK (2019) Extension of the UNRES coarse-grained force field to membrane proteins in the lipid bilayer. J Phys Chem B 123:7829–7839 43. Sieradzan AK, Bogunia M, Mech P, Ganzynkowicz R, Giełdon´ A, Liwo A (2019) Introduction of phosphorylated residues into the UNRES coarse-grained model: toward modeling of signaling processes. J Phys Chem B 119:8526–8534 44. Sieradzan AK, Giełdon´ A, Yin Y, He Y, Scheraga HA, Liwo A (2018) A new protein nucleicacid coarse-grained force field based on the UNRES and NARES-2P force fields. J Comput Chem 39:2360–2370 45. Lubecka EA, Liwo A (2016) New UNRES force field package with FORTRAN 90. TASK Quart 20:399–408 46. Blackford LS, Demmel J, Dongarra J, Duff I, Hammarling S, Henry G, Heroux M, Kaufman L, Lumsdaine A, Petitet A, Pozo R, Remington K, Whaley RC (2002) An updated set of basic linear algebra subprograms (BLAS). ACM Trans Math Softw 28-2:135–151 47. Czaplewski C, Karczyn´ska A, Sieradzan A, Liwo A (2018) UNRES server for physicsbased coarse-grained simulations and prediction of protein structure, dynamics and thermodynamics. Nucleic Acids Res 46: W304–W309

103

48. Harper J, Lansbury P Jr (1997) Models of amyloid seeding in Alzheimer’s disease and scrapie: mechanistic truths and physiological consequences of the time-dependent solubility of amyloid proteins. Annu Rev Biochem 66:385–407 49. Rojas A, Liwo A, Browne D, Scheraga HA (2010) Mechanism of fiber assembly; treatment of aβ-peptide aggregation with a coarsegrained united-residue force field. J Mol Biol 404:537–552 50. Petkova AT, Yau W-M, Tycko R (2006) Experimental constraints on quaternary structure in Alzheimer’s amyloid fibrils. Biochemistry 45:498–512 51. Xiao Y, Ma B, McElheny D, Parthasarathy S, Long F, Hoshi M, Nussinov R, Ishii Y (2015) Aβ-(1-42) fibril structure illuminates selfrecognition and replication of amyloid in Alzheimer’s disease. Nat Struct Mol Biol 22:499–505 52. Colvin MT, Silvers R, Ni QZ, Can TV, Sergeyev I, Rosay M, Donovan KJ, Michael B, Wall J, Linse S, Griffin RG (2016) Atomic resolution structure of monomorphic β-amyloid fibrils. J Am Chem Soc 138:9663–9674 53. W€alti MA, Ravotti F, Arai H, Glabe CG, Wall JS, Bo¨ckmann A, Gu¨ntert P, Meier BH, Riek R (2016) Atomic-resolution structure of a disease-relevant Aβ(1-42) amyloid fibril. Proc Natl Acad Sci U S A 113:E4976–E4984 54. Gremer L, Scho¨lzel D, Schenk C, Reinartz E, Labahn J, Ravelli RBG, Tusche M, LopezIglesias C, Hoyer W, Heise H, Willbold D, Schro¨der GF (2017) Fibril structure of amyloid-β(1-42) by cryo-electron microscopy. Science 358:116–119 55. Nguyen PH, Li MS, Stock G, Straub JE, Thirumalai D (2007) Monomer adds to preformed structured oligomers of Aβ-peptides by a two-stage dock-lock mechanism. Proc Natl Acad Sci U S A 104:111–116 56. Reddy G, Straub JE, Thirumalai D (2009) Dynamics of locking of peptides onto growing amyloid fibrils. Proc Natl Acad Sci U S A 106:11948–11953 57. O’Brien E, Okamoto Y, Straub JE, Brooks B, Thirumalai D (2009) Thermodynamic perspective on the dock-lock growth mechanism of amyloid fibrils. J Phys Chem B 113:14421–14430 58. Rojas A, Maisuradze N, Kachlishvili K, Scheraga HA, Maisuradze GG (2017) Elucidating important sites and the mechanism for amyloid fibril formation by coarse-grained molecular dynamics. ACS Chem Neurosci 8:201–209

104

Ana V. Rojas et al.

59. Takeda T, Klimov DK (2009) Replica exchange simulations of the thermodynamics of Aβ fibril growth. Biophys J 96:442–452 60. Wallace JA, Shen JK (2010) Probing the strand orientation and registry alignment in the propagation of amyloid fibrils. Biochemistry 49:5290–5298 61. Schor M, Vreede J, Bolhuis PG (2012) Elucidating the locking mechanism of peptides onto growing amyloid fibrils through transition path sampling. Biophys J 103:1296–1304 62. Tjernberg LO, Naslund J, Lindqvist F, Johansson J, Karlstrom AR, Thyberg J, Terenius L, Nordstedt C (1996) Arrest of beta-amyloid fibril formation by a pentapeptide ligand. J Biol Chem 271:8545–8548 63. Esler WP, Stimson ER, Jennings JM, Vinters HV, Ghilardi JR, Lee JP, Mantyh PW, Maggio JE (2000) Alzheimer’s disease amyloid propagation by a template dependent dock-lock mechanism. Biochemistry 39:6288–6295 64. Cannon M, Williams A, Wetzel R, Myszka D (2004) Kinetic analysis of beta-amyloid fibril elongation. Anal Biochem 328:67–75 65. Ban T, Hoshino M, Takahashi S, Hamada D, Hasegawa K, Naiki H, Goto Y (2004) Direct observation of Aβ amyloid fibril growth and inhibition. J Mol Biol 344:757–767 66. Kellermayer MSZ, Karsa A, Benke M, Soos K, Penke B (2008) Stepwise dynamics of epitaxially growing single amyloid fibrils. Proc Natl Acad Sci U S A 105:141–144 67. Qiang W, Kelley K, Tycko R (2013) Polymorph-specific kinetics and thermodynamics of β-amyloid fibril growth. J Am Chem Soc 135:6860–6871 68. Sole-Domenech S, Rojas AV, Maisuradze GG, Scheraga HA, Lobel P, Maxfield FR (2018) Lysosomal enzyme tripeptidyl peptidase 1 destabilizes fibrillar Aβ by multiple endoproteolytic cleavages within the β-sheet domain. Proc Natl Acad Sci U S A 115:493–1498 69. Massi F, Straub JE (2001) Energy landscape theory for Alzheimer’s amyloid-beta peptide fibril elongation. Proteins 42:217–229 70. Lam AR, Rodriguez JJ, Rojas A, Scheraga HA, Mukamel S (2013) Tracking the mechanism of fibril assembly by simulated two-dimensional ultraviolet spectroscopy. J Phys Chem A 117:342–350

71. Gurry T, Stultz CM (2014) Mechanism of amyloid-β fibril elongation. Biochemistry 53:6981–6991 72. Weingarten MD, Lockwood AH, Hwo SY, Kirschner MW (1975) A protein factor essential for microtubule assembly. Proc Natl Acad Sci U S A 72:1858–1862 73. Barghorn S, Mandelkow E (2002) Toward a unified scheme for the aggregation of tau into Alzheimer paired helical filaments. Biochemistry 41:14885–14896 74. Rojas AV, Maisuradze GG, Scheraga HA (2018) Dependence of the formation of tau and a beta peptide mixed aggregates on the secondary structure of the N-terminal region of A beta. J Phys Chem B 122:7049–7056 75. Wegmann S, Medalsy ID, Mandelkow E, Mu¨ller DJ (2013) The fuzzy coat of pathological human tau fibrils is a two-layered polyelectrolyte brush. Proc Natl Acad Sci U S A 110: E313–E321 76. von Bergen M, Friedhoff P, Biernat J, Heberle J, Mandelkow EM, Mandelkow E (2000) Assembly of β protein into Alzheimer paired helical filaments depends on a local sequence motif ((306)VQIVYK(311)) forming β structure. Proc Natl Acad Sci U S A 97:5129–5134 77. Goux WJ, Kopplin L, Nguyen AD, Leak K, Rutkofsky M, Shanmuganandam VD, Sharma D, Inouye H, Kirschner DA (2004) The formation of straight and twisted filaments from short tau peptides. J Biol Chem 279:26868–26875 78. McGowan E, Pickford F, Kim J, Onstead L, Eriksen J, Yu C, Skipper L, Murphy MP, Beard J, Das P et al (2005) Aβ-42 is essential for parenchymal and vascular amyloid deposition in mice. Neuron 47:191–199 79. Hu X, Li X, Zhao M, Gottesdiener A, Luo W, Paul S (2014) Tau pathogenesis is promoted by Aβ-42 but not Aβ-40. Mol Neurodegener 9:52 80. Rank KB, Pauley AM, Bhattacharya K, Wang Z, Evans DB, Fleck TJ, Johnston JA, Sharma SK (2002) Direct interaction of soluble human recombinant tau protein with Aβ 1-42 results in tau aggregation and hyperphosphorylation by tau protein kinase II. FEBS Lett 514:263–268

Chapter 6 Contact-Based Analysis of Aggregation of Intrinsically Disordered Proteins Marek Cieplak, Łukasz Mioduszewski, and Mateusz Chwastyk Abstract We review the contact-based description of aggregation of intrinsically disordered proteins in coarsegrained and all-atom models. We consider polyglutamines and polyalanines at various concentrations of the peptides. We also study associations of two chains of α-synuclein and up to 20 chains of a 12-residuelong segment of protein tau. We demonstrate that the total number of two-chain association events (in an aggregate that comprises at least two chains) provides a useful measure of the propensity to aggregate. This measure is consistent, for instance, with the previously reported mass spectroscopy data. The distribution of the number of association events is given essentially by a power law as a function of the duration of these events. The corresponding exponent depends on the protein and the temperature but not on the concentration of the proteins. Key words Intrinsically disordered proteins, Aggregation of proteins, Coarse-grained models, Molecular dynamics, Contact map, PolyQ, Protein tau, α-Synuclein

1

Introduction The homomeric and heteromeric assembly of structured proteins into quarternary architectures is vital for life processes. It has been shown recently [1] that it proceeds through three basic types of steps: dimerization, cyclization, and heteromeric subunit addition. Combining these steps (and assuming that the proteins are nearly rigid) leads to specific types of the assembly topology, 14 of which are found in the Protein Data Bank (PDB). In the case of the intrinsically disordered proteins (IDP) [2–10], enumerating the assembly pathways seems to be difficult since such proteins undergo structural transformations and readily flow between various basins of conformations (see, e.g., ref. 11). Thus, for instance, dimerization may result in different outcomes and may also compete with cyclization. Nevertheless, the process of aggregation of the IDPs is known to often lead ultimately to well-defined fibrilar structures [12–15]. Understanding this process is important because most

Mai Suan Li et al. (eds.), Computer Simulations of Aggregation of Proteins and Peptides, Methods in Molecular Biology, vol. 2340, https://doi.org/10.1007/978-1-0716-1546-1_6, © Springer Science+Business Media, LLC, part of Springer Nature 2022

105

106

Marek Cieplak et al.

neurodegenerative diseases are associated with protein aggregation [16, 17] though the aggregates themselves need not be necessarily toxic [18, 19]. It is also important because the membraneless organelles are proteinaceous liquid droplets that are made of aggregates of IDPs and, frequently, also of the nucleic acids [20–22]. It is expected that insights into the mechanics of aggregation can be obtained through molecular dynamics simulations that use empirical potentials and the issue is how to describe it quantitatively (and not just by making “movies” of the time evolution). We shall focus here on solving this task by identifying the relevant interresidue couplings between the chains that associate. There are essentially two kinds of molecular dynamics simulations: all-atom, involving effective interatomic interactions, and coarse-grained, involving effective inter-residue interactions. The latter can be either repulsive (preventing penetration of the backbone) or contact forming, i.e., corresponding to an attractive potential well. The contacts may have either intrachain or interchain character, and the distribution of the duration times of various association events, established through the interchain contacts, can be taken as a measure of the propensity to make associations. Naturally, the coarse-grained models allow one for accessing longer time scales and studying larger systems compared to the all-atom approach. This is accomplished by shedding the degrees of freedom associated with the finer details of the description. It is likely that these details are not important for the phenomena taking place at the long time scales. In the coarse-grained models, the contacts define the forces occurring in the dynamics. In the all-atom models, they can be used descriptively to identify the atomic groups, and then the residues that are involved in the interactions. A convenient geometry-based method to identify the contacts (that we used in a number of studies including those on protein stretching [23, 24]) relies on representing the heavy atoms by spheres and postulates that a contact exists if there is at least one pair of spheres that overlap—this can be applied both to the instantaneous all-atom conformations and to the native state. We take the radii of the spheres to be proportional to the van der Waals radii of the atoms. The proportionality factor is 1.24 (the inflection point in the Lennard–Jones potential) to account for attraction [25]. There are various calculated sets of the van der Waals radii. For proteins, it is sensible to take the context-dependent values derived by Tsai et al. [26] (α-C has a slightly different radius than β or γ-C). Further possible improvements in the selection of contacts (like adding the chemically motivated rCSU contacts) are discussed in ref. 27. The usage of contacts in the context of aggregation should start by considering subsequent snapshots and making the list of all inter-chain contacts for each of the snapshots. Two chains can associate by one contact or by many, and the number of possibilities

Contact-Based Analysis of Aggregation of Intrinsically Disordered Proteins

107

to choose specific connecting contacts is “astronomical” even for peptide chains that are as short as 20 residues. Thus an exact enumeration of the contact-defined pattern possibilities to associate is hard (unless it is restricted to several trajectories). Nevertheless, the contacts that appear most frequently provide a useful information about the nature of association and about how it changes as a result of, say, making a mutation or changing the concentration. If we disregard the information about the number and identity of the connecting contacts, then we describe the process merely in terms of the presence of a two-chain association event. If more chains are involved in an interconnected cluster, then we consider all connected pairs and count all of the events detected in the snapshot under consideration. Their combined number is denoted here as Y. The value of Y is time dependent. In the following, we illustrate the usage of the contacts in three simulational approaches to aggregation. The first approach is valid for structured proteins—specifically for those that combine to an icosahedral virus capsid with a known native structure. The intra and inter-protein contacts are determined from this structure, and assembly, like folding of a single protein, means establishing all native contacts from a disordered state. This means that the distribution of the parameter Y in the aggregated state is sharply peaked. In the native state, there is just one value, but thermal fluctuations may widen the distribution. In the case of the IDPs, there are many energetically comparable ways to associate, and the distribution of Y is peaked at one event, and then it decays over a substantial range. We show, by using coarse-grained and all-atom simulations pertaining to the IDPs, that the equilibrated distribution of Y is a convenient measure of the propensity to aggregate. In particular, the total number of Y’s in a simulation of a fixed duration is a simple descriptor of that propensity. It is convenient to normalize this descriptor by the number of chains (if the chains are sequentially identical) and denote it by G. We show that the distribution of Y depends on the temperature, T. Using the coarse-grained model, we demonstrate that this shows primarily in the value of a power law exponent that can be associated with the dependence of Y on the duration of the association. The exponent can differentiate between the low and room temperature (TR) phases. We also use the implicit solvent all-atom simulations for a segment of protein tau and for α-synuclein. We demonstrate that a G-enhancing mutation in protein tau is consistent with the enhanced aggregation propensity that was attested in the literature.

108

2

Marek Cieplak et al.

Methods

2.1 Self-Assembly of Model Proteins into a Virus Capsid

One interesting example of self-assembly of structured proteins is the formation of the virus capsids. The proteins involved are mostly structured, but they also have disordered tails that contain electric charges, as in the case of the CCMV (cowpea chlorotic mottle virus) capsid [28–31]. In vivo, the process is driven electrostatically by the attraction of the tails to the charged strands of nucleic acids. This is followed by a stage in which the proteins lock together into a tiling network. In vitro, however, this process can take place without any nucleic acids if one implements a proper manipulation of the pH of the solvent [32]. In this way, self-assembly reduces just to the locking stage and the role of the tails is minor. The theoretical studies of the locking mechanisms are performed not with flexible proteins but with their solid representations or, more commonly, with solid representations of capsomers—the structural units of several proteins (three in the case of CCMV) [33–36]. These solid units interact as shaped single particles with symmetrymotivated potentials. This approach allows for simulations of a large number of particles and thus of a large number of capsids. We have proposed another method [37] in which the aggregating proteins have their own coarse-grained dynamics defined within the structure-based, or Go-like, description. In the context of the virus capsids, this approach was first used to study nanoindentation of 35 types of capsids and to demonstrate the existence of a large variety of the mechanical responses to squeezing and of a big range of effective elastic constants [38–40]. We have also argued that the elastic shell model [41] cannot describe the deformation field correctly in a qualitative way. In the case of CCMV, there are 180 sequentially identical protein molecules, 34,200 residues out of which 5580 belong to the tails, and 62,460 contacts (both within proteins and between proteins). In the aggregated native state, all of these contacts need to be established, and each of them contributes the well energy of ε to the total energy. We have calibrated ε by considering the protein stretching experiments [24] and by performing all-atom simulations [42] to be about 110 pN/Å or 1.6 kcal/mol. The time scale in the simulations is τ which is of order 1 ns. In order to study the aggregation in a simplified manner, we considered a protocol in which we start with one native capsid, heat it up by applying an elevated T for a certain time th and then reduce the T to its TR value, of about 0.30–0.35 ε/kB. An example of what happens is given in Fig. 1. In this case, th is 2000 τ and the interprotein contact disruption affects about 70% of the inter-protein and 40% of the intra-protein links. Cooling of the system results in an almost perfect reestablishment of all contacts in about 8000 τ as analyzed in more details in ref. 37. A longer th results in an

Contact-Based Analysis of Aggregation of Intrinsically Disordered Proteins

109

Fig. 1 The snapshot on the left shows the structure of the model CCMV virus capsid (considered without any strands of the RNA) that has been obtained by heating the native state of the capsid to the temperature of 0.94 ε/kB for 2000 τ. The snapshot on the right shows the structure obtained after a subsequent cooling of the system at TR of 0.3 ε/kB for time t of 8000 τ. The numbers in the top line indicate the fractions of the interprotein (Qpp) and intraprotein (Qp) native contacts that are present in the snapshots. The snapshot on the right is essentially native-like

imperfect assembly primarily because some of the proteins leave the company of most other proteins—the problem was considered in an infinite space. It should be restudied under the conditions of confinement and for many capsids. Nevertheless, it seems clear that the whole phenomenon relies on establishing proper contacts in this approach. In our model, the interactions are defined by the native contact map, and it is immaterial whether the recombining proteins belong originally to the same or different capsomers. We observe that satisfying the contacts leads to comparable formation rates of the intra- and inter-capsomer dimers. It is not clear whether a more refined model would lead to similar conclusions. It should be pointed out that experimentally [43], the CCMV proteins appear to first form dimers and then the dimers combine to make hexameric capsomers—we do not see this effect in our model. Another virus capsid, HK97, appears to form hexameric and pentameric capsomers in single steps [44]. 2.2 Self-Assembly of IDPs in a Coarse-Grained Model

We have generalized the structure-based model to the case of the IDPs [45] (see also ref. 11). The model is still defined in terms of the dynamics of the α-C atoms. However, instead of determining the (fixed) contacts by inferring from one privileged state, we derive them dynamically from the instantaneous shape of the backbone and make use of the local Frenet system of coordinates. The direction of the β-C atom, and thus of the sidechain, is approximately along the negative normal vector. The direction of the hydrogen bond connecting either the O or N backbone atoms is

110

Marek Cieplak et al.

approximately given by the binormal vector. The sidechainsidechain, backbone-backbone, and sidechain-backbone contacts may form, provided certain conditions on the local geometry are met [46–48], so that, for instance, the negative normal directions point roughly at each other instead of away from each other. There are also specific conditions on the distances at which the contacts form or break. In addition, there are conditions on how many residues a single residue of a given kind can connect to. For instance, cysteine can form just one contact, and it could be a disulfide bond, whereas phenylalanine can form up to six. The electrostatic interactions are incorporated within the Debye– Huckel approximation and the backbone stiffness is included. The full description of the model as well as benchmarking to experiments and to all-atom simulations are described in ref. 45. It should be noted that the parametrization of the model is valid around TR, because the PDB structures used to determine characteristic distances were obtained at TR. We have used this model to determine the phase diagram defined on the ρ–T plane, where ρ is the number density, for several systems of disordered polypeptides: polyalanines and polyglutamines [49]. Specifically, we have studied A20 and QN, where the sequence length, N, is equal to 20, 40, and 60. For each considered point on the ρ–T plane, the simulations involved at least 1800 residues (e.g., 90 chains for N ¼ 20) and 1.5 ms runs. ρ is defined as the number of residues in 1 nm3. These phase diagrams depend on the system, but they have turned out to correspond to the same topology that is illustrated by the cases of Q20 and Q60 in Fig. 2. The topology involves three phases: gas (G), liquid (L), and the amyloid glass (AG). The boundaries between the phases have been determined by considering the cluster distribution that specifies the values of P (n)—the probabilities to observe clusters of aggregated n chains in a nearly stationary state obtained after an equilibration stage of 1 million τ. In an all-atom model the existence of the clusters could be assessed by looking at the connectivities generated by the overlap-based contacts (for instance). In the more empirical coarse-grained model, we distinguish between the dynamical contacts that satisfy all of the conditions described above and the distance-based contacts for which only the correctness of the interaction distances is checked. We use the latter contacts only for the clustering purposes. The reason is that the directional conditions for the contacts have been derived from the structures of single proteins. Under dense conditions, the backbones without any true side chain atoms get distorted more than in real proteins, and the angle-related conditions are hard to satisfy, so we just check the effects of proximity.

Contact-Based Analysis of Aggregation of Intrinsically Disordered Proteins

111

Fig. 2 The phase diagrams on the ρ–T plane for Q20 and Q60. The black squares delimit the G phase and the blue circles—the L phase. The open blue circles correspond to the I line. The red line connecting the red stars delimits the low-T amyloid glass phase with the anisotropic directional correlations between the chains. The sizes of the data points correspond to the uncertainties in the locations of the points. The dashed horizontal lines show the approximate location of TR

The G phase is defined by the condition P(1)  0.5, that is, the system consists mostly of monomers. The L phase is when P (nmax)  0.95, where nmax is the largest cluster available at the ρ considered. In the L phase, nmax is typically equal to the total number of chains or a number a bit smaller, since there is a possibility of a minor evaporation. We also define the intermediate line I corresponding to P(nmax) ¼ 0.5. The density range between G and L corresponds to the coexistence region. However, multiple larger clusters, which we identify with the droplets, are more likely to appear between the I and L lines. We cannot determine the critical point above which there is no distinction between G and L just based on P(nmax). However, on lowering the T, we encounter the AG phase in which the values of P(n) strongly depend on the history of the simulations and in which one observes local structural patterns with the backbones tending to align in a parallel or antiparallel fashion. This indicates that the proteins in the low-T region do not behave

112

Marek Cieplak et al.

like quasi-spherical particles but form platelet-like anisotropic objects. These platelets may represent randomly oriented fragments of the amyloid fibrils which is consistent with the finding of Sharma et al. [50] and may relate to the experiments on exon1 proteins (with “bright” and “dim” droplets) [51, 52]. It should be noted that the AG phase does not appear in the isotropic coarse-grained model of Dignon et al. [53–55] in which the residue–residue interactions depend only on the distance between the residues. However, the model of Dignon et al. is parametrized only by a hydropathy scale, that is considered to be T-independent, and allows for the determination of the critical point by a method that does not involve determination of the cluster distribution. We now focus on the distribution of the parameter Y and ask how many aggregation events correspond to situations that last for time ta. The log-log plots of the dependence of Y on ta are shown in Figs. 3 and 4 for the systems of Q20 and Q60, respectively. The top panels correspond to T within the AG phase and the bottom panels to T ¼ TR. For each of these cases, we consider two values of ρ: one near the I line and the other near the L line. In an analogy to the result obtained by all-atom simulations for α-synuclein [56] (Fig. 9), we observe many short-lasting events and few long-lasting events. The statistics of the latter are too poor to infer about their functional description. Thus we focus on the region with large statistics. Figures 3 and 4 suggest that the distributions in this region correspond to a power law Y ¼ aðρÞðt a =τÞb The results for the four systems studied are summarized in Fig. 5. They were obtained based on 500,000 τ after equilibration. At TR, the exponent b is seen to depend on the system but not on ρ. In the AG phase, on the other hand, b is just equal to 2 for each system studied. Thus the transition to the AG phase is signaled not only by a maximum in the specific heat and a non-zero value of an order parameter that relates to the glassy directionality [49] but also by the switch in the value of b. G and thus the amplitude a(ρ) is observed to increase linearly with ρ for each system and for each T. This is consistent with the expectation that associations are more likely when ρ gets increased and, therefore, G is a good measure of the propensity to aggregate. The interesting part is that each system has a different slope of the dependence on ρ in the regions between the I and L lines. Our analysis has been based on the association events assessed by the distance-based contacts. When we use the contacts with full attributes, the results are consistent with the simplified approach, but they are more noisy and require larger statistics.

Contact-Based Analysis of Aggregation of Intrinsically Disordered Proteins

113

Fig. 3 The distribution of ln(Y) as a function of ln(ta/τ) for Q20 at the two temperatures and two densities as indicated. The dotted line shows the slope in the power law regime

Fig. 4 Similar to Fig. 3 but for Q60

114

Marek Cieplak et al.

Fig. 5 Summary of the aggregation data for the four systems studied and as a function of ρ. The top panel is for the power law exponent b for the two indicated temperatures. The bottom panel shows the ρ dependence of G for the same two temperatures 2.3 Self-Assembly of IDPs in an All-Atom Model

We now consider all-atom simulations. We have argued [56] that the properties of α-synuclein, and perhaps of all the other IDPs, strongly depend on whether the solvent is explicit or implicit. For structured proteins, this dependence is much weaker because their behavior is dominated by the hydrophobic residues. Nevertheless, due to computational limitations, we could study the aggregation problem only with the implicit solvent for which the conformations tend to be less extended. Specifically, we use the NAMD code version CVS-2013-11-07 for Linux-x86_64-MPI [57] with the CHARMM36m force field [58] that has been designed for the IDPs. The snapshots of the conformations were plotted by using the VMD package [59]. The simulations were performed with the use of the Generalized Born Implicit Solvent method [60]. We implemented the periodic boundary conditions with the cubic box size lx between 8 and 60 nm, depending on the system. The other details are as described in ref. 56. The two-chain results are based on 50 trajectories of 30 ns each. For the larger numbers of chains, we consider five trajectories of 30 ns each. The number of chains is denoted by nc. The simulations are only at TR ¼ 300 K. We start with the association of two chains of α-synuclein with lx ¼ 60 nm and ρ ¼ 0.0013 nm3. Figure 6 shows the top 20 most likely contacts (occurring in single- and multiple-contact events

Contact-Based Analysis of Aggregation of Intrinsically Disordered Proteins

115

Fig. 6 Establishment of contacts in all-atom simulations for α-synuclein (two sequences shown in blue) and between two chains of this protein (one sequence shown in blue and another in magenta). The top 20 most likely occurring contacts are shown as the tilted lines that connect the sequences. The thicker the line, the more probable the contact. We show contacts corresponding to locations i and j, where i < j. For simplicity, we assume the symmetry between i and j. In the single-chain dynamics, 9454 contacts have been identified, and the contacts shown are: 4–7 (rank 1), 27–30 (rank 2), 102–140 (3), 19–22 (4), 22–25 (5), 76–140 (6), 118–140 (7), 5–8 (8), 128–140 (9), 18–21 (10), 11–14 (11), 96–140 (12), 33–36 (13), 93–140 (14), 53–140 (15), 8–11 (16), 4–8 (17), 69–140 (18), 96–99 (19), and 121–140 (20). In the two-chain dynamics, 8596 contacts have been identified, and the contacts shown are: 72–73 (rank 1), 65–89 (rank 2), 4–67 (3), 51–111 (4), 49–111 (5), 48–111 (6), 64–86 (7), 71–73 (8), 73–73 (9), 66–87 (10), 8–68 (11), 49–110 (12), 78–87 (13), 4–66 (14), 50–111 (15), 65–87 (16), 71–75 (17), 72–74 (18), 71–74 (19), and 66–81 (20)

Fig. 7 An example of a two-chain associated state for α-synuclein. One chain is shown in blue and another in green. The interfacial residues are shown as spherical beads, in blue and green correspondingly. Other residues are not highlighted. There are 40 inter-chain contacts in the snapshot. Seven of these contacts are marked as red lines. They belong to the top-ten most probable contacts. These are: 65–89 (rank 2), 4–67 (3), 51–111 (4), 49–111 (5), 48–111 (6), 64–86 (7), and 66–87 (10) (see Fig. 6)

116

Marek Cieplak et al.

combined) that appear within a single chain and between two chains. We observe that the two sets are quite distinct. In the single-chain case, there are nine important N-terminal contacts while in the two-chain case, only three such contacts. This demonstrates that the connectivity-limited single-chain dynamics and two-chain dynamics are distinct when viewed from the perspective of the contacts involved. A snapshot of a two-chain association with one N-terminal contact (4–67) is shown in Fig. 7. Another example is the 273–284 segments of protein tau, denoted here as tau(273–284) that was studied by Larini et al. [61] by mass spectroscopy and by replica exchange molecular dynamics simulations. Both the methods indicated that the WT segment is stabilized by a salt bridge between residues K280 and D283. The deletion ΔK280 removes this bridge and thus enhances the population of more extended conformations and increases the aggregation rate. In the two-chain simulations, we take lx of 8 nm (ρ of 0.047 nm3 and 0.043 nm3 for the WT and ΔK280 cases respectively). The corresponding contact-connectivity diagram is shown in Fig. 8. It illustrates the difference between association by just one

Fig. 8 Establishment of contacts in tau(273–284) and between two chains of this protein—in the WT and ΔK280 versions, as indicated. One sequence is shown in blue and the other in magenta. The thicker the line, the more likely the emergence of the contact. The two top panels correspond to events in which the connection is established only by one contact. In the WT case, the most probable contacts are 274–283 (rank 1), 280–284 (2), 281–284 (3), 280–283 (4), 274–284 (5) and 275–284 (6). In the ΔK280 case: 281–284 (1), 274–283 (2), 281–283 (3), 277–278 (4), 274–284 (5), and 278–284 (6). The two bottom panels indicate the most frequent contacts regardless of the number of contacts that are present. In the WT case, these are: 276–278 (rank 1), 276–277 (2), 275–277 (3), 277–278 (4), 278–278 (5), 276–279 (6), 277–279 (7), and 275–279 (8). In the DeltaK280 case these are: 277–278 (rank 1), 276–277 (2), 275–277 (3), 276–278 (4), 278–279 (5), 275–276 (6), 277–277 (7), and 275–278 (8)

Contact-Based Analysis of Aggregation of Intrinsically Disordered Proteins

117

Fig. 9 The distribution of ln(Y) as a function of ln(ta/τ0) where τ0 is the fundamental time scale used in the all-atom molecular dynamics simulations—of order 1 ps. The top panel is for WT α-synuclein. The middle and bottom panels are for different numbers of chains of tau(273–284). The dotted lines show the slope in the power law regime. The bottom panel is for two chains of tau(273–284); the solid black line corresponds to WT and the blue line to ΔK280. The inset shows the same data but normalized by the total number of events in each case. The middle panel is for ten chains (the blue line corresponds to WT and the green line to the sequences with the deletion) and 20 chains (the red line is for WT and the magenta line for ΔK280)

contact (the top panels) and by an arbitrary number of contacts. They look quite distinct. In addition, making the deletion introduces further differences in each situation. Figure 9 shows the log-log plots of the dependence of Y on ta. Both for α-synuclein and the short segments of protein tau, the power law exponent b is about 1.6 (at TR). This value is not sensitive to nc, as evidenced at least for tau(273–284) for which 2, 10, and 20 chains are considered. For nc ¼ 10 and 20, lx is 10 nm and 17.5 nm, respectively. The corresponding values of ρ are 0.12 and 0.045 nm3 (and slightly less for the case of ΔK280). It is demanding to perform systematic studies as a function of ρ. However, in each case, we can assess the effect of replacing WT by ΔK280: the deletion enhances the ability to aggregate. For nc ¼ 20, G increases from 261.3 to 287.5, i.e., by 9%. For nc ¼ 10, G increases from 204.4 to 321.0, i.e., by 57%. For nc ¼ 2 G increases from 2009.5 to 3012.5, i.e., by 50%.

118

Marek Cieplak et al.

In summary, we have demonstrated that the contact perspective provides a quantitative description of the aggregation processes of proteins. In particular, the number of the association events provides a quantitative measure of the propensity to aggregate. Using this measure is consistent with the findings of Larini et al. [61]. It would be worthwhile to study its dependence on the density and on the duration of the simulations.

Acknowledgments We appreciate discussions with M. Carrion-Vazquez. This research has received support from the National Science Centre (NCN), Poland, under grant No. 2018/31/B/NZ1/00047. This project is a part of the European COST Action EUTOPIA. References 1. Ahnert SE, Marsh JA, Hernandez H, Robinson CV, Teichmann SA (2015) Principles of assembly reveal a periodic table of protein complexes. Science 350:aaa2245 2. Uversky VN (2002) Natively unfolded proteins: a point where biology waits for physics. Protein Sci 11:739–756 3. Dyson HJ, Wright PE (2005) Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol 6:197–208 4. Fink AL (2005) Natively unfolded proteins. Curr Opin Struct Biol 15:35–41 5. Dunker AK, Silman I, Uversky VN, Sussman VL (2008) Function and structure of inherently disordered proteins. Curr Opin Struct Biol 18:756–764 6. Uversky VN, Dunker AK (2010) Understanding protein non-folding. Biochim Biophys Acta 1804:1231–1264 7. Ferreon ACM, Moran CR, Gambin Y, Deniz AA (2010) Single-molecule fluorescence studies of intrinsically disordered proteins. Methods Enzymol 472:179–204 8. Babu MM, van der Lee R, de Groot NS, Gsponer J (2011) Intrinsically disordered proteins: regulation and disease. Curr Opin Struct Biol 21:432–440 9. Wright PE, Dyson HJ (2015) Intrinsically disordered proteins in cellular signalling and regulation. Nat Rev Mol Cell Biol 6:18–29 10. Banani SF, Lee HO, Hyman AA, Rosen MK (2017) Biomolecular condensates: organizers of cellular biochemistry. Nat Rev Mol Cell Biol 18:285–298 11. de Aquino BRH, Chwastyk M, Mioduszewski Ł, Cieplak M (2020) The networks of the inter-

basin traffic in intrinsically disordered proteins. Phys Rev Res 2:013242 12. Wu C, Shea JE (2011) Coarse-grained models for protein aggregation. Curr Opin Struct Biol 21:209–220 13. Nguyen P, Derreumaux P (2014) Understanding amyloid fibril nucleation and Aβ oligomer/ drag interactions from computer simulations. Acc Chem Res 47:603–611 14. Knowles TP, Vendruscolo M, Dobson CM (2014) The amyloid state and its association with protein misfolding diseases. Nat Rev Mol Cell Biol 15:384–396 15. Ranganathan S, Maji SK, Padinhateeri R (2016) Defining a physical basis for diversity in protein self assemblies using a minimal model. J Am Chem Soc 138:13911–13922 16. Stefani M, Dobson CM (2003) Protein aggregation and aggregate toxicity: new insights into protein folding, misfolding diseases and biological evolution. J Mol Med 81:678–699 17. Chuang E, Hori AM, Hesketh CD, Shorter J (2018) Amyloid assembly and disassembly. J Cell Sci 131:189928 18. Booth DR, Sunde M, Bellotti V, Robinson CV, Hutchinson WL, Fraser PE, Hawkins PN, Dobson CM, Radford SE, Blake CCF, Pepys MB (1997) Instability, unfolding and aggregation of human lysozyme variants underlying amyloid fibrillogenesis. Nature 385:787–793 19. Pastore A, Raimondi F, Rajendran LP, Temussi PA (2020) Why does the Aβ peptide of Alzheimer share structural similarity with antimicrobial peptides? Commun Biol 3:135 20. Brangwynne CP, Eckmann CR, Courson DS, Rybarska A, Hoege C, Gharkhani J,

Contact-Based Analysis of Aggregation of Intrinsically Disordered Proteins Juelicher F, Hyman AA (2009) Germline P granules are liquid droplets that localize by controlled dissolution/condensation. Science 324:1729–1732 21. Brangwynne CP, Mitchison TJ, Hyman AA (2011) Active liquid-like behavior of nucleoli determines their size and shape in Xenopus laevis oocytes. Proc Natl Acad Sci U S A 108:4334–4339 22. Caragine CM, Haley SC, Zidovska A (2018) Surface fluctuation and coalescence of nucleolar droplets in the human cell nucleus. Phys Rev Lett 121:148101 23. Sułkowska JI, Cieplak M (2007) Mechanical stretching of proteins – a theoretical survey of the Protein Data Bank. J Phys Condens Matter 19:283201 24. Sikora M, Sułkowska JI, Cieplak M (2009) Mechanical strength of 17 134 model proteins and cysteine spliknots. PLoS Comp Biol 5: e1000547 25. Settanni G, Hoang TX, Micheletti C, Maritan A (2002) Folding pathways of prion and doppel. Biophys J 83:3533–3541 26. Tsai J, Taylor R, Chothia C, Gerstein M (1999) The packing density in proteins: standard radii and volumes. J Mol Biol 290:253–266 ´ , Cieplak M (2015) 27. Wołek K, Go`mez-Sicilia A Determination of contact maps in proteins: a combination of structural and chemical approaches. J Chem Phys 143:243105 28. Bancroft JB, Hiebert E, Rees MW, Markham R (1968) Properties of cowpea chlorotic mottle virus, its protein and nucleic acid. Virology 34:224–239 29. Konecny R, Trylska J, Tama F, Zhang D, Baker NA, Brooks CL III, McCammon JA (2006) Electrostatic properties of cowpea chlorotic mottle virus and cucumber mosaic virus capsids. Biopolymers 82:106–120 30. Michel JP, Ivanovska IL, Gibbons MM, Klug WS, Knobler CM, Wuite GJL, Schmidt CF (2006) Nanoindentation studies of full and empty viral capsids and the effects of capsid protein mutations on elasticity and strength. Proc Natl Acad Sci U S A 103:6184–6189 31. Roos WH, Bruisma R, Wuite GJL (2010) Physical virology. Nat Phys 6:733–743 32. Fraenkel-Conrat H, Williams RC (1955) Reconstitution of active tobacco mosaic virus from its inactive protein and nucleic acid components. Proc Natl Acad Sci U S A 41o:690–698 33. Endres D, Zlotnick A (2002) Model-based analysis of assembly kinetics for virus capsids or other spherical polymers. Biophys J 83:1217–1230

119

34. Johnston IG, Louis AA, Doye JPK (2010) Modelling the self-assembly of virus capsids. J Phys Condens Matter 22:104101 35. Elrad OM, Hagan MF (2008) Mechanisms of size control and polymorphism in viral capsid assembly. Nano Lett 8:3850–3857 36. Elrad OM, Hagan MF (2010) Encapsulation of a polymer by an icosahedral virus. Phys Biol 7: o45003 37. Wołek K, Cieplak M (2017) Self-assembly of model proteins into virus capsids. J Phys Condens Matter 29:4745003 38. Cieplak M, Robbins MO (2010) Nanoindentation of virus capsids in a molecular model. J Chem Phys 132:015101 39. Cieplak M, Robbins MO (2013) Nanoindentation of 35 virus capsids in a molecular model: relating mechanical properties to structure. PLoS One 8:e63630 40. Cieplak M (2014) Mechanostability of virus capsids and their proteins in structure-based models. In: Liwo A (ed) Computational methods to study the structure and dynamics of biomolecules and biomolecular processes from bioinformatics to molecular quantum mechanics. Springer, Heidelberg, pp 295–315 41. Gibbons MM, Klug WS (2008) Influence of nonuniform geometry on nanoindentation of viral capsids. Biophys J 95:3640–3649 42. Poma AB, Chwastyk M, Cieplak M (2015) Polysaccharide-protein complexes in a coarsegrained model. J Phys Chem B 119:12028–12041 43. Zlotnick A, Aldrich R, Johnson JM, Ceres P, Young MJ (2000) Mechanism of capsid assembly for an icosahedral plant virus. Virology 277:450–456 44. Xie Z, Hendrix RW (1995) Assembly in vitro of bacteriophage HK97 proheads. Mol Biol 253:74–85 45. Mioduszewski Ł, Cieplak M (2018) Disordered peptide chains in an α-C-based coarsegrained model. Phys Chem Chem Phys 20:19057–19070 46. Buchete N-V, Straub JE, Thirumalai D (2009) On the development of coarse-grained protein models: importance of relative side-chain orientations and backbone interactions. In: Voth GA (ed) Coarse-graining of condensed phase and biomolecular systems. CRC Press, Boca Raton, FL, pp 141–156. Chapter 10 47. Hung NB, Le D-M, Hoang TX (2017) Sequence dependent aggregation of peptides and fibril formation. J Chem Phys 147:105102 48. Enciso M, Rey A (2010) A refined hydrogen bond potential for flexible protein models. J Chem Phys 132:235102

120

Marek Cieplak et al.

49. Mioduszewski Ł, Cieplak M (2020) Protein droplets in systems of disordered homopeptides and the amyloid glass phase. Phys Chem Chem Phys 22:15592 50. Sharma D, Shinchuk LM, Inouye H, Wetzel R, Kirschner DA (2005) Polyglutamine homopolymers having 8-45 residues form slablike β-crystallite assemblies. Proteins 61:398–411 51. Peskett TR, Rau F, O’Driscoll J, Patani R, Lowe AR, Saibil HR (2018) A liquid to solid phase transition underlying pathological huntingtin exon1 aggregation. Mol Cell 70:588–601 52. Elbaum-Garfinkle S (2019) Matter over mind: liquid phase separation and neurodegeneration. JBC Rev 294:7160–7168 53. Dignon GL, Zheng W, Kim YC, Best RB, Mittal J (2018) Sequence determinants of protein phase behavior from a coarse-grained model. PLoS Comput Biol 14:e1005941 54. Dignon GL, Zheng W, Best RB, Kim YC, Mittal J (2018) Relation between single-molecule properties and phase behavior of intrinsically disordered proteins. Proc Natl Acad Sci U S A 115:9929–9934 55. Dignon GL, Zheng W, Kim YC, Mittal J (2019) Temperature-controlled liquid-liquid

phase separation of disordered proteins. ACS Central Sci 5:821–830 56. Chwastyk M, Cieplak M (2020) Conformational biases of α-synuclein and formation of transient knots. J Phys Chem B 124:11–19 57. Phillips JC, Braun R, Wang W, Gumbart J, Tajkhorshid E, Villa E, Chipot C, Skeel RD, Kale L, Schulten K (2005) Scalable molecular dynamics with NAMD. J Comput Chem 26:1781–1802 58. Huang J, Rauscher S, Nawrocki G, Ran T, Feig M, de Groot BL, Grubmueller H, MacKerell AD Jr (2017) CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nat Methods 14:71–73 59. Humphrey W, Dalke A, Schulten K (1996) VMD-visual molecular dynamics. J Mol Graph 14:33–38 60. Tanner DE, Chan K-Y, Phillips JC, Schulten K (2011) Parallel generalized born implicit solvent calculations with NAMD. J Chem Theory Comput 7:3635–3642 61. Larini L, Gessel MM, LaPointe NE, Do TD, Bowers MT, Feinstein SC, Shea J-E (2013) Initiation of assembly of tau(273-284) and its ΔK280 mutant: an experimental and computational study. Phys Chem Chem Phys 15:8916–8928

Chapter 7 Molecular Insights into the Effect of Metals on Amyloid Aggregation Yifat Miller Abstract Amyloid diseases such as Alzheimer’s disease (AD), Parkinson’s disease (PD), and type 2 diabetes (T2D) are characterized by accumulation of misfolded proteins’ species, e.g., oligomers and fibrils. The formation of these species occurs via self-assemble of the misfolded proteins in a process which is named “aggregation.” It is known that essential divalent metal ions initiate the aggregation of these misfolded proteins, and that specific concentrations of these metal ions may be implicated in the pathology of amyloid diseases. This chapter focuses on the effects of two of the most common divalent metal ions in the brain—Zn2+ and Cu2+, and while Zn2+ ion is known as a metal that is release from the pancreas. Specifically, the spotlight of this chapter illustrates recent computational molecular modelling studies that investigate the effect of the concentrations of metal ions on aggregation of the misfolded proteins amylin, amyloid β, and α-synuclein. The challenges for computational molecular modeling and future perspectives are discussed. Key words Metals, Amyloidogenic diseases, Molecular dynamics simulations, Amylin, Amyloid β, α-Synuclein

1

Introduction The pathological self-assembly (or aggregation) of amyloid proteins into toxic aggregate species plays important role in amyloidogenic diseases, e.g., Alzheimer’s disease (AD), Parkinson’s disease (PD), and type 2 diabetes (T2D). The particular amyloids that have been implicated in the development of these diseases and will be focused in this chapter are amyloid β (Aβ) in AD, α-synuclein (AS) in PD, and amylin in T2D. The monomeric forms of these amyloids are non-toxic and play a variety of important biological functions, but it is currently held that it is the polymorphic early stage oligomers that contribute to disease pathology. Thus, one of the most challenges in investigating these diseases, is the fact that the early stage amyloid oligomers are polymorphic and are produced via a variety of molecular mechanisms. It is likely that the lack of understanding of the various

Mai Suan Li et al. (eds.), Computer Simulations of Aggregation of Proteins and Peptides, Methods in Molecular Biology, vol. 2340, https://doi.org/10.1007/978-1-0716-1546-1_7, © Springer Science+Business Media, LLC, part of Springer Nature 2022

121

122

Yifat Miller

molecular mechanism of formation of polymorphic toxic early stage amyloid oligomers is the main reason that effective drugs have not been developed for these diseases. Significant evidence has accumulated, linking the development of amyloidogenic diseases with exposure to metal ions (e.g., Al3+, Pb2+, Hg2+, Ni2+) or with an imbalance of essential nutrient divalent metal ions (e.g., Cu2+, Zn2+, Fe2+, Mn2+, Mg2+, Ca2+, and Co2 + ) [1]. This chapter will focus on the essential nutrient divalent metal ions that play important roles in key biological functions, such as in enzymatic catalysis as co-factors for various ribozymes, and in cell metabolism [2]. With increasing age and/or under conditions of oxidative stress, the homeostasis of these essential metal ions is disturbed, leading to an increase in the concentrations of free metal ions in the brain [3]. In AD and PD, such free ions are released from brain cells and interact with Aβ (in AD) and AS (in PD) to form metal–amyloid aggregate complexes [4]. It has been claimed that such metal–amyloid aggregate complexes promote amyloid aggregation and that specific concentrations of these essential divalent metal ions may be implicated in the pathology of neurodegenerative diseases. This chapter focuses on the effects of two of the most common divalent metal ions in the brain—Zn2+, Cu2+—because these ions have a relatively high affinity for the above-mentioned amyloids: Aβ and AS, and on Zn2+ that is known as a metal that is released from the pancreas together with insulin and amylin. This chapter focuses on recent studies applying molecular modeling tools to investigate the effect of metal ions on amylin Aβ and AS. In silico studies deals with challenges while investigating the effect of metal ions on amyloid aggregation [5]. The chapter does not discuss the oxidative pathways that are related to amyloid aggregation. The challenges to investigate the effect of metals on amyloid aggregation and future perspectives are discussed for each type of amyloid.

2

Amylin Amylin or islet amyloid polypeptide (IAPP) is a 37-residue hormone peptide, which is secreted together with insulin and Zn2+ ions from the pancreatic islet β-cells [6]. It is suggested that the physiological role of amylin is related to a wide range of regulatory functions [7–10]. Amylin peptides are self-assembled and produce aggregates that indirectly play role in the pathology of T2D by causing dysfunction and death to the β-cells [11–14]. In most of the T2D cases, it was found that amylin is the major component in the amyloid deposits of islet β-cells [15, 16]. Amylin and Zn2+ ions together have a crucial role in glycemic regulation, but interruption in the balance of amylin and Zn2+ ions

Molecular Insights into the Effect of Metals on Amyloid Aggregation

123

may cause aggregation of amylin and consequently progression of T2D [17, 18]. Therefore, Zn2+ ions may bind to amylin peptides and initiate amylin aggregation. Yet, the molecular mechanisms at the atomic resolution on which Zn2+ ions interact with amylin using in vivo studies or in vitro experimental techniques cannot be affordable. However, in silico studies can contribute to the understanding of molecular mechanisms in which Zn2+ ions bind to amylin and affect the aggregation. Subheading 2.1 illustrates the studies on the specific Zn2+-binding sites in amylin aggregates and Subheading 2.2 discusses the effect of the concentrations of Zn2+ ions on amylin aggregation. Finally, the effect of Zn2+ ions on the conformations of amylin aggregated will be focused on Subheading 2.3. 2.1 Zn2+ Ion-Binding Sites in Amylin Fibril-Like Oligomers

It has been shown that Zn2+ ions in amylin solution initiate formation of oligomers and fibril by binding to amylin peptides [19, 20]. To investigate the Zn2+-binding sites in amylin fibrillike oligomers, it is crucial at the first step to study the conformations of amylin fibril-like oligomers at the atomic resolution in the absence of metal ions. Previously, Wineman-Fisher et al. demonstrated polymorphic states of amylin fibril-like oligomers that differ in the orientation of the residues along the fibril axis [21, 22]. It has been proposed that Zn2+ ions bind to His18 in amylin monomer and aggregates [19, 23]; therefore, Zn2+ ions were bound to His18 in the single- and double-layer fibril-like oligomers of amylin [24]. Figure 1 shows the four representative polymorphic states of the fibril-like oligomers of amylin. One can see that in models A1 and A4, the His18 are located inside the core domain of the fibril axis of amylin. Simulations of the Zn2+-amylin fibril-like oligomers when His18 are located inside the core domain demonstrated that the Zn2+ ions were escaped from His18. In fact, the Zn2+ ions could not accommodate the binding site, due to steric effects inside the core domain. Thus, it was suggested that Zn2+ ions selectively bind to specific classes of amylin fibril-like oligomers, when His18 is located outside the core domain of the fibril. In summary, in the self-assembly and folding processes of the formation of the Zn2+amylin fibril, Zn2+ ions bind only when His18 are exposed to the solution. Therefore, the polymorphic states of amylin fibril-like oligomers are increased when adding Zn2+ ions. A further Zn2+-binding site was proposed by in silico study, which so far has not been tested by in vitro studies [25]. Cysteine residues, Cys2 and Cys7, in amylin may be potential residues to bind Zn2+ ions, but these two residues form a disulfide bond and therefore cannot bind metal ions. T2D, among other amyloidogenic diseases such as AD and PD, is related to oxidative stress [26– 28]. Cysteine residues are redox-activated, and in a wide range of biological systems, the disulfide bonds are used to regulate the oxidative stress by binding of Zn2+ ions to the cysteine residues, thereby causing disulfide bonds to form or break [29, 30]. Cysteines

124

Yifat Miller

Fig. 1 Representation of the molecular models of fibril-like structures of amylin (A1–A4) that differ in the orientations of the residues along the β-arch. Residue colors: Arg11, red; His18, blue; Asn19, purple; and Asn31, green

Fig. 2 Three new possible Zn2+-binding sites (in addition to His18)—suggested by molecular modeling tools, in the case were the disulfide bond between Cys2 and Cys7 in amylin is removed

function as redox regulators and act as “redox switches” that have a high tendency to bind and release Zn2+ ions from the disulfide bonds in the redox reactions [29, 30]. Furthermore, cysteine residues tend to bind metal ions in Cys-M-Cys motifs in biological systems that are involved in redox reactions [29, 31]. To investigate this new binding site, in the first step the disulfide bonds in amylin fibril-like oligomers were removed, and the effect on the structural features was investigated [32]. Interestingly, it has been shown that

Molecular Insights into the Effect of Metals on Amyloid Aggregation

125

the removals of the disulfide bonds do not affect the structural features of the cross-β structure of the fibrils. Later, Zn2+ ions were bound to cysteine residues. Three possible binding sites were taken into account (Fig. 2): (1) Cys2 of one peptide and Cys2 of a neighbor peptide, (2) Cys7 of one peptide and Cys7 of a neighbor peptide, and (3) Cys2 and Cys7 of one peptide and Cys2 and Cys7 of a neighbor peptide. It has been shown that the Zn2+ ions prefer to interact with four Cys residues than with two Cys residues [25]. 2.2 The Effect of the Concentrations of Zn2+ Ions on Amylin Aggregation

Amylin fibril-like oligomers exhibit polymorphic states with similar conformational energies and populations [21, 22]. Since Zn2+ ions cannot bind to all polymorphic amylin fibril-like oligomers, but only to those that the His18 residues are located outside the core domain, the Zn2+ ions decrease polymorphism in amylin fibril-like oligomers. Yet, there is a great interest to investigate whether the concentrations may affect the polymorphism of amylin aggregates. Previously, it has been shown by molecular modeling tools that at low Zn2+ concentration, the polymorphism is decreased, while at high Zn2+ concentration, the polymorphism is increased [24]. So far, the effect of Zn2+ ion concentration on polymorphism of amylin aggregates has not been investigated via in vitro studies, and future studies should be initiated to confirm the in silico study.

2.3 The Effect of Zn2+ Ions on the Conformations of Amylin Fibril-Like Oligomers

The study of the structural properties of amylin aggregates in the absence and in the presence of Zn2+ ions is vital in order to investigate the effect of Zn2+ ions on amylin aggregation. The question whether the Zn2+ ions affect the mechanisms of the fibrillation is crucial. To this aim, the structural properties of the cross-β structure of amylin in the absence of Zn2+ ions and in the presence of Zn2+ ions has been investigated via in silico study [24]. Figure 3 demonstrates three amylin fibril-like oligomers that were investigated via molecular dynamics simulations in the absence of Zn2+ ions: models M2, M2d, and M5 [21] and five amylin fibril-like oligomers in the presence of Zn2+ ions: models R1, R2, R3, S1, and S3 [24]. Models R1 and R3 are equivalent to model M2, and model R2 is equivalent to model M2d. Models S1 and S3 are equivalent to model M5. To investigate the structural properties of these amylin fibrillike oligomers, analysis of the secondary structure of the simulated models was performed. According to dihedral angles ψ and φ analysis, the secondary structure of the Zn2+-amylin fibril-like oligomers is similar to the equivalent amylin fibril-like oligomers (Fig. 4). The two β-strand domains (residues 8–17 and residues 26–37) were conserved in the Zn2+-amylin fibril-like oligomers, similarly to the equivalent amylin fibril-like oligomer models. Therefore, it was proposed that the Zn2+ ions do not change the

126

Yifat Miller

Fig. 3 Three amylin fibril-like oligomers in the absence of Zn2+ ions: models M2, M2d, and M5 and five amylin fibril-like oligomers in the presence of Zn2+ ions: models R1, R2, R3, S1, and S3. Models M2, R1, R2, and R3 present similar fibrillary state; Models M5, S1, and S3 present similar fibrillary state

formation of stable fibril cross-β structures of amylin. Interestingly, a previous experimental study has shown that both in the absence and in the presence of Zn2+ ions, amylin peptides are self-assembled into fibrillar structures, i.e., forming cross-β structures, and that the structural features are similar [20]. Thus, the in silico study approved the experimental observation by providing at the atomic resolution that Zn2+ ions do not affect the fibrillation of amylin.

3

Amyloid β AD is the most common progressive neurodegenerative brain disorder in man. The two pathological hallmarks observed in the brain of AD patients are senile plaques and/or neurofibrillary tangles.

Molecular Insights into the Effect of Metals on Amyloid Aggregation

127

Fig. 4 The secondary structure along the sequence of amylin of the simulated models of the simulated models amylin oligomers M2, M5, and M5d (Fig. 3), simulated models of Zn2+–amylin oligomers with the Zn2+:amylin ratio of 1:2—models R1, R2, and S1 and simulated models of Zn2+–amylin oligomers with the Zn2+:amylin ratio of 1:4—models R3 and S3. The arrows illustrate the β-strand structure. (Reproduced from ref. 24 with permission from the PCCP Owner Societies)

The senile plaques are composed of Aβ peptides, while the neurofibrillary tangles consist of tau proteins. Both Aβ and tau are selfassembled to produce aggregates, e.g., oligomers and fibrils. Tau protein—consisting of 441 amino acids—is an intrinsically disordered protein that stabilizes and promotes the self-assembly of microtubules in neurons. Aβ peptides is a peptide (37–43 amino acids in length) that is derived from the transmembrane protein named amyloid precursor proteins. With increasing age and/or oxidative stress, the blood–brain barrier (BBB), which tightly regulates the homeostasis of Zn2+ and Cu2+ ions in the brain [3], becomes impaired, enabling these metal ions to be transported into and out of the brain more easily [3, 33, 34]. The resulting metal ion dyshomeostasis usually leads to high levels of free metal ions in the extracellular matrix of brain neurons [35]. In AD, such free ions are released from brain cells and interact with Aβ and tau/ p-tau to form metal–amyloid oligomer complexes. It has been claimed that such metal–amyloid oligomer complexes promote amyloid aggregation and that specific concentrations of these essential divalent metal ions may be implicated in the pathology of AD [35]. Gaining a better understanding of the aggregation mechanisms with the absence of metal ions will contribute to producing a unified picture of AD etiology and thereby facilitate the development of effective diagnostics and therapeutics. This chapter focuses mainly on in silico studies that investigated the effect of metals

128

Yifat Miller

when binding to Aβ aggregates and not to Aβ monomers. Experimental studies provided information on metal-binding sites in Aβ monomer; however, data with regard to metal-binding sites in Aβ aggregates are so far not available from experiment. Subheading 3.1 details in silico studies that investigate the specific metal-binding sites in Aβ aggregates. In Subheading 3.2, the effect of the concentration of metal ions on Aβ aggregates and on polymorphism of Aβ are discussed. 3.1 Metal-Binding Sites in Aβ Aggregates

The Cu2+-binding sites in Aβ monomer have been reviewed [36] and are summarized in Table 1. The Cu2+-binding site in a disordered Aβ dimer has been investigated via molecular dynamics simulations [37, 38]. The Cu2+ was set to coordinate to atom N and O of Asp1, atom Nδ of His6, and atom Nε of residue His13. A further in silico study demonstrated the Cu2+-binding site in Aβ dimer, using H-replica exchange molecular dynamics simulations [39]. The Cu2+ ion was coordinated by a pair of His13 and His14 residues from the two Aβ1–42 peptides. In the fibrillary state, several Cu2+-binding sites were examined using molecular dynamics simulations (Fig. 5) [40]. Simulations showed that in some of the fibrillary states Cu2+ binding does not introduce major conformation changes except for the N- and C-terminal domains, where binding is likely to take place. In such cases, the C-terminal of the V40 participates in the coordination with the Cu2+ ions. The Zn2+-binding sites in Aβ monomer has been detailed previously [36] and are summarized in Table 2. So far, the Zn2+binding sites in Aβ oligomers have not been investigated neither by in silico nor by in vitro studies. However, various Zn2+-binding sites in the fibrillary state of Aβ have been extensively investigated in 12 models by molecular dynamics simulations [41] and are summarized in Table 3. While examining possible arrangements of the Zn2+-Aβ1–16 β-strands for the intra-peptide coordination mode within the fibrillary state, it was found that two nearby Zn2+Aβ1–16 peptides form complementary shapes with tightly packed geometry. Thus, the Zn2+ can simultaneously coordinate intra- and intermolecularly, bridging two peptides, forming hydrophobic cluster in the N-termini domains and consequently promoting aggregation.

3.2 The Effect of Zn2+ Concentrations on Aβ Aggregation and Polymorphism

It is known that Aβ landscape presents a broad range of aggregated conformational states [42]. Using replica exchange molecular dynamics simulations, it has been shown that at high Zn2+ concentration, the energy landscape undergoes a remarkable change as compared to the lower Zn2+ concentration. These results imply that Aβ aggregation does not increase linearly with zinc concentration. At certain concentration, increasing Zn2+ not only prevents fibril formation, but also decreases aggregation. Moreover, at high

Molecular Insights into the Effect of Metals on Amyloid Aggregation

129

Table 1 Proposed Cu2+-binding site in Aβ monomer, obtained from experimental data (References are detailed and reviewed in ref. 36) Segment of Aβ

pH

Cu2+-binding site

Rat–Aβ1–28

7.5

Asp1, His6, His13

Aβ1–40/Aβ1–42

5.5–7.5

His6, His13, His14, Tyr10

Aβ1–16/Aβ1–28

7.4

Asp1, His6, His13, His14

Aβ1–16/Aβ1–40

7.4

His6, Glu11, His13, His14

Aβ3–40/42

6.3–8

Ala2, His6, His13, His14

Fig. 5 Constructed models of various Cu2+-binding sites in Aβ fibril-like oligomers, examined via molecular dynamics simulations. (Detailed are illustrated in ref. 40) Table 2 Proposed Zn2+-binding site in Aβ monomer, obtained from experimental data (References are detailed and reviewed in ref. 36) Segment of Aβ

pH

Zn2+-binding site

Aβ1–16

6.5–7.4

His6, Glu11, His13, His14

Rat-Aβ1–28

7.5

Asp1, His6, Glu11, His14

Human Aβ1–28

7.5

Asp1, His6, Glu11, His13, His14

Two monomers of Aβ1–40

7.4

His13 and His14 of two adjacent Aβ peptides

130

Yifat Miller

Table 3 Models of Zn2+-Aβ1–42 fibril-like oligomers at Zn2+:Aβ ratios of 1:1 and 1:2 (The structures are detailed in ref. 41) Model

Residues coordinate to Zn2+

Conformational organization

M1

His6, Glu11, His13, His14

Parallel

M2

His6, Glu11, His13, His14

Parallel

M3

His6, Glu11, His13, His14

Antiparallel

M4

His6, Glu11, His13, His14

Antiparallel

M5

His6, Glu11, His13, His14

Antiparallel

M6

Asp1, His6, Glu11, His14

Parallel

M7

Asp1, His6, Glu11, His13, His14

Parallel

M8

Asp1, His6, Glu11, His13, His14

Parallel

M9

Asp1, His6, Glu11, His13, His14

Parallel

M10

Asp1, His6, Glu11, His13, His14

Antiparallel

M11

Asp1, His6, Glu11, His13, His14

Antiparallel

M12

Glu11, His13, His14

Antiparallel

Zn2+ concentration, there is a formation of more disordered toxic Aβ oligomers. The observed changing aggregation energy landscape with zinc concentration suggests that toxicity may not linearly correlate with ion concentration. It has been suggested that, in the human brain, Zn2+ concentration is not sufficiently high to promote amyloidosis [43]; however, in silico study implies that trace amounts of free zinc should not be neglected. Finally, it is important to note that toxicity could also relate to the mechanism. Species conformations and solubility could be crucial for Zn2+-mediated toxicity. Different zinc concentrations may selectively stabilize different oligomeric forms, and different oligomeric sizes are expected to have different stabilities. This complexity rendered by zinc coordination explains zinc selectivity in precipitating aggregation intermediates. Amyloids are highly polymorphic, and population shifts redistribute polymorphic states.

4

α-Synuclein (AS) AS is a presynaptic neuronal protein that is linked genetically and neuropathologically to PD. AS is an intrinsically disordered protein, which has attracted considerable interest in the recent years. This protein that consists of 140 residues is divided into three main domains: The first domain (residues M1–K60) is an amphipathic

Molecular Insights into the Effect of Metals on Amyloid Aggregation

131

lysine-rich amino terminus, which plays a crucial role in modulating the interactions of AS with metals and membranes. The second domain (residues E61–V95) is the central domain of AS that is known as the “non-amyloid-β component” (NAC). This domain contains a highly hydrophobic motif that is indispensable for AS aggregation. The third domain (K96–A140) is an acidic carboxylterminal tail, which has been implicated in regulating the nuclear localization and interactions of AS with metal ions, small molecules, and proteins [44, 45]. The C-terminal tail is highly acidic and devoid of any secondary structure. The primary cause of α-synuclein aggregation in PD is unknown, but growing evidence indicates that environmental factors such as long-term exposure to heavy metals may enhance AS aggregation [46–49]. In vitro studies demonstrate that 15 of the mono-, di-, and trivalent metal ions can significantly accelerate AS fibril formation [50], as reviewed by Santner and Uversky [1]. Another review focused only on Cu2+ and showed that the levels of Cu2+ are significantly increased in the cerebrospinal fluid of PD patients [51]. Furthermore, among all of the 15 metal ions that bind to AS, only Cu2+ can bind α-synuclein in the micromolar range [52], thereby indicating that this protein has high affinity for Cu2+. Moreover, Cu2+–AS interactions can enhance the formation of amyloid fibrils [53, 54]. Therefore, a detailed understanding of the structural features and the coordination environment of Cu2 + with respect to AS is essential to elucidate the molecular-level interactions between AS and Cu2+. However, the unstructured nature of AS fibrillary aggregate makes it difficult to characterize these aspects in experimental and computational studies [55]. The structure of polymorphic fibrillary aggregates of AS has been recently proposed by experimental techniques [56–58] and by molecular modeling tools [59]. Yet, the specific Cu2+-binding sites in the fibrillary states are necessary in order to understand the mechanism of AS aggregation in the presence of Cu2+ ions. Therefore, Subheading 4.1 focuses on the specific Cu2+-binding sites that have been proposed by a combination of extensive experimental and in silico techniques. In Subheading 4.2, the effect of the concentration of Cu2+ ions on AS aggregation and on polymorphism will be discussed, and in Subheading 4.3, the effect of the Cu2+ ions on the conformations of the fibrillary states will be detailed. 4.1 The Specific Cu2 + -Binding Sites in AS Fibrillary Aggregates

Measurements of electron paramagnetic resonance (EPR) and NMR of Cu2+ binding to fragments of AS suggest that Cu2+ can bind to three domains in AS: (1) the N-terminal region with the highest affinity, (2) His50, and (3) the C-terminal (Asp119Asp121-Asn122-Glu123) with lower affinity [60]. In the highest affinity-binding site of the N-terminal of AS, Cu2+ may be coordinated with Met1, Asp2, and a water molecule [54, 61]. Recently, it

132

Yifat Miller

Fig. 6 Three possible Cu2+-binding site domains that were proposed by experiments, examined in two polymorphic AS fibril-like oligomers. Left, top: The molecular modeling model (proposed by Bloch and Miller, ref. 59). Right, bottom: ssNMR model. (Proposed by Tuttle et al. ref. 58)

was suggested that Cu+ binds to the N-terminal acetylated AS [62], but acetylation of the N-terminal of AS prevents the binding of Cu+ to the N-terminal domain. Interestingly, a study of the NAC fragment showed that it can bind to Cu2+, but the specific binding site in the NAC domain was not determined [63]. So far, there is a lack of data with regard of Cu2+-binding sites in AS aggregates. Recently, the three possible Cu2+-binding sites were examined within two polymorphic fibril-like AS oligomers using molecular dynamics simulations (Fig. 6) [64]. The molecular dynamics simulations have shown that Cu2+ ions avoid to bind to the residues His50 within AS fibril-like oligomers. The Cu2+ ions were escaped from the His50 residues within the fibril-like oligomers during the molecular dynamic simulations. Combining experimental techniques, it has been shown that when Cu2+ ions were incubated with mature fibrils, the ions were not bound to His50 residues. The well-ordered β-sheets in the fibrils prevent accommodation of the Cu2+ ions to the His50 residues that are located in these β-sheet domains. The other two possible Cu2+-binding sites in AS fibril-like oligomers were examined: amphipathic N-termini domains and the acidic C-termini domains. Using molecular modeling tools, it has been found that Cu2+ ions bind to both N- and C-termini domains in both polymorphic AS fibril-like oligomers. Interestingly, statistical calculations demonstrated that the Cu2+ ions prefer to bind to the amphipathic N-termini domains than to the acidic

Molecular Insights into the Effect of Metals on Amyloid Aggregation

133

C-termini domain within the polymorphic fibril-like oligomers. The extended X-ray absorption fine-structure spectra and further extensive experimental techniques approved that Cu2+ ions have high affinity to the amphipathic N-termini domains within the fibrillary state. 4.2 The Effect of Cu2 + Concentrations on Metal-Binding Sites

While it was suggested that the residues in the amphipathic domain in AS monomer, residues Met1 and Asp2, bind Cu2+ ions, in the fibril-like oligomers, it was found that only Asp2 residues bind the ions. It was shown by molecular dynamics simulations that with Cu2+:AS ratio of 1:2, i.e., at low Cu2+ concentration, two Asp2 of each AS monomer bind one Cu2+ ion. Furthermore, the Glu139 residues of the C-termini act as Cu2+ ligands. Finally, water molecules complete the coordination sphere. Increasing Cu2+ ion concentrations to a ratio Cu2+:AS of 1:1 affect the fluctuations in the N-termini and allow water molecules to access the N-termini domains. Therefore, it is reasonable to propose that a water molecule completes the coordination mode. Interestingly, higher copper concentrations (from Cu2+:AS ratio of 1:2 to 1:1) induce hopping of Cu2+ ions between monomers within the fibril-like oligomers. The N atom of Met1 does not bind the Cu2+ ions at high copper concentrations. However, other residues participate in the Cu2+-binding sites in the fibrils, mainly Glu and Asp, which are located in the N- and the C-termini. Hopping of the Cu2+ ions supports the experimental observation that fluctuations in fibril termini lead to the change of copper coordination geometries [64]. At low concentrations, Cu2+ ions prefer to bind the amphipathic N-termini domain, while Asp2 residues are maintained in the metal coordination sphere for all the molecular dynamics trajectories along the simulations. Obviously, the effect of Cu2+ concentrations may present different response in a particular polymorph fibril-like AS oligomer. Therefore, it is crucial to examine the effect on the Cu2+ concentrations on each polymorphic state. The effect of the Cu2+ concentrations on two polymorphic AS fibril-like oligomers were examined by molecular dynamics simulations and are detailed by Bloch et al. [64].

4.3 The Effect of Cu2 + Concentrations on Conformational Change and Polymorphism

While Cu2+ ions do not bind His50 in AS fibril-like oligomers, there are two possible binding sites in the polymorphic AS fibrillike oligomers that were examined by molecular dynamics simulations and experimental techniques: [64] residue Asp2 in the amphipathic N-termini domains and residues Asp119, Asp121, Asn122, and Glu123 in the acidic C-termini domains. Therefore, it is proposed that Cu2+ ions increase polymorphism of AS fibril-like oligomers. Furthermore, it was found by molecular dynamics simulations that there are hopping events between variety of Cu2 + -binding sites within the N- and the C-termini domains within the

134

Yifat Miller

fibrillary states. Thus, a variety of the Cu2+-binding sites increase dramatically the polymorphic states of AS fibrils. Moreover, it has been proposed from molecular dynamics simulations that the hopping interplay of Cu2+ ions between N-termini and C-termini yield to conformational change of Cu2+AS coordination geometries. Interestingly, the simulations demonstrated that the Cu2+ ions do not change the diameter of AS fibrils, but lead to the distortion of β-sheet structures within the N- and C-termini of the fibrils. The distortion of the β-sheet may be due to the fluctuations of the N- and the C-termini and the hopping events of the Cu2+ ions that occur in these domains.

5

Challenges and Future Perspectives One of the major difficulties in experimental studies is obtaining high atomic resolution structures of self-assembled amyloids, particularly when these self-assembled amyloids are complexed with metals. High atomic resolution ssNMR or X-ray crystallography structures of metal-amyloid oligomers are not available. Experimental studies lack the techniques to investigate the molecular mechanisms of metal binding to amyloid aggregates. However, recent in silico studies provided efforts to learn about the metalbinding sites of metals in both amyloid monomers and aggregates. Obviously, combining experimental and computational techniques may assist in the progress of understanding the molecular mechanisms of metal binding to amyloid aggregates. The current chapter in the book focuses on the binding sites in amyloid aggregates and the effect of the specific metal-binding sites on the conformation of the aggregates. Further in silico studies are focused on oxidative pathways that are activated around metal ions [38, 39]. Yet, promising future directions for the research of the metal effect on amyloid aggregation should be focused on amyloid oligomers. Since amyloid oligomers are known as the toxic species, there is a great interest to focus on the early stage oligomers of amyloids. The most impact in the coming years should be on developing computational techniques to initiate the research on the redox active metal ions that can catalyze the production of reactive oxygen species when metals bind to amyloid oligomers. Obviously, one need to take into account a combination of quantum and classical mechanical methods to solve this challenge.

Molecular Insights into the Effect of Metals on Amyloid Aggregation

135

References 1. Santner A, Uversky VN (2010) Metalloproteomics and metal toxicology of alpha-synuclein. Metallomics 2(6):378–392 2. Wedd A, Maret W (2014) Binding, transport and storage of metal ions in biological cells. RSC, London 3. Tamano H, Takeda A (2011) Dynamic action of neurometals at the synapse. Metallomics 3 (7):656–661 4. Solomonov I, Korkotian E, Born B, Feldman Y, Bitler A, Rahimi F, Li H, Bitan G, Sagi I (2012) Zn2+-Abeta40 complexes form metastable quasi-spherical oligomers that are cytotoxic to cultured hippocampal neurons. J Biol Chem 287(24):20555–20564 5. Wineman-Fisher V, Bloch DN, Miller Y (2016) Challenges in studying the structures of metalamyloid oligomers related to type 2 diabetes, Parkinson’s disease, and Alzheimer’s disease. Coord Chem Rev 327:20–26 6. Lukinius A, Wilander E, Westermark GT, Engstrom U, Westermark P (1989) Co-localization of islet amyloid polypeptide and insulin in the B cell secretory granules of the human pancreatic islets. Diabetologia 32 (4):240–244 7. Zhao HL, Lai FM, Tong PC, Zhong DR, Yang D, Tomlinson B, Chan JC (2003) Prevalence and clinicopathological characteristics of islet amyloid in chinese patients with type 2 diabetes. Diabetes 52(11):2759–2766 8. Rushing PA, Hagan MM, Seeley RJ, Lutz TA, D’Alessio DA, Air EL, Woods SC (2001) Inhibition of central amylin signaling increases food intake and body adiposity in rats. Endocrinology 142(11):5035 9. Reda TK, Geliebter A, Pi-Sunyer FX (2002) Amylin, food intake, and obesity. Obes Res 10 (10):1087–1091 10. Akesson B, Panagiotidis G, Westermark P, Lundquist I (2003) Islet amyloid polypeptide inhibits glucagon release and exerts a dual action on insulin release from isolated islets. Regul Pept 111(1–3):55–60 11. Hull RL, Westermark GT, Westermark P, Kahn SE (2004) Islet amyloid: a critical entity in the pathogenesis of type 2 diabetes. J Clin Endocrinol Metab 89(8):3629–3643 12. Maloy AL, Longnecker DS, Greenberg ER (1981) The relation of islet amyloid to the clinical type of diabetes. Hum Pathol 12 (10):917–922 13. Kahn SE, Andrikopoulos S, Verchere CB (1999) Islet amyloid: a long-recognized but

underappreciated pathological feature of type 2 diabetes. Diabetes 48(2):241–253 14. Jurgens CA, Toukatly MN, Fligner CL, Udayasankar J, Subramanian SL, Zraika S, Aston-Mourney K, Carr DB, Westermark P, Westermark GT, Kahn SE, Hull RL (2011) beta-cell loss and beta-cell apoptosis in human type 2 diabetes are related to islet amyloid deposition. Am J Pathol 178(6):2632–2640 15. Brender JR, Hartman K, Reid KR, Kennedy RT, Ramamoorthy A (2008) A single mutation in the nonamyloidogenic region of islet amyloid polypeptide greatly reduces toxicity. Biochemistry 47(48):12680–12688 16. Westermark P, Wernstedt C, Wilander E, Hayden DW, O’Brien TD, Johnson KH (1987) Amyloid fibrils in human insulinoma and islets of Langerhans of the diabetic cat are derived from a neuropeptide-like protein also present in normal islet cells. Proc Natl Acad Sci U S A 84(11):3881–3885 17. Janson J, Ashley RH, Harrison D, McIntyre S, Butler PC (1999) The mechanism of islet amyloid polypeptide toxicity is membrane disruption by intermediate-sized toxic amyloid particles. Diabetes 48(3):491–498 18. Ritzel RA, Meier JJ, Lin CY, Veldhuis JD, Butler PC (2007) Human islet amyloid polypeptide oligomers disrupt cell coupling, induce apoptosis, and impair insulin secretion in isolated human islets. Diabetes 56(1):65–71 19. Brender JR, Hartman K, Nanga RP, Popovych N, de la Salud Bea R, Vivekanandan S, Marsh EN, Ramamoorthy A (2010) Role of zinc in human islet amyloid polypeptide aggregation. J Am Chem Soc 132 (26):8973–8983 20. Brender JR, Krishnamoorthy J, Messina GM, Deb A, Vivekanandan S, La Rosa C, PennerHahn JE, Ramamoorthy A (2013) Zinc stabilization of prefibrillar oligomers of human islet amyloid polypeptide. Chem Commun 49 (32):3339–3341 21. Wineman-Fisher V, Atsmon-Raz Y, Miller Y (2015) Orientations of residues along the beta-arch of self-assembled amylin fibril-like structures lead to polymorphism. Biomacromolecules 16(1):156–165 22. Wineman-Fisher V, Miller Y (2016) Structural insights into the polymorphism of selfassembled Amylin oligomers. Isr J Chem 56 (8):590–598 23. Wei L, Jiang P, Xu W, Li H, Zhang H, Yan L, Chan-Park MB, Liu XW, Tang K, Mu Y, Pervushin K (2011) The molecular basis of distinct

136

Yifat Miller

aggregation pathways of islet amyloid polypeptide. J Biol Chem 286(8):6291–6300 24. Wineman-Fisher V, Miller Y (2016) Effect of Zn(2+) ions on the assembly of amylin oligomers: insight into the molecular mechanisms. Phys Chem Chem Phys 18(31):21590–21599 25. Wineman-Fisher V, Miller Y (2017) Insight into a new binding site of zinc ions in fibrillar amylin. ACS Chem Neurosci 8(9):2078–2087 26. Butterfield DA, Lauderback CM (2002) Lipid peroxidation and protein oxidation in Alzheimer’s disease brain: potential causes and consequences involving amyloid beta-peptideassociated free radical oxidative stress. Free Radic Biol Med 32(11):1050–1060 27. Markesbery WR (1997) Oxidative stress hypothesis in Alzheimer’s disease. Free Radic Biol Med 23(1):134–147 28. Pappolla MA, Chyan YJ, Omar RA, Hsiao K, Perry G, Smith MA, Bozner P (1998) Evidence of oxidative stress and in vivo neurotoxicity of beta-amyloid in a transgenic mouse model of Alzheimer’s disease: a chronic oxidative paradigm for testing antioxidant therapies in vivo. Am J Pathol 152(4):871–877 29. Maret W (2006) Zinc coordination environments in proteins as redox sensors and signal transducers. Antioxid Redox Signal 8 (9–10):1419–1441 30. Giles NM, Watts AB, Giles GI, Fry FH, Littlechild JA, Jacob C (2003) Metal and redox modulation of cysteine protein function. Chem Biol 10(8):677–693 31. Miseta A, Csutora P (2000) Relationship between the occurrence of cysteine in proteins and the complexity of organisms. Mol Biol Evol 17(8):1232–1239 32. Wineman-Fisher V, Tudorachi L, Nissim E, Miller Y (2016) The removal of disulfide bonds in amylin oligomers leads to the conformational change of the ‘native’ amylin oligomers. Phys Chem Chem Phys 18 (18):12438–12442 33. Bush AI, Tanzi RE (2008) Therapeutics for Alzheimer’s disease based on the metal hypothesis. Neurotherapeutics 5(3):421–432 34. Scott LE, Orvig C (2009) Medicinal inorganic chemistry approaches to passivation and removal of aberrant metal ions in disease. Chem Rev 109(10):4885–4910 35. Uttara B, Singh AV, Zamboni P, Mahajan RT (2009) Oxidative stress and neurodegenerative diseases: a review of upstream and downstream antioxidant therapeutic options. Curr Neuropharmacol 7(1):65–74 36. Miller Y, Ma B, Nussinov R (2012) Metal binding sites in amyloid oligomers: complexes and mechanisms. Coord Chem Rev 256:2245

37. Huy PD, Vuong QV, La Penna G, Faller P, Li MS (2016) Impact of Cu(II) binding on structures and dynamics of Abeta42 monomer and dimer: molecular dynamics study. ACS Chem Neurosci 7(10):1348–1363 38. La Penna G, Li MS (2019) Computational models explain how copper binding to amyloid-beta peptide oligomers enhances oxidative pathways. Phys Chem Chem Phys 21 (17):8774–8784 39. Liao Q, Owen MC, Bali S, Barz B, Strodel B (2018) Abeta under stress: the effects of acidosis, Cu(2+)-binding, and oxidation on amyloid beta-peptide dimers. Chem Commun 54 (56):7766–7769 40. Parthasarathy S, Long F, Miller Y, Xiao Y, McElheny D, Thurber K, Ma B, Nussinov R, Ishii Y (2011) Molecular-level examination of Cu2+ binding structure for amyloid fibrils of 40-residue Alzheimer’s beta by solid-state NMR spectroscopy. J Am Chem Soc 133 (10):3390–3400 41. Miller Y, Ma B, Nussinov R (2010) Zinc ions promote Alzheimer Abeta aggregation via population shift of polymorphic states. Proc Natl Acad Sci U S A 107(21):9490–9495 42. Miller Y, Ma B, Nussinov R (2010) Polymorphism in Alzheimer Abeta amyloid organization reflects conformational selection in a rugged energy landscape. Chem Rev 110 (8):4820–4838 43. Noy D, Solomonov I, Sinkevich O, Arad T, Kjaer K, Sagi I (2008) Zinc-amyloid beta interactions on a millisecond time-scale stabilize non-fibrillar Alzheimer-related species. J Am Chem Soc 130(4):1376–1383 44. Eliezer D, Kutluay E, Bussell R Jr, Browne G (2001) Conformational properties of alphasynuclein in its free and lipid-associated states. J Mol Biol 307(4):1061–1073 45. Rao JN, Dua V, Ulmer TS (2008) Characterization of alpha-synuclein interactions with selected aggregation-inhibiting small molecules. Biochemistry 47(16):4651–4656 46. Barnham KJ, Bush AI (2008) Metals in Alzheimer’s and Parkinson’s diseases. Curr Opin Chem Biol 12(2):222–228 47. Bisaglia M, Tessari I, Mammi S, Bubacco L (2009) Interaction between alpha-synuclein and metal ions, still looking for a role in the pathogenesis of Parkinson’s disease. NeuroMolecular Med 11(4):239–251 48. Rybicki BA, Johnson CC, Uman J, Gorell JM (1993) Parkinson’s disease mortality and the industrial use of heavy metals in Michigan. Mov Disord 8(1):87–92 49. Singh C, Ahmad I, Kumar A (2007) Pesticides and metals induced Parkinson’s disease:

Molecular Insights into the Effect of Metals on Amyloid Aggregation involvement of free radicals and oxidative stress. Cell Mol Biol 53(5):19–28 50. Uversky VN, Li J, Fink AL (2001) Metaltriggered structural transformations, aggregation, and fibrillation of human alpha-synuclein. A possible molecular NK between Parkinson’s disease and heavy metal exposure. J Biol Chem 276(47):44284–44296 51. Pall HS, Williams AC, Blake DR, Lunec J, Gutteridge JM, Hall M, Taylor A (1987) Raised cerebrospinal-fluid copper concentration in Parkinson’s disease. Lancet 2(8553):238–241 52. Binolfi A, Quintanar L, Bertoncinic CW, Griesinger C, Ferna´ndez CO (2012) Bioinorganic chemistry of copper coordination to alpha-synuclein: relevance to Parkinson’s disease. Coord Chem Rev 256 (19–20):2188–2201 53. Binolfi A, Rasia RM, Bertoncini CW, Ceolin M, Zweckstetter M, Griesinger C, Jovin TM, Fernandez CO (2006) Interaction of alphasynuclein with divalent metal ions reveals key differences: a link between structure, binding specificity and fibrillation enhancement. J Am Chem Soc 128(30):9893–9901 54. Rasia RM, Bertoncini CW, Marsh D, Hoyer W, Cherny D, Zweckstetter M, Griesinger C, Jovin TM, Fernandez CO (2005) Structural characterization of copper(II) binding to alpha-synuclein: insights into the bioinorganic chemistry of Parkinson’s disease. Proc Natl Acad Sci U S A 102(12):4294–4299 55. Binolfi A, Rodriguez EE, Valensin D, D’Amelio N, Ippoliti E, Obal G, Duran R, Magistrato A, Pritsch O, Zweckstetter M, Valensin G, Carloni P, Quintanar L, Griesinger C, Fernandez CO (2010) Bioinorganic chemistry of Parkinson’s disease: structural determinants for the copper-mediated amyloid formation of alpha-synuclein. Inorg Chem 49(22):10668–10679 56. Guerrero-Ferreira R, Taylor NM, Mona D, Ringler P, Lauer ME, Riek R, Britschgi M, Stahlberg H (2018) Cryo-EM structure of alpha-synuclein fibrils. elife 7:e36402 57. Li B, Ge P, Murray KA, Sheth P, Zhang M, Nair G, Sawaya MR, Shin WS, Boyer DR, Ye S,

137

Eisenberg DS, Zhou ZH, Jiang L (2018) Cryo-EM of full-length alpha-synuclein reveals fibril polymorphs with a common structural kernel. Nat Commun 9(1):3609 58. Tuttle MD, Comellas G, Nieuwkoop AJ, Covell DJ, Berthold DA, Kloepper KD, Courtney JM, Kim JK, Barclay AM, Kendall A, Wan W, Stubbs G, Schwieters CD, Lee VM, George JM, Rienstra CM (2016) Solid-state NMR structure of a pathogenic fibril of full-length human alpha-synuclein. Nat Struct Mol Biol 23(5):409–415 59. Bloch DN, Miller Y (2017) Study of molecular mechanisms of alpha-synuclein assembly: insight into a cross-beta structure in the N-termini of new alpha-synuclein fibrils. ACS Omega 2(7):3363–3370 60. Bertoncini CW, Jung YS, Fernandez CO, Hoyer W, Griesinger C, Jovin TM, Zweckstetter M (2005) Release of long-range tertiary interactions potentiates aggregation of natively unstructured alpha-synuclein. Proc Natl Acad Sci U S A 102(5):1430–1435 61. Binolfi A, Valiente-Gabioud AA, Duran R, Zweckstetter M, Griesinger C, Fernandez CO (2011) Exploring the structural details of Cu (I) binding to alpha-synuclein by NMR spectroscopy. J Am Chem Soc 133(2):194–196 62. Miotto MC, Valiente-Gabioud AA, Rossetti G, Zweckstetter M, Carloni P, Selenko P, Griesinger C, Binolfi A, Fernandez CO (2015) Copper binding to the N-terminally acetylated, naturally occurring form of alphasynuclein induces local helical folding. J Am Chem Soc 137(20):6444–6447 63. Khan A, Ashcroft AE, Higenell V, Korchazhkina OV, Exley C (2005) Metals accelerate the formation and direct the structure of amyloid fibrils of NAC. J Inorg Biochem 99 (9):1920–1927 64. Bloch DN, Kolkowska P, Tessari I, Baratto MC, Sinicropi A, Bubacco L, Mangani S, Pozzi C, Valensin D, Miller Y (2019) Fibrils of alpha-synuclein abolish the affinity of Cu(2 +)-binding site to His50 and induce hopping of Cu(2+) ions in the termini. Inorg Chem 58 (16):10920–10927

Chapter 8 From Quantum Mechanics, Classical Mechanics, and Bioinformatics to Artificial Intelligence Studies in Neurodegenerative Diseases Orkid Coskuner-Weber, M. Gokhan Habiboglu, David Teplow, and Vladimir N. Uversky Abstract The amyloid β-protein is an intrinsically disordered protein that has the potential to assemble into myriad structures, including oligomers and fibrils. These structures are neurotoxic and are thought to initiate a cascade of events leading to Alzheimer’s disease. Understanding this pathogenetic process and elucidating targets for drug therapy depends on elucidation of the structural dynamics of Aβ assembly. In this chapter, we describe work packages required to determine the three-dimensional structures of Aβ and of smaller bioactive fragments thereof, which may be important in AD pathogenesis. These packages include density functional theory, Car–Parrinello molecular dynamics simulations, temperature-dependent replica exchange molecular dynamics simulations, disorder predictors based on bioinformatics, and neural network deep learning. Key words DFT, CPMD simulations, T-REMD simulations, Disorder predictors, Deep learning

1

Introduction Amyloid-β (Aβ) is produced by sequential endoproteolytic cleavage of the amyloid precursor protein (APP) by β-secretase (N-terminus) and γ-secretase (C-terminus) [1]. APP is a type I transmembrane protein in which the Aβ peptide region exists partly outside the membrane (residues 1–28) and partly inside it [2]. The amino acid sequence of human Aβ42 is shown below in one-letter code. DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVVIA

Aβ is released as a monomer into the extracellular milieu of the brain where it can undergo a complex assembly process producing a variety of oligomeric and fibrillar structures. Aβ fibrils are the

Mai Suan Li et al. (eds.), Computer Simulations of Aggregation of Proteins and Peptides, Methods in Molecular Biology, vol. 2340, https://doi.org/10.1007/978-1-0716-1546-1_8, © Springer Science+Business Media, LLC, part of Springer Nature 2022

139

140

Orkid Coskuner-Weber et al.

primary protein component of amyloid plaques, the extracellular protein deposits that are pathognomonic for Alzheimer’s disease (AD) [3]. Fibrillar Aβ structures have been investigated experimentally, but current efforts to understand the etiology of AD have focused on oligomeric Aβ assemblies, which have been shown to be neurotoxic and to exist in brains of AD patients [4]. Nuclear magnetic resonance (NMR) and circular dichroism (CD) spectroscopy have revealed that Aβ monomers exist as statistical coils in solution [5, 6]. Monomers exist in rapid equilibrium with dimers and higher-order assemblies, including protofibrils and fibrils [7]. This makes the use of NMR or X-ray methods for the determination of atomic-resolution structures of monomeric and oligomeric forms of the peptide problematic. Population average (CD) or relatively low-resolution methods (ion mobility mass spectrometry (IC-MS), electron microscopy (EM), transmission electron microscopy (TEM), and atomic force microscopy (AFM)) have been used successfully [8–10]. However, only low-resolution structural data from CD, ion mobility mass spectrometry (IC-MS), electron microscopy (EM), transmission electron microscopy (TEM), and atomic force microscopy (AFM) measurements exist in the current literature. The initial conformational changes of the Aβ monomer during its assembly include formation of turns, α-helices, and β-strands [11, 12]. Some of these elements are transitory, and others are perpetuated through the fibril assembly process [13]. Small-molecule inhibitors of toxic folds, or stabilizers of non-toxic folds, are an attractive therapeutic strategy. This strategy recently was applied to α-synuclein, the amyloidogenic protein of Parkinson’s disease. Marotta et al. [14], instead of developing drugs that directly block α-synuclein aggregation (which has not worked thus far), took advantage of cellular modifications of α-synuclein that have the potential to prevent aggregation. They studied whether glycosylation could directly block aggregation. Defects in the glycosylation of APP, tau, and other proteins have been detected in the AD pathology. Rescue of this process in the case of Aβ and tau may have merit. However, the fast conformational changes and steric effects associated with highly flexible carbohydrate structures, and the fast conformational changes and lack of 3D structure in IDPs, create challenges for experimentally monitoring conformational aspects of disordered glycoproteins [15–17]. Computational methods, which complement and extend experimental studies, may provide a solution to this problem. Here, we discuss modern computational packages for yielding structural information at atomic-scale resolution. These include density functional theory (DFT) calculations [18], Car–Parrinello molecular dynamics (CPMD) simulations [18], temperaturereplica exchange molecular dynamics (T-REMD) simulations [19], deep learning estimations [20] along with DFT calculations and disorder predictors [21]. DFT is a computational quantum

From Quantum Mechanics, Classical Mechanics, and Bioinformatics to. . .

141

mechanical modeling method used in various areas including biology, medicine, chemistry, and physics. In this work package, we investigate the electronic structure of a many-body system using DFT calculations. Using this theory, the properties of a manyelectron system can be determined utilizing functions that, in this case, involve spatially dependent electron density. CPMD is related to the more common Born–Oppenheimer molecular dynamics (BOMD) method in which the quantum mechanical effect of the electrons is included in the calculation of energy and forces for the classical motion of the nuclei. While BOMD treats the electronic structure problem within the time-independent Schro¨dinger equation, CPMD explicitly includes the electrons as active degrees of freedom through the usage of dynamic variables. CPMD usually employs periodic boundary conditions, planewave basis sets, and DFT. Parallel tempering, also known as replica exchange sampling, is a simulation method aimed at improving the dynamic properties of classical simulations without special sampling techniques. In particular, we run N copies of the system at different temperatures. Then, based on the Metropolis criterion, configurations are exchanged at different temperatures. High-temperature configurations can be studied at low temperature and vice versa. Disorder predictors capture a variety of computational methods that exploit the signatures of disorder to predict whether a protein is disordered and to what extent, at the residue level. The deep learning method is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised, or unsupervised. Artificial neural networks (ANNs) were initially inspired by distributed communication nodes and information processing in biological systems.

2

DFT Calculations In this example, we employ GAMESS [22] software to study a glycosylated Aβ fragment. The initial structure used in the calculation can be prepared using various visualization programs, such as Molden [23], MacMolPlt [24], or an experimentally determined structure. The initial structure can be optimized using DFT level of theory. Hybrid functionals in DFT calculations are a class of approximations to the exchange-correlation energy functional that incorporate a portion of exact exchange from Hartree–Fock theory with the rest of the exchange-correlation energy from ab initio or empirical sources. The exact energy functional is expressed in terms of the Kohn–Sham orbitals rather than the density (implicit density functional). One of the widely used versions of these functionals is the popular Becke, 3-parameter, Lee–Yang–Parr (B3LYP) functional [25]. Our system of interest in this example is β-D-glucopyranose-L-serine that represents a carbohydrate bound

142

Orkid Coskuner-Weber et al.

to the serine residue of Aβ. Once the initial structure is prepared, follow the protocol below for performing an optimization calculation. Note that one can conduct also energy, hessian, gradient, trudge, Raman, and various other calculations instead of optimization. The cc-pvTZ basis set along with the B3LYP functional is utilized. Create a directory called BDGluSer by using the following command in Linux: mkdir BDGluSer

Using the vi editor (visual editor), open and create an input file named BDGluSer.inp under the directory “BDGluSer.” To open and create that input file, type in vi BDGluSer.inp in the console. In the input file (after opening BDGluSer.inp), type “I” to start inserting using the vi editor the following commands; the selfconsistent field type is restricted Hartree Fock, run type is optimized for optimization but can be replaced with other commands including energy or hessian, the DFT type is B3LYP functional, maximum iterations are 80, and multiplicity is 1 in our example. The MWORDS represents the maximum replicated memory which our job will use on every core. This is given in units of 1,000,000 words where a Word is defined as 64 bits. Additionally, if you would like to use for example Z-matrix coordinates rather than Cartesian coordinates, the $ZMAT group (required if NZVAR is nonzero in $CONTRL) lets you define the internal coordinates in which the gradient geometry search is carried out. The coordinates may be in Z-matrix type, delocalized coordinates, or natural internal coordinates. The $BASIS group allows certain standard basis sets to be requested. Basis sets are specified by GBASIS (plus optional supplementation such as NDFUNC), BASNAM for reading custom basis sets from your input file, EXTFIL for reading custom bases from an external file. In our example, we use Dunning type correlation basis set, officially called cc-pVTZ, and T in cc-pVnZ indicates the level of polarization. However, one can use also the widely utilized 6-31G(d,p) basis set by typing “gbasis¼n31 ngauss¼6 ndfunc¼1 npfunc¼1 $END.” An implicit water model, such as COSMO (COSMO-RS) can be used as well. The COSab algorithm, as implemented in GAMESS, does not include non-electrostatic effects in the calculation. However, these contributions to the solvation energy can be calculated, and added “a posterion” by using the interface to the COSMO-RS program by COSMOlogic GmbH & Co. KG. Once the printout routine is obtained, and GAMESS is recompiled, the keywords “$COSGMS PRFCND¼.TRUE. COSWRT¼.TRUE. $END” activate the . cosmo files printout.

From Quantum Mechanics, Classical Mechanics, and Bioinformatics to. . .

143

$CONTRL SCFTYP=RHF RUNTYP=OPTIMIZE DFTTYP=B3LYP MAXIT=80 MULT=1 ISPHER=1 $END $SYSTEM MWORDS=80 $END $BASIS GBASIS=CCT $END $SCF DIRSCF=.TRUE. $END $STATPT OPTTOL=0.0001 NSTEP=300 $END

Next, we list the $DATA group, which describes the global molecular data, such as point group symmetry, nuclear coordinates and possibly the basis set. See the $RELWFN for detailed information on large and small component basis sets. After typing $DATA, we provide a title for the run in the input file (BLS_-060 in this example). In the next line of the input file, we provide the symmetry group of the molecule that can be C1, Cs, Ci, Cn, S2n, Cnh, Dn, Dnh, Dnd, T, Th, Td, O, or Oh. In our example, the symmetry group is C1 for our glycopeptide. Depending on your choice for COORD in $CONTRL, we follow a card sequence of U, C, G, or M in the Gamess software. Next, we can start listing the atom symbols, such as C for Carbon and O for Oxygen or H for Hydrogen. These information are followed by the nuclear charge information that we list for each atom in our input file. Then, we list the Cartesian coordinates (x, y, and z data) for each atom that we obtained from experiments or from visualization programs (see below). At the end of our input file, we type in $END. $DATA BLS_-060 C1 C

6.0

-0.97374

C

6.0

O

8.0

.

1.66271

0.22487

-0.26550

0.44366

0.79880

-0.61599

-0.74339

0.12895

...

...

...

...

...

...

...

...

...

... . ... . ... $END

To save and quit the input file, push the escape button using the vi editor, type in wq! From the directory that you created “BDGluSer” (you can use the cd command to change directories, or rm -r to remove a directory in a terminal in Linux/Unix). To run Gamess on your computer, you could type in under the “BDGluSer” directory in a Linux terminal:

144

Orkid Coskuner-Weber et al. rungms input_file_name >& output_file_name.log

The optimized structure is shown in Fig. 1. In the current example, an output file that lists the outcome including the following energies in Hartree units will be created by Gamess: TOTAL ENERGY =

-1009.1413950328

ELECTRON-ELECTRON POTENTIAL ENERGY =

1863.0761597302

NUCLEUS-ELECTRON POTENTIAL ENERGY =

-5380.4681359797

NUCLEUS-NUCLEUS POTENTIAL ENERGY =

1507.2919666880 ------------------

TOTAL POTENTIAL ENERGY = TOTAL KINETIC ENERGY =

3

-2010.1000095615 1000.958614528

CPMD Simulations The CPMD task in the software package NWCHEM [26] is used to perform ab initio molecular dynamics (AIMD) using the scheme developed by Car and Parrinello [27]. In this AIMD scheme, the motion of the ion cores is coupled to a fictitious motion for the Kohn–Sham orbitals of DFT. Constant energy or constant temperature simulations can be conducted using the NWChem software package. CPMD simulations have been used to describe ligand and Aβ interactions, such as transition metals and Aβ in water at the electronic level with dynamics. We will continue with our example and will investigate the structural properties of β-D-glucopyranoseL-serine in water in this example. To solvate the bimolecule, you could use various software including VMD (after pdb and psf file preparation) or your own script (we use our own script). We solvated the glycopeptide with 5.0 Å size around the solute in each direction of a cubic box. Next, open a file for preparing your input file for CPMD simulations using the vi editor as described above utilizing the insert “I” command mode. title "GlcLser" start glcLser.md geometry C

0.756500

-1.894000

0.292400

C

0.015000

-0.323000

0.878000

0.668000

0.004000

O

0.501000 ...

...

...

...

...

...

...

From Quantum Mechanics, Classical Mechanics, and Bioinformatics to. . . HO5 O5

C6 HO4 O4

C4

C5

HO2

HO3

O3

145

O7

O1 ψ

C9

O2

C3

C1

C2

O6

C7

O8

C8

N1

HN1c

HN1a HN1b

Fig. 1 The representative optimized structure β-D-glucopyranose-(L)-serine at ψ ¼ 180 (t) conformation using an implicit water model (COSMO) in B3LYP/ccpVTZ calculations ... ...

...

...

... end pspw car-parrinello time_step 5.0 #Typically between 1 and 20 fake_mass 600.0 #Typically between 300 and 1500 loop 10 100 end mult 1 end set nwpw:minimizer 2 task pspw energy task pspw car-parrinello

You may need to specify the amount of memory that NWChem can use for the simulation. An example would be typing memory 2480 mb stack 1120 mb heap 1120 mb for our sample job. Also, you can define the charge of the system and in our case, the system is neutral so type in “charge 0” before geometry information is listed. The simulation cell can be described by using the following commands: NWPW ... SIMULATION_CELL

[units ]

CELL_NAME BOUNDARY_ CONDITIONS periodic) LATTICE_VECTORS

(periodic || aperiodic default

146

Orkid Coskuner-Weber et al.

NGRID END ... END

The Car–Parrinello subblock can contain a great deal of input, including pointers to data as well as parameter input. In this example, we will assign the Troullier-Martins pseudopotential by typing in (for each element, you can use a pseudopotential): pseudopotentials O library TM C library TM H library TM N library TM end

The temperature can be controlled using the Nose–Hoover thermostat, and in our example, we use the following input parameters for simulating the system at room temperature: Nose-Hoover 1201.0 298.15 1201.0 298.15

And the exchange correlation (XC) can be LDA, Vosko, PBE96, HF, or other functionals such as PBE0. In our example, we could use PBE96 by typing in the input file: XC PBE96

The Ewald sum cut-off value can be (in our example): Ewald_ncut 2

At the end, we type in ... end task pspw energy task pspw Car-Parrinello

From Quantum Mechanics, Classical Mechanics, and Bioinformatics to. . .

147

Generally, one will run a job with the following command: nwchem input.nw >& input.out &

Geometry (in x, y, and z coordinates), effective nuclear repulsion energy in atomic units, nuclear dipole moment in atomic units, internuclear distances. Internuclear angles, orbital energies, total PSPW energy, spin contamination, center of charge, KE + energy for each CPMD iteration, final position of the ions, final velocity of ions, total orbital energy, Hartree energy, exchange-correlational energy, kinetic energy of electrons and ions, and thermostat energy of electrons and ions are among the properties that are calculated. Using the output.xyz output file, one can view the simulation run in VMD or any other visualization program. Figure 2 shows representative structures of β-D-glucopyranose-(L)-serine obtained from CPMD simulations at 298.15 K.

4

T-REMD Simulations Here we use GROMACS 5.1.4 for simulating Aβ4242 in water [28]. The initial structure was obtained by solution NMR (PDBID: 1Z0Q) [29]. Open this structure using VMD, Chimera or PyMOL and save only one frame as a pdb file. Care must be provided to initial structures since the experimentally determined structure can miss certain atoms or residues. Especially, this care has to be provided for structures determined by X-ray experiments because these usually miss hydrogen atoms. Missing hydrogen atoms can be placed using GROMACS and added during the setup of the simulation system. The “complete.pl” script from the MMTSB tool set [27] can be used for adding hydrogen atoms. In some instances, entire residues can be missing as they could not be measured using X-ray or NMR. Such missing residues must be added before the simulation is conducted. For adding residues, one could use MODELLER (https://salilab.org/modeller/wiki/Missing%20residues). Once the initial structure is prepared, follow the protocol below for performing a simulation. We will use the CHARMM36M parameters. Create the following directories from where you placed the initialstructure.pdb (WTAB.pdb is the name in our example) file and the CHARMM36M parameters by typing in: mkdir Prep_all and mkdir Work. Next, open with vi a file named step1_pdb2gmx.sh and in insert mode (see above) type in: gmx_mpi pdb2gmx -f WTAB.pdb -o WTAB_processed.gro -ignh

Save the file and exit as described above. Run the command ./ step1_pdb2gmx.sh, and this execution will ask you which force

148

Orkid Coskuner-Weber et al.

ψ

ψ

ψ

Fig. 2 Various structures of β-D-glucopyranose-(L)-serine obtained from our CPMD simulations

field parameter set you select for the protein. From the current directory, you will be able to find CHARMM36 all-atom force field (March 2019) listed in number 1. Type 1 and the C36M force field is then selected for your protein. Next, you need to select the water model. In this example, we are using the TIP3P water model and selecting 1 for this purpose. In the console, using the vi editor, open and write a new file called step2_def-box.sh. Type into that file gmx_mpi editconf -f WTAB_processed.gro -o WTAB_newbox.gro -c -d 2.0 -bt cubic

This command will choose a cubic box with a cubic cell size of 2.0  2.0  2.0 nm. Next, run the script by typing ./step2_defbox.sh in the console. Then, open a new file called step3_solv.sh using the vi editor. Type into that file “gmx_mpi solvate -cp WTAB_newbox.gro -cs spc216.gro -o WTAB_solv.gro -p topol. top” and save and exit and run the file by using the command ./ step3_solv.sh. In the current example, 23,610 water molecules will be added into the system. Please note that the topology file “topol. top” gets updated at this step as well. For an REMD simulation with periodic boundary conditions, which we apply in this example, the charge of the total system has to be zero. Therefore, we open a new file called “step4_genion.sh” and type in gmx_mpi grompp -f ions.mdp -c WTAB_solv.gro -p topol.top -o ions.tpr;

gmx_mpi genion -s ions.tpr -o WTAB_ions.gro -p topol.top -pname NA -nname CL -neutral

After saving and exiting, we type in the console ./step4_genion.sh and add the counterions to neutralize the system. The first command calls GROMACS preprocessor (grompp) that reads the molecular topology file, checks the validity of the file, expands the topology to an atomic listing rather than molecular listing, reads a .

From Quantum Mechanics, Classical Mechanics, and Bioinformatics to. . .

149

mdp file, and translates these into the directives of GROMACS. The final file from this step is a .tpr file which is a portable binary run input file. This file possesses the structure for the simulation, topology, and simulation parameters. The flag neutral guarantees that the systems are neutralized with the required number of counterions (Na+ and Cl in our example). To make sure that water molecules should be replaced by the ions, choose interactively 13 that is for SOL from the list of options provided by GROMACS when you execute step4_genion.sh. After this step, the system’s topology is created an done can continue for MD simulations to equilibrate the system for REMD simulations. For the next step, open a new file called step5_grompp_emin.sh and type in “gmx_mpi grompp -f em.mdp -c WTAB_ions.gro -p topol.top -o em.tpr” and save the file and exit using the vi editor. Then, the energy minimization can be conducted: gmx_mpi mdrun -v -deffnm em

The option -v is for verbose that produces detailed output during the minimization and -deffnm sets the base name for input and output file to em. The energy minimization is followed by two equilibration runs. For the first equilibration run, one conducts usually a short MD simulation in the NVT ensemble with position restraints on the protein that enables the solvent molecules and ions to relax: gmx_mpi grompp -f nvt_eq.mdp -c em.gro -p topol.top -o nvteq.tpr -r em.gro

Next, execute the run by typing “gmx_mpi mdrun -v -deffnm nvteq.” The NVT equilibration is followed by MD simulations in the NPT ensemble in which the restraint forces are gradually reduced from 1000 to 10 kJ/mol/nm2 using the steps in the following: gmx_mpi grompp -f npt_eq.mdp -c nvteq.gro -p topol.top -o npteq.tpr -r nvteq.gro followed by “gmx_mpi mdrun -v -deffnm npteq” to run the MD simulation. For preparing the files for the T-REMD simulation, change into the corresponding directory: cd Prep_all

Copy topol.top and npteq.gro files from the previous directory to this directory. Open with vi a tp file and list the temperatures that are needed for your T-REMD simulation. A temperature generator for REMD simulations exists online if you prefer not to calculate the temperatures that are needed for your system (http://folding. bmc.uu.se/remd/). Type into the “tp” file (using a vi editor) the temperatures as in the following example:

150

Orkid Coskuner-Weber et al. 280.84 281.68 282.53 ... ... ...

Using a vi editor, write the following script into a file named “gromp.sh.” cpu=48

prev_i=0 gro_prev=`awk -v i=$prev_i 'BEGIN{printf "md%02d_", i}'`

let i=$prev_i+1 def_nm=`awk -v i=$i 'BEGIN{printf "md%02d_", i}'` gro_fl=npteq.gro top_fl=topol.top

for ((k=0;k > > > < R

off

> > > > :

2

 r ij

  2 2 

j

Roff 2  3Ron 2 þ 2r ij 3 Roff 2  r ij 2

  W VdW r ij ¼

r ij < Ron

 2 ,

Ron < r ij < Roff , is

0, r ij  Roff the switching function with parameters Ron and Roff (this switching

Determination of the Most Stable Packing of Peptides from Ribosomal S1. . .

227

function restricts the radius of action of Lennard–Jones potential and allows its gradual decreasing to zero); εij and σ ij are parameters of Lennard–Jones potential for the pair of particles i and j. These parameters determine the depth of the potential well and the location of its minimum. εij and σ ij for the pair of different types of atoms i and j were calculated using the combinational rules: pffiffiffiffiffiffiffiffiffiffiffiffi σ þσ εij ¼ εi  ε j and σ ij ¼ i 2 j [17]. Interactions between charged atoms are described by electrostatic potential: XX q i  q j   Uq ¼ W q r ij , εr ij i

j

8 2   < 1  r ij , Rq where W q r ij ¼ : 0,

r ij < Rq ,

is the screening funcr ij > Rq: tion with radius Rq; ε is the dielectric constant of medium; rij is the distance between particles i and j, which are not valence bonded. The switching functions WVdW(rij) and Wq(rij) restrict the radius of action of Lennard–Jones and electrostatic potential and allows its gradual decreasing to zero. In this case Ron ¼ 9 Å and Roff ¼ Rq ¼ 10.5 Å. The parameters Ron, Roff, Rq do not depend on the type of atoms. Each system with β-layers was surrounded by more than 1500 water molecules and each oligomer by more than 6000 water molecules (see Fig. 4). Initially, the relaxation of the β-layers was performed in the NPT ensemble. After the density of the systems (β-layer + water) reached 1 g/cm3, further modeling was carried out in the NVT ensemble with the periodic boundary conditions. The modeling of systems (oligomer + water) was done without using the periodic boundary conditions. AMBER99 force field [17] for atom interactions and the TIP3P model of water [18] were used. The simulations were carried out at the constant temperature (T ¼ 300 K), which was supported by the collision thermostat [19]. The mean collision frequency of atoms with virtual particles was 10 ps1, and the masses of virtual particles were 1 atomic mass unit. The constant pressure was maintained by Berendsen barostat [20]. Equations of motion were integrated numerically by using the velocity version of Verlet algorithm [21] with a time step 1 fs. For every system (β-layer + water and oligomer + water), four independent simulations, which differ in the initial data (coordinates and velocities) were done (see Note 2). In the process of molecular dynamics simulations, we monitored how the percentage of secondary structure in the β-layers changes

228

Anna V. Glyakina et al.

Fig. 5 Graphs of the changes of the β-structure in the β-layers during the simulation for peptide IVRGVVVAID. The straight green lines indicate the initial level of β-structure before modeling 3.4 Molecular Dynamics Simulations of β-layers

during time (see Fig. 5). In this work, the fraction of β-structure in the β-layers was considered as a measure of stability for each β-layer (see Note 3). The fraction of the secondary structure was calculated using the YASARA program [15]. The fraction of β-structure in the constructed β-layers was calculated after fitting the amino acid sequence into them. It was found that the fraction of β-structure in the β-layers of anti_anti and anti_para is about 90%, in para_anti—78%, and in para_para—65%. The value of this parameter decreased after 1 ns of molecular dynamics simulations in all β-layers. For the β-layers of anti_anti and anti_para, this decrease was not significant compared to the β-layers of para_anti and para_para (see Fig. 5). It is also worth noting that the decrease is more significant for β-layers at pH5. It is especially important that the β-layer of para_anti in which the fraction of β-structure decreased from 78% to 28%. Previously, the β-layers with the amino acid sequences VDSWNVLVAG and VESWNVLVAG were modeled [22]. For these sequences, the initial fraction of β-structure in the β-layers of anti_anti and anti_para is also about 90%. But there is no

Determination of the Most Stable Packing of Peptides from Ribosomal S1. . .

229

Fig. 6 Graphs of the changes of the β-structure in β-layers during the simulation for peptide VDSWNVLVAG. Straight green lines indicate the initial level of β-structure before modeling

significant difference in the decrease in β-structure in the β-layers at pH3 and pH5 (see Figs. 6 and 7). Thus, the β-layers in which β-strands are antiparallel relative to each other (anti_anti and anti_para) were the most stable for peptides IVRGVVVAID, VDSWNVLVAG, VESWNVLVAG at pH3 and pH 5. 3.5 Molecular Dynamics Simulations of Oligomers

Using molecular dynamics simulations, we demonstrated that the antiparallel organization of β-strands is the most stable in comparison with the parallel arrangement of β-strands. Using electron microscopy, we showed that the amyloid fibrils and films for amyloidogenic peptides VDSWNVLVAG, KLVFFA EDVG , and IIGLMVGGVV consist of ring oligomers with a diameter of about 6 nm, packed ring-to-ring or ring-on-ring with a slight shift [5, 6]. Thus, the oligomer is a building block for the construction of amyloid fibrils and films (see Note 4). For the construction of ring oligomer with a diameter about 6 nm, the structure presented in the pdb file 4UZR was used as template. In this structure, β-strands are arranged antiparallel to

230

Anna V. Glyakina et al.

Fig. 7 Graphs of the changes of β-structure in the β-layers during the simulation for peptide VESWNVLVAG. The straight green lines indicate the initial level of β-structure before modeling

each other. Thus, three-ring oligomer structures with the amino acid sequence VDSWNVLVAG, KLVFFAEDVG, and IIGLMVG GVV were constructed. Each oligomer structure consists of 48 β-strands. It was found that for the ring oligomer with the amino acid sequence VDSWNVLVAG, the fraction of the β-structure after 3 ns of molecular dynamics simulation slightly decreased (from 70.6% to 67.7%). Moreover, in two of the four cases, some deformation of the initial ring structure was observed (see Fig. 8). For the ring oligomer with the amino acid sequences KLVFF AEDVG and IIGLMVGGVV, the fraction of the β-structure after 6 ns of molecular dynamics simulation remains unchanged and significant deformation of the initial ring structure was not observed (see Fig. 9).

Determination of the Most Stable Packing of Peptides from Ribosomal S1. . .

231

Fig. 8 Structures of the ring oligomer with the amino acid sequence VDSWNVLVAG from protein Bgl2p before and after 3 ns simulations

Fig. 9 Structures of the ring oligomers with the amino acid sequences KLVFFAEDVG and IIGLMVGGVV from Aβ peptide before and after simulation

4

Notes 1. To do some modeling with peptides, it is better to have some experimental data for behavior of molecules (proteins and peptides) [1].

232

Anna V. Glyakina et al.

2. To obtain more reliable data, it is better to use more time for simulations (Figs. 6, 7, and 8). 3. It is better to consider more parameters for the estimation of structure stability (Figs. 1, 6, 7, and 8). 4. Reliable identification of β-aggregation sites in the protein chain is important for search of therapeutic agents against diseases associated with protein deposition in the form of large aggregates. Therefore, it is important to know the structure of the oligomer for various amyloidogenic amino acid sequences. To do this, we needed to know which stacking of β-layers is the most stable (parallel or antiparallel). Thus, in this chapter, we focused only on stable stacking of β-layers in oligomer structure. Funding Information: Russian Science Foundation 18‐14‐ 00321 for AVG and OVG. References 1. Selivanova OM, Surin AK, Marchenkov VV et al (2016) The mechanism underlying amyloid polymorphism is opened for Alzheimer’s disease amyloid-β peptide. J Alzheimers Dis 54:821–830 2. Galzitskaya OV, Selivanova OM (2017) Rosetta stone for amyloid fibrils: the key role of ring-like oligomers in amyloidogenesis. J Alzheimers Dis 59:785–795 3. Klimov DK, Thirumalai D (2003) Dissecting the assembly of Abeta16-22 amyloid peptides into antiparallel beta sheets. Structure 11 (3):295–307 4. Rauscher S, Baud S, Miao M et al (2006) Proline and glycine control protein selforganization into elastomeric or amyloid fibrils. Structure 14(11):1667–1676 5. Selivanova OM, Glyakina AV, Gorbunova EY et al (2016) Structural model of amyloid fibrils for amyloidogenic peptide from Bgl2p–glucantransferase of S. cerevisiae cell wall and its modifying analog. New morphology of amyloid fibrils. Biochim Biophys Acta Proteins Proteomics 1864:1489–1499 6. Selivanova OM, Surin AK, Ryzhykau YL et al (2018) To be fibrils or to be nanofilms? Oligomers are building blocks for fibril and nanofilm formation of fragments of Aβ peptide. Langmuir 34:2332–2343 7. Grishin S, Deryusheva E, Machulin A et al (2020) Amyloidogenic propensities of ribosomal S1 proteins: bioinformatics screening and experimental checking. Int J Mol Sci 21 (15):5199

8. Massi F, Peng JW, Lee JP et al (2001) Simulation study of the structure and dynamics of the Alzheimer’s amyloid peptide congener in solution. Biophys J 80:31–44 9. Fersht AR, Daggett V (2002) Protein folding and unfolding at atomic resolution. Cell 108:573–582 10. Balbach JJ, Ishii Y, Antzutkin ON et al (2000) Amyloid fibril formation by A beta 16-22, a seven-residue fragment of the Alzheimer’s beta-amyloid peptide, and structural characterization by solid state NMR. Biochemistry 39:13748–13759 11. Garbuzynskiy SO, Lobanov MY, Galzitskaya OV (2010) FoldAmyloid: a method of prediction of amyloidogenic regions from protein sequence. Bioinformatics 26:326–332 12. Maurer-Stroh S, Debulpaep M, Kuemmerer N et al (2010) Exploring the sequence determinants of amyloid structure using positionspecific scoring matrices. Nat Methods 7:237–242 13. Conchillo-Sole´ O, Groot NS, Avile´s FX et al (2007) AGGRESCAN: a server for the prediction and evaluation of “hot spots” of aggregation in polypeptides. BMC Bioinformatics 8:65 14. Walsh I, Seno F, Tosatto SCE et al (2014) PASTA 2.0: an improved server for protein aggregation prediction. Nucleic Acids Res 42: W301–W307 15. Krieger E, Koraimann G, Vriend G (2002) Increasing the precision of comparative models with YASARA NOVA--a self-parameterizing force field. Proteins 47:393–402

Determination of the Most Stable Packing of Peptides from Ribosomal S1. . . 16. Glyakina AV, Balabaev NK, Galzitskaya OV (2010) Two-, three-, and four-state events occur in the mechanical unfolding of small protein L using molecular dynamics simulation. Protein Pept Lett 17:92–103 17. Wang J, Cieplak P, Kollman PA (2000) How well does a restrained electrostatic potential (RESP) model perform in calculating conformational energies of organic and biological molecules? J Comput Chem 21:1049 18. Jorgensen WL, Chandrasekhar J, Madura JD et al (1983) Comparison of simple potential functions for simulating liquid water. J Chem Phys 79:926–935

233

19. Lemak AS, Balabaev NK (1995) A comparison between collisional dynamics and Brownian dynamics. Mol Simul 15:223–231 20. Berendsen HJC, Postma JPM, Gunsteren WFV et al (1984) Molecular dynamics with coupling to an external bath. J Chem Phys 81:3684–3690 21. Allen MP, Tildesley DJ (1996) Computer simulation of liquids. Clarendon Press, Oxford 22. Glyakina AV, Balabaev NK, Galzitskaya OV (2016) Dataset of the molecular dynamics simulations of bilayers consisting of short amyloidogenic peptide VDSWNVLVAG from Bgl2p–glucantransferase of S. cerevisiae cell wall. Data Brief 9:597–601

Chapter 12 Molecular Dynamics Simulations of Protein Aggregation: Protocols for Simulation Setup and Analysis with Markov State Models and Transition Networks Suman Samantray, Wibke Schumann, Alexander-Maurice Illig, Martin Carballo-Pacheco, Arghadwip Paul, Bogdan Barz, and Birgit Strodel Abstract Protein disorder and aggregation play significant roles in the pathogenesis of numerous neurodegenerative diseases, such as Alzheimer’s and Parkinson’s diseases. The end products of the aggregation process in these diseases are highly structured amyloid fibrils. Though in most cases, small, soluble oligomers formed during amyloid aggregation are the toxic species. A full understanding of the physicochemical forces that drive protein aggregation is thus required if one aims for the rational design of drugs targeting the formation of amyloid oligomers. Among a multitude of biophysical and biochemical techniques that are employed for studying protein aggregation, molecular dynamics (MD) simulations at the atomic level provide the highest temporal and spatial resolution of this process, capturing key steps during the formation of amyloid oligomers. Here we provide a step-by-step guide for setting up, running, and analyzing MD simulations of aggregating peptides using GROMACS. For the analysis, we provide the scripts that were developed in our lab, which allow to determine the oligomer size and inter-peptide contacts that drive the aggregation process. Moreover, we explain and provide the tools to derive Markov state models and transition networks from MD data of peptide aggregation. Key words Amyloid aggregation, Amyloid oligomers, MD simulations, Transition networks, Markov state models

1

Introduction During protein aggregation, misfolded or intrinsically disordered proteins assemble first into oligomers, which can grow into highly ordered β-sheet aggregates called amyloid fibrils, which, depending on the protein, takes place in the intra- or extracellular environment. This process is highly associated with various, often neurodegenerative diseases, such as Alzheimer’s and Parkinson’s diseases [1, 2]. Neurodegenerative diseases are debilitating conditions that result in progressive degeneration and/or death of nerve cells,

Mai Suan Li et al. (eds.), Computer Simulations of Aggregation of Proteins and Peptides, Methods in Molecular Biology, vol. 2340, https://doi.org/10.1007/978-1-0716-1546-1_12, © Springer Science+Business Media, LLC, part of Springer Nature 2022

235

236

Suman Samantray et al.

causing problems with movement (called ataxia) and/or mental functioning (called dementia). To our knowledge, none of these diseases linked to amyloid aggregation are currently curable and finding a cure against them poses huge challenges [3]. Computer simulations, especially molecular dynamics (MD) simulations have become essential tools to investigate the relationship between conformational and structural properties of proteins and the intermolecular interactions that give rise to aggregation [2, 4, 5]. In the twenty-first century, powerful supercomputers have enabled us to simulate more and more complex systems for longer time scales and larger length scales in order to approach experimental conditions. However, MD simulations produce a large amount of data, and extracting useful information from them necessitates sophisticated post-processing techniques. One of these techniques are Markov state models (MSMs), which have recently gained traction in the computational biochemistry and biophysics communities as a technique that excels at elucidating the relevant states and processes hidden in the MD data [6, 7]. MSMs are network models that encode the system dynamics in a states-and-rates format, i.e., the molecular system at a given instant can exist in one among many possible states, and it has a fixed probability of transitioning to other states, including itself, within a particular time interval. A fundamental assumption of MSMs is memorylessness, i.e., the probability of transition from one state to another depends only on the current state and not the history of the system. MSMs are kinetic maps of a system’s underlying free-energy landscape, which allow us to extract essential information as demonstrated for a large range of biological systems, including protein folding [8], protein– ligand binding [9], or allostery [10]. Our group recently extended the applicability of MSMs to molecular self-assembly by accounting for the degeneracy of aggregated during the aggregation process [11]. The power of this approach for the elucidation of kinetically relevant aggregation pathways has been demonstrated for the selfassembly of the amyloidogenic peptide KFFE [11]. An alternative network model to characterize protein aggregation is provided by transition networks (TNs), which were also developed by the Strodel lab [12]. TNs are based on conformational clustering, instead of kinetic clustering as done in MSMs. In TNs, the aggregation states are defined based on characteristics that are found to be most relevant for describing the aggregation process under study. These so-called descriptors always include the aggregation size and are augmented by, e.g., the number and type of interactions between the proteins in the aggregates, their shape and amount of β-sheet, i.e., quantities relevant to amyloid aggregation. The transformation of the high-dimensional conformational space into this lower-dimensional TN space enables clear views of the structures and pathways of the aggregation process. We successfully applied this approach to the aggregation of the amyloid-β

Molecular Dynamics Simulations of Protein Aggregation: Protocols for. . .

237

peptide (Aβ42) connected to the development of Alzheimer’s disease [13–15], a segment of this peptide, Aβ16–22 [12, 16], as well as GNNQQNY, a polar peptide sequence from the yeast prion protein Sup35 [12]. In this chapter, we provide a guided manual for performing MD simulations of protein aggregation, and analyzing them either with Markov state models or with transition networks.

2

Simulation and Analysis Protocols The basic prerequisite to perform MD simulations of proteins is an MD software engine such as GROMACS [17], AMBER [18], or NAMD [19]. Here, we employ the GROMACS software to illustrate the setup, conductance, and analysis of protein aggregation simulations. There are few more software packages which will be required for the following protocols: (1) protein visualization programs, i.e., PyMOL [20] or VMD [21], (2) Python [22] for general data analysis, (3) Python libraries specifically designed to analyze MD trajectories, i.e., MDAnalysis [23] and MDTraj [24], and (4) a molecule packing optimization software, i.e., PACKMOL [25].

2.1 MD Simulations and Basic Analysis

Most of the following protocols use Aβ16–22 as example, which is treated with capping groups at both ends and thus has the sequence ACE-KLVFFAE-NME. As the Protein Data Bank (PDB) does not include a structure for this peptide, a starting structure for the following simulations can be retrieved from the coordinates of residues 16–21 of a PDB structure of Aβ42, such as PDB entry 1Z0Q [26]. Using the Builder tool in Protein mode of PyMOL, the ACE and NME capping groups can be added to the N- and C-terminus, respectively. In this protocol, six copies of Aβ16–22 are simulated employing GROMACS 2016.4 as MD engine, Charmm36m as protein force file [27], and the TIP3P water model [28]. We use the Charmm36m force field as it has been shown to be one of the best force fields for modeling Aβ [29] and which also performs best in our in-house peptide aggregation benchmark [16].

2.1.1 Preparation of the Simulation Box Containing Six Peptides

1. The first step is to produce a relaxed conformation for the Aβ16–22 monomer. This can be achieved with an MD simulation of the monomer following our MD protocol published in ref. 30. Alternatively, the MD online tutorial available on our group website can be used: http://www.strodel.info/index_ files/lecture/html/tutorial.html. The length of this simulation depends on the size of the peptide under study, for Aβ16–22 a simulation length of 1 μs or longer is recommended. The most stable monomer structures can be determined using

238

Suman Samantray et al.

conformational clustering [31], and six of these structures are used to build the initial system of six Aβ16–22 monomers randomly placed in a simulation box. The initial simulation of the monomer is performed to avoid aggregation of artificial peptide structures in the following step, which would require more simulation time for relaxation of such aggregates or, even worse, might lead to artefacts in the simulation data. 2. To randomly position the six monomers into a simulation box, we use PACKMOL. The sample script below places six Aβ16–22 peptides with at least 1.2 nm (or 12 Å as in the script) distance between them in a simulation box of size ~10 nm  10 nm  10 nm. #Six monomers of abeta16-22 peptide #minimum distance between two monomers tolerance 12.0 seed -1 #The file type of input and output files is PDB filetype pdb #The name of the output file output abeta16-22_hexamer.pdb #add TER to after each monomer chain add_amber_ter #distance from the edges of box add_box_sides 1.0 #path to input structure file #units for distance is measured in Angstrom #box size is 100 Å structure abeta16-22.pdb number 6 inside box 0. 0. 0. 100. 100. 100. end structure

Alternatively, one can use GROMACS to achieve the same goal:

Molecular Dynamics Simulations of Protein Aggregation: Protocols for. . .

239

gmx insert-molecules -ci abeta16-22.pdb -nmol 6 -box 10 10 10 -o abeta16-22_hexamer.pdb

The resulting simulation box is exemplarily shown in Fig. 1. 2.1.2 Creation of Directories for the Different Simulation Steps

For the execution of the following MD steps, it is advantageous to perform these in separate directories, which avoids accidental replacement of files. To this end, directories for the five major steps are created: topology building, energy minimization, NVT equilibration, NPT equilibration, and MD production run. mkdir 1-topol 2-em 3-nvt 4-npt 5-md

For each step, an .mdp file is required. The mdp file type extension stands for molecular dynamics parameters as these files contain all the key parameters to set up an MD simulation with GROMACS. The five .mdp files required are provided in Appendix 1 at the end of this chapter. Create a directory, mkdir mdp

and copy these .mdp files to that directory. 2.1.3 Topology Building

In this step the topology file is created. It contains information about molecule types and the number of molecules, which will be simulated. As input, the .pdb file from the previous step is taken and, in addition to the topology file, a .gro file is produced, which, like a .pdb file, also contains the coordinates of the simulated system. The main difference between them is their format. Moreover, a .gro file can also hold velocities. 1. Download the Charmm36m force field from http://mackerell.umaryland.edu/download.php? filename¼CHARMM_ff_params_files/charmm36-mar2019. ff.tgz and copy it to the 1-topol directory. Change to that directory:

cd 1-topol/

240

Suman Samantray et al.

Fig. 1 Illustrative example of six Aβ16–22 peptides (shown as surfaces in different colors) randomly placed in a cubic box surrounded by solvent molecules (shown as grey dots)

2. Run the GROMACS pdb2gmx command to process the input structure file and create the topology file with .top extension, topology include files with .itp extension, and position restraint files with .itp extension. gmx pdb2gmx -f ../abeta16-22_hexamer.pdb -o protein.gro -p topol.top -ignh -ter =0} { lappend transition [list $pl $o] } } } } incr n } } set prevolig $olig set oldoligomer $oligomer set a2 {}

Molecular Dynamics Simulations of Protein Aggregation: Protocols for. . .

275

# For each transition identify the aggregation states foreach t1 [lsort -unique $transition] { set a1 {} set b1 {} set S {} foreach t2 $t1 { set frame [expr $j-$f] animate goto $frame mol ssrecalc top lappend a1 [llength $t2] set ss {} set O [llength $t2] #

set be 0

#Monomer, call procedures if {$O == 1 } { set Sh [NPMI $t2 $frame] set be [beta $t2 $frame] } else { #Oligomer, call procedures set Sh [NPMI $t2 $frame] set be [beta $t2 $frame] } # Assign a state consisting of three order parameters set states [join $O|[expr round($be/$O)]|$Sh ""] lappend S $states } lappend a2 $a1 set l [lindex $a1 0] set k [lindex $a1 1] if {$cnt!=0} { lappend TS $S } } } } # Write transition matrix and states attributes to files if {1} { set S_unique [lsort -unique [join $TS]] set bbins [llength $S_unique] for {set i 0} {$i < $bbins} {incr i} { for {set j 0} {$j < $bbins} {incr j} { set b($i,$j) 0 }

276

Suman Samantray et al. } foreach trans $TS { set i [lsearch $S_unique [lindex $trans 0]] set j [lsearch $S_unique [lindex $trans 1]] set b($i,$j) [expr $b($i,$j)+1] } set row2 {} for {set i 0} {$i < $bbins} {incr i} { set row2 {} for {set j 0} {$j < $bbins} {incr j} { lappend row2 $b($i,$j) } puts $fil1 $row2 puts

$row2

} # Create attributes file set all_states {} set count 0 foreach v $TS { if {$count < $pepno} { lappend all_states [lindex $v 0] lappend all_states [lindex $v 1] } else { lappend all_states [lindex $v 1] } incr count } set id 1 puts $fil2 "id state oligomer beta-sheet compactness population" foreach val $S_unique { puts $fil2 "$id $val [lindex [split $val ""] 0] [lindex [split $val ""] 2] [lindex [split $val ""] 4] [llength [lsearch -all $all_states $val]]" incr id } } close $fil1 close $fil2 exit

Molecular Dynamics Simulations of Protein Aggregation: Protocols for. . .

277

convert2csv.tcl: for converting the transition matrix to csv format #!/usr/bin/tclsh proc splitby { string spl_str } { set lst [split $string $spl_str] for { set cnt 0 } { $cnt < [llength $lst] } { incr cnt } { if { [lindex $lst $cnt] == "" } { set lst [lreplace $lst $cnt $cnt] incr cnt -1 } } return $lst } set input

[lindex $argv 0]

set output1 [lindex $argv 1] set fil1 [open $input r] set fil2 [open $output1 w] array unset a set i 1 set firstR {} # Read input matrix into variable "a" while {[gets $fil1 line1] >=0} { set firstR [concat $firstR ";$i"] set row [splitby $line1 " "] set j 1 foreach r $row { set a($i,$j) $r incr j } incr i } set n [expr $i-1] # Write output matrix to file puts $fil2 $firstR for {set i 1} {$i 2:1 were considered, and these values were normalized to the maximum amplitude in each condition (Fig. 1c).

l

All statistics were calculated using GraphPad Prism 8.0. Three independent ThT experiments were run for each condition. The data were normalized to the highest amplitude; averages and standard deviations were plotted. Plots were fitted to a nonlinear regression model, from which t1/2 values were derived. t1/ 2 error represents a 95% CI (Fig. 1d).

l

Build a linear structure for the sequence of interest using the “Build” function in Pymol and save the structure as a “.pdb” file.

l

Peptides were built for the following sequences:

l

R1R3 WT: TENLKHQPGGGKVQIVYK R1R3 P270S: TENLKHQSGGGKVQIVYK R2R3 WT: KDNIKHVPGGGSVQIVYK R2R3 P301S: KDNIKHVSGGGSVQIVYK Build the MD system in GROMACS using the AMBER99sbildn force-field [18] and SPCE explicit waters (#1, (see Note 7)). For the system, use a dodecahedron box, and Na+ or Cl ions to neutralize the system (#1–6). This will prepare the system for minimization and equilibration.

l

Create five copies of the system for five independent runs.

l

For minimization, stop after the 1009 steepest descent steps of converged energy minimization (#7–8). Then follow by 10 ns of

3.5 Molecular Dynamics Simulation

Modifying Amyloid Motif Aggregation Through Local Structure

351

NVT (#9–10) and 20 ns of NPT: first 10 with Berendsen [19] (#11–12) and the last 10 with Parrinello–Rahman [20] barostats (#13–14), for equilibrium. l

For production, set 5 fs time steps using hydrogen-only virtual sites [21]. The production level trajectories are calculated for an NPT ensemble with Parrinello–Rahman barostat, periodic boundary conditions with Particle Mesh Ewald (PME) [22], and summation for long-range electrostatics (#15–16).

l

Run each trajectory for 1μs for a total of 5μs split in five independent runs per peptide of interest (see Note 8). ## Create coordinate and topology file 1- gmx pdb2gmx -vsite h -f peptide.pdb -ignh -o peptide. gro -water spce ## center, form box, solvate, add ions 2- gmx editconf -f peptide.gro -o final_cen.gro -center 0 0 0 3- gmx editconf -f final_cen.gro -o newbox.gro -c -d 1.5 -bt dodecahedron 4- gmx solvate -cp newbox.gro -cs spc216.gro -p topol.top -o solvated.gro 5- gmx grompp -f ions.mdp -c solvated.gro -p topol.top -o ions.tpr 6- gmx genion -s ions.tpr -o solvated_ions.gro -p topol. top -pname NA -nname CL -neutral ## minimize, nvt, and then 2 different NPT equilibrations 7- gmx grompp -f minim.mdp -c solvated_ions.gro -p topol. top -o minimized.tpr 8- gmx mdrun -v -deffnm minimized 9- gmx grompp -maxwarn 2 -f nvt.mdp -c minimized.gro -p topol.top -o nvt.tpr 10- gmx mdrun -v -deffnm nvt 11- gmx grompp -maxwarn 2 -f berend.mdp -c nvt.gro -p topol.top -o berendsen.tpr 12- gmx mdrun -v -deffnm berendsen 13- gmx grompp -maxwarn 2 -f par.mdp -c berendsen.gro -p topol.top -o par.tpr 14- gmx mdrun -v -deffnm par ## production level NPT, with Parrinello-Rahman pressure coupling 15- gmx grompp -maxwarn 2 -c par.gro -f npt.mdp -t par.cpt -p topol.top -o npt.tpr 16- gmx mdrun -v -deffnm npt

352

Sofia Bali and Lukasz A. Joachimiak

A total of 20μs trajectories were generated, 5μs for R1R3 WT, 5μs for R2R3 WT, 5μs for R1R3 P270S, and 5μs for R2R3 P301S, over a total of 20 independent runs. All simulations were performed on the University of Texas Southwestern Medical Center bioHPC computing cluster running Red Hat Enterprise 7.4. 3.6 Molecular Dynamics Analysis

l

For each replicate, prepare the system for analysis using trjconv with “pbc cluster” and “fit rot+trans” (#17–18). Then remove the first 10 ns and trim the trajectory to every 100 ps to remove the initial conformational changes starting from the extended state (#19).

l

Per replicate, calculate the root mean square fluctuation (RMSF) using “gmx rmsf” with the “res” flag to output the RMSF per residue and average structure (#20). Then calculate the 95% confidence interval for the RMSF to assess the spread across replicates (Fig. 2a, d).

l

Compute a paired t-test to determine the statistical difference between two peptides overall and multiple comparisons to determine the statistical difference per residue between two peptides. Our results were calculated with GraphPad Prism 8.4 (Fig. 2a,d).

l

To create the contact maps across the five simulations, first, combine the individual trajectories into one using “gmx trjcat” (#21). Then calculate the mean minimum distance using the use the “gmx mdmat” command with the “mean” flag (#22) (Fig. 2c–f).

l

For determining the interactions between the amyloid motif (i.e., VQIVYK) and the preceding sequence, calculate the pairwise distance between residues. First, create an index file with two groups for the first and second half of the peptide (e.g., r1 1–9 and ri 10–18) and using “gmx make_ndx” (#23). Then use “gmx pairdist” to calculate the pairwise distance between every residue in the two groups (our output contained 81 columns, 1 per pair, with a distance for every timepoint), use a large distance cutoff (e.g., 3.2 nm) to capture all distances and improve performance (#24). Then use “gmx analyze” with the “dist” flag to obtain the distribution of contact distances per residue pair (#25). Repeat for each replicate. For our sample peptides, we had 81 distributions for each replicate of each peptide.

l

Determine the pairwise %contacts by using a cutoff of 0.45 nm and calculate the percentage from the pairwise distributions. Repeat for each replicate and calculate the mean pairwise % contacts for each peptide. For comparing contacts between peptides, perform a Kolmogorov–Smirnov (KS) test between each pairwise %contact to determine which contacts are

Modifying Amyloid Motif Aggregation Through Local Structure

353

significantly different across peptides. Then visualize by plotting the difference in pairwise %contacts (Fig. 3). l

Randomly select structures from simulations with contacts that are significantly different by using the timepoint/pairwise distance file produced by “gmx pairdist.” Use a text processor to select rows that have a distance less than the cutoff (0.45 nm) for each column that represents a significant contact and output the timepoints. Then randomly select one of those timepoints, output the structure at that time and visualize significant contacts by mapping onto structure (Fig. 3c). For outputting a *.pdb file for a specific timepoint, you can use the “gmx trajcat” command with the dump flag (#26). ## Center and fix trajectory 17- gmx trjconv -s npt.tpr -f npt.xtc -center -pbc cluster -o npt_centered.xtc 18- gmx trjconv -f npt_centered.xtc -s npt.tpr -fit rot +trans -o npt_centered_fitted.xtc 19- gmx trjconv -f npt_centered_fitted.xtc -o npt_centered_fitted_clipped.xtc -b 10000 -skip 10 #Calculate RMSF, contact map, and pairwise %contact 20- gmx rmsf -f npt_centered_fitted_clipped.xtc -s npt. tpr -o rmsf_perres.xvg -ox average.pdb -oq bfactors-residue.pdb -res 21- gmx trjcat -f npt_centered_fitted_clipped* -o combined.xtc -cat -settime 22- gmx mdmat -f combined.xtc -s npt.tpr -mean contactmap.xpm 23- gmx make_ndx -f npt.tpr -o grps.ndx 24- gmx pairdist -f npt_centered_fitted.xtc -s npt.tpr -n grps.ndx -o PWdist.xvg -refgrouping res -selgrouping res -cutoff 3.2 25- gmx analyze -f PWdist.xvg -dist PWdist_distribution. xvg -bw 0.2 > PWdist_stats.dat 26- gmx trjconv -f npt_centered_fitted_clipped.xtc -s npt.tpr -dump ### -o sample_struct.pdb

This protocol leverages the structural detail of molecular dynamics simulations to understand the aggregation properties of peptides with amyloid motifs. By using peptides for the aggregation experiments, specific sequence modifications that result in changes in aggregation behavior may be observed. These changes in

354

Sofia Bali and Lukasz A. Joachimiak

behavior can then be correlated with specific contacts that either stabilize or disrupt local structure using MD simulations. MD simulations can also provide information on many other things such as backbone conformation and solvent accessible surface area of the amyloid motif, which can be used to assess how the changes in conformation are affecting aggregation properties of the peptide. Studies into the structural behavior around amyloid motifs can reveal early steps in protein aggregation that could be used for the diagnosis and treatment of protein misfolding diseases.

4

Notes 1. Rather than purchasing highly purified peptides (>95%), crude peptide preparations can be purchased from peptide vendors at a lower cost. Crude peptide preparations can be purified to >95% purity by RP-HPLC using an Agilent ZORBAX StableBond 250 mm C8 column (Agilent) eluting with an acetonitrile gradient. The identity of the species must be confirmed by mass spectrometry. Mass-verified peptides can be lyophilized and used in subsequent ThT fluorescence aggregation experiments. 2. Peptide monomerization is essential to remove any potential preexisting seeding species from the lyophilized peptide samples. The extent of peptide solubilization can be confirmed by recovery yields per mass by RP-HPLC. If TFA solubilization is insufficient, hexafluoroisopropanol (HFIP) can be used to monomerize the peptides [23]. Additionally, for some peptides (i.e., amyloidβ), they may be dissolved in ammonium hydroxide. 3. Acidic buffer conditions will interfere with the aggregation kinetics of the peptide if the pH < 5. To resolve this, after disaggregation in TFA, blow off the acid with nitrogen gas in a fume hood, freeze in liquid nitrogen and lyophilize for 1h. Resuspend peptide in 0.22 μm filtered Millipore water, then flash freeze the peptide in liquid nitrogen and lyophilize overnight to remove residual TFA. 4. For more accurate concentration calculate the E0.1% (1 mg/ mL) value at 205 nm: extinction coefficient at 205 nm [M1 cm1] divided by molecular weight [g/mol]. Update the value on your small volume spectrophotometer per peptide. Sample calculator: spin.niddk.nih.gov/clore/Software/ A205.html. 5. ThT containing buffers should have a light yellow color.

Modifying Amyloid Motif Aggregation Through Local Structure

355

6. Gentle shaking of the plate in the plate reader (if possible) during the incubation period can help accelerate peptide aggregation kinetics. 7. N- and C-terminal modifications (e.g., acetylation and amidation) may also be added in Pymol for use with AMBER99sbildn force-field, but these modifications require changes to the AMBER atom definition file. If using a non-AMBER force field, N- and C-terminal modifications can be added directly in gromacs by adding the “-ter” flag to the “gmx pdb2gmx” command. 8. Simulation length and number of replicates should be determined empirically as the appropriate sampling required for analysis will depend on peptide length and sequence. Increasing simulation length may allow enrichment of more stable conformations, while increasing replicates may increase conformations sampled.

Acknowledgments Work in the Joachimiak lab was supported by the Marie Effie Cain Endowed Scholarship, a Chan Zuckerberg Initiative Collaborative Science Award (2018-191983), and a Welch Foundation grant (I-1928-20200401). SB was supported by a Molecular Biophysics Training Grant (T32 GM008297). We thank Dr. Milo Lin, Dr. Levant Sari, for their guidance in the molecular dynamics simulations. We thank members of the Joachimiak lab for discussions. References 1. Geddes AJ et al (1968) “Cross-beta” conformation in proteins. J Mol Biol 32(2):343–358 2. Soto C (2003) Unfolding the role of protein misfolding in neurodegenerative diseases. Nat Rev Neurosci 4(1):49–60 3. Thompson MJ et al (2006) The 3D profile method for identifying fibril-forming segments of proteins. Proc Natl Acad Sci U S A 103 (11):4074–4078 4. Walsh I et al (2014) PASTA 2.0: an improved server for protein aggregation prediction. Nucleic Acids Res 42(Web Server issue): W301–W307 5. Rousseau F, Schymkowitz J, Serrano L (2006) Protein aggregation and amyloidosis: confusion of the kinds? Curr Opin Struct Biol 16 (1):118–126 6. Goldschmidt L et al (2010) Identifying the amylome, proteins capable of forming

amyloid-like fibrils. Proc Natl Acad Sci U S A 107(8):3487–3492 7. Sabate R et al (2015) What makes a protein sequence a prion? PLos Comput Biol 11(1): e1004013 8. Uversky VN, Oldfield CJ, Dunker AK (2008) Intrinsically disordered proteins in human diseases: introducing the D2 concept. Annu Rev Biophys 37:215–246 9. Chen D et al (2019) Tau local structure shields an amyloid-forming motif and controls aggregation propensity. Nat Commun 10(1):2493 10. Shida T et al (2020) Short disordered protein segment regulates cross-species transmission of a yeast prion. Nat Chem Biol 16(7):756–765 11. Mirbaha H et al (2018) Inert and seedcompetent tau monomers suggest structural origins of aggregation. elife 7:e36584

356

Sofia Bali and Lukasz A. Joachimiak

12. Schoch KM et al (2016) Increased 4R-tau induces pathological changes in a human-tau mouse model. Neuron 90(5):941–947 13. Tribello GA et al (2014) PLUMED 2: new feathers for an old bird. Comput Phys Commun 185(2):604–613 14. Louros N et al (2020) WALTZ-DB 2.0: an updated database containing structural information of experimentally determined amyloid-forming peptides. Nucleic Acids Res 48(D1):D389–D393 15. Buchan DWA, Jones DT (2019) The PSIPRED protein analysis workbench: 20 years on. Nucleic Acids Res 47(W1): W402–W407 16. Petersen B, Lundegaard C, Petersen TN (2010) NetTurnP—neural network prediction of beta-turns by use of evolutionary information and predicted protein sequence features. PLoS One 5(11):e15079 17. Wilkins MR et al (1999) Protein identification and analysis tools in the ExPASy server. Methods Mol Biol 112:531–552

18. Lindorff-Larsen K et al (2010) Improved sidechain torsion potentials for the Amber ff99SB protein force field. Proteins 78(8):1950–1958 19. Berendsen HJC et al (1984) Molecular dynamics with coupling to an external bath. J Chem Phys 81(8):3684–3690 20. Parrinello M, Rahman A (1981) Polymorphic transitions in single crystals: a new molecular dynamics method. J Appl Phys 52 (12):7182–7190 21. Feenstra KA, Hess B, Berendsen HJC (1999) Improving efficiency of large time-scale molecular dynamics simulations of hydrogen-rich systems. J Comput Chem 20:786–798 22. Darden T, York D, Pedersen L (1993) Particle mesh Ewald: ANN·log(N) method for Ewald sums in large systems. J Chem Phys 98 (12):10089–10092 23. O’Nuallain B et al (2006) Kinetics and thermodynamics of amyloid assembly using a highperformance liquid chromatography-based sedimentation assay. Methods Enzymol 413:34–74

Chapter 16 Assessing the Stability of Biological Fibrils by Molecular-Scale Simulations Rodrigo A. Moreira, Joseph L. Baker, Horacio V. Guzman, and Adolfo B. Poma Abstract The nanomechanical characterization of several biological fibrils that are the result of protein aggregation via molecular dynamics simulation is nowadays feasible, and together with atomic force microscopy experiments has widened our understanding of the forces in the regime of pN-nN and system sizes of about hundreds of nanometers. Several methodologies have been developed to achieve this target, and they range from the atomistic representation via molecular force fields to coarse-grained strategies that provide comparable results with experiments in a systematic way. In this chapter, we discuss several methodologies for the calculation of mechanical parameters, such as the elastic constants of relevant biological systems. They are presented together with details about parameterization and current limitations. Then, we discuss some of the applications of such methodologies for the description of bacterial filament and β-amyloid systems. Finally, the latest lines of development are discussed. Key words Nanomechanics, Protein aggregate, Coarse graining, Bacterial filament, β-Amyloid, Young modulus, All atom MD, Go ¯ -MARTINI, Steered MD, AFM, Indentation, Tensile, Shear

1

Introduction The fast-paced development of the atomic force microscopy (AFM) [1] technique has definitely impacted several fields of science and technology such as biology, physics, chemistry, and biotechnology. The AFM technique’s capability to record high-resolution images of different materials and its versatility to quantify mechanical properties (fatigue, elastic parameters, hardness, etc.) based on data collected in situ makes this approach the ideal technique to assess stability and for the codesign of novel biomaterials. In particular, AFM has become a routine tool to probe the mechanical stability of large protein complexes, which can span from single protein modules to large protein assemblies such as viral-like particles and long protein fibrils (e.g., bacterial filaments and β-amyloid system [2, 3]). However, despite the many useful insights that can

Mai Suan Li et al. (eds.), Computer Simulations of Aggregation of Proteins and Peptides, Methods in Molecular Biology, vol. 2340, https://doi.org/10.1007/978-1-0716-1546-1_16, © Springer Science+Business Media, LLC, part of Springer Nature 2022

357

358

Rodrigo A. Moreira et al.

be obtained by AFM, a detailed molecular analysis of the mechanical response of protein filaments at the nanometer scale is still a challenging task. In this regard, several computational strategies based on high-resolution methods such as atomistic simulations and low-resolution ones such as the coarse-grained approach, and also continuum models have been further extended and are discussed in this chapter. The latter approach is broadly used among experimental scientists because of its rapid applicability for estimating elastic properties based on the response of the system. The underlying microscopic processes involved during first stages of the mechanical deformation are bound to smaller lengths and time scales, and thererefore they are in general not accounted by the continuum model. To further elucidate phenomena taking place in very high-resolution AFM and which cannot be explained by purely continuum models [4], we consider atomistic simulations and coarse-grained methodologies. The latter is becoming an important complementary tool to validate atomistic predictions as coarse-grained methodologies are less limited by the time and length scales of the underlying physical process being investigated. Coarse-grained models are useful because they remove unnecessary degrees of freedom of the system, which later enables them to reach the experimental time and length scales that are needed to describe the relevant phenomena while maintaining a molecular-level description of the systems under consideration [5]. In the next section, we describe three computational approaches for the study of mechanical stability in protein aggregates.

2

Methodologies for the Characterization of the Mechanical Stability

2.1 The Continuum Approach

The mechanical interaction of contacting objects “pairs” constitutes a challenging research question when speaking at different scales and hence is tackled with different methods, like: continuum mechanics, molecular dynamics, and finite element analysis. A common crucial factor on which method to use is the required accuracy, interacting length-scales, and the computational power involved. All-atom MD and coarse-grained are the computational most demanding methods and will be tackled in the upcoming Sects. 2.2 and 2.3. In this section, we formulate the mechanical stability of biological fibrils from a continuum approach. Specifically, for an ensemble of a rigid indenter in contact with a biological fibril, which resembles the interaction between the atomic force microscopy tip and the fibril under study. While several analytical theories have been proposed for tackling such mechanical response, the most robust and widely spread is still Hertzian mechanics.

Assessing the Mechanical Stability of Biological Fibrils

359

Fig. 1 (a) Before deformation, two elastic spheres touch at point 0, and two adjacent points can be defined with functions: wt(0) and ws(0). (b) Applying a concentrated force F0, the spheres are brought into mechanical compression whereby the relative penetrations are defined by wt(r) and ws(r). (c) Integration area for calculating the force distribution over the contact area as a function of θ and s 2.1.1 Hertzian Mechanics

In 1882 Hertz [6] first solved the problem of pressure distribution between two elastic bodies coming into a mechanical contact (see Fig. 1). The assumptions taken by Hertz can be summarized as follows: l

The contacting bodies are solids made of isotropic and homogeneous materials in the linear elastic regime.

l

The size of the contact radius is very small in comparison to the radii of curvature of the nondeformed body.

l

Each sphere can be considered as an elastic half-space. In the case of a tube, this condition is generally not fulfilled and applies only to a one-dimensional surface sample.

l

Only normal stress is present in the contact interface, where no friction is taking place.

Boussineq’s solution [7] to a perpendicular force F0 concentrated at the point r ¼ 0 of a half-space determines a vertical displacement of the surface w(r), which can be expressed as,   F 0 1  ν2 ð17:1Þ w ðr Þ ¼ πEr where E and ν denote the Young’s modulus and Poisson’s ratio of the half-space, respectively, and r is the radial distance from the point where the concentrated force is applied. Based on this solution, Hertz proposed the surface displacement of two contacting spheres as,   1  ν2i w i ðr Þ ¼ πE i Zac 

Z2π P ðs Þsds

0

0

 2 1=2 r þ s 2  2rs cos θ dθ

ð17:2Þ

360

Rodrigo A. Moreira et al.

where P is the pressure acting over a contact area, s and θ are coordinate parameters defined in Fig. 1. In addition, by assuming that the profiles of the spheres are parabolic (r2/(2R)), w1 þ w2 ¼ δ  1 πE

Zac

Z2π P ðs Þsds

0



r2 , 2R

r 2 þ s 2  2rs cos θ

ðr < ac Þ 1=2

dθ ¼ δ 

ð17:3Þ r2 2R

ð17:4Þ

0

where the or reduced Young  combined   modulus of both spheres is E ¼ 1= 1  ν21 =E 1 þ 1  ν22 =E 2 and a reduced Radius of R ¼ 1/(1/R1 + 1/R2). By assuming an elliptical distribution of pressures and no lateral displacement of the elastic bodies, Hertz proposed the solution to Eq. (4) as,  1=2 2a E r2 , ðr < ac Þ ð17:5Þ P ðr Þ ¼ c 1 2 πR ac where the penetration δ ¼ a 2c =R . Furthermore, integrating the term P(r) over the contact area ac yields the Hertzian force [7], pffiffiffiffiffiffi 4E a 3c 4E Rt δ3=2 F ¼ ð17:6Þ ¼ 3Rt 3 where R!Rt. Hertzian mechanics has proven to be a good description at macroscopic scales with a lower boundary at the nanoscale, although intermolecular adhesion forces between two solid surfaces have not been considered. After Hertz’s pioneering work, Johnson, Kendal, and Rayleigh (JKR) [8], and Derjaguin, Mueller, and Tuporov (DMT) [9] introduced adhesion forces to the Hertzian force Eq. (6), as explained elsewhere [7]. Interestingly, DMT and JKR models predicted different values of the adhesives force. In 1976, Tabor introduced the Tabor number (μT) [10], which is built into a function that reproduces both JKR and DMT models. The range of application of the different contact mechanics models is illustrated in Fig. 2. Nonetheless, this analytical framework relies on several assumptions, namely: the deformations are small (ac  Rt), no viscosity is considered and the material complies with the linear elastic regime. Moreover, no lateral deformations are induced and the contact geometry is axisymmetric. In this context, the continuum mechanics framework has provided good descriptions of experimental results (in specific cases) at the nanoscale [13–15]. Nonetheless, the application of continuum mechanics could present some restrictions as explored through atomistic simulations [4]. In a liquid environment, the aforementioned adhesion forces are generally screened [16, 17] and hence the interaction forces sensed at the tip are determined by the electrical double layer force,

Assessing the Mechanical Stability of Biological Fibrils

361

Fig. 2 Johnson and Greenwood continuum mechanics map as a function of the reduced Young modulus E. The curves corresponding parameters are: Rt ¼ 5 nm, surface tension γs ¼ 2 mJ/m2 and σ th ¼ 2γs/zeq. The MaugisDugdale contact model with the adhesive force is described in [11]. The figure has been adapted from [12]

viscoelastic and solvation forces. Nevertheless, under high-resolution imaging (AFM) conditions, the latter forces are commonly neglected during characterization [18–20]. 2.1.2 The Ensemble Tip and Biological Fibrils

In Sect. 2.1.1, we have described the general theoretical framework to obtain the mechanical properties of matter given an exerted force. Nevertheless, in order to properly describe biological fibrils, we need to narrow down the modeling by considering some analytical theories with their initial roots on Hertzian mechanics, like Tatara’s model. In 1989, Tatara proposed a model to describe both vertical and lateral displacements during a soft body compression [21]. This soft body fulfills the incompressibility criteria of Boussineq [7] and hence the same Hertzian based theoretical framework can be used. Here, the total indentation due to vertical and lateral displacements of the sample is written as, L δT ¼ F 2=3 =n2=3  F a 2c i þ F K i ð17:7Þ 2Ri pffiffiffiffiffi where n ¼ ð4=3ÞE R . The total indentation of Eq. (7) reflects only one point of contact tip–sample and not the one sample– substrate, as depicted in Fig. 3a. However, it is assumed that the study is not focused on a half-space sample because fibrils are commonly treated as cylindrical bodies, that is, finite samples.

362

Rodrigo A. Moreira et al.

Fig. 3 (a) Illustration of the Tatara contact mechanics when a force is applied to a finite sample but experiences a two contact deformation (tip–samples and sample–substrate). (b) The force for one point of contact, that is, the sample is considered as a half-space (Rs > Rt). In addition, a simulation result of the indentation cycle in the elastic regime is also shown, in terms of force (blue dashed) and the position of the tip (black line). (Adapted from [20])

Now, combining the method of Johnson for cylindrical bodies on pages 129–134 in [7], combined with a series expansion applied to Eq. (7), the exerted force can be written as the sum of power laws, given by, F Tatara ¼

2

n 3=2 δ 3=2 T

þ

3n2 2 15n3 5=2 δT þ 11=2 δT 8nc 2 n2

ð17:8Þ

c

where, nc ¼

4πE t Rt E s Rs 6 þ νt  2ν2t þ νs  2ν2s

ð17:9Þ

The subindexes t and s stand, respectively, for tip and sample. In the above equations, δ is the indentation, νt and νs are the Poisson coefficients and E the Young modulus. Figure 3a depicts the tip–finite sample ensemble. The forces mentioned above can be applied to a general type of surface or a specific one. So far, a widespread assumption is to model those forces with a surface sample characterized by a half-space. There are however, alternative models that can be applied to cylindrical bodies, such as the Tatara model [21]. In the next sections, we show two computational approaches that can be used to accompany the continuum model in the nanoscale regime. 2.2 All-Atom MD Approach for the Study of Filament Response to Force

All-atom molecular dynamics simulation is a powerful computational method in which the equations of classical mechanics (Newtonian mechanics) are used to propagate the motion of every atom in the molecular system over time [22–26]. By evaluating the motion of particles in a particular ensemble (NPT, NVT, etc.), insights into the underlying physics and chemistry of molecules

Assessing the Mechanical Stability of Biological Fibrils

363

can be studied, and direct visualization of molecular motion at a resolution in time and space that is unachievable by experimental methods can be obtained. In contrast to the continuum approach (described above) or to structure-based coarse-grained methods (described in a subsequent section), all-atom molecular dynamics methods provide insights into the study of proteins, and other biomolecular systems, with a high degree of chemical specificity. This section provides a brief overview of some considerations for the application of all-atom molecular dynamics simulation [22–26], and in particular, the steered molecular dynamics method [27–31], to the study of the properties of protein filaments. Steered molecular dynamics is a method in which additional external forces are applied to a molecular system beyond the typical terms found in the molecular dynamics potential, and the approach can mimic experimental methods such as optical tweezers or atomic force microscopy in which individual proteins or macromolecular complexes are extended under force [27–31]. 2.2.1 Preparing the Initial Coordinates for All-Atom MD Simulation

Cryo-electron microscopy based models of protein filaments with periodic structure are of increasingly better resolution [32, 33] and have been determined for systems such as pili, actin, microtubules, etc. [34–37]. These models can be used as the initial positions of the atoms for molecular dynamics simulations of filament systems (see, for example [2, 38, 39]). To prepare the coordinates for all-atom simulation, solvation of the filament system with explicit water molecules and the introduction of monovalent ions to balance the protein net charge (and to provide a specific ion concentration if desired) can be accomplished using a number of software packages [40–42]. A schematic example of a protein filament in a water box is depicted in Fig. 4a in which the long dimension of the water box is aligned along the filament’s central axis to aid in reducing the overall system size. With the use of explicit water molecules in the simulation, the number of simulated atoms can increase quickly, and therefore limitations related to system size become an immediate consideration [40–42]. In a steered molecular dynamics simulation where the protein filament will be extended further from its initial length, additional water molecules must also be included along the direction of the filament elongation (for example, along the filament axis in Fig. 4a). Therefore, a great deal of computational effort could be spent on the dynamics of these additional water molecules in a steered MD simulation, and the computational cost of all-atom MD explicit solvent simulations can increase quickly depending on the length of the filament one intends to model and whether the effects of the steering force on the system beyond initial stages of filament elongation are of interest [2].

364

Rodrigo A. Moreira et al.

Fig. 4 (a) Depiction of an all-atom filament (image is derived from the PDB entry 5KUA [43], with residues 1–30 of each subunit removed). Each subunit is colored differently for clarity. Background shading represents the explicit solvent water environment in which the filament would be embedded in an MD simulation. The orange arrow depicts the direction along which an external force would be applied in steered MD to produce filament elongation. The protein structure was rendered using VMD [44]. (b) A cartoon example of a force vs. extension plot for an arbitrary system. Blue shaded region represents the linear (“elastic”) region of the force vs. extension curve that would be used to determine Keff in the equation for Young’s modulus, Eq. (12) 2.2.2 Steered All-Atom MD for Assessing Filament Force Response

While the details of the preparation of a protein filament for steered molecular dynamics will vary widely depending on the specifics of the protein being simulated, typically a series of energy minimization, heating, and equilibration stages are performed to prepare the filament system for further investigation [2, 45, 46]. Each of (or some subset of) these stages may include the application of harmonic positional restraints to the backbone or alpha carbon atoms of the filament, which aids in maintaining the overall filament structure during these initial simulation stages. Once the system has been adequately equilibrated, steered molecular dynamics simulations can be used to explore filament conformational change. Here we will focus on some of the general attributes of the constant velocity steered molecular dynamics method as applied to filament simulations, while Sect. 3 will discuss more specific details and values of parameters used in simulations of bacterial filaments. To simulate filament elongation using constant velocity steered MD, some set of atoms must be chosen to act as the pull and/or fixed atoms, as well as the pulling direction, pulling speed, and the stiffness of the spring that is used to guide the steered atoms along the pulling direction [2, 27–31, 46]. For example, if the protein filament’s long axis is aligned with the coordinate system z-axis, then the filament extension can be simulated by choosing the direction of pulling along the positive z-direction. It is advisable to take a trial-and-error approach to explore several spring constants and pulling speeds to gauge how robust the results of a particular simulation are with respect to choices of these parameters. One must be mindful, however, that while the slowest speed possible is desired to be closer to experimental pulling

Assessing the Mechanical Stability of Biological Fibrils

365

speeds, even the slowest speeds that are attainable in all-atom steered MD simulations of large system are typically many orders of magnitude faster compared to those used in experiments [29, 31, 47]. From the constant velocity steered MD simulations, the filament force response can be used to calculate Young’s modulus of the filament by performing a linear fit to the force versus extension data in the “elastic” region of the filament deformation [48, 49] (see Fig. 4b). An estimate of the cross-sectional area and initial length of the filament model are also needed to perform the Young’s modulus calculation, which can be obtained by measuring the radius and length of the filament directly from the constructed system coordinates. Furthermore, a specific advantage of using the all-atom steered MD method to investigate protein filaments is the ability to obtain information about specific interactions at the atomistic level that cannot be monitored using a more coarsegrained approach. For protein filaments, these interactions in aggregate can be very important for filament stability, and so the ability to monitor how they change as a function of the filament elongation under stress can provide important insights into the most critical interactions for filament integrity. For example, there are many prebuilt tools available for the measurement of hydrophobic contacts, hydrogen bonds, salt-bridge interactions, solvent accessible surface area, protein secondary structure, and a wide variety of other features that are accessible at the all-atom level of resolution [44, 50–54]. Finally, if the free energy along the reaction path (in this case, the filament extension) is desired, the Jarzynski equality [55] relates the equilibrium free energy to the nonequilibrium work obtained in steered molecular dynamics simulations. Therefore, in principle, the equilibrium free energy could be calculated from the steered MD simulations, however care must be taken in the selection of initial snapshots from an equilibrated thermodynamic ensemble for use of the Jarzynski method and the practical limitations of this approach would be particularly significant for such large macromolecular assemblies like protein filaments [56–58]. 2.3 Structure-Based Coarse Grained Model for the Nanomechanical Characterization

As we know the time scales of several biological processes involving unfolding of proteins, protein aggregation toward the fibril-like state, among others are in the range of few 106 to 103 s, and thus they are orders of magnitude larger than typical timescales captured in brute-force all-atom MD (e.g., 1015 to 1012 s). The combination of advanced high performance computing (HPC) and enhanced sampling methods have enabled all-atom MD to reach 106 s timescale, but the use and share of HPC can be restricted to a few scientific communities and prohibited for the large sampling of rare events. The length scales are similarly much smaller in all-atom MD simulation than would be relevant for studying

366

Rodrigo A. Moreira et al.

processes involving large conformation changes in large biological complexes. In the context of assessing the mechanical stability of various protein fibrils, for example, β-amyloids [59, 60] and collagen [61], all-atom models have been used to estimate the Young modulus based on the response of the system, but most approximately. Still, molecular-level methods are necessary to understand the microscopic mechanisms of the mechanical response of biological fibrils. In this regard, structure-based coarse-grained (CG) models are suitable [62–66], because they remove several degrees of freedom of the system, which enables them to reach the experimental time and length scales required to describe the relevant phenomena while maintaining a molecular-level description of the systems under consideration. In particular, CG simulation is able to describe large structural changes in the context of fibril deformation, which would be otherwise impossible with all-atom MD models. In particular, CG models can be used to infer the elastic parameter in ideal conditions, which is given by the Hertz model, Sect. 2.1.1 and is valid for isotropic materials. Although biological matter is an example of an anisotropic material, it is not expected to follow a priori a simple Hertz equation (see Eq. (6)) given by F  YTδ3/2 (with YT the Young transversal modulus and δ the indentation depth). When it actually follows this relationship, the Young modulus can be obtained from the slope of the Force vs. indentation depth curve. Such an approach can be used to test the experimental estimation of an elastic property. Most importantly, the mechanism of deformation that gives rise to the linear response can be characterized in the CG simulations. We provide in the next sections more information about CG methodologies for modeling large fibril assemblies. Among them, we have the determination of the native contacts in the so-called contact map in Sect. 2.3.1, also we describe characteristics of the Go¯-like approach [67–69] and its contribution to the MARTINI approach [70] in Sect. 2.3.2, which allows the study of large conformation changes in proteins. 2.3.1 Contact Maps

A contact map is a topological description of a molecule that abstracts its complex geometry into a coordinate-free representation. However, its construction usually requires a set of coordinates. This geometric representation can be obtained from a database such as the Protein Data Bank (PDB) [71]. There is not a simple way to define the contact map, and many reasonable constructions are available in the literature. For example, a picture commonly used is to consider each amino acid as a cluster of spheres, with a contact being formed when two spheres overlap [68, 69]. The use of the overlap (OV) of van der Waals (VdW) spheres associated with heavy atoms from the all-atom well-folded structure of a protein is a commonly accepted procedure [69]. However, the

Assessing the Mechanical Stability of Biological Fibrils

367

Table 1 Atomic classification of rCSU methodology Group

Atom characteristic

Group

Atom characteristic

I

Hydrogen bond donor or acceptor

VI

Neutral atoms

II

Hydrogen bond acceptors

VIII

Neutral-acceptor

III

Hydrogen bond donor

VII

Neutral-donor

IX

Positively charged

V

C atoms on aromatic rings

X

Negatively charged

IV

Hydrophobic

regular [72] VdW radius must be increased to account for the attraction [69] which is in part due to hydrogen bonds. In practice, we increase the VdW radii by a factor 1.24 [73], which is related to the radius of water molecules. The VdW OV contact map considers the variation of the atomic sizes. Consequently, the overlap among different surfaces defining each residue is used to decide about the existence of a contact in the native state [74]. An alternative methodology that also considers the difference of volumes of each amino acid is the restricted Contacts of Structural Unit (rCSU) [73], which also uses the chemical properties of the atoms to create the contact map. First, the rCSU method determines the contact between the heavy atoms, that is, excluding hydrogen atoms, using the OV of enlarged VdW radius. All overlapping spheres, even in the same residue, are candidates for an atom-atom contact. To select which contacts are valid the surface is discretized using a Fibonacci [75, 76] grid, where the centers of the sections are given by   k F Θk ¼ arccos 1  2 , φk ¼ 2πk n1 ð17:10Þ Fn Fn where Θk and φk are spherical coordinates, k runs from 1 to a Fibonacci number Fn, where typically n ¼ 14 [73]. A contact is defined for a section and the closest center with an overlapping sphere corresponding to another atom. Then, each atom is classified into ten (from I to X) major groups, as described in Table 1. In particular, atoms of mainly neutral character, namely, groups VI, VII, and VIII, can create only very weak hydrogen and disperse interactions and are considered as not able to make contacts at all (see Fig. 5). A peculiar characteristic of rCSU is the existence of “destabilizing” contacts due to the repulsion between the full or partial charges of the same sign [73]. When considering all atomic contacts, if the number of stabilizing, attractive, contacts is larger than the number of destabilizing, repulsive, contacts, then the respective pair is considered to form a contact. Once we obtained the contact map from a reliable

368

Rodrigo A. Moreira et al.

Fig. 5 Diagram of atomic rCSU interactions between groups of Table 1. Red (blue) edges are considered different mechanisms responsible for (de)stabilizing interactions

geometric structure, we can use it to increase the accuracy of CG methods, which is described in the next section. 2.3.2 Go¯-MARTINI

MARTINI CG models are based on an alternative mapping of the effective representation of the protein chain [77]. In this model, each CG bead substitutes, on average, four heavy atoms. The model also defines four main types of particles: charged, nonpolar, polar, and apolar, which are subdivided into the donor, acceptor, and a donor and acceptor type that is not involved in hydrogen bonds. The model also incorporates the degree of polarity which ranges from 1 to 5, indicating low to high polarity. All of these beads comprise the building blocks of the MARTINI methodology and are characterized by different nonbonded interactions, which are described by a Lennard–Jones (LJ) 12-6 potential, Eq. (11), and Coulomb interaction for charged beads. The MARTINI force field also includes variants of small and tiny beads that are used to describe, for example, ring-like structures. We note that MARTINI beads set has a structure similar to the rCSU methodology discussed above.  

σ ij 12 σ ij 6 U LJ ¼ 4εij  ð17:11Þ r r In the case of proteins [78], the MARTINI model provides each amino acid with a backbone (BB) and some additional small beads for the side chain atoms. We show in Fig. 6 an all-atom representation of two protein segments, as well as the related MARTINI BB bead mapping. Those beads are bonded by harmonic potentials and also interact through nonbonded forces. A pure MARTINI CG force field description is not able to preserve the native structure of a protein [79]. A common way to overcome that is the use of the ELNEDIN protein model [80], which is based on the use of an elastic network (EN) [81] to preserve the native structure of the protein. This model preserves the structure of proteins by adding harmonic bonds with a certain

Assessing the Mechanical Stability of Biological Fibrils

369

Fig. 6 Diagrammatic illustration of the coarse-graining process using the protein Chignolin (PDB entry 1UAO). The all-atoms representation is used to generate the contact map and MARTINI coarse-graining

spring constant between the BB bead based on a cutoff criterion. However, harmonic potentials cannot be broken, which precludes studies of systems involving protein unfolding. On the other hand, Go¯-like models are based on CG beads, typically located on Cα-atoms, that often interact through a Lenard-Jones (LJ) potential [67–69]. The selection of the interaction pairs is based on the use of a contact map, as described earlier, and its construction uses the native structure of the protein. The interacting LJ pairs have the advantage of being able to broken apart. The Go¯-MARTINI [79] approach introduces additional virtual CG beads upon the BB MARTINI beads that interact through pairs of LJ potentials selected from an OV + rCSU contact map, as described in Sect. 2.3.1. It also includes two additional restrictions to the contact (BB) pairs, namely, they must be at least three residues apart and they must be inside a spherical shell typically with an inner radius of 0.3 nm and an outer radius of 1.1 nm. These conditions avoid contacts forming between close as well as

370

Rodrigo A. Moreira et al.

far distance pairs. The former is taken into account by the explicit bonded potentials and the latter is expected to be described by the MARTINI model. In the case of the Go¯-MARTINI model, the parameter εij is uniform throughout the system with typical value in the range of 6–9 kJ mol1 [79].

3

Notes

3.1 Numerical Simulations and the Breakdown of Continuum

Continuum models, like the ones described in Sects. 2.1.1 and 2.1.2, have provided a profound insight into the ubiquitous physical phenomenon of mechanical contact and deformation of matter coming into contact. However, by lowering down the scale, for example, for advanced microscopy-based studies of proteins, emergent multiscale phenomena have been reported [4]. Remarkably, matter at the nanoscale is comprised of clusters of atoms and this cannot be resolved to an infinitely smaller unit. Such a picture can be visualized if we think about the AFM tip apex, which at the micrometer scale looks mostly like a sharp cone, while at the nanoscale we can visualize some particle arrangements inherent from discreteness. In fact, advanced molecular simulation techniques have been developed [82], to model mechanical contact directly at the all-atom scale. Comparisons of continuum and atomistic models have been carried out [4, 83]. Nevertheless, there is no rule of thumb coming out from such comparison, especially when tackling “softer” materials in a range below 1 GPa of Young’s moduli and in terms of repulsive forces, between a 50 pN to a couple of nN. Hence, understanding the mechanical properties of those materials is strongly recommended to be studied at the nanoscale, which can be thought of as a “bottom-up” approach. Such “bottom-up” strategies start by performing, for example, allatom molecular dynamics (MD) simulations and scale up by lowering degrees of freedom toward a continuum scale [84–86]. Such an assessment will aid in finding the right compromise between modeling time-demanding atomistic approaches and continuum level simulations. In the next section, we the show results of the latter approach for biological fibers. Several computational methodologies have been developed in the past to describe the experimental results from the nanomechanical study of very long protein filaments. As we described before, we can divide them into two groups: (a) methodologies developed on the basis of all-atom MD simulation aided by steered molecular dynamics [28, 48] and (b) methodologies employing CG-MD simulation [63, 79]. The main idea is to probe mechanical properties [64] through local deformation (e.g., tensile, indentational, and shearing processes) (see Fig. 7). In this context, both types of methodologies deliver a dynamic picture of the whole deformation process by enabling molecular characterization during the early

Assessing the Mechanical Stability of Biological Fibrils

371

Fig. 7 The schematic representation for βA40 (PDB entry 2M4J) under three deformation processes. Left side shows tensile, middle panel, the shearing, and right panel the indentation processes. The set of Cα-atoms anchored in each processes are shown in solid blue color, the atoms that are moving at a speed vpull are shown with red color, and the indenter bead is shown in green moving at a speed vind. Arrows indicate the direction of pulling. In the case of indentation, a potential V(z) ~ z10 has been used to model the basis plane, where z is the distance between the plane and the CG beads

deformation stages. The main advantage of all-atom MD is that it can describe the underlying molecular mechanism behind local deformations at very short length and time scales (i.e., nm and μs) [87, 88]. CG-MD simulations on the other hand, can in general be used to study much large system sizes under more severe deformation and at time and length scales more directly comparable to experiments [63, 64, 89, 90]. Moreover, the CG-MD simulations can be employed as a tool to validate the elastic theory. This is done by calculating the coefficient “n” in the force vs. indentation depth curves. In particular, we found n ¼ 3/2 (see Eq. (6)) in the linear regime, which corresponds to the Hertzian theory [6]. In the next section, we discuss some of the successful studies of protein filaments under the application of external forces through all-atom and coarse-grained MD approaches. 3.2 Bacterial Filaments

In this section, the use of steered MD to study the type IV pilus (T4P) is discussed with respect to a concrete examples from the literature of the application of this method at the all-atom and coarse-grained scales to the T4P from the organism N. gonorrhoeae [2, 49]. Type IV pili (T4P) are long protein filaments (i.e., biopolymers) that emanate from the surface of prokaryotic cells and are implicated in a variety of critical functions (surface attachment, biofilm formation, twitching motility, DNA binding, virulence factors, etc.) [91–94]. Full-length T4P can be comprised of thousands of copies of a “ladle-shaped” protein called pilin (the monomer subunit of T4P). The structure of pilin includes a predominantly helical domain that packs into the filament core and a globular domain which lines the filament exterior and functions in adhesion [91–94]. T4P are both highly flexible and very strong, which has led to their description as biomolecular “grappling

372

Rodrigo A. Moreira et al.

hooks” [94]. Therefore, it is instructive to study them using steered MD methods [2, 49], which is an ongoing research effort in the Baker group [95–97]. Between all-atom MD and CG-MD simulations, the size of a simulated system can vary substantially, as mentioned earlier in the chapter. For example, in the all-atom steered MD simulations of the T4P protein filament, the total number of pillion subunits included in the simulated system was 18, and when both water and ions are added the number of atoms was approximately 290,000 [2]. However, due to the mapping of multiple atoms to a coarse-grained bead in the MARTINI force field [77], CG T4P models were able to be built to include 26 and 66 pilin subunits [49]. Although the CG model can describe a long filament, as discussed there is inherently a trade-off between system size and chemical specificity when comparing all-atom and coarse-grained models for molecular simulations [98]. In particular, the inclusion of explicit water molecules in all-atom MD simulations can quickly increase the system size, especially in steered MD simulations where additional water must be present for the protein filament to remain solvated while being elongated, and to prevent the filament from interacting with its periodic image [2]. For example, to extend a protein filament to twice its initial length using steered MD would require a water box more than twice the length of the filament system. Therefore, if we consider a segment of an arbitrary filament than just fits into a water box of size 200 Å  100 Å  100 Å extending the filament to twice its original length would require the increase of the water box size along the pulling direction by a minimum of another 200 Å, in turn introducing nearly 67,000 additional water molecules into the simulated system. Therefore, CG filament models are very useful for simulating larger filament systems or for exploring more significant conformational changes compared to all-atom models. As mentioned previously, it is in general good practice during the development of the steered MD protocol for a simulated system to carry out steered MD simulations using a range of values for parameters such as the pulling speed or the spring stiffness, which connects the pulled atom selection to the moving sMD restraint in order to gauge the sensitivity of the results of simulation to these choices. In the all-atom steered MD T4P study, several pulling velocities ranging from 10 to 1 Å/ns and a spring stiffness connecting the pulled selection to the moving restraint of ~7 kcal mol1 Å2 were used [2]. In the CG steered MD T4P velocities ranging from 0.05 to 10 Å/ns were used along with a spring stiffness of ~14 kcal mol1 Å2 [49]. Slower pulling speeds are more reasonable to obtain at the coarse-grained scale compared to the all-atom scale. The values for the pulling speed in these T4P simulations are also in the range of what has been used for other filament systems, for example, in all-atom cahderin simulations [46] or CG simulations of actin filaments [48].

Assessing the Mechanical Stability of Biological Fibrils

373

At the all-atom scale of resolution, Baker et al. investigated the initial stages of T4P filament elongation, and how the structure of the T4P filament changed over the course of the constant velocity steered MD simulations [2]. In total, a filament extension of approximately 15 Å was observed in that study [2]. Importantly, these simulations were able to provide molecular-scale insights into the separation between subunits in the filament, which led to new gaps opening up on the filament surface during the initial stages of elongation [2]. More specifically, the steered MD simulations demonstrated that the amino acid sequence EYYLN, which is found at the end of the alpha-helix of each pilin subunit, exhibits a significant increase in its solvent exposed surface area as a result of filament elongation [2]. This observation was consistent with experimental measurements that were previously known to demonstrate EYYLN exposure via fluorescence in the elongated state of the T4P [99], and the steered MD simulations were able to provide molecular scale details about the EYYLN exposure occurs [2]. The all-atom steered MD simulations also predicted additional sequences that would become exposed based on these predictions led to a new round of experiments which identified one of these regions as becoming more exposed [2]. In the work of Zhao et al., the MARTINI CG simulations of the T4P filament were used in order to determine Young’s modulus of the filament upon elongation by steered MD [49]. The force versus extension data obtained from their simulations was then used to determine Young’s modulus by first finding the slope of the force–extension curve in the linear (“elastic”) region of extension, and then converting to Young’s modulus by calculating Y ¼ K eff 

L0 A

ð17:12Þ

where Y is the Young’s modulus, Keff is the “effective” spring constant of the T4P filament during elastic extension, L0 is the initial filament length, and A is the cross-sectional area of the T4P filament [49]. The value for the Young’s modulus for the N. gonorrhoeae T4P filament was determined to be 0.80 +/  0.07 GPa. Eventually, in the CG steered MD simulations after continuing to elongate the filament, complete breakage of the T4P was observed [49]. 3.3 Peptide and β-Amyloid Fibrils

Here we discuss the advantages and limitations of CG models to assess the mechanical properties of fibrils in time and length scales larger than those readily accessible to the all-atom MD approach. We show in Fig. 7 a sketch of three different types of deformation for threefold amyloid fibrils. The structure-based coarse-grained representation involves the Cα-atoms [100], and native contacts are determined by OV criterion as described previously. The experimental tensile Young modulus, YL, for this system has not been

374

Rodrigo A. Moreira et al.

reported, but the case of a twofold β-amyloid system was studied by Paparcone et al. [87, 88] using all-atom MD simulations, and they reported values in the range 2.3–12.4 GPa for longitudinal deformation at a constant force. The CG methodology predicts 3.1 and 1.6 GPa for the threefold and the twofold geometries respectively [63]. This discrepancy can be related to the magnitude of applied force to start the tensile deformation in all-atom MD simulations, which induces a fast deformation process and increases the mechanical response of the system. Moreover, the precise geometry of the fibril is important to obtain accurate mechanical properties, and the all-atom MD simulations were based on an in vitro twofold model and has shown a large variation of YL compared to CG simulation. The strength of CG methodology can also be verified through the measure of the Young transverse modulus, YT, that was experimentally measured as 3.3 GPa [3] for an unknown symmetry and using a most simple CG simulations Poma et al. [101] reported 7.0 and 21.3 GPa for two and threefold geometries, respectively, which is remarkable for a such simple CG model. The mechanical properties of Aβ40 and Aβ42 fibrils could also be compared [63], where Aβ42 showed higher mechanical stability compared to Aβ40. The longitudinal deformation was related by YLAβ42 ¼ 1.14YLAβ40 and YTAβ42 ¼ 2.34YTAβ40, which is consistent with previous results [102, 103]. Thus, structure-based CG methods could be employed to probe different mechanical deformation on single proteins, aggregates, and fibrils, provided that a reliable structure of the system exists. The mechanical properties of the (LDLK)3 fibril were studied by Fontana and Gelain [64] using a CG approach based on the Go¯-MARTINI methodology as described in Sect. 2.3.2. The main drawback of MARTINI force field is related to its description of protein secondary structure [104]. Go¯-MARTINI improves that aspect of the MARTINI model by introducing a set of LJ potentials which allow the study of large conformational changes including secondary structure transitions [79]. In that work, the Go¯MARTINI model was further validated against all-atom MD simulations. The results obtained for the mechanical features of (LDLK)3 and FAQ-(LDLK)3 indicate that the former has a higher shear modulus than the latter. This result shows a good correlation with previous rheological data [64]. In fact, it was possible to monitor the fibrils failure mechanism, at ten times reduced computational cost compared with all-atom MD simulation [64]. This shows the high performance of the Go¯-MARTINI approach for the study and assessment of the mechanical properties of protein complexes.

Assessing the Mechanical Stability of Biological Fibrils

375

Acknowledgments A.B.P. and R.A.M. acknowledge the financial support from the National Science Centre, Poland, under grant No. 2017/26/D/ NZ1/00466. J.L.B. acknowledges support from the National Science Foundation, USA, under grant MCB-1817670. H.V.G acknowledges the core funding support from the Slovenian Research Agency, under grant No. P1-0055. References 1. Ando T (2012) High-speed atomic force microscopy coming of age. Nanotechnology 23:062001 2. Baker JL, Biais N, Tama F (2013) Steered molecular dynamics simulations of a type IV pilus probe initial stages of a force-induced conformational transition. PLoS Comput Biol 9:e1003032 3. Ruggeri FS, Adamcik J, Jeong JS, Lashuel HS, Mezzenga R, Dietler G (2015) Influence of the β-sheet content on the mechanical properties of aggregates during amyloid fibrillization. Angew Chem Int Ed 54:2462–2466 4. Luan B, Robbins MO (2005) The breakdown of continuum models for mechanical contacts. Nature 435:929–932 5. Kmiecik S, Gront D, Kolinski M, Wieteska L, Dawid AE, Kolinski A (2016) Coarse-grained protein models and their applications. Chem Rev 116:7898–7936 6. Hertz H (1881) Ueber die beruhrung fester elastischer korper. J Reine Angew Math 92:156–171 7. Johnson KL (1987) Contact mechanics. Cambridge University Press, Cambridge 8. Johnson KL, Kendall K, Roberts AD (1971) Surface energy and the contact of elastic solids. Proc R Soc Lond A 324:301–313 9. Derjaguin BV, Muller VM, Toporov YP (1975) Effect of contact deformations on the adhesion of particles. J Colloid Interface Sci 53:314–326 10. Tabor D (1977) Surface forces and surface interactions. J Colloid Interface Sci 58:2–13 11. Dugdale DS (1960) Yielding of steel sheets containing slits. J Mech Phys Solids 8:100–108 12. Unertl WN (1999) Implications of contact mechanics models for mechanical properties measurements using scanning force microscopy. J Vac Sci Technol A 17:1779–1786. In: 45th national symposium of the

American-Vacuum-Society Nov 02-06 Baltimore, MD Part 2. 13. Lantz MA, O’Shea SJ, Welland ME, Johnson KL (1997) Atomic-force-microscope study of contact area and friction on NbSe2. Phys Rev B 55:10776 14. Carpick RW, Ogletree DF, Salmeron M (1997) Lateral stiffness: a new nanomechanical measurement for the determination of shear strengths with friction force microscopy. Appl Phys Lett 70:1548–1550 15. Carpick RW, Salmeron M (1997) Scratching the surface: fundamental investigations of tribology with atomic force microscopy. Chem Rev 97:1163–1194 16. Israelachvili J (2005) Intermolecular and surface forces, 2nd edn. Elsevier Academic Press, London 17. Butt HJ, Kappl M (2010) Surface and interfacial forces. Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim, pp 120–125 18. Herruzo ET, Asakawa H, Fukuma T, Garcia R (2013) Three-dimensional quantitative force maps in liquid with 10 piconewton, angstrom and sub-minute resolutions. Nanoscale 5:2678–2685 19. Guzman HV, Perrino AP, Garcia R (2013) Peak forces in high-resolution imaging of soft matter in liquid. ACS Nano 7:3198–3204 20. Guzman HV, Garcia R (2013) Peak forces and lateral resolution in amplitude modulation force microscopy in liquid. Beilstein J Nanotechnol 4:852–859 21. Tatara Y (1989) Extensive theory of forceapproach relations of elastic spheres in compression and in impact. J Eng Mater Technol 111:163–168 22. Karplus M, McCammon JA (2002) Molecular dynamics simulations of biomolecules. Nat Struct Biol 9:646–652 23. Adcock SA, McCammon JA (2006) Molecular dynamics: survey of methods for

376

Rodrigo A. Moreira et al.

simulating the activity of proteins. Chem Rev 106:1589–1615 24. Gonza´lez MA (2011) Force fields and molecular dynamics simulations. JDN 12:169–200 25. Dror RO, Dirks RM, Grossman JP, Xu H, Shaw DE (2012) Biomolecular simulation: a computational microscope for molecular biology. Annu Rev Biophys 41:429–452 26. Hollingsworth SA, Dror RO (2018) Molecular dynamics simulation for all. Neuron 99:1129–1143 27. Izrailev S, Stepaniants S, Isralewitz B, Kosztin D, Lu H, Molnar D, Wriggers W, Schulten K (1999) Steered molecular dynamics. In: Computational molecular dynamics: challenges, methods, ideas. Springer, Berlin, pp 39–65 28. Isralewitz B, Gao M, Schulten K (2001) Steered molecular dynamics and mechanical functions of proteins. Curr Opin Struc Biol 11:224–230 29. Isralewitz B, Baudry J, Gullingsrud J, Kosztin D, Schulten K (2001) Steered molecular dynamics investigations of protein function. J Mol Graph Model 19:13–25 30. Sotomayor M, Schulten K (2007) Singlemolecule experiments in vitro and in silico. Science 316:1144–1148 31. Sotomayor M (2015) Computational exploration of single-protein mechanics by steered molecular dynamics. AIP Conf Proc 1703:030001 32. Egelman EH (2016) The current revolution in Cryo-EM. Biophys J 110:1008–1012 33. Popp D, Koh F, Scipion CPM, Ghoshdastider U, Narita A, Holmes KC, Robinson RC (2018) Advances in structural biology and the application to biological filament systems. Bioessays 40:e1700213 34. Nogales E (2015) An electron microscopy journey in the study of microtubule structure and dynamics: EM studies of microtubule structure. Protein Sci 24:1912–1919 35. Egelman EH (2017) Cryo-EM of bacterial pili and archaeal flagellar filaments. Curr Opin Struct Biol 46:31–37 36. Chou SZ, Pollard TD (2019) Mechanism of actin polymerization revealed by cryo-EM structures of actin filaments with three different bound nucleotides. Proc Nat Acad Sci U S A 116:4265–4274 37. Ren Z, Zhang Y, Zhang Y, He Y, Du P, Wang Z, Sun F, Ren H (2019) Cryo-EM structure of actin filaments from Zea mays pollen. Plant Cell 31:2855–2867 38. Wells DB, Aksimentiev A (2010) Mechanical properties of a complete microtubule revealed

through molecular dynamics simulation. Biophys J 99:629–637 39. Saunders MD, Voth GA (2012) Comparison between actin filament models: coarse graining reveals essential differences. Structure 20:641–653 40. Phillips JC, Braun R, Wang W, Gumbart J, Tajkhorshid E, Villa E, Chipot C, Skeel RD, Kale L, Schulten K (2005) Scalable molecular dynamics with NAMD. J Comput Chem 26:1781–1802 41. Der Spoel DV, Lindahl E, Hess B, Groenhof G, Mark AE, Berendsen HJC (2005) GROMACS: fast, flexible, and free. J Comput Chem 26:1701–1718 42. Salomon-Ferrer R, Go¨tz AW, Poole D, Grand SL, Walker RC (2013) Routine microsecond molecular dynamics simulations with AMBER on GPUs. 2. Explicit solvent particle mesh Ewald. J Chem Theory Comput 9:3878–3888 43. Kolappan S, Coureuil M, Yu X, Nassif X, Egelman EH, Craig L (2016) Structure of the Neisseria meningitidis type IV pilus. Nat Commun 7:13015 44. Humphrey W, Dalke A, Schulten K (1996) VMD—visual molecular dynamics. J Mol Graph 14:33–38 45. Matsushita S, Inoue Y, Hojo M, Sokabe M, Adachi T (2011) Effect of tensile force on the mechanical behavior of actin filaments. J Biomech 44:1776–1781 46. Sotomayor M, Weihofen WA, Gaudet R, Corey DP (2012) Structure of a forceconveying cadherin bond essential for innerear mechanotransduction. Nature 492:128–132 47. Gao M, Wilmanns M, Schulten K (2002) Steered molecular dynamics studies of titin I1 domain unfolding. Biophys J 83:3435–3445 48. Kim JI, Kwon J, Baek I, Na S (2016) Steered molecular dynamics analysis of the role of cofilin in increasing the flexibility of actin filaments. Biophys Chem 218:27–35 49. Zhao X, Liu Y, Guo Z, Zhang Y, Li Y, Liu W (2017) Mechanical response and deformation mechanics of type IV pili investigated using steered coarse-grained molecular dynamics simulation. J Biomech 56:97–101 50. Michaud-Agrawal N, Denning EJ, Woolf TB, Beckstein O (2011) MDAnalysis: a toolkit for the analysis of molecular dynamics simulations. J Comput Chem 32:2319–2327 51. Roe DR, Cheatham TE III (2013) PTRAJ and CPPTRAJ: software for processing and analysis of molecular dynamics trajectory data. J Chem Theory Comput 9:3084–3095

Assessing the Mechanical Stability of Biological Fibrils 52. Abraham MJ, Murtola T, Schulz R, Pa´ll S, Smith JC, Hess B, Lindahl E (2015) GROMACS: high performance molecular simulations through multilevel parallelism from laptops to supercomputers. SoftwareX 1–2:19–25 53. McGibbon RT, Beauchamp KA, Harrigan MP, Klein C, Swails JM, Herna´ndez CX, Schwantes CR, Wang LP, Lane TJ, Pande VS (2015) MDTraj: a modern open library for the analysis of molecular dynamics trajectories. Biophys J 109:1528–1532 54. Gowers R, Linke M, Barnoud J, Reddy T, Melo M, Seyler S, Doman´ski J, Dotson D, Buchoux S, Kenney I, Beckstein O (2019) MDAnalysis: a Python package for the rapid analysis of molecular dynamics simulations. In: Proceedings of the 15th Python in Science Conference, pp 98–105. SciPy. 55. Jarzynski C (1997) Nonequilibrium equality for free energy differences. Phys Rev Lett 78:2690–2693 56. Park S, Khalili-Araghi F, Tajkhorshid E, Schulten K (2003) Free energy calculation from steered molecular dynamics simulations using Jarzynski’s equality. J Chem Phys 119:3559–3566 57. Cuendet MA, Michielin O (2008) Proteinprotein interaction investigated by steered molecular dynamics: the TCR-pMHC complex. Biophys J 95:3575–3590 58. Dellago C, Hummer G (2013) Computing equilibrium free energies using nonequilibrium molecular dynamics. Entropy 16:41–61 59. Xu Z, Paparcone R, Buehler MJ (2010) Alzheimer’s aβ (1-40) amyloid fibrils feature sizedependent mechanical properties. Biophys J 98:2053–2062 60. Paparcone R, Buehler MJ (2011) Failure of aβ (1-40) amyloid fibrils under tensile loading1. Biomaterials 32:3367–3374 61. Gautieri A, Vesentini S, Redaelli A, Buehler MJ (2011) Hierarchical structure and nanomechanics of collagen microfibrils from the atomistic scale up. Nano lett 11:757–766 62. Poma AB, Li MS, Theodorakis PE (2018) Generalization of the elastic network model for the study of large conformational changes in biomolecules. Phys Chem Phys 20:17020–17028 63. Poma AB, Guzman HV, Li MS, Theodorakis PE (2019) Mechanical and thermodynamic properties of aβ42, aβ40, and α-synuclein fibrils: a coarse-grained method to complement experimental studies. Beilstein J Nanotechnol 10:500–513 64. Fontana F, Gelain F (2020) Probing mechanical properties and failure mechanisms of

377

fibrils of self-assembling peptides. Nanoscale Adv 2:190–198 65. Martı´nez M, Cooper CD, Poma AB, Guzman HV (2020) Free energies of the disassembly of viral capsids from a multiscale molecular simulation approach. J Chem Inf Model 60:974–981 66. Senapati S, Poma AB, Cieplak M, Filipek S, Park PSH (2019) Differentiating between inactive and active states of rhodopsin by atomic force microscopy in native membranes. Anal Chem 91:7226–7235 67. Klimov DK, Thirumalai D (2000) Mechanisms and kinetics of β-hairpin formation. Proc Natl Acad Sci U S A 97:2544–2549 68. Settanni G, Hoang TX, Micheletti C, Maritan A (2002) Folding pathways of prion and doppel. Biophys J 83:3533–3541 69. Sułkowska JI, Cieplak M (2008) Selection of optimal variants of Go¯-like models of proteins through studies of stretching. Biophys J 95:3174–3191 70. Marrink SJ, Risselada HJ, Yefimov S, Tieleman DP, De Vries AH (2007) The MARTINI force field: coarse grained model for biomolecular simulations. J Phys Chem B 111:7812–7824 71. Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, Jain S, Fagan P, Marvin J, Padilla D, Ravichandran V, Schneider B, Thanki N, Weissig H, Westbrook JD, Zardecki C (2002) The Protein Data Bank. Acta Crystallogr Sect D Biol Crystallogr 58:899–907 72. Tsai J, Taylor R, Chothia C, Gerstein M (1999) The packing density in proteins: standard radii and volumes. J Mol Biol 290:253–266 73. Wołek K, Go´mez-Sicilia A, Cieplak M (2015) Determination of contact maps in proteins: a combination of structural and chemical approaches. J Chem Phys 143:243105 74. Seeliger D, de Groot BL (2007) Atomic contacts in protein structures. a detailed analysis of atomic radii, packing, and overlaps. Proteins 68:595–601 75. Sobolev V, Wade RC, Vriend G, Edelman M (1996) Molecular docking using surface complementarity. Proteins 25:120–129 76. Gonza´lez A (2010) Measurement of areas on a sphere using Fibonacci and latitude–longitude lattices. Math Geosci 42:49 77. Monticelli L, Kandasamy SK, Periole X, Larson RG, Tieleman DP, Marrink SJ (2008) The MARTINI coarse-grained force field: extension to proteins. J Chem Theory Comput 4:819–834

378

Rodrigo A. Moreira et al.

78. Chwastyk M, Bernaola AP, Cieplak M (2015) Statistical radii associated with amino acids to determine the contact map: fixing the structure of a type I cohesin domain in the Clostridium thermocellum cellulosome. Phys Biol 12:046002 79. Poma AB, Cieplak M, Theodorakis PE (2017) Combining the MARTINI and structurebased coarse-grained approaches for the molecular dynamics studies of conformational transitions in proteins. J Chem Theory Comput 13:1366–1374 80. Periole X, Cavalli M, Marrink SJ, Ceruso MA (2009) Combining an elastic network with a coarse-grained molecular force field: structure, dynamics, and intermolecular recognition. J Chem Theory Comput 5:2531–2543 81. Tirion MM (1996) Large amplitude elastic motions in proteins from a single-parameter, atomic analysis. Phys Rev Lett 77:1905 82. Guzman HV, Tretyakov N, Kobayashi H, Fogarty AC, Kreis K, Krajniak J, Junghans C, Kremer K, Stuehn T (2019) Espresso++ 2.0: advanced methods for multiscale molecular simulation. Comput Phys Commun 238:66–76 83. Buehler MJ (2006) Atomistic and continuum modeling of mechanical properties of collagen: elasticity, fracture, and self-assembly. J Mater Res 21:1947–1961 84. Peter C, Kremer K (2009) Multiscale simulation of soft matter systems—from the atomistic to the coarse-grained level and back. Soft Matter 5:4357–4366 85. Saunders MG, Voth GA (2012) Coarsegraining of multiprotein assemblies. Curr Opin Struc Biol 22:144–150 86. Noid WG (2013) Perspective: Coarse-grained models for biomolecular systems. J Chem Phys 139:09B201_1 87. Paparcone R, Keten S, Buehler MJ (2010) Atomistic simulation of nanomechanical properties of Alzheimer’s aβ (1–40) amyloid fibrils under compressive and tensile loading. J Biomech 43:1196–1201 88. Paparcone R, Buehler MJ (2011) Failure of aβ (1-40) amyloid fibrils under tensileloading. Biomaterials 32:3367–3374 89. Schramm AC, Hocky GM, Voth GA, Blanchoin L, Martiel JL, Enrique M (2017) Actin filament strain promotes severing and cofilin dissociation. Biophys J 112:2624–2633 90. Szatkowski L, Dale MR, Ross JL, Dima RI (2019) Mechanical stability of microtubule lattices-molecular dynamic indentation studies. Biophys J 116:257a

91. Craig L, Pique ME, Tainer JA (2004) Type IV pilus structure and bacterial pathogenicity. Nat Rev Microbiol 2:363–378 92. Craig L, Li J (2008) Type IV pili: paradoxes in form and function. Curr Opin Struct Biol 18:267–277 93. Giltner CL, Nguyen Y, Burrows LL (2012) Type IV pilin proteins: versatile molecular modules. Microbiol Mol Biol Rev 76:740–772 94. Maier B (2013) The bacterial type IV pilus system—a tunable molecular motor. Soft Matter 9:5667 95. Bogin BA, Lovenduski CA, Biais N, Baker JL (2019) Probing the polymorphic transition of type IV pilus filaments under force using coarse-grained molecular dynamics simulations. Biophys J 116:186a 96. Fairfield MN, Jones SJ, Biais N, Baker JL (2019) Investigating the response of type IV pilins and type IV pilus filaments to applied force using all-atom steered molecular dynamics simulations. Biophys J 116:185a–186a 97. Goncalves RB, Biais N, Baker JL (2019) Large-scale all-atom simulations of T4P filaments reveal critical interactions for T4P stability. Biophys J 116:188a 98. Riniker S, Allison JR, van Gunsteren WF (2010) On developing coarse-grained models for biomolecular simulation: a review. Phys Chem Phys 14:12423–12430 99. Biais N, Higashi DL, Brujic J, So M, Sheetz MP (2010) Force dependent polymorphism in type IV pili reveals hidden epitopes. Proc Natl Acad Sci U S A 107:11358–11363 100. Poma AB, Chwastyk M, Cieplak M (2015) Polysaccharide–protein complexes in a coarse-grained model. J Phys Chem B 119:12028–12041 101. Poma AB, Chwastyk M, Cieplak M (2017) Elastic moduli of biological fibers in a coarse-grained model: crystalline cellulose and β-amyloids. Phys Chem Phys 19:28195–28206 102. Kouza M, Co NT, Li MS, Kmiecik S, Kolinski A, Kloczkowski A, Buhimschi IA (2018) Kinetics and mechanical stability of the fibril state control fibril formation time of polypeptide chains: a computational study. J Chem Phys 148:215106 103. Knowles TPJ, Buehler MJ (2011) Nanomechanics of functional and pathological amyloid materials. Nat Nanotechnol 6:469–479 104. Alessandri R, Souza PCT, Thallmair S, Melo MN, De Vries AH, Marrink SJ (2019) Pitfalls of the MARTINI model. J Chem Theory Comput 15:5448–5460

Chapter 17 Predictive Modeling of Neurotoxic α-Synuclein Polymorphs Liang Xu, Shayon Bhattacharya, and Damien Thompson Abstract Assembly of monomeric α-synuclein (αS) into aggregation-resistant helically folded tetramers and related multimers is a key target for Parkinson’s disease (PD). Protein dynamics hampers experimental characterization of the polymorphism of these structures and so computational modeling and simulation is providing a complementary approach to obtain high-resolution structural information on the assembly of αS and interactions with biological surfaces. These computational techniques are particularly valuable for intrinsically disordered proteins (IDPs) and short-lived peptide and protein assemblies with as yet undetermined 3D structures. Experimental observables such as NMR J-coupling constants and chemical shifts can be predicted directly from simulation data, and compared with available experimental data to generate the most physically realistic atomic-resolution structure. For appropriately validated and benchmarked computational models, macroscopic aggregation properties can be related to the calculated thermodynamic properties at an atomic level. In this chapter, we describe a useful protocol for designing helical αS multimers, especially tetramers, and scanning the peptide–membrane interface for cell-bound αS tetramers. These computationally modeled structures are validated by comparison with the range of available known experimental parameters at time of writing in early 2020, and used to generate predictive design rules to motivate and guide experiments. Key words Intrinsically disordered proteins (IDP), Molecular simulation, Directed self-assembly, Neurodegeneration, Peptide–cell interactions

1

Introduction Parkinson’s disease (PD) is the second most common neurodegenerative disease after Alzheimer’s disease, and the most common movement disorder worldwide [1, 2]. One prominent neuropathological hallmark of PD is the presence of Lewy bodies and Lewy neurites composed of α-synuclein (αS) fibrils in the patient’s brain [3]. Although the functional role of αS remains elusive, it has been shown to interact with synaptic vesicles, modulate dopamine metabolism, and regulate glucose activity [4]. The abnormal aggregation of αS monomers produces neurotoxic β-sheet rich oligomers that transform into insoluble amyloid fibrils [5]. Monomeric αS is disordered in aqueous solution and lacks secondary structure, but

Mai Suan Li et al. (eds.), Computer Simulations of Aggregation of Proteins and Peptides, Methods in Molecular Biology, vol. 2340, https://doi.org/10.1007/978-1-0716-1546-1_17, © Springer Science+Business Media, LLC, part of Springer Nature 2022

379

380

Liang Xu et al.

adopts α-helical conformations when interacting with micelles. Two micelle-bound αS structures are available in the protein data bank (PDB IDs: 1XQ8 and 2KKW) [6, 7], showing two antiparallel helices. In contrast, an extended superhelical conformation for αS has been reported in the presence of physiologically relevant phospholipid bilayer membranes [8], which is markedly different to the structure found with nonvesicular lipids or the commonly used membrane-mimetic detergent sodium dodecyl sulfate, SDS. Increasing evidence has shown that αS oligomers rich in β-sheet play a key role in triggering neurotoxicity by disrupting membrane integrity, and impair protein degradation and function of several organelles such as the mitochondria and endoplasmic reticulum [9]. Moreover, αS aggregates (oligomers and preformed fibrils) display prion-like seeded propagation and spreading in the brain [10]. αS fibril strains with different structures could lead to different levels of neurotoxicity, seeding abilities, and propagation properties [11]. The full sequence of αS contains 140 residues, including three distinct domains [5]: lipid-binding N-terminal region (residues 1–60), highly hydrophobic non–amyloid-β component (NAC) (residues 61–95), and negatively charged C-terminal region (residues 96–140). The NAC region is essential for αS aggregation and the fold preference of αS fibrils [12]. In addition to the disordered monomers that are prone to aggregation, the physiological state of αS also features helically folded tetramers (and other multimers) that resist aggregation [13, 14]. It is likely that a dynamic equilibrium exists between these two states, and many factors could disrupt it. Familial mutations of PD including E46K, H50Q, G51D, and A53T have been observed to destabilize α-helical tetramers and thus induce neurotoxicity and inclusions [15]. Other mutations in αS conserved KTKEGV repeat motifs were also found to block tetramer formation and cause PD-like neurotoxicity [16]. In addition, αS E!K mutations (E35K, E46K, and E61K) could individually or collectively destabilize αS multimers and induce monomer aggregation at membranes to form vesicle-rich inclusions [17]. Recent studies also showed that glucocerebrosidase 1 (GBA1) deficiency may lead to the destabilization of αS tetramers [18]. Given the polymorphism of αS aggregates, structural characterization of helical αS tetramers and related multimers is extremely challenging by experimental techniques alone. In this chapter, we describe a holistic computational approach that combines modeling, simulation, refinement, and validation, to provide valuable structural information at an atomic level on assembly of helical tetramers.

Predictive Modeling of Neurotoxic α-Synuclein Polymorphs

2

381

Model Building, Computational Methodology, and Analysis

2.1 Construction, Refinement, and Characterization of Stable Helical αS Monomers 2.1.1 Generating Starting Helical αS Monomer Structures

2.1.2 Relaxation and Characterization of Helical αS Monomers

The membrane-free helical conformation of αS has been observed in a recent experimental study [19]. It remains unclear if such a helical structure shares the same helices as those observed in the micelle-bound states (PDB IDs: 1XQ8 and 2KKW). To construct stable helical tetramers under physiological conditions, we need to obtain stable helical αS monomers first. We constructed both broken [20] and extended helical αS monomers based on the available experimental evidence. For the broken helical conformation, the linking loop connecting two antiparallel α-helices is thought to host most familial mutations such as E46K, H50Q, G51D, and A53T [21]. Therefore, a new helical monomer was designed with two α-helices spanning Met1-Thr44 and Gly51-Phe94, respectively, which is different from the micelle-bound monomer (PDB ID: 1XQ8) with two α-helices spanning Val3-Val37 and Lys45Thr92, respectively. For the extended helical conformation, the experimentally determined 11/3 (3 helical turns per 11 residue span) helical αS (residues 9–89) [8] was used to build the fulllength helical monomer by adding disordered C-terminal (residues 90–140) and proximal N-terminal (residues 1–8). For comparison, we also included another helical monomer taken from a previous study of full-length αS tetramers [22], which demonstrated tight hydrophobic packing between three monomers. Figure 1 shows the α-helical conformations of different monomers. All-atom molecular dynamics (MD) simulations were performed to relax each αS monomer in water. The structures were calculated using the CHARMM36m protein force field [25] with the corresponding modified TIP3P water model. Each monomer was solvated in a large cube of water molecules, with all protein atoms at least 15 Å from the box edge. Counterions were added to neutralize formal charge with an additional 0.15 M NaCl salt added to represent physiological ionic strength. MD simulations were conducted at 310 K and 1 atm. The temperature was maintained using a velocity rescaling thermostat [26], and the pressure was controlled by the Parrinello–Rahman coupling method [27]. All MD simulations were performed using GROMACS-5.0.4 software [28]. The short-range van der Waals and electrostatic interactions were truncated using a cutoff of 12 Å, and the long-range electrostatics was treated using the particle mesh Ewald (PME) method. Each system was first subjected to energy minimization, followed by two stages of equilibration with positional restraints on the heavy atoms of the protein structure: 100 ps at constant volume followed by 100 ps at constant pressure at 310 K. A time step of 2 fs was applied with the LINCS algorithm. Production dynamics were

382

Liang Xu et al.

Fig. 1 α-Helical conformations of different αS monomers. (a) Redesigned αS monomer [23]; (b) micelle-bound αS monomer (PDB ID: 1XQ8) [6]; (c) αS monomer taken from study of Gurry et al. [24]; (d) extended αS monomer with 11/3 α-helix conformation [8]. The N-terminal, NAC region, and the C-terminal are colored in red, blue, and gray, respectively. Residues E46, H50, G51, and A53 showing the positions of familial mutations E46K, H50Q, G51D, and A53T are highlighted in Liquorice representation. Residues at the start and end of helices are labelled

Fig. 2 The fraction of native contacts Q for αS monomers. A, B, C, and D correspond to the four starting conformations shown in Fig. 1

Predictive Modeling of Neurotoxic α-Synuclein Polymorphs

383

Fig. 3 Calculated helical percentage and conformational energy for the different monomeric αS models shown in Fig. 1. Values were averaged over the last 20-ns of 200-ns simulations

then performed at constant temperature and pressure without any positional restraints, with structures saved every 10 ps. The fraction of native contacts Q was calculated using the following formula [29]. 1X 1 h  i Q ðX Þ ¼ ð18:1Þ N 0 ði, j Þ 1 þ exp β r ij ðX Þ  λr ij Heavy atoms i and j in residues θi and θj are in contact if the distance between them is less than 5.0 Å. rij(X) is the distance between i and j in conformation X; r 0ij ðX Þ is the distance in the native state (starting conformation). β is a smoothing parameter taken to be 5 Å1 and the factor λ accounts for fluctuations when the contact is formed, taken to be 1.8 for the all-atom model, following [29]. Here we use Q to monitor the degree of relaxation for each helical αS monomer (Fig. 2). No significant intermolecular contacts were observed for all systems, indicating that after 200-ns simulations, each αS monomer reached equilibrium. However, the structure of model C deviates remarkably from its initial conformation as the Q value drops to 0.45 at the end of simulations. The changes of secondary structure, especially helical structures, were calculated using the DSSP (Define Secondary Structure of Proteins) algorithm [30]. The calculated percentage of helix (the sum of α-helix, 310-helix and π-helix) for αS monomers is shown in Fig. 3. The relative stabilities of different helical αS monomers were estimated according to their conformational energies calculated using the following formula.

384

Liang Xu et al.

E ðcomplexÞ ¼ E MM þ G solvation

ð18:2Þ

where EMM is the molecular mechanics potential energy, calculated in terms of the molecular mechanics force filed parameters as E MM ¼ E bonded þ E vdW þ E elec

ð18:3Þ

where Ebonded is the bonded interaction energy including bond, angle, and dihedral energy terms; EvdW is the van der Waals interaction described by the Lennard–Jones potential function; and Eelec is the electrostatic interaction described by the Coulomb potential function. The solvation free energy Gsolvation was calculated according to the Generalized Born using Molecular Volume (GBMV) implicit solvent model implemented in the CHARMM (v40b2) program [31]. The single point energy was calculated after 200-step minimization of each conformation using the GBMV II algorithm [32– 34]. Other energy terms including bonded energy, van der Waals energy, electrostatic energy, and solvation energy were also obtained using the GB implicit solvent model. This method is similar to the MM/GBSA method [35], except that the solvation energy was calculated using the GBMV method. The block average method was used to estimate the mean value and standard deviations. The calculated conformational energies for different αS monomers are shown in Fig. 3. The redesigned helical αS model (Model A) displays the most favorable conformational energy, implying that the presented design protocol can identify and distinguish thermodynamically stable conformations of the helical αS monomer. 2.2 Construction and Validation of Helical αS Multimers 2.2.1 Protocol for Construction of Stable Helical αS Tetramers

Starting models for the helical αS tetramer could be generated in various ways. It can be assembled by four homogeneous or heterogeneous αS monomers simultaneously or hierarchically. Experimentally identified helical αS tetramers present as part of mixtures with other helical multimers containing dimers, trimers, pentamers, hexamers, heptamers, and octamers, with the tetramers being the most populous species [13]. Moreover, several familial mutations— in particular, E46K, H50Q, G51D, and A53T—have been shown to destabilize αS tetramers and shift the monomer–tetramer equilibrium back toward free monomers [15]. From a thermodynamic point of view, such experimental observations provide important clues for the rational design of helical αS tetramers. That is, the computationally designed αS tetramers should be both thermodynamically favorable and more kinetically stable than other multimers; and the known familial mutations may be expected to create a much higher energy barrier to tetramerization. Since there is no golden rule to follow to generate the helical αS multimers that are consistent with experimental data, the finding that most homomers have nearly perfect structural symmetry

Predictive Modeling of Neurotoxic α-Synuclein Polymorphs

385

Fig. 4 Designed αS NAC multimers from dimer to octamer in side and top views at the start of the simulations (after minimization and equilibration). The protein structures are shown in cartoon representation to highlight their symmetries. (Reprinted with permission from the Royal Society of Chemistry, [20])

Fig. 5 Designed full-length αS multimers from dimer to octamer in side and top views at the start of the simulations (after minimization and equilibration). The protein structures are shown in cartoon representation to highlight the symmetries. (Reprinted with permission from the Royal Society of Chemistry, [20])

simplifies the modeling and design of homomeric assemblies [36], and cyclic symmetries seem to be the basic building blocks for de novo design of self-assembling proteins such as water-soluble α-helical barrels [37] and helical bundles with high thermodynamic stability [38]. Based on this simple rule, we computationally designed different αS multimers from dimer to octamer with Cn (where n ¼ 2–8) symmetry (Figs. 4 and 5) using the symmetric protein-protein docking method [39]. The following protocol describes the typical procedure for constructing a Cn-related αS tetramer, and the same protocol is also applicable to other αS multimers [20]. 1. Use the NAC region (residues 61–95) of the designed helical αS monomer as input for the M-DOCK program [39]. Select symmetry 4 for a tetramer, and skip the residue selection. 2. Select one tetrameric NAC model from the top ten predicted structures. While the model can be selected randomly to avoid bias, in our experience (see also below), models with tight

386

Liang Xu et al.

hydrophobic packing generally give better conformational energies consistent with the known role of the NAC region as an essential molecular signature for αS aggregation [40]. 3. Align the NAC region of the full-length αS monomer with each chain of the tetrameric NAC structure to determine the coordinates of the rest of the full-length αS tetramer. 4. For the mutated tetramer, quadruple mutant (E46K + H50Q + G51D + A53T) was generated using the Mutator plugin of the VMD program [41]. 5. Relax the generated αS tetramer with a C4 symmetry using short MD simulations (minimization, thermalization, and equilibration). 2.2.2 Testing the Applicability of the Design Protocol

To check if the designed helical αS multimers are more stable than those constructed by other methods, we first compared the relative stabilities of αS tetramer without initial perfect C4 symmetry and the C4 tetramer designed with the initial C4 symmetry [20]. Note that both tetramers have direct NAC contacts between neighboring monomers. We found statistically indistinguishable conformational energies of 11,115  11 and 11,122  3 kcal/mol, respectively [20]. Next, we designed an alternative octamer structure containing one pentamer with C5 symmetry and one trimer with C3 symmetry but no direct NAC interactions between them (Fig. 6). The time-averaged conformational energy suggests that the octamer without the initial NAC contacts is penalized by +61  19 kcal/mol, indicating that initial NAC contacts between neighboring monomers are important to stabilize αS octamers [20, 23]. Therefore, cyclic symmetry and hydrophobic contacts between adjacent NAC regions provides a powerful way of rapidly generating stable αS multimers.

2.2.3 Validation of the Protocol Used to Design αS Multimers

The calculated conformational energy is shown in Fig. 7a. An excellent linear relationship was found between the conformational energy of the multimer and its number of monomers, indicating that assembly of αS multimers should be thermodynamically favorable, and should increase in direct proportion to monomer concentration [20]. The binding energy (BE) for assembly of αS multimers is calculated according to the formula: BE ¼ E ðcomplexÞ 

n X

E i ðmonomerÞ

ð18:4Þ

i¼1

where E(complex) is the conformational energy of the multimer, and Ei(monomer) is the conformational energy of the monomer i within the multimer. The binding affinity also increases linearly from dimer to octamer (Fig. 7b) and no significant alterations were

Predictive Modeling of Neurotoxic α-Synuclein Polymorphs

387

Fig. 6 Designed αS NAC octamer (a) and full-length αS octamer (b) consisting of one NAC pentamer and one trimer with no direct contact between them, in side and top views

observed in the helical structures of the assembling monomers, suggesting that the formation of αS multimers is more likely to follow the conformational selection mechanism than the induced fit model [42]. This means that the preformed helical αS monomers are ready to assemble into helically folded tetramers/multimers that avoid conformational transition and subsequent aggregation. Development of conformation-specific molecule (small drugs or peptides) that can stabilize such helical αS tetramers could provide a promising approach for the treatment of PD. The above thermodynamic predictions indicate that assembly of helical αS multimers ought to proceed smoothly, and should not stall at helical tetramers as suggested by experimental data [13, 14]. To reconcile this apparent contradiction, we estimated the activation energy (AE) that reflects the average energy barrier for assembly [23] using the relation: * + n X   AE ¼ E i ðmonomerÞ  E 0 ðmonomerÞ ð18:5Þ i¼1

where represents ensemble average, Ei(monomer) is the conformational energy of the monomer i within the multimer, and E0(monomer) is the conformational energy of the monomer sampled in the free state. The change in the calculated average

388

Liang Xu et al.

Fig. 7 Conformational energy (a), binding energy calculated using single trajectory (b), and activation energy (c). Horizontal axis number n indicates growth from monomer to octamer. Linear fits are shown in panels (a, b) and lines connecting activation energy values in panel (c) are guides to the eye. (Reprinted with permission from the Royal Society of Chemistry, [20])

activation energy with number of monomers is shown in Fig. 7c. The helical tetramer has the smallest activation energy among all the multimers investigated, which provides an explanation for why the tetramers are the most populated species observed experimentally. Pentamers, hexamers, heptamers, and octamers are thermodynamically stable, but the higher energy barriers indicate that their assembly are kinetically less favorable than tetramer. Figure 7c shows that the activation energy of the wild-type αS overlaps with that of the quadruple mutant, implying that the formation of both species involves a very low energy barrier. To further show the impact of the same mutations on the assembly of polymorphic tetramers, we calculated the activation energy for the helical tetramer without initial perfect C4 symmetry. Activation energies of about 25 and 55 kcal/mol were predicted for the wild-type and the mutant, respectively, indicating that mutations increase the energy barrier for tetramerization, and the mutated tetramer is more sensitive to the mutations. Taken together, these

Predictive Modeling of Neurotoxic α-Synuclein Polymorphs

389

results suggest that the proposed protocol is applicable for the design of relatively stable helical αS multimers with predicted thermodynamic properties consistent with experiment findings. 2.2.4 Comparison with NMR Experimental Observables

The J-coupling constants are useful NMR restraints to distinguish the secondary structures within a protein, which can be calculated based on the dihedral angels according to the Karplus equation [43]: J ¼ A cos 2 ðϕ þ θÞ þ B cos ðϕ þ θÞ þ C

ð18:6Þ

where ϕ denotes dihedral angles φ or ψ, θ is an offset, and A, B, and C are Karplus coefficients parameterized using experimental data or quantum chemical calculations. The Gromacs tool g_chi was used to back-calculate the 3JHNHα tensor, for which A ¼ 6.51, B ¼ 1.76, C ¼ 1.6, and θ ¼ 60 [44]. Chemical shifts are NMR spectroscopy parameters that are used to identify the covalent and noncovalent environment of individual atoms, from which maps of molecular structures can be assembled. Chemical shifts are a powerful tool to identify the secondary structures of a protein [45]. The calculated J-coupling constants and chemical shifts for the designed helical αS monomer are shown in Fig. 8. Tables 1, 2, 3, and 4 summarize the Pearson correlation coefficient (P. C. C.) of J-coupling constants and chemical shifts between the micelle-bound αS structure (PDB ID: 1XQ8) and the designed monomers in free and assembled states. The predicted 3 JHNHα coupling constants and 1Hα and 15N chemical shifts of the designed αS monomers in the micelle-bound and free states could be probed experimentally. 2.3 Construction of Membrane-Bound αS Tetramers 2.3.1 Broken vs. Extended αS Tetramers

Two extended helical tetramers were generated using different protocols (Fig. 9). In model A, the hydrophobic core of NAC region of each monomer (model D in Fig. 1) was aligned to maximize the hydrophobic contacts, similar to the NAC arrangement in the broken tetramer (Fig. 5). In model B, the same helical monomers within the tetramer were intertwined in order to maximize the peptide–peptide contacts without optimizing the hydrophobic contacts in the NAC region. After 200-ns simulations, the calculated conformational energies predict that model B (11,045  62 kcal/mol) is more stable than model A (10,907  0 kcal/mol), but both are less stable than the designed broken tetramer (11,122  3 kcal/mol, Fig. 5), under physiological conditions (310 K, 1 atm, 0.15 M NaCl). Note that the negligible standard deviation for model A could also indicate that it may be trapped in a local minimum.

390

Liang Xu et al.

Fig. 8 Comparison of 3JHNHα coupling constants, (a), and Hα (b) and N (c) chemical shifts between the simulation-optimized helical αS structures (model A shown in Fig. 1) and micelle-bound structure (PDB ID: 1XQ8). MA and MA_M refers to the wild type and mutated model A, respectively

Table 1 Pearson correlation coefficients (P.C.C.) of J-coupling constants 3

JHNHα

1XQ8 (sim.) A

M

A

M _M

1XQ8 (sim.)

MA

MA_M

T

T

1.00

0.05

0.22

0.10

0.21

0.05

1.00

0.35

0.55

0.55

0.22

0.35

1.00

0.65

0.60

T

0.10

0.55

0.65

1.00

0.68

T_M

0.21

0.55

0.60

0.68

1.00

A

Predictive Modeling of Neurotoxic α-Synuclein Polymorphs

391

Table 2 Pearson correlation coefficients (P.C.C.) of Cα chemical shifts δCα

1XQ8 (exp.)

MA

MA_M

T

T

1XQ8 (exp.)

1.00

0.99

0.97

0.98

0.97

0.99

1.00

0.98

0.99

0.97

0.97

0.98

1.00

0.97

0.99

0.98

0.99

0.97

1.00

0.97

0.97

0.97

0.99

0.97

1.00

A

M

MA_M

A

T

A

T_M

Table 3 Pearson correlation coefficients (P.C.C.) of Hα chemical shifts δHα

1XQ8 (exp.)

MA

MA_M

T

T

1XQ8 (exp.)

1.00

0.98

0.79

0.82

0.84

A

0.98

1.00

0.81

0.84

0.87

A

0.79

0.81

1.00

0.86

0.84

0.82

0.84

0.86

1.00

0.89

0.84

0.87

0.84

0.89

1.00

M

M _M T

A

T_M

Table 4 Pearson correlation coefficients (P.C.C.) of N chemical shifts δN

1XQ8 (exp.)

MA

MA_M

T

T

1XQ8 (exp.)

1.00

0.90

0.83

0.89

0.85

0.90

1.00

0.91

0.95

0.91

A

M

A

0.83

0.91

1.00

0.90

0.95

T

0.89

0.95

0.90

1.00

0.92

T_M

0.85

0.91

0.95

0.92

1.00

M _M A

Notes: (1) The chemical shifts were calculated using SHIFTX2 program [45] for conformations sampled during the final 20-ns of molecular dynamics. The NMR J-coupling constants were calculated using the Karplus equation (g_chi tool in Gromacs software). (2) 1XQ8 (sim.) refers to the calculated J-coupling constant for micelle-bound αS (PDB ID: 1XQ8), and 1XQ8 (exp.) refers to the experimental chemical shift for the same micelle-bound αS (PDB ID: 1XQ8). (3) MA, MA_M, T, and T_M refer to the designed helical αS monomer (model A shown in Fig. 1), mutated model A, monomeric model A in tetramer, and monomeric model A in mutated tetramer, respectively. (4) Compared to experimental structure 1XQ8, the designed helical αS (MA) has the biggest difference in 3J-coupling constant; and the mutant (MA_M) has the biggest difference in Hα and N chemical shifts

392

Liang Xu et al.

Fig. 9 Extended αS tetramers designed using different protocols. (a) The extended tetramer with maximum hydrophobic packing; and (b) the extended tetramer with maximum peptide–peptide contacts 2.3.2 Preparation of Membrane-Bound αS Tetramers Criteria for Model Selection

Since there is no way to sample all the possible orientations of αS tetramers relative to the membrane surface, several factors were taken into account to generate a computationally feasible subset of the most plausible initial orientations (Fig. 10). 1. It is well-known that the N-terminal of αS interacts with membrane [46], so we exclude orientations with the C-terminal interacting with membrane (Fig. 10a). 2. The modeled αS tetramers are homogeneous with an initial C4 symmetry, which further precludes many possible orientations. For example, the orientations shown in Fig. 10b, c are the same. 3. The simulations are long enough to allow the tetramer to find favorable binding sites by weakly interacting with membrane. Our simulation results show that both broken and extended αS tetramers bind weakly to the head groups of the membrane. Hence, the orientation shown in Fig. 10d seems to be the best starting conformation to investigate the association of αS tetramers with different membranes.

Generation of Lipid Bilayers with Different Compositions

Two types of lipid bilayers were used to investigate tetramer– membrane interactions. One lipid bilayer is composed of 1000 negatively charged POPS (1-palmitoyl-2-oleoyl-sn-glycero-3phospho-L-serine) lipids, with 500 POPS lipids in each leaflet. The second one is composed of DOPC (1,2-dioleoyl-sn-glycero3-phosphocholine), DOPE (1,2-dioleoyl-sn-glycero-3-phosphoethanolamine), and DOPS (1,2-dioleoyl-sn-glycero-3-phospho-L-serine) lipids in a ratio of 2:5:3, with 500 mixed lipids in each leaflet. Note that the net charge decreases from 1000e in the

Predictive Modeling of Neurotoxic α-Synuclein Polymorphs

393

Fig. 10 Possible orientations of αS tetramers relative to the membrane surface. The orientations (a–c) are excluded according to the model selection criteria and only the orientation in (d) is considered. N and C indicate the N- and C-terminal of αS tetramers with each chain colored differently

POPS lipid bilayer to 300e in the DOPC–DOPE–DOPS lipid bilayer. These two types of membranes are known αS-binding membranes containing the acidic head group PS (phosphatidylserine), and are good mimics of inner plasma membrane and synaptic vesicles [47–50]. The starting structure of each lipid bilayer was generated using the CHARMM-GUI membrane builder [51, 52]. Protocol for Relaxation of Tetramer–Membrane Systems

The stable broken and extended tetramers were placed on the membrane surface (xy-plane) at a starting minimum distance of either 15 or 5 Å away from the lipid bilayer (Fig. 11). The αS tetramer and lipid bilayer were represented by CHARMM36m [25] and CHARMM36 [53] force field parameters, respectively. The tetramer–membrane complex was solvated with the modified TIP3P water model [25] in the z-plane, resulting in a  20-Å thick water layer above the protein and below the membrane. Each simulation cell was then neutralized by adding counterions. After 5000 steps of minimization, each system was equilibrated over six consecutive steps (100 ps each step), with the force constants of position and dihedral restraints of lipids reduced from 1000 to 0 (the unit for position and dihedral restraints are kJ/(mol nm2) and kJ/(mol rad2), respectively), and the force constants of protein backbone and sidechain atoms reduced from 4000 to 0 and 2000 to 0, respectively. During equilibration, the Berendsen [54] thermostat and barostat were applied to maintain the temperature at 310 K and pressure at 1 atm. Semi-isotropic pressure coupling was applied to the system to allow the lipid bilayer to fluctuate in the xy plane independently of the z-axis. For the production run, the Nose´–Hoover thermostat [55] and Parrinello–Rahman barostat [27] were applied. Long-range electrostatic interactions were treated using the particle-mesh Ewald (PME) method. The time step in MD simulations is 2 fs, and the trajectory was saved every 100 ps during 0.5μs of production dynamics for each system. The fraction of native contact Q [29] was calculated to estimate the convergence of the MD simulations. MD simulations were performed using GROMACS-5.0.4 software [28].

394

Liang Xu et al.

Fig. 11 Representative conformations of broken tetramer (a–c) and extended tetramer (d–f) in the presence of different membranes. For clarity, water molecules and background ions are not shown

No direct interactions were observed over the 500-ns MD simulations of tetramers in the presence of the POPS membrane if the initial minimum distance between the tetramer and membrane is set to 15 Å, with the peptide behaving as in bulk water. Therefore, we decreased the initial minimum distance to 5 Å to allow the tetramer to interact with the membrane. Representative binding conformations of tetramers with the lipid bilayers shown in Fig. 11 suggest that the preformed helical tetramers interact weakly with different membranes. A closer view of the residues binding to the membrane is shown in Fig. 12. Residues in the loop region (residues 43–50) of the broken tetramer, especially charged residues Lys43, Lys45 and Glu46, were found to interact with the lipid head groups. With the decrease in the negative charge of lipid head groups on changing from the POPS to DOPC/DOPE/DOPS lipid bilayer, the selectivity of interacting residues from the broken tetramer appears to decrease as more hydrophobic (Val48 and Val49), negatively charged (Glu46 and Glu57), and polar (His50) residues are found to interact with lipid head groups. Different from the broken tetramer that displays a similar orientation on both cell membrane types, the extended tetramer adopts (and maintains) completely different orientations on the two membranes. A “lying down” orientation was observed for the binding

Predictive Modeling of Neurotoxic α-Synuclein Polymorphs

395

Fig. 12 Tetramer residues involved in interactions with lipid bilayer. (a, b) Interacting residues of the broken helical tetramer with the POPS and DOC/DOPE/DPOS membrane; (c, d) interacting residues of the extended helical tetramer with the POPS and DOPC/DOPE/DOPS membrane. Residues are colored by type (basic: blue; acidic: red; polar: green; nonpolar: white)

of the extended tetramer to the POPS membrane whereas a “standing up” orientation was observed for the binding of the same extended tetramer to the DOPC/DOPE/DOPS membrane (Fig. 11), increasing the contact between charged residues of the N-terminal with the POPS head groups by lying down on the membrane. By contrast, only the proximal N-terminal residues up to Lys6 are involved in the interactions with the less-charged membrane composed of DOPC/DOPE/DOPS (Fig. 12d). Implications from Simulations of Tetramer– Membrane Interactions

Although both broken and extended helical tetramers interact weakly with the model membranes, the calculated conformational energies suggest that interaction with biological surfaces could shift the relative stability of the tetramer polymorphs (Fig. 13). Since no direct contact was observed when the initial minimum distance between the tetramer and the membrane is 15 Å, these simulations mimic simulations in bulk water, and can be taken as a control simulation to compare with the simulations of membraneassociated tetramers. No significant difference was detected with 0.5% change in conformational energy for the broken tetramer associating with the POPS and the DOPC/DOPE/DPOS membrane. By contrast, a mild but detectable increase of 2% and 3% in the stability of the extended helical tetramer was found when

396

Liang Xu et al.

Fig. 13 Calculated conformation energy for the broken (denoted as B) and extended (denoted as E) helical tetramers in the presence of different membranes

interacting with the POPS and the DOPC/DOPE/DOPC membrane, respectively. This suggests that membrane interaction can improve the stability of the extended helical tetramer and could potentially impede its degradation into lower molecular weight (and more neurotoxic) [56] monomers, dimers and trimers. Another potentially useful design rule emerging from these predictive models is that the relative stability of broken vs. extended helical tetramer varies with environment. When associating with the POPS membrane, the broken helical tetramer is more stable than the extended helical conformation, with a significantly more favourable conformational energy of 11,199  3 kcal/mol vs. 11,017  31 kcal/mol. However, upon binding to the DOPC/DOPE/DOPS membrane, the extended helical tetramer becomes more stable than the broken one (11,151  2 kcal/mol vs. 11,104  16 kcal/mol) (Fig. 13). This result suggests that αS tetramers could adopt different conformations as they interact with heterogeneous cellular membranes. Specifically, the broken tetramer conformation that predominates in solution could be replaced by extended conformations on moderately charged membrane surfaces such as DOPC/ DOPE/DOPS but not on highly charged surfaces such as POPS.

3

Conclusion In this chapter, we presented a computational protocol to design relatively stable helical αS multimers, especially tetramers in both broken and extended conformations. These structural models are useful to understand the experimental findings that helical αS

Predictive Modeling of Neurotoxic α-Synuclein Polymorphs

397

tetramers are the most commonly observed species, and that familial mutations enhance αS aggregation by shifting the monomer– tetramer equilibrium toward monomers. We discussed how to construct realistic on-membrane structures of tetramer–lipid complexes based on available structural information, and showed that the relative populations of different conformations of helical tetramers could be shifted by altering the composition of the lipid bilayer. Our data provides a further demonstration of the power of predictive modeling for understanding IDPs. References 1. Przedborski S (2017) The two-century journey of Parkinson disease research. Nat Rev Neurosci 18:251–259 2. Bhattacharya S, Xu L, Thompson D (2018) Revisiting the earliest signatures of amyloidogenesis: roadmaps emerging from computational modelling and experiment. Wiley Interdiscip Rev Comput Mol Sci 8:e1359 3. Chiti F, Dobson CM (2017) Protein misfolding, amyloid formation, and human disease: a summary of progress over the last decade. Annu Rev Biochem 86:27–68 4. Goedert M (2001) Alpha-synuclein and neurodegenerative diseases. Nat Rev Neurosci 2:492–501 5. Lashuel HA, Overk CR, Oueslati A, Masliah E (2013) The many faces of α-synuclein: from structure and toxicity to therapeutic target. Nat Rev Neurosci 14:38–48 6. Ulmer TS, Bax A, Cole NB, Nussbaum RL (2005) Structure and dynamics of micellebound human alpha-synuclein. J Biol Chem 280:9595–9603 7. Rao JN, Jao CC, Hegde BG, Langen R, Ulmer TS (2010) A combinatorial NMR and EPR approach for evaluating the structural ensemble of partially folded proteins. J Am Chem Soc 132:8657–8668 8. Jao CC, Hegde BG, Chen J, Haworth IS, Langen R (2008) Structure of membrane-boundsynuclein from site-directed spin labeling and computational refinement. Proc Natl Acad Sci 105:19666–19671 9. Auluck PK, Caraveo G, Lindquist S (2010) α-Synuclein: membrane interactions and toxicity in Parkinson’s disease. Annu Rev Cell Dev Biol 26:211–233 10. Masuda-Suzukake M, Nonaka T, Hosokawa M, Oikawa T, Arai T, Akiyama H, Mann DMA, Hasegawa M (2013) Prion-like spreading of pathological α-synuclein in brain. Brain 136:1128–1138

11. Uchihara T, Giasson BI (2015) Propagation of alpha-synuclein pathology: hypotheses, discoveries, and yet unresolved questions from experimental and human brain studies. Acta Neuropathol 131:49–73 12. Xu L, Bhattacharya S, Thompson D (2018) The fold preference and thermodynamic stability of α-synuclein fibrils is encoded in the non-amyloid-β component region. Phys Chem Chem Phys 20:4502–4512 13. Bartels T, Choi JG, Selkoe DJ (2011) α-Synuclein occurs physiologically as a helically folded tetramer that resists aggregation. Nature 477:107–110 14. Wang W, Perovic I, Chittuluru J, Kaganovich A, Nguyen LTT, Liao J, Auclair JR, Johnson D, Landeru A, Simorellis AK, Ju S, Cookson MR, Asturias FJ, Agar JN, Webb BN, Kang C, Ringe D, Petsko GA, Pochapsky TC, Hoang QQ (2011) A soluble α-synuclein construct forms a dynamic tetramer. Proc Natl Acad Sci 108:17797–17802 15. Dettmer U, Newman AJ, Soldner F, Luth ES, Kim NC, von Saucken VE, Sanderson JB, Jaenisch R, Bartels T, Selkoe D (2015) Parkinson-causing α-synuclein missense mutations shift native tetramers to monomers as a mechanism for disease initiation. Nat Commun 6:7314 16. Dettmer U, Newman AJ, von Saucken VE, Bartels T, Selkoe D (2015) KTKEGV repeat motifs are key mediators of normal α-synuclein tetramerization: their mutation causes excess monomers and neurotoxicity. Proc Natl Acad Sci 112:9596–9601 17. Dettmer U, Ramalingam N, von Saucken VE, Kim T-E, Newman AJ, Terry-Kantor E, Nuber S, Ericsson M, Fanning S, Bartels T, Lindquist S, Levy OA, Selkoe D (2017) Loss of native α-synuclein multimerization by strategically mutating its amphipathic helix causes abnormal vesicle interactions in neuronal cells. Hum Mol Genet 26:3466–3481

398

Liang Xu et al.

18. Kim S, Yun SP, Lee S, Umanah GE, Bandaru VVR, Yin X, Rhee P, Karuppagounder SS, Kwon S-H, Lee H, Mao X, Kim D, Pandey A, Lee G, Dawson VL, Dawson TM, Ko HS (2018) GBA1 deficiency negatively affects physiological α-synuclein tetramers and related multimers. Proc Natl Acad Sci 115:798–803 19. Rovere M, Sanderson JB, Fonseca-Ornelas L, Patel DS, Bartels T (2018) Refolding of helical soluble α-synuclein through transient interaction with lipid interfaces. FEBS Lett 592:1464–1472 20. Xu L, Bhattacharya S, Thompson D (2019) On the ubiquity of helical α-synuclein tetramers. Phys Chem Chem Phys 21(22):12036–12043 21. Kara E, Lewis PA, Ling H, Proukakis C, Houlden H, Hardy J (2013) α-Synuclein mutations cluster around a putative protein loop. Neurosci Lett 546:67–70 22. Ullman O, Fisher CK, Stultz CM (2011) Explaining the structural plasticity of alphasynuclein. J Am Chem Soc 133:19536–19546 23. Xu L, Bhattacharya S, Thompson D (2018) Re-designing the α-synuclein tetramer. Chem Commun 54:8080–8083 24. Gurry T, Ullman O, Fisher CK, Perovic I, Pochapsky T, Stultz CM (2013) The dynamic structure of α-synuclein multimers. J Am Chem Soc 135:3865–3872 25. Huang J, Rauscher S, Nawrocki G, Ran T, Feig M, de Groot BL, Grubmu¨ller H, MacKerell AD (2016) CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nat Methods 14:71–73 26. Bussi G, Donadio D, Parrinello M (2007) Canonical sampling through velocity rescaling. J Chem Phys 126:014101 27. Parrinello M, Rahman A (1981) Polymorphic transitions in single crystals: a new molecular dynamics method. J Appl Phys 52:7182–7190 28. Abraham MJ, Murtola T, Schulz R, Pa´ll S, Smith JC, Hess B, Lindahl E (2015) GROMACS: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1–2:19–25 29. Best RB, Hummer G, Eaton WA (2013) Native contacts determine protein folding mechanisms in atomistic simulations. Proc Natl Acad Sci 110:17874–17879 30. Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22:2577–2637 31. Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M (1983) CHARMM: a program for macromolecular

energy, minimization, and dynamics calculations. J Comput Chem 4:187–217 32. Lee MS, Salsbury FR, Brooks CL (2002) Novel generalized Born methods. J Chem Phys 116:10606–10614 33. Lee MS, Feig M, Salsbury FR, Brooks CL (2003) New analytic approximation to the standard molecular volume definition and its application to generalized Born calculations. J Comput Chem 24:1348–1356 34. Feig M, Onufriev A, Lee MS, Im W, Case DA, Brooks CL (2004) Performance comparison of generalized born and Poisson methods in the calculation of electrostatic solvation energies for protein structures. J Comput Chem 25:265–284 35. Kollman PA, Massova I, Reyes C, Kuhn B, Huo S, Chong L, Lee M, Lee T, Duan Y, Wang W, Donini O, Cieplak P, Srinivasan J, Case DA, Cheatham TE (2000) Calculating structures and free energies of complex molecules: combining molecular mechanics and continuum models. Acc Chem Res 33:889–897 36. Norn CH, Andre I (2016) Computational design of protein self-assembly. Curr Opin Struct Biol 39:39–45 37. Thomson AR, Wood CW, Burton AJ, Bartlett GJ, Sessions RB, Brady RL, Woolfson DN (2014) Computational design of water-soluble alpha-helical barrels. Science 346:485–488 38. Huang PS, Oberdorfer G, Xu C, Pei XY, Nannenga BL, Rogers JM, DiMaio F, Gonen T, Luisi B, Baker D (2014) High thermodynamic stability of parametrically designed helical bundles. Science 346:481–485 39. Pierce B, Tong W, Weng Z (2005) M-ZDOCK: a grid-based approach for Cn symmetric multimer docking. Bioinformatics 21:1472–1478 40. Hashimoto M, Takenouchi T, Mallory M, Masliah E, Takeda A (2000) The role of NAC in amyloidogenesis in Alzheimer’s disease. Am J Pathol 156:734–736 41. Humphrey W, Dalke A, Schulten K (1996) VMD: visual molecular dynamics. J Mol Graph 14:33–38 42. Okazaki K, Takada S (2008) Dynamic energy landscape view of coupled binding and protein conformational change: induced-fit versus population-shift mechanisms. Proc Natl Acad Sci U S A 105:11182–11187 43. Minch MJ (1994) Orientational dependence of vicinal proton-proton NMR coupling constants: the Karplus relationship. Concepts Magn Reson 6:41–56

Predictive Modeling of Neurotoxic α-Synuclein Polymorphs 44. Vuister GW, Bax A (1993) Quantitative J correlation: a new approach for measuring homonuclear three-bond J(HNH.alpha.) coupling constants in 15N-enriched proteins. J Am Chem Soc 115:7772–7777 45. Han B, Liu Y, Ginzinger SW, Wishart DS (2011) SHIFTX2: significantly improved protein chemical shift prediction. J Biomol NMR 50:43–57 46. O’Leary EI, Lee JC (2019) Interplay between α-synuclein amyloid formation and membrane structure. Biochim Biophys Acta Proteins Proteom 1867:483–491 47. Fusco G, Chen SW, Williamson PTF, Cascella R, Perni M, Jarvis JA, Cecchi C, Vendruscolo M, Chiti F, Cremades N, Ying L, Dobson CM, De Simone A (2017) Structural basis of membrane disruption and cellular toxicity by α-synuclein oligomers. Science 358:1440–1443 48. Galvagnion C, Brown JWP, Ouberai MM, Flagmeier P, Vendruscolo M, Buell AK, Sparr E, Dobson CM (2016) Chemical properties of lipids strongly affect the kinetics of the membrane-induced aggregation of α-synuclein. Proc Natl Acad Sci 113:7065–7070 49. Ferreon ACM, Gambin Y, Lemke EA, Deniz AA (2009) Interplay of α-synuclein binding and conformational switching probed by single-molecule fluorescence. Proc Natl Acad Sci 106:5645–5650 50. Dong C, Hoffmann M, Li X, Wang M, Garen CR, Petersen NO, Woodside MT (2018)

399

Structural characteristics and membrane interactions of tandem α-synuclein oligomers. Sci Rep 8:6755 51. Jo S, Lim JB, Klauda JB, Im W (2009) CHARMM-GUI membrane builder for mixed bilayers and its application to yeast membranes. Biophys J 97:50–58 52. Wu EL, Cheng X, Jo S, Rui H, Song KC, Da´vila-Contreras EM, Qi Y, Lee J, MonjeGalvan V, Venable RM, Klauda JB, Im W (2014) CHARMM-GUI membrane builder toward realistic biological membrane simulations. J Comput Chem 35:1997–2004 53. Huang J, MacKerell AD (2013) CHARMM36 all-atom additive protein force field: validation based on comparison to NMR data. J Comput Chem 34:2135–2145 54. Berendsen HJC, Postma JPM, van Gunsteren WF, DiNola A, Haak JR (1984) Molecular dynamics with coupling to an external bath. J Chem Phys 81:3684–3690 55. Nose´ S (1984) A unified formulation of the constant temperature molecular dynamics methods. J Chem Phys 81:511–519 56. Winner B, Jappelli R, Maji SK, Desplats PA, Boyer L, Aigner S, Hetzer C, Loher T, Vilar M, Campioni S, Tzitzilonis C, Soragni A, Jessberger S, Mira H, Consiglio A, Pham E, Masliah E, Gage FH, Riek R (2011) In vivo demonstration that α-synuclein oligomers are toxic. Proc Natl Acad Sci 108:4194–4199

Chapter 18 Characterization of Amyloidogenic Peptide Aggregability in Helical Subspace Shayon Bhattacharya, Liang Xu, and Damien Thompson Abstract Prototypical amyloidogenic peptides amyloid-β (Aβ) and α-synuclein (αS) can undergo helix–helix associations via partially folded helical conformers, which may influence pathological progression to Alzheimer’s (AD) and Parkinson’s disease (PD), respectively. At the other extreme, stable folded helical conformers have been reported to resist self-assembly and amyloid formation. Experimental characterisation of such disparities in aggregation profiles due to subtle differences in peptide stabilities is precluded by the conformational heterogeneity of helical subspace. The diverse physical models used in molecular simulations allow sampling distinct regions of the phase space and are extensive in capturing the ensemble of rich helical subspace. Robust and powerful computational predictive methods utilizing network theory and free energy mapping can model the origin of helical population shifts in amyloidogenic peptides, which highlight their inherent aggregability. In this chapter, we discuss computational models, methods, design rules, and strategies to identify the driving force behind helical self-assembly and the molecular origin of aggregation resistance in helical intermediates of Aβ42 and αS. By extensive multiscale mapping of intrapeptide interactions, we show that the computational models can capture features that are otherwise imperceptible to experiments. Our models predict that targeting terminal residues may allow modulation and control of initial pathogenic aggregability of amyloidogenic peptides. Key words Intrinsically disordered proteins, Peptide self-assembly, Neurodegenerative disease, Helical intermediates, Predictive molecular design, Cross-correlation network analyses, Molecular dynamics simulations, Central hydrophobic domain, Charged terminal groups

1

Introduction A central challenge in amyloid research is the identification and characterisation of key pathological intermediates formed during the early events of amyloidogenic self-assembly [1–3]. Although the amyloidogenic monomers are natively unfolded (intrinsically disordered proteins or IDPs) under physiological conditions, the disease pathogenicity is now commonly associated with the assembly of toxic soluble oligomers [3, 4], including amyloid-β (Aβ) in Alzheimer’s disease (AD) [5] and α-synuclein (αS) in Parkinson’s disease (PD) [6]. Aβ exists in its two major alloforms: 40-residue

Mai Suan Li et al. (eds.), Computer Simulations of Aggregation of Proteins and Peptides, Methods in Molecular Biology, vol. 2340, https://doi.org/10.1007/978-1-0716-1546-1_18, © Springer Science+Business Media, LLC, part of Springer Nature 2022

401

402

Shayon Bhattacharya et al.

Aβ40, and the more toxic and aggregation-prone 42-residue Aβ42 [7]. Aβ42 is made up of four distinct domains: a polar N-terminal region (NTR; residues 1–16), central hydrophobic cluster (CHC; 17–21), hydrophilic turn region (22–29), and hydrophobic C-terminal region (CTR; 30–42). The larger 140-residue αS peptide has three domains: NTR (residues 1–60), a central hydrophobic non–amyloid-β component region (NAC, 61–95) [8], and an acidic CTR (96–140). Hydrophobic peptide–peptide interactions via CHC in Aβ and NAC in αS are critical for self-assembly of these two amyloidogenic peptides [9, 10]. The hydrophobic domains are commonly referred to as central hydrophobic domain or CHD, and this acronym has been consistently used throughout this chapter. The primordial phase of amyloidogenesis may present metastable partially folded intermediates [11, 12], commensurate with fibril formation [13, 14]. In particular, propensity of aggregation to fibrils may well be encoded within the monomer dynamics and their energetics [15–19]. A less-explored feature in this regard has been the possible contribution of helical intermediates in modulating the aggregation pathway [20]. Previous experiments have reported the involvement of α-helical oligomeric intermediates en route to fibrils in early amyloidogenesis, including Aβ and αS [20– 28]. Also, partially folded helical monomers have been reported to be aggregation prone and identified as a crucial intermediate at the nascent stage of fibrillation for both Aβ and αS [27, 29]. The possibility has also been raised that intermediary helical states might populate the initial lag phase and subsequently transit to β-sheet oligomers and prefibrillar aggregates that accumulate downstream to mature fibrils [20, 28]. These transiently populated locally folded helical peptides that both Aβ and αS display in membrane-free aqueous solution [30, 31], can partake in helix– helix associations to form intermediary helical oligomers: these events may arise from conformational exchanges between nonaggregating helically folded and completely unfolded (disordered) states. On the other hand, it is likely that the folded stable helical monomers can resist the helix–helix associations leading to helical oligomers [32]. As a logical consequence, optimizing the stability of predominantly folded helical monomers [27] may reduce the pool of toxic misfolded or partially folded intermediates for amyloidogenesis, and consequently impede fibril formation. This hypothesis is supported by a number of experimental studies. For Aβ42 monomer, stabilization of helicity in both central hydrophobic region [33–35] and N-terminal region [36] by small molecules has resulted in reduction of pathogenic aggregation. For αS monomer, the central hydrophobic non–amyloid-β component has been stabilized by small molecules into helical domains that resist aggregation on membranes [37] and in membrane-free environment [38]. The above studies strengthen the idea that monomers can

Characterization of Amyloidogenic Peptide Aggregability in Helical Subspace

403

exhibit transient helicity in solution. However, very little is known about the conformational distribution in the helical subspaces of Aβ42 and αS peptides in aqueous solution, as it is extremely challenging to capture the elusive nature of these helices by experimental techniques alone [39, 40]. Modeling studies, from our group [41–45] and others [46–61], have helped map the subspaces. In this chapter, we delineate computational theories, strategies, methods, designs, and models to characterize the subtleties in helical conformational subspaces of Aβ42 and αS peptides that help answer two important questions: (1) What is the driving force to aggregate partially folded helices but not fully folded or fully unfolded states? (2) Can the population of stabilized helical conformations be increased, to make them more resistant to toxic oligomerization? In turn, we highlight the predictive power of diverse physical models (combinations of protein force field and explicit water model) in simulations with distinct helix–coil transition propensities, and atomic resolution information obtained from powerful computational techniques including dynamic cross-correlation network analyses, free energy calculations, and networks of interaction maps, using experimentally validated models [62–72] and benchmarking our data against all available experimental data in particular NMR [47, 73–77].

2

Materials and Methods

2.1 Rationale for Selection of Starting Helical Monomer Models

To generate an extensive distribution of helical subspaces, we monitored unfolding of helices from Aβ42 and αS monomers, and so the choice of starting helical models is imperative to accurately sample the phase space. The initial helical structures for equilibrium molecular dynamics (MD) simulations were obtained from PDB coordinates solved by solution NMR: Aβ42 (PDB ID: 1IYT [78]) solved in 20% water/80% deuterated hexafluoro isopropanol (see Fig. 1a(I)) and micelle bound human αS (PDB ID: 1XQ8 [79]) solved in aqueous solution with sodium dodecyl sulphate (Fig. 1b (I)). We note that another NMR structure of Aβ42 (PDB ID: 1Z0Q [80]) solved in 70% water–30% HFIP also shows α-helical propensities, but has significantly less helical content than that of 1IYT (Fig. 1a(II)), and thus further promotes conformational changes from α-helix to β-strand on increasing the content of water. Since our main focus was starting from a predominantly helical conformation, we chose the helix–kink–helix conformation captured in NMR structure 1IYT as our starting geometry for Aβ42. We note that for αS, another broken helical structure (PDB ID: 2KKW [81]) was solved by NMR (Fig. 1b(II)). In this structure, αS was bound to a micelle of the detergent sodium lauroyl sarcosinate and displayed similar helical regions as 1XQ8. The two helices for Aβ42 in model 1IYT span Ser8–Gly25 (Helix 1) and

404

Shayon Bhattacharya et al.

Fig. 1 NMR structures of monomers. (a) Aβ42 solved in (I) 20% water/80% deuterated-hexafluoroisopropanol (PDB code 1IYT [78]) and (II) 70% water/30% deuterated-hexafluoroisopropanol (PDB code 1Z0Q [80]). (b) αS solved in (I) micelle-bound aqueous solution with sodium dodecyl sulphate (PDB code 1XQ8 [79]) and (II) Sodium lauroyl sarcosinate micelle–bound aqueous solution (PDB code 2KKW [81]). The N-, C-terminal, and the central hydrophobic domains (CHC in Aβ42 and NAC in αS) of the peptides are colored red, green, and gray respectively, while the additional turn region in Aβ42 is colored blue. The helix–kink–helix models selected for our simulations are shown inside red boxes with the regions encompassing Helix 1 and Helix 2 marked. The full sequences of both the peptides are given underneath the models

Lys28–Gly38 (Helix 2), while for αS in model 1XQ8, they span Val3–Val37 (Helix 1) and Lys45–Tyr92 (Helix 2). Figure 1 depicts all the available helical models of the two peptides highlighting the helix–kink–helix models we used as starting geometry in our simulations.

Characterization of Amyloidogenic Peptide Aggregability in Helical Subspace

405

2.1.1 Role of N-Terminus in Amyloid-β Aggregation

Although the hydrophobic domains encompassing the central hydrophobic cluster (CHC) and the C-terminal region (CTR) are well-documented for their aggregation-prone nature [9], the N-terminal region (NTR) remains majorly unfolded in Aβ fibrils and does not constitute the fibril core [82, 83]. In stark contrast, the NTR of some fibril polymorphs can form structured β-strand motifs as seen from recent NMR and cryo-EM studies [84, 85]. This feature is consistent with the more extended β-sheet structures observed for the NTR of Aβ42 than the NTR of Aβ40 [50, 86]. Sequence and site dependent shifting of the β-hairpin populations in mutants with single substitutions in the NTR (such as A2T and A2V) further emphasize the role of NTR in either promoting or inhibiting aggregation of Aβ42 [87–90]. Interestingly, the NTR may also partake in allosteric stabilization of the β-hairpin of Aβ(17–34) [91].

2.1.2 Role of C-Terminus in α-Synuclein Aggregation

Amyloidogenic aggregation and cytotoxicity can be attributed mainly to the non–amyloid-β component (NAC) in αS [10]. However, the acidic C-terminal region (CTR) of αS may significantly contribute in regulating self-assembly [92, 93]. Previous studies highlight an increase in aggregation propensity in truncated variants of αS CTR compared to their full-length WT counterparts [94–96]. CTR may participate in intrapeptide tertiary contacts which potentially may resist pathogenic aggregation, while preserving the native monomeric state [97, 98]. On the contrary, these contacts have also been prevalent in PD-linked familial mutants [99]. Thus, the role of CTR in modulating toxic αS aggregation requires further scrutiny.

2.2 Modeling the Helical Subspace

A broad distribution of monomeric helical intermediates of Aβ42 and αS was obtained by extensive sampling across helical subspace employing the predictive power of ten physical models or combinations of force field and water model (FF/WM) belonging to the CHARMM and Amber families. The FF/WM combinations employed are: CHARMM36 [62] with (1) CHARMM-modified TIP3P [63], (2) TIP4P [64], (3) TIP4P/2005 [65], (4) TIP4PEw [66], and (5) TIP4P-D [67]; (6) CHARMM22* [68] with TIP4P-D, CHARMM36m [69] with (7) CHARMM-modified TIP3P and (8) scaled CHARMM-modified TIP3P [69], and (9) Amber ff03 [70, 71] (A03) with TIP4P-D and (10) Amber ff03Ws [72] with scaled TIP4P/2005. The combinations of FF/WM were carefully and precisely chosen such that their distinct helix–coil transition propensities may not only identify short-lived helical states, but also local helices which persist in water. Previous benchmarking studies have shown that the CHARMM22* protein force field represented model IDPs including Aβ42 [53, 100] better than other force fields [101, 102], so here we have combined CHARMM22* with the dispersion-corrected four-site TIP4P-D

2.2.1 Employing the Predictive Power of Physical Models

406

Shayon Bhattacharya et al.

water model to favor extensive sampling of disordered states. On the other hand, we combine the more conventional CHARMM36 (mainly developed for folded/globular proteins) with three-site and four-site water models (including TIP4P-D) to sample more ordered states. The recently developed CHARMM36m, a flexible protein force field to simultaneously model ordered and disordered protein states, with the two three-point water models provides an additional very useful model to sample transitions between quasiordered states. Similar rules apply for the Amber FF variants, with Amber ff03Ws plus scaled TIP4P/2005 more suitable for enhancing helix unfolding to disordered states, while Amber ff03 is parameterized for ordered states, and so we wanted to assess if combining it with TIP4P-D water model would expedite unfolding of helices. The FF/WM combinations allowed us to generate different states of “foldedness”: folded helical (F), partially folded helical (P), and unfolded (U) (discussed later under Sect. 2.2.3). 2.2.2 Model Relaxation, Dynamics, and Test for Convergence Atomistic Equilibrium Molecular Dynamics Simulations

Equilibrium molecular dynamics (MD or EMD) simulation [103, 104] is a method to computationally mimic the natural “wiggling and jiggling” of atoms, in order to calculate their equilibrium and dynamical properties, and their interactions. MD simulations track atomic trajectories along a potential energy surface at the given ensemble of temperature and volume/pressure. The function generates the so-called predictive model, as it attempts to determine the possible energy of the model system utilizing information such as bond stretching, angle bending, and rotations. For EMD simulations, the peptides were represented by the different FF/WM combinations and solvated in large water boxes keeping a minimum distance of 20 Å between any protein atom and any box edge. EMD simulations were carried out using the Gromacs 5.1.4 [105] package with an integration time step of 2 fs implemented in the velocity Verlet integrator [106] with bond lengths to hydrogen constrained using the LINCS [107] (protein) and the SETTLE [108] (water) algorithms. Snapshots were saved every 20 ps, which ensured adequate sampling of conformations as assessed from the convergence of simulations (discussed in “Assessment of Convergence in Simulations” section). Background ions were added to neutralize protein formal charges and to mimic physiological ionic strengths (0.15 M NaCl). Long-range electrostatics were treated by the particle mesh Ewald (PME) method [109] using periodic boundary conditions. Protein and nonprotein molecules (water and ions) were coupled separately to an external heat bath (310 K) with the coupling time constant of 1 ps using the velocity rescaling method [110]. All systems were minimized for 100 ps, and equilibrated for 1 ns in constant volume NVT ensemble followed by another 1 ns of NPT equilibration with the reference pressure at 1 bar and a time constant of 4 ps using the Berendsen barostat [111]. The production runs were carried out in constant

Characterization of Amyloidogenic Peptide Aggregability in Helical Subspace

407

pressure NPT ensemble. Simulations were run with each of the ten combinations of protein force fields and water models for fulllength Aβ42 and αS and their terminal deleted variants (on a subset of FF/WM combinations), amounting to a total run time of ~36 μs. Hamiltonian Replica Exchange Simulations with Solute Scaling

In addition to EMD, we used Hamiltonian replica exchange simulations (HREX) with solute scaling (also known as REST2 [112]) for advanced sampling in the conformational subspace. Standard temperature replica exchange MD (T-REMD, also known as parallel tempering) simulations have been used extensively to enhance the sampling of IDP conformations [113]. Multiple replicas (N copies) of a IDP system, which are noninteracting, are usually concurrently simulated at different temperatures (Ti) with periodic conformational exchanges between replicas based on Metropolis criterion [114, 115]. Simply put, the Metropolis criterion decides whether a certain conformational exchange is accepted or rejected, such that that their transition probability depends only on the change in potential energy of the ith copy or replica. This leads to sampling of structures at higher temperatures, which might have been unexplored at physiological temperature (usually 310 K). Finally, the conformations are obtained at physiological temperature to represent an ensemble of accurately sampled conformations. However, the number of replicas with standard T-REMD method increases with system size as O( f1/2), where f ¼ total number of degrees of freedom [116]. A suitable method for bypassing the poor scaling of replicas is to only sample through parallel tempering of the solute (proteins in this case), keeping the solvent at physiological temperature. This method is known as Replica Exchange with Solute Tempering or REST [117]. An improved version of REST—the more recent Hamiltonian scaling or replica exchange with solute scaling greatly enhances sampling efficiency [112]. This method scales the solute at different Hamiltonians (rather than at different temperatures). A Hamiltonian is an operator which includes the sum of the potential energies plus the kinetic energies for all the atoms in the simulation system. Thus, the Lennard–Jones potential well depths (ε) of the protein atoms in the system could be scaled at different coupling parameter values or λ values given by λ¼

T ½0 T ½i 

ð19:1Þ

where T[0] is the initial temperature/temperature of interest and T[i] is the subsequent higher temperature, such that higher temperatures have lower λ values within a range of 0–1. We evaluated 24 replicas for Aβ42 and 48 replicas for αS between temperatures 310 and 500 K (with corresponding scaling to their Hamiltonians).

408

Shayon Bhattacharya et al.

Fig. 2 Comparison of the quality of helical conformational subspaces sampled by EMD and HREX through free energy mapping of % helicity vs. radius of gyration (Rg), for αS sampled with CHARMM22* in TIP4P-D water

The choice of performing an EMD simulation or a HREX here depends on the sampling requirements of the given system. For studies of helical unfolding of amyloidogenic peptides here, we have used EMD, as reaching a thermodynamic equilibrium (for obtaining equilibrium related properties typical to a real system) would satisfy adequate sampling from a structured protein. However, for systems starting with an IDP structure to monitor transient helical folding, and to correspond with the unfolding simulations from a helical starting structure, we have used HREX to validate the sampling space generated by traditional EMD, which requires extensive sampling for long timescales in order to reach convergence. Figure 2 depicts how the free energy helical subspace of unfolding through EMD simulations match transient folding through HREX. The protocol for HREX simulations here is the same that of EMD discussed in the previous section. Assessment of Convergence in Simulations

The initial assessment of the statistical quality of sampling in all-atom MD simulations is done through convergence tests. Simply put, one has to determine how long the simulations must run before a certain property reaches a stable state. There are several methods to assess convergence of a given simulation data and check for self-consistency [118]. In this work we used the following methods.

Bootstrapped Block Covariance Overlap Method

Adequate sampling of the helical IDPs from different physical models was determined by checking for their self-consistency and convergence within the simulation time. In the Block Covariance Overlap Method [119] (BCOM), the trajectories are divided into equal-sized time blocks. The Principal Component Analyses (PCA)

Characterization of Amyloidogenic Peptide Aggregability in Helical Subspace

409

is then calculated for each block on the conformational matrix (of backbone atoms), and the covariance overlap is estimated between the blocks and the full trajectory. The whole process is iterated for blocks of increased sizes. Here, we have used a variant of the original BCOM method, the Bootstrapped Block Covariance Overlap Method (BBCOM). The difference between BCOM and BBCOM is that BBCOM uses a bootstrapping procedure in which the trajectory frames are selected randomly rather than contiguously, enabling a more robust assessment of convergence. The covariance overlap method [120, 121] assesses both similarities in eigenvectors and eigenvalues, which are the directions of atomic motion and their relative significance, respectively. These values range from 0 to 1 scaled according to their similarities in fluctuations, with 0 being the least and 1 being most similar. The covariance overlap between two PCA of ensembles A and B (ΩA, B) can be expressed as [119]: ΩA,B ¼ 1 1 2N 1  ! !2 32 NP modes  modes NP modes   P A B A B A B 2 λi þ λi  2 λi λ j vi : v j 6 7 6 i 7 i i 6 7 NP modes   4 5 A B λi þ λi

i

ð19:2Þ where Nmodes is the number of most significant modes for (backbone) atomic fluctuations obtained from PCA, λA i is the ith eigenvalue from ensemble A, λBi is the ith eigenvalue from ensemble B, ! λBj is the jth eigenvalue from ensemble B, vA i is the ith eigenvector ! B from ensemble A, and v j is the jth eigenvector from ensemble B. For this work, the Lightweight Object-Oriented Structure Library (LOOS) [122] was used to perform BBCOM. The convergence plots for the F, P, and U states of Aβ42 and αS obtained using the BBCOM method are shown in Fig. 3. Here, the trajectories were initially disvided into equal blocks as small as 10 ns and iterated by increasing block sizes of further 10 ns, until half of the total trajectory/simulation time (two blocks) was reached. Cumulative Averages of Secondary Structure Content

As opposed to block averages that compute averages for a time interval or block of time, cumulative averages calculate the average property at every time stamp of simulation, such that each sliding averaged property over the time interval is computed. In other words, cumulative averages are moving averages of equally weighted series of measurements up until a certain time, and can be expressed as: CA n ¼

x1 þ x2 þ x3 þ . . . þ xn n

ð19:3Þ

410

Shayon Bhattacharya et al.

Fig. 3 The temporal convergence or self-consistency of simulations of the conformational matrices (atom coordinates) of Aβ42 and αS by Bootstrapped Block Covariance Overlap Method (BBCOM). All the Folded (F), Partially folded (P) and Unfolded (U) states are shown. Initially conservative blocks of 10 ns were considered, with iterative increments of 10 ns up to half the simulation times (relative to the protein and sampling time). Most datasets start to plateau after the first 200 ns of simulations

where x1, x2, x3, . . ., xn are measurements of the property in question up to n number of values. We used this method to assess the temporal convergence of α-helical content (percentages) from both EMD and HREX simulations (Fig. 4). 2.2.3 Characterisation of the Degree of Helical Folding/Unfolding

In order to generate extensive helical subspaces and categorically characterize the helical “foldedness,” we adopted a strategy to precisely monitor the two key events of (a) unfolding and (b) preservation of helically folded states. Fraction of native

Characterization of Amyloidogenic Peptide Aggregability in Helical Subspace

411

Fig. 4 Time convergence of cumulative average helical content (%) for the (a) equilibrium MD (EMD) for a subset of FF/WM combinations for Aβ42 and αS and (b) the replica with Hamiltonian (from scaling of temperatures) of interest for Hamiltonian replica exchange with solute scaling (HREX)

contacts or Q assesses the property of retainment of contacts in the native folds of proteins as a function of time. A native contact occurs between nonconsecutive residues with a pair of heavy atoms within a cutoff distance, typically 5 Å. This method was designed by Best et al. [123] to show that Q is a good folding coordinate for atomistic simulations. This method is extremely suitable in our case for assessment of helical “foldedness” of these amyloidogenic peptides in order to map their closeness to not only the folded helical state, but also the native IDP state. The fraction of native contact for configuration X, Q(X) can be expressed as: 1X 1 h  i Q ðX Þ ¼ ð19:4Þ N 0 ð X Þ  λr 1 þ exp β r ði, j Þ ij ij where N is atoms (i, j), which are in contact if the distance between them is less than 5 Å and they are separated by at least 3 residues. rij(X) is the distance between i and j in conformation X and r 0ij is the distance in the native state. β is a smoothening parameter taken as 5 Å1 and λ is a factor that describes fluctuations when the

412

Shayon Bhattacharya et al.

contact is formed, taken to be 1.8 for all-atom models. The details of this method are provided in [123]. Q(X) is a popular reaction coordinate for structured proteins [123, 124] and has been extensively employed as an order parameter to investigate coupled helical folding and binding in IDPs [125, 126]. Here we designed a metric for classification of helical conformational subspace such that, Q(X)  0.5 identifies folded (F) states, Q(X) ¼ 0.1–0.5 are partially (P) folded states, and Q (X)  0.1 are unfolded (U) states [123], and discount random, very short-lived states with lifetimes 50% and are almost identical for both 5 and 4 Å interaction distance cutoffs

NTR interactions (marked by a red rectangle) in F state (Fig. 13a). Hydrophobic interactions between the NTR (residue F4) and the CHC (L17 and A21) predominates followed by salt-bridges between NTR and turn region (E3–K28), and a minor population of hydrophobic contacts between NTR and CTR (F4–I31). These interactions are completely absent in the P and the U states. The simultaneous involvement of proximal NTR residue F4 with the CHC and the CTR suggests that the NTR is collapsed in the F state. The E22/D23–K28 salt bridges can stabilize the loop region

Characterization of Amyloidogenic Peptide Aggregability in Helical Subspace

429

in Aβ42 fibrils [82]. It is thus expected that with the formation of E3–K28 salt bridge in the F state, the E22/D23–K28 salt bridge is disrupted, which indicates the propensity for the F states to be less fibril-like and hence nonaggregating. Similar to Aβ42, the tertiary contacts of αS states were characterized (Fig. 13b). At the outset, the distribution of contacts seems quite diffuse. Given the structural heterogeneity of αS with a long-disordered tail of CTR (45 residues), it is a challenge to segregate the dominant tertiary structural patterns with the fold propensities. However, two distinct patterns emerge that separate the F from the U states: firstly, the contact frequencies between CTR and the rest of the domains (marked by green rectangle) of the F states are truly antidiagonal with densely populated NAC– CTR contacts (> 60% probability) and with no discernible NTR– NAC contacts. Secondly, a total absence of contacts between either NTR or NAC and CTR for the U states with a less dense NTR– NAC contact probabilities. Thus, the prominence of CTR contacts here can be appreciated, where frequent interactions with the NAC region (green rectangle) is seen for the F states (Fig. 13b). These interactions are predominantly the cluster of hydrophobic interactions between NAC and a collapsed CTR (V74–I112/L113, V77– P108/I112, and A78/I112) with the secondary salt-bridge between K80 and E110. The CTR interactions are completely absent in the P and the U states, and their experimental characterisation is hampered by the propensity to form helical multimers [26, 42, 58, 60, 156, 157]. Energetic Assessment of Interactions

In order to assess the impact of interdomain interactions on the overall folding and unfolding profiles, correlated changes were computed between domain–domain interaction energies and peptide conformational energies.

Computation of Conformational Energies

The conformational energies, EMM/PBSA was calculated from the molecular mechanics energies combined with the Poisson–Boltzmann continuum solvation (MM/PBSA) method [158] using the Amber 12 [159, 160] package, which can be expressed as E MM=PBSA ¼ E MM þ G solvation

ð19:9Þ

where EMM is the molecular mechanics potential energy in vacuum/gas phase, which is expressed as E MM ¼ E bonded þ E vdW þ E elec

ð19:10Þ

where Ebonded is the covalent bonded interaction energy including bond, angle and dihedral energy terms, EvdW is the van der Waals interactions described by the Lennard-Jones potential function, and Eelec is the electrostatic interactions described by the Coulomb potential function.

430

Shayon Bhattacharya et al.

Gsolvation in Eq. (9) is the solvation free energy comprised of two terms [161, 162] expressed as G solvation ¼ G polar þ G apolar

ð19:11Þ

where Gpolar is the electrostatic contribution to solvation and is obtained by solving the Poisson-Boltzmann (PB) equation [163]; Gapolar is the nonpolar contribution and is estimated from a linear relation to solvent accessible surface area (SASA) as G apolar ¼ γ:SASA þ b

ð19:12Þ

where γ is a coefficient set to the surface tension of the solvent and b is a fitting parameter. Computation of Interaction Energies

Interaction energies calculated are composed of peptide nonbonded electrostatic and van der Waals energies estimated using the NAMD [164] energy plugin in VMD [165].

Relative Contribution of Domain–Domain Interactions To Overall Energies

The relative contributions of the domain–domain interactions were quantified by Pearson Correlation Coefficients (PCC) between interaction energies and the conformational energies (EMM/PBSA), for the F and the P states, and the U states separately (see Tables 3 and 4). A positive correlation is recorded for all the F and the P states of Aβ42 (Fig. 14a), with the interaction energies between NTR and amyloidogenic CHC showing the highest PCC of 0.6 (marked in red) with the conformational energies, which is in line with the estimated relative populations in interaction maps (Fig. 13a). This is followed by modest correlations among amyloidogenic CHC–turn (PCC ¼ 0.4) and NTR–turn (PCC ¼ 0.4), indicating a competition between NTR and amyloidogenic CHC to interact with the turn region. The shift in population of the amyloidogenic CHC–CTR interactions toward CHC–NTR interactions is confirmed by the negligible CHC–CTR correlation coefficient (0.1). Similarly, there is no significant correlation (PCC ¼ 0.1) between interaction energies of the termini (NTR–CTR) and the conformational energies. In summary, the decreasing order of correlation for the F and P states of Aβ42 is [NTR– CHC] > [CHC–turn] > [NTR–turn] > [turn–CTR] > [CHC– CTR] > [NTR–CTR]. By contrast, the interaction energies between the NTR and amyloidogenic CHC for the U states show only a weak correlation of 0.2 with the conformational energies, and all the interactions with the CTR show a negative correlation. Amyloidogenic CHC–CTR and NTR–CTR are highly anticorrelated, indicating that these interactions have no positive effect on the overall structure, which is in part due to an extension of conformation during unfolding. The highest correlation among the U states for NTR–turn (0.3) is one of the weakest compared to the F and the P states, the order decreasing as [NTR–

Characterization of Amyloidogenic Peptide Aggregability in Helical Subspace

431

Table 3 Average Interaction energies with standard errors within parentheses between different interacting domains of the F/P and U states of Aβ42 showing the Pearson correlation coefficient (PCC) with total conformational energy (EMM/PBSA) for the last 200 ns of simulations. We used NAMD [164] energy plugin in VMD [165]. The corresponding residue charges of the force fields were retained during calculation. The EMM/PBSA calculated by the MM/PBSA [158] method are composed of the molecular mechanics energies in vacuum and solvation free energies. All energy units are in kcal mol1 EMM/ FF/WM

NTR–CHC CHC–Turn Turn–CTR NTR–Turn CHC–CTR NTR–CTR

PBSA

F or P states Charmm36/mTIP3P

121 (0) 37 (0)

52 (10) 87 (2)

19 (6)

9 (3)

509 (3)

Charmm36/TIP4P

88 (17) 38 (3)

49 (2)

7 (7)

20 (0)

19 (0)

529 (6)

Charmm36/TIP4P/ 2005

83 (14) 19 (0)

22 (1)

3 (3)

10 (1)

13 (1)

497 (4)

Charmm36/TIP4PEw

67 (0)

35 (16)

27 (1)

32 (10) 24 (3)

12 (7)

488 (6)

Amber ff03/TIP4P-D

91 (1)

32 (4)

31 (0)

49 (12) 13 (3)

29 (11) 582 (16)

Charmm36/TIP4P-D

42 (1)

25 (1)

29 (5)

6 (3)

8 (13)

35 (0)

PCC

0.6

0.4

0.2

0.3

0.1

0.1

Charmm22*/TIP4PD

53 (4)

17 (6)

69 (10) 12 (3)

17 (1)

18 (1)

453 (7)

Amber ff03ws/ TIP4P/2005

81 (3)

16 (2)

41 (4)

9 (1)

2 (1)

2 (0)

594 (10)

Charmm36m/ mTIP3P

88 (8)

17 (1)

44 (3)

2 (0)

30 (1)

11 (1)

449 (7)

Charmm36m/sTIP3P

88 (12) 9 (7)

28 (3)

2 (1)

9 (0)

7 (3)

472 (12)

PCC

0.2

0.3

0.3

0.8

0.8

462 (2)

U states

0.1

turn] > [NTR–CHC] > [CHC–turn] [turn–CTR] > [CHC– CTR] > [NTR–CTR]. On the other hand, for αS, a stronger segregation of the folded states from the unfolding states could be observed with respect to the correlations between the interaction and the conformational energies (Fig. 14b). A high correlation (PCC ¼ 0.7) for the NAC– CTR interactions surpasses the weakly correlated NTR–NAC and NTR–CTR interactions for the F or the P states, reflecting the high

432

Shayon Bhattacharya et al.

Table 4 Average Interaction energies with standard errors within parentheses between different interacting domains of the F/P and U states of αS showing the PCC with EMM/PBSA for the last 200 ns of simulations. All energy units are in kcal.mol1 FF/WM

NTR–NAC

NAC–CTR

NTR–CTR

EMM/PBSA

Charmm36/mTIP3P

47 (2)

138 (23)

3 (3)

1079 (9)

Charmm36/TIP4P

223 (9)

208 (2)

826 (34)

1195 (8)

Charmm36/TIP4P-Ew

160 (18)

165 (5)

271 (26)

1208 (14)

Charmm36m/mTIP3P

154 (2)

186 (23)

137 (35)

1219 (12)

Charmm36m/sTIP3P

202 (11)

70 (2)

430 (71)

1119 (16)

PCC

0.2

0.7

0.3

Charmm36/TIP4P/2005

130 (9)

58 (1)

240 (42)

966 (30)

Charmm36/TIP4P-D

73 (14)

49 (11)

135 (46)

943 (25)

Charmm22*/TIP4P-D

200 (12)

33 (4)

43 (8)

950 (5)

Amber ff03/TIP4P-D

150 (22)

70 (20)

143 (16)

2087 (116)

Amber ff03ws/TIP4P/2005

143 (3)

25 (6)

62 (51)

2376 (23)

PCC

0.1

0.1

0.3

F or P states

U states

frequency of NAC interactions with a predominantly collapsed CTR. This is in accordance with the prominent existence of hydrophobic interactions between NAC and CTR in the F states. However, the distant NTR–CTR interactions are slightly more favorable for αS (PCC ¼ 0.3) than Aβ42 (PCC ¼ 0.1). The decreasing order or PCC for the F or the P states is [NAC–CTR] [NTR– CTR] > [NTR–NAC]. In the U states, the NAC–CTR interactions are negatively correlated (PCC ¼ 0.1) with the conformational energies showing that such interactions do not contribute toward overall helical unfolding of αS. 2.4.2 Predicting the Effect of Terminal Deletions

In order to validate the finding that terminal groups (NTR in Aβ42 and CTR in αS) have an impact on the overall helical stability of the F states, we simulated terminal deleted variants (NTR deletion or NDEL in Aβ42 and CTER deletion or CDEL in αS) of both Aβ42 and αS (see Table 1) using a subset of physical models based on the following criteria: 1. Two FF/WM models unfold helices completely in both peptides. 2. One FF/WM retains helices in both peptides.

Characterization of Amyloidogenic Peptide Aggregability in Helical Subspace

433

Fig. 14 Pearson correlation coefficient (PCC) between conformational energy and (a) NTR–CHC, CHC–Turn, Turn–CTR, NTR–Turn, CHC–CTR, and NTR–CTR interaction energy in Aβ42, and (b) NTR–NAC, NAC–CTR, and NTR–CTR interaction energy in αS. The standard error bars were estimated using block averages

3. One FF/WM completely unfolds helices in Aβ42, but preserves helices in αS. We find that all four physical models are able to completely unfold helices of the terminal deleted variants, and in a shorter period of time than the FF/WM model that unfolded helices for the full-length peptides, indicating that terminal deletions may compromise the helical stabilities of the peptides. We further accounted for the stability of the peptides due to terminal deletions

434

Shayon Bhattacharya et al.

by computing their conformational energies and comparing them with the conformational energies of the full-length peptides. Figure 15 shows the sequence length—normalized conformational energies of full-length Aβ42 and αS and their terminal deleted variants for the chosen combinations of FF/WM. For both NDEL Aβ42 (Fig. 15a) and CDEL αS (Fig. 15b) peptides, the energies become less favorable in three out of four models, indicating that the terminal groups protect the helical structures from unfolding. Taken together, we propose that the flexible terminal groups in both Aβ42 and αS optimize the helical stabilities via primary hydrophobic interactions with the CHD, and secondary salt bridge formation with other domains, while screening the CHD from surrounding solvent, and imposing a potential impediment to aggregation. Such interactions were absent during complete helix unfolding and the helix was destabilized in terminal-deleted variants suggesting a crucial role of the termini in protecting stable folded helices. 2.5 Benchmarking Against NMR Observables

Comparing de novo MD simulation data of IDPs with the corresponding experimental NMR measurements is a standard procedure to validate the accuracy of the modelled ensemble. Since we assume ergodicity [166] (ensemble average ¼ time average) in MD simulations, the average NMR measurements that are back-calculated from simulations should correlate well with average NMR observables obtained from experiments, if the MD ensemble is physically realistic. We have predicted these NMR observables from our simulations (using the terminal deleted variants and their corresponding full-length counterpart) by using the program SHIFTX2 [167] for chemical shift (CS), MDTraj [168] employing Karplus equation [169, 170] for J-coupling constant, and the PALES [171] program for RDC constants. The error in all backcalculated observables were estimated by block averaging [104, 172, 173]. For the peptide helical monomers investigated here, the secondary CS were calculated by subtracting the predicted random coil CS values (obtained through CamCoil [174]) from the backcalculated/experimental CS values. Δδ ¼ δobserved δrandom coil

ð19:13Þ

where Δδ is the particular atom’s secondary CS, δobserved is the primary CS calculated either from simulations or obtained from experiments, and δrandom coil is the predicted random coil CS from CamCoil. The secondary CS is sensitive to the IDP secondary structures, for example, a positive Cα secondary CS denotes α-helical structures, whereas a negative CS in Cα will identify β-strands, and vice versa for Cβ atoms.

Characterization of Amyloidogenic Peptide Aggregability in Helical Subspace

435

Fig. 15 Normalized conformational energies of the full-length (a) Aβ42 and (b) αS, and their respective terminal deleted variants

The 3JHNHα-coupling constants were back-calculated using the MDTraj [168] python library, which utilizes the Karplus equation [170]. J ðϕÞ ¼ A: cos 2 ðϕÞ þ B: cos ðϕÞ þ C

ð19:14Þ

where A ¼ 6.51, B ¼ 1.76, C ¼ 1.60 are the parameters obtained from Vuister and Bax [175] and ϕ is the dihedral angle formed between HN, N, Cα, and Hα. The N-H RDC values were back-calculated from simulations and best-fitted to experimental RDCs by using the PALES [171] program, following a global prediction of alignment tensor from the experimental dipolar couplings. Thus, the calculated alignment tensor is highly sensitive to experimental conditions. RDCs for ordered α-helices are positive, while for β-strands are negative.

436

Shayon Bhattacharya et al.

For the calculation of CS and J-coupling constants, both the initial folded helical PDB structure and predominantly disordered ensemble experimental values were used for comparison in order to validate the relative helical fold propensities, while for RDC only the disordered experimental values were used for comparison. Although the Pearson Correlation Coefficient (PCC) of CS values are largely degenerate, we obtain a systematically low RMSD of the U states with the predominantly disordered experimental structure, and a lesser deviation of the F states from the predominantly helical PDB structure, and vice versa (Tables 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, and 15). Rather a much clearer distinction between the F and the U states are obtained from the RMSDs between the backcalculated three-bond scalar J-coupling constant values and the experimental structures (Tables 13 and 14). From the comparison Table 5 Pearson correlation coefficient (PCC) and root mean squared deviation (RMSD) of Cα chemical shifts between simulations (back-calculated with SHIFTX2 [167]) and experiments for the F and the U states of Aβ42 which show maximum difference. Two experimental values were chosen to compare: the full helically folded PDB structure 1IYT [78], and the predominantly disordered ensemble obtained by Hou et al. [73]. The RMSD values showing maximum distinction between states are shown in bold 1IYT (Exp.)

Hou et al. [73] (Exp.)

δCα (ppm)

PCC

RMSD (ppm)

PCC

RMSD (ppm)

Charmm36/mTIP3P (F)

0.98

1.33

0.96

2.13

Charmm36m/mTIP3P (U)

0.97

2.20

0.98

1.09

Charmm22*/TIP4P-D (U)

0.98

2.49

0.99

0.59

Amber ff03ws/TIP4P/2005 (U)

0.96

2.25

0.97

1.24

Table 6 PCC and RMSD of Cβ chemical shifts between simulations and experiments for the F and the U states of Aβ42 which show maximum difference. The experimental values chosen to compare (helically folded PDB structure 1IYT [78], and the intrinsically disordered ensemble by Hou et al. [73]). The RMSD values showing maximum distinction between states are shown in bold 1IYT (Exp.)

Hou et al. [73] (Exp.)

δCβ (ppm)

PCC

RMSD (ppm)

PCC

RMSD (ppm)

Charmm36/mTIP3P (F)

1.00

0.94

1.00

1.24

Charmm36m/mTIP3P (U)

1.00

0.94

1.00

0.66

Charmm22*/TIP4P-D (U)

1.00

0.95

1.00

0.43

Amber ff03ws/TIP4P/2005 (U)

1.00

1.02

1.00

0.66

Characterization of Amyloidogenic Peptide Aggregability in Helical Subspace

437

Table 7 PCC and RMSD of Hα chemical shifts between simulations and experiments for the F and the U states of Aβ42 1IYT (Exp.)

Hou et al. [73] (Exp.)

δHα (ppm)

PCC

RMSD (ppm)

PCC

RMSD (ppm)

Charmm36/mTIP3P (F)

0.89

0.15

0.72

0.24

Charmm36m/mTIP3P (U)

0.79

0.15

0.83

0.15

Charmm22*/TIP4P-D (U)

0.84

0.15

0.89

0.13

Amber ff03ws/TIP4P/2005 (U)

0.81

0.16

0.88

0.13

Table 8 PCC and RMSD of N chemical shifts between simulations and experiments for the F and the U states of Aβ42 1IYT (Exp.)

Hou et al. [73] (Exp.)

δN (ppm)

PCC

RMSD (ppm)

PCC

RMSD (ppm)

Charmm36/mTIP3P (F)

0.88

2.57

0.91

3.86

Charmm36m/mTIP3P (U)

0.89

2.88

0.93

2.60

Charmm22*/TIP4P-D (U)

0.94

2.67

0.96

1.90

Amber ff03ws/TIP4P/2005 (U)

0.92

2.86

0.93

2.38

Table 9 PCC and RMSD of Cα chemical shifts from simulations and experiments for the selected F and the U states of αS. Two experimental values chosen to compare are the helically folded micelle-bound PDB structure 1XQ8 [79], and the predominantly disordered ensemble obtained from the study by Porcari et al. [74]. The RMSD values showing maximum distinction between states are shown in bold 1XQ8 (Exp.)

Porcari et al. [74] (Exp.)

δCα (ppm)

PCC

RMSD (ppm)

PCC

RMSD (ppm)

Charmm36/mTIP3P (F)

0.98

1.53

0.97

1.53

Charmm36m/mTIP3P (U)

0.98

1.35

0.97

1.75

Charmm22*/TIP4P-D (U)

0.97

2.07

0.99

0.82

Amber ff03ws/TIP4P/2005 (U)

0.96

2.48

1.00

0.60

438

Shayon Bhattacharya et al.

Table 10 PCC and RMSD of Cβ chemical shifts between simulations and experiments for the F and the U states of αS which show maximum difference. The experimental values chosen to compare (helically folded micelle-bound PDB structure 1XQ8 [79], and the predominantly disordered ensemble obtained from the study by Porcari et al. [74]). The RMSD values showing maximum distinction between states are shown in bold 1XQ8 (Exp.)

Porcari et al. [74] (Exp.)

δCβ (ppm)

PCC

RMSD (ppm)

PCC

RMSD (ppm)

Charmm36/mTIP3P (F)

1.00

0.51

1.00

0.65

Charmm36m/mTIP3P (U)

1.00

0.37

1.00

0.60

Charmm22*/TIP4P-D (U)

1.00

0.67

1.00

0.60

Amber ff03ws/TIP4P/2005 (U)

1.00

1.00

1.00

0.49

Table 11 PCC and RMSD of Hα chemical shifts between simulations and experiments for the F and the U states of αS 1XQ8 (Exp.)

Porcari et al. [74] (Exp.)

δHα (ppm)

PCC

RMSD (ppm)

PCC

RMSD (ppm)

Charmm36/mTIP3P (F)

0.95

0.08

0.85

0.16

Charmm36m/mTIP3P (U)

0.92

0.10

0.80

0.18

Charmm22*/TIP4P-D (U)

0.67

0.18

0.75

0.17

Amber ff03ws/TIP4P/2005 (U)

0.83

0.19

0.90

0.07

Table 12 PCC and RMSD of N chemical shifts between simulations and experiments for the F and the U states of αS 1XQ8 (Exp.)

Porcari et al. [74] (Exp.)

δN (ppm)

PCC

RMSD (ppm)

PCC

RMSD (ppm)

Charmm36/mTIP3P (F)

0.94

1.83

0.93

3.01

Charmm36m/mTIP3P (U)

0.92

1.97

0.90

3.32

Charmm22*/TIP4P-D (U)

0.84

3.18

0.92

2.30

Amber ff03ws/TIP4P/2005 (U)

0.94

2.49

0.97

1.77

Characterization of Amyloidogenic Peptide Aggregability in Helical Subspace

439

Table 13 PCC and RMSD between back-calculated (with MDTraj [168]) three-bond H-N-Hα scalar J-coupling constants from simulations and experimental values of Aβ42 J-coupling. Like chemical shifts, the Jcouplings obtained from simulations were compared with two different experimental values: the fully folded PDB structure and experimentally obtained predominantly disordered ensemble from Rosenman et al. [47]. The RMSD values showing maximum distinction between states are shown in bold 1IYT (Exp.)

Rosenman et al. [47] (Exp.)

3

PCC

RMSD (Hz)

PCC

RMSD (Hz)

Charmm36/mTIP3P (F)

0.26

1.87

0.12

2.35

Charmm36m/mTIP3P (U)

0.05

2.69

0.22

1.36

Charmm22*/TIP4P-D (U)

0.22

2.87

0.53

1.10

Amber ff03ws/TIP4P-2005 (U)

0.07

2.65

0.31

1.25

JHNHα (Hz)

Table 14 PCC and RMSD between back-calculated H-N-Hα scalar J-coupling constants from simulations and experimental values of αS J-coupling. The back-calculated J-couplings from simulations were compared with two different experimental values: the fully folded PDB structure and experimentally obtained predominantly disordered ensemble from Schwalbe et al. [75]. The RMSD values showing maximum distinction between states are shown in bold 1XQ8 (Exp.)

Schwalbe et al. [75] (Exp.)

3

PCC

RMSD (Hz)

PCC

Charmm36/mTIP3P (F)

0.31

1.80

0.06

2.37

Charmm36m/mTIP3P (U)

0.19

2.49

0.03

2.04

Charmm22*/TIP4P-D (U)

0.05

2.56

0.19

1.66

Amber ff03ws/TIP4P-2005 (U)

0.01

2.57

0.12

1.27

JHNHα (Hz)

RMSD (Hz)

Table 15 PCC between Residual Dipolar Coupling (RDC) constants back-calculated with the PALES [171] software and experiments for Aβ42. The experimentally determined RDCs for disordered ensemble was only used for validation, as the alignment tensor for RDC calculations depend heavily on the experimental conditions used. Experimental RDC values were obtained from Yan et al. [76]. The highest PCC value (for U states) is shown in bold

1

Yan et al. [76] (Exp.) PCC

Charmm36/mTIP3P (F)

0.22

Charmm36m/mTIP3P (U)

0.33

Charmm22*/TIP4P-D (U)

0.30

Amber ff03ws/TIP4P-2005 (U)

0.35

DNH (Hz)

440

Shayon Bhattacharya et al.

Table 16 PCC between back-calculated RDC constants from simulations and experiments for αS. Only the experimentally determined RDCs for disordered ensemble was used for validation. Experimental RDC values were obtained from Narayanan et al. [77]. The highest PCC value (for U states) is shown in bold

1

Narayanan et al. [77] (Exp.) PCC

Charmm36/mTIP3P (F)

0.20

Charmm36m/mTIP3P (U)

0.33

Charmm22*/TIP4P-D (U)

0.72

Amber ff03ws/TIP4P-2005 (U)

0.74

DNH (Hz)

table of RDC values (Tables 15 and 16), we find that the U states are closer to the experimental values than the F states. 2.6 Estimation of Error

The procedure adopted for estimating standard error of mean (Eq. 15) would not apply to the data generated by MD simulations, as these are highly correlated data having a sparse population distribution, and would lead to underestimation of the actual error [176]. 0

N P

2

B ðx i  x Þ B σ x ¼ Bi¼1 @ N ðN  1Þ

112 C C C A

ð19:15Þ

where σ x is the standard error of mean, N is the number of conformations in the trajectory, xi is the value of the data generated for each conformation by simulation having a mean value of x. In light of the above shortcoming, we have employed the block averaging procedure [104, 172, 173] for error estimation in our MD simulation data and in the back-calculated NMR observables. Block averaging is a simple method to implement for time-correlated trajectories (as in our case), where first the total time of simulations performed is divided into N blocks of equal size τb such that τsim ¼ N τb

ð19:16Þ

where τsim is the total performed time of simulation. The average measured property for each block can then be expressed as hA ib ¼

τb 1 X A τb i¼1 i

ð19:17Þ

where hAib is each block average comprising the measured property in the ith block, Ai. The hAib tends to become more independent

Characterization of Amyloidogenic Peptide Aggregability in Helical Subspace

441

(being initially correlated with small block sizes) as the block size (τb) is increased. The idea here is to find the limit or threshold when the measured properties pertaining to trajectory conformations become uncorrelated or independent, so that a suitable block size can be selected for computing errors. Thus, the variance of the mean (σ 2hAi ) in the measured average property of each block (hAib) b can be given as σ 2hAib ¼

N 2 1 X hA ib  hA isim N

ð19:18Þ

b¼1

where hAisim is the average property of the entire simulated ensemble. From here on, the threshold value (s) for calculating errors of independent properties can be interpreted as s ¼ lim

τb !1

τb σ 2hAi σ 2A

b

ð19:19Þ

where σ 2A is the variance corresponding to the property A. Finally, from this threshold value s (which is basically the block size, τb corresponding to first uncorrelated data), the error (σ hAisim ) can be estimated as  12 s σ hAisim  σ A ð19:20Þ M where M is the number of steps used in simulation, and σ A is the standard deviation of the property A. Therefore, the error calculated by blocking procedure is actually the standard deviation of the mean of block averages or square root of the variance of block averages.

3

Conclusion In this chapter, we have summarized the computational design strategies, methods and predictive models used to characterize the aggregation propensity of amyloidogenic peptides (in particular amyloid-β42 and α-synuclein) in helical subspaces. By highlighting the predictive power of diverse physical models used in atomistic simulations, we identify possible long-range allosteric regulation of aggregation of short-lived helical monomers and identify the shortrange intrapeptide contacts that stabilize long-lived helical conformations. These unprecedently detailed models suggest strategies to curb oligomerization of locally folded helices by targeting the flexible termini of amyloidogenic peptides and serve as a further illustration of the potential of well-benchmarked predictive computational modelling to identify new leads for experiment.

442

Shayon Bhattacharya et al.

References 1. Bhattacharya S, Xu L, Thompson D (2018) Revisiting the earliest signatures of amyloidogenesis: roadmaps emerging from computational modeling and experiment. Wiley Interdiscip Rev Comput Mol Sci 8:e1359 2. Ke PC et al (2017) Implications of peptide assemblies in amyloid diseases. Chem Soc Rev 46:6492–6531 3. Haass C, Selkoe DJ (2007) Soluble protein oligomers in neurodegeneration: lessons from the Alzheimer’s amyloid beta-peptide. Nat Rev Mol Cell Biol 8:101–112 4. Ono K, Condron MM, Teplow DB (2009) Structure-neurotoxicity relationships of amyloid beta-protein oligomers. Proc Natl Acad Sci U S A 106:14745–14750 5. Nasica-Labouze J et al (2015) Amyloid β protein and Alzheimer’s disease: when computer simulations complement experimental studies. Chem Rev 115:3518–3563 6. Maries E, Dass B, Collier TJ, Kordower JH, Steece-Collier K (2003) The role of alphasynuclein in Parkinson’s disease: insights from animal models. Nat Rev Neurosci 4:727–738 7. Yan Y, Wang C (2006) Aβ42 is more rigid than Aβ40 at the C terminus: implications for Aβ aggregation and toxicity. J Mol Biol 364:853–862 8. Hashimoto M, Takenouchi T, Mallory M, Masliah E, Takeda A (2000) The role of NAC in amyloidogenesis in Alzheimer’s disease. Am J Pathol 156:734–736 9. Liu R, McAllister C, Lyubchenko Y, Sierks MR (2004) Residues 17-20 and 30-35 of beta-amyloid play critical roles in aggregation. J Neurosci Res 75:162–171 10. Periquet M, Fulga T, Myllykangas L, Schlossmacher MG, Feany MB (2007) Aggregated alpha-synuclein mediates dopaminergic neurotoxicity in vivo. J Neurosci 27:3338–3346 11. Uversky VN, Lee HJ, Li J, Fink AL, Lee SJ (2001) Stabilization of partially folded conformation during alpha-synuclein oligomerization in both purified and cytosolic preparations. J Biol Chem 276:43495–43498 12. Ma B, Nussinov R (2006) The stability of monomeric intermediates controls amyloid formation: Abeta25-35 and its N27Q mutant. Biophys J 90:3365–3374 13. Uversky VN, Li J, Fink AL (2001) Evidence for a partially folded intermediate in alpha-

synuclein fibril formation. J Biol Chem 276:10737–10744 14. Khurana R et al (2001) Partially folded intermediates as critical precursors of light chain amyloid fibrils and amorphous aggregates. Biochemistry 40:3525–3535 15. Lapidus LJ (2013) Understanding protein aggregation from the view of monomer dynamics. Mol BioSyst 9:29–35 16. Acharya S, Srivastava KR, Nagarajan S, Lapidus LJ (2016) Monomer dynamics of Alzheimer peptides and kinetic control of early aggregation in Alzheimer’s disease. ChemPhysChem 17:3470–3479 17. Acharya S, Saha S, Ahmad B, Lapidus LJ (2015) Effects of mutations on the reconfiguration rate of alpha-synuclein. J Phys Chem B 119:15443–15450 18. Zhuravlev PI, Reddy G, Straub JE, Thirumalai D (2014) Propensity to form amyloid fibrils is encoded as excitations in the free energy landscape of monomeric proteins. J Mol Biol 426:2653–2666 19. Chakraborty D, Straub JE, Thirumalai D (2020) Differences in the free energies between the excited states of Aβ40 and Aβ42 monomers encode their distinct aggregation propensities. bioRxiv 20. Abedini A, Raleigh DP (2009) A role for helical intermediates in amyloid formation by natively unfolded polypeptides? Phys Biol 6:015005 21. Abedini A, Cao P, Raleigh DP (2016) Detection of helical intermediates during amyloid formation by intrinsically disordered polypeptides and proteins. Methods Mol Biol 1345:55–66 22. Kirkitadze MD, Condron MM, Teplow DB (2001) Identification and characterization of key kinetic intermediates in amyloid beta-protein fibrillogenesis. J Mol Biol 312:1103–1119 23. Misra P, Kodali R, Chemuru S, Kar K, Wetzel R (2016) Rapid alpha-oligomer formation mediated by the Abeta C terminus initiates an amyloid assembly pathway. Nat Commun 7:12419 24. Dammers C et al (2017) Pyroglutamatemodified amyloid-beta(3-42) shows alphahelical intermediates before amyloid formation. Biophys J 112:1621–1633

Characterization of Amyloidogenic Peptide Aggregability in Helical Subspace 25. Apetri MM, Maiti NC, Zagorski MG, Carey PR, Anderson VE (2006) Secondary structure of alpha-synuclein oligomers: characterization by raman and atomic force microscopy. J Mol Biol 355:63–71 26. Ghosh D et al (2015) Structure based aggregation studies reveal the presence of helix-rich intermediate during alpha-synuclein aggregation. Sci Rep 5:9228 27. Fezoui Y, Teplow DB (2002) Kinetic studies of amyloid beta-protein fibril assembly. Differential effects of alpha-helix stabilization. J Biol Chem 277:36948–36954 28. Liu G et al (2010) Mechanistic studies of peptide self-assembly: transient alpha-helices to stable beta-sheets. J Am Chem Soc 132:18223–18232 29. Anderson VL, Ramlall TF, Rospigliosi CC, Webb WW, Eliezer D (2010) Identification of a helical intermediate in trifluoroethanolinduced alpha-synuclein aggregation. Proc Natl Acad Sci U S A 107:18850–18855 30. Rovere M, Sanderson JB, Fonseca-Ornelas L, Patel DS, Bartels T (2018) Refolding of helical soluble α-synuclein through transient interaction with lipid interfaces. FEBS Lett 592(9):1464–1472 31. Vivekanandan S, Brender JR, Lee SY, Ramamoorthy A (2011) A partially folded structure of amyloid-beta(1-40) in an aqueous environment. Biochem Biophys Res Commun 411:312–316 32. Abedini A, Raleigh DP (2009) A critical assessment of the role of helical intermediates in amyloid formation by natively unfolded proteins and polypeptides. Protein Eng Des Sel 22:453–459 33. Nerelius C et al (2009) Alpha-helix targeting reduces amyloid-beta peptide toxicity. Proc Natl Acad Sci U S A 106:9191–9196 34. Kumar S, Hamilton AD (2017) alpha-Helix mimetics as modulators of Abeta selfassembly. J Am Chem Soc 139:5744–5755 35. Crespi GA, Hermans SJ, Parker MW, Miles LA (2015) Molecular basis for mid-region amyloid-beta capture by leading Alzheimer’s disease immunotherapies. Sci Rep 5:9649 36. Miles LA, Crespi GA, Doughty L, Parker MW (2013) Bapineuzumab captures the N-terminus of the Alzheimer’s disease amyloid-beta peptide in a helical conformation. Sci Rep 3:1302 37. Fonseca-Ornelas L et al (2014) Small molecule-mediated stabilization of vesicleassociated helical alpha-synuclein inhibits pathogenic misfolding and aggregation. Nat Commun 5:5857

443

38. Fonseca-Ornelas L et al (2017) Smallmolecule-induced soluble oligomers of alpha-synuclein with helical structure. Chemistry 23:13010–13014 39. Roche J, Shen Y, Lee JH, Ying J, Bax A (2016) Monomeric Abeta(1-40) and Abeta (1-42) peptides in solution adopt very similar ramachandran map distributions that closely resemble random coil. Biochemistry 55:762–775 40. Waudby CA et al (2013) In-cell NMR characterization of the secondary structure populations of a disordered conformation of alphasynuclein within E. coli cells. PLoS One 8: e72286 41. Bhattacharya S, Xu L, Thompson D (2019) Molecular simulations reveal terminal group mediated stabilization of helical conformers in both amyloid-β42 and α-synuclein. ACS Chem Neurosci 10(6):2830–2842 42. Xu L, Bhattacharya S, Thompson D (2018) Re-designing the alpha-synuclein tetramer. Chem Commun (Camb) 54:8080–8083 43. Xu L, Bhattacharya S, Thompson D (2019) On the ubiquity of helical alpha-synuclein tetramers. Phys Chem Chem Phys 21:12036–12043 44. Bhattacharya S, Xu L, Thompson D (2020) Long-range regulation of partially folded amyloidogenic peptides. Sci Rep 10:7597 45. Nirmalraj PN et al (2020) Complete aggregation pathway of amyloid β (1-40) and (1-42) resolved on an atomically clean interface. Sci Adv 6:eaaz6014 46. Riccardi L, Nguyen PH, Stock G (2012) Construction of the free energy landscape of peptide aggregation from molecular dynamics simulations. J Chem Theory Comput 8:1471–1479 47. Rosenman DJ, Connors CR, Chen W, Wang C, Garcia AE (2013) Abeta monomers transiently sample oligomer and fibril-like configurations: ensemble characterization using a combined MD/NMR approach. J Mol Biol 425:3338–3359 48. Zheng W, Tsai MY, Chen M, Wolynes PG (2016) Exploring the aggregation free energy landscape of the amyloid-beta protein (1-40). Proc Natl Acad Sci U S A 113:11835–11840 49. Zheng W, Tsai MY, Wolynes PG (2017) Comparing the aggregation free energy landscapes of amyloid beta(1-42) and amyloid beta(1-40). J Am Chem Soc 139:16666–16676 50. Yang M, Teplow DB (2008) Amyloid β-protein monomer folding: free energy

444

Shayon Bhattacharya et al.

surfaces reveal alloform specific differences. J Mol Biol 384:450–464 51. Lin YS, Bowman GR, Beauchamp KA, Pande VS (2012) Investigating how peptide length and a pathogenic mutation modify the structural ensemble of amyloid beta monomer. Biophys J 102:315–324 52. Sgourakis NG et al (2011) Atomic-level characterization of the ensemble of the Abeta (1-42) monomer in water using unbiased molecular dynamics simulations and spectral algorithms. J Mol Biol 405:570–583 53. Rosenman DJ, Wang C, Garcia AE (2016) Characterization of Abeta monomers through the convergence of ensemble properties among simulations with multiple force fields. J Phys Chem B 120:259–277 54. Allison JR, Rivers RC, Christodoulou JC, Vendruscolo M, Dobson CM (2014) A relationship between the transient structure in the monomeric state and the aggregation propensities of alpha-synuclein and beta-synuclein. Biochemistry 53:7170–7183 55. Allison JR, Varnai P, Dobson CM, Vendruscolo M (2009) Determination of the free energy landscape of alpha-synuclein using spin label nuclear magnetic resonance measurements. J Am Chem Soc 131:18314–18326 56. Dedmon MM, Lindorff-Larsen K, Christodoulou J, Vendruscolo M, Dobson CM (2005) Mapping long-range interactions in alpha-synuclein using spin-label NMR and ensemble molecular dynamics simulations. J Am Chem Soc 127:476–477 57. Wise-Scira O, Aloglu AK, Dunn A, Sakallioglu IT, Coskuner O (2013) Structures and free energy landscapes of the wild-type and A30P mutant-type alpha-synuclein proteins with dynamics. ACS Chem Neurosci 4:486–497 58. Gurry T et al (2013) The dynamic structure of alpha-synuclein multimers. J Am Chem Soc 135:3865–3872 59. Rossetti G et al (2016) Conformational ensemble of human alpha-synuclein physiological form predicted by molecular simulations. Phys Chem Chem Phys 18:5702–5706 60. Cote Y, Delarue P, Scheraga HA, Senet P, Maisuradze GG (2018) From a highly disordered to a metastable state: uncovering insights of alpha-synuclein. ACS Chem Neurosci 9:1051–1065 61. Kukharenko O, Sawade K, Steuer J, Peter C (2016) Using dimensionality reduction to systematically expand conformational

sampling of intrinsically disordered peptides. J Chem Theory Comput 12:4726–4734 62. Best RB et al (2012) Optimization of the additive CHARMM all-atom protein force field targeting improved sampling of the backbone phi, psi and side-chain chi(1) and chi (2) dihedral angles. J Chem Theory Comput 8:3257–3273 63. MacKerell AD et al (1998) All-atom empirical potential for molecular modeling and dynamics studies of proteins. J Phys Chem B 102:3586–3616 64. Jorgensen WL, Chandrasekhar J, Madura JD, Impey RW, Klein ML (1983) Comparison of simple potential functions for simulating liquid water. J Chem Phys 79:926–935 65. Abascal JL, Vega C (2005) A general purpose model for the condensed phases of water: TIP4P/2005. J Chem Phys 123:234505 66. Horn HW et al (2004) Development of an improved four-site water model for biomolecular simulations: TIP4P-Ew. J Chem Phys 120:9665–9678 67. Piana S, Donchev AG, Robustelli P, Shaw DE (2015) Water dispersion interactions strongly influence simulated structural properties of disordered protein states. J Phys Chem B 119:5113–5123 68. Piana S, Lindorff-Larsen K, Shaw DE (2011) How robust are protein folding simulations with respect to force field parameterization? Biophys J 100:L47–L49 69. Huang J et al (2017) CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nat Methods 14:71–73 70. Duan Y et al (2003) A point-charge force field for molecular mechanics simulations of proteins based on condensed-phase quantum mechanical calculations. J Comput Chem 24:1999–2012 71. Hornak V et al (2006) Comparison of multiple Amber force fields and development of improved protein backbone parameters. Proteins 65:712–725 72. Best RB, Zheng W, Mittal J (2014) Balanced protein–water interactions improve properties of disordered proteins and non-specific protein association. J Chem Theory Comput 10:5113–5124 73. Hou L et al (2004) Solution NMR studies of the A beta(1-40) and A beta(1-42) peptides establish that the Met35 oxidation state affects the mechanism of amyloid formation. J Am Chem Soc 126:1992–2005 74. Porcari R et al (2015) The H50Q mutation induces a 10-fold decrease in the solubility of

Characterization of Amyloidogenic Peptide Aggregability in Helical Subspace alpha-synuclein. J Biol Chem 290:2395–2404 75. Schwalbe M et al (2014) Predictive atomic resolution descriptions of intrinsically disordered hTau40 and alpha-synuclein in solution from NMR and small angle scattering. Structure 22:238–249 76. Yan Y, McCallum SA, Wang C (2008) M35 oxidation induces Abeta40-like structural and dynamical changes in Abeta42. J Am Chem Soc 130:5394–5395 77. Narayanan C, Weinstock DS, Wu K-P, Baum J, Levy RM (2012) Investigation of the polymeric properties of α-synuclein and comparison with NMR experiments: a replica exchange molecular dynamics study. J Chem Theory Comput 8:3929–3942 78. Crescenzi O et al (2002) Solution structure of the Alzheimer amyloid beta-peptide (1-42) in an apolar microenvironment. Similarity with a virus fusion domain. Eur J Biochem 269:5642–5648 79. Ulmer TS, Bax A, Cole NB, Nussbaum RL (2005) Structure and dynamics of micellebound human alpha-synuclein. J Biol Chem 280:9595–9603 80. Tomaselli S et al (2006) The alpha-to-beta conformational transition of Alzheimer’s Abeta-(1-42) peptide in aqueous media is reversible: a step by step conformational analysis suggests the location of beta conformation seeding. Chembiochem 7:257–267 81. Rao JN, Jao CC, Hegde BG, Langen R, Ulmer TS (2010) A combinatorial NMR and EPR approach for evaluating the structural ensemble of partially folded proteins. J Am Chem Soc 132:8657–8668 82. Luhrs T et al (2005) 3D structure of Alzheimer’s amyloid-beta(1-42) fibrils. Proc Natl Acad Sci U S A 102:17342–17347 83. Colvin MT et al (2016) Atomic resolution structure of monomorphic Abeta42 amyloid fibrils. J Am Chem Soc 138:9663–9674 84. Walti MA et al (2016) Atomic-resolution structure of a disease-relevant Abeta(1-42) amyloid fibril. Proc Natl Acad Sci U S A 113:E4976–E4984 85. Gremer L et al (2017) Fibril structure of amyloid-beta(1-42) by cryo-electron microscopy. Science 358:116–119 86. Sgourakis NG, Yan Y, McCallum SA, Wang C, Garcia AE (2007) The Alzheimer’s peptides Abeta40 and 42 adopt distinct conformations in water: a combined MD/NMR study. J Mol Biol 368:1448–1457 87. Xu L, Chen Y, Wang X (2014) Dual effects of familial Alzheimer’s disease mutations (D7H,

445

D7N, and H6R) on amyloid β peptide: correlation dynamics and zinc binding. Proteins 82:3286–3297 88. Das P, Murray B, Belfort G (2015) Alzheimer’s protective A2T mutation changes the conformational landscape of the Abeta(1)(-) (4)(2) monomer differently than does the A2V mutation. Biophys J 108:738–747 89. Das P, Chacko AR, Belfort G (2017) Alzheimer’s protective cross-interaction between wild-type and A2T variants alters Abeta42 dimer structure. ACS Chem Neurosci 8:606–618 90. Chakraborty S, Das P (2017) Emergence of alternative structures in amyloid beta 1-42 monomeric landscape by N-terminal hexapeptide amyloid inhibitors. Sci Rep 7:9941 91. Xu L, Nussinov R, Ma B (2016) Allosteric stabilization of the amyloid-beta peptide hairpin by the fluctuating N-terminal. Chem Commun (Camb) 52:1733–1736 92. Levitan K et al (2011) Conserved C-terminal charge exerts a profound influence on the aggregation rate of alpha-synuclein. J Mol Biol 411:329–333 93. Afitska K, Fucikova A, Shvadchak VV, Yushchenko DA (2017) Modification of C terminus provides new insights into the mechanism of alpha-synuclein aggregation. Biophys J 113:2182–2191 94. Li W et al (2005) Aggregation promoting C-terminal truncation of alpha-synuclein is a normal cellular process and is enhanced by the familial Parkinson’s disease-linked mutations. Proc Natl Acad Sci U S A 102:2162–2167 95. Muntane G, Ferrer I, Martinez-Vicente M (2012) alpha-Synuclein phosphorylation and truncation are normal events in the adult human brain. Neuroscience 200:106–119 96. Iyer A et al (2017) C-terminal truncated alpha-synuclein fibrils contain strongly twisted beta-sheets. J Am Chem Soc 139:15392–15400 97. Bertoncini CW et al (2005) Release of longrange tertiary interactions potentiates aggregation of natively unstructured alphasynuclein. Proc Natl Acad Sci U S A 102:1430–1435 98. Hong DP, Xiong W, Chang JY, Jiang C (2011) The role of the C-terminus of human alpha-synuclein: intra-disulfide bonds between the C-terminus and other regions stabilize non-fibrillar monomeric isomers. FEBS Lett 585:561–566 99. McClendon S, Rospigliosi CC, Eliezer D (2009) Charge neutralization and collapse of

446

Shayon Bhattacharya et al.

the C-terminal tail of alpha-synuclein at low pH. Protein Sci 18:1531–1540 100. Carballo-Pacheco M, Strodel B (2017) Comparison of force fields for Alzheimer’s A β42: a case study for intrinsically disordered proteins. Protein Sci 26:174–185 101. Rauscher S et al (2015) Structural ensembles of intrinsically disordered proteins depend strongly on force field: a comparison to experiment. J Chem Theory Comput 11:5513–5524 102. Hoffmann KQ, McGovern M, Chiu CC, de Pablo JJ (2015) Secondary structure of rat and human amylin across force fields. PLoS One 10:e0134091 103. Tuckerman M (2010) Statistical mechanics: theory and molecular simulation. Oxford University Press, Oxford 104. Leach AR, Leach AR (2001) Molecular modelling: principles and applications. Pearson Education, Harlow 105. Berendsen HJ, van der Spoel D, van Drunen R (1995) GROMACS: a message-passing parallel molecular dynamics implementation. Comput Phys Commun 91:43–56 106. Swope WC, Andersen HC, Berens PH, Wilson KR (1982) A computer simulation method for the calculation of equilibrium constants for the formation of physical clusters of molecules: application to small water clusters. J Chem Phys 76:637–649 107. Hess B, Bekker H, Berendsen HJC, Fraaije JGEM (1997) LINCS: a linear constraint solver for molecular simulations. J Comput Chem 18:1463–1472 108. Miyamoto S, Kollman PA (1992) Settle: an analytical version of the SHAKE and RATTLE algorithm for rigid water models. J Comput Chem 13:952–962 109. Darden T, York D, Pedersen L (1993) Particle mesh Ewald: an N· log (N) method for Ewald sums in large systems. J Chem Phys 98:10089–10092 110. Bussi G, Donadio D, Parrinello M (2007) Canonical sampling through velocity rescaling. J Chem Phys 126:014101 111. Berendsen HJ, Postma JV, van Gunsteren WF, DiNola A, Haak J (1984) Molecular dynamics with coupling to an external bath. J Chem Phys 81:3684–3690 112. Wang L, Friesner RA, Berne BJ (2011) Replica exchange with solute scaling: a more efficient version of replica exchange with solute tempering (REST2). J Phys Chem B 115:9431–9438

113. Chebaro Y, Ballard AJ, Chakraborty D, Wales DJ (2015) Intrinsically disordered energy landscapes. Sci Rep 5:10386 114. Hansmann UH (1997) Parallel tempering algorithm for conformational studies of biological molecules. Chem Phys Lett 281:140–150 115. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21:1087–1092 116. Fukunishi H, Watanabe O, Takada S (2002) On the Hamiltonian replica exchange method for efficient sampling of biomolecular systems: application to protein structure prediction. J Chem Phys 116:9058–9067 117. Liu P, Kim B, Friesner RA, Berne BJ (2005) Replica exchange with solute tempering: a method for sampling biological systems in explicit water. Proc Natl Acad Sci U S A 102:13749–13754 118. Sawle L, Ghosh K (2016) Convergence of molecular dynamics simulation of protein native states: feasibility vs self-consistency dilemma. J Chem Theory Comput 12:861–869 119. Romo TD, Grossfield A (2011) Block covariance overlap method and convergence in molecular dynamics simulation. J Chem Theory Comput 7:2464–2472 120. Hess B (2002) Convergence of sampling in protein simulations. Phys Rev E Stat Nonlinear Soft Matter Phys 65:031910 121. Grossfield A, Feller SE, Pitman MC (2007) Convergence of molecular dynamics simulations of membrane proteins. Proteins 67:31–40 122. Romo TD, Grossfield A (2013) LOOS: a tool for making new tools for analyzing molecular simulations. Biophys J 104:504a 123. Best RB, Hummer G, Eaton WA (2013) Native contacts determine protein folding mechanisms in atomistic simulations. Proc Natl Acad Sci U S A 110:17874–17879 124. Meshkin H, Zhu F (2017) Thermodynamics of protein folding studied by umbrella sampling along a reaction coordinate of native contacts. J Chem Theory Comput 13:2086–2097 125. Turjanski AG, Gutkind JS, Best RB, Hummer G (2008) Binding-induced folding of a natively unstructured transcription factor. PLoS Comput Biol 4:e1000060 126. Ithuralde RE, Roitberg AE, Turjanski AG (2016) Structured and Unstructured binding of an intrinsically disordered protein as

Characterization of Amyloidogenic Peptide Aggregability in Helical Subspace revealed by atomistic simulations. J Am Chem Soc 138:8742–8751 127. Daura X et al (1999) Peptide folding: when simulation meets experiment. Angew Chem Int Ed 38:236–240 128. Tuttle MD et al (2016) Solid-state NMR structure of a pathogenic fibril of full-length human alpha-synuclein. Nat Struct Mol Biol 23:409–415 129. Xu L, Bhattacharya S, Thompson D (2018) The fold preference and thermodynamic stability of alpha-synuclein fibrils is encoded in the non-amyloid-beta component region. Phys Chem Chem Phys 20:4502–4512 130. Zangi R, Kovacs H, van Gunsteren WF, Johansson J, Mark AE (2001) Free energy barrier estimation of unfolding the alphahelical surfactant-associated polypeptide C. Proteins 43:395–402 131. Pokhrel R et al (2018) Domain rearrangement and denaturation in Ebola virus protein VP40. AIP Adv 8:125129 132. Fritsche M, Pandey RB, Farmer BL, Heermann DW (2012) Conformational temperature-dependent behavior of a histone H2AX: a coarse-grained Monte Carlo approach via knowledge-based interaction potentials. PLoS One 7:e32075 133. Mirau P, Farmer BL, Pandey RB (2015) Structural variation of alpha-synuclein with temperature by a coarse-grained approach with knowledge-based interactions. AIP Adv 5:092504 134. Essmann U et al (1995) A smooth particle mesh Ewald method. J Chem Phys 103:8577–8593 135. Butts CT (2009) Revisiting the foundations of network analysis. Science 325:414–416 136. Schmitt FO (1956) Patterns of interaction of biological macromolecules in relation to cell function. Proc Natl Acad Sci U S A 42:806–810 137. Alon U (2006) An introduction to systems biology: design principles of biological circuits. Chapman and Hall/CRC, London 138. Kovacs IA et al (2019) Network-based prediction of protein interactions. Nat Commun 10:1240 139. Greene CS (2017) Tell me your neighbors, and I will tell you what you are. Sci Transl Med 9(376):eaam6058 140. Vinayagam A et al (2011) A directed protein interaction network for investigating intracellular signal transduction. Sci Signal 4:rs8 141. Bertolazzi P, Bock ME, Guerra C (2013) On the functional and structural characterization

447

of hubs in protein-protein interaction networks. Biotechnol Adv 31:274–286 142. DasGupta B, Liang J (2016) Models and algorithms for biomolecules and molecular networks. Wiley, Hoboken 143. Doshi U, Holliday MJ, Eisenmesser EZ, Hamelberg D (2016) Dynamical network of residue-residue contacts reveals coupled allosteric effects in recognition, catalysis, and mutation. Proc Natl Acad Sci U S A 113:4735–4740 144. Papaleo E (2015) Integrating atomistic molecular dynamics simulations, experiments, and network analysis to study protein dynamics: strength in unity. Front Mol Biosci 2:28 145. Bode C et al (2007) Network analysis of protein dynamics. FEBS Lett 581:2776–2782 146. Holliday MJ, Camilloni C, Armstrong GS, Vendruscolo M, Eisenmesser EZ (2017) Networks of dynamic allostery regulate enzyme function. Structure 25:276–286 147. Yao XQ et al (2016) Dynamic coupling and allosteric networks in the alpha subunit of heterotrimeric G proteins. J Biol Chem 291:4742–4753 148. Sethi A, Eargle J, Black AA, Luthey-Schulten Z (2009) Dynamical networks in tRNA:protein complexes. Proc Natl Acad Sci U S A 106:6620–6625 149. Skjaerven L, Yao XQ, Scarabelli G, Grant BJ (2014) Integrating protein structural dynamics and evolutionary analysis with Bio3D. BMC Bioinform 15:399 150. Grant BJ, Rodrigues AP, ElSawy KM, McCammon JA, Caves LS (2006) Bio3d: an R package for the comparative analysis of protein structures. Bioinformatics 22:2695–2696 151. Ichiye T, Karplus M (1991) Collective motions in proteins: a covariance analysis of atomic fluctuations in molecular dynamics and normal mode simulations. Proteins 11:205–217 152. Yao XQ, Skjaerven L, Grant BJ (2016) Rapid characterization of allosteric networks with ensemble normal mode analysis. J Phys Chem B 120:8276–8288 153. Yen JY (1971) Finding the k shortest loopless paths in a network management. Science 17:712–716 154. Lange OF, Grubmu¨ller H (2006) Generalized correlation for biomolecular dynamics. Proteins 62:1053–1061 155. Mercadante D, Grater F, Daday C (2018) CONAN: a tool to decode dynamical information from molecular interaction maps. Biophys J 114:1267–1273

448

Shayon Bhattacharya et al.

156. Wang W et al (2011) A soluble alphasynuclein construct forms a dynamic tetramer. Proc Natl Acad Sci U S A 108:17797–17802 157. Bartels T, Choi JG, Selkoe DJ (2011) alphaSynuclein occurs physiologically as a helically folded tetramer that resists aggregation. Nature 477:107–110 158. Kollman PA et al (2000) Calculating structures and free energies of complex molecules: combining molecular mechanics and continuum models. Acc Chem Res 33:889–897 159. Case DA, et al (2012) University of California, San Francisco 160. Pearlman DA et al (1995) AMBER, a package of computer programs for applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to simulate the structural and energetic properties of molecules. Comput Phys Commun 91:1–41 161. Still WC, Tempczyk A, Hawley RC, Hendrickson T (1990) Semianalytical treatment of solvation for molecular mechanics and dynamics. J Am Chem Soc 112:6127–6129 162. Honig B, Nicholls A (1995) Classical electrostatics in biology and chemistry. Science 268:1144–1149 163. Srinivasan J, Miller J, Kollman PA, Case DA (1998) Continuum solvent studies of the stability of RNA hairpin loops and helices. J Biomol Struct Dyn 16:671–682 164. Phillips JC et al (2005) Scalable molecular dynamics with NAMD. J Comput Chem 26:1781–1802 165. Humphrey W, Dalke A, Schulten K (1996) VMD: visual molecular dynamics. J Mol Graph 14:33–38 166. Moore CC (2015) Ergodic theorem, ergodic theory, and statistical mechanics. Proc Natl Acad Sci U S A 112:1907–1911

167. Han B, Liu Y, Ginzinger SW, Wishart DS (2011) SHIFTX2: significantly improved protein chemical shift prediction. J Biomol NMR 50:43–57 168. McGibbon RT et al (2015) MDTraj: a modern open library for the analysis of molecular dynamics trajectories. Biophys J 109:1528–1532 169. Li F, Lee JH, Grishaev A, Ying J, Bax A (2015) High accuracy of Karplus equations for relating three-bond J couplings to protein backbone torsion angles. Chemphyschem 16:572–578 170. Karplus M (1959) Contact electron-spin coupling of nuclear magnetic moments. J Chem Phys 30:11–15 171. Zweckstetter M (2008) NMR: prediction of molecular alignment from structure using the PALES software. Nat Protoc 3:679–690 172. Flyvbjerg H, Petersen HG (1989) Error estimates on averages of correlated data. J Chem Phys 91:461–466 173. D. R. t. Kent, R. P. Muller, A. G. Anderson, W. A. Goddard, 3rd, M. T. Feldmann (2007) Efficient algorithm for "on-the-fly" error analysis of local or distributed serially correlated data. J Comput Chem 28:2309–2316 174. De Simone A, Cavalli A, Hsu ST, Vranken W, Vendruscolo M (2009) Accurate random coil chemical shifts from an analysis of loop regions in native states of proteins. J Am Chem Soc 131:16332–16333 175. Vuister GW, Bax A (1993) Quantitative J correlation: a new approach for measuring homonuclear three-bond J(HNHalpha) coupling constants in 15N-enriched proteins. J Am Chem Soc 115:7772–7777 176. Straatsma T, Berendsen H, Stam A (1986) Estimation of statistical errors in molecular simulation calculations. Mol Phys 57:89–95

Chapter 19 Exploration of Protein Aggregations in Parkinson’s Disease Through Computational Approaches and Big Data Analytics Saba Shahzadi, Muhammad Yasir, Bisma Aftab, Sumbal Babar, and Mubashir Hassan Abstract Protein aggregation has been implicated in numerous neurodegenerative disorders whose etiologies are poorly understood, and for which there are no effective treatments. Here we show that the computational approaches may help us to better understand the basics of Parkinson’s disease (PD). The high-resolution structural, dynamical, and mechanistic insights delivered by computational studies of protein aggregation have a unique potential to enable the rational manipulation of oligomer formation. Additionally, big data and machine learning methods may provide valuable insights to better understand the nature of proteins involved in PD and their aggregative behavior for the betterment of PD treatment. Key words Parkinson’s disease, Big data, Protein aggregation

1

Introduction Parkinson’s disease (PD) is the second most common neurodegenerative condition after Alzheimer’s disease (AD) characterized by loss of the neurotransmitter dopamine, motor dysfunction, and dopaminergic neurons in the substantia nigra, frequently accompanied by Lewy body (LB) inclusions [1]. In addition, PD patients also show the cognitive impairment (CI) which is the most common nonmotor complication of PD [2]. Moreover, PD is predominantly a motion disease, with various accompanying progressing impairments, including psychiatric issues such as depression and dementia [3]. The substantia nigra is the most affected part of brain in PD, where the nerve cells misplace their dopaminergic stimulation and functionality [4]. The dopamine, a neurotransmitter, acts as a messenger between the elements of the brain and nervous system that assists, manages, and coordinates the movements. Autonomic disturbances and pain may additionally occur later,

Mai Suan Li et al. (eds.), Computer Simulations of Aggregation of Proteins and Peptides, Methods in Molecular Biology, vol. 2340, https://doi.org/10.1007/978-1-0716-1546-1_19, © Springer Science+Business Media, LLC, part of Springer Nature 2022

449

450

Saba Shahzadi et al.

and the situation progresses causing widespread disability and handicap that impair lives of affected individuals [3].

2

Protein Misfolding and PD Misfolding is a common cause of protein aggregation in multiple neurodegenerative disorders such as PD and AD [5]. Multiple genes such as the SNCA gene that encodes α-synuclein protein (α-Syn) and the PARK2 gene that encodes parkin protein with various mutations may cause protein aggregation which leads to disorders. The misfolding of proteins can also affect the function of other key PD associated genes like DJ-1, PTEN-induced kinase 1 (PINK1), and Leucine-rich repeat kinase 2 (LRRK2), respectively [6].

3

Structure of α-Synuclein The unfolded α-Syn (kDa) is a highly conserved protein localized in different parts of brain. The α-Syn is comprised of 140 residues having three functional domains such as an amphipathic N-terminal (AA, 1–60), hydrophobic region (AA, 61–95), and acidic C-terminal region (AA, 96–140) (Fig. 1). Amphipathic N-terminal region contains imperfect six hexamer motif which is involved in binding of micelles and liposomes. Basically, α-Syn regulates the neurotransmission of dopamine through vascular modulation and with tubulin and can function like tau protein [7]. It has been observed that α-Syn is present as soluble and in unfolded form in the cellular cytoplasm. However, its α-helical conformation has been reported for lipid membrane binding, and it can be found as β-sheet as in Lewy bodies [7, 8].

3.1 Structures of Lewy Bodies

4

The LB are aggregative protein structures causing PD upon deposition. Research studies showed that LB have diameter of 8–30 μm and consist of 10 nm amyloidogenic fibrils. Lewy bodies contain a variety of proteins including α-Syn, neurofilaments, ubiquitinated proteins, and heat shock proteins (Hsp70 and Hsp90). There are multiple factors causing misfolding and aggregation of proteins such as oxidative stress, mitochondrial dysfunction, inflammation, ubiquitin proteasome system, pH, protein concentration, and high temperature [9, 10].

Misfolding and Aggregation of α-Synuclein The α-synfibrillogenesis is a process that causes the development of fine fibrils through multiple adjuvant and disincentive factors. The

Protein Aggregation and Parkinson’s Disease

451

Fig. 1 Primary structure of human α-synuclein

prior research study showed that a couple of mutations such A53T and A30P, in the α-Syn and its overexpression (wild type) increased the misfolding and aggregation processes [11]. In PD brains, α-Syn is phosphorylated at Ser87 and Ser129 by casein kinase 1/2 (CK1/2) which results in aggregation. Multiple studies reported that accumulation of phosphorylated α-Syn was observed in animal models of synucleinopathies. Therefore, this posttranslational modification has a pathological role in fibrillation of α-synuclein [12].

5

α-Synuclein and Simulation Analysis The computer simulation approaches have been implemented in the analysis of α-synuclein structure and interactions, both in the native state and in misfolded pathological aggregates. The MD simulation approach was coupled with NMR [13, 14] and FRET experiments [15] to provide the evidence that the native state of α-synuclein is more compact and results in random coil which forms transient oligomeric structures. In Monte Carlo simulations, a dual-phase behavior of α-synuclein has been reported in solution as structurally disordered and with significant β-strand content [16]. Prior data showed that destabilization of the native state usually occurs through diseaseassociated mutations [17–19]. The structural changes caused by these mutations result in aggregation of misfolded proteins [20]. Moreover, atomistic simulations reported that the α-synuclein formed transient oligomers capable of interacting with lipid membranes [21]. Another MD study of aggregation kinetics [22] indicates a two-phase process of fibril formation, where formation of the contact interface by mostly disordered chains is followed by structural transition and accumulation of β-sheet content. It also suggests a critical role of protein-solvent interactions in α-synuclein aggregation. Structures of fibrils have

452

Saba Shahzadi et al.

also been a focus of computer-assisted studies. Fibril rupture simulations agree with AFM experiments and indicate the presence of highly mechanically stable structures in α-synuclein fibrils with high β-sheet content [23].

6

Membrane Effects in α-Synuclein Aggregation The aggregation of α-synuclein through self-assembly promotes the aggregation of multiple morphologies which results in PD [24]. It has been observed that α-synuclein aggregation process depends upon the composition of membrane; therefore, best results were observed for 1-palmitoyl-2-oleoyl-sn-glycero-3-phospho-L-serine (POPS) bilayer. The computational modeling shows also that assembly of α-synuclein dimers occurs quickly through the membrane-bound monomer on POPS bilayer, due to an aggregation-prone orientation of α-synuclein. However, the interaction of α-synuclein with 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine (POPC) leads to a binding mode which resist in dimer assembly [24]. The α-synuclein interacts with both synthetic and biological lipids which leads to synaptic plasticity and the initiation of its aggregation. It has been observed that α-synuclein is primarily present in the termini of presynaptic neurons [25, 26]. The α-synuclein is formed in two states depending upon its binding with membranes such as unbound and α-helical conformation [27]. The balance between both states results in the regulation of synaptic plasticity and kinetics of its pathogenic aggregation which causes synucleinopathies [28, 29]. Dobson et al. studied membrane interaction with α-synuclein trigger that leads to transition from soluble to aggregated state which is directly associated with neurodegeneration such as PD [30].

7

Other Key Players Involved in PD There are several key proteins involved in the development of PD through gene mutations and protein aggregations in downstream signaling pathways (see Table 1).

8

Presenilin 1 The Presenilin 1 (PS1) is encoded by PSEN1 gene which is part of gamma-(γ)-secretase and is involved in the proteolysis, a process that breaks protein into smaller pieces of peptides [40]. It has been observed that γ-secretase complex is located in the membrane that surrounds cells and cleaves proteins, which is

Protein Aggregation and Parkinson’s Disease

453

Table 1 List of proteins involved in PD Genes

Associated functions

References

PS1 and PS2

A component of γ-secretase, which cleaves APP to yield Aβ Kim et al. [31], Cai et al. [32]

α-Synuclein The primary component of Lewy bodies

Sebastian et al. [5]

Parkin

A ubiquitin E3 ligase

Pickrell and Youle [33]

DJ-1

Protects the cell against oxidant-induced cell death

Solti et al. [34]

PINK1

A kinase localized to mitochondria. Function unknown. Seems to protect against cell death

Pickrell and Youle [33]

LRRK2

A kinase. Function unknown

Zimprich et al. [35]

HTRA2

Faccio et al. [36], Jones et al. A serine protease in the mitochondrial intermembrane [37], Strauss et al. [38] space. Degrades denatured proteins within mitochondria. Degrades inhibitor of apoptosis proteins and promotes apoptosis if released into the cytosol

PARK2

Slow progression, dystonia, or dyskinesia

LaVoie et al. [39]

important for different chemical signaling, such as the Notch signaling pathway that transmits signals from outside the cell into the nucleus [41]. The Notch signaling is also involved in normal immune system function. Recently published data reported that patients having mutations p.Glu120Lys and p.Tyr159Cys displayed memory impairment (MI) and language impairments such as anomia, psychiatric symptoms, cerebellar ataxia, and parkinsonism [31]. Another study reported that mutant PSEN1 might play a significant role in the pathogenetic process of both aggregation of a α-synuclein into Lewy bodies and deposition of β-amyloid into cotton wool plaques which may results in PD [42]. The threedimensional structure of protein PS1 is shown in Fig. 2.

9

Presenilin 2 Presenilin 2 (PS2) is a neurodegenerative protein associated with PD [32]. PSEN2 gene is best known for its role in processing amyloid precursor protein (APP), which is present in the brain. Research suggests that PS2 interacts with other enzymes and cuts APP into smaller peptides and that can form soluble amyloid precursor protein (sAPP), and another one called amyloid β peptide [43]. The mutational studies showed that PSEN2 is not only involved in AD patients but also in patients with different diseases, such as dementia with Lewy bodies and Parkinson’s disease with dementia [32]. The 3D model of the structure of PS2 is shown in Fig. 3.

454

Saba Shahzadi et al.

Fig. 2 Crystal structure of Presenilin 1

Fig. 3 Structural model of Presenilin 2

Protein Aggregation and Parkinson’s Disease

10

455

PARKN and PINK1 Parkin is a protein which in humans is encoded by the PARK2 gene [44] and plays a significant role in the cell machinery by degrading damaged and excess proteins through tagging them with a molecule called ubiquitin [45]. It has been observed that ubiquitinproteasome system works as the cell quality control by eliminating the damaged, misshapen, and excess proteins from cells. Moreover, published research also suggested that Parkin may help to trigger the destruction of mitochondria that are not working properly [46]. Parkin works as a tumor suppressor protein and controls the cell growth and division in appropriate fashion. Parkin may also regulate the supply and release of sacs called synaptic vesicles from nerve cells. Synaptic vesicles contain chemical messengers that transmit signals from one nerve cell to another. Mutational data reported that Parkin is also associated with mitochondrial dysfunction and neuronal death in Parkinson’s disease [47]. The structure PARKN protein is represented in Fig. 4. PTEN-induced kinase 1 (PINK1) is a mitochondrial serine/ threonine-protein kinase encoded by the PINK1 gene [48]. PINK1 protects cells from the stress induced by mitochondrial dysfunction. PINK1 is processed by healthy mitochondria and released to trigger neuron differentiation [49]. Moreover, mutational studies reported that PINK1 gene is involved in autosomal recessive early-onset Parkinson’s disease. Both PINK1 and Parkin act together in the same pathway and constitute mitochondrial quality control mechanism and provide evidence that mitochondrial damage is involved in PD [33]. The 3D structural models of PARKN and PINK1 protein are shown in Figs. 4 and 5, respectively.

10.1

DJ-1

DJ-1 is a small, highly conserved protein composed of 189 amino acids, linked to PD [50]. It has been observed that DJ-1 is also involved in the protection of brain cells from oxidative stress. One prior study showed that DJ-1 aggregates into β-sheet (soluble and fibrillary aggregates) under different physiological conditions by the oxidation of its catalytic Cys106 residue and leads to loss of its native biochemical glyoxalase function. The aggregation is observed in Lewy bodies, neurofibrillary tangles, and amyloid plaques in human PD and Alzheimer’s (AD) patients’ postmortem brain tissues [34]. The structure of DJ-1 protein is displayed in Fig. 6.

456

Saba Shahzadi et al.

Fig. 4 Crystal structure of PARKN

Fig. 5 Crystal structure of PINK1

11

Leucine-Rich Repeat Kinase 2 Leucine-rich repeat kinase 2 (LRRK2), also known as dardarin, is a kinase enzyme that in humans is encoded by the LRRK2 gene [51]. The genetic data showed that variants of LRRK2 are

Protein Aggregation and Parkinson’s Disease

457

Fig. 6 Crystal structure of DJ-1

associated with an increased risk of PD and Crohn’s disease [35]. Past studies reveal that LRRK2 is a kinase protein that identifies various interaction partners, suggesting that LRRK2 may play important roles in protein aggregation and causes neuronal degeneration. Mutations I1122V, R1441C, Y1699C, G2019S, and I2020T increase the chances of LRRK2 to form inclusion bodies [52], suggesting that LRRK2 kinase activity may also contribute to protein aggregation [53]. LRRK2 protein kinase activity plays an important role in both neuronal degeneration and protein aggregation [54]. The 3D structure of LRRK2 protein is shown in Fig. 7.

12

HTRA2 The mitochondrial serine protease HTRA2 protein, is an enzyme that in humans is encoded by the HTRA2 gene, which causes apoptosis and PD [36, 38]. The HTRA family is evolutionarily conserved and involved in mitochondrial homeostasis and inactivating mutations associated with neurodegenerative disorders such as PD [37]. PINK1 and HTRA2 are known to be key modulators of mitochondrial molecular quality control. The mitochondrial stress-control pathway regulated by the p38 stress kinase signaling pathway through the activity of HTRA2 via PINK1, and cyclindependent kinase 5 (CDK5) leads to PD by defecting this pathway [55]. The structure of HTRR2 protein is shown in Fig. 8.

458

Saba Shahzadi et al.

Fig. 7 Crystal structure of LRRK2

Fig. 8 Crystal structure of HTRR2

13

PARK2 PARK2 gene encodes parkin PRKN protein involved in familial early onset PD (autosomal recessive inheritance). Mutations in this gene are known to cause Parkinson’s disease and autosomal recessive juvenile Parkinson’s disease (ARJP) [39]. The structure of parkin PRKN protein is shown in Fig. 9.

Protein Aggregation and Parkinson’s Disease

459

Fig. 9 Crystal structure of PRKN

14

Big Data and PD Big data term was coined by Doug Laney in the early 2000s. Big data refers to three main aspects of information: (1) volume of information, (2) velocity of information, and (3) variety of information [56]. Nowadays, a huge amount of data collected from mass-scale gene sequencing experiments are being collected from different sources for genes, gene variants, and gene-encoded proteins. To analyze such huge data repositories, highly efficient computing hardware and software are necessary. Another parameter is velocity of information, the velocity at which data is collected in real time. Finally, the quality of information which is being gathered, is described by amount of “noise” within the data (information). Excessively big amount of information could make it tough to identify which facts are important and which information are distractions. Apart from the significance of big data, it has been observed that it may not easily be converted into good data. For example, from data based on binary definitions of AD vs. control and PD vs. control, important insights may be derived, on a smaller scale, from studying rare genetic forms of disease, for which targeted therapeutic efforts may succeed [57].

15

Predictive Analysis To better utilize big data and to extract the most fruitful information, researchers have advanced a technique referred to as predictive

460

Saba Shahzadi et al.

Fig. 10 Overview of PD and big data analytics

analytics for managing big data [56]. The analytic technique consists of various statistical techniques such as predictive modeling, machine learning, and data mining which can be used to extract value from the data. Data mining is an important technique to analyze the retrieved sequences and to better describe data outcomes [58]. Moreover, data mining approach can also be used to describe new findings and predict future developments. Data mining methods include clustering models, decision trees, and neural networks [58]. It has been found that protein aggregation data is presently found in an increasing number of repositories, because recent developments in computational biology have greatly influenced this field of research [59]. The implementation of predictive tools with the ability to systematically analyze large collections of proteins has allowed to extend the analysis of aggregation to complete proteomes, resulting in a deeper understanding of the molecular determinants that govern protein aggregation while revealing crosstalk between protein evolution and aggregation [60]. The overview of PD and Big Data analytics has been shown in Fig. 10.

16

Big Data Approaches to Understand PD Computational predictive models in combination with experimental approaches are significantly important for protein characterization and to study their aggregative behavior in multiple diseases [61]. Big data or machine learning approaches are very important to better understand protein sequences and structures involved in PD.

Protein Aggregation and Parkinson’s Disease

17

461

Analysis of PD Using Apriori Algorithm Mass-scale genome sequencing generates enormous amount of sequence data while experimental approaches to annotate these genes are costly and slow. Because of this, an automatic annotation procedure is required to reduce the gap between the amount of newly sequenced genomes and their reliable functional annotations [62]. Various algorithms are being developed to address these problems. The apriori algorithm is used to mine the relationship between the enzyme class and significant InterPro entries. The candidate rules are evaluated for their classificatory capacity [62]. The mining of genomic/proteomic sequences from different resources may help to predict protein structure and to design drugs that can reduce the rate of protein aggregation.

18

Big Data and Its Use in Drug Discovery Big data-driven drug discovery has been an increasingly popular strategy in pharmaceutical companies. Currently, many therapeutics companies have integrated gene expression analysis, cellular screening systems together with computational healthcare informatics software to identify chemical structures with attributes of interest for oncology drug discovery [63]. Moreover, the highperformance combination screening system with cell-based phenotypic assays has been used by pharmaceutical companies to make the combinations of existing FDA-approved compounds to attack multiple dysfunctional pathways in various diseases, e.g., inflammation, respiratory, metabolic, and infectious diseases [63]. The aforementioned facts demonstrate that big data is playing a significant role in the present drug discovery development. In 2016, we published the paper reviewing the use of big data in drug discovery [64]. As an update, here we will keep discussing the important perspectives in data-driven drug discovery and complement relevant resources useful for broad biomedical research community. The advancement in medical imaging, molecular sequencing, and laboratory screening approaches have enhanced the drug development process [65]. Presently, much better tools and techniques are available; therefore, it is now a good time to design drug candidates having better efficacy and good therapeutical values. Big data can help to aggregate the information from various diverse sources and provide intelligence support for drug design. There are major areas in pharmaceutical industry where big data can play a crucial role [66].

462

19

Saba Shahzadi et al.

Management of Clinical Trial and Omics Data Analysis Large population clinical trials with diversity and across multiple study sites are an outdated approach. The data retrieved from such study design, patient enrollment, data analysis, and interpretation are not meaningful. Therefore, omics data management is frequently used to better interpret the results from data collection. The successes of recent sequencing-based genomics approaches have increased their importance in clinical diagnosis and in research. Enormous proteomic, metabolomic, lipidomic, glycomic, and transcriptomic data from such studies can be only analyzed through big data analytics.

20

Selection of Chemical Scaffolds from Databases To fetch a desired compound for target protein from millions of screen compounds is a hard task possible only through various computational approaches. By using complex algorithms, we can screen large databases containing biological, chemical, and clinical information to obtain the best results in a short time. In fact, advanced analytics is being used to analyze the clinical requirements and drug candidate profiles to create a product development pipeline for pharmaceutical companies.

21

Orphan Drugs and Drug Repurposing During clinical trials, it happens that some drug candidates are infective for the whole population, but display some potential in particular subpopulations characterized by a set of demographic, genetic, or other commonalities. Big data analytics can help to identify the specific subpopulations for which a “failed” drug can still be success. This practice of drug repurposing (process of finding new applications of FDA-approved drugs [67, 68]) is beneficial to patients suffering from rare diseases or genetic disorders that may not by commercially attractive for a dedicated product development. Amantadine—a drug originally developed for influenza—has been reported to be effective also for PD treatment [69].

22

Resources Useful for PD Research There are various databases and resources where PD data is being collected and which can be utilized to better understand the etiology of PD and to improve treatment of PD (see Table 2).

Protein Aggregation and Parkinson’s Disease

463

Table 2 List of PD resources and databases Databases

URLs

Parkinson’s Disease Research Database

https://www.michaeljfox.org/ grant/

Accelerating Medicines Partnership: Parkinson’s Disease

https://fnih.org/amp-pd

The Parkinson’s Progression Markers Initiative

https://www.ppmi-info.org/

ParkDB

http://www2.cancer.ucl.ac.uk/ Parkinson_Db2/

PDbase

http://bioportal.kobic.re.kr/ PDbase/

Parkinson’s Disease Data and Organizing Center

http://www.pd-doc.org/

Parkinson’s Disease Genetics Database

https://clinicaltrials.gov/ct2/ show/NCT00855556

BioGPS

http://biogps.org/about/

ProADD: a database on protein aggregation diseases

http://bicmku.in/ProADD

WALTZ-DB 2.0: a database for characterizing short peptides for their amyloid fiber-forming capacities

http://waltzdb.switchlab.org/

CPAD 2.0: The Curated Protein Aggregation Database

https://web.iitm.ac.in/bioinfo2/ cpad2/index.html

22.1 Parkinson’s Disease Research Database (PDRD)

The PDRD is a very unique database which explores the basic understanding of PD: its causes, symptoms, medications, and treatment. The database also lists funding opportunities for PD-related research studies. This database provides excellent platform for multidisciplinary research collaborations and for seeking an advanced knowledge of PD.

22.2 Accelerating Medicines Partnership: Parkinson’s Disease

The Foundation for the National Institutes of Health (FNIH) in collaboration with Accelerating Medicines Partnership has designed a dedicated database for PD and PD-related drugs. The Accelerating Medicines Partnership (AMP) is a public–private partnership between the NIH, FNIH, multiple biopharmaceutical companies, and nonprofit organizations. The basic aim of AMP is to identify and validate the promising protein targets for PD. Moreover, AMP is also providing the research expertise and technical support to screen biomarkers having good therapeutical potential against PD.

22.3 The Parkinson’s Progression Markers Initiative (PPMI)

The PPMI is an important database collecting the results of clinical and behavioral studies to comprehensively evaluate the cohorts and to identify potential biomarkers for PD. PPMI acquires samples from different clinical studies and constitutes a comprehensive

464

Saba Shahzadi et al.

Parkinson’s database and biorepository. The PPMI was created by the concerted efforts of numerous collaborators sponsored by The Michael J. Fox Foundation for Parkinson’s Research. 22.4

ParkDB

ParkDB is a queryable database which collects curated and annotated microarray datasets from publicly available gene expression studies. ParkDB allows researchers to identify and compare gene expression signatures involved in PD and dopaminergic neuron differentiation under different biological conditions. It also allows to mine differentially expressed genes.

22.5

PDbase

PDbase database is a research platform for PD-related genetic variations and functional elements. PDbase database comprises 2698 PD-related genes accumulated from human normal and PD patients from Substantia nigra (NS) tissues and other public resources. Additionally, PDbase provides mitochondrial proteins, miRNA gene regulation elements, single-nucleotide polymorphisms (SNPs) markers within PD-related gene structures, repeat elements, pathways, and networks with protein–protein interaction information.

22.6 Parkinson’s Disease Data and Organizing Center

The Parkinson’s Disease Data and Organizing Center (PDDOC) was established at the University of Rochester in 2004 under a cooperative agreement with the National Institute of Neurological Disorders and Stroke (NINDS). PDDOC serves as a national resource to advance translational and clinical research in PD. The PD-DOC is creating a centralized and standardized database of clinical, epidemiological/environmental risk, postmortem neuropathological, and biological data from subjects with PD and other parkinsonian conditions.

22.7 BioGPS and ProADD

BioGPS is a gene portal built with two guiding principles in mind: customizability and extensibility. The PD gene data can be retrieved and used for better understanding of PD. ProADD, a database for protein aggregation diseases, is developed to organize the data under a single platform to facilitate easy access for researchers. Diseases caused due to protein aggregation and the proteins involved in each of these diseases are integrated into ProADD database. ProADD helps in the classification of proteins involved in the protein aggregation diseases based on sequence and structural analysis.

22.8 WALTZ-DB 2.0 and CPAD 2.0

WALTZ-DB 2.0 is a platform to characterize the short peptides, and majority of data comes from electron microscopy, FTIR, and Thioflavin-T experiments done by the Switch lab. Apart from that class of data, WALTZ-DB 2.0 also provides amyloid annotation for several other short peptides found in current scientific literature. The Curated Protein Aggregation Database (CPAD) is manually

Protein Aggregation and Parkinson’s Disease

465

curated, open-access database dedicated to providing comprehensive information related to mechanistic, kinetic, and structural aspects of protein and peptide aggregation.

23

Conclusion Here, we show that computational approaches and big data analytics help us to better understand the etiology of PD. The computational analysis of sequential and structural data helps us to understand the misfolding and aggregation of proteins which leads to PD. In addition, big data methods can also provide useful results on subtypes of participants in PD clinical studies as well as provide a better understanding of the molecular nature of PD and development of drugs against PD.

Conflict of Interest All authors declare no conflict of interests. References 1. Meireles J, Massano J (2012) Cognitive impairment and dementia in Parkinson’s disease: clinical features, diagnosis, and management. Front Neurol 3:88 2. Watson GS, Leverenz JB (2010) Profile of cognitive impairment in Parkinson disease. Brain Pathol 20:640–645 3. DeMaagd G, Philip A (2015) Parkinson’s disease and its management. Part 1: disease entity, risk factors, pathophysiology, clinical presentation, and diagnosis. P T 40:504–510 4. Alexander GE (2004) Biology of Parkinson’s disease: pathogenesis and pathophysiology of a multisystem neurodegenerative disorder. Dialogues Clin Neurosci 6:259–280 5. Sebastian WS, Samaranch L, Kells AP et al (2013) Gene therapy for misfolding protein diseases of the central nervous system. Neurotherapeutics 10:498–510 6. Tan JMM, Wong ESP, Lim KL (2009) Protein misfolding and aggregation in Parkinson’s disease. Antioxid Redox Signal 11:2119–2134 7. Ma QL, Chan P, Yoshii M, Ueda K (2003) α-Synuclein aggregation and neurodegenerative diseases. J Alzheimers Dis 5:139–148 8. Valtierra S (2008) α-Synuclein phosphorylation and nitration in Parkinson’s disease. Eukaryon 4:90–94

9. McNaught KSP, Olanow CW (2006) Protein aggregation in the pathogenesis of familial and sporadic Parkinson’s diseases. Neurobiol Aging 27:530–545 10. Licker V, Ko¨vari E, Hochstrasser DF et al (2009) Proteomics in human Parkinson’s diseases research. J Proteome 73:10–29 11. Gitler AD, Chesi A, Geddie ML et al (2009) α-Synuclein is part of a diverse and highly conserved interaction network that includes PARK9 and manganese toxicity. Nat Genet 41:308–315 12. Carla M, Tome L, Tyson T et al (2013) Inflammation and α-synuclein’s prion-like behavior in Parkinson’s disease—is there a link? Mol Neurobiol 47:561–574 13. Allison JR, Varnai P, Dobson CM et al (2009) Determination of the free energy landscape of alpha-synuclein using spin label nuclear magnetic resonance measurements. J Am Chem Soc 131:18314–18326 14. Wu KP, Weinstock DS, Narayanan C et al (2009) Structural reorganization of alphasynuclein at low pH observed by NMR and REMD simulations. J Mol Biol 391:784–796 15. Nath A, Sammalkorpi M, DeWitt DC et al (2012) The conformational ensembles of α-synuclein and tau: combining single-

466

Saba Shahzadi et al.

molecule FRET and simulations. Biophys J 103:1940–1949 16. Jo´nsson SA, Mohanty S, Irb€ack A (2012) Distinct phases of free α-synuclein—a Monte Carlo study. Proteins 80:2169–2177 17. Coskuner O, Wise-Scira O (2013) Structures and free energy landscapes of the A53T mutant-type α-synuclein protein and impact of A53T mutation on the structures of the wild-type α-synuclein protein with dynamics. ACS Chem Neurosci 4:1101–1113 18. Wise-Scira O, Aloglu AK, Dunn A et al (2013) Structures and free energy landscapes of the wild-type and A30P mutant-type α-synuclein proteins with dynamics. ACS Chem Neurosci 4:486–497 19. Wise-Scira O, Dunn A, Aloglu AK et al (2013) Structures of the E46K mutant-type α-synuclein protein and impact of E46K mutation on the structures of the wild-type α-synuclein protein. ACS Chem Neurosci 4:498–508 20. Hazy E, Bokor M, Kalmar L et al (2011) Distinct hydration properties of wild-type and familial point mutant A53T of α-synuclein associated with Parkinson’s disease. Biophys J 101:2260–2266 21. Tsigelny IF, Sharikov Y, Miller MA et al (2008) Mechanism of alpha-synuclein oligomerization and membrane interaction: theoretical approach to unstructured proteins studies. Nanomedicine 4:350–357 22. Matthes D, Gapsys V, de Groot BL (2012) Driving forces and structural determinants of steric zipper peptide oligomer formation elucidated by atomistic simulations. J Mol Biol 421:390–416 23. Jo´nsson SÆ, Mitternacht S, Irb€ack A (2013) Mechanical resistance in unstructured proteins. Biophys J 104:2725–2732 24. Zhengjian LV, Hashemi M, Banerjee S et al (2019) Assembly of α-synuclein aggregates on phospholipid bilayers. Biochim Biophys Acta Proteins Proteom 1867:802–812 25. Galvagnion C, Brown JWP, Ouberai MM et al (2016) Chemical properties of lipids strongly affect the kinetics of the membrane-induced aggregation of α-synuclein. PNAS 113:7065–7070 26. Cabin DE et al (2002) Synaptic vesicle depletion correlates with attenuated synaptic responses to prolonged repetitive stimulation in mice lacking α-synuclein. J Neurosci 22:8797–8807 27. Lee HJ, Choi C, Lee SJ (2002) Membranebound α-synuclein has a high aggregation propensity and the ability to seed the aggregation

of the cytosolic form. J Biol Chem 277 (1):671–678 28. Goedert M (2001) α-Synuclein and neurodegenerative diseases. Nat Rev Neurosci 2 (7):492–501 29. Chiti F, Dobson CM (2006) Protein misfolding, functional amyloid, and human disease. Annu Rev Biochem 75:333–366 30. Galvagnion C, Buell AK, Meisl G, Michaels TCT, Vendruscolo M, Knowles TPJ, Dobson CM (2015) Lipid vesicles trigger α-synuclein aggregation by stimulating primary nucleation. Nat Chem Biol 11(3):229–234 31. Kim Y, Cho H, Kim HJ et al (2020) PSEN1 variants in Korean patients with clinically suspicious early-onset familial Alzheimer’s disease. Sci Rep 10:3480 32. Cai Y, Soo S, Kim SY (2015) Mutations in presenilin 2 and its implications in Alzheimer’s disease and other dementia-associated disorders. Clin Interv Aging 10:1163–1172 33. Pickrell AM, Youle RJ (2015) The roles of PINK1, parkin, and mitochondrial fidelity in Parkinson’s disease. Neuron 85:257–273 34. Solti K, Kuan WL, Fo´rizs B et al (2020) DJ-1 can form β-sheet structured aggregates that co-localize with pathological amyloid deposits. Neurobiol Dis 134:104629 35. Zimprich A, Biskup S, Leitner P et al (2004) Mutations in LRRK2 cause autosomaldominant parkinsonism with pleomorphic pathology. Neuron 44:601–607 36. Faccio L, Fusco C, Chen A et al (2000) Characterization of a novel human serine protease that has extensive homology to bacterial heat shock endoprotease HtrA and is regulated by kidney ischemia. Int J Biol Chem 275:2581–2588 37. Jones JM, Datta P, Srinivasula SM et al (2003) Loss of Omi mitochondrial protease activity causes the neuromuscular disorder of mnd2 mutant mice. Nature 425:721–727 38. Strauss KM, Martins LM, Plun-Favreau H et al (2005) Loss of function mutations in the gene encoding Omi/HtrA2 in Parkinson’s disease. Hum Mol Genet 14:2099–2111 39. LaVoie MJ, Ostaszewski BL, Weihofen A et al (2005) Dopamine covalently modifies and functionally inactivates parkin. Nat Med 11:1214–1221 40. Herl L, Thomas AV, Lill CM et al (2009) Mutations in amyloid precursor protein affect its interactions with presenilin/gammasecretase. Mol Cell Neurosci 41:166–174 41. Wolfe MS (2019) Structure and function of the γ-secretase complex. Biochemist 58 (27):2953–2966

Protein Aggregation and Parkinson’s Disease 42. Ishikawa A, Piao YS, Miyashita A et al (2005) A mutant PSEN1 causes dementia with Lewy bodies and variant Alzheimer’s disease. Ann Neurol 57:429–434 43. O’Brien RJ, Wong PC (2011) Amyloid precursor protein processing and Alzheimer’s disease. Annu Rev Neurosci 34:185–204 44. Kitada T, Asakawa S, Hattori N et al (1998) Mutations in the parkin gene cause autosomal recessive juvenile parkinsonism. Nature 392:605–608 45. Ciechanover A, Kwon Y (2015) Degradation of misfolded proteins in neurodegenerative diseases: therapeutic targets and strategies. Exp Mol Med 47:e147 46. Seirafi M, Kozlov G, Gehring K (2015) Parkin structure and function. FEBS J 282:2076–2088 47. Dawson TM, Dawson VL (2014) The role of parkin in familial and sporadic Parkinson’s disease. Mov Disord 25:S32–S39 48. Unoki M, Nakamura Y (Aug 2001) Growthsuppressive effects of BPOZ and EGR2, two genes involved in the PTEN signaling pathway. Oncogene 20(33):4457–4465 49. Dagda RK, Pien I, Wang R et al (2013) Beyond the mitochondrion: cytosolic PINK1 remodels dendrites through protein kinase A. J Neurochem 128:864–877 50. Bonifati V, Rizzu P, van Baren MJ et al (2003) Mutations in the DJ-1 gene associated with autosomal recessive early-onset parkinsonism. Science 299:256–259 51. Paisa´n-Ruı´z C, Jain S, Evans EW, Gilks WP, Simo´n J, van der Brug M, Lo´pez de Munain A, Aparicio S, Gil AM, Khan N, Johnson J, Martinez JR, Nicholl D, Carrera IM, Pena AS, de Silva R, Lees A, Martı´-Masso´ JF, Pe´rez-Tur J, Wood NW, Singleton AB (2004) Cloning of the gene containing mutations that cause PARK8-linked Parkinson’s disease. Neuron 44(4):595–600. https://doi. org/10.1016/j.neuron.2004.10.023 52. Gaiter D, Westerlund M, Carmine A et al (2006) LRRK2 expression linked to dopamine-innervated areas. Ann Neurol 59 (4):714–719 53. Greggio E, Jain S, Kingsbury A et al (2006) Kinase activity is required for the toxic effects of mutant LRRK2/dardarin. Neurobiol Dis 23 (2):329–341 54. Li T, Yang D, Sushchky S et al (2011) Models for LRRK2-linked parkinsonism. Parkinsons Dis 942412:16 55. Desideri E, Martins LM (2012) Mitochondrial stress signalling: HTRA2 and Parkinson’s disease. Int J Cell Biol 2012:607929

467

56. Gandomi A, Haider M (2015) Beyond the hype: big data concepts, methods, and analytics. Int J Inf Manag 35(2):137–144 57. Espay AJ, Vizcarra JA et al (2019) Revisiting protein aggregation as pathogenic in sporadic Parkinson and Alzheimer diseases. Neurology 92:329–337 58. Sivarajah U, Kamal MM, Irani Z et al (2017) Critical analysis of Big Data challenges and analytical methods. J Bus Res 70:263–286 59. Tsiolaki PL, Nastou KC, Hamodrakas SJ et al (2017) Mining databases for protein aggregation: a review. Amyloid 24:143–152 60. Redler RL et al (2014) Computational approaches to understanding protein aggregation in neurodegeneration. J Mol Cell Biol 6:104–115 61. Sharma S, Ding F, Dokholyan NV (2008) Probing protein aggregation using discrete molecular dynamics. Front Biosci 13:4795–4808 62. Chiu S, Chen C, Yuan G et al (2006) Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences. BMC Bioinform 7:304 63. Cha Y, Erez T, Reynolds IJ et al (2018) Drug repurposing from the perspective of pharmaceutical companies. Br J Pharmacol 175:168–180 64. Kim RS, Goossens N, Hoshida Y (2016) Use of Big Data in drug development for precision medicine. Expert Rev Precis Med Drug Dev 1:245–253 65. Kalimuthu S, Jeong JH, Oh JM (2017) Drug discovery by molecular imaging and monitoring therapy response in lymphoma. Int J Mol Sci 18:1639 66. Qian T, Zhu S, Hoshida Y (2019) Use of Big Data in drug development for precision medicine: an update. Expert Rev Precision Med Drug Dev 4:189–200 67. Mucke HAM (2014) A new journal for the drug repurposing community. Drug Repurpos Rescue Reposition 1:3–4 68. Hassan M, Raza H, Abbasi MA et al (2019) The exploration of novel Alzheimer’s therapeutic agents from the pool of FDA approved medicines using drug repositioning, enzyme inhibition and kinetic mechanism approaches. Biomed Pharmacother 109:2513–2526 69. Padhy BM, Gupta YK (2011) Drug repositioning: re-investigating existing drugs for new therapeutic indications. J Postgrad Med 57:153–160

Chapter 20 Final Remarks Mai Suan Li, Andrzej Kloczkowski, Marek Cieplak, and Maksim Kouza Abstract Aggregation of biomolecules is responsible for a number of neurodegenerative diseases, but it is also behind the formation of membraneless organelles that are vital to life. There are many novel experimental tools to investigate the phenomenon. There is also a rapid progress in its computational studies, as evidenced by the chapters in this volume. Key words Aggregation-related diseases, Mechanisms of aggregation, Computational studies, Membraneless organelles

The subject of the aggregation of biomolecules is important for the public health as our rapidly aging society is increasingly exposed to aggregation-related diseases, such as Parkinson’s, Alzheimer’s, and Huntington’s diseases. Finding a remedy for these diseases is at the top of the priorities for NIH and other funding agencies around the globe. At the same time, scientific studies of this subject are challenging and, therefore, exciting and evolving rapidly. We are witnessing a recent tremendous progress in experimental techniques: atomic force microscopy, cryogenic electron microscopy, nuclear magnetic resonance, small-angle X-ray spectroscopy, optogenetics, and many other. They offer an increased spatial and/or temporal resolution in measurements at the molecular level. There is also a growing variety of useful medical tools, such as the patient gene sequencing. Important insights into the mechanisms of aggregation are often obtained through computational studies. This book provides a review of the current status of the molecular dynamics approaches, bioinformatics-based prediction and quantum mechanical methods that are used in the context of aggregation. Nevertheless new simulational methods are appearing on the horizon, such as exascale computing, deep learning. These new methods may soon change this field of research completely. This should lead to a further development of efficient therapies for Mai Suan Li et al. (eds.), Computer Simulations of Aggregation of Proteins and Peptides, Methods in Molecular Biology, vol. 2340, https://doi.org/10.1007/978-1-0716-1546-1_20, © Springer Science+Business Media, LLC, part of Springer Nature 2022

469

470

Mai Suan Li et al.

diseases that involve protein and peptide aggregation. Another growing area is the subject of the membraneless organelles which are proteinaceous droplets that arise through aggregation of proteins, often with the help of nucleic acids. These droplets are responsible for many vital processes in the cell, such as the formation of ribosomes in the nucleoli. We may anticipate that in the next few years, this volume of Methods in Molecular Biology will require a substantial updating, and a new edition will be needed.

INDEX A Accelerating Medicines Partnership (AMP) ................ 463 Activation energy (AE) ........................................ 387, 388 Activation-relaxation technique (ART) ....................... 180 Adaptive multiple time split algorithm (A-MTS).......... 81 Aggregates .................................5, 13, 26, 46, 48, 51, 52, 54, 55, 57, 60, 61, 63, 69, 79, 85–86, 91, 92, 99, 106, 107, 112, 117, 118, 122–124, 127, 128, 131, 132, 134, 175, 176, 179, 181–183, 197, 198, 200–203, 213, 215, 221, 222, 232, 235, 236, 238, 246, 251, 256, 258, 281, 282, 285–289, 293, 295, 297–300, 309–322, 324–328, 330–332, 334–337, 358, 374, 380, 402, 451, 455, 461 Aggregation ............................... 2, 4, 6, 7, 12, 13, 17–38, 43, 48, 49, 52–55, 57, 59–61, 63–69, 80, 83, 85–87, 91, 92, 94, 95, 99, 105–118, 121, 123, 124, 127, 128, 130, 131, 140, 175–177, 179–181, 183, 198, 200, 215, 236, 238, 248–250, 256, 261, 262, 281, 283–290, 294, 295, 298, 314, 320, 321, 326–328, 343–346, 348–355, 379, 380, 386, 387, 397, 402, 405, 426, 434, 441, 450–453, 455, 460, 469, 470 Aggregation kinetics ......................... 3, 13, 53, 176, 345, 354, 355, 451 Aggregation of proteins..........6, 13, 197–204, 206, 207, 209–215, 450, 465 Aggregation prediction.............................................5, 349 Aggregation-prone regions ..................12, 25, 26, 28, 31 Aggregation propensities ......................... 5, 7, 12, 17, 18, 22, 24, 25, 27, 37, 107, 344, 345, 347, 405 Aggregation rates ................................37, 55, 57, 69, 116 Aggregation states.......................... 86, 99, 236, 257, 259 Aggrescan .................................7, 8, 17, 22, 38, 223, 224 Aggrescan3D ....................................7, 17–19, 21, 23–28, 31, 33, 34, 36, 38 AggScore ......................................................................... 12 AgMata ............................................................................ 12 Air–water interface .......................................200–202, 215 All-atom molecular dynamics .................... 117, 197–204, 206, 207, 209–215, 222, 362, 363, 369, 381 All-atom simulations ............................68, 80, 81, 83, 96, 107, 108, 110, 112, 114, 115 α-synuclein (AS) .................................121, 122, 130–134, 296, 311

Alzheimer’s disease (AD)......................... v, vi, 51, 54, 59, 79, 90, 121, 140, 175, 184, 190, 197, 209, 215, 221, 235, 237, 261, 284, 320, 325, 379, 401, 449, 455, 469 AMBER99 ..................................................................... 227 AMBER99sb ................................................................. 180 AMBER99sb-ildn ................................................ 350, 355 Amber ff03Ws ................................... 405, 406, 414, 415, 431, 432, 436–440 Amide bands.................................................................. 292 Amide I ....................................... 185, 186, 188, 290–292 Amphipathic ................................. 64, 130, 132, 133, 450 AmyCo............................................................................... 4 Amylins ................................................121–127, 283, 286 AmyLoad ........................................................................... 3 AmyloGram ...............................................................11, 12 Amyloid aggregation..........................1, 13, 81, 121–135, 236, 283, 349 Amyloid-beta .........................................7, 51, 57, 79, 284 Amyloid β peptide .................................................. 86, 453 Amyloid fibril formation ........................52, 91, 175, 176, 179, 180, 182, 184, 197, 295 Amyloid fibrils .............................66, 112, 131, 175, 177, 184–189, 197, 198, 201, 203, 205, 209, 210, 212–215, 222, 228, 235, 281, 286–288, 290, 292–298, 343, 372, 379 Amyloid glass..................................................63, 110, 111 Amyloid motifs ..................................................... 343–355 AmyloidMutants ........................................................... 8, 9 Amyloid oligomers.....................121, 122, 127, 134, 180 Amyloid peptides ............................ 13, 53, 57, 177, 179, 282, 283, 285 Amyloid precursor protein (APP) .............. 139, 140, 453 Amyloidogenic ....................................... 1–11, 57, 59, 63, 121–123, 140, 221–223, 225, 232, 256, 281, 285, 292, 297, 325, 344, 345, 401, 405, 411, 430, 450 Amyloidogenicity Propensity Prediction Neural Network (APPNN) ............................................. 11 Amyloidogenic peptides ........ 8, 53, 222, 228, 236, 289, 401–442 Amyloidogenic regions .......................6, 10, 12, 222–224 Amyloidoses..................................................................... 13 Amyloids .......................... v–vii, 1–13, 46, 53, 57, 63, 79, 80, 86, 91, 92, 94, 96, 121, 122, 126–130, 134, 140, 175–185, 190, 202, 209, 213, 215, 221,

Mai Suan Li et al. (eds.), Computer Simulations of Aggregation of Proteins and Peptides, Methods in Molecular Biology, vol. 2340, https://doi.org/10.1007/978-1-0716-1546-1, © Springer Science+Business Media, LLC, part of Springer Nature 2021

471

COMPUTER SIMULATIONS OF AGGREGATION OF PROTEINS AND PEPTIDES

472 Index

222, 258, 281–283, 285–300, 344, 345, 354, 401, 455, 463, 464 AmylPred ........................................................................... 8 AMYPdb ............................................................................ 3 AmyPro.......................................................................... 3, 4 Analytical ...............................3, 285, 288–292, 298, 299, 358, 360, 426 Analytics................................................................ 449–465 Andersen barostat ................................................ 199, 203 Annotations .........................................4, 5, 153, 461, 464 Antibody design ........................................................30–37 Antiparallel β-sheet.............. 90, 182, 183, 210, 256, 289 Apriori algorithm .......................................................... 461 ArchCandy....................................................................... 10 Aromaticity .................................................................... 356 Artificial neural networks (ANNs) .................... 8, 11, 141 Assembly of macromolecules...........................................vii Assembly of proteins ....................................................... 53 Associations ........................ 42, 60, 61, 67, 69, 106, 107, 112, 114, 116, 118, 176, 180, 392, 402, 418–426 Atomic force microscopy (AFM) ....................... 140, 205, 294, 295, 297, 300, 357, 358, 363, 452, 469 Atomistic equilibrium molecular dynamics (EMD) 406–408, 410, 411, 414, 415 Attenuated total reflectance (ATR)..................... 290, 291 Aβ16-22 ............................ 55, 61, 66, 179–181, 183, 237, 238, 240, 246, 248, 250–258, 262 Aβ peptides ............................. 51, 62, 66, 69, 80, 91, 95, 96, 99, 127, 129, 139, 176, 180, 182, 197, 198, 200–205, 209, 212, 213, 215, 222, 223, 225, 237, 238, 246, 324, 325, 327, 328, 330, 337

B Backbones....................... 31, 34, 60, 64–66, 81, 89, 106, 109–111, 178, 179, 181, 189, 250, 251, 284, 354, 364, 367, 393, 409 Bacterial filaments ...............................357, 364, 371–373 Beads ........................ 60, 63, 64, 66, 115, 179, 181, 367, 369, 371, 372 Berendsen barostat............................................... 227, 406 β-amyloid ..................... vii, 288, 357, 366, 373–374, 453 β-cells ............................................................................. 122 β-hairpin ........................60, 88, 176, 209, 210, 215, 405 β-layer .......................................................... 223, 227, 228 β-propensity............................................... 4, 64, 181, 349 β-secretase...................................................................... 139 BetaSerpentine ................................................................ 10 β-sheet.............................4, 6, 60, 64, 65, 86, 87, 96, 97, 99, 134, 175, 176, 179, 181, 183, 186, 204, 205, 209, 210, 215, 235, 236, 256, 282, 284, 289, 290, 292, 326, 349, 379, 380, 402, 405, 450–452, 455 β-strand ........................... 59, 65–67, 94, 95, 98, 99, 124, 127, 179–182, 225, 257, 259, 261, 403, 405, 451

β-turns.......................................................... 345, 346, 349 Big Data.......................................................... vii, 449–465 Bilayers.................................... 57, 60, 65, 177, 180, 182, 380, 392–395, 397, 452 Binding affinity...............................................11, 287, 385 Binding energy ............................................ 321, 385, 388 Binding sites .............................. 41, 42, 44–48, 123–125, 128, 132, 134, 321, 332, 392 Binormal vector............................................................. 110 Bio3D ................................................................... 420, 425 BioGPS ................................................................. 463, 464 Bioinformatics ................. v, vi, 1–13, 139–160, 221, 350 Biotherapeutics................................................................ 26 Block averages ............................384, 409, 433, 435, 441 Block covariance overlap...................................... 408–410 Bubble cavitation .......................................................... 189 Bubbles ................................................ 188–190, 212–215

C CABS-flex .............................7, 18–20, 22, 26, 31, 34–36 CABS model....................................................... 26, 33–36 Canonical ensemble (NVT)......................... 83, 149, 178, 189, 199, 207, 239, 244, 245, 350 Capsids.................................................................. 108, 109 Catalysis ......................................................................... 122 Cell metabolism ............................................................ 122 Central hydrophobic domain (CHD).........................402, 424–427, 434 Characterization ...................................48, 177, 284, 291, 294–297, 358–370, 460 Charges ............................... 4, 20, 55, 69, 108, 143, 145, 147, 148, 158, 179, 188, 209, 223, 225, 242, 283, 312–314, 316, 317, 336, 337, 381, 394, 406, 431 CHARMM .....................................................54, 384, 405 Charmm22* ..............................405, 408, 414, 415, 431, 432, 436–440 CHARMM36 ............................148, 393, 405, 406, 414, 415, 431, 432, 436–440 CHARMM36m.........................114, 147, 180, 237, 239, 381, 393, 405, 406, 414, 415, 431, 432, 436–440 CHARMM force field............................................... v, 405 Chemical shifts .................. 387, 390, 391, 433, 436–439 Circular dichroism (CD) .....................86, 140, 176, 282, 287–290, 292, 298, 299, 321 Cluster analysis ..................................................... 249, 256 Coarse-grained model.............................v, 47, 53, 58–62, 80, 107, 109, 110, 112 Coarse graining ..............................................58, 249, 369 Collision thermostat ..................................................... 227 Computational biology................................................. 460 Computational modeling ............................................. 452 Computer simulations .......................175–190, 236, 310, 316, 335

COMPUTER SIMULATIONS Conformational distribution ............................... 403, 411 Conformational energies ....................... 8, 124, 383–388, 394, 396, 428, 430–435 Conformations ........................5, 6, 9, 10, 44–45, 54, 55, 62, 84–88, 90, 91, 93–96, 98, 99, 105, 106, 114, 116, 123, 124, 128, 130, 131, 134, 145, 176, 178–182, 198, 201, 203–205, 209, 212, 237, 249, 257, 259, 261, 286, 289, 298, 310, 311, 344–346, 348, 354, 355, 366, 380–384, 391, 392, 394, 396, 397, 403, 406, 407, 411, 414, 415, 418, 430, 435, 441, 450, 452 Contact distances .......................................................... 352 Contact frequencies .............. 9, 247, 346, 348, 427, 429 Contact maps .....................................109, 246–249, 262, 267–271, 345, 352, 353, 366, 367, 369, 420, 427 Contacts.......................... 8, 43–47, 61, 63, 88, 106–110, 112, 114–118, 176, 181, 209, 246, 248, 249, 256, 257, 261, 262, 294, 311, 314, 317, 346–348, 352–354, 358–362, 365–367, 369, 372, 383, 385, 387, 392, 394, 395, 405, 411, 412, 420, 423, 424, 427–429, 441, 451 Continuum mechanics ................................ 358, 360, 361 Coordination ............................. 128, 130–132, 134, 319, 324–327, 329, 330, 332, 335–337 Counterion condensation ..........312, 314, 315, 317, 319 Cowpea chlorotic mottle virus (CCMV)............ 108, 109 CPAD 2.0 ............................................................. 463, 464 Car–Parrinello molecular dynamics (CPMD) simulations ........................................140, 144–148 Cross β-sheet ................................................................. 292 Cross-β sheet protein assemblies .................................. 343 Cross-correlation network analyses.............................. 403 Crowding effects ............................................................. 64 Cryo-electron microscopy (Cryo-EM) .................. 52, 92, 176, 296 C-terminus ........................................ 139, 223, 237, 316, 318, 405, 415 Cumulative secondary structure average ..................... 409 Curated Protein Aggregation Database (CPAD)........... 3, 463, 464 Cyclization..................................................................... 105 Cysteines .................................... 110, 123–125, 283, 285, 323, 336, 337 Cα–Cα distance ..................................................... 203, 204

D Databases ....................................... 2–6, 8, 13, 42, 61, 69, 349, 366, 462–465 Debye–Hu¨ckel............................................................... 110 Deep learning ............................. 140, 141, 154–160, 469 Density functional theory (DFT)....................... 140–144, 154–160 Deuterium oxide (D2O) ...................................... 291, 298

OF

AGGREGATION

OF

PROTEINS

AND

PEPTIDES Index 473

Dihedral angles.....................................82, 124, 204, 259, 293, 387, 435 Dimerization ..................................................98, 105, 262 Dimers .................................... 55, 57, 61, 86, 87, 98, 99, 109, 128, 140, 177, 180, 209, 210, 247, 248, 250, 252, 254–259, 261, 332, 335, 336, 382, 385, 396, 452 Discrete molecular dynamics (DMD).......................... 181 Disorder predictors ..................................... 140, 141, 153 Dissociation ............................... vi, 89, 95, 175–190, 248 Disulfide bonds .................................... 81, 110, 123–125, 285, 292 Divalent metal ions .............................................. 122, 127 DJ-1 .....................................................450, 453, 455, 457 Docking ..................................................... 41, 94, 95, 178 Dodecahedron box ....................................................... 350 Domain–domain interactions .............................. 429, 430 Dopaminergic neurons ........................................ 449, 464 Droplets ........................................ 83, 106, 111, 112, 470 Drug repurposing ......................................................... 462 Dyshomeostasis ............................................................. 127

E E-cadherins ............................................................ v, 41–49 Effective dimension D .................................................. 419 Elastic constants ............................................................ 108 Electric field.......................................................... 177, 185 Electron microscopy (EM) ........................ 140, 243, 244, 294–297, 300, 464, 469 Electron paramagnetic resonance (EPR) ..................... 131 Electrostatic interactions .......................10, 81, 110, 209, 242, 248, 312, 335, 381, 384, 393, 428 Electrostatics..............................12, 24, 41, 67, 179, 209, 227, 263, 264, 266, 351, 381, 384, 406, 430 Empirical contact potentials ........................................... 45 Empirical potentials ......................................... 41–49, 106 Environmental factors................................................... 131 Experimental .............................1, 4, 5, 7, 13, 17, 27, 36, 42, 52, 55, 63, 64, 67, 68, 86, 91, 95, 96, 99, 123, 126, 128, 129, 131–134, 140, 176, 177, 180, 181, 186, 187, 203, 209, 212, 214, 215, 231, 236, 261, 281–300, 311, 314, 319–328, 331, 334, 344, 348, 350, 358, 360, 363, 364, 366, 369, 372, 373, 380–382, 387, 391, 396, 402, 403, 429, 433, 435–440, 460, 461, 469 Explicit solvent ........................... 177, 180–183, 363, 364 Ewald method .....................................381, 393, 406, 421

F Fibril formation ............................. 53, 59, 60, 64, 66–68, 88, 128, 131, 181, 286, 294, 343, 402, 451 Fibrilizer .......................................................................... 11 Fibrillary states ....................................126, 128, 131–133

COMPUTER SIMULATIONS OF AGGREGATION OF PROTEINS AND PEPTIDES

474 Index

Fibrillation .............................. 13, 53, 63, 124, 126, 288, 289, 296, 402, 451 Fibril-like structure ......................................................... 63 Fibril-prone conformations ..............................................vi Fibril-prone state............................................................. 67 Fibrils .........................................9, 11, 52, 54, 55, 59, 61, 64–68, 88–95, 98, 99, 123, 125–127, 132, 134, 139, 140, 176, 179, 180, 185–190, 201, 204, 205, 212, 214, 222, 223, 256, 287, 288, 292–297, 299, 325, 326, 343, 357–374, 379, 380, 402, 405, 417, 429, 450–452 Fibril template ..........................61, 63, 88, 89, 91–96, 98 FISH Amyloid ................................................................... 9 Fluorescence ....................... 68, 176, 179, 286–288, 292, 345, 349, 350, 354, 373 FoldAmyloid................................................ 8, 9, 223, 224 Folded states................................................ 410, 412, 431 Force fields .................................... 24, 25, 53, 80, 82, 91, 96, 114, 180, 183, 190, 227, 237, 239, 310, 350, 355, 367, 381, 393, 403, 405–407, 431 Fourier transform infrared spectroscopy (FTIR) .......176, 282, 290, 291, 299, 464 Fraction of native contacts......................... 382, 383, 393, 410, 411, 413 Free energies......................... 68, 90, 92, 94, 97, 99, 179, 182, 202, 210, 211, 213, 236, 313, 314, 365, 384, 403, 408, 418, 421, 430, 431 Free energy landscape (FEL)..................... 52, 94, 98, 99, 179, 182 Free energy surfaces ............................198, 203, 418, 419 Frenet system of coordinates........................................ 109

G γ-secretase .................................................... 139, 452, 453 Generalized Born ................................................. 114, 177 Generalized-ensemble algorithm ............................ vi, 198 Globular proteins ......................................... 7, 12, 18, 47, 54, 344, 406 Glucantransferase Bgl2p ...................................... 222, 223 Glycemic regulation ...................................................... 122 GoMartini model ................................................. 370, 374 Go ¯ model..................................................... 369, 370, 374 Groningen machine chemical simulations (GROMACS) .................. 11, 147–149, 237–243, 245, 246, 261, 345, 350, 355, 387, 391, 406

H Hamiltonian replica exchange with solute scaling (HREX)....................................407, 408, 410, 411 Helical intermediates ..........................402, 405, 411, 418 Helical tetramers ............... 380, 381, 387, 388, 394–397 Helix–helix associations ...............................402, 418–426 Hemoglobin ..............................................................47, 48

Hertz model .................................................................. 366 High performance computing (HPC)................ 338, 365 High-resolution..................................178, 285, 292–298, 357, 358, 361 High-speed AFM .......................................................... 294 Hot spot ............................................................... 1, 3, 7, 8 HTRA2 ................................................................. 453, 457 Hydrogen bonds (HBs).................47, 88, 90–95, 97–99, 179, 189, 204, 205, 257, 282, 283, 365, 367 Hydropathy .......................................................... 112, 425 Hydrophobic clusters............................42, 128, 402, 405 Hydrophobic interactions........................ 44, 46, 67, 204, 428, 429, 432, 434 Hydrophobicity ......................... vi, 4, 31, 35, 41–48, 283 Hydrophobic packing ................................. 381, 385, 392 Hydrophobic patches................................................44–48

I Indentation..........................................360, 362, 366, 371 Insulin .......................................................... 122, 286, 292 Interaction energies ...............................6, 179, 384, 427, 428, 430–433 Interaction maps .................................403, 427, 428, 430 Intrinsically disordered proteins (IDPs) ................ 48, 53, 55–57, 59, 105–107, 109, 114, 140, 177, 179, 235, 316, 344, 348, 397, 401, 405, 408, 412, 420, 433 In vitro ..............................108, 123, 124, 128, 131, 175, 176, 185, 285, 286, 288, 295, 296, 298, 311, 326, 335, 337, 345, 374 Ions ...........................122–128, 130–134, 140, 144, 147, 149, 203, 242, 244, 246, 262, 299, 309–338, 350, 351, 363, 372, 394, 406 Isoforms..................................................... 3, 92, 344, 345 Isolation ......................................................................... 282 Isothermal-isobaric ensemble (NPT)................. 178, 189, 199, 200, 239, 245, 351, 406

J Jarzynski equality .......................................................... 365 J-coupling constants ..................387, 390, 391, 436, 439

K Karplus equations........................................ 391, 433, 435 Kinetics of fibril formation .......................................53, 64 KLVFF ............................................................................. 95 KLVFFAE ............................................................... 55, 200

L Lasers ................................. 185–188, 288, 292, 297, 299 Lattice models .................................................. 65–67, 179 Leucine-rich repeat kinase 2 (LRRK2) .............. 450, 453, 456–458

COMPUTER SIMULATIONS Lewy bodies......................... 54, 379, 449, 450, 453, 455 Ligands ................................. 42, 46, 132, 144, 222, 236, 310, 319, 322, 332, 334, 336, 337 LINCS ........................................ 245, 264–266, 381, 406 Lipid membranes .......................177, 180, 294, 450, 451 Liquids .......................................106, 110, 212, 282, 285, 296, 309, 310, 314, 354, 360 Local secondary structures ........................................... 344 Long-range contacts ..................................................... 346

M Machine learning..............................2, 5, 7–9, 11–13, 69, 141, 182, 460 Markov state models (MSMs) ...................... vii, 236, 237, 239, 240, 242–256, 258–262, 264, 265 Martini force field ...............................180, 367, 372, 374 MDAnalysis .......................................................... 237, 246 MDTraj .......................................237, 246, 433, 435, 439 Mechanical stability............................... 68, 357–370, 374 Membraneless organelles ..................................... 106, 470 Metal-amyloid aggregate .............................................. 122 Metal ions ...................................... 52, 69, 122–124, 127, 128, 131, 134, 177, 319–321, 326, 330, 337 MetAmyl ............................................................................ 9 Metropolis algorithms ..........................66, 203, 206, 207 Microscopy .........................................222, 228, 285–287, 291, 294, 297, 363 Misfolding ......................................... 180, 281, 320, 321, 450–451, 465 Miyazawa-Jernigan.......................................................... 45 MM/PBSA ........................................................... 428, 431 MODELLER ......................................................... 13, 147 Molecular dynamics (MD) ........................ 41, 52, 55, 58, 61, 63, 80–87, 89, 95, 96, 99, 106, 124, 128, 129, 132, 134, 140, 141, 144, 149, 178, 180–183, 185, 186, 188, 189, 198–215, 221–232, 236–253, 256–267, 320, 327, 328, 344–346, 348–355, 358, 362–366, 369, 371–374, 381, 386, 391, 393, 394, 403, 406–408, 411, 420–423, 433, 435, 451, 469 Molecular dynamics (MD) simulation......................... 198 Molecular modeling..................... 13, 122, 124, 131, 132 Monomeric state ..................................... 67, 99, 259, 405 Monomers ..................................... 48, 52, 55, 57, 61, 63, 65–68, 83, 85–86, 88–96, 99, 111, 123, 128, 129, 132, 134, 139, 140, 176, 179–181, 185, 189, 197, 201, 210, 211, 237, 238, 247, 256–259, 261, 288, 294, 299, 328, 335, 336, 371, 379–388, 391, 396, 397, 401–405, 418, 426, 433, 452 Monte Carlo (MC) .................................... 58, 62, 65, 66, 178, 179, 203, 207, 208, 451 Monte Carlo simulations ............... 62, 66, 178–179, 451 Morphologies ........... 63, 64, 69, 89, 294–296, 298, 452 mTIP3P ............................. 414, 415, 431, 432, 436–440

OF

AGGREGATION

OF

PROTEINS

AND

PEPTIDES Index 475

Multibody potentials....................................................... 63 Multimers ................................... 380, 382–389, 396, 429 Multiplexed replica exchange molecular dynamics (MREMD) .....................................................83–85 Mutations ........................................ 1, 4, 7, 9, 13, 18, 19, 21, 24–28, 30, 31, 33–38, 60, 107, 180, 182, 222, 283, 320, 344, 345, 348, 349, 380–382, 388, 397, 450–453, 457, 458

N NAC domain ........................................................ 132, 426 NAMD........................................114, 237, 338, 430, 431 Nanoindentation ........................................................... 108 Nerve cells ................................................... 235, 449, 455 Net charge ................................................... 242, 363, 392 NetCSSP ............................................................................ 8 NetTurnP ...................................................................... 349 Network models.........................160, 236, 262, 420, 422 Neurodegeneration .............................................. 180, 452 Neurodegenerative diseases ............................51, 79, 105, 122, 139–143, 145, 147–149, 152–154, 158, 160, 190, 197, 235, 282, 320, 326, 379 Neurofilaments .............................................................. 450 Neurons ........................................ 79, 127, 335, 452, 455 Neurotoxic....................................................140, 379–397 NMR observables................................................. 433, 435 Nonbonded interactions................................57, 165, 368 Non-equilibrium MD (NEMD) .................177, 184–189 Normal vector ............................................................... 109 Nose´–Hoover thermostat ............ 82, 146, 199, 200, 203 NPT ensemble ................... 149, 227, 245, 265, 351, 407 nrPDB.............................................................................. 48 N-terminus ................................ 139, 176, 223, 316–318, 334, 335, 405, 414 Nuclear magnetic resonance (NMR) ................... 89, 131, 140, 147, 176, 203, 204, 282, 292, 293, 296–299, 309, 321, 326, 387, 391, 403–405, 433, 451, 469 Nucleation ..................................... 52, 53, 59, 60, 63, 64, 66, 176, 179, 181, 258, 287, 294 Nucleation mechanism ................................................. 181 Nucleus .......................52, 53, 64–67, 88, 179, 180, 188, 213, 214, 292, 310, 453 NVT ensemble................... 149, 189, 227, 244, 264, 406 NWCHEM package...................................................... 144

O Oligomers ......................... 52, 53, 55, 59–65, 79, 85–87, 91, 121, 123, 124, 126–130, 132–134, 176–182, 190, 197, 201, 202, 209–211, 215, 222, 224, 225, 227, 228, 230–232, 235, 246–249, 257–259, 261, 262, 286, 288–292, 294–296, 298, 335, 379, 380, 401, 402, 418, 426, 451 Omics data..................................................................... 462

COMPUTER SIMULATIONS OF AGGREGATION OF PROTEINS AND PEPTIDES

476 Index

OPEP force field ........................................................... 179 OPLS force field.......................................................... v, 57 Optimized helical stabilities................................. 426–433 Optimized potential for efficient peptide (OPEP)....... 58, 60, 61, 66, 179, 180, 183 Orphan drugs ................................................................ 462 Oxidative stress.................. 122, 123, 127, 326, 450, 455

P Packing ............................ 8, 10, 177, 178, 221–232, 237 Packmol ................................................................ 237, 238 Pafig ............................................................................... 8, 9 Pairwise % contacts..............................346, 347, 352, 353 PALES.......................................................... 433, 435, 439 Pancreas ......................................................................... 122 PARK2 ................................................450, 453, 455, 458 ParkDB ................................................................. 463, 464 Parkin ...................................................450, 453, 455, 458 Parkinson’s ...................................... 54, 60, 79, 221, 235, 261, 320, 455, 463, 464, 469 Parkinson’s disease (PD) ..............................51, 121–123, 130, 131, 140, 175, 197, 379, 380, 387, 401, 449–465 Parkinson’s Disease Data and Organizing Center (PDDOC) .......................................................... 464 Parkinson’s Disease Research Database (PDRD) ....... 463 Parrinello-Rahman ............................................... 266, 351 Partially folded states .................................................... 424 Particle mesh Ewald (PME) ...................... 263, 264, 266, 351, 381, 393, 406, 421 PASTA........................................................................6, 349 PASTA 2.0 ...........................................6, 7, 223, 224, 344 Path analyses................................................ 420, 422, 423 Pathogenic mutations ............................................ 18, 345 PATHs .................................. 13, 22, 24, 25, 34, 68, 180, 313, 321, 365, 422, 424–426 PDbase.................................................................. 463, 464 Peptide aggregation .............................57, 62, 79, 83, 91, 99, 237, 261, 297, 355, 465, 470 Peptides, v ...................................vi–viii, 1–13, 51–55, 57, 58, 60–69, 79–81, 85–86, 88, 90, 91, 95–99, 107, 122, 123, 125–129, 139, 140, 176–186, 188, 189, 197–215, 221–232, 236–242, 244, 246–249, 251, 253, 256–259, 261, 262, 281–295, 297, 298, 300, 324–330, 332–335, 337, 344–355, 372, 374, 387, 392, 394, 402–404, 406, 411, 416, 418–421, 428, 430, 431, 433, 434, 452, 453, 463, 464 Peptide self-assembly ........................................... 402, 418 Periodic boundary conditions (PBCs)................. 68, 114, 141, 148, 212, 227, 242, 247, 258, 351, 406 Phase diagrams ...............................................63, 110, 111 Physiological conditions ............................ 221, 242, 296, 300, 381, 387, 401, 455

Platelets.......................................................................... 112 PLUMED ...................................................................... 349 Poisson coefficients ....................................................... 362 Polyalanines ............................................................ 63, 110 Polyglutamines ...............................................63, 110, 197 Polymorphic states ............ 123, 124, 130, 132, 134, 336 Polymorphism ................................... 124, 128, 131, 132, 296, 310, 311, 380, 411 Population shifts ..........................................130, 418–420 Power law .................................................... 107, 112, 114 Predictive analysis................................................. 459–460 Preeclampsia ................................................................ v, viii Presenilin 1 (PS1) ................................................ 452–454 Presenilin 2 (PS2) ................................................ 453, 454 Pressure......................................188, 189, 199, 200, 203, 210, 212–215, 227, 245, 266, 351, 359, 360, 381, 383, 393, 406, 407 Principal component analysis (PCA).............84, 408, 409 Prions ...................................... 3, 4, 6, 57, 175, 237, 310, 326, 330, 334, 335, 344 ProADD ............................................................... 463, 464 PROFASI....................................................................... 179 Protein aggregations .................................... 2, 13, 17, 26, 46–49, 51–69, 79–101, 106, 175–183, 185, 186, 188–190, 198, 235–237, 239, 240, 242–262, 264, 265, 291, 311, 348, 354, 365, 449–465 Protein assemblies ........................................................... 63 Protein design ...........................................................17–38 Protein flexibility .............................. 7, 19, 20, 22, 24, 31 Protein folding .............................. 52, 54, 59, 60, 67, 81, 178, 180, 236, 314 Protein misfolding ...................................... 291, 354, 450 Protein–protein binding ...........................................41–49 Protein-protein docking ............................................... 385 Proteins............................................. 1–14, 17–19, 21–31, 33, 34, 36–38, 41–48, 51–61, 63, 64, 68, 69, 79, 81, 83–85, 105–112, 114–116, 118, 121, 127, 130, 131, 139–141, 148, 149, 152, 153, 175–180, 182–185, 187, 188, 190, 197, 198, 200, 213, 215, 221–225, 231, 232, 235–237, 241, 244–248, 257, 258, 262, 264, 266, 281, 282, 285–298, 309–311, 314–335, 337, 343, 344, 357, 358, 363–367, 369, 371, 372, 374, 380, 381, 383, 385, 387, 393, 403, 405–408, 410–412, 420, 422, 450–453, 455, 457–465, 470 Protein stability ................................ 7, 25, 29, 32, 37, 69 Protein tau ...........................................107, 116, 117, 344 Protein therapeutics ........................................................ 17 Protein unfolding.......................................................... 369 Protofibrils .................................52, 63, 64, 91, 140, 178, 202, 215, 288, 289, 294, 299 Psipred ........................................................................... 349 PTEN-induced kinase 1 (PINK1) .....450, 453, 455–457 Pulling..................................................189, 364, 371, 372

COMPUTER SIMULATIONS PUMA ........................................................................... 225 Purification .................................................. 282, 284, 285 PyEMMA..................................................... 249, 250, 254 PyMOL .................22, 88, 147, 237, 346, 347, 350, 355 Python ..........................7, 12, 13, 18, 38, 154, 237, 246, 247, 249, 262, 267–271, 435

R Raman spectroscopy (RS)........................... 289, 292, 297 Redox.................................................................... 124, 134 Replica exchange molecular dynamics (REMD).......... 55, 62, 83, 85, 89–91, 93, 97, 116, 128, 148, 149, 182–184 Replica exchanges.......................... 83, 84, 141, 178, 182, 198, 203–210, 213, 215, 407, 408 Replica exchange with solute tempering (REST) ........ 90, 94, 141, 182, 324, 343, 386, 407, 416, 429 Replica-permutation method (RPM) ................ 198, 203, 207, 209, 215 Residual dipolar coupling (RDC) ...................... 433, 435, 436, 439, 440 Restricted Contacts of Structural Unit rCSU ............106, 367–369 RFAmyloid ...................................................................... 12 Ribosomal protein S1 ................................................... 223 Ribozymes ..................................................................... 122 Root mean square fluctuation (RMSF) ................. 26, 31, 34, 345, 346, 352, 353 Rosetta ........................................................................... 178 Rupture force ................................................................ 295

S Secondary chemical shifts ............................................. 433 Secondary structures ................................ 7, 8, 12, 59, 60, 99, 124, 127, 131, 176, 178, 188, 209, 210, 226, 228, 283, 287–290, 292, 297, 298, 330, 346, 349, 365, 374, 379, 383, 387, 409, 414, 433 Self-aggregation ................................................................ 1 Self-assembly ................................. 26, 51, 53, 63–65, 67, 68, 108, 109, 114, 121, 123, 127, 236, 249, 288, 294, 344, 348, 401, 402, 405, 418, 419, 426, 452 Shape complementarity................................................... 41 Shear ...........................................177, 184, 210, 215, 374 Shell model .................................................................... 108 SHIFTX2..................................................... 391, 433, 436 Short-lived helical monomers....................................... 441 Sidechains ........................................ 64, 80, 81, 109, 209, 210, 336, 337, 393 Simulated tempering (ST) .......................... 183, 184, 198 SLLOD method ............................................................ 210 SNCA ............................................................................ 450 Solid-film ....................................................................... 291 Solid state ............................................293, 296, 297, 299

OF

AGGREGATION

OF

PROTEINS

AND

PEPTIDES Index 477

Solid-state NMR ........................... 52, 91, 176, 178, 201, 203, 212, 222, 298 Solubility......................................... v, 7, 17–38, 130, 284, 293, 297, 298 Solvent accessible surface area (SASA)............... 354, 411, 414–417, 430 Solvents......................................7, 10, 12, 31, 36, 43, 45, 63, 66, 80, 81, 107, 108, 114, 149, 177, 179–182, 186, 209–211, 240, 242, 244, 246, 282, 284, 285, 291, 293, 312, 319, 335, 348, 349, 365, 373, 384, 407, 411, 415–418, 430, 434 Specific heat ................................................................... 112 Specificity .....................31, 42, 43, 45–47, 363, 372, 420 Stabilities....................................9, 11, 18, 20, 25, 27, 28, 30, 35–37, 44, 46, 54, 55, 57, 59, 64, 88, 89, 91, 95, 130, 182, 186, 228, 232, 284, 287, 311, 328, 332, 346, 357–375, 383, 385, 394, 396, 402, 417, 426–434 Staining .............................. 285–288, 291, 295, 298, 299 Steered molecular dynamics (SMD) ...........363–365, 369 Stiffness.............................. 110, 184, 295, 314, 364, 372 Structural plasticity .............................................. 311, 312 Structure-based coarse graining model .............. 365–369 Structure-based design ................................................... 27 Substantia nigra.................................................... 449, 464 Supervised learning ...................................................5, 141 Suwa Todo algorithm ........................................ 205–208 Synaptic vesicles........................................... 379, 393, 455 Synthesis ......................................... 46, 51, 222, 282–285 Synucleinopathies................................................. 451, 452

T Tango........................................................... 5, 8, 344, 349 Tau ...............................................61, 79, 92, 96, 99, 116, 117, 127, 140, 176, 177, 266, 344, 345, 349 Tau proteins..........................................51, 57, 60, 91, 92, 97–99, 127, 175, 179, 450 Tensile ..................................................369, 371, 372, 374 Terminal deletions....................................... 431, 433, 434 Tetramers ......................................... 57, 61, 86, 176, 182, 210, 211, 248, 335, 336, 380–382, 385–397 Thioflavin T (ThT) fluorescence .................... vii, 68, 344, 345, 354 3D profiles ......................................................................... 6 TIP3P ...................................................54, 148, 189, 227, 237, 381, 393, 405 TIP4P/2005 ..................................... 405, 406, 414, 415, 431, 432, 436–438 TIP4P-D....................................405, 406, 408, 414, 415, 431, 432, 436–440 Toxic aggregates............................................................ 121 Toxicity ......................................... 55, 130, 182, 287, 288 Transition networks ............................. 62, 180, 235–262, 264, 265, 271–278

COMPUTER SIMULATIONS OF AGGREGATION OF PROTEINS AND PEPTIDES

478 Index

T-REMD simulation ....................................140, 147–152 Trimers............................................. 55, 61, 86, 182, 183, 210, 213, 214, 247, 328, 382, 385, 387, 396 Turn region ....................................... 402, 404, 411, 416, 424, 427, 428, 430 Type II diabetes (T2D) ....................................v, 121–123 Type IV pili.................................................................... 371

U Ubiquitin ..................................................... 450, 453, 455 Ultrasonication..................................................... 185, 214 Unidirectionality ........................................................... 205 UNited RESidue (UNRES) ....59, 61, 62, 80–86, 88, 96 UNRES force field ....................................vi, 79–100, 181 UNRES model of polypeptide chains......................80, 81

V van der Waals radii ...................................... 106, 158, 225 Vibrational spectroscopy...................................... 289, 298 Virus capsids ......................................................... 107–109 Virus dissociation .........................................177, 184–189 Visualization .................................. 23, 26, 141, 143, 147, 237, 252, 285, 288, 291, 294, 295, 297, 300, 363

VMD..........................................114, 144, 147, 237, 247, 258, 338, 364, 386, 430, 431

W Waltz ............................................................ 8, 9, 223, 224 Waltz DB ...................................................................3, 349 WALTZ-DB 2.0 ...............................................3, 463, 464 Weighted histogram analysis method (WHAM) .....83, 85 WSAXS spectra .............................................................. 186

X X-ray...........................................134, 140, 147, 292, 296, 309, 311, 314, 321, 325, 329, 469 X-ray absorption..................................133, 311, 321–322 X-ray diffraction (XRD)............................. 292, 297, 299, 300, 314, 315, 326

Y YASARA....................................................... 223, 225, 228 Young modulus ...................................360–362, 366, 372

Z ZipperDB..................................................... 4, 6, 344, 349