Concepts in Protein Engineering and Design: An Introduction [Reprint 2020 ed.] 9783110874242, 9783110129755

172 26 113MB

English Pages 396 Year 1994

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Concepts in Protein Engineering and Design: An Introduction [Reprint 2020 ed.]
 9783110874242, 9783110129755

Citation preview

Concepts in Protein Engineering and Design

Concepts in Protein Engineering and Design An Introduction Editors Paul Wrede • Gisbert Schneider

W DE G Walter de Gruyter • Berlin • New York 1994

Editors Paul Wrede, Priv. Doz., Dr. rer. nat. Gisbert Schneider, Dr. rer. nat. Freie Universität Berlin Universitätsklinikum Benjamin Franklin Institut für Medizinisch-Technische Physik und Lasermedizin AG Molekulare Bioinformatik Krahmerstraße 6 - 1 0 D-12207 Berlin Germany With 108 figures and 15 tables Cover illustration: Structure of FELIX, a de novo designed protein (see: M. H. Hecht, J. S. Richardson, D. C. Richardson, R. C. Ogden (1990). Science 249, 884-891; courtesy of Jane Richardson) Library of Congress Cataloging-in-Publication Data Concepts in protein engineerung and design : an introduction / editors, Paul Wrede. Gisbert Schneider. Includes index. ISBN 3-11-012975-2 1. Protein engineering. I. Wrede, Paul, 1948— II. Schneider, Gisbert, 1965TP248.65.P76C66 1994 660'.63-dc20 94-30261 CIP Die Deutsche Bibliothek - Cataloging-in-Publication Data Concepts in protein engineering and design : an introduction / ed. Paul Wrede ; Gisbert Schneider. — Berlin ; New York : de Gruyter, 1994 ISBN 3-11-012975-2 NE: Wrede, Paul [Hrsg.] © Printed on acid-free paper which falls within the guidelines of the ANSI to ensure permanence and durability. © Copyright 1994 by Walter de Gruyter & Co., D-10785 Berlin All rights reserved, including those of translation into foreign languages. No part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Typesetting: Arthur Collignon GmbH, Berlin. — Printing: Karl Gerike GmbH, Berlin. — Binding: Dieter Mikolai, Berlin. - Cover Design: Hansbernd Lindemann, Berlin. — Printed in Germany.

Preface

New techniques often result from the fusion of different scientific disciplines and ideas. The progress made depends mainly on the current degree of knowledge in each of the individual disciplines and the abilities of the interacting scientists. The development of the bioscienes in recent decades, in which physics and chemistry have played a central role, has resulted in a deeper understanding of the structure and function of proteins and nucleic acids. In the present situation, the new fusion of scientific disciplines in the biosciences even includes fields such as computer science, engineering and technology. This continuing process makes the field of protein engineering and design an exciting adventure with many surprising and unforseeable developments. The nine chapters in this book present a bird's-eye view of the current state of such a fusion process. The topics selected start with an extensive introduction to the field (Chapter 1), followed by two papers on the fundamental prerequisites for protein engineering: the analysis and characterization of proteins (Chapter 2); the determination of their spatial structure, modeling and site-directed mutagenesis (Chapter 3). The concepts and significance of structural and functional design are presented in the following part (Chapter 4 and 5). Not only successes but drawbacks and limitations as well are reported on. The crucial problem in protein design is still the lack of thorough knowledge of the rules describing how an amino acid sequence folds up into a given three-dimensional structure. Nonetheless, the understanding we do have of protein folding has already led to a number of successful design approaches. In the near future protein design will prove to be an additional powerful method in biotechnology. Nevertheless, considerable efforts in many parts of science will be necessary to attain this goal. In Chapter 6 and 7 examples of two rather different approaches show that, on the one hand, a thorough understanding of organic and physical chemistry is necessary for creating successful design and, on the other, a good background in biochemistry and cell biology cannot be gotten around. The first example describes the design of catalytic antibodies for the catalysis of special reactions between organic molecules. Now, a designed protein is able to direct an organic chemical reaction (Chapter 6). The second paper describes the design of regulatory sequences for membrane and secretory proteins. The method has yielded

VI

Preface

very promising results for many biotechnological applications (Chapter 7). In particular, combinations of in vivo and in vitro design will be fused with rational design procedures to enhance such applications in the near future. In addition to the, in part, rather well understood in vivo and in vitro design approaches described in Chapters 1 to 7, several entirely computer-based examples of in machina design are introduced in Chapters 8 and 9. One method is based on artificial neural networks and systematic optimization of amino acid sequences by "simulated molecular evolution", another makes use of special computer programs for designing nanoscale DNA-structures. Such techniques are likely to be components of future design procedures, and they might be of value for designing other biomolecules as well. This book is mainly addressed to novices in the field of protein design with a background in biochemistry, biophysics, physical chemistry and, of course, to a certain extent in computer science. We hope that the book will be an introductory guide for the versatile scientist who would like to combine different scientific concepts for the creation of innovative approaches in applied protein science. Of course, many aspects have not been considered here and the interested reader is referred to the literature cited. The bold-faced words in the text can be found in the glossary at the end of the book, which has been included to keep the reader from getting lost in a terminology which in part derives from various fields of scientific research and in part is still emerging. We would like to express our great pleasure at the fact that one of the founders of protein design, Jane Richardson, wrote the Prologue, giving an overview of the current state of research. We are equally thankful to Alexander Rich for the Epilogue summarizing things to come, based on his background of more than 40 years of experience in pioneering work in molecular biology. We thank all our colleagues who supported us in editing this book. We would especially like to thank Georg Biildt, Heinz Schweppe, Wolfram Saenger, and Andreas Pluckthun for their help and advice. Further, we thank George Rosenberg, Petra Schneider and Johannes Schuchhardt for their assistance. Monika Bulang-Lorcher is thanked for the excellent graphic work and Peter Germain for his extremely careful proofreading. The German Ministry of Research and Technology (BMFT), the Deutsche Forschungsgemeinschaft (DFG), and the Fonds der Chemischen Industrie (FCI) are thanked for support. We are grateful to Michael Schonherr of the Walter de Gruyter Verlag for his farsighted decisions, encouragement and support at the outset of this project and to Mario Noyer-Weidner, who continued the project with great engagement. Berlin, July 1994

Paul Wrede Gisbert Schneider

Contents

Abbreviations Prologue fane S. Richardson 1 A n Introduction to Protein Engineering

XV XVII

1

Thomas }. Graddis, Dale L. Oxender 1.1 Protein Engineering is a N e w Addition to the Biotechnology Revolution

1

1.2 Proteins Consist of Several Structural Elements

2

1.3 Technologies that Enable the Development of Protein Engineering . . Recombinant DNA and Genetic Engineering Technologies 4 — Protein Purification Plays a Role in the Cycle of Protein Engineering 8 — Functional Analysis is Used to Evaluate Structural Changes of Engineered Proteins 11 - Structural Analysis of Proteins is Essential 12 - The Use of X-Ray Crystallography in the Structural Analysis of Proteins 12 — Nuclear Magnetic Resonance Spectroscopy is an Emerging Technology 13 — The Use of Design and Prediction in Protein Engineering 14 — De novo Design of Proteins is an Important Long-Term Goal 15 — Knowledge-Based Structural Prediction is Used to Model Proteins of Unknown Structure 16

3

1.4 There are Currently Several Bottlenecks to the Field of Protein Engineering

17

The Rules Governing Protein Folding are Complex 18 — A Few Recognized Steps in Protein Folding are Beginning to Emerge 19 — Chaperones are Proteins that Assist the Folding of Other Proteins 19 - Understanding the Structure-Function Relationship is Central to Protein Engineering 20 - Computational Chemistry is Finding Increasing Utility 20 — Data Base Management and Computer Graphics are Useful Tools 21 — Choosing a Prptein Expression System is Often an Empirical Science 22 — Advantages and Disadvantages of Protein Expression Systems 23 1.5 A Progress Report in Protein Engineering Includes Many Exciting Topics . Attempts to Increase Protein Stability have Met with Outstanding Success 25 - Altering Enzyme Catalytic Activity has been Achieved in a Number of

24

VIII

Contents Systems 27 - Enzyme Specificity and Molecular Recognition are Fundamental to Enzyme Mechanism 28 — Peptide Hormones are Promising Targets for Protein Engineering 30 — Significant Effort has been Applied to the Engineering of Antibodies 32 — Humanized Antibodies are Providing Important Therapeutic Agents 32 - Catalytic Antibodies May Become Important Tools for Synthetic Chemists 33 — Random Libraries Represent a Semi-Rational Approach to Protein Engineering 33 — Synthetic Peptide Libraries Provide a Systematic Alteration of Peptide Sequence 34 — Combinatorial Libraries Generate Numerous Antibodies with Varied Binding Activity 35 — Phage Display Libraries Permit the Functional Screening of Vast Numbers of Protein Sequences 35 — The Future of Nanotechnology may Utilize DNA Structural Engineering 37

1.6 The Application of Protein Engineering to Agricultural, Industrial, and Medical Arenas has Produced Useful Products

38

Agricultural Applications 38 — Industrial Applications 40 — Medical Applications 41 — Mass Screening of Natural Products will Utilize the Results of Engineered Proteins 43 — Growth Factors are a Promising Area for Protein Engineers 44 - The Marriage of Electronics and Biochemistry will Provide Many New Products 44

2 Analysis and Characterization of Proteins Brigitte Wittmann-Liebold, Peter

47

Jungblut

2.1 Introduction: How to Approach the Structure and Function of Proteins

47

By the Reductionistic Approach Individual Proteins are Selected for Investigation 49 — By the Global Approach Single Proteins from Thousands of Polypeptides in the Cell are Selected 50

2.2 The Various Preparative and Analytical Protein Purification Techniques How to Consider an Appropriate Strategy for Protein Isolation 50 - Prefractionation is Necessary for Reducing the Number of Proteins in a Complex Protein Mixture 53 — Strategies to Obtain Pure Proteins in a Few Steps 54 — Conventional Column Chromatography for Preparative Isolation of Proteins 54 - Gel Filtration is Based on Differences in Molecular Mass 54 - Ion Exchange Chromatography Allows the Isolation of Native Proteins 55 — Hydrophobic Interaction Chromatography may Assist Isolation of Membrane Proteins 56 - Covalent Chromatography Binds Proteins to Supports for Selective Separation 56 — Immobilized Metal Affinity Chromatography Uses Metal Chelate Formation 57 — Affinity Chromatography is Frequently Applied in Immunological Investigations 58 — HPLC-Separations and 2-DE-Polyacrylamide Gel Electrophoresis Resolve Complex Protein Mixtures 60 — Electrophoresis is Applied for Separating Highly Complex Protein Mixtures 60 — Electrophoresis in Combination with Blotting are Fast Means of Screening Proteins for Microsequencing and Immunostaining 65 — One Band in SDS Gel Electrophoresis Does Not Guarantee Purity of a Protein 67 — Desalting and Concentrating of Protein Fractions 69 — Spectroscopic Methods Can be Used for the Preliminary Quantitative Determination of Proteins 71 — Amino Acid Analysis is a First Step towards Characterizing a Protein 72

50

Contents

IX

2.3 Investigation of Proteins and Protein Complexes Identification of Subunits 73 - Extraction of Proteins from Hetero-Complexes 73 — End-Group Determinations Yield Information about Subunit Structures in Multi-Component Complexes 74 - Modern Mass Spectrometry Makes Possible Analysis of Proteins and Peptides 75 — Reduction and Alkylation versus Treatment by Detergents 75 — Separation of Subunits 75

72

2.4 Strategies for Primary Structure Analysis of Proteins and Peptides. . Generation of Peptide Fragments 76 — Cyanogen Bromide Cleavage Releases Large Fragments for Sequence Alignment 76 — Mild Acid Treatment Generates Peptides from Insoluble Proteins 77 - Enzymatic Protein Digestions Yield Suitable Peptides for Internal Sequence Analysis 77 - Assignment of Cysteine Residues and Cystine Bridges in Proteins 79 — Mass Spectrometry of Cysteine and Cystine Peptides 80 — Crosslinking of Neighboring Constituents 81 — Protein-Protein Crosslinking Yields Important Information about Distances in the Complex 82 — Investigation of the Crosslinked Proteins on the Amino Acid Level Yields a Valuable Fine Structure Analysis of the Complex 84 Crosslinks between RNA (DNA) and Protein are More Difficult to Analyze than Protein-Protein Crosslinks 86 — Localization of Structural and Functional Domains 87 — Surface Peptides in Ribosomes 87 - Immunological Studies 88 — Anti-Protein Antibody Studies 88 - Synthetic Peptides are Useful for Detecting Sequence-Specific Antigenic Sites 89

76

2.5 Protein Microsequence Analysis The Chemistry of the Stepwise Edman Degradation Technique 91 — Advantages of the Automatic Microsequence Procedure 93 - Sample Preparation for Automated Sequence Analysis 94 - Manual Methods are Applied to Screen Many Peptide Samples Simultaneously 94 - Dansyl-Edman Degradation Yields Highly Sensitive Fluorescent Amino Acid Derivatives 95 — The Manual DABITC /PITC-Double Coupling Method is Used for Visual Detection of the Released Amino Acids in the Picomole Range 95 — Efforts to Develop Fluorescent Isothiocyanates for a Sensitive Degradation Failed 96 — Manual Methods Allow the Simultaneous Degradation of Many Peptide Samples 96 — DNASequencing of the Protein's Gene 97 - Animo Acid Sequencing is Necessary for Confirming the Deduced Gene Sequence 98 — Mass Spectrometric Methods 98 — Mass Ion Determinations have Become Possible for Big Proteins Now 99 — Sequencing by Mass Spectrometry is a Complementary Technique to the Edman Degradation 100 — Determination of the C-Terminal Amino Acids is Difficult and of Limited Relevance 100

90

2.6 Conclusions 3 Structure Determination, Modeling and Site-directed Studies Ulrich Hahn, Udo Heinemann

102 Mutagenesis 109

3.1 Introduction

109

3.2 A Model System: Ribonuclease T1 Ribonuclease T1 is Small, Stable and Water-Soluble 112 — Single-Stranded RNA is Cleaved by Ribonuclease T1 Specifically after Guanine 112

110

X

Contents

3.3 Methods of Structure Determination, Modeling and Mutagenesis . . 114 The Three-Dimensional Structure of a Protein Molecule can be Determined with High Accuracy 115 — X-Ray Crystallography Yields High-Resolution Structures in the "Solid" State 117 — Nuclear Magnetic Resonance Spectroscopy Yields an Average Solution Conformation 122 - Databases Archive Structural Information 125 - Modeling and Force-Field Calculations Can Help in the Planning and Understanding of Mutagenesis Experiments 127 — Recombinant DNA Technology Permits the Creation of Protein Molecules at Will 128 - Foreign DNA Fragments can be Biologically Amplified in Bacteria such as Escherichia coli 134 - High-Level Expression Provides the Amounts of Proteins Required for Many Experiments 138 — Site-Directed Mutagenesis Changes Protein Sequences according to Prespecified Goals 138 3.4 The Structures of Natural and Mutated Ribonuclease T1 145 The Structure of Wild-Type Ribonuclease T1 has been Determined by X-Ray Crystallography 146 — Ribonuclease T1-Inhibitor Complexes Provide Information about Enzyme Function 147 — Single Amino Acid Mutations may have Drastic Effect on Protein Function and Conformation 155 - NMR Spectroscopy Shows Protein Structure and Flexibility 156 3.5 Catalysis and Specificity Active Site Mutations Help Us to Unterstand the Mechanism of RNA Hydrolysis 158 - The Prediction of Protein Variants with Altered Substrate Specificity Remains a Problem 159

157

3.6 Folding Pathways 160 Ribonuclease T1 may Unfold and Refold with Complete Restoration of Activity 160 — Cis-trans Isomerization of two Prolines is the Rate-Limiting Step in Ribonuclease T1 Folding 160 3.7 Protein Stability 161 The Stability of Ribonuclease T1 and Other Proteins is Very Low 161 — Hydrophobic Interactions and Hydrogen Bonds Contribute Equally to the Stability of Ribonuclease T1161 3.8 Internal Motions in Ribonuclease T1

162

3.9 Modeling of Homologous Ribonucleases

163

3.10 Conclusions

164

4 Rational Design of Proteins with New Properties Dietmar Schomburg

169

4.1 Overview - The Current Status of Rational Protein Design 169 Interdisciplinary Work is Essential for the Protein Design Cycle 172 — Step 1: Screening, Purification and Characterization 172 — Step 2: Cloning, Expression and Genetic Engineering of the Wild-Type Enzyme 173 — Step 3: Molecular Modeling 173 — Step 4: Site-Directed Mutagenesis and Evaluation of the Variant 174

Contents

XI

4.2 Knowledge of the Protein 3D-Structure is the Essential Prerequisite . . 174 X-Ray Crystallography and NMR are Used for Experimental Structure Determination 176 — Protein Crystallography 176 - NMR-Methods 177 — Structure Prediction is an Alternative Sequence-Oriented Approach 178 — Correct Sequence Alignment is Necessary for Correct Stucture Predictions by Homology 180 — Similarity Matrices Improve Sequence Alignment 181 — Predicting the Main-Chain Folding of Insertions and Deletions 184 - Exchange of Amino Acid Side-Chains 185 - Check of the Model 187 4.3 The Game with Large Numbers: Design of New Variants 188 Molecular Graphics are Helpful for Planning Experiments 190 — Experimental Background is Essential 192 4.4 The Necessary Check: Force Field Calculations Energy Minimization is Usually the First Step of the Force Field Treatment 194 - Protein Dynamics Simulations Help to Find the Global Energy Minimum of a Protein Structure 194

192

4.5 Successful Protein Design Examples 195 Successfully Designed Enzymes 197 — An Important Goal: Redesign of Protein/Protein Interactions 197 — An Instructive Example: Design of a Highly Effective and Selective Inhibitor for Human Granulocyte Elastase 197 — Protein Stability can be Improved by Design 199 — Summary 200 5 Structural Design of Proteins

209

Chris Sander 5.1 Introduction: From Natural Evolution to Protein Design

209

5.2 Topological Redesign of Natural Proteins 209 The Core Hypothesis: Side Chain Interactions in the Protein Interior Determine the Specific Fold 209 — Protein Chains can be Cyclically Permuted but Fold as Before 210 — Reengineering of Loop Connections in a Four-Helix Bundle Left the Fold Intact 210 — Point of Departure: A Natural Four-Helix Bundle 210 — Steps in Redesign Involved Cutting and Pasting 211 — Experimental Verification Proved Successful Design 213 - The Protein Engineer is Free to Redesign Loop Connections 213 5.3 De novo Design of Protein Structures Define the Structure, Invent a Sequence 214 — Construct a Backbone Model 215 — Design an Amino Acid Sequence 216 — Avoid Alternate Folds 216 - Optimize the Model 217 - Express, Synthesize, Purify - and Determine the Structure 217 — More to Come 218

214

5.4 Design of a-Helices to Form Coiled Coils Different Approaches to the Design of Four-Helix Bundles 219

218

5.5 Attempts at Designing Larger Proteins: (Pa) 8 Barrels Sandwiches of (3-Sheets 223 — Mixed ap Structures 223

221

XII

Contents

5.6 Design Exercises on Computers Help Formulate N e w Design Projects

224

5.7 De novo Design of Protein Function 224 First Attempts to Engineer Binding Sites: DDT and Metal Ions 224 — Design of Enzymatic Function by a Hybrid Approach 226 — Catalysis Near a Porphyrin Ring 226 — An Attempt to Create an Esterase 227 — Membrane Ion Channels Formed by Simple Helical Peptides 228 5.8 Computer Design and Molecular Selection

229

5.9 Looking Ahead

230

5.10 Protein Databases: Structures and Sequences 230 Database of Protein Structures 230 - Database of Protein Sequences 231 — Derived Databases Useful in Protein Design 231 6 Antibody Catalysis Theodore Tarasow, Donald

237 Hilvert

6.1 Introduction

237

6.2 Exploiting Antibodies as Catalysts

239

6.3 Utilization of Entropy to Speed Chemical Reactions

244

6.4 Catalysis Through Substrate Destabilization

250

6.5 Catalytic Groups and Cofactors can be Used to Accelerate Reactions

253

6.6 Future Prospects for Antibody Catalysis

257

7 Design of Protein Targeting Signals and Membrane Protein Engineering

263

Gunnar von

Heijne

7.1 Introduction

263

7.2 Protein Targeting: Pathways and Signals 264 Sorting in the Secretory Pathway Depends on Multiple Signals 264 — Mitochondrial Targeting Peptides Form Amphiphilic a-Helices 266 - Chloroplast Transit Peptides Contain Many Serines and Threonines 268 — Nuclear Localization Sequences are Composed of Two Closely Spaced Clusters of Basic Residues 269 — There are Two Kinds of Peroxisomal Targeting Signals 270 7.3 Membrane Proteins: Principles of Assembly and Engineering . . . . 270 Membrane Proteins are Adapted to a Non-Polar Environment 270 — The Topology of Membrane Proteins is Controlled by Positively Charged Residues 271 — Transmembrane a-Helices Pack Together in Much the Same Way as a-Helices in Globular Proteins 273 - The Topology of Membrane Proteins can be Predicted from Their Amino Acid Sequence 274 - Membrane Proteins have Sorting Signals not Found in Soluble Proteins 275 7.4 Conclusions

276

Contents

8 The Rational Design of Amino Acid Sequences

XIII

281

Gisbert Schneider, Reinhard Lohmann, Paul Wrede 8.1 Introduction

281

8.2 Understanding the Sequence-Function Relationship is a Prerequisite for Rational Protein Design

282

Appropriate Sequence Representations Allow the Extraction of Features Responsible for a Certain Protein Function 283 — Heuristics Serve as a Guide through the Feature Space 292

8.3 Artificial Neural Networks and Machine Learning are Methods of Choice for Feature Extraction

294

Reliable Prediction Systems are Necessary for Sequence Design: The Problem of Generalization Ability 296 — Artificial Neural Networks Provide Flexible Systems for Pattern Recognition in Amino Acid Sequences and Sequence Classification 298 — Development of Articifial Neural Networks is an Optimization Task 304

8.4 Simulated Molecular Evolution is a Potential Method for Rational Sequence-Oriented Protein Design

305

Amino Acid Distance Maps can be Used as a Guide for Sequence Variation 307

9 Structural Control and Engineering of Nucleid Acids

319

Nadrian C. Seeman 9.1 Introduction

319

9.2 The Assignment of Sequences to DNA Objects 9.3 Forming DNA Objects Requires Structural, Environmental and Synthetic Considerations

326 328

9.4 Structures Built from Branched DNA Molecules

329

9.5 Potential Applications of Branched DNA Objects and Lattices . . . .

335

9.6 Concluding Remarks

339

Epilogue by Alexander Rich

345

Authors

349

Glossary

353

Index

361

Abbreviations

l-D; ID 2-D; 2D 2-DE 3-D; 3D ADP AI AIDS AMP ATP BPTI CD CE COSY Da DABITC DDT DNA DNP ER ES-MS FAB Fab

FPLC GdnHCl GMP HIC HPLAC HPLC IDA IEC IEF IEM IMAC

One-Dimensional Two-Dimensional Two-Dimensional Electrophoresis Three-Dimensional Adenosine Diphosphate Artificial Intelligence Acquired Immuno-Deficiency Syndrome Adenosine Monophosphate Adenosine Triphosphate Bovine Pancreatic Trypsin Inhibitor Circular Dichroism Capillary Electrophoresis Correlated SpectroscopY Dalton (molecular mass unit) N,N' -DimethylAminoazoBenzene IsoThioCyanate 2,2-bis(p-chlorophenyl)-l,l/l-Thrichloroethane DeoxyriboNucleic Acid DeoxyriboNucleic-Protein particle Endoplasmic Reticulum ElectroSpray Mass Spectrometer Fast Atom Bombardment Antigene-Binding Fragment of an antibody Fast Performance Liquid Chromatography GuanidiniumHydroChloride GuanosineMonoPhosphate Hydrophobic Interaction Chromatography High Performance Liquid Affinity Chromatography High Pressure Liquid Chromatography N-methyllminoDiacetic Acid Ion Exchange Chromatography IsoElectric Focusing Immuno Electron Microscopy Immobilized Metal Affinity Chromatography

XVI

Abbreviations

kDa LEC MALDI-MS MCS mRNA NADH NADPH NMR NOE PAGE PCR PDB PIR PITC rDNA RNA RNP RP-HPLC SDS SME TOF-MS tRNA

Kilo-Dalton Ligand Exchange Chromatography Matrix-Assisted Laser Desorption Mass Spectrometer Multi Cloning Site Messenger Ribonucleic Acid NicotinAmide Adenine Dinucleotide, reduced form NicotinAmide Adenine Dinucleotide Phosphate, reduced form Nuclear Magnetic Resonance Nuclear Overhauser Effect PolyAcrylamide Gel Electrophoresis Polymerase Chain Reaction Brookhaven Protein Data Base Protein Identification Resource PhenyllsoThioCyanate Recombinant DNA RiboNucleic Acid RiboNucleic-Protein particle Reversed-Phase High Pressure Liquid Chromatography Sodium Dodecyl Sulfate Simulated Molecular Evolution Time Of Flight-Mass Spectrometer Transfer Ribonucleic Acid

Prologue

This is an exciting time for protein engineering and design: there have already been enough successes on many different fronts to convince even the skeptical; however, it is also clear that even greater developments lie ahead. It seems timely, then, to produce a book that surveys underlying fundamentals, describes a wide selection of present successes, and points to likely future developments on a variety of fronts. As is clear from the present volume, the field of protein (or nucleic-acid) engineering and design is enormously diverse. What unites it, more than anything else, is an attitude or frame of mind — molecular designers all believe that our tools and our knowledge are mature enough to productively tinker with macromolecules, as well as just studying the ones provided by nature. Two different goals motivate this activist approach, mixed in different proportions for each researcher. One goal is to use protein engineering and design as a more effective way of learning about the molecules, particularly such difficult questions as protein folding and biological function. The second goal is to actually produce molecules with new and useful functions designed into them. Each goal can be served either by the modification of natural proteins or by inventing completely new ones out of "whole cloth". Already we can design some useful new functions be choosing the arena carefully, as is demonstrated by several authors in this volume. For example, catalytic antibodies (see Tarasow and Hilvert chapter) take advantage of a biological selection process to optimize the detailed design of a catalytic site, while subcellclar localization can be changed in useful ways (see von Heijne chapter) by modifying or swapping signal regions whose properties are relatively simple functions of local sequence. If in parallel we also strike out boldly into the wilder sorts of design, we can hope to learn enough that some day fairly soon we might have engineering-level control of the entire process of protein production, folding, and functionality, at least for a useful set of specific cases. We are most definitely not there yet, nor are we within a few straightforward steps, but the benefits of such thorough understanding and control would be enormous, and we know many fruitful ways of proceeding toward that goal. In approaching these long-range goals, the active meddling process of protein design and engineering is especially valuable because it can force us to stumble

XVIII

Prologue

over our incorrect assumptions. In other words, making good mistakes may be the most valuable part of the process. For example, in the early days of protein engineering it was assumed that adding disulfide crosslinks between C a atoms at the right distance would necessarily stabilize a protein. From several such attempts that actually decreased stability, we learned first that the details of geometry must be just right, and more recently that effective stabilization is easiest to achieve in the most mobile regions. Another paradigm shift is taking place at present, driven by results from the sort of de novo design described in the Sander chapter and also done in our own laboratory. Ten years ago we all thought it would be extremely difficult (although worth trying) to come up with completely de novo protein designs that would fold up at all, but that once we got close it should be simple to make incremental improvements. However, it has turned out that it is actually fairly easy to achieve approximately-correct secondary and tertiary structure, that even after much effort no one has yet designed a unique, well-ordered structure that behaves like a natural protein. In other words, we were wrong about where the hard part was, and major research effort will now be concentrating on issues like internal packing, the determinants of uniqueness, and the attributes of the transition from a molten-globule-like state to a native state. In summary, the field of protein engineering and design offers substantial rewards in both the intellectual and the practical spheres. I hope you enjoy reading about it. Jane Shelby Richardson

1 An Introduction to Protein Engineering Thomas J.Graddis, Dale L. Oxender

1.1 Protein Engineering is a New Addition to the Biotechnolgy Revolution The primary goal of protein engineering is the rational design and construction of novel proteins and polypeptides with enhanced or unique properties. Rapid progress toward this goal has already provided several important products for market and promises to soon deliver many more. The world pharmaceutical industry is presently the driving economic force powering the development of protein engineering. The economic potential of protein engineering has spawned numerous start-up companies and prompted the major pharmaceutical and chemical concerns to expend increasing percentages of their research and development funding on programs to develop protein engineering technology. International competition is increasing as governments and industry foresee the potential of protein engineering. Japan's Ministry of International Trade and Industry has catalyzed the formation of a consortium of 14 companies to develop a Protein Engineering Research Institute (PERI) which has a 10-year research budget of over $ 100 million to develop new and improved proteins. Corporate strategies are focusing on valuable biologicals and polypeptide hormones, novel drug screening technologies utilizing the products of protein engineering, and 'superior' enzymes for use as catalysts in the production of highvalue speciality chemicals and pharmaceuticals. Protein engineering is defined as the modification of the native structure of proteins. The means by which proteins are engineered fall into two broad categories, those modifications carried out in vivo and those administered in vitro. In vivo modifications of protein structure are the most frequently used methods and utilize the tools of molecular biology, including specific alterations in primary structure by gene mutation, construction of multi-functional chimeric proteins via gene fusion, and post-translational modification, whether achieved by selection of or engineering of host-cell expression systems. In vitro engineering of proteins includes enzymatic and chemical modification of isolated proteins, synthetic polypeptides, and transcriptional and translational cell free systems.

2

Thomas J. Graddis, Dale L. Oxender

The major proportion of in vivo protein engineering experiments involve the introduction of point mutations into well characterized enzyme systems in order to investigate the role of specific residues in binding and catalytic activities. A non-rational approach to protein design that is finding increasing favor among biotechnologists includes random epitope libraries and combinatorial libraries, both of which raise the prospect of rapid genetic screening for optimizing or obtaining new binding activities. Protein engineering provides the essential methodology by which the structure-function relationship of proteins may be delineated, the knowledge of which is a prerequisite to rational design. The birth of protein engineering is the logical consequence of developments during the last decade in the methods for deoxyribonucleic acid (DNA) synthesis, recombinant DNA (rDNA) technologies, and significant improvements in protein structural analysis. In the late 1970's a technological breakthrough, termed site-directed mutagenesis, for which Dr. M. Smith received a Nobel Prize in 1993, permitted investigators to precisely and efficiently alter genetic information coding for a protein. Site-directed mutagenesis allows investigators to specifically delete, add or replace amino acids in a protein structure at will. The success of protein engineering requires an interdisciplinary collaboration of scientists with expertise in biochemical, molecular genetics, protein structural analysis, enzymology, computational, and theoretical areas. An examination of the efforts currently being carried out suggests that studies of well established model systems that lend themselves to an interdisciplinary collaboration will be most effective for developing the generic tools for protein engineering. Advances in protein engineering will have important applications to the fields of medicine, chemical industry, and agriculture. This chapter is an overview of the concepts and methods of engineering proteins. Our intention is to provide a conceptual framework by which the reader may judge both the potential of protein engineering and its limitations.

1.2 Proteins Consist of Several Structural Elements The transfer of information in a cell from gene to protein is a multistep process that involves transcribing information from the DNA of the gene to the ribonucleic acid messenger (mRNA) and then translating this information into a linear sequence of amino acids referred to as the primary structure of a protein. The information or code for folding the protein into a three-dimensional structure is contained in the primary amino acid sequence. The information for the folding pathway has sometimes been referred to as the second-half of the genetic code, the determination of which would overcome a major bottleneck for the protein engineering field.

1 An Introduction to Protein Engineering

3

The architecture of a protein is a complex mixture of three-dimensional structural motifs. As mentioned above, the amino acid sequence is referred to as a protein's primary structure. Secondary structure, i.e. local formation of alphahelices, beta-sheets, or turns, is based on the most favorable conformation for the specific amino acid sequence. The helices, turns and beta-sheet secondary structural motifs of each polypeptide chain are folded into a compact threedimensional tertiary structure. Finally, quaternary structure refers to the shape of protein complexes that are composed of more than one polypeptide chain. The tertiary structure of a polypeptide chain can often be subdivided into smaller functional units, called domains. Domains are composed of sections of contiguous amino acid sequences that independently fold into defined structural units. Domains often represent functional units as well, containing intact binding and catalytic sites. These binding and catalytic sites are also often found at the interface between domains or at the interface between polypeptide chains of multimeric proteins. Nature has provided a set of functionally discrete structural domains that may be spliced together in unique combinations that either retain activity or in some cases produce new activities. A specific aim of protein engineering is the design and formation of functionally active chimeric proteins by combining domains of different proteins. Analysis of protein functional families and divergence between families has begun to uncover some of the principles nature uses in the natural evolution and design of functional proteins. Several generalizations can be made: - Three-dimensional structures of proteins, either tertiary or quaternary, appear to be more conserved in evolution than the linear sequence of amino acids and considerably more conserved than DNA sequence. - Secondary structure of alpha-helices and beta-sheets of related or recently diverged proteins may often be superimposed while amino acid replacements, insertions or deletions tend to occur on the surface of the protein in loops between these secondary structures thus leaving the central globular core unchanged.

1.3 Technologies that Enable the Development of Protein Engineering The successful application of protein engineering requires an interdisciplinary approach dependent on a flow of information between the five disciplines or areas of expertise outlined in Figure 1.1. For purposes of this discussion we will start with the structural and functional analysis of an important hypothetical industrial enzyme. It is imperative to be able to obtain the three-dimensional structure of this enzyme so that structural features can be related to specific function. At the present time, the three-dimensional structure of over 400 pro-

4

Thomas J. Graddis, Dale L. Oxender

\ D

C

Figure 1.1 The cycle of Protein Engineering. Schematic representation of the five major paradigms comprising the field of protein engineering and the relationship between them.

teins has been determined; however, structural analysis remains one of the bottlenecks in the cycle represented in the Figure. From knowledge of the relationship between structure and function, predictions and design principles (next circle, Figure 1.1) can be used to develop plans for modifying the structure of the enzyme to produce a new enzyme with improved catalytic activity or stability. Once the modification has been decided the tools of recombinant DNA and genetic engineering allow structural modification to be made. Using recombinant DNA technology is now essentially possible to isolate the gene for any protein from the appropriate cell or tissue and modify its structure by sitedirected mutagenesis to produce a specifically altered protein. A large battery of techniques are available for overexpression and purification (Figure 1.1) of the newly modified protein which can be subjected to a new round of functional and structural analysis. It is often possible to screen newly modified proteins by carrying out functional analysis prior to complete purification thus avoiding the more laborious structural analysis of modified proteins not considered useful. The following section briefly describes the current status of the technologies necessary for the applications of protein engineering.

Recombinant DNA and Genetic Engineering Technologies The isolation of a gene coding for a protein of interest, by cloning or by synthetic methods, is a prerequisite to protein engineering. A gene may be isolated based on its pattern of expression, or its ability to bind a complementary strand

1 An Introduction to Protein Engineering

5

of DNA or its ability to code for a particular protein detected with antibodies. Once the gene has been isolated the nucleotide sequence can be translated by the genetic code to provide the amino acid sequence of the corresponding cloned protein product. The cloned gene will serve as the template for engineering the protein product. Protein engineering has been aided by recent technological breakthroughs in both DNA sequencing, synthesis and splicing. A class of enzymes discovered in the early 1970's called restriction endonucleases will cleave DNA at unique recognition sequences. Today, several hundred of these enzymes are available commercially and are in wide use. Availability of these restriction enzymes permits isolation of a gene from one organism and splicing it into the genome of another organism such as bacteria for efficient expression and production of the protein product. The wide array of procedures available for this type of manipulation are referred to as recombinant DNA technologies (rDNA). These technologies can be used to facilitate large scale, relatively low-cost production of proteins in plants, animals, and fast growing microbes. A chemical procedure for sequencing DNA, developed by Maxam and Gilbert, enjoyed wide spread use by the mid-1970's. In the early 1980's Fred Sanger developed an enzymatic means to sequence DNA that is amenable to automation. The chemistry of solid phase DNA synthesis has evolved to the level that it is now standard to generate synthetic DNA fragments of desired sequence in a matter of hours using a fully automated system. Biotechnologists increasingly find it convenient to synthesize the complete gene at a modest cost. This approach, impractical just a few years ago, offers advantages in that it allows the researcher to introduce restriction sites for future mutational analysis with gene cassettes or alter and optimize codon usage for a particular host expression system. The level of sophistication that mutagenesis has achieved is demonstrated in the recent report by Santi and coworkers at the University of CA, San Francisco. They mutagenized a synthetic thymidylate synthase gene with degenerate cassettes and produced 125 mutants at 12 different amino acid residues. In the mid-1980's, Scientists at Cetus Corporation introduced polymerase chain reaction (PCR) technology. PCR utilizes an enzymatic primer extension procedure that has found broad application in molecular biology. PCR is a relatively simple semi-automated method for amplifying genes for subsequent cloning and introducing specific or random genetic mutations. It is often ignored but it is difficult to overstate the importance of the improved quality of reagents available to the molecular biologist. The high purity and better properties of reagents, be they restriction enzymes, polymerases, or oligonucleotides, permit the modern researcher to carry out tasks such as cloning individual isotypes of a receptor or rescuing immunoglobulin domains, that were difficult if not impossible just a few years ago. Once a gene has been cloned it is a relatively simple matter, through a variety of genetic engineering techniques, to edit information in that gene one base at

6

Thomas J. Graddis, Dale L. Oxender DNA

M

I I I I

M

I I I TT

TRANSCRIPTION

lRNA

5'

TRANSLATION

POLYPEPTIDE

FOLDING

PROTEIN

Figure 1.2 The transfer of information from gene to protein. The mutation in the DNA, represented here as a circle, is first transcribed then translated into the final folded protein structure. a time and therefore specifically substitute any amino acid in the protein structure with another. This makes it possible to examine the role of individual amino acid residues in the structure and function of a protein. Figure 1.2 presents the transfer of information encoded in the DNA sequence of a gene which specifies a protein. The DNA is a double stranded linear polymer of deoxynucleotide repeating units. The two complementary strands of DNA form a double helix. One strand of the helix contains regions of code called genes which direct the synthesis of cellular products such as hormones and enzymes. The first step in the information transfer is the transcription of the gene into a messenger ribonucleic acid, abbreviated hereas mRNA. The information in the messenger is translated into a polypeptide or protein enzyme. The protein is produced as a linear polymer of amino acid units and the sequence of these amino acid is coded for by the sequence of nucleotides in the DNA. The functional enzyme is formed by specific folding into a unique three-dimensional structure. Folding of the nascent polypeptide may occur concomitant with translation or post-translationally and is often aided by other folding components called chaperones. It is now possible to take advantage of this information transfer pathway to produce modified enzymes by changing the informa-

1 An Introduction to Protein Engineering

7

tion in the gene coding for the enzyme. The circle in the DNA structure at the top of the diagram represents a designed alteration in the nucleotide sequence of the gene which was accomplished by in vitro site-directed mutagenesis techniques. This change in the gene results in a change in the messenger and, ultimately, a corresponding change in the amino acid sequence of the enzyme: The altered amino acid sequence may produce a more stable enzyme or one with modified catalytic properties. Protein engineering techniques can be used for virtually any enzyme by first cloning the gene, and then applying the appropriate gene splicing and engineering procedures (Figure 1.3).

Figure 1.3 a) Scheme for cassette mutagenesis of a receptor gene. A synthetic receptor gene is synthesized and cloned into an expression vector. This designer gene contains unique restriction sites carefully positioned in the coding region, thus permitting the modification of particular segments of mature protein.

8

Thomas J. Graddis, Dale L. Oxender

Figure 1.3 b) Superimposition of cassettes on the structure of the mature protein. The multitude of unique restriction sites bracket the functional domains of the receptor protein. Structure-function analysis of these domains is carried out by cloning mutated cassettes into the receptor gene and expressing the engineered protein.

Protein Purification Plays a Role in the Cycle of Protein Engineering Cells normally contain several thousand different proteins as well as DNA, RNA, lipids, and carbohydrates. The challenge of protein purification is to isolate one protein from this complex mixture with reasonable efficiency, yield and purity, while keeping in mind that for each step of purification the yield decreases and the cost of producing the protein increases. Protein purification procedures depend on the properties of a protein and therefore must be optimized for each protein. Therapeutic use requires highly purified proteins in small quantities while some industrial uses require larger quantities of somewhat less pure protein provided the remaining contaminates do not interfere with the reaction to be catalyzed. New protein purification techniques have emerged from advances in material sciences, such as ultrafiltration with highly

1 An Introduction to Protein Engineering

9

Construction of chimeric G-Protein coupled receptors

HM

V

G-protein coupled Receptor I D e s i g n e r Gene

Digest with unique restriction sites a n d ligate

G-protein coupled R e c e p t o r II D e s i g n e r Gene

G-protein coupled l/ll chimeric receptor

determine binding activity of the chimeric receptor in C O S 7 cells Figure 1.3 c) Construction of a chimeric protein. Designer genes may be used to fuse protein segments from two similar proteins in order to derive altered or unique function.

specific membranes or high-performance liquid chromatography using new column resins. These methods complement classical approaches and give scientists a powerful set of tools for handling various protein purification problems. Recombinant DNA techniques can be used to overproduce proteins so that in certain cases they represent as much as 50% of the total soluble protein of a bacterial cell. This degree of overproduction greatly aids the purification process. Some proteins when over-expressed in bacteria are produced as soluble active protein, whereas many others form cytoplasmic inclusion bodies that precipitate inside the cell. Large quantities of protein can be sequestered in these inclusion bodies which may facilitate their isolation, however, they can present a problem renaturing active proteins from these inclusion bodies. It is often possible to use special properties engineered into proteins to greatly assist their own purification. A cysteine residue engineered into subtilisin, a

10

Thomas J. Graddis, Dale L. Oxender

Signal Peptide

Affinity Column Binding Site

Protease c l e a v a g e S i t e

Mature Gene Sequence

Synthetic Peptide Secretion

Protease Digestion

Affinity Column Purification

Figure 1.4 Design scheme for expression and one-step purification of a synthetic peptide or protein. The top of the Figure shows a gene construct that encodes for a protein with several functional domains. These domains include a target mature protein or peptide, a site for cleavage by protease, a specific binding domain, and a signal peptide that directs secretion from the host organism. Host cells transfected with an expression vector containing this gene will express and secrete the multi-domain protein into the media with concomitant cleavage and loss of the signal peptide. The protein is directly affinity purified from the media by utilizing the specific binding domain. The binding domain is removed from the target protein by cleavage with protease digestion.

1 An Introduction to Protein Engineering

11

serine protease, allowed researchers at Genentech to use a thiol-specific-binding resin to isolate the modified protein. A powerful method for purifying proteins now commercially available involves inserting the gene coding for a protein of interest between the genes coding for two different binding domains, such as a metal binding, biotin binding and /or carbohydrate binding domain. The expressed hybrid protein contains two unique binding activities and may be isolated in two affinity chromatographic steps. The unwanted binding domains are then removed by either chemical or enzymatic cleavage of specific sites engineered into the linker sequences between the binding domains and the protein of interest. A scheme for this procedure is given in Figure 1.4. Sometimes it is possible to remove amino acid residues from a protein which interfere with the expression and purification of a fully functional protein. For example, expression of properly folded beta-interferon in bacteria, a polypeptide hormone with antiviral activity, is hampered by the presence of an unpaired cysteine residue. This cysteine residue interferes with proper formation of a disulfide bridge between two other important cysteine residues in the protein. Since the unpaired sulfhydryl group was not essential for activity, engineering a serine replacement for this cysteine residue resulted in an improved yield of a more stable protein without significant loss in activity. In some cases it may aid purification to have a protein secreted by cells into the culture medium. This is particularly true for proteins that contain several disulfide bonds. When proteins containing disulfide bonds are expressed in microorganisms, the reducing atmosphere inside the cells is not appropriate for disulfide bond formation and when the proteins are extracted they often form incorrectly folded aggregates. To get around this problem, the secretion of a protein may be controlled by genetically attaching cleavable bacterial signal sequences that direct a protein to be secreted from the cell. The extracellular location of the modified protein and oxidizing atmosphere will enable proper folding and disulfide bond formation.

Functional Analysis is Used to Evaluate Structural Changes of Engineered Proteins Specific and sensitive assays of protein function, whether it be catalysis, ligand binding, or stability, are essential to effective evaluation of the results of sitedirected mutation steps in the cycle of protein engineering. The greater the sensitivity of such an assay the better the investigator is able to evaluate even small structural changes. A rapid assay will find greater utility in quantitating protein function. In addition, it is necessary to have a means of monitoring the progress of purification of a protein. It is often necessary for an assay which monitors protein activity to be tailored to a specific protein or class of proteins. Binding assays for proteins such as ligand receptors or DNA binding proteins

12

Thomas J. Graddis, Dale L. Oxender

are often difficult to quantitate. Protein engineering has been used to facilitate binding assays in several of the following ways: to provide and modify protein substrates used in screening assays; introduce random non-rational screening assays; develop bio-availability assays utilizing expression of intracellular receptors; and to produce chimeric proteins so that the binding activity of interest is fused to an enzyme whose catalytic activity is easily determined. In addition, an indirect assay may be developed to evaluate DNA binding proteins. For example, beta-galactosidase, an enzyme whose activity is easily monitored in a colorimetric assay, has been fused to a variety of proteins whose expression is dependent on the activity of a DNA binding protein. In this manner the DNA binding activity of the protein of interest may be determined by monitoring the enzymatic activity and thus expression of the chimeric protein.

Structural Analysis of Proteins is Essential The heart of protein engineering lies in obtaining a broad range of structural information, which is often the rate-limiting step in the cycle of protein engineering outlined in Figure 1.1. The three-dimensional structure of a protein determines its biological function, whether it acts as an enzyme, a hormone, or serves some other purpose. It is important to have a high resolution three-dimensional structure of the protein available when designing a new function for a protein. In addition, it is essential at this stage to acquire structural information of an engineered protein when testing design hypotheses so that the new structural elements can be related to altered function. With increasing computational sophistication we are better able to model three-dimensional structures or proteins based on their amino acid sequence analogies with other proteins of known crystallographic structure.

The Use of X-Ray Crystallography in the Structural Analysis of Proteins To date there are 400 plus refined protein structures in the Brookhaven Protein Data Base (PDB). These structures were acquired using X-ray diffraction of crystalline proteins. This technique requires substantial investment in time and dollars, sophisticated technology and highly trained personnel. Crystallographic techniques for determining protein structure have greatly improved in the last few years. Data acquisition taking hours by conventional means now has been reduced by using more intense X-ray sources and modern two-dimensional area detectors coupled to a computer. This rapid data acquisition time permits transient phenomena to be captured in successive images, a sort of motion

1 An Introduction to Protein Engineering

13

picture at atomic resolution. A number of large pharmaceutical concerns, including Merck, SmithKLme Beecham, Glaxo, and Hoffman-LaRoche, have formed a consortium to buy time at a U.S. government cyclotron high intensity radiation facility in order to obtain X-ray diffraction patterns of proteins of commercial interest to these companies. Considering the reluctance of major corporations to enter into consortiums, this development underscores the importance of obtaining crystallographic structural information of proteins. The intrinsic weakness of a crystallographic approach is that suitable protein crystals must be obtained. Protein crystallization is still as much art as science, however, emerging guidelines and improved techniques have yielded greater success. Crystallization experiments require large quantities of highly purified protein and are labor intensive. The laboratories of major research institutes often include industrial robots dedicated to crystallization of proteins. These instruments are time saving and provide greater experimental consistency. To get the initial three-dimensional structure of a protein it must be crystallized with a suitable heavy metal atom to serve as a reference point. Once a structure has been determined solving the structure of a series of mutant proteins is made easier by using the atomic coordinates of the first structure and looking for density differences. In this manner, for example, Brian Matthews and coworkers at the University of Oregon have obtained the three-dimensional structures of over 50 derivatives of T4 lysozyme. Many laboratories have used protein engineering to produce more easily crystallized extracellular domains of cell-surface receptors. Other laboratories are seeking to improve crystallization properties of protein by either removing, substituting, or adding residues and peptides, hopefully without altering their active sites.

Nuclear Magnetic Resonance Spectroscopy is an Emerging Technology Two-dimensional Nuclear Magnetic Resonance (NMR) spectroscopy is a powerful technique for examining the structure of a polypeptide in solution. This method is used to look at dynamic structure of proteins in solution, bypassing the difficult crystallization step in X-ray diffraction. Three-dimensional structural information is gleaned from resonance patterns through complex computer assisted calculations. Two-dimensional NMR techniques are expected to offer advantages for the study of protein folding since many of the intermediates in protein folding may be unstable and difficult if not impossible to crystallize. The major limitation to two-dimensional-NMR is that it has currently only been useful for small proteins of mass less than 15,000 daltons. New 600 Mhz instruments as well as improved techniques, such as selective deuteration, stable isotopic labeling, and three- and four-dimensional NMR, offer advantages by reducing the data collection time as well as providing the

14

Thomas J. Graddis, Dale L. Oxender

ability to study larger polypeptides. The relative ease with which proteins and protein-ligand systems may be isotopically enriched has lead to a surge in the study of protein dynamics. An impressive example of an isotopic labelling NMR study by Wiithrich's group in Zurich demonstrates that cyclosporin A when bound to cyclophilin has a conformation distinct from that of either its free solution structures or its X-ray crystal structure. Homonuclear and heteronuclear three-dimensional NMR can yield more accurate proton to proton distance information, greatly reduce spectral overlap, and provide better characterization of internal protein dynamics compared to that derived from two-dimensional NMR. This approach should prove useful in the study of not only proteins but also nucleic acids and carbohydrates, where problems of spectral overlap can be extremely severe. Heteronuclear four-dimensional NMR is a viable approach for deconvoluting the extremely crowded spectra of medium-sized proteins that are not readily assessable to three-dimensional NMR.

The Use of Design and Prediction in Protein Engineering Mutational studies play a central role in identifying sites within protein structures critical to function, stability or structure, often uncovering surprising and unpredicted principles not immediately obvious from perusal of protein structure. For example, amino acid substitutions that increase conformational stability were identified in second-site revertant studies of Staphylococcal nuclease, an enzyme that degrades DNA. The enzymatic activity of mutant enzymes, in which the mutations were remote from the active site, were restored by just a few key second-site suppressor mutations, termed "global suppression" of many different alleles. When only the suppressor mutations were present these proteins proved more stable to unfolding than the wild type protein. The classical approach to obtain proteins with altered properties has been largely limited to proteins produced by microorganisms that have been subjected to chemically induced mutagenesis techniques followed by an appropriate selection or screening procedure. This "pot-luck" approach has produced a number of altered proteins and enzymes and still proves useful for identifying specific amino acid residues that are important for activity or structure. Since such methods primarily lead to a single amino acid substitution it is difficult to obtain altered forms of enzymes that may require several amino acid substitutions. Systematic design of proteins via site-directed mutagenesis offers advantages well beyond the reach of classical mutagenesis methods. It is possible to replace, delete or introduce single amino acids or large numbers of amino acid residues and radically alter the protein structure by eliminating or adding an entire domain.

1 An Introduction to Protein Engineering

15

It is premature to say we are designing proteins based on theoretical predictions, rather we are modifying existing structures in nature to test hypotheses. There are two types of alterations obtained by site-directed mutagenesis that are most useful in defining the role of a particular amino acid residue. Isosteric replacement can alter amino acid function without changing shape, such as replacing glutamic acid with glutamine or asparatic acid with asparagine, each of which removes a negative charge but maintains most of the hydrogen bonds. Substituting alanine for serine removes a potential hydrogen bond and somewhat increases hydrophobicity. A second type of change looks at dependence of function on structure. Replacing a glutamic acid residue with asparatic acid would change structure but not function. This mutation shortens the side chain carboxylate by less than an angstrom provided there are only minor structural oscillations. Since we do not have the rules defining an amino acid's contribution to protein structure it is imperative, at this stage, to obtain the structural information for mutant proteins so that the nature of a functional aberration may be assigned to a structural element. While these types of mutations help define the role an amino acid plays in a protein, changing both structure and function of a residue can lead to practical applications for proteins.

De novo Design of Proteins is an Important Long Term Goal The design of protein structures from first principles is at a primitive stage in development. It is clear that for the immediate future both de novo design and other approaches that begin with functional proteins are important to the progress of protein engineering. Realizing the full potential of protein engineering will require rational de novo design and, therefore, it remains as an important long range goal. The simplest de novo protein designs have been most successful, involving only small polypeptides of 20 — 70 amino acids whose structures are entirely composed of either beta-sheet or alpha-helices with clefts or pockets that can later be used to form binding and catalytic sites. The physical properties of many of these designed polypeptides resemble those of the molten globule, indicating defined secondary structure with disordered side chains in the non-specific hydrophobic core. This may be one reason why most of these molecules have resisted attempts to crystallize them. The dimeric alpha-helical coiled-coil structural motif found in fibrous proteins has been an important starting structure for a number of attempts at de novo design. Recently, Erikson and co-workers at the University of North Carolina synthesized a 39 residue peptide which contains a cell-surface ligand. This peptide forms a stable coiled-coil dimeric structure. The authors hope that this molecule will bind to cell-surface receptors and be useful for diagnostic or therapeutic purposes. A number of investigators intend to build binding or catalytic

16

Thomas J. Graddis, Dale L. Oxender

sites into simple structures assembled from helical bundles and beta-sheets. The now classical designs by Berndt Gutte of the Biophysical Institute, University of Zürich, include a 24 amino acid polypeptide that forms a beta-sheet structure and binds DDT weakly but considerably better than if by chance and a 34residue beta-sheet/alpha-helix structure that exhibits some ribonuclease activity. More recently, Stewart and co-workers at the University of Colorado utilized a branched peptide approach to build a parallel four-helix bundle with an amino-terminus chymotrypsin-like esterase catalytic site. This construct catalyzes the hydrolysis of ethyl esters at a low rate, however, the design has generated considerable interest in the scientific community. Sasaki and the late Thomas Kaiser at Rockefeller University have designed a parallel four-helix bundle tethered to a porphyrin derivative in order to mimic the architecture of the enzyme cytochrome P450. It was intended that the four helices would create a hydrophobic binding cavity near the heme group. This molecule exhibited analine hydroxylase activity similar to that of natural hemeproteins. Two groups have recently engineered a zinc-binding site into a stable 72-residue de novo designed four-helical bundle. Both DeGrado's group at du Pont laboratories and Clark's group at Johns Hopkins University were successful in demonstrating tight binding of zinc (K D = 25 nM). Interestingly, the protein loses much of the molten-globule-like character upon binding zinc. Melloproteins can be potent catalysts, thus, the design of metal binding sites may be viewed as a significant step toward de novo design of catalytically active proteins. In addition, DeGrado and colleagues have synthesized model amphophilic alpha-helical peptides that aggregate within a lipid bilayer. These aggregates form channels with ion permeability characteristics resembling those of the acetycholine receptor.

Knowledge-Based Structural Prediction is Used to Model Proteins of Unknown Structure Knowledge-based predictions represent one of the current approaches to design. Modeling the three-dimensional structure of a protein can be based on strong sequence homologies of other proteins of known structure, usually proteins within the same functional family. Aligning the homologous amino acid stretches of a protein of unknown structure with those of known structure permits identification of analogies in secondary structure motifs, domains or ligand interactions. The final stage is the assembly of a three-dimensional protein framework by borrowing bits and pieces of structure from homologous proteins closest in sequence to modeled proteins, preferably connecting chains at junctures between defined secondary structure. This procedure, coined knowledge based approach by J. M. Thornton and co-workers, primarily uses information from the 400 plus protein structures in the Brookhaven Protein Data Base (PDB).

1 An Introduction to Protein Engineering

17

Using a knowledge-based procedure researchers were able to model the serine protease domain of the mammalian protease, tissue plasminogen activator, using 17 pieces from the known X-ray structures of four serine proteases: trypsin, elastase, chymotrypsin, and kallikrein. The success of this procedure has been promising. The variable domain of immunoglobulin was modeled and correlated with Xray analysis. A high degree of accuracy was obtained for the main chain region and 4 out of 6 of the more difficult hypervariable loop regions were correctly predicted. T. Blundell and coworkers have used this approach to model the active site conformation of the angiotensinogen processing enzyme renin, an aspartyl protease secreted by the kidney that is involved in high bloodpressure. The structure of this protein was unknown since earlier attempts to crystallize it failed. As a result of these models, successful design of renin inhibitors has led to valuable pharmaceutical products. These results reinforce the practical utility of a knowledge-based approach. It is hoped that this procedure will be extended to cases of little sequence homology but suspected structural homology. The challenge is to identify amino acid sequence fingerprints that yield common structural motifs. A knowledge-based approach in conjunction with advanced computer programs will have a profound effect on the ability to rationally design proteins for a given function.

1.4 There are Currently Several Bottlenecks to the Field of Protein Engineering The immediate challenge for the field of protein engineering is to understand the rules or code relating the primary sequence to its three-dimensional structure and then to determine how that structure is related to function. Though much progress has been made both experimentally and theoretically, our predictive powers are still largely limited to secondary structure for soluble globular proteins. As pointed out in an earlier section, knowing how proteins with similar sequences fold can be used with the aid of computers to make knowledge-based intelligent predictions. A major bottleneck to efficient design using the knowledge-based approach is the limited number of three-dimensional structures in the present data base. There is a significant backlog of proteins whose primary sequence is known but whose three-dimensional structures have yet to be determined. Advances in the techniques of structural determination, employing X-ray diffraction and NMR, and the increasing number of laboratories concentrating on structural analysis of proteins offers hope that this backlog will be decreased and more structures will soon be added to the data base. Another important bottleneck in protein design is the need for more sophisticated computational analysis of protein

18

Thomas J. Graddis, Dale L. Oxender

dynamics and energy minimization profiles. At present, huge stores of information regarding proteins are being generated by various university and industrial laboratories. As this body of information grows, there is a critical demand for more efficient and innovative data base management as well as advanced algorithms capable of extracting important structure function relationships hidden in the data bases. Additional minor bottlenecks in the progress of protein engineering include the expression and large scale production of functional proteins. Further development of efficient host-vector systems as well as determining the role of post-translational modification (methylation or glycosylation) in protein folding and activity must receive greater attention. In the remaining portion of this section we will discuss in more detail these bottlenecks to protein engineering and design and provide examples where investigators are addressing these problems.

The Rules Governing Protein Folding are Complex The forces governing protein folding are generally measured in thermodynamic terms - a decrease in free energy of the folded protein provides the driving force for spontaneous folding. Globular proteins in their water environments are more stable in the folded state than the unfolded state. The two predominant opposing forces contributing to protein folding are the cohesive and highly cooperative interatomic forces, primarily hydrophobic interactions, and the destabilizing entropic forces. Hydrophobic interactions serve to sequester apolar portions of the polypeptide side chains away from the aqueous solvent. The entropy of a folded protein relative to an unfolded protein contributes the major destabilizing component to the overall free energy of a protein conformation and unless stabilizing interactions outweigh the entropy term a protein will not fold. These forces acting in concert cause a net decrease in free energy by some 10—20 kcal/mol for most folded proteins. This value is small considering the enormous total energy input from stabilizing and destabilizing interactions of a polypeptide chain, 107 kcal/mole for a 100 residue polypeptide. As a result, overall protein stability may be disrupted by a seemingly minor loss of stabilizing interactions, whether they are van der Waals contacts, ion pairs, hydrogen bonds, or hydrophobic interactions. The structure of a folded protein may be either the lowest free energy thermodynamic state or some kinetically accessible intermediate state. For many small single domain single-chain proteins the native state is usually the lowest free energy state. For some more complex multidomain polypeptides there is evidence for a trapping of the active protein in some local energy-minima, intermediate state. Protein folding may be thought of as initially consisting of numerous parallel and intersecting pathways that progressively converge to fewer pathways as

1 An Introduction to Protein Engineering

19

final structure becomes defined. For this reason the effect of a single amino acid replacement on folding efficiency may be masked by alternative folding pathways. This, however, is not always the case. Scientists at the Upjohn Company found that the rate of refolding of bovine growth hormone is reduced some thirty-fold when a lysine residue is replaced with leucine. Stabilization of a folding intermediate is believed to be responsible for this decrease in the refolding rate. Kinetic studies complement thermodynamic studies of protein folding by looking at processes between the initial and final states. Protein dynamic studies, including NMR experiments, time-resolved fluorescence spectroscopy and hydrogen exchange help to quantitate the folding process and give a qualitative view of protein folding in vitro.

A Few Recognized Steps in Protein Folding are Beginning to Emerge As a random coil polypeptide begins to fold the secondary structure motifs quickly form into alpha-helices, beta-sheets, turns and loops. The extended polypeptide then collapses into a compact globule in which the hydrophobic portions of a domain are buried in a solvent excluded core. Some proteins appear to collapse into a "molten state" with undefined tertiary structure but with considerable secondary structure that represents 6 0 - 9 0 % of that present in the native protein. Further reshuffling of non-covalent and covalent interactions leads to an ordered three-dimensional native structure. This last step is often the rate limiting step. Recent experimental evidence using peptide models suggests that subdomains play a key role in folding and not all of the protein is required to stabilize folding intermediates. Kim and co-workers at the Massachusetts Institute of Technology have designed several peptide models of protein folding intermediates of bovine pancreatic trypsin inhibitor (BPTI). One model was designed to test the structural role of a disulfide bond that is predominant during refolding of reduced BPTI. In native BPTI, both of the cysteine residues are located in regions of defined secondary structure. Subdomain model peptides containing these residues form native-like structure only when the disulfide bond is formed, otherwise the reduced species appear unstructured.

Chaperones are Proteins that Assist the Folding of Other Proteins In some cases, such as secreted or translocated proteins and polypeptides of multimeric proteins, the folding process is mediated by other proteins, referred to as chaperones. Chaperones function in vivo to recognize and stabilize partially folded intermediates during protein folding, assembly, and disassembly.

20

Thomas J. Graddis, Dale L. Oxender

Initially identified because of their specific induction during cellular response of all organisms to heat shock, chaperones have been shown to be highly conserved and universally distributed. A number of theses proteins have ATPase activity which serves to dissociate the chaperone from the unfolded protein, thus allowing the protein to proceed folding. Protein disulfide isomerase (PDI) catalyses thiol /disulfide interchange reactions during protein folding. Widely distributed peptidyl prolyl cis-trans isomerases (PPI) may act to catalyze the slow steps in the intitial folding and/or rearrangement of protein structures. The biosynthesis of the heavy chain moiety of class I histocompatibility antigen (HLA) has been shown to fold into a short-lived intermediate. This high energy intermediate state must be stabilized by binding with a small chain polypeptide, forming an active oligomeric protein, or it will continue to fold into a lower free energy biologically inactive state. Some proteins are synthesized as inactive precursors, such as insulin, trypsin, pepsin, chymosin, and penicillinase. Their precursors fold into a specific conformation that can be activated by enzymatic removal of a segment of the polypeptide chain. The common feature of these proteins is that they become metastable after cleavage, having lost some of the stabilizing interactions required to reach that folded state.

Understanding the Structure-Function Relationship is Central to Protein Engineering The role of the individual amino acids in protein structure and function are beginning to be defined through site-directed mutagenesis. Site-directed mutagenesis studies of enzyme function are fast becoming the method of choice in most laboratories. In many respects, these experiments are descendants of traditional chemical modification studies. Unlike chemical modification, however, protein engineering allows one to propagate a mutant protein, and the site of modification as well as the extent of modification. Attempts to design new substrate specificities into enzymes has met with some successes as well as surprises. Certain mutational studies designed to change enzyme specificity by substituting residues thought to be involved in substrate binding, not catalysis, instead lead to a precipitous fall in enzymatic activity.

Computational Chemistry is Finding Increasing Utility Effective application of protein engineering necessitates a coupling of experimental data with theoretical studies for predicting the folded structure of globular polypeptides and for the development of models that anticipate the conformational effects of a given amino acid substitution. Deriving energy potential functions from energy minimization and molecular dynamic calculations de-

1 An Introduction to Protein Engineering

21

mands the computational capacity of a main frame or supercomputer. Due to the current expense of computers and computational analysis it is not always practical to determine the global energy minimum of entire macromolecular systems and investigators must settle for regional or local energy minimization. Energy minimization and molecular dynamic procedures are constantly being improved and have exhibited considerable success in predicting protein secondary structure, yet their limitations painfully remind us that the prediction of tertiary structure remains a major bottleneck for progress in protein design. Any serious effort in protein engineering must develop a program for macromolecular computational chemistry. As a consequence of this demand a number of groups are presently dedicated to meeting this challenge.

Data Base Management and Computer Graphics are Useful Tools The impact of computer technology on the field of protein engineering is difficult to overstate. Despite rapid progress in data base management technology that permits storage and retrieval of vast amounts of information, this area remains a major bottleneck in protein design. Laboratories throughout the world are producing immense quantities of information from DNA and protein sequences in addition to studies of kinetic, thermodynamic and structural characteristics of native and mutant proteins. This ever increasing body of information presents a formidable task for experts in data base management. Its storage must be in an economical form readily accessible by a diverse field of researchers and in a manner that anticipates the development of new searching and comparison algorithms. Dynamic programming methods using alignment algorithms to carry out a global comparison of DNA sequences, amino acid sequences and protein structures have proved invaluable in identifying homologies and analogies provided by nature. Numerous data banks have emerged world wide that are both commercially and governmentally sponsored: European Molecular Biology Laboratory (EMBL, Heidelberg, Federal Republic of Germany); The National Biomedical Research Foundation (NBRF, Maryland, USA) contains software to search, compare, align, and detect similarities and score and display the degree of similarity; GeneBank of Bolt, Beranach, and Newman Inc. contains some five million DNA bases in over 5,000 entries; Protein Identification Resource Databank (PIR) contains over 3,000 proteins; Brookhaven Protein Data Base (PDB, Brookhaven, Connecticut) contains the three-dimensional coordinates for over 400 proteins. Advances in computer graphics technology allows three-dimensional models of proteins to be displayed and manipulated. Graphics modeling of protein interations with substrates makes it possible to make an educated guess con-

22

Thomas J. Graddis, Dale L. Oxender

cerning possible changes in protein structure on the screen before going into the laboratory to make the changes using gene splicing technologies. Supercomputers can aid in the complex calculations of molecular dynamics and energy minimization of protein-solvent systems. The U.S. National Science Foundation has recently announced a multi-million dollar nationwide project to link six supercomputers (at San Diego, Boulder, Pittsburgh, Princeton, Ithaca and Urbana-Champaign) and seven regional centers in one network called NSFNET. This 'data highway' will provide greater access to information and computational power to a larger number of investigators thereby increasing the rate of knowledge expansion.

Choosing a Protein Expression System is Often an Empirical Science Functional proteins may be isolated from their natural sources, or from sources in which the foreign gene has been introduced by means of recombinant DNA technologies. In some cases proteins may be synthesized in vitro. For producing short polypeptides in small quantities, the in vitro chemical and/or enzymatic synthesis is proving to be competitive with other recombinant methods. For production of polypeptides in cells, the efficiency of gene expression is affected by both the structure of a particular gene and the physiology of the host cell. Factors such as promoters, enhancers, ribosome binding sites and codon abundance of transfer RNAs will effect the level of gene expression while host cell factors such as plasmid maintenance stability, gene copies per cell, secretion machinery, protease composition, tissue specificity and post-translational modification are all important for maximum gene expression. The optimum expression of a gene is often a unique combination of these factors and is not always transferable to a second gene. Animal or human tissue has been used as a traditional sources of hormones for pharmaceutical purposes. Enzymes and hormones obtained from animal sources, however, sometimes induce immunogenic response in patients. Many hormones are species-specific and must be isolated from human sources. The availability of human tissue is limited and hormone concentrations are vanishingly small. In addition, a persistent danger for products purified from human sources is contamination, particularly by viruses. Viral contamination of growth hormones, thyroid stimulation hormone and prolactin obtained from human pituitary glands forced the U.S. Food and Drug Administration to ban proteins derived from this source. The advent of recombinant DNA technology has permitted the production of human proteins in either bacteria or mammalian cell culture where viral contamination and low concentration problems are avoided.

1 An Introduction to Protein Engineering

23

Advantages and Disadvantages of Protein Expression Systems At present microorganisms are used to produce most of the major industrial enzymes. Production of proteins in bacteria can be advantageous since they can usually be scaled-up to produce large quantities in a relatively short time at a modest cost. Escherichia coli is the most common microorganism used for the production of a protein that remains in the cytoplasm. At the end of the fermentation, cells are harvested and homogenized in order to isolate the protein. Escherichia coli bacteria, however, contain poisonous endotoxins, whose presence can pose problems in purification of pharmaceutical proteins expressed in this organism. A protein can be directed to be secreted into the periplasm of Escherichia coli by attaching a signal sequence via gene splicing techniques. The advantage of having a protein secreted into the periplasm is that it can avoid problems of purification and for proteins that have complex folding pathways involving disulfide bonds. A continuous cycle of protein production can be achieved utilizing Bacillus subtilis, a bacterium that secrets proteins into the media via a signal sequence mechanism. The protein can then be isolated from the culture media flowing through a bioreactor. One disadvantage associated with secretion of recombinant proteins from gram-positive bacteria is that they are often degraded due to the concomitant secretion of native proteases. Bacteria can be a disadvantage for production of proteins that demand post-translational modification for biological activity. Glycosylation, the attachment of carbohydrates to residue side chains of a protein, is a common feature of many proteins from higher organisms. In some cases it is possible to produce proteins in bacteria and modify them in vitro provided the modification enzymes are available and are not cost-prohibitive. Eucaryotic microorganisms such as fungi or yeast offer an alternative host for expression and modification of proteins. Yeast contain most of the protein modification machinery of higher organisms. One problem with yeast is that they tend to over glycosylate proteins including some proteins that are normally not glycosylated. Production of mammalian proteins in mammalian cell culture, such as hamster ovary cells, overcomes the problem of post-translational modification associated with bacterial production. Cell culture methods often require complex, expensive medium that imitates the body environment of an animal. Despite the enormous research and substantial progress in improving cell culture techniques the high cost of production persists due to problems in keeping cells alive, low efficiency, preventing contamination, and scaling-up. Insect cell culture has recently gained wider acceptance among investigators due to several advantages over mammalian cell culture. In mammalian cell culture it usually takes six to eight months to amplify genes, a step required for increased protein expression, whereas genes may be amplified extrachromosomally in insect cell culture within six weeks. Insect cells grow in the absence

24

Thomas J. Graddis, Dale L. Oxender

of carbon dioxide, at a lower temperature, and to a higher density than mammalian cells. Insect cells will glycosylate protein, however, they do not form complex carbohydrate structures found in human proteins. An alternative approach to the production of mammalian proteins is the expression of these proteins in the body fluids of transgenic mammals, such as the milk of a mouse, goat or even a cow, where they are processed and modified as fully functional proteins for human medicinal use. The expression of tissue plasminogen activator in the milk glands of mice has been reported by researchers at Integrated Genetics Corporation. To obtain this expression system, the genetic information which directs the expression of a milk protein, such as casein, in the mammary gland and its secretion into the milk was attached by gene splicing techniques to the gene coding for tissue plasminogen activator. In this model system the tissue plasminogen activator was recovered from the milk as a fully functional protein without sacrificing the animal. For example, DNX Corporation of Princeton New Jersey, USA, has transplanted the human gene that prompts the produciton of hemoglobin into a pig. A portion of the pig's red blood cells, typically about 15 percent, produce human rather than pig hemoglobin. DNX hopes to develop herds of these transgenic animals for producition of human hemoglobin for use as a blood substitute in acute therapeutic treatments. There are drawbacks and advantages to each host-expression system mentioned above. No one method will suffice for all protein expression and often a trial-and-error approach is necessary. Expressing proteins in plants is still a relatively young field. Scientists throughout the world are studying the molecular biology of plants, however, the tools for cloning genes into plants and achieving high level expression of protein are still limited. Nonetheless, the ability of plants to produce huge quantities of biomass inexpensively make these organisms attractive alternatives for protein expression. Sometime in the future, we can expect to see large scale production of industrial and medicinal proteins in plants.

1.5 A Progress Report in Protein Engineering Includes Many Exciting Topics By surveying current knowledge of naturally occurring proteins and performing mutational studies on well-defined model systems we may address the gaps in our understanding of protein folding and structure-function problems. There are an estimated one million functionally different proteins in nature. Over 3,000 proteins have been characterized and 400 of these have been crystallized and spatially modeled. A number of examples, including proteases and their inhibitors, DNA binding proteins, immunoglobulins aminoacyl-tRNA synthetases, triosephosphate isomerase, dihydrofolate reductase and T4 lysozyme, are

1 An Introduction to Protein Engineering

25

being developed as model systems for extensive enzymatic, folding, structural and thermodynamic analyses utilizing protein engineering. Protein engineers have achieved considerable success in altering the stability or the specificity of several important biologically active proteins. In the following section we will highlight a few examples from investigators responsible for some of the present progress in the field of protein engineering.

Attempts to Increase Protein Stability have Met with Outstanding Success Extensive mutational and crystallographic studies carried out with bacteriophage T4 lysozyme by Brian Matthews and colleagues at the University of Oregon provides one of the larger data bases of temperature sensitive mutational effects on protein structure. Currently, over 50 mutant proteins have been crystallized and analyzed by X-ray diffraction. Selecting temperature-sensitive mutations by genetic techniques has been an efficient method for identifying residues important to thermal stability. These investigators used site-directed mutagenesis to study the effect of other amino acid substitutions at these positions. Matthews and coworkers identified a single substitution of isoleucine for threonine, located in a turn between two helices on the surface of T4 lysozyme, that disrupts a hydrogen bond network, which in turn strongly diminishes protein thermal stability. A glycine substitution at this site yielded a mutant protein whose thermal stability was unexpectedly near that of wild type. Structural analysis of the glycine-containing mutant showed that a water molecule was located in the position normally occupied by the hydroxyl group of threonine so that the important hydrogen bonding network was maintained, underscoring the important role solvent can play in protein stability. In a separate study with the serine protease, subtilisin, scientists at Genentech Corporation introduced serine for the native asparagine residue in order to improve hydrogen bond interactions in the protein core. This substitution resulted in improved thermal stability without loss of activity. These experiments indicate that single hydrogen bonds can contribute significantly to protein stability. Of the total population of hydrogen bonds in a protein structure only a small percentage would be expected to represent these potent stabilizing hydrogen bonds. The role of disulfide bond formation in protein stabilization has been the subject of intense scrutiny In general, attempts to stabilize protein structure by introducting cysteine residues in positions capable of forming disulfide bridges have met with mixed results. Matthews and associates were able to significantly increase the thermal stability of T4 lysozyme by introducing cysteine residues at sites suggested by theoretical calculations and computer modeling that would form disulfide bridges. At the same time they had to remove an un-

26

Thomas J. Graddis, Dale L. Oxender

paired cysteine residue to prevent intermolecular thermally induced thiol/disulfide interchange that could result in inactivation. In some cases, engineered disulfide bonds can diminish a protein's dynamic flexibility and cause a reduction of stability and activity. A disulfide bond engineered into dihydrofolate reductase, for example, resulted in a mutant protein whose activity was significantly diminished for the oxidized enzyme but not the reduced enzyme. Scientists at Cetus Corporation were able to introduce disulfide bonds into the peptide hormone Interleukin II, which resulted in thermal stabilization of this commercially important protein. Mutations that enhance the compactness of the protein core by increasing intrahelix packing interactions, such as changes of alanine for glycine, tend to improve the stability of proteins like T4 lysozyme. The common theme of antiparallel alignment of helices in folded proteins suggests that these helix dipoles serve to stabilize protein structure. The thermal stability of T4 lysozyme could be improved without decreasing catalytic activity by engineering amino acid replacements that are believed to decrease the conformational entropy of the unfolded state. Replacing an alanine residue with the more restrictive proline increased thermal stability of T4 lysozyme. Studies of chemical denaturation of tryptophan synthase from E. coli, an enzyme involved in tryptophan biosynthesis, show that substitution of a buried amino acid residue by each of the other 19 amino acids increased protein stability with increasing hydrophobicity of the replacement residue, provided it did not introduce steric barriers. These studies suggest that there are numerous methods for achieving protein stability and that there are energetic tradeoffs for a given alteration. Increasing the thermal stability of a protein often correlates with increased stability in solvents, resistance to oxidative processes, and increased stability to protease digestion. Robert Saurer and colleagues at M.I.T. have shown that the thermal sensitivity of the amino-terminal domain of mutant Phage lambda repressors, a DNA binding protein, correlates with resistance to proteolysis and may be compared with the activity of temperature-sensitive mutants in vivo. As a protein ages its integrity is destroyed by several irreversible chemical processes, including hydrolysis or rearrangement of aspartyl bonds, deamidation of asparagine and glutamine, beta-elimination of cysteine, and oxidation of methionine to sulfoxide. In all cases, substitution of these amino acids in native proteins by chemically inert residues increases the longevity of the mutant proteins. For example, alpha-1 antitrypsin, a potent inhibitor of the protease neutrophil elastase, whose excess activity is believed to lead to emphysema, is inactivated by oxidation of a methionine residue located at a site that binds and inhibits elastase. Replacing this methionine with valine using site-directed mutagenesis techniques produced a fully functional protease inhibitor that was resistant to oxidative inactivation. As a result, the valine variant of antitrpysin may be used as a more stable alternative in the treatment of emphysema.

1 An Introduction to Protein Engineering

27

Proteins can often be profoundly stabilized by combining a number of selected mutations in one molecule. Investigators at Genentech Corporation screened a number of subtilisin mutant proteins in a thermophilic bacteria (which grows at 70° C) and analyzed the thermal stabilizing mutations. By incorporating a number of these amino acid replacements into a single mutant protein via sitedirected mutagenesis they increased the thermal stability of this protein 50 fold. The definition of protein stability may be broadened to include other important stability characteristics. In practice, using proteins in repetitive cycles, such as an affinity or catalytic matrix, often results in a rapid decay of function. In these systems, proteins will partially or completely unfold and refold incorrectly into inactive forms. It is important to recognize that properties of the unfolded states may have a greater impact on stability than the thermal stability of the native state. Jonathan King's group at the Massachusetts Institute of Technology has identified mutations in the Salmonella phage P22 tailspike protein that suppress a number of temperature-sensitive folding mutations. The global suppression of protein folding defects reported in this study suggest that a key set of residues carry information inhibiting unproductive off-pathway conformations. Protein engineers may take advantage of this surprising result by improving the likelihood that an unfolded polypeptide will refold properly. This would serve to extend the life time of proteins used in repetitive cycles. Another form of stability involves a protein's behavior in the human body. The pharmaceutical industry is faced with the problem of increasing serum half-life of protein therapeutics. Several approaches have been used to circumvent the body's natural clearance mechanisms. These include attaching a cell-surface binding ligand to the functional protein in order to tether it to cells in the blood stream, building functional sites on to proteins with known long-term serum half-life, such as immunoglobulins or albumin, and decorating the protein with polyethylene glycols or other moieties that do not inhibit function but do increase clearance time.

Altering Enzyme Catalytic Activity has been Achieved in a Number of Systems The relative ease by which recombinant DNA methods allow protein structure to be engineered has led to a plethora of reports. The success of the technique allowed early investigators to draw dramatic, and sometimes unwarranted, conclusions concerning the role of a single residue in catalysis. The field of protein engineering has since matured; it is now clear that even conservative replacements can affect global structure and catalytic chemistry in unanticipated ways. Lattam and Loll, from Johns Hopkins University and the University of Chicago, respectively, recently presented a sobering demonstration of potential structural effects far from the site of the mutation in staphylococcal

28

Thomas J. Graddis, Dale L. Oxender

nuclease. In this example, the authors replaced glutamic acid residue at position 43 with aspartic acid, a conservative mutation. The mutant aspartate side-chain was unable to function as a general base in catalysis, did not stabilize a network of hydrogen-bonded water molecules found in native protein, and resulted in the destabilization of an eleven residue loop containing side-chains adjacent to the active site. Another report describes the replacement by alanine of a glutamic acid residue known to serve as the catalytic base in native Ribinuclease Tl. The activity of the resulting mutant protein was near wild type. Upon further analysis, the investigators concluded that a histidine residue, not involved in catalysis of the native structure, substitutes as the active-site base. Both studies reinforce the conclusion that functional effects caused by even conservative mutations may be considerably more drastic than would have been predicted and both should serve as caveats to those who draw conclusions about structure-function relationships without extensive mechanistic and structural data.

Enzyme Specificity and Molecular Recognition are Fundamental to Enzyme Mechanism Alan Fersht and co-workers at Imperial College of Science and Technology originally demonstrated the feasibility of distinguishing between the roles of individual residues in enzyme mechanism and substrate binding. This group carried out extensive mutational and kinetic studies with tyrosyl-tRNA synthetase, an enzyme involved in the charging of tyrosyl-tRNA with tyrosine. Utilizing presteady state kinetic analysis of engineered enzymes, these investigators were able to quantitate the thermodynamic binding energies of individual amino acids with regard to the substrate, transition state and products in the catalytic cycle. These studies show that a single residue may effect either substrate binding (K m ) or the rate of catalysis (K. o .

.

a »'¿' / r t w . *%£i

J

-

* '

*

l i i T * ' ? . :

• .

V-^V*:*

*

^ è-

• '

^

4*

i

4.6

I SO

I

I

I

6.0

7.0

B.Q

Figure 2.1 Two-dimensional protein pattern of human myocardial proteins. 200 |ig protein of a biopsy (2 mg) were separated by two-dimensional gel electrophoresis and stained by silver staining. The sample was applied at the anodic side of the isoelectric focusing (IEF) gel. The proteins were focused in a carrier ampholyte pH-gradient. The second dimension, sodium dodecylsulfate polyacrylamide gel electrophoresis (SDSPAGE), was performed in a 15% acrylamide gel according to Laemmli (1971). Molecular weight and isoelectric point calibration were obtained by coelectrophoresis of marker proteins. The original gel size was 23 X 30 X 0.15 cm. 15 of the proteins were identified my amino acid analysis, N-terminal or internal sequencing. 1) myoglobin; 2) myosin light chain; 3) alpha-crystalline; 4) actin; 5) tropomyosin; 6) fatty acid binding protein; 7) serum albumin; 8) phosphoglycerate mutase; 9) GTP-binding protein Rab-4; 10) proendothelin-

2 Analysis and Characterization of Proteins

49

researcher is forced to have a clear conception at the outset of each protein project in order not to waste too much material and manpower. Hence, some general rules should be set up on how to handle proteins and peptides for certain purposes. Since single cells are composed of a minimum of 3,000 different proteins, a strategy for the selection and purification of the essential polypeptides for a particular research topic has to be evaluated. This chapter may help in assisting the investigator to select a successful strategy, starting from the initial disruption of the cells to the elucidation of the proteins playing a major role in the biological phenomenon under investigation, for sequence analysis and the final identification of the protein. In some projects it may be clear which protein is to be isolated, e. g., a certain enzyme of known physiological properties, and functional assays might already be available. In other cases, the decision first has to be made which proteins are to be selected to obtain answers to a given scientific question. However, for the analysis and characterization of proteins two principally different strategies are possible: a reductionistic or a global approach.

By the Reductionistic Approach Individual Proteins are Selected for Investigation This approach starts with the idea that a particular protein is involved in the biological problem under investigation. For this protein an assay has to be developed, which then is used in the following purification procedure to detect the protein of interest. The enriched protein is checked for purity by various methods, e.g. column chromatography, one-dimensional or two-dimensional polyacrylamide electrophoresis (1-DE or 2-DE, respectively) or capillary electrophoresis (CE). If the protein has been isolated in pure form, then it may be characterized according to molecular weight, isoelectric point, amino acid composition and protein sequence. Further, if sufficient amounts of the protein are available, studies of its secondary structure elements and of its tertiary domain structure may be started. The highly purified proteins may also be used as antigens for antibody production. In the case of enzymes the main interest may be in the enzymatic activity, in enzyme kinetics, in finding inhibitors or activators and their substrate specificity. Of further interest is how to design a new protein with optimized properties. The protein-chemical, biophysical and immu-

3-precursor; 11) myosin light chain II; 12) myosin light chain II fragment; 13) creatine kinase M-chain; 14) mitochondrial creatine kinase; 15) glycerolaldehyde phosphate dehydrogenase.

50

Brigitte Wittmann-Liebold, Peter Jungblut

nological characterization of the protein and a genetic characterization might shed light on the structure and function of the protein. By the Global Approach Single Proteins from Thousands of Polypeptides in the Cell are Selected The global approach is comparable to a morphological approach. A morphologist uses the electron microscope to describe the structures of the cell. Accordingly, a protein analyst uses 2-DE to resolve as many of the entire protein species of an organism, a tissue, or a cell type as possible in order to analyze the protein composition of the sample. An example of a 2-DE pattern of a tissue is shown in Figure 2.1. The advantage of a global approach is that the researcher is not likely to miss an important protein in the investigation. In the next step changes in this protein composition occurring concomitantly with biological, pharmaceutical, medical, developmental, or other changes of the biological system under investigation may be detected. Only then does one formulate a hypothesis about the involvement of the proteins as variants in the changing biological system. By the development of methods for the identification of proteins on 2-DE protein maps the purely descriptive character of this analysis was overcome and it became possible to attribute changes in 2-DE spot intensities or positions to individual proteins; hence,functional changes in the biological system can be detected. The variant proteins can be identified by N-terminal or internal sequencing, amino acid analysis, mass spectrometry, or immunological methods. This approach is also known as subtractive analysis, because the spot intensities of one 2-DE pattern are subtracted from those of another one. Subtractive analyses have also been used on the mRNA level (cascade hybridization) or on the DNA level (restriction enzyme fragment length polymorphism). The development of the automatic evaluation of 2-DE gels and identification of proteins on 2-DE protein maps has already resulted in the first 2-DE protein databases, by means of which the global approach will be very effective in helping researchers to find proteins involved in differentiation or in diseases.

2.2 The Various Preparative and Analytical Protein Purification Techniques How to Consider an Appropriate Strategy for Protein Isolation In contrast with fluid biological materials such as milk, blood, serum, urine, cerebrospinal fluid and extracellular medium after cultivation of microorganisms, solid biological materials such as organs, tissues or cells have to be

2 Analysis and Characterization of Proteins

51

disrupted first to obtain a homogenate from which proteins may be extracted for isolation. Hard and stringy materials like muscles have to be minced first in a mixer, especially if large amounts are to be homogenized. Protein characterization, therefore, depends to a considerable degree on how the cells are disrupted. Homogenization of gram amounts may be performed in glass homogenizers, obtainable with different interspaces, and with different shapes and surfaces for the pestles. Homogenization can be supported by freezing and thawing, sonication, or the use of hypotonic solutions. Amounts of material in the milligram range are homogenized with minimal loss of material in polypropylene tubes and by well-fitted pestles surrounded by liquid nitrogen. Depending on the problem - investigation of the primary structure or of biological activity — the optimal extraction medium has to be chosen for each individual protein mixture. Temperature, buffer salts, pH, detergents, chaotropic reagents, reducing agents, chelators, metal ions, protease inhibitors and bacteriostatics are the main factors influencing the effectivity of the extraction and they have an influence on the solubility and suitability of the proteins for further investigation. For a subtractive analysis, in which no protein activity measurements are required, the following conditions can be applied for a broad range of biological materials: 9 M urea, 75 mM dithiothreitol, 50 mM Tris/HCl pH 7.1, 25 mM KC1, 3 mM ethylenediamine tetraacetic acid, 1.0 mM phenylmethylsulfonyl fluoride, 0.1 |iM pepstatin A, 3 mM benzamidine, and 2 )iM leupeptin. Addition of 5 to 10 volumes of such a solution to a homogenate obtained by pulverization in liquid nitrogen assures the optimal solubility of proteins since the hydrophobic interactions are reduced due to the presence of the chaotropic agent urea. Further, salting-in or salting-out effects at an ionic strength near physiological conditions are avoided, and a pH value higher than the pi value of most of the cellular proteins is maintained. The addition of this extraction buffer and protease inhibitors directly to the thawing homogenate leads to a minimized protease activity. The extraction conditions for the purification of single proteins are not aimed at extracting as much as possible of all of the proteins, but at obtaining the protein of interest in a highest yield. Hence, a purification effect excluding the extraction of the other proteins as much as possible is desired. This means that the choice of the extraction method is always a compromise between maximum recovery and maximum purity. In the case of eukaryotic proteins, cloned and expressed in bacteria, a high purification may be achieved by selecting conditions where the protein of interest is secreted out of the bacterial cell. If an extraction cannot be avoided, these conditions have to be optmized for each protein. In general, low temperatures (4°C) are used to maintain the activity of the protein and to avoid chemical modification or degradation. Usually the pH-value of maxi-

52

Brigitte Wittmann-Liebold, Peter Jungblut

mum protein activity is chosen. A list of buffers maintaining different pHvalues is shown in Table 2.1. For the extraction of proteins that are difficult to solubilize nonionic detergents such as Triton X-100, Lubrol, octylglycoside, or CHAPS are used. More critical is the use of chaotropic reagents like urea or guanidinium hydrochloride because of their drastic reduction of hydrophobic interactions, resulting in changes of protein conformation. Reducing agents such as mercaptoethanol, dithiothreitol or dithioerythrol in concentrations of 10 to 25 mM are used to protect cysteine residues against oxidation, without reduction of disulfide bridges. Addition of glycerol or sucrose may be advantageous, especially for stabilizing the protein during freezing and thawing. For stabilization in solution, metal ions or metal chelators like ethylenediamine tetraacetic acid inhibit metal proteases. Other protease activities are prevented by the addition of a protease inhibitor cocktail containing all of the protease inhibitors listed in Table 2.2. If the protein solutions are handled for more than one hour at 4°C the addition of bacteriostatics like sodium azide (1 mM), Merthiolate or Thimerosal (2-ethylmercurimercapto benzoic acid) (0.005%) is strongly recommended.

Table 2.1

Buffers for protein extraction. The buffering capacity is optimal if pH = pK.

Buffer

pK-value

Sodium acetate Sodium bicarbonate Sodium citrate Ammonium acetate Ammonium bicarbonate Tris-HCl Sodium phosphate Tris/phosphate

4.75 6.5; 3.09; 4.75; 6.50; 8.21 1.5; 7.5;

Table 2.2

10.25 4.75; 5.41 9.25 9.25; 10.25 7.5; 12.0 8.21

Protease inhibitors

Inhibitor Concentration

Enzymes inhibited

Phenylmethylsulfonylfluoride (PMFS) Ethylenediamine tetraacetate (EDTA) Leupeptin Benzamidine Pepstatin A

Serine proteases Metal-activated proteases Thiol proteases Serine proteases Acid proteases

0.5-1 mM 1 — 10 mM 1 - 3 nM 1—3 mM 0.1 nM

2 Analysis and Characterization of Proteins

53

Prefractionation is Necessary for Reducing the Number of Proteins in a Complex Protein Mixture Organs, tissues, fluids of organisms, cells of plants, animals, fungi, or microorganisms contain at least one thousand and up to several tens of thousands of chemically different protein species. Even by 2-DE and a resolution of about 5,000 proteins per gel a complete resolution of the investigated material may not be obtained. An additional difficulty derives from the differences in the number of protein molecules per cell of each protein. Thus, rare proteins may be lost by the detection in 2-DE patterns. On the other hand, as in case of ribosomal subunits, HPLC separations with a resolution power of about 20 to 30 proteins (Kamp and Wittmann-Liebold, 1988) may be sufficient for resolving most of the proteins (Figure 2.2). But in many other cases, various prefractionations have to be applied prior to chromatography.

Figure 2.2 High performance liquid chromatography (HPLC) of ribosomal proteins by micro-bore RP-HPLC (presented by Dr. Albrecht Otto, MDC Berlin-Buch). Total protein mixture from Bacillus stearothermophilus ribosomal subunits (50 ng total protein mixture in 50 |il) was injected onto a micro RFC C2-C18 column (i. d. 2.1 mm, length 10 cm) at 20° C and separated in the SMART microbore HPLC-System (both purchased from Pharmacia, Freiburg). The flow-rate was 80 (il/min, and the gradient was made from 0.1% TFA in water and 0.1% TFA in acetonitrile. The effluent was monitored at 214 nm.

54

Brigitte Wittmann-Liebold, Peter Jungblut

Strategies to Obtain Pure Proteins in a Few Steps As stated above, there is no general procedure by which all proteins may be purified and the choice of the method depends largely on whether the isolation is to be performed micro- or large-scale. For micro-scale purifications electrophoretic methods, e.g., gel electrophoresis, have become increasingly important, because of their high resolving power. Especially if the activity of the proteins is not to be maintained, e. g., if the protein is subjected to sequencing, purification by 2-DE in a one-step procedure directly from a tissue homogenate may be possible. Alternatively, a combination of ion exchange or size exclusion chromatography with reversed phase high performance liquid chromatography (HPLC) may be applied (see below); this usually requires more than one purification step. Biological prefractionations such as differential centrifugation to separate different cell compartments may enrich rare proteins so that electrophoretic separation may suffice to obtain the desired protein of sufficient purity and amounts for sequencing. But still today preparative two-dimensional electrophoresis yields microgram amounts of protein only, whereas one-dimensional methods may yield from milligram to gram amounts. For large scale purification, precipitation and extraction methods, as well as conventional chromatography in combination with HPLC, are most widely used. Limitations in large-scale HPLC purification derive from the high costs of HPLC supports.

Conventional Column Chromatography for Preparative Isolation of Proteins Properties of proteins such as their molecular mass, net charge, hydrophobicity, covalent attachment by SH-groups or through metal chelate complex formation, binding to drugs and specific interactions of enzymes with substrates are used for the separation and purification of proteins by column chromatography.

Gel Filtration is Based on Differences in Molecular Mass Separations based on different molecular weights are performed by gel filtration. Gel filtration, gel permeation chromatography and size-exclusion chromatography are often used as synonyms, although it would be more appropriate to define the separation principle as size-exclusion chromatography. Elutions in aqueous solutions are typical for gel filtration and in non-aqueous media are typical for gel permeation chromatography. The principle is that proteins of different molecular weights migrate with different velocities through the column, since smaller proteins have more contacts with the beads of the matrix

2 Analysis and Characterization of Proteins

55

than larger ones. Smaller proteins move into the pores of the beads, whereas large molecules migrate between the beads, leaving the column earlier. Gel filtration matrices are available for different molecular weight ranges between 1 and 50,000 kDa. There are gels with bead sizes between 50 and 200 |im based on dextran, agarose, polyacrylamide; further combinations of agarose and polyacrylamide are commercially available. When applying the protein to the column, it must be dissolved in small volumes to reach a final high resolution. The chromatographic development is performed by isocratic elution. In contrast to gradient elution the isocratic elution conditions (salt composition, pH of liquid phase, pressure, temperature) are kept constant during the whole experiment. Because of the need for small sample volumes, gel filtration is often used as one of the final steps of purification. Decreasing the bead size down to 1 - 3 0 nm and using silica, hydrophilized vinyl polymers or highly cross-linked agarose, a high resolution gel filtration may be performed.

Ion Exchange Chromatography Allows the Isolation of Native Proteins In ion exchange chromatography (IEC) the criterion of separation is charge density, as in electrophoresis without detergents, whereas in chromato- and in isoelectric focusing the separation principle is based on the pi-value. The charge density of a protein may be influenced by the ionic strength and by the nature of the particular ions in the solvent, the pH, and additives, such as organic components. According to the net charges and the surface charge distribution proteins are adsorbed onto the ion exchange material. Matrices in IEC are either hydrophilic synthetic or naturally occurring polymers such as cellulose, dextran, agarose or silica gels. Hydrophobic polystyrene-based or partly hydrophobic polymethacrylate-based polymers or various synthetic hydrophilic polymers supplied as hard or moderately hard beads are used for ion-exchange HPLC applications. In the case of anion exchangers diethyl aminoethyl (DEAE), triethyl aminomethyl (TEAE), trimethyl hydroxypropyl (QA), quaternary aminomethyl (Q), among others, are used as functional groups. Hydroxyapatite also may be employed as an anion exchanger. For cation exchangers carboxymethyl (CM), orthophosphate (P) or sulfonate (S) are often employed. Protein samples can be applied to the column in large volumes, but with low ionic strength. The column is washed with a starting buffer and the bound proteins are eluted in a pH or, preferentially, in an ionic stength gradient with increasing salt concentrations. With hydroxyapatite columns proteins are eluted in a concentration gradient of phosphate buffer. For IEC batch-wise operation, step-wise elution, or isocratic elution (for analytical investigations) are also possible. IEC is very well-suited for large-scale protein purifications, because of the

56

Brigitte Wittmann-Liebold, Peter Jungblut

high binding capacity, the possibility of applying large volumes of sample and because of its ability to concentrate proteins from dilute samples. The method is often used as the first step in a purification scheme.

Hydrophobic Interaction Chromatography may Assist Isolation of Membrane Proteins In hydrophobic interaction chromatography (HIC) proteins are separated according to different strengths of the hydrophobic interaction between protein and column material. As the matrix agarose is mainly used. The functional groups are alkyl chains, preferentially octyl or phenyl groups. The protein sample is applied in high salt concentration to increase the hydrophobic interaction. The strength of interaction follows the sequence: Na 2 So 4 > NaCl > (NH 4 ) 2 S0 4 > NH4C1 > NaBr > NaSCN Elution may be performed stepwise or with a gradient by decreasing the ionic strength, decreasing the polarity (for example, by the addition of ethylene glycol or (iso)propanol), or by adding detergents. The high capacity of HIC adsorbents makes them suitable for use at an early stage of a purification scheme. No desalting or concentration is necessary following IEC. HIC is also often applied in large-scale purifications with rigid gels allowing high flow rates.

Covalent Chromatography Binds Proteins to Supports for Selective Separation In contrast with the chromatographic methods mentioned so far, covalent chromatography binds proteins covalently to the column matrix. For this procedure, the covalent bond has to be split without destroying the protein during elution. The most commonly used covalent bond found suitable for this condition is the between the two sulfur atoms in the disulfide group. Mainly beaded agarose gels are preferred as column support. A reactive disulfide is bound to the matrix via a spacer molecule. The resulting mixed disulfide is the functional group of covalent chromatography. Among others, the reaction product of a reactive disulfide like 2,2-dipyridyldisulfide, thiopropyl or gluthathione bound to the matrix is very useful for covalent chromatography. The principle of the separation is shown by the following scheme: Immobilization of protein: Matrix-Spacer-S-S-2-pyridine + Protein-SH > Matrix-Spacer-S-S-Protein + 2-thiopyridone Elution of the protein: Matrix-Spacer-S-S-Protein + R-SH (excess) -> Matrix-Spacer-SH + Protein-SH + RS-SR

2 Analysis and Characterization of Proteins

57

2-thiopyridone is stabilized by thiol-thione tautomerism and, therefore, an almost quantitative binding of the protein to the column is assured. After immobilization of the thiol containing proteins the column should be washed with one or two column volumes of application buffer or by a high ionic strength buffer containing, for example, 0.1-0.3 M NaCl. Elution of the immobilized protein may be achieved by 10—24 mM DTT or 25—50 mM 2-mercaptoethanol. Stepwise elution to separate thiol proteins with different binding forces has also been described. Proteins were eluted by low-molecular weight thiols with increasing reducing power: 20 mM L-cysteine, 50 mM gluthathione, and 20 mM DTT. Covalent chromatography should be used in the last steps of a purification scheme, because of its low capacity, which may also be further decreased by endogeneous thiols. Gel-filtration should, therefore, be one of the preceeding steps to remove low-molecular weight thiols.

Immobilized Metal Affinity Chromatography Uses Metal Chelate Formation A special form of ligand exchange chromatography (LEC) is immobilized metal affinity chromatography (IMAC). Metal ions (Me) are bound on the surface of the column material and proteins may be separated according to their capability to form metal chelate complexes. Proteins with histidine, cysteine, tryptophan or, even better, clusters of histidines on their surface are bound, whereas a clear separation of metalloproteins from non-metalloproteins may not be obtainable. The principle of IMAC separation is the following: Activation of the column material: Matrix-spacer-chelator + Me > Matrix-spacer-chelator-Me Adsorption of the protein: Matrix-spacer-chelator-Me + protein * Matrix-spacer-chelator-Me-protein Desorption of the protein: Matrix-spacer-chelator-Me-protein + EDTA • Matrix-spacer-chelator + protein + Me-EDTA Agarose or crosslinked agarose are predominantly applied as matrix for fast flow chromatography. Spacers with 3 to 12 C-atoms in the main chain are bound to agarose, for example by reaction with l,4-bis-(2,3-epoxypropoxy)-butane. As chelator N-methyliminodiacetic acid (IDA) is often employed. Several metal ions, such as Cu 2 + , Zn 2 + , Ni 2 + , Co 2 + , Ca 2 + and Fe 3 + are described being bound to IDA. Cu 2 + has the highest binding force. Fe-IDA columns are used for separating phosphoproteins. To reduce ionic exchange effects the starting buffer should contain about 100 mM NaCl. Elution of the bound proteins may be reached in one step by 50 mM EDTA without any discrimination of the bound

58

Brigitte Wittmann-Liebold, Peter Jungblut

proteins or by gradients of increasing ammonium chloride, glycine, histidine or imidazol concentrations, or by reduction of the pH. By fusion of DNA elements coding for adjacent histidine residues at the end of a gene of a recombinant protein, proteins with histidine groups at their end have been produced in E. coli. These proteins were very efficiently purified by IMAC. The advantages of IMAC are the possible use in high salt concentrations, the application of large volumes of very diluted protein samples and the high potential for large scale purifications.

Affinity Chromatography is Frequently Applied in Immunological Investigations The term "affinity chromatography" is not defined uniquely in the literature. Sometimes it is used for nearly all adsorption chromatographies. Here we want to review only procedures employing biological or biomimetic interactions between ligand and counterligand. In most cases the functional group is a ligand with a specific biological affinity to the protein to be separated. But the specificity may be a mono-specific or a group-specific one. The ligand may be a low molecular weight ligand or one of high molecular weight. The principle of affinity chromatography is nearly the same as that of IMAC. A spacer is bound to the matrix and the functional group is bound to the spacer. Examples for mono-specific low molecular weight ligands are steroid hormones, vitamins, and enzyme inhibitors. The main problem with these ligands is that they have very large association constants. Elution without denaturation is often not possible. The ligand has to be modified so that the specifity is not lost but the association constant is decreased. 5'-AMP, 2', 5'-ADP, ATP, NAD+, NADP+ and the imidazol triazine dyes belong to the group-specific low molecular weight ligands. A large variety of different dyes are commercially available. The originally predicted specifity for NADH-fold proteins has not been confirmed and so the term pseudo-affinity chromatography was introduced for dye-ligand chromatography. Nevertheless, high purification factors for a large number of proteins have been obtained by the use of dyes as ligands. Bound proteins can easily be eluted from the column material by 1 M NaCl in one step or by a gradient or, more specifically in the case of NADH-fold proteins, by NADH or NADPH. Mono-specific macromolecular ligands are, for example, fibronectin (specific for gelatin), antithrombin (specific for thrombin and heparin), transferrin receptor (specific for transferrin), and antibodies (specific for their antigen). Purification factors of several thousands may be obtained by the use of monoclonal antibodies in one purification step. Because the production of these antibodies is very expensive, crude extracts of the desired protein

2 Analysis and Characterization of Proteins

59

should be prefractionated to avoid proteolytic degradation of the antibodies bound onto the matrix of the affinity column by endogeneous protease activity. Group-specific macromolecular ligands are, for example, lectins (for glycoproteins), staphylococcal protein A and streptococcal protein G (for immunoglobulin G), calmodulin (for calcium-binding proteins) and heparin (broad affinity for lipoproteins, coagulation proteins, receptors and others). Many lectins with specificities for different carbohydrates are commercially available. For affinity chromatography agarose, especially crosslinked agarose, is a widely used matrix material. For large-scale industrial applications cellulose is a cheaper alternative. High performance

liquid affinity chromatography

( H P L A C ) m a y b e per-

formed by beads with 5—15 |am diameter. Many procedures are described for the immobilization of proteins, but it is beyond the scope of this article to describe all of them in detail. The reader is referred to the literature (see References).

An example of the purification of a protein from an entire tissue extract is shown in Table 2.3. Some of the principles described above are realized in this purification. IEC is used as the first separation step. Hydroxylapatite resulted only in a small increase in purity, but this may have been caused by remaining low concentrations of phosphate which inhibited the enzyme in the activity test, thus yielding a lower enzyme activity than under noninhibiting conditions. HIC was applied next, avoiding a desalting and concentrating step. Rechromatography by IEC, again without desalting, reduced the volume of the protein sample with a further increase in purity. The final protein fraction was ultrafiltrated and applied to gel filtration, obtaining a further 4-fold purification. In total, an 850-fold purification factor was obtained for this 260 kDa protein, which was very difficult to maintain in an active form. Table 2.3 Purification table for the purification of carnosine synthetase from chicken skeletal muscle. The enzyme activity is in units (U) and the specific activity as units per milligram of protein (U/mg). One unit is defined as the enzymatic activity which catalyzes the formation of 1 mmol carnosine per minute at 37 °C. Purification step

Crude extract DEAE-Cellulose Hydroxyl-apatite Phenylsepharose C1-4B DEAE-Cellulose ACA34 Gelfiltration

Volume ml

Protein conc. mg/ml

800 990 535

20.2 1.1 0.5

186 131

0.6 0.4

72

0.2

Specific activity mU/mg 0.084 4.3 4.8

Purification factor

Yield % tissue extract

1 51 57

100 341 94

14.2 19.0

168 225

116 73

71.7

850

75

60

Brigitte Wittmann-Liebold, Peter Jurigblut

HPLC-Separations and 2-DE-Polyacrylamide Gel Electrophoresis Resolve Complex Protein Mixtures HPLC stands for high pressure, high performance or, as some people have suggested, high price liquid chromatography; the chromatography is performed in the reverse mode on a hydrophobic matrix made from silica by derivatization with short or long aliphatic side chains and elution by increasing concentrations of organic modifier. It involves liquid chromatography operating under high pressures resulting in short analysis times and high resolutions are obtained automatically but at a high cost. The goal of performing liquid chromatography in times between 5 min and several hours was reached after the development of matrices composed of small rigid beads with diameters in the range, typically of 3—10 (xm, which causes the high back pressures of the columns, and of a high porosity in the range of 10—50 ran nominal pore diameter. Most of the separation procedures described above for liquid chromatography may also be performed in a high pressure mode. Because this technique is very common, we want to concentrate here on the hydrophobic interaction principle in the HPLC mode called reversed phase HPLC (RP-HPLC). The stationary phase typically consists of highly porous small beads of silica. As functional groups alkyl groups such as C18, C8 or C4 groups, or biphenyl groups are used. Recovery of proteins with molecular masses of about 30 kDa may be better with butyl or biphenyl groups as compared to octyl- or octadecyl columns. In order to minimize the free silanol group content of the support, endcapping by sililation with methylsilyl chloride has to be performed. The protein is preferably applied to the column without salt, but in an acidic solution, e. g., in 0.1% trifluoroacetic acid, to assure optimal solubility of the protein and its unique charge. The elution of the bound proteins is achieved by increasing concentrations of organic solvents. The elutropic strength increases from methanol, ethanol, acetonitrile to 1-propanol or 2-propanol. As an example, the separation of ribosomal proteins by microbore RP-HPLC is shown in Figure 2.2. About thirty peaks were obtained, eleven of the proteins eluted in one peak each and were purified enough for direct microsequence analysis.

Electrophoresis is Applied for Separating Highly Complex Protein Mixtures Whereas chromatography has been optimized to a high selectivity, and easy, automated performance, electrophoresis has been brought to extremely high resolutions. Automation of electrophoresis is just at the beginning, with one exception: capillary electrophoresis (CE). But high resolutions for resolving complex protein mixtures still cannot be reached by capillary electrophoresis. On the other hand the improvements in resolution of two-dimensional electropho-

2 Analysis and Characterization of Proteins

61

resis (2-DE) over the past 40 years may be used to demonstrate the progress that has been made. In 1956 Smithies and Poulik separated serum proteins into 15 components by a combination of filter paper- (separation according to charge) and starch gel electrophoresis (separation according to molecular weight). The first 2-DE techniques were performed by employing polyacrylamide as matrix and only changing the acrylamide concentrations from the first to the second dimension; however, this did not increase the number of serum proteins resolved. In 1970 Kaltschmidt and Wittmann separated ribosomal proteins into about 50 components by a 2-DE method combining an 8% acrylamide gel under alkaline buffer conditions with a 18% acrylamide gel in an acidic milieu. For basic proteins like ribosomal proteins this system gave an excellent resolution and enabled the separation of all the proteins of the Escherichia coli ribosorne. Macko and Stegemann (1969) combined isoelectric focusing in gels of low acrylamide concentration with PAGE of high acrylamide content. This Anolyte I , Pio-Q.n

© 3

pl=

3

movement of the protein to its p i (focusing)

pi * 4 assembly of the pH gradient

Ampholyte mixture

©

I, ,Pra',