Synthetic Biology: Methods and Protocols (Methods in Molecular Biology, 2760) [2 ed.] 107163657X, 9781071636572

This second edition provides new and updated techniques and applications associated with synthetic biology. Chapters gui

102 58 19MB

English Pages 542 [523] Year 2024

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Synthetic Biology: Methods and Protocols (Methods in Molecular Biology, 2760) [2 ed.]
 107163657X, 9781071636572

Table of contents :
Preface
Contents
Contributors
Part I: Gene Circuits and Biochemical Pathways
Chapter 1: Plant Engineering to Enable Platforms for Sustainable Bioproduction of Terpenoids
1 Introduction
2 Terpenoid Production in Plants
3 Engineering Strategies to Improve the Bioproduction of Terpenoids in Plants
4 Compartmentalization of Pathways
5 Engineering Plants for the Production of Terpenoids
6 Advancements in Plant Genetic Engineering
7 Conclusions
References
Chapter 2: Compartmentalized Terpenoid Production in Plants Using Agrobacterium-Mediated Transient Expression
1 Introduction
2 Materials
2.1 Vector Assembly
2.2 Preparation and Transformation of Agrobacterium Competent Cells
2.3 N. benthamiana Transformation
3 Methods
3.1 Constructing Transient Expression Vectors
3.2 Preparation and Transformation of Agrobacterium Competent Cells
3.3 Syringe Infiltration
3.4 Vacuum Infiltration
3.5 Analytics
4 Notes
References
Chapter 3: Design Principles for Biological Adaptation: A Systems and Control-Theoretic Treatment
1 Introduction
2 Mathematical Preliminaries
2.1 Systems Theory: A Primer
2.2 Relevant Results on Algebraic Graph Theory
3 Methodology and Applications
3.1 Proposed Methodology
3.2 Identifying Adaptive Network Structures
3.2.1 Assumptions
3.3 Conditions for Adaptation
3.4 From Abstraction to Structure
4 Conclusions
5 Notes
References
Chapter 4: Construction of Xylose-Utilizing Cyanobacterial Chassis for Bioproduction Under Photomixotrophic Conditions
1 Introduction
2 Materials and Equipment
2.1 Target Genes, Plasmids, and Strains
2.2 Medium
2.3 Chemicals and Reagents
2.4 Sample Content Analysis
2.5 Equipment
3 Methods
3.1 Determination of Xylose-Utilizing Capability of Wild-Type Synechococcus 2973
3.2 Construction of Xylose-Utilization Pathway in Synechococcus 2973
3.3 Determination of Xylose Content During Photomixotrophic Conditions
3.4 Determining the Efficiency of Converting Xylose to Intracellular Acetyl-CoA in Strain JQ01
3.5 Analyze the Metabolism Changes of Strain JQ01 Under Photomixotrophic Conditions
3.6 Drive More Carbon Flux to Acetyl-CoA by Rewiring Central Carbon Metabolism
3.7 Bioproduction of 3-HP in the Rewired Chassis Under Photomixotrophic Conditions
4 Notes
References
Chapter 5: Allosteric-Regulation-Based DNA Circuits in Saccharomyces cerevisiae to Detect Organic Acids and Monitor Hydrocarbo...
1 Introduction
2 Materials
2.1 In Silico DNA Analysis and PCR
2.2 Agarose Gel Electrophoresis
2.3 Isothermal DNA Assembly (Gibson Method)
2.4 Yeast Transformation
2.5 Media: Solutions and Plates
2.6 Fluorescence-Activated Cell Sorting (FACS)
2.7 Other Chemicals and Reagents
3 Methods
3.1 Biosensor Design and Assembly
3.2 Touchdown PCR
3.3 Bacterial Transformation and Sequence Verification
3.4 Integration into the Genome of S. cerevisiae
3.5 Cell Selection-Reporter Element
3.6 Cell Selection-Complete Circuit (Receptor and Reporter)
3.7 FACS Measurements
3.8 Detection of Putative Compounds in Environmental Samples
3.9 Real-Time Monitoring of Hydrocarbon Metabolism
4 Notes
References
Chapter 6: dCas12a:Pre-crRNA: A New Tool to Induce mRNA Degradation in Saccharomyces cerevisiae Synthetic Gene Circuits
1 Introduction
2 Materials
2.1 PCR
2.2 Agarose Gel Electrophoresis
2.3 Agarose Gel Preparation
2.4 Gibson Reaction Buffer
2.5 Gibson Assembly Master Mixture
2.6 Mini-preparation
2.7 Yeast Transformation
2.8 FACS Beads and Cleaning Solution Preparation
2.9 Media and Plate
3 Methods
3.1 Plasmid Design
3.2 Gene Circuit Design
3.3 Touchdown PCR
3.4 Agarose Gel Electrophoresis
3.5 Backbone Digestion
3.6 Gibson Assembly
3.7 Escherichia coli DH5α Transformation
3.8 Mini-preparation
3.9 Checking the Sequences of the Assembled Plasmids
3.10 Yeast Transformation
3.11 Strain Selection: FACS Measurement
3.12 Strains Transformed with Two Plasmids
3.13 Strains Transformed with the Complete IMPLY Boolean Gates Responding to Copper and Methionine
4 Notes
References
Part II: Genome Editing and Modification
Chapter 7: CRISPRi-Driven Genetic Screening for Designing Novel Microbial Phenotypes
1 Introduction
2 Materials
2.1 Synthesis of Oligomer Llibrary and Assembly of Plasmid Library
2.2 Transformation of sgRNA Plasmid Library to Helper Strain for Conjugation
2.3 Conjugation of Bacterial Hosts
2.4 Quality Control (QC) of CRISPRi Library
2.5 Growth Profiling of CRISPRi Library to Provide Selected Library
3 Methods
3.1 Design Considerations for sgRNA Oligomer Library
3.2 Synthesis of Oligomer Library and Assembly of Plasmid Library
3.2.1 Assembly of Plasmid Libraries
3.3 Transformation of sgRNA Plasmid Library to Helper Strain for Conjugation
3.3.1 Preparation of Electrocompetent Cells
3.3.2 Electroporation
3.3.3 Transformation of sgRNA Plasmid Library to Helper Strain
3.4 Conjugation of Bacterial Hosts
3.5 Quality Control (QC) of CRISPRi Library
3.5.1 Sequencing Library Preparation
3.5.2 Running NGS
3.5.3 Sequencing Analysis
3.6 Determination of Gene Essentiality
3.6.1 Growth Profiling of CRISPRi Library to Provide Selected Library
3.6.2 Calculation of Gene Fitness Score
3.6.3 Identification of Essential Genes
4 Notes
References
Chapter 8: Enzymatic Preparation of DNA with an Expanded Genetic Alphabet Using Terminal Deoxynucleotidyl Transferase and Its ...
1 Introduction
2 Materials
2.1 Reagents, Kits, and Apparatus
2.2 Oligonucleotides
3 Methods
3.1 Primer Extension with Unnatural Nucleoside Triphosphates by TdT
3.2 Enzyme-Linked Oligonucleotide Assay (ELONA) with a dTPT3Bio-Labeled DNA Aptamer
3.3 Imaging of Bacterial Cells with a dTPT3FAM-Labeled DNA Aptamer
3.4 Enzymatic Synthesis of DNA Strands Containing an Internal UB/UBP
3.4.1 Preparation of DNA Oligonucleotides Containing an Internal Unnatural Nucleobase
3.4.2 PCR Amplification of the Unnatural Nucleobase-Containing DNA Oligonucleotide Product and Biotin Gel Shift Assay
3.4.3 Magnetic Beads Purification of the UBP-Containing DNA Product Produced by PCR
4 Notes
References
Chapter 9: Single-Nucleotide Microbial Genome Editing Using CRISPR-Cas12a
1 Introduction
2 Materials
2.1 Electroporation
2.2 MacConkey Agar
2.3 Sanger Sequencing
3 Methods
3.1 Preparation of Electrocompetent Cell
3.2 Electroporation
3.3 Sanger Sequencing
4 Notes
References
Chapter 10: Multiplex Marker-Less Genome Integration in Pichia pastoris Using CRISPR/Cas9
1 Introduction
2 Materials
2.1 Gibson Assembly (GA) Mix
2.2 Buffer and Reagents
2.3 Culture Medium
3 Methods
3.1 Plasmid Construction
3.2 Transformation of P. pastoris Competent Cells
3.3 Validation of P. pastoris Transformants
4 Notes
References
Chapter 11: Genome Editing, Transcriptional Regulation, and Forward Genetic Screening Using CRISPR-Cas12a Systems in Yarrowia ...
1 Introduction
2 Materials
2.1 Molecular Biology Reagents
2.2 Cell Culture and Transformation
3 Methods
3.1 Design and Cloning of Gene Disruption Constructs
3.2 Transformation of Y. lipolytica with CRISPR-Cas12a Plasmids for Genome Editing or Transcriptional Regulation
3.3 Genome Mutation Analysis
3.4 Design, Cloning, and Application of CRISPRi (Interference) and CRISPRa (Activation) Using CRISPR-Cas12a
3.5 Design, Cloning, and Validation of a Genome-Wide CRISPR-Cas12a Library for Pooled CRISPR Screening in Y. lipolytica
3.6 Pooled CRISPR Screening Protocol for Genome-Wide Cas12a gRNA Activity Profiling in Y. lipolytica
3.7 Bioinformatic Processing of NextSeq Reads to Obtain gRNA Abundances
4 Notes
References
Chapter 12: An Improved Method for Eliminating or Creating Intragenic Bacterial Promoters
1 Introduction
2 Materials
2.1 Hardware and Software Requirements
2.1.1 Software
3 Methods
3.1 Choose Coding Sequence
3.2 Generate New Sequences with σ70 Promoters Weakened (CORPSE) or Strengthened (iCORPSE)
3.3 Analyzed Results
4 Notes
References
Chapter 13: Genetic Code Expansion in Pseudomonas putida KT2440
1 Introduction
2 Materials
2.1 Media and Reagents
2.2 Plasmids
2.3 Equipment
3 Methods
3.1 Cell Cultivation
3.2 Competent Cells Preparation (See Note 5)
3.3 Heat Shock Transformation of Plasmids
3.4 Fluorescence Measurement
3.5 Purification of sfGFP Variant Proteins
3.6 Sample Preparation for MS Analysis of Purified sfGFP
4 Notes
References
Chapter 14: Genome-Wide Screen for Enhanced Noncanonical Amino Acid Incorporation in Yeast
1 Introduction
2 Materials
2.1 Transformation of Reporter System into Yeast
2.2 Prepare Cells for Electroporation
2.3 Electroporation
2.4 Yeast Culture Expansion
2.5 Long-Term Library Storage
2.6 Library Characterization
2.7 Sanger Sequencing
2.8 Fluorescence-Activated Cell Sorting (FACS)
2.9 Deep Sequencing
3 Methods
3.1 Transform pSPS-RepTAG-OTS and pSPS-Rep-OTS into Chemically Competent Yeast
3.2 Prepare Cells for Electroporation
3.2.1 Yeast Culture Growth
3.2.2 Pellet Paint
3.2.3 Making Electrocompetent Cells
3.3 Electroporation
3.4 Expanding Culture
3.5 Long-Term Library Storage
3.6 Library Characterization
3.6.1 Prepare Cells for Flow Cytometry Analysis
3.6.2 Flow Cytometry
3.7 Sequencing Clones Isolated from Naïve Library
3.8 Fluorescence-Activated Cell Sorting
3.8.1 Analytical Flow Cytometry on Recovered Sorts
3.8.2 Sorted Library Characterization
3.9 Deep Sequencing
4 Notes
References
Chapter 15: Positive Selection Screens for Programmable Endonuclease Activity Using I-SceI
1 Introduction
2 Materials
2.1 Cell Culture and Cultivation
2.2 Molecular Biology Reagents
3 Methods
3.1 Electrocompetent Cell Creation
3.2 Transformation (Electroporation)
3.3 I-SceI Lethality Validation (Fig. 3)
3.4 Nuclease Activity Assay (Fig. 4)
3.5 Enrichment and Positive Selection (Fig. 6)
4 Notes
References
Chapter 16: CRISPR-Cas9-Mediated Genome Editing in Paenibacillus polymyxa
1 Introduction
2 Materials
2.1 Molecular Biology
2.2 Cultivation and Transformation
3 Methods
3.1 Cloning of Plasmids for Deletions and Integrations
3.2 Chemical Transformation of E. coli TOP10, E. coli Turbo, and E. coli S17-1
3.3 Conjugation
3.4 Multipurpose Genome Editing
3.4.1 Deletion and Integration
3.4.2 Point Mutations
3.4.3 Multiplexing
4 Notes
References
Part III: Genome Language and Computing
Chapter 17: Programming Juxtacrine-Based Synthetic Signaling Networks in a Cellular Potts Framework
1 Introduction
2 Materials
3 Methods
3.1 Getting Started
3.2 Creating Cells in CC3D
3.3 Adding Adhesion to the In Silico Cell Line (ISCL)
3.4 Adding Motility to ISCL
3.5 Adding Size, Growth, and Division to ISCL
3.6 Adding Genetic Circuits to ISCL: Assigning and Tracking Reporter Levels in Cells
3.7 Adding Genetic Circuits to ISCL: Defining the Signal from Ligands on Neighbor Cells
3.8 Adding Genetic Circuits to ISCL: Determining How the Focal Cell Responds to Signal from Ligands on Neighbors
3.9 Adding Genetic Circuits to ISCL: Changing the State of the Cell as a Result of Reporter Accumulation
3.10 Setting the Initial Simulation State
3.11 Technical Simulation Parameters
3.12 Running the Simulation
3.13 Optional: Quantifications for Monitoring the Signaling Network
3.14 Optional: Parameterscan
4 Notes
References
Chapter 18: Encoding Genetic Circuits with DNA Barcodes Paves the Way for High-Throughput Profiling of Dose-Response Curves of...
1 Introduction
2 Materials
2.1 Synthetic Complete Medium Minus Ura (SC-Ura)
2.2 Yeast Transformation Buffer
3 Methods
3.1 Trackable Assembly Methods
3.2 Transformation of Plasmids into Yeast
3.3 Sort-Seq Experiment
3.4 NGS Library Preparation
3.5 Data Processing and Machine Learning
4 Notes
References
Chapter 19: Machine Learning for Biological Design
1 Introduction
1.1 Predictive Models and Supervised Learning
1.2 Nomenclature
1.3 Anatomy of an Adaptive Experimental Design Workflow
2 Different Design Objectives
2.1 A Working Example
3 Action Improvement
3.1 Bayesian Optimization
3.2 Bandits
3.3 Reinforcement Learning
3.4 Recommendations and Further Reading
4 Predictor Improvement
4.1 Bayes Optimal Experiment Design
4.2 Model Discrimination
4.3 Active Learning
4.4 Recommendations and Further Reading
5 Discussion and Conclusion
5.1 Greedy, Non-Greedy, and Batch Algorithms
5.2 Summary
References
Chapter 20: A Machine Learning Approach for Predicting Essentiality of Metabolic Genes
1 Introduction
2 Theoretical Background and Concepts
2.1 Genome-Scale Metabolic Models
2.2 Flux Balance Analysis
2.3 Prediction of Gene Essentiality with FBA
2.4 Mass Flow Graphs
3 Binary Classification
4 Gene Essentiality Data for Model Training
4.1 Mass Flow Graph of iML1515
4.2 Feature Extraction
4.3 Essentiality Labels
4.3.1 Algorithm for Gene to Reaction Essentiality Mapping
5 Binary Classifiers Trained on Mass Flow Graphs
5.1 Data Standardisation
5.2 Dimensionality Reduction
5.3 Evaluating Classification Performance
5.4 Application to Escherichia Coli Metabolic Network
5.4.1 Baseline Classification Models
5.4.2 Hyperparameter Tuning
5.4.3 Dimensionality Reduction using Principle Component Analysis
5.4.4 Evaluation on the Test Set
6 Transfer Learning
7 Discussion
Appendix 1: Acronyms
Appendix 2: Mapping Gene Essentiality to Reaction Essentiality
Appendix 3: List of Genes in the Test Set
References
Chapter 21: The Causes for Genomic Instability and How to Try and Reduce Them Through Rational Design of Synthetic DNA
1 Introduction
1.1 SSR and RMD in Genetic Instability
2 Counter Selection for SSR in Evolutionary Conserved Sequences
3 ESO Detection of SSR and RMD Sites
4 Epigenetics and the Challenges It Presents to Synthetic Biology
5 ESO Detection of Methylation Motifs
5.1 ESO Automatic DNA Sequence Optimization
6 Conclusions
Bibliography
Chapter 22: Genetic Network Design Automation with LOICA
1 Introduction
2 Materials
2.1 Dependencies
2.2 Installation
3 Methods
3.1 Designing a NOR Gate
3.2 Genetic Ring Oscillator
3.3 Receiver and Inverter Characterization
4 Notes
References
Chapter 23: Flapjack: Data Management and Analysis for Genetic Circuit Characterization
1 Introduction
2 Materials
2.1 Installation
3 Methods
3.1 Data Model
3.2 Accessing Flapjack
3.2.1 Creating an Account
3.2.2 Logging in to an Existing Account
3.3 Home Page
3.4 Preparing and Uploading Data in Flapjack
3.4.1 Preparing the Data File
3.4.2 Uploading the File
Study
Creating a New Study
Machine
Data File
3.5 Browse Page
3.5.1 Studies
Actions
Share
Make Public
Delete
3.5.2 Assays
3.5.3 Vectors
3.6 View Page
3.6.1 View Page Filters and Options
Query
Analysis
Plot Options
3.7 Creating a Plot
3.7.1 Induction Curve
3.8 pyFlapjack
3.8.1 Importing pyFlapjack
3.8.2 Connecting to Flapjack
3.8.3 Functions
get Function
Create Function
Delete Function
Analyzing Data
4 Notes
4.1 Patch Function
References
Part IV: Molecular Assembly
Chapter 24: In Vivo DNA Assembly Using the PEDA Method
1 Introduction
2 Materials
2.1 Polymerase Chain Reaction (PCR)
2.1.1 Equipment
2.1.2 Reagents
2.2 DNA Electrophoresis and Purification
2.2.1 Equipment
2.2.2 Reagents
2.3 Electrocompetent E. coli Cells
2.3.1 Equipment
2.3.2 Reagents
3 Methods
3.1 Design the Primers
3.2 Amplification of the DNA Fragments
3.2.1 Prepare the Following Master Mixture in 50 μL Volume
3.2.2 Set Up the Thermal Cycling Under the Following Program
3.3 Purification of the DNA Fragments
3.4 Preparation of the Electrocompetent Cells
3.5 Electrotransformation of DNA into the Cells
3.6 Validation of the Assembled DNA by Colony PCR
4 Notes
References
Chapter 25: Cell-Free Synthesis and Quantitation of Bacteriophages
1 Introduction
2 Materials
2.1 Phage Amplification
2.2 Phage DNA Extraction and Purification
2.3 Cell-Free Gene Expression
2.4 Dilutions of Cell-Free Synthesized Phage Reactions
2.5 Spotting Assay
2.6 Kinetic Assay
3 Methods
3.1 Preparation of a Phage Lysate
3.2 Phage DNA Extraction
3.3 TXTL of Phages
3.4 Serial Dilutions of Cell-Free Synthesized Phages
3.5 Phage-Host Infection Spotting Assay
3.6 Phage-Host Infection Kinetics Assay
4 Notes
References
Chapter 26: Multimodal Control of Bacterial Gene Expression by Red and Blue Light
1 Introduction
2 Materials
2.1 Bacterial Strains and Cultivation
2.2 Implementation and Analysis of Individual Plasmid Systems
2.2.1 pREDusk and pREDawn
2.2.2 pCrepusculo and pAurora
2.3 Multiplexing of Systems
2.4 Multiplexed Optogenetic Control of Semi-preparative Protein Production
3 Methods
3.1 Implementation and Analysis of Individual Plasmid Systems
3.1.1 pREDusk and pREDawn
3.1.2 pAurora and pCrepusculo
3.2 Multiplexing of Systems
3.3 Multiplexed Optogenetic Control of Semi-preparative Protein Production
4 Notes
References
Chapter 27: In Silico Design, In Vitro Construction, and In Vivo Application of Synthetic Small Regulatory RNAs in Bacteria
1 Introduction
2 Materials
2.1 Computational Equipment
2.2 Plasmids
2.3 DNA Oligonucleotides
2.4 Enzymes
2.5 Antibiotics
2.6 Chemicals, Buffers, and Media Components
2.7 Consumables
2.8 Strains
3 Methods
3.1 Seed Prediction Using SEEDling
3.2 Construction and Validation of Synthetic sRNA TUs by Golden Gate Cloning
3.2.1 Golden Gate Cloning
3.2.2 Transformation of Golden Gate Reaction into E. coli Cells
3.2.3 Screening and Validation of Plasmid DNA
3.3 Synthetic sRNA Functionality Test on Solid Media
3.4 Synthetic sRNA Functionality Test in Liquid Media
3.5 Fluorescence Reporter Assay
3.6 Summary & Perspectives
4 Notes
References
Index

Citation preview

Methods in Molecular Biology 2760

Jeffrey Carl Braman  Editor

Synthetic Biology Methods and Protocols Second Edition

METHODS

IN

MOLECULAR BIOLOGY

Series Editor John M. Walker School of Life and Medical Sciences University of Hertfordshire Hatfield, Hertfordshire, UK

For further volumes: http://www.springer.com/series/7651

For over 35 years, biological scientists have come to rely on the research protocols and methodologies in the critically acclaimed Methods in Molecular Biology series. The series was the first to introduce the step-by-step protocols approach that has become the standard in all biomedical protocol publishing. Each protocol is provided in readily-reproducible step-bystep fashion, opening with an introductory overview, a list of the materials and reagents needed to complete the experiment, and followed by a detailed procedure that is supported with a helpful notes section offering tips and tricks of the trade as well as troubleshooting advice. These hallmark features were introduced by series editor Dr. John Walker and constitute the key ingredient in each and every volume of the Methods in Molecular Biology series. Tested and trusted, comprehensive and reliable, all protocols from the series are indexed in PubMed.

Synthetic Biology Methods and Protocols Second Edition

Edited by

Jeffrey Carl Braman Agilent Technologies, Inc, La Jolla, CA, USA

Editor Jeffrey Carl Braman Agilent Technologies, Inc La Jolla, CA, USA

ISSN 1064-3745 ISSN 1940-6029 (electronic) Methods in Molecular Biology ISBN 978-1-0716-3657-2 ISBN 978-1-0716-3658-9 (eBook) https://doi.org/10.1007/978-1-0716-3658-9 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2024 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Humana imprint is published by the registered company Springer Science+Business Media, LLC, part of Springer Nature. The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A. Paper in this product is recyclable.

Preface Synthetic Biology represents a unique branch of science requiring the expertise of chemists, biologists, and engineers. This broad statement is intended to entice biochemists, analytical and organic chemists, chemical engineers, molecular biologists, computer scientists, software, mechanical, and electrical engineers, biophysicists, plant and evolutionary biologists, process and manufacturing engineers, and pharmaceutical scientists into research programs and collaborations that will solve difficult problems such as producing renewable energy resources, feeding an ever-increasing world population, and curing disease, to name just a few examples. The second edition of Synthetic Biology complements and enhances the first edition and provides you with updated and enhanced techniques and tools. As with the first edition, experienced contributors to the second edition have previously published their work in reputable, peer-reviewed periodicals, and their contributions to this second edition provide expert step-by-step guidance and ideas for conducting your own synthetic biology research. La Jolla, CA, USA

Jeffrey Carl Braman

v

Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

PART I

GENE CIRCUITS AND BIOCHEMICAL PATHWAYS

1 Plant Engineering to Enable Platforms for Sustainable Bioproduction of Terpenoids. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jacob D. Bibik and Bjo¨rn Hamberger 2 Compartmentalized Terpenoid Production in Plants Using Agrobacterium-Mediated Transient Expression . . . . . . . . . . . . . . . . . . . . . . . Jacob D. Bibik, Abigail E. Bryson, and Bjo¨rn Hamberger 3 Design Principles for Biological Adaptation: A Systems and Control-Theoretic Treatment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Priyan Bhattacharya, Karthik Raman, and Arun K. Tangirala 4 Construction of Xylose-Utilizing Cyanobacterial Chassis for Bioproduction Under Photomixotrophic Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xinyu Song, Yue Ju, Lei Chen, and Weiwen Zhang 5 Allosteric-Regulation-Based DNA Circuits in Saccharomyces cerevisiae to Detect Organic Acids and Monitor Hydrocarbon Metabolism In Vitro . . . . . Michael Dare Asemoloye and Mario Andrea Marchisio 6 dCas12a:Pre-crRNA: A New Tool to Induce mRNA Degradation in Saccharomyces cerevisiae Synthetic Gene Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . Lifang Yu and Mario Andrea Marchisio

PART II

v xi

3

21

35

57

77

95

GENOME EDITING AND MODIFICATION

7 CRISPRi-Driven Genetic Screening for Designing Novel Microbial Phenotypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Minjeong Kang, Kangsan Kim, and Byung-Kwan Cho 8 Enzymatic Preparation of DNA with an Expanded Genetic Alphabet Using Terminal Deoxynucleotidyl Transferase and Its Applications. . . . . . . . . . . . Guangyuan Wang, Yuhui Du, and Tingjian Chen 9 Single-Nucleotide Microbial Genome Editing Using CRISPR-Cas12a. . . . . . . . . Ho Joung Lee and Sang Jun Lee 10 Multiplex Marker-Less Genome Integration in Pichia pastoris Using CRISPR/Cas9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jucan Gao, Jintao Cheng, and Jiazhang Lian

vii

117

133 147

157

viii

11

12

13 14

15

16

Contents

Genome Editing, Transcriptional Regulation, and Forward Genetic Screening Using CRISPR-Cas12a Systems in Yarrowia lipolytica. . . . . . . . . . . . . . . . . . . . . . . Adithya Ramesh, Sangcheon Lee, and Ian Wheeldon An Improved Method for Eliminating or Creating Intragenic Bacterial Promoters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ellina Trofimova, Dominic Y. Logel, and Paul R. Jaschke Genetic Code Expansion in Pseudomonas putida KT2440. . . . . . . . . . . . . . . . . . . . Tianyu Gao, Jiantao Guo, and Wei Niu Genome-Wide Screen for Enhanced Noncanonical Amino Acid Incorporation in Yeast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Briana R. Lino and James A. Van Deventer Positive Selection Screens for Programmable Endonuclease Activity Using I-SceI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael A. Mechikoff, Kok Zhi Lee, and Kevin V. Solomon CRISPR-Cas9-Mediated Genome Editing in Paenibacillus polymyxa . . . . . . . . . . Giulia Ravagnan, Meliawati Meliawati, and Jochen Schmid

PART III 17

18

19 20

21

22

23

169

199 209

219

253 267

GENOME LANGUAGE AND COMPUTING

Programming Juxtacrine-Based Synthetic Signaling Networks in a Cellular Potts Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calvin Lam and Leonardo Morsut Encoding Genetic Circuits with DNA Barcodes Paves the Way for High-Throughput Profiling of Dose-Response Curves of Metabolite Biosensors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Huibao Feng, Yikang Zhou, and Chong Zhang Machine Learning for Biological Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tom Blau, Iadine Chades, and Cheng Soon Ong A Machine Learning Approach for Predicting Essentiality of Metabolic Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lilli J. Freischem and Diego A. Oyarzu´n The Causes for Genomic Instability and How to Try and Reduce Them Through Rational Design of Synthetic DNA . . . . . . . . . . . . . . . . . . . . . . . . . Matan Arbel-Groissman, Itamar Menuhin-Gruman, Hader Yehezkeli, Doron Naki, Shaked Bergman, Yarin Udi, and Tamir Tuller Genetic Network Design Automation with LOICA . . . . . . . . . . . . . . . . . . . . . . . . . Gonzalo Vidal, Carolus Vitalis, Tamara Matu´te, ˜ ez, Ferna´n Federici, and Timothy J. Rudge Isaac Nu´n Flapjack: Data Management and Analysis for Genetic Circuit Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ˜ ez Feliu´, Gonzalo Vidal, Carolus Vitalis, Guillermo Ya´n ˜ oz Silva, Tamara Matu´te, Isaac Nu´n ˜ ez, Macarena Mun Ferna´n Federici, and Timothy J. Rudge

283

309 319

345

371

393

413

Contents

PART IV

ix

MOLECULAR ASSEMBLY

In Vivo DNA Assembly Using the PEDA Method . . . . . . . . . . . . . . . . . . . . . . . . . . Tianyuan Su, Qingxiao Pang, and Qingsheng Qi 25 Cell-Free Synthesis and Quantitation of Bacteriophages . . . . . . . . . . . . . . . . . . . . . Antoine Levrier, Steven Bowden, Bruce Nash, Ariel Lindner, and Vincent Noireaux 26 Multimodal Control of Bacterial Gene Expression by Red and Blue Light . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stefanie S. M. Meier, Elina Multam€ a ki, Ame´rico T. Ranzani, Heikki Takala, and Andreas Mo¨glich 27 In Silico Design, In Vitro Construction, and In Vivo Application of Synthetic Small Regulatory RNAs in Bacteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . ¨ ck, Bork A. Berghoff, and Daniel Schindler Michel Bru 24

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

437 447

463

479 509

Contributors MATAN ARBEL-GROISSMAN • Shmunis School of Biomedicine and Cancer Research, The George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel MICHAEL DARE ASEMOLOYE • School of Pharmaceutical Science and Technology, Tianjin University, Tianjin, China BORK A. BERGHOFF • Institute for Microbiology and Molecular Biology, Justus-Liebig University Giessen, Giessen, Germany SHAKED BERGMAN • Department of Biomedical Engineering, Tel Aviv University, Tel Aviv, Israel PRIYAN BHATTACHARYA • Department of Chemical Engineering, Indian Institute of Technology, Madras (IIT Madras), Chennai, India; Robert Bosch Centre of Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, Chennai, India; Initiative for Biological Science and Systems mEdicine (IBSE), IIT Madras, Chennai, India JACOB D. BIBIK • Department of Biochemistry, Michigan State University, East Lansing, USA; MelaTech, LLC, Baltimore, Maryland, USA TOM BLAU • CSIRO, Data 61, Eveleigh, NSW, Australia STEVEN BOWDEN • University of Minnesota, Department of Food Science and Nutrition, Saint Paul, MN, USA MICHEL BRU¨CK • Max Planck Institute for Terrestrial Microbiology, Marburg, Germany; Institute for Microbiology and Molecular Biology, Justus-Liebig University Giessen, Giessen, Germany ABIGAIL E. BRYSON • Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA IADINE CHADES • CSIRO, Environment, Brisbane, QLD, Australia JINTAO CHENG • Key Laboratory of Biomass Chemical Engineering of Ministry of Education, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, China; ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, China LEI CHEN • Laboratory of Synthetic Microbiology, School of Chemical Engineering & Technology, Tianjin University, Tianjin, People’s Republic of China; Key Laboratory of Systems Bioengineering and Frontier Science Center of Synthetic Biology, the Ministry of Education of China, Tianjin University, Tianjin, People’s Republic of China TINGJIAN CHEN • MOE International Joint Research Laboratory on Synthetic Biology and Medicines, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, P. R. China BYUNG-KWAN CHO • Department of Biological Sciences, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea; Innovative Biomaterials Research Center, KI for the BioCentury, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea; Intelligent Synthetic Biology Center, Daejeon, Republic of Korea YUHUI DU • MOE International Joint Research Laboratory on Synthetic Biology and Medicines, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, P. R. China FERNA´N FEDERICI • ANID-Millennium Science Initiative Program Millennium Institute for Integrative Biology (iBio), Santiago, Chile; FONDAP Center for Genome Regulation,

xi

xii

Contributors

Santiago, Chile; Institute for Biological and Medical Engineering Schools of Engineering, Medicine and Biological Sciences Pontificia Universidad Catolica de Chile, Santiago, Chile GUILLERMO YA´N˜EZ FELIU´ • Interdisciplinary Computing and Complex Biosystems, School of Computing, Newcastle University, Newcastle upon Tyne, UK HUIBAO FENG • MOE Key Laboratory for Industrial Biocatalysis, Institute of Biochemical Engineering, Department of Chemical Engineering, Tsinghua University, Beijing, China LILLI J. FREISCHEM • Department of Physics, University of Oxford, Oxford, UK JUCAN GAO • Key Laboratory of Biomass Chemical Engineering of Ministry of Education, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, China; ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, China TIANYU GAO • Department of Chemistry, University of Nebraska-Lincoln, Lincoln, Nebraska, USA JIANTAO GUO • Department of Chemistry, University of Nebraska-Lincoln, Lincoln, Nebraska, USA; The Nebraska Center for Integrated Biomolecular Communication (NCIBC), University of Nebraska-Lincoln, Lincoln, Nebraska, USA BJO¨RN HAMBERGER • Department of Biochemistry, Michigan State University, East Lansing, USA PAUL R. JASCHKE • School of Natural Sciences, Macquarie University, Sydney, New South Wales, Australia; ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney, Australia YUE JU • Laboratory of Synthetic Microbiology, School of Chemical Engineering & Technology, Tianjin University, Tianjin, People’s Republic of China; Key Laboratory of Systems Bioengineering and Frontier Science Center of Synthetic Biology, the Ministry of Education of China, Tianjin University, Tianjin, People’s Republic of China MINJEONG KANG • Department of Biological Sciences, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea KANGSAN KIM • Department of Biological Sciences, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea CALVIN LAM • Department of Biochemistry and Molecular Biology, University of Nebraska Medical Center, Omaha, NE, USA HO JOUNG LEE • Department of Systems Biotechnology, and Institute of Microbiomics, Chung-Ang University, Anseong, Republic of Korea KOK ZHI LEE • Department of Energy, Environmental & Chemical Engineering, Washington University at St. Louis, St. Louis, MO, USA SANG JUN LEE • Department of Systems Biotechnology, and Institute of Microbiomics, ChungAng University, Anseong, Republic of Korea SANGCHEON LEE • Department of Chemical and Environmental Engineering, University of California, Riverside, USA ANTOINE LEVRIER • Universite´ de Paris, INSERM U1284, Center for Research and Interdisciplinarity (CRI), Paris, France; University of Minnesota, Physics and Nanotechnology, Minneapolis, MN, USA JIAZHANG LIAN • Key Laboratory of Biomass Chemical Engineering of Ministry of Education, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, China; ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, China; Zhejiang Key Laboratory of Smart Biomaterials, Zhejiang University, Hangzhou, China

Contributors

xiii

ARIEL LINDNER • Universite´ de Paris, INSERM U1284, Center for Research and Interdisciplinarity (CRI), Paris, France BRIANA R. LINO • Chemical and Biological Engineering Department, Tufts University, Medford, MA, USA DOMINIC Y. LOGEL • School of Natural Sciences, Macquarie University, Sydney, New South Wales, Australia; ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney, Australia MARIO ANDREA MARCHISIO • School of Pharmaceutical Science and Technology, Tianjin University, Tianjin, China TAMARA MATU´TE • ANID-Millennium Science Initiative Program Millennium Institute for Integrative Biology (iBio), Santiago, Chile; FONDAP Center for Genome Regulation, Santiago, Chile; Institute for Biological and Medical Engineering Schools of Engineering, Medicine and Biological Sciences Pontificia Universidad Catolica de Chile, Santiago, Chile MICHAEL A. MECHIKOFF • Department of Biology, US Air Force Academy, Colorado Springs, CO, USA STEFANIE S. M. MEIER • Department of Biochemistry, University of Bayreuth, Bayreuth, Germany MELIAWATI MELIAWATI • Institute of Molecular Microbiology and Biotechnology, University of Mu¨nster, Mu¨nster, Germany ITAMAR MENUHIN-GRUMAN • School of Mathematical Sciences, The Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv, Israel ANDREAS MO¨GLICH • Department of Biochemistry, University of Bayreuth, Bayreuth, Germany; Bayreuth Center for Biochemistry and Molecular Biology, Universit€ a t Bayreuth, Bayreuth, Germany; North-Bavarian NMR Center, Universit€ a t Bayreuth, Bayreuth, Germany LEONARDO MORSUT • The Eli and Edythe Broad CIRM Center, Department of Stem Cell Biology and Regenerative Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Department of Biomedical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, USA ELINA MULTAMA€ KI • Department of Anatomy, University of Helsinki, Helsinki, Finland DORON NAKI • Shmunis School of Biomedicine and Cancer Research, The George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel BRUCE NASH • Cold Spring Harbor Laboratory, DNA Learning Center, Cold Spring Harbor, NY, USA WEI NIU • Department of Chemical & Biomolecular Engineering, University of NebraskaLincoln, Lincoln, Nebraska, USA; The Nebraska Center for Integrated Biomolecular Communication (NCIBC), University of Nebraska-Lincoln, Lincoln, Nebraska, USA VINCENT NOIREAUX • University of Minnesota, Physics and Nanotechnology, Minneapolis, MN, USA ISAAC NU´N˜EZ • ANID-Millennium Science Initiative Program Millennium Institute for Integrative Biology (iBio), Santiago, Chile; FONDAP Center for Genome Regulation, Santiago, Chile; Institute for Biological and Medical Engineering Schools of Engineering, Medicine and Biological Sciences Pontificia Universidad Catolica de Chile, Santiago, Chile CHENG SOON ONG • CSIRO, Data 61, Canberra, ACT, Australia DIEGO A. OYARZU´N • School of Informatics, University of Edinburgh, Edinburgh, UK; School of Biological Sciences, University of Edinburgh, Edinburgh, UK; The Alan Turing Institute, London, UK

xiv

Contributors

QINGXIAO PANG • Shandong Lishan Biotechnology Co. LTD, Jinan, People’s Republic of China QINGSHENG QI • State Key Laboratory of Microbial Technology, Shandong university, Qingdao, People’s Republic of China KARTHIK RAMAN • Bhupat and Jyoti Mehta School of Biosciences, IIT Madras, Chennai, India; Robert Bosch Centre of Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, Chennai, India; Initiative for Biological Science and Systems mEdicine (IBSE), IIT Madras, Chennai, India ADITHYA RAMESH • Department of Chemical and Environmental Engineering, University of California, Riverside, USA AME´RICO T. RANZANI • Department of Biochemistry, University of Bayreuth, Bayreuth, Germany GIULIA RAVAGNAN • Institute of Molecular Microbiology and Biotechnology, University of Mu¨nster, Mu¨nster, Germany TIMOTHY J. RUDGE • Interdisciplinary Computing and Complex Biosystems, School of Computing, Newcastle University, Newcastle upon Tyne, UK DANIEL SCHINDLER • Max Planck Institute for Terrestrial Microbiology, Marburg, Germany; Center for Synthetic Microbiology (SYNMIKRO), Philipps-University Marburg, Marburg, Germany JOCHEN SCHMID • Institute of Molecular Microbiology and Biotechnology, University of Mu¨nster, Mu¨nster, Germany MACARENA MUN˜OZ SILVA • Interdisciplinary Computing and Complex Biosystems, School of Computing, Newcastle University, Newcastle upon Tyne, UK KEVIN V. SOLOMON • Department of Chemical & Biomolecular Engineering, University of Delaware, Newark, DE, USA XINYU SONG • Laboratory of Synthetic Microbiology, School of Chemical Engineering & Technology, Tianjin University, Tianjin, People’s Republic of China; Key Laboratory of Systems Bioengineering and Frontier Science Center of Synthetic Biology, the Ministry of Education of China, Tianjin University, Tianjin, People’s Republic of China; Center for Biosafety Research and Strategy, Tianjin University, Tianjin, People’s Republic of China TIANYUAN SU • State Key Laboratory of Microbial Technology, Shandong university, Qingdao, People’s Republic of China HEIKKI TAKALA • Department of Anatomy, University of Helsinki, Helsinki, Finland; Department of Biological and Environmental Science, Nanoscience Center, University of Jyvaskyla, Jyvaskyla, Finland ARUN K. TANGIRALA • Department of Chemical Engineering, Indian Institute of Technology, Madras (IIT Madras), Chennai, India; Robert Bosch Centre of Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, Chennai, India; Initiative for Biological Science and Systems mEdicine (IBSE), IIT Madras, Chennai, India ELLINA TROFIMOVA • School of Natural Sciences, Macquarie University, Sydney, New South Wales, Australia; ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney, Australia TAMIR TULLER • Department of Biomedical Engineering, Tel Aviv University, Tel Aviv, Israel; The Sagol School of Neuroscience, Tel-Aviv University, Tel Aviv, Israel YARIN UDI • Department of Biomedical Engineering, Tel Aviv University, Tel Aviv, Israel JAMES A. VAN DEVENTER • Chemical and Biological Engineering Department, Tufts University, Medford, MA, USA; Biomedical Engineering Department, Tufts University, Medford, MA, USA

Contributors

xv

GONZALO VIDAL • Interdisciplinary Computing and Complex Biosystems, School of Computing, Newcastle University, Newcastle upon Tyne, UK CAROLUS VITALIS • Department of Electrical, Computer, and Energy Engineering, University of Colorado Boulder, Boulder, Colorado, USA GUANYUAN WANG • MOE International Joint Research Laboratory on Synthetic Biology and Medicines, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, P. R. China IAN WHEELDON • Department of Chemical and Environmental Engineering, University of California, Riverside, USA HADER YEHEZKELI • Department of Biomedical Engineering, Tel Aviv University, Tel Aviv, Israel LIFANG YU • School of Pharmaceutical Science and Technology, Tianjin University, Tianjin, China CHONG ZHANG • MOE Key Laboratory for Industrial Biocatalysis, Institute of Biochemical Engineering, Department of Chemical Engineering, Tsinghua University, Beijing, China; Center for Synthetic and Systems Biology, Tsinghua University, Beijing, China WEIWEN ZHANG • Laboratory of Synthetic Microbiology, School of Chemical Engineering & Technology, Tianjin University, Tianjin, People’s Republic of China; Key Laboratory of Systems Bioengineering and Frontier Science Center of Synthetic Biology, the Ministry of Education of China, Tianjin University, Tianjin, People’s Republic of China; Center for Biosafety Research and Strategy, Tianjin University, Tianjin, People’s Republic of China YIKANG ZHOU • MOE Key Laboratory for Industrial Biocatalysis, Institute of Biochemical Engineering, Department of Chemical Engineering, Tsinghua University, Beijing, China

Part I Gene Circuits and Biochemical Pathways

Chapter 1 Plant Engineering to Enable Platforms for Sustainable Bioproduction of Terpenoids Jacob D. Bibik and Bjo¨rn Hamberger Abstract Terpenoids represent the most diverse class of natural products, with a broad spectrum of industrial relevance including applications in green solvents, flavors and fragrances, nutraceuticals, colorants, and therapeutics. They are typically challenging to extract from their natural sources, where they occur in small amounts and mixtures of related but unwanted byproducts. Formal chemical synthesis, where established, is reliant on petrochemistry. Hence, there is great interest in developing sustainable solutions to assemble biosynthetic pathways in engineered host organisms. Metabolic engineering for chemical production has largely focused on microbial hosts, yet plants offer a sustainable production platform. In addition to containing the precursor pathways that generate the terpenoid building blocks as well as the cell structures and compartments required, or tractable localization for the enzymes involved, plants may provide a low input system to produce these chemicals using carbon dioxide and sunlight only. There have been significant recent advancements in the discovery of pathways to terpenoids of interest as well as strategies to boost yields in host plants. While part of the phytochemical field is focusing on the discovery of biosynthetic pathways, this review will focus on advancements using the pathway toolbox and toward engineering plants for the production of terpenoids. We will highlight strategies currently used to produce target products, optimization of known pathways to improve yields, compartmentalization of pathways within cells, and genetic tools developed to facilitate complex engineering of biosynthetic pathways. These advancements in Synthetic Biology are bringing engineered plant systems closer to commercially relevant hosts for the bioproduction of terpenoids. Key words Terpene, Engineering, Terpene synthase, Synthetic biology, Metabolic engineering, Bioproduction, Compartmentalization, Regulation.

1

Introduction Plants contain enormous chemical diversity, which humans have been recognizing for thousands of years, especially from traditionally medicinal plants [1–3]. Since the earliest plants with highly limited terpene diversity began to colonize the land [4], this diversity has exploded as rapidly speciating plants have evolved an array of specialized metabolites, with roles in adaptation and response to various environmental conditions, both biotic and abiotic, as well as

Jeffrey Carl Braman (ed.), Synthetic Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2760, https://doi.org/10.1007/978-1-0716-3658-9_1, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2024

3

4

Jacob D. Bibik and Bjo¨rn Hamberger

signaling and communication. An estimated number of more than 1 million unique metabolites is produced across all plants [5, 6], which establishes them as excellent natural product factories. This potential can be further expanded onthrough engineering for biotechnological applications [7, 8]. Terpenes, and the further functionalized terpenoids, are a particularly diverse group of natural products with 66,650 of the total 86,000 reported structures identified in plants, according to the Dictionary of Natural Products (Dictionary of Natural Products 31.1, https://dnp.chemnetbase. com/ accessed 9/2022). While terpenoids have important roles within central metabolism across plant species, the majority have evolved as specialized metabolites, often only found in specific plant lineages or species [9–11]. The expansive chemical diversity of terpenes and terpenoids far surpasses their known roles in plants [12]. Yet, these specific natural product-accumulating plants have been exploited throughout human history as culinary herbs and spices, medicinal plants, for resins, and other traditional applications [13]. The current global market for terpene flavor and fragrances (C10–C30) is expected to reach nearly $5B by 2024 with an annual increase in demand of 5.2% [14]. There has been growing interest in advancing synthetic biology approaches to engineer plants into chemical factories through engineering of biosynthetic pathways for terpenoids important to modern society, while even further expanding beyond naturally occurring terpenoid chemistry [9, 11, 15–17]. The discovery and installation of these biosynthetic pathways have enabled the production of not only natural terpenoids but also the creation of newto-nature structures by combining modular pathways [18– 20]. Much of the success in plant engineering has involved simple pathways, i.e., short terpene pathways, but strategies are being developed to build multigene complex pathways by controlling how, where, and when terpenoids are produced within plants. In addition to the innovations with pathway engineering, there have been advancements in genetic engineering tools. Yet, to enable the engineering of longer and more complex pathways beyond terpene synthases, i.e., downstream enzymes responsible for the functionalization and decoration of terpene scaffolds, advanced tools will be needed. Here, we review recent strategies to engineer plants for the production of terpenoids, optimization of biosynthetic pathways and carbon flux, compartmentalization of pathways and products, and advancements in genetic tools to build plant hosts with increasingly larger and more complex pathways.

2

Terpenoid Production in Plants Terpenoids are universally found across all kingdoms of life and are formed from common C5 building blocks. These building blocks,

Plant Engineering to Enable Platforms for Sustainable Bioproduction of. . .

5

isopentenyl diphosphate (IDP) and dimethylallyl diphosphate (DMADP), are isomers synthesized via two pathways that are both present in plants, although most non-plant organisms contain only one. The mevalonate (MVA) pathway, which is commonly found in eukaryotic organisms, synthesizes a cytosolic pool of IDP/DMADP starting with condensation of three acetyl-CoA molecules in two enzymatic steps to form 3-hydroxy-3-methylglutaryl-CoA (HMG-CoA). HMG-CoA is then reduced by HMG-CoA reductase (HMGR) to form mevalonate, the dedicated step to the MVA pathway. The methylerythritol 4-phosphate (MEP) pathway, often found in prokaryotic organisms and characteristic of plant plastids, synthesizes a pool of IDP/DMADP, which begins with the condensation of pyruvate and glyceraldehyde 3-phosphate (GAP). This reaction forms 1-deoxy-d-xylulose-5phosphate (DXP) and is catalyzed by DXP synthase (DXS). In addition to this natural compartmentalization of IDP/DMADP accumulation, the enzymes involved in the immediate terpene precursor and scaffold biosynthesis are also co-localized to access the respective building block pools in each compartment. Typically localized to plastids are the diphosphate synthases, which form the C10 geranyl diphosphate (GDP) or the C20 geranylgeranyl diphosphate (GGDP) through condensation of two or four IDP/DMADP molecules by GDP synthase or GGDP synthase (GGDPS), respectively. Furthermore, mono- and di-terpene synthases co-localized to plastids utilize GDP and GGDP to synthesize the C10 mono- and C20 diterpenes, respectively. Some plants are native emitters of the C5 hemiterpene isoprene and contain an isoprene synthase (ISPS), which also typically resides in the plastid. Cytosolically, three IDP/DMADP molecules form the C15 intermediate farnesyl diphosphate (FDP) through condensation by FDP synthase (FDPS), which serves as the substrate for sesquiterpene synthases to synthesize sesquiterpenes. Localized to the endoplasmic reticulum is squalene synthase (SQS), which condenses two FDP molecules to form the C30 squalene, which is the precursor to sterols and triterpenoids. There have been many research efforts to understand and engineer these pathways while developing hosts for biosynthesis of terpenoids as direct products and intermediates in semi-chemical synthesis, e.g., refs [21–23]. In some instances, plant species that are known to synthesize particular terpenoids have been manipulated to enable larger-scale production. For example, the diterpene (-)-sclareol from Salvia sclarea (clary sage) is commonly extracted as an intermediate of the semi-chemical synthesis of ambroxide, a high-value product used in the fragrance industry [24]. Ambroxide was historically extracted from sperm whale ambergris but the biosynthetic pathway is unknown, although bacterial pathways have recently been engineered to synthesize the triterpenoid precursor, ambrein, of which ambroxide is an oxidized product [25, 26]. In another example, plant tissue culture techniques were

6

Jacob D. Bibik and Bjo¨rn Hamberger

developed as the production platform for paclitaxel, a complex diterpenoid found in yew trees (Taxus spp.) [27]. Under standard growth conditions, paclitaxel was not produced at industrially relevant yields, but the development and optimization of tissue culture enabled efficient production for the compound to become widely used in chemotherapeutics. While these examples demonstrate the ingenuity to produce high-value terpenoids from native species, this is not feasible for most plants or terpenoids. To this end, engineering plants with specific biosynthetic pathways and strategies to optimize yields has become a focus in synthetic biology and will be in line with developing the potential of plants for a sustainable future [28]. While microbial fermentation is often used for the production of chemicals like terpenoids, engineered plants present an opportunity for a low-input chassis system by utilizing sunlight and CO2 to produce large amounts of biomass [29, 30]. This can be especially effective in high biomass producing, non-food crops, which can be grown on marginal lands not suitable for food crops. Additionally, plants already contain cellular structures, co-enzymes, and precursor pathways to support the production of many natural products. There are many examples of effective plant engineering to produce valuable terpenoids with biotechnological importance. A broad spectrum of terpenoid pathways has been engineered in tobacco (Nicotiana spp.), which has become extensively used due to the ease of generating transgenic plants, as well as transient expression system to rapidly test pathways [11, 15–17, 19, 31–36]. Other model plants have also been used for terpenoid production, including tomato [37], Arabidopsis [38, 39], Camelina [40], and even moss [4, 41]. These platforms have been employed to produce volatile terpenoids including the hemiterpene isoprene [33], monoterpenoids [15, 38, 39], and sesquiterpenoids [40, 41], many of which are commonly used in the flavor and fragrance industry. The longer chain, non-volatile terpenoids have also been of great interest, including combinatorial strategies to expand the number of diterpenoids that can be produced in planta [11, 19, 42] and have potential roles in a range of industries. Additionally, there have been efforts to engineer the production of squalene as well as downstream triterpenoids and sterols, which have applications in cosmetic oils, biofuels, or pharmaceuticals [16, 17, 35]. Finally, there has been a focus on engineering tetraterpenoids and, in particular, carotenoids, which are of interest as natural pigments and nutritional additives [32, 36]. While many studies have enabled the synthesis of an array of terpenoids, they often result in low yields because they are not optimized for production, with deleterious effects on the products and the misbalance of native metabolism as well. We will next discuss strategies to improve production in plant hosts, through pathway optimization and compartmentalization.

Plant Engineering to Enable Platforms for Sustainable Bioproduction of. . .

3

7

Engineering Strategies to Improve the Bioproduction of Terpenoids in Plants Approaches have been developed to increase metabolic flux toward desired terpenoids including overexpression of key bottlenecks in the MEP/MVA pathways, engineering de-regulated variants of enzymes, e.g., a truncated HMGR, or mutated DXS [43, 44], incorporation of alternative contributors to influence IDP/DMADP pools [38, 39, 45], and silencing of competing pathways [34]. Each of these approaches either improves the availability of or redirects IDP/DMADP for the production of desired terpenoids. For example, overexpression of rate-limiting steps of the MVA or MEP pathways, HMGR and DXS, enables significant increases in terpenoids by increasing metabolic flux toward IDP/DMADP [45–48]. These enzymes have also been targeted for engineering to reduce feedback inhibition from the end products IDP and DMADP [43, 44]. Other strategies have been developed via alternative contributions to the MVA/MEP pathways or IDP/DMADP formation directly. One set of enzymes, phosphomevalonate decarboxylase (MPD) and isopentenyl phosphate kinase (IPK), were used to recreate an archaeal pathway in plants, which performs the final two steps of the MVA pathway to form IDP/DMADP, but essentially in reverse order [38, 39]. Another group found overexpressing the gene for a Biotin Carboxyl Carrier Protein 1 (BCCP1) can disrupt the proper formation of the acetyl-CoA complex, increasing the availability of acetyl-CoA to enter the MVA pathway by reducing conversion to malonyl-CoA in plastids [45]. Virus-induced gene silencing has been pursued to reduce the conversion of shared precursors and redirect them toward desired terpenoids. For example, silencing expression of phytoene synthase, which converts GGDP to phytoene, a carotenoid precursor, increased the production of the diterpene taxadiene, which is also derived from GGDP [34]. There are also other enzyme engineering strategies from microbes that may be of value in plants. For example, bacterial enzyme variants of DXS have been developed to synthesize DXP but from ribulose 5-phosphate, potentially providing an unregulated mechanism to improve MEP pathway flux in plants [49]. Additionally, random mutagenesis approaches developed in Escherichia coli to improve enzyme functions may be a strategy to improve enzymes, which may ultimately be utilized for plant production. Engineered GGDPS variants created through random mutagenesis showed improved lycopene production in screening as well as increased microbial production of the diterpene levopimaradiene [50].

8

Jacob D. Bibik and Bjo¨rn Hamberger

4

Compartmentalization of Pathways Strategies have been developed to improve terpenoid production by re-targeting pathways between the cytosol and plastids [15, 17, 31, 36, 51], creating synthetic compartments [17, 31], and targeting pathways to specific tissues [52–58]. Plants naturally compartmentalize terpenoid biosynthesis within cells as well as across different tissue types, plausibly to insulate specific routes from the competition with general metabolism, to utilize dedicated pools of precursor building blocks, and for their specialized roles. This strategy can be applied to engineer novel pathways. Re-targeting terpenoid biosynthesis from the cytosol to plastids has been shown to improve terpenoid production [15, 16], as has re-targeting natively plastidial pathways to the cytosol [36]. Mitochondria have also been targeted for the production of terpenoids, in particular sesquiterpenes, which are typically produced cytosolically [51, 59–61]. Hijacking native organelles may enable the production of a target compound while reducing negative regulation or competition present in the native compartments. Furthermore, strategies have been developed to improve the production of hydrophobic terpenoids using lipid droplets as synthetic storage compartments [17, 31]. Specifically, ectopic production of the gene encoding the master regulator of plastidial fatty acid metabolism WRINKLED1 from Arabidopsis and cytosolic or plastidial Lipid Droplet Surface protein from the microalgae Nannochloropsis afforded lipid droplets in the respective subcellular residence. Co-production of both cytosolic and plastidial lipid droplets and terpenoid pathways was shown to improve overall yields, plausibly by sequestration of bioproducts, diterpenoid resin acids, and squalene [31, 62]. Re-engineering subcellular compartmentalization can not only improve terpenoid production but also alleviate the negative effects these pathways may have on the plant host, as demonstrated in tobacco [16, 17]. Recently, an oleosin-based strategy was successful in synthesizing squalene, a triterpenoid, in plastids and trapping it in plastidial lipid droplets [17]. These novel organelles are also amenable to anchoring distinct steps of the biosynthetic pathways through protein fusion to the surface of the lipid droplet, and this approach was found to lead to the efficient production of terpenoids [31]. Control and manipulation of compartments and subcellar localization are, therefore, effective strategies to further improve terpenoid production in plants. In addition to the natural subcellar localization of different pathways, plants have naturally developed differential expression and accumulation of pathways and products across different tissues, which also provide opportunities for engineering [9, 63–65]. Terpenoid biosynthesis is seen in specific tissues, and accumulation of terpenoids can even be localized to single cell types and structures

Plant Engineering to Enable Platforms for Sustainable Bioproduction of. . .

9

associated with these tissues. For example, specialized cork cells [63] or secretory ducts in root tissue [66], resin ducts in stems [67], and other oleoresin structures commonly found in conifers [68] have been shown to synthesize and store terpenoids. Similarly, glandular trichomes on leaf tissue are known to accumulate a variety of specialized metabolites, including terpenoids, and can be specifically engineered for terpenoid production [69]. Flowers are also known to specifically produce terpenoids, especially volatile variants [64], and can even emit and store these products in the stigma through a natural fumigation process in unopened flower buds [70]. Building from natural biosynthesis examples, engineering specific tissues, cell types, and structures may allow for greater control of terpenoid production in plant hosts.

5

Engineering Plants for the Production of Terpenoids Development of engineering strategies is often performed using Agrobacterium-mediated transient expression, which allows rapid testing of pathways and biological parts in a matter of days as opposed to the months or years that are required for creating stable transgenic plants. Pathways can be quickly tested in a combinatorial approach by mixing parts, which enables the characterization of biosynthetic pathways and the construction of novel terpenoid pathways [11, 35, 71]. Transient expression is typically localized to the infiltrated leaf tissue, although some methods have been developed to transiently express in other tissues, e.g., woodforming tissue in Eucalyptus, or roots in Chinese cabbage [72, 73]. It is possible to produce potentially commercially relevant yields of terpenoids using Agrobacterium-mediated transient expression in lab plants like tobacco [35], but stable transformation may allow more sustainable and large-scale production of compounds. Furthermore, engineering terpenoid production in crops with industrial uses may be a strategy to add value to existing infrastructure while reducing the cost of producing terpenoids of interest [74]. Engineering fast-growing, high biomass-producing crops like poplar and sorghum (Sorghum spp.) has become of great focus to generate bioenergy and bioproduct feedstocks. These crops are typically desirable as they represent significant lignocellulosic feedstocks that can converted to simple sugars and monolignols for microbial production of biofuels and bioproducts, like terpenoids [75, 76]. Their robust growth and established transformation protocols also make bioenergy crops a candidate for engineering production of specialty chemicals and other bioproducts. A technoeconomic analysis modeling sorghum biomass showed that engineering the crops to yield a variety of compounds, including terpenoids, may improve the economics when extracting the

10

Jacob D. Bibik and Bjo¨rn Hamberger

compounds prior to lignocellulosic biomass processing [74]. This strategy may be effective for a woody bioenergy crop like poplar, which is also commonly used in the pulp and paper industry and in the manufacture of oriented strand board (OSB). Poplar has previously been engineered for the production of specialty chemicals derived from phenylpropanoid-derived aromatics, which serve in plants as common precursors in the biosynthesis of monolignols, which make up the lignin biomass [77, 78]. These studies demonstrate that poplar is not only valuable as a bioenergy crop but also as a platform for the direct production of high-value chemicals. Poplar has also been extensively studied because many species can synthesize and emit substantial amounts of isoprene, promising a metabolic capacity to produce large amounts of terpenoids if engineered [79]. Furthermore, isoprene has a large role in climate change, as massive amounts, over 500 Tg year-1 from all plants [80], are emitted each year, and reducing emissions in commercial poplar would be consistent with sustainable strategies [81]. Engineering poplar to re-route IDP/DMADP carbon from isoprene toward the production of target terpenoids may be an effective strategy for high-value products while reducing isoprene emissions in poplar plantations. Combinatorial assays and co-expression of complex pathways are easily performed by transient expression, i.e., mixing of Agrobacterium strains harboring different plasmids with individual target genes. The best-established host system is the tobacco relative N. benthamiana, but recent advances demonstrated transient expression in the biomass species poplar and sorghum [82, 83]. Transient expression does not require gene stacking in multigene constructs, nor does it rely on plant selection markers because genomic integration is not required. Additionally, transient expression typically relies on strong constitutive promoters, like the cauliflower mosaic virus (CaMV) 35S promoter [84], with no concern over temporal or spatial expression regulation. These expression constructs can also be paired with over-expression of the gene encoding viral P19 protein, which can suppress RNA silencing in the host plants, as is seen in the pEAQ vector series [85]. Including viral silencing suppressors like P19 when generating stable transformants, however, often leads to developmental issues as the suppression of RNA silencing is not specific [86]. Therefore, many of the strategies that have been optimized for transient expression are not directly translatable to generating stable transformants, but recent innovations in genetic engineering are enabling more complex pathway design.

Plant Engineering to Enable Platforms for Sustainable Bioproduction of. . .

6

11

Advancements in Plant Genetic Engineering When engineering larger and more complex pathways in stable transformants, the plants are typically transformed with multiple genes through one of three strategies: (i) consecutive re-transformations with different genes, (ii) co-transformations with genes on multiple constructs, or (iii) transformation with multigene constructs [87, 88]. Consecutive re-transforming can take months or years, co-transformations are inefficient and require multiple selection markers, and multigene platforms require several unique promoters, resulting in large constructs with low transformation efficiency. There have been advancements in the assembly of larger constructs, which enable more efficient transformation with large DNA fragments. For example, Collier and co-workers developed a recombination system to assemble a large construct with a 28.5 kb transfer DNA region, which required multiple promoters and two selection markers to ensure genomic integration and expression of all genes [89]. Shih and co-workers reported efficient gene stacking of the three-enzyme bisabolene biosynthetic pathway through an elegant assembly utilizing yeast homologous recombination [90]. A third example is the direct in vivo assembly of multiple linear DNA fragments by homologous recombination. King and coauthors demonstrated the efficient deletion of a genomic locus encoding diterpene metabolism and re-routing of carbon into the sesquiterpene amorphadiene in the moss Physcometrium patens (syn. Physcomitrella patens) [91]. However, this approach is limited to organisms with effective homologous recombination. Technologies like this provide insights to begin building larger constructs, but diverse genetic elements are still needed for more complex metabolic engineering in plants. There is a set of regulatory elements that have been traditionally used to reliably overexpress genes of engineered pathways. These are from viral sources like the CaMV 35S promoter [84] and terminator [92], bacterial sources like the promoter and terminator sequences from various opine synthases [93], or plant sources like the maize Ubiquitin [94] or the rice Actin promoter [95]. Additionally, the use of 5′ and 3′ untranslated regions (UTRs) has been applied to increase expression, in particular, the UTRs from cowpea mosaic virus [96], which are implemented in the pEAQ-HT vectors [97]. These regulatory tools have proven effective within plant biotechnology when engineering high expression of genes and pathways, but additional tools are being developed to further improve expression regulation, tissue specificity, and gene stacking abilities from natural and synthetic elements [52, 53, 55, 98–105]. A key consideration for metabolic engineering is the regulation of gene expression strengths and tissue specificity, which is largely

12

Jacob D. Bibik and Bjo¨rn Hamberger

controlled by promoters in plants. A central theme is creating synthetic promoters for robust expression, mainly through combining known elements from different sequences, which has been recently reviewed [98]. In general, synthetic promoter design in plants has not been as advanced as microbial systems, but there have been major strides recently in developing synthetic and tunable plant promoters [100–104]. For example, Cai and co-workers devised a strategy to computationally design constitutive, synthetic minimal promoters, which demonstrated a range of expression strengths [101]. In a recent study by Jores and co-workers, a comprehensive analysis of Arabidopsis, Zea mays, and Sorghum bicolor core promoter sequences was performed to create a series of synthetic variants [103]. A recent review has summarized advancements in the synthetic regulation of pathways through post-transcriptional and translational approaches in addition to promoter design [99]. Post-transcriptional engineering approaches include UTRs on the 5′ and 3′ ends of a transcript [105, 106] as well as terminator sequences [107]. Synthetic riboswitches have also been developed for translational control of pathways [108, 109], enabling inducible translation of mRNA from target genes. These strategies yield additional layers of regulation through influencing gene expression, mRNA stability, and translation [110, 111]. In addition to expression tunability, regulating where multigene pathways are expressed may be especially important to dictate where products like terpenoids accumulate and to reduce potential adverse effects on plant development. A number of promoters have been developed to regulate tissue specificity [52–55, 57, 58, 112]. Engineering high specificity can prove advantageous, as has been demonstrated in leaf oil production using a leaf senescencespecific promoter to reduce pleiotropic effects of accumulation while obtaining oilseed-like levels in more biomass than traditional seed production [113]. In the oilseed crop Camelina sativa, the promoter and terminator of the soybean oleosin gene mediated seed-specific expression of mono- and sesquiterpene pathways and afforded accumulation of up to about 0.08 mg g-1 seed of (4S)limonene per day [40]. In addition to tissue specificity, strategies have been developed to efficiently express multiple genes in a single construct. For example, an elegant system has been developed where a synthetic activator gene is placed under the control of an endosperm-specific promoter, which when expressed activates the expression of multiple downstream genes under the control of synthetic promoters responsive to the synthetic activator [100]. A simpler approach to the expression of multiple genes is the use of bidirectional promoters, which have been isolated from native genomic sequences [56, 58, 114, 115] or created synthetically through fusing unidirectional promoters, like the CaMV 35S tailto-tail, i.e., CAT-3′-5′::5′-3′-AUG [52, 116, 117]. Bidirectional

Plant Engineering to Enable Platforms for Sustainable Bioproduction of. . .

13

promoters have been well studied in microbial engineering [118, 119] but have yet to be used more broadly for plant engineering. In combination with linker peptides like the self-cleaving 2A peptide from the foot-and-mouth disease virus [120] or the more efficient hybrid LP4/2A linker [121], bidirectional promoters may enable polycistronic expression on either side of the promoter. These strategies to regulate the expression of multiple genes would enable more complex gene stacking to build entire metabolic pathways in a single construct used for plant transformation. This effort is complementary to the vision shared in the field of Plant Synthetic Biology for a unified catalog of standardized, characterized DNA parts that will accelerate plant bioengineering [122]. There have been many advancements in genetic tools for plant engineering, and these will aid in engineering precise regulation of metabolic pathways. Combining tunable expression regulation, tissue specificity, and compact construct assemblies will enable the complex engineering of large multigene metabolic pathways. While some of these tools may have limited functionality outside specific plant species, the further discovery of species-agnostic tools is an attractive endeavor. Those approaches may at least provide inspiration to further engineer regulatory tools for chassis species more predisposed to the production of high-value small molecules like terpenoids. Additionally, improving gene stacking strategies will advance the engineering of plants to avoid issues such as silencing of repeated promoter regions and limitations in selectable markers and yield smaller constructs for more reliable transformation with a larger number of genes.

7

Conclusions While plants are natural chemical factories, advancements in metabolic engineering have facilitated redesigning plants as chassis for the production of chemicals including terpenoids. With developments in engineering the MEP and MVA pathways and downstream enzymes for terpenoid production at industrially relevant yields may become within reach. To become economically viable, it will be important to consider the optimization of pathways, compartmentalization of pathways and storage of products, and the proper plant host, organ, tissue, or cell type for production and sequestration. From a techno-economic perspective, yields, purity, ease of extraction, and alignment with downstream processing of the residual biomass will be critical. Furthermore, developing the genetic tools for the construction of pathways in hosts will be key. Controlling tissue and cell type specificity, along with intracellular compartmentalization, will enable precision engineering of terpenoid production, which could improve yields while limiting or

14

Jacob D. Bibik and Bjo¨rn Hamberger

eliminating negative effects on the host. Creative gene stacking with diverse promoters will enable expansion to engineering larger terpenoid pathways while reducing potential silencing due to expression from repeated promoter sequences. Furthermore, establishing verified and unified libraries of regulatory elements with varying expression strengths will allow expression tunability for each gene in a pathway, providing the tools to begin regulating metabolic pathway stoichiometry in plants.

Acknowledgments This work was supported by the Great Lakes Bioenergy Research Center, U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research under Award Number DE-SC0018409. We would also like to acknowledge partial support from the Department of Biochemistry and Molecular Biology startup funding and support from AgBioResearch (MICL02454). We collectively acknowledge that Michigan State University occupies the ancestral, traditional, and contemporary Lands of the Anishinaabeg—Three Fires Confederacy of Ojibwe, Odawa, and Potawatomi peoples. In particular, the University resides on Land ceded in the 1819 Treaty of Saginaw. We recognize, support, and advocate for the sovereignty of Michigan’s twelve federally recognized Indian nations, for historic Indigenous communities in Michigan, for Indigenous individuals and communities who live here now, and for those who were forcibly removed from their Homelands. By offering this Land Acknowledgement, we affirm Indigenous sovereignty and will work to hold Michigan State University more accountable to the needs of American Indians and Indigenous peoples. References 1. Leroi-Gourhan A (1975) The flowers found with Shanidar IV, a Neanderthal Burial in Iraq. Science (1979) 190(4214):562–564 2. Gurib-Fakim A (2006) Medicinal plants: traditions of yesterday and drugs of tomorrow. Mol Asp Med 27(1):1–93 3. Newman DJ, Cragg GM (2016) Natural products as sources of new drugs from 1981 to 2014. J Nat Prod [Internet] 79(3):629–661. Available from:. https://doi.org/10.1021/ acs.jnatprod.5b01055 4. Banerjee A, Arnesen JA, Moser D, Motsa BB, Johnson SR, Hamberger B (2019) Engineering modular diterpene biosynthetic pathways in Physcomitrella patens. Planta [Internet]

249(1):221–233. Available from:. https:// doi.org/10.1007/s00425-018-3053-0 5. Afendi FM, Okada T, Yamazaki M, HiraiMorita A, Nakamura Y, Nakamura K, Ikeda S, Takahashi H, Altaf-Ul-Amin M, Darusman LK, Saito K, Kanaya S (2012) KNApSAcK family databases: integrated metabolite–plant species databases for multifaceted plant research. Plant Cell Physiol 53(2):e1 6. Dixon RA, Strack D (2003) Phytochemistry meets genome analysis, and beyond. Phytochemistry 62(6):815–816 7. Yuan L, Grotewold E (2015) Metabolic engineering to enhance the value of plants as green factories. Metab Eng 27:83–91

Plant Engineering to Enable Platforms for Sustainable Bioproduction of. . . 8. Huchelmann A, Boutry M, Hachez C (2017) Plant Glandular Trichomes: natural cell factories of high biotechnological interest. Plant Physiol 175(1):6–22 9. Miller GP, Bhat WW, Lanier ER, Johnson SR, Mathieu DT, Hamberger B (2020) The biosynthesis of the anti-microbial diterpenoid leubethanol in Leucophyllum frutescens proceeds via an all-cis prenyl intermediate. Plant J [Internet] 104(3):693–705. Available from:. https://doi.org/10.1111/ tpj.14957 10. Karunanithi PS, Zerbe P (2019) Terpene synthases as metabolic gatekeepers in the evolution of plant Terpenoid chemical diversity. Front Plant Sci [Internet] 10. Available from: https://www.ncbi.nlm.nih.gov/pmc/arti cles/PMC6779861/ 11. Johnson SR, Bhat WW, Bibik J, Turmo A, Hamberger B, Consortium EMG, Hamberger B (2019) A database-driven approach identifies additional diterpene synthase activities in the mint family (Lamiaceae). J Biol Chem [Internet] 294(4):1349–1362. Available from: http://www.jbc.org/con tent/early/2018/11/29/jbc.RA118.00602 5.abstract 12. Pichersky E, Raguso RA (2018) Why do plants produce so many terpenoid compounds? New Phytol 220(3):692–702 13. Chassagne F, Cabanac G, Hubert G, David B, Marti G (2019) The landscape of natural product diversity and their pharmacological relevance from a focus on the Dictionary of Natural Products®. Phytochem Rev 18(3): 601–622 14. Chen J (2020) Global Markets for Flavors and Fragrances. BCC Research CHM034F 15. Wu S, Schalk M, Clark A, Miles RB, Coates R, Chappell J (2006) Redirection of cytosolic or plastidic isoprenoid precursors elevates terpene production in plants. Nat Biotechnol 24(11):1441–1447 16. Wu S, Jiang Z, Kempinski C, Eric Nybo S, Husodo S, Williams R, Chappell J (2012) Engineering triterpene metabolism in tobacco. Planta 236(3):867–877 17. Zhao C, Kim Y, Zeng Y, Li M, Wang X, Hu C, Gorman C, Dai SY, Ding SY, Yuan JS (2018) Co-compartmentation of terpene biosynthesis and storage via synthetic droplet. ACS Synth Biol 7(3):774–781 18. Zerbe P, Bohlmann J (2015) Plant diterpene synthases: exploring modularity and metabolic diversity for bioengineering. Trends Biotechnol 33(7):419–428

15

19. Andersen-Ranberg J, Kongstad KT, Nielsen MT, Jensen NB, Pateraki I, Bach SS, Hamberger B, Zerbe P, Staerk D, Bohlmann J, Møller BL, Hamberger B (2016) Expanding the landscape of diterpene structural diversity through stereochemically controlled combinatorial biosynthesis. Angewandte Chemie Int Edn 55(6):2142 20. Mafu S, Jia M, Zi J, Morrone D, Wu Y, Xu M, Hillwig ML, Peters RJ (2016) Probing the promiscuity of ent-kaurene oxidases via combinatorial biosynthesis. Proc Natl Acad Sci U S A 113(9):2526–2531 21. Welsch R, Li L (2022) Golden Rice—lessons learned for inspiring future metabolic engineering strategies and synthetic biology solutions. In: Methods in enzymology. Academic Press Inc., pp 1–29 22. Bohlmann J, Schrader J (eds) (2015) Advances in biotechnological engineering/ biotechnology Series Editor: T. Scheper. Springer International Publishing AG Switzerland, pp 1–470 23. Jørgensen L, McKerrall SJ, Kuttruff CA, Ungeheuer F, Felding J, Baran PS (2013) 14-step synthesis of (+)-Ingenol from (+)-3Carene. Science (1979) [Internet] 341(6148):878 LP–882. Available from: h t t p : // s c i e n c e . s c i e n c e m a g . o r g / c o n tent/341/6148/878.abstract 24. Yang S, Tian H, Sun B, Liu Y, Hao Y, Lv Y (2016) One-pot synthesis of (-)-Ambrox. Sci Rep 6:32650 25. Ke D, Caiyin Q, Zhao F, Liu T, Lu W (2018) Heterologous biosynthesis of triterpenoid ambrein in engineered Escherichia coli. Biotechnol Lett 40(2):399–404 26. Moser S, Strohmeier GA, Leitner E, Plocek TJ, Vanhessche K, Pichler H (2018) Wholecell (+)-ambrein production in the yeast Pichia pastoris. Metab Eng Commun 7: e00077 27. Croteau R, Ketchum REB, Long RM, Kaspera R, Wildung MR (2006) Taxol biosynthesis and molecular genetics. Phytochem Rev 5(1):75–97 28. Henkhaus N, Bartlett M, Gang D, Grumet R, Jordon-Thaden I, Lorence A, Lyons E, Miller S, Murray S, Nelson A, Specht C, Tyler B, Wentworth T, Ackerly D, Baltensperger D, Benfey P, Birchler J, Chellamma S, Crowder R, Donoghue M, Dundore-Arias JP, Fletcher J, Fraser V, Gillespie K, Guralnick L, Haswell E, Hunter M, Kaeppler S, Kepinski S, Li FW, Mackenzie S, McDade L, Min Y, Nemhauser J, Pearson B, Petracek P, Rogers K, Sakai A, Sickler D, Taylor C,

16

Jacob D. Bibik and Bjo¨rn Hamberger

Wayne L, Wendroth O, Zapata F, Stern D (2020) Plant science decadal vision 2020–2030: reimagining the potential of plants for a healthy and sustainable future. Plant Direct 4(8):e00252 29. Xu J, Dolan MC, Medrano G, Cramer CL, Weathers PJ (2012) Green factory: plants as bioproduction platforms for recombinant proteins. Biotechnol Adv 30(5):1171–1184 30. Tiwari P, Khare T, Shriram V, Bae H, Kumar V (2021) Plant synthetic biology for producing potent phyto-antimicrobials to combat antimicrobial resistance. Biotechnol Adv 48: 107729 31. Sadre R, Kuo P, Chen J, Yang Y, Banerjee A, Benning C, Hamberger B (2019) Cytosolic lipid droplets as engineered organelles for production and accumulation of terpenoid biomaterials in leaves. Nat Commun 10(1): 853 32. Busch M, Seuter A, Hain R (2002) Functional analysis of the early steps of carotenoid biosynthesis in tobacco. Plant Physiol 128(2): 439–453 33. Vickers CE, Possell M, Laothawornkitkul J, Ryan AC, Hewitt CN, Mullineaux PM (2011) Isoprene synthesis in plants: lessons from a transgenic tobacco model. Plant Cell Environ 34(6):1043–1053 34. Hasan MM, Kim HS, Jeon JH, Kim SH, Moon B, Song JY, Shim SH, Baek KH (2014) Metabolic engineering of Nicotiana benthamiana for the increased production of taxadiene. Plant Cell Rep 33(6):895–904 35. Reed J, Stephenson MJ, Miettinen K, Brouwer B, Leveau A, Brett P, Goss RJM, Goossens A, O’Connell MA, Osbourn A (2017) A translational synthetic biology platform for rapid access to gram-scale quantities of novel drug-like molecules. Metab Eng 42: 185–193 36. Andersen TB, Llorente B, Morelli L, TorresMontilla S, Bordanaba-Florit G, Espinosa FA, Rodriguez-Goberna MR, Campos N, Olmedilla-Alonso B, Llansola-Portoles MJ, Pascal AA, Rodriguez-Concepcion M (2021) An engineered extraplastidial pathway for carotenoid biofortification of leaves. Plant Biotechnol J 19(5):1008–1021 37. Kovacs K, Zhang L, Linforth RST, Whittaker B, Hayes CJ, Fray RG (2007) Redirection of carotenoid metabolism for the efficient production of taxadiene [taxa-4(5),11 (12)-diene] in transgenic tomato fruit. Transgenic Res 16(1):121–126 38. Henry LK, Gutensohn M, Thomas ST, Noel JP, Dudareva N (2015) Orthologs of the

archaeal isopentenyl phosphate kinase regulate terpenoid production in plants. Proc Natl Acad Sci U S A 112(32):10050–10055 39. Henry LK, Thomas ST, Widhalm JR, Lynch JH, Davis TC, Kessler SA, Bohlmann J, Noel JP, Dudareva N (2018) Contribution of isopentenyl phosphate to plant terpenoid metabolism. Nat Plants 4(9):721 40. Augustin JM, Higashi Y, Feng X, Kutchan TM (2015) Production of mono- and sesquiterpenes in Camelina sativa oilseed. Planta 242(3):693–708 41. Zhan X, Zhang YH, Chen DF, Simonsen HT (2014) Metabolic engineering of the moss Physcomitrella patens to produce the sesquiterpenoids patchoulol and α/β-santalene. Front Plant Sci 5(NOV):636 42. Jia M, Mishra SK, Tufts S, Jernigan RL, Peters RJ (2019) Combinatorial biosynthesis and the basis for substrate promiscuity in class I diterpene synthases. Metab Eng [Internet] 55:44–58. Available from: http://www. sciencedirect.com/science/article/pii/S10 96717619301417 43. Harker M, Holmberg N, Clayton JC, Gibbard CL, Wallace AD, Rawlins S, Hellyer SA, Lanot A, Safford R (2003) Enhancement of seed phytosterol levels by expression of an N-terminal truncated Hevea brasiliensis (rubber tree) 3-hydroxy-3-methylglutaryl-CoA reductase. Plant Biotechnol J 1(2):113–121 44. Banerjee A, Preiser AL, Sharkey TD (2016) Engineering of recombinant Poplar DeoxyD-Xylulose-5-Phosphate Synthase (PtDXS) by site-directed mutagenesis improves its activity. PLoS One 11(8):e0161534 45. Lee AR, Kwon M, Kang MK, Kim J, Kim SU, Ro DK (2019) Increased sesqui- and triterpene production by co-expression of HMG-CoA reductase and biotin carboxyl carrier protein in tobacco (Nicotiana benthamiana). Metab Eng 52:20–28 46. Wright LP, Rohwer JM, Ghirardo A, Hammerbacher A, Ortiz-Alcaide M, Raguschke B, Schnitzler JP, Gershenzon J, Phillips MA (2014) Deoxyxylulose 5-phosphate synthase controls flux through the Methylerythritol 4-phosphate pathway in Arabidopsis. Plant Physiol 165(4):1488–1504 47. Chappell J, Wolf F, Proulx J, Cuellar R, Saunders C (1995) Is the reaction catalyzed by 3-Hydroxy-3-Methylglutaryl coenzyme A reductase a rate-limiting step for isoprenoid biosynthesis in plants? Plant Physiol 109(4): 1337–1343 48. Gutensohn M, Henry LK, Gentry SA, Lynch JH, Nguyen TTH, Pichersky E, Dudareva N

Plant Engineering to Enable Platforms for Sustainable Bioproduction of. . . (2021) Overcoming bottlenecks for metabolic engineering of Sesquiterpene production in tomato fruits. Front Plant Sci 12: 691754 49. Kirby J, Nishimoto M, Chow RWN, Baidoo EEK, Wang G, Martin J, Schackwitz W, Chan R, Fortman JL, Keasling JD (2015) Enhancing terpene yield from sugars via novel routes to 1-Deoxy-d-Xylulose 5-phosphate. Appl Environ Microbiol 81(1): 130–138 50. Leonard E, Ajikumar PK, Thayer K, Xiao WH, Mo JD, Tidor B, Stephanopoulos G, Prather KLJ (2010) Combining metabolic and protein engineering of a terpenoid biosynthetic pathway for overproduction and selectivity control. Proc Natl Acad Sci 107(31):13654–13659 51. Malhotra K, Subramaniyan M, Rawat K, Kalamuddin M, Qureshi MI, Malhotra P, Mohmmed A, Cornish K, Daniell H, Kumar S (2016) Compartmentalized metabolic engineering for Artemisinin biosynthesis and effective malaria treatment by oral delivery of plant cells. Mol Plant 9(11):1464–1477 52. Bai J, Wang X, Wu H, Ling F, Zhao Y, Lin Y, Wang R (2020) Comprehensive construction strategy of bidirectional green tissue-specific synthetic promoters. Plant Biotechnol J 18(3):668–678 53. Gao L, Tian Y, Chen MC, Wei L, Gao TG, Yin HJ, Zhang JL, Kumar T, Liu LB, Wang SM (2019) Cloning and functional characterization of epidermis-specific promoter MtML1 from Medicago truncatula. J Biotechnol 300: 32–39 54. Li Y, Liu S, Yu Z, Liu Y, Wu P (2013) Isolation and characterization of two novel rootspecific promoters in rice (Oryza sativa L.). Plant Sci 207:37–44 55. Li Y, Dong C, Hu M, Bai Z, Tong C, Zuo R, Liu Y, Cheng X, Cheng M, Huang J, Liu S (2019) Identification of flower-specific promoters through comparative transcriptome analysis in Brassica napus. Int J Mol Sci 20(23):5949 56. Liu X, Li S, Yang W, Mu B, Jiao Y, Zhou X, Zhang C, Fan Y, Chen R (2018) Synthesis of seed-specific bidirectional promoters for metabolic engineering of anthocyanin-rich maize. Plant Cell Physiol 59(10):1942–1955 57. Sta˚lberg K, Ellersto¨m M, Ezcurra I, Ablov S, Rask L (1996) Disruption of an overlapping E-box/ABRE motif abolished high transcription of the napA storage-protein promoter in transgenic Brassica napus seeds. Planta [Internet] 199(4) Available from: http://link. springer.com/10.1007/BF00195181

17

58. Wang R, Yan Y, Zhu M, Yang M, Zhou F, Chen H, Lin Y (2016) Isolation and functional characterization of bidirectional promoters in rice. Front Plant Sci [Internet] 7. Available from: https://www.ncbi.nlm. nih.gov/pmc/articles/PMC4885881/ 59. van Herpen TWJM, Cankar K, Nogueira M, Bosch D, Bouwmeester HJ, Beekwilder J (2010) Nicotiana benthamiana as a production platform for Artemisinin precursors. PLoS One 5(12):e14222 60. Liu Q, Majdi M, Cankar K, Goedbloed M, Charnikhova T, Verstappen FWA, Vos de RCH, Beekwilder J, Krol van der S, Bouwmeester HJ (2011) Reconstitution of the Costunolide biosynthetic pathway in yeast and Nicotiana benthamiana. PLoS One 6(8):e23255 61. Eljounaidi K, Cankar K, Comino C, Moglia A, Hehn A, Bourgaud F, Bouwmeester H, Menin B, Lanteri S, Beekwilder J (2014) Cytochrome P450s from Cynara cardunculus L. CYP71AV9 and CYP71BL5, catalyze distinct hydroxylations in the sesquiterpene lactone biosynthetic pathway. Plant Sci 223:59– 68 62. Bibik JD, Weraduwage SM, Banerjee A, Robertson K, Espinoza-Corral R, Sharkey TD, Lundquist PK, Hamberger BR (2022) Pathway engineering, re-targeting, and synthetic scaffolding improve the production of squalene in plants. ACS Synth Biol 11(6): 2121–2133 63. Pateraki I, Andersen-Ranberg J, Hamberger B, Heskes AM, Martens HJ, Zerbe P, Bach SS, Møller BL, Bohlmann J, Hamberger B (2014) Manoyl oxide (13R), the biosynthetic precursor of forskolin, is synthesized in specialized root cork cells in Coleus forskohlii. Plant Physiol 164(3): 1222–1236 64. Boachon B, Junker RR, Miesch L, Bassard JE, Ho¨fer R, Caillieaudeaux R, Seidel DE, Lesot A, Heinrich C, Ginglinger JF, Allouche L, Vincent B, Wahyuni DSC, Paetz C, Beran F, Miesch M, Schneider B, Leiss K, Werck-Reichhart D (2015) CYP76C1 (Cytochrome P450)-mediated linalool metabolism and the formation of volatile and soluble linalool oxides in Arabidopsis flowers: a strategy for defense against floral antagonists. Plant Cell 27(10):2972–2990 65. Muchlinski A, Chen X, Lovell JT, Ko¨llner TG, Pelot KA, Zerbe P, Ruggiero M, Callaway L, Laliberte S, Chen F, Tholl D (2019) Biosynthesis and emission of stress-induced volatile terpenes in roots and leaves of Switchgrass (Panicum virgatum L.) [Internet]. Front

18

Jacob D. Bibik and Bjo¨rn Hamberger

Plant Sci 10:1144. Available from: https:// www.frontiersin.org/ar ticle/10.3389/ fpls.2019.01144 66. Andersen TB, Martinez-Swatson KA, Rasmussen SA, Boughton BA, Jørgensen K, Andersen-Ranberg J, Nyberg N, Christensen SB, Simonsen HT (2017) Localization and in-vivo characterization of Thapsia garganica CYP76AE2 indicates a role in Thapsigargin biosynthesis. Plant Physiol 174(1):56–72 67. Abbott E, Hall D, Hamberger B, Bohlmann J (2010) Laser microdissection of conifer stem tissues: isolation and analysis of high quality RNA, terpene synthase enzyme activity and terpenoid metabolites from resin ducts and cambial zone tissue of white spruce (Picea glauca). BMC Plant Biol 10:106 68. Celedon JM, Bohlmann J (2019) Oleoresin defenses in conifers: chemical diversity, terpene synthases and limitations of oleoresin defense under climate change. New Phytol 224(4):1444–1463 69. Kortbeek RWJ, Xu J, Ramirez A, Spyropoulou E, Diergaarde P, OttenBruggeman I, de Both M, Nagel R, Schmidt A, Schuurink RC, Bleeker PM (2016) Chapter Twelve - Engineering of tomato glandular Trichomes for the production of specialized metabolites. In: O’Connor SEBTM in E, O’Connor SE (ed) Methods in enzymology [Internet], pp 305–331. (Synthetic Biology and Metabolic Engineering in Plants and Microbes Part B: Metabolism in Plants; vol. 576). Available from: https:// www.sciencedirect.com/science/article/pii/ S0076687916000987 70. Boachon B, Lynch JH, Ray S, Yuan J, Caldo KMP, Junker RR, Kessler SA, Morgan JA, Dudareva N (2019) Natural fumigation as a mechanism for volatile transport between flower organs. Nat Chem Biol 15(6):583 71. Andersen-Ranberg J, Kongstad KT, Nielsen MT, Jensen NB, Pateraki I, Bach SS, Hamberger B, Zerbe P, Staerk D, Bohlmann J, Møller BL, Hamberger B (2016) Expanding the landscape of Diterpene structural diversity through Stereochemically controlled combinatorial biosynthesis. Angewandte Chemie Int Edn [Internet] 55(6): 2142–2146. Available from: https:// onlinelibrary.wiley.com/doi/abs/10.1002/ anie.201510650 72. Spokevicius AV, van Beveren K, Leitch MA, Bossinger G (2005) Agrobacterium-mediated in vitro transformation of wood-producing stem segments in eucalypts. Plant Cell Rep 23(9):617–624

73. Zhong L, Zhang Y, Liu H, Sun G, Chen R, Song S (2016) Agrobacterium-mediated transient expression via root absorption in flowering Chinese cabbage. Springerplus 5(1):1825 74. Yang M, Baral NR, Simmons BA, Mortimer JC, Shih PM, Scown CD (2020) Accumulation of high-value bioproducts in planta can improve the economics of advanced biofuels. Proc Natl Acad Sci [Internet] 117(15):8639 LP–8648. Available from: http://www.pnas. org/content/117/15/8639.abstract 75. Mathur S, Umakanth AV, Tonapi VA, Sharma R, Sharma MK (2017) Sweet sorghum as biofuel feedstock: recent advances and available resources. Biotechnol Biofuels 10(1):1–19 76. Sannigrahi P, Ragauskas AJ, Tuskan GA (2010) Poplar as a feedstock for biofuels: a review of compositional characteristics. Biofuels Bioprod Biorefin 4(2):209–226 77. Costa MA, Marques JV, Dalisay DS, Herman B, Bedgar DL, Davin LB, Lewis NG (2013) Transgenic hybrid poplar for sustainable and scalable production of the commodity/specialty chemical, 2-Phenylethanol. PLoS One 8(12):e83169 78. Lu D, Yuan X, Kim SJ, Marques JV, Chakravarthy PP, Moinuddin SGA, Luchterhand R, Herman B, Davin LB, Lewis NG (2017) Eugenol specialty chemical production in transgenic poplar (Populus tremula × P. alba) field trials. Plant Biotechnol J 15(8):970–981 79. Schnitzler JP, Louis S, Behnke K, Loivam€aki M (2010) Poplar volatiles – biosynthesis, regulation and (eco)physiology of isoprene and stress-induced isoprenoids. Plant Biol 12(2): 302–316 80. Guenther AB, Jiang X, Heald CL, Sakulyanontvittaya T, Duhl T, Emmons LK, Wang X (2012) The model of emissions of gases and aerosols from nature version 2.1 (MEGAN2.1): an extended and updated framework for modeling biogenic emissions. Geosci Model Dev 5(6):1471–1492 81. Vanzo E, Jud W, Li Z, Albert A, Domagalska MA, Ghirardo A, Niederbacher B, Frenzel J, Beemster GTS, Asard H, Rennenberg H, Sharkey TD, Hansel A, Schnitzler JP (2015) Facing the future: effects of short-term climate extremes on isoprene-emitting and nonemitting poplar. Plant Physiol 169(1): 560–575 82. Sharma R, Liang Y, Lee MY, Pidatala VR, Mortimer JC, Scheller HV (2020) Agrobacterium-mediated transient transformation of sorghum leaves for accelerating functional genomics and genome editing studies. BMC Res Notes 13(1):116

Plant Engineering to Enable Platforms for Sustainable Bioproduction of. . . 83. Zheng L, Yang J, Chen Y, Ding L, Wei J, Wang H (2021) An improved and efficient method of agrobacterium syringe infiltration for transient transformation and its application in the elucidation of gene function in poplar. BMC Plant Biol 21(1):54 84. Amack SC, Antunes MS (2020) CaMV35S promoter – a plant biology and biotechnology workhorse in the era of synthetic biology. Curr Plant Biol 24:100179 85. Peyret H, Lomonossoff GP (2013) The pEAQ vector series: the easy and quick way to produce recombinant proteins in plants. Plant Mol Biol 83(1–2):51–58 86. Siddiqui SA, Sarmiento C, Truve E, Lehto H, Lehto K (2008) Phenotypes and functional effects caused by various viral RNA silencing suppressors in transgenic Nicotiana benthamiana and N. tabacum. Mol PlantMicrobe Interact 21(2):178–187 87. Halpin C (2005) Gene stacking in transgenic plants – the challenge for 21st century plant biotechnology. Plant Biotechnol J 3(2): 141–155 88. Bock R (2013) Strategies for metabolic pathway engineering with multiple transgenes. Plant Mol Biol 83(1):21–31 89. Collier R, Thomson JG, Thilmony R (2018) A versatile and robust Agrobacterium-based gene stacking system generates high-quality transgenic Arabidopsis plants. Plant J 95: 573–583. https://doi.org/10.1111/tpj. 13992 90. Shih PM, Vuu K, Mansoori N, Ayad L, Louie KB, Bowen BP, Northen TR, Loque´ D (2016) A robust gene-stacking method utilizing yeast assembly for plant synthetic biology. Nat Commun 7:13215 91. King BC, Vavitsas K, Ikram NKBK, Schrøder J, Scharff LB, Bassard JE´, Hamberger B, Jensen PE, Simonsen HT (2016) Corrigendum: in vivo assembly of DNA-fragments in the moss, Physcomitrella patens. Sci Rep 6:31261 92. Hirt H, Ko¨gl M, Murbacher T, Heberle-Bors E (1990) Evolutionary conservation of transcriptional machinery between yeast and plants as shown by the efficient expression from the CaMV 35S promoter and 35S terminator. Curr Genet 17(6):473–479 93. Koncz C, de Greve H, Andre´ D, Deboeck F, van Montagu M, Schell J (1983) The opine synthase genes carried by Ti plasmids contain all signals necessary for expression in plants. EMBO J 2(9):1597–1603 94. Christensen AH, Sharrock RA, Quail PH (1992) Maize polyubiquitin genes: structure,

19

thermal perturbation of expression and transcript splicing, and promoter activity following transfer to protoplasts by electroporation. Plant Mol Biol 18(4):675–689 95. McElroy D, Zhang W, Cao J, Wu R (1990) Isolation of an efficient actin promoter for use in rice transformation. Plant Cell 2(2): 163–171 96. Sainsbury F, Lomonossoff GP (2008) Extremely high-level and rapid transient protein production in plants without the use of viral replication. Plant Physiol 148(3): 1212–1218 97. Sainsbury F, Thuenemann EC, Lomonossoff GP (2009) pEAQ: versatile expression vectors for easy and quick transient expression of heterologous proteins in plants. Plant Biotechnol J 7(7):682–693 98. Ali S, Kim WC (2019) A fruitful decade using synthetic promoters in the improvement of transgenic plants. Front Plant Sci [Internet] 10. Available from: https://www.ncbi.nlm. nih.gov/pmc/articles/PMC6838210/ 99. Huang D, Kosentka PZ, Liu W (2021) Synthetic biology approaches in regulation of targeted gene expression. Curr Opin Plant Biol 63:102036 100. Belcher MS, Vuu KM, Zhou A, Mansoori N, Agosto Ramos A, Thompson MG, Scheller HV, Loque´ D, Shih PM (2020) Design of orthogonal regulatory systems for modulating gene expression in plants. Nat Chem Biol 16:1–9 101. Cai YM, Kallam K, Tidd H, Gendarini G, Salzman A, Patron NJ (2020) Rational design of minimal synthetic promoters for plants. Nucleic Acids Res [Internet]. Available from: https://academic.oup.com/nar/advance-arti cle/doi/10.1093/nar/gkaa682/5897334 102. Gupta D, Dey N, Leelavathi S, Ranjan R (2021) Development of efficient synthetic promoters derived from pararetrovirus suitable for translational research. Planta 253(2): 42 103. Jores T, Tonnies J, Wrightsman T, Buckler ES, Cuperus JT, Fields S, Queitsch C (2021) Synthetic promoter designs enabled by a comprehensive analysis of plant core promoters. Nat Plants 7(6):842–855 104. Yang Y, Lee JH, Poindexter MR, Shao Y, Liu W, Lenaghan SC, Ahkami AH, Blumwald E, Stewart CN Jr (2021) Rational design and testing of abiotic stress-inducible synthetic promoters from poplar cis-regulatory elements. Plant Biotechnol J 19(7): 1354–1369

20

Jacob D. Bibik and Bjo¨rn Hamberger

105. Peyret H, Brown JKM, Lomonossoff GP (2019) Improving plant transient expression through the rational design of synthetic 5′ and 3′ untranslated regions. Plant Methods 15(1):108 106. Diamos AG, Rosenthal SH, Mason HS (2016) 5′ and 3′ untranslated regions strongly enhance performance of Geminiviral replicons in Nicotiana benthamiana leaves. Front Plant Sci [Internet] 7. Available from: https://www.frontiersin.org/article/10.33 89/fpls.2016.00200 107. Diamos AG, Mason HS (2018) Chimeric 3′ flanking regions strongly enhance gene expression in plants. Plant Biotechnol J 16(12):1971–1982 108. Verhounig A, Karcher D, Bock R (2010) Inducible gene expression from the plastid genome by a synthetic riboswitch. Proc Natl Acad Sci 107(14):6204–6209 109. Agrawal S, Karcher D, Ruf S, Erban A, Hertle AP, Kopka J, Bock R (2022) Riboswitchmediated inducible expression of an astaxanthin biosynthetic operon in plastids. Plant Physiol 188(1):637–652 110. Bernardes WS, Menossi M (2020) Plant 3’ regulatory regions from mRNA-encoding genes and their uses to modulate expression. Front Plant Sci [Internet] 11. Available from: https://www.frontiersin.org/article/10.33 89/fpls.2020.01252 111. Srivastava AK, Lu Y, Zinta G, Lang Z, Zhu JK (2018) UTR dependent control of gene expression in plants. Trends Plant Sci 23(3): 248–259 112. Liu X, Yang W, Mu B, Li S, Li Y, Zhou X, Zhang C, Fan Y, Chen R (2018) Engineering of ‘Purple Embryo Maize’ with a multigene expression system derived from a bidirectional promoter and self-cleaving 2A peptides. Plant Biotechnol J 16(6):1107–1109 113. Vanhercke T, Dyer JM, Mullen RT, Kilaru A, Rahman MM, Petrie JR, Green AG, Yurchenko O, Singh SP (2019) Metabolic engineering for enhanced oil in biomass. Prog Lipid Res 74:103–129 114. Wang C, Ding D, Yan R, Yu X, Li W, Li M (2008) A novel bi-directional promoter cloned from melon and its activity in cucumber and tobacco. J Plant Biol 51(2):108–115 115. Kourmpetli S, Lee K, Hemsley R, Rossignol P, Papageorgiou T, Drea S (2013) Bidirectional promoters in seed development and related hormone/stress responses. BMC Plant Biol 13(1):187

116. Zhang C, Gai Y, Zhu Y, Chen X, Jiang X (2008) Construction of a bidirectional promoter and its transient expression in Populus tomentosa. Front Forest China 3(1):112–116 117. Zhang C, Gai Y, Wang W, Zhu Y, Chen X, Jiang X (2008) Construction and analysis of a plant transformation binary vector pBDGG harboring a bi-directional promoter fusing dual visible reporter genes. J Genet Genomics 35(4):245–249 118. Poliner E, Pulman JA, Zienkiewicz K, Childs K, Benning C, Farre´ EM (2018) A toolkit for Nannochloropsis oceanica CCMP1779 enables gene stacking and genetic engineering of the eicosapentaenoic acid pathway for enhanced long-chain polyunsaturated fatty acid production. Plant Biotechnol J 16(1):298–309 119. Vogl T, Kickenweiz T, Pitzer J, Sturmberger L, Weninger A, Biggs BW, Ko¨hler EM, Baumschlager A, Fischer JE, Hyden P, Wagner M, Baumann M, Borth N, Geier M, Ajikumar PK, Glieder A (2018) Engineered bidirectional promoters enable rapid multi-gene co-expression optimization. Nat Commun 9(1):3589 120. de Felipe P, Hughes LE, Ryan MD, Brown JD (2003) Co-translational, Intraribosomal cleavage of polypeptides by the foot-andmouth disease virus 2A Peptide. J Biol Chem 278(13):11441–11448 121. Sun H, Zhou N, Wang H, Huang D, Lang Z (2017) Processing and targeting of proteins derived from polyprotein with 2A and LP4/2A as peptide linkers in a maize expression system. PLoS One 12(3):e0174804 122. Patron NJ, Orzaez D, Marillonnet S, Warzecha H, Matthewman C, Youles M, Raitskin O, Leveau A, Farre´ G, Rogers C, Smith A, Hibberd J, Webb AAR, Locke J, Schornack S, Ajioka J, Baulcombe DC, Zipfel C, Kamoun S, Jones JDG, Kuhn H, Robatzek S, van Esse HP, Sanders D, Oldroyd G, Martin C, Field R, O’Connor S, Fox S, Wulff B, Miller B, Breakspear A, Radhakrishnan G, Delaux PM, Loque´ D, Granell A, Tissier A, Shih P, Brutnell TP, Quick WP, Rischer H, Fraser PD, Aharoni A, Raines C, South PF, Ane´ JM, Hamberger BR, Langdale J, Stougaard J, Bouwmeester H, Udvardi M, Murray JAH, Ntoukakis V, Sch€afer P, Denby K, Edwards KJ, Osbourn A, Haseloff J (2015) Standards for plant synthetic biology: a common syntax for exchange of DNA parts. New Phytol 208(1):13

Chapter 2 Compartmentalized Terpenoid Production in Plants Using Agrobacterium-Mediated Transient Expression Jacob D. Bibik, Abigail E. Bryson, and Bjo¨rn Hamberger Abstract As the field of plant synthetic biology continues to grow, Agrobacterium-mediated transient expression has become an essential method to rapidly test pathway candidate genes in a combinatorial fashion. This is especially important when elucidating and engineering more complex pathways to produce commercially relevant chemicals like many terpenoids, a widely diverse class of natural products of often industrial relevance. Agrobacterium-mediated transient expression has facilitated multiplex expression of recombinant and modified enzymes, including synthetic biology approaches to compartmentalize the biosynthesis of terpenoids subcellularly. Here, we describe methods on how to deploy Agrobacterium-mediated transient expression in Nicotiana benthamiana to rapidly develop terpenoid pathways and compartmentalize terpenoid biosynthesis within plastids, the cytosol, or at the surface of lipid droplets. Key words Agrobacterium-mediated transient expression, Terpenoids, Pathway compartmentalization, Plant synthetic biology

1

Introduction Plants provide the potential of a sustainable platform for the production of an array of metabolites and specialty chemicals. As photosynthetic organisms, plants have abundant reduction equivalents, such as NADPH, which is needed for many natural product pathways. They have distinct organelles, e.g., plastids, mitochondria, and the endoplasmic reticulum, which can be targeted and offer a distinct metabolic environment from the cytosol. Lastly, they can be engineered for the co-production of storage organelles alongside biosynthetic pathways to high-value natural products. Bioproducts of particular interest are the terpenoids, the most diverse class of chemicals naturally accumulating across the kingdoms of life. Building from native terpenoid precursor pathways, plants can be further engineered to produce a variety of terpenoids through new-to-nature pathways [1–4]. Stable metabolic

Jeffrey Carl Braman (ed.), Synthetic Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2760, https://doi.org/10.1007/978-1-0716-3658-9_2, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2024

21

22

Jacob D. Bibik et al.

engineering in plants is time-consuming as it can take several months to generate transgenic lines in many species. Strategies have been developed utilizing Agrobacterium-mediated transient expression as an effective method to rapidly test pathways and develop plant synthetic biology tools [1–3, 5–9]. By manipulating the natural ability of Agrobacterium to infect and transfer DNA to host plants, transgenes can be easily introduced and expressed in a transient manner in select tissues, especially leaves. These methods can be used to not only test single gene functions but also rapidly develop complex metabolic pathways. By simply mixing and co-infiltrating Agrobacterium strains harboring different genes of interest (GOIs) into leaf tissues, entire pathways can be overexpressed for the production of target compounds. Transient expression enables rapid design and optimization of metabolic pathways, including improvement of yields through co-expression of rate-limiting steps of precursor pathways and re-targeting of pathways to different cellular compartments. This allows pathway validation prior to scaled transient expression to produce terpenoids at small or medium (up to milligram) scale. Here, we present strategies to re-target terpenoid biosynthetic pathways from the cytosol to plastids or anchor them to synthetic lipid droplets in the cytosol using Agrobacterium-mediated transient expression with both syringe- and vacuum-infiltrations. Native transit peptides or retention signals, if any, can be removed and replaced with a plastid transit peptide sequence from Arabidopsis thaliana Rubisco small subunit [10] to target enzymes to plastids or the sequence for the Lipid Droplet Surface Protein (LDSP) from Nannochloropsis oceanica [11] to anchor them to the surface of lipid droplets, respectively [1, 3]. When co-expressing these target genes with genes encoding enzymes to boost the precursor pathways, we demonstrate a robust system to produce terpenoids and develop plant synthetic biology tools. The following methods are based on our previous work producing squalene, a C30 triterpene [1], in Nicotiana benthamiana. They have also been applied to other terpene synthases [3] and can likely be applied to other compounds and plant species.

2 2.1

Materials Vector Assembly

1. pEAQ-HT vector (Plant Bioscience Limited, Norwich Research Park, UK). 2. E. cloni 10G Escherichia coli cells with recovery media (Lucigen), or common Super Optimal Broth (SOC). 3. NucleoSpin Gel and PCR Clean-up Kit (Macherey-Nagel). 4. XhoI and NruI restriction enzymes with rCutSmart buffer (New England Biolabs).

Transient Expression for Terpenoid Production in Plants

23

5. Phusion High-Fidelity DNA Polymerase (New England Biolabs). 6. In-Fusion HD cloning kit (Takara Bio). 7. 1× Tris–HCl, borate, EDTA (TBE) buffer. 8. 0.8% agarose in 1× TBE. 9. Low salt Luria-Bertani (LB) plates (1.5% agar) with 50 μg/mL kanamycin (Sigma Aldrich). 10. Primers for pEAQ-HT (forward sequence 5′- GCTTGAC GAGGTATTGTTGCCTGTAC-3′, reverse 5′-CTTTGTGAG CTCCTGTTTAGCAGG-3′). 11. Sterile glass beads. 2.2 Preparation and Transformation of Agrobacterium Competent Cells

12. Low salt LB liquid media with 25 μg/mL rifampicin. 13. Low salt LB plates (1.5% agar) with 25 μg/mL rifampicin and 50 μg/mL kanamycin. 14. Agrobacterial strain of choice; here, we use LBA4404. 15. Ice cold, sterile 10% glycerol in water. 16. Ice cold, sterile water. 17. 2 mm electroporation cuvette. 18. Eppendorf Electroporator 2510.

2.3 N. benthamiana Transformation

19. Low salt LB liquid media with 25 μg/mL rifampicin and 50 μg/mL kanamycin. 20. Low salt LB liquid media with 50 μg/mL kanamycin. 21. Nicotiana benthamiana plants 4–5 weeks old, seeds available from Herbalistics, Pty Ltd, ABN 32 104 503 657, PO Box 135. Bli Bli. QLD. 4560, AUSTRALIA. 22. Sterile 50 mL conical tubes. 23. 1 L shake flasks. 24. 1 mL needleless syringes. 25. 200 μM acetosyringone in water. 26. Metal cork borer. 27. Wide mouth vacuum chamber (e.g., vacuum desiccator). 28. Plastic sheet with slit wide enough to slide around plant stem. 29. Aluminum foil. 30. Hexane. 31. N-hexacosane (or other internal standard). 32. 2 mL screw cap, polypropylene tubes. 33. 4 mm tungsten milling beads. 34. Gas chromatography vials with screw caps containing PTFE/ rubber septa.

24

3

Jacob D. Bibik et al.

Methods

3.1 Constructing Transient Expression Vectors

1. Digest 3 μg of pEAQ-HT or pEAQ-TP (see Note 1) with 2 μL (40 units) each of restriction enzymes XhoI and NruI for 60 min (see Note 2) at 37 °C in a total of 50 μL 1× rCutSmart Buffer. 2. Add 6× gel loading buffer to each reaction, load the samples on a 0.8% agarose gel, and run the electrophoresis in 1× TBE at 100 V for 30 min to separate the linearized vector from the digested fragment. Cut the linearized vector from the gel and purify using the NucleoSpin Gel and PCR Clean-up Kit, or similar kit or protocol. 3. Amplify GOIs using Phusion High-Fidelity DNA Polymerase or a similar high-fidelity polymerase. Primers should be designed specific to your GOI with an additional 15 nt, 5′-overhangs (see Note 3) specific to the digested vector or adjacent genes (see Fig. 1). 4. To design constructs to target soluble proteins to chloroplasts or anchor proteins to cytosolic lipid droplets, genes can be fused to a plastid transit peptide sequence or a Lipid Droplet Surface Protein (LDSP) sequence (see Note 4). 5. Insert GOIs into the digested vector with the In-Fusion HD cloning kit (Takara Bio), following the manufacturer’s protocol. 6. Thaw 20 μL aliquots of E. cloni 10G cells on ice for 10 min.

Fig. 1 Schematic construct designs to re-target desired proteins to plastids or the surface of lipid droplets. Shown are the 5′ to 3′ regions of the pEAQ-HT cloning site, which can be digested by NruI and XhoI prior to insertion of target genes. The left and right show key elements to either anchor proteins to the surface of lipid droplets or target proteins to plastids, respectively. Not shown for clarity, Arabidopsis thaliana WRINKLED1 transcription factor; upstream pathways providing either cytosolic or plastidial precursors. 35S, cauliflower mosaic virus constitutive promoter CaMV35S; GOI, gene of interest; LDSP, Nannochloropsis oceanica Lipid droplet surface protein; and TP, plastidial transit peptide

Transient Expression for Terpenoid Production in Plants

25

7. Add 0, 8k = 1ðiÞN

ð21Þ

C k s ðN - kÞ .

In order to deduce the design principles, the precise mathematical conditions (PMC) for adaptation, as shown in (17) and (20) have to be posed as certain structural requirements of the networks.

48

Priyan Bhattacharya et al.

3.4 From Abstraction to Structure

In this section, we use the wealth of algebraic graph theory to make sense of (17) and (20) vis-a-vis the structure. Without any loss of generality, we assume the input disturbance-receiving node as node 1, and the output is obtained by measuring the concentration of the kth node. In this case, the matrices B and C can be expressed as B = αe1

ð22Þ

C = βeTk

ð23Þ

where α, β∈ are constants and ej ∈N is a unit vector in direction of jth axis. In the language of systems theory, the first condition requires the system in (15), (16) to be controllable by the disturbance output dðtÞ—this can be ensured as long as there exists at least one forward path from the disturbance-receiving node to the output node [24]. For the second condition, in the absence of a feed-through term (20) can be modified as βeTk A - 1 αe1 = 0

ð24Þ

The above equation is satisfied for any choice of ðα, βÞ as long as the following conditions are satisfied MinorðA1,k Þ = 0

ð25Þ

ReðσðAÞÞ < 0

ð26Þ

where σðAÞ refers to all the eigenvalues of A. Again from algebraic graph theory, barring the product of the diagonals, every term in the determinant expression of A contains at least one loop expression which includes the set of terms obtained by combining the element A1,k with its minor. Therefore, we conclude that every term in MinorðA1,k Þ has to contain at least one forward path from the input disturbance-receiving node to the output node. The second condition, as depicted in (26), refers to the stability of the system. Since it is extremely difficult to arrive analytically at a closed-form expression of the eigenvalues of a matrix of any size, we use a weaker (necessary) condition for stability. For any matrix A∈N × N , the characteristic polynomial (C A ðsÞ ) can be expressed as C A ðsÞ = DetðsIA Þ

ð27Þ

Since stability requires all the roots of the equation C A ðsÞ = 0 to carry a negative real part, one of the associated necessary conditions can be expressed as αi ðC A Þ × αj ðC A Þ > 0 8ði, j Þ∈0 to N þ 1

ð28Þ

Design Principles for Biological Adaptation: A Systems and Control. . .

49

where αk ðC A Þ the coefficient of s N - k in the characteristic polynomial. It is evident from (27) that C A ðsÞ is a monic polynomial of s, therefore, the modified stability condition can be written as αi ðC A Þ > 0 8ðiÞ∈0 to N þ 1

ð29Þ

Further, any network can be of two types: (i) feed-forward (without loops) or (ii) looped network structure. For a feedforward network, minor of A1,k contains only additive combinations of all the forward paths from the input-receiving to the output node. Therefore, for adaptation, at least two mutually opposing forward paths have to exist for the minor to be zero—this imposes the structural condition that there have to exist multiple forward paths from the disturbance-receiving node to the output node with mutually opposing effects on the output (refer to Fig. 4). This class of structures is called balancer module (Fig. 5) [24]. In the second scenario, where no mutually opposite paths exist, the only way to zero out the minor of A1,k is to make each term in the minor component individually zero—this is possible if and only if at least one diagonal is made zero and at least one feedback loop exists in the network. Further, from algebraic graph theory, barring the product of the diagonals, each component in the coefficients of the characteristic polynomial of A contains at least one loop. We previously proved that if at least one diagonal element of a digraph matrix is zero and all the loops in the network have a cumulative positive sign, then the coefficients of the characteristic polynomial contain at least one sign alteration, indicating the emergence of instability [24]. Therefore, the network structure without any mutually opposing forward paths from the input disturbancereceiving node to the output requires at least one negative feedback with a zero diagonal component (buffer action) to provide perfect adaptation. This is known as the opposer module (Fig. 6). Further, it can also be shown that an opposer module cannot provide perfect

Fig. 4 A schematic of balancer topology. The edges with Red and green refer to repression and activation, respectively. The node in yellow refers to the controller node

50

Priyan Bhattacharya et al.

Fig. 5 Simulation of the balancer topology. (a) refers to the disturbance input, (b) showcases the response of a 3-node balancer topology, whereas (c) demonstrates the same for a 5-node balancer topology. The simulation details can be obtained in the supporting information

Fig. 6 A schematic of opposer topology. The edges with Red and green refer to repression and activation, respectively. The node in yellow refers to the controller node with the buffer action

adaptation if it contains an edge from the output to the disturbance input-receiving node. Figure 7 demonstrates the adaptation capability of opposer modules of different sizes.

Design Principles for Biological Adaptation: A Systems and Control. . .

51

Fig. 7 Simulation of the opposer topology. (a) refers to the disturbance input, (b) showcases the response of a 3-node opposer topology, whereas (c) demonstrates the same for a 5-node opposer topology. The simulation details can be obtained in the supporting information

The above analysis in the foregone sections assumes that all the adaptive modules function in isolation—this is far from reality. In fact, due to the disturbance-rejection property, adaptive networks are often situated in the cell signaling networks where the adaptive modules are connected with a big downstream system. Therefore, verifying whether the adaptive topologies remain functionally invariant with respect to downstream connections is necessary. In this work, we assume that the downstream is only connected to the output node of the adaptive network in a feedback fashion. We previously showed using the proposed methodology that an adaptive topology (either a balancer or an opposer module without an edge from output to the input-receiving node) retains its adaptive characteristic in the presence of looped downstream connection with the output as long as the stability of the system is not altered [24]. Table 1 proposes a schematic view of the mappings from the abstract mathematical conditions to the structural underpinnings. Figure 8 illustrates this fact with the help of simulating both the opposer and balancer modules in the presence of downstream connections.

52

Priyan Bhattacharya et al.

Table 1 Finding the design principles for adaptation by first translating the design criteria to some systems requirements using systems and control theory, subsequently mapping the systems and controltheoretic conditions to structural requirements via algebraic graph theory

Fig. 8 Simulation of different combinations. (a) refers to the disturbance input, (b) showcases the response of a 3-node balancer topology and negative feedback. (c) demonstrates the behavior of a 5 - node opposer topology when connected to a downstream system. Similarly, (d) demonstrates the behavior of 5 - node balancer in downstream presence. The simulation details can be obtained in the supporting information

Design Principles for Biological Adaptation: A Systems and Control. . .

4

53

Conclusions Systems biology enables us to adopt a network-based approach to understanding complex biological systems. Together with the classical fields of biology, the network-level approach aids in drawing crucial and comprehensive insights that can, in turn, advance the development of efficient algorithms for synthetic design. For instance, in the seminal work of [32, 33] it was revealed that oscillation involves the presence of two genes, namely per and cry in Drosophila. Further, Snoussi [34] showed that a two-node network needs to be connected in a negative feedback fashion to facilitate oscillation [34]. As an extension of the above result, Snoussi [34] also argued that a complete network of any size requires at least one negative feedback for sustained oscillation. Inspired by the advances in understanding oscillator networks, Elowitz et [35] synthesized a genetic repressilator that provided sustained oscillation of tunable time period. A similar trajectory of progress can also be observed for toggle switches [34–36] and perfect adaptation [13, 37] as well. Therefore, a network-level treatment of emerging functionalities can provide a holistic understanding of the network structure and particular insights into the rate dynamics that are instrumental in the subsequent design procedure. As mentioned earlier, the approaches hitherto invested in discovering the design principles for perfect adaptation can be broadly divided into three categories: (1) computational screening, (2) rule-based, and (3) systems and control-theoretic approaches. Due to its explicit dependence on the particular rate kinetics and brute-force simulation, the computational screening is neither generalizable nor scalable. On the other hand, the rule-based technique, albeit scalable, can only provide a subset of admissible network structures, thereby losing out on exhaustivity. Although all the above three qualities (generalizability, scalability, and exhaustivity) can be met through systems and control-theoretic approaches, a thorough analysis was due. This chapter attempted to develop algorithms inspired by systems theory to deduce network structures for different varieties of adaptation in all their variety and interconnectedness. Further, for each variant of perfect adaptation, we adopted a two-step approach for unraveling the design principles. At first, we asked the following question: what are the minimal (both in terms of nodes and edges) motifs that can perform adaptation? The answer to this question served as the basis for detecting all possible admissible network structures. In its entirety, the chapter contributes to the existing literature in the following ways: we attempt to deduce the network motifs that can produce perfect adaptation in the presence of deterministic disturbance. In order to obtain the minimal motifs in this case, we characterized perfect, infinitesimal adaptation by two well-known

54

Priyan Bhattacharya et al.

performance parameters, namely sensitivity and precision. We argue that an adaptive response should ideally possess infinite precision and non-zero sensitivity. Further, the infinite precision and non-zero sensitivity can be translated to the requirement of a zero-gain, stable, linear, and time-invariant system controllable by external disturbance. Inferring these conditions using the wealth of algebraic graph theory enables us to obtain general structural requirements for adaptation. It is to be noticed at this juncture that due to the complicated dependence of the network structures on the frequency domain parameters of the system, it becomes increasingly difficult with the network size to draw any meaningful insight about the network structure from the mathematical conditions involving the parameters of the transfer function. Therefore, we propose a state-space-based approach to discover the network structures capable of perfect adaptation. Interestingly, with certain assumptions, the system matrix obtained by linearizing the nonlinear dynamical system around a steady state serves as the network structure’s digraph matrix. Therefore, posing the requirement for adaptation on the elements of the digraph matrix enabled us to draw valuable insights into the structural requirements for a network of any size using algebraic graph theory. The proposed stability conditions yield the most stringent set of network structures among the existing literature, thereby proving the conjecture by Araujo and Liotta [20]. Although this study proposes a scalable, generalizable algorithm that yields the adaptive topologies in an exhaustive manner, it consists of several assumptions and limitations that can be relaxed for more reliable predictions. The network structures obtained in this work assume identical internal environments across the cells, which is far from reality. Therefore, extending the proposed algorithm to the scenario of varying parameters can be an attractive future study area. Secondly, this work assumes that the adaptive topologies are dedicated to adaptation which is not the scenario for these network structures constitute only a tiny part of extensive, complex biochemical networks. Therefore, a proper treatment that takes the effect of unmodelled connections and inherent randomness into account is still an open problem. Further, the linearized treatment assumes the disturbance to be infinitesimally small, and the time interval between two subsequent step-type disturbances is larger than the system’s settling time. Further, the biological feasibility of the network requires the states of the underlying dynamical system to be positive—which is unachievable through the Jacobian analysis. This necessitates a nonlinear systems theory-based methodology that can further restrict the set of admissible network topologies for perfect adaptation. Overall, the methodology developed in this chapter does not require any particular knowledge of the rate dynamics of the underlying network. Therefore, the consequent structural

Design Principles for Biological Adaptation: A Systems and Control. . .

55

recommendations remain valid for many biological networks across the organism. Further, the proposed methodologies can potentially aid in drawing novel structural predictions for other functionalities crucial for the survival and growth of the living organism. Moreover, we hope the multi-disciplinary nature of the proposed methodology can encourage the multi-disciplinary research community to apply the framework of mathematical systems theory to shed light on some of the endless mysteries of life.

5

Notes Necessary information for the simulation in this text can be obtained in https://github.com/RamanLab/SystemsTheoryAdaptation.

References 1. Marcelo B, Nan H, Dohlman G, Timothy C (2007) Mathematical and computational analysis of adaptation via feedback inhibition in signal transduction pathways. Biophys J 93: 806–821 2. Bernardo M, Yuhai T (2003) Perfect and nearperfect adaptation in a model of bacterial chemotaxis. Biophys J 84:2943–2956 3. Goh LK, Sorkin A (2013) Endocytosis of receptor tyrosine kinases. Cold Spring Harb Perspect Biol 5:833–849 4. Xiao F, Doyle JC (2018) Robust perfect adaptation in biomolecular reaction networks. In: 2018 IEEE conference on decision and control (CDC), pp 4345–4352 5. Khammash MH (2021) Perfect adaptation in biology. Cell Syst 12(6):509–521 6. Ko¨nigs V, de Oliveira Freitas Machado C, Arnold B, et al (2020) SRSF7 maintains its homeostasis through the expression of SplitORFs and nuclear body assembly. Nat Struct Mol Biol 27(3):260–273 7. Vittadello ST, Stumpf MPH (2022) Open problems in mathematical biology. Math Biosci 354:108926. https://doi.org/10.1016/j. mbs.2022.108926 8. Bhattacharya P, Raman K, Tangirala A (2022) Discovering design principles for biological functionalities: perspectives from systems biology. J Biosci 47:1–23 9. Ma W, Trusina A, El-Samad H, et al (2009) Defining network topologies that can achieve biochemical adaptation. Cell 138:760–773 10. Milo R, Shen-Orr S, Itzkovitz S, et al (2002) Network motifs: simple building blocks of complex networks. Science 298:824–827

11. Qiao L, Zhao W, Tang C, et al (2019) Network topologies that can achieve dual function of adaptation and noise attenuation. Cell Syst 9: 271–285 12. Eduardo S (2003) Adaptation and regulation with signal detection implies internal model. Syst Control Lett 50:119–126 13. Briat C, Gupta A, Khammash M (2016) Antithetic integral feedback ensures robust perfect adaptation in noisy biomolecular networks. Cell Syst 2:15–26 14. Briat C, Gupta A, Khammash M (2018) Antithetic proportional-integral feedback for reduced variance and improved control performance of stochastic reaction networks. J R Soc Interface 15:20180079 15. Drengstig T, Ueda H, Ruoff P (2008) Predicting perfect adaptation motifs in reaction kinetic networks. J Phys Chem B 112:16752–16758 16. Drengstig T, Kjosmoen T, Ruoff P (2011) On the relationship between sensitivity coefficients and transfer functions of reaction. J Phys Chem B 115:6272–6278 17. Waldherr S, Streif S, Allgo¨wer F (2012) Design of biomolecular network modifications to achieve adaptation. IET Syst Biol 6:223–31 18. Bhattacharya P, Raman K, Tangirala A (2018) A systems-theoretic approach towards designing biological networks for perfect adaptation. IFACPapersOnline 51:307–312 19. Bhattacharya P, Raman K, Tangirala A (2021) Systems-theoretic approaches to design biological networks with desired functionalities. Methods Mol Biol 2189:133–155 20. Araujo RP, Liotta LA (2018) The topological requirements for robust perfect adaptation in

56

Priyan Bhattacharya et al.

networks of any size. Nat Commun 9:1757– 1769 21. Wang Y, Huang Z, Antoneli F, Golubitsky M (2021) The structure of infinitesimal homeostasis in input–output networks. J Math Biol 82:1–43 22. Artyukhin A, Wu, Altschuler J (2009) Only two ways to achieve perfection. Cell Syst 138: 619–621 23. Golubitsky M, Wang Y (2020) Infinitesimal homeostasis in three-node input-output networks. J Math Biol 80:1163–1185 24. Bhattacharya P, Raman K, Tangirala AK (2022) Discovering adaptation-capable biological network structures using control-theoretic approaches. PLoS Comput Biol 18(1):1–28. https://doi.org/10.1371/journal.pcbi. 1009769. 25. Khalil HK (2002) Nonlinear systems, 3rd edn. Prentice-Hall, Upper Saddle River, NJ. https://cds.cern.ch/record/1173048 26. Shankar S (2013) Nonlinear systems: analysis, stability, and control, 2nd edn. Prentice Hall, New Jersey. Springer Science and Business Media 27. Hespanha Joao P (2018) Linear systems theory, 2nd edn. Princeton University Press, Princeton 28. Maybee J, Driessche P, Olesky D, et al (1989) Matrices, digraphs, and determinants. Soc Ind Appl Math 10:500–519

29. Bullo F (2022) Lectures on network systems, 1.6 edn. Kindle Direct Publishing, Seattle. http://motion.me.ucsb.edu/book-lns 30. Rose HE (2002) Linear algebra: a pure mathematical approach, 1st edn. Birkh€auser, Basel 31. Khammash M (2021) Perfect adaptation in biology. Cell Syst 12:509–521 32. Saez L, Young MW (1996) Regulation of nuclear entry of the drosophila clock proteins period and timeless. Neuron 17(5):911–920 33. Hardin PE, Hall JC, Rosbash M (1992) Circadian oscillations in period gene mRNA levels are transcriptionally regulated. Proc. Natl. Acad. Sci. U. S. A. 89:11711–11715 34. Snoussi E (1998) Necessary conditions for multistationarity and stable periodicity. J Biol Syst 6:3–9 35. Elowitz MB, Leibler S (2000) A synthetic oscillatory network of transcriptional regulators. Nature 403(6767):335–338. https://doi. org/10.1038/35002125. 36. Lu J, Sherman D, Devor M, et al (2006) A putative flip–flop switch for control of REM sleep. Nature 441:589–594 37. Briat C, Khammash M (2018) Perfect adaptation and optimal equilibrium productivity in a simple microbial biofuel metabolic pathway using dynamic integral control. ACS Synth Biol 7(2):419–431. https://doi.org/10. 1021/acssynbio.7b00188

Chapter 4 Construction of Xylose-Utilizing Cyanobacterial Chassis for Bioproduction Under Photomixotrophic Conditions Xinyu Song, Yue Ju, Lei Chen, and Weiwen Zhang Abstract Xylose is a major component of lignocellulose and the second most abundant sugar present in nature after glucose; it, therefore, has been considered to be a promising renewable resource for the production of biofuels and chemicals. However, no natural cyanobacterial strain is known capable of utilizing xylose. Here, we take the fast-growing cyanobacteria Synechococcus elongatus UTEX 2973 as an example to develop the synthetic biology-based methodology of constructing a new xylose-utilizing cyanobacterial chassis with increased acetyl-CoA for bioproduction. Key words Xylose, Photomixotrophic, Cyanobacteria, Acetyl-CoA, Chassis engineering

1

Introduction Photosynthetic organisms, especially cyanobacteria, have attracted great interest as microbial cell factories to produce value-added chemicals, due to the ability to utilize solar energy to fix CO2 and the simplicity of cultivation conditions and genetic manipulation [1–7]. However, despite various genetic engineering manipulations, nowadays the titer and productivity in cyanobacterial cell factories are still far below the threshold needed for industrial application, which restricts the large-scale utilization of the cyanobacterial chassis [8]. Photomixotrophic cultivation, which refers to the supplementation of additional organic carbon substrates, such as glucose, into photoautotrophic cultivation, has been proposed recently as an effective alternative to improve cyanobacterial growth and chemical production [9]. In several recent studies, Wang et al. revealed that the addition of glucose can dramatically promote biomass accumulation in the model cyanobacterium Synechocystis sp. PCC 6803 (hereafter Synechocystis 6803) [10]. Yoshikawa et al.

Xinyu Song and Yue Ju contributed equally with all other contributors. Jeffrey Carl Braman (ed.), Synthetic Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2760, https://doi.org/10.1007/978-1-0716-3658-9_4, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2024

57

58

Xinyu Song et al.

found that the intracellular pool sizes of glycolytic intermediates and vital metabolites of the oxidative pentose phosphate (OPP) pathway in Synechocystis 6803 were higher under photomixotrophic conditions with glucose, suggesting that the addition of reduced carbon feedstocks might be valuable for the production of chemicals in cyanobacterial chassis [11]. Xylose is a major component of lignocellulose and the second most abundant sugar present in nature after glucose [12]; it, therefore, has been considered a promising renewable resource for producing biofuels and chemicals [13]. However, no natural cyanobacterial strain is known capable of utilizing xylose. To achieve xylose utilization in cyanobacteria, genes encoding xylosespecific transport systems, such as the xylE, which codes for a relatively low-affinity xylose/proton symporter, or the xylFGH, which codes for a high-affinity xylose transporter complex [14, 15], needed to be introduced in order to achieve xylose transporting into the cells. After that, intracellular xylose is converted to xylulose by xylose isomerase (XI, encoded by xylA), and xylulose is subsequently phosphorylated to xylulose-5-phosphate (X5P) by xylulokinase (XK, encoded by xylB), where it enters the pentose phosphate pathway. Several attempts have been made to introduce a xylose-utilizing pathway into cyanobacteria recently [16–18]. For example, Lee et al. have recently introduced xylose utilization genes (xylAB) into an ethylene-produced Synechocystis 6803 and found that the ethylene production was enhanced by 64% under photoautotrophic conditions with 40 μE/m2/s1 illumination in the presence of xylose [17]. In addition, the production of keto acids was also increased from 4.26 to 6.92 mM at 96 h cultivation by introducing xylAB into a glycogen-synthesis mutant, in which nearly half of the carbon into keto acids was derived from xylose under the photomixotrophic condition [17]. In another study, McEwen et al. engineered the 2,3-butanediol production in Synechococcus elongatus PCC 7942 (hereafter Synechococcus 7942) and found that production was increased up to two to four-fold via supplementation with glucose or xylose [16]. Synechococcus elongatus UTEX 2973 (hereafter Synechococcus 2973) has been identified recently with a shorter doubling time and distinct advantages of tolerance to high light and high temperatures, making it a potential chassis for cyanobacterial cell factories [19–21]. In recent studies, Song et al. tried to increase sucrose secretion in Synechococcus 2973 under salt stress conditions by expressing a sucrose permease coding gene cscB, the extracellular sucrose productivity reached 35.5 mg/L/h [22]. Previously, Ducat et al. found that sucrose export was undetectable in wild-type Synechococcus 7942 cultures [23], which is a close relative of Synechococcus 2973. Lin et al. further overexpressed the sucrosephosphate phosphatase gene spp and sucrose-phosphate synthase gene sps in this engineered strain and achieved the highest sucrose

Xylose-Utilizing Cyanobacterial Chassis for Bioproduction

59

productivity of 1.1 g/L/day without salt induction [24]. Hyejin et al. engineered a polyhydroxybutyrate (PHB)-producing Synechococcus 2973 strain by introducing heterologous phaCAB genes and achieved a PHB production of 420 mg/L in 10 days under photoautotrophic conditions, which was 2.4-fold higher than that of the native PHB Synechocystis 6803 under nitrogen deprivation [25]. All these studies demonstrated the application potential and value of the Synechococcus 2973 chassis in cyanobacterial cell factories. Acetyl coenzyme A (hereafter acetyl-CoA) is a critical precursor metabolite for the biosynthesis of a variety of fine chemicals, such as fatty acids, polyketides, and isoprenoids [26]. A higher acetyl-CoA content could be beneficial for the higher carbon yield, which is one key factor that determines economic feasibility, especially under photo-mixotrophic conditions with reduced carbon feedstocks [27]. Even with the faster cell growth, Synechococcus 2973 contains a smaller pool size of acetyl-CoA than other cyanobacteria, such as the model cyanobacterium Synechocystis 6803 [28]. To address the issue, Song et al. have recently rewired the central carbon metabolism pathway by introducing a phosphoketolase (all1483) gene from Anabaena 7120 and a phosphotransacetylase (pta) gene from Cglostridium kluyveri and deleting native acetate kinase (ack) to improve the acetyl-CoA content in Synechocystis 6803 [27]. Combining these modifications, the acetyl-CoA content in this engineered Synechocystis 6803 was approximately two-fold higher than in the wild type [27]. In this chapter, we present the detailed protocols for constructing xylose-utilizing cyanobacterial chassis by synthetic biology approach. In addition, to drive carbon flux from xylose to acetylCoA, the heterologous all1483 gene from Anabaena 7120 was expressed in the xylose-utilizing Synechococcus 2973, combined with phosphofructokinase (pfk) gene knockout and fructose-1,6bisphosphatase ( fbp) gene overexpression. Using 3-hydroxypropionic acid (3-HP) as a “proof-of-molecule,” which is an important platform chemical that can be converted to various compounds, such as malonic acid, acrylic acid, and polyhydroxypropionic acid, and has been widely applied in paint, polymers, daily chemical products, and food additives or preservatives, we demonstrate that the 3-HP biosynthesis can be improved up to approximately 4.1-fold (from 22.5 to 91.3 mg/L) compared with the engineered strain without a rewired metabolism under photomixotrophic conditions and up to approximately 14-fold compared with the strain under photoautotrophic conditions, respectively. The strategy not only increases the intracellular concentration of acetyl-CoA but also demonstrates the potential to improve the production of value-added chemicals that require acetyl-CoA as a key precursor in the cyanobacterium Synechococcus 2973.

60

2

Xinyu Song et al.

Materials and Equipment

2.1 Target Genes, Plasmids, and Strains

The xylose transporter encoding gene xylE (b4031), xylulokinase encoding gene xylB (b3564), and xylose isomerase encoding gene xylA (b3565) were amplified from E. coli K-12 MG1655 (ATCC 700926) genomic DNA. The Pkt-encoding gene (pkt, all1483) was isolated from the Anabaena 7120 (ATCC 27893) genome. The endogenous Fbp-encoding gene fbp (M744_04600) and DNA fragments homologous to the regions upstream and downstream of gene pfk (M744_13890) were amplified from Synechococcus 2973 genomic DNA. The malonyl-CoA reductase (MCR)-encoding gene mcr (Caur_2614) was obtained from Chloroflexus aurantiacus. The genomes of the above strains were lab stock. The promoters Pcpc560, Ptrc, and PsbA3 used in this study were amplified according to a previous publication. The plasmids and strains used in this study are listed in Table 1. All primers used are provided in Table 2.

2.2

Medium

The composition of BG11 liquid medium is as follows: 17.6 mM NaNO3, 0.175 mM K2HPO4, 0.245 mM CaCl2·2H2O, 0.027 mM Na2EDTA, 0.3 mM MgSO4·7H2O, 0.023 mM (III) citrate, 0.046 mM H3BO3, 0.0091 mM MnCl2·4H2O, 0.00077 mM ZnSO4·7H2O, 0.00161 mM Na2MoO4·2H2O, 0.00032 mM CuSO4·5H2O, 0.00017 mM (CoNO3)2·6H2O, 0.189 mM Na2CO3, 0.031 mM citric acid, pH 7.5. For the BG11 solid medium, 19 mM sodium thiosulfate, 10 mL/L 50× TES buffer, and 1.5% agar were added to the BG11 liquid medium, adjust the pH to 8.2.

2.3 Chemicals and Reagents

Phusion DNA polymerase and all restriction enzymes used in this study were purchased from Thermo Fisher Scientific Inc. (Waltham, MA, USA). The molecular biology kits used were from Omega Bio-Tek Inc. (Norcross, GA, USA), Transgene Inc. (Beijing, China), or Thermo Scientific Inc. (Waltham, MA, USA). Synthesis of DNA oligonucleotides and DNA sequencing was provided by Genewiz Inc. (Suzhou, Jiangsu, China). Xylose standard was purchased from Macklin Inc. (Shanghai, China). Acetate standard was purchased from Fengchuan Inc. (Tianjin, China). The 3-HP standard was purchased from TCI (Shanghai) Development Co., Ltd. (Shanghai, China).

2.4 Sample Content Analysis

Xylose, acetate, and 3-HP were all measured by high-performance liquid chromatography (HPLC). The samples were subjected to an Agilent 1260 series HPLC (Agilent Technologies, Santa Clara, CA, USA) equipped with an Aminex HPX-87H 300 × 7.8 mM ion exchange column (Bio-Rad, Hercules, USA). Xylose, acetate, and

Xylose-Utilizing Cyanobacterial Chassis for Bioproduction

61

Table 1 Plasmids and strains used in this study Strains/plasmids

Genotypes

References

pJQ01

NSII targeting vector, Pcpc560-xylEBA-Trbcl; kanR

[18]

pJQ02

NSI targeting vector, Ptrc-pkt-Trbcl; cmR

[18]

Plasmids

pJQ03 pJQ04

R

pfk targeting vector, Pcpc560-fbp-Trbcl; amp ; em NSIII targeting vector, PsbA3-mcr-Trbcl; spe

R

R

[18] [18]

Strains E. coli DH5α

Cloning host strain

TransGenBiotech

E. coli HB101

Conjugation of Synechococcus 2973

[29]

Synechococcus 2973

Wild-type Synechococcus 2973

[18]

JQ01

NSII::Pcpc560-xylEBA-Trbcl in WT

[18]

JQ02

NSI::Ptrc-pkt-Trbcl in JQ01

[18]

JQ03

pfk::Pcpc560-fbp-Trbcl in JQ01

[18]

JQ04

pfk::Pcpc560-fbp-Trbcl in JQ02

[18]

WT-mcr

NSIII::PsbA3-mcr-Trbcl in WT

[18]

JQ01-mcr

NSIII::PsbA3-mcr-Trbcl in JQ01

[18]

JQ02-mcr

NSIII::PsbA3-mcr-Trbcl in JQ02

[18]

JQ03-mcr

NSIII::PsbA3-mcr-Trbcl in JQ03

[18]

JQ04-mcr

NSIII::PsbA3-mcr-Trbcl in JQ04

[18]

3-HP were detected with an Agilent 1260 series Refractive Index Detector. Metabolomic analysis and acetyl-CoA quantitation assays were carried out on an Agilent 1260 series binary HPLC (Agilent Technologies, Santa Clara, CA, USA) using an XBridge Amide column (150 × 2.1 mM, 3.5 μM; Waters, Milford, MA, USA) coupled to an Agilent 6410 550 triple quadrupole mass analyzer equipped with an electrospray ionization source (ESI). The solvents of samples were removed using a vacuum concentrator system (ZLS-1, Her-exi, Hunan, China). 2.5

Equipment

The optical density of the cells was measured at 750 nM (OD750) with a spectrophotometer (UV-1750, Shimadzu, Kyoto, Japan). E. coli was cultured in a constant temperature incubator (HPX-9162 MBE, Boxun, Shanghai, China). Synechococcus 2973 wild-type and engineered strains were cultured in an illumination incubator (PGX-150B, Zhongyiguoke, Beijing, China). All the

62

Xinyu Song et al.

Table 2 Primers used in this study Name

Sequence (5′–3′)

P560-F

ACCTGTAGAGAAGAGTCCCTG

P560-R

TGAATTAATCTCCTACTTGAC

Ptrc-F

ATGAGCTGTTGACAATTAATC

Ptrc-R

GTGTGAAATTGTTATCCGC

Psba3-F

AATTCGCAAAGTTTTGTTATTATTAGC

Psba3-R

GATGTTTTGAGTCCAGTGAATTTTTATG

Trbcl-F Trbcl-R xylE-F xylE-rbs-xylB-R xylB-F xylB-rbs-xylA-R rbs-xylA-F xylA-R xylA-rt-F xylA-rt-R xylB-rt-F xylB-rt-R xylE-rt-F xylE-rt-R pkt-cz-F pkt-cz-R fbp-F fbp-R pfk-us-F pfk-us-R pfk-ds-R mcr-cz-F mcr-cz-R

ACCGGTGTTTGGATTGTCGG GCTGTCGAAGTTGAACATC ATGAATACCCAGTATAATTCCAG CCGATATACATATATTACCTCCTTTACAGCGTA ATGTATATCGGGATAGATC GGCTTGCATATATTACCTCCTTTACGCCATTAATGGC AGGAGGTAATATATGCAAGCCTATTTTGACC TTATTTGTCGAACAGATAATG ATGCAAGCCTATTTTGACC TTATTTGTCGAACAGATAATGG ATGTATATCGGGATAGATCTTG TTACGCCATTAATGGCAGAAG ATGAATACCCAGTATAATTCCAG TTACAGCGTAGCAGTTTG ATTTCACACAATGACTATTTCTCCTTC CAAACACCGGTTTAGTAAGGCC ATGGCTCAATCCACCACTTCCG TTAGGGCACCGACTCAGCCAAG ATCGCTGCCAGTTCCGAGTCAGC CGCGCACCCTTTCGATCGGCTTTGC ACGATCGCCCCCGCTTTTGC CTCAAAACATCATGAGCGGAACAGG AACACCGGTTTACACGGTAATCG

fragments were generated by polymerase chain reaction (PCR) in the following instruments (Vertiti 96 well Thermal Cycler, Applied Biosystems, Waltham, MA, USA; T100™ Thermal Cycler, Bio-Rad, Hercules, USA; FastAmp-T96, Bio-DL, Shanghai, China). Centrifugation used a benchtop microcentrifuge (Microfuge 16, Beckman, Shanghai, China) or a refrigerated centrifuge (Centrifuge 5430R, Eppendorf, Hamburg, Germany).

Xylose-Utilizing Cyanobacterial Chassis for Bioproduction

3

63

Methods

3.1 Determination of Xylose-Utilizing Capability of WildType Synechococcus 2973

To determine whether the wild-type Synechococcus 2973 strain could utilize xylose naturally, the optical density and xylose content were analyzed when Synechococcus 2973 wild-type strain was grown under photomixotrophic conditions with 2 g/L xylose. 1. Culture the cells of wild-type Synechococcus 2973 strain under 200 μmol photons m-2 s-1 illumination at 37 °C under photoautotrophic conditions and add 2 g/L xylose to the BG11 medium. 2. Remove 200 μL sample of the cultured bacterial liquid daily and measure the optical density of the cells at 750 nM (OD750). 3. Draw a line graph with time as the horizontal axis and cell density as the vertical axis (see Fig. 1a). 4. Centrifuge the 200 μL sample at 12000 × g for 5 min and then filter the supernatant through a 0.22 μM nylon syringe filter to analyze xylose content. 5. Run the HPLC program using 5 mM H2SO4 as the solvent at a flow rate of 0.6 mL/min for 30 min and maintain the column at 65 °C. Integrate the absorption peak at the retention time of  9.69 min [17]. 6. Draw a line graph with time as the horizontal axis and xylose content as the vertical axis (see Fig. 1b). As shown in Fig. 1a, b, no difference was observed for cell growth under photomixotrophic cultivation with the additional xylose compared with that under photoautotrophic conditions during the cultivation of the first 4 days. However, when cells get into the late stationary phase, a slight improvement in cell density and xylose consumption was observed, which is consistent with previous findings. Previous findings have shown that no natural cyanobacterial strain is known to utilize xylose [16, 17].

3.2 Construction of Xylose-Utilization Pathway in Synechococcus 2973

To engineer xylose-utilizing Synechococcus 2973, three xylose assimilation-related genes, including xylE, xylB, and xylA, were cloned from Escherichia coli and integrated into neutral site (NS) II under the control of the super strong promoter Pcpc560 in the Synechococcus 2973, resulting strain JQ01. 1. Construct recombinant plasmids pJQ01 containing target genes, xylE, xylB, and xylA, using E. coli DH5α (see Fig. 1c). 2. Cultivate E. coli DH5α possessing plasmid pJQ01 and E. coli HB101 possessing pRL443 and pRL623 overnight and then inoculate into fresh LB liquid medium at a 1:100 ~ 1:50 ratio with appropriate antibiotics, respectively.

64

Xinyu Song et al.

Fig. 1 Engineered xylose-utilizing pathway into Synechococcus 2973. (a) Cell growth of Synechococcus 2973 wild-type and engineered strain JQ01under 200 μmol photons m-2 s-1 illumination at 37 °C under photoautotrophic conditions, and 2 g/L xylose was added when cultivated under photomixotrophic conditions, respectively. (b) Xylose consumption of Synechococcus 2973 wild-type and engineered strain JQ01 under photomixotrophic conditions with 2 g/L xylose. (c) Schematic representation of the integration target genes related to the xylose-utilizing pathway into the neutral site 2 in the chromosome of Synechococcus 2973. KanR, kanamycin gene. Pcpc560, promoter. rbs, ribosome binding site. Trbcl, terminator. (d) The expression of the target gene related to the xylose-utilizing pathway was analyzed by reverse-transcription PCR. Lanes 1, 3, and 5 represent the control samples with Synechococcus 2973 wild-type cDNA as a template. Lanes 2, 4, and 6 represent the samples of strain JQ01 cDNA as a template

3. Take 10 mL of cultures at the exponential growth phase with OD600 ≈ 0.5, centrifugate at 4000 × g and 4 °C for 8 min, and then wash them with 1 mL fresh LB medium three times, respectively. 4. Resuspend two strains of cells in 0.2 mL of fresh LB and mixed together for incubation at 37 °C for 30 min. 5. Centrifuge 10 mL of the Synechococcus 2973 wild-type at the exponential growth phase (OD750 ≈ 1) at 4 °C and 4000 × g for 6 min. Discard the supernatant. 6. Resuspend the cell pellet in 0.2 mL BG11 medium and wash twice with BG-11 medium to remove antibiotic(s). 7. Mix the cyanobacterial cells with the E. coli mixture mentioned in step 4 and incubate at 37 °C for 30 min. 8. Spread 0.4 mL of mixture on sterile filters (0.45 μM pore size) and place on BG11 agar plates without antibiotic for a 24-h

Xylose-Utilizing Cyanobacterial Chassis for Bioproduction

65

incubation at 37 °C and an illumination intensity of 100 μmol photons m-2 s-1. 9. Transfer the filter onto a new BG11 agar plate with appropriate antibiotics and incubate the plate at an illumination intensity of 100 μmol photons m-2 s-1 for 5 days. 10. Select newly grown clones that appeared on the BG11 agar plate for verification and determine the expression of three heterologous genes expressed by reverse-transcriptional PCR; the positive strain is defined as strain JQ01 (see Fig. 1d). The specific operation of reverse-transcriptional PCR is as follows (the following steps used the regular conventional commercial bacterial RNA extraction kit and reverse transcription kit): (a) Synechococcus UTEX 2973 wild-type and engineered strain JQ01 were subcultured for 2 days until they reached the exponential growth phase, and the cells with the same OD750 were centrifuged, the supernatant was discarded, and the bacteria were collected. (b) Resuspend the cells with 500 μL Trizol, transfer to a 2.5 mL Eppendorf centrifuge tube, add 2 scoops of zirconia beads for grinding, and set the cell disruption 4000 times, 3 min. (c) Centrifuge the cell disruption solution at 12,000 rpm for 30 s to precipitate cell debris, transfer all the supernatant to a 1.5 mL RNase-free centrifuge tube, add 500 μL RNase-free absolute ethanol, and mix well. (d) Transfer the extract to an RNase-free adsorption column and centrifuge at 12,000 rpm for 1 min. (e) Transfer the adsorption column to a new collection tube, add 400 μL Direct-zol RNA prewash buffer, centrifuge at 12,000 rpm for 1 min, discard the waste liquid, and repeat the wash once. (f) Add 700 μL RNA wash buffer, centrifuge at 12,000 rpm for 1 min, discard the waste liquid, and centrifuge once. (g) Transfer the adsorption column to a new RNase-free 1.5 mL centrifuge tube, add 30–50 μL DNase/RNasefree water, let stand for 1 min, and centrifuge at 12,000 rpm for 2 min to obtain total RNA. (h) Use NanoDrop to detect the concentration and quality of total RNA. The specific operation process of synthesizing cyanobacterial cDNA is as follows: (a) Transfer 500 ng of total RNA to a 200 μL RNase-free PCR tube. (b) Add 4 μL 4 × gDNA wiper Mix to the PCR tube.

66

Xinyu Song et al.

(c) Add water to make up to 16 μL, pipette the pipette tip to mix evenly, and centrifuge for 30 s. (d) Incubate at 42 °C for 2 min. (e) Add 4 μL qRT SuperMix II to the above PCR tube, pipette the tip to mix well, and centrifuge slightly for 30 s. (f) React at 50 °C for 15 min, and incubate at 85 °C for 5 s. The method of reverse transcription PCR (RT-PCR) refers to the method of verifying PCR, using the cDNA product as a template, designing specific oligonucleotide sequences as primers on the Oligo Architect™ Online web page, and performing PCR reactions directly. 11. Analyze the cell growth and xylose utilization of strain JQ01 as described in Subheading 3.1. As expected, the cell growth of strain JQ01 was significantly improved under photomixotrophic conditions with 2 g/ L xylose added compared with that under photoautotrophic conditions; consistently, xylose consumption was also observed (see Fig. 1a, b). 3.3 Determination of Xylose Content During Photomixotrophic Conditions

Given the obvious improvement in cell growth of strain JQ01 under the photomixotrophic condition, we further determine the possible effects of xylose concentration on cell growth. 1. Incubate strain JQ01 in a medium supplemented with different concentrations of xylose, 2, 4, 6, 8, and 10 g/L. 2. Measure the optical density of the cells at 750 nM (OD750) and draw a line graph with time as the horizontal axis and cell density as the vertical axis (see Fig. 2a). 3. The overall growth rate was improved and the highest OD750 was observed after 5 days at an initial xylose concentration of 6 g/L. The xylose addition was set as 6 g/L consistently in subsequent experiments.

3.4 Determining the Efficiency of Converting Xylose to Intracellular AcetylCoA in Strain JQ01

Carbon yield is one key factor that determines the economic feasibility of microbial chassis. Acetyl-CoA is a critical precursor metabolite for the biosynthesis of many fine chemicals, such as fatty acids, polyketides, and isoprenoids [26, 27]. It is thus valuable to further determine the efficiency of assimilated xylose converted to acetylCoA in strain JQ01 under photomixotrophic conditions with 6 g/ L xylose. Acetyl-CoA quantitation assays are performed using the LC–MS method, except that the mass spectrometer ESI is in positive ion mode [30]. 1. Dissolve an acetyl-CoA standard, dilute it with ddH2O before use, and store it at -20 °C.

Xylose-Utilizing Cyanobacterial Chassis for Bioproduction

67

Fig. 2 Optimization of the xylose concentration for strain JQ01 under photomixotrophic conditions. (a) Effect of xylose concentration on cell growth in strain JQ01. (b) Comparison of acetyl-CoA abundance in strain JQ01 under photoautotrophic (black bar) and photomixotrophic conditions with 6 g/L xylose (red bar). Assays were performed independently in triplicate, and standard deviations (SD) are indicated by error bars. ***represents p value < 0.001

2. Rapidly collect cell samples (10 OD750) at 48 h by centrifugation at 4000 × g for 10 min at 4 °C, and quench them in liquid nitrogen for 5 min, and then freeze the samples at -80 ° C until extraction. 3. Resuspend the pellets in 800 μL 80% methanol (v/v), freezethaw three times, and then centrifuge at 14000 × g for 5 min at 4 °C to obtain supernatants. 4. Repeat the above process with 600 μL of 80% methanol and mix the supernatants with the previous step 3 correspondingly. 5. Remove the solvent using a vacuum concentrator system and dissolve the metabolite sediment using 100 μL ddH2O. 6. Run the program on an Agilent 1260 series binary HPLC using an XBridge Amide column (150 × 2.1 mM, 3.5 μM; Waters, Milford, MA, USA) coupled to an Agilent 6410 550 triple quadrupole mass analyzer equipped with an electrospray ionization source (ESI). The mobile phase A is the water phase with acetonitrile: water (2:98) and with 0.1% ammonium hydroxide and 0.2% formic acid. The mobile phase B is the organic phase with acetonitrile:water (95:5) and with 0.075% ammonium hydroxide and 0.1% formic acid. The linear gradient is as follows: 0–5 min, 100% B; 5–15 min, 100–55% B; 15–25 min, 55–25% B; 25–26 min, 25–15% B; 26–29 min, 15% B; 29–30 min, 15–100% B; and 30–37 min, 100% B. The total run time was 37 min at 0.2 mL/min.

68

Xinyu Song et al.

7. Integrate the absorption peak of samples according to the acetyl-CoA standard and compare the acetyl-CoA abundance in strain JQ01 under photoautotrophic with that under photomixotrophic conditions with 6 g/L xylose (see Fig. 2b). These results show that the acetyl-CoA content in strain JQ01 was increased, up to 3.8-fold, from 3.60 μg/g under autotrophic conditions to 17.25 μg/g under photomixotrophic conditions with 6 g/L xylose. The increased acetylCoA content might lead to higher productivity of acetyl-CoAderived target products under photomixotrophic conditions. 3.5 Analyze the Metabolism Changes of Strain JQ01 Under Photomixotrophic Conditions

The intracellular concentration of acetyl-CoA in cyanobacteria is typically lower than that in heterotrophic E. coli, which is especially true for Synechococcus 2973, as it has an even lower intracellular acetyl-CoA concentration than Synechocystis 6803 [28]. To further enhance the productivity of the new xylose-utilizing cyanobacterial chassis, the central carbon metabolism was analyzed by LC–MSbased metabolomics. Extraction and measurement of the metabolites were performed according to previous publications with the following modifications [30–33]. 1. Dissolve the candidate metabolites standards, including erythrose-4-phosphate (E4P), fructose-1,6-bisphosphate (FBP), fructose-6-phosphate (F6P), glyceraldehyde-3-phosphate (G3P), α-ketoglutarate (AKG), phosphoenolpyruvate (PEP), 3-phosphoglycerate (3-PG), ribose-5-phosphate (R5P), and ribulose-1,5-bisphosphate (RuBP), dilute with ddH2O before use, and store at -20 °C. 2. Cultivate the strain JQ01 under photomixotrophic with 6 g/L xylose and autotrophic conditions, respectively. 3. Rapidly collect cell samples (10 OD750) at 48 h by centrifugation at 4000 × g for 10 min at 4 °C, quench them in liquid nitrogen for 5 min, and then freeze the samples at -80 °C until extraction. 4. Exact the intracellular metabolites as described in Subheading 3.5, steps 3–5 and run the program as described in Subheading 3.5, step 6. 5. Integrate the absorption peak of samples according to the corresponding standards and compare the changes of central carbon metabolites in strain JQ01 between autotrophic and photomixotrophic conditions. 6. Normalize the data of photomixotrophic conditions by setting the data of autotrophic conditions as 1 for each metabolite (see Fig. 3). Several pivotal metabolites involved in the EmbdenMeyerhof-Parnas (EMP) and oxidative pentose phosphate (OPP) pathways, including F6P, FBP, G3P, R5P, and E4P,

Xylose-Utilizing Cyanobacterial Chassis for Bioproduction

69

Fig. 3 Targeted LC-MS metabolomic analysis of the engineered strain JQ01 under photoautotrophic and photomixotrophic conditions. The black bar represents photoautotrophic conditions, and the red bar represents photomixotrophic conditions. Assays were performed independently in triplicate, and standard deviations (SD) are indicated by error bars. A significant difference is indicated by an asterisk (*). Abbreviations: acetyl coenzyme A (Ac-CoA), acetyl-phosphate (AcP), citrate (CIT), dihydroxyacetone phosphate (DNAP), erythrose-4-phosphate (E4P), fructose-1,6-bisphosphate (FBP), fructose-6-phosphate (F6P), fumarate (FUM), glucose-6-phosphate (G6P), glyceraldehyde-3-phosphate (G3P), isocitrate (ICIT), α-ketoglutarate (AKG), malate (MAL), oxaloacetate (OAA), phosphoenolpyruvate (PEP), pyruvate (PYR), 3-phosphoglycerate (3-PG), ribose-5-phosphate (R5P), ribulose-1,5bisphosphate (RuBP), ribulose-5-phosphate (Ru5P), sedoheptulose-7phosphate (S7P), succinate (SUC), tricarboxylic acid (TCA), xylulose-5phosphate (X5P), xylose isomerase gene (xylA), xylose transporter gene (xylE), and xylulokinase gene (xylB)

were significantly increased under photomixotrophic conditions, suggesting that intracellular central carbohydrate metabolism in Synechococcus 2973 was enhanced by photomixotrophic cultivation supplemented with xylose. Interestingly, although the F6P content was increased up to 21.5-fold under photomixotrophic conditions compared with photo-autotrophic conditions, acetyl-CoA was increased only 3.8-fold, which led us to a hypothesis that acetyl-CoA

70

Xinyu Song et al.

Fig. 4 Engineering central carbon metabolism pathway to drive more carbon flux to acetyl-CoA. (a–c) represent the schematic of engineered strains JQ02, JQ03, and JQ04, respectively. CmR, chloramphenicol gene. Ptrc, promoter. EmR, erythromycin gene. Pcpc560, promoter. rbs, ribosome binding site. Trbcl, terminator. (d) Analysis of the intracellular acetyl-CoA content in engineered strains under photomixotrophic conditions. Assays were performed independently in triplicate, and standard deviations (SD) are indicated by error bars. DCW, dry cell weight

could be further increased by modifying native central carbohydrate metabolism. 3.6 Drive More Carbon Flux to AcetylCoA by Rewiring Central Carbon Metabolism

To improve the conversion efficiency from xylose to acetyl-CoA, the central carbon metabolism is rewired to increase the acetyl-CoA pool by synthetic biology. 1. Introduce pyruvate decarboxylation encoding gene pkt (all1483) from Anabaena 7120 into strain JQ01 by inserting it into neutral site I to drive more carbon flux from F6P and X5P to acetyl-CoA, resulting in strain JQ02 (see Fig. 4a). 2. Delete the pfk gene, which is responsible for the reaction from F6P to FBP, to further drive carbon flux from FBP to F6P by inserting all1483 into the native pfk site in strain JQ01, resulting in strain JQ03 (see Fig. 4b). 3. Analyze the acetyl-CoA content in all the engineered strains after cultivation of 5 days. Synechococcus 2973 (20 mL) is harvested, vacuum freeze-dried, and weighed. The dry cell weight (DCW) is calculated with the units of g/L. The intracellular acetyl-CoA contents of strains JQ02 and JQ03 were 22.3 μg/g DCW and 47.1 μg/g DCW, respectively, representing respective increases of 24.6% and 163.1% compared with the control strain JQ01 (17.9 μg/g DCW) (see Fig. 4d).

Xylose-Utilizing Cyanobacterial Chassis for Bioproduction

71

4. Integrate the modification in both strain JQ02 and strain JQ03 to generate strain JQ04 (see Fig. 4c) and analyze the acetyl-CoA content in all the engineered strains after cultivation of 5 days. The acetyl-CoA content in strain JQ04 was further enhanced to 58.6 μg/g DCW after cultivation of 5 days, representing an increase of 227.4% compared with the control strain JQ01 (see Fig. 4d). 3.7 Bioproduction of 3-HP in the Rewired Chassis Under Photomixotrophic Conditions

3-HP was selected as a “proof-of-molecule” to determine the effectiveness of the chassis with rewired central carbohydrate metabolism. As the most important precursor of 3-HP production, malonyl-CoA is generated from acetyl-CoA by acetyl-CoA carboxylase, and the final product 3-HP is produced via a two-step reaction catalyzed by malonyl-CoA reductase (MCR). 1. Clone the malonyl-CoA reductase encoding gene mcr (Caur_2614) from Chloroflexus aurantiacus. 2. Introduce the mcr gene into the NSIII neutral site under the control of a strong PsbA3 promoter in the genome of strains JQ01, JQ02, JQ03, and JQ04, generating strains JQ01-mcr, JQ02-mcr, JQ03-mcr, and JQ04-mcr. 3. Cultivate these engineered strains in an illuminating incubator with 200 μmol photons m-2 s-1 continuous illumination and 37 °C. 4. Determinate the cell growth of these four engineered strains and draw a line graph with time as the horizontal axis and cell density as the vertical axis (see Fig. 5a). 5. Analyze the xylose consumption of these four engineered strains as described in Subheading 3.1, steps 4 and 5 and draw a line graph with time as the horizontal axis and xylose content as the vertical axis (see Fig. 5b). 6. Remove 200 μL of the cultured liquid of four engineered strains daily and centrifuge the samples at 12000 × g for 5 min. 7. Filter the supernatant, freeze-dry them for 24 h, and then add 40 μL ddH2O to dissolve the samples. 8. Prepare the HPLC system with an Aminex HPX-87H 300 × 7.8 mM ion exchange column. Perform the program using 5 mM H2SO4 as the solvent at a flow rate of 0.6 mL/min for 30 min and maintain the column at 65 °C. 9. Detect the productivity of 3-HP with an Agilent 1260 series Refractive Index Detector and record the retention times at 13.34 min. Draw a line graph with time as the horizontal axis and 3-HP titer as the vertical axis (see Fig. 5c). 10. Increase cultivation temperature from 37 to 41 °C to optimize the productivity of strain JQ04-mcr. 11. Detect xylose concentration and 3-HP production of JQ04mcr cultivated at 41 °C and draw a line graph with time as the

72

Xinyu Song et al.

Fig. 5 Effect of the rewired central carbon metabolism on 3-HP production. (a–c) represent the cell growth, xylose consumption, and the yield of 3-HP produced strain under photomixotrophic conditions, respectively. (d) 3-HP production and xylose concentration of strain JQ04-mcr cultivated at 41 °C under photomixotrophic conditions. Assay was performed independently in triplicate, and standard deviations (SD) are indicated by error bars

horizontal axis and xylose content and 3-HP titer as the vertical axis (see Fig. 5d).

4 Notes 1. To prepare BG11 medium for cyanobacterial cultivation, the components of CaCl2 and ferric ammonium citrate need to be autoclaved separately and then added after the other components of the autoclaved medium are cooled to avoid precipitation [34]. 2. Avoid bubbles in a 200 μL liquid when measuring the optical density of cells using a 96-well plate; otherwise, the measured accuracy will be affected [34]. 3. To successfully achieve the positive transformants, it is necessary to make sure the sterile filter is fully attached to the solid

Xylose-Utilizing Cyanobacterial Chassis for Bioproduction

73

plate when conjugation of recombinant plasmids into Synechococcus 2973 [30]. 4. When performing HPLC chromatography, the mobile phase must undergo ultrasonic degassing, and the liquid cannot be shaken violently after the degassing is completed [36]. 5. Samples used for the detection of metabolites or 3-HP need to be concentrated before performing LC/LC–MS analysis [7]. 6. For the insertion or deletion of genes into the genome, corresponding plasmids need to be constructed first. Draw a map according to the plasmid function described in the article, amplify the required fragment according to the primers in Table 2 using the relevant template (genome, plasmids, or other PCR product), and then purify the PCR product (using a commercial PCR purification kit). The related fragments were cloned into the vector by blunt end ligation. The plasmid was transformed into E.coli DH5α by the heat shock method, and a positive single colony was verified by PCR. Transformation of the correct plasmid into cyanobacteria follows the steps in Subheading 3.2. For specific experimental steps, please refer to the article published by Li et al. on MIMB in 2021 [33]. 7. Under photomixotrophic conditions, cyanobacteria can utilize inorganic carbon sources (such as CO2 fixation through the Calvin cycle) and simultaneously utilize organic carbon sources (such as glucose and acetate). Photomixotrophic conditions can enhance the photosynthesis of cyanobacteria and promote strain growth and biomass accumulation, and acetyl-CoA is a key precursor of many chemicals, and the increase of its content may lead to an increase in the production of its derivative chemicals [27]. 8. The data in Fig. 3 were divided by the data obtained under the photomixotrophic condition for each metabolite under the autotrophic condition, that is, the vertical axis of the autotrophic condition in the figure is 1 [18].

Acknowledgments This chapter was supported by grants from the National Key Research and Development Program of China (Grant Nos. 2019YFA0904600, 2018YFA0903000, 2018YFA0903600, and 2020YFA0906800) and the National Natural Science Foundation of China (Grant Nos. 31901016, 31770035, 31972931, 32270091, and 21621004).

74

Xinyu Song et al.

References 1. Atsumi S, Higashide W, Liao JC (2009) Direct photosynthetic recycling of carbon dioxide to isobutyraldehyde. Nat Biotechnol 27(12): 1177–1180. https://doi.org/10.1038/nbt. 1586 2. Gao X, Gao F, Liu D et al (2016) Engineering the methylerythritol phosphate pathway in cyanobacteria for photosynthetic isoprene production from CO2. Energy Environ Sci 9(4): 1400–1411. https://doi.org/10.1039/ C5EE03102H 3. Liu X, Miao R, Lindberg P et al (2019) Modular engineering for efficient photosynthetic biosynthesis of 1-butanol from CO2 in cyanobacteria. Energy Environ Sci 12(9): 2765–2777. https://doi.org/10.1039/ C9EE01214A 4. Mulkidjanian AY, Koonin EV, Makarova KS et al (2006) The cyanobacterial genome core and the origin of photosynthesis. Proc Natl Acad Sci U S A 103(35):13126–13131. https://doi.org/10.1073/pnas.0605709103 5. Wang B, Pugh S, Nielsen DR et al (2014) Engineering cyanobacteria for photosynthetic production of 3-hydroxybutyrate directly from CO2. Metab Eng 21:1–1. https://doi.org/10. 1016/j.ymben.2013.10.008 6. Wang X, Liu W, Xin CP et al (2016) Enhanced limonene production in cyanobacteria reveals photosynthesis limitations. Proc Natl Acad Sci U S A 113(50):14225–14230. https://doi. org/10.1073/pnas.1613340113 7. Wang Y, Sun T, Gao X et al (2016) Biosynthesis of platform chemical 3-hydroxypropionic acid (3-HP) directly from CO2 in cyanobacterium Synechocystis sp. PCC 6803. Metab Eng 34:60–70. https://doi.org/10.1016/j. ymben.2015.10.008 8. Angermayr SA, Gorchs Rovira A, Hellingwerf KJ (2015) Metabolic engineering of cyanobacteria for the synthesis of commodity products. Trends Biotechnol 33(6):352–361. https:// doi.org/10.1016/j.tibtech.2015.03.009 9. Yu H, Jia S, Dai Y (2009) Growth characteristics of the cyanobacterium Nostoc flagelliforme in photoautotrophic, mixotrophic and heterotrophic cultivation. J Appl Phycol 21(1): 127–133. https://doi.org/10.1007/s10811008-9341-5 10. Wang Y, Li Y, Shi D et al (2002) Characteristics of mixotrophic growth of Synechocystis sp. in an enclosed photobioreactor. Biotechnol Lett 24(19):1593–1597. https://doi.org/10. 1023/A:1020384029168

11. Yoshikawa K, Hirasawa T, Ogawa K et al (2013) Integrated transcriptomic and metabolomic analysis of the central metabolism of Synechocystis sp. PCC 6803 under different trophic conditions. Biotechnol J 8(5):571–580. https://doi.org/10.1002/biot.201200235 12. Van Dyk J, Pletschke B (2012) A review of lignocellulose bioconversion using enzymatic hydrolysis and synergistic cooperation between enzymes-factors affecting enzymes, conversion and synergy. Biotechnol Adv 30(6): 1458–1480. https://doi.org/10.1016/j.bio techadv.2012.03.002 13. Zhao Z, Xian M, Liu M et al (2020) Biochemical routes for uptake and conversion of xylose by microorganisms. Biotechnol Biofuels 13:21. https://doi.org/10.1186/s13068-0201662-x 14. Davis E, Henderson P (1987) The cloning and DNA sequence of the gene xylE for xyloseproton symport in Escherichia coli K12. J Biol Chem 262(29):13928–13932. https://doi. org/10.1016/s0021-9258(18)47883-0 15. Jojima T, Omumasaba CA, Inui M et al (2010) Sugar transporters in efficient utilization of mixed sugar substrates: current knowledge and outlook. Appl Microbiol Biotechnol 85(3):471–480. https://doi.org/10.1007/ s00253-009-2292-1 16. McEwen JT, Machado IM, Connor MR et al (2013) Engineering Synechococcus elongatus PCC 7942 for continuous growth under diurnal conditions. Appl Environ Microbiol 79(5): 1668–1675. https://doi.org/10.1128/AEM. 03326-12 17. Lee TC, Xiong W, Paddock T et al (2015) Engineered xylose utilization enhances bio-products productivity in the cyanobacterium Synechocystis sp. PCC 6803. Metab Eng 30:179–189. https://doi.org/10.1016/j. ymben.2015.06.002 18. Yao J, Wang J, Ju Y et al (2022) Engineering a xylose-utilizing Synechococcus elongatus UTEX 2973 chassis for 3-hydroxypropionic acid biosynthesis under photomixotrophic conditions. ACS Synth Biol 11(2):678–688. https://doi. org/10.1021/acssynbio.1c00364 19. Mueller TJ, Ungerer JL, Pakrasi HB et al (2017) Identifying the metabolic differences of a fast-growth phenotype in Synechococcus UTEX 2973. Sci Rep 7:41569. https://doi. org/10.1038/srep41569 20. Ungerer J, Wendt KE, Hendry JI et al (2018) Comparative genomics reveals the molecular determinants of rapid growth of the

Xylose-Utilizing Cyanobacterial Chassis for Bioproduction cyanobacterium Synechococcus elongatus UTEX 2973. Proc Natl Acad Sci U S A 115(50): 201814912. https://doi.org/10.1073/pnas. 1814912115 21. Yu J, Liberton M, Cliften PF et al (2015) Synechococcus elongatus UTEX 2973, a fast growing cyanobacterial chassis for biosynthesis using light and CO2. Sci Rep 5:8132. https://doi. org/10.1038/srep08132 22. Song K, Tan X, Liang Y et al (2016) The potential of Synechococcus elongatus UTEX 2973 for sugar feedstock production. Appl Microbiol Biotechnol 100(18):7865–7875. https://doi.org/10.1007/s00253-0167510-z 23. Ducat DC, Avelar-Rivas JA, Way JC et al (2012) Rerouting carbon flux to enhance photosynthetic productivity. Appl Environ Microbiol 78(8):2660–2668. https://doi.org/10. 1128/AEM.07901-11 24. Lin P-C, Zhang F, Pakrasi HB (2020) Enhanced production of sucrose in the fastgrowing cyanobacterium Synechococcus elongatus UTEX 2973. Sci Rep 10(1):390. https:// doi.org/10.1038/s41598-019-57319-5 25. Hr A, Jsl A, Hong I et al (2021) Improved CO2-derived polyhydroxybutyrate (PHB) production by engineering fast-growing cyanobacterium Synechococcus elongatus UTEX 2973 for potential utilization of flue gas. Bioresour Technol 327:124789. https://doi.org/10. 1016/j.biortech.2021.124789 26. Choi YN, Lee JW, Kim JW et al (2020) AcetylCoA-derived biofuel and biochemical production in cyanobacteria: a mini review. J Appl Phycol 5991. https://doi.org/10.1007/ s10811-020-02128-x 27. Song X, Diao J, Yao J et al (2021) Engineering a central carbon metabolism pathway to increase the intracellular acetyl-CoA pool in Synechocystis sp. PCC 6803 grown under photomixotrophic conditions. ACS Synth Biol 10(4):836–846. https://doi.org/10.1021/ acssynbio.0c00629 28. Abernathy MH, Yu J, Ma F et al (2017) Deciphering cyanobacterial phenotypes for fast photoautotrophic growth via isotopically

75

nonstationary metabolic flux analysis. Biotechnol Biofuels 10:273. https://doi.org/10. 1186/s13068-017-0958-y 29. Kumar V, Ashok S, Park S (2013) Recent advances in biological production of 3-hydroxypropionic acid. Biotechnol Adv 31(6):945–961. https://doi.org/10.1016/j. biotechadv.2013.02.008 30. Li S, Sun T, Xu C et al (2018) Development and optimization of genetic toolboxes for a fast-growing cyanobacterium Synechococcus elongatus UTEX 2973. Metab Eng 48:163– 174. https://doi.org/10.1016/j.ymben. 2018.06.002 31. Cui J, Sun T, Li S et al (2020) Improved salt tolerance and metabolomics analysis of Synechococcus elongatus UTEX 2973 by overexpressing Mrp antiporters. Front Bioeng Biotechnol 8:500. https://doi.org/10.3389/fbioe.2020. 00500 32. Cui J, Good NM, Bo H et al (2016) Metabolomics revealed an association of metabolite changes and defective growth in methylobacterium extorquens AM1 overexpressing ecm during growth on methanol. PLoS One 11(4): e0154043. https://doi.org/10.1371/journal. pone.0154043 33. Day JG, Achilles-Day U, Brown S et al (2007) Cultivation of algae and protozoa. Manual of environmental microbiology 79–92. https:// doi.org/10.1128/9781555815882.ch7 34. Li S, Sun T, Chen L et al (2021) Designing and constructing artificial small RNAs for gene regulation and carbon flux redirection in photosynthetic cyanobacteria. Manual of Environmental Microbiology 2290:229–252. https://doi.org/10.1007/978-1-0716-13238_16 35. Berg M, Undisz K, Thiericke R et al (2001) Evaluation of liquid handling conditions in microplates. Journal of biomolecular screening 6(1):47–56. https://doi.org/10.1177/ 108705710100600107 36. Battino R, Clever HL (1996) The solubility of gases in liquids. Chemical Reviews 66(4):395– 463. https://doi.org/10.1021/cr60242a003

Chapter 5 Allosteric-Regulation-Based DNA Circuits in Saccharomyces cerevisiae to Detect Organic Acids and Monitor Hydrocarbon Metabolism In Vitro Michael Dare Asemoloye

and Mario Andrea Marchisio

Abstract We show the engineering of prokaryotic-transcription-factor-based biosensing devices in Saccharomyces cerevisiae cells for an in vitro detection of common hydrocarbon intermediates/metabolites and potentially, for monitoring of the metabolism of carbon compounds. We employed the bacterial receptor proteins MarR (multiple antibiotic-resistant receptor) and PdhR (pyruvate dehydrogenase-complex regulator) to detect benzoate/salicylate and pyruvate, respectively. The yeast-enhanced green fluorescence protein (yEGFP) was adopted as an output signal. Indeed, the engineered yeast strains showed a strong and dynamic fluorescent output signal in the presence of the input chemicals ranging from 2 fM up to 5 mM. In addition, we describe how to make use of these strains to assess over time the metabolism of complex hydrocarbon compounds due to the hydrocarbon-degrading fungus Trichoderma harzianum (KY488463). Key words Receptor protein, Biosensor, Saccharomyces cerevisiae, Hydrocarbons, MarR, PdhR

1

Introduction In order to avoid expensive chromatographic analytical methods (e.g., HPLC, LC/MS/MS, GCMS, GCXGC; solid-phase microextraction (SPME) [1, 2]), scientists are now interested in developing low-cost biological devices for the rapid detection/screening of different chemicals either in the intracellular or extracellular environment. A biosensor, as the name depicts, entails sensing of signals (be it light, chemicals, ions, or metals) by a biological entity and transducing them into a response that can be read directly or via cheap sensor devices with high sensitivity and specificity. Biosensors were initially developed to spot detect promptly hazardous/xenobiotic compounds [3, 4]. Nowadays, they are needed in various industrial and environmental applications as well as in clinical diagnostics [5]. In general, biosensors are engineered with the utmost aim of detecting specific compounds/inducers prioritizing

Jeffrey Carl Braman (ed.), Synthetic Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2760, https://doi.org/10.1007/978-1-0716-3658-9_5, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2024

77

78

Michael Dare Asemoloye and Mario Andrea Marchisio

cost-effectiveness and response time and they should be qualified in terms of specificity and scalability [3]. The transcriptional-based biosensors (TBB) are engineered by leveraging the capability of transcription factors (TFs) to control gene expression. This is commonly utilized by synthetic biologists to genetically modify microbes for the detection of compounds of interest ranging from environmental pollutants and xenobiotics to clinical biomarkers [6–9]. They have great potential and are rated as the next generation of synthetic biology-derived biosensors [5]. They are cheap, biotic, and stable at room temperature, which makes them advantageous for point-of-care and real-time applications [9]. Many TBBs have been designed as toolboxes that aid the functionality of many circuits in vivo [5, 10–12]. Meanwhile, a new milestone in biosensor design is the engineering of inter/intracellular metabolic biosensors that would allow real-time monitoring of either the metabolism of particular compounds or the detection of their intermediate/end products [3]. Microbial conversion/degradation of hydrocarbons to methane/methyl (methanogenesis/methylogenesis) is one of the most important bioprocesses occurring in hydrocarbon-polluted ecosystems. These phenomena entail a series of events such as the secretion of cascades of enzymes for the transformation of more complex to simpler hydrocarbon intermediates from which organic acids like catechol, benzoic acid (benzoate), and salicylic acid (salicylate) are finally formed [13–15]. Permeation of organic acids is known to have non-negligible effects on living cells/tissues, including the modification of cell pH [16, 17], the accumulation in plants [18– 20], and the triggering of antibiotic resistance in many enteric bacterial pathogens [21–23]. It is thus important to know their presence or concentration in living cells and the natural environment. The MarR protein, in E. coli cells, dissociates from its promoter upon binding benzoate or salicylate. This causes the derepression of the multiple antibiotic resistance operon marRAB, which triggers a large number of multidrug efflux systems [23–26]. Similarly, the PdhR protein, which belongs to the gluconate-operon transcription repressor (GnTR) family of transcription factors (TFs), acts as a repressor of the gluconate operon in Bacillus subtilis [27, 28]. Upon binding pyruvate, PdhR dissociates from its operator sequence on the DNA provoking, in this way, the derepression of the pdhR-aceEF-lpdA operon. In this chapter, we describe the engineering of yeast TBBs, which detect organic acids in environmental samples, by using the bacterial allosteric transcription factors MarR and PdhR. We also illustrate a protocol to use these biosensors to monitor the metabolism of carbon/hydrocarbon compounds. These engineered yeast strains might become important for environmental care and bioremediation.

Synthetic Saccharomyces cerevisie Biosensors for Organic Acids

2

79

Materials Use double distilled water (ddH2O) for the preparation of chemicals, buffers, reagents, and media. All solutions and plates should be autoclaved at 121 °C for 20 min, before being stored at 4 °C. Every DNA protein sequence should be yeast codon optimized before being synthesized (see Note 1). Bacterial-competent E. coli cells are used to store plasmids (see Note 2). For our experiments, the yeast strain (wild-type) CEN.PK2-1C was chosen (see Note 3).

2.1 In Silico DNA Analysis and PCR

1. DNA parts and primer design: We use the “CLC sequence viewer 8.0” (QIAGEN Aarhus A/S) and the “OligoAnalyzer” (IDT, Integrated DNA Technologies) software for the design of DNA sequences (both circuit components and oligos—see Note 4). 2. Primer solution: Primers should be at a final concentration of 10 μM. Centrifuge the primer tubes for at least 10 s and add ddH2O to dissolve the DNA in a 100 μM stock solution (see Note 5). Mix 1 μL of 100 μM primer stock solution with 9 μL ddH2O to get a 10 μM primer solution for PCR. 3. dNTP mix: Make a 2.5 mM dNTP mix by mixing 250 μL of 100 mM dATP, dGTP, dCTP, and dTTP into a 15 mL centrifuge tube together with 9 mL ddH2O. Aliquot the solution into 1.5 mL tubes (0.5 mL/tube). Store at -20 °C. 4. PCR mixture: Mix DNA template (20–40 ng), primers (1 μL each primer—at 10 μM), 5 μL of 2.5 mM dNTP mix, 0.5 μL of DNA polymerase, 10 μL of 10× DNA polymerase reaction buffer, and ddH2O up to a volume of 50 μL.

2.2 Agarose Gel Electrophoresis

1. 10× TBE buffer: Weigh 108 g of Tris and 55 g of boric acid, and pour them into a 1-L glass bottle. Add 40 mL of 0.5 M EDTA (pH 8.0) and 700 mL of ddH2O. Mix with a magnetic stir bar. Add ddH2O up to 1 L volume. Mix again with a magnetic stir bar. 2. 0.5× TBE buffer: Mix 50 mL of 10× TBE buffer with 950 mL ddH2O. 3. Agarose gel: Add agarose to 50 mL of 0.5× TBE buffer in a glass flask (see Note 6). Melt the agarose by heating the flask in a microwave oven. 4. Pour the solution into the gel preparation cast. Insert a comb with the desired number and size of teeth. Let the solution solidify for about 20 min at 4 °C. 5. 6× DNA loading dye: Mix 25 mg of amaranth dye, 3 mL glycerol, and ddH2O up to a final volume of 10 mL (use a

80

Michael Dare Asemoloye and Mario Andrea Marchisio

15 mL plastic tube). Aliquot the solution into 1.5 mL black tubes (1 mL/tube) and store them at -20 °C. Add a nucleic acid dye before using (see Note 7). 2.3 Isothermal DNA Assembly (Gibson Method)

1. Gibson reaction buffer: Mix 1.5 g of PEG8000, 3 mL of 1 M Tris-HCL (pH 7.5), 150 μL of 2 M MgCL2, 60 μL of 100 mM dGTP/dATP/dTTP/dCTP, 300 μL of 1 M DTT, 1.5 mL of 20 mM NAD, and ddH2O up to an overall volume of 6 mL. Use a 10 mL tube. Aliquot the solution into 1.5 mL sterile tubes (320 μL/tube). Store at -80 °C. 2. Gibson assembly master mixture: Place 80 PCR tubes on ice. Take 320 μL of the Gibson reaction buffer (one tube prepared as in Subheading 2.3, item 1) and add 0.64 μL of T5 exonuclease (10 U/μL), 20 μL of Phusion DNA polymerase (2 U/μ L), 160 μL of Taq DNA ligase (40 U/μL), and ddH2O up to a volume of 1.2 mL. Mix with a bench vortex mixer and a minicentrifuge. Aliquot the solution into the ice-cold PCR tubes (15 μL/tube). Store at -20 °C.

2.4 Yeast Transformation

1. 10× TE buffer: Mix 5 mL of 1 M Tris–HCL (pH 8.0), 1 mL of 0.5 M EDTA (pH 8.0), and 44 mL ddH2O in a 100 mL glass bottle. Autoclave the solution for 20 min at 121 °C. Store at room temperature. 2. 1× TE buffer: Mix 10 mL of 10× TE buffer with 90 mL ddH2O. 3. 10 mg/mL salmon sperm DNA (ssDNA): Add 500 mg salmon sperm DNA to 50 mL of TE buffer. Mix over 2 days at 4 °C. Aliquot into 1.5 ml tubes (1.0 mL/Tube). Boil the tubes for 20 min at 95 °C (see Note 8). 4. 1 M lithium acetate solution (LiAc): Dissolve 10.2 g of lithium acetate dihydrate in 100 mL ddH2O. 5. LiAc mix: Mix 10 mL of LiAc, 10 mL of 10× TE buffer, and 80 mL ddH2O in a beaker. Sterilize the solution with a 0.22 μM filter. 6. PEG mix: Dissolve 40 g of PEG 3350 in 70 mL LiAc mix. Add LiAc mix up to a volume of 100 mL. Sterilize the solution with a 0.22 μM filter.

2.5 Media: Solutions and Plates

1. LB (Luria-Bertani) (plus ampicillin) solution/plate: Pour 10 g of bacto-tryptone, 5 g of yeast extract, and 10 g of NaCL into a 1 L glass bottle (for plates, add 15 g—2%—of agar). Add ddH2O up to 999 mL and autoclave. 2. LB plus ampicillin solution/plate: Let LB (or LB plus agar) solution, as in Subheading 2.6, item 1, cool down to 55–60 °C and add 1 mL ampicillin (100 mg/mL) (see Note 9).

Synthetic Saccharomyces cerevisie Biosensors for Organic Acids

81

3. YPD medium: Pour 20 g of bacto-peptone, 10 g of yeast extract, and adenine hemisulfate (a spatula tip) into a 1 L glass bottle. Add ddH2O up to 800 mL. Dissolve 20 g of glucose in 150 mL ddH2O, then add ddH2O again in order to reach a volume of 200 mL. Autoclave the two solutions. Pour the glucose solution (200 mL) into the 800 mL mixture to get 1 L YPD solution. 4. AA mix preparation: Mix in a 250 mL glass bottle 0.5 g of adenine and 2.0 g of each of the following amino acids: alanine, arginine, asparagine, aspartic acid, cysteine, glutamine, glutamic acid, glycine, isoleucine, methionine, lysine, phenylalanine, proline, serine, threonine, tyrosine, and valine (see Note 10). Finally, add glass beads. Shake the bottle for about 15 min. 5. Synthetic defined complete (SDC) medium: Mix 20 g of glucose, 6.7 g of YNB, 2 g of AA mix, 79.2 g of histidine, 396 g of leucine, 79.2 g of tryptophan, and 79.2 g of uracil in a 1 L glass bottle. Add ddH2O to reach a volume of 1 L. 6. Bushnell Hass (BH) synthetic broth M350 for characterization: This growth medium contains only essential micro- and macro-nutrients without any carbon source. Mix: magnesium sulfate (0.20 g/L), calcium chloride (0.20 g/L), monopotassium dihydrogen phosphate (1.00 g/L), dipotassium phosphate (1.00 g/L), ammonium nitrate (1.00 g/L), ferric chloride (1.00 g/L), and ddH2O up to a volume of 1 L. 2.6 FluorescenceActivated Cell Sorting (FACS)

1. Fluorescent beads preparation: Pour 800–900 μL ddH2O in a 2 mL black tube. Add a drop of beads (BD FACS SuiteTM CS&T Research Beads 650621). 2. Cleaning solution preparation: Filter 50 mL of sodium hypochlorite solution with a 0.22 μM filter.

2.7 Other Chemicals and Reagents

1. Anthracene, benzoate, salicylate, pyruvate, dichloromethane, ethyl acetate, and ethanol preparation: all chemicals should be of analytical grade (>99%) and, where necessary, mixed with ultra-pure water (see Note 11). 2. An environmental sample containing (some of) the reagents detailed in Subheading 2.7, item 1 was needed to test the biosensor operability. We used engine oil waste from an automechanic shop in Tianjin, China.

3

Methods

3.1 Biosensor Design and Assembly

1. Every biosensor is organized in a receptor and a reporter element. The former interacts with the inputs and the second processes the inputs into an output signal. In these biosensors,

82

Michael Dare Asemoloye and Mario Andrea Marchisio

both receptor and reporter consist of a single transcription unit (TU), i.e., a DNA sequence where a gene (encoding, in this case, for a protein) is flanked by a promoter and a terminator. Each transcription unit is inserted into the integrative plasmid (backbone) pRSII405 (the receptor) or pRSII406 (the reporter) [29] after removing the multiple cloning site (MCS) located between the Acc651 and Sac1 restriction sites. 2. The receptor elements MarR and PdhR have been selected since they are orthogonal repressors in Saccharomyces cerevisiae (see Note 12). Moreover, their effect on transcription can be controlled by varying the concentration of benzoate/salicylate and pyruvate, respectively. MarR and PdhR have been synthesized, yeast codon optimized, and fused to a nuclear localization sequence (NLS) and a HIS tag (for Western blotting). 3. Their expression is driven by the moderately strong ADH1 promoter (pADH1—about 38% of the strength of the GPD promoter, see Note 13). 4. The reporter element: We employed the yeast-enhanced green fluorescence protein (yEGFP) as a source of the output signal [30]. The reporter TU contains a synthetic promoter where either MarR or PdhR operators are inserted into the wellcharacterized sequence of the S. cerevisiae constitutive CYC1 promoter stripped of its two UASs (upstream activating sequences) and one TATA box (pCYC1min) [31]. At the end of the TU, the short synthetic terminator (Tsynth6) [32] is placed. 5. The synthetic promoters in the reporter element: We realized three different synthetic promoters for the inducible expression of fluorescence: (a) pCYC1min_marOpR(-65, 22 nt), which contains the short 22-nt-long binding site from the right part of the full marOp (76 nt overall). marOpR is placed, along the synthetic promoter, at position -65 with respect to the transcription start site (TSS). (b) pCYC1min_marOpL(-99, 22 nt)_marOpR(-65, 22 nt), where a second binding site—from the left part of the original full marOp (marOpL)—lies upstream of marOpR, spanning 22 nt from position -99 with respect to the TSS. (c) pCYC1min_pdhOp(-57, 19 nt), which hosts the 19-nt-long PdhR binding site starting at position -57 with respect to the TSS. 6. In vitro DNA assembly: promoter, coding region, and terminator are amplified via touchdown PCR with primers designed to guarantee about 40 nt between adjacent parts (including the cut-open backbone). 7. Purified PCR products: Promoters, coding regions, and terminators are checked via gel electrophoresis (see Note 14). They are mixed in equimolar amounts with the cut-open backbone

Synthetic Saccharomyces cerevisie Biosensors for Organic Acids

83

and joined together via the Gibson (isothermal) assembly method [33]. E. coli cells are transformed with the plasmids obtained via the Gibson method [34]. The same plasmids are later extracted, in high concentration, from E. coli cells through mini-preparation. The sequence of the assembled TU is verified via the Sanger method [35]. 8. In vivo circuit assembly: the plasmid carrying the reporter element (e.g., pRSII405-pCYC1min_marOpL(-99, 22 nt)_marOpR(-65, 22 nt)-yEGFP-Tsynth6) is integrated into the S. cerevisiae strain wild-type CEN.PK2-1C at the sequence homologous to the LEU2 marker [36]. 9. The strain expressing the highest fluorescence level (let us term it strain A) is selected, with a flow cytometer (see Note 15), to complete the circuit assembly. The plasmid hosting the bacterial allosteric repressor protein able to bind the synthetic promoter inside strain A (e.g., pRSII406-pADH1NLS_MarR_HIStag-CYC1t) is integrated into strain A at the homologous region of the URA3 marker. 10. Fluorescence intensity is measured, with a flow cytometer, in the absence (OFF state) and in the presence (ON state) of the input chemical (e.g., benzoate). The strain with the highest ON/OFF ratio shall be selected to make a cryostock for future analyses (see Notes 16 and 17 and Figs. 1 and 2). 3.2

Touchdown PCR

1. PCR program: Set the PCR program in a thermal cycler. Stage 1: DNA denaturation at 98 °C (duration: 30 s). Stage 2: it is organized in three phases and shall be cycled 10 times. DNA denaturation (at 98 °C for 10 s), primer annealing to the DNA (initially at, for instance, 68 °C for 20 s, and at every cycle, the temperature is decreased by 1 °C). Elongation (at 72 °C, the duration depends on the DNA length and the DNA polymerase speed). Stage 3: it is also organized into three phases and shall be cycled 25 times. DNA denaturation (98 °C, 10 s), primer annealing (59 °C—the last annealing temperature in stage 2—for 20 s), elongation (same as in Stage 2). Stage 4: final elongation at 72 °C for 2 min. Stage 5: the tube is held at 4 °C for an infinite time interval (see Note 18). 2. PCR results: gel electrophoresis permits to verify, by comparison to a DNA ladder, if the length of the amplified DNA sequences is correct. DNA bands are excised from the agarose gel and purified with a “DNA elution kit.”

3.3 Bacterial Transformation and Sequence Verification

1. Plasmid number amplification: Escherichia coli DH5α cells are transformed with the plasmids assembled via the Gibson method. High plasmid concentration is required to carry out, first, mini-preparation and, then, check the sequence of the TU

84

Michael Dare Asemoloye and Mario Andrea Marchisio

Fig. 1 Benzoate/salicylate biosensor. (a) Circuit scheme and truth table. The reporter element contains a single short operator, marOpR(-65, 22 nt). (b) Response to benzoate. (c) Response to salicylate. Each bar represents the mean fluorescence value (in arbitrary units—A.U.) ± the standard deviation from three independent experiments. Star symbols on top of different bars indicate a statistically significant difference with respect to the fluorescence level in the absence of the chemical, which means that the biosensor works properly by discriminating between the absence and the presence of the input (two-sided Welch’s t-test, *pvalue < 0.05, **p-value < 0.01, ***p-value < 0.001). Star symbols with the same color indicate fluorescence levels that are not significantly different according to one-way ANOVA (three or more samples) or two-sided Welch’s t-test (two samples only)

of the receptor/reporter element via Sanger sequencing. Mix into a 1.5 mL glass tube (transformation tube) 5 μL of newly assembled plasmids solution with 50 μL of E. coli DH5α cells. Let the tube be on ice for 10–15 min. Thermal shock: place the transformation tube in a thermal shaker at 42 °C for 30 s. Add 300 μL of Luria-Bertani to the transformation tube and leave it on the bench for 10–15 min during which the cells recover from the thermal shock. Plating the cells: spread the whole solution of transformed cells onto a Luria-Bertani agar plate supplied with 0.1 g/L ampicillin. Place the plate into an incubator at 37 °C. Let the cells grow for 8–16 h. Pick up at least eight colonies, each with a different sterile pipette tip. Drop the

Synthetic Saccharomyces cerevisie Biosensors for Organic Acids

85

Fig. 2 Pyruvate biosensor. (a) Circuit scheme and truth table. (b) Response to pyruvate. at concentrations from 2 fM to 5000 μM. Each bar represents the mean fluorescence value ± standard deviation from three independent measurements. Star symbols on top of different bars indicate a statistically significant difference with respect to the fluorescence level in the absence of the chemical, i.e., the biosensor is working correctly (two-sided Welch’s t-test, *p-value < 0.05, **p-value < 0.01, ***p-value < 0.001). Star symbols with the same color indicate fluorescence levels that are not significantly different according to one-way ANOVA

tips into glass tubes containing 2 mL of LB plus ampicillin solution. Culture the cells in LB plus ampicillin for 8–16 h in a shaker at 37 °C (240 RPM) and make DNA mini-preparation to extract the plasmids from the transformed E. coli cells. Measure the plasmid concentration with a nano-drop machine. 2. Plasmid control digestion: cleave 5 μg of the plasmid with the restriction endonucleases Kpn1 and Sac1 to extract the TU carrying the receptor/reporter sequence. Gel-electrophoresis: mix the digestion solution with the 6× DNA dye. Load the whole solution into a 0.8% agarose gel. Run the gel for 20–30 min. Verify that the length of the TUs is the expected one (see Fig. 3). Obtain the actual sequences of the assembled TUs via Sanger sequencing. 3.4 Integration into the Genome of S. cerevisiae

1. Plasmid linearization: digest a plasmid at a restriction site that is present only on the fraction of the auxotrophic marker that is homologous to a region in the CEN.PK2-1C genome (see Note 19). 2. Yeast cell culture: S. cerevisiae cells are grown overnight in YPD (rich medium) in a shaker at 30 °C (240 RPM). 3. Yeast cell transformation: in the morning, cells are, first, diluted into 20 mL of YPD (in a baffled flask) and, then, grown for 4–5 h in a shaker at 30 °C (130 RPM) to reach a value of OD600 between 0.8 and 2. Cells are finally transformed with a

86

Michael Dare Asemoloye and Mario Andrea Marchisio

Fig. 3 Control digestion. Plasmids that have been assembled via the Gibson method are digested with Kpn1 and Sac1 to separate the TU that carries the receptor/reporter element from the backbone. The lane “L” contains a DNA ladder that shows DNA bands of known length. Odd lanes are occupied by the digested plasmids, whereas even lanes show undigested plasmids. The length of the TUs extracted from the plasmids is estimated by comparison with the DNA ladder

linearized plasmid according to the PEG/LiAc/heat shock protocol [32]. 4. Cell plating: yeast cells are spread on SD-URA (first integration) or SD-LEU (second integration) plates to obtain auxotrophic selection. Plates are placed into an incubator at 30 °C to allow the yeast cells grow for 2–4 days. 3.5 Cell Selection— Reporter Element

1. Pick up, in sterile conditions, at least eight yeast colonies from the SD-URA plate by using sterile inoculating loops. Streak them onto a new SD-URA plate to eliminate false-positive strains. Place the plate into an incubator at 30 °C and allow the cells grow for 24 h. In principle, only the strains containing the reporter will grow on this plate. 2. Cell culture: grow the cells overnight in 3 mL SDC. Dilute 15 μL of the cell solution into 300 μL of water (use a 1.5 or 2 mL plastic tube). Measure cell fluorescence with a flow cytometer (see Note 17).

3.6 Cell Selection— Complete Circuit (Receptor and Reporter)

1. Repeat the process described in Subheading 3.5 by using, however, SD-LEU plates. The strains carrying the complete circuit are compared with those containing the reporter element only. The fluorescence emitted by the uninduced complete circuit shall be much lower than that coming from the reporter element only. Strains showing the lowest fluorescence levels are selected for further characterization under induction with a proper chemical.

Synthetic Saccharomyces cerevisie Biosensors for Organic Acids

3.7 FACS Measurements

87

1. Cell culture: To read fluorescence from the engineered yeast biosensors, MarR-containing transformants shall be grown for 24 h in SDC, whereas the PdhR-containing strains need to be grown, over the same time interval, in the BH medium to avoid any pyruvate flux. 2. Cell induction: To assess how the engineered yeast strains react to benzoate, salicylate, and pyruvate, cells are induced with the corresponding input(s) at different concentrations (titration curve) ranging from 0.2 fM up to 5 mM. Uninduced cells serve to quantify the fluorescence of the circuit in the OFF state. MarR-based strains: 20 μL of the cell culture is added to 980 μL SDC containing salicylate or benzoate (or both) at different concentrations and allowed to grow for 18 h at 30 ° C (240 RPM). PdhR-based strains: 20 μL of the cell culture is added to 980 μL BH broth M350 (pH = 4.5, see Note 20) supplied with diverse concentrations of pyruvate. The induction lasts 18 h at 30 °C (240 RPM) (see Note 21). 3. Cell dilution: before measuring fluorescence with a flow cytometer, cells should be 1:40 diluted in ddH2O by using sterile 2 mL tubes. 4. For each strain, the whole experiment should be repeated three times independently, i.e., on different days. Yeast strain will have an increased expression of green fluorescence in the presence of the corresponding input chemical(s) only. 5. The accuracy and stability of the FACS machine are checked using fluorescent beads whose fluorescence is measured at the beginning and end of any experiment. Results are accepted only if the relative difference between the initial and final values of the bead peaks is not higher than 5%.

3.8 Detection of Putative Compounds in Environmental Samples

1. We got, as an environmental sample, a complex hydrocarbon mixture (CHM)—from a mechanic shop in Tianjin (China)— that was made of engine oil waste from different automobiles. We induced our yeast-based biosensors with this CHM to check if they could detect putative-inducing compounds: benzoate, salicylate, and pyruvate. 2. Cell culture: Grow the yeast strains containing the MarR- and PdhR-based biosensors overnight in SDC at 30 °C (240 RPM). 3. Fluorescence induction with benzoate/salicylate: Grow the yeast strains containing MarR-based biosensors overnight in SDC at 30 °C (240 RPM). Dilute the cell solution to reach OD600 = 0.2. Prepare a 150 mL solution of SDC supplemented with CHM at 0.03, 0.06, 0.125, 0.25, 0.50, and 1.00% (v/v) concentration. Add 20 μL of the solution containing the cells engineered with the MarR-based biosensors. Benzoate

88

Michael Dare Asemoloye and Mario Andrea Marchisio

and salicylate in the CHM would cause an increase in fluorescence expression (compared to uninduced cells). 4. Fluorescence induction with pyruvate: Grow the yeast strains containing the PdhR-based biosensors overnight in BH at 30 ° C (240 RPM). Dilute the cell solution to reach OD600 = 0.2. Prepare a solution of the BH medium supplied with CHM at 0.03, 0.06, 0.125, 0.25, 0.50, and 1.00% (v/v) concentration (pH 4.5, see Note 20). Add 20 μL of the solution containing the cells engineered with the Pdh-based biosensors. Pyruvate present in the CHM would provoke an increase in the cell fluorescence level (in comparison to that from uninduced cells). 5. Fluorescence detection: Let the cells grow for 12 h (overnight) and measure fluorescence with a flow cytometer (see Fig. 4). 3.9 Real-Time Monitoring of Hydrocarbon Metabolism

1. Hydrocarbons are metabolized by some microorganisms to benzoate, salicylate, and pyruvate. Such a process can be monitored by using aliquots of the solution containing processed hydrocarbons to induce fluorescence in our engineered yeast strains at different time intervals. 2. Hydrocarbon degradation: Set up an anthracene-degradation system by means of the fungus Trichoderma harzianum— strain asemoJ (KY488463)—that is capable of degrading hydrocarbons from crude-oil polluted site [3, 37] (see Note 22).

Fig. 4 Response to a complex hydrocarbon mixture (CHM). (a) MarR-based biosensor. (b) PdhR-based biosensor. Each bar is the mean fluorescence value ± standard deviation from three replicas. ns = not a significant difference, one-way ANOVA (three or more samples), or two-tailed Welch’s t-test (two samples)

Synthetic Saccharomyces cerevisie Biosensors for Organic Acids

89

3. Grow the fungus on a potato dextrose agar (PDA) plate and incubate it at 25 °C for 3–5 days to activate the cells. 4. Prepare 1 L of sterile BH broth (pH 4.5–6.0) and supplement it with 0.1 g/L ampicillin to suppress bacteria. Distribute 50 mL of the medium into 250 mL flasks. 5. Supplement the BH medium with 1 mM anthracene as the sole carbon source. 6. Make equal amounts of mycelial plugs (diameter = 5 mM) from the young fungal cultures and aseptically transfer them into the flasks containing 50 mL BH medium. 7. Negative control: Prepare in different flasks 50 mL BH and T. harzianum without any carbon source. 8. Make three replicas for each flask. Place them into a shaker at 30 °C (120 RPM). 9. Monitoring anthracene degradation: Take an aliquot of 1.5 mL from each flask every 12 h over a time lapse of 84 h. Spin it at 14,000 RPM for 10 min to remove the cells. Add 1 mL of the supernatant to 1 mL of either fresh SDC medium to grow MarR-based strains or the BH medium for PdhR-based strains. 10. Grow the yeast cells for at least 12 h before measuring fluorescence with a flow cytometer. 11. Any increase in fluorescence expression would signal the presence of benzoate/salicylate or pyruvate upon comparison to uninduced cells (see Fig. 5).

4

Notes 1. Our proteins were synthesized by Genewiz Inc., Suzhou (China). 2. Bacterial cells (E. coli strain DH5α, Life Technology, 18263012) are grown in Luria Bertani (LB) broth (or 2% agar plates) supplemented with ampicillin (EC number: 200-708-1, CAS number: 69-52-3). 3. The genotype of the S. cerevisiae strain CEN.PK2-1C is: MATa; his3D1; leu2-3_112; ura3-52; trp1-289; MAL2-8c; SUC2. 4. Primers are designed according to the Gibson assembly specifications, i.e., they shall guarantee an overlap of about 40 nt between two contiguous DNA sequences and they shall not fold into hairpins whose melting temperature is bigger than 50 ° C. Touchdown PCR allows to amplify a DNA sequence with a forward and a reverse primer characterized by different annealing temperatures (up to 9 °C difference, in our experiments).

90

Michael Dare Asemoloye and Mario Andrea Marchisio

Fig. 5 Monitoring hydrocarbon metabolism. (a) Response from two different MarR-based biosensors (marOp and 2XmarOp: the promoters in the reporter elements contain a different kind and number of MarR binding sites) to SDC plus T. harzianum-anthracene culture extracts. An enhanced expression of the yEGFP starts after 48 h, which indicates that benzoate and/or salicylate are produced during anthracene metabolism/degradation by T. harzianum. (b) Response from two strains hosting the same PdhR-based biosensor to BH plus T. harzianumanthracene culture extracts. Values are the mean fluorescence from three replicates together with their standard deviation. Fluorescence begins to increase after 12 h and it is still increasing after 84 h, which points out an active metabolism of the compound that, however, was not fully degraded/ metabolized

5. The volume of water, in μL, is calculated by multiplying the primer amount (in nanomoles—it is written on the primer tube) by 10. 6. The percentage of the agarose gel depends on the length of the DNA sequence (the longer the DNA sequence, the lower the amount of agarose in the gel).

Synthetic Saccharomyces cerevisie Biosensors for Organic Acids

91

7. We prefer to avoid using lithium bromide in our lab. 8. ssDNA should be boiled for 5 min at 95 °C regularly, i.e., after any five transformations. 9. Higher temperatures would inactivate ampicillin. 10. AA mix excludes histidine, leucine, tryptophan, and uracil since they are used for preparing selective media for yeast auxotrophic selection. 11. All waste disposal regulations should be followed diligently. 12. A component of a synthetic gene circuit is orthogonal to the circuit chassis (the host cell) if it does not interfere with the usual molecular processes that take place in the same host cell. 13. The GPD promoter is the strongest promoter in the yeast S. cerevisiae. At the end of the transcription unit, the CYC1 terminator (CYC1t) was placed. 14. Promoters, coding regions, and terminators are loaded on an agarose gel to verify that their length is correct. They are, then, eluted from the gel with a kit developed ad hoc. Their concentration is measured with a nano-drop spectrophotometer. 15. We used a BD FACSVerse flow cytometer (blue laser:488 nM, emission filter 527/32 nM). 16. A biosensor is classified as “working” if the ON/OFF ratio is, at least, equal to 2. 17. As an alternative to the flow cytometer, fluorescence can be measured with a multimode plate reader (we used a TECAN NanoQuant Infinite M200). The overall procedure requires to centrifuge for 2 min at MAX speed the solution of yeast cells that have been cultured for 8–16 h, remove the supernatant, and resuspend the cells in 500 μL of water. Then, 200 μL of this solution shall be transferred into a 96-well plate. Both absorbance and fluorescence are measured with the plate reader. As for the total fluorescence intensity (FIT), the excitation and emission wavelengths of the yEGFP shall be set to 476 and 512 nM, respectively [36]. As for the absorbance, the cell solution is further 1:100 diluted before the measurement (OD600-1:100). The average fluorescence intensity level, hFIi, is calculated as < FI > = Sample

FIT FIT - Background , 100 * OD600 - 1:100 100 * OD600 - 1:100 where the “Sample” is an engineered strain, whereas the “Background” corresponds to the CEN.PK2-1C strain.

18. The temperatures here used are compatible with Q5 Hot Start DNA polymerase.

92

Michael Dare Asemoloye and Mario Andrea Marchisio

19. Plasmid linearization increases the probability that homologous recombination takes place. When the pRSII40X are used as a backbone, URA3 can be digested with Stu1 and LEU2 with BstEII—provided that these restriction sites are absent from the reporter and the receptor TU, respectively. 20. BH medium pH is around 7.0 and needs to be adjusted to 4.5 to be suitable for yeast growth. 21. PdhR is activated by pyruvate generated during cellular metabolism [38]. Therefore, yeast PdhR-based biosensors should be grown in a BH medium to avoid fluorescence emission due to the intracellular pool of pyruvate. 22. This test can be conducted using any other hydrocarbon and any microbe that can utilize the chosen hydrocarbon.

Acknowledgments We thank the students of the Synthetic Biology lab for their help. We also want to thank Xiangyang Zhang and Zhi Li for their assistance in the FACS experiments. References 1. Mohler RE, O’Reilly KT, Zemo DA, Tiwary AK, Magaw RI, Synowiec KA (2013) Non-targeted analysis of petroleum metabolites in groundwater using GC×GCTOFMS. Environ Sci Technol 47:10471–10476 2. O’Reilly KT, Mohler RE, Zemo DA, Ahn S, Tiwary AK, Magaw RI, Espino Devine C, Synowiec KA (2015) Identification of ester metabolites from petroleum hydrocarbon biodegradation in groundwater using GC×GCTOFMS. Environ Toxicol Chem 34:1959– 1961. https://doi.org/10.1002/etc.3022 3. Asemoloye MD, Marchisio MA (2023) Synthetic metabolic transducers in Saccharomyces cerevisiae as sensors for aromatic permeant acids and bioreporters of hydrocarbon metabolism. Biosens Bioelectron 220:114897. https://doi.org/10.1016/j.bios.2022. 114897 4. Mehinto AC et al (2015) Interlaboratory comparison of in vitro bioassays for screening of endocrine active chemicals in recycled water. Water Res 83:303–309 5. Voyvodic PL, Pandi A, Koch M, Conejero I et al (2019) Plug-and-play metabolic transducers expand the chemical detection space of cell-free biosensors. Nat Commun 10:1697. https://doi.org/10.1038/s41467-01909722-9

6. Fernandez-Lo´pez R, Ruiz R, de la Cruz F, Moncalia´n G (2015) Transcription factorbased biosensors enlightened by the analyte. Front Microbiol 6:648 7. Park M, Tsai SL, Chen W (2013) Microbial biosensors: engineered microorganisms as the sensing machinery. Sensors 13:5777–5795 8. van der Meer JR, Belkin S (2010) Where microbiology meets microengineering: design and applications of reporter bacteria. Nat Rev Microbiol 8:511–522 9. Ma J, Ptashne MA (1987) New class of yeast transcriptional activators. Cell 51:113–119. https://doi.org/10.1016/0092-8674(87) 90015-8 10. Raut N, O’Connor G, Pasini P, Daunert S (2012) Engineered cells as biosensing systems in biomedical analysis. Anal Bioanal Chem 402: 3147–3159 11. Garamella J, Marshall R, Rustad M, Noireaux V (2016) The all E. coli TX-TL Toolbox 2.0: a platform for cell-free synthetic biology. ACS Synth Biol 5:344–355 12. Shin J, Noireaux V (2012) An E. coli cell-free expression toolbox: application to synthetic gene circuits and artificial cells. ACS Synth Biol 1:29–41 13. Asemoloye MD, Ahmad R, Jonathan SG (2018) Transcriptomic responses of catalase,

Synthetic Saccharomyces cerevisie Biosensors for Organic Acids peroxidase and laccase encoding genes and enzymatic activities of oil spill inhabiting rhizospheric fungal strains. Environ Pollut 235:55– 64. https://doi.org/10.1016/j.envpol.2017. 12.042 14. Dacco` C, Girometta C, Asemoloye MD et al (2020) Key fungal degradation patterns, enzymes and their applications for the removal of aliphatic hydrocarbons in polluted soils: a review. Int Biodeterior Biodegrad 147: 104866. https://doi.org/10.1060/j.ibiod. 2019.104866 15. Sun S, Wang H, Chen Y, Lou J, Wu L, Xu J (2019) Salicylate and phthalate pathways contributed differently on phenanthrene and pyrene degradations in Mycobacterium sp. WY10. J Hazard Mater 364:509–518. https://doi. org/10.1016/j.jhazmat.2018.10.064 16. Creamer KE, Ditmars FS, Basting PJ (2016) Benzoate- and salicylate-tolerant strains of Escherichia coli K-12 lose antibiotic resistance during laboratory evolution. Appl Environ Microbiol 83(2):e02736. https://doi.org/10. 1128/AEM.02736-16, 16 17. Lennerz B, Vafai SB, Delaney NF, Clish CB, Deik AA, Pierce KA, Ludwig D, Mootha VK (2015) Effects of sodium benzoate, a widely used food preservative, on glucose homeostatsis and metabolic profiles in humans. Mol Genet Metab 114:73–79 18. An C, Mou Z (2011) Salicylic acid and its function in plant immunity. J Integr Plant Biol 53:412–428 19. Hawley S, Fullerton MD, Ross F (2012) The ancient drug salicylate directly activates AMPactivated protein kinase. Science 336(80):918–922, 17 20. Spadafranca A, Bertoli S, Fiorillo G, Testolin G, Battezzati A (2007) Circulating salicylic acid is related to fruit and vegetable consumption in healthy subjects. Br J Nutr 98:802–806 21. Qin J, Li R et al (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464:59–67 22. Kanjee U, Houry WA (2013) Mechanisms of acid resistance in Escherichia coli. Annu Rev Microbiol 67:65–81 23. Martin RG, Rosner JL (1995) Binding of purified multiple antibiotic-resistance repressor protein (MarR) to mar operator sequences. Proc Natl Acad Sci U S A 92:5456–5460 24. Sundaramoorthy NS, Sivasubramanian A, Nagarajan S (2020) Simultaneous inhibition of MarR by salicylate and efflux pumps by curcumin sensitizes colistin resistant clinical isolates of Enterobacteriaceae. Microb Pathog 148:104445

93

25. Sundaramoorthy NS, Suresh P, Selva Ganesan S et al (2019) Restoring colistin sensitivity in colistin-resistant E. coli: combinatorial use of MarR inhibitor with efflux pump inhibitor. Sci Rep 9:19845. https://doi.org/10.1038/ s41598-019-56325-x 26. Chubiz LM, Rao CV (2010) Aromatic acid metabolites of Escherichia coli K-12 can induce the marRAB operon. J Bacteriol 192:4786– 4789 27. Ogasawara H, Ishida Y, Yamada K, Yamamoto K, Ishihama A (2007) PdhR (pyruvate dehydrogenase complex regulator) controls the respiratory electronransport system in Escherichia coli. J Bacteriol 189(15): 5534–5541. https://doi.org/10.1128/JB. 00229-07 28. Quail MA, Haydon DJ, Guest JR (1994) The pdhR-aceEF-lpd operon of Escherichia coli expresses the pyruvate dehydrogenase complex. Mol Microbiol 12(1):95–104 29. Chee MK, Haase SB (2012) New and redesigned pRS plasmid shuttle vectors for genetic manipulation of Saccharomyces cerevisiae. G3 (Bethesda) 2:515–526. https://doi.org/10. 1534/g3.111.001917 30. Sheff MA, Thorn KS (2004) Optimized cassettes for fluorescent protein tagging in Saccharomyces cerevisiae. Yeast 21(8):661–670. https://doi.org/10.1002/yea.1130 31. Hahn S, Hoar ET, Guarente L (1985) Each of three “TATA elements” specifies a subset of the transcription initiation sites at the CYC-1 promoter of Saccharomyces cerevisiae. Proc Natl Acad Sci U S A 82:8562–8566 32. Curran KA, Morse NJ, Markham KA, Wagman AM, Gupta A, Alper HS (2015) Short synthetic terminators for improved heterologous gene expression in yeast. ACS Synth Biol 4:824–832 33. Gietz RD, Woods RA (2002) Transformation of yeast by lithium acetate/single-stranded carrier DNA/polyethylene glycol method. Methods Enzymol 350:87–96. https://doi.org/10. 1016/s0076-6879(02)50957-5 34. Gibson D (2009) One-step enzymatic assembly of DNA molecules up to several hundred kilobases in size. Protoc Exch 6:343–345 35. Sambrook MR, Ga J (2018) Molecular cloning, 4th edn. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York 36. Yu L, Zhang Y, Marchisio MA (2022) Gene digital circuits based on CRISPR–Cas systems and anti-CRISPR proteins. J Vis Exp (188). https://doi.org/10.3791/64539 37. Asemoloye MD, Marchisio MA (2022) Synthetic Saccharomyces cerevisiae tolerate and

94

Michael Dare Asemoloye and Mario Andrea Marchisio

degrade highly pollutant complex hydrocarbon mixture. Ecotoxicol Environ Saf 241:11376 38. Anzai T, Imamura S, Ishihama A, Shimada T (2020) Expanded roles of pyruvate sensing PdhR in transcription regulation of the

Escherichia coli K-12 genome: fatty acid catabolism and cell motility. Microb Genom 6(10): mgen000442. https://doi.org/10.1099/ mgen.0.000442

Chapter 6 dCas12a:Pre-crRNA: A New Tool to Induce mRNA Degradation in Saccharomyces cerevisiae Synthetic Gene Circuits Lifang Yu and Mario Andrea Marchisio Abstract We describe a new way to trigger mRNA degradation in Saccharomyces cerevisiae synthetic gene circuits. Our method demands to modify either the 5′- or the 3′-UTR that flanks a target gene with elements from the pre-crRNA of type V Cas12a proteins and expresses a DNase-deficient Cas12a (dCas12a). dCas12a recognizes and cleaves the pre-crRNA motifs on mRNA sequences. Our tool does not require complex engineering operations and permits an efficient control of protein expression via mRNA degradation. Key words CRISPR-dCas12a, Pre-crRNA, Repeat, mRNA, Degradation, Gene circuits, S. cerevisiae

1

Introduction The messenger RNA (mRNA), as illustrated in the central dogma of molecular biology, plays an essential role in gene expression in any kind of organism [1]. Therefore, the concentration of the proteins involved in a synthetic gene circuit can be controlled by acting on the mRNA. One possibility is to regulate translation with RNA-binding proteins or riboswitches that prevent or (less commonly) facilitate the ribosome binding to and elongation through the mRNA [2]. Another, highly effective, way requires to undermine mRNA stability, which triggers mRNA degradation and provokes a drastic reduction in protein synthesis. To date, at least four different controls of mRNA stability have been engineered in synthetic cells. They make use of (1) Pumilio/FBF (PUF) proteins, which bind a specific 8-nt-long site placed on the 3′ UTR (untranslated region) of the target mRNA and recruit RNA degradation factors [3, 4]; (2) short anti-sense RNA sequences, such as siRNAs (small interfering RNAs) and miRNAs (micro RNAs) [5, 6]; (3) ribozymes, which auto-cleave upon binding a chemical [7];

Jeffrey Carl Braman (ed.), Synthetic Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2760, https://doi.org/10.1007/978-1-0716-3658-9_6, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2024

95

96

Lifang Yu and Mario Andrea Marchisio

and (4) Csy4 (also known as Cas6f) and Cas13a/b/d/, i.e., a type I-F and some type VI CRISPR-associated proteins that work as RNases [8–10]. Cas12a—formerly known as Cpf1—belongs to the type V CRISPR-Cas system and it is characterized by a double nuclease activity. When, in prokaryotic cells, the presence of a foreign DNA element triggers the CRISPR-Cas-based immune system, the CRISPR array (a library of the intruding DNAs met by the cell) is transcribed into a long pre-crRNA (pre-CRISPR RNA) molecule. The pre-crRNA is organized into several identical hairpin structures (the repeats) separated by sequences of different lengths (the spacers that come from the invading DNA). Cas12a recognizes the hairpin structure of the repeats and cleaves the pre-crRNA in their vicinity. In this way, Cas12a processes the pre-crRNA into mature crRNA molecules, each consisting of a repeat followed by a spacer (Rs). Upon forming a complex with a crRNA, the Cas12a: crRNA ribonucleoprotein binds and cleaves a DNA sequence that matches the spacer, provided that it is preceded by the TTTV protospacer adjacent motif (PAM). The cut DNA molecule is finally degraded by the cell [11] (see Fig. 1). So far, in Synthetic Biology, Cas12a and its DNase-deficient version, dCas12a, have been used extensively in gene editing and transcription regulation, respectively [12–14]. We established in Saccharomyces cerevisiae a new, efficient, mRNA degradation system—termed the dCas12a:pre-crRNA tool—based on dCas12a proteins and their repeats. We considered, mainly, three different dCas12as: dFnCas12a (from Francisella novicida), dLbCas12a (from Lachnospiraceae bacterium), and the enhanced dAsCas12a (denAsCa12a, from Acidaminococcus sp. BV3L6 [15]. Each dCas12a was employed in three different configurations: the bare protein, the protein fused to two NLSs–– nuclear localization sequences, and the protein fused to two NESs––nuclear export sequences. We showed that every yeast codon-optimized dCas12a was capable of inducing strong mRNA degradation in S. cerevisiae after inserting (shortened) pre-crRNA sequences (cognate and non-cognate) in the 5′- or 3′-UTR of the target mRNA (encoding either for the yeast enhanced green fluorescent protein—yEGFP, or the Trametes Trogii laccase 1 protein— Ttlcc1) [16]. We pointed out that the introduction of a single direct repeat on the 5′ UTR was, in general, more effective than modifying the 3′ UTR with up to nine repeats. dFnCas12a appeared to degrade the mRNA only when fused to the NLS or NES sequences and displayed a clear preference for its cognate pre-crRNA. In contrast, the bare denAsCas12a outperformed its variants containing two NLSs or NESs, with good performance also on non-cognate repeats. dLbCas12a induced a remarkable decrease in protein expression when fused to the NESs and acting on the cognate Rs on the 5′ UTR. Overall, the fluorescence signal

dCas12a:Pre-crRNA Induces mRNA Degradation in Yeast

97

Fig. 1 CRISPR-Cas12a working mechanism. Once Cas12a is expressed in prokaryotic cells, it binds and cuts at the repeat-spacers of the pre-crRNA to form the Cas12a:crRNA ribonucleoprotein that leads to DNA cleavage

from our synthetic S. cerevisiae cells decreased by up to 81.04-fold when the 5′ UTR was cleaved and 24.17-fold when the pre-crRNA elements were placed on the 3′ UTR. Taken together, our work shows a new and flexible mRNA degradation system that is engineered in two steps only. The first one requires the insertion of (part of) the pre-crRNA associated with a Cas12a protein into the 5′ or 3′ UTR of a target gene. The second step demands the expression of a dCas12a protein into the previously engineered cells. dCas12a recognizes the pre-crRNA element(s) on the target mRNA, cleaves them, and, as a consequence, induces mRNA degradation. This tool was successfully applied to the construction of one- and two-input Boolean gates. Compared to RNA binding proteins and ribozymes, we provide a more straightforward way to tune the mRNA level inside yeast synthetic gene circuits without recurring to any complicated operations of genetic engineering.

98

2

Lifang Yu and Mario Andrea Marchisio

Materials Double distilled water (ddH2O) was used for buffer, solid and liquid media, and reagents preparation. All solutions and plates were autoclaved at 121 °C for 20 min before being stored at 4 ° C. Every dCas12a gene was yeast codon optimized and synthesized by Genewiz Inc., Suzhou (China). Plasmid computational design and DNA sequence analysis were carried out with the software CLC sequence viewer 8.0 (QIAGEN Aarhus A/S).

2.1

PCR

1. Primer design: CLC sequence viewer 8 and OligoAnalyzer (IDT, Integrated DNA Technologies) programs. 2. 2.5 mM dNTP mix preparation: Pour 250 μL of 100 mM dATP, dGTP, dCTP, and dTTP into a 15 mL centrifuge tube together with 9 mL ddH2O. Mix the components by inverting the tube several times, then aliquot the solution into 1.5 mL tubes (0.5 mL/tube). Store them at -20 °C. 3. 10 μM primer preparation: Centrifuge the tubes containing dry DNA for at least 10 s to collect the primers at the bottom of the tube (and reduce any loss of oligos). Add ddH2O to dissolve the DNA and prepare 100 μM stock solutions. The volume of water, in μL, is calculated by multiplying the primer amount (in nanomoles) by 10. Add 1 μL of 100 μM primer stock solution to 9 μL ddH2O to get a 10 μM primer solution for PCR.

2.2 Agarose Gel Electrophoresis

1. 10× TBE buffer: Weigh 108 g Tris and 55 g boric acid into 1-L glass bottle. Add 40 mL 0.5 M EDTA (pH 8.0) and 700 mL ddH2O. Mix them thoroughly with a magnetic stir bar. Add ddH2O up to 1000 mL and mix the solution again with a magnetic stir bar. 2. 0.5× TBE buffer: Mix 50 mL of 10× TBE buffer and 950 mL ddH2O in 1-L glass bottle. 3. 6× DNA loading dye: Add, into a 15 mL centrifuge tube, 25 mg of amaranth dye, 3 mL glycerol, and ddH2O up to a final volume of 10 mL. Aliquot the solution into 1.5 mL black tubes (1 mL/tube) and store them at -20 °C. Add a nucleic acid dye before using (see Note 1).

2.3 Agarose Gel Preparation

1. Weigh enough agarose to achieve the required gel concentration. Mix in a flask the agarose with a proper amount of 0.5× TBE buffer (see Note 2). 2. Heat the flask in a microwave oven to dissolve the agarose completely.

dCas12a:Pre-crRNA Induces mRNA Degradation in Yeast

99

3. Pour the solution into the gel preparation cast where a comb has been already placed. Let the solution solidify at 4 °C (it takes 15–20 min—see Note 3). 2.4 Gibson Reaction Buffer

1. Weigh 1.5 g of PEG8000 in a 10 mL centrifuge tube and add to it 3 mL of 1 M Tris–HCL (pH 7.5). 2. Add to the above solution 150 μL of 2 M MgCL2, 60 μL of 100 mM dGTP/dATP/dTTP/dCTP, 300 μL of 1 M DTT, 1.5 mL of 20 mM NAD, and ddH2O up to an overall volume of 6 mL. Mix by inverting the tube. 3. Aliquot the solution in 1.5 mL sterile tubes (320 μL/tube). Store them at -80 °C.

2.5 Gibson Assembly Master Mixture

1. Place 80 PCR tubes on ice to precool them. 2. Take 320 μL of Gibson reaction buffer, i.e., one tube from the -80 °C freezer and add 0.64 μL of T5 exonuclease (10 U/μL), 20 μL of Phusion DNA polymerase (2 U/μL), 160 μL of Taq DNA ligase (40 U/μL), and ddH2O up to a volume of 1.2 mL. Mix the solution by shaking and centrifuging (see Note 4) the tubes (repeat this operation 3–4 times). 3. Aliquot the solution into the precooled PCR tubes (15 μL/ tube). Store them at -20 °C.

2.6

Mini-preparation

1. Resuspension solution (with the addition of 100 μg/mL RNAse A): Mix 12.5 mL of 1 M Tris–HCL (pH 8.0), 10 mL of 0.5 M EDTA (pH 8.0), and 477.5 mL ddH2O in a 500 mL glass bottle with a magnetic stir bar to obtain 500 mL of resuspension solution. Store at 4 °C. Add 30 μL of RNAse A (100 mg/mL) solution to 30 mL of resuspension solution in a 50 mL centrifuge tube. Store at 4 °C. 2. Lysis solution: 10 M NaOH: Weigh 100 g of NaOH in a 500 mL beaker, add 100 mL ddH2O and mix with a magnetic stir bar. Add a further 150 mL ddH2O and mix again with a magnetic stir bar. Store at room temperature. Dissolve 1 g of SDS in 2 mL of 10 M NaOH, then add ddH2O up to a volume of 100 mL. Store at room temperature. 3. Neutralization solution: 5 M KAc stock solution: Dissolve 49.075 g of KAc (molecular weight, 98.15 amu) in 100 mL ddH2O. Store at room temperature. Pour 60 mL of 5 M KAc into a new 100 mL glass bottle and add 11.5 mL of 96% acetic acid. Finally, add 28.5 mL ddH2O to obtain 100 mL of the neutralization solution. Store at 4 °C. 4. 70% ethanol: Mix 35 mL 100% ethanol (analytical level) and 15 mL sterile ddH2O in a 50 mL centrifuge tube.

100

Lifang Yu and Mario Andrea Marchisio

2.7 Yeast Transformation

All synthetic constructs were placed into the S. cerevisiae strain CEN.PK2-1C (MATa; his3D1; leu2-3_112; ura3-52; trp1-289; MAL2-8c; SUC2). 1. 10× TE buffer: Mix 5 mL of 1 M Tris–HCL (pH 8.0), 1 mL of 0.5 M EDTA (pH 8.0), and 44 mL ddH2O in a 100 mL glass bottle. Autoclave the solution (121 °C for 20 min). Store at room temperature. 2. TE buffer (1×): Mix 10 mL of 10× TE buffer with 90 mL ddH2O. 3. ssDNA (10 mg/mL): Weigh 500 mg salmon sperm DNA. Add them to 50 mL of TE buffer. Shake the mixture over 2 days at 4 °C. Aliquot into 1.5 mL tubes (1.0 mL/Tube). Boil the tubes for 20 min at 95 °C (see Note 5). 4. LiAc (lithium acetate) 1 M solution: Weigh 10.2 g of lithium acetate dihydrate and dissolve them in 100 mL ddH2O. 5. LiAc mix: Mix 10 mL of LiAc, 10 mL of 10× TE buffer, and 80 mL ddH2O in a beaker. Sterilize the solution with a 0.22 μM filter membrane. 6. PEG mix: Dissolve 40 g of PEG 3350 in 70 mL LiAc mix. Add further LiAc mix up to an overall volume of 100 mL and sterilize the solution with a 0.22 μM filter membrane.

2.8 FACS Beads and Cleaning Solution Preparation

1. Fluorescent beads preparation: Pour 800–900 μL ddH2O in a 2 mL black tube. Add a drop of beads into the tube (see Note 6). 2. Cleaning solution preparation: Filter 50 mL sodium hypochlorite solution with a 0.22 μM filter membrane.

2.9

Media and Plate

1. LB (Luria-Bertani) (plus ampicillin) solution/plate: Weigh 10 g of bacto-tryptone, 5 g of yeast extract, and 10 g of NaCL. Pour them into a 1-L glass bottle (for plates, weigh extra 15 g of agar). Add ddH2O up to 999 mL and autoclave. When preparing LB plus ampicillin, allow the LB (or LB plus agar) solution cool down to 55–60 °C (the temperature that a hand can bear) and then add 1 mL ampicillin solution (100 mg/mL) (see Note 7). 2. YPD medium: Mix 20 g of bacto-peptone, 10 g of yeast extract, and adenine hemisulfate (a spatula tip) in a 1-L glass bottle. Add ddH2O up to 800 mL. Dissolve 20 g of glucose in 150 mL ddH2O, then add further ddH2O to reach a 200 mL volume. Autoclave the two solutions, then pour the 200 mL of glucose solution into the 800 mL mixture to get 1 L YPD solution. 3. AA mix preparation: Mix, in a 250 mL glass bottle, 0.5 g adenine, 2.0 g alanine, 2.0 g arginine, 2.0 g asparagine, 2.0 g aspartic acid, 2.0 g cysteine, 2.0 g glutamine, 2.0 g glutamic

dCas12a:Pre-crRNA Induces mRNA Degradation in Yeast

101

acid, 2.0 g glycine, 2.0 g isoleucine, 2.0 g methionine, 2.0 g lysine, 2.0 g phenylalanine, 2.0 g proline, 2.0 g serine, 2.0 g threonine, 2.0 g tyrosine, 2.0 g valine, and glass beads. Shake the bottle strongly for almost 15 min. 4. SDC (synthetic defined complete) medium: Mix 20 g of glucose, 6.7 g of YNB, 2 g AA mix, 79.2 g Histidine, 396 g Leucine, 79.2 g Tryptophan, and 79.2 g Uracil (see Note 8) in a 1000 mL glass bottle. Add ddH2O to reach 1 L volume.

3 3.1

Methods Plasmid Design

1. A transcription unit (TU), which consists of, at least, a promoter followed by a coding region and a terminator, is placed into a plasmid backbone, where the multiple cloning site (MCS) is removed. The bacterial-yeast integrative shuttle vectors pRSII40X (where X is a label for the auxotrophic marker) are used here [17]. 2. Plasmids carrying the pre-crRNA element(s) in the 3′ UTR— pRSII404-pCYC1min-yEGFP-nxRs-CYC1t: a promoter of moderate strength shall be chosen to avoid an excessive production of mRNA (see Note 9) [18]. The yeast-enhanced green fluorescent protein (yEGFP) [19] is a convenient reporter protein for measurements with a flow cytometer or a microscope (see Note 10). A variable number (n) of repeat-spacer (see Note 11) elements is placed downstream of the STOP codon of the yEGFP, i.e., in front of the terminator. They are transcribed as the initial segment of the 3′ UTR. The spacers are bacterial sequences to prevent any action of the dCas12a: crRNA complex on the yeast genome (see Note 12). A wellcharacterized yeast terminator, such as CYC1t, is placed at the end of the TU. 3. Plasmids carrying a single Rs in the 5′ UTR—pRSII404pTEF2-Rs-yEGFP-CYC1t: choose a rather strong promoter to compensate for the reduction in the transcription initiation rate due to the hairpin structure (the repeat) in the 5′ UTR (see Note 13). A single Rs is placed downstream of the promoter, before the START codon of the yEGFP. The CYC1 terminator ends this TU as well. 4. Acceptor vectors for the constitutive expression of dCas12a proteins—pRSII406-pGPD-ATG-(NLS/NES)-HIStagBamH1-sp-Xho1-(NLS/NES)-TAA-CYC1t: choose a strong promoter, such as pGPD, to maximize the expression of dCas12a. Downstream of the promoter, place the START codon of the CDS, an NLS (or an NES) to force dCas12a into the nucleus (or the cytoplasm) (see Note 14), the HIS (or another short) tag for Western blotting experiments, a random, up to 20-nt-long DNA sequence (sp) flanked by two restriction sites that are not present elsewhere in the acceptor

102

Lifang Yu and Mario Andrea Marchisio

vector nor along the dCas12a sequence (e.g., BamH1 and Xho1), another NLS (NES) to achieve a strong protein localization within a cell compartment (see Note 14), the STOP codon, a well-characterized terminator such as CYC1t. 5. Acceptor vectors for the controlled expression of dCas12a proteins—pRSII406-regulated promoter-ATG-(NLS/NES)HIStag-BamH1-sp-Xho1-(NLS/NES)-TAA-CYC1t: the only difference with respect to the plasmid design in Subheading 3.1, step 4 is in the promoter whose transcription initiation rate is controlled, here, by a chemical. Possible choices are pGAL1, activated by galactose; pCUP1, activated by copper; and pMET25, repressed by methionine. These promoters are useful to build gene digital circuits whose inputs are the chemicals to which the promoters respond (see Note 15). 3.2 Gene Circuit Design

1. An IMPLY Boolean gate that responds to copper and methionine: The gate consists of three Tus, two sensing the inputs (copper and methionine) and expressing the two halves of a split dLbCas12a [14]. The last TU carries the yEGFP gene preceded by a single Rs motif. dLbCas12a is reconstituted— and, therefore, able to cleave the mRNA of the yEGFP—only in the presence of its two components, i.e., when the cell culture contains copper but lacks methionine (see Fig. 2).

Fig. 2 The diagram of the IMPLY gate. One Rs motif is inserted into the 5′ UTR of the yEGFP gene. The two parts of the split dLbCas12a are expressed by the regulated CUP1 and MET25 promoters. The circuit triggers the degradation of the yEGFP mRNA only for the (1,0) truth table entry, i.e., when the sole copper is present as an input (FI: fluorescence intensity)

dCas12a:Pre-crRNA Induces mRNA Degradation in Yeast

103

2. Plasmid expressing yEGFP: the same as in Subheading 3.1, step 3—pRSII404-pTEF2-Rs-yEGFP-CYC1t (see Note 16). 3. Plasmid expressing dLbCas12a_N406 (i.e., the first 406 amino acids of dLbCas12a)—pCUP1-NES-HIStag-dLbCas12a_N406-Z1-TGUO1: the inducible CUP1 promoter precedes an NES, a HIS tag, the dLbCas12a_N406 fragment, a leucine zipper domain (Z1-EE12RR345L [14]), and the GUO1 terminator (TGUO1) (see Note 17). 4. Plasmid expressing dLbCas12a_N407, the rest of dLbCas12a—pMET25-Z2-dLbCas12a_N407-NES-CYC1t: the MET25 promoter (repressed by methionine) is placed in front of a leucine zipper domain (Z2-RR12EE345L) [14], dLbCas12a_N407, an NES, and the CYC1 terminator. The binding between dLbCas12a_N406 and dLbCas12a_N407 is facilitated by the presence of the two complementary leucine zipper domains, Z1 and Z2. 3.3

Touchdown PCR

1. Primer design: two pairs of primers are required to assemble pRSII404-pCYC1min-yEGFP-nxRs-CYC1t. They are termed pair 1 (Forward primer—Fw1, Reverse primer—Rev1) and pair 2 (Fw2, Rev2), as in Fig. 3 (see Note 18). Pair 1 helps amplify the pCYC1min-yEGFP fragment, pair 2 is used for CYC1t.

Fig. 3 Assembling the plasmids that express the target mRNA. Plasmid 1, the backbone, is digested by two enzymes (Acc651 and Sac1) to remove the MCS. pCYC1min-yEGFP and CYC1t fragments are obtained, from other plasmids, via PCR with primer pair 1 (Fw1, Rev1) and pair 2 (Fw2, Rev2), respectively. A variable number, n, of Cas12a repeat-spacer elements is introduced between the STOP codon of yEGFP and the beginning of CYC1t (3′ UTR) by Fw2 and Rev1. During the Gibson assembly, pCYC1min-yEGFP, nxRs, and CYC1t are assembled into the cut-open backbone. The plasmid pRSII404-pTEF2- Rs-yEGFP-CYC1t is constructed in a similar way. A single Cas12a Rs is inserted upstream of yEGFP (5′ UTR) via the primers Fw4 and Rev3

104

Lifang Yu and Mario Andrea Marchisio

The nxRs, i.e., the Cas12a pre-crRNA elements, are located on the primers Rev1 and Fw2. Two more pairs of primers are designed to build pRSII404-pTEF2-Rs-yEGFP-CYC1t. Fw3 and Rev3 serve to obtain pTEF2 from a template plasmid, whereas Fw4 and Rev4 permit to get yEGFP-CYC1t. Rs is introduced via Rev3 and Fw4 (see Note 19). 2. Take out of the -20 °C fridge: primers (10 μM), dNTP mix (2.5 mM), Q5 Hot Start DNA polymerase reaction buffer, and DNA templates (i.e., the plasmids containing the transcription units pCYC1min-yEGFP-CYC1t or pTEF2-yEGFP-CYC1t). Keep all tubes on ice for about 5–8 min. Shake the tubes for a few more seconds. 3. PCR reaction mixture preparation: Add, into a 200 μL PCR tube, 1 μL of DNA template (see Note 20), 1 μL of forward primer (10 μM), 1 μL of reverse primer (10 μM), 5 μL of 2.5mM dNTP mix, 10 μL of 10× Q5 Hot Start DNA polymerase reaction buffer, and 31.5-μL sterile ddH2O (for total volume 49.5 μL). Shake the tube and centrifuge it for 1–3 s (see Note 21). Add 0.5 μL of Q5 Hot Start DNA polymerase (see Note 22) to reach an overall volume of 50 μL (see Note 23). 4. Set up the touchdown PCR program: After placing the PCR tubes into a Thermal Cycler, set the PCR program as Stage 1: 30 s at 98 °C. Stage 2: 98 °C for 10 s, Ta1 for 20 s (see Note 24), and 72 °C for Etime (see Note 25), to be repeated 10 times (each time Tai, the annealing temperature, decreases by 1 °C). Stage 3: 98 °C for 10 s, Ta10 for 20 s, and 72 °C for Etime, to be repeated 20–25 times. Stage 4: 2 min at 72 °C. Stage 5: “infinite” time at 4 °C. 3.4 Agarose Gel Electrophoresis

1. Take the agarose gel out of the 4 °C fridge, remove the comb from the gel, and put the gel into an electrophoresis tank. 2. Load the PCR solutions, supplied with 10 μL of 6× DNA loading dye (60 μL overall), on the lanes of the gel (see Note 26). 3. Mix 8–10 μL of DNA marker with 2 μL of 6× DNA loading dye in a new 1.5-mL tube, then load it on the gel. 4. Run the gel at 100 V for 15–20 min (see Note 27). 5. Compare the length of the PCR products with the DNA marker. Cut the bands out of the gel if their length is correct. Put each of them into a 2 mL tube. 6. Extract the DNA from the gel with a DNA elution kit.

3.5 Backbone Digestion

1. Preparation of 30 μL digestion mixture: Mix 5 μg of the backbone, 3 μL of the 10× reaction buffer, and ddH2O in a variable quantity into a 1.5 mL tube. Add 1 μL Acc651 and 0.5 μL Sac1 (restriction endonucleases) to remove the MCS

dCas12a:Pre-crRNA Induces mRNA Degradation in Yeast

105

from the backbone. The final volume of the reaction solution is 30 μL. Hence, the concentration of the cut-open backbone should be equal to 166.7 ng/μL (see Note 28). 2. Preparation of 30 μL control solution: The control solution is prepared as in Subheading 3.5, step 1 except for the two enzymes that are not used. 3. Put the digestion and control tubes in an incubator at 37 °C for 1 h. 4. Backbone digestion results: Mix 5 μL digestion and control solution with 1 μL of 6× DNA loading dye in two separate tubes. Load both solutions on a gel and run agarose gel electrophoresis as in Subheading 3.4. 5. Inactivation of the activity of the enzymes: Put the digestion tube, which now contains a cut-open backbone, in an incubator at 65 °C for 20 min (see Note 29). 3.6

Gibson Assembly

1. Mix the cut-open backbone and the PCR products in equimolar amounts inside a 1.5 mL tube (DNA tube). In another 1.5 mL, mix the cut-open backbone (same volume as in the previous tube) with ddH2O to have an overall volume of 5 μL (control tube). Take out of the -20 °C freezer two PCR tubes containing 15 μL of the Gibson assembly master mixture (see Subheading 2.5, item 3). Put them on ice. Add to one tube 5 μL from the “DNA tube” (Gibson reaction solution), and to the other tube the whole content of the “control tube” (Gibson control solution) (see Note 30). Put both PCR tubes on a Thermal Cycler for 1 h at 50 °C.

3.7 Escherichia coli DH5α Transformation

1. Mix, separately, 5 μL of the Gibson reaction solution and 5 μL of the Gibson control solution with 50 μL of E. coli DH5α cells. Bacterial transformation requires a heat shock at 42 °C (30 s, at least) [12]. Afterwards, plate the cells in LB supplied with ampicillin (0.1 g/L). Allow the cells to grow overnight in an incubator at 37 °C (see Note 31). Count the number of bacterial colonies on the two plates. Divide the number of colonies on the “Gibson reaction solution” plate by that on the control plate. The higher the ratio between the two plates, the lower the probability to have a false positive on the transformation plate.

3.8

1. Cell culture: Prepare at least four glass tubes containing 3 mL of LB supplied with ampicillin (0.1 g/L). Pick four colonies from the “Gibson reaction solution” plate (see Subheading 3.7, step 1) using each time a sterile white tip. Drop the tips into the glass tubes. Place the glass tubes into a shaker at 37 °C, 240 RPM. Grow the E. coli cells for 8–16 h.

Mini-preparation

106

Lifang Yu and Mario Andrea Marchisio

Fig. 4 Mini-preparation. Bacterial cell solutions after adding the neutralization solution

2. Collect the cells: Pour 2 mL of cell culture solutions into 2 mL centrifuge tubes and centrifuge them at MAX speed for 1 min. Remove the supernatant with a pump. 3. Suspend the cells: Add 250 μL of the resuspension solution (supplied with Rnase A) to the cell deposited at the bottom of the 2 mL tubes. Resuspend the cells by scraping the tubes on a rack. 4. Lyze the cells: Add 250 μL of the lysis solution to the solutions of resuspended cells. Invert the tubes (1–2 s per time) until the cell mixture becomes clear. 5. Neutralization reaction: Add 350 μL of the neutralization solution to the solutions of lyzed cells. Invert the tubes gently (almost 7–8 s per time) until the solutions become cloudy (see Fig. 4). 6. Centrifuge the 2-mL tubes at MAX speed for 10 min to get the plasmids in the supernatant. 7. Transfer the supernatant into 1.5 mL centrifuge tubes containing 800 μL isopropanol (analytical level). Invert the tubes at least five times. Centrifuge the tubes at MAX speed for 10 min. 8. Remove the supernatant with a pump, carefully (if possible, decrease the pressure of the pump), to avoid removing the plasmids that have adhered to the tube wall. 9. Add 500 μL of 70% ethanol. Centrifuge at MAX speed for 5 min. This step permits to wash the DNA by removing salt. 10. Remove the supernatant and put the tubes with open lids on a thermal shaker at 37 °C, 900 RPM, to dry the DNA (it takes at least 15 min). 11. Add 50-μL ddH2O to dissolve the DNA in 1.5 mL tubes. Vortex the tubes briefly at half speed (8–10 s) and measure the concentration of the plasmids with a nanodrop machine (see Note 32).

dCas12a:Pre-crRNA Induces mRNA Degradation in Yeast

107

3.9 Checking the Sequences of the Assembled Plasmids

1. Digest the plasmids obtained from mini-preparation as in Subheading 3.8. Use 2 μg of DNA and Kpn1 instead of Acc651 (see Note 33). If the size of the insert is the expected one, the plasmids can be sequenced via the Sanger method [20].

3.10 Yeast Transformation

1. Plasmid DNA linearization: Mix in a 1.5 mL tube 5 μg of an integrative plasmid, a restriction endonuclease that cuts the plasmid along the auxotrophic marker (see Note 34), the enzyme reaction buffer, and ddH2O for an overall volume of 25 μL. Prepare a control solution where the enzyme is absent. Place the digestion and control solutions in an incubator at 37 ° C for 1–4 h. Check the plasmid linearization via agarose gel electrophoresis (use 5 μL of the two solutions). The digestion solution—containing the linearized plasmid—will be used for yeast transformation. 2. Yeast preculture: Grow yeast cells in 5 mL YPD medium at 30 ° C, 240 RPM, for 14–16 h. 3. Dilution: Add 500 μL of yeast preculture solution to 20 mL of fresh YPD inside a flask. Grow the cells at 30 °C, 130 RPM, for 4.5–5 h such that the OD600 reaches at least 0.8 (up to 2.0 is fine). 4. Yeast transformation is realized as described in [21] by using LiAc mix, ssDNA, and PEG mix (see Note 35). 5. Grow the transformed cells on a plate containing a synthetic selective medium for 2–4 days.

3.11 Strain Selection: FACS Measurement

1. Strains transformed with pRSII404-pCYC1min-yEGFP-nxRsCYC1t or pRSII404-pTEF2-As_Rs-yEGFP-CYC1t only. These strains should produce a moderate fluorescent signal. 2. Cell culture: Pick with a sterile loop at least eight strains from the transformation plate (see Subheading 3.10). Streak them on a new selective plate and let them grow overnight in an incubator at 30 °C (see Note 36). Collect cells, from each strain that grew on the new plate, with a sterile loop. Dissolve them in a 24- or 48-well plate. Each well should contain 2 mL SDC. Grow the cells in a shaker for 18 h at 30 °C, 240 RPM (see Note 37). Dilute 15 μL of the cell culture in 300-mL ddH2O inside a 1.5 or 2 mL centrifuge tube (see Note 38). 3. FACS laser warming up: Switch ON the FACS machine 20 min before starting the measurements to preheat the laser. 4. Open the FACS software on a computer screen (see Note 39). Set the filter to FITC or GFP, the excitation wavelength to 488 nm, the emission wavelength to 527/32 nm [22], and the number of events (acquisitions) to 10,000 cells (see Note 40).

108

Lifang Yu and Mario Andrea Marchisio

5. Wash the sample channel: Use 2 mL ddH2O to clean the channel until the fluorescence intensity is close to 0. 6. Adjust FITC voltage: Measure the two peaks of the fluorescent beads and adjust the voltage such that the relative difference between the current and the last-measured peak values is less than 5%. 7. Wash the sample channel with ddH2O to remove any left beads. 8. Measure the samples one by one. 9. Measure the beads again. If the relative difference with respect to the values obtained at the beginning of the experiment (see Subheading 3.11, step 6) does not exceed 5%, the machine was stable during the measurements and the recorded fluorescence values are reliable. 10. Export FACS data as FCS files. 11. Clean the FACS machine: Wash the sample channel with ethanol for 10 min, then with the cleaning solution (see Subheading 2.8, item 2) again for 10 min, and finally with ddH2O for 10 more minutes. Switch OFF the machine. 12. Analyze FCS files with “R studio” [22] or FlowJo software. The mean fluorescence values and their standard deviations are extracted from the FCS files (see Fig. 5). 13. Discard the samples whose mean fluorescence intensity is too low. After repeating the experiment at least three times, select one (or more samples) to be stored at -80 °C as a cryostock and used for the next plasmid integration. 3.12 Strains Transformed with Two Plasmids

1. One carrying the yEGFP gene and the other a dCas12a gene. The fluorescence from these strains should be very low. 2. Repeat all steps from Subheadings 3.11, steps 2–12 (see Note 41). 3. Discard the samples whose fluorescence intensity is too high, i.e., very close to that measured in Subheading 3.11, step 12. Select the sample(s) that minimize(s) fluorescence expression and store them as cryostocks.

3.13 Strains Transformed with the Complete IMPLY Boolean Gates Responding to Copper and Methionine

1. Only when copper is present and methionine is absent, a functional dLbCas12a (fused to two NESs) is reconstituted and the fluorescence intensity drops to very low values. 2. Cell culture: select at least eight strains, after yeast transformation, as in Subheading 3.11, step 2. Each strain shall be grown in four different solutions corresponding to the entries of the truth table of the IMPLY gate: SDC (0,0), SDC supplemented with 10 mM methionine (0,1), SDC supplemented with

dCas12a:Pre-crRNA Induces mRNA Degradation in Yeast

109

Fig. 5 Results of IMPLY gate. Two IMPLY gates were built by using different Rs elements along the 5′ UTR, namely, 1xFn_Rs (cognate to FnCas12a) and 1xLb_Rs (cognate to LbCas12a). Both gates performed well since they showed a significant reduction of yEGFP expression only for the (1,0) truth table entry. The split dLbCas12a achieved the best performance on 1xLb_Rs with a 10.94-fold decrease in fluorescence

500 μM copper (1,0), SDC supplemented with 10 mM methionine and 500 μM copper (1,1). Each solution occupies a 2 mL volume. Grow cells in a shaker for 18 h at 30 °C, 240 RPM. Dilute 15 μL of each cell culture in 300 mL ddH2O as mentioned in Subheading 3.11, step 2. 3. Repeat all steps given in Subheadings 3.11, steps 3–12. 4. The best working IMPLY gate corresponds to the strain that maximizes the ON/OFF ratio (ρ), i.e., the ratio between the minimal 1 output value and the only 0 output value (see Note 42) [23] (see Fig. 5). Store the strain with the highest ρ value as a cryostock.

4

Notes 1. Both amaranth dye and nucleic acid dye are sensitive to light. For this reason, they shall be stored in black tubes. DNA loading buffer (6×), supplied with a nucleic acid dye, can be kept at room temperature. 2. The concentration of the agarose gel (usually between 0.8% and 2%) depends on the length of the DNA (the longer the DNA sequence, the lower the gel concentration). We recommend: DNA length < 100 nt, 2% gel; 100 up to 500 nt, 1.5%; 500 up to 1000 nt, 1%; over 1000 nt, 0.8%). 3. Agarose gel preparation can be carried out also at room temperature, without placing the gel preparation cast inside a 4 °C

110

Lifang Yu and Mario Andrea Marchisio

fridge. On the bench, agarose needs more time to solidify (at least 25 min). 4. Use a bench mini-centrifuge for this task. 5. ssDNA should be boiled for 5 min at 95 °C every fifth time it has been used. 6. Shake the beads stock solution tube (stored at 4 °C) before adding the beads to water. Beads are fluorescent microspheres sensitive to light. Hence, they shall be kept in a black tube when diluted in water. The beads solution can be stored at 4 °C for long-time usage. 7. High temperature will inactivate ampicillin. In this case, the false-positive rate on plates will increase. 8. AA mix excludes histidine, leucine, tryptophan, and uracil since they are used for preparing selective plates (“SD minus”) for yeast auxotrophic selection. 9. pCYC1min is derived from the core sequence of the yeast constitutive CYC1 promoter. It contains two TATA boxes (instead of three) and expresses about 18% of the green fluorescence reached by the GPD promoter, the strongest in S. cerevisiae. 10. As an alternative, we used the laccase 1 protein from Trametes trogii that returns a signal visible to the naked eye, i.e., a greenpurple color on SDC supplied with 0.5 mM ABTS [24]. 11. The folding of this short pre-crRNA can be simulated with RNAFold. The temperature shall be set to 30 °C, i.e., the one at which yeast cells are cultured in the lab [25]. 12. Possible OFF-target effects can be determined computationally, for instance via the Cas-OFFinder algorithm [26]. 13. pTEF2 strength is about 44% of that of pGPD. 14. The NLS or the NES is omitted when the bare dCas12a is expressed. 15. pCUP1 and pMET25 show a rather strong leakage effect in the absence of copper and in the presence of methionine, respectively. This can spoil the Boolean behavior of the circuits in which they are used. 16. We used two different repeats: one cognate to LbCas12a and the other to AsCas12a. The former provided a circuit ON/OFF ratio about four times higher than that achieved with the latter. 17. A different terminator is used here to avoid having too many copies of the same part—CYC1t—in the circuit. 18. Primers are designed according to the Gibson assembly specifications, i.e., they can guarantee an overlap of about 40-nt

dCas12a:Pre-crRNA Induces mRNA Degradation in Yeast

111

between two contiguous DNA sequences and they cannot fold into hairpins whose melting temperature is higher than 50 °C. Touchdown PCR allows to amplify a DNA sequence with a forward primer and a reverse primer characterized by very different annealing temperatures (up to 9 °C, usually). 19. Further primers are designed in an analogous way to assemble the Tus that encodes for both the complete and the split dLbCas12a. 20. The concentration of the DNA template for PCR should be 20–40 ng/μL. A lower DNA concentration might prevent PCR success. A higher DNA concentration, in contrast, might result in unspecific binding of the primers to the template. 21. The PCR solution shall be mixed, here, to remove bubbles at the bottom of the tube. 22. Phusion Hot Start DNA polymerase can be used as well. 23. Q5 Hot Start DNA polymerase (like any other enzyme) should be kept on ice after taking it out of the -20 °C freezer. Otherwise, its activity could be lost. For this reason, it is preferable to add Q5 Hot Star DNA polymerase to the PCR mixture at the very last step. 24. The temperatures Ta1 and Ta10 are set according to the melting temperature of the forward and reverse primers. It holds that Ta10 = Ta1 - 10 + 1 °C. 25. The elongation time (Etime) depends on the length of the DNA and the speed of the DNA polymerase. For example, Q5 Hot Start DNA polymerase travels over 1 kb in 20–30 s. If the DNA length is 500 bp, then Etime can be set equal to 15 s. 26. The quantity of 6× DNA loading dye is equal to the sample volume divided by 5. 27. The time for running the gel depends on the concentration of the gel and the length of the DNA sequences. 28. The quantity of an enzyme is calculated according to its concentration and buffer. For example, Acc651 (NEB-R0599S) is sold at a concentration of 10 U/μL and its cleaving efficiency in CutSmart buffer is 25%, 100% in Buffer 3.1. Sac1 (NEB-R3156S) is sold at 20 U/μL with a cleaving efficiency of 100% in CutSmart buffer, whereas it shows star activity in Buffer 3.1. Hence, CutSmart shall be selected for the digestion solution containing both enzymes. However, the working efficiency of Acc651 is only a quarter of that of Sac1. Therefore, the amount of Acc651 to cut 5 μg of the backbone shall be increased from 0.5 μL to at least 1 μL and the digestion time shall be 2 h instead of one. The backbones for the assembly of the plasmids expressing the complete and the split dCas12a are

112

Lifang Yu and Mario Andrea Marchisio

acceptor vectors as those described in Subheading 3.1, steps 4 and 5. They are cut open with BamH1 and Xho1. 29. The heat-inactivation temperature changes from enzyme to enzyme. About 65 °C is fine for both Sac1 and Acc651. 30. If necessary, add ddH2O to the “DNA tube” to reach a volume of 5 μL. 31. E.coli DH5α transformation might demand more than 5 μL of the Gibson reaction solution. If the DNA parts (including the backbone) are more than 4 and/or their concentrations are less than 10 ng/μL, a higher volume of the Gibson reaction solution, such as 10–20 μL, can be utilized during the bacterial transformation to improve process efficiency. 32. DNA concentration: If the ratio OD260/OD280 is ~1.8, the solution contains pure DNA; if OD260/OD280 is close to 2.0, then the solution contains pure RNA. When OD260/OD280 is less than 1.8, the sample might contain proteins or phenol. 33. This digestion is to check if a plasmid was assembled properly, i.e., if it contains an insert instead of the MCS. Kpn1 (NEB-R3142S) and Acc651 are isoschizomers. Kpn1 has, like Sac1, 100% efficiency in CutSmart buffer. However, it cannot be heat inactivated. Therefore, its use is not recommended in the backbone digestion before the Gibson assembly. 34. More precisely, the enzyme shall cut the marker sequence homologous to that in the CEN.PK2-1C strain genome. 35. To transform yeast cells, a 15-min-long thermal shock is required. 36. A selective medium is used to eliminate possible false positives. 37. After 18 h the cells have reached a steady state in fluorescence emission. 38. It is important to grow and measure the fluorescence of the original CEN.PK2-1C to estimate the background fluorescence value. A positive control—i.e., a strain whose (reasonably high) fluorescence intensity was previously measured—should be added to the experiment as well. 39. It is convenient to prepare a file that contains a template for the detection of green fluorescence. This file is loaded at the beginning of every experiment. 40. The FACS machine we used is a BD FACSVerse. 41. A strain that was transformed only with the plasmid carrying the yEGFP gene and the pre-crRNA elements shall be measured as well as a negative control. 42. More in general, the ON/OFF ratio is calculated as ρ =

minð1Þ MAXð0Þ

dCas12a:Pre-crRNA Induces mRNA Degradation in Yeast

113

Acknowledgments We are grateful to the students of the Synthetic Biology lab for their help. We want to thank Xiangyang Zhang and Zhi Li for their assistance in the FACS experiments. References 1. Parker R, Song H (2004) The enzymes and control of eukaryotic mRNA turnover. Nat Struct Mol Biol 11(2):121–127. https://doi. org/10.1038/nsmb724 2. Isaacs FJ, Dwyer DJ, Collins JJ (2006) RNA synthetic biology. Nat Biotechnol 24(5): 545–554. https://doi.org/10.1038/nbt1208 3. Wang Y, Wang Z, Tanaka Hall TM (2013) Engineered proteins with Pumilio/fem-3 mRNA binding factor scaffold to manipulate RNA metabolism. FEBS J 280(16): 3755–3767. https://doi.org/10.1111/febs. 12367 4. Ahringer J, Kimble J (1991) Control of the sperm–oocyte switch in Caenorhabditis elegans hermaphrodites by the fem-3 3′ untranslated region. Nature 349(6307):346–348. https:// doi.org/10.1038/349346a0 5. Ohrt T, Merkle D, Birkenfeld K, Echeverri CJ, Schwille P (2006) In situ fluorescence analysis demonstrates active siRNA exclusion from the nucleus by Exportin 5. Nucleic Acids Res 34(5):1369–1380. https://doi.org/10.1093/ nar/gkl001 6. He L, Hannon GJ (2004) MicroRNAs: small RNAs with a big role in gene regulation. Nat Rev Genet 5(7):522–531. https://doi.org/10. 1038/nrg1379 7. Ge H, Marchisio MA (2021) Aptamers, riboswitches, and ribozymes in S. cerevisiae synthetic biology. Life (Basel) 11(3). https://doi. org/10.3390/life11030248 8. Qi L, Haurwitz RE, Shao W, Doudna JA, Arkin AP (2012) RNA processing enables predictable programming of gene expression. Nat Biotechnol 30(10):1002–1006. https://doi.org/10. 1038/nbt.2355 9. Borchardt EK, Vandoros LA, Huang M, Lackey PE, Marzluff WF, Asokan A (2015) Controlling mRNA stability and translation with the CRISPR endoribonuclease Csy4. RNA 21(11):1921–1930. https://doi.org/ 10.1261/rna.051227.115 10. Tang X, Zheng X, Qi Y, Zhang D, Cheng Y, Tang A, Voytas DF, Zhang Y (2016) A single transcript CRISPR-Cas9 system for efficient genome editing in plants. Mol Plant 9(7):

1088–1091. https://doi.org/10.1016/j. molp.2016.05.001 11. Fonfara I, Richter H, Bratovic M, Le Rhun A, Charpentier E (2016) The CRISPR-associated DNA-cleaving enzyme Cpf1 also processes precursor CRISPR RNA. Nature 532(7600): 5 1 7 – 5 2 1 . h t t p s : // d o i . o r g / 1 0 . 1 0 3 8 / nature17945 12. Yu L, Marchisio MA (2021) Saccharomyces cerevisiae synthetic transcriptional networks harnessing dCas12a and type V-A antiCRISPR proteins. ACS Synth Biol 10(4): 8 7 0 – 8 8 3 . h t t p s : // d o i . o r g / 1 0 . 1 0 2 1 / acssynbio.1c00006 13. Campa CC, Weisbach NR, Santinha AJ, Incarnato D, Platt RJ (2019) Multiplexed genome engineering by Cas12a and CRISPR arrays encoded on single transcripts. Nat Methods 16(9):887–893. https://doi.org/10. 1038/s41592-019-0508-6 14. Kempton HR, Goudy LE, Love KS, Qi LS (2020) Multiple input sensing and signal integration using a split Cas12a system. Mol Cell. https://doi.org/10.1016/j.molcel.2020. 01.016 15. Kleinstiver BP, Sousa AA, Walton RT, Tak YE, Hsu JY, Clement K, Welch MM, Horng JE, Malagon-Lopez J, Scarfo` I, Maus MV, Pinello L, Aryee MJ, Joung JK (2019) Engineered CRISPR-Cas12a variants with increased activities and improved targeting ranges for gene, epigenetic and base editing. Nat Biotechnol 37(3):276–282. https://doi.org/10. 1038/s41587-018-0011-0 16. Yu L, Marchisio MA (2023) CRISPRassociated type V proteins as a tool for controlling mRNA stability in S. cerevisiae synthetic gene circuits. Nucleic Acids Res. https:// doi.org/10.1093/nar/gkac1270 17. Chee MK, Haase SB (2012) New and redesigned pRS plasmid shuttle vectors for genetic manipulation of Saccharomyces cerevisiae. G3 (Bethesda, Md) 2(5):515–526. https://doi. org/10.1534/g3.111.001917 18. Hahn S, Hoar ET, Guarente L (1985) Each of three “TATA elements” specifies a subset of the transcription initiation sites at the CYC-1 promoter of Saccharomyces cerevisiae. Proc Natl

114

Lifang Yu and Mario Andrea Marchisio

Acad Sci U S A 82(24):8562–8566. https:// doi.org/10.1073/pnas.82.24.8562 19. Sheff MA, Thorn KS (2004) Optimized cassettes for fluorescent protein tagging in Saccharomyces cerevisiae. Yeast 21(8):661–670. https://doi.org/10.1002/yea.1130 20. Sambrook MRGJ (2018) Molecular cloning, 4th edn. Cold Spring Harbor Laboratory Press, Cold Spring Harbor 21. Gietz RD, Woods RA (2002) Transformation of yeast by lithium acetate/single-stranded carrier DNA/polyethylene glycol method. Methods Enzymol 350:87–96. https://doi.org/10. 1016/s0076-6879(02)50957-5 22. Yu L, Zhang Y, Marchisio MA (2022) Gene digital circuits based on CRISPR-Cas systems and anti-CRISPR proteins. J Vis Exp (188). https://doi.org/10.3791/64539 23. Wang X, Tian X, Marchisio MA (2023) Logic circuits based on 2A peptide sequences in the

yeast Saccharomyces cerevisiae. ACS Synth Biol 12(1):224–237. https://doi.org/10.1021/ acssynbio.2c00506 24. Asemoloye MD, Marchisio MA (2022) Synthetic Saccharomyces cerevisiae tolerate and degrade highly pollutant complex hydrocarbon mixture. Ecotoxicol Environ Saf 241:113768. https://doi.org/10.1016/j.ecoenv.2022. 113768 25. Lorenz R, Bernhart SH, Siederdissen CHZ, Tafer H, Flamm C, Stadler PF, Hofacker IL (2011) ViennaRNA Package 2.0. Algorithms Mol Biol 6:26. https://doi.org/10.1186/ 1748-7188-6-26 26. Bae S, Park J, Kim JS (2014) Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics (Oxford, England) 30(10):1473–1475. https://doi.org/ 10.1093/bioinformatics/btu048

Part II Genome Editing and Modification

Chapter 7 CRISPRi-Driven Genetic Screening for Designing Novel Microbial Phenotypes Minjeong Kang, Kangsan Kim, and Byung-Kwan Cho Abstract Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/Cas9 system has enabled rapid advances in genomic engineering and transcriptional regulation. Specifically, CRISPR interference (CRISPRi) system has been used to systematically investigate the gene functions of microbial strains in a highthroughput manner. This method involves growth profiling using cells that have been transformed with the deactivated Cas9 (dCas9) and single-guide RNA (sgRNA) libraries that target individual genes. The fitness scores of each gene are calculated by measuring the abundance of individual sgRNAs during cell growth and represent gene essentiality. In this chapter, a process is described for functional genetic screening using CRISPRi at the whole-genome scale, starting from the synthesis of sgRNA libraries, construction of CRISPRi libraries, and identification of essential genes through growth profiling. The commensal bacterium Bacteroides thetaiotaomicron was used to implement the protocol. This method is expected to be applicable to a broader range of microorganisms to explore the novel phenotypic characteristics of microorganisms. Key words CRISPRi, Single-guide RNA, Gene fitness score, Gene essentiality, Bacteroides thetaiotaomicron

1

Introduction Understanding how genes interact to express a particular phenotype is crucial for genetic studies. Many efforts have been undertaken to understand the complex dynamics of gene expression associated with specific phenotypes. Typically, functional analysis is conducted via gene knockout or reverse engineering of genetic variants. However, this process is time consuming and often insufficient to elucidate the genotype–phenotype relationships of genes at a genome-wide level. Recent advances in sequencing technology have enabled the systematic and high-throughput exploration of gene function.

Jeffrey Carl Braman (ed.), Synthetic Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2760, https://doi.org/10.1007/978-1-0716-3658-9_7, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2024

117

118

Minjeong Kang et al.

Transposon-insertion sequencing (Tn-Seq) is a representative example of high-throughput genetic screening [1] and is undertaken by preparing various mutants from a saturated transposon insertion library. This high-throughput sequencing of the mutant library allows for the analysis of enriched and depleted mutants after selection [2]. Genomic regions enriched with transposon insertion are deemed non-essential, while genomic sites devoid of insertion mutations are considered essential for survival under specific culture conditions. In principle, Tn-Seq provides a binary way of understanding the essentiality of genic and intergenic regions in a genome-wide manner. With the advent of the clustered regularly interspaced short palindromic repeat-associated system (CRISPR/Cas9 system) and oligomer synthesis technology, a more efficient high-throughput genetic screening method has been developed [3, 4]. CRISPR/ Cas9 is an RNA-guided nuclease derived from the bacterial adaptive immune system. A mutation is applied to the Cas9 protein, resulting in a CRISPR interference (CRISPRi) system that inactivates transcription by blocking RNA polymerase elongation. The CRISPRi system is applied as a genome-wide genetic screening method. Briefly, CRISPRi-mediated gene repression leads to fitness changes as a function of the essentiality of target genes. Variation in fitness can, in turn, translate into gene essentiality. Using an sgRNA library that covers the entire genome, one growth profiling experiment can provide an entire genotype–phenotype map. CRISPRi screening has the advantage of being able to detect both essential and non-essential genes because it is a transcriptional repression method rather than a gene knockout one. Through CRISPRi screening, all genes and intergenic regions can be quantified for essentiality. In addition, it is possible to specify the target genomic site (tiling library) and adjust the library size accordingly, making this method more flexible than Tn-seq [5]. This useful method also has disadvantages such as off-target binding and polar effects. To address off-target binding, several mathematical matrices have been developed to calculate the probability of off-target occurrence during sgRNA design and to eliminate off-target candidates [6]. Polar effects are addressed by the manual correction of the fitness of genes that exhibit defects [7]. High-throughput genetic screening using CRISPRi has been applied to a broad range of prokaryotic cells, ranging from the model strain Escherichia coli to pathogenic Staphylococcus species [8, 9]. The CRISPRi system has generally been applied to microbes to identify functional connections between genotypes and phenotypes, including, but not limited to, genetic determinants of cell growth [7], essentiality, susceptibility to chemicals [8, 10], and phage infection [11]. CRISPRi has been increasingly exploited for diverse engineering purposes, including the identification of novel

CRISPRi-Driven Genetic Screening

119

antibiotic targets for pathogenic species and the assessment of engineering targets for maximizing target compound production [12]. In this study, the application of high-throughput genetic screening was introduced using Bacteroides thetaiotaomicron (B. theta), which is one of the most prominent gut commensal bacteria. The protocol includes an sgRNA design method for the synthesis of the pooled sgRNA library and an analysis method developed for identifying the essential genes. It is expected that these procedures can be applied to a wide range of microorganisms other than model organisms to explore novel microbial phenotypes.

2

Materials

2.1 Synthesis of Oligomer Llibrary and Assembly of Plasmid Library

1. pMM553 (Addgene, plasmid #68543). 2. Oligomer library. 3. One shot™ PIR1 Chemically Competent E. coli (Thermo Fisher Scientific). 4. 5× Phusion HF Buffer, 10 mM dNTPs, Phusion DNA Polymerase, and nuclease-free water. 5. AccuPower PCR Master Mix (2× Master Mix solution). 6. DpnI, XbaI, and SalI. 7. Fast AP Thermosensitive Alkaline Phosphatase (Thermo Fisher). 8. 10× T4 DNA Ligase Buffer, T4 DNA Ligase, and nuclease-free water. 9. LB (Luria-Bertani) agar containing ampicillin (100 μg/mL). 10. 1× TAE Buffer. 11. 1% agarose gel in TAE Buffer. 12. 1 kb and 100 bp DNA ladder. 13. Gel caster, gel tray, comb, gel electrophoresis apparatus. 14. PCR tubes. 15. PCR purification kit. 16. Cell scrapers. 17. PCR machine.

2.2 Transformation of sgRNA Plasmid Library to Helper Strain for Conjugation

1. S17-1λpir (ATCC BAA-2428). 2. HindIII. 3. Liquid LB medium. 4. Liquid LB medium containing ampicillin (100 μg/mL).

120

Minjeong Kang et al.

5. LB agar containing ampicillin (100 μg/mL). 6. SOC medium. 7. Ice-cold 10% (v/v) glycerol. 8. 50% (v/v) glycerol. 9. 1× TAE Buffer. 10. 1% agarose gel in TAE Buffer. 11. 1 kb DNA ladder. 12. Gel caster, gel tray, comb, gel electrophoresis apparatus. 13. 15 mL and 50 mL falcon cell culture tubes. 14. 1.5 mL Eppendorf tubes. 15. 90 × 15 mM and 150 × 20 mM petri dishes. 16. Spectrophotometer, disposable cuvette. 17. Refrigerated Centrifuge. 18. Shaking incubator. 19. 0.1 cm gap Gene Pulser/MicroPulser Electroporation Cuvettes (Bio-Rad). 20. MicroPulser Electroporator (Bio-Rad). 21. Plasmid purification kit. 22. Cell scrapers. 23. Nanodrop spectrophotometer. 24. -80 °C freezer. 2.3 Conjugation of Bacterial Hosts

1. dCas9 genome-integrated B. theta VPI-5482 (B. theta::dCas9, dCas9 from Streptococcus pyogenes). 2. Liquid LB medium containing ampicillin (100 μg/mL). 3. Columbia agar supplemented with 5% defibrinated sheep blood (blood agar). 4. Brain heart infusion medium supplemented with 10 mg/L hemin, 0.5 g/L cysteine, and 1 mg/L vitamin K3 (BHIS). 5. Liquid BHIS medium containing erythromycin (25 μg/mL). 6. BHIS agar contained erythromycin (25 μg/mL) and gentamicin (200 μg/mL). 7. 50% (v/v) glycerol. 8. Anaerobic chamber with an atmosphere of 20% CO2, 4% H2, and balanced N2. 9. Gassing station for gas exchange of 20% CO2, 4% H2, and balanced N2. 10. Anaerobic serum bottles. 11. 90 × 15 mM petri dishes.

CRISPRi-Driven Genetic Screening

121

12. 245 × 245 mM square dishes. 13. Spectrophotometer, disposable cuvette. 14. Cell scrapers. 15. Refrigerated Centrifuge. 16. Shaking incubator. 17. Vortex mixer. 18. -80 °C freezer. 2.4 Quality Control (QC) of CRISPRi Library

1. 5× Q5 Reaction Buffer, 10 mM dNTPs, Q5 High-Fidelity DNA polymerase, 10× SYBR green solution, and nucleasefree water. 2. HotStarTaq Master Mix, 10× SYBR green solution, and nuclease-free water. 3. KAPA Library Quantification Kit (Roche Molecular Systems). 4. 1× TAE Buffer. 5. 1% agarose gel in TAE Buffer. 6. 1 kb and 100 bp DNA ladder. 7. Gel caster, gel tray, comb, and gel electrophoresis apparatus. 8. Qubit 1× dsDNA Broad Range (BR) assay kit and Qubit 1× dsDNA High Sensitivity (HS) assay kit. 9. Qubit 4 Fluorometer. 10. QuantStudio 1 Real-Time PCR system (Thermo Fisher Scientific). 11. D1000 ScreenTape, D1000 Reagents, and 2200 TapeStation system. 12. MiSeq system, MiSeq Reagent Kit v3 (150-cycle), and PhiX Control v3. 13. CLC Genomic Workbench 6. 14. Genomic DNA extraction kit (DNeasy PowerSoil Pro kit, Qiagen). 15. PCR purification kit.

2.5 Growth Profiling of CRISPRi Library to Provide Selected Library

1. BHIS liquid medium containing erythromycin (25 μg/mL). 2. BHIS liquid medium containing 100 μM isopropyl β-D-1thiogalactopyranoside (IPTG) and erythromycin (25 μg/mL). 3. Anaerobic serum bottles. 4. Spectrophotometer and disposable cuvette. 5. Anaerobic chamber with an atmosphere of 20% CO2, 4% H2, and balanced N2.

122

Minjeong Kang et al.

6. Gassing station for gas exchange of 20% CO2, 4% H2, and balanced N2. 7. Shanking incubator.

3

Methods

3.1 Design Considerations for sgRNA Oligomer Library

A sufficient number of sgRNAs at the genome scale are necessary to obtain a high-quality sgRNA library. Generally, 10 sgRNAs per gene provide statistical robustness for data analysis [4]. To this end, sgRNAs located near the 5′ end of every 100 bp segment of the open reading frame (ORF) sequence were designed. This corresponded to a library size of 11.5 sgRNAs per gene (see Note 1). In cases where no sgRNA target site was available within a 100 bp segment, an sgRNA-binding site closest to the preceding segment was designed. To effectively inhibit transcription elongation, sgRNAs were designed to hybridize the coding strand of target ORFs, as reported previously [13]. Guides that did not meet the off-target score criteria described in a previous study were excluded [14]. Finally, 55,378 sgRNAs were designed that bound to one of every 100 bp of the ORF sequence (Fig. 1).

3.2 Synthesis of Oligomer Library and Assembly of Plasmid Library

The sgRNA library can be delivered as either an oligomer library or a plasmid library (cloned oligomer library on a plasmid). A plasmid library can be delivered both as lyophilized plasmid DNA and as a glycerol cell stock. Oligomers were assembled as follows.

3.2.1 Assembly of Plasmid Libraries

1. Amplification of a backbone vector by PCR required a master mix containing 10 μL of 5× Phusion HF Buffer, 1 μL of 10 mM dNTPs, 1 μL of pMM553_RE_F, and pMM553_RE_R each (10 μM stock, Table 1), 0.5 μL of Phusion DNA Polymerase,

Fig. 1 (a) Distribution of sgRNAs in B. theta CRISPRi library. All the mapped reads are assigned to the B. theta genome. The entire sgRNAs are uniformly distributed throughout the genome. (b) Histogram of the number of sgRNAs per mapped read counts. Most of the mapped read counts are between 50 and 100

CRISPRi-Driven Genetic Screening

123

Table 1 Primer sequences used in this study

Primer sequences (5′ – 3′)

PCR product size

Uses

Name

Backbone amplification

pMM553_RE_F GACTGTCGACCAGCCTTACTTGTGCCTG pMM553_RE_R CTAGTCTAGACAGCAGCTTTCATTGCT

4836 bp

Oligomer amplification

sgRNA_RE_F sgRNA_RE_R

GATCTTCTAGACAATTGGGCTACCT GGCTGGTCGACGCACCGA

218 bp

Cloning verification

Con_F Con_R

GTAACGCACTGAGAAGCCCT GGTCATGAAATGGACGGGAC

603 bp

1′nested PCR

BT_F sgRNA_R

TTCGTACCTTTGCACCGCTT AACAGGCACAAGTAAGGCTG

3553 bp

2′nested PCR

UDi5_F1

AATGATACGGCGACCACCGAGATC TACACAGCGCTAGACACTCTTTCCC TACACGACGCTCTTCCGATCTTTG TTTGCAATGGTTAATCTATTGTT CAAGCAGAAGACGGCATACGAGA TAACCGCGGGTGACTGGAGTTCAGACGTG TGCTCTTCCGATCACTCGGTGCCAC TTTTTCAAGTTG AATGATACGGCGACCACCGAGATCTACACGA TATCGAACACTCTTTCCCTACACGACGCTC TTCCGATCTTTGTTTGCAATGGTTAATCTA TTGTT CAAGCAGAAGACGGCATACGAGATGGTTA TAAGTGACTGGAGTTCAGACGTGTGCTC TTCCGATCACTCGGTGCCACTTTTTCAAG TTG

301 bp

UDi7_R1

UDi5_F2

UDi7_R2

301 bp

10 ng of backbone plasmid, and nuclease-free water made up to 50 μL. The following PCR parameters were used: 98 °C for 30 s; the cycle repeated 30 times, 98 °C for 15 s, 66.4 °C for 30 s, 72 °C for 2 min 30 s, and 72 °C for 1 min and then stored at 4 °C. 2. The size of the PCR product was verified by gel electrophoresis using an aliquot of the PCR mixture. The expected PCR product size was 4836 bp. 3. 1 μL of DpnI was added to the PCR products for 1 h at 37 °C. 4. The amplified insert fragments (oligomer library) were produced with a PCR master mix containing 10 μL of 5× Phusion HF Buffer, 1 μL of 10 mM dNTPs, 1 μL of sgRNA_RE_F, and sgRNA_RE_R, each (10 μM stock, Table 1), 0.5 μL of Phusion DNA Polymerase, 10 ng of oligomer library, and nuclease-free water made up to 50 μL. The following PCR parameters

124

Minjeong Kang et al.

were used: 98 °C for 30 s; the cycle repeated 30 times, 98 °C for 15 s, 63.8 °C for 30 s, 72 °C for 30 s, and 72 °C for 1 min, and then stored at 4 °C. 5. The size of the PCR product was verified by gel electrophoresis using an aliquot of the PCR mixture. The expected size of the PCR product was 218 bp. 6. All PCR products were purified using a commercial PCR purification kit. 7. The FastDigest restriction enzymes XbaI and SalI were used to digest both the backbone and oligomer PCR products according to the Thermo Fisher Scientific Protocol. 8. The digested backbone fragment was dephosphorylated using FastAP Thermosensitive Alkaline Phosphatase following the Thermo Fisher Scientific Protocol. 9. The digested backbone and oligomer were digested by making a mixture of 100 ng of digested backbone fragment, 20.95 ng of digested oligomer (1:5 ratio), 2 μL of 10× T4 DNA Ligase Buffer, 1 μL of T4 DNA Ligase, and nuclease-free water made up to 20 μL, incubated for 16 h at 16 °C. 10. Transformation of the ligase mixture was done with One Shot™ PIR1 Chemically Competent E. coli, following the Thermo Fisher Scientific Protocol. The recovered solution was spread onto LB agar plates containing ampicillin (100 μg/mL). 11. Colony PCR was performed to determine whether cloning was successful. The PCR master mix contained 10 μL of 2× PCR Master Mix solution, 1 μL of Con_F and Con_R (10 μM stock, Table 1), and nuclease-free water (made up to 20 μL). The PCR parameters used were 95 °C for 5 min, the cycle repeated 30 times, 95 °C for 30 s, 59 °C for 30 s, 72 °C for 40 s, and 72 ° C for 5 min, and then stored at 4 °C. 12. The size of the PCR product was verified by gel electrophoresis using an aliquot of the PCR mixture. The PCR product size for successfully cloned PCR was 603 bp. 13. All the colonies were scraped and the plasmid was purified using a commercial plasmid extraction kit. 3.3 Transformation of sgRNA Plasmid Library to Helper Strain for Conjugation 3.3.1 Preparation of Electrocompetent Cells

1. 50 μL of S17-1 from cell stock was inoculated into 5 mL of LB medium and grown overnight at 37 °C with 200 rpm agitation. 2. The culture was then transferred to 510 mL of fresh LB medium with OD600 of 0.05 and the culture was grown at 37 °C with 200 rpm agitation. 3. Ice-cold 10% (v/v) glycerol, 50 mL falcon tubes, 1 mM electroporation cuvettes, and 1.5 mL Eppendorf tubes were prepared.

CRISPRi-Driven Genetic Screening

125

4. When the culture reached the mid-log phase (OD600 0.4 ~ 0.6), ten aliquots of the cell culture were placed in ice-cold 50 mL falcon tubes and centrifuged to harvest cells at 4000 rpm (3134 g) at 4 °C for 10 min. 5. The supernatant was carefully removed and 50 mL of ice-cold 10% (v/v) glycerol was added. Pellets were resuspended by brief vortexing. 6. The resuspension was centrifuged at 4000 rpm (3134 g) at 4 ° C for 10 min. Steps 4 and 5 were repeated twice. 7. The supernatant was discarded, and the cell pellet was resuspended with 1 mL of ice-cold 10% (v/v) glycerol by pipetting. 8. 100 μL aliquots of resuspended competent cells were placed into ice-cold 1.5 mL Eppendorf tubes and immediately frozen in liquid nitrogen. 9. The competent cells were kept at -80 °C until the transformation was performed (see Note 2). 3.3.2

Electroporation

1. Electrocompetent cells were thawed in ice. 2. 500 ng of the plasmid library was added to a 100 μL aliquot of electrocompetent cells and mixed gently by pipetting. 3. The mixture was placed into 1 mm electroporation cuvettes and pulsed using the BioRad MicroPulser Electroporator according to the recommended protocol for E. coli transformation (1.8 kV with 0.1 cm cuvettes). 4. 900 μL of SOC was immediately added and the cells recovered for an hour at 37 °C with 200 rpm agitation. 5. Serially diluted recovered cells were spread on pre-warmed LB agar plates containing ampicillin (100 μg/mL) and the plates were incubated at 37 °C overnight. 6. The transformation efficiency was estimated by counting colonies to determine the number of plates necessary to generate the desired colonies. Furthermore, the dilution factors for plating electroporated cells on selection plates were determined. For example, to generate more than 3,000,000 colonies per transformation with a transformation efficiency of 15,000,000 colonies, the transformed cells were diluted 1500-fold to obtain 10,000 colonies per plate. Consequently, 400 selection plates were used and more than 4,000,000 colonies were obtained. 7. It was ensured that the desired plasmid was properly transformed in some colonies through plasmid extraction and restriction enzyme treatment.

126

Minjeong Kang et al.

3.3.3 Transformation of sgRNA Plasmid Library to Helper Strain

1. 50 μL of cell stock harboring sgRNA plasmid library (1/100 volume) was inoculated into 5 mL of LB medium containing ampicillin (100 μg/mL) and grown to the mid-log phase at 37 °C with 200 rpm agitation. 2. All the grown cells were transferred to a 15 mL falcon tube and centrifuged for 10 min at 3134 × g, and the supernatant was carefully removed. 3. The plasmids were purified with a commercial kit. 4. The restriction enzyme HindIII was used to digest the plasmids and analyze the fragment size by gel electrophoresis to confirm whether they contained the sgRNA insert. This yielded three DNA fragments of 2679, 1481, and 847 bp long. 5. The DNA concentration was analyzed using a Nanodrop. The DNA concentration was expected to be 250–500 ng/μL after elution with 30–50 μL. 6. The sgRNA plasmids were electroplated into S17-1 competent cells following the steps in Subheading 3.3, step 2. 7. After recovery, the transformed cells were diluted 1500-fold, and 1 mL of the solution was spread on each selection plate (LB with ampicillin [100 μg/mL], 150 × 20 mM plate). In total, over 4,000,000 colonies were obtained from 400 plates (this process can be completed over a period of several days). 8. 150 × 20 mM monitoring plates were used to count colonies. The electroporated cells were diluted by at least 2 dilutions and spread on an LB agar selection plate containing ampicillin (100 μg/mL) in a 90 × 15 mm petri dish. The dilution factor and number of colonies on the monitoring plates were used to determine the number of colonies. Cell scrapers were used to scrape all colonies from 150 × 20 mm petri dishes using 1 mL of LB medium with 100 μg/mL ampicillin. The colonies were then resuspended in an LB medium with ampicillin. The cell suspension was mixed thoroughly and divided equally into 1.5 mL Eppendorf tubes. Subsequently, 50% (v/v) glycerol of the same volume was added to make a 25% (v/v) glycerol cell stock which was then stored at -80 °C. All processes after cell scraping were performed with the cells placed on ice.

3.4 Conjugation of Bacterial Hosts

The conjugation efficiency and required library size determined the number of conjugation iterations. According to this experiment, approximately 100,000 colonies were obtained per conjugation, and 35 rounds of conjugation were conducted with the goal of obtaining approximately 3,000,000 colonies. Seven conjugations were performed per day over 5 days. In total, 3,500,000 colonies were obtained, creating a library with an average sgRNA coverage of 63×.

CRISPRi-Driven Genetic Screening

127

1. dCas9 genome-integrated B. theta (B. theta::dCas9) from blood agar plates was inoculated into 5 mL of fresh BHIS medium and grown overnight at 37 °C in an anaerobic chamber (20% CO2, 4% H2, and balanced N2). 2. The overnight culture was transferred to fresh BHIS medium with OD600 of 0.05 and the culture was grown at 37 °C with 200 rpm agitation under anaerobic condition (CO2 10% and balanced N2). 3. 20 mL of the grown cells were harvested at the early exponential phase (approximately OD600 of 0.2), and 69 μL of sgRNAtransformed S17-1 cell stock (stock OD600 = 11.6) was added for mating. The mating mixture was centrifuged. 4. The supernatant was discarded and the mixture was resuspended with 200 μL of BHIS medium. 5. The mating mixture was spotted on blood agar. 6. The culture was incubated aerobically for 20 h in a standing incubator at 37 °C. 7. The cells were scraped and resuspended in 20 mL of BHIS medium containing erythromycin (25 μg/mL) and then vortexed thoroughly. 8. The resuspension was spread on two BHIS agar plates containing erythromycin (25 μg/mL) and gentamicin (200 μg/mL) on 245 × 245 mM square plates. The plates were incubated at 37 °C in an anaerobic chamber (20% CO2, 4% H2, and balanced N2) for 2 days. 9. The cell resuspension was diluted twice and spread on BHIS agar containing erythromycin (25 μg/mL) and gentamicin (200 μg/mL) in 90 × 15 mm Petri dishes. The dilution factor and number of colonies on the monitoring plates were used to determine the number of colonies on 245 × 245 mM square monitoring plates. 10. Scrapers were used to scrape all colonies from the monitoring plates, and the colonies were resuspended in the BHIS medium with 25 μg/mL erythromycin. Each plate was scraped using 14 mL of this mixture. The cell suspension was mixed thoroughly and divided into 1.8 mL cryovials. Subsequently, 50% (v/v) glycerol of the same volume was added to make a 25% (v/v) glycerol cell stock. 11. The cell stock was placed in an anaerobic chamber (20% CO2, 4% H2, and balanced N2) to perform gas exchange and then stored at -80 °C. 12. The newly created cell stocks were integrated into a single stock, with each being mixed in proportion to the number of colonies formed.

128

Minjeong Kang et al.

3.5 Quality Control (QC) of CRISPRi Library

A quality control process is necessary to verify that the CRISPRi library is evenly distributed in the B. theta genome. QC involves reading each sgRNA sequence inserted into the B. theta genome using next-generation sequencing (NGS). From this, the type and abundance of transformed sgRNAs can be determined, as well as the distribution of each sgRNA across the genome.

3.5.1 Sequencing Library Preparation

1. Genomic DNA (gDNA) was extracted from the conjugant cell stock using a commercial kit (see Notes 3 and 4). 2. The Qubit 1× dsDNA BR assay kit was used to measure gDNA concentration, and gel electrophoresis was used to detect gDNA shearing. 3. A nested PCR strategy was used to avoid amplification of sgRNA from the S17-1 strain, which was used as a conjugation helper strain. In the first amplification step of nested PCR, sgRNAs inserted into the B. theta genome were amplified using primers that bind to the B. theta genome and sgRNA, respectively. The PCR master mix contained 10 μL of 5× Q5 Reaction Buffer, 1 μL of 10 mM dNTPs, 1 μL of BT_F, and sgRNA_R, each (10 μM stock, Table 1), 0.5 μL of Q5 HighFidelity DNA polymerase, 1 μL of 10× SYBR green solution, 25 ng of gDNA, and nuclease-free water made up to 50 μL. The PCR parameters were 98 °C for 3 min; the cycle repeated until just before saturation, 98 °C for 10 s, 65 °C for 30 s, 72 ° C for 2 min, and 72 °C for 2 min, and then stored at 4 °C (see Note 5). 4. After the first PCR, the size of the PCR product was verified by gel electrophoresis using an aliquot of the PCR mixture. The expected PCR product size was 3353 bp. 5. PCR purification was performed using a commercial kit (gel extraction may be required if a non-specific band appears on gel electrophoresis). 6. In the second amplification step of nested PCR, parts of the sgRNAs required for sequencing were amplified. The PCR master mix contained 1 μL of UDi5_F1 and UDi7_R1, 25 μL of HotStarTaq Master Mix, 1 μL of 10× SYBR green solution, 50 ng of purified PCR product, and nuclease-free water made up to 50 μL. The PCR parameters were 95 °C for 15 min; the cycle repeated until just before saturation, 94 °C for 30 s, 54 °C for 30 s, 72 °C for 30 s, and 72 °C for 10 min, and then stored at 4 °C (see Note 5). 7. After the second PCR, the size of the PCR product was verified by gel electrophoresis using an aliquot of the PCR mixture. The expected size of the PCR product was 301 bp. 8. The PCR amplicons were purified using a commercial kit (gel extraction may be required if a non-specific band appears in the gel).

CRISPRi-Driven Genetic Screening

129

9. The concentration and size distribution of the sequencing library were analyzed using a Qubit 1× dsDNA HS assay kit and a D1000 ScreenTape assay for the 2200 TapeStation system. 10. The final library was quantified by qPCR using the KAPA Library Quantification Kit, as an additional quality control measure. 3.5.2

Running NGS

3.5.3 Sequencing Analysis

1. Sequencing was conducted using MiSeq System with 150 cycles single-end run with 10% PhiX control, following the manufacturer’s protocol. 1. Low-quality reads and adapter sequences were trimmed using the CLC Genomics Workbench. The promoter sequence (CAATTGGGCTACCTTTTTTTTGTTTTGTTTGCAATGGTTAATCTATTGTTAAAATTTAAAGTTTCACTTGAACTTTCAAATAATGTTCTTATATTTGCAG) and scaffold sequence (GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC) were deleted, leaving the spacer sequence (N20 sequence). 2. Each of the identified spacer sequences was assigned to the designed sgRNAs to calculate the read count for each unique sgRNA. 3. sgRNAs were distributed evenly throughout the genome and can be enriched or depleted (Fig. 1).

3.6 Determination of Gene Essentiality

Gene essentiality can be determined using gene fitness as a proxy. The fitness score for each gene was determined by measuring the abundance of each sgRNA during cell growth.

3.6.1 Growth Profiling of CRISPRi Library to Provide Selected Library

1. (Pre-culture, initial library) The B. theta CRISPRi library was inoculated into 50 mL of the BHIS medium with erythromycin (25 μg/mL) in triplicate at an OD600 of 0.05, and grown at 37 °C with 200 rpm agitation under anaerobic conditions (CO2 10% and balanced N2). 2. When the culture reached the mid-log phase (OD600 of 0.5), the three cultures were combined in proportion to the OD to make a mixture. 3. (Main culture, selected library) The culture was transferred to fresh 50 mL of BHIS medium containing erythromycin (25 μg/mL) and the inducer (100 μM of IPTG) in triplicate at an OD600 of 0.05. The culture was then grown at 37 °C with 200 rpm, under anaerobic conditions (CO2 10% and balanced N2).

130

Minjeong Kang et al.

4. When the culture reached an OD600 of 1, each of the triplicates was transferred into 50 mL of fresh BHIS medium containing erythromycin (25 μg/mL) and the inducer (100 μM IPTG). 5. Steps 3 and 4 were repeated six times (the cell culture was transferred six times and cultured for ~25 generations, see Note 6). 6. The sequencing library was prepared and the sequencing was run for both the pre-culture and each passage sample (Subheadings 3.5, steps 1 and 2). 3.6.2 Calculation of Gene Fitness Score

sgRNA fitness was determined by comparing the abundance of sgRNAs in the selected library with that of the initial library using the Z-score value. Gene fitness was defined as the 75th percentile of the sgRNA fitness of all sgRNAs binding to a gene (see Note 7). Each sgRNA read count was normalized to the sum of the total read counts (1) and transformed into log10 (2). Next, the Z-score was calculated for each sgRNA (sgRNA fitness) by subtracting the average log-transformed frequency for all sgRNAs of the initial library from the log-transformed frequency of the selected library, and dividing by the standard deviation of the log-transformed frequency for all sgRNAs of the initial library (3). Finally, the fitness values of the 75th percentile of all sgRNAs that bind to a gene were defined as representative fitness values for the gene (4). FrequencysgRNA =

read count ×n count

n i = 1 read

F 0 = log 10 FrequencysgRNA þ 0:1 FitnesssgRNA =

ð1Þ ð2Þ

F ′ selected libray - F 0μ 0 , F μ = mean of log F 0σ

- transformed frequency for all sgRNA in initial library, F 0σ = standard deviation of log - transformed frequency for all sgRNA in initial library ð3Þ Fitnessgene = 75th percentile of Z sgRNA1 , Z sgRNA2 , Z sgRNA3 , . . . Z sgRNAn ð4Þ 3.6.3 Identification of Essential Genes

The distribution of gene fitness was fitted with a Gaussian curve, where gene fitness below 2.5-fold of the standard deviation from the mean value was defined as an essential gene (see Fig. 2).

CRISPRi-Driven Genetic Screening

131

Fig. 2 Essential genes identified in BHIS across the passages. The distribution of gene fitness was fitted with a Gaussian curve, where the gene fitness below 2.5-fold of standard deviation from mean value was defined as essential genes

4

Notes 1. 55,378 sgRNAs were synthesized that bind evenly to 4873 genes. 2. It is recommended that all electrocompetent cells be generated simultaneously to maintain the same transformation efficiency throughout each electroporation cycle. 3. It is necessary to use appropriate types of gDNA kits based on the sample type. For example, if the medium contains humic substances, cell debris, and proteins, it is recommended to use a gDNA extraction kit, such as the DNeasy PowerSoil Pro kit (Qiagen), to remove these contaminants effectively. It is critical to use a high-quality gDNA extraction kit suitable for the sample to maximize the PCR efficiency for subsequent library synthesis. 4. Shearing of gDNA during the gDNA extraction needs to be minimized. 5. To prevent PCR bias during the construction of the sequencing library, performing an excessive number of PCR cycles should be avoided. 6. In pre-culture, the B. theta CRISPRi library was inoculated at an OD600 of 0.05 and grown at an OD600 of 0.5 (5). In the main culture, the B. theta CRISPRi library was inoculated at an OD600 of 0.05 and grown at an OD600 of 1 (6) and repeated six times by cell transfer. 0:5 = 3:3 generations 0:05 1 log 2 = 4:3 generations 0:05

log 2

ð5Þ ð6Þ

7. It is essential to choose appropriate algorithms for sequencing analysis. It is critical that the results be reproducible between replicates, to maintain robustness. Furthermore, the calculated gene fitness should adequately represent mapped read counts to avoid over- or under-representation of raw mapped reads [8].

132

Minjeong Kang et al.

References 1. Langridge GC, Phan MD, Turner DJ, Perkins TT, Parts L, Haase J, Charles I, Maskell DJ, Peters SE, Dougan G, Wain J, Parkhill J, Turner AK (2009) Simultaneous assay of every Salmonella Typhi gene using one million transposon mutants. Genome Res 19(12): 2308–2316 2. van Opijnen T, Camilli A (2013) Transposon insertion sequencing: a new tool for systemslevel analysis of microorganisms. Nat Rev Microbiol 11(7):435–442 3. Cui L, Vigouroux A, Rousset F, Varet H, Khanna V, Bikard D (2018) A CRISPRi screen in E. coli reveals sequence-specific toxicity of dCas9. Nat Commun 9:1912 4. Wang TM, Guan CG, Guo JH, Liu B, Wu YA, Xie Z, Zhang C, Xing XH (2018) Pooled CRISPR interference screening enables genome-scale functional genomics study in bacteria with superior performance. Nat Commun 9:2475 5. Zhang RY, Xu WS, Shao S, Wang QY (2021) Gene silencing through CRISPR interference in bacteria: current advances and future prospects. Front Microbiol 12:635227 6. Naeem M, Majeed S, Hoque MZ, Ahmad I (2020) Latest developed strategies to minimize the off-target effects in CRISPR-Cas-mediated genome editing. Cell 9(7):1608 7. Peters JM, Colavin A, Shi H, Czarny TL, Larson MH, Wong S, Hawkins JS, Lu CHS, Koo BM, Marta E, Shiver AL, Whitehead EH, Weissman JS, Brown ED, Qi LS, Huang KC, Gross CA (2016) A comprehensive, CRISPRbased functional analysis of essential genes in bacteria. Cell 165(6):1493–1506 8. Jiang W, Oikonomou P, Tavazoie S (2020) Comprehensive genome-wide perturbations

via CRISPR adaptation reveal complex genetics of antibiotic sensitivity. Cell 180(5): 1002–1017.e1031 9. Rousset F, Cabezas-Caballero J, Piastra-FaconF, Fernandez-Rodriguez J, Clermont O, Denamur E, Rocha EPC, Bikard D (2021) The impact of genetic diversity on gene essentiality within the Escherichia coli species. Nat Microbiol 6(3):301–312 10. Bosch B, DeJesus MA, Poulton NC, Zhang W, Engelhart CA, Zaveri A, Lavalette S, Ruecker N, Trujillo C, Wallach JB, Li S, Ehrt S, Chait BT, Schnappinger D, Rock JM (2021) Genome-wide gene expression tuning reveals diverse vulnerabilities of M. tuberculosis. Cell 184(17):4579–4592. e4524 11. Rousset F, Cui L, Siouve E, Becavin C, Depardieu F, Bikard D (2018) Genome-wide CRISPR-dCas9 screens in E. coli identify essential genes and phage host factors. PLoS Genet 14(11):e1007749 12. Li S, Jendresen CB, Landberg J, Pedersen LE, Sonnenschein N, Jensen SI, Nielsen AT (2020) Genome-wide CRISPRi-based identification of targets for decoupling growth from production. ACS Synth Biol 9(5):1030–1040 13. Qi LS, Larson MH, Gilbert LA, Doudna JA, Weissman JS, Arkin AP, Lim WA (2013) Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell 152(5):1173–1183 14. Sanjana NE, Shalem O, Zhang F (2014) Improved vectors and genome-wide libraries for CRISPR screening. Nat Methods 11(8): 783–784

Chapter 8 Enzymatic Preparation of DNA with an Expanded Genetic Alphabet Using Terminal Deoxynucleotidyl Transferase and Its Applications Guangyuan Wang, Yuhui Du, and Tingjian Chen Abstract Efficient preparation of DNA oligonucleotides containing unnatural nucleobases (UBs) that can pair with their cognates to form unnatural base pairs (UBPs) is an essential prerequisite for the application of UBPs in vitro and in vivo. Traditional preparation of oligonucleotides containing unnatural nucleobases largely relies on solid-phase synthesis, which needs to use unstable nucleoside phosphoramidites and a DNA synthesizer, and is environmentally unfriendly and limited in product length. To overcome these limitations of solid-phase synthesis, we developed enzymatic methods for daily laboratory preparation of DNA oligonucleotides containing unnatural nucleobase dNaM, dTPT3, or one of the functionalized dTPT3 derivatives, which can be used for orthogonal DNA labeling or the preparation of DNAs containing UBP dNaM-dTPT3, one of the most successful UBPs to date, based on the template-independent polymerase terminal deoxynucleotidyl transferase (TdT). Here, we first provide a detailed procedure for the TdT-based preparation of DNA oligonucleotides containing 3′-nucleotides of dNaM, dTPT3, or one of dTPT3 derivatives. We then present the procedures for enzyme-linked oligonucleotide assay (ELONA) and imaging of bacterial cells using DNA oligonucleotides containing 3′-nucleotides of dTPT3 derivatives with different functional groups. The procedure for enzymatic synthesis of DNAs containing an internal UBP dNaM-dTPT3 is also described. Hopefully, these methods will greatly facilitate the application of UBPs and the construction of semi-synthetic organisms with an expanded genetic alphabet. Key words Terminal deoxynucleotidyl transferase (TdT), DNA synthesis, Unnatural base pairs (UBPs), Nucleic acid labeling, Semi-synthetic life, DNA data storage

1

Introduction Unnatural base pairs (UBPs) are a group of artificially designed and synthesized extra base pairs that replicate orthogonally to natural base pairs. A large number of UBPs, represented by dZ-dP, dS-dB, dDs-dPx, and dNaM-dTPT3, have been developed [1–4],

Guangyuan Wang and Yuhui Du contributed equally to this chapter. Jeffrey Carl Braman (ed.), Synthetic Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2760, https://doi.org/10.1007/978-1-0716-3658-9_8, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2024

133

134

Guangyuan Wang et al.

providing new tools and materials for site-specific labeling of nucleic acids [5], screening of high-affinity aptamers [6], sequencing of DNA lesions [7], DNA storage [8], and creation of novel genetic codons [9]. Among these, dNaM-dTPT3 and its analog dNaM-d5SICS have been successfully introduced as the third base pair to construct a semi-synthetic organism (SSO), which stores and retrieves increased genetic information efficiently [10–12]. The extensive application of UBPs requires the efficient preparation of DNA oligonucleotides containing unnatural nucleobases (UBs). Traditionally, UBs were incorporated into DNAs via solid phase synthesis. Although it has been broadly used in the synthesis of oligonucleotides, solid phase synthesis still has many obvious limitations, such as the need for unstable nucleoside phosphoramidites and a DNA synthesizer, potential contamination to the environment, difficulty in the synthesis of long oligonucleotides, and limited compatibility of substrates with unstable moieties (any groups prone to be damaged by the harsh reaction conditions) [13, 14]. An alternative approach for the preparation of UB-containing DNA oligonucleotides is enzymatic synthesis. Many DNA polymerases have proven capable of recognizing and replicating different UBPs [4, 15, 16], which enables the enzymatic synthesis of DNAs containing these UBPs in a template-dependent manner. However, most regular DNA polymerases depend on primer-template duplex-driven synthesis, limiting their application in enzymatic de novo DNA synthesis [17]. Terminal deoxynucleotidyl transferase (TdT), which incorporates random nucleotides to the 3′-end of a single-stranded DNA (ssDNA) primer in a template-independent manner, is the most employed enzyme for de novo DNA synthesis [18–21]. Extensive studies have shown that TdT possesses a broad substrate repertoire [22–26] and can elongate an ssDNA strand to several kilobases within 24 h [27], and is thus a promising tool for DNA labeling and fast preparation of long DNA oligonucleotides for biotechnology and DNA data storage [28]. Our previous study demonstrated that TdT could incorporate dNaMTP, dTPT3TP, and dTPT3TP derivatives to the 3′-end of an ssDNA primer [29]. We also optimized the reaction conditions for TdT-mediated ssDNA extension with dNaMTP or dTPT3TP, resulting in an approximately 100% extension rate with both of the nucleoside triphosphates. Based on these results, we developed methods for efficient 3′-labeling of ssDNAs with functionally modified dTPT3TP and enzymatic synthesis of dsDNAs containing dNaM-dTPT3 at arbitrary positions. These methods have great potential to facilitate the preparation of UB or UBP-containing DNAs and their application in orthogonal nucleic acid labeling, bioimaging, and construction of genetic code expansion systems in vitro and in vivo.

Preparation of Unnatural Nucleobase-Containing DNAs Using TdT

135

In this chapter, we first describe a detailed protocol for the template-independent primer extension with different unnatural nucleoside triphosphates by TdT. We then describe protocols for orthogonal labeling of functional DNAs by TdT-mediated incorporation of dTPT3 derivatives that bear different functional groups and their use in enzyme-linked oligonucleotide assay (ELONA) and imaging of bacterial cells. We next provide a protocol for the enzymatic preparation of DNAs with an internal UBP at arbitrary positions with TdT. It should be noted that while we only describe the preparation and use of DNAs containing dNaM, dTPT3, and dTPT3 derivatives, these protocols may also be applied to the preparation and use of DNAs containing other unnatural nucleobases after proper optimization. Some specific precautions and troubleshooting tips are also summarized.

2

Materials

2.1 Reagents, Kits, and Apparatus

1. Commercial TdT from calf. 2. Deoxyribonucleoside triphosphate (dNTP) solution set. 3. Unnatural nucleoside triphosphates dNaMTP, dTPT3TP, amino-modified dTPT3TP (dTPT3AmTP), and biotinylated dTPT3TP (dTPT3BioTP). 4. Manganese (II) acetate (Mn (OAc)2) and magnesium acetate (Mg (OAc)2). 5. Streptavidin (SA). 6. Thrombin. 7. Reagents for Enzyme-Linked Oligonucleotide Assay (ELONA), including SA-HRP (Streptavidin–Horseradish Peroxidase), OPD/H2O2, H2SO4 (3 M). 8. Microplate reader. 9. Zymo ssDNA/RNA Clean & Concentrator Kit. 10. NHS-FAM (5-carboxyfluorescein N-hydroxysuccinimide ester) for the fluorescent labeling of dTPT3Am via coupling with amino group. 11. Escherichia coli (E. coli) DH5α cells. 12. Fluorescence microscope. 13. T4 DNA ligase and 10× T4 DNA ligase buffer. 14. Cyber Gold nucleic acid gel stain. 15. Reagents for PCR amplification with unnatural nucleoside triphosphates, including dNTPs, MgSO4, OneTaq DNA polymerase, OneTaq standard buffer (5×), and Deep Vent DNA polymerase.

136

Guangyuan Wang et al.

16. SA-coated magnetic beads. 17. BWBS buffer (10 mM Tris–HCl, 1 mM EDTA, 1 M NaCl, 0.01% Tween 20, pH 7.5). 18. Thermocycler. 19. Nuclease-free molecular biology grade water. 20. 10× Thrombin binding buffer (1.5 M NaCl, 20 mM CaCl2, 200 mM HEPES, 0.5% Tween 20, pH 7.3) 21. 10× Phosphate-buffered saline (PBS buffer), pH 8.5. 22. 10× E. coli binding buffer (10× PBS buffer supplemented with 10% BSA and 0.5% Tween 20). 2.2

Oligonucleotides

FAM-T36-A: 5′-FAM-CACACAGGAAACAGCTATGACGAATT CAGTGTGGAA FAM-T36-T: 5′-FAM-CACACAGGAAACAGCTATGACGAATT CAGTGTGGAT FAM-T36-G: 5′-FAM-CACACAGGAAACAGCTATGACGAATT CAGTGTGGAG FAM-T36-C: 5′-FAM-CACACAGGAAACAGCTATGACGAATT CAGTGTGGAC 31TBA: 5′-CACTGGTAGGTTGGTGTGGTTGGGGCCAGTG E1:

5′- GCAATGGTACGGTACTTCCACTTAGGTCGAGGT TAGTTTGTCTTGCTGGCG

CATCCACTGAGCGCAAAAGTGCACGCTACTTTGCTAA T50: 5′ACAGATACATCTGCCTTGCAAGGTACCTGG TATTCGCAGGAACCATCGGG T59-P-NH2: 5′-P- CACACAGGAAACAGCTATGACGAATT CAGTGTGGAGAGAGTAG TTAAACAGGAAACAGG-NH2 sp-A: 5′-TCATAGCTGTTTCCTGTGTGACCCTATAGTGAG TCGTAT sp-T: 5′-TCATAGCTGTTTCCTGTGTGTCCCTATAGTGAG TCGTAT splint-A: 5′-CGTCATAGCTGTTTCCTGTGTGACCCGATGG TTCCTGCGAATACC splint-T: 5′-CGTCATAGCTGTTTCCTGTGTGTCCCGATG GTTCCTGCGAATACC PCR-F (HindIII): TCTGCCTTGC

5′-CCGTCGACAAGCTTACAGATACA

PCR-R (BamHI): 5′-CTGATATCGGATCCCCTGTTTCCTG TTTAACTAC

Preparation of Unnatural Nucleobase-Containing DNAs Using TdT

3

137

Methods

3.1 Primer Extension with Unnatural Nucleoside Triphosphates by TdT

Protocol for TdT incorporation of dNaMTP is described as an example. TdT incorporation of dTPT3TP and its derivatives dTPT3AmTP and dTPT3BioTP can be conducted using the same protocol (see Note 1) (Fig. 1). 1. Design and purchase FAM (6-Carboxyfluorescein)-labeled natural oligonucleotides FAM-T36-A/T/G/C with different 3′ nucleobases (see Note 2). 2. Prepare the reaction mixture for primer extension (see Table 1). 3. Incubate the reaction mixture at 37 °C for 1 h. 4. Add equal volume of 2× TBE-urea sample buffer to the reaction mixture, and incubate at 95 °C for 10 min (see Note 3). 5. Analyze the primer extension products with a 20% denaturing PAGE containing 8 M urea. 6. Visualize the gel with a gel imager without staining.

Fig. 1 Chemical structures of unnatural nucleobases dNaM, dTPT3 and dTPT3 derivatives Table 1 Reaction mixture for the primer extension with dNaMTP using TdT Reagent

Volume (μL)

Primer FAM-T36-A/T/G/C (10 μM)

0.2

TdT buffer (10×)

1

dNaMTP (1 mM)

0.5

Mg (OAc)2 (100 mM)

1

Mn (OAc)2 (10 mM)

1

TdT (20 U/μL)

0.5

ddH2O

Up to 10

138

Guangyuan Wang et al.

3.2 Enzyme-Linked Oligonucleotide Assay (ELONA) with a dTPT3Bio-Labeled DNA Aptamer

Human α-thrombin is used as the model target to be detected with ELONA. 1. Prepare thrombin-coated 96-well ELISA plate. Coat each well with 100 μL of 0.05 mg/mL thrombin in 1× thrombin binding buffer (150 mM NaCl, 2 mM CaCl2, 20 mM HEPES, 0.05% Tween 20, pH 7.3) by incubation at 4 °C overnight (see Note 4). 2. Block the wells with 5% bovine serum albumin (BSA) by incubation at room temperature for 1 h, and wash the wells with 1× thrombin binding buffer 6 times. 3. Carry out TdT-mediated primer extension experiment as described in Subheading 3.1 to incorporate dTPT3BioTP into the 3′-end of thrombin aptamer 31TBA. 4. Purify the primer extension product, 31TBA-dTPT3Bio, with a Zymo ssDNA/RNA Clean & Concentrator Kit. 5. Fold 500 nM 31TBA-dTPT3Bio in the 1× thrombin binding buffer by incubating at 95 °C for 5 min and slowly cooling down to room temperature (see Note 5). 6. Add 100 μL of 5 nM/μL folded 31TBA-dTPT3Bio in the 1× thrombin binding buffer into the thrombin-coated well, and incubate at 37 °C for 30 min to allow the binding of the aptamer to thrombin (Fig. 2)

Fig. 2 Overview of orthogonal labeling of functional DNAs with modified unnatural nucleobases using TdT. (Reproduced from ref. [29] with permission from American Chemical Society)

Preparation of Unnatural Nucleobase-Containing DNAs Using TdT

139

7. Wash the wells with 1× thrombin binding buffer 6 times. 8. Add 100 μL of 1 μg/mL SA-HRP into each well. Incubate the plate at room temperature for 1 h, and wash the wells with 1× thrombin binding buffer 6 times. 9. Add 200 μL of the solution of HRP substrate OPD/H2O2 into each well, and incubate the plate at room temperature for 15 min (see Note 6). 10. Add 50 μL of 3 M H2SO4 into each well to quench the reaction. 11. Read OD490 with a microplate reader (see Note 7). 3.3 Imaging of Bacterial Cells with a dTPT3FAM-Labeled DNA Aptamer

DH5α cells are used as the model cells to be imaged. 1. Inoculate a single colony of E. coli DH5α into LB medium, and grow the culture to log phase (OD600 = 0.6). 2. Centrifuge the culture at 6000 rpm for 10 min to remove the medium, and wash the cells with 1× E. coli binding buffer three times (see Note 8). 3. Carry out TdT-mediated primer extension experiment as described in Subheading 3.1 to incorporate dTPT3AmTP into the 3′ end of E. coli aptamer E1. 4. Purify the primer extension product, E1-dTPT3Am, with a Zymo ssDNA/RNA Clean & Concentrator Kit. 5. Mix 1 μM of the purified product with 0.1 mM of NHS-FAM in 1× PBS buffer (pH 8.5), and incubate at room temperature with mild shaking overnight to couple NHS-FAM with the amino group in dTPT3Am (see Note 9). 6. Purify the FAM-labeled coupling product, E1-dTPT3FAM, with a Zymo ssDNA/RNA Clean & Concentrator kit. 7. Fold 500 nM E1-dTPT3FAM in 1× binding buffer by incubating at 95 °C for 5 min and slowly cooling down on ice for 5 min. 8. Mix 500 nM of folded E1-dTPT3FAM with washed E. coli cells (OD600 = 0.8) in 1× E. coli binding buffer, and incubate at 25 ° C with mild shaking for 1 h to allow the binding of the aptamer with E. coli cells. 9. Wash the cells 3 times with 1× E. coli binding buffer. 10. Visualize the cells binding with E1-dTPT3FAM using a fluorescence microscope.

140

Guangyuan Wang et al.

3.4 Enzymatic Synthesis of DNA Strands Containing an Internal UB/UBP 3.4.1 Preparation of DNA Oligonucleotides Containing an Internal Unnatural Nucleobase

1. Design and purchase natural oligonucleotides T50, T59-PNH2, and Splint-T or Splint-A (see Note 10). 2. Mix 1 μM ssDNA T50 with 1 μM ssDNA T59-P-NH2, 1 μM splint ssDNA Splint-T (for the preparation of dTPT3containing DNA) or Splint-A (for the preparation of dNaMcontaining DNA), 1 mM manganese acetate, 10 mM magnesium acetate, and 0.01 mM dTPT3TP or dNaMTP in 1× TdT buffer. 3. Incubate the mixture at 95 °C for 10 min, and slowly cool down to room temperature to anneal T50 and T59-P-NH2 onto the splint ssDNA. 4. Mix the annealing product with 1 U/μL TdT, and incubate at 37 °C for 1 h. 5. Terminate the reaction by incubating at 75 °C for 20 min, and cool down to room temperature. 6. Directly add into the product solution T4 DNA ligase to a final concentration of 20 U/μL, and T4 DNA ligation buffer to a final concentration of 1×. Mix thoroughly. 7. Incubate the ligation reaction at 16 °C for 16 h, and terminate the reaction by incubating at 75 °C for 15 min. 8. To analyze the product, mix an aliquot of the product solution with one volume of 2× TBE-urea sample buffer, incubate at 95 °C for 5 min, and then assay with a 15% denaturing PAGE gel containing 8 M urea. 9. Stain the gel with Cyber Gold, and image with a gel imager (Fig. 3)

3.4.2 PCR Amplification of the Unnatural Nucleobase-Containing DNA Oligonucleotide Product and Biotin Gel Shift Assay

1. Size-separate the ligation product with denaturing PAGE, and purify the ssDNA product containing the unnatural nucleobase by excising the target band from the gel. 2. PCR amplify the purified product with dNaMTP and dTPT3BioTP. The recipe of the PCR mixture is shown in Table 2 (see Note 11). 3. Perform PCR with the following program in a thermal cycler: 94 °C for 2 min; 15–20 cycles (96 °C for 30 s, 53 °C for 1 min, 68 °C for 4 min); 68 °C for 10 min (see Note 12). 4. For higher product purity, further separate the desired dsDNA product containing dNaM-dTPT3Bio from the PCR products using SA-coated magnetic beads (see Subheading 3.4.3). 5. Purify the PCR product with a Zymo ssDNA/RNA Clean & Concentrator kit. 6. Mix an aliquot of the purified PCR product with 0.5 μg/μL streptavidin, incubate at 37 °C for 4 h, and assay with 8% PAGE. 7. Stain the gel with Cyber Gold, and visualize with a gel imager.

Preparation of Unnatural Nucleobase-Containing DNAs Using TdT

141

Fig. 3 Schematic diagram for the enzymatic production of DNAs with a UBP incorporated in the middle of the sequence. (Reproduced from ref. [29] with permission from American Chemical Society) 3.4.3 Magnetic Beads Purification of the UBPContaining DNA Product Produced by PCR

1. Add 100 μL of 10 mg/mL SA-coated magnetic beads into a centrifuge tube, and wash with 1 mL BWBS buffer (10 mM Tris–HCl, 1 mM EDTA, 1 M NaCl, 0.01% Tween 20, pH 7.5) for 3 times. 2. Mix 50 μL of the DNA product produced by PCR with dNaMTP and dTPT3BioTP with 450 μL of BWBS buffer, and incubate the resulting solution with the washed beads at room temperature for 4 h with mild shaking.

142

Guangyuan Wang et al.

Table 2 Reaction mixture for PCR amplification of ssDNA containing dNaM or dTPT3 with unnatural nucleoside triphosphates Reagent

Volume (μL)

ssDNA containing dNaM/dTPT3 (100 ng/μL)

1

PCR-F (HindIII) (10 mM)

2

PCR-R(BamHI) (10 mM)

2

dNTPs (10 mM)

1.2

Bio

dTPT3

TP (0.1 mM)

1

dNaMTP (2 mM)

1

MgSO4 (100 mM)

0.24

OneTaq standard buffer (5×)

4

OneTaq DNA polymerase (5 U/μL)

0.08

Deep Vent DNA polymerase (0.2 U/μL)

0.25

ddH2O

Up to 20

3. Wash the beads with BWBS buffer for 3 times. 4. Elute the DNA binding with the beads with 95% formamide by incubating at 95 °C for 5 min. 5. Purify the DNA with a Zymo ssDNA/RNA Clean & Concentrator kit.

4

Notes 1. The same reaction conditions can be applied to the primer extension experiments with dNaMTP, dTPT3TP, and dTPT3 derivatives dTPT3AmTP and dTPT3BioTP. Under these conditions, the primer extension efficiencies with all these unnatural nucleoside triphosphates are satisfactory. 2. The incorporation efficiency of the unnatural nucleoside triphosphate by TdT can be affected by the 3′-end nucleobase of the ssDNA primer. 3. High temperature treatment of the extension products in 1× TBE-urea sample buffer helps to denature the ssDNA, eliminating possible complex structures that can affect gel analysis of the extension products. 4. To maintain the activity of the protein to be detected, when the plate coating is carried out at a higher temperature, e.g., room temperature, shorter coating time should be applied.

Preparation of Unnatural Nucleobase-Containing DNAs Using TdT

143

5. Slowly cooling down can help the folding of the ssDNA aptamer. 6. Prepare the OPD/H2O2 solution immediately before use. 7. Read OD490 immediately after quenching the reaction to avoid further color change of the solution. 8. Wash the cells gently and at low temperature to avoid the cell death and lysis. 9. Much higher concentration of NHS-FAM is used than that of ssDNA, since the NHS group is easy to get hydrolyzed in aqueous solution. 10. T59-P-NH2 is phosphorylated at the 5′-end and modified with an amino group at the 3′-end. Splint-T (for the preparation of dTPT3-containing DNA) and Splint-A (for the preparation of dNaM-containing DNA) are splint ssDNAs of T50 and T59-PNH2, and have reverse complementary regions with both T50 and T59-P-NH2. 11. dTPT3BioTP was observed to be misincorporated into undesired positions of the PCR products when at high concentration, so lower concentration of dTPT3BioTP is used than that of dNaMTP. Addition of a little amount of extra Deep Vent DNA polymerase to the PCR mixture can help improve the PCR fidelity. 12. Longer extension time in each PCR cycle than that in regular PCR with only dNTPs is helpful for the complete replication of UBP-containing DNA. Less number of PCR cycles than that of regular PCR with only dNTPs is applied to avoid excessive loss of the UBP during amplification.

Acknowledgements The authors thank the National Key R&D Program of China (2019YFA0904102), the National Natural Science Foundation of China (21978100), the Guangdong Provincial Pearl River Talents Program (2019QN01Y228), and the Program for Guangdong Introducing Innovative and Entrepreneurial Teams (2019ZT08Y318) for the financial support of this work. References 1. Hutter D, Benner SA (2003) Expanding the genetic alphabet: non-epimerizing nucleoside with the pyDDA hydrogen-bonding pattern. J Org Chem 68(25):9839–9842. https://doi. org/10.1021/jo034900k 2. Piccirilli JA, Krauch T, Moroney SE, Benner SA (1990) Enzymatic incorporation of a new base

pair into DNA and RNA extends the genetic alphabet. Nature 343(6253):33–37. https:// doi.org/10.1038/343033a0 3. Kimoto M, Kawai R, Mitsui T, Yokoyama S, Hirao I (2009) An unnatural base pair system for efficient PCR amplification and functionalization of DNA molecules. Nucleic Acids Res

144

Guangyuan Wang et al.

37(2):e14. https://doi.org/10.1093/nar/ gkn956 4. Li L, Degardin M, Lavergne T, Malyshev DA, Dhami K, Ordoukhanian P, Romesberg FE (2014) Natural-like replication of an unnatural base pair for the expansion of the genetic alphabet and biotechnology applications. J Am Chem Soc 136(3):826–829. https://doi. org/10.1021/ja408814g 5. Seo YJ, Malyshev DA, Lavergne T, Ordoukhanian P, Romesberg FE (2011) Sitespecific labeling of DNA and RNA using an efficiently replicated and transcribed class of unnatural base pairs. J Am Chem Soc 133(49):19878–19888. https://doi.org/10. 1021/ja207907d 6. Futami K, Kimoto M, Lim YWS, Hirao I (2019) Genetic alphabet expansion provides versatile specificities and activities of unnatural-base DNA aptamers targeting cancer cells. Mol Ther Nucleic Acids 14:158–170. https://doi.org/10.1016/j.omtn.2018. 11.011 7. Zhu W, Wang H, Li X, Tie W, Huo B, Zhu A, Li L (2022) Amplification, enrichment, and sequencing of mutagenic methylated DNA adduct through specifically pairing with unnatural nucleobases. J Am Chem Soc 144(44): 20165–20170. https://doi.org/10.1021/ jacs.2c06110 8. Lim CK, Nirantar S, Yew WS, Poh CL (2021) Novel modalities in DNA data storage. Trends Biotechnol 39(10):990–1003. https://doi. org/10.1016/j.tibtech.2020.12.008 9. Manandhar M, Chun E, Romesberg FE (2021) Genetic code expansion: inception, development, commercialization. J Am Chem Soc 143(13):4859–4878. https://doi.org/10. 1021/jacs.0c11938 10. Malyshev DA, Dhami K, Lavergne T, Chen T, Dai N, Foster JM, Correa IR Jr, Romesberg FE (2014) A semi-synthetic organism with an expanded genetic alphabet. Nature 509(7500):385–388. https://doi.org/10. 1038/nature13314 11. Zhang Y, Ptacin JL, Fischer EC, Aerni HR, Caffaro CE, Jose KS, Feldman AW, Turner CR, Romesberg FE (2017) A semi-synthetic organism that stores and retrieves increased genetic information. Nature 551(7682): 6 4 4 – 6 4 7 . h t t p s : // d o i . o r g / 1 0 . 1 0 3 8 / nature24659 12. Fischer EC, Hashimoto K, Zhang YK, Feldman AW, Dien VT, Karadeema RJ, Adhikary R, Ledbetter MP, Krishnamurthy R, Romesberg FE (2020) New codons for efficient production of unnatural proteins in a semisynthetic organism. Nat Chem Biol 16(5):570–575.

https://doi.org/10.1038/s41589-0200507-z 13. Jensen MA, Davis RW (2018) Templateindependent enzymatic oligonucleotide synthesis (TiEOS): its history, prospects, and challenges. Biochemistry 57(12):1821–1832. https://doi.org/10.1021/acs.biochem. 7b00937 14. Eritja R (2007) Solid-phase synthesis of modified oligonucleotides. Int J Pept Res Ther 13(1–2):53–68. https://doi.org/10.1007/ s10989-006-9053-0 15. Ledbetter MP, Karadeema RJ, Romesberg FE (2018) Reprograming the replisome of a semisynthetic organism for the expansion of the genetic alphabet. J Am Chem Soc 140(2): 758–765. https://doi.org/10.1021/jacs. 7b11488 16. Sun L, Ma X, Zhang B, Qin Y, Ma J, Du Y, Chen T (2022) From polymerase engineering to semi-synthetic life: artificial expansion of the central dogma. RSC Chem Biol 3(10): 1173–1197. https://doi.org/10.1039/ d2cb00116k 17. Obeid S, Baccaro A, Welte W, Diederichs K, Marx A (2010) Structural basis for the synthesis of nucleobase modified DNA by Thermus aquaticus DNA polymerase. Proc Natl Acad Sci U S A 107(50):21327–21331. https://doi. org/10.1073/pnas.1013804107 18. Sarac I, Hollenstein M (2019) Terminal deoxynucleotidyl transferase in the synthesis and modification of nucleic acids. Chembiochem 20(7):860–871. https://doi.org/10.1002/ cbic.201800658 19. Ramadan K, Shevelev IV, Maga G, Hubscher U (2004) De novo DNA synthesis by human DNA polymerase lambda, DNA polymerase mu and terminal deoxyribonucleotidyl transferase. J Mol Biol 339(2):395–404. https:// doi.org/10.1016/j.jmb.2004.03.056 20. Tang L, Navarro LA Jr, Chilkoti A, Zauscher S (2017) High-molecular-weight polynucleotides by transferase-catalyzed living chaingrowth polycondensation. Angew Chem Int Ed Engl 56(24):6778–6782. https://doi. org/10.1002/anie.201700991 21. Gouge J, Rosario S, Romain F, Beguin P, Delarue M (2013) Structures of intermediates along the catalytic cycle of terminal deoxynucleotidyltransferase: dynamical aspects of the two-metal ion mechanism. J Mol Biol 425(22):4334–4352. https://doi.org/10. 1016/j.jmb.2013.07.009 22. Flickinger JL, Gebeyehu G, Buchman G, Haces A, Rashtchian A (1992) Differential incorporation of biotinylated nucleotides by

Preparation of Unnatural Nucleobase-Containing DNAs Using TdT terminal deoxynucleotidyl transferase. Nucleic Acids Res 20(9):2382. https://doi.org/10. 1093/nar/20.9.2382 23. Horakova P, Macickova-Cahova H, Pivonkova H, Spacek J, Havran L, Hocek M, Fojta M (2011) Tail-labelling of DNA probes using modified deoxynucleotide triphosphates and terminal deoxynucleotidyl transferase. Application in electrochemical DNA hybridization and protein-DNA binding assays. Org Biomol Chem 9(5):1366–1371. https://doi. org/10.1039/c0ob00856g 24. Rothlisberger P, Levi-Acobas F, Sarac I, Marliere P, Herdewijn P, Hollenstein M (2017) On the enzymatic incorporation of an imidazole nucleotide into DNA. Org Biomol Chem 15(20):4449–4455. https://doi.org/ 10.1039/c7ob00858a 25. Rothlisberger P, Levi-Acobas F, Sarac I, Marliere P, Herdewijn P, Hollenstein M (2019) Towards the enzymatic formation of artificial metal base pairs with a carboxyimidazole-modified nucleotide. J Inorg Biochem 191:154–163. https://doi.org/10. 1016/j.jinorgbio.2018.11.009

145

26. Jarchow-Choy SK, Krueger AT, Liu H, Gao J, Kool ET (2011) Fluorescent xDNA nucleotides as efficient substrates for a templateindependent polymerase. Nucleic Acids Res 39(4):1586–1594. https://doi.org/10.1093/ nar/gkq853 27. Tjong V, Yu H, Hucknall A, Rangarajan S, Chilkoti A (2011) Amplified on-chip fluorescence detection of DNA hybridization by surface-initiated enzymatic polymerization. Anal Chem 83(13):5153–5159. https://doi. org/10.1021/ac200946t 28. Lee HH, Kalhor R, Goela N, Bolot J, Church GM (2019) Terminator-free template-independent enzymatic DNA synthesis for digital information storage. Nat Commun 10(1): 2383. https://doi.org/10.1038/s41467019-10258-1 29. Wang G, He C, Zou J, Liu J, Du Y, Chen T (2022) Enzymatic synthesis of DNA with an expanded genetic alphabet using terminal deoxynucleotidyl transferase. ACS Synth Biol 11(12):4142–4155. https://doi.org/10. 1021/acssynbio.2c00456

Chapter 9 Single-Nucleotide Microbial Genome Editing Using CRISPR-Cas12a Ho Joung Lee and Sang Jun Lee Abstract Microbial genome editing can be achieved by donor DNA-directed mutagenesis and CRISPR-Cas12amediated negative selection. Single-nucleotide-level genome editing enables the manipulation of microbial cells exactly as designed. Here, we describe single-nucleotide substitutions/indels in the target DNA of E. coli genome using a mutagenic DNA oligonucleotide donor and truncated crRNA/Cas12a system. The maximal truncation of nucleotides at the 3′-end of the crRNA enables Cas12a-mediated single-nucleotidelevel precise editing at galK targets in the genome of E. coli. Key words Cas12a, 3′-truncated crRNA, Single-base, Precise genome editing

1

Introduction CRISPR-Cas12a system is used for genome editing of various organisms, including bacteria [1], plants [2], mammals [3], and humans [4]. For bacterial genome editing, negative selection is usually performed using single-, double-stranded, or circular mutagenic donor DNAs. Previous studies suggested the modification of guide RNAs to improve the editing efficiency of the CRISPR-Cas system [5–7]. Recently, a mismatched sequence was introduced in advance into the single-molecular guide RNA (sgRNA) to overcome the mismatch tolerance of CRISPR-Cas9 and facilitate microbial genome editing at a single-nucleotide level [8]. In a similar way, the mismatched crRNA method was applied to the CRISPRCas12a system and successfully enabled single-base genome editing in microbial cells [1]. Furthermore, the truncation of sgRNA can help improve CRISPR-Cas9 nuclease specificity and the off-target effect [9]. We reported that a two-nucleotide truncation at the 5′-end of an sgRNA enabled CRISPR-Cas9-mediated genome editing at the single-nucleotide level [10].

Jeffrey Carl Braman (ed.), Synthetic Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2760, https://doi.org/10.1007/978-1-0716-3658-9_9, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2024

147

148

Ho Joung Lee and Sang Jun Lee

Thus, a CRISPR-Cas12a method utilizing maximally 3′-end-truncated crRNA was employed for negative selection of single-nucleotide-edited targets [11]. crRNA plasmids were designed with truncation ranging from 1 to 6 nucleotides, targeting a 20 to 15 nucleotide segment within the galK (497-517) gene. The number of surviving transformants with 1–5 nt truncated crRNAs (Δ1-Δ5) was approximately equal to the number of transformants harboring perfectly matched crRNAs (around 104 per μg of crRNA plasmid). However, when the Δ6 crRNA was utilized, the genomic DNA cleavage activity of the Cas12a nuclease was not preserved (refer to Fig. 1a). This suggests that a maximum truncation of 5 nucleotides is necessary to maintain the nuclease activity of FnCas12a in vivo. Consequently, the Δ5 crRNA was employed for the negative selection of single-nucleotide-edited targets (refer to Fig. 1b). To evaluate the editing efficiency, the single-mutagenic oligonucleotide (510C→A in galK) and crRNA plasmids were co-electroporated into cells expressing both Cas12a and Bet proteins. The enumeration of editing efficiency was based on the D-galactose fermenting phenotype. The proportions of white colonies on the MacConkey agar containing D-galactose were considerably low with 0–4 nucleotide-truncated crRNAs (approximately 12%). However, when a 3′–5-nt-truncated crRNA (Δ5) was employed, 88% of the transformants exhibited white colonies (refer to Fig. 2). In case of a single-nucleotide indel mutation, the proportion of white colonies formed using Δ4 crRNA upon the introduction of single indels was slightly higher compared to that achieved using Δ0. Remarkably, when Δ5 crRNA was utilized, the proportions of white colonies resulting from the insertion (510G) and deletion (509ΔG) in the galK target significantly increased to 79 and 76%, respectively (refer to Fig. 3). This maximally truncated crRNA method significantly improved the editing efficiency and accuracy of microbial genome editing at the single-nucleotide level, including insertions, deletions, and substitutions.

2 2.1

Materials Electroporation

1. Prepare all solutions using purified deionized water and analytical grade reagents. 2. Synthesized mutagenic oligonucleotides (refer to Table 1): harboring single-nucleotide-mutagenic sequence and 20-mer homologous to the target DNA in both sides, dissolve 100 pmolμL-1. Store at -20 °C.

Accurate Microbial Genome Editing by CRISPR-Cas12a

149

Fig. 1 The negative selection of the single-base-edited target using truncated crRNA/Cas12a. (a) The number of surviving colonies indicates the genomic DNA cleavage efficiency of Cas12a with the galK target using 3′-truncated crRNAs. The presence of a large number (>106) of surviving cells signifies the failure of target recognition and cleavage by the truncated crRNA/Cas12a complex. Each bar represents the mean value obtained from three independent experiments. (b) The unaltered target DNA can be cleaved by the 3′–5-nttruncated crRNA/Cas12a complex. Conversely, the single-nucleotide-edited target DNA cannot be recognized or cleaved by the 3′–5-nt-truncated crRNA/Cas12a complex. (Reproduced from ref. [11] with permission from the authors)

3. crRNA or bet expressing plasmids (refer to Table 2): prepared from E. coli DH5α using NucleoSpin Plasmid EasyPure kit (Macherey-Nagel, Cat. No. 740727). 4. All plasmids and strains listed in Table 2 are generated as described earlier [11]. 5. Gene Pulser #1652662).

Xcell

Electroporation

Systems

(Bio-Rad,

150

Ho Joung Lee and Sang Jun Lee

Fig. 2 The efficiencies of Cas12a-mediated single-base substitutions using different truncated crRNAs. The editing efficiency of converting 510C to A in the galK gene reached up to 88%, when a 5-nt-truncated crRNA was utilized. (Reproduced from ref. [11] with permission from the authors)

6. Electroporation cuvette: Gene Pulser/MicroPulser Electroporation Cuvettes, 0.1 cm gap (Bio-Rad, #1652089). 7. SOC medium: 10 mM MgCl2 and 20 mM D-glucose in the SOB medium (Difco, Cat. No. 244310) 2.2

MacConkey Agar

1. 20% D-galactose solution is made by dissolving 10 g of D-galactose (CAS No. 59-23-4) powder in 50 mL of ultrapure water, 0.2μm filtered and stored at room temperature (see Note 1). 2. Spectinomycin: 50 mg mL-1 solution in water, 0.2μm filtered and stored in aliquots at -20 °C. 3. Dissolve 24 g of MacConkey agar base (Difco, Cat. No. 281810) powder in 600 mL of ultrapure water using a 1 L glass beaker. 4. Adjust the pH to 7.1 using 10 N NaOH or 2 N HCl. 5. After autoclave sterilization, add D-galactose and spectinomycin (final 0.5% and 75μg mL-1, respectively), and pour into 90 mm petri dishes.

Accurate Microbial Genome Editing by CRISPR-Cas12a

151

Fig. 3 The insertion and deletion of single nucleotides in the galK gene using the truncated crRNA/Cas12a system. The editing efficiency of a 4-nt-truncated crRNA is comparable to that of an untruncated crRNA. However, when a 5-nt-truncated crRNA is used, the editing efficiencies of the insertion of 510G and the deletion of 509G in the galK target increased to 79% and 76%, respectively. (Reproduced from ref. [11] with permission from the authors) Table 1 Mutagenic oligonucleotides for single-nucleotide editing Name galKC510A

Sequence (5′→3′)

Description a

CAGTTTGTAGGCTGTAACTGA GGGATCATGGATCAGCTAAT

galK509Gdel CAGTTTGTAGGCTGTAACT(G)b CGGGATCATGGATCAGCTAAT

galK 510C to A substitution galK 509G deletion

galK510Gins CAGTTTGTAGGCTGTAACTGGaCGGGATCATGGATCAGCTAAT G insertion after galK 509 G a

The target nucleotides for single-nucleotide editing and single-nucleotide insertion are underlined The nucleotide in parentheses indicates a deleted nucleotide

b

152

Ho Joung Lee and Sang Jun Lee

Table 2 Strains and plasmids used in this study

Name

Source/ reference

Characteristics

Strain DH5α

F- fhuA2 Δ(lacZYA-argF) U169 phoA glnV44 Φ80lacZΔM15 gyrA96 recA1 relA1 endA1 thi-1 hsdR17

HK1061 MG1655, araBAD::PBAD-cas12a-KmR

Lab stock [11]

Plasmid pHK463 λ bet gene, araC, pSC101 ori ts, AmpR

[8]

crRNA plasmid pHK461 Δ0 crRNA targeting 497TAGGCTGTAACTGCGGGATCA517 in galK, pBR322 ori, SpR

[11]

pHL173 Δ1 crRNA targeting 497TAGGCTGTAACTGCGGGATC516 in galK, pBR322 ori, SpR

[11]

pHL172 Δ2 crRNA targeting 497TAGGCTGTAACTGCGGGAT515 in galK, pBR322 [11] ori, SpR pHL171 Δ3 crRNA targeting 497TAGGCTGTAACTGCGGGA514 in galK, pBR322 ori, SpR

[11]

pHL170 Δ4 crRNA targeting 497TAGGCTGTAACTGCGGG513 in galK, pBR322 ori, [11] SpR pHL190 Δ5 crRNA targeting 497TAGGCTGTAACTGCGG512 in galK, pBR322 ori, SpR

[11]

pHL189 Δ6 crRNA targeting 497TAGGCTGTAACTGCG511 in galK, pBR322 ori, SpR [11]

2.3 Sanger Sequencing

1. Taq DNA Polymerase (Biofact, Cat. No. ST402). 2. NucleoSpin Gel and PCR Clean-up kit (Macherey-Nagel, #740609). 3. Sanger Sequencing was performed by Bionics Co., Ltd.

3

Methods

3.1 Preparation of Electrocompetent Cell

1. Perform all procedures at room temperature, if not mentioned. 2. Inoculate a single colony of E. coli HK1061 cells harboring pHK463 plasmid to 4 mL of LB broth containing ampicillin at 50μg mL-1, in a 14 mL round bottom tube, and grow for 16 h at 30 °C, 180 rpm.

Accurate Microbial Genome Editing by CRISPR-Cas12a

153

3. Transfer 1% of the seed culture into 200 mL of LB broth containing ampicillin at 50μg mL-1, in a 1 L flask, and measure optical density (OD) at 600 nm (OD600) while incubating at 30 °C, 180 rpm. 4. At OD600 = 0.4, add 150μL of 20% L-arabinose solution into 200 mL of LB broth in the flask (L-arabinose final 1 mM), and grow cells for additional 3 h, at 30 °C, 180 rpm. 5. After 3 h, chill the flask on ice for 10 min. 6. Harvest cells by centrifugation at 3500 rpm, 4 °C for 20 min, and discard the supernatant. 7. Resuspend the cells with 1 mL of ice-chilled 10% glycerol solution, and add the 10% glycerol solution up to 40 mL (see Note 2). 8. Repeat steps 5–6 once. 9. Discard the supernatant, resuspend the cells with 1 mL of 10% glycerol solution, divide the resuspended cells into aliquots of 50μL each. Leave one aliquot at 4 °C for current use and store the remaining aliquots at -80 °C. 3.2

Electroporation

1. Chill one aliquot of 50μL electrocompetent HK1061 cells harboring pHK463 and the electroporation cuvette on ice. 2. Add 200 ng of purified crRNA plasmid (refer to Table 2) and 100 pmol of mutagenic oligonucleotide to the electrocompetent cells, mix with gentle pipetting, and incubate for 5 min on ice. 3. Transfer the mixed electrocompetent cells into the electroporation cuvette (see Note 3–5). 4. Electroporate at 25μF, 200 Ω, and 1.8 kV, and add 950μL of SOC to the electroporated cells immediately (see Notes 6 and 7). 5. Transfer 1 mL of the electroporated cells to a 14 mL round bottom tube, and recover for 1 h at 37 °C, 180 rpm. 6. Spread the cells on MacConkey agar containing D-galactose and spectinomycin (see Note 8). 7. Incubate for 16 h at 37 °C (see Note 9).

3.3 Sanger Sequencing

1. Calculate the ratio of the number of white colonies to the total number of colonies on MacConkey agar containing D-galactose and spectinomycin. 2. Amplify the galK gene by PCR in four white colonies per each electroporation. 3. After agarose gel electrophoresis, purify each amplified galK fragments using NucleoSpin Gel and PCR Clean-up kit for Sanger sequencing.

154

4

Ho Joung Lee and Sang Jun Lee

Notes 1. Incubate 20% D-galactose solution in a dry oven at 60 °C to dissolve D-galactose powder thoroughly in water. 2. Resuspend the cell pellet in a conical tube on ice, and pipette carefully to avoid foaming. 3. Pipette carefully to avoid forming bubbles, which can lead to incomplete electroporation. 4. Hold the upper side of the electroporation cuvette, to prevent heat transfer from the hands to the cells. 5. If air bubbles form between the electrodes of the cuvette, remove the air bubbles by tapping the electroporation cuvette on a paper towel 4–5 times. 6. Preheat a sterilized SOC medium at 37 °C before electroporation. 7. Tilt the cuvette while adding SOC to prevent bubble formation in the electroporated cells. 8. When 2, 20, and 200μL of the electroporated and recovered cells were spread, add 200μL of ultrapure water on MacConkey agar to prevent dense colony formation. 9. After incubation at 37 °C for 16 h, store the MacConkey agar plates at room temperature (not in the refrigerator) to prevent colonies from developing red coloration.

References 1. Kim HJ, Oh SY, Lee SJ (2020) Single-base genome editing in Corynebacterium glutamicum with the help of negative selection by target-mismatched CRISPR/Cpf1. J Microbiol Biotechnol 30(10):1583–1591. https:// doi.org/10.4014/jmb.2006.06036 2. Kim H, Kim ST, Ryu J, Kang BC, Kim JS, Kim SG (2017) CRISPR/Cpf1-mediated DNA-free plant genome editing. Nat Commun 8: 1 4 4 0 6 . h t t p s : // d o i . o r g / 1 0 . 1 0 3 8 / ncomms14406 3. Kim Y, Cheong SA, Lee JG, Lee SW, Lee MS, Baek IJ, Sung YH (2016) Generation of knockout mice by Cpf1-mediated gene targeting. Nat Biotechnol 34(8):808–810. https://doi. org/10.1038/nbt.3614 4. Kim D, Kim J, Hur JK, Been KW, Yoon SH, Kim JS (2016) Genome-wide analysis reveals specificities of Cpf1 endonucleases in human cells. Nat Biotechnol 34(8):863–868. https:// doi.org/10.1038/nbt.3609

5. Zhang X, Xu L, Fan R, Gao Q, Song Y, Lyu X, Ren J, Song Y (2018) Genetic editing and interrogation with Cpf1 and caged truncated pre-tRNA-like crRNA in mammalian cells. Cell Discov 4:36. https://doi.org/10.1038/ s41421-018-0035-0 6. Park HM, Liu H, Wu J, Chong A, Mackley V, Fellmann C, Rao A, Jiang F, Chu H, Murthy N, Lee K (2018) Extension of the crRNA enhances Cpf1 gene editing in vitro and in vivo. Nat Commun 9(1):3313. https://doi.org/10.1038/s41467-01805641-3 7. Kim H, Lee WJ, Oh Y, Kang SH, Hur JK, Lee H, Song W, Lim KS, Park YH, Song BS, Jin YB, Jun BH, Jung C, Lee DS, Kim SU, Lee SH (2020) Enhancement of target specificity of CRISPR-Cas12a by using a chimeric DNA-RNA guide. Nucleic Acids Res 48(15): 8601–8616. https://doi.org/10.1093/nar/ gkaa605

Accurate Microbial Genome Editing by CRISPR-Cas12a 8. Lee HJ, Kim HJ, Lee SJ (2020) CRISPRCas9-mediated pinpoint microbial genome editing aided by target-mismatched sgRNAs. Genome Res 30(5):768–775. https://doi. org/10.1101/gr.257493.119 9. Fu YF, Sander JD, Reyon D, Cascio VM, Joung JK (2014) Improving CRISPR-Cas nuclease specificity using truncated guide RNAs. Nat Biotechnol 32(3):279–284. https://doi.org/10.1038/nbt.2808 10. Lee HJ, Kim HJ, Lee SJ (2021) Mismatch intolerance of 5′-truncated sgRNAs in

155

CRISPR/Cas9 enables efficient microbial single-base genome editing. Int J Mol Sci 22(12):6457. https://doi.org/10.3390/ ijms22126457 11. Lee HJ, Kim HJ, Park YJ, Lee SJ (2022) Efficient single-nucleotide microbial genome editing achieved using CRISPR/Cpf1 with maximally 3′-end-truncated crRNAs. ACS Synth Biol 11(6):2134–2143. https://doi. org/10.1021/acssynbio.2c00054

Chapter 10 Multiplex Marker-Less Genome Integration in Pichia pastoris Using CRISPR/Cas9 Jucan Gao, Jintao Cheng, and Jiazhang Lian Abstract Pichia pastoris is known for its excellent protein expression ability. As an industrial methyl nutritional yeast, it can effectively utilize methanol as the sole carbon source, serving as a potential platform for C1 biotransformation. Unfortunately, the lack of synthetic biology tools in P. pastoris limits its broad applications, particularly when multigene pathways should be manipulated. Here, the CRISPR/Cas9 system is established to efficiently integrate multiple heterologous genes to construct P. pastoris cell factories. In this protocol, with the 2,3-butanediol (BDO) biosynthetic pathway as a representative example, the procedures to construct P. pastoris cell factories are detailed using the established CRISPR-based multiplex genome integration toolkit, including donor plasmid construction, competent cell preparation and transformation, and transformant verification. The application of the CRISPR toolkit is demonstrated by the construction of engineered P. pastoris for converting methanol to BDO. This lays the foundation for the construction of P. pastoris cell factories harboring multi-gene biosynthetic pathways for the production of high-value compounds. Key words Pichia pastoris, CRISPR/Cas9, Multiplex genome integration, 2,3-Butanediol, Microbial cell factories

1

Introduction As a methylotrophic yeast, Pichia pastoris (also known as Komagataella phaffii) has received a lot of research and attention in both academic research and industrial production and is currently one of the most widely used eukaryotic expression hosts [1–4]. P. pastoris has a very strong transcriptional regulation system and has shown many advantages in genetic manipulation, post-translational modification, secretory expression, and high-density fermentation, which is suitable for large-scale industrial production. Moreover, the potent assimilation of methanol, a cheap substrate, makes P. pastoris a dominant chassis for C1 compound biotransformation [1, 5–7]. In addition, the USA Food and Drug Administration has recognized P. pastoris as GRAS (generally recognized as safe) strain,

Jeffrey Carl Braman (ed.), Synthetic Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2760, https://doi.org/10.1007/978-1-0716-3658-9_10, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2024

157

158

Jucan Gao et al.

which has further enhanced its potential applications in the field of drugs and food [8, 9]. However, the construction of P. pastoris cell factories has been largely hampered by the lack of genetic tools to manipulate polygenic biosynthetic pathways. As the most widely used genetic editing technique at present, the CRISPR/Cas9 system has been studied and developed in P. pastoris [10, 11]. In 2016, Weninger et al. first established the CRISPR/Cas9 system for genome editing of P. pastoris through systematic optimization of the expression of Cas9 and gRNA [12]. Afterwards, key elements such as promoters, donor fragments, and plasmid autonomous replication sequence (ARS) of the CRISPR/Cas9 system have been further optimized to improve genome editing efficiency in P. pastoris [13–18]. Compared with traditional gene manipulation techniques, the CRISPR/Cas9 system mainly induces the endogenous repair mechanism through the double-strand break (DSB) generated by Cas9 cleavage, leading to the manipulation of P. pastoris genome in a selection marker-free manner. Recently, our group has established a CRISPR-based synthetic biology toolkit that can be used to integrate multiple genes into P. pastoris genome in a single step. Through the characterization of synthetic biology elements (e.g., integration sites) and the optimization of genome engineering parameters, the established CRISPR toolkit enables the integration of heterologous genes into single-locus, double-loci, and triple-loci with an efficiency as high as 100%, ~93%, and ~75%, respectively. Finally, the application of the toolkit was demonstrated by efficiently integrating heterologous pathway genes into P. pastoris for the production of 2,3-butanediol (BDO), β-carotene, zeaxanthin, and astaxanthin [18]. Here, the procedures for the construction of P. pastoris cell factories using the established CRISPR-based genome integration toolkit are introduced in detail, mainly including linear donor fragment preparation, competent cell preparation and transformaiton, as well as transformant verification. While the BDO biosynthetic pathway is chosen as a case study, the toolkit can be generally applicable for the construction of P. pastoris cell factories to synthesize other value-added products.

2

Materials Prepare all solutions using ultrapure water and analytical-grade reagents. Store preconfigured solutions at 4 °C (unless otherwise stated).

Multiplex Marker-Less Genome Integration in Pichia pastoris Using CRISPR/Cas9

2.1 Gibson Assembly (GA) Mix

159

1. 5× GA reaction buffer: The 5× GA reaction buffer formula is listed in Table 1. Aliquot the 5× GA reaction buffer, 320 μL into each tube and store at -20 °C. 2. GA mix: The GA mix formula is listed in Table 2. Aliquot the GA mix, 15 μL into each tube and store at -20 °C.

2.2 Buffer and Reagents

1. BEDS: 0.16317 g N,N-Dihydroxyethyl glycine, 3 mL ethylene glycol, 18.217 g sorbitol, and add water to a total volume of 95 mL (pH 8.3). Store at 4 °C after autoclave. 2. DMSO: Dimethyl sulfoxide, filtered and stored at room temperature. 3. DTT: Dithiothreitol is formulated at a concentration of 1 M, filtered, and stored at room temperature.

2.3

Culture Medium

1. LB: 0.5 g yeast extract, 2 g tryptone, 1 g sodium chloride, and add water to a total volume of 100 mL. 2. YPD: 1 g yeast extract, 2 g peptone, 2 g glucose, and add water to a total volume of 100 mL. 3. YPM: 1 g yeast extract, 2 g peptone, and add water to a total volume of 98 mL. After autoclave, add 2 mL of filtered methanol. 4. Incubation solution: Mix equal volumes of 1 M sorbitol and YPD and store at 4 °C. 5. Ampicillin: The stock concentration is 200 mg/mL and the working concentration is 200 mg/L. 6. Zeocin: The stock concentration is 100 mg/mL and the working concentration is 100 mg/L.

3

Methods

3.1 Plasmid Construction

Precise genome editing at multiple sites can be efficiently achieved using the CRISPR-based synthetic biology toolkit [18]. Taking the construction of the BDO biosynthetic pathway in P. pastoris as an example, the experimental procedures are described in detail (see Fig. 1). Three heterologous genes (AlsS, AlsD, and BDH) involved in BDO biosynthesis are introduced into three different integration loci (Int1, Int12, and Int21) [18]. First, the helper plasmids Int1PAOX2*-T0547, Int12-PAOX1-TCYC1, and Int21-PFLD1-TDUS4 corresponding to the three integration sites are linearized by the pre-reserved restriction enzymes (see Note 1 and Table 3). The heterologous genes are cloned into the linearized helper plasmids using GA to obtain donor plasmids Int1-AlsS, Int12-AlsD, and Int21-BDH (see Note 2). The constructed vectors are screened

160

Jucan Gao et al.

Table 1 5× GA reaction buffer Component

Concentration Volume

Tris–HCl (pH 7.5) 1 M

3 mL

MgCl2

2M

150 μL

dNTP

25 mM

240 μL

DTT

1M

300 μL

NAD

100 mM

300 μL

PEG-8000

1.5 g

/

H2O

/

Add water to a total volume of 6 mL

Table 2 GA mix 5× Gibson buffer

320 μL

T5 exonuclease

0.64 μL

Phusion DNA polymerase

20 μL

H2 O

Add water to a total volume of 1040 μL

and validated by colony PCR and DNA sequencing (Table 4). During the entire vector construction process, Escherichia coli DH5α is used as the host, LB is used as the medium, and ampicillin is used as the selection resistance. The specific steps are as follows: 1. Design primers according to the heterologous genes inserted and the selected helper plasmids (AlsS-F/R, AlsD-F/R, and BDH-F/R). 2. Amplify and purify the heterologous genes using the designed primers. 3. Linearize the helper plasmids by restriction enzymes, which are then purified and recovered with a PCR product purification kit. 4. Clone the amplified heterologous gene fragments into the linearized helper plasmids using GA (see Note 3). 5. Add 3 μL of the GA reaction mixture to 30 μL of E. coli DH5α competent cells, mix well and incubate on ice for 30 min. 6. Heat shock the mixture at 42 °C for 30 s and then incubate on ice for 3 min. Add 300 μL LB and culture at 37 °C, 220 rpm for 1 h. 7. Spread 100 μL of the cell culture onto LB plates containing ampicillin. Incubate the selection plates at 37 °C for 12–16 h.

Multiplex Marker-Less Genome Integration in Pichia pastoris Using CRISPR/Cas9

161

Fig. 1 Procedures for the construction of P. pastoris cell factories using the CRISPR-based multiplex genome editing toolkit Table 3 Helper plasmids for vector construction Heterologous genes

Helper plasmids

Pre-reserved restriction enzyme cutting sites

AlsS

Int1-PAOX2*-T0547

AatII and EcoRI

AlsD

Int12-PAOX1-TCYC1

KpnI and BsaI

BDH

Int21-PFLD1-TDUS4

BamHI and XhoI

8. Select eight transformants randomly to verify vector construction by colony PCR. 9. Sequence the heterologous genes of the verified transformants. 10. Amplify the heterologous gene expression cassettes containing 500 bp homologous arms on both sides using donor amplification primers (Int1-donor-F/R, Int12-donor-F/R, and Int21-donor-F/R). 11. Purify the amplify expression cassettes containing homologous arms and store at -20 °C.

162

Jucan Gao et al.

Table 4 Primers used for the amplification of heterologous genes Primers

Primer sequences

AlsS-F

agaagatcaaaaaacaactaattattcgaagacgtATGATGCACTCATCTGCCTGC

AlsS-R

atgtaagcgtgacataactaattacatgagacgtTTAGTTTTCGACGGAACGGATC

AlsD-F

aactttgcttgttcatacaattcttgatattcacagATGCAAAAAGTTGCTCTCGTAAC

AlsD-R

aatttttttttgtgctttgctcgattgacttggTTAGTTGAACACCATCCCACCATC

BDH-F

aatccccacaaacaaatcaactgagaaaaagtcATGAATAATGTAGCCGCTAAAAATG

BDH-R

ttagtcttaaactaagcgaaactacgtacaggtctcaTCAAGATTGCTTAGAGGCTTC

Int1-donor-F

ctgggcagtagtgaattggttgcatg

Int1-donor-R

acattgttcgtgaggctaatcc

Int12-donor-F

gatactacaagaaaggttgttgatg

Int12-donor-R

aatgtttctttactattgaatcttcag

Int21-donor-F

attgatctttgatcagattcacg

Int21-donor-R

accatgaaactgccattgaatacc

a

The homology arms for recombination with the helper plasmids are shown in lower case, while the sequences for the amplification of heterologous pathway genes are shown in upper case

3.2 Transformation of P. pastoris Competent Cells

1. Pick up a single clone of P. pastoris GS115-Cas9, inoculate into 5 mL of YPD medium, and grow overnight at 30 °C and 250 rpm. 2. Transfer the overnight culture to 50 mL YPD, with an initial OD600 between 0.15 and 0.2. Culture the yeast cells under the same conditions for additional 4–5 h, until OD600 reaching between 0.8 and 1. 3. Spin down yeast cells by centrifuging at 4000 rpm for 5 min at room temperature, and discard the supernatant. 4. Resuspend the yeast cells after adding 8.55 mL BEDS, 450 μL DMSO, and 1 mL DTT. Perform the resuspension in a gentle manner. 5. Incubate the resuspended liquid at 100 rpm and 30 °C for 5 min. 6. Spin down yeast cells by centrifuging at 4000 rpm for 5 min at room temperature, and discard the supernatant. 7. Resuspend the yeast cells after adding 950 μL BEDS and 50 μL DMSO. Transfer the resuspension to a pre-chilled 1.5 mL EP tube. 8. Spin down yeast cells in the EP tube by centrifuging at 4000 rpm and 4 °C for 30 s, and discard the supernatant.

Multiplex Marker-Less Genome Integration in Pichia pastoris Using CRISPR/Cas9

163

9. Resuspend the yeast cell after adding 190 μL BEDS and 10 μL DMSO. Aliquot the resultant competent cells 40 μL/tube and store at -80 °C (see Note 4). 10. Mix 40 μL competent cells, 1500 ng gRNA plasmid, and 2000 ng linearized DNA fragment, which is then transferred to a pre-chilled electroporation cuvette and incubated on ice for 3 min (see Note 5). 11. Electroporation using the following parameters: cuvette gap, 2.0 mm; charging voltage, 1500 V; resistance, 400 Ω; capacitance, 25 μF. 12. Add 1 mL of the incubation solution immediately after electroporation. Culture the resuspended liquid at 30 °C and 220 rpm for 4 h (see Note 6). 13. Spread 100 μL of the yeast culture onto a YPD plate containing zeocin (see Note 7). 14. Incubate the agar plates at 30 °C for 2–3 days. 3.3 Validation of P. pastoris Transformants

1. Pick up ten transformants on the selection plate randomly, and add 20 μL of MightyPrep reagent for DNA to each. Incubate the mixture at 98 °C for 15 min. 2. Spin down cell pellets by centrifuging and use the supernatant as a template for diagnostic PCR. 3. Perform PCR amplification using verification primers for genome integration (Table 5), and determine whether the heterologous gene expression cassette is integrated or not according to the size of the amplified fragment (Fig. 2). 4. Sequence the fragments with correct size to further confirm the integration of foreign genes. According to the correct number of transformants, calculate the integration efficiency (see Note 8). As shown in Fig. 2, the AlsS, AlsD, and BDH expression cassettes are integrated with an efficiency of 100%, 90%, and 70%, respectively. As for the BDO biosynthetic pathway, seven out of ten clones show the integration of three expression cassettes. In other words, the three-gene BDO biosynthetic pathway is integrated with an efficiency of 70% using the CRISPR-based genome integration toolkit in a single step. 5. Inoculate the correctly sequenced strains into 3 mL YPD medium and incubate at 30 °C, 220 rpm for 16 h. 6. Transfer the seed culture to 50 mL YPM and carry out fermentation at 30 °C, 120 rpm for 3 days. 7. Spin down yeast cells from 1 mL fermentation broth by centrifuging at 12000 rpm for 1 min. Dilute the supernatant 10 times, which is then filtered and placed in HPLC vials.

164

Jucan Gao et al.

Table 5 Primers for the verification of genome integration Heterologous genes

Primers used for genome validation

Int1-L

aaagtgaatctgaacgttgc

Int1-R

tgttttctcctctgaatcacg

Int12-L

tggttccattagttcgactgc

Int12-R

aacaccagtaatgacacagc

Int21-L

agcacagaaatgaaatctatcg

Int21-R

ttattgactgagatgctcagg

Fig. 2 Verification of the integration of the BDO pathway genes by colony PCR

Multiplex Marker-Less Genome Integration in Pichia pastoris Using CRISPR/Cas9

165

Fig. 3 HPLC analysis of the production of BDO from methanol

8. Detect the production of BDO using a Shimadzu HPLC chromatograph equipped with an Aminex HPX-87H column (Bio-Rad, Hercules, CA, USA) and a Shimadzu RID-20A differential refractive detector. The column temperature is maintained at 65 °C, the mobile phase is 0.5 mM sulfuric acid, the constant flow rate is 0.6 mL/min, and the injection volume is 10 μL. 9. Dissolve BDO standards at different concentrations for HPLC analysis. Identify the production of BDO qualitatively according to the retention time of the standard product (Fig. 3). Determine the titer of the target product quantitatively according to the peak area of the standard chemical (standard curve method, see Note 9).

4

Notes 1. All the plasmids used in this chapter can be requested from the corresponding author with a Materials Transfer Agreement. 2. GA: 15 μL GA mix, 0.03 pmol linearized helper plasmid, 0.03 pmol heterologous gene, and add water to a total volume of 20 μL. After mixing thoroughly, incubate at 50 °C for 1 h. 3. As restriction sites are reserved on the helper plasmids, the heterologous pathway genes can be cloned using the restriction/digestion method as well. 4. The transformation efficiency of competent cells does not change within 2 weeks and will be decreased afterwards. 5. For single-locus, two-loci, and three-loci editing, the amount of gRNA plasmid and linearized DNA fragment are 500 ng and 1000 ng, 1000 ng and 1500 ng, and 1500 ng and 2000 ng, respectively. 6. Incubation can be omitted if an auxotrophic marker is used. 7. Spread more yeast cells when more loci are edited. About 100 μL, 150 μL, and 200 μL of yeast cultures are suggested for single-locus, two-loci, and three-loci integration, respectively.

166

Jucan Gao et al.

8. To a certain extent, the longer the homologous arms, the higher the integration efficiency. 9. The methanol peak and BDO peak are relatively close to each other, and the samples should be diluted five to tenfold in ddH2O for HPLC analysis.

Acknowledgements This work was supported by the National Key Research and Development Program of China (2021YFC2103200), the Natural Science Foundation of Zhejiang Province (LR20B060003), the Natural Science Foundation of China (22278361), and the Fundamental Research Funds for the Central Universities (226-202200214). References 1. Gao J, Jiang L, Lian J et al (2021) Development of synthetic biology tools to engineer Pichia pastoris as a chassis for the production of natural products. Syn Syst Biotechno 6(2): 110–119 2. Sun W, Zuo Y, Yao Z et al (2022) Recent advances in synthetic biology applications of Pichia species. In: Synthetic biology of yeasts, pp 251–292. https://doi.org/10.1007/9783-030-89680-5_10 3. Karbalaei M, Rezaee S, Farsiani H (2020) Pichia pastoris: a highly successful expression system for optimal synthesis of heterologous proteins. J Cell Physiol 235(9):5867–5881 4. Zhu T, Sun H, Wang M et al (2019) Pichia pastoris as a versatile cell factory for the production of industrial enzymes and chemicals: current status and future perspectives. Biotechnol J 14(6):e1800694 5. Zhu T, Zhao T, Bankefa O et al (2020) Engineering unnatural methylotrophic cell factories for methanol-based biomanufacturing: challenges and opportunities. Biotechnol Adv 39: 107467 6. Patra P, Das M, Kundu P et al (2021) Recent advances in systems and synthetic biology approaches for developing novel cell-factories in non-conventional yeasts. Biotechnol Adv 47:107695 7. Yang Z, Zhang Z (2018) Engineering strategies for enhanced production of protein and bio-products in Pichia pastoris: a review. Biotechnol Adv 36(1):182–195 8. Ciofalo V, Barton N, Kreps J et al (2006) Safety evaluation of a lipase enzyme preparation,

expressed in Pichia pastoris, intended for use in the degumming of edible vegetable oil. Regul Toxicol Pharmacol 45(1):1–8 9. Safder I, Khan S, Islam I et al (2018) Pichia pastoris expression system: a potential candidate to express protein in industrial and biopharmaceutical domains. Biomed Lett 4(1): 1–14 10. Cai P, Gao J, Zhou Y (2019) CRISPRmediated genome editing in non-conventional yeasts for biotechnological applications. Microb Cell Factories 18(1):63 11. Lino C, Harper J, Carney J et al (2018) Delivering CRISPR: a review of the challenges and approaches. Drug Deliv 25(1):1234–1257 12. Weninger A, Hatzl A, Schmid C et al (2016) Combinatorial optimization of CRISPR/Cas9 expression enables precision genome engineering in the methylotrophic yeast Pichia pastoris. J Biotechnol 235:139–149 13. Gu Y, Gao J, Cao M et al (2019) Construction of a series of episomal plasmids and their application in the development of an efficient CRISPR/Cas9 system in Pichia pastori. World J Microbiol Biotechnol 35(6):79 14. Prielhofer R, Barrero J, Steuer S et al (2017) GoldenPiCS: a Golden Gate-derived modular cloning system for applied synthetic biology in the yeast Pichia pastoris. BMC Syst Biol 11(1): 123 15. Weninger A, Fischer J, Raschmanova H et al (2018) Expanding the CRISPR/Cas9 toolkit for Pichia pastoris with efficient donor integration and alternative resistance markers. J Cell Biochem 119(4):3183–3198

Multiplex Marker-Less Genome Integration in Pichia pastoris Using CRISPR/Cas9 16. Dalvie N, Leal J, Whittaker C et al (2020) Host-informed expression of CRISPR guide RNA for genomic engineering in Komagataella phaffii. ACS Synth Biol 9(1):26–35 17. Yang Y, Liu G, Chen X et al (2020) High efficiency CRISPR/Cas9 genome editing system with an eliminable episomal sgRNA

167

plasmid in Pichia pastoris. Enzym Microb Technol 138:109556 18. Gao J, Xu J, Zuo Y et al (2022) Synthetic biology toolkit for marker-less integration of multigene pathways into Pichia pastoris via CRISPR/Cas9. ACS Synth Biol 11(2): 623–633

Chapter 11 Genome Editing, Transcriptional Regulation, and Forward Genetic Screening Using CRISPR-Cas12a Systems in Yarrowia lipolytica Adithya Ramesh, Sangcheon Lee, and Ian Wheeldon Abstract Class II Type V endonucleases have increasingly been adapted to develop sophisticated and easily accessible synthetic biology tools for genome editing, transcriptional regulation, and functional genomic screening in a wide range of organisms. One such endonuclease, Cas12a, presents itself as an attractive alternative to Cas9-based systems. The ability to mature its own guide RNAs (gRNAs) from a single transcript has been leveraged for easy multiplexing, and its lack of requirement of a tracrRNA element, also allows for short gRNA expression cassettes. To extend these functionalities into the industrially relevant oleaginous yeast Yarrowia lipolytica, we developed a set of CRISPR-Cas12a vectors for easy multiplexed gene knockout, repression, and activation. We further extended the utility of this CRISPR-Cas12a system to functional genomic screening by constructing a genome-wide guide library targeting every gene with an eightfold coverage. Pooled CRISPR screens conducted with this library were used to profile Cas12a guide activities and develop a machine learning algorithm that could accurately predict highly efficient Cas12a gRNA. In this protocols chapter, we first present a method by which protein coding genes may be functionally disrupted via indel formation with CRISPR-Cas12a systems. Further, we describe how Cas12a fused to a transcriptional regulator can be used in conjunction with shortened gRNA to achieve transcriptional repression or activation. Finally, we describe the design, cloning, and validation of a genome-wide library as well as a protocol for the execution of a pooled CRISPR screen, to determine guide activity profiles in a genome-wide context in Y. lipolytica. The tools and strategies discussed here expand the list of available synthetic biology tools for facile genome engineering in this industrially important host. Key words Genome editing, Transcriptional regulation, Synthetic Biology, CRISPR-Cas12a, CRISPR interference and activation, Pooled CRISPR screens, Yarrowia lipolytica

1

Introduction Yarrowia lipolytica is a nonconventional oleaginous yeast that can utilize a wide variety of inexpensive and renewable substrates (such as sugars, glycerol, fatty acids, alkanes, and other hydrophobic substrates) as carbon sources. It also displays halotolerance and pH tolerance, with the ability to grow under high levels of salt

Jeffrey Carl Braman (ed.), Synthetic Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2760, https://doi.org/10.1007/978-1-0716-3658-9_11, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2024

169

170

Adithya Ramesh et al.

stress (up to 10% w/v) and a wide range of pH from 4 to 11 [1, 2]. As an oleaginous yeast, Y. lipolytica natively accumulates high levels of intracellular lipids as triacylglycerides (TAGs) [3– 7]. As a consequence of this oleaginous behavior, it can accommodate a high flux of the metabolic precursor acetyl-CoA. These characteristics have made Y. lipolytica an attractive industrial host for the production of a wide variety of chemicals such as lipidderived biofuels and oleochemicals, organic acids, terpenoids, and sugar alcohols [8–16]. In part, Y. lipolytica owes its success as a production host to the development of synthetic biology tools for genome editing over the past few years. There now exists a suite of CRISPR-Cas9-based tools for gene editing, integration, and transcriptional regulation for facile genetic engineering [17–22]. As increasingly complex host and pathway engineering is performed to push the limits of this organism for industrial bioprocessing, more advanced tools will be required. Synthetic biology tools that will facilitate multiplexed and combinatorial genome editing strategies are necessary for shorter design-build-test-learn cycles during strain engineering. As a nonconventional yeast, much of Y. lipolytica’s genome encodes proteins with unknown functions. As such, genome-wide screening strategies also provide a promising avenue for elucidating as of yet uncovered genetics and metabolism of this host. The growing number of available CRISPR endonucleases present themselves as potent solutions to the development of such advanced synthetic biology tools, with CRISPR-Cas12a systems showing several attractive features for genome engineering. Cas12a from Acidaminococcus spp. BV3L6 (AsCpf1), Lachnospiraceae bacterium ND2006 (LbCpf1), and Francisella novicida U112 (FnCpf1) are some of the most commonly adapted CRISPR endonucleases for genome editing. Cas12a targets genomic loci in a manner similar to Cas9, but has the advantage of processing its own CRISPR-RNA arrays [23]. The ability to mature its own guide RNAs (gRNAs) from a single transcript can be leveraged for easy multiplexing [24]. Cas12a also benefits from a T-rich PAM sequence (TTTV) that does not overlap with Cas9 function, and, unlike Cas9, does not require a tracrRNA sequence, which shortens gRNA expression cassettes [25]. Moreover, Cas12a endonuclease is also purported to show lesser off-target editing compared to its Cas9 counterpart. Y. lipolytica like most nonmodel yeasts preferentially repairs DNA using error prone nonhomologous end joining which causes indel mutations and subsequently loss of gene function when it occurs in protein coding genes [26]. Cas12a can thus easily induce DSB to disrupt gene function in this yeast. Coupled with Cas12a’s ability to mature its own gRNAs, multiplexed gene disruption can also easily be achieved [17, 21]. Another interesting feature of Cas endonucleases is their gRNA length-dependent nuclease activity [21, 27]. While full-length

Yarrowia CRISPR Screens

171

gRNAs (22–25 nt) effectively induce Cas12a to cause DSB in the target locus, gRNAs truncated to less than 16 nt or less still target Cas12a to the desired locus but are not able to induce Cas12a nuclease activity. This feature may be leveraged to control gene transcription. Cas12a fused to a transcriptional regulator protein may be targeted upstream of a desired gene to modulate its expression. This way, Cas12a function may simply be changed by varying gRNA lengths, without the need for catalytic deactivation of the endonuclease. Interestingly, Cas12a fused to a transcriptional regulator, still retains nuclease activity when used in conjunction with a full-length spacer, allowing for easy switching between editing and transcriptional control modalities [21]. Genome-wide pooled CRISPR screens have shown great success in discovering genotype-phenotype relationships and engineering new phenotypes in a wide range of organisms [28–32]. The design of highly active sgRNA is critical to obtaining accurate screening results. Machine learning-based guide activity prediction algorithms are potent tools for designing CRISPR gRNA in genetic engineering applications [33–37]. However, predictions from such tools are organism-specific and require large datasets of guide activity profiles generated in the organism in question for training and accurate predictions. One strategy for obtaining gRNA activities at such a scale involves the design of gRNA libraries that target an organism at the whole-genome level [37–39]. Subsequently, a negative selection screen is performed in a background disrupted for the dominant DNA repair mechanism. The gRNA library is transformed into such a strain in the presence of Cas activity and guide activity is measured based on the change in abundance of a gRNA compared to a background lacking the Cas protein. Highly active guides will cause DSB more efficiently, and the lack of DNA repair will cause cell death and a decline in abundance of the corresponding gRNA. The following sections describe the materials and experimental protocols necessary to implement CRISPR-Cas12a-based genome editing, transcriptional regulation, and pooled CRISPR screening in Y. lipolytica. Similar protocols for genome engineering with Cas9 have been described in detail in a previous book chapter [40].

2

Materials

2.1 Molecular Biology Reagents

1. Plasmids. 2. Standard desalted DNA oligos for cloning of sgRNA sequences, homology donor vectors, and genome screening and amplification. 3. Cloning PCR reagents: Phusion High-Fidelity DNA Polymerase, 5× Phusion HF buffer, and 10 mM dNTP mix (NEB, M0530L).

172

Adithya Ramesh et al.

4. Screening PCR reagents: Taq DNA Polymerase, 10× Standard Taq Reaction Buffer, and 10 mM dNTP mix (NEB, M0273L). 5. Cloning enzymes: SpeI-HF (NEB, R3133S), BssHII (NEB, R0199L), NheI-HF (NEB, R3131L), AvrII (NEB, R0174S), XmaI (NEB, R0180L), Calf Alkaline Phosphatase (CIP), T4 DNA ligase (NEB, M0202M). 6. 10× Cutsmart buffer (NEB, B6004S). 7. 10× T4 DNA ligase buffer. 8. Purified water (DNase/RNase-free) (ddH2O). 9. Thermocycler. 10. PCR tubes. 11. DNA cleanup kit (Zymo Research, D4004). 12. Gibson Assembly Master Mix 2× (GA MM) (NEB, M5510AA). 13. Nanodrop or similar UV-Vis spectrophotometer (for DNA quantification). 14. Competent Escherichia coli, such as chemically competent DH5α or TOP10. 15. Incubator for plates. 16. Incubator with shaking for liquid cultures. 17. Lysogeny broth (LB). 18. Agar. 19. Petri dishes. 20. Ampicillin. 21. LB-ampicillin media (1× LB media, 100 μg/mL ampicillin). 22. LB-ampicillin agar plates (1× LB media, 100 μg/mL ampicillin, 15 g/L agar). 23. 14 mL Falcon Round-Bottom Polystyrene Tubes. 24. Plasmid miniprep kit (Zymo Research, D4037). 25. Access to Sanger sequencing services (e.g., Genewiz). 26. Microcentrifuge tubes. 27. Microcentrifuge. 28. DNA gel extraction kit (Zymo Research, D4002). 29. 5-Fluoroorotic acid (5-FOA). 30. YPD with 5-FOA (1× YPD, 1 mg/mL 5-FOA). 31. Linearized homology donor (custom designed for a given application).

Yarrowia CRISPR Screens

2.2 Cell Culture and Transformation

173

1. Yarrowia lipolytica strain of interest stored at -80 °C as glycerol stock, e.g., Y. lipolytica PO1f. 2. YPD media (20 g/L peptone, 10 g/L yeast extract, 2% glucose). 3. YPD agar plates (1× YPD, 20 g/L agar). 4. Deoxyribonucleic acid, single stranded from salmon testes, 10 mg/mL. 5. 10x TE buffer, pH 8.0 (100 mM Tris–HCl, 10 mM EDTA). 6. Lithium Acetate, 1 M. 7. Triacetin. 8. β-Mercaptoethanol. 9. Transformation buffer (0.3 M LiAc, 1× TE). 10. ssDNA mix (8 mg/mL Deoxyribonucleic acid, single stranded from salmon testes, 1× TE). 11. Triacetin mix (5% v/v β-mercaptoethanol in triacetin). 12. Polyethylene glycol, MW = 3350. 13. PEG solution (70% w/v polyethylene glycol in H2O). 14. Complete supplement mix without leucine (CSM-Leu) powder (Sunrise Science Products, 1005). 15. Complete supplement mixture without leucine and uracil (CSM-Leu-Ura) Powder (Sunrise Science Products, 1038). 16. Glucose. 17. Yeast nitrogen base without amino acids. 18. Synthetic defined without leucine (SD-Leu) media (6.7 g/L yeast nitrogen base without amino acids, 0.67 g/L CSM-Leu, 2% glucose). 19. SD-Leu agar plates (6.7 g/L yeast nitrogen base without amino acids, 0.67 g/L CSM-Leu, 2% glucose, 20 g/L agar). 20. Synthetic defined without leucine and uracil (SD-Leu-Ura) media (6.7 g/L yeast nitrogen base without amino acids, 0.64 g/L CSM-Leu-Ura, 2% glucose). 21. SD-Leu-Ura agar plates (6.7 g/L yeast nitrogen base without amino acids, 0.64 g/L CSM-Leu-Ura, 2% glucose, 20 g/ L agar). 22. Agarose. 23. 0.5× TBE (45 mM Tris–Borate, 1 mM EDTA, pH 8.0). 24. 1% agarose gel (0.5× TBE, 10 g/L agarose). 25. 250 mL baffled flask. 26. TE with LiAc (1× TE, 100 mM LiAc). 27. TE with LiAc and PEG (1× TE, 100 mM LiAc, 40% w/v PEG).

174

3

Adithya Ramesh et al.

Methods

3.1 Design and Cloning of Gene Disruption Constructs

This protocol is used for the design and generation of episomal plasmids for genome editing, and transcriptional regulation using the CRISPR-LbCas12a system in Yarrowia lipolytica (Fig. 1). These plasmids can be used to disrupt genes either individually or in a multiplex manner via indel formation as described in Subheading 3.2. These plasmids are available with the choice of either LEU2 or URA3 as the selection marker (see Note 1). 1. Target sequences for Lachnospiraceae bacterium ND2006 CRISPR-Cas12a (or Cpf1) have two components, a PAM sequence, which consists of the bases “TTTV” (V = A/G/ C), and a 23–25 bp guide sequence immediately 3′ of the PAM. Target sequences close to the start codon and in exons are preferable to ensure that a mutation eliminates function of the targeted gene. Target sequences should also be verified for uniqueness using a BLAST against the genome of Y. lipolytica.

a

c

d 80 60 40 20

n 28 t nt 25 n 22 t nt 19 n 18 t nt 16 n 14 t nt

0 31

MGA1 disruption efficiency (%)

100

100 80 60 40 20 0 M G C A1 M AN G C A 1 UR 1 A AN M N C A3 1- G 1-UAN M A1 R 1 G -U A A1 R 3 -U A3 R A3

Disruption efficiency (%)

100 80 60 40 20 0 gR gRNA 1 gRNA 2 gRNA 3 gRNA N 4 A 5

MGA1 disruption efficiency (%)

b

spacer length

Fig. 1 CRISPR-LbCas12a genome editing and transcriptional regulation. (a) Schematic showing the pCpf1_yl plasmid and expression cassettes for LbCpf1 and gRNA expression. When constructing multiplexed cassettes, the target sequences may be tiled one after another separated by a direct repeat (DR) sequence. (b) Gene disruption efficiency for five different gRNAs targeting MGA1. Best gRNA for MGA1 achieved disruption efficiencies of over 80%. (c) Efficiency of multiple disruptions of three different gene in Y. lipolytica (MGA1, CAN1, and URA3). (d) Effect of gRNA length on gene disruption efficiency. Reprinted with permission from guide RNA engineering enables dual purpose CRISPR-Cpf1 for simultaneous gene editing and gene regulation in Yarrowia lipolytica. (ACS Synth. Biol. 2020, 9 (4), 967–971)

Yarrowia CRISPR Screens

175

2. After selection of a 23–25 bp target sequence with a 5′ PAM sequence (of the form TTTVNNNNNNNNNNNNNNNNNNNNNNNNN),

oligos containing the 23–25 bp guide sequence as shown below should be ordered (for the sake of uniformity 25 bp sequences are shown throughout). • 2a. Oligos for single gene disruptions are shown below. Forward CRISPR primer template 5′- ATTTCTACTAAGTGTAGATNNNNNNNNNNN NNNNNNNNNNNNNNTTTTTTACGTCTAAGAAA-3′ Reverse CRISPR primer template 5′TTTCTTAGACG TAAAAAANNNNNNNNNNNNNNNNNNNNNNNN NATCTACACTTAGTAGAAAT-3′ • 2b. When constructing a plasmid for achieving dual knockouts, the 23–25 bp target sequences may be tiled one after another separated by a direct repeat (DR) sequence. The forward and reverse CRISPR templates may be split into two to ensure that they may still be easily ordered as primers up to 90 bp in length, that are easily and cheaply synthesized by IDT. Subscripts 1 and 2 signify target sequences used to knockout each of the two genes. Forward CRISPR primer template 1 5′- GGCGCATAATTTCTACTAAGTGTAGATN 1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1 AATTTCTACTAAGTGTAGATCTACCCGA TATCT-3′ Reverse CRISPR primer template 1 5′- AGATATCGGGTAGATCTACACTTAGTAGAA ATTN 1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1ATCTACACTTAGTAGAAATTATGCGCC-3′ Forward CRISPR primer template 2 5′- TCCAACGACTCGTTCAATTTCTACTAAGTG TAGATN 2N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2TTTTTTACGTCTAAGAAACCAT-3′ Reverse CRISPR primer template 2 5′ATGGTTTCTTAGACGTAAAAAAN 2N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2 ATCTACACTTAGTAGAAATTGAAC GAGTCGTTGGA-3′ • 2c. Similarly, CRISPR templates to be cloned into plasmids for achieving triple knockouts are also presented below. Subscripts 1, 2, and 3 signify target sequences used to knockout each of the three genes. Forward CRISPR primer template 1

176

Adithya Ramesh et al.

5′CAAATTTCTACTAAGTGTAGATN 1N1 N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1 N1N1N1AATTTCTACTAAGTGTAGATN2N2N2N2N2 N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2 N2N2N2N2N2-3′ Reverse CRISPR primer template 1 5′-N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2 N2N2N2N2N2N2N2 ATCTACACTTAGTAGAAATTN 1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1N1 N1N1N1N1N1ATCTACACTTAGTAGAAATTTG-3′ Forward CRISPR primer template 2 5′-N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2 N2N2N2 N2N2N2N2N2AATTTCTACTAAGTGTAGATN3 N3N3N3N3N3N3N3N3N3N3N3N3N3N3N3N3N3N3N3N3 N3N3N3N3TTTTTTACGTCTAAGAAACC-3′ Reverse CRISPR primer template 2 5′- GGTTTCTTAGACGTAAAAAAN 3N3N3N3N3N3 N3N3N3N3N3N3N3N3N3N3N3N3N3N3N3N3N3N3N3 AT CTACACTTAGTAGAAATTN2N2N2 N2N2N2N2N2N2N2 N2N2N2N2N2N2N2N2N2N2N2N2N2N2N2-3′ • The ordered oligos should be resuspended, and mixed along with their respective reverse primer under the conditions shown. 12.5 μL

ddH2O

2.5 μL

10× Cutsmart buffer

5 μL

Forward CRISPR primer

5 μL

Reverse CRISPR primer

The mixture should then be placed in a thermocycler and subjected to the following program to anneal the oligos together and yield double-stranded DNA. 95 °C

4 min

95 °C

1 min then ramp to 40 °C at 10 °C/min

4 °C

Until ready

3. To prepare the vector for cloning, the pCpf1yl plasmid should be digested using the restriction enzyme SpeI. 39 μL

ddH2O

5 μL

10× Cutsmart buffer

5 μL

200 ng/μL pCRISPRyl

1 μL

SpeI

Yarrowia CRISPR Screens

177

After digestion, a silica-column-based kit such as one from Zymo can be used to purify the digested backbone. The digested vector should then be quantified using a Nanodrop or another UV-Vis spectrophotometer. 4. Isothermal assembly is then used to clone the annealed oligos into the digested vector. We use Gibson Assembly master mix and the mixture shown below, but other DNA assembly methods can be used. 3 μL

ddH2O

1 μL

Annealed oligos

1 μL

50 ng/μL AvrII-digested pCRISPRyl

5 μL

2× GA MM

Place the mixture in a thermocycler and subject to the following program. 50 °C

60 min

4 °C

Until ready

5. Transform 1–5 μL of the Gibson Assembly reaction into the preferred competent Escherichia coli cells, following the recommended procedure. Our lab uses chemically competent TOP10. Plate the transformation on LB agar plates containing ampicillin to select for successful transformants. Incubate the plates at 37 °C overnight (or 16–20 h). 6. Ensure that the incubated plate has single colonies. Typically, the plasmid transformation into TOP10 results in 100–1000 colonies, although this is dependent upon the competent cells used. Pick between two and five colonies to inoculate a 2 mL LB ampicillin liquid culture to maximize the odds of obtaining a successful clone, and incubate with shaking at 37 °C overnight. The following day, isolate plasmid from the cultures using a spin column kit from the preferred manufacturer. 7. Sequence the isolated plasmid using a primer that binds outside of the cloned region. The sequencing primer we use is shown below. Cloning efficiency for correct plasmids varies depending upon the sequence of the 25 mer guide, from 100% to ~20%. Cloning efficiencies of dual and triple knockout plasmids are generally lower as they require a three-fragment Gibson assembly. pCpf1yl sequencing primer 5’-CTTCGACTCTAGAGGATCTGG-3′

178

Adithya Ramesh et al.

3.2 Transformation of Y. lipolytica with CRISPR-Cas12a Plasmids for Genome Editing or Transcriptional Regulation

The protocol described below is a method for rapid, high-efficiency transformation of Y. lipolytica with CRISPR-Cas12a plasmids. Details of the protocol depend upon the application as described in Note 2 and Subheading 3.5. 1. To prepare Y. lipolytica cells for transformation, cells stored at -80 °C should be streaked for single colonies on solid media, and incubated at 30 °C for 24 h. Pick a single colony and use it to inoculate 2 mL of liquid YPD media. Allow the culture to grow at 30 °C for 24 h shaking at 200 RPM, so that cells reach stationary phase. 2. Aliquot 250 μL of stationary phase cells to microcentrifuge tubes, pellet cells via centrifugation at 6500 g, wash cells in 250 μL transformation buffer, and resuspend in 100 μL transformation buffer. Add 3 μL of ssDNA mix, approximately 1 μg of plasmid DNA (between 1 and 10 μL in H2O), and 15 μL of triacetin-2-mercaptoethanol mix in that order. The ssDNA contains a final concentration of 8 mg/mL of sheared salmon sperm DNA (200–500 bp fragment size) in 1× Tris–EDTA. The triacetin-2-mercaptoethanol mix contains 95% v/v of triacetin and 5% v/v of 2-mercaptoethanol. Mix the solution via pipetting and incubate for 30 min at room temperature. Add 150 μL PEG solution, mix well via pipetting, and incubate for 30 min at room temperature. Heat shock at 37 °C for 15 min, and then add 1 mL ddH2O and mix via pipetting. Pellet cells via centrifugation, resuspend in 100 μL of ddH2O, and use to inoculate 2 mL of SD-Leu liquid media. 3. Allow cells to grow for 48–72 h in SD-Leu media at 30 °C shaking at 200 RPM, until growth is visible. The cells may also be subcultured to improve editing efficiency as this allows for the CRISPR cuts to occur for a longer duration. Subculturing is typically done by transferring 15 uL of cells into 2 mL of fresh selective media. Plate dilutions of cells on YPD plates to isolate single colonies. Screening is done using colony PCR as described in Subheading 3.3. To cure plasmids from successful mutants, grow single colonies in YPD media at 30 °C for 24 h shaking at 200 RPM and plate to isolate single colonies. Confirmation of plasmid removal can be done by restreaking on SD-Leu agar plates.

3.3 Genome Mutation Analysis

Mutations introduced by error-prone repair of double-strand breaks introduced by CRISPR-Cas12a can be identified by the protocol described here. 1. Amplify the targeted sequence in the genome via colony PCR using primers approximately 250 bp up and downstream of the target site. Pick single colonies from transformed and plated cells and use as template in a PCR with the volumes and concentrations shown below.

Yarrowia CRISPR Screens

179

20.875 μL

ddH2O

2.5 μL

10× standard taq buffer

0.5 μL

10 μM forward primer

0.5 μL

10 μM reverse primer

0.5 μL

10 mM dNTP mix

0.125 μL

Taq DNA polymerase

2. Run the reaction using the following thermocycler parameters. 95 °C

10 min

1 cycle

95 °C 52 °Ca 68 °C

30 s 20 s 45 s

35 cycles

68 °C

5 min

1 cycle

4 °C

Hold

1 cycle

a Annealing temperature is dependent upon screening primers—we target 52 °C when designing primers

3. Confirm successful PCR using a 1% agarose gel to visualize a single band of the appropriate size. Use a column-based DNA purification kit to isolate PCR product, and sequence by Sanger sequencing (see Note 3). Align sequencing results to the native DNA sequence. For functional disruptions, confirm that a frameshift mutation resulting in a premature stop codon has been introduced. 3.4 Design, Cloning, and Application of CRISPRi (Interference) and CRISPRa (Activation) Using CRISPR-Cas12a

By targeting gRNAs to the promoter of a Y. lipolytica gene using pCpf1i_yl, or the pCpf1a_yl plasmids, the transcription of the targeted gene can be suppressed or activated respectively. Differently to conventional CRISPRi or CRISPRa vectors that introduce a mutation in the endonuclease to deactivate the nuclease domains, cutting activity in the CRISPR-Cas12a system is modulated by the length of the gRNA [21]. Full-length gRNAs (23–25 bp) as discussed in Subheading 3.1 are used for gene disruption, while shorter length sgRNA (14 bp) is used for gene repression or activation (Fig. 2). 1. To design sgRNAs for CRISPRi, the promoter region of the gene to be targeted must be downloaded from a database, sequenced, or identified using another method. The transcription start site (TSS) should then be identified, and the 25–120 bp upstream of the TSS should be manually inspected for a TATA box or similar element. If no TATA element is apparent, a location approximately 40–60 bp upstream of the

180

Adithya Ramesh et al.

b

a

1500 1000 500 0

Fold Change hrGFP Expression

10 8 6 4 2 0

20 15 10 5 0

N N og on R ta N rg A et C i n on g t t-g -gR rol N t-gRN A A t-gRN 1 A t-gRN 2 A dC t-gRN 3 pf R A 1 t- NA 4 +v g R e N 5 Co A6 nt ro l

0.0

Fluorescence of hrGFP (RFU)

0.5

t-gRNA: 16 nt

N N og on R ta N rg A et C i n on g t t-g -gR rol N t-gRN A A t-gRN 1 A t-gRN 2 A dC t-gRN 3 pf R A 1 t- NA 4 +v g R e N 5 Co A6 nt ro l

1.0

OD600 @ Day 2

1.5

N N og on R ta N rg A et Co in n g- tr t-g gR ol R N t-g NA A R t-g NA 1 R t-g NA 2 dC pf t RN 3 1 -g A +v R 4 e NA Co 5 nt No ro No g l nt RN ar A ge C tin on g- tr t-g gR ol R N t-g NA A R t-g NA 1 R t-g NA 2 dC pf t RN 3 1 -g A +v R 4 e NA Co 5 nt ro l

Fold Change CAN1 Expression

t-gRNA: 16 nt

Fig. 2 CRISPRi (interference) and CRISPRa (activation) using CRISPR-Cas12a (a) CRISPR interference of CAN1 with truncated gRNAs and LbCpf1-MXI1. Canavanine challenge assay result showing repression of CAN1 with t-gRNA1, -2, -4 and -5 enables growth (t-gRNA indicates a truncated gRNA of 16 bp in length). qPCR result confirmed the repression effect. (b) CRISPR activation of hrGFP with truncated gRNAs and LbCpf1-VPR. CRISPRa increases GFP fluorescence and hrGFP mRNA level. GFP fluorescence data were measured by Flow Cytometry. Reprinted with permission from guide RNA engineering enables dual purpose CRISPR-Cpf1 for simultaneous gene editing and gene regulation in Yarrowia lipolytica. (ACS Synth. Biol. 2020, 9 (4), 967–971)

TSS may be used. Alternatively, gRNAs may be designed to target regions immediately upstream of the TSS. A web server such as YeasTSS which collates TSS of coding genes in various yeast species may be used to ease the design of gRNAs [41]. 2. To design sgRNAs for CRISPRa, the promoter region and the TSS of the gene to be targeted must be identified as discussed in the previous paragraph. Yeast promoters typically span hundreds of base pairs with a ~150 core promoter region and upstream activation or repression elements that are repeated multiple times [42]. gRNA for activation must be designed to target the upstream activation elements that typically lie 100–200 upstream of the TSS. 3. gRNA that spans 14 bp from a TTTV PAM sequence must be designed at the target region, and subsequently cloned into the pCpf1i_yl or pCpf1a_yl plasmids following the steps discussed in Subheadings 3.1, 3.2, 3.3, 3.4, 3.5 and 3.6, steps 1–6. The cloned plasmids must be verified for sequence correctness as outlined in Subheading 3.1, step 7. 4. The plasmids may then be transformed into Y. lipolytica following the procedures outlined in Subheading 3.2 and plated to obtain single colonies on agar plates containing selective media (SD-Leu).

Yarrowia CRISPR Screens

181

5. Gene repression or activation is typically measured by performing qPCR. The protocol for qPCR to confirm transcriptional regulation is briefly outlined as follows. Grow Y. lipolytica transformants to early stationary phase in SD-Leu (OD600 ~ 10) and centrifuge 1 mL of the culture at an OD600 of ~10 at 6500 g for 2 min. Use the Yeastar RNA isolation kit from Zymo to extract total nucleic acids and subject it to DNaseI digestion for 45 min at 37 °C in a volume of 50 uL. Purify the DNA-free RNA using an RNA purification kit such as RNA Clean and Concentrator-25 from Zymo. Determine the RNA concentration using QuBit and use 400 ng of the DNA-free RNA to set up a reverse transcription reaction to generate cDNA (Biorad’s iScript Reverse Transcription Supermix). Dilute the resulting mix eightfold and use 2 uL per well in an RT-qPCR experiment, with a kit such as SsoAdvanced Universal SYBR Green Supermix from Biorad and appropriate qPCR primers. qPCR primers may be designed using tools such as the IDT PrimerQuest and then checked to ensure that the primers have high-free energies for self and heterodimerization. Generate standard curves to ensure that primer efficiencies fall between 90% and 110% as is typical for qPCR applications. Perform the RT-qPCR on the gene of interest with and without CRISPRi/a, and normalize its expression to that of a housekeeping gene such as actin or GAPDH using the ΔΔCt method to determine the efficiency of repression or activation. 3.5 Design, Cloning, and Validation of a Genome-Wide CRISPRCas12a Library for Pooled CRISPR Screening in Y. lipolytica

For the design of an unbiased n-fold coverage CRISPR-Cas12a library targeting all protein coding genes in Y. lipolytica, the reader is referred to another book chapter in Methods in Molecular Biology which explains the methodology in sufficient detail [39]. Custom MATLAB scripts that were used to design an eightfold coverage Cas12a library have also been deposited on Github (https:// github.com/ianwheeldon/acCRISPR/tree/main/MATLAB_ scripts_genome_wide_CRISPR_screens_Y_lipolytica/Cas12a_ Library_Design). The designed Cas12a library was constructed by Agilent and cloned in our lab using Agilent’s SureVector CRISPR library kit (G7556A), as per vendor instructions, with a few small modifications. Further, a backbone plasmid (pLbCas12ayl-GW) that could receive the constructed Cas12a library was also constructed. This was done by first adding a second direct repeat sequence at the 5′ of the polyT terminator in pCpf1_yl to allow for the library gRNAs to end in one or more thymine residues without being construed as part of the terminator. Subsequently, the LbCas12a was removed from the plasmid to be integrated into the genome of Y. lipolytica for stable expression during the genome-wide screen. The following protocol details the cloning

182

a

Adithya Ramesh et al. Genes Import all CDS from YALI1 assembly

Genome Cas12a Library

b

Import all chromosomes of YALI1

Uniqueness Identify all sgRNA (25 nt) from a TTTV PAM

Identify all sgRNA (25 nt) from a TTTN PAM

Seed sequences (14 bp closest to PAM) of sgRNA were compared

c

Only sgRNA that have exactly a single occurrence pass filter

Pick the now unique sgRNA from top and bottom strands in a non-biased manner to generate final 8-fold coverage library

Fig. 3 (a) Flow diagram for the design of an eightfold coverage library of sgRNAs for pooled CRISPR-Cpf1 screens. Guide Uniqueness was verified within a seed region of 14 nt, to minimize off-target activity. (b) The coverage of genome-wide Cas12a library. 80% genes have eight sgRNA, >90% of genes had at least five unique sgRNA. (c) Raw library characterization. Agilent printed the library as oligos, we cloned it in house and so the raw library was characterized. Even representation sgRNA as indicated by a tight normal distribution with minimal skew

of a CRISPR library with the example of the CRISPR-Cas12a guide library that we constructed (Fig. 3). 1. Linearize the backbone vector pLbCas12ayl-GW using PCR with the primers InversePCR-F and InversePCR-R. InversePCR-F 5′-TTTTTTTACGTCTAAGAAACCATTATTATCATGACATTAAC CT-3′

InversePCR-R 5′-TGCGCCGACCCGGAATCGAACCGGGGGCCC-3′ 2. Digest the purified PCR product with DpnI (that only cuts intact E. coli plasmid) in a 100 uL reaction at 37 °C for 2 h, and clean up the reaction using AMPure XP SPRI PCR beads following vendor instructions (using 1× the volume of beads to volume of digest). 56 μL

ddH2O

10 μL

10× cutsmart buffer

30 μL

Inverse PCR product

4 μL

DpnI

Yarrowia CRISPR Screens

183

3. Transform the 2 uL of the cleaned-up product into 100 uL of competent TOP10 cells and plate the entire transformation on LB supplemented with Ampicillin. Run appropriate controls to ensure that transformation is efficient by transforming any intact plasmid. Verify that the plate with the Inverse PCR product transformation has little to no colonies. This signifies that the DpnI treatment was a success and that there is no residual intact backbone plasmid in the Inverse PCR reaction. 4. Resuspend the lyophilized library oligos obtained from Agilent in 100 uL of TE buffer at pH 8.0 and vortex briefly to ensure full resuspension. Determine the concentration of the library using the Bioanalyzer or QuBit (for single-stranded DNA), and use 2 nM of the library to PCR amplify the library with the OLS-F and OLS-R primers. OLS-F 5′-GTTTAGTGGTAAAATCCATCGTTGCCATCG-3′ OLS-R 5′-GATACGCCTATTTTTATAGGTTAATGTCATG-3′ 10 μL

5× Q5 reaction buffer

1 μL

dNTPs

2.5 μL

10 μM OLS-F

2.5 μL

10 μM OLS-R

X μL

2 nM oligo library

0.5 μL

Taq DNA polymerase

33.5-X μL

Nuclease-free water

98 °C

120 s

1 cycle

98 °C 70 °Ca 72 °C 72 °C

20 s 20 s 20 s 2 min

15 cycles

4 °C

Hold

1 cycle

Annealing temperature is dependent upon OLS amplification primers—we target 70 °C when designing primers a

5. Purify the PCR product using AMPure XP SPRI beads following vendor instructions (using 1.8×, the volume of beads to volume of the pooled PCR product). Analyze the PCR product to determine accurate concentrations using the Bioanalzyer. Dilute the final product to 5 ng/uL and preserve at -20 °C for cloning into the backbone that was linearized by inverse PCR.

184

Adithya Ramesh et al.

6. For cloning the amplified library into the linearized backbone, first prepare a 20% SureSolution stock by diluting 10 uL of SureSolution in 40 uL of water. Make a 4.5× cloning reaction master mix as shown in the table below and aliquot 20 uL each of the final reaction mix into four PCR tubes. Set up the thermocycler as shown to proceed with the cloning of the OLS library. 9.0 μL

10× SureVector buffer

3.6 μL

SureVector dNTP mix

6.7 μL

50 ng/μL linearized pLbCas12ayl-GW Vector

5.7 μL

5 ng/μL OLS Cas12a Library

3.6 μL

SureVector enzyme mix

9.0 μL

20% SureSolution

52.4 μL

Nuclease-free water

95 °C

60 s

1 cycle

95 °C 60 °C 65 °C

20 s 90 s 60 s

15 cycles

4 °C

Hold

7. The cloning reactions are then combined and purified using AMPure XP SPRI beads or equivalent PCR cleanup beads (0.8× the volume of beads added to the volume of the 4× pooled cloning reaction). 8. Prepare two 1 L library amplification bottles by combining 1 L of LB and 3 g of Agilent’s library-grade soft gelling agar. Leave the stir bar in and autoclave the bottles. Once autoclaved, cool the bottles, add 1 mL of ampicillin (100 mg/mL), mix via gentle stirring, and maintain at 37 °C. 9. Perform 18 replicate transformations of the cloned library into high-efficiency electrocompetent cells such as Agilent’s ElectroTen-Blue cells (Catalog #200159) via electroporation (0.2 cm cuvette, 2.5 kV, 1 pulse, time constant, 5.7). Recover the cells for 1 h in SOC media (2% tryptone, 0.5% yeast extract, 10 mM NaCl, 2.5 mM KCl, 10 mM MgCl2, 10 mM MgSO4, and 20 mM glucose) at 37 °C. 10. Split all successful transformations (those that did not arc) between the two amplification bottles and mix gently using the magnetic stir bars. Once thoroughly mixed, plate 100 uL of each of the bottles on LB plates containing ampicillin to determine the efficiency of the transformation. Then, submerge the amplification bottles in an ice bath with ice

Yarrowia CRISPR Screens

185

covering the bottle up till its neck (important) for 2 h, and then leave the bottle in the incubator at 30 °C for 48 h. Analyze the plates the next day to determine transformation efficiency (generally transformation is considered efficient if there are at least 100× transformants as the number of unique sgRNA in the library). 11. After 48 h, recover the colonies by centrifuging the two amplification bottles at 8000 g for 30 min, at 22 °C. The centrifuged cell pellets are pooled and split between 4 flasks with 200 mL of LB supplemented with ampicillin (100 ug/ mL) in a second round of amplification for 4 h. 12. Finally the cells from these cultures are pelleted and gigaprepped using a kit such as the ZymoPURE II Plasmid Gigaprep Kit (Catalog #D4202), to obtain between 1.5 and 3 mg of the CRISPR plasmid library. This library may then be validated via NGS using the Illumina NextSeq platform to test for fold coverage of individual sgRNA and skew. The protocol for NGS library preparation is discussed in subsequent sections. 3.6 Pooled CRISPR Screening Protocol for Genome-Wide Cas12a gRNA Activity Profiling in Y. lipolytica

Negative selection CRISPR screens in the absence of nonhomologous end joining (dominant DNA repair mechanism in Y. lipolytica) allow for the generation of gRNA activity profiles. Cas12a-induced double-stranded breaks in a strain deficient in NHEJ cause cell death or significantly impair growth, thus linking gRNA abundance (as measured by next-generation sequencing of the recovered sgRNA expression plasmids) to Cas12a activity. This abundance is compared to that in a control strain lacking Cas12a, where lack of cutting is expected to keep the sgRNA abundance at relatively constant levels. To conduct this screen, Y. lipolytica PO1f was first disrupted for KU70 gene, and subsequently, LbCas12a expressed from a UAS1B8-TEF (136) promoter was integrated into the A08 locus. The methodology for conducting pooled CRISPR screens comprises scaled transformation of the library of sgRNA containing plasmids into yeast, growth-based screening, isolation of plasmids at the end the screen, library preparation with isolated plasmids, and next-generation sequencing to quantify the sgRNA abundance. The following protocol provides details on each of these steps (Fig. 4). 1. To prepare Y. lipolytica cells for transformation with the genome-wide library, cells stored at -80 °C should be streaked for single colonies on solid media, and incubated at 30 °C for 24 h. Pick a single colony to inoculate 3 mL of liquid YPD media. Allow the culture to grow at 30 °C for 20 h shaking at 225 RPM, so that cells reach stationary phase (OD 35–40). 2. Pellet all 3 mL of stationary phase cells to microcentrifuge tubes, via centrifugation at 6500 g, wash cells in 2 mL transformation buffer, and resuspend in 1.2 mL transformation buffer.

186

Adithya Ramesh et al.

Fig. 4 Workflow of genome-wide CRISPR loss-of-function screen in Y. lipolytica

Transfer the cell suspension into a 15 mL Eppendorf tube. Add 36 μL of ssDNA mix, approximately 10 μg of plasmid DNA (between 1 and 10 μL in H2O), and 180 μL of triacetin-2mercaptoethanol mix in that order. The ssDNA contains a final concentration of 8 mg/mL of sheared salmon sperm DNA (200–500 bp fragment size) in 1× Tris–EDTA. The triacetin2-mercaptoethanol mix contains 95% v/v of triacetin and 5% v/v of 2-mercaptoethanol. Mix the cell suspension with all the reagents thoroughly via pipetting and incubate for 30 min at room temperature. Add 1.8 mL 70% PEG 3350 solution, mix well via pipetting, and incubate for 30 min at room temperature. Heat shock at 37 °C for 25 min, and then transfer the viscous cell suspension into a 50 mL Eppendorf tube containing 20 mL of water. Pellet cells via centrifugation, resuspend in 1 mL of ddH2O. Plate a dilution of 0.001% of the transformation on SD-Leu to determine transformation efficiency, and use the remaining to inoculate 50 mL of SD-Leu liquid media in a baffled shake flask.

3. Cells typically reach confluency after 2 days of growth. While CRISPR cuts that cause gene disruptions typically occur by 2 days of growth, the cells have not undergone enough doublings for the effect of the gene disruption to manifest as changes in gRNA abundance. Thus, day 2 samples may be

Yarrowia CRISPR Screens

187

stored as stocks to start new screens. To make glycerol stocks, add 178 uL of 80% glycerol to 812 uL, mix well by pipetting, and store at -80 °C. 4. To subculture the cells, take 200 uL of the cell culture and inoculate into a 250 mL baffled flask containing 25 mL of fresh SD-Leu media. This cell culture may also be passaged once more upon reaching confluency, in a similar manner. 5. Upon reaching confluency, remove and store 1 mL of cell culture to isolate plasmids. Treat each 1 mL sample with 25 uL of 10× DNaseI buffer and 2 uL of DNaseI and incubate for 1 h at 30 °C to remove any extracellular plasmid DNA. Pellet the cells by centrifugation at 4500 g and store at -80 °C until plasmid extraction. 6. Thaw and resuspend each of the frozen cell pellets in 400 uL of nuclease-free water, and then split each suspension into two 200 uL samples. Splitting into separate samples here is done to accommodate the capacity of the Zymo Yeast Miniprep Kit, specifically to ensure complete lysis of cells using Zymolyase and lysis buffer. This step is critical in ensuring sufficient plasmid recovery and library coverage for downstream sequencing. Follow vendor instructions to isolate plasmids into 10 uL of water and pool the samples that were split at the start. 7. Plasmid concentration and number is typically quantified using qPCR as the number of extracted plasmids fall below the detection threshold of common measurement instruments such as the nanodrop or QuBit. Design qPCR primers that bind to and amplify the plasmid and not the genome (such as the M13F and M13R primers). Check to ensure that the designed primers fulfill qPCR kit requirements (Tm of primers, length of the PCR product, and GC clamp are typically specified by the kit), and do not form homo- or heterodimers. Also ascertain that the primer efficiencies fall between 90% and 110% with the help of a qPCR primer standard curve. The template we used to determine primer efficiencies was the pLbCas12aylGW empty vector. Tenfold serial dilutions of this plasmid starting from 10 ng down to 0.001 ng were used to make the standard curve, and equations correlating qPCR cycle thresholds (Ct value) to plasmid copy number and plasmid quantity in nanograms were also obtained. qPCR primers that were designed for the Cas12a library are presented below. qPCR-GW-F 5′-TTATGAACTGAAAGTTGATGGC-3′ qPCR-GW-R 5′-TCACACAGGAAACAGCTATG-3′ 8. Make a tenfold dilution stock of the extracted plasmid samples from the genome-wide experiment by adding 1 uL of the plasmids and 9 uL of nuclease-free water in separate PCR tubes. Set up the qPCR as shown in the table below. The

188

Adithya Ramesh et al.

table presents reagent quantities for each well in the qPCR plate. Make a master mix that combines the SYBR green qPCR 2× mix and the forward and reverse primers. Aliquot 18 uL of the master mix into each well of the qPCR plate. Subsequently, add 2 uL of the tenfold diluted plasmid extract from the genome-wide screen. We typically perform two technical replicates for each plasmid sample. Each experiment also contains three wells where nuclease-free water is added in place of the plasmid sample to serve as No Plasmid Controls. 7 μL

ddH2O

0.5 μL

qPCR-GW-F

0.5 μL

qPCR-GW-R

10 μL

SYBR green 2× master mix

2 μL

Tenfold dilute plasmid/ddH2O

9. Set up the qPCR thermocycler (BioRad CFX Connect) with the following program. Once complete, obtain the Ct values as determined by the CFX connect, and use the previously developed equations to determine plasmid quantity and copy number. 95 °C

30 s

1 cycle

95 °C 60 °Ca

30 s 15 s

40 cycles

65 °C

5s

Ramp till 95 °C at 0.5 °C/5 s

Annealing temperature is dependent upon the qPCR primers—we target 60 °C as required by the kit when designing primers

a

10. Library preparation of deep sequencing involves PCR amplification of the gRNA portion of each plasmid, while adding all the necessary components to the PCR product to enable sequencing on the Illumina NextSeq platform. PCR amplify gRNA portion of each plasmid with primers that add the Illumina adapters, Standard TruSeq read 1 sequencing primer annealing site, 8 bp Illumina barcodes, Index Read 1 sequencing primer annealing site, and pseudobarcodes. The sequencing fragment for the Cas12a after PCR is shown in Fig. 4. Forward primers typically add the P5 adapter (required for binding to the NextSeq flowcell), Standard TruSeq Read 1 site, and pseudobarcodes, while the reverse primer adds the P7 adapter (required for binding to the flow cell), 8 bp Illumina barcode index, and the Index Read 1 site. The table below presents the list of forward and reverse primers we use for amplification of the Cas12a sequencing fragments.

Yarrowia CRISPR Screens

Primer name

Primer sequence

189

Illumina barcode (reverse primer)/ pseudobarcode (forward primer) for demultiplexing

ILU1-F

^TTCCGG AATGATACGGCGACCACCGAGATCTACACTCTTTCCC TACACGACGCTCTTCCGATCTTTCCGGGTCGGCGCAAA TTTC

ILU2-F

AATGATACGGCGACCACCGAGATCTACACTCTTTCCC TACACGACGCTCTTCCGATCTAGATCGGG TCGGCGCAAATTTCT

^AGATCG

ILU3-F

AATGATACGGCGACCACCGAGATCTACACTCTTTCCC TACACGACGCTCTTCCGATCTGCTATTCGGG TCGGCGCAAATTTCT

^GCTATT

ILU4-F

AATGATACGGCGACCACCGAGATCTACACTCTTTCCC TACACGACGCTCTTCCGATCTCAGGACTACGGG TCGGCGCAAATTTCT

^CAGGAC

ILU1-R

CAAGGCGA CAAGCAGAAGACGGCATACGAGATTCGCCTTGGTGAC TGGAGTTCAGACGTGTGCTCTTCCGATCTTAGAGGATC TGGGCCTCGTGATAC

ILU2-R

CAAGCAGAAGACGGCATACGAGATGACGAGAGGTGAC CTCTCGTC TGGAGTTCAGACGTGTGCTCTTCCGATCTTAGAGGATC TGGGCCTCGTGATAC

ILU3-R

CAAGCAGAAGACGGCATACGAGATAGACTTGGGTGAC CCAAGTCT TGGAGTTCAGACGTGTGCTCTTCCGATCTTAGAGGATC TGGGCCTCGTGATAC

ILU4-R

TAATACAG CAAGCAGAAGACGGCATACGAGATCTGTATTAGTGAC TGGAGTTCAGACGTGTGCTCTTCCGATCTTAGAGGATC TGGGCCTCGTGATAC

ILU5-R

CAAGCAGAAGACGGCATACGAGATCCTGAACCGTGAC GGTTCAGG TGGAGTTCAGACGTGTGCTCTTCCGATCTTAGAGGATC TGGGCCTCGTGATAC

ILU6-R

CAAGCAGAAGACGGCATACGAGATATCAGGTTGTGAC AACCTGAT TGGAGTTCAGACGTGTGCTCTTCCGATCTTAGAGGATC TGGGCCTCGTGATAC

ILU7-R

GTCACCTA CAAGCAGAAGACGGCATACGAGATTAGGTGACGTGAC TGGAGTTCAGACGTGTGCTCTTCCGATCTTAGAGGATC TGGGCCTCGTGATAC

ILU8-R

ACTGTTCG CAAGCAGAAGACGGCATACGAGATCGAACAGTGTGAC TGGAGTTCAGACGTGTGCTCTTCCGATCTTAGAGGATC TGGGCCTCGTGATAC

ILU9-R

CAAGCAGAAGACGGCATACGAGATGTTCGATCGTGAC GATCGAAC TGGAGTTCAGACGTGTGCTCTTCCGATCTTAGAGGATC TGGGCCTCGTGATAC (continued)

190

Primer name

Adithya Ramesh et al.

Illumina barcode (reverse primer)/ pseudobarcode (forward primer) for demultiplexing

Primer sequence

ILU10-R CAAGCAGAAGACGGCATACGAGATACCTAGCTGTGAC TGGAGTTCAGACGTGTGCCTTCCGATCTTAGAGGATC TGGGCCTCGTGATAC

AGCTAGGT

TCATCTCT ILU11-R CAAGCAGAAGACGGCATACGAGATAGAGATGAGTGAC TGGAGTTCAGACGTGTGCTCTTCCGATCTTAGAGGATC TGGGCCTCGTGATAC AAGTCCAG ILU12-R CAAGCAGAAGACGGCATACGAGATCTGGACTTGTGAC TGGAGTTCAGACGTGTGCTCTTCCGATCTTAGAGGATC TGGGCCTCGTGATAC a The ^ symbol indicates that the sequence is a pseudobarcode. It also indicates that the sequence is anchored at the 5′ end of the sequencing read, meaning that these are the first few base pairs of the read that will be sequenced if a particular sample is amplified with the specified forward primer. This information may be used to demultiplex samples that have the same Illumina barcode (reverse primer) but different pseudobarcodes (forward primers)

11. Every sample is amplified with a unique combination of forward and reverse primers such that they are distinguishable from each other based on the combination of Illumina barcodes and Pseudobarcodes. As a general rule of thumb, biological replicates of any given sample will share the same Illumina barcode but will contain different barcodes. Samples that are distinct from one another during the genome-wide screen based on time points, strain backgrounds or treatment conditions will all have different Illumina barcodes to afford easy sample identification and corroboration during bioinformatic processing of NGS reads. 12. Set up the PCR as shown in the table below with the thermocycler settings also shown below. About 0.2 ng of plasmid template is subject to 16–18 cycles of amplification. It is recommended to keep below 18 cycles of amplification to avoid PCR amplification bias. Take extra precautions at this stage to ensure each sample truly contains a unique combination of forward and reverse primer. Sample with redundant primer combinations will confound data analysis and any conclusions that can be drawn from the screen. 10 μL

5× Q5 reaction buffer

1 μL

dNTPs

2.5 μL

10 μM forward primer (one of ILU1-4-F) (continued)

Yarrowia CRISPR Screens

191

2.5 μL

10 μM reverse primer (one of ILU1-12-R)

X μL

0.2 ng of extracted plasmid

0.5 μL

Taq DNA polymerase

33.5-X μL

Nuclease-free water

98 °C

120 s

1 cycle

98 °C 70 °Ca 72 °C

20 s 20 s 20 s

16–18 cycles

72 °C

2 min

1 cycle

4 °C

Hold

13. Purify the PCR products using AMPure XP SPRI beads or equivalent PCR beads using 0.9×, the volume of beads to the volume of PCR product for purification (see Note 4). 14. Quantify the purified samples on the bioanalyzer to obtain accurate molar concentrations of the DNA fragment of the required size (typically this will be a band at ~290 bp). 15. Pool all PCR samples by aliquoting equimolar quantities of each sample into a microcentrifuge tube. Anywhere between 10 and -50 nM of each sample may be used for pooling. The pooled sample is now ready to be run on the NextSeq. 3.7 Bioinformatic Processing of NextSeq Reads to Obtain gRNA Abundances

The next-generation sequencing run outputs raw fastq files that can be processed using a combination of the Galaxy bioinformatics platform [43], and MATLAB to generate gRNA abundances in all samples. Since samples are pooled prior to sequencing on the NextSeq, the output is typically a single readset. However, if the Illumina barcodes are provided at the start, the reads may be demultiplexed when being converted to fastq files by the sequencing core. These reads are only demultiplexed based on the 8 bp Illumina barcode. Another round of demultiplexing is required to separate out samples that contained the same Illumina barcode, but different pseudobarcodes. Subsequently, gRNA read counts from fully demultiplexed datasets may be generated using exact matching, inexact matching or a combination of both (see Note 5). This section provides the general protocol as well as links to custom MATLAB scripts that our lab uses, to obtain gRNA read counts from CRISPR-Cas12a screens. 1. Upload all the fastq datasets and demultiplex the datasets using Cutadapt. We used Cutadapt Galaxy version 1.16.6 with the settings provided in the table presented at the end of this section. The reader is also referred to the previous table that

192

Adithya Ramesh et al.

provides correlations between the pseudobarcode and the forward primer used during the PCR. 2. Assess the read quality of demultiplexed reads using FastQC. Make note of readsets with regions containing poor quality scores ( Run all (Fig. 3c). 5. Ensure your computer does not disconnect from the Colab hosted runtime during the run (see Note 2). 6. At the completion of the run, your browser will ask for permission (Fig. 3d) to download up to four (see Note 3) commaseparated values (.csv) files containing the results (Fig. 3e). These files are: (a) Increasing the forward stand transcription (PSSMPromoterCalculator_MAX_FWD_results.csv). (b) Decreasing the forward strand transcription (PSSMPromoterCalculator_MIN_FWD_results.csv). (c) Increasing the reverse strand transcription (PSSMPromoterCalculator_MAX_REV_results.csv). (d) Decreasing the reverse strand transcription (PSSMPromoterCalculator_MIN_REV_results.csv).

Option 2: Command Line Installation https://github.com/ellinium/pssm_promoter_tool https://pypi.org/project/pssm-promoter-tool/

1. Install the required libraries through pip (Fig. 4a). pip install pssm-promoter-tool

Method to Add or Remove Intragenic Bacterial Promoters

203

Fig. 3 How to use the PSSM promoter tool hosted on Google Colaboratory. (a) Copy the notebook to your own Google Drive, (b) paste a sequence into the window and give it a name, (c) select run all option, (d) upon run completion, a download will automatically start. You may need to give permission to download multiple files, (e) a list of downloaded csv files

Fig. 4 How to use the command line version of the PSSM promoter tool. (a) Use pip to install the required Python libraries, (b) download the package from GitHub, (c) files downloaded from GitHub, (d) example output to command line after a successful analysis

204

Ellina Trofimova et al.

2. Download PSSM Promoter Tool from the Github repository (Fig. 4b) and unpack the files (Fig. 4c), or use the “clone” command (see Note 4): git clone https://github.com/ellinium/pssm_promoter_tool

1. Create a FASTA file of the CDS you wish to process using this tool. 2. Save the FASTA file in the same directory as the moter_calculator.py file.

pssm_pro-

3. Analyze target sequence via the script (Fig. 4d): python pssm_promoter_calculator.py

4. Depending on the analyzed results (see Note 3), up to four csv files will be generated (Fig. 4d). These are: (a) Increasing the forward stand transcription (PSSMPromoterCalculator_MAX_FWD_results.csv). (b) Decreasing the forward strand transcription (PSSMPromoterCalculator_MIN_FWD_results.csv). (c) Increasing the reverse strand transcription (PSSMPromoterCalculator_MAX_REV_results.csv). (d) Decreasing the reverse strand transcription (PSSMPromoterCalculator_MIN_REV_results.csv). 3.3

Analyzed Results

The generated csv files contain tables of the results of PSSM Promoter Tool analysis, including both wild-type sequences (Type column: Original Promoter), and modified sequences (Type column: Modified Promoter) specified by the CORPSE algorithm. The transcription start site (TSS column) for each promoter is predicted by Promoter Calculator v1.0 [22]. Additionally, this calculator generates ΔG values for each component of the promoter which indicate how biophysically favorable it is for the component to function (see Note 5). The CORPSE-specific columns are PSSM_hex35 and PSSM_hex10 which are the CORPSE PSSM values described in our previous work [18]. There are two relevant columns to understand which modified promoters may be suitable for further testing in the lab. “Tx_rate” is the transcription rate predicted from the promoter sequence from the Promoter Calculator v1.0 model, and “Tx_rate_FoldChange”

Method to Add or Remove Intragenic Bacterial Promoters

A.

pB

205

C. mCherry

Gene A 651 756 918(pB)

Original

Original

543

CORPSE

iCORPSE

Promoter Minimization

Promoter Maxmization

B.

D.

PSSMPromoterCalculator_MAX_FWD_results.csv

PSSMPromoterCalculator_MIN_FWD_results.csv

Fig. 5 PSSM promoter tool analysis results. (a) PhiX174 gene A contains an intragenic promoter (pB) that is detected by promoter calculator v1.0 along with other potential promoters, (b) example csv output obtained from the promoter erasure algorithm, (c) mCherry fluorescent reporter gene is not known to naturally contain a promoter. Several potential proto-promoters are detected by promoter calculator v1.0, (d) example csv output from the PSSM promoter tool algorithm silently modifying the proto-promoter at coordinate 543 to have a 9.3fold greater predicted transcription rate

is the change between the wild-type sequence (Original Promoter) and each generated sequence (Modified Promoter), indicating the predicted fold difference in expression due to the sequence modifications (Fig. 5b, d). In the case of the pB promoter within gene A (Fig. 5a), analysis shows the three strongest predicted promoters are at positions 651, 756, and 918 (the pB promoter). The PSSM Promoter Tool results give multiple options for synonymous mutations that are predicted to reduce expression from the promoters to essentially zero (Fig. 5a, b). Similarly, the mCherry gene contains several weak predicted promoters on the forward strand (Fig. 5c). The PSSM Promoter Tool results show the proto-promoter at 543 could be synonymously mutated to increase the predicted strength 9.3-fold (Tx_rate_FoldChange) from a “Tx_rate” of 1100.27 to 10,230.86 (Fig. 5c, d). To synthesize and experimentally assess the model predictions, the “new_gene_sequence” column gives the full gene sequence with mutated bases for each promoter variant generated.

206

4

Ellina Trofimova et al.

Notes 1. A Google Drive account is necessary to run the hosted Colaboratory Python Jupyter notebook. 2. You can disconnect from Google Colaboratory hosted runtime if your computer goes to sleep or the internet connection is disrupted. You may need to adjust your computer’s power settings to ensure disconnection does not occur over the timeframe of the run (10–60+ min). 3. If promoters are not found on either the forward or reverse strand, then there will be no output files from these strand(s). So the number of files returned will be either 0 (no promoters found on either forward or reverse strands), 2 (promoter (s) found on only the forward or only reverse strand), or 4 (promoters found on both forward and reverse strands). 4. A user account is required to use the GitHub command “clone.” The account credentials (login and password) are requested during command execution. 5. More information on each Promoter Calculator v1.0 value (TSS, Tx_rate, hex35, hex10, UP, spacer, disc, ITR, dG values, UP_position, hex35_position, spacer_position, hex10_position, disc_position) can be found on the PSSM Promoter Tool GitHub page (https://github.com/ellinium/pssm_pro moter_tool).

References 1. Mu¨ller KM, Arndt KM (2012) Standardization in synthetic biology. In: Synthetic gene networks: methods and protocols, pp 23–43 2. Del Vecchio D, Dy AJ, Qian Y (2016) Control theory meets synthetic biology. J R Soc Interface 13(120):20160380 3. Temme K, Zhao D, Voigt CA (2012) Refactoring the nitrogen fixation gene cluster from Klebsiella oxytoca. Proc Natl Acad Sci U S A 109(18):7085–7090. https://doi.org/10. 1073/pnas.1120788109 4. Springman R, Molineux IJ, Duong C, Bull RJ, Bull JJ (2012) Evolutionary stability of a refactored phage genome. ACS Synth Biol 1(9): 4 2 5 – 4 3 0 . h t t p s : // d o i . o r g / 1 0 . 1 0 2 1 / sb300040v 5. Ghosh D, Kohli AG, Moser F, Endy D, Belcher AM (2012) Refactored M13 bacteriophage as a platform for tumor cell imaging and drug delivery. ACS Synth Biol 1(12):576–582. https://doi.org/10.1021/sb300052u

6. Chan LY, Kosuri S, Endy D (2005) Refactoring bacteriophage T7. Mol Syst Biol 1(2005): 0 0 1 8 . h t t p s : // d o i . o r g / 1 0 . 1 0 3 8 / msb4100025 7. Song M, Sukovich DJ, Ciccarelli L, Mayr J, Fernandez-Rodriguez J, Mirsky EA, Tucker AC, Gordon DB, Marlovits TC, Voigt CA (2017) Control of type III protein secretion using a minimal genetic system. Nat Commun 8(1):14737. https://doi.org/10.1038/ ncomms14737 8. Thomason MK, Bischler T, Eisenbart SK, Fo¨rstner KU, Zhang A, Herbig A, Nieselt K, Sharma CM, Storz G (2015) Global transcriptional start site mapping using differential RNA sequencing reveals novel antisense RNAs in Escherichia coli. J Bacteriol 197(1):18–28. https://doi.org/10.1128/jb.02096-14 9. Logel DY, Jaschke PR (2020) A highresolution map of bacteriophage ϕX174 transcription. Virology 547:47–56. https://doi. org/10.1016/j.virol.2020.05.008

Method to Add or Remove Intragenic Bacterial Promoters 10. Frumkin I, Lajoie MJ, Gregg CJ, Hornung G, Church GM, Pilpel Y (2018) Codon usage of highly expressed genes affects proteome-wide translation efficiency. Proc Natl Acad Sci 115(21):E4940–E4949. https://doi.org/10. 1073/pnas.1719375115 11. Yu C-H, Dang Y, Zhou Z, Wu C, Zhao F, Sachs Matthew S, Liu Y (2015) Codon usage influences the local rate of translation elongation to regulate co-translational protein folding. Mol Cell 59(5):744–754. https://doi. org/10.1016/j.molcel.2015.07.018 12. Hanson G, Coller J (2018) Codon optimality, bias and usage in translation and mRNA decay. Nat Rev Mol Cell Biol 19(1):20–30 13. Gorochowski TE, Espah Borujeni A, Park Y, Nielsen AA, Zhang J, Der BS, Gordon DB, Voigt CA (2017) Genetic circuit characterization and debugging using RNA-seq. Mol Syst Biol 13(11):952 14. Brophy JA, Voigt CA (2014) Principles of genetic circuit design. Nat Methods 11(5): 508–520 15. Wright BW, Molloy MP, Jaschke PR (2022) Overlapping genes in natural and engineered genomes. Nat Rev Genet 23(3):154–168 16. Wright BW, Ruan J, Molloy MP, Jaschke PR (2020) Genome modularization reveals overlapped gene topology is necessary for efficient viral reproduction. ACS Synth Biol. https:// doi.org/10.1021/acssynbio.0c00323

207

17. Jaschke PR, Dotson GA, Hung KS, Liu D, Endy D (2019) Definitive demonstration by synthesis of genome annotation completeness. Proc Natl Acad Sci U S A 116(48): 24206–24213. https://doi.org/10.1073/ pnas.1905990116 18. Logel DY, Trofimova E, Jaschke PR (2022) Codon-restrained method for both eliminating and creating intragenic bacterial promoters. ACS Synth Biol. https://doi.org/10.1021/ acssynbio.1c00359 19. Vvedenskaya IO, Vahedian-Movahed H, Zhang Y, Taylor DM, Ebright RH, Nickels BE (2016) Interactions between RNA polymerase and the core recognition element are a determinant of transcription start site selection. Proc Natl Acad Sci 113(21):E2899–E2905 20. Feklistov A, Darst SA (2011) Structural basis for promoter- 10 element recognition by the bacterial RNA polymerase σ subunit. Cell 147(6):1257–1269 21. Lane WJ, Darst SA (2006) The structural basis for promoter- 35 element recognition by the group IV σ factors. PLoS Biol 4(9):e269 22. LaFleur TL, Hossain A, Salis HM (2022) Automated model-predictive design of synthetic promoters to control transcriptional profiles in bacteria. Nat Commun 13(1):5159. https://doi.org/10.1038/s41467-02232829-5

Chapter 13 Genetic Code Expansion in Pseudomonas putida KT2440 Tianyu Gao, Jiantao Guo, and Wei Niu Abstract Emerging microorganism Pseudomonas putida KT2440 is utilized for the synthesis of biobased chemicals from renewable feedstocks and for bioremediation. However, the methods for analyzing, engineering, and regulating the biosynthetic enzymes and protein complexes in this organism remain underdeveloped. Such attempts can be advanced by the genetic code expansion-enabled incorporation of noncanonical amino acids (ncAAs) into proteins, which also enables further controls over the strain’s biological processes. Here, we give a step-by-step account of the incorporation of two ncAAs into any protein of interest (POI) in response to a UAG stop codon by two commonly used orthogonal archaeal tRNA synthetase and tRNA pairs. Using superfolder green fluorescent protein (sfGFP) as an example, this method lays down a solid foundation for future work to study and enhance the biological functions of KT2440. Key words Genetic code expansion, Noncanonical amino acids, Pseudomonas putida KT2440, Amber suppression, Orthogonal tRNA synthetase and tRNA

1

Introduction A promising biotechnological chassis, Pseudomonas putida KT2440 (KT2440), has diverse metabolic capabilities and genetic tractability, which makes it easy to work with. KT2440 has the FDA HV1 certified status and lacks virulence factors [1]. It recently drew significant attention as a host strain for environmentally friendly chemical production, such as using lignocellulosic carbon sources [2–8]. Reliable and versatile methods are in need to further characterize, engineer, and control the functions of protein complexes and biosynthetic enzymes in this organism to fully realize the promise of KT2440 in biotechnological applications. In synthetic biology, genetic code expansion is a powerful enabling tool to engineer new biological activities for both theoretical research and practical applications [9–12]. An orthogonal aminoacyl-tRNA synthetase (aaRS)/tRNA pair can be introduced to incorporate noncanonical amino acids (ncAAs) into host’s protein in response to a blank codon, frequently the UAG amber stop

Jeffrey Carl Braman (ed.), Synthetic Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2760, https://doi.org/10.1007/978-1-0716-3658-9_13, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2024

209

210

Tianyu Gao et al.

codon (Fig. 1a). More than 200 ncAAs with various chemical characteristics have been genetically incorporated into proteins of interest in viruses, bacteria, yeasts, mammalian cells, and even entire animals over the past 20 years [9–12]. In this chapter, we describe a plasmid-based method for the incorporation of tyrosine analogs using M. jannaschii tyrosyl-tRNA synthetase/tRNATyrCUA (MjTyrRS/tRNA) system [13] as well as the incorporation of lysine analogs with M. barkeri pyrrolysyltRNA synthetase/tRNAPylCUA (MbPylRS/tRNA) pair [14, 15] at position 149 of the superfolder green fluorescent protein (sfGFP). One plasmid has a MjTyrRS-derived tRNA synthetase or a MbPylRS-derived tRNA synthetase that is cloned into pSEVA621 vector (RK2 ori) under the Ptac promoter. On the second plasmid, sfGFP with an amber codon at the permissive site of N149 behind the Ptac promoter and the cognate tRNA, tRNATyrCUA, under the control of an E. coli-derived Prok promoter or the tRNAPylCUA under a KT2440 P_t01 promoter, are cloned into pBBRMCS2 vector (pBBR ori) (Fig. 2). The two plasmids are then cotransformed into KT2440 cells. In the presence of ncAAs, the mentioned tRNA synthetase can specifically aminoacylate the cognate tRNA with ncAAs, and the charged tRNA is able to decode the UAG amber codon at position 149 in sfGFP. The amber suppression results in the expression of the full-length sfGFP protein, which is detected by fluorescence measurement, protein purification, SDS-PAGE gel separation, and mass spectrometry analysis.

2

Materials All commercial chemicals are of analytical grade or higher. All solutions are prepared with ultrapure water.

2.1 Media and Reagents

1. Media: LB agar or LB liquid media. Can be stored at room temperature after autoclave. 2. Antibiotics: Add where appropriate to the following final concentrations: gentamicin (10 mg/L), kanamycin (50 mg/L), ampicillin (100 mg/L), and chloramphenicol (25 mg/L). Stock solutions of antibiotics are prepared at 1000-fold concentration, except for chloramphenicol, which is prepared in ethanol. Aqueous solutions of other antibiotics are filtered through 0.22 μm sterile membrane filters. Stock solutions are stored at -20 °C. 3. Noncanonical Amino Acids: p-azido-L-phenylalanine ( pAzF) (cat. no. 4020250.0001) and Nε-(tert-butoxycarbonyl)-Llysine (BocK) (cat. no. 4000211.0005) are purchased from Bachem. Both chemicals are stored at -20 °C.

Genetic Code Expansion in P. putida KT2440

211

Fig. 1 General scheme of genetic code expansion in KT2440. (a) An orthogonal aaRS/tRNA pair is efficiently expressed to genetically incorporate ncAA into a protein of interest in response to an amber stop codon. (b) Structures of ncAAs used in this chapter: p-azido-L-phenylalanine (pAzF) and Ne-(tert-butoxycarbonyl)-Llysine (BocK)

Fig. 2 Plasmid-based system for codon expansion in KT2440. (a) Plasmid constructs for the expression of sfGFP with tyrosine analogs incorporated at position 149. (b) Plasmid constructs for the expression of sfGFP with lysine analogs incorporated at position 149

4. Noncanonical Amino Acid Solutions: Dissolve ncAAs in 1 M NaOH to make stock solutions. Following the addition of the ncAA into the cell culture, the pH of the culture is immediately adjusted to neutral (see Note 1). Stock solutions of ncAA can be stored at -20 °C. 5. Regents for Preparing Competent Cells: (a) 0.1 M MgCl2. (b) TG buffer containing 15% glycerol, 75 mM CaCl2, and 6 mM MgCl2. Above solutions are stored at 4 °C after autoclave. 6. Reagents for Protein Purification:

212

Tianyu Gao et al.

(a) Binding buffer: 25 mM pH 7.4 potassium phosphate (KPi), 500 mM NaCl, 5 mM imidazole. (b) Wash buffer: 25 mM pH 7.4 KPi buffer, 500 mM NaCl buffer, 50 mM imidazole. (c) Elution buffer: 25 mM pH 7.4 KPi buffer, 500 mM NaCl buffer, 250 mM imidazole. (d) Desalting buffer: 25 mM pH 7.4 KPi buffer. (e) Above solutions are stored at 4 °C. 7. Reagents for Protein Concentration Determination: (a) Quick Start™ Bradford protein assay kit (Bio-Rad Laboratories). 2.2

Plasmids

1. pSEVA621-AzFRS: The plasmid is based on pSEVA621 vector [16]. This plasmid allows efficient expression of AzFRS [17] (see Note 2) under the Ptac promoter to incorporate pAzF into sfGFP-N149TAG. 2. pSEVA621-PylRS*: The plasmid is based on pSEVA621 vector. This plasmid allows efficient expression of PylRS* [18] (see Note 3) under the Ptac promoter to incorporate BocK into sfGFP-N149TAG. 3. pBBR-sfGFP N149TAG-MjtRNA: The plasmid is based on pBBRMCS2 vector [19]. This plasmid allows efficient expression of sfGFP under the Ptac promoter and MjtRNA under the control of an E. coli-derived ProK promoter. 4. pBBR-sfGFP N149TAG P_t01-PylT: The plasmid is based on pBBRMCS2 vector. This plasmid allows efficient expression of sfGFP under the Ptac promoter and tRNAPylCUA under the control of the promoter for a Tyr tRNA (P_t01) in KT2440 (see Note 4). 5. pBBR-sfGFP wild type: The plasmid is based on pBBRMCS2 vector. This plasmid allows efficient expression of sfGFP-wild type under the Ptac promoter. 6. All plasmids are available from the authors upon reasonable request.

2.3

Equipment

1. Barnstead Nanopure® ultrapure water purification system (Thermo Fisher Scientific Inc). 2. Synergy H1 Hybrid plate reader (BioTek Instrument). 3. Incubator shaker. 4. 96-well plates. 5. 0.22 μm sterile membrane filters. 6. Pipettes and tips (1000 μL, 200 μL, and 20 μL). 7. Ni Sepharose 6 Fast Flow resin (GE Healthcare).

Genetic Code Expansion in P. putida KT2440

213

8. Ultrasonic cell disruptor. 9. Amicon Ultra-0.5 mL centrifugal filters. 10. Refrigerated centrifuge.

3 3.1

Methods Cell Cultivation

3.2 Competent Cells Preparation (See Note 5)

Pseudomonas putida KT2440 can be obtained from the American Type Culture Collection (ATCC 47054). The strain is routinely cultured at 30 °C on LB agar plates or in LB liquid media. 1. Pick a single colony of KT2440 from LB plate without antibiotics into 5 mL of LB liquid media for inoculation. Shake the culture overnight at 30 °C, 250 rpm. 2. Add 1 mL of overnight culture into 25 mL LB media. Incubate culture at 30 °C, 250 rpm for 2.5 h. 3. Collect cells by centrifuging at 4 °C at 3500 g for 5 min. Remove supernatant (after this step, the cells should always be kept cold). 4. Resuspend the pelleted cells in 2.5 mL of 0.1 M MgCl2 and incubate on ice for 2 h. 5. Collect cells at 4 °C at 3500 g for 5 min. 6. Remove supernatant, then resuspend in 1.5 mL TG buffer. 7. Dispense 50 μL of resuspended cells per tube. Store at -80 °C.

3.3 Heat Shock Transformation of Plasmids

1. Incubate 50 μL competent cells with 500 ng DNA on ice for 15 min (see Note 6). 2. Heat shock the cells at 37 °C for 2.5 min. 3. Incubate the cells on ice for 1 min. 4. Add 200 μL LB media, then incubate the cells at 30 °C, 250 rpm for 1 h. 5. Streak 100 μL of cells on LB plate with appropriate antibiotics. Incubate the plate at 30 °C overnight.

3.4 Fluorescence Measurement

1. Transformed P. putida KT2440 cells are grown in 5 mL of LB media containing appropriate antibiotics at 30 °C with agitation at 250 rpm. When the cell density reaches an OD600 of 0.4, ncAA is supplemented in the culture media at the indicated concentration (see Note 7). 2. Following 12 h of cultivation, cells are harvested by centrifugation at 3500 g for 5 min, washed with equal volume of 1x PBS for three times, then resuspended in 1× PBS for cell density measurement and fluorescence quantification (excitation at 485 nm, emission at 528 nm) using a BioTek Synergy HTX plate reader (see Notes 8 and 9).

214

Tianyu Gao et al.

3.5 Purification of sfGFP Variant Proteins

1. For the purification of sfGFP with ncAA incorporation, a 50 mL cell culture is routinely grown under the same condition that is used for fluorescence measurement. 2. Cells are harvested by centrifugation at 4 °C and 5000 g for 10 min. The collected cells are then resuspended in binding buffer (4 mL per gram of wet cell pellets) and lysed by sonication (see Note 10). 3. sfGFP variants with C-terminal 6xHis tag are purified on Ni Sepharose 6 Fast Flow resin (GE Healthcare) following the manufacturer’s protocol with binding buffer, wash buffer, and elution buffer. Purified protein is quantified by the Bradford assay and analyzed by SDS-PAGE.

3.6 Sample Preparation for MS Analysis of Purified sfGFP

1. For full-length protein analysis: The purity of the target protein is first evaluated by SDS-PAGE. A purity above 80% is generally required. Imidazole and NaCl in the protein solution are removed by buffer exchange into desalting buffer at 4 °C using Amicon Ultra centrifugal filters following the manufacturer’s protocol (see Note 11). After the desalting step, the concentration of NaCl and imidazole in protein solution should be lower than 10 mM. Molecular weight analysis of the full-length protein is conducted by proteomics facility (see Note 12). 2. For tandem MS analysis: The target protein is separated from impurities by SDS-PAGE. Band of desired size is excised from the PAGE and collected into a 1.5 mL microtube for proteomics analysis. Briefly, the gel bands containing sfGFP variants are trypsin digested. Obtained peptides are analyzed using a Q-Exactive HF mass spectrometer. Scaffold is used for data processing and analysis. The MS result for the incorporation of pAzF into sfGFP N149TAG is shown in Fig. 3 as an example. The peptide which covers the site of incorporation has a modified phenylalanine residue at position 149. The molecular weight of the residue equals to that of pAzF. The result supports a site-specific incorporation of pAzF.

4

Notes 1. For ncAAs with good solubility in H2O, a stock solution of 100 mM is prepared directly in water. The ncAA stock solution used in this chapter is prepared using 1 M NaOH. Aliquots of 100 mM ncAA stock solution are stored at -20 °C. To prepare medium with ncAA, add ncAA stock solution first, then add an equal volume of 1 M HCl solution. The final pH of the medium is checked by pH test paper and should be neutral.

Genetic Code Expansion in P. putida KT2440

215

y4

100%

1.001.48 m/z, 2+, 2,000.95 Da, (Parent Error: 3.5 ppm) L K

E

Y D

A

N T

F I

S

N Y

V

F+41 H S

H F+41

V

Y N

T

I F

N

A

D

Y

E

K L

b2

80% Relative Intensity

y2 a2

y7

60% y6

y3

y1

40%

pAzF

y8

y5

20%

y9 y10

y11 y12

0% 0

250

500

750

1000 m/z

1250

1500

1750

2000

Fig. 3 LC-MS/MS analysis of sfGFP-149pAzF tryptic peptide fragments that contains the position of incorporation, LEYNFNSH(F+41)VYITADK

2. The orthogonality of the AzFRS/tRNATyrCUA pair to the host’s endogenous system has been verified. 3. A reported MbPylRS mutant (PylRS*, Y349F) with good activity toward BocK is used in this effort. The orthogonality of the MbPylRS*/tRNAPylCUA pair to the host’s endogenous system has been verified. 4. tRNAPylCUA expression can be a limiting factor for the efficient ncAAs incorporation in KT2440. Therefore, an RNA-seq dataset for the expression level of endogenous tRNAs when KT2440 cells were cultured to the exponential growth phase has been analyzed. The highest expression is observed when the promoter for the tRNATyrCUA (P_t01) is used. 5. Precool the centrifuge, 0.1 M MgCl2, and TG buffer to 4 °C before experiment. 6. 50 μL KT2440 competent cells are cotransformed with either 500 ng pBBR-sfGFP N149TAG-MjtRNA plasmid and 500 ng pSEVA621-AzFRS plasmid for the incorporation of pAzF or 500 ng pSEVA621-PylRS* plasmid and 500 ng pBBR-sfGFP N149TAG P_t01-PylT plasmid for the incorporation of BocK. 7. To quantify GFP expression, plasmids are transformed into P. putida KT2440, and cells are selected on LB agar plates containing appropriate antibiotics. Following incubation at 30 °C for 16 h, single colonies are picked to be cultivated at 30 °C with agitation at 250 rpm for 12 h, and then a 2% overnight culture is diluted in fresh LB. Biological triplicates are recommended for this analysis.

216

Tianyu Gao et al.

8. To wash the cells with 1× PBS solutions, pellet cells at 3500 g for 5 min, then remove as much supernatant as possible. Following the final round of washing step, resuspend the cells in appropriate volume of 1× PBS buffer to keep the culture OD600 between 0.4 and 1.0. This can be achieved by making series of dilutions of the final solution with 1× PBS buffer. 9. Normalized fluorescence is calculated by dividing the absolute fluorescence by cell density. The incorporation efficiency is calculated by dividing the normalized fluorescence of cells expressing the ncAA-containing sfGFP by the normalized fluorescence of cells expressing the wild-type sfGFP from the pBBRMCS2 vector under the same cultivation condition. With a low background of fluorescence in the absence of pAzF, the KT2440 cells cotransformed with pBBR-sfGFP N149TAG-MjtRNA and pSEVA621-AzFRS have an approximate 8.8-fold higher sfGFP expression in the presence of 1 mM pAzF. The level of sfGFP expression accounts for a 78.4% incorporation efficiency in comparison to the expression of the wild-type sfGFP from the pBBRMCS2 vector under the same cultivation condition [20]. The KT2440 cells cotransformed with pBBR-sfGFP N149TAG P_t01-PylT and pSEVA621-PylRS* can achieve 34.6% decoding efficiency with 1 mM BocK compared to the wild-type sfGFP under the same cultivation condition [20]. 10. For cell lysis, resuspended cells are placed in a 50 mL conical tube and sonicated for a total of 6 min with 4 s on and 9 s off at 50% amplitude. Throughout the process, cells are kept cold by placing the tube on an ice bath to reduce the heat accumulation. Do not let the sonication probe touch the wall of the tube during sonication to avoid damaging the instrument. 11. The desalting step can be accomplished using the Amicon Ultra-0.5 device. Following concentrating the protein sample from 500 μL down to 50 μL by centrifugation at 4 °C and 14,000 g for 30 min, the sample volume is brought back to 500 μL by the addition of 450 μL of desalting buffer for a second round of centrifugation. Following three rounds of desalting, a dilution factor of 1000× can be achieved. Alternatively, samples can be desalted by dialysis. 12. Because detergent can contaminate the LC-MS system, it should be avoided in all buffers that are used in the purification. For the sfGFP variant of 28 kDa, 100 ng protein in 10–50 μL desalting buffer is sufficient for the full-length analysis. The result complements to that of the tandem MS analysis. It also produces more reliable data when the ncAA is unstable in tandem MS analysis, which is the case for BocK.

Genetic Code Expansion in P. putida KT2440

217

References 1. Kampers LF, Volkers RJ, Martins Dos Santos VA (2019) Pseudomonas putida KT 2440 is HV 1 certified, not GRAS. Microb Biotechnol 12: 845–848 2. Nelson KE, Weinel C, Paulsen IT et al (2002) Complete genome sequence and comparative analysis of the metabolically versatile Pseudomonas putida KT2440. Environ Microbiol 4: 799–808 3. Belda E, van Heck RG, Jose´ Lopez-Sanchez M et al (2016) The revisited genome of Pseudomonas putida KT2440 enlightens its value as a robust metabolic chassis. Environ Microbiol 18:3403–3424 4. Beckham GT, Johnson CW, Karp EM et al (2016) Opportunities and challenges in biological lignin valorization. Curr Opin Biotechnol 42:40–53 5. Nikel PI, Chavarrı´a M, Danchin A, de Lorenzo V (2016) From dirt to industrial applications: Pseudomonas putida as a synthetic biology chassis for hosting harsh biochemical reactions. Curr Opin Chem Biol 34:20–29 6. Nikel PI, de Lorenzo V (2018) Pseudomonas putida as a functional chassis for industrial biocatalysis: from native biochemistry to transmetabolism. Metab Eng 50:142–155 7. Johnson CW, Salvachua D, Rorrer NA et al (2019) Chemicals and materials from bacterial aromatic catabolic pathways. Joule 3:1523– 1537 8. Niu W, Willett H, Mueller J et al (2020) Direct biosynthesis of adipic acid from lignin-derived aromatics using engineered Pseudomonas putida KT2440. Metab Eng 59:151–161 9. Liu CC, Schultz PG (2010) Adding new chemistries to the genetic code. Annu Rev Biochem 79:413–444 10. Dumas A, Lercher L, Spicer CD, Davis BG (2015) Designing logical codon reassignment –

expanding the chemistry in biology. Chem Sci 6:50–69 11. Young DD, Schultz PG (2018) Playing with the molecules of life. ACS Chem Biol 13:854– 870 12. Dela Torre D, Chin JW (2021) Reprogramming the genetic code. Nat Rev Genet 22: 169–184 13. Wang L, Brock A, Herberich B, Schultz PG (2001) Expanding the genetic code of Escherichia coli. Science 292:498–500 14. Srinivasan G, James CM, Krzycki JA (2002) Pyrrolysine encoded by UAG in archaea: charging of a UAG-decoding specialized tRNA. Science 296:1459–1462 15. Neumann H, Peak-Chew SY, Chin JW (2008) Genetically encoding Nε-acetyl lysine in recombinant proteins. Nat Chem Biol 4:232–234 ˜ i-Moreno 16. Martı´nez-Garcı´a E, Aparicio T, Gon A et al (2015) SEVA 2.0: an update of the Standard European Vector Architecture for de-/re-construction of bacterial functionalities. Nucleic Acids Res 43:D1183–D1189 17. Chin JW, Santoro SW, Martin AB et al (2002) Addition of p-Azido- L-phenylalanine to the genetic code of Escherichia coli. J Am Chem Soc 124:9026–9027 18. Yanagisawa T, Ishii R, Fukunaga R et al (2008) Multistep engineering of pyrrolysyl-tRNA synthetase to genetically encode Nε-(o-azido benzyl oxycarbonyl) lysine for site-specific protein modification. Chem Biol 15:1187–1197 19. Kovach ME, Phillips RW, Elzer PH et al (1994) pBBR1MCS: a broad-host-range cloning vector. BioTechniques 16:800–802 20. Xinyuan H, Tianyu G, Yan C et al (2022) Genetic code expansion in Pseudomonas putida KT2440. ACS Synth 11:3724–3732

Chapter 14 Genome-Wide Screen for Enhanced Noncanonical Amino Acid Incorporation in Yeast Briana R. Lino and James A. Van Deventer Abstract Expanding the genetic code beyond the 20 canonical amino acids enables access to a wide range of chemical functionality that is inaccessible within conventionally biosynthesized proteins. The vast majority of efforts to expand the genetic code have focused on the orthogonal translation systems required to achieve the genetically encoded addition of noncanonical amino acids (ncAAs) into proteins. There remain tremendous opportunities for identifying genetic and genomic factors that enhance ncAA incorporation. Here we describe genome-wide screening strategies to identify factors that enable more efficient addition of ncAAs to biosynthesized proteins. These unbiased screens can reveal previously unknown genes or mutations that can enhance ncAA incorporation and deepen our understanding of the translation apparatus. Key words High-throughput screening, Genome-wide screen, Yeast, Noncanonical amino acids, Genetic code expansion

1

Introduction Bespoke protein synthesis has played a substantial role in the advancement of science and engineering efforts for applications ranging from fundamental biological studies to therapeutic discovery [1]. Naturally encoded amino acids contain a relatively narrow range of functional groups, which in turn limits the functions of genetically encoded proteins. Accessing a broader range of functionalities requires looking beyond the canonical amino acids (cAAs). The introduction of genetically encoded noncanonical amino acids (ncAAs) into proteins is a powerful approach to augmenting protein function. In cells, ncAAs can be genetically encoded either by repurposing sense codons to replace a canonical amino acid or by adding amino acids to the genetic code in response to nonsense codons. Residue-specific replacement of a cAA with a ncAA repurposes one or more sense codons. This results

Jeffrey Carl Braman (ed.), Synthetic Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2760, https://doi.org/10.1007/978-1-0716-3658-9_14, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2024

219

220

Briana R. Lino and James A. Van Deventer

in global replacement of a cAA at any encoded position with a ncAA, which can be useful in proteins of interest but under some conditions also causes undesirable changes to protein folding and structure throughout the proteome [2]. Site-specific incorporation of a ncAA in response to a nonsense codon employs an orthogonal aminoacyl-tRNA synthetase and tRNA pair, also referred to as an orthogonal translation system (OTS), to introduce a ncAA at a specific location within a protein; this approach allows for point mutations to proteins of interest to be made with minimal disruption to protein structure (although readthrough of nonsense codons throughout the genome may also occur) [3]. Most commonly, site-specific incorporation is performed in response to one of the three stop codons, although schemes to incorporate ncAAs in response to quadruplet [4] or quintuplet [5] nucleotides and in response to a codon containing an unnatural base pair have also been devised [6]. A major challenge of genetic code expansion is that ncAA incorporation in response to a nonsense codon is typically inefficient compared to wild type (WT) protein translation. Genome engineering has enabled codon compression and more efficient translation with expanded genetic codes in E. coli [7], and strains bearing one or more codon-compressed chromosomes in the yeast S. cerevisiae are also now available [8]. However, these designed genomes (chromosomes) leave many questions open about how to best engineer cells to accommodate expanded genetic codes. An alternative approach to genome engineering is to conduct genomewide screens in search of genomic manipulations that enhance ncAA incorporation efficiency. In principle, these approaches could also be paired with genomic diversity-generating mechanisms, such as Synthetic Chromosome Rearrangement and Modification by LoxP-mediated Evolution (SCRaMbLE) [9]. The vast genetic and genomic resources available in the yeast S. cerevisiae make this an attractive organism in which to conduct genome-wide screens for enhanced ncAA incorporation. In addition, there are a number of protein engineering and synthetic biology tools available in this organism that allow for immediate exploitation of more efficient ncAA incorporation systems. Finally, since many core elements of eukaryotic cellular machinery are conserved throughout the tree of life, insights into enhanced translation systems gained in yeast may find immediate application in mammalian cells. We have recently implemented systems that support high-throughput screens for enhanced ncAA incorporation in nearly any strain of S. cerevisiae. In this chapter, we describe how to implement these tools to conduct genome-wide screens in yeast in search of factors that enhance noncanonical amino acid incorporation.

Genome-Wide Screen for Enhanced ncAA Incorporation

2

221

Materials

2.1 Transformation of Reporter System into Yeast

1. Chemically competent Saccharomyces cerevisiae: Yeast strain compatible with auxotrophic markers in reporter system. 2. pSPS-RepTAG-OTS and pSPS-Rep-OTS: Single-plasmid system (SPS) encodes both a reporter (Rep) and an orthogonal translation system (OTS). The version of the SPS with a TAG codon in its sequence is used during screening, and the version of the SPS with a cognate codon is used as a control during quantitative characterizations of candidate enhancements. 3. Selective solid growth media: SD-SCAA –RepDO, pH 6.0: “–RepDO” refers to the omission of the amino acid produced by the selectable marker encoded on the reporter plasmid: Combine 182 g sorbitol, 5.4 g anhydrous disodium phosphate, 8.56 g monosodium phosphate monohydrate, 15 g agar, and a magnetic stir bar. Add deionized water (diH2O) up to 900 mL. Autoclave mixture. In a separate vessel, combine 20 g dextrose, 6.7 g yeast nitrogen base without amino acids, and 2 g of synthetic amino acid drop-out mixture that excludes the amino acids required to maintain the SPS (see Note 2 for synthetic drop-out mix recipe), and add diH2O up to 100 mL. Once completely dissolved, sterile filter the mixture with a 0.2 μm filter cup into the autoclaved components that have cooled to 55 °C, under sterile conditions. Pour media into petri dishes under sterile conditions and leave overnight to solidify at room temperature. Store at 4 °C the following day. One liter of media is sufficient for pouring approximately three to four sleeves of plates. 4. Yeast transformation kit for preparing chemically competent yeast. 5. 1.7 mL microcentrifuge tubes. 6. Sterile petri dishes. 7. Incubator at 30 °C (stationary for yeast plate incubation).

2.2 Prepare Cells for Electroporation

1. May include materials from previous sections. 2. Selective liquid growth media: SD-SCAA –RepDO media, pH 4.5: Dissolve 20 g glucose, 6.7 g yeast nitrogen base without amino acids, 2 g of synthetic amino acid drop-out mixture that excludes the amino acids required to maintain the SPS (see Note 2 for synthetic drop-out mix recipe), and citrate buffer salts (10.4 g sodium citrate, 7.4 g citric acid monohydrate) in 1 L diH2O. Filter sterilize completely dissolved mixture using a 0.2 μm filter cup and store at room temperature.

222

Briana R. Lino and James A. Van Deventer

3. 100× Penicillin-streptomycin (pen-strep): Penicillin G, potassium salt (10,000 IU/mL) and streptomycin sulfate (10,000 μg/mL). 4. Spectrophotometer (to measure optical density of yeast at 600 nanometers). 5. Incubator at 30 °C (shaking at 300 rpm for yeast liquid culture growth). 6. Sterile polyethylene culture tubes with caps. 7. 100 mM lithium acetate (LiAc). 8. Sterile water. 9. 1 M dithiothreitol (DTT): Dissolve DTT (MW 154.25 g/mol) at 0.15425 g/mL in diH2O, then sterile filter using a 0.2 μm filter. Store on ice or at 4 °C until use. Prepare same day it will be used. 10. Yeast cells previously transformed with pSPS-RepTAG-OTS. 11. Pellet Paint NF Co-Precipitant. 12. 3 M sodium acetate. 13. 200 proof ethanol. 14. 70% ethanol in deionized water (diH2O). 15. pSPS-RepTAG-OTS. If using a plasmid-based collection: 16. pPOI-Collection: Plasmid-based library of proteins of interest (POI) on an auxotrophic marker compatible with the strain used for selection in yeast and an antibiotic resistance marker for selection in E. coli. See Note 1 for selecting libraries on different auxotrophic markers than single-plasmid reporter/ OTS. 2.3

Electroporation

1. May include materials from previous sections. 2. 2 mm electroporation cuvettes. 3. Rich growth media: Yeast Extract–Peptone–Dextrose (YPD) Medium: Combine 20 g peptone with 10 g yeast extract and add diH2O up to 900 mL. In a separate vessel, combine 20 g dextrose in 100 mL diH2O. Autoclave both mixtures, then combine under sterile conditions once both components have cooled to 60 °C. Store at room temperature. 4. Selective solid growth media: SD-SCAA –RepDO, pH 6.0. 5. Selective solid growth media: SD-SCAA –RepDO –LibDO, pH 6.0: “–LibDO” refers to the omission of the amino acid produced by the selectable marker encoded on the library plasmid: Combine 182 g sorbitol, 5.4 g anhydrous disodium phosphate, 8.56 g monosodium phosphate monohydrate, 15 g

Genome-Wide Screen for Enhanced ncAA Incorporation

223

agar, and a magnetic stir bar. Add diH2O up to 900 mL. Autoclave mixture. In a separate vessel, combine 20 g dextrose, 6.7 g yeast nitrogen base without amino acids, and 2 g of synthetic amino acid drop-out mixture that excludes the amino acids required to maintain the SPS and plasmid-based library (see Note 2 for synthetic drop-out mix recipe), and add diH2O up to 100 mL. Once completely dissolved, sterile filter the mixture with a 0.2 μm filter cup into the autoclaved components that have cooled to 55 °C, under sterile conditions. Pour media into petri dishes under sterile conditions and leave overnight to solidify at room temperature. Store at 4 °C the following day. One liter of media is sufficient for pouring approximately three to four sleeves of plates. 6. Selective liquid growth media: SD-SCAA –RepDO media, pH 4.5. 7. Selective liquid growth media: SD-SCAA –RepDO –LibDO media, pH 4.5: Dissolve 20 g glucose, 6.7 g yeast nitrogen base without amino acids, 2 g of synthetic amino acid drop-out mixture that excludes the amino acids required to maintain the SPS and plasmid-based library (see Note 2 for synthetic drop-out mix recipe), and citrate buffer salts (10.4 g sodium citrate, 7.4 g citric acid monohydrate) in 1 L diH2O. Filter sterilize completely dissolved mixture using a 0.2 μm filter cup and store at room temperature. 8. Electroporator. 9. Sterile 250 mL flasks for recovery. 10. Sterile 1 L bottles. 11. 15 mL sterile culture tubes. 12. KimWipes. 2.4 Yeast Culture Expansion

1. May include materials from previous sections. 2. 50 mL conical tubes. 3. 100× pen-strep. If using plasmid-based library: 4. Selective liquid growth media: SD-SCAA –RepDO –LibDO media, pH 4.5. If using yeast strain library: 5. Selective liquid growth media: SD-SCAA –RepDO media, pH 4.5.

2.5 Long-Term Library Storage

1. May include materials from previous sections. 2. 750 mL Nalgene bottles (or bottles for large volume centrifugation).

224

Briana R. Lino and James A. Van Deventer

3. 2 mL cryogenic storage tubes. 4. 60% glycerol: Add 60 mL 100% glycerol to 40 mL diH2O. Autoclave to sterilize and store at room temperature. 2.6 Library Characterization

1. May include materials from previous sections. 2. Sterile polyethylene culture tubes with caps. 3. 100× pen-strep. 4. 50 mM noncanonical amino acid (ncAA): Prepare a 50 mM liquid stock of the L-isomer of the ncAAs by dissolving the ncAA in 90% of the final volume diH2O and vortexing thoroughly. The addition of NaOH may be required to fully dissolve the ncAA. Add diH2O to the final volume and sterile filter using a 0.2 μm filter before use (see Note 3). 5. Incubator at 20 °C (shaking at 300 rpm for yeast liquid culture induction). 6. Analytical flow cytometer. 7. Flow cytometry data analysis software. 8. 10× PBS, pH 7.4: Dissolve 80 g sodium chloride, 2 g potassium chloride, 14.4 g disodium phosphate, and 2.4 g monopotassium phosphate in 900 mL diH2O. Adjust pH to 7.4 and add diH2O to a final 1 L volume. Filter sterilize completely dissolved mixture using a 0.2 μm filter cup and store at room temperature. 9. 1× PBSA, pH 7.4: Combine 100 mL 10× PBS and 1 g bovine serum albumin (BSA) with 850 mL diH2O. Stir until BSA has completely dissolved, then adjust pH to 7.4. Add diH2O to a final 1 L volume and filter sterilize using a 0.2 μm filter cup and store at room temperature. If using plasmid-based library: 10. Selective liquid growth media: SD-SCAA –RepDO –LibDO media, pH 4.5. 11. Selective liquid induction media: SG-SCAA –RepDO –LibDO media, pH 6.0: Dissolve 20 g galactose, 2 g glucose, 6.7 g yeast nitrogen base without amino acids, 2 g synthetic amino acid drop-out mixture that excludes the amino acids required to maintain the SPS and plasmid-based library (see Note 2 for synthetic drop-out mix recipe), 5.4 g anhydrous disodium phosphate, and 8.56 g monosodium phosphate monohydrate in 1 L diH2O. Filter sterilize using a 0.2 μm filter cup and store at room temperature. If using yeast strain library: 12. Selective liquid growth media: SD-SCAA –RepDO media, pH 4.5.

Genome-Wide Screen for Enhanced ncAA Incorporation

225

13. Selective liquid induction media: SG-SCAA –RepDO media, pH 6.0: Dissolve 20 g galactose, 2 g glucose, 6.7 g yeast nitrogen base without amino acids, 2 g synthetic amino acid drop-out mixture that excludes the amino acids required to maintain the SPS (see Note 2 for synthetic drop-out mix recipe), 5.4 g anhydrous disodium phosphate, and 8.56 g monosodium phosphate monohydrate in 1 L diH2O. Filter sterilize using a 0.2 μm filter cup and store at room temperature. 14. Primary antibodies. (a) Mouse anti-HA. (b) Chicken anti-c-myc. 15. Secondary antibodies. (a) Goat anti-mouse Alexa Fluor 488. (b) Goat anti-chicken Alexa Fluor 647. 2.7 Sanger Sequencing

For Plasmid-Based Collections: 1. May include materials from previous sections. 2. Selective liquid growth media: SD-SCAA –RepDO –LibDO media, pH 4.5. 3. 100× pen-strep. 4. Yeast plasmid miniprep kit. 5. Chemically competent E. coli (see Note 4). 6. E. coli miniprep kit. 7. Liquid growth media: Luria-Bertani (LB): Dissolve 10 g tryptone, 5 g yeast extract, and 10 g sodium chloride in 1 L diH2O. Autoclave to sterilize. Note: LB medium also available as a premixed powder. Store at room temperature. 8. Selective solid growth media: Luria-Bertani (LB) plates with antibiotics: Dissolve 5 g tryptone, 2.5 g yeast extract, 5 g sodium chloride, and 7.5 g agar in 500 mL diH2O in a 1 L bottle. Add a stir bar and autoclave to sterilize. Once media has cooled to 55 °C, add antibiotic. 9. Antibiotic stock solution: Dissolve antibiotic in diH2O at 100 mg/mL and sterile filter with a 0.2 μm filter. Store at 20 °C for up to 1 year or at 4 °C for up to 1 month. See Note 5 for working concentrations of various common antibiotics. 10. Sanger sequencing primers (see Note 6). 11. Gel extraction kit. 12. PCR reagents (see Note 7).

226

Briana R. Lino and James A. Van Deventer

For Barcoded Yeast Strain Libraries: 1. Sanger sequencing primers (see Note 6). 2. Gel extraction kit. 3. PCR reagents (see Note 7). 4. Genomic isolation materials (see Note 8). 2.8 FluorescenceActivated Cell Sorting (FACS)

1. May include materials from previous sections. 2. Round bottom polystyrene test tubes with cell strainer (5 mL). 3. Fluorescence-activated cell sorter. 4. Polystyrene culture tubes. If using plasmid-based library: 5. Selective liquid growth media: SD-SCAA –RepDO –LibDO media, pH 4.5. 6. Selective liquid induction media: SG-SCAA –RepDO –LibDO media, pH 6.0. If using yeast strain library: 7. Selective liquid growth media: SD-SCAA –RepDO media, pH 4.5. 8. Selective liquid induction media: SG-SCAA –RepDO media, pH 6.0.

2.9

Deep Sequencing

1. May include materials from previous sections. 2. Amplicon sequencing primers: Primers designed to flank target DNA fragments, with or without adapter sequences. 3. Bioinformatics software for deep sequencing analysis.

3

Methods The yeast S. cerevisiae is a key biological model organism that has led to a critical understanding of many elements of eukaryotic biology. The translation apparatus in yeast is conserved with the translation apparatus of higher-order eukaryotes, suggesting that enhanced ncAA incorporation systems engineered in yeast could also lead to improvements in protein biosynthesis in mammalian cells. The genetic and genomic resources available in S. cerevisiae are extensive: this includes collections of strains harboring gene deletions [10] and titratable gene expression strains [11], plasmid-encoded gene overexpression collections [12], transposon-insertion libraries [13], and many more resources. Recently, we reported a genome-wide screen for enhanced ncAA incorporation in yeast using a pooled Saccharomyces cerevisiae

Genome-Wide Screen for Enhanced ncAA Incorporation

227

Fig. 1 Schematic for a genome-wide screen of a pooled yeast knockout collection. Yeast strain-based libraries may be screened for enhanced ncAA incorporation using the process depicted. A pooled library that contains barcode sequences flanking the gene of interest is first transformed with a reporter and synthetase plasmid prior to inducing reporter expression in the presence of a ncAA. The induced library can then be screened for enhanced ncAA incorporation using fluorescence-activated cell sorting (FACS). The populations sorted for improved stop codon readthrough are then subjected to barcode sequencing to identify candidate strains. (Reprinted with permission from [14]. Copyright 2022 American Chemical Society)

heterozygous diploid BY4743 molecular barcoded yeast knockout (YKO) collection (Fig. 1) [14]. Using fluorescent reporters of ncAA incorporation, Zackin et al. identified 55 candidate gene knockouts that appeared to enhance ncAA incorporation (two knockouts were also validated extensively in this work). Interestingly, many of the candidate knockouts have no known connection to the protein translation apparatus, demonstrating the power of this unbiased screening approach. Overall, the promise of screening additional genetic and genomic diversities has strong potential to lead to further enhanced systems for ncAA incorporation and to basic biological insights about the eukaryotic translation apparatus. The methods detailed in this chapter serve as a guide to design and implement genome-wide screens for enhanced ncAA incorporation in yeast (Fig. 2). In addition, the modularity of these methods enables them to be used to screen libraries in which genetic diversity is concentrated within one gene or a handful of genes suspected to control ncAA incorporation efficiency. These methods use yeast display [15] and fluorescent reporters [16] to support quantification of ncAA incorporation efficiency and fidelity. To conduct a genome-wide screen, it is necessary to identify compatible combinations of genetic/genomic diversity, ncAA incorporation reporter system, and orthogonal translation system (OTS). We have established a number of systems that support genome-wide screens, with Single-Plasmid Reporter/OTS [17] systems allowing for the most flexibility in using existing genetic/ genomic collections by requiring only a single-selection marker to maintain the plasmid encoding the reporter and OTS. These singleplasmid systems (SPSs) come in two forms: an intracellular dual-

228

Briana R. Lino and James A. Van Deventer

Fig. 2 Overview of methods for genome-wide screen for enhanced ncAA incorporation in yeast. Plasmid-based libraries or yeast strain libraries containing genome-wide manipulations are first transformed with a fluorescent reporter and orthogonal translation system plasmid (single-plasmid system) prior to screening. The naı¨ve library is then subjected to flow cytometric analysis and sequencing to confirm library diversity and compatibility with the reporter system. Once confirmed, the naı¨ve library may be sorted for enhanced ncAA incorporation via fluorescence-activated cell sorting (FACS). The sorted population(s) is then characterized to confirm enrichment for improved ncAA incorporation. To comprehensively evaluate enrichments of mutations to identify sequence-function landscapes, deep sequencing may be performed on the naı¨ve and sorted libraries. Isolation of single clones from sorted populations, sequencing, and characterization via flow cytometric analysis is important to further confirm that mutations (or genes) of interest lead to enhanced ncAA incorporation efficiency

fluorescent protein reporter and a yeast display reporter. Both systems include a TAG stop codon flanked by N-terminal and C-terminal sequences that encode fluorescent proteins (dualfluorescent) or epitope tags (yeast display) that can be detected in flow cytometry experiments (Fig. 3). The intracellular system uses Blue Fluorescent Protein on the N-terminal side and Green

Genome-Wide Screen for Enhanced ncAA Incorporation

229

Fig. 3 Reporter and orthogonal translation system (OTS) architecture. Single-plasmid and dual-plasmid systems are available for both yeast display reporters and intracellular reporters. Yeast display systems encode an HA tag, a TAG codon downstream of the HA tag, and an scFv followed by a c-myc tag downstream of the TAG codon. The display system used is the Aga1p-Aga2p yeast display system [18]. Intracellular systems encode Blue Fluorescent Protein upstream of a linker containing a TAG codon followed by Green Fluorescent Protein downstream of the TAG. Single-plasmid systems encode both the reporter sequence and the OTS on the same plasmid. Dual-plasmid systems encode the reporter and OTS on separate plasmids, necessitating the transformation of two separate plasmids into yeast to facilitate screening for enhanced ncAA incorporation. (Reprinted with permission from [17]. Copyright 2021 American Chemical Society)

Fluorescent protein on the C-terminal side. The yeast display system utilizes an HA tag on the N-terminal side and a c-myc tag on the c-terminal side; these tags can be labeled with primary and secondary antibodies for fluorescent detection. Both systems are under the control of a galactose-inducible Gal1-10 promoter. Within the single-plasmid systems, the aminoacyl-tRNA synthetase/tRNACUA pair is encoded under constitutive promoters on the same plasmid as the inducible reporter system. A list of these single-plasmid systems along with their auxotrophic markers is available in Table 1. Some yeast strains known to be compatible with the single-plasmid systems are summarized in Table 2. There are numerous possible combinations of genetic/genomic diversity, yeast strain background, and reporter/OTS. For example, for plasmid-based genetic diversities, the yeast strain BY4741 (MATa his3Δ1 leu2Δ0 met15Δ0 ura3Δ0) transformed with the singleplasmid system pRS416-BXG–LeuOmeRS (dual-fluorescent reporter + OTS on URA marker), could be transformed with a pooled plasmid library encoded on either a HIS or LEU marker plasmid. If a yeast display reporter is desired in place of an intracellular reporter, the Aga1p-Aga2p yeast display system used here requires use of a yeast display strain such as RJY100 (this strain contains a genomically integrated, galactose-inducible copy of the

230

Briana R. Lino and James A. Van Deventer

Table 1 Examples of single-plasmid reporter/OTSs Auxotrophic marker

Antibiotic Resistance

Reporter type

Yeast display strain needed?

TRP

Amp

Display

Yes

pCTCON2-FAPB2.3.6-L1TAG- TRP TyrAcFRS

Amp

Display

Yes

pCTCON2-FAPB2.3.6– LeuOmeRS

TRP

Amp

Display

Yes

pCTCON2-FAPB2.3.6TyrAcFRS

TRP

Amp

Display

Yes

pRS416-BXG-altTAG– LeuOmeRS

URA

Amp

Intracellular No

pRS416-BXG-altTAGTyrOmeRS

URA

Amp

Intracellular No

pRS416-BXG–LeuOmeRS

URA

Amp

Intracellular No

pRS416-BXG-TyrOmeRS

URA

Amp

Intracellular No

pRS416-BYG-TyrOmeRS

URA

Amp

Intracellular No

pRS416-BYG–LeuOmeRS

URA

Amp

Intracellular No

pRS413-TyrOmeRS-Aga1pFAPB2.3.6

HIS

Amp

Display

No

pRS413-TyrOmeRS-Aga1pFAPB2.3.6L1TAG

HIS

Amp

Display

No

Plasmid pCTCON2-FAPB2.3.6L1TAG–LeuOmeRS

Table 2 Yeast strains compatible with SPS reporter/OTS Yeast display strain?

Yeast strain

Genotype

RJY100 (Trp-, Leu-)

MATa AGA1::GAL1-AGA1::URA3 ura3–52 trp1::KanMX leu2-Δ 200 his3-Δ200 pep4::HIS3 prbd1.6R can1 GAL

Yes

BY4741 (His-, Leu-, Ura-)

MATa his3Δ1 leu2Δ0 met15Δ0 ura3Δ0

No

BY4743 (His-, Leu-, Ura-)

MATa/α his3Δ1/his3Δ1 leu2Δ0/leu2Δ0 LYS2/lys2Δ0 met15Δ0/ No MET15 ura3Δ0/ura3Δ0

gene encoding Aga1p). If opting to screen a yeast strain collection (genomic diversity), such as the BY4743 molecular barcoded knockout collection (MATa/α his3Δ1/his3Δ1 leu2Δ0/leu2Δ0 LYS2/lys2Δ0 met15Δ0/MET15 ura3Δ0/ura3Δ0), the singleplasmid system would need to be compatible with the strain

Genome-Wide Screen for Enhanced ncAA Incorporation

231

background of the collection (here: single-plasmid system on a HIS, LEU, or URA marker). Barcoded strain collections, where the genomic manipulation encoded within an individual strain can be identified via sequencing of short DNA barcodes, are highly desirable for these screens to facilitate rapid identification of candidate strains (and associated genomic manipulations) that appear to support enhanced ncAA incorporation. Similarly, for plasmid-based collections, a known target or conserved sequence must be available in the plasmid adjacent to the gene of interest in order to enable high-throughput sequencing to establish which genes are contributing to improved stop codon readthrough. 3.1 Transform pSPSRepTAG-OTS and pSPS-Rep-OTS into Chemically Competent Yeast

For Plasmid-Based Libraries: Chemically transforming yeast with the reporter plasmid prior to making the yeast electrocompetent increases the chances of retaining both the reporter plasmid and plasmid-based library during electroporation to introduce the plasmid-based library into the strain. 1. Thaw one aliquot of chemically competent yeast on ice for 15–20 min. 2. Thaw reporter/OTS plasmids (pSPS-RepTAG-OTS and pSPSRep-OTS) at room temperature (or on ice). The SPS must be chosen so that it contains an auxotrophic marker compatible with the yeast strain transformed with plasmid library encoding genetic diversity or collection of yeast strains encoding genomic diversity (see Note 1). In some cases, dual-plasmid systems can be used; when using a plasmid-based library, single-plasmid systems are preferred to allow for flexibility with available auxotrophic markers and provide less cellular strain resulting from plasmid retention efforts. See Table 1 for a list of available single-plasmid systems and corresponding specifications. 3. Combine 10 μL of competent yeast cells, 1 μL of plasmid, and 100 μL of EZ 3 solution under sterile conditions. 4. Incubate at 30 °C for 45 min without shaking, flicking tubes every 15 min. 5. Place two SD-SCAA –RepDO plates in the 30 °C incubator to warm up at this time. 6. Plate 50 μL of each cell mixture on separate plates using a pipette tip at an angle to spread the cells under sterile conditions and leave in the 30 °C incubator until individual colonies grow (2–3 days). Once colonies have formed, plates can be stored at 4 °C for 4–6 weeks.

232

Briana R. Lino and James A. Van Deventer

3.2 Prepare Cells for Electroporation 3.2.1 Yeast Culture Growth

For Plasmid-Based Libraries 1. Grow yeast in fresh media. (a) Inoculate a single colony of yeast transformed with pSPSRepTAG-OTS and inoculate into 5 mL SD-SCAA – RepDO +50 μL 100× Pen-strep. (i) Leave shaking at 300 rpm in a 30 °C incubator at a 45 degree angle for 2–3 days until saturated. Note that shaking speed may vary as a result of incubator used. See manufacturer’s guidelines for recommended speed for yeast culture growth ensuring proper oxygenation. (ii) Check Optical Density at 600 nm (OD600) of saturated culture and calculate volume of culture is needed for a final OD600 = 1 in 5 mL. Note that in yeast, a saturated culture should be an OD600 of approximately 8–10. (iii) Combine calculated culture volume from previous step with enough fresh SD-SCAA –RepDO media supplemented with 50 μL 100× Pen-strep for a 5 mL final culture volume. (iv) Leave shaking in a 30 °C incubator at a 45 degree angle overnight at 300 rpm. (v) Repeat steps ii–iv for a second grow-out. 2. Dilute the saturated culture from the incubator to an OD600 of 1 in 5 mL of fresh SD-SCAA –RepDO +50 μL 100× Pen-strep. 3. Leave shaking in a 30 °C incubator at a 45 degree angle overnight at 300 rpm. 4. Measure the OD600 of the yeast culture and calculate volume needed for a 100 mL culture at an OD600 = 0.2. (Note: 100 mL is enough for four transformations. Scale up/down as needed). 5. Aliquot calculated volume into a 50 mL conical tube and spin down at 2400 rcf for 5 min. Discard supernatant. 6. Resuspend the cell pellet in 50 mL YPD. Transfer culture to a 250 mL flask and add another 50 mL of YPD for a total culture volume of 100 mL. Leave shaking at 300 rpm in 30 °C incubator until OD600 reaches ~1.5 (about 4–6 h). An OD600 within the range of 1.35–1.8 is acceptable. 7. In this time, concentrate DNA using the Pellet Paint Protocol in Subheading 3.2.2. For a detailed Pellet Paint protocol see Ref. [19].

Genome-Wide Screen for Enhanced ncAA Incorporation

233

For Yeast Strain Libraries: 1. Grow yeast in fresh media. (a) Passage an appropriate volume of pooled yeast strain library that would cover at least 10× library diversity into 100 mL YPD (see Notes 5 and 9 about antibiotic selection and working concentrations). Note that for yeast genomic diversity, the reporter/OTS plasmid will be transformed into the yeast strain library via electroporation. (i) Leave shaking at 300 rpm in a 30 °C incubator at a 45 degree angle overnight. (ii) Check OD600 of saturated culture and calculate volume of culture needed for a final OD600 = 1 in 5 mL. (iii) Combine calculated culture volume from previous step with enough fresh YPD media supplemented with 50 μL 100× Pen-strep and antibiotic for a 5 mL final culture volume. (iv) Leave shaking in a 30 °C incubator at a 45 degree angle overnight at 300 rpm. (v) Repeat steps ii–iv for a second grow-out. 2. Dilute the saturated culture from the incubator to an OD600 of 1 in 5 mL of fresh YPD + 50 μL 100× Pen-strep. 3. Leave shaking in a 30 °C incubator at a 45 degree angle overnight at 300 rpm. 4. Measure the OD600 of the yeast culture and calculate volume needed for a 100 mL culture at an OD600 = 0.2. (Note: 100 mL is enough for four transformations. Scale up/down as needed). 5. Aliquot calculated volume into a 50 mL conical tube and spin down at 2400 rcf for 5 min. Discard supernatant. 6. Resuspend the cell pellet in 50 mL YPD. Transfer culture to a 250 mL flask and add another 50 mL of YPD for a total culture volume of 100 mL. Leave shaking at 300 rpm in 30 °C incubator until OD600 reaches ~1.5 (about 4–6 h). An OD600 within the range of 1.35–1.8 is acceptable. 3.2.2

Pellet Paint

For Plasmid-Based Libraries 1. Mix 1 μg of pooled plasmid-based library DNA (pPOICollection) and 1 μg of reporter/OTS plasmid (pSPSRepTAG-OTS) in a 1.7 mL microcentrifuge tube. Bring up to 100 μL sterile water. Note: Although the yeast cells have been previously transformed with the reporter/OTS plasmid, this plasmid is included in pellet paint to increase chances of

234

Briana R. Lino and James A. Van Deventer

retaining both the reporter and library plasmids during and after electroporation. 2. In a separate tube, add 1 μg reporter/OTS plasmid only and bring up to 100 μL sterile water. This will be the negative control. It is strongly encouraged to additionally prepare a positive control electroporation using a plasmid that contains the same auxotrophic marker as is encoded within the library plasmid. 3. Follow the Pellet Paint Protocol according to manufacturer’s guidelines. 4. Allow pellet to air dry for a few hours until it bounces around the tube when flicked. 5. When ready to use, resuspend dried pellets in 10 μL sterile H2O. Note: If extended storage of pelleted and resuspended DNA is required, the DNA may be stored at -20 °C. For Yeast Strain Libraries 1. Concentrate 1 μg reporter/OTS plasmid DNA (pSPSRepTAG-OTS) using the Pellet Paint Protocol outlined in Subheading 3.2.2, omitting the pPOI-Collection plasmid. The negative electroporation control will be electroporating cells without the addition of a reporter/OTS plasmid. 3.2.3 Making Electrocompetent Cells

For Plasmid-Based Libraries 1. Chill 2 mm electroporation cuvettes on ice. LiAc and sterile water should be chilled in advance at 4 °C. 2. Once OD600 of yeast culture diluted in YPD reaches ~1.5, under sterile conditions transfer culture to two sterile 50 mL conical tubes. 3. Spin down for 5 min at 2000 rcf, decant and resuspend each tube in 25 mL chilled sterile LiAc (100 mM) by shaking vigorously. 4. Cool centrifuge to 4 °C. 5. Add 250 μL freshly prepared, sterile filtered 1 M DTT to each 50 mL tube & invert to mix. 6. Loosen caps of the conical tubes to ensure maximal aeration; use tape to gently secure the cap on the tube, and shake at 300 rpm in 30 °C incubator for 10 min. 7. At the end of the incubation, keep all cells on ice or at 4 °C for the remainder of the procedure. 8. After the 10-min incubation period, tighten caps and centrifuge at 2000 rcf for 5 min at 4 °C. 9. Decant supernatant and resuspend each pellet in 25 mL chilled sterile H2O with vigorous shaking.

Genome-Wide Screen for Enhanced ncAA Incorporation

235

10. Centrifuge at 2000 rcf for 5 min at 4 °C. 11. Decant supernatant and resuspend each pellet in 250 μL sterile chilled water by pipetting. Ensure that the pellets are completely resuspended with no clumps in solution using repeated pipetting. The combined volume of each pellet and water will be approximately 500 μL, enough for two electroporations. For Yeast Strain Libraries 1. Follow the steps listed in Subheading 3.2.3.1 to make yeast strain library cells electrocompetent. 3.3

Electroporation

For Plasmid-Based Libraries 1. Prepare a conical tube with 6 mL YPD (2 mL per electroporation) and three sterile culture tubes for recovery. Note: This protocol is written with volumes that account for three electroporations: one library-based electroporation, and two controls (positive and negative). Volumes may be scaled up to accommodate larger libraries. 2. Remove DNA from -20 °C and let thaw at room temperature if previously frozen. Ensure DNA is completely thawed prior to use. 3. Take 250 μL electrocompetent cells and add to 10 μL preparations of resuspended DNA. 4. Mix cells and DNA thoroughly using a pipette and transfer to chilled 2 mm electroporation cuvette. 5. Dry the outside of the cuvette with a KimWipe, place cuvette in shock pad and shock cells using a square wave protocol, 500 V, 15 ms pulse, 1 pulse. The droop percentage should be ~6. 6. Immediately after the shock is completed, add 1 mL YPD into the cuvette, pipet up and down, and pour contents into a sterile culture tube. Rinse out the cuvette with an additional 1 mL YPD, pipetting up and down in the gap, and pour into the culture tube. 7. Perform electroporations for any additional samples and controls and incubate culture tubes at 30 °C without shaking for 1 h. Warm 3 SD-SCAA –RepDO –LibDO plates in 30 °C. During the hour, prelabel dilution tubes (100×, 1000×, 10,000×, 100,000× dilutions) and prepare a sterile 250 mL flask with 95 mL SDSCAA –RepDO –LibDO +1 mL 100× Pen-strep for each electroporation to transform the library into the yeast strain (excluding controls). 8. Label plates & draw quadrants for each dilution step. 9. After 1 h, remove tubes with transformed cells, vortex gently at low intensity, set aside 10 μL of cells from each electroporation

236

Briana R. Lino and James A. Van Deventer

(including controls) and dilute in 990 μL SDSCAA –RepDO – LibDO. This is the 100× dilution. 10. Spin down remaining volume of cells from each electroporation (excluding controls) for 5 min at 900 rcf. 11. Aspirate supernatant and resuspend in 5 mL SDSCAA – RepDO –LibDO. Add this 5 to 94 mL SDSCAA–RepDO – LibDO +1 mL 100× Pen-strep in a 250 mL flask. 12. Incubate the 100 mL culture at 30 °C for 48 h at 300 rpm. 13. To create a serial dilution for plating and colony counting, take the 100× dilution samples that were set aside in step 9, vortex gently and transfer 20 μL into 180 μL SDSCAA –RepDO – LibDO (this tube is 1000×) dilution. Mix repeatedly with the pipet and vortex gently. Dilute the reporter only control into SDSCAA –RepDO. (a) Repeat previous numbered step until 10,000× and 100,000× dilutions are created. This will give a 100× dilution, 1000× dilution, 10,000× dilution, and 100,000× dilution. (b) Remove prewarmed, labeled SDSCAA –RepDO –LibDO plates and SDSCAA–RepDO plate. (c) Vortex dilutions vigorously and plate 20 μL each dilution in the appropriate quadrant. (d) Incubate at 30 °C for 3–4 days. Count colonies in each quadrant of plate. Note: A single colony in the 100×, 1000×, 10,000×, and 100,000× quadrants corresponds to 1 × 104, 1 × 105, 1 × 106, and 1 × 107 transformed cells, respectively. The number of transformed cells determined by colony counts in different quadrants should ideally result in the same values. The counts determined from two adjacent quadrants can be averaged to more accurately estimate the number of transformants in the library. See Note 10 about transformation efficiency calculations to determine library diversity. Note: The negative control plates should have no growth due to the selective solid media being unable to provide essential amino acids that the library plasmid, not present in the negative control sample, would have encoded. For Yeast Strain Libraries 1. Prepare a conical tube with 4 mL YPD (2 mL per electroporation) and two sterile culture tubes for recovery. Note: This protocol is written with volumes that account for two electroporations: one library-based electroporations (with pSPSRepTAG-OTS) and one negative control (yeast strain library

Genome-Wide Screen for Enhanced ncAA Incorporation

237

without pSPS-RepTAG-OTS). Volumes may be scaled up to accommodate multiple transformations of the library. 2. Remove DNA from -20 °C and let thaw at room temperature if previously frozen. 3. Take 250 μL electrocompetent cells and add to 10 μL preparations of resuspended DNA. Reserve one tube for electrocompetent cells with no DNA added for the negative control. 4. Mix cells and DNA thoroughly using a pipette and transfer to chilled 2 mm electroporation cuvette. 5. Dry the outside of the cuvette with a KimWipe, place cuvette in shock pad and shock cells using a square wave protocol, 500 V, 15 ms pulse, 1 pulse. The droop percentage should be ~6. 6. Immediately after the shock is completed, add 1 mL YPD into the cuvette, pipet up and down and pour contents into a sterile culture tube. Rinse out the cuvette with an additional 1 mL YPD, pipetting up and down in the gap, and pour into the culture tube. 7. Repeat electroporations for additional samples and controls and incubate culture tubes at 30 °C without shaking for 1 h. Warm SD-SCAA –RepDO plates in 30 °C. 8. During the hour, prelabel dilution tubes (100×, 1000×, 10,000×, 100,000× dilutions) and prepare sterile 250 mL flasks with 95 mL SDSCAA –RepDO +1 mL 100× Pen-strep. 9. Label plates & draw quadrants for each dilution step. 10. After 1 h, remove tubes with transformed cells, vortex gently at low intensity, set aside 10 μL of cells from each electroporation (including control) and dilute in 990 μL SDSCAA –RepDO. This is the 100× dilution. 11. Spin down remaining volume of cells from each electroporation (excluding control) for 5 min at 900 rcf. 12. Aspirate supernatant and resuspend in 5 mL SDSCAA – RepDO. Add this to 94 mL SDSCAA –RepDO +1 mL 100× Pen-strep (prepared in advance) in a 250 mL flask. 13. Incubate the 100 mL culture at 30 °C for 48 h at 300 rpm. 14. To create a serial dilution, Take the 100× dilution samples that were set aside in step 10, vortex gently and transfer 20 μL into 180 μL SDSCAA–RepDO (this tube is 1000×) dilution. Mix repeatedly with the pipet and vortex gently. (a) Repeat previous numbered step until 10,000× and 100,000× dilutions are created. This will give a 100× dilution, 1000× dilution, 10,000× dilution, and 100,000× dilution. (b) Remove prewarmed, labeled SDSCAA –RepDO plates.

238

Briana R. Lino and James A. Van Deventer

(c) Vortex dilutions vigorously and plate 20 μL each dilution in the appropriate quadrant. (d) Incubate at 30 °C for 3–4 days. Count colonies in each quadrant of plate. Note: A single colony in the 100×, 1000×, 10,000×, and 100,000× quadrants corresponds to 1 × 104, 1 × 105, 1 × 106, and 1 × 107 transformed cells, respectively. The number of transformed cells determined by colony counts in different quadrants should ideally result in the same values. The counts determined from two adjacent quadrants can be averaged to more accurately estimate the number of transformants in the library. See Note 10 about transformation efficiency calculations to determine library diversity. 3.4 Expanding Culture

For Plasmid-Based Libraries 1. Once the 100 mL liquid culture is saturated, pour cultures into 2 × 50 mL conical tubes per flask. Spin down 2000 rcf for 5 min. 2. Decant supernatant & resuspend entire pellet in 1 L SD-SCAA –RepDO –LibDO +10 mL 100× Pen-strep in a 2 L flask. 3. Leave shaking overnight at 30 °C at 300 rpm. 4. This culture can be scaled up further to prepare enough cells to accommodate aliquots with larger numbers of cells (if a large number of transformants has been generated) or to increase the number of frozen aliquots to be prepared in the next section. For Yeast Strain Libraries 1. Once the 100 mL liquid culture is saturated, pour cultures into 2× 50 mL conical tubes per flask. Spin down 2000 rcf for 5 min. 2. Decant supernatant & resuspend entire pellet in 1 L SD-SCAA –RepDO +10 mL 100× Pen-strep in a 2 L flask. 3. Leave shaking overnight at 30 °C at 300 rpm. 4. This culture can be scaled up further to prepare enough cells to accommodate aliquots with larger numbers of cells (if a large library has been created) or to increase the number of frozen aliquots to be prepared in the next section.

3.5 Long-Term Library Storage

For Plasmid-Based and Yeast Strain Libraries: 1. Take the 2 L flasks out of 30 °C and set aside 5 mL of each into culture tubes for analysis. (a) These 5 mL samples can be stored at 4 °C for approximately 1 month.

Genome-Wide Screen for Enhanced ncAA Incorporation

239

2. Transfer remaining volume (~1 L) into two sterile centrifuge bottles (~500 mL each). Use scale to balance weights of bottles prior to centrifugation. 3. Spin down at 2000 rcf for 5 min. 4. Decant supernatant and resuspend the pellet of the first bottle with 5 mL 60% glycerol. Once the pellet is completely resuspended, transfer the solution and resuspend the pellet in the second bottle. If resuspended volume is less than 20 mL, add sterile water until final volume is equal to 20 mL. 5. Aliquot into 2 mL cryo tubes, ensuring that each aliquot has at least 10× number of cells compared to number of transformants in order to cover the entire library diversity. 6. Add tubes to a cryogenic storage box and freeze at -80 °C. 7. Check 10 μL from 5 mL reserve under a microscope to ensure cell morphology appears healthy and to ensure that no bacteria are detected under the microscope. 3.6 Library Characterization 3.6.1 Prepare Cells for Flow Cytometry Analysis

For Plasmid-Based Libraries 1. Passage 200 μL of the 5 mL recovered cells stored in 4 °C in 5 mL of fresh SD-SCAA –RepDO –LibDO +50 μL 100× Pen-strep. Incubate in 30 °C overnight at 300 rpm at a 45 degree angle. 2. Dilute a portion of the saturated overnight culture to OD600 of 1 in 5 mL of fresh SD-SCAA –RepDO –LibDO +50 μL 100× Pen-strep. Incubate in 30 °C at a 45 degree angle for 4–8 h (until cells have doubled one to two times for an OD600 = 2–5) shaking at 300 rpm. 3. After confirming OD600 is between 2 and 5, calculate the volume of cells needed to prepare a 2 mL culture of OD600 = 1. Aliquot calculated volume into a sterile 15 mL culture tube. 4. Spin down for 5 min at 2400 rcf. Aspirate supernatant. 5. Resuspend pellet in 2 mL of SG-SCAA –RepDO –LibDO +20 μL 100× Pen-strep. The galactose media will induce cells into protein production mode. 6. Add 40 μL of 50 mM ncAA of interest for a final concentration of 1 mM in media and vortex to mix. Note that other final concentrations may be used to determine efficiency of incorporation under higher or lower concentrations of ncAA. 7. Place culture at a 45 degree angle in the 20 °C incubator with shaking at 300 rpm overnight (~16 h). 8. The next morning, measure OD600 and calculate the volume equivalent to two million cells (An OD600 value of 1 is equal to 107 cells/mL).

240

Briana R. Lino and James A. Van Deventer

9. Aliquot two million cells into a 1.7 mL microcentrifuge tube. Add PBSA to 1 mL final volume. If using a yeast display system, aliquot an additional three samples of WT control to be used for flow cytometry color controls during the fluorescent antibody labeling step. Cells expressing individual fluorescent proteins can be used as controls for fluorescent protein reporters. (see Note 11 for more details on color controls). 10. Spin down for 30 s at 12,000 rpm at 4 °C. 11. Aspirate supernatant. Wash 3× in 1 mL cold PBSA (resuspend pellet in PBSA, spin down 12,000 rpm for 30 s, aspirate, repeat). If using a fluorescent reporter system, skip steps 12–18. If using a yeast display reporter system, follow the remainder of the steps in this section for fluorescent antibody labeling. 12. Add 50 μL primary labeling mixture for N- and C- terminal tags per tube (excluding unlabeled control) at a 1:500 dilution. Note: antibodies should always be kept on ice or at 4 °C. 13. Incubate the samples on a rotary wheel at room temperature for at least 30 min. After this incubation, all steps should be done on ice. 14. Wash the samples 3× in 1 mL ice cold PBSA. 15. Add 50 μL of well-mixed secondary label per tube at a 1:500 dilution. Note: protect these antibodies from light by covering the ice bucket as much as possible. 16. Incubate the samples on ice for 15 min. Protect from light. 17. Dilute once in 1 mL PBSA, spin down 12,000 rpm for 30 s, aspirate, then wash once in 1 mL PBSA, spin down 12,000 rpm for 30 s, aspirate. 18. After the last wash, aspirate supernatant and leave pellet on ice or at 4 °C until ready for flow cytometry. For Yeast Strain Libraries 19. Passage 200 μL of the 5 mL recovered cells stored in 4 °C in 5 mL of fresh SD-SCAA –RepDO +50 μL 100× Pen-strep. Incubate in 30 °C overnight at 300 rpm at a 45 degree angle. 20. Dilute a portion of the saturated overnight culture to OD600 of 1 in 5 mL of fresh SD-SCAA –RepDO +50 μL 100× Pen-strep. Incubate in 30 °C at a 45 degree angle for 4–8 h (until cells have doubled one to two times for an OD600 = 2–5) shaking at 300 rpm. 21. After confirming OD600 is between 2 and 5, calculate the volume of cells needed to prepare a 2 mL culture of OD600 = 1. Aliquot calculated volume into a sterile 15 mL culture tube.

Genome-Wide Screen for Enhanced ncAA Incorporation

241

22. Spin down for 5 min at 2400 rcf. Aspirate supernatant. 23. Resuspend pellet in 2 mL of SG-SCAA –RepDO +20 μL 100× Pen-strep. The galactose media will induce protein synthesis. 24. Add 40 μL of 50 mM ncAA of interest for a final concentration of 1 mM in media and vortex to mix. Note that other final concentrations of ncAA may be used to determine efficiency of incorporation under higher or lower concentrations of ncAA. 25. Place culture at a 45 degree angle in the 20 °C incubator with shaking at 300 rpm overnight (~16 h). 26. The next morning, measure OD600 and calculate the volume of culture that contains two million cells (An OD600 value of 1 is equal to 107 cells/mL). 27. Aliquot two million cells into a 1.7 mL microcentrifuge tube. Add PBSA to 1 mL final volume. If using a yeast display system, aliquot an additional three samples of WT control to be used for flow cytometry color controls during the fluorescent antibody labeling step. Individual fluorescent proteins can be used as controls for fluorescent protein reporters. (see Note 11 for more details on color controls). 28. Spin down for 30 s at 12,000 rpm at 4 °C. 29. Aspirate supernatant. Wash 3× in 1 mL cold PBSA (resuspend pellet in PBSA, spin down 12,000 rpm for 30 s, aspirate, repeat). If using a fluorescent reporter system, skip steps 30–36. If using a yeast display reporter system, follow the remainder of the steps in this section for fluorescent antibody labeling. 30. Add 50 μL primary labeling mixture for N- and C-terminal tags per tube (excluding unlabeled control) at a 1:500 dilution. Note: antibodies should always be kept on ice or at 4 °C. 31. Incubate the samples on a rotary wheel at room temperature for at least 30 min. After this incubation, all steps should be done on ice. 32. Wash the samples 3× in 1 mL ice cold PBSA. 33. Add 50 μL of well-mixed secondary label per tube at a 1:500 dilution. Note: protect these antibodies from light by covering the ice bucket as much as possible. 34. Incubate the samples on ice for 15 min. Protect from light. 35. Dilute once in 1 mL PBSA, spin down 12,000 rpm for 30 s, aspirate, then wash once in 1 mL PBSA, spin down 12,000 rpm for 30 s, aspirate. 36. After the last wash, aspirate supernatant and leave pellet on ice or at 4 °C until ready for flow cytometry.

242 3.6.2

Briana R. Lino and James A. Van Deventer Flow Cytometry

For Plasmid-Based and Yeast Strain Libraries: 1. Resuspend pellets in 500 μL cold PBSA immediately prior to use for flow cytometry. 2. On the flow cytometer, run color controls first, and monitor real-time collection results with axes set to the correct detection channels and adjust instrument settings as needed so data points fall within the graph. If fluorescent populations are located too high or too low within the axis, adjust the voltage for the corresponding detector and restart collection of data for all samples. 3. Collect data for each sample on the flow cytometer using detector voltages determined above. 4. Once all samples are run, export FCS files for analysis. Software such as FlowJo can be used to analyze flow cytometry data and create overlays of library populations over WT or other controls to determine phenotypic changes as they correspond to ncAA incorporation. See Note 12 for a reference guide on flow cytometry analysis. At this step, ensure library appears to have desired phenotypic properties before moving on to sorting.

3.7 Sequencing Clones Isolated from Naı¨ve Library

Prior to sorting, the newly created library must first be validated via sequencing to ensure library diversity is equal to or greater than 10× the theoretical diversity and confirm mutations are occurring within regions of interest. For Plasmid-Based Collections 1. Grow an overnight 5 mL culture with 100 μL cells recovered from electroporation. 2. Yeast miniprep according to manufacturer’s instructions using a yeast plasmid miniprep kit. 3. Run a restriction digest with the full 30 μL volume of DNA from the yeast miniprep to cleave the reporter/OTS plasmid and leave the library plasmid intact. Buffer and restriction enzyme volumes can be determined according to manufacturer’s guidelines. Ensure that the restriction enzymes are compatible with cut sites that exist within the reporter/OTS plasmid and are not found in the library plasmid. Note: If reporter/OTS and library plasmids have different antibiotic resistance, this step is not needed. 4. Transform full volume of DNA digest into E. coli according to standard protocols (see Ref. [20]), using SOC media for the outgrowth step. Plate on selective solid growth media containing the appropriate antibiotic, compatible with the antibiotic resistance marker in the library plasmid.

Genome-Wide Screen for Enhanced ncAA Incorporation

243

5. Once individual colonies form (~12–16 h after plating), inoculate, and miniprep a minimum of ten colonies to serve as representative clones from the library. 6. Sequence to determine whether individual clones contain diversity consistent with the library design. Note: With the advent of affordable full-plasmid sequencing technologies, it is recommended that at least ten clones from the plasmid library are fully sequenced. This will ensure that the phenotypes characterized are a direct result of mutations within the gene of interest (or within a desired collection of genes) and not due to off-target effects (see Note 6 on sequencing types and specifications). For Barcoded Yeast Strain Libraries 1. Isolate genomic DNA from recovered sorts. See Note 8 for genomic DNA isolation protocol. 2. Run PCR on genomic DNA to amplify molecular barcodes unique to each yeast strain in the library. PCR Conditions listed in Note 7. 3. Perform a PCR cleanup using a gel extraction kit following the manufacturer’s protocol. 4. Add the appropriate primers to the eluted DNA and analyze via Sanger sequencing to identify candidate yeast strains and ensure strain diversity within the transformed library. 3.8 FluorescenceActivated Cell Sorting

The preliminary characterizations of the naı¨ve library must meet a specific set of criteria prior to sorting: (1) Flow cytometry data validates that the reporter system chosen is compatible with the library of choice, and the library population exhibits fluorescence characteristics consistent with the library design; (2) Sequences in library contain expected types of diversification; and (3) Library diversity is adequately covered. Once all three sorting prerequisites are met, the naı¨ve library can be screened using FACS. 1. Prepare cells, including controls, for flow cytometry analysis as described in Subheading 3.6.1, but at step 8 calculate the volume of cells needed to obtain 10× the library size to maintain library diversity. For example, for a one million member library, calculate the volume needed for ten million cells. Controls can be reduced to two million cells for analysis since they will not be sorted (coverage of diversity is not necessary during analysis). 2. Starting with the color controls, resuspend pellet in 1 mL cold PBSA, pass cells through polystyrene test tubes with cell strainer, and insert tube into loading stage.

244

Briana R. Lino and James A. Van Deventer

3. In the FACS software, set up dot plots with axes corresponding to the channels that detect the fluorescence emissions of the reporters used. 4. Run analyses on control samples to set detector voltages. Collect data on controls and on a sample of the library and set gates to collect cells exhibiting fluorescence that corresponds to enhanced ncAA incorporation (high levels of C-terminal tag detection in comparison to N-terminal tag detection). 5. Once the gates are set, prepare an empty polystyrene collection tube with 1 mL of SD-SCAA –RepDO –LibDO for recovery. The tube volume may vary depending on sorter used. Note: pipette media along rim to ensure cells slide to the media at the bottom and do not get stuck on side of tube when sorting. 6. Place collection tube in the sorter collection device at designated sorting position. 7. When ready to sort library, resuspend pelleted cells in 1 mL cold PBSA, and pass cells through cell strainer into polystyrene collection tube for sorting. 8. Place filtered tube of library cells in the loading stage, and begin sorting for enhanced ncAA incorporation. 9. Sort a number of events corresponding to at least ten times the library diversity (and collect approximately the top 0.1–1% of events), then stop the sort, pull the recovery tube from the sorter, and add 1 mL of selective dextrose media (SDSCAA – RepDO –LibDO) to the rim to move collected cells to the bottom of the collection tube. Leave the tube on ice or at 4 °C until remaining samples have been sorted. 10. Transfer recovered cells to 5 mL SDSCAA –RepDO –LibDO + pen-strep and grow at 30 °C with shaking at 300 rpm for ~3 days until saturated. 3.8.1 Analytical Flow Cytometry on Recovered Sorts

1. To characterize recovered populations for ncAA incorporation via flow cytometry, from saturated culture, passage, dilute, and induce as described in Subheading 3.6.1. In parallel, prepare naı¨ve library as described in Subheading 3.6.1 to serve as a control to evaluate for enrichment (or, if multiple rounds of sorts have been completed, prepare the population isolated in the prior round of sorting). 2. Conduct flow cytometry analysis on induced presort and postsort samples to determine if increases in fluorescence corresponding to enhanced ncAA incorporation have occurred, indicating a successful sort (see Note 13 for examples of phenotypic analyses).

Genome-Wide Screen for Enhanced ncAA Incorporation 3.8.2 Sorted Library Characterization

245

1. Isolate and sequence individual clones from the sorted library by repeating Subheading 3.7 starting with populations recovered from FACS. (a) For plasmid-based libraries: If using Sanger sequencing in lieu of full plasmid sequencing to investigate gene(s) of interest, submit plasmids and sequencing primers according to the guidance of the sequencing facility of interest. (b) For yeast strain libraries: Follow protocols for PCR amplifying molecular barcodes prior to submitting samples for sequencing (see Notes 6, 7, and 8 for information on sequencing platforms, PCR protocol, and genomic DNA isolation protocol, respectively). 2. In addition to sequencing individual sorted clones, retransform plasmids encoding variants of interest (plasmid-based screens) or acquire separate strains of interest (strain-based screens) and run additional flow cytometry analysis to evaluate enhancement of ncAA incorporation with candidate genetic/genomic variants. Flow cytometry and sequencing data can be used in combination with one another to determine which mutations result in enhanced ncAA incorporation (see Note 14 for information on where to acquire individual strains). 3. Relative Readthrough Efficiency (RRE) and Maximum Misincorporation Frequency (MMF) measurements can be performed to enable more rigorous determination of ncAA incorporation efficiency and fidelity [see Ref. [21] for full protocol; see also Note 13 for more information on RRE and MMF calculations]. Note: single clones of candidate plasmids encoding variants of interest or candidate strains must be retransformed with both a pSPS-RepTAG-OTS and pSPSRep-OTS in order to generate data required to perform RRE and MMF calculations. Candidate variants can be compared to parent strains transformed with the same reporter-OTS (pSPSRepTAG-OTS and pSPS-Rep-OTS) that may serve as a control in determining the phenotypic effects of plasmid/strain variations.

3.9

Deep Sequencing

Deep sequencing uses next-generation sequencing to generate tens of thousands to upwards of millions of sequencing reads on populations of clones (variants of individual genes or barcodes corresponding to genomic variations). In the types of screens described here, deep sequencing enables evaluation of which mutations (or genes) are most highly enriched in populations exhibiting phenotypes consistent with improved ncAA incorporation. This requires performing sequencing on both the naı¨ve library and one or more enriched populations that exhibit phenotypes of interest. Comprehensive insights into the landscape of mutations that are

246

Briana R. Lino and James A. Van Deventer

enriched in individual genes of interest, or the genes that are enriched in screens of strains, can be gained from these approaches. 1. Design and order primers with or without adapter sequences as outlined by the deep sequencing platform used. 2. From recovered sorts, isolate DNA from both the naı¨ve and sorted library populations using a yeast plasmid miniprep kit according to the manufacturer’s protocol. 3. Perform a PCR amplification according to standard protocols to amplify a gene of interest (or barcodes of interest) in both the naı¨ve and sorted library populations. If using short-read Illumina sequencing and the region to be amplified is greater than 400 bp, design primers to amplify multiple gene fragments where each fragment is no larger than 400 bp in length to achieve the cleanest sequencing reads. 4. Submit samples for sequencing according to sequencing service or sequencing core guidelines. 5. Process and analyze data to eliminate low-quality sequencing reads and then determine which mutations (genes) are enriched in populations sorted for enhanced ncAA incorporation. See Ref. [22] for more information on deep sequencing analysis.

4

Notes 1. Auxotrophic marker selection: The reporter plasmids used should have an auxotrophic marker gene that is compatible with the yeast strain being used. Additionally, if using a plasmid-based library for a genome-wide screen, ensure that the library contains an auxotrophic marker gene that is also compatible with the yeast strain used, and one different from the auxotrophic marker used in the reporter plasmid. For the yeast strain BY4741, for example, (genotype: MATa his3Δ1 leu2Δ0 met15Δ0 ura3Δ0), compatible auxotrophic marker genes would encode HIS3, LEU2, MET15 (not commonly used), or URA3. 2. Synthetic drop-out mix recipe: Weigh out each of the components of the table below (omitting the amino acids that will be dropped out). Blend the amino acid mixture until homogenous. Store the mixture in 50 mL falcon tubes and store at 4 °C.

Genome-Wide Screen for Enhanced ncAA Incorporation

247

Amino acid

Mass (g)

Amino acid

Mass (g)

Adenine

0.5

Leucine

10.0

Alanine

2.0

Lysine

2.0

Arginine

2.0

Methionine

2.0

Asparagine

2.0

Para-aminobenzoic acid

0.2

Aspartic acid

2.0

Phenylalanine

2.0

Cysteine

2.0

Proline

2.0

Glutamine

2.0

Serine

2.0

Glutamic acid

2.0

Threonine

2.0

Glycine

2.0

Tryptophan

2.0

Histidine

2.0

Tyrosine

2.0

Inositol

2.0

Uracil

2.0

Isoleucine

2.0

Valine

2.0

3. Noncanonical amino acid stock recipe: For 5 mL of a 50 mM stock of L-isomer of a noncanonical amino acid: MW g/mol * 0.00025 mol = mass in grams of L-isomer of ncAA needed. For ncAAs available only as racemic mixtures, double the mass in grams of the DL-isomer to obtain the mass needed for 50 mM L -somer. Use ncAA stock solutions immediately or store at 4 ° C. For photosensitive ncAAs, make sure to cover in foil to protect from light. The stability of ncAAs in solution at 4 °C can vary considerably depending on the ncAA and thus should be monitored carefully. 4. Chemically competent E. coli can be purchased or made using commercially available kits. E. coli transformations in our lab are generally performed using strain DH5ɑZ1, though other strains may be used. 5. Working concentrations of common antibiotics: (1) Ampicillin/Carbenicillin: 50 μg/mL, (2) Kanamycin: 34 μg/mL, (3) Zeocin: 25 μg/mL. 6. Sequencing platforms: If a known portion of a gene is being diversified, primers can be ordered to sequence just the gene of interest (GOI). These primers can be added via PCR amplification with the target DNA. If plasmid DNA has been isolated from E. coli, sequencing primers and plasmids can be submitted directly for sequencing. Otherwise, full plasmid sequencing can be done, which typically only requires purified plasmid DNA

248

Briana R. Lino and James A. Van Deventer

isolated from E. coli to be sent. For a comprehensive enrichment analysis of the frequency by which each mutation appears in naive versus sorted libraries, deep sequencing can be performed. See Subheading 3.9 and references therein for further details. 7. Example of a PCR amplification, here using a Q5 polymerase: (a) In a PCR tube, add 10 μL Q5 reaction buffer, 2.5 μL forward primer, 2.5 μL reverse primer, 1.0 μL dNTPs, 1.0 μL DMSO, 0.5 μL Q5 Polymerase, 31.5 μL diH2O, and 1.0 μL DNA from a single colony. (b) Run PCR using the following cycling conditions (Note: Annealing temperature can vary): Step

Temperature (°C)

Time

Initialization

98

30 s

Denaturation

98

10 s

Annealing

61

30 s

Extension

72

1 min

Final extension

72

4 min

Final hold

4

1

35 cycles

End cycle

(c) Run PCR product on an agarose gel and extract the appropriate band of DNA for sequencing. 8. Genomic DNA isolation protocol: (a) Plate 50 μL of recovered cell culture on solid YPD growth medium (+ antibiotic selection if necessary). (b) Once colonies have formed (~1–2 days later), inoculate individual colonies into 2 mL YPD culture (+ antibiotic selection if necessary). Repeat this step for a total of 10–20 inoculations of individual colonies. Incubate at 30 °C overnight at 45 degree angle, shaking at 300 rpm. (c) Transfer 1–1.5 mL saturated cultures into microcentrifuge tubes and spin down at 14,000 rpm for 2 min. Aspirate the supernatant. (d) Resuspend pellet in 200 μL of 200 mM Lithium Acetate 1% SDS solution, vortex cells, and incubate either at room temperature for 10–15 min or 65 °C for 5–10 min. (e) Add 600 μL of 100% Ethanol. Briefly vortex cells and centrifuge tubes at 14,000 rpm for 3 min.

Genome-Wide Screen for Enhanced ncAA Incorporation

249

(f) Aspirate supernatant. Wash pellet with 500 μL of 70% Ethanol and centrifuge at 14,000 rpm for 3 min. (g) Aspirate supernatant. Let the pellet dry completely. (h) Once dry, resuspend the pellet in 600 μL diH2O. 9. Antibiotic selection: For libraries containing an antibiotic resistance marker, grow in the presence of the appropriate antibiotic for selection. For example, libraries with a KanMX cassette should be grown in media supplemented with G418. 10. Transformation efficiency: For a gene library prepared via random mutagenesis with a target mutation rate of one to two mutations per gene, attempt to obtain at least one million transformants. For strain-based libraries, the number of transformants needed to confidently cover library diversity should be 10× the number of strains contained in the pooled collection. For libraries created via saturation mutagenesis, or other mutagenesis strategies in which a theoretical diversity can be calculated, attempt to obtain a number of transformants corresponding to at least 10× the theoretical diversity of the library design whenever possible. See Ref. [23] for information on how to calculate theoretical diversity. 11. Color controls for flow cytometry and FACS: When using a yeast display system that utilizes N-terminal and C-terminal tags, it is important to have three color controls–an unlabeled control population that will indicate cellular autofluorescence, a control population that is fluorescently labeled with only the reagents for N-terminal detection, and a control population that is fluorescently labeled with only the reagents for C-terminal detection. When using dual-fluorescent reporters, it is recommended to use individual fluorescent proteins as color controls. For more information on choosing color controls for yeast display and intracellular reporter systems, see Ref. [21]. 12. For an in-depth guide on flow cytometry analysis, see Ref. [21]. 13. To calculate stop codon readthrough efficiency and ncAA incorporation fidelity, relative readthrough efficiency (RRE) and maximum misincorporation frequency (MMF) can be calculated with the equations below. Note that these calculations require data generated from cells transformed with a TAG reporter induced in the presence and absence of a ncAA, and cells transformed with a WT reporter (no TAG) induced in the presence and absence of a ncAA. See Ref. [21] for information on how to generate data for RRE and MMF. For enhanced ncAA incorporation systems, a high efficiency and low misincorporation is desired. These metrics are calculated on a scale from 0 to 1. A ncAA incorporation system that

250

Briana R. Lino and James A. Van Deventer

exhibits the properties of wild-type translation would have an RRE of 1 and an MMF of 0. (a) RRE: (b)

C - terminus detection for RepTAG C - terminus detection for Rep N - terminus detection for RepTAG = N - terminus detection for Rep: - ncAA MMF: RRE RREþncAA

14. Individual strains can be acquired from repositories including Euroscarf and companies including Horizon Discovery.

Acknowledgments Research in the Van Deventer Lab on enhancing the efficiency of genetic code expansion in yeast is supported by a grant from the National Institute of General Medical Sciences of the National Institutes of Health (R35GM133471). The content of this work is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The authors would like to thank Rebecca Hershman, Arlinda Rezhdo, and Jessica Stieglitz for their insightful comments and feedback on these protocols. References 1. Rezhdo A, Islam M, Huang M, Van Deventer JA (2019) Future prospects for noncanonical amino acids in biological therapeutics. Curr Opin Biotechnol 60:168–178. https://doi. org/10.1016/j.copbio.2019.02.020 2. Katoh T, Suga H (2022) In vitro genetic code reprogramming for the expansion of usable noncanonical amino acids. Annu Rev Biochem 91:221–243. https://doi.org/10.1146/ annurev-biochem-040320-103817 3. Johnson JA, Lu YY, Van Deventer JA, Tirrell DA (2010) Residue-specific incorporation of non-canonical amino acids into proteins: recent developments and applications. Curr Opin Chem Biol 14(6):774–780. https://doi. org/10.1016/j.cbpa.2010.09.013 4. Murakami H, Hohsaka T, Ashizuka Y, Sisido M incorporation of (1998) Site-directed p-nitrophenylalanine into streptavidin and site-to-site photoinduced electron transfer from a pyrenyl group to a nitrophenyl group on the protein framework. J Am Chem Soc 120(30):7520–7529. https://doi.org/10. 1021/ja971890u 5. Hohsaka T, Ashizuka Y, Murakami H, Sisido M (2001) Five-base codons for incorporation of nonnatural amino acids into proteins. Nucleic

Acids Res 29(17):3646–3651. https://doi. org/10.1093/nar/29.17.3646 6. Bain JD, Switzer C, Chamberlin AR, Benner SA (1992) Ribosome-mediated incorporation of a non-standard amino acid into a peptide through expansion of the genetic code. Nature 356(6369):537–539. https://doi.org/10. 1038/356537a0 7. Robertson WE, Funke LFH, de la Torre D, Fredens J, Elliott TS, Spinck M, Christova Y, Cervettini D, Boge FL, Liu KC, Buse S, Maslen S, Salmond GPC, Chin JW (2021) Sense codon reassignment enables viral resistance and encoded polymer synthesis. Science 372(6546):1057–1062. https://doi.org/10. 1126/science.abg3029 8. Richardson SM, Mitchell LA, Stracquadanio G, Yang K, Dymond JS, DiCarlo JE, Lee D, Huang CL, Chandrasegaran S, Cai Y, Boeke JD, Bader JS (2017) Design of a synthetic yeast genome. Science 355(6329):1040–1044. https://doi. org/10.1126/science.aaf4557 9. Luo Z, Wang L, Wang Y, Zhang W, Guo Y, Shen Y, Jiang L, Wu Q, Zhang C, Cai Y, Dai J (2018) Identifying and characterizing SCRaMbLEd synthetic yeast using ReSCuES. Nat

Genome-Wide Screen for Enhanced ncAA Incorporation Commun 9(1):1930. https://doi.org/10. 1038/s41467-017-00806-y 10. Giaever G, Nislow C (2014) The yeast deletion collection: a decade of functional genomics. Genetics 197(2):451–465. https://doi.org/ 10.1534/genetics.114.161620 11. Arita Y, Kim G, Li Z, Friesen H, Turco G, Wang RY, Climie D, Usaj M, Hotz M, Stoops EH, Baryshnikova A, Boone C, Botstein D, Andrews BJ, McIsaac RS (2021) A genomescale yeast library with inducible expression of individual genes. Mol Syst Biol 17(6):e10207. https://doi.org/10.15252/msb.202110207 12. Gelperin DM, White MA, Wilkinson ML, Kon Y, Kung LA, Wise KJ, Lopez-Hoyo N, Jiang L, Piccirillo S, Yu H, Gerstein M, Dumont ME, Phizicky EM, Snyder M, Grayhack EJ (2005) Biochemical and genetic analysis of the yeast proteome with a movable ORF collection. Genes Dev 19(23):2816–2826. https://doi.org/10.1101/gad.1362105 13. Kumar A (2016) Multipurpose transposoninsertion libraries in yeast. Cold Spring Harb Protoc 2016(6). https://doi.org/10.1101/ pdb.top080259 14. Zackin MT, Stieglitz JT, Van Deventer JA (2022) Genome-wide screen for enhanced noncanonical amino acid incorporation in yeast. ACS Synth Biol 11(11):3669–3680. https://doi.org/10.1021/acssynbio.2c00267 15. Boder ET, Wittrup KD (2000) Yeast surface display for directed evolution of protein expression, affinity, and stability. Methods Enzymol 328:430–444. https://doi.org/10.1016/ s0076-6879(00)28410-3 16. Potts KA, Stieglitz JT, Lei M, Van Deventer JA (2020) Reporter system architecture affects

251

measurements of noncanonical amino acid incorporation efficiency and fidelity. Mol Syst Des Eng 5(2):573–588. https://doi.org/10. 1039/c9me00107g 17. Stieglitz JT, Potts KA, Van Deventer JA (2021) Broadening the toolkit for quantitatively evaluating noncanonical amino acid incorporation in yeast. ACS Synth Biol 10(11):3094–3104. https://doi.org/10.1021/acssynbio.1c00370 18. Eric T, Boder K, Wittrup D (1997) Yeast surface display for screening combinatorial polypeptide libraries. Nat Biotechnol 15(6):553– 557. https://doi.org/10.1038/nbt0697-553 19. Van Deventer JA, Wittrup KD (2014) Yeast surface display for antibody isolation: library construction, library screening, and affinity maturation. Methods Mol Biol 1131:151– 181. https://doi.org/10.1007/978-162703-992-5_10 20. Addgene Protocol-Bacterial Transformation. https://www.addgene.org/protocols/bacte rial-transformation/ 21. Stieglitz JT, Van Deventer JA (2022) Incorporating, quantifying, and leveraging noncanonical amino acids in yeast. Methods Mol Biol 2394:377–432. https://doi.org/10.1007/ 978-1-0716-1811-0_21 22. Shomron N (2013) Deep sequencing data analysis methods in molecular biology. Springer. https://doi.org/10.1007/978-162703-514-9 23. Bosley AD, Ostermeier M (2005) Mathematical expressions useful in the construction, description and evaluation of protein libraries. Biomol Eng 22(1–3):57–61. https://doi.org/ 10.1016/j.bioeng.2004.11.002

Chapter 15 Positive Selection Screens for Programmable Endonuclease Activity Using I-SceI Michael A. Mechikoff, Kok Zhi Lee, and Kevin V. Solomon Abstract Positive selection screens are high-throughput assays to characterize novel enzymes from environmental samples and enrich for more powerful variants from libraries in applications such as biodiversity mining and directed evolution. However, overly stringent selection can limit the power of these screens due to a high false-negative rate. To create a more flexible and less restrictive screen for novel programmable DNA endonucleases, we developed a novel I-SceI-based platform. In this system, mutant E. coli genomes are cleaved upon induction of I-SceI to inhibit cell growth. Growth is rescued in an activity-dependent manner by plasmid curing or cleavage of the I-SceI expression plasmid via endonuclease candidates. More active candidates more readily proliferate and overtake growth of less active variants leading to enrichment. While demonstrated here with Cas9, this protocol can be readily adapted to any programmable DNA endonuclease and used to characterize single candidates or to enrich more powerful variants from pooled candidates or libraries. Key words Endonuclease, CRISPR, I-SceI, Positive selection, Directed evolution, Enrichment

1

Introduction DNA endonucleases form the foundation of all biotechnology as they enable the targeted manipulation of DNA sequences. These enzymes recognize diverse substrates that are either inherent to the enzyme or programmed. Among the most important are restriction enzymes [1], homing DNA endonucleases [2], engineered meganucleases such as zinc finger nucleases [3, 4], and programmable endonucleases such as CRISPR-Cas [5]. However, target specificity and activity vary greatly among these enzymes leading to a gold rush to engineer and/or identify more precise endonucleases with high activity [6–8]. These efforts are supported by the recent explosion of environmental genomic datasets [9] and powerful

Michael A. Mechikoff and Kok Zhi Lee contributed equally. Jeffrey Carl Braman (ed.), Synthetic Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2760, https://doi.org/10.1007/978-1-0716-3658-9_15, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2024

253

254

Michael A. Mechikoff et al.

tools of synthetic biology (e.g., directed evolution [7, 10]), which deliver an exponentially growing list of candidate enzymes to be evaluated. This list grows more quickly than candidates are evaluated leading to a need for high-throughput screening approaches for DNA endonuclease activity. Positive selection screens are powerful high-throughput tools for the discovery and characterization of natural and synthetic proteins [11–13]. Core to a positive selection screen is the linkage of cell survival to enzyme function. A common approach for DNA endonuclease activity is to link activity to curing of a plasmid expressing a lethal gene [7, 14]. While simple, this can be nontrivial to devise as complex phenotypes such as survival may lack sensitivity or be too stringent. For example, expression of barnase or bacterial RNAse is toxic outside of its native host in the absence of its intracellular inhibitor [15]. While this system has been successfully exploited to select for novel DNA endonucleases that can cure plasmids expressing barnase [14], the high stringency of barnase required amber mutations in barnase and suppressor tRNA to function. Even then, activity from high-efficiency homing endonucleases could rescue only up to ~80% of cell viability suggesting that less active enzymes, with potentially superior properties such as better target discrimination, may be missed. Other common toxins such as ccdB share similar stringency concerns and require exotic engineering solutions to reduce the number of false negatives [7]. While such stringency is sought after for directed evolution, the high false-negative rate will miss many promising DNA endonucleases from environmental samples that are less active but possess unique properties (e.g., improved accuracy in target recognition). Thus, we developed a less stringent DNA endonuclease selection assay to identify and characterize novel enzymes. In this system, we repurpose the homing DNA endonuclease, I-SceI, as a conditionally lethal gene for the selection of novel endonucleases [16] (Fig. 1). I-SceI recognizes and cleaves the unique 18 nt sequence TAGGGATAACAGGGTAAT, which is non-native to E. coli [17]. We introduced two copies of this sequence to E. coli MG1655 (DE3) (strain KS165) via recombineering to make an I-SceI susceptible strain that cleaves its genome and loses viability. Induction of this system results in a 3-log fold or 99.9% reduction in cell viability providing a wide dynamic range for the detection of novel enzyme activities [16]. Targeting this lethal I-SceI plasmid for curing with a DNA endonuclease, such as CRISPR-Cas9, is able to completely rescue growth. That is, unlike previous approaches, our system is less stringent and easier to implement. Despite the reduced stringency, this system is still able to enrich for more active DNA endonucleases [16] making it suitable for both biodiversity mining and directed evolution applications.

Positive Selection for Endonuclease Activity

255

Fig. 1 Overview of the positive selection system. Our system uses two plasmids: a lethal plasmid and a rescue plasmid. The lethal plasmid encodes a gene that cleaves the host genome inhibiting viability. The rescue plasmid harbors an endonuclease of choice, targeted to the lethal plasmid. Proper endonuclease targeting and cleavage of the lethal plasmid result in healthy, growing cells, while improper targeting or inefficient cleaving of the lethal plasmid results in cell death

2

Materials

2.1 Cell Culture and Cultivation

1. Miller LB: Mix 10 g tryptone, 10 g NaCl, and 5 g yeast extract in 975 mL ddH2O. Make up to 1 L with water and sterilize by autoclaving. 2. SOC: Mix 20 g tryptone, 5 g yeast extract, and 0.5 g NaCl in 939.5 mL d2H2O. Add 10 mL of 250 mM KCl. Autoclave to sterilize. Add 5 mL of sterile-filtered 2 M MgCl2 and 20 mL sterile-filtered 1 M glucose aseptically. 3. Antibiotics: Ampicillin (100 ug/mL), Kanamycin (50 ug/mL), Tetracycline (10 ug/mL) that are sterile filtered (.20 μm filter) (see Note 1). 4. 0.85% NaCl: Mix 8.5 g of NaCl with 1000 mL water. Autoclave to sterilize. 5. Inducers: L-Arabinose (0.2% w/v), aTc (200 ng/mL), Rhamnose (0.2% w/v) (see Note 1). 6. E. coli KS165 (MG1655 (DE3) nth::ISceIrs-tetA-ISceIrs) *Available upon request. 7. Serological pipet. 8. Sterile Falcon tubes. 9. Sterile 15 mL culture tubes. 10. Sterile 1.5 mL microcentrifuge tubes. 11. Sterile 1 L shake flasks.

256

Michael A. Mechikoff et al.

Fig. 2 Schematic of lethal and rescue plasmids. The lethal plasmid encodes homing endonuclease I-SceI under the arabinose-inducible BAD promoter. The rescue plasmid encodes the programmable DNA endonuclease to be tested (e.g., Cas9 mutants in the provided plasmids), driven by an aTc-inducible Tet promoter. The rescue plasmid also includes guide RNA targeting the lethal plasmid under the rhamnose-inducible rhamBAD promoter. The restriction enzyme sites, SalI and SpeI, flank the endonuclease for alternative endonuclease cloning

12. 10% (v/v) glycerol. 13. 1 mm gap electroporation cuvette. 14. Electroporator. 15. LB agar plates. 2.2 Molecular Biology Reagents

1. Restriction Enzymes: BglII, EcoRI, PstI, SalI, SpeI (BcuI), XhoI (see Note 2). 2. Polymerase: Phusion High-Fidelity PCR Master Mix with HF Buffer. 3. Ligase: T4 Ligase. 4. Plasmids: pColEI-ISceI (Addgene #179578)—lethal plasmid, pFREE2-eSpCas9-sgRNA (Addgene #179580),—Rescue plasmid, [optional: pColEI-ISceI-D44S; *Available upon request] (see Note 3) (Fig. 2).

3

Methods

3.1 Electrocompetent Cell Creation

This protocol is used to generate competent cells that can be transformed with plasmids used to validate I-SceI lethality (see Subheading 3.3) or to test programmable endonuclease activity (see Subheading 3.4). Specific strains and antibiotics to be used depend on the protocol step being followed.

Positive Selection for Endonuclease Activity

257

1. Inoculate from a single colony or freezer stock and grow each strain in 5 mL LB with antibiotics (if appropriate) for 16–20 h in a sterile culture tube. 2. Dilute the overnight culture 100× (2 mL) into 200 mL LB with antibiotics (if appropriate) in a 1 L shake flask. 3. Grow the cells until OD600 = 0.5. 4. Transfer the cells to two 50 mL sterile Falcon tubes [keep the cells on ice for this step]. 5. Centrifuge to pellet the cells at 4000 RPM for 5 min at 4 °C. 6. Pour off the supernatant promptly. 7. Repeat steps 4–6 until all 200 mL of the culture is pelleted using the same two Falcon tubes to concentrate the cells. 8. Add 50 mL of prechilled d2H2O to each Falcon tube (wash step). 9. Vortex gently to resuspend cells. 10. Centrifuge to pellet the cells at 4000 RPM for 5 min at 4 °C. 11. Pour off the supernatant promptly. 12. Repeat steps 7–11 once. 13. Add 50 mL of prechilled 10% glycerol to each Falcon tube. 14. Vortex gently to resuspend cells. 15. Centrifuge to pellet the cells at 4000 RPM for 5 min at 4 °C. 16. Use a serological pipet to remove the supernatant (see Note 5). 17. Resuspend the pellet with 2 mL 10% glycerol. 18. Aliquot the cells with 50 μL to 40× sterile 1.5 mL microcentrifuge tubes. 19. Store the cells at -80 °C or proceed with transformation promptly. 3.2 Transformation (Electroporation)

This protocol is used to introduce plasmids to be tested in competent cells. 1. Thaw the competent cells on ice. 2. Mix 30–100 μg of plasmid DNA with the competent cells. 3. Transfer the cells into a prechilled electroporation cuvette (1 mm gap) ensuring no air bubbles between the sample and the bottom of the cuvette. 4. Set the electroporator to 1.8 kV, 25 uF, 200 ohms. 5. Place the cuvette into the electroporator and complete the electroporation according to instrument protocols. 6. Resuspend the cells with 1 mL SOC and transfer into a sterile 1.5 mL Eppendorf tube.

258

Michael A. Mechikoff et al.

7. Let the cells grow for 1 h at 37 °C, shaking at 250 RPM to recover. 8. Centrifuge at 12,000 RPM to pellet the cells. 9. Discard supernatant. 10. Resuspend the pellet with 20 μL LB. 11. Streak on an agar plate with antibiotics (if applicable). 12. Grow the cells on LB agar plates overnight at 37 °C. 3.3 I-SceI Lethality Validation (Fig. 3)

The first step in performing the positive selection system is to ensure the lethal plasmid is properly killing the KS165 strain. Because the system relies directly on the lethal plasmid’s ability to cause cell death, this validation is a good check before introducing the rescue plasmid. To ensure the lethal plasmid is causing cell death in KS165, use the following protocol: 1. Prepare electrocompetent KS165 cells in tetracycline media (see Subheading 3.1). 2. Transform 50 ng pColEI-ISceI and 50 ng pColEI-ISceI-D44S (catalytic mutant) into KS165 competent cells (see Subheading 3.2 and Note 3). 3. Recover in SOC for 1 h at 37 °C with tetracycline, shaking at 250 RPM (see Subheading 3.2 and Note 6). 4. Plate on LB with ampicillin for 16 h. 5. Pick one colony and inoculate into 3 mL LB in a culture tube with ampicillin. Grow at 37 °C for 4 h, 250 RPM.

Fig. 3 Induction of the lethal plasmid results in cell death. (a) Liquid cultures harboring the lethal plasmid, pColEI-ISceI, after 4 h of growth in LB/amp postinduction. Col 3A is an induced replicate (cell death), Col 3- is an uninduced replicate (control, cell growth). (b) Plated cultures after serial dilution. The lethal plasmid, pColEI-ISceI (I-SceI Wild Type) showed sufficient cell death when induced. Meanwhile, the I-SceI catalytic mutant plasmid, pColEI-ISceI-D44S, resulted in extensive cell growth regardless of the presence of the inducer, confirming that wild-type I-SceI on pColEI-ISceI is responsible for cell death when induced

Positive Selection for Endonuclease Activity

259

6. Inoculate a culture tube of 5 mL fresh LB with 50 μL of the culture from step 5 with ampicillin and 10 mM arabinose. Repeat the step but without arabinose for control (see Note 7). 7. Take initial OD600 reading immediately after dilution. 8. Grow at 37 °C for 4 h, shaking at 250 RPM. 9. Take final OD600 reading (see Note 8). 10. From the 5 mL cultures, dilute 45 μL into 10 mL of sterile 0.85% NaCl. Then, from this new 10 mL tube, immediately dilute another 45 μL into a separate 10 mL of sterile 0.85% NaCl. Finally, plate 50 μL of this last dilution on LB agar with ampicillin and grow overnight (~12 h) at 37 °C. 3.4 Nuclease Activity Assay (Fig. 4)

In this protocol, cell survival is directly linked to proper endonuclease targeting and cleaving activity. Growth, or the % rescue, directly scales with endonuclease activity and can be used to easily estimate endonuclease activity without time-intensive real-time protocols. This can be completed in liquid culture and automated on a plate reader or via cell counts. Typical results are depicted in Fig. 5 using SpCas9 as the endonuclease. Relative endonuclease activity of cloned endonucleases in the rescue plasmid can be measured as follows:

Fig. 4 Workflow for the nuclease activity assay

260

Michael A. Mechikoff et al.

Fig. 5 Typical nuclease activity assay results for (a) liquid and (b) solid plate cultures. Here, results are shown using SpCas9 in the rescue plasmid and are mean ± standard error. (Reprinted with permission from Lee et al. [16]. Copyright 2022 American Chemical Society)

1. Transform the rescue plasmid (e.g., pFREE2-eSpCas9sgRNA) into KS165 competent cells harboring the I-SceI lethal plasmid (pColEI-ISceI) (see Subheading 3.2) and after recovery, plate on LB with kanamycin and ampicillin overnight at 37 °C. 2. Pick colony and grow overnight in LB with kanamycin and ampicillin at 37 °C, shaking at 250 RPM. 3. Dilute culture to an OD600 of 0.01 in 5 mL LB (with kanamycin and ampicillin) and grow at 37 °C at 250 RPM until OD600 of 0.5. 4. Inoculate two tubes of 5 mL fresh LB with 50 μL of the culture from step 3. Induce only one tube with 100 μL 10% rhamnose (guide inducer) and 10 μL aTc (endonuclease inducer). These are Rescue+ samples. The other tube without inducers will serve as a control (Rescue-). Grow for 4 h at 37 °C and 250 RPM. 5. After 4 h, transfer 50 μL of each tube from step 4 to two culture tubes of 5 mL fresh LB. For one tube in each pair of tubes, add 50 μL of 1 M arabinose (I-SceI inducer; Lethal+ sample). The uninduced tube will serve as an uninduced

Positive Selection for Endonuclease Activity

261

control (Lethal--). Grow for 4 h at 37 °C, shaking at 250 RPM, then take OD600 reading. * Complete step 5 for both experimental and control cultures (step 4). 6. At this stage, there should be four cultures with the following configurations: Rescue+ Lethal+, Rescue+ Lethal-, RescueLethal+, Rescue- Lethal-. For each, serially dilute 10× eight times in sterile LB, and plate 100 μL of each dilution on LB. For each condition (Rescue+ Lethal+, etc.), use the plate that forms the most distinct colonies to calculate CFUs/mL (Fig. 4). *Rescue+/- refers to your endonuclease to be tested; Lethal+/- refers to the I-SceI plasmid. 7. To measure recovery efficiency of your endonuclease, use the following equation: %rescue =

OD600 Rescueþ Lethalþ - OD600 Rescue - Lethalþ OD600 ðRescueþ Lethal - Þ - OD600 Rescue - Lethalþ

(Fig. 5) (see Note 9). 3.5 Enrichment and Positive Selection (Fig. 6)

In addition to estimating relative endonuclease activity, this positive selection system may be used to enrich or select for more active DNA endonucleases. Pooled candidates or libraries of DNA endonucleases can be directly selected for and/or measured for enrichment via next-generation sequencing (NGS). The procedure to quantify enrichment via NGS is presented below: 1. After transforming Rescue and Lethal Plasmids, inoculate individual colonies in 5 mL LB with kanamycin and ampicillin (Tubes A, B, and C) (see Notes 10 and 11). *For libraries of endonuclease candidates, transform your rescue plasmid library and then your Lethal (I-SceI) plasmid. 2. Culture the cells at 37 °C, shaking at 250 RPM until the OD reaches OD600 0.5. 3. Dilute each culture 100× and combine into 5 mL of LB with rhamnose/aTc to produce a master culture (Tube D). Grow a parallel master culture without additional antibiotics/inducers to serve as a control (Tube E) (see Note 12). 4. Culture the tubes for 4 h at 37 °C, shaking at 250 RPM. 5. For Tube D, dilute 100× in LB with and without L-arabinose (Tubes F & G, respectively). Repeat for Tube E (to make Tubes H and I). 6. Culture the tubes (F, G, H, I) for 4 h at 37 °C, shaking at 250 RPM. 7. Centrifuge to pellet the cells and resuspend with fresh LB containing kanamycin without inducers (rhamnose, aTc, and L-arabinose) for 16 h at 37 °C, shaking at 250 RPM (see Note 13).

262

Michael A. Mechikoff et al.

Fig. 6 Workflow for enrichment or positive selection

8. Miniprep the overnight culture and sequence the plasmids using next-generation sequencing. 9. Prepare the libraries according to the protocol of preferred next-generation sequencing platform. 10. Map the sequencing reads to the rescue plasmid using relevant program related to preferred sequencing platform. 11. Count the frequency of each rescue plasmid variant based on the abundance of single-nucleotide polymorphisms within the sequencing data set. 12. Normalize all the frequencies of single-nucleotide polymorphisms for a given variant in each condition.

4

Notes 1. Concentrations shown are the working/final concentrations. Antibiotics should be prepared in 100×-1000× stocks and diluted to the indicated final concentration in prepared solutions. 2. Keep restriction enzymes on ice during sample preparation to prevent denaturation.

Positive Selection for Endonuclease Activity

263

3. The lethal plasmid was designed with the homing endonuclease I-SceI, targeting the E. coli host genome. The lethal plasmid is a critical part of the positive selection system and does not need to be edited. The rescue plasmid houses the endonuclease of choice (Cas9 in our examples) and includes Cas9 RNA guides targeting the lethal plasmid. The guides were constructed using Golden Gate Assembly and have introduced PstI and XhoI sites, so you can swap in your guide using PstI and XhoI. Note, the guide sits between two direct repeats, so to swap in your guide using PstI and XhoI, the direct repeat sequences will need to be incorporated in your guide fragment you wish to clone in. To use this system to test your own endonuclease, use SalI and SpeI restriction sites on the rescue plasmid to swap out the Cas9 for your endonuclease (see Note 4 for parameter recommendations if PCR reactions are failing to produce product). I-SceI validation (see Subheading 3.3) can be completed without pColE1-ISceI-D44S. This plasmid expresses an inactivated I-SceI mutant and is used as a control to recreate as best as possible the effects of metabolic burden. A plasmid-free control will behave similarly although not identically due to the lack of plasmid replication or expression. 4. During our attempts to clone Cas9 mutants into our rescue plasmid, multiple PCR attempts failed to amplify Cas9 mutants (eSpCas9 and xCas9) from their host plasmids. Some iterations included different primers, inclusion of DMSO, and different parameters. The target Cas9 mutants were finally amplified successfully when the annealing temperature was lowered to 55 °C, allowing proper primer annealing. 5. Pipetting carefully is recommended to avoid dislodging the pellet or inadvertently removing cells from the pellet. 6. Supplement tetracycline in step 6 of Subheading 3.2. 7. Because arabinose is the inducer for I-SceI, a control step should be included that copies the protocol exactly, minus the inclusion of arabinose (resulting in no I-SceI expression). Ampicillin is the selection marker for the plasmid, so the culture should be grown in the presence of ampicillin to prevent spontaneous curing of the plasmid. 8. At this point, cultures should resemble Fig. 3a and should be visibly different. Cultures grown in the presence of arabinose should have very little cell growth, resulting in transparent media. Cultures grown without arabinose should have significant cell growth (no I-SceI induction), resulting in opaque/ cloudy media. The OD600 reading and a visual inspection of the cultures should give you a good idea as to whether the pColEIISceI plasmid is sufficiently causing cell death. However, to confirm cell toxicity, serial dilutions were performed and the

264

Michael A. Mechikoff et al.

cultures were plated to obtain a colony count, more accurately identifying cell toxicity. Plating the cells is not necessary, though, because the point of this step is to simply identify if the I-SceI plasmid is causing cell death. If a more accurate account of cell growth is desired, a serial dilution followed by plating the cells is recommended. 9. This equation can be adapted for colony counts rather than liquid culture. Instead of using OD600, you can swap for a number of colonies, keeping the proper experimental and control group orientation within the equation. If you opt to use the colony counts in the equation, you must keep track of the dilution factor after serially diluting the cultures and incorporate appropriately into the equation. 10. Though it may seem more efficient to make electrocompetent cells harboring the lethal plasmid, then introduce to the rescue plasmid, we found the assay works best when the rescue plasmid is transformed first, then the lethal plasmid. 11. We recommend running parallel replicates to obtain the most accurate results. So, instead of picking a single colony for Endonuclease 1 (Tube A), we recommend picking multiple colonies and growing them in their own cultures (Tubes A1, A2, A3, A4, etc.). Repeat for Endonucleases 2 and 3. Then, combine A1, B1, and C1 to create the first mixed culture and A2, B2, and C2 for the next mixed culture (and so forth). Continue this methodology throughout the experiment. Doing this may result in many simultaneous trials, so a consistent naming convention for the tubes is recommended. 12. For sample libraries, no combination/pooling required. Dilute directly and inoculate in 5 mL of LB with rhamnose/aTc as directed. 13. For directed evolution, the most active versions should dominate this culture and can be directly propagated/isolated/analyzed at this step without sequencing for enrichment.

Acknowledgments This work was supported by the Ralph W. and Grace M. Showalter Trust (Grant # 41000622) and by the National Science Foundation (NSF MCB-2143856). The authors declare no conflict of interest. Figures 1, 2, 4, and 6 were created with BioRender.com.

Positive Selection for Endonuclease Activity

265

References 1. Pingoud A, Jeltsch A (2001) Structure and function of type II restriction endonucleases. Nucleic Acids Res 29(18):3705–3727 2. Stoddard BL (2005) Homing endonuclease structure and function. Q Rev Biophys 38(1): 4 9 – 9 5 . h t t p s : // d o i . o r g / 1 0 . 1 0 1 7 / S0033583505004063 3. Nemudryi AA, Valetdinova KR, Medvedev SP, Zakian SM (2014) TALEN and CRISPR/Cas genome editing systems: tools of discovery. Acta Nat 6(3):19–40 4. Urnov FD, Rebar EJ, Holmes MC, Zhang HS, Gregory PD (2010) Genome editing with engineered zinc finger nucleases. Nat Rev Genet 11(9):636–646. https://doi.org/10. 1038/nrg2842 5. Jinek M, Chylinski K, Fonfara I, Hauer M, Doudna JA, Charpentier E (2012) A programmable dual-RNA–guided DNA endonuclease in adaptive bacterial immunity. Science 337(6096):816–821. https://doi.org/10. 1126/science.1225829 6. Collias D, Beisel CL (2021) CRISPR technologies and the search for the PAM-free nuclease. Nat Commun 12(1):555. https://doi.org/10. 1038/s41467-020-20633-y 7. Chen Z, Zhao H (2005) A highly sensitive selection method for directed evolution of homing endonucleases. Nucleic Acids Res 33(18):e154–e154. https://doi.org/10. 1093/nar/gni148 8. Christie KA, Guo JA, Silverstein RA, Doll RM, Mabuchi M, Stutzman HE, Lin J, Ma L, Walton RT, Pinello L, Robb GB, Kleinstiver BP (2022) Precise DNA cleavage using CRISPRSpRYgests. Nat Biotechnol 1–8. https://doi. org/10.1038/s41587-022-01492-y 9. Chen I-MA, Chu K, Palaniappan K, Pillay M, Ratner A, Huang J, Huntemann M, Varghese N, White JR, Seshadri R, Smirnova T, Kirton E, Jungbluth SP, Woyke T, Eloe-Fadrosh EA, Ivanova NN, Kyrpides NC (2019) IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes. Nucleic Acids Res 47(D1):D666–D677. https://doi.org/10.1093/nar/gky901

10. Paschon DE, Lussier S, Wangzor T, Xia DF, Li PW, Hinkley SJ, Scarlott NA, Lam SC, Waite AJ, Truong LN, Gandhi N, Kadam BN, Patil DP, Shivak DA, Lee GK, Holmes MC, Zhang L, Miller JC, Rebar EJ (2019) Diversifying the structure of zinc finger nucleases for high-precision genome editing. Nat Commun 10(1):1133. https://doi.org/10.1038/ s41467-019-08867-x 11. Yoshida S, Hiraga K, Takehana T, Taniguchi I, Yamaji H, Maeda Y, Toyohara K, Miyamoto K, Kimura Y, Oda K (2016) A bacterium that degrades and assimilates poly(ethylene terephthalate). Science 351(6278):1196–1199. https://doi.org/10.1126/science.aad6359 12. Kranz RG, Gabbert KK, Madigan MT (1997) Positive selection systems for discovery of novel polyester biosynthesis genes based on fatty acid detoxification. Appl Environ Microbiol 63(8): 3010–3013. https://doi.org/10.1128/aem. 63.8.3010-3013.1997 13. Santoro SW, Wang L, Herberich B, King DS, Schultz PG (2002) An efficient system for the evolution of aminoacyl-tRNA synthetase specificity. Nat Biotechnol 20(10):1044–1048. https://doi.org/10.1038/nbt742 14. Gruen M, Chang K, Serbanescu I, Liu DR (2002) An in vivo selection system for homing endonuclease activity. Nucleic Acids Res 30(7): e29. https://doi.org/10.1093/nar/30.7.e29 15. Hartley RW (1988) Barnase and barstar: expression of its cloned inhibitor permits expression of a cloned ribonuclease. J Mol Biol 202(4):913–915. https://doi.org/10. 1016/0022-2836(88)90568-2 16. Lee KZ, Mechikoff MA, Parasa MK, Rankin TJ, Pandolfi P, Fitzgerald KS, Hillman ET, Solomon KV (2022) Repurposing the homing endonuclease I-SceI for positive selection and development of gene-editing technologies. ACS Synth Biol 11(1):53–60. https://doi. org/10/gn4mj7 17. Gimble FS, Wang J (1996) Substrate recognition and induced DNA distortion by the PI-SceI endonuclease, an enzyme generated by protein splicing. J Mol Biol 263(2): 163–180. https://doi.org/10.1006/jmbi. 1996.0567

Chapter 16 CRISPR-Cas9-Mediated Genome Editing in Paenibacillus polymyxa Giulia Ravagnan, Meliawati Meliawati, and Jochen Schmid Abstract In recent years, the clustered regularly interspaced palindromic repeats-Cas (CRISPR-Cas) technology has become the method of choice for precision genome editing in many organisms due to its simplicity and efficacy. Multiplex genome editing, point mutations, and large genomic modifications are attractive features of the CRISPR-Cas9 system. These applications facilitate both the ease and velocity of genetic manipulations and the discovery of novel functions. In this protocol chapter, we describe the use of a CRISPR-Cas9 system for multiplex integration and deletion modifications, and deletions of large genomic regions by the use of a single guide RNA (sgRNA), and, finally, targeted point mutation modifications in Paenibacillus polymyxa. Key words CRISPR, Cas9, Multiplexing, Point mutations, Genome engineering, Single sgRNA, Paenibacillus polymyxa

1

Introduction One dominant goal of metabolic engineering and synthetic biology is to overproduce a target compound. To this end, genetic engineering tools are required to manipulate the microorganism of interest. Paenibacillus polymyxa is a non-pathogenic bacterium used as a biofertilizer and a potent alternative chassis organism due to its ability to produce the relevant chemical (R,R)-2,3-butanediol, different exopolysaccharides (EPS) with unique characteristics and physiochemical properties, and several antimicrobial peptides [1–3]. Previous genetic engineering tools have been developed for P. polymyxa but showed low efficiencies [4–6]. However, our laboratory has successfully implemented a CRISPR-Cas9-based system that enables robust and efficient genome editing in this species for the first time [7]. The clustered regularly interspaced short palindromic repeat (CRISPR) is an adaptive defense mechanism found in many

Jeffrey Carl Braman (ed.), Synthetic Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2760, https://doi.org/10.1007/978-1-0716-3658-9_16, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2024

267

268

Giulia Ravagnan et al.

prokaryotes and archaea. To date, the CRISPR-Cas9 system from Streptococcus pyogenes is the most characterized and extensively studied [8]. The programmable Cas9 endonuclease targets a genomic locus with high specificity and induces a double-stranded DNA (dsDNA) break [8–10]. Target specificity is controlled by a coexpressed guide RNA (gRNA), which consists of a CRISPR-RNA (crRNA) and trans-activating crRNA (tracrRNA) complex containing a 20-nucleotides (nt) spacer sequence complementary to the target region, which is located upstream of a 5′-NGG-3′ sequence, so-called the protospacer-adjacent motif (PAM) site [8, 10]. Once the dsDNA is generated, the DNA repair system is triggered, which, depending on the organism, could be realized by homology-directed repair (HDR), nonhomologous end joining, or alternative end-joining repair mechanisms [11–13]. CRISPR-Cas system has been proven to be an efficient tool for the genetic editing of microorganisms and it offers also transcriptional manipulation tools, such as CRISPR-Cas-based activation or interference, and base editing tools, which have facilitated and engineered the next generation of metabolically engineered microbial cell factories [14–16]. Despite the general utility of the Cas9based CRISPR-Cas technology, targeting one locus at a time limits the efficiency and biotechnological applications. Multiplexing strategies that enable the editing of multiple loci simultaneously allow for minimizing labor and time for strain engineering [17]. Furthermore, streamlined approaches to obtain simplified and reduced genomes are more and more used to improve microbial cell factories [18–21]. To achieve this goal, large-scale genomic deletions contribute to speeding up the engineering process and can additionally provide a better understanding of novel gene functions [22, 23]. In P. polymyxa, we have successfully employed the nuclease-deactivated Cas12a (dCas12a) for developing a transcriptional perturbation (CRISPRi/a) tool for both single and multiple gene targets [24]. We have also attempted to exploit the endonuclease activity of Cas12a for genome-editing purposes but with no positive outcome (unpublished data). Finally, a base editing tool, in which the dCas9 is fused with a cytidine deaminase, for multiplex genome editing has also been developed by Kim et al. [25]. Yet, no large genomic deletion and genomic multiplexing modification tool were further explored in this species until now. For this, the single-vector-based system (pCasPP) we developed in 2017 was used to further explore its potential for enhanced efficiency [7]. The pCasPP plasmid contains the cas9 gene from S. pyogenes, which is codon-optimized for Streptomyces. The expression of Cas9 is under the control of the constitutive sgsE promoter from Geobacillus stearothermophilus, while the expression of the sgRNA is controlled by the constitutive gapdh promoter from Eggerthella lenta [26]. Additionally, homologous regions of 1 kb and the appropriate 20-nt spacer sequences are provided

CRISPR-Cas9-Mediated Genome Editing in Paenibacillus polymyxa

269

Fig. 1 Genome editing strategies exploiting the one-plasmid Cas9-mediated system in P. polymyxa. The plasmid harbors all elements needed for the targeted modifications including the Cas9, CRISPR array, and homologous regions as repair template. The CRISPR array consists of one or two sgRNA expression cassettes, each of them containing a promoter, spacer sequence, crRNA, and a terminator. (The figure is adapted from [27])

based on the genomic target. In the initial study, the deletion of a genomic region of 18 kb was achieved by utilizing two targeting sgRNAs without attempting any multiplexing [7]. In this protocol, we describe in detail the improved CRISPRCas9-based approach for large cluster deletions using one single sgRNA in P. polymyxa as published in [27]. Additionally, we describe how to design the sgRNA concerning the position of the cutting site to increase the efficiency of the deletion of these large genomic regions. Finally, the design of the system for multiplexing of single gene deletions, gene integrations, and even for single nucleotide modifications is described, as shown in Fig. 1 [27, 28]. This system represents a very powerful tool for future studies and development of a highly efficient P. polymyxa chassis and related species.

2

Materials

2.1 Molecular Biology

1. Primers are synthesized by Microsynth AG. 2. PCR polymerases Q5 (NEB, New England Biolabs Inc.) and Accuzyme (Bioline) Hi-Fi are used for DNA cloning purposes; GoTaq (Promega) is used for the colony PCR (cPCR) to screen for the correct mutant strain of P. polymyxa. 3. PCR cleanup is performed by the GeneJET PCR Purification Kit (Thermo Fisher Scientific), and in case of the presence of double or multiple amplicons in the PCR product, gel purification is performed by the use of the Monarch Gel Purification Kit (NEB).

270

Giulia Ravagnan et al.

4. Isothermal Assembly Master Mix. 5. Plasmid isolation is performed using the GeneJET Plasmid Miniprep Kit (Thermo Fisher Scientific). 6. Genomic DNA (gDNA) is isolated with the DNeasy Blood & Tissue Kit (Qiagen). 7. Sequencing of plasmids and PCR products is performed by Microsynth AG. 2.2 Cultivation and Transformation

1. LB medium liquid or solid (15 g/L agar): 10 g/L tryptone, 5 g/L NaCl, 5 g/L yeast extract, and neomycin (neo, 50 μg/ mL) are used for the selection of E. coli transformants and with neomycin (50 μg/mL) and polymyxin (pmx, 40 μg/mL) for selection of P. polymyxa transconjugants. 2. MM1 P100: EPS-inducing agar plate (15 g/L agar, 30 g/L glucose, 5 g/L peptone, 1.33 g/L MgSO4·7H2O, 0.05 g/L CaCl2, 1.67 g/L KH2PO4, 2 mL/L RPMI 1640 vitamin solution (Sigma-Aldrich), and 1 mL/L trace elements (2.5 g/L FeSO4·7H2O, 2.1 g/L C4H4Na2O6·2H2O, 1.8 g/ L MnCl2·4H2O, 0.075 g/L CoCl2·6H2O, 0.031 g/L CuSO4·7H2O, 0.258 g/L H3BO3, 0.023 g/L Na2MoO4, and 0.021 g/L ZnCl2)) used as first visual screening for deletion of the heteropolysaccharide paenan (pep). 3. Water (Milli-Q). 4. 15-mL falcon culture tubes. 5. Eppendorf centrifuge tubes, 1.5 mL and 2 mL. 6. Spatula.

3

Methods

3.1 Cloning of Plasmids for Deletions and Integrations

Proper selection of the targeting sgRNA has to be considered during cloning of the targeting pCasPP plasmid. The design of the oligonucleotides for the amplification of each fragment (backbone, sgRNA, and the homologous flanks) has to include 20 bp overlapping sequences for efficient cloning via isothermal assembly. All the primers required are listed in Table 1. 1. sgRNA is selected by the PAM sequence, which consists of the sequence “NGG” for the S. pyogenes Cas9, and consists of the 20-nt spacer sequence immediately upstream of the PAM sequence. The spacer sequence can be identified using several different publicly available websites, such as https://www. benchling.com/, and it is recommended to select a sgRNA with a high on-target score. If the desired outcome of the modification involves the deletion of large genomic regions, the sgRNA should be chosen as detailedly described in Subheading 3.4.1.

CRISPR-Cas9-Mediated Genome Editing in Paenibacillus polymyxa

271

Table 1 DNA oligonucleotides used in this protocol Primers Sequence (5′-3′) P01

TCTAGTGGCCAGGAACCGTAAAAAGG

P02

CGACATTGATATGAATATGCCTGTAACAG

P03

CTGTTACAGGCATATTCATATCAATGTCG

P04

CGAATTACCCGTGTGGGTCGGCGTATCCCCTTTCAGATACTCG

P05

CGACCCACACGGGTAATTCGGTTTTAGAGCTAGAAATAGCAAG

P06

CCTCACCTCCTGCTACCATTCCTAGTCGGCGGGCTTGATGCG

P07

GAATGGTAGCAGGAGGTGAGG

P08

GCATCCTGCTTTTACCGACTTGTTGATTATGGAAAAAGAAATCGAAAAAAGCC

P09

AGTCGGTAAAAGCAGGATGC

P10

TACGGTTCCTGGCCACTAGATTACGGATTTACTACAGACTGATG

P11

CCGAATATATCGGTTATGCGTGG

P12

GGAAAGTCTACACGAACCCTTTGGC

P13

ATCCGTCATGGATTGGCCAAG

P14

CGGATGATACATCGCATTCG

P15

AGAGTTTGATCMTGGCTCAG

P16

AAGGAGGTGWTCCARCC

P17

GACCTGACGCCAAAGGAAAAGGAAG

P18

TGTTCTCAATTCTCAGCTTAGCGTATCCCCTTTCAGATACTCG

P19

TAAGCTGAGAATTGAGAACAGTTTTAGAGCTAGAAATAGCAAG

P20

ATTATGAATTCCTTCTGTCGCTAGTCGGCGGGCTTGATGCG

P21

CGACAGAAGGAATTCATAATTCGATGTC

P22

TTTGTTCTCAATCCTTAGTTTATCTACTACCATC

P23

AAACTAAGGATTGAGAACAAAGTGTCCTGAAAG

P24

CATGCCTGTCCTCTTCCAAAC

P25

GCTGTGACACATGTATTTGTGAATG

P26

TCAAATGACTGTATGTCCTTAAAGCC

P27

CTAACCGTGTTCGCCAATTAG

P28

GGAATGTGACAACGACATGGGTTTTAGAGCTAGAAATAGCAAG

P29

TGCATCACTGGAACGATGAACTAGTCGGCGGGCTTGATGC

P30

TTCATCGTTCCAGTGATGCAC

P31

GTTCATTCCACCCTTTCCAGC (continued)

272

Giulia Ravagnan et al.

Table 1 (continued) Primers Sequence (5′-3′) P32

CTGGAAAGGGTGGAATGAACATGATTAATCCTTTTGAGCAGAATG

P33

CTCACCTCCTGCTACCATTCTTGCTCAATAGATGCATTACGAC

P34

GAAGATTGGTTGGCCTTCATC

P35

CAAGGCCGTGTCAAATCGTAG

P36

GGAAGCAGGACATTCAAGTGG

gblock

Sequence (5′-3′)

gb1

CGACCCACACGGGTAATTCGGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGC TAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTTACTCCATC TGGATTTGTTCAGAACGCTCGGTTGCCGCCGGGCGTTTTTTAGCTGCTCC TTCGGTCGGACGTGCGTCTACGGGCACCTTACCGCAGCCGTCGGCTG TGCGACACGGACGGATCGGGCGAACTGGCCGATGCTGGGAGAAGCGCGCTGC TGTACGGCGCGCACCGGGTGCGGAGCCCCTCGGCGAGCGGTGTGA AACTTCTGTGAATGGCCTGTTCGGTTGCTTTTTTTATACGGCTGCCAGATAAGGC TTGCAGCATCTGGGCGGCTACCGCTATGATCGGGGCGTTCCTGCAATTCTTAG TGCGAGTATCTGAAAGGGGATACGCGGAATGTGACAACGACATGG

2. The backbone of a specific pCasPP plasmid is split into two fragments: fragment 1 and fragment 2. Fragment 1 is amplified through the use of the primers P01 + P02, whereas fragment 2 uses a first primer that includes a forward primer P03 and a reverse primer that present the annealing region on the backbone with sequence 5´-ATCCCCTTTCAGATACTCGC, and the overhang of the specific spacer, which is used as the overlapping sequence between fragment 2 and fragment 3. Fragment 3 consists of the sgRNA and the origin of transfer, oriT. The forward primer has always the same annealing sequence 5´-GT TTTAGAGCTAGAAATAGCAAG, but the overhang has to be adjusted based on the spacer sequence, which is chosen. The reverse primer is always used for the amplification of sgRNA fragments, which anneals in the oriT. Consequently, the primers required to create fragment 2 and fragment 3 have to be always adjusted when cloning different plasmids. Fragment 4 and fragment 5 are the upstream (US) and downstream (DS) homology sequences that always vary between plasmids and have to be adjusted accordingly. Constructing plasmids for multiplexing more fragments is present as given in Subheading 3.4.3. Homology sequences should be around 1 kb (see Note 1). 3. PCRs are carried out using either the pCasPP plasmid or the gDNA as a template with high-fidelity polymerase (Q5 or Accuzyme) following the according protocol [29, 30].

CRISPR-Cas9-Mediated Genome Editing in Paenibacillus polymyxa

273

4. The PCR fragments are verified via gel electrophoresis. If the gel shows a clean desirable band, a PCR cleanup is performed. Otherwise, when there are other unspecific bands visible, a gel extraction is performed to purify the desired PCR fragment following the manufacturer’s protocols [31, 32]. 5. The amplified PCR fragments are assembled via isothermal assembly. 6. The isothermal assembly product is transformed in E. coli TOP10 or E. coli Turbo following the procedure described in Subheading 3.2. 3.2 Chemical Transformation of E. coli TOP10, E. coli Turbo, and E. coli S171

E. coli TOP10 or E. coli Turbo is used for plasmid propagation. Once the correct cloned plasmid is obtained, E. coli S17-1 is transformed with the plasmid using the same transformation protocol. E. coli S171-1 will be used for conjugation in P. polymyxa as explained in Subheading 3.3. 1. A 1.5-mL microcentrifuge tube containing chemicalcompetent E. coli TOP10 or E. coli Turbo or E. coli S171-1 is thawed on ice. 2. Either the whole mixture of the isothermal reaction or 0.5–2 μL of the correct plasmid is mixed with 50 μL of chemical-competent cells and incubated on ice for 30 min. 3. The cell culture is heat shocked at 42 °C for 45 s and following incubated on ice for 5 min. 4. 900 μL of LB medium is added to the culture and incubated for recovery in a shaker for 1 h at 37 °C with 250 rpm. 5. After the recovery, the cell culture is spun down at 8000 rpm for 3 min and the supernatant is discarded. The pellet is resuspended in 100 μL of LB and plated on LB + neo plates for selection. The plates are following incubated overnight at 37 °C. 6. After colonies appear on the plates the next day, a positive colony growing on LB + neo plates is picked and inoculated in 3 mL of LB + neo overnight. The culture will be further used for either plasmid extraction, as explained below, or for conjugation as explained in Subheading 3.3. 7. The plasmid is isolated from the E. coli culture according to the Miniprep protocol and the E. coli variant is stocked in 24% glycerol at -80 °C. 8. The concentration of the plasmid is measured by use of a NanoDrop (Implen, Germany) at 260 nm. 9. The plasmid sequence sequencing [33].

is

checked

through

Sanger

274

3.3

Giulia Ravagnan et al.

Conjugation

Transformation of P. polymyxa is done via conjugation with the E. coli S17-1 containing the plasmid of interest as the donor strain. 1. Freshly streaked single colonies are inoculated in 3 mL of LB medium, with or without antibiotic as appropriate, in a 13-mL culture tube. P. polymyxa is cultivated at 30 °C and E. coli S17-1 at 37 °C with shaking of 250 rpm. 2. Following, the overnight culture is diluted 1:100 in the same medium and grown at 37 °C and 250 rpm for 4 h (see Note 2). 3. 900 μL of P. polymyxa culture is transferred to a 1.5-mL reaction tube and incubated in a 42 °C water bath for 15 min. 4. Afterward, 300 μL of E. coli S17-1 culture is added to the reaction tube containing the P. polymyxa culture and then centrifuged at 8000 rpm for 2 min (see Note 3). 5. The supernatant is discarded, and the pellet is carefully resuspended with 100 μL of LB medium. 6. The resuspended cells are slowly dropped on an LB plate and incubated overnight at 30 °C. 7. The culture is scrapped off the plate by using a spatula and resuspended in 500 μL of LB medium. 8. The cells are plated on an LB plate containing neomycin and polymyxin and incubated at 30 °C for 48 h (see Note 4). 9. Initial screening of transconjugants is performed by cPCR to confirm the desired genome modifications. For this, a set of primers binding outside the homologous flanks provided as repair templates are used. Following, cPCR is performed with GoTaq polymerase according to the manufacturer’s protocol [34]. 10. The genomic DNA (gDNA) of the transconjugants is isolated by using DNeasy Blood & Tissue Kit (Qiagen) [35] and once again checked via PCR and sequencing to revalidate the correct modification.

3.4 Multipurpose Genome Editing

The system based on the pCasPP plasmid can be flexibly used for various genome modification purposes in P. polymyxa, including deletion, integration, point mutation, and multiplexing, as shown in Fig. 2.

3.4.1 Deletion and Integration

Approximately 1 kb upstream and downstream of the targeted region are provided as the homologous repair template and are incorporated into the targeting pCasPP plasmid. When aiming for gene integration, the gene(s) of interest is placed in between the homologous flanks. For the deletion of large biosynthetic gene clusters (BGC) (>12 kb), the position of the targeted sequence in the cluster becomes highly important [27]. The region near the 5′

CRISPR-Cas9-Mediated Genome Editing in Paenibacillus polymyxa

275

Fig. 2 Schematic representation of genome editing strategies with the one-plasmid Cas9-mediated system in P. polymyxa. (a) single large cluster deletion, (b) single point mutation modification, (c) two multiplex deletions, and (d) two multiplex integrations. The plasmid harbors all elements needed for the targeted modifications including Cas9, CRISPR array, and homologous regions as repair template. The CRISPR array consists of one or two sgRNA expression cassettes, each of them containing a promoter, spacer sequence, crRNA, and a terminator. (The figure is adapted from [27])

end of the cluster should be avoided, and the region in the middle or close to the 3′ end should be targeted instead. Here, we describe an example of the deletion of a 33 kb EPS cluster in P. polymyxa, namely the pep cluster.

1. The assembly of pCasPP-pep plasmid requires five fragments, as generally described in Step 2 of Subheading 3.1. The backbone of the plasmid is amplified in fragment 1 (3640 bp) and fragment 2 (4926 bp) with the primers P01 + P02 and P03 + P04, respectively. Fragment 3 (714 bp) is obtained with primers P05 + P06. Fragment 4 and fragment 5 are amplified from the gDNA of P. polymyxa by using primers P07 + P08 and P09 + P10. The selected spacer sequence is 5′-CGACCCAC ACGGGTAATTCG, which is located 24 kb and 8.8 kb away

276

Giulia Ravagnan et al.

from the 5′ and 3′ ends of the pep cluster. Other targeting sgRNA can also be used to target the pep cluster. Ideally, they should be located at least 8 kb away from the beginning of the cluster. The PCR fragments are visualized via gel electrophoresis and purified. 2. All five fragments are then assembled by using isothermal assembly. 3. Transformation of E. coli Top10 or Turbo as described in Subheading 3.2 and the desirable pCasPP-pep plasmid is confirmed by Sanger DNA sequencing with primers P11 and P12 to verify the sgRNA and the homologous sequences. 4. Transformation of E. coli S17-1 and conjugation with P. polymyxa are performed as described in Subheadings 3.2 and 3.3. 5. As the pep cluster is responsible for the EPS biosynthesis in P. polymyxa [7], initial screening of the mutant can be done by assessing the lack of EPS production. Single colonies of the transconjugants are streaked on the MM1 P100 plate and incubated overnight at 30 °C. 6. Further screening of the transconjugants, which do not show slimy phenotype on the EPS-inducing plate (MM1 P100), is done by cPCR with primers P13 + P14. The mutant with the pep cluster deleted will show a band of 3 kb, while the wild type will not give any PCR fragment as the applied PCR condition does not support the amplification of such a big region (33 kb). Primers P15 + P16 are used as a control to confirm the viability of the cPCR. The fragment is purified and sent for sequencing with primer P17 to confirm the deletion. 7. The gDNA isolation is performed for the colony, which shows a positive result from the cPCR and sequencing. The DNA is re-checked as described in Step 6 to validate the result. 3.4.2

Point Mutations

To achieve targeted point mutations of a particular gene, the desired mutations are incorporated in the homologous flanks provided in the pCasPP plasmid. Depending on the degree of mutations, the flanks can either be synthesized or introduced via clever primer design for combining the fragments, by the isothermal assembly. The spacer sequence located closest to the sequence to be mutated should be chosen. If applicable, silent mutations can additionally be introduced to the PAM site or the region in close proximity to the targeted sequence to increase editing efficiency. The mutation of Spo0A A257V is given as an example, with four additional silent mutations including one in the corresponding PAM sequence [28].

CRISPR-Cas9-Mediated Genome Editing in Paenibacillus polymyxa

277

1. Similar to the previous example, the assembly of pCasPPSpo0A* plasmid also requires five fragments. Primers P01 + P02 are used to obtain fragment 1 (3640 bp), primers P03 + P18 for fragment 2 (4926 bp), and primers P19 + P20 for fragment 3 (714 bp). The spacer sequence is 5′-TAAGCTG AGAATTGAGAACA and is introduced to the plasmid via primers P18 and P19. Fragment 4 and fragment 5 containing the US and DS homologous flanks are amplified by using primers P21 + P22 and P23 + P24, respectively. Primers P22 and P23 contain the five mutations to be introduced into the spo0A gene. The point mutations are C ! T, G ! A, G ! A, A ! G, G ! A at nucleotide positions 770, 888, 891, 894, and 906, respectively. The mutation at position 770 results in the desired Spo0A A257V, and the mutation at 906 changes the PAM sequence so that it is no longer recognizable by the Cas9 nuclease. The other mutations are located within the original spacer sequence. Hence, these mutations prevent the Cas9 from attacking the desired mutant. 2. The five fragments are assembled via the isothermal assembly. 3. Transformation of E. coli Turbo or Top10 is performed as described in Subheading 3.2, and the desirable pCasPPSpo0A* is verified by screening the colonies with cPCR and following Sanger sequencing with primers P11 and P12 to verify the sgRNA and the homologous sequences. 4. Transformation of E. coli S17-1 and conjugation with P. polymyxa is performed as described in Subheadings 3.2 and 3.3. 5. Following the conjugation, the transconjugants are verified by cPCR with primers P25 + P26, which should result in a band of ~2.2 kb. Primers P15 + P16 are used as a control to confirm the functionality of the cPCR. The fragment is purified and sent for sequencing with primer P27 to confirm the mutation. 3.4.3

Multiplexing

The pCasPP plasmid can also be employed for multiplexing by incorporating two sets of sgRNA expression cassettes, each of them consisting of a promoter, spacer, crRNA, and terminator. By using this approach, the pep (33 kb) and dhb (12 kb) clusters were simultaneously deleted in P. polymyxa [27]. 1. The pCasPP-pep-dhb plasmid is assembled from eight fragments. As previously mentioned, fragment 1 (3640 bp) is obtained with primers P01 + P02, fragment 2 (4926 bp) with P03 + P04, and fragment 3 (714 bp) with P28 + P29. Fragment 4 and fragment 5 are the homologous flanks of the dhb cluster and are amplified with primers P30 + P31 and P32 + P33. Fragment 6 and fragment 7 are the homologous flanks of the pep cluster and are amplified with primers

278

Giulia Ravagnan et al.

P07 + P08 and P09 + P10. Fragment 8 (gb1, 456 bp) was synthesized by Eurofins (Germany). It consists of a spacer sequence for the pep cluster (5′-CGACCACACGGG TAATTCG), gRNA-1, promoter-2, and the spacer for the dhb cluster (5′-GGAATGTGACAACGACATGG). 2. All eight fragments are subsequently assembled to generate the dual targeting plasmid pCasPP-pep-dhb via isothermal assembly (see Note 4). 3. Transformation of E. coli Top10 or Turbo as described in Subheading 3.2, and the desirable pCasPP-pep-dhb plasmid is verified by Sanger sequencing with primers P07, P11, and P12 to verify the sgRNAs and the homologous sequences. 4. Transformation of E. coli S17-1 and conjugation with P. polymyxa are performed as described in Subheadings 3.2 and 3.3. 5. Following the conjugation, the transconjugants are screened on an MM1 P100 plate to evaluate the deletion of the pep cluster, as described in step 4 of Subheading 3.4.1. 6. Further screening by cPCR is done by using primers P13 + P14 for pep deletion and P34 + P35 for dhb deletion. Primers P15 + P16 are used as a control to confirm the viability of the cPCR and should give a band of ~1.5 kb. Deletion of pep and dhb cluster will result in a band of ~3 kb, while the wild type will not give any PCR fragment since the applied PCR condition does not support for amplification of such big regions (33 kb and 12 kb). DNA sequencing is used to confirm the deletion of pep and dhb by using primers P17 and P36, respectively.

4

Notes 1. Interestingly, the length of homologous flanks can also be adjusted reaching a minimum of 300 bp. 2. If a mutant of P. polymyxa is used instead of the wild type, the time of ideal growth for conjugation (early exponential phase) could vary. Instead of 4 h, it might be less or more. This might affect the negative genome-editing outcome. For this reason, the incubation time should be adjusted accordingly. 3. E. coli S17-1 might not grow as expected due to the burden that a plasmid exerts on the cells. For this reason, double or even triple amounts of the E. coli S17-1 culture can be added instead of just using 300 μL. 4. Serial dilutions should be performed to obtain single colonies of transconjugants, especially if a high transformation efficiency is expected.

CRISPR-Cas9-Mediated Genome Editing in Paenibacillus polymyxa

279

5. For complicated plasmids, it is also advisable to use the Golden Gate assembly method instead of isothermal assembly, as described in [7].

Acknowledgments This study is part of the German Federal Ministry of Education and Research (BMBF) funded project Polymore with the no. 031B0855A and by BASF SE, Germany. References 1. Jeong H, Choi SK, Ryu CM et al (2019) Chronicle of a soil bacterium: Paenibacillus polymyxa E681 as a Tiny Guardian of plant and human health. Front Microbiol 10:467 2. Ru¨tering M, Schmid J, Ru¨hmann B et al (2016) Controlled production of polysaccharides-exploiting nutrient supply for levan and heteropolysaccharide formation in Paenibacillus sp. Carbohydr Polym 148:326– 334 3. Schilling C, Ciccone R, Sieber V et al (2020) Engineering of the 2,3-butanediol pathway of Paenibacillus polymyxa DSM 365. Metab Eng 61:381–388 4. Zarschler K, Janesch B, Zayni S et al (2009) Construction of a gene knockout system for application in Paenibacillus alvei CCM 2051T, exemplified by the S-layer glycan biosynthesis initiation enzyme WsfP. Appl Environ Microbiol 75:3077–3085 5. Bin KS, Timmusk S (2013) A simplified method for gene knockout and direct screening of recombinant clones for application in Paenibacillus polymyxa. PLoS One 8:e68092 6. Choi SK, Park SY, Kim R et al (2008) Identification and functional analysis of the fusaricidin biosynthetic gene of Paenibacillus polymyxa E681. Biochem Biophys Res Commun 365: 89–95 7. Ru¨tering M, Cress BF, Schilling M et al (2017) Tailor-made exopolysaccharides-CRISPRCas9 mediated genome editing in Paenibacillus polymyxa. Synth Biol 2:ysx007 8. Jinek M, Chylinski K, Fonfara I et al (2012) A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science (1979) 337:816–821 9. Garneau JE, Dupuis ME`, Villion M et al (2010) The CRISPR/Cas bacterial immune system cleaves bacteriophage and plasmid DNA. Nature 468:67–71

10. Gasiunas G, Barrangou R, Horvath P et al (2012) Cas9-crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria. Proc Natl Acad Sci U S A 109:E2579–E2586 11. Szostak JW, Orr-Weaver TL, Rothstein RJ et al (1983) The Double-Strand-Break repair model for recombination. Cell 33:25–35 12. Chayot R, Montagne B, Mazel D et al (2010) An end-joining repair mechanism in Escherichia coli. Proc Natl Acad Sci U S A 107: 2141–2146 13. Lieber MR (2010) The mechanism of doublestrand DNA break repair by the nonhomologous DNA end-joining pathway. Annu Rev Biochem 79:181–211 14. Bikard D, Jiang W, Samai P et al (2013) Programmable repression and activation of bacterial gene expression using an engineered CRISPR-Cas system. Nucleic Acids Res 41: 7429–7437 15. Qi LS, Larson MH, Gilbert LA et al (2013) Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell 152:1173–1183 16. Komor AC, Kim YB, Packer MS et al (2016) Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533:420–424 17. Adiego-Pe´rez B, Randazzo P, Daran JM et al (2019) Multiplex genome editing of microorganisms using CRISPR-Cas. FEMS Microbiol Lett 366:fnz086 18. Tian J, Xing B, Li M et al (2022) Efficient large-scale and scarless genome engineering enables the construction and screening of Bacillus subtilis biofuel overproducers. Int J Mol Sci 23:4853 19. Baumgart M, Unthan S, Kloß R et al (2018) Corynebacterium glutamicum Chassis C1*: building and testing a novel platform host for

280

Giulia Ravagnan et al.

synthetic biology and industrial biotechnology. ACS Synth Biol 7:132–144 20. Fan X, Zhang Y, Zhao F et al (2020) Genome reduction enhances production of polyhydroxyalkanoate and alginate oligosaccharide in Pseudomonas mendocina. Int J Biol Macromol 163:2023–2031 21. Zhang F, Huo K, Song X et al (2020) Engineering of a genome-reduced strain Bacillus amyloliquefaciens for enhancing surfactin production. Microb Cell Factories 19:223 22. Milunovic B, diCenzo GC, Morton RA et al (2014) Cell growth inhibition upon deletion of four toxin-antitoxin loci from the megaplasmids of Sinorhizobium meliloti. J Bacteriol 196:811–824 23. diCenzo GC, Zamani M, Milunovic B et al (2016) Genomic resources for identification of the minimal N2-fixing symbiotic genome. Environ Microbiol 18:2534–2547 24. Schilling C, Koffas MAG, Sieber V et al (2021) Novel prokaryotic CRISPR-Cas12a-based tool for programmable transcriptional activation and repression. ACS Synth Biol 9:3353–3363 25. Kim MS, Kim HR, Jeong DE et al (2021) Cytosine base editor-mediated multiplex genome editing to accelerate discovery of novel antibiotics in Bacillus subtilis and Paenibacillus polymyxa. Front Microbiol 12:691839 26. Cobb RE, Wang Y, Zhao H (2015) Highefficiency multiplex genome editing of Streptomyces species using an engineered CRISPR/ Cas System. ACS Synth Biol 4:723–728 27. Meliawati M, Teckentrup C, Schmid J (2022) CRISPR-Cas9-mediated large cluster deletion and multiplex genome editing in Paenibacillus polymyxa. ACS Synth Biol 11:77–84 28. Meliawati M, May T, Eckerlin J et al (2022) Insights in the complex DegU, DegS, and Spo0A regulation system of Paenibacillus

polymyxa by CRISPR-Cas9-based targeted point mutations. Appl Environ Microbiol 88: e0016422 29. NEB protocol: PCR Using Q5® High-Fidelity DNA Polymerase (M0491). https://interna tional.neb.com/protocols/2013/12/13/pcrusing-q5-high-fidelity-dna-polymerase-m04 91. Accessed March 2023 30. Bioline protocol: Accuzyme DNA polymerase. https://www.bioline.com/mwdownloads/ download/link/id/2703/accuzyme_dna_ polymerase_product_manual.pdf. Accessed March 2023 31. Thermo Fischer Scientific protocol: GeneJET P C R p u r i fi c a t i o n k i t . h t t p s : // w w w. thermofisher.com/document-connect/docu ment-connect.html?url=https://assets. thermofisher.com/TFS-Assets%2FLSG%2 Fmanuals%2FMAN0012662_GeneJET_PCR_ Purification_UG.pdf. Accessed March 2023 32. NEB protocol: Monarch DNA Gel extraction kit protocol card https://international.neb. com/protocols/2015/11/23/monarch-dnagel-extraction-kit-protocol-t1020. Accessed March 2023 33. Microsynth protocol: Sanger sequencing. https://www.microsynth.com/files/Inhalte/ PDFs/Sanger/UserGuide_EconomyRun.pdf. Accessed March 2023 34. Promega protocol: GoTaq G2 green master mix. https://www.promega.de/-/media/ files/resources/protocols/product-informa tion-sheets/g/gotaq-g2-green-master-mixprotocol.pdf?rev=ba2e5156136b43068889f3 c33c344240&la=en. Accessed March 2023 35. Qiagen protocol: DNeasy Blood & Tissue Handbook. https://www.qiagen.com/us/res ources/resourcedetail?id=68f29296-5a9f-40 fa-8b3d-1c148d0b3030&lang=en. Accessed March 2023

Part III Genome Language and Computing

Chapter 17 Programming Juxtacrine-Based Synthetic Signaling Networks in a Cellular Potts Framework Calvin Lam and Leonardo Morsut Abstract Synthetic development is a synthetic biology subfield aiming to reprogram higher-order eukaryotic cells for tissue formation and morphogenesis. Reprogramming efforts commonly rely upon implementing custom signaling networks into these cells, but the efficient design of these signaling networks is a substantial challenge. It is difficult to predict the tissue/morphogenic outcome of these networks, and in vitro testing of many networks is both costly and time-consuming. We therefore developed a computational framework with an in silico cell line (ISCL) that sports basic but modifiable features such as adhesion, motility, growth, and division. More importantly, ISCL can be quickly engineered with custom genetic circuits to test, improve, and explore different signaling network designs. We implemented this framework in a free cellular Potts modeling software CompuCell3D. In this chapter, we briefly discuss how to start with CompuCell3D and then go through the steps of how to make and modify ISCL. We then go through the steps of programming custom genetic circuits into ISCL to generate an example signaling network. Key words Synthetic biology, Computational modeling, Cellular Potts, Tissue engineering, Morphogenesis, Self-organization, Juxtacrine signaling, Synthetic receptors, Synthetic development, Differential adhesion

1

Introduction Synthetic development is a recent subbranch of synthetic biology that aims to control tissue formation and morphogenesis through custom signaling networks [1–6]. Efforts have recently expanded into engineering mammalian cells, but they proceed rather slowly due to reliance on trial-and-error iterations [7–10]. One possible solution is to perform parallel screens of numerous signaling networks. However, this is not practical in vitro as it is both timeconsuming and costly. A computational analog could be an ideal solution here [11]. For example, an in silico framework with a simulated cell line that can be implemented with different signaling

Jeffrey Carl Braman (ed.), Synthetic Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2760, https://doi.org/10.1007/978-1-0716-3658-9_17, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2024

283

284

Calvin Lam and Leonardo Morsut

networks would be modularly identical to that of the in vitro system. However, rather than requiring time-consuming modifications of the real cell’s properties, it would be a rapid modification of code representing the property in the simulated cell. Such a framework enables cost-effective and rapid screening of numerous signaling networks, as it becomes possible to design, implement, alter, and test even within the same day. We have developed one such framework and thus far have used it to study the synNotch differential adhesion system in Toda et al. [8, 12]. In this in vitro system, several signaling networks are implemented as genetic circuits into a mouse fibroblast cell line (L929). These genetic circuits consist of the synthetic Notch (synNotch) receptor that enables cells to sense a user-specified ligand and induce the expression of a fluorescent reporter and adhesion protein. Different signaling networks comprise cells programmed with different synNotch genetic circuits, enabling cells to communicate with one another and undergo organization into different structures. One signaling network (Fig. 1) from this system will serve as a demo in this chapter, and we will go through the steps on how to create this network. This signaling network served as a reference for others in the in vitro system and offers a good balance of generalizability and complexity. As we implemented our framework in a free cellular Potts software, CompuCell3D (CC3D) [13], we will show how to generate this network in CC3D. In general, however, this framework can be abstracted to other modeling software provided it can model the composing features.

2

Materials Materials to use the framework include the CompuCell3D installation file for your operating system, the framework code on the GitHub repository, and a computer. As the framework was built in CC3D version 3.7.8, we will show how to make the reference network in this version. Download the installation file for your operating system at: https:// compucell3d.org/SrcBinOldReleases#A378 (see Note 1). The code for the reference signaling network is available at: https:// github.com/calvinlamk/Book-Chapter-Demo. “ChapterDemo” is the project file containing all the code needed for the CC3D simulation. This signaling network corresponds to Fig. 4g in [12] (see Note 2). A modern computer running either Windows, Mac, or Linux can be used with CC3D.

Engineering Morphogenic Signaling Networks In Silico

285

A

B B, Id3

(1) L.ECAD/RFP

H.ECAD/GFP-lig

(4)

R, Id4

(2)

(3)

L.ECAD/RFP

H.ECAD/GFP-lig

(ECAD)

Y, Id1

G, Id2

(ECAD)

Fig. 1 Reference signaling network from [8] as modeled in [12]. (Redrawn from Toda 2018 with permission from AAAS). The reference signaling network contains two parts. (a) First, a developmental trajectory is designed consisting of a mixture of blue and gray cells. Blue cells then signal to gray cells turning them green and adhesive to other green cells. Green cells then signal to blue cells turning them red and adhesive to green cells more than other red cells. End point structure is shown. (b) To realize this developmental trajectory, cells were programmed with the following genetic circuits. Blue cells (in silico coded as type name B with type Id 3) signal with a triangle ligand to gray cells (in silico coded as type name Y with type Id 1) via a notched synNotch receptor (step 1). This causes gray cells to activate to green cells (in silico coded as type name G with type Id 2) (step 2). Green cells have high levels of E-cadherin (an adhesive protein denoted as circles on a stick) and a GFP reporter-ligand (GFP-lig) that can signal to blue cells via a square indented synNotch receptor (step 3). This signaling causes blue cells to activate, turning them red (in silico coded as type name R with type Id 4) via a red fluorescent reporter protein and expressing low levels of E-cadherin (step 4). As green cells and red cells still have the notched and square indented synNotch receptor, respectively, they can still receive signaling from the cognate ligand

3 3.1

Methods Getting Started

After installing CC3D, we will open the built-in project editor called Twedit++ (see Note 3). Twedit++ can be found in the CC3D install directory, and in Windows OS, CC3D by default installs into the C directory. Be sure to open the run file and not the folder, which contains the backend. The program will pop up

286

Calvin Lam and Leonardo Morsut

along with a terminal, which serves to show the commands processed. Open the downloaded demo project file called “ChapterDemo,” by navigating to the “CC3D Project” tab (top left of program), then clicking “Open CC3D Project. . .”, and then opening the CC3D file named “ChapterDemo” in the project folder. Both the project folder and CC3D file are named “ChapterDemo.” This will load all constitutive files in the project folder and will show up at the left of Twedit++. For this demo, you should see the CC3D icon (little green globe) with “ChapterDemo.cc3d” next to it. To see the code in the project, expand the project by clinking the > icons. Notice that the project comprises several files. In the demo, it is “ELUGM.py,” “ELUGM.xml,” “ELUGMSteppables. py,” and “ParameterScanSpecs.xml.” Click each of these individual files to load them into Twedit++ for viewing and editing. For the framework, the files we will discuss are the ELUGM xml file, ELUGM Steppables py file, and ParameterScanSpecs xml file (See Note 4). The xml file contains the basic parameters for the cellular Potts calculations and simple plugins for features such as cells, adhesion, and initialization of the simulation. The steppables.py file contains the majority of the framework and is organized by comments headed by the # symbol. These comments serve as section headers and offer a simple way to locate sections by simply searching via Ctrl F. The parameterscan xml file, though not necessary for running the simulation, is crucial for testing signaling network parameters. 3.2 Creating Cells in CC3D

To start, we will first determine the number of cell types needed in our signaling network (Fig. 1). In the reference signaling network, we have two cell types, gray and blue, that each have their own genetic circuit. With signaling, each cell type can change its adhesive state by expressing different levels of an adhesive protein. Gray cells can turn green indicating it has high E-cadherin, and blue cells can turn red indicating it has low E-cadherin (Fig. 1b). Therefore, we need to specify four cell types in total (see Note 5). To create these cells in CC3D, we navigate to the xml file “ELUGM.xml” and find the line “” (Fig. 2a). Directly underneath this line is a block of code specifying the existence of the cell type along with its type Id and type name. For example, “” tells CC3D that cells of type Y exist in this code and are recognized with a type Id of 1. The name here is arbitrary, and here, it is Y for gray. We strongly recommend naming cell types in a manner easily recognizable in the signaling network schematic. From this block of code, we can see we have four cell types: Y, G, B, R for gray, green, blue, and red, respectively. Respective Id number is 1, 2, 3, and 4. To add more cell types, just add in additional lines of “,” where X is a type Id number not previously used for another cell type and Z is the

Engineering Morphogenic Signaling Networks In Silico

287

A





B





C B, Id3

Y, Id1

R, Id4

G, Id2

Fig. 2 Establishing the cells in CC3D. (a) CellType plugin allows specifying cells in CC3D through an Id number and a type name. (b) ConnectivityGlobal plugin prevents cells from fragmenting during the simulation. (c) With these lines specified, we have defined the cells needed in our reference signaling network in CC3D

name (see Note 6). For example, the line will create a cell type O with type number 5. We have utilized a CC3D plugin that helps stop cells from fragmenting, and the added cell types will need to be included here as well. Find the line “” (Fig. 2b), and add your additional cell types under this by replacing the X in this line “” with the name specifying the type, and then copy the line to under “.” In the example above, we would add the line “.” With this section completed, we have defined the cell types required to model the reference signaling network (Fig. 2c).

288

Calvin Lam and Leonardo Morsut

3.3 Adding Adhesion to the In Silico Cell Line (ISCL)

With cells defined in CC3D, we can start adding properties to these cells. We begin with adhesion, as it is a basic feature of all biological cells and is key in our reference signaling network. In the framework, we utilize the basic adhesion plugin called Contact Plugin, which is located in the same xml file, “ELUGM.xml” (See Note 7). Find the line “” (Fig. 3a).

A

0 26.0 26.0 26.0 26.0 45.0 40.0 49.0 40.0 20.0 49.0 35.0 49.0 49.0 40.0 4

B

B, Id3

Y, Id1

R, Id4

G, Id2 (ECAD)

(ECAD)

Fig. 3 Defining cell-cell adhesion in CC3D. (a) We have utilized a contact plugin that allows basic modeling of cell-cell adhesion as they differ between types of cells. The higher the value, the weaker the cell type-cell type adhesion. For example, G-R adhesion is 35, which is intermediate in our reference signaling network. Medium is a special double interface calculation; thus, as long as cell-cell adhesion is less than 52 (2 × 26), cells will prefer to adhere to one another rather than disperse. NeighborOrder controls the number of surrounding pixels used when calculating the effect of adhesion in moving cells. We find that four help cells sort at a reasonable rate for these signaling networks. (b) With the adhesion values implemented, we are beginning to piece our signaling network together

Engineering Morphogenic Signaling Networks In Silico

289

Underneath, you will notice a block of code following the pattern “NUMBER” where X and Z are names of cell types specified in the CellType Plugin we just examined. For example, the line “45.0” specifies the adhesion values of cells of type Y to other cells of type Y as 45. In the Potts formulation, a smaller number corresponds to a stronger adhesion strength. Thus, G-G adhesion in this signaling network is much stronger than Y-Y adhesion (20 < 45). Also, notice that the cell-to-medium adhesion is specified as well. Here, it is 26, which CC3D interprets twice, one for the cell interface and one for the medium interface. This interpretation is unique for cell to medium only. Therefore, to have cells weakly aggregate to one another, we must set the adhesion to just a bit less than 2 × 26 (52). B to any cell types have the weakest adhesion in the signaling network and appropriately are set to 49 (see Note 8). In general, adhesion values between cell types will depend on the signaling network modeled. As a broad rule, we find that the values themselves are a range and the hierarchy of type-to-type adhesion to be far more important. In the reference signaling network, we set a hierarchy based on the adhesion protein. High E-cadherin (H.ECAD) cells logically will prefer to adhere to high E-cadherin cells over low E-cadherin (L.ECAD). L.ECAD cells logically will prefer to adhere to other E-cadherin-bearing cells rather than non-cadherin-bearing cells. Therefore, our hierarchy should be G-G is more adhesive than R-R which is more adhesive than Y-Y or B-B. Cross-interacting values like G-Y will lie somewhere in between. To add more cells, simply replace the X and Z in this line “NUMBER” with the appropriate type names and the number with the adhesion strength desired. Then, place the line under the Contact Plugin section with the other similar lines (See Note 9). With this, we have established how our cells adhere to one another (Fig. 3b). 3.4 Adding Motility to ISCL

With cell adhesion defined, we can move over to another basic cell feature, motility. In CC3D, motility can be specified on the individual cell level via the line “cell.fluctAmpl.” These lines are found under the #Defining and Calculating Cell Physical Properties Code/Parameters section in the “ELUGMSteppables.py” file. The first instance of this line will be “cell.fluctAmpl=BASAL +SCF*(CtoM*CSAM+YtoY*CSAY+YtoG*CSAG+YtoB*CSAB +YtoR*CSAR)/cell.surface” appearing in the block of code in Fig. 4a. This block effectively states that, if the focal cell is of type 1 (type name Y/gray cells), then set its motility according to the given formula. Similar blocks of code exist for all the other cell types. This formula calculates motility based on a baseline motility BASAL and neighbors with weight SCF. The weight defines how

290

Calvin Lam and Leonardo Morsut

A #Defining and Calculating Cell Physical Properties Code/Parameters if cell.type==1: cell.lambdaSurface=2.2 cell.lambdaVolume=2.2 NUMTY+=1 cell.fluctAmpl=BASAL+SCF*(CtoM*CSAM+YtoY*CSAY+YtoG*CSAG+YtoB*CSAB+YtoR*CSAR)/cell.surface YGPTS+=cell.dict["PTS"][0]

B #Iterating Over Neighbors Code for neighbor, commonSurfaceArea in self.getCellNeighborDataList(cell): if neighbor is None: continue if neighbor.type==1: CSAY+=commonSurfaceArea PTSY+=0 if neighbor.type==2: CSAG+=commonSurfaceArea PTSG+=commonSurfaceArea*neighbor.dict["PTS"][0]/(neighbor.surface) if neighbor.type==3: CSAB+=commonSurfaceArea PTSB+=commonSurfaceArea*(CONEXPSCF/(1+math.exp(-(mcs-THETA)/XI)))/neighbor.surface if neighbor.type==4: CSAR+=commonSurfaceArea PTSR+=commonSurfaceArea*(CONEXPSCF/(1+math.exp(-(mcs-THETA)/XI)))/neighbor.surface CSAM=cell.surface-(CSAY+CSAG+CSAB+CSAR)

C #Calling Adhesion Values From the XML File Code global YtoY,YtoG,GtoY,YtoB,BtoY,YtoR,RtoY,GtoG,GtoB,BtoG,GtoR,RtoG,BtoB,BtoR,RtoB,RtoR YtoY=float(self.getXMLElementValue(['Plugin','Name','Contact'],['Energy','Type1','Y','Type2','Y'])) YtoG=GtoY=float(self.getXMLElementValue(['Plugin','Name','Contact'],['Energy','Type1','Y','Type2','G'])) YtoB=BtoY=float(self.getXMLElementValue(['Plugin','Name','Contact'],['Energy','Type1','Y','Type2','B'])) YtoR=RtoY=float(self.getXMLElementValue(['Plugin','Name','Contact'],['Energy','Type1','Y','Type2','R'])) GtoG=float(self.getXMLElementValue(['Plugin','Name','Contact'],['Energy','Type1','G','Type2','G'])) GtoB=BtoG=float(self.getXMLElementValue(['Plugin','Name','Contact'],['Energy','Type1','G','Type2','B'])) GtoR=RtoG=float(self.getXMLElementValue(['Plugin','Name','Contact'],['Energy','Type1','G','Type2','R'])) BtoB=float(self.getXMLElementValue(['Plugin','Name','Contact'],['Energy','Type1','B','Type2','B'])) BtoR=RtoB=float(self.getXMLElementValue(['Plugin','Name','Contact'],['Energy','Type1','B','Type2','R'])) RtoR=float(self.getXMLElementValue(['Plugin','Name','Contact'],['Energy','Type1','R','Type2','R']))

Fig. 4 Adding motility to ISCL. (a) Motility is defined in section #Defining and Calculating Cell Physical Properties Code/Parameters of the steppable.py file “ELUGMSteppable.py.” In this example, block of code for gray cells (determined by line “if cell.type==1:”), motility is defined as “cell.fluctAmpl” calculated with the formula shown. This formula calculates motility as a sum of baseline adhesion (BASAL) and motility affected by adhesive neighbors. (b) To determine the values of the parameters in the motility/cell.fluctAmpl formula, we must calculate the common surface area the focal cell shares with each cell type. This block of code, for a focal cell, iterates over its neighbors and tallies to total surface area shared with each cell type. For example, this is done for green neighbors via the line “CSAG+=commonSurfaceArea.” (c) The other parameters (i.e., YtoY, YtoG, etc.) are determined by the cell-cell adhesion specified in the Contact Plugin of the xml file. To call these values from the xml plugin, we define several variables and assign them to the appropriate xml value through the lines shown. Once assigned, these variables can be referred to in the steppable.py file

Engineering Morphogenic Signaling Networks In Silico

291

well its neighbors’ attenuate motility and we built the formula with the logic that cells with more adhesive neighbors are less likely to move. Dividing by the cell’s surface area, cell.surface, normalizes the attenuation. BASAL, SCF, and CtoM (cell to medium) are constant values that are the same across all cells in this signaling network (see Note 10). They can therefore be found specified under the section #Motility Parameters at the top of the steppables.py file. The other variables are calculated on the individual cell level and will be examined subsequently. The variables CSAM, CSAY, CSAG, CSAB, and CSAR refer to the common surface area the focal cell shares with the medium and neighbors of type gray, green, blue, and red, respectively. These variables are defined in the #Defining Per Step Per Cell Variables for Signaling and Motility Code section, and values are calculated in the #Iterating Over Neighbors Code section. In the former, we define each variable as 0 and then modify it from the calculations in the latter section. In the #Iterating Over Neighbors Code section, each focal cell has its neighbors iterated over and neighbor type identified (Fig. 4b). For instance, the surface area the focal cell shares with cells of type 1 is totaled by the line “CSAY+=commonSurfaceArea.” Once calculated, the values are then used in the motility code. To add additional cell types, for example, cell type 5 with name O, the variable CSAO should be added to the #Defining Per Step Per Cell Variables for Signaling and Motility Code section with line CSAO = 0. Next, add this cell type to the #Iterating Over Neighbors Code section with lines such as “if neighbor.type==5: CSAO+=commonSurfaceArea”

Also, modify the line “CSAM=cell.surface-(CSAY+CSAG +CSAB+CSAR)” to “CSAM=cell.surface-(CSAY+CSAG+CSAB +CSAR+CSAO)” as O cells can now also contribute to shared surface area. The other variables YtoY, YtoG, YtoB, and YtoR refer to the focal cell’s adhesion to the type of the neighbor. As our focal cell is of type 1 (Y), these variables refer to Y-Y, Y-G, Y-B, Y-R adhesion strengths we previously defined in the adhesion section of the xml file. To call these values from the xml file, we use the code located in the #Calling Adhesion Values from the XML File Code section. To begin, we define the names of the variables in the line “global YtoY, YtoG,GtoY,YtoB,BtoY,YtoR,RtoY,GtoG,GtoB,BtoG,GtoR, RtoG,BtoB,BtoR,RtoB,RtoR” (Fig. 4c). Global tells CC3D these variables are the same anywhere in the simulation. Next, we call the values directly from the xml file using the line “float(self.getXMLElementValue([‘Plugin’,‘Name’,‘Contact’],[‘Energy’,‘Type1’,‘X’,‘Type2’,‘Z’])).” This line tells the steppable.py file to obtain the adhesion values between cells of type X and Z from the

292

Calvin Lam and Leonardo Morsut

Contact Plugin in the xml file. For example, the line “YtoG=GtoY=float(self.getXMLElementValue([‘Plugin’,‘Name’,‘Contact’],[‘Energy’,‘Type1’,‘Y’,‘Type2’,‘G’]))” defines the variables YtoG and GtoY as the same adhesion value of between types Y and G (40) in the xml file (See Note 11). To add additional cell types, again with cell type 5 with name O as an example, its adhesion can be called into the steppables.py file with similar lines. Do not forget to first define the variable name in the global line before assigning it a value. With the common surface area calculated and the adhesion values called from the xml file, cell motility is calculatable. For adding additional cell types, we will give an example with cell type 5 with name O. For example, in the #Defining and Calculating Cell Physical Properties Code/Parameters, for cells of type 1 (denoted by line “if cell.type==1:”), “cell.fluctAmpl” can be changed to “cell.fluctAmpl=BASAL+SCF*(CtoM*CSAM+YtoY*CSAY +YtoG*CSAG+YtoB*CSAB+YtoR*CSAR+YtoO*CSAO)/cell. surface” if example cell type 5 with name O was added. This change will need to be repeated for each cell type defined in this section #Defining and Calculating Cell Physical Properties Code/Parameters. We will also need to specify a block for cell type O in this section as well. For example, a block such as “if cell.type==5: cell.fluctAmpl=BASAL+SCF*(CtoM*CSAM+OtoY*CSAY +OtoG*CSAG+OtoB*CSAB+OtoR*CSAR+OtoO*CSAO)/cell. surface”. In the later sections, we go over the other lines in the block (Fig. 4a) and give examples on how they are added to this block. 3.5 Adding Size, Growth, and Division to ISCL

In CC3D, cells can be programmed at the individual cell level to target a surface area and volume through the variables “cell.targetSurface” and “cell.targetVolume.” For example, a “cell.targetSurface” of 113 and a “cell.targetVolume” of 113 tell the cells to try to maintain a surface area and volume of 113 pixels. How well the cell adheres to these targets is dependent on the parameters “cell.lambdaSurface” and “cell.lambdaVolume,” respectively. These lambda parameters can be thought of as spring constants and dictate how well the cell can deviate from its assigned target values. A higher value constrains the cell closer to its target surface/volume, while a lower value allows the cell to deviate more from the target surface/ volume. In this framework, we chose to create and specify a target radius for each cell, and targetSurface and targetVolume are calculated from this target radius through the classic formulas for the surface area and volume of a sphere, respectively. This can be found in the section #Initialization of Cells Parameters with the lines: “cell.dict [“RDM”]=RNG.gauss(RADAVG,RADDEV),” “cell.

Engineering Morphogenic Signaling Networks In Silico

293

A

#Initialization of Cells Parameters for cell in self.cellList: cell.dict["RDM"]=RNG.gauss(RADAVG,RADDEV) cell.lambdaSurface=2.5 cell.targetSurface=4*math.pi*cell.dict["RDM"]**2 cell.lambdaVolume=2.5 cell.targetVolume=(4/3)*math.pi*cell.dict["RDM"]**3 cell.dict["PTS"]=[0]

B

#Mitosis Codes from PySteppablesExamples import MitosisSteppableBase class MitosisSteppable(MitosisSteppableBase): def __init__(self,_simulator,_frequency=1): MitosisSteppableBase.__init__(self,_simulator, _frequency) self.setParentChildPositionFlag(0) def step(self,mcs): cells_to_divide=[] for cell in self.cellList: cell.dict["RDM"]+=RNG.uniform(MTFORCEMIN,MTFORCEMAX) cell.targetSurface=4*math.pi*cell.dict["RDM"]**2 cell.targetVolume=(4/3)*math.pi*cell.dict["RDM"]**3 if cell.volume>2*(4/3)*math.pi*RADAVG**3: cells_to_divide.append(cell) for cell in cells_to_divide: self.divideCellRandomOrientation(cell)

C

def updateAttributes(self): self.parentCell.dict["RDM"]=RNG.gauss(RADAVG,RADDEV) self.parentCell.targetVolume=(4/3)*math.pi*self.parentCell.dict["RDM"]**3 self.parentCell.targetSurface=4*math.pi*self.parentCell.dict["RDM"]**2 self.cloneParent2Child()

#Defining and Calculating Cell Physical Properties Code/Parameters if cell.type==1: cell.lambdaSurface=2.2 cell.lambdaVolume=2.2 NUMTY+=1 cell.fluctAmpl=BASAL+SCF*(CtoM*CSAM+YtoY*CSAY+YtoG*CSAG+YtoB*CSAB+YtoR*CSAR)/cell.surface YGPTS+=cell.dict["PTS"][0] if cell.type==2: cell.lambdaSurface=1.0 cell.lambdaVolume=1.0 NUMTG+=1 cell.fluctAmpl=BASAL+SCF*(CtoM*CSAM+GtoY*CSAY+GtoG*CSAG+GtoB*CSAB+GtoR*CSAR)/cell.surface YGPTS+=cell.dict["PTS"][0] if cell.type==3: cell.lambdaSurface=2.2 cell.lambdaVolume=2.2 NUMTB+=1 cell.fluctAmpl=BASAL+SCF*(CtoM*CSAM+BtoY*CSAY+BtoG*CSAG+BtoB*CSAB+BtoR*CSAR)/cell.surface BRPTS+=cell.dict["PTS"][0] if cell.type==4: cell.lambdaSurface=1.0 cell.lambdaVolume=1.0 NUMTR+=1 cell.fluctAmpl=BASAL+SCF*(CtoM*CSAM+RtoY*CSAY+RtoG*CSAG+RtoB*CSAB+RtoR*CSAR)/cell.surface BRPTS+=cell.dict["PTS"][0]

Fig. 5 Adding physical dimensions and division into ISCL. (a) In the #Initialization of Cells Parameters section, we initialize cells with a target radius, which is then used to calculate the target surface area and target volume. Starting cells with a distribution in radius prevents mitosis from synchronizing and can represent a cell line with cells in various stages of the cell cycle. The two lambda values here are temporary and are reassigned on the cell type level in (c) below. Cells are also initialized with a dictionary named “PTS” short for points. This dictionary is used to track reporter levels. (b) In the mitosis section of the steppable.py file, cells undergrow random fluctuations in their target radius, with bias toward increasing to cause cells to grow. When

294

Calvin Lam and Leonardo Morsut

targetSurface=4*math.pi*cell.dict[“RDM”]**2,” and “cell.targetVolume=(4/3)*math.pi*cell.dict[“RDM”]**3” (Fig. 5a). The first line assigns each cell a radius drawn from a Gaussian distribution with mean RADAVG and standard deviation RADDEV. These parameters are specified in the #Cell Size and Division Parameters section at the top of the steppables.py file. The two subsequent lines calculate the targetSurface and targetVolume based on this target radius. With target radius defined, we can also add in a growth and division model. Navigate to the #Mitosis Codes section in the steppables.py file, which effectively contains all the growth and division codes (Fig. 5b). Here, cells experience changes in their target radius through the line cell.dict[“RDM”]+=RNG.uniform (MTFORCEMIN,MTFORCEMAX). These changes are drawn from a uniform distribution with minimum MTFORCEMIN and maximum MTFORCEMAX. These parameters can be found in the #Cell Size and Division Parameters section. Once target radius is adjusted, the new targetSurface and targetVolume are calculated, and if the cell exceeds twice the volume calculated using the mean (RADAVG), it undergoes division. Upon dividing, cells are assigned a new target radius and repeat the cycle. “Cell.lambdaSurface” and “cell.lambdaVolume” are specified in the section #Defining and Calculating Cell Physical Properties Code/Parameters. They are also defined in the section #Initialization of Cells Parameters, but the value there is rapidly overwritten. They are included, so CC3D does not throw an error when trying to initialize the simulation. Now, navigate to the #Defining and Calculating Cell Physical Properties Code/Parameters section, where the actual values are specified via the lines “cell.lambdaSurface” and “cell.lambdaVolume” (Fig. 5c). Notice that their values can be specified on a per cell type basis but like the “cell.fluctAmpl” motility line, which could also be specified on an individual cell level via a formula if desired (see Note 12). In this implementation of size, growth, and division, there are minimal changes to make here if an additional cell type is added. Again, with example cell type 5 with name O, cell.lambdaSurface and cell.lambdaVolume would need to be specified in the #Defining and Calculating Cell Physical Properties Code/Parameters > Fig. 5 (continued) cells reach twice the expected cell volume, calculated as the volume using the mean of the population, cells undergo division. They are then reassigned a new target radius along with a new calculated target surface area and target volume. (c) Here, cells have their lambdaSurface and lambdaVolume specified on the per cell type level. For example, as G cells are adhesive and change their morphology when adhesive [12], we relax their lambdas to 1. Notice this is the same section for the motility code. In general, this section is useful for specifying parameters on a per cell type level

Engineering Morphogenic Signaling Networks In Silico

295

section. Then, a complete block for a cell type, excluding quantification and signaling, will have cell.fluctAmpl (added in the motility section above), cell.lambdaSurface, and cell.lambdaVolume in this implementation. Figure 4a serves as an example. 3.6 Adding Genetic Circuits to ISCL: Assigning and Tracking Reporter Levels in Cells

With ISCL and its physical properties defined, we can now add the genetic circuits that control the changes in adhesion in the reference signaling network. To start, navigate to the #Initialization of Cells Parameters section (Fig. 5a). Here, we specify that each cell has a dictionary list attached to it. For example, through the line “cell.dict[“PTS”]=[0],” we assign each cell a dictionary called “PTS” (for points) with one numerical value that will keep track of the level of reporter (and thus effect of signaling) in each cell. For our gray Y and green G cells, this dictionary will track GFP reporter levels, while for our blue B and red R cells, it will track RFP reporter levels. The value 0 here reflects that cells begin with 0 reporter as in the reference signaling network (See Note 13).

3.7 Adding Genetic Circuits to ISCL: Defining the Signal from Ligands on Neighbor Cells

With reporter tracking enabled and initial reporter condition defined, we will now calculate the ligand amount a focal cell is exposed to per simulation timestep. We first define these variables in the section #Defining Per Step Per Cell Variables for Signaling and Motility Code. PTSY, PTSG, PTSB, and PTSR refer to points from gray, green, blue, and red cells, respectively. Note that this is similar to the code we are using to calculate motility for each cell. Next, navigate to the #Iterating Over Neighbors Code section (Fig. 6a). Just as how we totaled the surface area the focal cell shares with cells of a specific type, we now also total the ligands the focal cell is exposed to from cells of a specific type. In the reference signaling network (Fig. 1b), gray cells do not have ligands that signal to neighbors. Therefore, they contribute 0 points via the code “PTSY+=0” (see Note 14). Now, turn to the line under “if neighbor.type==2:” (recall that type 2 is green cells with type name G), and look at the green points code “PTSG+=commonSurfaceArea*neighbor.dict[“PTS”] [0]/(neighbor.surface).” In the signaling network, green cells signal via a reporter that doubles as a ligand (GFP-lig) and is thus tracked by our “PTS” dictionary. To calculate how much reporterligand our focal cell is exposed to, we call upon the total reporter level of the neighboring cell via “neighbor.dict[“PTS”][0],” and then, divide it by the neighbor’s surface area to obtain a density. We then multiply by the focal cell’s shared surface area to effectively get the number of ligands that contact our focal cell. The += operation continuously adds to PTSG for each neighbor that is a green cell. A similar line occurs in the subsequent code for PTSB and PTSR. However, as blue and red cells signal via a constitutive ligand (not induced or affected by the genetic circuit), we instead have

296

Calvin Lam and Leonardo Morsut

A #Iterating Over Neighbors Code for neighbor, commonSurfaceArea in self.getCellNeighborDataList(cell): if neighbor is None: continue if neighbor.type==1: CSAY+=commonSurfaceArea PTSY+=0 if neighbor.type==2: CSAG+=commonSurfaceArea PTSG+=commonSurfaceArea*neighbor.dict["PTS"][0]/(neighbor.surface) if neighbor.type==3: CSAB+=commonSurfaceArea PTSB+=commonSurfaceArea*(CONEXPSCF/(1+math.exp(-(mcs-THETA)/XI)))/neighbor.surface if neighbor.type==4: CSAR+=commonSurfaceArea PTSR+=commonSurfaceArea*(CONEXPSCF/(1+math.exp(-(mcs-THETA)/XI)))/neighbor.surface

B B, Id3

Y, Id1

R, Id4

G, Id2 (ECAD)

(ECAD)

Fig. 6 Determining the level of ligand, each cell is exposed to. (a) To determine what ligand and how much of that ligand the focal cell is exposed to, we turn to the #Iterating Over Neighbors Code. Here, each cell in the signaling network is iterated over and its neighbors assessed. For neighbors of each cell type (gray, green, blue, red), the variables PTSY, PTSG, PTSB, and PTSR are calculated via the formula shown. Effectively, each variable totals the ligand level from one cell type by assessing the total ligand on a neighboring cell of the specified type, dividing by the neighbor’s surface area, and then by multiplying by the focal cell’s contacted surface area. This is repeated on all neighboring cells of that type. This process is then repeated for each cell type. (b) With this section of code, the ligand type on each cell is specified along with the type and level of ligands each cell in the simulation is exposed to

their total ligand calculated by the sigmoid equation “(CONEXPSCF/(1+math.exp(-(mcs-THETA)/XI))).” These parameters are constant and are defined in the #Constitutive Ligand Parameters section at the top of the steppables.py file (see Note 15). Just like with neighbors of type 2 (green cells), the amount of ligand contacting our focal cell is again calculated by dividing by neighbor surface and multiplying by the amount of surface contacting our focal cell. If additional cell types with signaling ligands are to be added, this is the section to do it. Again, using the example cell type 5 with

Engineering Morphogenic Signaling Networks In Silico

297

name O, the “if neighbor.type==5:” line would be added along with a PTSO line underneath (see Note 16). This completes defining ligands on cell types for our reference signaling network (Fig. 6b). 3.8 Adding Genetic Circuits to ISCL: Determining How the Focal Cell Responds to Signal from Ligands on Neighbors

Now that we know the type and how much ligand our focal cell is exposed to from each cell type, the next step is to determine how the focal cell responds to them. Navigate to the section #Changing Reporter as a Result of Signaling Changes. Here, we encode which and how cells process the ligands as a signal. We first notice, from our reference signaling network (Fig. 1b), that gray and green cells receive signal from blue and red cells (triangle ligand signaling to notched receptor) to express H.ECAD (high E-cadherin) and GFP-lig (GFP-ligand). We code this underneath “if (cell.type==1 or cell.type==2):” with the indented line: “DTRES= (1/(ALPHAYG+math.exp(-((PTSR+PTSB)-BETAYG)/EPSILONYG)))-(1/KAPPAYG)*cell.dict[“PTS”][0]” (Fig. 7a). This line calculates DTRES, the change (delta) in reporter (GFP-ligand in this case) as a result of signaling (see Note 17). Note that DTRES is first defined in the same code block as PTSY. The constant parameters (ALPHAYG, BETAYG, EPSILONYG, KAPPAYG) for this formula are found in the section #YG Signaling Parameters at the top of the steppable.py file. Once DTRES is calculated, it is added to the PTS dictionary via the line “cell.dict[“PTS”][0]+=DTRES,” effectively documenting the change in total reporter as a result of signaling at a timestep. Notice that we only specify PTSB and PTSR in this DTRES formula for gray and green cells. This is because in our reference signaling network, only blue and red cells signal to gray and green cells. Thus, this formula for DTRES not only calculates the change in reporter because of signaling but also specifies what ligands the cell should respond to. We repeat this code for the blue and red cells, first with the “if (cell.type==3 or cell.type==4):” line. However, the change in reporter, DTRES, is calculated only from PTSG as only green cells can signal to blue or red cells (Fig. 1b). Once DTRES is calculated, the cell reporter level is updated in the PTS dictionary. Notice that the parameters are different per genetic circuit. Gray and green cells have their own parameters, while blue and red cells have their own parameters (i.e., BETAYG vs BETABR; see Note 18). With this, we complete the ligand-receptor interactions in our signaling network (Fig. 7b). If desired, additional cell types that respond to a ligand in the signaling network can be added in this #Changing Reporter as a Result of Signaling Changes section via similar lines.

298

Calvin Lam and Leonardo Morsut

A #Changing Reporter as a Result of Signaling Changes if (cell.type==1 or cell.type==2): DTRES=(1/(ALPHAYG+math.exp(-((PTSR+PTSB)-BETAYG)/EPSILONYG)))-(1/KAPPAYG)*cell.dict["PTS"][0] cell.dict["PTS"][0]+=DTRES if (cell.type==3 or cell.type==4): DTRES=(1/(ALPHABR+math.exp(-((PTSG)-BETABR)/EPSILONBR)))-(1/KAPPABR)*cell.dict["PTS"][0] cell.dict["PTS"][0]+=DTRES

B (1)

B, Id3

L.ECAD/RFP

H.ECAD/GFP-lig

Y, Id1

H.ECAD/GFP-lig

G, Id2

(3)

R, Id4

L.ECAD/RFP

(ECAD)

(ECAD)

Fig. 7 Specifying ligand-receptor signaling and calculating the change in reporter due to signaling. (a) The section #Changing Reporter as a Result of Signaling Changes specifies which cells receive signal from what cell types. In this block, cells of type 1 or type 2 (gray or green), receive signal (PTSB+PTSR) from cells of type 3 or type 4 (blue or red), as determined by the notched synNotch receptor and triangle ligand, respectively. A similar process occurs for cells of type 3 or type 4, except they only receive signaling from green cells, hence PTSG. This signal is then fed into the formula shown to calculate DTRES, the change in reporter at this timestep as a result of signaling. This change is then added to the “PTS” dictionary as it tracks reporter level. (b) This section completes the ligand-receptor signaling and links it to the reporter levels crucial for tracking change in adhesion 3.9 Adding Genetic Circuits to ISCL: Changing the State of the Cell as a Result of Reporter Accumulation

Our signaling network is nearly complete. We can finish it by adding the last piece, the state transition code. In our framework, cells transition discretely to the activated state (gray to green and blue to red), representing changing from weakly/nonadhesive to adhesive via their representative cadherins. They can only transition when they have enough reporter, as it is a proxy for tracking their adhesion protein levels. We code this in the section #Changing State as a Result of Reporter Changes (Fig. 8a). As gray cells can

Engineering Morphogenic Signaling Networks In Silico

299

A #Changing State as a Result of Reporter Changes if cell.type==1: if cell.dict["PTS"][0]>=THRESHOLDUPYG: cell.type=2 if cell.type==2: if cell.dict["PTS"][0]=THRESHOLDUPBR: cell.type=4 if cell.type==4: if cell.dict["PTS"][0] 0) or consumed (S ij < 0) by reaction Rj (Fig. 2a, b). In addition to the stoichiometric matrix, GSMMs define the relationship between genes, proteins, and reactions via logical rules known as the Gene-Protein-Reaction GPR map [14]. As illustrated in Fig. 1, the GPR map describes the link between enzyme-coding

Fig. 1 Illustration of two Gene-Protein-Reaction (GPR) maps in Escherichia coli. (a) The Sdh enzyme is built from 4 peptides and catalyses the two reactions SUCD4 and SUCD1i. (b) GAPD reaction is catalysed by two proteins (GapA and GapC); GapC is composed of two peptides encoded by distinct genes. (c) The resulting GPR map. Figure adapted from Cardoso et al. [14]

348

Lilli J. Freischem and Diego A. Oyarzu´n

Fig. 2 Formulation of an FBA problem. (a) A genome-scale metabolic network is reconstructed. (b) Metabolic reactions and constraints are mathematically represented. (c) A set of linear equations is defined by the mass balance (Sv = 0). (d) An objective function Z is defined. (e) Fluxes that maximise Z are calculated. Figure adapted from [7]

genes and metabolic reactions (Rj ) in the GSMM [8]; this mapping is generally not one-to-one, because some reactions can be catalysed by several enzymes coded by different genes (which corresponds to an OR logical rule between genes) and others require enzyme complexes coded in different genes (which corresponds to an AND rule between genes). 2.2 Flux Balance Analysis

Flux Balance Analysis (FBA) is a widely adopted method for predicting metabolic phenotypes from GSMMs. In its most basic form, FBA assumes that organisms have evolved to optimise their metabolic activity and predicts the distribution of metabolic fluxes as the solution of a linear programming problem [7]. As illustrated in Fig. 2, FBA finds the solution vector v* to the following constrained optimisation problem: maximise : subjectto

Z = c⊤ v Sv = 0

ð2Þ

vlb ≤ v ≤ vub ,

where c is a vector of flux weights which encodes the cell objective function and vlb and vub are lower and upper bounds on reaction fluxes, respectively (Fig. 2e). The expression Sv = 0 describes the mass balance of metabolic reactions and assumes that intracellular metabolism is in steady state.

A Machine Learning Approach for Predicting Essentiality of Metabolic Genes

349

For microbial systems, a common FBA objective function is growth rate maximisation, whereby the weights in the vector c are employed to represent the relative contribution of macromolecular components such as lipid, carbohydrates, or proteins to biomass production. A key advantage of FBA is that it allows to incorporate various modalities of ’omics data into the models, for example, by scaling the flux bounds v lb, j and vub, j with data on protein abundance [15]. A common application of FBA is the simulation of different environmental conditions and genetic perturbations. Next we briefly explain how FBA can be employed to predict metabolic phenotypes in such scenarios. 2.3 Prediction of Gene Essentiality with FBA

A gene is defined as essential if its deletion impairs cell survival. The identification of essential genes is typically performed with large growth assays designed to test the impact of deletions in multiple growth conditions [1, 16]. In the case of metabolic genes, i.e., genes that code for metabolic enzymes, flux balance analysis has demonstrated an excellent ability to predict essentiality in microbes, particularly for those that have well-curated GSMMs. For example, the latest and most comprehensive models for the E. coli bacterium can predict gene essentiality with accuracy over 90% across several growth conditions [8]. Predicting gene essentiality with FBA is done via manipulation of the reaction flux bounds in Eq. (2). For example, deletion of the gene i encoding for reaction vi can be simulated by setting v lb, i = vub, i = 0 and then solving the growth maximisation problem. The fitness effect of deleting gene i is then: g ei = 1 - i , ð3Þ gW T where g WT and g i are the FBA-predicted maximal growth rate of the wild type and deletion strain, respectively. In practice, most fitness predictions are close to 0 or 1, so typically the fitness effect is binarized [17]: yi =

0,

if e i < 0:5

1,

otherwise:

ð4Þ

In more complex reactions, such as those that can be catalysed by several enzymes or those that require enzyme complexes as in Fig. 1, we first need to map the impact of the gene deletion onto the space of reactions through the GPR map. Overall, the above procedure can be looped for all genes in the GSMM and thus produce genome-scale predictions of essentiality. Moreover, often times we are interested in condition-dependent essentiality such as specific growth conditions; these can be accounted for by manipulating the bounds of uptake reactions for specific compositions of the growth media.

350

Lilli J. Freischem and Diego A. Oyarzu´n

Fig. 3 Construction of mass flow graphs. Mass flow graphs (MFG) can be directly built from the stoichiometric matrix and a flux vector [21]. The MFG is a directed graph with reactions as nodes, and edge weights corresponding to the total mass flow between two reactions, defined by the adjacency matrix in Eq. (7) 2.4 Mass Flow Graphs

A number of studies have employed tools from network science to study the connectivity of metabolic networks [18, 19]. In this representation, metabolism is described by a graph with nodes and edges that model the connectivity between metabolites and enzymes [20]. There are many ways to represent cellular metabolism as a graph, depending on how the nodes and edges are defined. For example, one can define metabolic graphs with metabolites as nodes, reactions as nodes, or both metabolites and reactions as nodes in a hypergraph. In this chapter, we focus on the second type of graphs: reaction-centric graphs, where nodes are assumed to be reactions. These graphs have several advantages and, as we shall see later, allow incorporate data from gene knockout screen into predictive machine learning algorithms. Our approach is based on a graph construction first proposed by Beguerisse-Dı´az and colleagues [21], termed “mass flow graphs” (MFGs). These are flux-based, weighted graphs that can be directly computed from flux solutions computed with FBA. The nodes in an MFG are reactions, and two nodes share an edge if they share metabolites either as reactants or products. To construct an MFG, we define the weight of the connection between reactions Ri and Rj as the total flux of metabolites produced by Ri that are consumed by Rj (Fig. 3). The MFG can be directly constructed from the stoichiometric matrix S and any FBA solution vector v*. By incorporating the FBA solution into the graph structure, MFGs account for environmental conditions and genetic perturbations [21]. Directed graphs can be specified in terms of their adjacency matrix A ; the entries of which define the connectivity between nodes. For example, if there is a connection from node i to node j with weight w ij , then the corresponding entry of the adjacency matrix is A ij = w ij . To construct the adjacency matrix of an MFG, we first unfold the flux vector v* into 2m forward and reverse reactions:

A Machine Learning Approach for Predicting Essentiality of Metabolic Genes

v*2m =

v*þ v* -

=

* * 1 absðv Þ þ v : 2 absðv* Þ - v*

351

ð5Þ

Similarly, the stoichiometric matrix S is unfolded as: S2m = ½S - S]

Im

0

0

diagðrÞ

,

where r is the m-dimensional reversibility vector such that rj

=

1

ifreaction j isreversible,

0

otherwise:

ð6Þ

The adjacency matrix of the MFG is then given by: * † * Mðv* Þ = Sþ 2m V ′ Jv S2m V ,

ð7Þ

where † denotes the matrix pseudoinverse, and the matrices are * defined as V* = diagðv*2m Þ, Jv = diagðSþ 2m v2m Þ, with 1 ðabsðS2m Þ þ S2m Þ, 2 1 = ðabsðS2m Þ - S2m Þ: 2

Sþ 2m =

ð8Þ

S2m

ð9Þ

As explained in detail in [21], the edge weights in the adjacency matrix Mðv* Þ have units of mass per unit of time and represent the strength of connectivity between reactions in terms of the chemical mass flow they share.

3

Binary Classification Supervised machine learning includes a wide range of tools for building predictive models from data [22]. A central problem in supervised learning is binary classification, where the goal is to extract patterns from observations of two classes of samples and use these patterns to automatically determine the class of new samples. In our case, we are interested building binary classifiers that determine whether a gene is essential or non-essential. More specifically, assume we have N pairs: xð1Þ , y ð1Þ , xð2Þ , y ð2Þ , . . . , xðN Þ , y ðN Þ ,

ð10Þ

where xðiÞ ∈p is a p-dimensional vector of features associated with the ith gene, and y ðiÞ ∈f0, 1g is the label or class of the ith gene. We denote non-essential and essential genes as the negative class (0) and positive class (1), respectively.

352

Lilli J. Freischem and Diego A. Oyarzu´n

The feature vectors and labels are assembled into a feature matrix X and a vector of class labels y: X = xð1Þ , xð2Þ , . . ., xðN Þ ′ , y = y ð1Þ , y ð2Þ , . . ., y ðN Þ ′ ,

ð11Þ

that are employed to train a classification algorithm to learn the patterns in X required to predict the labels in y. Once trained, the algorithm can automatically assign labels to new samples on the basis of their feature vectors. Given the feature vector of a new ^, the binary classifier predicts the corresponding class label sample x ŷ based on the learned classification rules. There exists a huge number of algorithms for binary classification, and these differ typically on assumptions on the shape of the feature space [22, 23]. The choice of specific algorithm depends on the characteristics of the data and the task at hand. Model training is usually accompanied by cross-validation analyses to perform model selection and prevent overfitting to the training data. For a comprehensive review on machine learning models aimed at a biology audience, we refer the reader to the excellent review by Greener and colleagues [24]. In this chapter, we explore the suitability of four standard classification algorithms for prediction of gene essentiality using the connectivity of the MFG features: Random Forest (RF) An ensemble method based on a collection of decision trees that are trained on random subsets of data. Each decision tree classifies samples based on orthogonal splits of the feature space. Logistic Regression Uses a logistic sigmoid function as transformation to linear output labels to model a binary output variable [25]. The resulting outputs correspond to the probability of a sample to be in either class. Multilayer Perceptron (MLP) Uses layers of feedforward neural networks to learn classification rules; they are universal function approximators and are thus able to learn non-linear models [26]. c-Support Vector Machine (SVC) Splits the feature space by finding a hyperplane that optimally separates the two classes in the feature space [26]. They aim to maximise the margin that separates the classes which reduces the risk of false classification.

A Machine Learning Approach for Predicting Essentiality of Metabolic Genes

4

353

Gene Essentiality Data for Model Training To train a binary classifier, we employed the Escherichia coli knockout data from Monk et al. [8] as well as the genome-scale metabolic model iML1515 E. coli developed in that same work. We computed the MFG of E. coli in the conditions matching the essentiality studies and extracted reaction-node features. Then, we translated gene essentiality measurements to reaction essentiality. The details of how we collected and labelled the dataset are discussed in this section.

4.1 Mass Flow Graph of iML1515

We chose E. coli as test organism for our method as it has been the focus of extensive gene essentiality studies and thereby the required essentiality data is available. Monk et al. [8] provide iML1515, the metabolic reconstruction of E. coli K-12 MG1655. Alongside the model, they provide in vitro gene essentiality measurements obtained from studying E. coli cells of the K-12 BW25113 strain. For our experiments, we used cell measurements with glucose as primary carbon source. To train machine learning models to predict reaction essentiality labels inferred from gene essentiality measurements (explained in Subheading 4.3), the computational E. coli model must match the cells that were investigated experimentally. We, thus adjusted the constraints on iML1515 accordingly. K-12 BW25113 lacks several genes that are present in K-12 MG1655: araBAD, rhaBAD, and lacZ [8], so we set the upper and lower flux bounds of the associated reactions to zero. The cells were growing aerobically, so we adjusted the oxygen exchange reaction bounds to simulate aerobic growth. Glucose was used as primary carbon source and, hence, the glucose uptake reaction bounds were adjusted appropriately. The updated bounds are summarised in Table 1. We calculated the FBA solution vector of the resulting model. Then, we used MFGpy, a python package for the automatic computation of mass flow graphs [27], to compute the corresponding MFG which contains 444 reactions (Fig. 4).

4.2 Feature Extraction

We obtain reaction features from the MFG adjacency matrix M, as defined in Eq. (7). Reactions that are not in the MFG do not contribute to the optimal FBA solution and thus are non-essential.1 Their corresponding rows and columns in M only contain zeros. Hence, they are removed from M for classification. Mk denotes the resulting k × k matrix where k is the number of nodes in the MFG.

1 Knocking out reactions force their flux to be zero. Reactions that are not in the MFG already have zero flux in the wild-type cell, so their knockout does not impact cell growth meaning they are, by definition, non-essential.

354

Lilli J. Freischem and Diego A. Oyarzu´n

Table 1 The adjusted lower bounds (lb) and upper bounds (ub) on reaction flux in iML1515 to simulate aerobic cell growth using glucose as primary carbon source Reactions

Adjusted bounds

L-arabinose isomerase (ARAI), L-ribulokinase (RBK_L1), Rhamnulose-1-phosphate aldolase (RMPA), Lyxose isomerase (LYXI), L-rhamnose isomerase (RMI), Rhamnulokinase (RMK), and B-galactosidase (LACZ)

lb = ub = 0

Oxygen exchange (EX_o2_e)

lb = - 20

Glucose uptake (EX_glc__D_e)

lb = - 10

The rows in Mk correspond to the outgoing edges and the columns to the incoming edges of each node. To consider both incoming and outgoing fluxes for each reaction, we concatenate Mk and its transpose. We obtain the k × 2k node-feature matrix: X = Mk M⊤k ,

ð12Þ

where row j of X contains first the outgoing and then the incoming edges to reaction Rj . 4.3 Essentiality Labels

Monk et al. [8] provide gene essentiality measurements for E. coli K-12 BW25113 in different environmental conditions. As MFGs have reactions as nodes, we had to translate gene essentiality to reaction essentiality using the GPR map included in the iML1515 model. The GPR map provide a formal connection that links genes to their corresponding proteins and the reactions they catalyse. Hence, they enable mapping data from the gene-space to the reaction-space. iML1515 provides the most up-to-date set of characterised genes and metabolic reactions for E. coli. It contains 1516 genes and 2712 reactions; essentiality measurements are available for 1502 of the genes [8].

4.3.1 Algorithm for Gene to Reaction Essentiality Mapping

We analysed the GPR map of iML1515 to find reaction knockouts that have a one-to-one mapping to a gene knockout under the following assumption: Assumption 1 If the knockout of gene G deactivates exactly one reaction Rj , the essentiality of Rj corresponds to the measured essentiality of G: e G = e Rj .

A Machine Learning Approach for Predicting Essentiality of Metabolic Genes

355

Fig. 4 Mass flow graph of Escherichia coli under aerobic growth with glucose as sole carbon source. The graph contains k = 444 reaction nodes and 14,459 edges and was computed from the most complete genome-scale reconstruction (iML1515) by Monk et al. [8]

This assumption formalises the following intuition: a gene knockout that deactivates a single reaction whilst all other reactions remain active is equivalent to the knockout of that single reaction. Therefore, the cell growth rate after the knockout of G is equivalent to the growth rate after the knockout of Rj , we have g G = g R . As j g the essentiality of a gene or reaction U is defined as e U = 1 - g U WT we have: gR g j ð13Þ eG = 1 - G = 1 = e Rj : gWT gWT The set of reactions with a one-to-one mapping to a gene knockout were computed from the GPR map. Out of the 444 reactions in the MFG of iML1515, only 155 reactions had a one-toone knockout mapping to a gene (Fig. 5). We thus obtained essentiality labels for 155 reactions when using one-to-one

356

Lilli J. Freischem and Diego A. Oyarzu´n

Fig. 5 Reaction and gene counts in iML1515. From top to bottom, we show the total number of genes, the total number of reactions, the number of reactions which are deactivated by single-gene knockouts that do not deactivate any other reactions (“1-to-1”) and that deactivate one or more reactions (“1-to-X”)

mappings only. To maximise the number of labelled reactions, we made the following second assumption: Assumption 2 The essentiality of reaction Rj corresponds to the measured essentiality of gene G if the knockout of G deactivates Rj . This holds even if there exists a reaction Rl ≠ Rj which is also deactivated by the knockout of G. We could increase the number essentiality labels obtained from the measurement data for reactions in the MFG to 255 when using this assumption (Fig. 5). Hence, the advantage from using this assumption (increasing the size of our dataset by a factor of 1:65) outweighs the potential approximation error that arises. This assumption is not entirely accurate and can lead to labelling too many reactions as essential: (i) a reaction that is not essential but is knocked out by the same single-gene knockout as a different, essential reaction, or (ii) a set of reactions which are not essential individually but are essential as a group and are knocked out by the same single-gene knockout. However, since we can only up/downregulate metabolic genes and are unable to target individual reactions, the definition of individual essentiality of reactions that can only be deactivated as a group is unclear. Reactions that cannot be deactivated by the knockout of a single gene were excluded from model training because their essentiality labels cannot be inferred from the available growth data. We obtain a small dataset of 255 reactions in the MFG with measured essentiality labels. The limited size is caused by the number of reactions in the MFG (only 444) as well as the number of reactions whose essentiality cannot be measured via single-gene knockouts. Nonetheless, this dataset is sufficient for our experiments which act as a proof of concept. Pseudocode for the algorithm is included in Appendix 1

A Machine Learning Approach for Predicting Essentiality of Metabolic Genes

5

357

Binary Classifiers Trained on Mass Flow Graphs We train binary classifiers to predict reaction essentiality labels from MFG reaction node features. An overview of the machine learning pipeline is depicted in Fig. 6. We first extract node features from the MFG (see Subheading 4.2) and apply preprocessing steps. The resulting feature matrix is provided as input to classification algorithms, together with the essentiality class labels computed from gene essentiality measurements.

5.1 Data Standardisation

Classification models, especially distance-based classifiers, such as SVCs, are negatively impacted by differently scaled dimensions in the data. They aim to maximise the distance between the separating plane and the support vectors; if features have different scales, features with large values will dominate over smaller features when calculating the distance. Prior to training such models, the dataset thus has to be standardised. Additionally, standardisation is a prerequisite for using dimensionality reduction techniques such as PCA. Tree-based models, however, do not require standardisation as they search for partition rules that best split the feature space which is independent of feature scaling. Typically, features are standardised by subtracting the feature mean and subsequently dividing by the feature standard deviation. However, our feature matrix is sparse; only 7% of the entries in the adjacency feature matrix of the iML1515 MFG are nonzero. Subtracting the feature means would break its sparsity structure and is thus omitted [28]. Hence, features are scaled to unit standard deviation only and are computed as:

Fig. 6 Training binary classification algorithms on mass flow graphs. The node feature matrix X is computed from the adjacency matrix of a mass flow graph M. Binary classifiers are trained using X and measured essentiality labels y. For hyperparameter optimization, different performance metrics were employed to account for the imbalanced number of essential and non-essential genes

358

Lilli J. Freischem and Diego A. Oyarzu´n

x~ij , ðsparseÞ =

x ij : σj

ð14Þ

where x ij is the entry in the ith column and the jth row of X and μi and σ i are the mean and standard deviation of column i in the training set. 5.2 Dimensionality Reduction

The adjacency feature matrix X is of size k × 2k and thus potentially of high dimension. The number of trainable parameters in classification algorithms increases with the dimensionality of the input data. To avoid training unnecessarily complex models, we attempt to reduce the dimensionality of the input features. Each column has at least one nonzero entry, so we are not able to simply remove columns from the input matrix. We, therefore, investigate the effects of adding dimensionality reduction, more specifically PCA, to the machine learning pipeline. The benefit would be a significant reduction in problem size which would be especially useful because of the small amount of data available for model training. PCA is a method for linear dimensionality reduction that minimises information loss and increases the interpretability of the data [29]. To compute the principal components of X we first compute its covariance matrix A as: A = covðX, XÞ =

1 n-1

n

ðxi - xÞðxi - xÞ⊤ :

ð15Þ

i=1

The principal components are the eigenvectors of this covariance matrix, so they are the vectors v that solve the following equation: A . v = λ . v,

ð16Þ

where λ is the corresponding eigenvalue. The eigenvectors are sorted by their eigenvalues, as they correspond to the amount of variation that samples show along the direction of the eigenvector. To reduce the dimensionality of X to q, select the eigenvectors with the q largest eigenvalues. Using PCA to reduce the dimensionality of X, we obtain XP CAq = PCAq ðXÞ where q denotes the number of principal components that are kept and, therefore, the dimensionality of XPCAq . It has to be chosen manually, which is a trade-off between minimising dimensionality and minimising information loss [29]. 5.3 Evaluating Classification Performance

A binary classifier labels samples as either positive or negative. Our samples are genes which are labelled as essential (positive, class label 1) or non-essential (negative, class label 0). The decisions made by the classifier fall into one of the following four categories:

A Machine Learning Approach for Predicting Essentiality of Metabolic Genes

359

1. True Positives (TP): essential reactions correctly labelled as essential. 2. False Positives (FP): non-essential reactions incorrectly labelled as essential. 3. True Negatives (TN): non-essential reactions correctly labelled as non-essential. 4. False Negatives (FN): essential reactions incorrectly labelled as non-essential. Performance metrics need to be defined to assess classification performance. Our dataset contains an unbalanced number of essential and non-essential reactions (25%/75% split). The overall classification accuracy, therefore, does not contain sufficient information to evaluate classifiers. Instead, performance was quantified using accuracy, precision, recall (true positive rate, TPR), specificity (true negative rate, TNR) and the F1 Score: accuracy = ðTP þ TNÞ=ðTP þ TN þ FP þ FNÞ, precision = TP=ðTP þ FPÞ, recallðTPRÞ = TP=ðTP þ FNÞ, specificityðTNRÞ = TN=ðTN þ FPÞ, F1 - score =

ð17Þ

2 . precision . recall : precision þ recall

In addition, we compute the macro-averaged F1 (Macro-F1) score which is the arithmetic mean of the F1 scores of the positive and negative class. Therefore, it prevents neglecting the classification performance on non-essential reactions (the minority class). Because of the significant imbalance of non-essential to essential reactions, we employed the normalised confusion matrix to evaluate classification performance (Fig. 7). It contains the entries of a standard confusion matrix (TP, FP, TN, FN) normalised by class size, so its rows sum to 1. This makes it easier to compare performance on the two classes.

Fig. 7 Normalised confusion matrix to assess the performance of a binary classifier

360

Lilli J. Freischem and Diego A. Oyarzu´n

Fig. 8 Measured essentiality of all reactions in the MFG of the K-12 BW25113 E. coli strain. The MFG was computed using the iML1515 model Table 2 Baseline cross-validation performance of different classifiers on the feature matrix X Classifier

Accuracy

F1 score

Macro-F1 score

Random Forest

76:0 ± 4:2%

85:3 ± 2:7%

58:3 ± 8:2%

Logistic Regression

74:0 ± 1:9%

85:0 ± 1:2%

44:2 ± 3:5%

Multi-Layer Perceptron

76:5 ± 1:9%

86:2 ± 1:0%

52:0 ± 8:3%

c-Support Vector Machine

74:5 ± 2:3%

85:3 ± 1:5%

44:5 ± 4:1%

All models were trained on a fixed fraction of the k connected nodes of the MFG, and we held out a subset of the graph nodes to test model performance on unseen data. The dataset was split using stratified sampling to account for the imbalance between the number of essential and non-essential reactions. To compare different models, feature sets, and hyperparameters, we used stratified 5-fold cross-validation. 5.4 Application to Escherichia Coli Metabolic Network

We conducted a first analysis of the essentiality distribution in the dataset. As shown in Fig. 8, around 75% of labelled reactions in the MFG are essential. Only 23 reactions (9%) have an essentiality between 0.1 and 0.9. This supports the claim that essentiality prediction can be approached as binary classification problem.

5.4.1 Baseline Classification Models

Our baseline models were trained on the adjacency features as derived in Eq. (12). The classifier with the best cross-validation Macro-F1 score of 58.3% was a Random Forest; this is our baseline model for further evaluations (Table 2).

A Machine Learning Approach for Predicting Essentiality of Metabolic Genes

361

Table 3 The hyperparameter search space used for optimising the Random Forest Random Forest Hyperparameters Hyperparameter

Search space

Number of estimators

randint(100, 500)

Maximum depth of the tree

randint(10,200)

Minimum number of

randint(1, 10)

samples per leaf Criterion

Gini or entropy

Max features

sqrt or log2

5.4.2 Hyperparameter Tuning

Next, we employed hyperparameter optimisation. To reduce the computational cost of this process, we used the hyperopt package [30] which employs Bayesian optimisation to automatically choose optimal hyperparameters. This method produces a reproducible and unbiased optimisation process and yields a significant reduction in computation time. Selecting hyperparameters through Bayesian optimisation requires the definition of a measure of prediction quality. As explained previously, accuracy is not a good measure of classification skill due to the class imbalance in our dataset. Empirically, we found that the hyperparameter settings optimised for accuracy produced unskilled classifiers which naively label all reactions as essential. Instead, we used the Macro-F1 score which prevents choosing models without classification skill for non-essential reactions by assigning equal weight to precision and recall on non-essential and essential reactions. The model with the highest Macro-F1 score on standardised adjacency features was a Random Forest. We used hyperopt to find optimal hyperparameters within the search space shown in Table 3. The optimised model contains 300 trees, with a maximum depth of 50, using information gain as criterion, and considering log 2 ð2kÞ features when looking for the best split. It had a cross-validation Macro-F1 score of 64.8%, with a precision of 82% and 86% recall. As shown in Table 4, the Macro-F1 score improved by 5.3% compared to the model with default hyperparameters, however, the accuracy and F1 Score are slightly lower (1.1% and 1.8%). Whilst the model is better at correctly classifying non-essential reactions, it makes more mistakes in its predictions of essential reactions.

5.4.3 Dimensionality Reduction using Principle Component Analysis

The feature vectors in X are 888-dimensional because the MFG contains 444 nodes; the dimensionality of our data is over three times the size of our dataset. Therefore, we explored if PCA can reduce the problem size and improve classification performance.

362

Lilli J. Freischem and Diego A. Oyarzu´n

Table 4 Performance of the Random Forest with largest Macro-F1 score on standardised adjacency features found in hyperparameter tuning Classifier

Accuracy

F1 score

Macro-F1 score

RF

74:9 ± 8:6%

83:4 ± 6:5%

64:8 ± 10:8%

A

B

Fig. 9 Principal component analysis of XM . (a) A cumulative plot of the explained variance of the principal components. (b) Random Forest cross-validation performance across varying number of principal components

The use of principal components as input features was found to be detrimental for performance. The resulting models could not reach cross-validation accuracies beyond 70% and the Macro-F1 scores were around 50% and lower (Fig. 9b). An explanation can be derived from Fig. 9a. The first 20 principal components explain less than 50% of the variance of the feature matrix, and we need 140 principal components to reach 90% of explained variance. Hence, a significant dimensionality reduction causes loss of information and prevents accurate classification of samples. Furthermore, even with 160 principal components, classification performance is significantly worse than on the original features. This indicates that the sparsity of the adjacency matrix which encodes the graph’s connectivity is key for accurate prediction. Based on these results, we decided not to apply dimensionality reduction in further experiments.

A Machine Learning Approach for Predicting Essentiality of Metabolic Genes

A

363

B

Fig. 10 Classification performance of the optimised Random Forest model on the reactions in our test set using standardised adjacency features 5.4.4 Evaluation on the Test Set

The model with the highest Macro-F1 score on the standardised adjacency features was a Random Forest with hyperparameter settings presented in Subheading 5.4.2. Evaluated on the test set, the model had an overall accuracy of 76% and a Macro-F1 score of 67.3%, with a precision of 82.5% and a recall of 86.8% (Fig. 10). We note that on the test set, the model has slightly better Macro-F1 score than on average during cross-validation (67.3% vs 64.8%). This is likely caused by the limited size of our test set of 51 reactions but shows that the model performs similar on reactions that did not guide the model selection process. To evaluate the quality of predictions, we compared the performance of our classifiers to the quality of essentiality predictions of iML1515. From the paper published by Monk et al. [8], we know that iML1515 has a gene essentiality prediction accuracy of 93.4% across all of their experiments. However, we are only training and predicting the essentiality of a small subset of reactions in the model, namely the reactions contained in the MFG. To enable a direct comparison we computed the metrics of iML1515 predictions on the genes associated with the reactions in our test set (Fig. 11). We find that the iML1515’s FBA predictions on these genes have 84.3% accuracy, precision and recall of both 89.5%, with a Macro-F1 score of 79.4%. In the direct comparison with predictions of the Random Forest on standardised adjacency features, we find that our method can predict essential reactions nearly as well as FBA; its true positive rate is only 2% lower. However, our models struggle noticeably to predict non-essential reactions and only reach a true negative rate of 46%, compared to iML1515’s 69%. Consequently, the Macro-F1 score of our method is 12% lower than that of iML1515.

364

Lilli J. Freischem and Diego A. Oyarzu´n

Fig. 11 Confusion matrix of the FBA gene essentiality predictions of iML1515 on the group of genes in our the test set. The list of genes in the test set can be found in Appendix 2

6 Transfer Learning To explore the predictive power of the approach, we investigated whether classification models trained on essentiality labels of one MFG could be used to predict gene essentiality in a different environmental condition. For this purpose, we generated a second MFG of iML1515, now using acetate as carbon source. Acetate was chosen as it had the highest number of genes with different measured essentiality compared to glucose. We thus expect models to find it harder to predict gene essentialities in this environment. Hence, it is most suited for assessing the suitability of our method for transfer learning and the prediction of conditional essentiality, i.e., genes which are essential only under specific environmental conditions. We computed the MFGs for both environmental conditions and generated feature sets from the adjacency matrices as presented in Subheading 4.2. To use models trained on the glucose MFG for predictions on the acetate MFG, the columns in the feature matrix need to correspond to the same features; in this case, to edges to and from the same reaction nodes. The two MFGs contain 397 of the same reactions; 47 and 44 reactions appear only in the MFG of glucose and acetate, respectively. We removed columns that correspond to reactions which only appear in one of the MFGs from each feature matrix. Next, we computed essentiality labels for reactions in the acetate MFG. Labels were available for 252 of the overlapping reactions; this set was used for evaluation.

A Machine Learning Approach for Predicting Essentiality of Metabolic Genes

365

Fig. 12 Predictions of the Random Forest trained on the MFG of iML1515 using glucose as carbon source on the MFG of iML1515 using acetate as carbon source

We trained the Random Forest with optimised hyperparameters on the training set of the glucose reactions and used the glucose test set for validation; this ensured that removing reactions that are unique two the glucose MFG did not significantly impact classification performance. Classification skill did not decrease when these columns were removed from the feature matrix. Evaluated on acetate reaction nodes, the model had an overall accuracy of 70.2% and a Macro-F1 score of 64.8% The corresponding confusion matrix (Fig. 12a) shows that its true positive rate is 11% lower and its true negative rate is 9% higher than on the test set of reactions of the glucose MFG (compare to Fig. 10a). The confusion matrix and precision-recall curve in Fig. 12 indicate that the classifier trained on the glucose MFG can predict reaction essentiality in an MFG of a closely related cell.

7

Discussion We presented a method for predicting gene essentiality using Flux Balance Analysis in combination with machine learning algorithms. Essentiality prediction was approached as a binary classification problem and reaction essentiality labels were extracted from gene essentiality measurements. The application of our method was demonstrated using iML1515, the metabolic network of E. coli, as an example. We simulated cells according to the biological scenarios in which essentiality studies were conducted, generated the MFG of iML1515 and, in combination with reaction essentiality labels inferred from gene essentiality measurements published by Monk et al. [8], created a small dataset for training a binary classifier, Structural reaction node features were extracted from the MFG and an array of

366

Lilli J. Freischem and Diego A. Oyarzu´n

binary classifiers was trained to predict essentiality labels using our feature set. The best model was a Random Forest trained on standardised adjacency features with an accuracy of 74.9% and Macro-F1 score of 64.8%. The small number of non-essential reactions in the training dataset posed a challenge to our classifiers and caused low prediction performance on this essentiality class. We also demonstrated the potential of this approach to predict essentiality across growth conditions. These are promising results and underline the importance of crafting informative features for prediction. In our case, we employed adjacency features can be used to make predictions in closely related genome-scale models, but they are not well-suited for predictions across organisms as these will have a significant number of distinct reactions in their MFG. Further studies should be conducted to investigate whether more elaborate machine learning models combined with different structural feature sets could enable transfer learning across more distant genome-scale metabolic models and flux distributions, for example through the use of graph neural networks [32]. Compared to the FBA predictions of iML1515, which were used as benchmark, we found that our method achieves true positive rates near the stateof-the-art but performs significantly worse at detecting non-essential reactions. Machine learning is already subject of growing interest in metabolic engineering tasks [31], and its integration with genomescale modelling has led to progress on multiple challenges encountered in the field [33]. With the advent of more detailed and bettercurated metabolic models, current literature suggests that tighter integration of black-box machine learning with mechanistic descriptions afforded by genome-scale models offer substantial opportunities for progress in both systems and synthetic biology.

Appendix 1: Acronyms E.coli iML1515 FBA GPR GSMM MFG MLP PCA SVC

Escherichia coli. 23, 28-30, 35, 40 Escherichia coli iML1515 metabolic model. 28-31, 33, 35, 38-41 Flux Balance Analysis. 23, 24, 40 Gene-Protein-Reaction. 23, 30, 31, 41 Genome-Scale Metabolic Model. 22, 23 Mass Flow Graph. 26, 28, 39, 40 Multilayer Perceptron. 28 Principal Component Analysis. 33, 34, 37 c-Support Vector Machine. 28, 32

A Machine Learning Approach for Predicting Essentiality of Metabolic Genes

367

Appendix 2: Mapping Gene Essentiality to Reaction Essentiality In this section, we present the pseudocode of the algorithm used to find reactions in iML1515 that can be deactivated by single-gene knockouts. Algorithm 1 FINDREQUIREDGENES(rule)

The GPR map are provided individually for each reaction in iML1515. In MAPGENETOREACTIONKNOCKOUT, we loop through the rules for each reaction Rj . We use the recursive algorithm FINDREQUIREDGENES to compute the set of genes that, if knocked out individually, would knockout Rj . If the set of genes contains exactly one gene g, reaction Rj is knocked out by the single-gene knockout of g, but not by any other genes. In this case, we assume the essentiality of Rj to equal the essentiality of g following Assumption 2. We append reaction Rj to the list of reactions that are deactivated by the single-gene knockout of g.

Algorithm 2 MAPGENETOREACTIONKNOCKOUTS(model)

368

Lilli J. Freischem and Diego A. Oyarzu´n

Appendix 3: List of Genes in the Test Set We evaluated the performance of our machine learning algorithm on a test set with 51 genes, corresponding to 20% of all nodes in the MFG. All these genes were held out during model selection, hyperparameter optimisation, and when training the model in Subheading 5.4.4: adk, argA, argH, aroB, bioB, bioF, dapB, dapF, deoB, dxr, eno, fabG, fabZ, fadE, folE, glmM, glmS, glmU, gltX, glyA, gmk, gnd, gsk, hemD, hisA, hisD, iscU, ispU, lpxC, ltaE, murC, nadB, nadC, panD, pssA, purA, purC, purM, purN, ribE, serA, tesB, thiE, thiG, thiL, tnaA, trpD, ubiD, waaA, yrbG, zupT. References 1. Rancati G, Moffat J, Typas A, Pavelka N (2018) Emerging and evolving concepts in gene essentiality. Nat Rev Genet 19(1): 34–49. issn: 1471-0064 2. Stephanopoulos G, Aristidou AA, Nielsen J (1998) Metabolic engineering: principles and methodologies 3. Zhan T, Boutros M (2016) Towards a compendium of essential genes-From model organisms to synthetic lethality in cancer cells. Crit Rev Biochem Mol Biol 51:74–85. issn: 15497798 4. Lu Y, Deng J, Rhodes JC, Lu H, Lu LJ (2014) Predicting essential genes for identifying potential drug targets in Aspergillus fumigatus. Comput Biol Chem 50:29–40 5. Aromolaran O, Aromolaran D, Isewon I, Oyelade J (2021) Machine learning approach to gene essentiality prediction: a review. Brief Bioinform 22(5):bbab128. issn: 14774054 6. Campos TL, Korhonen PK, Gasser RB, Young ND (2019) An evaluation of machine learning approaches for the prediction of essential genes in eukaryotes using protein sequence-derived features. Comput Struct Biotechnol J17:785– 796. issn: 2001-0370 7. Orth JD, Thiele I, Palsson BØ (2010) What is flux balance analysis? Nat Biotechnol 28(3): 245–8. issn: 1546-1696 8. Monk JM, et al (2017) iML1515, a knowledgebase that computes Escherichia coli traits. Nat Biotechnol 35(10):904–908. issn: 15461696 9. Plaimas K, Eils R, Ko¨nig R (2010) Identifying essential genes in bacterial metabolic networks with machine learning methods. BMC Syst Biol 4(1):1–16. ISSN: 1752-0509 10. Nandi S, Subramanian A, Sarkar RR (2017) An integrative machine learning strategy for improved prediction of essential genes in Escherichia coli metabolism using flux-coupled

features. Mol. BioSyst. 13(8):1584–1596. Publisher: Royal Society of Chemistry 11. Freischem LJ, Barahona M, Oyarzu´n DA (2022) Prediction of gene essentiality using machine learning and genome-scale metabolic models. IFAC-PapersOnLine 55(23):13–18. ISSN: 24058963 12. Lewis NE, Nagarajan H, Palsson BO (2012) Constraining the metabolic genotypephenotype relationship using a phylogeny of in silico methods. Nat Rev Microbiol 10(4): 291–305. Publisher: Nature Publishing Group. issn: 17401534 13. King ZA, Lu J, Dr€ager A, Miller P, Federowicz S, Lerman JA, Ebrahim A, Palsson BO, Lewis NE (2016) BiGG models: a platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Res 44 (D1):D515–D522. Publisher: Narnia. issn: 0305-1048 14. Cardoso J, Vilac¸a P, Soares S, Rocha M (2012) An algorithm to assemble gene- proteinreaction associations for genome-scale metabolic model reconstruction. In: Lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 7632 LNBI, pp 118–128. issn: 03029743 15. Colijn C, Brandes A, Zucker J, Lun DS, Weiner B, Farhat MR, Cheng T-Y, Moody DB, Murray M, Galagan JE (2009) Interpreting expression data with metabolic flux models: predicting Mycobacterium tuberculosis mycolic acid production. PLoS Comput Biol 5(8): e1000489. issn: 1553-7358 16. Price MN, et al (2018) Mutant phenotypes for thousands of bacterial genes of unknown function. Nature 557(7706):503–509. Publisher: Nature Publishing Group. issn: 1476-4687. (Visited on 05/01/2023)

A Machine Learning Approach for Predicting Essentiality of Metabolic Genes 17. Bartha I, di Iulio J, Venter JC, Telenti A (2018) Human gene essentiality. Nat Rev Genet 19(1): 51–62. issn: 1471-0064 18. Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45(2): 167–256 19. Dusad V, Thiel D, Barahona M, Keun HC, Oyarzu´n DA (2021) Opportunities at the interface of network science and metabolic modeling. Front Bioeng Biotechnol 8: 591049. issn: 2296-4185 20. Smart AG, Amaral LAN, Ottino JM (2008) Cascading failure and robustness in metabolic networks. Proc Natl Acad Sci USA 105(36): 13223–13228. issn: 1091-6490. (Visited on 05/21/2016) 21. Beguerisse-Dı´az M, Bosque G, Oyarzu´n D, Pico´ J, Barahona M (2018) Flux-dependent graphs for metabolic networks. npj Syst Biol Appl 4(1):32. ISSN: 20567189 22. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. ISBN: 978-0387-84857-0 ˜ aga P, et al (2006) Machine learning in 23. Larran bioinformatics. Brief Bioinform 7(1): 86–112. issn: 1467-5463 24. Greener JG, Kandathil SM, Moffat L, Jones DT (2022) A guide to machine learning for biologists. Nat Rev Mol Cell Biol 23(1): 40–55. ISSN: 1471-0080 25. Dreiseitl S, Ohno-Machado L (2002) Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform 35(5):352–359. ISSN: 15320464 26. Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. In Proceedings of the 2007 conference on emerging artificial intelligence applications in

369

computer engineering: real word AI systems with applications in EHealth, HCI, information retrieval and pervasive technologies. IOS Press, New York, pp 3–24. ISBN: 9781586037802 27. Freischem L, Oyarzu´n DA (2023) MFGpy: computation of mass flow graphs for genomescale metabolic models. https://doi.org/10. 5281/zenodo.7882034 28. Zheng A, Casari A (2018) Feature engineering for machine learning: principles and techniques for data scientists, 1st edn. O’Reilly Media, Inc, New York. ISBN: 1491953241 29. Jolliffe IT, Cadima J (2016) Principal component analysis: a review and recent developments. Philos Trans R Soc A Math Phys Eng Sci 374(2065):20150202 30. Bergstra J, Yamins D, Cox D (2013) Making a science of model search: hyper-parameter optimization in hundreds of dimensions for vision architectures. In: Dasgupta S, McAllester D (eds) Proceedings of the 30th international conference on machine learning, vol 28. Proceedings of Machine Learning Research 1. PMLR, Atlanta, pp 115–123 31. Kim GB, Kim WJ, Kim HU, Lee SY (2020) Machine learning applications in systems metabolic engineering. Curr Opin Biotechnol. Analytical Biotechnology 64:1–9. issn: 0958-1669 32. Hasibi R, Michoel T, Oyarzu´n DA (2023) Integration of graph neural networks and genome-scale metabolic models for predicting gene essentiality, biorxiv. https://doi.org/10. 1101/2023.08.25.554757 33. Zampieri G, Vijayakumar S, Yaneske E, Angione C (2019) Machine and deep learning meet genome-scale metabolic modeling. PLoS Comput Biol 15(7):e1007084. issn: 15537358

Chapter 21 The Causes for Genomic Instability and How to Try and Reduce Them Through Rational Design of Synthetic DNA Matan Arbel-Groissman, Itamar Menuhin-Gruman, Hader Yehezkeli, Doron Naki, Shaked Bergman, Yarin Udi, and Tamir Tuller Abstract Genetic engineering has revolutionized our ability to manipulate DNA and engineer organisms for various applications. However, this approach can lead to genomic instability, which can result in unwanted effects such as toxicity, mutagenesis, and reduced productivity. To overcome these challenges, smart design of synthetic DNA has emerged as a promising solution. By taking into consideration the intricate relationships between gene expression and cellular metabolism, researchers can design synthetic constructs that minimize metabolic stress on the host cell, reduce mutagenesis, and increase protein yield. In this chapter, we summarize the main challenges of genomic instability in genetic engineering and address the dangers of unknowingly incorporating genomically unstable sequences in synthetic DNA. We also demonstrate the instability of those sequences by the fact that they are selected against conserved sequences in nature. We highlight the benefits of using ESO, a tool for the rational design of DNA for avoiding genetically unstable sequences, and also summarize the main principles and working parameters of the software that allow maximizing its benefits and impact. Key words Genetic stability, Evolvability, DNA damage repair, DNA optimization, Computational models

1

Introduction The issue of evolutionary stability of heterological genes is gaining more and more traction. At the beginning of the genetic revolution, just the mere concept of transforming living organisms seemed far-fetched and hard to achieve. Nowadays, with significant advances in the field, it has become increasingly easy to insert foreign DNA into living cells. The question that remains is how to design the inserted DNA so it will remain functioning for as long a period as possible [1, 2]. When inserting a heterological gene into

Matan Arbel-Groissman and Itamar Menuhin-Gruman contributed equally with all other contributors. Jeffrey Carl Braman (ed.), Synthetic Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2760, https://doi.org/10.1007/978-1-0716-3658-9_21, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2024

371

372

Matan Arbel-Groissman et al.

Fig. 1 Evolutionary stability of genetically engineered microorganisms. Mutants that have lost their function may take over the population of genetically engineered microorganisms due to a selection process

an organism, it imposes a metabolic burden on it, while not carrying any evolutionary benefit with it [3, 4]. This metabolic load is even more daunting when the heterological gene is highly expressed [3, 5], which is almost always the case in biotechnological manufacturing. Mutation, generated either by mutagenic DNA damage repair [6, 7] or errors in the replication, will eventually lead to the creation of an individual in the population that does not express the heterological gene. This mutated individual possesses a higher fitness than the rest of the population, as it does not carry the metabolic load of highly expressing said heterological gene and can focus all its finite cellular resources on growing and dividing. It will take over the population, leading at first to a decline in the overall protein production, and eventually to the complete loss of the inserted gene from the population (Fig. 1). This highly problematic Achilles heel in bio-engineering and biotechnology will only be completely solved with a combination of different solutions and technologies. However, rational designing of DNA sequence in genetic circuits can increase the evolutionary stability of any genetic circuit [8]. The DNA sequence inherently affects the mutation and recombination rate of the sequence [9, 10]. In any given sequence, there are genetically unstable regions, whether they are hard-to-replicate regions [11] that promote polymerase slippage [12, 13], recombination-prone areas, or any other specific sequence that can elevate the mutation or recombination event in a certain region [14]. The two most characterized and known types of sequences that can cause instability are simple sequence repeats (SSRs) and homology regions that can cause recombination-mediated deletion (RMD) [8, 15]. Those types of sequences are inherently unstable. SSR is any repetitive sequence that can either cause polymerase slippage or misalignments during replication that can result in insertion or deletion events.

How to Rationally Design DNA and Reduce Genomic Instability

373

RMD is an event of sequence deletion caused by recombination [16]. When DNA is damaged, especially after double-stranded break, the cell tries frantically to repair said damage [17]. The way the damaged site is repaired varies between organisms. Some, like mammals, will prefer non-homology end joining (NHEJ) to repair the double-stranded break [18]. Others, like yeast, will prefer recombination [19]. In recombination, after the occurrence of double-stranded break (DSB), special DNA endonuclease resects DNA from the point of breakage, exposing ssDNA from both sides of the DSB [20]. Those exposed strands now migrate, trying to search for homology [21, 22]. After finding homology, the exposed strand is elongated; then, the elongated and exposed ssDNAs from both sides are re-attached, and DNA polymerase fills the remaining gaps [23, 24]. When done properly, this allows for seamless DNA repair, but having homology sites surrounding a genetic circuit can create a large deletion event during recombination as a by-product of several homology recombination DNA repair pathways [10, 25]. Such homology sites are sometimes created in some cloning techniques and can drastically lower the evolutionary stability of a given construct [26, 27]. Evolutionary instability is not the only issue that can hurt synthetic construct function over time. Epigenetic silencing has also the potential to render any genetic engineering obsolete. Epigenetic silencing is the regulation of gene expression by either DNA modification or by the three-dimensional organization of the chromatin structure [28]. DNA is tightly packed around nucleosomes, comprised of histones [29]. Those histones have different posttranslational modifications determining their affinity to other histones in the region [30]. When many histones in a certain region have modification strengthening their affinity to each other, a heterochromatin region is established [31], an inaccessible area greatly reducing or shutting down all together the expression of all the genes in said area [32]. DNA modification, mainly methylation, creates a similar phenomenon [33, 34]. DNA methylation, mainly around genes and their promotor, can lower RNA polymerase ability to transcribe them, thus changing gene expression pattern [35]. Both of these are naturally occurring processes, one of which happens in all eukaryotes and the second mainly happens in mammalian and insectoid cells [36]. Both of those mechanisms for gene silencing can greatly hurt any genetic circuit. The success of inserting a heterogenic gene and making sure it is sufficiently expressed can be erased quite easily by the epigenetic regulation of its expression. These two weak spots in genetic engineering, DNA instability hotspots and epigenetic silencing, will be reviewed in the chapter. More importantly, ESO, a software tool that allows to rationally design synthetic circuit in a way which decreases DNA instability

374

Matan Arbel-Groissman et al.

hotspots and epigenetics silencing, will be reviewed. We will go over the inner mechanism of the model and highlight ways to utilize the software in order to maximize its benefits. 1.1 SSR and RMD in Genetic Instability

Mutation can predominantly occur at either DNA replication or as a result of DNA damage repair [6]. During DNA replication, the DNA polymerase has an internal error rate [37], which is greatly influenced by its structure and the sequence of the DNA replicating [38, 39]. In some organisms, there are several DNA polymerases, some of which are considered “error-prone” with a much higher mutation rate than other polymerase [6]. Erasing some of those “error-prone” polymerases will greatly decrease the organism mutation rate, leading to an increased evolutionary stability of any heterological gene [40]. Many DNA polymerases have a proofreading domain, which is able to detect and repair mutation during replication, and its ability is what determines the polymerase error rate [41]. Although it has not been done yet, theoretically speaking, with advances in protein bio-informatic tools and protein engineering, it will be possible to create a polymerase mutant with increased proofreading capabilities (although probably decreasing replication speed), which can greatly aid various future synthetic biology technologies. During DNA replication, any DNA polymerase has an uneven mutation rate, greatly increasing in specific segment, mutational hotspot [42]. Those regions are mainly comprised of SSR, simple sequence repeats. First, those regions promote slippage mutation, resulting in insertion or deletion [43, 44] (Fig. 2). This is caused by misalignment of the DNA strands during replication. Second, those hard-to-replicate regions, especially GC-rich repetitive sequences, result in numerous SNP [11]. SSRs promote so much genetic instability that when compared to the rest of the coding genome, essential genes have markedly less SSR sites [8]. This is theoretically because of the evolutionary disadvantage of hating a mutational hotspot in an essential gene. Detection and prevention of SSR sites in any sequence can greatly increase the evolutionary stability of said sequence. The second cause of mutation is DNA damage, or more precisely, DNA damage repair [6]. When DNA is damaged, whether it is ssDNA or dsDNA breaks, or any chemical modification of the nucleotide, the result almost always is not alteration in the sequence, but simply damaged DNA that cannot be translated. The mutations are created when said damaged sites are repaired. Generally speaking, DNA damage repair can be divided into two groups, “error-prone” and “error-free” [45, 46]. “Error-prone” DNA damage repair, which can be divided into many sub-groups, relies on special DNA polymerase that can replace or replicate the damage site, where the normal DNA polymerase cannot [6]. Those polymerases are usually very mutagenetic [47]. The second group of DNA repair pathways are collectively termed “error-free” and all

How to Rationally Design DNA and Reduce Genomic Instability

375

Fig. 2 An illustration of the mechanism by which some of the insertion and deletion mutation happen during the replication of SSR region. During replication, events of dissociation can stochastically happen, and the problem arises in regions with many repeats where misalignment can easily occur. After misalignment, deletion or insertion will happen, depending on which strand was folded on itself. The size of deletion or insertion will mostly be determined by the size of the repetitive unit. Such mutation can easily can frameshift mutation, resulting in the deactivation of the gene of interest

rely on homolog recombination to some extent; although it is not implied from their name, those repair pathways do cause genomic instability, just to a lesser extent than the “error-prone” pathways [48]. As those repairs use homologous recombination in order to repair damaged DNA using data from the sister chromatid, the presence of homology sites in the same sequence can promote deletion. Detecting those RMD sites in a synthetic sequence and replacing them will greatly decrease the chances of the synthetic sequence being deleted as a cause of recombination (Fig. 3).

2

Counter Selection for SSR in Evolutionary Conserved Sequences If indeed certain sequences do promote genetic instability, they will be selected against important sequences that have a low tolerance for mutations, for instance coding sequences (CDS) [49] that exhibit great evolutionary conservation [50]. While there were previous results [8], showing that SSR sites are significantly less prone to be found in essential gene compared to the rest of the genome in the budding yeast, we sought out to verify it across multiple organisms from different domains. We compared between the distribution of the patterns in the different genomic regions of

376

Matan Arbel-Groissman et al.

Fig. 3 Illustration demonstrating three possible mechanisms for recombinationmediated deletion. A Flowing double-stranded break, DNA is resected 5–3 until homology is found. After uncovering the homology region, the sticky ends of the homology region will anneal, leading to the exonuclease cutting the excess DNA, and the untethered DNA will be ligated, leading to the deletion of any sequence that was between the homology sequences. B Inter-chromatid recombination between the sister chromatids will lead to a duplication of one of the chromatids and large deletion in the second one. If this happens near or at a synthetic gene, this will probably lead to two daughter cells with non-functioning synthetic gene. C Intra recombination event, in which there was an unwanted recombination event between the homology sequence, this will lead to the deletion of the sequence between the homology regions, similarly to illustration A

native sequences (list of organisms in Table 1 supplementary data) that were compared against this distribution calculated for the randomized that preserves important genomic attributes. This statistical comparison included the examination of two opposite hypotheses: The real genome has significantly higher or lower SUR (short units’ repetitiveness) values than expected in random and that may imply selection for or against patterns, respectively. Each random version of the genome (a total of 200 random genomes) was generated in the following manner: The randomization of each CDS sequence was achieved by shuffling the synonymous codons. Thus, the randomized CDS sequences maintained the GC content, amino acid content and order, and codon frequencies of the original genome. The randomization of the other types of sequences (regulatory and intergenic sequences) preserves the di-nucleotide distribution

How to Rationally Design DNA and Reduce Genomic Instability

377

Table 1 List of species TaxID

Species

Nickname Phylum

Domain

64091

Halobacterium salinarum NRC-1

Hsal

Euryarchaeota

Archaea

469382

Halogeometricum borinquense DSM 11551

Hbor

Euryarchaeota

Archaea

1737403 Nanohaloarchaea archaeon SG9

Narc

Candidatus

Archaea

797304

Natronobacterium gregoryi SP2

Ngre

Euryarchaeota

Archaea

186497

Pyrococcus furiosus DSM 3638

Pfur

Euryarchaeota

Archaea

273063

Sulfurisphaera tokodaii str. 7

Stok

Crenarchaeota

Archaea

28892

Methanofollis liminatans DSM 4140

Mlim

Euryarchaeota

Archaea

414004

Cenarchaeum symbiosum A

Csym

Thaumarchaeota

Archaea

224308

Bacillus subtilis subsp. subtilis str. 168

Bsub

Firmicutes

Bacteria

283166

Bartonella henselae str. Houston-1

Bhen

Proteobacteria

Bacteria

192222

Campylobacter jejuni subsp. jejuni NCTC 11168 = ATCC 700819

Cjej

Proteobacteria

Bacteria

867900

Cellulophaga lytica DSM 7489

Clyt

Bacteroidetes

Bacteria

546414

Deinococcus deserti VCD115

Ddes

DeinococcusThermus

Bacteria

511145

Escherichia coli str. K-12 substr. MG1655

Ecol

Proteobacteria

Bacteria

85962

Helicobacter pylori 26695

Hpyl

Proteobacteria

Bacteria

189518

Leptospira interrogans serovar Lai str. 56601

Lint

Spirochaetes

Bacteria

449447

Microcystis aeruginosa NIES-843

Maer

Cyanobacteria

Bacteria

83332

Mycobacterium tuberculosis H37Rv

Mtub

Actinobacteria

Bacteria

722438

Mycoplasma pneumoniae FH

Mpne

Tenericutes

Bacteria

208964

Pseudomonas aeruginosa PAO1

Paer

Proteobacteria

Bacteria

373153

Streptococcus pneumoniae D39

Spne

Firmicutes

Bacteria

1314

Streptococcus pyogenes

Spyo

Firmicutes

Bacteria

391774

Desulfovibrio vulgaris str. Hildenborough

Dvul

Proteobacteria

Bacteria

754

Pasteurella dagmatis ATCC 43325

Pdag

Proteobacteria

Bacteria

289376

Thermodesulfovibrio yellowstonii DSM 11347

Tyel

Nitrospirae

Bacteria

1162668 Leptospirillum ferrooxidans C2–3

Lfer

Nitrospirae

Bacteria

216432

Croceibacter atlanticus HTCC2559

Catl

Bacteroidetes

Bacteria

59374

Fibrobacter succinogenes subsp. succinogenes S85

Fsuc

Fibrobacteres

Bacteria

63737

Nostoc punctiforme PCC 73102

Npun

Cyanobacteria

Bacteria (continued)

378

Matan Arbel-Groissman et al.

Table 1 (continued) TaxID

Species

Nickname Phylum

Domain

351607

Acidothermus cellulolyticus 11B

Acel

Actinobacteria

Bacteria

196627

Corynebacterium glutamicum ATCC 13032

Cglu

Actinobacteria

Bacteria

2021

Thermobifida fusca strain UPMC 901

Tfus

Actinobacteria

Bacteria

521095

Lancefieldella parvulum DSM 20469

Lpar

Actinobacteria

Bacteria

266117

Rubrobacter xylanophilus DSM 9941

Rxyl

Actinobacteria

Bacteria

525909

Acidimicrobium ferrooxidans DSM 10331

Afer

Actinobacteria

Bacteria

1608957 Euzebya pacifica strain DY32–46

Epac

Actinobacteria

Bacteria

194439

Chlorobium tepidum TLS

Ctep

Chlorobi

Bacteria

83555

Chlamydia abortus strain 15-70d24

Cabo

Chlamydiae

Bacteria

511051

Caldisericum exile AZM16c01

Cexi

Caldiserica

Bacteria

240015

Acidobacterium capsulatum ATCC 51196

Acap

Acidobacteria

Bacteria

309799

Dictyoglomus thermophilum H-6-12

Dthe

Dictyoglomi

Bacteria

331113

Simkania negevensis Z

Sneg

Chlamydiae

Bacteria

316274

Herpetosiphon aurantiacus DSM 785

Haur

Chloroflexi

Bacteria

926569

Anaerolinea thermophila UNI-1

Athe

Chloroflexi

Bacteria

226186

Bacteroides thetaiotaomicron strain DSM 2079 Bthe

Bacteroidetes

Bacteria

243090

Rhodopirellula baltica SH 1

Rbal

Planctomycetota

Bacteria

1142394 Phycisphaera mikurensis NBRC 102666

Pmik

Planctomycetota

Bacteria

322098

Aster yellows witches’-broom phytoplasma AYWB

Ayel

Tenericutes

Bacteria

243273

Mycoplasma genitalium G37

Mgen

Tenericutes

Bacteria

46541

Thermosipho melanesiensis strain 431

Tmel

Thermotogae

Bacteria

445932

Elusimicrobium minutum Pei191

Emin

Elusimicrobia

Bacteria

452637

Opitutus terrae PB90–1

Oter

Verrucomicrobia

Bacteria

580340

Thermovirga lienii DSM 17291

Tlie

Synergistetes

Bacteria

157692

Pseudoleptotrichia goodfellowii strain JCM16774

Pgoo

Fusobacteria

Bacteria

639282

Deferribacter desulfuricans SSM1

Ddesu

Deferribacteres

Bacteria

880073

Caldithrix abyssi DSM 13497

Caby

Calditrichaeota

Bacteria

1618372 Candidatus Beckwithbacteria bacterium GW2011_GWC1_49_16

Cbec

Candidatus Berkelbacteria

Bacteria

1619007 Candidatus Wolfebacteria bacterium GW2011_GWB1_47_1

Cwol

Candidatus Wolfebacteria

Bacteria (continued)

How to Rationally Design DNA and Reduce Genomic Instability

379

Table 1 (continued) TaxID

Species

Nickname Phylum

Domain

487

Neisseria meningitidis strain 11–7

Nmen

Proteobacteria

Bacteria

228410

Nitrosomonas europaea ATCC 19718

Neur

Proteobacteria

Bacteria

637390

Acidithiobacillus thiooxidans ATCC 19377

Athi

Proteobacteria

Bacteria

959

Bdellovibrio bacteriovorus strain 109 J

Bbac

Proteobacteria

Bacteria

1128398 Gottschalkia acidurici 9a

Gaci

Firmicutes

Bacteria

642492

Clostridium lentocellum DSM 5427

Clen

Firmicutes

Bacteria

608538

Hydrogenobacter thermophilus TK-6

Hthe

Aquificae

Bacteria

648996

Thermovibrio ammonificans

Tamm

Aquificae

Bacteria

945713

Ignavibacterium album JCM 16511

Ialb

Ignavibacteriae

Bacteria

751945

Thermus oshimai JL-2

Tosh

DeinococcusThermus

Bacteria

649638

Truepera radiovictrix DSM 17093

Trad

DeinococcusThermus

Bacteria

104782

Adineta vaga

Avag

Rotifera

Eukaryota

3055

Chlamydomonas reinhardtii

Crei

Chlorophyta

Eukaryota

353152

Cryptosporidium parvum Iowa II

Cpar

Apicomplexa

Eukaryota

6669

Daphnia pulex

Dpul

Arthropoda

Eukaryota

284811

Eremothecium gossypii ATCC 10895

Egos

Ascomycota

Eukaryota

45351

Nematostella vectensis

Nvec

Cnidaria

Eukaryota

3218

Physcomitrium patens

Ppat

Streptophyta

Eukaryota

242507

Pyricularia oryzae

Pory

Ascomycota

Eukaryota

559292

Saccharomyces cerevisiae S288C

Scer

Ascomycota

Eukaryota

4896

Schizosaccharomyces pombe

Spom

Ascomycota

Eukaryota

227321

Aspergillus nidulans FGSC A4

Anid

Ascomycota

Eukaryota

484906

Babesia bovis T2Bo

Bbov

Apicomplexa

Eukaryota

436017

Ostreococcus lucimarinus

Oluc

Chlorophyta

Eukaryota

296543

Thalassiosira pseudonana CCMP1335

Tpse

Bacillariophyta

Eukaryota

(including content) of the sequences and was done by a shuffle that preserves the distribution of pairs of nucleotides in each sequence. To compare the distribution of the patterns in the real genome and the null model, a way to measure and compare SUR was needed. A basic unit was defined as the shortest repeating subsequence of a repetitive sequence. The number of repetitions of a

380

Matan Arbel-Groissman et al.

basic unit inside a pattern was referred to as the multiplicity factor (MF). For instance, the basic unit of the pattern TATATA is TA, and the MF is 3. Various basic units with the same MF can be found inside an SSR site. For the SSR site TATATATATA, for example, we have two basic units, TA and AT, that can repeat themselves three times (meaning both have MF of 3). Thus, this site contains the following five patterns with MF = 3 and a basic unit of length 2: TATATATATA, TATATATATA, ATATATATA, ATATATATA, TATATATATA. We can infer that a higher number of repeats may increase the probability of a mutation, from the fact that short repetitive sequences increase the probability of mutation during the DNA replication process. As a result, we designed our measure to be based on the number of repetitions of the short DNA sequences, MF. Since short repetitive units may increase sequences’ genetic instability, we expect to discover more patterns in a less stable sequence. Based on that, we defined the short units’ repetitiveness (SUR) measure for a given sequence and MF, as the sum of all appearances of patterns that consist of basic units that repeat themselves MF times, across all the SSRs that the sequence contains. We computed eight SURs for each genomic sequence based on MFs ranging from 3 to 10. SURi will represent the SUR measure computed for MF = i, where 3 ≤ i ≤ 10. SUR3-SUR10 measures were computed for CDS of the real and 200 randomized genomes. Then, we wanted to quantify the SURi, 3 ≤ i ≤ 10, per genomic region. Therefore, in each genome, we summed the unnormalized SURi values of all sequences in the same genomic region and normalized them by the sum of the sequences’ lengths. Utilizing the 200 SURi values calculated for the randomized genomes, we calculated an empirical p-value showing that there is strong evidence for a global selection against high SURs in CDS throughout the evolutionary tree as shown in Fig. 4. This phenomenon is even stronger when focusing on highly expressed gene compared to lowly expressed (Fig. 5). To examine how short sequences’ repetitiveness is related to expression level, we compared the distributions of the SURs in highly and lowly expressed genes. At first, we used protein levels to determine the groups of highly and lowly expressed genes and performed the analysis described below. When we sought to expand our analysis to additional species for which protein levels are unavailable, we utilized the ENc values as a proxy for expression levels and defined the expression groups. The effective number of codons (ENC) is a codon usage bias (CUB) measurement that quantifies how far the codon usage of a gene departs from the equal usage of synonymous codons. The following are the steps for calculating ENC for a coding sequence [51]:

How to Rationally Design DNA and Reduce Genomic Instability

381

Fig. 4 Comparison between the Z-score values for the SUR3-SUR10 measurements that were calculated for the CDSs of the real genome and the null model of all the microorganisms. Each pixel represents the Z-score calculated according to the formula for the different SURs. For each species, a bluer color indicates that the CDSs of the real genome had fewer patterns than those found in the null model, while a redder color means that the CDSs of the real genome had more patterns than the null model. Thresholds of ±10 were set to represent most of the Z-score values while maintaining good resolution. Consequently, any Z-score value that exceeded this range was represented in the darkest color of the corresponding direction

(a) Counting the total number of appearances of each amino acid: k

na =

ni

ð1Þ

i=1

where ni is the number of appearances of the synonymous codon i and k is the number of synonymous codons for the amino acid of interest. (b) The frequency of the ith synonymous codon is then calculated by: pi =

ni na

ð2Þ

(c) The frequency of an amino acid is obtained by: k

na Fa =

i=1

P 2i - 1

na - 1

ð3Þ

382

Matan Arbel-Groissman et al.

Fig. 5 Comparison between the log( p-values) of SUR3-SUR10 measurements that were calculated for the CDSs of the highly and lowly expressed genes of all the microorganisms. Pixels with positive/negative log( pvalues) values represent tests that were significant for the hypothesis of more/less patterns in the lowly expressed genes compared to the highly expressed genes, respectively. For each species, pixels that represent more patterns in lowly expressed genes compared to highly expressed genes correspond to a greener color, while pixels that represent more patterns in highly expressed genes compared to lowly expressed genes correspond to a pinker color

(d) The average of the frequencies of amino acids with the same number of synonymous codons is then computed: 1 ¯ F RC = nRC

ð4Þ

Fa aϵRC

where nRC is the number of amino acids in the same redundancy class, RC, (meaning amino acids have 2, 3, 4, or 6 synonymous codons). (e) Finally, the ENC is computed as follows: ENC = 2 þ

9 ¯ F2

þ

1 ¯ F3

þ

5 ¯ F4

þ

3 ¯ F6

ð5Þ

When the codon usage deviates from the uniform usage of each synonymous codon, ENC reduces from 61. One drawback of ENC is that extreme GC content values might distort the ENC scores. For this reason, we also used the ENc prime (ENc′) [52]

How to Rationally Design DNA and Reduce Genomic Instability

383

measurement, which corrects this disadvantage of the typical ENC and allows comparisons of codon usage bias among species with different background nucleotide compositions. ENc′ has been widely used to study the codon usage bias in various organisms [53–55]. The ENc′ is based on Pearson’s χ 2 statistics and measures the codon usage bias of a gene after filtering out the expected codon usage calculated from the background nucleotide composition. The steps for calculating this index are [52]: (a) Calculating Pearson’s χ 2 for each amino acid to quantify the deviation of each codon’s usage from some expected usage for each amino acid: χ 2a =

na pi - e i i=1 ei k

2

ð6Þ

where pi is the frequency of the ith codon, ei is the expected frequency for the ith codon, k is the number of synonymous codons for the amino acid of interest, and na is the observed number of codons for the amino acid. There are several methods for calculating the ei of a codon, including using the frequencies of mono-, di-, or trinucleotides. For example, using the mononucleotide frequency’s way, the expected frequency for a codon of interest, N1N2N3, can be calculated by multiplying the frequency of each nucleotide as follows: e N 1 N 2 N 3 = ∏3j = 1 pN j

ð7Þ

where Nj is the nucleotide in position j and pN j is the frequency of it. (b) Using the χ 2a value, the next step is to calculate the frequency of a given amino acid as follows: F 0a =

χ 2a þ na - k kðna - 1Þ

ð8Þ

(c) The average of the frequencies of amino acids with the same number of synonymous codons is computed similarly to formula (4): ¯ 1 F 0r = nRC

F 0a

ð9Þ

aϵRC

where nRC is the number of amino acids in the same redundancy class, RC (meaning amino acids have 2, 3, 4, or 6 synonymous codons). (d) Finally, the ENc′ is computed based on the same manner of formula (5):

384

Matan Arbel-Groissman et al.

ENc0 = 2 þ

9 ¯ F 02

þ

1 ¯ F 03

þ

5 ¯ F 04

þ

3 ¯ F 06

ð10Þ

The ENc′ departs from 61 when the codon usage does not match the distribution predicted by nucleotide composition. The ENC and ENc′ scores for the coding sequences of each organism were calculated by the ENCprime program (GitHub user jnovembre, commit 0ead568, October 2016) using the default settings. Highly expressed genes tend to have a strong codon usage bias [56, 57] and are thus expected to have low ENc′ values. Therefore, the 10% of genes with the lowest ENc′ values were classified as highly expressed genes, and the top 10% of genes were classified as lowly expressed genes. The SUR3-SUR10 measures were calculated for each CDS and the regulatory sequence of highly and lowly expressed genes of all microorganisms. Then, each SURi, 3 ≤i ≤ 10, was normalized by the length of the sequence. Utilizing the normalized SURi values of highly and lowly expressed genes of each microorganism, we compared their distributions using a Mann–Whitney U-test [58]. We calculated the statistical test for the two opposite hypotheses for each genomic region: The first hypothesis was that the SURi values of the lowly expressed genes were higher than those calculated for the highly expressed genes, and the second hypothesis was that the SURi values of the highly expressed genes were higher than those of the lowly expressed genes. A p-value lower than 0.05 was considered significant, and a “significant result” implies that the SURi values in one group of expressions are significantly lower than those in the other. Highly expressed genes are usually highly important housekeeping genes, and this only strengthens the case that highly important areas in the genomes have counter-selection for SUR and instability-promoting sequences [59, 60].

3

ESO Detection of SSR and RMD Sites As previously described, the ESO detects various hypermutable sites, in order to replace them during the sequence optimization process. It does so according to this flow (Fig. 6). In [15], a method to detect SSR and RMD sites is introduced. They use empirical data from [12, 61–64] in order to estimate the mutation rates of these sites in E. coli, assuming and demonstrating an exponential decay. While these rates are estimated for E. coli, the underlying mechanism of mutation for these sites is similar in other organisms, and thus, SSR and RMD sites more likely to mutate in E. coli will be more likely to mutate in other organisms as well.

How to Rationally Design DNA and Reduce Genomic Instability

385

Fig. 6 Optimization workflow—full blocks are necessary inputs and outputs, white blocks are optional inputs and outputs, and orange circles are internal objects used by the ESO in the workflow

For SSR sites, the factors impacting a site’s mutation rate per generation (denoted μSSR) are the length of its base unit (denoted L) and its multiplicity factor (denoted MF). For example, ATATATATA could have as a base unit (AT) or (TA) with L = 2, MF = 4, or a base unit (ATAT) or (TATA) with L = 4, MF = 2—the most likely to mutate representation will be selected. The mutation rates are calculated for all SSR candidates within the sequence, defined as all short subsequences (L ≤ 15) satisfying (MF ≥ 4, L = 1) or (MF ≥ 3, L ≥ 2). Their mutation rate is estimated by the empirical equation presented in [15]: log μSSR =

- 12:9þ0:729˜ nMF, L = 1 - 4:749þ0:063˜ nMF, L > 1

RMD candidates are long (L ≥ 16), identical subsequences that appear in separate locations within a given sequence. The factors impacting a RMD site’s mutation rate (denoted μRMD) are the length of the identical sites (denoted L) and the distance between them (denoted Ls). Their mutation rate is also estimated in [15]: μRMD = ðA þ L s Þ - α=L n˜

L 1 þ BL

where A, B, α are empirically found factors satisfying A = 5.8 ± 0.4, B = 1465.6 ± 50.0, α = 29.0 ± 0.1.

386

4

Matan Arbel-Groissman et al.

Epigenetics and the Challenges It Presents to Synthetic Biology Genetic instability is not the only factor hindering the stability of synthetic biology, epigenetics also poses a major challenge. Epigenetics is a term for numerous upon the DNA that can change an organism phenotype without altering its DNA sequence. Epigenetics mainly alter the expression pattern of genes by changing the three-dimensional structure of the DNA and by attaching several modifications to the DNA that can limit its accessibility to RNA polymerase (but not in all organisms) [28, 65]. In eukaryotes, DNA is tightly packed around nucleosomes, formed from histones. Histones carry tails that are continuously post-translationally modified by several proteins. These post-translational modifications control the affinity between the nucleosomes [66]. In an area where all the histones are modified in a manner which increases their affinity to each other, the area becomes tightly packed and gene expression drops significantly or is shut down altogether. These post-translational modifications are transient and interchangeable in their nature. Furthermore, in every replication, all the histones are removed from the chromatid to clear way to the replication fork. Then, they need to be re-established in the proper place and order, to maintain the epigenetic state of the cell. This epigenetic “picture” needed to be copied entirely and accurately to the newly synthesized DNA strand [67]. Thus, it is possible to experience changes in the epigenetic state of the cell quite easily. In an organism expressing a synthetic gene, with the associated metabolic burden and no benefit to the cell, it is a matter of time before this synthetic gene will be epigenetically silenced. This results in an increased fitness of the silenced individual, leading to the loss of expression of the synthetic gene in all the population (Fig. 7). A second type of epigenetic regulation of gene expression pattern is DNA methylation [68]. DNA methylation, especially near the promotor region of the gene, hinders RNA polymerase ability to attach to the promotor and thus transcribe the gene [35]. As such, DNA methylation is a powerful tool for silencing genes. Only some organisms exhibit DNA methylation (e.g., the budding yeast lost it during evolution). For those that do (e.g., insects and mammals), this poses a major challenge for the stable expression of synthetic genes [36]. While currently there is no solution in the horizon to epigenetic silencing of synthetic genes resulting from changes to the three-dimensional structure of the genome, it is possible to greatly reduce the probability of incurring DNA methylation in a given sequence. The likelihood of DNA methylation was found empirically for different sequences and encapsulated within methylation motifs that promote DNA methylation in a given sequence [69, 70]. By using these motifs and

How to Rationally Design DNA and Reduce Genomic Instability

387

Fig. 7 Epigenetic effect on gene silencing. (a) Post-transitional modification of histones affects their affinity to each other, which affects the density of histones in a specific region. Modification encouraging higher density in the region will lead to the DNA being inaccessible to RNA polymers and to gene silencing. (b) DNA methylation, mainly in the promotor region, causes a decrease in the ability of RNA polymerase to attach to the region and transcribe it, leading to loss of expression and silencing of the gene

avoiding sites likely to promote DNA methylation in a synthetic construct, the probability of a synthetic gene to be epigenetically silenced is reduced.

5

ESO Detection of Methylation Motifs As previously noted, DNA methylation is a powerful silencing mechanism in certain organisms, such as mammalian and insectoid cells. For these organisms, avoiding sites that are likely to promote DNA methylation should be a primary objective in ensuring continued expression. Fortunately, the probabilities of subsequences matching DNA methylation sites were summarized using position-specific scoring matrices (PSSM) by [69]. They identified 313 methylation motifs within brain, liver, and pancreatic cells, and for each motif, they found the likelihood of each nucleotide per position, normalized by the overall likelihood of the nucleotide in the host. Thus, assuming each time a different motif, the probability of each subsequence within a host sequence can be estimated. As the reduced expression due to different methylation motifs has not been analyzed so far, this probability of matching different motifs is the best proxy available for finding methylation sites. Thus, per motif and per subsequence (defined by a sliding window), the probability of matching is estimated, and the most likely sites are marked, for avoidance in the optimization process. We note that PSSM are a common analytical tool in genomics [71]. Thus, if the user has custom sites to avoid in their sequence (whether encapsulated in an external library or custom-made), they

388

Matan Arbel-Groissman et al.

may use other PSSM accordingly. This provides accommodation to many different engineering needs, such as avoiding specific transcription sites, protein-protein interactions, and protein functions. 5.1 ESO Automatic DNA Sequence Optimization

The ESO, as opposed to its predecessors, is intended to provide automatically optimized and corrected sequences for any input. For this purpose, it uses the DNA chisel, an efficient and flexible sequence optimization framework released by the Edinburgh Genome Foundry [72]. This framework may combine multiple objectives through first applying hard constraints (minimal requirements that must be satisfied), followed by optimizing an objective function within these constraints. Users should note that the more constraints they use, even assuming they do not clash, the narrower the search space is and likely a weaker solution for the objective function is found. The following hard constraints are employed by the optimization mechanism: 1. Following the detection of hypermutable sites, sites prone to epigenetic silencing, and/or other custom sites to avoid, they are denoted as hard constraints—sequences to avoid. 2. The amino acid translation of the ORF is conserved. 3. The GC content can be constrained, and this constraint is employed on a sliding window along the sequence. This ensures that there will be no locally GC-enriched or GC-deficient regions. 4. Sites may be “locked” by the user—ensured not to be changed in the optimization process. In addition to stability, the ESO may optimize the sequence for high expression levels, assuming a host organism and optimization objective are provided. It does so by employing common codon matching methods as possible training objectives. These methods include: 1. Replacing each codon with the most frequent synonymous codon—CAI optimization. 2. Matching the final sequence’s codon usage to the target species’ codon usage profile. 3. Harmonizing RCA [73]—also codon usage matching, but taking into account the different roles of codons in target and host organisms. The organisms for whom codon optimization is available are Bacillus subtilis, Caenorhabditis elegans, Drosophila melanogaster, E. coli, Gallus gallus, Homo sapiens, Mus musculus, Mus musculus domesticus, and S. cerevisiae.

How to Rationally Design DNA and Reduce Genomic Instability

389

We note that avoiding hypermutable sites is not expected to influence the overall GC (or other hard constraints) significantly. Thus, the optimization is done in three steps: 1. Resolving all other hard constraints and optimization objectives, generating an intermediate sequence. 2. Detection of SSR, RMD, and possibly DNA methylation or custom sites. 3. Optimizing a second time, taking avoidance of these sites as hard constraints—a more easily satisfied optimization problem.

6

Conclusions In conclusion, the success of genetic engineering depends on the stability of the introduced DNA sequences in the host organism’s genome. However, genomic instability and epigenetic silencing present major challenges to efficient and predictable gene expression. We show, for the first time, evidence of evolutionary selection against patterns that decrease genomic stability. The development of software tools that can detect and remove unstable sequences in synthetic DNA constructs represents a significant step toward improving the reliability and efficiency of genetic engineering. The ESO software is an example of such a tool, using a scanning algorithm to identify and eliminate unstable sequences in synthetic DNA. By addressing the problem of genomic instability, tools such as ESO will enable more efficient and predictable gene expression in a variety of applications. With ongoing advancements in synthetic biology and genome editing technologies, software tools such as ESO will continue to play an important role in the development of next-generation genetic engineering solutions. We believe that a better understanding of how evolution “deals” with genetic instability in various organisms across the tree of life may help develop the next generation of tools for designing stable synthetic biology constructs.

Acknowledgement S.B. is supported by a fellowship from the Edmond J. Safra Center for Bioinformatics at Tel Aviv University. This study was partially funded by the Israel Innovation Authority. Figures 2, 3, and 5 were created using Biorender.com.

390

Matan Arbel-Groissman et al.

Bibliography 1. Lenski RE, Nguyen TT (1988) Stability of recombinant DNA and its effects on fitness. Trends Ecol Evol 3:S18 2. Lenski RE (1991) Quantifying fitness and gene stability in microorganisms. Biotechnology (Reading, Mass) 15:173 3. Sleight SC, Sauro HM (2013) Visualization of evolutionary stability dynamics and competitive fitness of Escherichia coli engineered with randomized multigene circuits. ACS Synth Biol 2:519–528 4. Couto JM, McGarrity A, Russell J, Sloan WT (2018) The effect of metabolic stress on genome stability of a synthetic biology chassis Escherichia coli K12 strain. Microb Cell Factories 17:8 5. Bandopadhyay R, Haque I, Singh D, Mukhopadhyay K (2010) Levels and stability of expression of transgenes. Transgenic Crop Plants: Principles and Development 145–186. ht t p s : //do i . o r g / 1 0. 1 0 07 / 97 8 - 3- 64 204809-8_5 6. Arbel M, Liefshitz B, Kupiec M (2021) DNA damage bypass pathways and their effect on mutagenesis in yeast. FEMS Microbiol Rev 45. https://doi.org/10.1093/femsre/ fuaa038 7. Volkova NV et al (2020) Mutational signatures are jointly shaped by DNA damage and repair. Nat Commun 11:2169 8. Menuhin-Gruman I et al (2021) Evolutionary stability optimizer (ESO): a novel approach to identify and avoid mutational hotspots in DNA sequences while maintaining high expression levels. ACS Synth Biol 11:1142. https://doi. org/10.1021/acssynbio.1c00426 9. Lovett ST (2004) Encoded errors: mutations and rearrangements mediated by misalignment at repetitive DNA sequences. Mol Microbiol 52:1243. https://doi.org/10.1111/j. 1365-2958.2004.04076.x 10. Bishop AJR, Schiestl RH (2000) Homologous recombination as a mechanism for genome rearrangements: environmental and genetic effects. Hum Mol Genet 9:2427 11. Kiktev DA, Sheng Z, Lobachev KS, Petes TD (2018) GC content elevates mutation and recombination rates in the yeast Saccharomyces cerevisiae. Proc Natl Acad Sci 115:E7109– E7118 12. Vogler AJ et al (2006) Effect of repeat copy number on variable-number tandem repeat mutations in Escherichia coli O157:H7. J Bacteriol 188:4253. https://doi.org/10.1128/ JB.00001-06

13. Viguera E, Canceill D, Ehrlich SD (2001) Replication slippage involves DNA polymerase pausing and dissociation. EMBO J 20:2587 14. Rogozin IB, Pavlov YI (2003) Theoretical analysis of mutation hotspots and their DNA sequence context specificity. Mutat Res Rev Mutat Res 544:65 15. Jack BR et al (2014) Predicting the genetic stability of engineered DNA sequences with the EFM calculator. ACS Synth Biol 4:939. https://doi.org/10.1021/acssynbio.5b00068 16. Sung P, Klein H (2006) Mechanism of homologous recombination: mediators and helicases take on regulatory functions. Nat Rev Mol Cell Biol 7:739 17. Cannan WJ, Pederson DS (2016) Mechanisms and consequences of double-strand DNA break formation in chromatin. J Cell Physiol 231:3 18. Chang HHY, Pannunzio NR, Adachi N, Lieber MR (2017) Non-homologous DNA end joining and alternative pathways to doublestrand break repair. Nat Rev Mol Cell Biol 18: 495 19. Lee B-G et al (2016) ELG1, a yeast gene required for genome stability, forms a complex related to replication factor C. Mol Cell 10. https://doi.org/10.1038/s41467-01910376-w 20. Symington LS (2014) End resection at doublestrand breaks: mechanism and regulation. Cold Spring Harb Perspect Biol 6:a016436 21. Agmon N, Liefshitz B, Zimmer C, Fabre E, Kupiec M (2013) Effect of nuclear architecture on the efficiency of double-strand break repair. Nat Cell Biol 15:694 22. Renkawitz J, Lademann CA, Jentsch S (2014) Mechanisms and principles of homology search during recombination. Nat Rev Mol Cell Biol 15:369 23. Donnianni RA et al (2019) DNA polymerase delta synthesizes both strands during breakinduced replication. Mol Cell 76:371 24. McVey M, Khodaverdian VY, Meyer D, Cerqueira PG, Heyer WD (2016) Eukaryotic DNA polymerases in homologous recombination. Annu Rev Genet 50:393 25. Piazza A, Heyer WD (2019) Homologous recombination and the formation of complex genomic rearrangements. Trends Cell Biol 29: 135 26. Tan L, Strong EJ, Woods K, West NP (2018) Homologous alignment cloning: a rapid, flexible and highly efficient general molecular cloning method. PeerJ 2018:e5146

How to Rationally Design DNA and Reduce Genomic Instability 27. Rozwadowski K, Yang W, Kagale S (2008) Homologous recombination-mediated cloning and manipulation of genomic DNA regions using Gateway and recombineering systems. BMC Biotechnol 8:88 28. Allis CD, Jenuwein T (2016) The molecular hallmarks of epigenetic control. Nat Rev Genet 17:487 29. Cutter AR, Hayes JJ (2015) A brief review of nucleosome structure. FEBS Lett 589:2914 30. Lai WKM, Pugh BF (2017) Understanding nucleosome dynamics and their links to gene expression and DNA replication. Nat Rev Mol Cell Biol 18:548 31. Padeken J, Methot SP, Gasser SM (2022) Establishment of H3K9-methylated heterochromatin and its functions in tissue differentiation and maintenance. Nat Rev Mol Cell Biol 23:623 32. Grewal SIS, Moazed D (2003) Heterochromatin and epigenetic control of gene expression. Science 301:798 33. Moore LD, Le T, Fan G (2013) DNA methylation and its basic function. Neuropsychopharmacology 38:23 34. Anastasiadi D, Esteve-Codina A, Piferrer F (2018) Consistent inverse correlation between DNA methylation of the first intron and gene expression across tissues and species. Epigenetics Chromatin 11:37 35. Cholewa-Waclaw J et al (2019) Quantitative modelling predicts the impact of DNA methylation on RNA polymerase II traffic. Proc Natl Acad Sci U S A 116:14995–15000 36. Nasrullah, Hussain A, Ahmed S, Rasool M, Shah AJ (2022) DNA methylation across the tree of life, from micro to macro-organism. Bioengineered 13:1666 37. Reha-Krantz LJ (2010) DNA polymerase proofreading: multiple roles maintain genome stability. Biochim Biophys Acta, Proteins Proteomics 1804:1049 38. Oman M, Alam A, Ness RW (2022) How sequence context-dependent mutability drives mutation rate variation in the genome. Genome Biol Evol 14:evac032 39. Duan C et al (2018) Reduced intrinsic DNA curvature leads to increased mutation rate. Genome Biol 19:132 40. Renda BA, Hammerling MJ, Barrick JE (2014) Engineering reduced evolutionary potential for synthetic biology. Mol BioSyst 10:1668. https://doi.org/10.1039/c3mb70606k 41. Be˛benek A, Ziuzia-Graczyk I (2018) Fidelity of DNA replication—a matter of proofreading. Curr Genet 64:985

391

42. Bzymek M, Lovett ST (2001) Evidence for two mechanisms of palindrome-stimulated deletion in Escherichia coli: single-strand annealing and replication slipped mispairing. Genetics 158: 527 43. Li YC, Korol AB, Fahima T, Nevo E (2004) Microsatellites within genes: structure, function, and evolution. Mol Biol Evol 21:991 44. Bhargava A, Fuentes FF (2010) Mutational dynamics of microsatellites. Mol Biotechnol 44:250 45. Huang RX, Zhou PK (2020) DNA damage response signaling pathways and targets for radiotherapy sensitization in cancer. Signal Transduct Target Ther 5:60 46. Huang D, Piening BD, Paulovich AG (2013) The preference for error-free or error-prone postreplication repair in Saccharomyces cerevisiae exposed to low-dose methyl methanesulfonate is cell cycle dependent. Mol Cell Biol 33: 1515 47. Suzuki T et al (2021) Error-prone bypass patch by a low-fidelity variant of DNA polymerase zeta in human cells. DNA Repair (Amst) 100: 103052 48. Smirnova M, Klein HL (2003) Role of the error-free damage bypass postreplication repair pathway in the maintenance of genomic stability. Mutat Res - Fundam Mol Mech Mutagen 532:117 49. Vorontsov IE et al (2016) Negative selection maintains transcription factor binding motifs in human cancer. BMC Genomics 17:395 50. Nitta KR et al (2015) Conservation of transcription factor binding specificities across 600 million years of bilateria evolution. elife 2015:e04837 51. Wright F (1990) The ‘effective number of codons’ used in a gene. Gene 87:23 52. Fuglsang A (2006) Accounting for background nucleotide composition when measuring codon usage bias: brilliant idea, difficult in practice. Mol Biol Evol 23:1345 53. Peeri M, Tuller T (2020) High-resolution modeling of the selection on local mRNA folding strength in coding sequences across the tree of life. Genome Biol 21:63 54. Satapathy SS, Sahoo AK, Ray SK, Ghosh TC (2017) Codon degeneracy and amino acid abundance influence the measures of codon usage bias: improved Nc (N^c) and ENCprime (N^′c) measures. Genes Cells 22:277 55. Hershberg R, Petrov DA (2009) General rules for optimal codon choice. PLoS Genet 5: e1000556 56. Bahiri-Elitzur S, Tuller T (2021) Codon-based indices for modeling gene expression and

392

Matan Arbel-Groissman et al.

transcript evolution. Comput Struct Biotechnol J 19:2646 57. Frumkin I et al (2018) Codon usage of highly expressed genes affects proteome-wide translation efficiency. Proc Natl Acad Sci U S A 115: E4940–E4949 58. Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 18:50 59. Do¨tsch A et al (2010) Evolutionary conservation of essential and highly expressed genes in Pseudomonas aeruginosa. BMC Genomics 11: 234 60. Karlin S, Mra´zek J, Campbell A, Kaiser D (2001) Characterizations of highly expressed genes of four fast-growing bacteria. J Bacteriol 183:5025 61. Oliveira PH, Lemos F, Monteiro GA, Prazeres DMF (2008) Recombination frequency in plasmid DNA containing direct repeatspredictive correlation with repeat and intervening sequence length. Plasmid 60:159. https:// doi.org/10.1016/j.plasmid.2008.06.004 62. U’Ren JM et al (2007) Tandem repeat regions within the Burkholderia pseudomallei genome and their application for high resolution genotyping. BMC Microbiol 7. https://doi.org/ 10.1186/1471-2180-7-23 63. Girard JM et al (2004) Differential plaguetransmission dynamics determine Yersinia pestis population genetic structure on local, regional, and global scales. Proc Natl Acad Sci U S A 101:8408. https://doi.org/10.1073/ pnas.0401561101 64. Lee H, Popodi E, Tang H, Foster PL (2012) Rate and molecular spectrum of spontaneous

mutations in the bacterium Escherichia coli as determined by whole-genome sequencing. Proc Natl Acad Sci U S A 109:E2774. https://doi.org/10.1073/pnas.1210309109 65. Stajic D, Perfeito L, Jansen LET (2019) Epigenetic gene silencing alters the mechanisms and rate of evolutionary adaptation. Nat Ecol Evol 3:491 66. Allshire RC, Madhani HD (2018) Ten principles of heterochromatin formation and function. Nat Rev Mol Cell Biol 19:229 67. Panne D et al (2018) Mechanistic insights into histone deposition and nucleosome assembly by the chromatin assembly factor-1. Nucleic Acids Res 46:9907–9917 68. Mattei AL, Bailly N, Meissner A (2022) DNA methylation: a historical perspective. Trends Genet 38:676 69. Wang M et al (2019) Identification of DNA motifs that regulate DNA methylation. Nucleic Acids Res 47:6753–6768 70. Scala G, Federico A, Greco D (2021) CpGmotifs: a tool to discover DNA motifs associated to CpG methylation events. BMC Bioinformatics 22:278 71. Nielsen H, Tsirigos KD, Brunak S, von Heijne G (2019) A brief history of protein sorting prediction. Protein J 38:200 72. Zulkower V, Rosser S (2020) DNA Chisel, a versatile sequence optimizer. Bioinformatics 36:4508–4509 73. Claassens NJ et al (2017) Improving heterologous membrane protein production in Escherichia coli by combining transcriptional tuning and codon usage algorithms. PLoS One 12: e0184355

Chapter 22 Genetic Network Design Automation with LOICA Gonzalo Vidal, Carolus Vitalis, Tamara Matu´te, Isaac Nu´n˜ez, Ferna´n Federici, and Timothy J. Rudge Abstract Genetic design automation (GDA) is the use of computer-aided design (CAD) in designing genetic networks. GDA tools are necessary to create more complex synthetic genetic networks in a highthroughput fashion. At the core of these tools is the abstraction of a hierarchy of standardized components. The components’ input, output, and interactions must be captured and parametrized from relevant experimental data. Simulations of genetic networks should use those parameters and include the experimental context to be compared with the experimental results. This chapter introduces Logical Operators for Integrated Cell Algorithms (LOICA), a Python package used for designing, modeling, and characterizing genetic networks using a simple object-oriented design abstraction. LOICA represents different biological and experimental components as classes that interact to generate models. These models can be parametrized by direct connection to the Flapjack experimental data management platform to characterize abstracted components with experimental data. The models can be simulated using stochastic simulation algorithms or ordinary differential equations with varying noise levels. The simulated data can be managed and published using Flapjack alongside experimental data for comparison. LOICA genetic network designs can be represented as graphs and plotted as networks for visual inspection and serialized as Python objects or in the Synthetic Biology Open Language (SBOL) format for sharing and use in other designs. Key words Computer-Aided Design, Genetic Design Automation, Genetic Network, Modeling, SBOL, Synthetic Biology.

1

Introduction Synthetic Biology or Engineering Biology as an engineering discipline has the Design Build Test Learn (DBTL) cycle at its core. Modeling is key to the DBTL cycle and is essential to the design and learn stages, given that a model states a well-defined hypothesis about the system operation. Therefore, synthetic biologists have used mathematical and computational models to represent their hypotheses about how biological systems behave at the Design stage. Constructing physical implementations of that biological

Jeffrey Carl Braman (ed.), Synthetic Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2760, https://doi.org/10.1007/978-1-0716-3658-9_22, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2024

393

394

Gonzalo Vidal et al.

system at the Build stage and obtaining measurements from samples of it at the Test stage. Then, experimental data should be analyzed, characterized, and compared with our design to extract information at the Learn stage. The information generated at the Test stage should feed into the Learn and Design stages to increase the understanding of the biological system improving future models and designs. Abstraction enables the construction and analysis of models based on components, devices, and systems that can be used to compose genetic networks and derive their DNA sequences. It is the basis for genetic design automation (GDA), as a computational aided design (CAD) tool for genetic design, which can accelerate and automate the synthetic genetic network design process by compiling models into DNA sequences. In order for GDA to proceed in a rational way, the abstract elements of genetic networks must be accessible to characterization, allowing parametrization of models of their operation and interactions. GDA tools enable the functional decoupling between researchers working at a sequence level and researchers working at the systems level [1, 2]. Standardization is at the foundations of synthetic biology, enabling the collaborative work of a community. The Synthetic Biology Open Language (SBOL) is a free and open-source standard for the representation of biological designs [3]. The SBOL standard was developed by the synthetic biology community to create a standardized format for the electronic exchange of information on the structural and functional aspects of biological designs. This community developed an ecosystem of software tools to, for example, design, model, visualize, store, and share SBOL files [4– 10]. Flapjack is a tool from that ecosystem for Synthetic Biology experimental data management [9]. Experimental data in Flapjack can be linked to SBOL metadata in SynBioHub [5]. Logical Operators for Integrated Cell Algorithms (LOICA) is a Python package allowing programmatic design, simulation, parametrization, and analysis of genetic networks [11]. While perhaps not as accessible as a graphical user interface, this approach is more flexible, extensible, and amenable to automation. It can be easily combined with the large ecosystem of biological Python projects [9, 12–14], and uses simple programming concepts that are commonly understood by researchers from a range of disciplines. LOICA genetic network designs can be represented as graphs and plotted as networks for visual inspection. Furthermore, Operators and Metabolism have models of gene expression and growth which parameters can be defined as input or obtained through the analysis of relevant of experimental data [15]. A two-way communication with Flapjack allows LOICA to get experimental data for model parametrization and to upload simulated data to the platform in a straightforward way. GeneticNetworks can generate a SBOL representation that can be shared and used as a component in another design. Here, we describe some use cases and how LOICA can be used to design, visualize, and model genetic networks.

Genetic Network Design Automation with LOICA

2 2.1

395

Materials Dependencies

1. Installing LOICA via pip will automatically install some required packages. These prerequistes include Python 3.8 or a later version. We recommend the use of an environment manager, such as Anaconda (https://www.anaconda.com/). 2. Additionally, users must install pyFlapjack (see Note 1) to be able to interact with Flapjack (see next chapter).

2.2

Installation

1. The LOICA Python package is distributed using the Python Package Index (PyPI), which utilizes the Pip Installs Package (PIP) for installation and update management. The latest stable release of LOICA can be installed using the following commands (see Note 2): pip install loica

2. To verify that the installation was successful, users should be able to run the following command with no errors: import loica as lc

3

Methods In the following sections, we detail how LOICA can be used to generate genetic network designs, graph visualization and simulations. We exemplify its use by designing a NOR gate [16], a repressilator [17], and characterizing an inverter, covering the main use cases. To learn more about the advanced features, we recommend consulting our previous publication [11] and exploring the source repository https://github.com/RudgeLab/LOICA with more examples and tutorials, as well as the documentation https://loica.readthedocs.io.

3.1 Designing a NOR Gate

To create a NOR genetic network design in LOICA, we instantiate two representing Acyl Homoserine Lactone (AHL) Supplements, each of which induce a Receiver Operator that expresses a regulator. These regulators, in turn, repress a Hill2 Operator that expresses a GFP Reporter. This process can be described step by step, as follows: 1. Create a GeneticNetwork (Table 1). nor = lc.GeneticNetwork(vector=0)

396

Gonzalo Vidal et al.

Table 1 Supplement Parameter

Description

Format/values Default

Name of the supplement

Str

n/a

concentration Concentration of the supplement in Molar

int – float

n/a

pubchemid

Str

n/a

name

supplier_id

sbol_comp

PubChemID URI of the supplement

str – List Supplier ID of the supplement. An URL of the product that you acquire. Accepts list of the form [product URL, catalog number, batch].

n/a

SBOL component of the supplement.

n/a

Str

Table 2 Genetic network Parameter

Description

Format/values

Default

operators

List of Operators that are part of the genetic network

List

[]

regulators

List of Regulators that are part of the genetic network

List

[]

reporters

List of Reporters that are part of the genetic network

List

[]

Flapjack ID of the vector that is associated with the genetic network. If none use 0.

int – float

n/a

vector

2. Create Supplements (Table 2) (see Note 3). ahl1 = lc.Supplement(name=AHL1) ahl2 = lc.Supplement(name=AHL2)

3. Create Regulators (Table 3) and add them to the nor GeneticNetwork. In this example, we use TetR and LacI. tetr_reg = lc.Regulator(name=TetR, degradation_ rate=1) laci_reg = lc.Regulator(name=LacI, degradation_ rate=1) nor.add_regulator([tetr_reg, laci_reg])

4. Create a Reporter (Table 4) and add it to the nor GeneticNetwork. The example shows the creation of a GFP Reporter.

Genetic Network Design Automation with LOICA

397

Table 3 Regulator Parameter

Description

Format/values

Default

name

Name of the gene product

Str

n/a

init_concentration Initial concentration of the gene product in Molar

Float

0

degradation_rate

Degradation rate of the gene product

float

0

type_

Molecular type of the gene product, options are ‘PRO’ (protein) or ‘RNA’ (RNA)

str, optional

PRO

uri

SynBioHub URI

str, optional

None

sbol_comp

SBOL Component

SBOL Component, optional

None

color

Color displayed on the network representation

str

lightgreen

Table 4 Reporter Parameter

Description

Format/values

Default

name

Name of the gene product

str

n/a

init_concentration Initial concentration of the gene product in Molar

int – float

0

degradation_rate

Degradation rate of the gene product

int – float

0

type_

Molecular type of the gene product, can be ‘PRO’ str, optional or ‘RNA’

PRO

uri

SynBioHub URI

str, optional

None

sbol_comp

SBOL Component

SBOL Component, optional

None

signal_id

Flapjack ID of the Signal that the reporter is associated with.

str, optional

None

color

Color displayed on the network representation

str, optional

white

gfp_rep

=

lc.Reporter(name= GFP ,

degradation_

rate=1, signal_id=0, color=green) rep.add_reporter(gfp_rep)

5. Create Operators and add them to the nor GeneticNetwork. The example shows the creation of two Receiver Operators (Table 5) and one Hill2 Operator (Table 6), modeling a transcriptional NOR gate.

398

Gonzalo Vidal et al.

Table 5 Operator receiver Parameter Description

Format/values

Default

input

The input of the operator that regulates the expression of the output

Regulator – Supplement

n/a

output

The output of the operator that is regulated by the input Regulator – Reporter

n/a

alpha

[Basal expression rate, Regulated expression rate in MEFL/second]

List

n/a

K

Half expression input concentration in Molar

int – float

n/a

n

Hill coefficient, cooperative degree (unitless)

int – float

n/a

uri

SynBioHub URI

str, optional

None

sbol_comp SBOL Component

SBOL Component, optional

None

name

Name of the operator displayed on the network representation

str, optional

None

color

Color displayed on the network representation

str, optional

skyblue

Parameter Description

Format/values

Default

input

The inputs of the operator that regulates the expression of the output

List [Regulator – Supplement]

n/a

output

The output of the operator that is regulated by the input Regulator – Reporter – List

n/a

alpha

Regulated and unregulated expression rates (see LOICA [11])

List

n/a

K

Half expression input concentration in Molar

List

n/a

n

Hill coefficient, cooperative degree (unitless)

int – float

n/a

uri

SynBioHub URI

str, optional

None

sbol_comp SBOL Component

SBOL Component, optional

None

name

Name of the operator displayed on the network representation

str, optional

None

color

Color displayed on the network representation

str, optional

orange

Table 6 Operator Hill2

Genetic Network Design Automation with LOICA

399

This design represents a set of three transcriptional units. One transcriptional unit induced by AHL1 that expresses TetR, another transcriptional unit induced by AHL2 that expresses LacI, and finally, a transcriptional unit repressed by LacI and TetR that expresses GFP. All the Operators have parameters to model a Hill equation. ahl1_rec_tetr = lc.Receiver(name=pahl1, input=ahl1, output=[tetr_reg], alpha=[10000,0.1], K=10, n=2) ahl2_rec_laci = lc.Receiver(name=pahl2, input=ahl2, output=laci_reg, alpha=[10000,0.1], K=10, n=2) tetr_laci_nor_gfp = lc.Hill2(name=NOR, input=[tetr_re g,

laci_reg],

output=gfp_rep,

alpha=[10000,

10000,0.1, 0.1], K=[10, 10], n=[2,2]) nor.add_operator([ahl1_rec_tetr, ahl2_rec_laci, tetr_laci_nor_gfp])

6. Plot the graph representation for visual inspection of the design (Fig. 1a). plt.figure(figsize=(3.3,3.3), dpi=300) rep.draw()

7. Alternatively, plot the contracted graph representation for a simplified visual inspection of the design (Fig. 1b). plt.figure(figsize=(3.3,3.3), dpi=100) rep.draw (contracted=True)

8. Create a Metabolism. The example shows the creation of a SimulatedMetabolism (Table 7) based on the Gompertz growth model [18]. def growth_rate(t): return lc.metabolism.gompertz_growth_rate(t, 0.01, 1, 1, 1) def biomass(t): return lc.metabolism.gompertz(t, 0.01, 1, 1, 1) metabolism = lc.SimulatedMetabolism(LOICA metab, biomass, growth_rate)

9. Create Samples (Table 8) encapsulating the GeneticNetwork and the Metabolism with different concentration of Supplement. The example shows the creation of multiple Samples containing the nor Genetic Network, the simulated metabolism and AHLs at different concentrations.

400

Gonzalo Vidal et al.

Fig. 1 NOR gate genetic network diagram. (a) GeneticNetwork diagram generated by its draw method, which creates a graph representation and plots it as a network. (b) Contracted version of the GeneticNetwork graph representation. In these networks, light blue nodes are Hill1 Operators, orange nodes are Hill2 Operators, light green nodes are Regulators, pink nodes are Supplements and Reporters are represented in their assigned color, in this case, green. Regulators are not shown in the contracted version. Pointy arrows represent production or induction and blunt arrows represent repression samples = [] for conc1 in np.logspace(-3, 3, 12): for conc2 in np.logspace(-3, 3, 12): sample = lc.Sample(genetic_network=nor, metabolism=metab)

Genetic Network Design Automation with LOICA

401

Table 7 SimulatedMetabolism Parameter

Description

Format/values Default

name

Name of the metabolism

Str

None

biomass

A function of time that describes biomass f(t) = biomass

Function

n/a

Function

n/a

growth_rate A function of time that describes the growth rate f(t) = growth rate

Table 8 Sample Parameter

Description

Format/values

genetic_network Genetic network that is part of the sample

Default

GeneticNetwork None

metabolism

Metabolism that drives the genetic network in the sample Metabolism

None

assay

Assay to which this sample belongs

Assay

None

media

Name of the media in the sample

str, optional

None

strain

Name of the strain in the sample

str, optional

None

sample.set_supplement(ahl1, conc1) sample.set_supplement(ahl2, conc2) samples.append(sample)

10. Create an Assay (Table 9). The example shows the creation of an Assay that takes 100 measurements every 15 min with a name, description, and using OD as the biomass signal. assay = lc.Assay([sample], n_measurements=100, interval=0.25, name=LOICA NOR gate, description=Simulated NOR gate generated by LOICA, biomass_signal_id=0

11. Run the Assay. The example shows how to run an Assay with default parameters; this uses ODEs for simulations (Fig. 2). assay.run()

3.2 Genetic Ring Oscillator

To create a genetic ring oscillator design in LOICA, similar to the repressilator (see Note 4), we instantiate three repressor Regulators, each of which represses a Hill1 Operator that expresses another Regulator in a ring fashion, A a B a C a D. One of these operators expresses a GFP Reporter in a bicistronic way. This process has some variation for simulations and can be described step by step as follows:

402

Gonzalo Vidal et al.

Table 9 Assay Parameter

Description

Format/values

Default

samples

List of Samples that belongs to the Assay

List[Sample]

n/a

n_measurements

Number of measurements to take

Int

n/a

interval

Time in hours between each measurements

Float

n/a

name

Name of the Assay

Str

LOICA assay

description

Description of the Assay

Str

biomass_signal_id

Flapjack ID of the Signal measuring culture growth (e.g. OD600)

Int

None

Fig. 2 NOR gate simulation heatmap. NOR gate simulated with a Gompertz metabolism and adding Supplements to Samples in 12 concentrations between -3 and 3 in log space. The Assay runs for 17.5 h measuring every 15 min. High mean expression is shown in yellow and low expression rate is shown in blue

1. Create a GeneticNetwork. repressilator = lc.GeneticNetwork(vector=0)

2. Create Regulators and add them to the repressilator GeneticNetwork. In this example, we use TetR, LacI, and CI. tetr_reg = lc.Regulator(name=TetR, degradation_ rate=1)

Genetic Network Design Automation with LOICA

403

laci_reg = lc.Regulator(name=LacI, degradation_ rate=1) ci_reg = lc.Regulator(name=cI, degradation_ rate=1) repressilator.add_regulator([tetr_reg, laci_ reg, ci_reg])

3. Create a Reporter and add it to the repressilator GeneticNetwork. The example shows the creation of a GFP Reporter. gfp_rep = lc.Reporter(name=GFP, degradation_rate=1, signal_id=0, color=green) rep.add_reporter(gfp_rep)

4. Create Operators and add them to the repressilator Genetic Network. The example shows the creation of three Hill1 Operators (Table 10), modeling repressible transcriptional units. A transcriptional unit repressed by LacI that expresses TetR and GFP, a transcriptional unit repressed by TetR expressing CI and a transcriptional unit repressed by CI expressing LacI. All the Operators have parameters to model a Hill equation. laci_not_tetr_gfp = lc.Hill1(name=pLac, input=laci_reg,

output=[tetr_reg,

gfp_rep],

alpha=

[10000,0.1], K=10, n=2)

Table 10 Operator Hill1 Parameter Description

Format/values

Default

input

The input of the operator that regulates the expression of the output

Regulator – Supplement

n/a

output

The output of the operator that is regulated by the input Regulator – Reporter

n/a

alpha

[Basal expression rate, Regulated expression rate in MEFL/second]

List

n/a

K

Half expression input concentration in Molar

int | float

n/a

n

Hill coefficient, cooperative degree (unitless)

int | float

n/a

uri

SynBioHub URI

str, optional

None

sbol_comp SBOL Component

SBOL Component, optional

None

name

Name of the operator displayed on the network representation

str, optional

None

color

Color displayed on the network representation

str, optional

skyblue

404

Gonzalo Vidal et al. tetr_not_ci = lc.Hill1(name=pTet, input=tetr_reg, output=ci_reg, alpha=[10000,0.1], K=10, n=2) ci_not_laci = lc.Hill1(name=pcI, input=ci_reg, output=laci_reg, alpha=[10000,0.1], K=10, n=2) rep.add_operator([laci_not_tetr_gfp, ci_not_laci, tetr_not_ci])

5. Plot the graph representation for visual inspection of the design (Fig. 3a). plt.figure(figsize=(3.3,3.3), dpi=100) rep.draw ()

6. Alternatively, plot the contracted graph representation for a simplified visual inspection of the design (Fig. 3b). plt.figure(figsize=(3.3,3.3), dpi=100) rep.draw(contracted=True)

7. Create a Metabolism. The example shows the creation of a SimulatedMetabolism based on the Gompertz growth model [18]. def growth_rate(t): return lc.metabolism.gompertz_growth_rate(t, 0.01, 1, 1, 1) def biomass(t): return lc.metabolism.gompertz(t, 0.01, 1, 1, 1) metabolism = lc.SimulatedMetabolism(LOICA metab, biomass, growth_rate)

8. Create a Sample encapsulating the GenticNetwork and the Metabolism. The example shows the creation of a Sample containing the repressilator and metabolism. sample = lc.Sample(genetic_network=repressilator, metabolism=metabolism)

9. Create an Assay. The example shows the creation of an Assay that takes 100 measurements every 15 min with a name, description and using OD as the biomass signal. assay = lc.Assay([sample], n_measurements=100, interval=0.25,

name= LOICA

repressilator ,

description=Simulated repressilator generated by LOICA, biomass_signal_id=0

Genetic Network Design Automation with LOICA

405

Fig. 3 Simple repressilator genetic network diagram. (a) GeneticNetwork diagram generated by its draw method, which creates a graph representation and plots it as a network. (b) Contracted version of the GeneticNetwork graph representation. In these networks light blue nodes are Operators, light green nodes are Regulators and Reporters are represented in their assigned color, in this case green. Regulators are not shown in the contracted version. Pointy arrows represent production or induction and blunt arrows represent repression

10. Run the Assay. The example shows how to run an Assay with default parameters, this uses ODEs for simulations (Fig. 4a). assay.run()

11. Alternatively, run the Assay with noise. The example shows how to run an Assay with a noise to signal ratio of 10-3; this uses ODEs with noise for simulations (Fig. 4b). assay.run(nsr=1e-3)

406

Gonzalo Vidal et al.

Fig. 4 Genetic ring oscillator simulations. (a) Genetic ring oscillator simulation using ordinary differential equations, lines correspond to the signals of OD and green fluorescence over 25 h. (b) Genetic ring oscillator simulation using ordinary differential equations with a noise to signal ratio (NSR) of 103, lines correspond to the signals of OD and green fluorescence over 25 h. (c) Genetic ring oscillator simulation using a stochastic algorithm, lines correspond to the signals of OD and green fluorescence over 10 h

12. Let us model this genetic network with another metabolism. Create a Metabolism. The example shows the creation of a SimulatedMetabolism with a biomass of 1 and no growth rate. def growth_rate(t): return 0 def biomass(t): return 1 metabolism = lc.SimulatedMetabolism(LOICA metab, biomass, growth_rate)

13. Create a Sample encapsulating the GeneticNetwork and the Metabolism. The example shows the creation of a Sample containing the repressilator and metabolism.

Genetic Network Design Automation with LOICA

407

sample = lc.Sample(genetic_network=repressilator, metabolism=metabolism)

14. Create an Assay. The example shows the creation of an Assay that takes 100 measurements every 15 min with a name, description and using OD as the biomass signal. assay

=

lc.Assay([sample],

n_measure-

ments=1000, interval=1e-2, name=LOICA repressilator, description=Simulated repressilator generated by LOICA, biomass_signal_id=0

15. Run the Assay. The example shows how to run an Assay with default parameters, this uses ODEs for simulations (Fig. 4c). assay.run(stochastic=True)

3.3 Receiver and Inverter Characterization

To characterize a receiver in LOICA, we connect to experimental data in Flapjack, then instantiate a Receiver Operator (See Note 5) and use the characterize method. To characterize an inverter in LOICA, we connect to experimental data in Flapjack, then instantiate a Hill1 Operator to characterize and pass a previously characterized Receiver as argument Operator and then a Hill1 Operator. This process can be described step by step, as follows: 1. Log in to your Flapjack account. For more information on Flapjack, see the next Chapter. fj

=

Flapjack(url_base= flapjack.rudge-lab.

org:8000) fj.log_in(username=input(Flapjack

username:

password=getpass.getpass(Password: ))

2. Get Flapjack data objects. vector = fj.get(vector, name=pAN1818_cyan) yfp = fj.get(signal, name=YFP) vector = fj.get(vector, name=pAN1818_cyan) media = fj.get(media, name=M9 Glicerol) strain = fj.get(strain, name=Top10) biomass_signal = fj.get(signal, name=OD))

),

408

Gonzalo Vidal et al.

3. Create a receiver GeneticNetwork. receiver = lc.GeneticNetwork(vector=vector.id [0])

4. Create a inducer Supplement. In this example, we use IPTG. iptg = lc.Supplement(name=IPTG)

5. Create a Reporter and add it to the receiver GeneticNetwork. The example shows the creation of a YFP Reporter. yfp_rep = lc.Reporter(name=YFP, degradation_rate=0, signal_id=yfp.id[0], color=green) receiver.add_reporter(yfp_rep)

6. Create an Operator and add it to the receiver Genetic Network. The example shows the creation of a Receiver Operator. This design represents a transcriptional units that gets induced with IPTG and expresses GFP. The Operators have parameters to model a Hill equation. iptg_rec_yfp = lc.Receiver(input=iptg, output=yfp_rep, alpha=[1e-3,1e4], K=1e-5, n=2) receiver.add_operator(iptg_rec_yfp)

7. Plot the graph representation for visual inspection of the receiver design (Fig. 5a left panel). plt.figure(figsize=(3,3), dpi=300) receiver.draw()

8. Use the characterize method. This will fit the experimental data to the Receiver Operator model parametrizing alpha, K and n. iptg_rec_yfp.characterize( fj, vector=vector.id, media=media.id, strain=strain.id, signal=yfp.id, biomass_signal=biomass_signal.id)

9. Connect to inverter experimental data. vector2 = fj.get(vector, name=pSrpR-S3_cyan)

10. Create an inverter GeneticNetwork. inverter = lc.GeneticNetwork(vector=vector2.id [0])

Genetic Network Design Automation with LOICA

409

Fig. 5 LOICA inverter characterization workflow. (a) To characterize an inverter you need to build two genetic networks, one for the receiver and another connecting the receiver with an NOT gate to for the inverter. (b) These genetic networks needs to be measured under different concentration of AHL to drive the expression of the receiver. (c) The measurements can be uploaded to Flapjack by it built in parser. (d) LOICA can access experimental data in flapjack to characterize the Receiver and then the Hill1 Operator, thus parametrizing an inverter

410

Gonzalo Vidal et al.

11. Create a Regulator and add it to the inverter GeneticNetwork. In this example, we use SrpR. srpr_reg = lc.Regulator(SrpR) inverter.add_regulator(srpr_reg)

12. Modify the Receiver from previous steps, create an Operator and add them to the inverter GeneticNetwork. This design represents a set of two transcriptional units, one induced by IPTG that expresses SrpR and another repressed by SrpR that expresses GFP. iptg_rec_srpr = iptg_rec_yfp iptg_rec_srpr.output = srpr_reg srpr_not_yfp = lc.Hill1(input=srpr_reg, output=yfp_rep, alpha=[10,1e-3], K=1e3, n=2) inverter.add_operator([srpr_not_yfp,

ipt-

g_rec_srpr])

13. Plot the graph representation for visual inspection of the design (Fig. 5a right panel). plt.figure(figsize=(3.3,3.3), dpi=300) inverter.draw()

14. Assemble the genetic networks from the designs, prepare samples, and measure them in a serial dilution of inducer (Fig. 5b). The example shows the preparation of two 96-well plates with a gradient of AHL inducer, one with each genetic network. Fluorescence and OD are measured in a plate reader. 15. Collect the experimental data, usually an Excel output, and upload it to Flapjack (Fig. 5d) 16. Use the characterize method. This will get experimental data from Flapjack and fit the Hill1 Operator model, parametrizing alpha, K and n (Fig. 5d). Operators characterized from experimental data can be used to build new genetic networks and simulate their behaviour (see Note 6). srpr_not_yfp.characterize( fj, receiver=iptg_ rec_yfp, inverter=vector2.id, media=media.id, strain=strain.id,

signal=yfp.id,

signal=biomass_signal.id) gamma=0

biomass_

Genetic Network Design Automation with LOICA

4

411

Notes 1. A Flapjack account is required to run certain examples; if you do not already have an account, you may register following the instructions in the Flapjack chapter. 2. The version of LOICA used during in this example is 1.0.5. 3. Supplements are added to Sample. 4. The original repressilator constructed by Elowitz [17] has the functional repressilator in one plasmid and the reporter in another. For simplicity, in this design, we get the reporter expressed with one of the repressors in a bicistronic fashion. 5. This code is made using: from flapjack import*. 6. More details can be found in https://github.com/RudgeLab/ LOICA/tree/master/notebooks.

References 1. Endy D (2005) Foundations for engineering biology. Nature 438(7067):449 2. Aldulijan I, Beal J, Billerbeck S, Bouffard J, Chambonnier G, Ntelkis N, Guerreiro I, Holub M, Ross P, Selvarajah V et al (2023) Functional synthetic biology.. Synth Biol 8(1):ysad006 3. McLaughlin JA, Beal J, Mısırlı G, Gru¨nberg R, Bartley BA, Scott-Brown J, Vaidyanathan P, Fontanarrosa P, Oberortner E, Wipat A et al (2020) The synthetic biology open language (SBOL) version 3: simplified data exchange for bioengineering. Front Bioeng Biotechnol 8:1009 4. Mitchell T, Beal J, Bartley B (2022) pySBOL3: SBOL3 for python programmers. ACS Synth Biol 11(7):2523 5. McLaughlin JA, Myers CJ, Zundel Z, Mısırlı G, Zhang M, Ofiteru ID, GoniMoreno A, Wipat A (2018) SynBioHub: a standards-enabled design repository for synthetic biology. ACS Synth Biol 7(2):682 6. Sents Z, Stoughton TE, Buecherl L, Thomas PJ, Fontanarrosa P, Myers CJ (2023) SynBioSuite: a tool for improving the workflow for genetic design and modeling. ACS Synth Biol 12(3):892 ´ (2022) ˜ i-Moreno A 7. Crowther M, Wipat A, Gon A network approach to genetic circuit designs. ACS Synth Biol 11(9):3058 8. Jones TS, Oliveira SM, Myers CJ, Voigt CA, Densmore D (2022) Genetic circuit design

automation with Cello 2.0. Nat Protoc 17(4): 1097 ˜ez Feliu´ G, Earle Go´mez B, Codoceo 9. Ya´n ˜ oz Silva M, Nun ˜ ez IN, Matute Berrocal V, Mun TF, Arce Medina A, Vidal G, Vitalis C, Dahlin J et al (2020) Flapjack: Data management and analysis for genetic circuit characterization. ACS Synth Biol 10(1):183 10. Samineni SP, Vidal G, Vitalis C, Feliu´ GY, Rudge TJ, Myers CJ, Mante J (2023) Experimental data connector (XDC): integrating the capture of experimental data and metadata using standard formats and digital repositories. ACS Synth Biol 12(4):1364 11. Vidal G, Vitalis C, Rudge TJ (2022) LOICA: Integrating models with data for genetic network design automation. ACS Synth Biol 11(5):1984 12. Bartley BA, Choi K, Samineni M, Zundel Z, Nguyen T, Myers CJ, Sauro HM (2018) pySBOL: a python package for genetic design automation and standardization. ACS Synth Biol 8(7):1515 13. Yeoh JW, Swainston N, Vegh P, Zulkower V, Carbonell P, Holowko MB, Peddinti G, Poh CL (2021) Synbiopython: an open-source software library for synthetic biology. Synth Biol 6: Article ysab001 14. Chapman B, Chang J (2000) Biopython: Python tools for computational biology. ACM Sigbio Newslett 20(2):15

412

Gonzalo Vidal et al.

˜ oz Silva M, Castillo15. Vidal G, Vitalis C, Mun ˜ ez Feliu´ G, Federici F, Rudge TJ Passi C, Ya´n (2022) Accurate characterization of dynamic microbial gene expression and growth rate profiles. Synth Biol 7(1):ysac020 16. Tamsir A, Tabor JJ, Voigt CA (2011) Robust multicellular computing using genetically encoded NOR gates and chemical ‘wires’. Nature 469(7329):212

17. Elowitz MB, Leibler S (2000) A synthetic oscillatory network of transcriptional regulators. Nature 403(6767):335 18. Zwietering MH, Jongenburger I, Rombouts FM, Van’t Riet K (1990) Modeling of the bacterial growth curve. Appl Environ Microbiol 56(6):1875

Chapter 23 Flapjack: Data Management and Analysis for Genetic Circuit Characterization Carolus Vitalis, Guillermo Ya´n˜ez Feliu´, Gonzalo Vidal, Macarena Mun˜oz Silva, Tamara Matu´te, Isaac Nu´n˜ez, Ferna´n Federici, and Timothy J. Rudge Abstract Flapjack presents a valuable solution for addressing challenges in the Design, Build, Test, Learn (DBTL) cycle of engineering synthetic genetic circuits. This platform provides a comprehensive suite of features for managing, analyzing, and visualizing kinetic gene expression data and associated metadata. By utilizing the Flapjack platform, researchers can effectively integrate the test phase with the build and learn phases, facilitating the characterization and optimization of genetic circuits. With its user-friendly interface and compatibility with external software, the Flapjack platform offers a practical tool for advancing synthetic biology research. This chapter provides an overview of the data model employed in Flapjack and its hierarchical structure, which aligns with the typical steps involved in conducting experiments and facilitating intuitive data management for users. Additionally, this chapter offers a detailed description of the user interface, guiding readers through accessing Flapjack, navigating its sections, performing essential tasks such as uploading data and creating plots, and accessing the platform through the pyFlapjack Python package. Key words Data management, Genetic circuit characterization, SBOL, Web application, Visualization tools

1

Introduction Flapjack [1] plays a crucial role in the Design, Build, Test, Learn (DBTL) cycle of engineering synthetic genetic circuits. By offering a comprehensive set of features for managing, analyzing, and visualizing kinetic gene expression data and associated metadata, Flapjack facilitates the seamless integration of the Test phase with the Build and Learn phases. With its user-friendly interface and interaction with external software tools such as LOICA [2] and SynBioHub [3], Flapjack is an invaluable resource for advancing research in synthetic biology.

Jeffrey Carl Braman (ed.), Synthetic Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2760, https://doi.org/10.1007/978-1-0716-3658-9_23, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2024

413

414

Carolus Vitalis et al.

Flapjack can handle large volumes of measurement data and associated metadata. Researchers can effectively collect, organize, and analyze data from numerous circuits, conditions, assays, and laboratories. This enables a comprehensive understanding of circuit behaviors in diverse contexts, leading to improved performance and functionality of genetic circuits. Flapjack provides an intuitive interface for exploring and analyzing measurement data. Its interactive visualization tools empower researchers to dynamically examine and interpret results, uncovering patterns and correlations within the data. This capability aids in the identification of potential circuit optimizations and design improvements, ultimately enhancing the efficiency and effectiveness of the DBTL cycle. Moreover, the Flapjack platform can interact with different tools and workflows via its REST API and Python package. This integration enables researchers to incorporate Flapjack into existing workflows, leveraging its capabilities with other tools. By facilitating interoperability, Flapjack enhances the efficiency and versatility of the DBTL cycle, allowing researchers to fully leverage the potential of synthetic genetic circuits in various applications.

2

Materials Flapjack can be used entirely from a web browser using the frontend user interface. For more complex analysis, post-processing and custom plotting users may optionally install the pyFlapjack Python package. Installing pyFlapjack via pip will automatically install some required packages. These prerequisites include Python 3.7 or a later version. We recommend the use of an environment manager, such as Anaconda (https://www.anaconda.com/).

2.1

Installation

The pyFlapjack Python package is distributed using the Python Package Index (PyPI), which utilizes the Pip Installs Package (PIP) for installation and update management. pyFlapjack can be installed using the following commands: pip install pyflapjack

To verify that the installation was successful, users should be able to run the following command with no errors: import flapjack

Flapjack: Data Management and Analysis for Genetic Circuit Characterization

3 3.1

415

Methods Data Model

3.2 Accessing Flapjack 3.2.1 Creating an Account

The data model used in Flapjack is analogous to typical laboratory workflows and data organization, making data management intuitive to the user. The data model can be seen in Table 1. Access the Flapjack page at http://flapjack.rudge-lab.org/. The front page is shown in Fig. 1. Here, the user will find two buttons. Select “Ready to get started? Sign Up!” to create a new account. On the next page, provide the Username, Email, and Password, and click “Register.”

3.2.2 Logging in to an Existing Account

If the user has already registered in the platform, simply select the button “Already have an account? Log In!” Enter the Username and Password on the next page and then press “Login.”

3.3

Figure 2 shows Flapjack’s home page after signing up. On the home page, three buttons are found, each one with its description:

Home Page

. Upload: Kinetic data from a microplate reader and other sources can be uploaded along with associated metadata. . Browse: Browse published studies, assays, and available DNA. . Search and Analyse: Query public and private data to visualize, analyze, and model. 3.4 Preparing and Uploading Data in Flapjack

Flapjack is a robust platform that facilitates the efficient uploading and management of data obtained from plate readers. This chapter provides a detailed guide on preparing and uploading data files into Flapjack. It covers the necessary steps for file preparation, including adding relevant sheets with assay information. It outlines the process of uploading the file to Flapjack, creating new studies, and associating data with existing information in the platform’s database. By following these instructions, researchers can seamlessly integrate their experimental data into Flapjack for further analysis and exploration.

3.4.1 File

Data obtained directly from a plate reader can be uploaded to Flapjack as an Assay after minor adjustments. First, once the Excel file is obtained from the plate reader, we will get something like Fig. 3. It is necessary to add some sheets with relevant metadata about the Assay to this file. The first sheet with our data must be renamed to “Data,” so Flapjack can identify the data it contains. This step is crucial; otherwise, Flapjack will not recognize the file, and the upload cannot be continued.

Preparing the Data

416

Carolus Vitalis et al.

Table 1 Flapjack models and their attributes. Attributes in italics are optional when creating an object of that type Model

Attributes

Description

Study

name description doi owner shared_with public

A project, for example, a paper or report, that corresponds to a particular question a researcher wants to address.

Assay

study name machine description temperature

Measurement experiments performed to explore different study aspects. This includes replicates and varying experimental conditions.

Media

owner name description

Composition of the substrate which drives the genetic circuit, media in the case of live cell assays, or extract for cell-free experiments.

Strain

owner name description

The chassis organism—if any—hosting the genetic circuit.

Chemical

owner name description pubchemid

Any supplementary chemicals that interact with components of the genetic circuit.

Supplement

owner name chemical concentration

Dna

owner name sboluri

Vector

owner name dnas

Sample

assay media strain vector supplements row col

Signal

owner name description color

Measurement sample signal value time

Describes the synthetic DNAs encoding a genetic circuit, including links to part composition and sequence via the corresponding SBOL URIs.

Corresponds to the basic unit subject to measurement, for example, a colony or a well in a microplate.

The raw measurement value recorded for a particular sample during an assay at a particular time.

Flapjack: Data Management and Analysis for Genetic Circuit Characterization

Fig. 1 Flapjack frontend front page, seen before logging in

Fig. 2 Flapjack home page, seen after logging in

417

418

Carolus Vitalis et al.

Fig. 3 An example of an Excel data file produced by a plate reader

Fig. 4 Media metadata sheet, which is added to the original Excel file from a plate reader to specify the media in which each Sample (well) was grown

The first sheet to be added is the “Media,” shown in Fig. 4, which will indicate the media used in each well of the plate. As the plate reader follows an arrangement of rows from A to H and columns from 1 to 12, we will follow this arrangement, reserving the first cell A1 as the sheet’s title. Thus, in cells A2 to A9, we will have the axis of the rows, while from B1 to M1, we will have the axis of the columns. Then, we will fill each cell with the media that was incorporated. If no media has been incorporated, we write “None” in the cell. In this case, the same media was incorporated in all the wells; therefore, we repeat the value in all the cells. We continue with the next sheet, which we will call “Strains.” Following the previous arrangement, we place the cell strain used in each well, and again if no strain has been used in a particular well, we will fill this cell with the value “None.” In this case, our last row contains the value “None,” indicating that these Samples are controls. Flapjack will automatically recognize these control Samples and use them to correct for background fluorescence and optical density.

Flapjack: Data Management and Analysis for Genetic Circuit Characterization

419

Fig. 5 DNA metadata sheet added to the original Excel file from a plate reader, showing how multiple plasmids can be present in each Sample (well). “None” indicates that the well does not contain cells with synthetic DNA (negative controls)

Next, we generate the DNA sheet, and in it, we place—again following the previous arrangement—the plasmids we have used. If we have used more than one plasmid, we generate a new table below, leaving an empty row. In our example, cells were co-transformed with two plasmids; hence, we have two tables “DNA 1” and “DNA 2,” as shown in Fig. 5. Finally, we created the sheet “Chemicals,” where we will add the Chemicals used, following the same arrangement as before. Chemicals are any substance that is added to the media as a Supplement at a given concentration, such as an inducer of gene expression. Be careful; the concentrations must be in molar; consider that when entering the values. In the end, we should have the following sheets: Data, Media, Strains, DNA, and Chemicals. 3.4.2

Study

Uploading the File

We are now ready to upload the file to Flapjack. Once logged in (see Subheadings 3.2.1 and 3.2.2), place the cursor over your username, where the “Upload” option will be displayed, as shown in Fig. 6. After clicking, the page shown in Fig. 7 will be displayed. The fields to be filled are covered in the next sections. A Study represents a group of Assays that were performed for some research purpose, perhaps to assess the effect of growth conditions on gene expression. Select one of the studies available from the drop-down menu. Here, only studies owned or shared with the user will be displayed. If the study does not previously exist, select the “Create new study” button. In our example, we will create a new one named “Example.”

420

Carolus Vitalis et al.

Fig. 6 Flapjack dropdown menu showing the Upload option

Fig. 7 Upload File dialog showing the first metadata entry form

Creating a New Study

For a new study, three fields are required: Name, Description, and DOI. The user will be asked to provide these fields, as shown in Fig. 8. The DOI is optional but good practice if available. There is also an option to make the study public. After entering the details, select “Create Study.”

Machine

It indicates on which type of instrument the experiment was conducted. This is important because each machine delivers different files, and Flapjack treats them accordingly. In our example, it is an HTX Synergy plate reader.

Data File

The file to be uploaded is selected. The user can select the trash can icon if the file is to be removed and replaced with another one. Then select “Next.” Now, we will be asked for the details of the Assay. We must fill in the fields: Name, Description, and the temperature in degrees Celsius at which the experiment was conducted. All of them are mandatory.

Flapjack: Data Management and Analysis for Genetic Circuit Characterization

421

Fig. 8 Create new Study popup window showing metadata entry form for describing a new Study

Fig. 9 Metadata entry form for describing the DNA used in the Assay to be uploaded

Remember that we can have several Assays per Study, so we must place a name accordingly. In our example, we will call it “Assay 1,” but feel free to provide a more detailed name. Once this is done, press Submit. After the file has been uploaded to the platform, the window shown in Fig. 9 will appear with data to be completed according to our particular file. In this window, the user must match the data— DNA names, for instance—with the data available on the server using the drop-down menu or the search function. If the value is

422

Carolus Vitalis et al.

Fig. 10 Popup window to create a new DNA

Fig. 11 Metadata entry form to describe chemicals (e.g., inducers) added to Samples in the Assay

not found, it can be added with the button “Create new plasmid,” where the name entered in the Excel file for that plasmid is meant to be entered here. The following pop-up windows will appear when a plasmid value is created, shown in Fig. 10. The name to be assigned to it (required) and an SBOL Uri (optional) must be entered here, then “Create DNA” must be pressed. Then, we must enter the Chemical details, following the same steps as above, as indicated in Fig. 11. If the chemical is not found, select the “Create new chemical” button, where a pop-up window will appear, as depicted in Fig. 12, with fields to fill in: Name, Description, and PubChem ID (optional).

Flapjack: Data Management and Analysis for Genetic Circuit Characterization

423

Fig. 12 Popup window to create a new chemical (e.g., inducer)

Fig. 13 Metadata entry form to describe the Signals measured in the Assay to be uploaded

Once this is completed, press “Next” to go to the next section. Finally, we fill in the fields for signal, as shown in Fig. 13. Again, we can create a new signal if not found, as illustrated in Fig. 14. To do this, select the “Create new signal” button where the signal is equivalent to the value. Here, we must complete the fields: Name, Description, and Color. All are required. The “Submit Metadata” button should be selected after the data entry is verified to be correct. A loading bar will appear indicating the progress of the upload, and that is it; the data has been successfully uploaded to Flapjack.

424

Carolus Vitalis et al.

Fig. 14 Popup window to create a new Signal

Fig. 15 Flapjack Browse Page, which lists Studies, Assays and Vectors available to the user. The Studies tab is selected. The other two tabs—Assays and Vectors—are shown

3.5

Browse Page

3.5.1

Actions

Studies

All the Studies, Assays, and Vectors available for review are displayed in the Browse page as seen in Fig. 15, either because they are public or because they have been shared with this account. The Search box can be used in both the Studies and Assays sections. To remove the search, simply select the x button. The list of studies can be accessed in the first tab, corresponding to Studies. This tab has the following parameters: Name, Description, and DOI, which allow us to identify the different studies. Next to these parameters, we find the Subheading Actions which allows us to see the different actions we can perform in the study. Depending on whether the Study’s owner is the user or not, the

Flapjack: Data Management and Analysis for Genetic Circuit Characterization

425

available Actions will vary. The image below shows that an owner can manage the data. For instance, the owner can access the Manage button, which displays three options: Share, Make public, or Delete. Share

When sharing a study, a pop-up window will appear. The user email of the person with whom the study will be shared is entered here. This window also shows a list of all the people who were given access to this study. The message “Study successfully shared” will be displayed on the screen after pressing the Share button. Access to the study can be removed by selecting the trash can icon next to the user’s email if the decision to share it with someone is changed at any point. The message “Study unshared successfully” will appear on the screen after the trash can icon has been pressed.

Make Public

The study can be made public by selecting this button so that anyone can browse its content without individual sharing. This function is independent of the Share function, so access to the study will be maintained by those previously indicated if the study is made private again after being public.

Delete

The study will be removed from the database by this button. No confirmation is required, and the action cannot be undone. All associated Assays and their measurement data will be removed. Caution is advised.

3.5.2

Assays

An Assay is a procedure which measures Signals produced by a set of Samples, such as a kinetic plate-reader experiment. In this tab, we can see the name. ID, description, study to which the assay is associated, temperature, machine, and button to view the data.

3.5.3

Vectors

Once in the Vectors tab, we can see the Vectors, which are the synthetic DNA transformed into the cell Strain. A SynBioHub URI (link) is associated with each Vector, which allows quick access to the corresponding SBOL representation for more detailed metadata. SynBioHub is shown in Fig. 16. One Transcriptional Unit is reviewed in this example.

3.6

View Page

The Data Viewer button under the Actions tab can be selected to view the data associated with a Study, Assay, or Vector. The Analysis page is then reached. It can also be accessed via the View tab on the top bar. A new tab, named Analysis #, is generated when the View Page is opened, where # is the number of the tab according to when it was generated.

426

Carolus Vitalis et al.

Fig. 16 SynBioHub entry linked from Flapjack, which can be accessed via the Browse page for Vector

The tab can be renamed to any name by pressing the pencil icon (Fig. 17). The View page has three options: Query, Analysis, and Plot. 3.6.1 View Page Filters and Options

The Query section is subdivided into six filters to select a specific set of measurement data, which are:

Query

. Studies . Assays . Vector . Strain . Media . Signal In this example, we will select Reporter behavior in the Study filter, as seen in Fig. 17. All related Assays (and their components, such as Vector, Strain, Media, and Signal) will be automatically selected by selecting this Study. Filters can be removed by checking each field as desired.

Flapjack: Data Management and Analysis for Genetic Circuit Characterization

427

Fig. 17 Flapjack View page. Example of a Query section filter. Here, the Reporter behaviour study has been selected Analysis

The data can be analyzed by selecting the desired option from here. The available analysis types are listed below. More details can be obtained in references [1] and [4]. . Expression Rate (Indirect) . Expression Rate (Direct) . Expression Rate (Inverse) . Rho . Alpha . Induction Curve . Heatmap . Kymograph

Plot Options

. Normalize: The default option is no normalization (None), but you also have options for Temporal Mean, Mean/std, and Min/Max. . Subplots: Data can be grouped into different subplots according to the user’s plot needs. The Signal is the default option, but Study, Assay, Vector, Media, Strain, and Supplement are also available. . Lines/Markers: Different line and marker colors can be plotted according to the user’s needs. The default option is to group by Vector, but other available options are Study, Assay, Media, Strain, Supplement, and Signal. . Plot: This determines whether to show all Samples or to compute their mean and/or standard deviation. Mean +/- std is the default option chosen, but Mean and All data points are also available. Each filter or option has a Clear button to clear the settings that have been selected.

428

Carolus Vitalis et al.

Fig. 18 Plots downloaded from Flapjack’s front end, showing raw measurement data (top) and computed gene expression rate (CFP, RFP, and YFP) and growth rate (OD) 3.7

Creating a Plot

Now that we have familiarized ourselves with the interface, we will recreate the plot shown in Fig. 18 step by step using the Flapjack frontend. Once we have previously logged into our account (see Subheadings 3.2.1 and 3.2.2), we must go to the View tab and, under the option “Studies,” select the study “Context effects.” Now we can click on the Plot button. Immediately a Plot will be generated, and we will see the loading percentage on the screen. However, when plotting with the default options, we get a plot that does not look much like our target plot. This is because we have not defined the best parameters for the display of this particular case. We can see that the default plot shows us 14 vectors (pAAA, pGEA, pEAA, pECA, pEDA, pEFA, pGAA, pGDA, pGCA, pGFA, pBAA, pBCA, pBDA, and pBFA), which is correct because this Study was made with all those vectors, but this time we only want to visualize the vector pAAA. Hence, the first thing we will do is to deselect the other vectors. To perform this task, we open the option Vector and remove all vectors except pAAA, then proceed to click on Plot again, obtaining Fig. 19. We now have only the data that interest us; however, these are separated into four plots because the default option groups subplots according to Signal. To have them all together in one plot, we

Flapjack: Data Management and Analysis for Genetic Circuit Characterization

Measurement

CFP

OD

60k

429

pAAA

0.8 0.6

40k

0.4

20k

0.2

0

0 RFP

YFP

Measurement

600k 200k

400k

100k

200k 0

0 5

10

15

20

5

Time

15

10

20

Time

Fig. 19 Plots from the Context effects study showing mean and standard deviation of measurements of fluorescence (CFP, RFP, and YFP) and optical density (OD) pAAA CFP

600k

OD RFP

Measurement

500k

YFP

400k 300k 200k 100k 0 5

10

15

20

Time

Fig. 20 Plot of mean and standard deviation of all the Signals measured for Vector pAAA in the Context effects Study

must group them by Vector. To do this, we go to the Plot Options section and, under Subplots, select “Vector,” Also, under Lines/ Markers, we will select “Signal,” to show each measurement channel in a different color. We proceed to select Plot again, and we should have something like Fig. 20. It can be observed that we are approaching the original plot. However, we see the raw measurements here, so we must still normalize the data for better visualization. For this, we go to the

430

Carolus Vitalis et al.

Fig. 21 Visualization and analysis of the Induction Curve study. Here four signal inverters are characterized by plotting their mean fluorescence as a function of added IPTG inducer

Normalize option and select “Min/Max,” to eliminate distractions, we will remove the standard deviation, selecting only “Mean” under Plot. We proceed to press Plot, and finally, we have correct the plot. 3.7.1

3.8

Induction Curve

pyFlapjack

In Fig. 21, we consider a Study characterizing signal inverter circuits that use repressor proteins to produce low gene expression when the input signal (IPTG in this case) is at high concentration, and vice versa. To characterize the behavior of these circuits we use the Induction curve analysis type, choosing the appropriate signal Chemical. In this plot we have grouped subplots by Vector to show four different inverter circuits, and lines/markers are colored according to Signal (YFP). We can see that the circuits are functional. This section covers the basics of how to use the pyFlapjack Python Package. It should be noted that studies can only be accessed by their owners, by those who have been granted access, or by anyone if they are public.

Flapjack: Data Management and Analysis for Genetic Circuit Characterization 3.8.1 Importing pyFlapjack

431

To import pyFlapjack, use the following command: import flapjack

3.8.2 Connecting to Flapjack

A class called Flapjack is defined by the pyFlapjack package, which connects to the Flapjack web application via its API. The Flapjack instance –either online or local– to which it connects can be specified. The main instance of Flapjack is connected to this example. fj

=

flapja ck.Flap jack( fl apjack. rudge-l ab.

org:8000)

The following command is used to log in after the creation of the Flapjack object: fj.log_in(username=user, password=passwd)

3.8.3

Functions

get Function

The Flapjack package provides several functions to access the Flapjack data model. These functions make HTTPS requests to the Flapjack web application to retrieve the required data and metadata. This function retrieves information about objects from a particular table in the database. variable_name = fj.get(’model’, attribute=value)

Where variable_name is the assigned name for easy later access. fj.get states that the get function is being called. Table 1 lists the values accepted for model and attribute, and value is the value is the value that is sought. Here is an example of using the get function to find a study called Context effects: study = fj.get(’study’, name=’Context effects’)

The database may be queried against any of the attributes of the model in question, and the keyword argument “search” specified to query all attributes. Possible errors: If the user specifies a model not present in Flapjack, like typing studies instead of study, an error stating “model studies does not exist” will pop up. An empty data frame is returned if no objects are found matching the query.

432

Carolus Vitalis et al.

Create Function

This function creates a new object. All the information needed to create the object must be specified; otherwise, an error is obtained. Refer to Table 1. variable_name = fj.create(’model’, attribute_1=value_1, attribute_n=value_n)

Where variable_name is the assigned name for easy later access. fj.create states that the create function is being called. The values accepted for model and attribute are listed in Table 1, and value is the value the user wants to assign to the corresponding attribute. Important: It is necessary to specify every required attribute for the model to be created; otherwise, an error will be encountered. Here is an example of using the create function to create a study called “Test study” and with the description “This is a test”: study = fj.create(’study’, name=’Test study’, description=’This is a test’)

Consider that we only provided values for name and in this example because these values are required, as shown in Table 1. description

Delete Function

Allows the user to delete the object. The user has to specify the model id. variable_name = fj.delete(’model’, id_number)

Where variable_name is the assigned name for easy later access. fj.delete states that the delete function is being called, and id_number is the model’s id at which the attributes are to be modified, which may be obtained by calling the get function to query for existing objects. The values accepted for model and attribute are listed in Table 1. Here is an example of deleting an assay with id = 0: deleting_assay = fj.delete(’assay’, 0)

Analyzing Data

To analyze data in Flapjack, we first use the get function to identity the metadata. For example, study

= fj.get(’study’, name=’voigt inverters

RVs’) vector = fj.get(’vector’, name=’pAN1818_cyan’) vector2 = fj.get(’vector’, name=’pSrpR-S3_cyan’) media = fj.get(’media’, name=’M9 Glicerol’)

Flapjack: Data Management and Analysis for Genetic Circuit Characterization YFP

160k

pAN1818_cyan pSrpR-S3_cyan

140k Mean expression (AU)

433

120k 100k 80k 60k 40k 20k 0 1.00e–5

1.00e–4

1.00e–3

Concentration IPTG (M)

Fig. 22 Induction curves plot generated using pyFlapjack. Induction curves of the signal receiver pAN1818 and the signal inverter pSrpR-S3 showing mean fluorescence at different concentrations of inducer IPTG strain = fj.get(’strain’, name=’Top10’) iptg_chem = fj.get(’chemical’, name=’IPTG’) yfp = fj.get(’signal’, name=’YFP’) biomass_signal = fj.get(’signal’, name=’OD’)

Then, we can use the plot function. The example shows how to query by study, vector, and signal. For the analysis, select as type Induction curve with function mean expression (the value to plot for each inducer concentration). Keyword arguments for the analysis include analyte (the inducer Chemical) and biomass_signal (the Signal representing culture growth). The grouping of data can be specified as parameters corresponding to the options in the front end. The final result can be seen in Fig. 22. fig = fj.plot(study=study.id, vector=[vector.id, vector2.id], signal=yfp.id, type=’Induction Curve’, analyte=iptg_chem.id[0], function=’Mean Expression’, biomass_signal=biomass_signal.id, normalize=’None’, subplots=’Signal’, markers=’Vector’, plot=’All data points’ ) fig

434

4 4.1

Carolus Vitalis et al.

Notes Patch Function

This function allows the user to modify something that already exists. The user has to specify the model id. variable_name = fj.patch(’model’, id_number, attribute=value)

Where variable_name is the assigned name for easy later access. fj.patch states that the patch function is being called, and id_number is the id of the study at which the attributes are to be modified. The values accepted for model and attribute are listed in Table 1, and value is the new value the user wants to assign. Here is an example where we modify the name of a study with ID 0 to “Changed name”: study = fj.patch(’study’, 0, name=’Changed name’)

References ˜ ez Feliu´ G, Earle Go´mez B, Codoceo 1. Ya´n ˜ oz Silva M, Nun ˜ ez IN, Matute Berrocal V, Mun TF, Arce Medina A, Vidal G, Vitalis C, Dahlin J, Federici F, Rudge TJ (2021) Flapjack: data management and analysis for genetic circuit characterization. ACS Synth Biol 10:183–191. https://doi.org/10.1021/acssynbio.0c00554 2. Vidal G, Vitalis C, Rudge TJ (2021) LOICA: integrating models with data for genetic network design automation. ACS Synth Biol 2021.09.21.460548-2021.09.21.460548. https://doi.org/10.1021/acssynbio.1c00603

3. McLaughlin JA, Myers CJ, Zundel Z, Mlslrll G, ˜ i-Moreno A, Wipat A Zhang M, Ofiteru ID, Gon (2018) SynBioHub: a standards-enabled design repository for synthetic biology. ACS Synth Biol 7:682–688. ht tps://doi.org/10.1021/ ACSSYNBIO.7B00403 ˜ oz Silva M, Castillo4. Vidal G, Vitalis C, Mun ˜ ez Feliu´ G, Federici F, Rudge TJ Passi C, Ya´n (2022) Accurate characterization of dynamic microbial gene expression and growth rate profiles. Synth Biol 7. https://doi.org/10.1093/ SYNBIO/YSAC020

Part IV Molecular Assembly

Chapter 24 In Vivo DNA Assembly Using the PEDA Method Tianyuan Su, Qingxiao Pang, and Qingsheng Qi Abstract Simple and efficient DNA assembly methods have been widely used in synthetic biology. Here, we provide the protocol for the recently developed PEDA (phage enzyme-assisted in vivo DNA assembly) method for direct in vivo assembly of individual DNA parts in multiple microorganisms, such as Escherichia coli, Ralstonia eutropha, Pseudomonas putida, Lactobacillus plantarum, and Yarrowia lipolytica. PEDA allows in vivo assembly of DNA fragments with homologous sequences as short as 5 bp, and the efficiency is comparable to the prevailing in vitro DNA assembly, which will broadly boost the rapid progress of synthetic biology. Key words In vivo, DNA assembly, Molecular cloning, T4 DNA ligase, T5 DNA exonuclease, Multiple microorganisms

1

Introduction Molecular cloning is essential for biomedical, biotechnological, and synthetic biology research [1, 2]. The recombination-dependent in vitro DNA assembly methods, such as Gibson assembly [3], TEDA [4], SLIC, CPEC [5], and In-Fusion™ [6], have been broadly used in the construction of synthetic metabolic pathways and complex genetic circuits. These methods enable seamless assembly of multiple DNA parts in one tube using diverse enzymes or the mixtures. However, with the availability of increasing genome data and the development of cost-effective chemical DNA synthesis techniques, the assembly of DNA fragments is becoming more imperative than before and the demand for reliable, simple, and efficient DNA assembly methods is extremely urgent [7]. Compared to in vitro DNA assembly, direct in vivo assembly of DNA parts is more attractive considering its costeffective and time-saving merits [8–11]. In this chapter, we describe an efficient in vivo DNA assembly method (phage enzyme-assisted in vivo DNA assembly, PEDA) in detail [12]. Specifically, combinatorial expression of the phage-

Jeffrey Carl Braman (ed.), Synthetic Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2760, https://doi.org/10.1007/978-1-0716-3658-9_24, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2024

437

438

Tianyuan Su et al.

Fig. 1 The molecular mechanisms of PEDA. First, the 5′ ends of the introduced linear DNA are degraded by the T5 Exo. Then, the assembly of RecA filaments promote the annealing of the homologous ssDNA. Finally, the host’s DNA polymerase fills the DNA gaps and the T4 Lig repairs the DNA nicks, generating the desired circular DNA

derived T5 DNA exonuclease (T5 Exo), and T4 DNA ligase (T4 Lig), enables seamless assembly of DNA parts directly in the host. As shown in Fig. 1, the 5′ ends of the introduced linear DNA are firstly digested by T5 Exo, producing ssDNA. Then, the endogenous RecA filaments promote the annealing of the homologous ssDNA [13]. The DNA gaps are filled by the intracellular DNA polymerases (e.g., DNA polymerase I [14]) and the resulting nicks are finally repaired by T4 Lig, generating the intact circular DNA. In general, PEDA is efficient for routine DNA cloning as well as direct construction of DNA library in multiple microorganisms. This high host-compatible in vivo DNA assembly method enables molecular cloning to be more simple, rapid, and economical and will greatly accelerate the study of molecular biology, metabolic engineering, and synthetic biology.

2 Materials 2.1 Polymerase Chain Reaction (PCR)

1. Thermocycler.

2.1.1 Equipment

3. Refrigerator (-80 °C).

2. Microcentrifuge.

In Vivo DNA Assembly Using the PEDA Method 2.1.2

Reagents

439

1. 0.25 mL PCR tubes. 2. 10 mM dNTPs. 3. High fidelity DNA polymerase (such as Phanta Max SuperFidelity DNA. Polymerase from Vazyme Biotech) for DNA amplification. 4. High fidelity DNA polymerase reaction buffer. 5. Nuclease-free water. 6. DNA template for the amplification of specific DNA fragment. 7. Specific primers for the amplification of DNA fragments. 8. DNA polymerase mixture (such as 2× Taq Master Mix from Vazyme Biotech) for colony PCR.

2.2 DNA Electrophoresis and Purification

1. Gel imaging system.

2.2.1

4. Nanodrop spectrophotometer.

Equipment

2. Microcentrifuge. 3. Transilluminator (UV or blue light). 5. DNA gel electrophoresis setup. 6. Scalpel.

2.2.2

Reagents

1. FastDigest DpnI (Thermofisher Scientific). 2. FastDigest buffer (Thermofisher Scientific). 3. 1.5 mL microfuge tubes. 4. 1× TAE (40 mM Tris base, 40 mM acetic acid, 1 mM EDTA). 5. 0.8% agarose gel in TAE. 6. SYBR safe nucleic acid gel stain. 7. 1 kb DNA generuler ladder (Thermofisher Scientific). 8. 6× DNA gel loading dye (Thermofisher Scientific). 9. DNA gel extraction kit. 10. DNA PCR extraction kit.

2.3 Electrocompetent E. coli Cells

1. Microcentrifuge.

2.3.1

4. Clean bench (for inoculation).

Equipment

2. Incubator shaker (for bacterial cell culture). 3. Spectrophotometer. 5. Gene Pulser electroporation electrotransformation). 6. Autoclave.

system

(for

440 2.3.2

Tianyuan Su et al. Reagents

1. E. coli DH5α::araC-T4T5 (DH5α genome integrating araC, T4 Lig and T5 Exo) [12]. 2. E. coli Turbo::araC-T4T5 (Turbo genome integrating araC, T4 Lig and T5 Exo) [12]. 3. Lysogeny broth (LB) medium (5 g/L yeast extract, 10 g/L tryptone, 10 g/L NaCl, pH 7.0). 4. SOC medium (20 mM glucose, 20 g/L tryptone, 5 g/L yeast extract, 0.5 g/L NaCl, 2.5 mM KCl, 5 mM MgCl2, 5 mM MgSO4, pH 7.0). 5. 50 mL centrifuge tubes. 6. 300-mL shake flasks. 7. Antibiotic stock solution. 8. 1.5 mL microfuge tubes. 9. LB agar plate. 10. Arabinose stock solution (10% w/v). 11. Glycerol (10% v/v).

3

Methods

3.1 Design the Primers

1. The target DNA fragment used for in vivo DNA assembly should have overlaps with the adjacent DNA fragment on the ends. Therefore, the primers consist of two parts: the 3′ end part (20–25 bp) that is identical with the target DNA for amplification and the 5′ end part (10–20 bp) that is identical to the end of the adjacent DNA fragment for assembly (see Note 1). 2. The linearized vector is obtained by endonuclease digestion or PCR amplification (see Note 2). The primers used to amplify the linearized vector were designed as usual, that is, 20–25 bp in length and annealed to the DNA ends. 3. To detect the assembled plasmid, the test primers should be designed on the vector backbone and ensure that the PCR product contains the DNA ligation site junctions (see Note 3). 4. All of the primers are ordered from commercial vendors (such as Tsingke Biotech).

3.2 Amplification of the DNA Fragments 3.2.1 Prepare the Following Master Mixture in 50 μL Volume

Reagents

Volume (μL)

10 μM forward primer

2

10 μM reverse primer

2

10 mM dNTP

1

Phanta Max Super-Fidelity DNA polymerase

1 (continued)

In Vivo DNA Assembly Using the PEDA Method

3.2.2 Set Up the Thermal Cycling Under the Following Program

3.3 Purification of the DNA Fragments

441

Reagents

Volume (μL)

5× DNA polymerase buffer

10

DNA template

1

Nuclease-free water

Top up to 50

Step

Temperature (°C)

Time

Initial denaturation

95

3 min

35 cycles

95

15 s

56

15 s

72

1 min/kb

Final extension

72

3 min

Hold

16



1. If the DNA template used in PCR is the plasmid, 1 μL of FastDigest DpnI should be added to per 50 μL of PCR products and incubated for 1 h at 37 °C to remove the DNA template (see Note 4). 2. Run the PCR products on 0.8% agarose gel in 1× TAE buffer using the DNA gel electrophoresis setup and visualize using the gel imaging system to check for the DNA size. 3. If only the target DNA band is present without any undesired bands, the PCR product can be recovered by the PCR extraction kit according to the manufacturer’s protocol. If there is a large amount of undesired product, cut out the desired DNA band under the transilluminator and gel-purify the DNA fragments according to the manufacturer’s protocol using the DNA gel extraction kit. 4. Measure the final DNA concentrations using the NanoDrop.

3.4 Preparation of the Electrocompetent Cells

All procedures in this section should be carried out on a clean bench. All of the experimental supplies, including tubes, tips, ddH2O, mediums, antibiotics, should be sterilized by standard autoclaving (121 °C, 15 min) or membrane filtration. 1. Streak the E. coli cells with genomic integration of arabinoseinduced T4 DNA Lig and T5 DNA Exo (DH5α::araC-T4T5 or Turbo::araC-T4T5) onto the LB plate and incubate at 37 °C overnight (see Note 5). 2. Pick the single colonies from the LB plate, inoculate into 5 mL of liquid LB medium, and incubate at 37 °C overnight in the incubator shaker with an agitation of 250 rpm.

442

Tianyuan Su et al.

3. Inoculate 1 mL overnight cell culture to 50 mL fresh LB medium and supplement with 1 mL arabinose stock solution (10% w/v) to induce the expression of T4 DNA Lig and T5 DNA Exo. The cultures continue to incubate in the incubator shaker at 37 °C until OD600 reaches 0.6–0.8. 4. Transfer the cell culture to the 50 mL centrifuge tube and centrifuge at 2900× g for 10 min at room temperature to harvest the cells (see Note 6). 5. Discard the supernatant and gently resuspend the pellets in the centrifuge tube with 50 mL of 10% (v/v) sterile glycerol. 6. Repeat Steps 4 and 5 for three times to completely remove the ions from the cells. 7. Discard the supernatant and gently resuspend the pellets in the centrifuge tube with 1 mL of 10% (v/v) sterile glycerol. 8. Aliquot the electrocompetent cells suspension per 100 μL into the 1.5 mL sterile microfuge tubes, store at room temperature for no more than 2 h, or into -80 °C refrigerator for long-term storage. 3.5 Electrotransformation of DNA into the Cells

1. Add 10–20 μL of DNA fragments (100–500 ng) into 100 μL of electrocompetent cells and mix gently by tapping the tubes (see Notes 7 and 8). 2. Transfer the mixture into a 2-mm sterile electroporation cuvette and electroporate at 2.5 kV, 25 μF and 200 Ω (see Note 9). 3. Immediately add 1 mL of SOC liquid medium into the electroporated cells, transfer to 1.5 mL sterile microfuge tube, and recover for 1 h at 37 °C with the incubator shaker. 4. Plate 100 μL of the cell culture onto the LB plates with appropriate antibiotic and incubate at 37 °C overnight (see Note 10).

3.6 Validation of the Assembled DNA by Colony PCR

1. Pick the single colonies and re-streak on the LB plate using the sterile pipette tips (see Note 11). 2. Slightly stir the pipette tip in a 0.25 mL PCR tube containing 20 μL of PCR mixture to transfer the small amounts of cells into the PCR mixture. 3. The PCR mixture including 10 μL 2× Taq Master Mix, 8 μL nuclease-free water, 1 μL 10 μM forward test primer, and 1 μL 10 μM reverse test primer (see Note 12). 4. Run the PCR according to the following program: initial denaturation at 95 °C for 10 min, followed by 30 cycles of (95 °C for 15 s, 56 °C for 15 s, 72 °C for 1 min/kb PCR product), and 72 °C for 5 min.

In Vivo DNA Assembly Using the PEDA Method

443

5. 5 μL of the PCR products is electrophoresed on a 0.8% agarose gel for 25–30 min to confirm the desired DNA band. 6. Sanger sequencing of the PCR product to further confirm the correct assembly of the target DNA fragment. Summary PEDA can be used for in vivo assembly of DNA in various microorganisms using the short homologous sequences with considerable efficiency. Ralstonia eutropha, Pseudomonas putida, Lactobacillus plantarum, and Yarrowia lipolytica have been experimentally confirmed to work using PEDA. Specifically, the host microorganisms should first express T4 Lig and T5 Exo and then introduce the DNA fragments into the cells according to the corresponding DNA transformation methods for in vivo DNA assembly. Finally, the successful assembly of the DNA can be determined by colony PCR. The ultimate plasmid can be extracted using a plasmid extraction kit according to the supplier’s protocol.

4

Notes 1. For two DNA fragments assembly, a minimum of 5 bp overlap region is acceptable, and for more than two DNA fragments or DNA library, increasing the overlap region to 20 bp can significantly improve the efficiency. 2. PCR is recommended for the preparation of linearized plasmid fragments, which reduces the restriction of endonuclease recognition sites, but for plasmid backbones over 6 kb, there is an increased risk of mutations arising from PCR amplification and subsequent DNA sequencing is required. 3. The PCR products need to contain the DNA overlap regions used in the DNA assembly, and the success assembly can be easily judged by the size of the amplified DNA fragments on the agarose gel. For multiple DNA fragments, or the two DNA overlap regions exceeding 5 kb, separate primers can be designed to detect each DNA junction. 4. FastDigest DpnI only cleaves the fully-adenomethylated dam sites. The plasmid DNA purified from a dam+ strain will be the substrate for DpnI, and the DNA produced by PCR amplification will not be digested by DpnI. Thus, DpnI digestion can remove the plasmid DNA template from the PCR product. 5. For routine DNA cloning, DH5α is recommended. If you want to speed up the experimental process, Turbo is a better choice, as it grows faster than DH5α. Individual colonies can be seen within 5 h after transformation, but are less efficient than using DH5α.

444

Tianyuan Su et al.

6. The entire electrotransformation process was undertaken at the room temperature, as we found that room temperature did not affect the assembly efficiency compared to conventional low temperature condition, and similar room temperature electrotransformation has also been reported previously [15]. 7. If only the target DNA band is present without any undesired bands, to simplify the procedure, a small amount of unpurified PCR product can be directly used for in vivo DNA assembly, but due to the presence of ions, no more than 5 μL of the unpurified PCR product can be added into the mixture for electrotransformation. 8. The highest assembly efficiency can be achieved when the molar ratio of DNA fragments is 1:1. 9. The high salt ions concentration in the solution will increase the conductivity and cause ‘arcing’ during electroporation, which results in experimental failure. Therefore, it is important to keep the ion content of the solution as low as possible. 10. For routine DNA cloning, 100 μL of the recovery culture is sufficient to spread onto the appropriate antibiotic plate, but if more colonies are required, e.g. DNA library construction, it is recommended to divide the recovery culture into several portions and spread onto multiple plates. 11. It is not recommended to pick too many bacterial cells in this step, as adding too many bacterial mixtures into the PCR tube is detrimental to colony PCR. 12. At least 3 single colonies should be picked out for colony PCR on each plate, and multiple aliquots of PCR mixture can be prepared in a 1.5 mL microtubes and then dispensed into the 0.25 mL PCR tubes.

Acknowledgments This work was supported by grants from the National Key R&D Program of China (2021YFC2100500), the National Natural Science Foundation of China (31730003, 32200081), and Shandong Provincial Natural Science Foundation (ZR2021QC021). References 1. Baek CH, Liss M, Clancy K, Chesnut J, Katzen F (2014) DNA assembly tools and strategies for the generation of plasmids. Microbiol Spectr 2(5):5. https://doi.org/10.1128/ microbiolspec.PLAS-0014-2013 2. Ellis T, Adie T, Baldwin GS (2011) DNA assembly for synthetic biology: from parts to

pathways and beyond. Integr Biol-UK 3(2): 1 0 9 – 1 1 8 . h t t p s : // d o i . o r g / 1 0 . 1 0 3 9 / c0ib00070a 3. Gibson DG, Young L, Chuang RY, Venter JC, Hutchison CA, Smith HO (2009) Enzymatic assembly of DNA molecules up to several

In Vivo DNA Assembly Using the PEDA Method hundred kilobases. Nat Methods 6(5):343– U341. https://doi.org/10.1038/Nmeth. 1318 4. Xia Y, Li K, Li J, Wang T, Gu L, Xun L (2019) T5 exonuclease-dependent assembly offers a low-cost method for efficient cloning and sitedirected mutagenesis. Nucleic Acids Res 47(3): e15. https://doi.org/10.1093/nar/gky1169 5. Quan JY, Tian JD (2009) Circular polymerase extension cloning of complex gene libraries and pathways. PLoS One 4(7). https://doi. org/10.1371/journal.pone.0006441 6. Sleight SC, Bartley BA, Lieviant JA, Sauro HM (2010) In-fusion BioBrick assembly and re-engineering. Nucleic Acids Res 38(8): 2624–2636. https://doi.org/10.1093/nar/ gkq179 7. Chao R, Yuan Y, Zhao H (2015) Recent advances in DNA assembly technologies. FEMS Yeast Res 15(1):1–9. https://doi.org/ 10.1111/1567-1364.12171 8. Watson JF (2019) In vivo DNA assembly using common laboratory bacteria: a re-emerging tool to simplify molecular cloning. J Biol Chem 294(42):15271–15281. https://doi. org/10.1074/jbc.REV119.009109 9. Ogawa T, Iwata T, Kaneko S, Itaya M, Hirota J (2015) An inducible recA expression Bacillus subtilis genome vector for stable manipulation of large DNA fragments. BMC Genomics 16: 209. https://doi.org/10.1186/s12864-0151425-4 10. King BC, Vavitsas K, Ikram NK, Schroder J, Scharff LB, Bassard JE, Hamberger B, Jensen

445

PE, Simonsen HT (2016) In vivo assembly of DNA-fragments in the moss, Physcomitrella patens. Sci Rep 6:25030. https://doi.org/10. 1038/srep25030 11. Wang H, Li Z, Jia R, Yin J, Li A, Xia L, Yin Y, Muller R, Fu J, Stewart AF, Zhang Y (2018) ExoCET: exonuclease in vitro assembly combined with RecET recombination for highly efficient direct DNA cloning from complex genomes. Nucleic Acids Res 46(5):2697. https://doi.org/10.1093/nar/gkx1296 12. Pang Q, Ma S, Han H, Jin X, Liu X, Su T, Qi Q (2022) Phage enzyme-assisted direct in vivo DNA assembly in multiple microorganisms. ACS Synth Biol 11(4):1477–1487. https:// doi.org/10.1021/acssynbio.1c00529 13. Muller B, Stasiak A (1991) RecA-mediated annealing of single-stranded DNA and its relation to the mechanism of homologous recombination. J Mol Biol 221(1):131–145. https:// doi.org/10.1016/0022-2836(91)80210-l 14. Zhang Y, Werling U, Edelmann W (2012) SLiCE: a novel bacterial cell extract-based DNA cloning method. Nucleic Acids Res 40(8):e55. https://doi.org/10.1093/nar/ gkr1288 15. Tu Q, Yin J, Fu J, Herrmann J, Li Y, Yin Y, Stewart AF, Muller R, Zhang Y (2016) Room temperature electrocompetent bacterial cells improve DNA transformation and recombineering efficiency. Sci Rep 6:24648. https:// doi.org/10.1038/srep24648

Chapter 25 Cell-Free Synthesis and Quantitation of Bacteriophages Antoine Levrier, Steven Bowden, Bruce Nash, Ariel Lindner, and Vincent Noireaux Abstract Cell-free transcription-translation (TXTL) enables achieving an ever-growing number of applications, ranging from the rapid characterization of DNA parts to the production of biologics. As TXTL systems gain in versatility and efficacy, larger DNAs can be expressed in vitro extending the scope of cell-free biomanufacturing to new territories. The demonstration that complex entities such as infectious bacteriophages can be synthesized from their genomes in TXTL reactions opens new opportunities, especially for biomedical applications. Over the last century, phages have been instrumental in the discovery of many ground-breaking biotechnologies including CRISPR. The primary function of phages is to infect bacteria. In that capacity, phages are considered an alternative approach to tackling current societal problems such as the rise of antibiotic-resistant microbes. TXTL provides alternative means to produce phages and with several advantages over in vivo synthesis methods. In this chapter, we describe the basic procedures to purify phage genomes, cell-free synthesize phages, and quantitate them using an all-E. coli TXTL system. Key words TXTL (cell-free transcription-translation), Bacteriophages, E. coli, Spotting assay, Kinetic infection assay

1

Introduction Cell-free transcription-translation (TXTL) is becoming one of the most convenient and transformative technologies for many developing synthetic biology applications. As the scope of TXTL utilization is rapidly diversifying, the production of biologics and high-value chemicals remains TXTL’s major strength [1–4]. Moreover, as cell-free gene expression systems are gaining in versatility and efficacy, expressing larger natural or synthetic DNAs in vitro promises to expand TXTL’s capabilities in biomanufacturing to new areas of research. One promising area is the cell-free synthesis (CFS) of phages from their genomes and this is just starting to be exploited for synthetic biology applications [5–8]. Phages are a virtually limitless resource of bioactive materials that are exploited in biotechnology [9], nanotechnology [10], medicine [11],

Jeffrey Carl Braman (ed.), Synthetic Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2760, https://doi.org/10.1007/978-1-0716-3658-9_25, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2024

447

448

Antoine Levrier et al.

agriculture [12], and bioremediation [13]. Their host specificity has led to several phage therapies and commercial success in food and agriculture to control foodborne pathogens despite costly manufacturing. Phages are a promising solution to the threat of antibiotic-resistant microbes. CFS of phages has several major advantages compared to current industrial methods: (1) It avoids the use of dangerous pathogens, (2) it reduces the production cost of phages, and (3) it accelerates phage characterization and engineering. CFS of phages also offers an affordable and safe setting for student or biotechnician training, while providing a platform for hands-on practices in molecular biology, nextgeneration sequencing (NGS), and bioinformatics. This chapter provides a detailed description of basic procedures to extract and purify phage genomes, cell-free synthesize phages, and quantitate phage titers using an all-E. coli TXTL system. We use the Escherichia coli-specific phage T7 (40 kbp, 60 genes) as an example.

2

Materials All the solutions should be prepared using dH2O (ultrapure water, resistivity of 18 MΩ-cm at 25 °C, autoclaved when indicated) and analytical grade reagents. Prepare and store all reagents at room temperature (unless indicated otherwise). Follow all waste disposal regulations when disposing of waste materials. Wear gloves, lab coat, and mask to prevent contamination of the solutions.

2.1 Phage Amplification

1. Phage lysate (from ATCC for phage T7). 2. Phage host strain. For T7: E. coli strain B. 3. Luria-Bertani (LB) broth liquid medium at room temperature (no antibiotic). 4. 1.5% LB agar plates. 5. 0.22 μm Cellulose-Acetate (CA) membrane filters and syringes. 6. Optional: SM buffer (5.8 g NaCl, 2.0 g MgSO4·7H2O, 50 mL 1 M Tris-HCl pH 7.4, in 1 L dH2O. Autoclave and 0.02 μm filter-sterilize before use, and store at room temperature). 7. Optional: 100 kDa membrane Amicon Ultra-15 centrifugal filter units.

2.2 Phage DNA Extraction and Purification

1. Phage lysate (prepared in Subheading 3.1). 2. DNase I recombinant, RNAse-free. 3. RNase A 10 mg/mL. 4. 1 M MgCl2 solution. 5. Proteinase K 20 mg/mL.

TXTL Phage Synthesis and Quantitation

449

6. SDS, 20% wt/vol solution, RNase-free. 7. Phenol:Chloroform:Isoamyl Alcohol 25:24:1 (PCI) Saturated with 10 mM Tris-HCL, pH 8.0, 1 mM EDTA. Use PCI under a fume hood. 8. 500 mL Separatory funnel. 9. Chloroform. 10. 3 M sodium acetate solution, pH 5.2. 11. Ethanol absolute and 70% vol/vol ethanol. 2.3 Cell-Free Gene Expression

1. TXTL system: myTXTL Sigma 70 master mix kit. 2. Chi6 oligonucleotides. 3. Freshly extracted Subheading 3.2).

T7

genomic

DNA

(prepared

in

4. 1.5 mL tubes and a 30 °C static incubator. 5. Plasmid P70a-deGFP. 2.4 Dilutions of CellFree Synthesized Phage Reactions

1. Cell-free expression reaction of phage T7 (prepared in Subheading 2.3). 2. LB liquid at room temperature (RT) (no antibiotic). 3. Thermomixer. 4. Filter tips. 5. Multichannel pipets. 6. 96-well plates.

2.5

Spotting Assay

1. Dilutions of cell-free synthesized phage reactions (prepared in Subheading 2.4). 2. Overnight E. coli B culture (5 mL LB broth, 37 °C, 250 rpm). 3. 1.5% wt/vol agar-LB plates solid at 37 °C (no antibiotic). 4. 1-well plates. 5. 0.7% wt/vol soft bacto-agar liquid at 50 °C (keep in water bath at 50 °C, no antibiotic). 6. Liquid LB broth at RT (no antibiotic). 7. Static incubator at 37 °C. 8. 14 mL round-bottom culture tubes. 9. Filter tips. 10. Multichannel pipets. 11. 96-well plates.

450

2.6

Antoine Levrier et al.

Kinetic Assay

1. Dilutions of cell-free synthetized phage reactions (prepared in Subheading 2.4). 2. Overnight E. coli B culture (5 mL LB broth, 37 °C, 250 rpm). 3. LB liquid at RT (no antibiotic). 4. Plate reader with absorbance measurement capabilities (e.g., Biotek H1m). 5. 96 bacterial culture well-plates (flat bottom transparent) with lids (e.g., Thermo Scientific™ Nunc MicroWell 96-Well Optical-Bottom Plates).

3

Methods The whole cycle of procedures takes 3 days (Table 1). Carry out all procedures at room temperature unless otherwise specified.

3.1 Preparation of a Phage Lysate

To get phage genomic DNA ready for TXTL reactions, one must first amplify the phage in cell cultures. The phage T7 is amplified in E. coli strain B liquid cultures using a commercial phage lysate (ATCC BAA-1025-B2). Infectious phages can be also acquired from phage banks (see Note 1). The preparation of a phage lysate takes about 4 h provided simple steps at day-2 and day-1 are successful [14]. 1. 2 days before Day 1: Plate the E. coli B strain on an LB-agar Petri dish to get colonies next day. Incubate the plate overnight at 37 °C. 2. The day before Day 1: From the agar plate, start an overnight culture of E. coli B strain (5 mL LB broth, 37 °C, 250 rpm), as late as possible. 3. On Day 1, start a culture of E. coli host B by adding 2 mL of an overnight E. coli B culture to 100 mL of LB broth and incubate at 37 °C (250 rpm). Table 1 Agenda of the whole method presented in this chapter Procedures

Duration

Day 1

Preparation of phage lysate Titration of phages in the lysate

~4 h ~4 h

Day 2

Phage genome extraction and purification Cell-free synthesis of phages

~4–6 h ~3 h

Day 3

Dilutions of cell-free reactions Phage kinetics assay Phage spotting assay

~1 h ~4 h ~4 h

TXTL Phage Synthesis and Quantitation

451

Fig. 1 Picture of E. coli strain B cultures after 2 h of incubation at 37 °C. On the left, 100 mL of an E. coli strain B culture not infected with phage T7, and on the right, 100 mL of an E. coli strain B culture cleared after infection by the phage T7 (phage lysate)

4. Monitor the culture optical density at 600 nm (OD). When the OD reaches 0.3, add 1 mL of phage lysate at 109 PFU/mL to start the culture with an initial multiplicity of infection (MOI) between 0.01 and 0.1. This will roughly allow 2–3 bacterial cycles before complete lysis and increase the titer of the lysate. The complete lysis is obtained when there is as many bacteria in the culture as phages. 5. Monitor the lysis by OD absorbance. For the phage T7, after approximately 2 h of incubation, complete lysis of the bacterial culture is obtained (phage lysate). The bacterial culture is completely cleared (Fig. 1), and the OD is 1) 3.2 Phage DNA Extraction

From 10 mL of phage lysate at 109–1011 PFU/mL (see Subheading 3.1), one can extract 100 μL of T7 genomic DNA at a concentration of about 10 nM in water. The extraction and purification of a phage genome takes about 4–6 h [15]. Alternatively, various phage genome extraction kits based on silica spin column DNA extraction can be used (see Note 3). 1. Add 10 mL phage lysate to a 50 mL tube. 2. Add 100 μL of 1 M MgCl2 (10 mM MgCl2). 3. Add 400 U of DNase I and 750 μg RNase A (see Note 4). Mix by gently vortexing the tube. 4. Incubate 30 min at 37 °C. 5. Incubate 15 min at 75 °C to inactivate the DNase/RNase reactions. 6. Add 100 μg/mL of proteinase K and 0.5% wt/vol SDS to lyse the phage capsids and release the genomes. 7. Mix the solution by inverting the tube gently several times (do not vortex). Incubate at 55 °C for 30 min. Mix the lysate by inverting the tube 2–3 times during incubation (do not vortex). 8. Add 10 mL of the pre-treated lysate to a separatory funnel (Fig. 3). 9. Perform steps 9–15 in a fume hood. Add 10 mL of Phenol: Chloroform:isoamyl alcohol (PCI). 10. Close the stopper and invert slowly. Hold the separatory funnel tightly at the stopper and the stopcock. Release the pressure by

TXTL Phage Synthesis and Quantitation

453

Fig. 3 Time series of a mixture of PCI and phage lysate in a separatory funnel. Right after mixing, an off-white emulsion is obtained. Upon decanting, the three phases slowly separate until the interphase becomes clearly separated from the upper phase, and the upper phase becomes transparent

opening the stopcock towards the back of the hood. Close the stopcock and shake the funnel gently. Vent it again. Repeat this step until no more gas escapes (see Note 5). 11. Allow the layers to separate for about 20–30 min until 3 phases are clearly separated (see Note 6). 12. Remove the stopper and drain the bottom layer and the interphase into a waste container. 13. Add 10 mL of PCI and repeat the procedure (steps 9–12). 14. Add 10 mL of PCI and repeat the procedure again (steps 9– 12). At this step the interphase should be very thin and the aqueous phase very clear. 15. Add 10 mL of pure chloroform to remove phenol traces. Shake, vent, decant, and drain the bottom layer (chloroform). 16. Recover the aqueous phase (~10 mL) containing trace chloroform and DNA in a 50 mL glass tube. 17. To the aqueous phase, add 0.3 M of sodium acetate and twice the volume of pre-chilled (4 °C) 100% ethanol. 18. Gently mix by inverting, then precipitate at -80 °C for 1 h (see Note 7).

454

Antoine Levrier et al.

Fig. 4 T7 genome dried pellet after precipitation and wash. The pellet can be white to brown

19. Recover the tube from -80 °C and centrifuge the precipitated DNA for 20 min at 4000× g at 4 °C (see Note 8). 20. Discard the supernatant by inverting the tube and wash the DNA pellet with 1 mL 70% ethanol four times without disturbing the pellet. 21. Carefully pipet all remaining ethanol droplets around the DNA pellet and let the pellet dry 15 min at room temperature under a fume hood (Fig. 4). Do not let the pellet dry for more than 30 min as it may be difficult to resuspend it in water. 22. Resuspend the pellet in 100 μL of deionized water. 23. Transfer the resuspended DNA to a 1.5 mL centrifuge tube and measure the DNA concentration using a Nanodrop. Test the phage DNA to ensure the absence of phages by the spotting and kinetic assays (Subheadings 3.5 and 3.6) (see Note 9). 3.3

TXTL of Phages

The goal is to synthesize the phage T7 from its purified genome (Subheadings 3.1 and 3.2) in a TXTL reaction. This has already been demonstrated with an all-E. coli TXTL system [5, 7, 16], now sold by Arbor Biosciences under the name myTXTL. The transcription and translation are performed by endogenous molecular components provided by an E. coli cytoplasmic extract. A typical TXTL reaction is composed of 33% (v/v) of E. coli extract. The other 66% of the reaction volume is composed of an energy mixture, amino

TXTL Phage Synthesis and Quantitation

455

acids, ions (magnesium and potassium), a molecular crowder (PEG8000) and the DNAs to be expressed. A complete description of this system has been published [16, 17]. For cell-free expressing T7 phage, 3 μM of chi6 DNA are added to the mix to inhibit degradation of the linear dsDNA genome by the RecBCD complex [18]. The myTXTL Sigma 70 master mix kit provides 75 μL aliquots at 75% of the final volume (e.g., 9 μL of the mix for a final reaction volume of 12 μL). The myTXTL Sigma 70 master mix kit contains all the necessary components except chi6 and the DNAs to be expressed. The cell-free synthesis of phages takes about 3 h. 1. Thaw a 75 μL myTXTL Sigma 70 master mix kit aliquot on ice. Vortex the mix gently. 2. Split the mix into 9 μL aliquots in sterile 1.5 mL tubes. 3. Add 1 μL Chi6 DNA at 36 μM (3 μM final, see Note 10) to each of the 9 μL reactions. 4. Add 1.2 μL T7 genome at 10 nM to one tube (1 nM final concentration), bring to 12 μL by adding 0.8 μL water. 5. To a second tube, add 1.2 μL plasmid P70a-deGFP at 50 nM (5 nM final concentration), bring to 12 μL by adding 0.8 μL water. This reaction is a control (synthesis of deGFP). 6. To a third tube, add 2 μL water (negative control). 7. Vortex each reaction gently. 8. Incubate the reactions overnight at 29 °C. 9. To perform the spotting (Subheading 3.5) and the kinetic assay (Subheading 3.6) the next day, start an overnight culture of E. coli host B (5 mL LB broth, 37 °C, 250 rpm), as late as possible. 3.4 Serial Dilutions of Cell-Free Synthesized Phages

Ten-fold serial dilutions of the reactions are prepared in LB to determine (by the spotting assay and/or the kinetic assay) the concentration of infectious T7 phages synthesized overnight in TXTL. The serial dilution takes about 1 h using multichannel pipets. 1. Recover the TXTL reaction tubes from the incubator. Add 108 μL of LB to each of the 12-μL reactions (1st ten-fold dilution). 2. Place the tube on a thermomixer, mix for 20 min at 1000 rpm and 37 °C (see Note 11). 3. Serially dilute the reactions ten-fold by adding 10–90 μL of LB (see Notes 12 and 13). 4. Store the cell-free phage dilutions at 4 °C for spotting and optical density kinetic assays.

456

Antoine Levrier et al.

Fig. 5 Results of the spotting assay for the cell-free synthesis of T7 phages. Lines A, B, and C are 3 parallel ten-fold serial dilutions (prepared in Subheading 3.4) of phage lysate (prepared in Subheading 3.1). gDNA 1 and 2 are the spotting of 5 μL of T7 genomic DNA stocks from 2 different DNA extractions prepared on two different days. No plaques are detected. Lines D, E, and F are the spotting of cell-free phage dilutions of 3 independent TXTL reactions (see Subheading 3.4) prepared from gDNA 1 and lines G, H, and I from gDNA 2. The dilution factors are indicated as powers of 10. 3 μL of each dilution were spotted on a soft agar bacterial layer and incubated at 37 °C overnight. The average PFU/mL and standard deviations are calculated based on the plaque counts of the last dilution with visible plaques (1–20 plaques) 3.5 Phage-Host Infection Spotting Assay

The spotting assay consists of adding a small volume of each of the dilutions of a phage solution to a lawn of the host strain. After several hours, the lawn of bacteria is lysed wherever phages are present, forming translucent circles into the lawn. At low phage concentrations, single plaques are observed that enable titration of phages (Figs. 2 and 5). In the case of T7, the spotting assay takes about 1 h to set up and 3–4 h to incubate. 1. Warm up 1.5% agar-LB plates at 37 °C for at least 1 h. 2. Make 100 mL of 0.7% soft agar solution. Add 2.5 g LB and 0.7 g Bacto-agar to a 100 mL bottle. Fill to 100 mL with de-ionized water. Autoclave at 121 °C for 15 min. Set water bath to 50 °C. 3. Once autoclave is finished, incubate the soft agar solution at 50 °C in the water bath. Wait at least 15 min so that the temperature equilibrates. 4. Plate the soft-agar layer as follows: to a 14 mL round-bottom culture tube, add 5 mL liquid soft agar (from the 50 °C water bath) and 100 μL of the overnight culture of E. coli host B (5 mL LB broth, 37 °C, 250 rpm). Vortex gently. Using a 5 mL pipette, slowly dispense 2.5 mL of solution onto each plate (avoid bubbles). Slowly let the solution out of the pipette and onto the center of the petri dish. Gently tilt the dish such that the solution coats the entire top surface of the agar plate. Let

TXTL Phage Synthesis and Quantitation

457

the soft-agar plates solidify for 15 min at room temperature on a flat surface. The dryness of the agar is very important; if too moist, the drops will run and coalesce, if too dry, the bacteria will grow poorly. 5. Spotting: for each phage dilution, add 3 μL on the soft agar. Upon phage spotting, touch the surface of the soft agar with the tip to mark the position of the center of the droplet. Space the spots evenly (it is helpful to use a 96-well plate lid with condensation rings as a guide under the soft-agar plate). 6. Let the plate dry 15 additional minutes on the bench to make sure all the droplets are absorbed by the soft agar. Label your plate and mark the positions of the spotting underneath the plate. 7. Place the plates at 37 °C, facing down, for 4 h (you should start seeing lysis after 2 h of incubation for T7 phage). 8. The first dilutions (typically 101–105 dilutions) typically yield clear and uniform area slightly bigger than the dispensed droplet for T7 spots on E. coli host. Plaques become progressively countable within the spotted droplet area with higher dilutions. 9. Count the plaques at the dilution where each spot has approximately 1–20 plaques. 10. PFU/mL: example for 12 plaques at 107 dilution. There are 12 PFU in 3 μL so 4 × 103 PFU/mL at a 107 dilution. This means 4 × 1010 PFU/mL initially. A more accurate phage titer is obtained by increasing the number of spots and calculating mean and standard deviation (see Note 14). 3.6 Phage-Host Infection Kinetics Assay

The spotting assay enables a direct count of the plaques, hence a precise phage titer in PFU/mL for either phage in lysate, phage in SM buffer or cell-free synthesized phages. The kinetic assay, based on optical density readings over time in a welled-plate, also enables quantifying the concentration of phages in a solution, such as a TXTL reaction (Figs. 6 and 7). The kinetic assay has the advantage of providing the time course of infection of each phage dilution [19]. The more concentrated the phages, the quicker and less variable the lysis. Upon dilutions, less and less phages are initially present in each well, increasing the variability between the wells. Despite this, comparing the titers from the spotting assay with the liquid infection assay for T7 phage in E. coli B, we can show that the liquid assay sensitivity (the sensitivity is observed for the last dilution at which the three wells lyse) is around 10 PFU/mL. In the case of T7, the kinetics assay takes about 2–4 h. The duration of the kinetics assay must be tested for other phages.

458

Antoine Levrier et al.

Fig. 6 T7 stock lysate kinetic assay. Serial dilution of the stock lysate (amplified in Subheading 3.1). Each curve is the result of three independent serial dilutions of the stock (prepared in Subheading 3.4). Only the dilutions where the three wells lysed are represented. The stock concentration was titered at 2 × 1010 PFU/mL in Subheading 3.5. This indicates that the kinetic assay is sensitive until ~20 PFU/mL

Fig. 7 Cell-free T7 phage expression from purified T7 genomes (first preparation) kinetic assay. Serial dilution of the cell-free expressed T7 phages (prepared in Subheading 3.4). Each curve is the result of three parallel independent cell-free expression reactions. Only the dilutions where the three wells lysed at the same dilution are represented here. Left: the stock concentration was titered at 2 108 PFU/mL in Subheading 3.5. This indicates that the kinetic assay is sensitive until ~20 PFU/mL. Right: the stock concentration was titered at 3 × 108 PFU/mL in Subheading 3.5. This indicates that the kinetic assay is sensitive until ~3 PFU/mL

1. Program the plate reader: (a) Set the temperature to 37 °C. (b) Continuous double orbital shaking at 90–180 rpm. (c) Set kinetic read for 5 h (T7 phage and E. coli B) with absorbance at 600 nm reads every 2 min. (d) Plot the mean and standard deviation of each condition. Subtract all the conditions by the negative control to get the real OD value.

TXTL Phage Synthesis and Quantitation

459

2. Dilute the overnight E. coli B culture to an OD of 0.033 and determine the total volume of culture needed. For a whole 96-well plate Vhost_0.033 = 96 × 200 μL ~20 mL. From a starting culture at OD = 1, put 660 μL host in 19,340 μL LB to obtain 20 mL at OD 0.033. 3. Dispense 180 μL of the host culture to the wells of a 96 flatbottom well plate. 4. Add 20 μL of different dilutions of the overnight TXTL reactions to each well containing the host culture for a final volume of 200 μL per well. For each condition, do four replicates. The controls are: – Positive control = 180 μL host + 20 μL LB (>4 replicates). – Negative control = 200 μL LB (>4 replicates). 5. Close the 96-well plate with a lid to reduce evaporation. 6. Insert the source plate into the preset plate reader. 7. Lysis shows up as an abrupt drop in the growth curves. Lysis occurs quickly for the first dilutions (usually 101, 102, 103, and 104 within the two first hours) and is progressively delayed for the higher dilutions. The first dilution at which the growth is not always inhibited (typically 1010) indicates that initially less than one phage per well was present, providing a first indicator of the initial phage titer ~1010–1011 PFU/mL (less than 1 phage at 1010 is >1 phage at a 109 dilution; 1 phage in 20 μL is ~50 PFU/mL diluted 109 times gives ~5 × 1010 PFU/mL) if the OD drops at 109 and not at 1010. More accurate titer estimations can be done by running a calibration experiment from a known stock concentration of phages (see Notes 15 and 16).

4

Notes 1. The Felix d’He´relle Reference Center for Bacterial Viruses, Leibniz Institute—DSMZ (German Collection of Microorganisms and Cell Cultures), Bacteriophage Bank of Korea, American Type Culture Collection (ATCC) Bacteriophage Collection, and National Collection of Type Cultures (NCTC) Bacteriophage Collection. 2. Buffer exchange slightly increases DNA purity. It is also relevant for other experiments requiring a pure phages solution free of small bacterial effectors. 3. Kits as an alternative to phenol/chloroform such as Norgen Biotek (46800), bioWORLD (10760112) are available commercially. However, genome DNA might be fragmented by the silica columns and hence reduce cell-free phage expression.

460

Antoine Levrier et al.

4. MgCl2 is a DNase/RNase cofactor. 5. Significant pressure might build up in the separatory funnel during the first extraction. 6. The bottom layer is the phenol phase containing lipids and cellular debris, the interphase is an emulsion containing aggregated proteins, and the upper layer is the aqueous phase containing the genomic DNA. 7. The mixture should not freeze at -80 °C. 8. A free-floating DNA precipitate should be visible when recovering the tube from the -80 °C freezer. 9. TXTL systems are sensitive to the presence of solvents. Notably ethanol and phenol strongly inhibit cell-free gene expression. It is critical to obtain a genome as pure as possible. 10. Chi6_sense: TCACTTCACTGCTGGTGGC CACTGCTGGTGGCCACTGCTGGTGGC CACTGCTGGTGGCCACTGCTGGTGGC CACTGCTGGTGGCCA Chi6_antisense: TGGCCACCAGCAGTGGCCACCAG CAGTGGCCACCAGCAGTGGCCACCAGCAGTGGCCAC CAGCAGTGGCCACCAGCAGTGAAGTGA. 11. Do not try to pipet the reaction as it is typically very viscous. There is no need to use the thermomixer after the first dilution. 12. Use filter pipet tips to pipet up and down a dozen of times to ensure proper mixing at each step of the serial dilution. 13. A convenient way to do the additional dilutions is to use a 96 well-plate and multichannel pipets. 14. The T7 genomes prepared in Subheading 3.2 are also spotted to control for the presence of residual phages in the DNA. 15. The percentage of inhibition (PI) can be calculated from the OD curves as follow: PI =

ðAcontrol - AblankÞ - ðAphage - AblankÞ * 100 ðAcontrol - AblankÞ

where Acontrol is the area under the curve of PI, Ablank is the area under the curve of negative control, and Aphage is the area under the curve of a given phage dilution. These areas under the curve are calculated between two time points arbitrarily defined to start before the first lysis (SPD) and after the last one (EPD). 16. The typical titers obtained with T7 are 109–1012. Titers for other phages have been measured [5, 16]. For instance, MS2 and phix174 (both E. coli) have titers of about 1012.

TXTL Phage Synthesis and Quantitation

461

Acknowledgments This material is based upon work supported by the National Science Foundation (award CBET FMRG 2228971). References 1. Pardee K, Slomovic S, Nguyen PQ et al (2016) Portable, on-demand biomolecular manufacturing. Cell. https://doi.org/10. 1016/j.cell.2016.09.013 2. Pardee K, Green AA, Takahashi MK et al (2016) Rapid, low-cost detection of Zika virus using programmable biomolecular components. Cell. https://doi.org/10.1016/j. cell.2016.04.059 3. Wilding KM, Schinn SM, Long EA, Bundy BC (2018) The emerging impact of cell-free chemical biosynthesis. Curr Opin Biotechnol. https://doi.org/10.1016/j.copbio.2017. 12.019 4. Silverman AD, Karim AS, Jewett MC (2020) Cell-free gene expression: an expanded repertoire of applications. Nat Rev Genet. https:// doi.org/10.1038/s41576-019-0186-3 5. Rustad M, Eastlund A, Marshall R et al (2017) Synthesis of infectious bacteriophages in an E. coli-based cell-free expression system. J Vis Exp. https://doi.org/10.3791/56144 6. Rustad M, Eastlund A, Jardine P, Noireaux V (2018) Cell-free TXTL synthesis of infectious bacteriophage T4 in a single test tube reaction. Synth Biol. https://doi.org/10.1093/synbio/ ysy002 7. Shin J, Jardine P, Noireaux V (2012) Genome replication, synthesis, and assembly of the bacteriophage T7 in a single cell-free reaction. ACS Synth Biol. https://doi.org/10.1021/ sb300049p 8. Garenne D, Bowden S, Noireaux V (2021) Cell-free expression and synthesis of viruses and bacteriophages: applications to medicine and nanotechnology. Curr Opin Syst Biol 28: 100373. https://doi.org/10.1016/j.coisb. 2021.100373 9. Harada LK, Silva EC, Campos WF et al (2018) Biotechnological applications of bacteriophages: state of the art. Microbiol Res. https://doi.org/10.1016/j.micres.2018. 04.007 10. Daube SS, Arad T, Bar-Ziv R (2007) Cell-free co-synthesis of protein nanoassemblies: tubes,

rings, and doughnuts. Nano Lett. https://doi. org/10.1021/nl062560n 11. Cui Z, Guo X, Feng T, Li L (2019) Exploring the whole standard operating procedure for phage therapy in clinical practice. J Transl Med. https://doi.org/10.1186/s12967-0192120-z 12. Keen EC (2015) A century of phage research: bacteriophages and the shaping of modern biology. BioEssays. https://doi.org/10. 1002/bies.201400152 13. Sharma RS, Karmakar S, Kumar P, Mishra V (2019) Application of filamentous phages in environment: a tectonic shift in the science and practice of ecorestoration. Ecol Evol. https://doi.org/10.1002/ece3.4743 14. Bonilla N, Barr JJ (2018) Phage on tap: a quick and efficient protocol for the preparation of bacteriophage laboratory stocks. Methods Mol Biol 1838:37–46. https://doi.org/10. 1007/978-1-4939-8682-8_4 15. Pickard DJJ (2009) Preparation of bacteriophage lysates and pure DNA. In: Clokie MRJ, Kropinski AM (eds) Bacteriophages. Humana Press, Totowa, pp 3–9 16. Garamella J, Marshall R, Rustad M, Noireaux V (2016) The all E. coli TX-TL toolbox 2.0: a platform for cell-free synthetic biology. ACS Synth Biol 5. https://doi.org/10.1021/ acssynbio.5b00296 17. Sun ZZ, Hayes CA, Shin J et al (2013) Protocols for implementing an Escherichia coli based TX-TL cell-free expression system for synthetic biology. J Vis Exp. https://doi.org/10.3791/ 50762 18. Marshall R, Maxwell CS, Collins SP et al (2017) Short DNA containing χ sites enhances DNA stability and gene expression in E. coli cell-free transcription–translation systems. Biotechnol Bioeng 114. https://doi.org/10. 1002/bit.26333 ˜ oz-Berbel X, Mas J (2019) 19. Rajnovic D, Mun Fast phage detection and quantification: an optical density-based approach. PLoS One 14: e0216292. https://doi.org/10.1371/journal. pone.0216292

Chapter 26 Multimodal Control of Bacterial Gene Expression by Red and Blue Light Stefanie S. M. Meier , Elina Multam€aki , Ame´rico T. Ranzani , Heikki Takala , and Andreas Mo¨glich Abstract By applying sensory photoreceptors, optogenetics realizes the light-dependent control of cellular events and state. Given reversibility, noninvasiveness, and exquisite spatiotemporal precision, optogenetic approaches enable innovative use cases in cell biology, synthetic biology, and biotechnology. In this chapter, we detail the implementation of the pREDusk, pREDawn, pCrepusculo, and pAurora optogenetic circuits for controlling bacterial gene expression by red and blue light, respectively. The protocols provided here guide the practical use and multiplexing of these circuits, thereby enabling graded protein production in bacteria at analytical and semi-preparative scales. Key words ANTAR, Bacteriophytochrome, Gene expression, Histidine kinase, Light-oxygen-voltage, Optogenetics, RNA binding, Sensory photoreceptor, Synthetic biology, Two-component system

1

Introduction Sensory photoreceptors mediate the sensation of light across manifold organisms [1, 2]. In optogenetics, photoreceptors double as genetically encoded actuators and enable the light-dependent perturbation of cellular traits and processes in reversible, noninvasive, and spatiotemporally acute manner [3]. Protein engineering has introduced a wealth of artificial photoreceptors that serve as custom-tailored optogenetic implements for the precision control of ever new and more complex biological processes [4, 5]. Although optogenetics originates in the neurosciences where it predominantly relies on rhodopsin photoreceptors acting as light-activated ion pumps and channels [6–9], the general concept and benefits of light-based cellular control extend to diverse organisms and applications in cell biology and biotechnology. Within bacteria, gene expression is a particularly frequent and versatile subject of optogenetics [10]. Optogenetic gene regulation is mainly exerted at the

Jeffrey Carl Braman (ed.), Synthetic Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2760, https://doi.org/10.1007/978-1-0716-3658-9_26, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2024

463

464

Stefanie S. M. Meier et al.

level of transcription initiation, but other control points also exist. Light-dependent gene expression affords spatial and temporal definition, precise dosing, scope for automation, noninvasiveness, and reversibility. These benefits set the optogenetic approach apart from conventional chemical induction, e.g., via L-arabinose or isopropyl-β-thiogalactoside, and may aid in addressing challenges of irreversibility and poor tunability. Depending on the underlying photoreceptor, optogenetic circuits for the regulation of bacterial gene expression are sensitive to distinct bands of the electromagnetic spectrum ranging from nearUV to near-infrared light [11]. Sensory photoreceptors usually exhibit a bipartite architecture consisting of a chromophore-binding photosensor and an effector module. Optogenetic applications in bacteria often harness sensory photoreceptors of the light-oxygen-voltage (LOV) [12–17] and bacteriophytochrome (BphP) classes [18–20] that sense blue and red/far-red light, respectively. Cyanobacteriochromes (CBCR) [21], belonging to the phytochrome superfamily, as do the BphPs, also find frequent use in bacterial optogenetics [19], with individual CBCRs possessing different light sensitivities, for instance red and green in the case of CcaS from Synechocystis sp. In this chapter, we describe the pREDusk and pREDawn systems [20] for down- and upregulating, respectively, bacterial gene expression under red light, and the pCrepusculo and pAurora setups that mediate similar effects but in response to blue light [17]. All four systems are realized as single plasmids and allow the light-dependent expression of target genes, included on the same plasmid backbone (Fig. 1). The pREDusk and pREDawn plasmids (Fig. 1a, b) descend from the earlier, blue-light-responsive pDusk and pDawn systems [12, 22] which are widely used in bacterial optogenetics [10]. Sensitivity to red light in pREDusk and pREDawn is conferred by the chimeric sensor histidine kinase DrF1 [20] which derives from the fusion of the photosensory core module (PCM) of the Deinococcus radiodurans BphP [23] and the effector moiety of the Bradyrhizobium japonicum sensor histidine kinase FixL [24]. Immediately upstream of DrF1, a heme oxygenase (HO), also from D. radiodurans, is included in the same operon to supply the chromophore biliverdin via oxidative cleavage of heme. DrF1 and its cognate response regulator FixJ, placed downstream of DrF1 also within the same operon, form a redlight-responsive two-component system [25, 26]. In the case of pREDusk [20], DrF1 phosphorylates FixJ in darkness, thus enabling its binding to the FixK2 promoter and ramping up gene expression (Fig. 1a). By contrast, red light strongly reduces the net kinase activity of DrF1 and essentially shuts off target-gene expression. In pREDawn, the signaling output is inverted by a genetic circuit comprising the λ phage cI repressor and its associated target promoter pR [12, 27]. Red light thus strongly upregulates

Control of Bacterial Expression by Red and Blue Light

465

Protein production Dark Light LacIq

a pREDusk

FixK2

HO

DrF1

FixJ

DrF1

FixJ

GOI

LacIq

b pREDawn

c pCrepusculo

d pAurora

pR

HO

P1

GOI

FixK2

cI

P2

GOI

PAL

P1

pR

P2

PAL

cI

GOI

Fig. 1 Architecture and light responses of the pREDusk, pREDawn, pCrepusculo, and pAurora plasmids. (a) pREDusk relies on the chimeric photoreceptor DrF1 which consists of the photosensory core module of the bacterial phytochrome from Deinococcus radiodurans and the histidine kinase effector from Bradyrhizobium japonicum FixL. Together, with the response regulator FixJ, also from B. japonicum, DrF1 forms a two-component system and regulates the expression of a gene of interest (GOI), with strong expression in darkness but not under red light. A heme oxygenase (HO) enzyme provides the biliverdin chromophore of DrF1. (b) The pREDawn architecture extends pREDusk by a gene cassette comprising the λ phage cI repressor and its promoter pR. The system response is thereby inverted and gene expression is induced by red light. (c) pCrepusculo is based on the photoreceptor PAL from Nakamurella multipartita which sequence-specifically binds to small RNA hairpins upon blue-light exposure. By embedding said hairpins in the ribosome-binding site upstream of the GOI, its expression is subjected to light control. The expression in darkness is higher than under blue light. (d) The response of pAurora is inverted by the λ-cI-based gene-inversion cassette and affords gene expression that is activated by blue light rather than being repressed

target-gene expression in pREDawn rather than inhibiting it (Fig. 1b). Both the pREDusk and pREDawn plasmids possess a ColE1 origin of replication (ori), and separate versions of the vectors are available with ampicillin (Amp), kanamycin (Kan), and streptomycin (Strep) resistance markers [20]. pCrepusculo and pAurora differ from the above systems not only in their color sensitivity, i.e., blue instead of red and far-red, but also in that light-dependent regulation is achieved at the mRNA level [17] (Fig. 1c, d). The two systems leverage the RNA-binding LOV receptor PAL from Nakamurella multipartita [28] which upon photoactivation by blue light sequencespecifically interacts with small RNA hairpins with up to several hundredfold enhanced affinity [17]. By interspersing the RNA

466

Stefanie S. M. Meier et al.

hairpin with the ribosome-binding site of a target gene, lightinduced binding by PAL can modulate the expression of this gene. This concept is realized in pCrepusculo (Fig. 1c) which affords downregulation of target genes under blue light. As described for pREDawn above, a λ-cI-based gene cassette inverts and amplifies the light-dependent signal output in pAurora (Fig. 1d). Given that pCrepusculo and pAurora use a CDF ori and a Strep resistance marker, they can be readily combined inside the same bacterial cell with pREDusk and pREDawn (that is, with the Amp- and Kan-conferring plasmid versions). In this manner, the multiplexed control of two separate target genes is achieved [20]. We here provide protocols detailing the deployment of the pREDusk, pREDawn, pCrepusculo, and pAurora plasmids for light-regulated gene expression in Escherichia coli. The performance of each light-responsive genetic circuit can be assessed separately using fluorescent reporter genes and flow cytometry (see Subheading 3.1). Due to different replication origins, light color sensitivities, and regulatory mechanisms, pREDusk and pREDawn lend themselves to combinations with pCrepusculo and pAurora which we describe in Subheading 3.2. Multiplexed gene expression using combinations of these plasmids is scalable to the semipreparative scale and supports the graded production of two individual proteins, as controlled by red and blue light (see Subheading 3.3). Taken together, these protocols guide the implementation and, as desired, the multiplexing of the plasmids for light-regulated bacterial expression. Given the increasing relevance of pertinent approaches for biotechnological, biomedical, and material-science use cases [10], the pREDusk, pREDawn, pCrepusculo, and pAurora plasmids appear versatile and widely applicable.

2

Materials

2.1 Bacterial Strains and Cultivation

The cultivation of bacteria underlies all experiments described in Subheadings 3.1, 3.2 and 3.3 and requires these materials: 1. Lysogeny broth (LB) liquid medium supplemented with 50 μg mL-1 kanamycin (Kan) and/or 100 μg mL-1 streptomycin (Strep). 2. LB agar plates supplemented with 50 μg mL-1 Kan and/or 100 μg mL-1 Strep. 3. E. coli strain of choice, e.g., the CmpX13 strain [29] (see Note 1). 4. Plasmids pREDusk-MCS (Addgene identifier 188970), pREDawn-MCS (188971), pCrepusculo-MCS (190156),

Control of Bacterial Expression by Red and Blue Light

467

and pAurora-MCS (190158) (MCS, multiple cloning site) (see Note 2). 5. Microtiter plate (MTP) shaker (e.g., PMS-1000i Grant-bio). 6. Incubator (e.g., Lucky Reptile), protected from ambient light by covering the windows with tin foil or cardboard. 7. Incubator shaker (e.g., New Brunswick Innova 42R) with window covered for light protection. 8. Light power meter to measure and adjust light intensities (e.g., Newport model 842-PE equipped with a 918D-UV-OD3 silicon detector). 2.2 Implementation and Analysis of Individual Plasmid Systems 2.2.1 pREDusk and pREDawn

The following materials are required for the experiments in Subheading 3.1:

1. Plasmids pREDusk-DsRed and pREDawn-DsRed bearing the red-fluorescent reporter DsRed Express2 [30] and a Kan resistance marker (see Note 3). 2. Empty-vector control pREDusk-MCS (see above). 3. Phosphate-buffered saline pH 7.4 (Fisher Scientific). 4. 10 mM phosphate buffer pH 7.4. 5. 50 mg mL-1 chloramphenicol in 100% ethanol. 6. 10 mg mL-1 tetracycline in water. 7. Arduino-controlled red LED spotlight (e.g., Mightex SLS-0310-C, 656 nm) with 3D-printed holders for illumination of Erlenmeyer flasks (Fig. 2a). 8. 500 mL baffled Erlenmeyer flasks. 9. Microcentrifuge tubes. 10. Flow cytometer (e.g., NovoCyte Quanteon 4025 or BioRad S3e Cell sorter).

2.2.2 pCrepusculo and pAurora

1. Plasmids pCrepusculo-DsRed and pAurora-DsRed bearing the red-fluorescent reporter DsRed Express2 [30] (see Note 3). 2. Empty-vector control pCDF-Duet (Novagen). 3. Panel with blue-light-emitting WEHBL01-D1M, 470 nm).

diodes

(e.g.,

Winger

4. ProFlow sheath fluid (BioRad) containing 1 mM Na EDTA, 1.9 mM K phosphate, 3.8 mM KCl, 16.6 mM Na phosphate, 139 mM NaCl, pH 7. 5. 96-deep-well clear MTPs (Axygen P-DW-11-C). 6. 96-well clear MTPs (ThermoFisher 269620).

468

Stefanie S. M. Meier et al.

a

b 1

2

3

4

c

1

d

2 3 4 5 6

Fig. 2 Illumination setups for optogenetic control. (a) Schematic of illumination setup for Erlenmeyer flasks. (b) Setup for illuminating bacterial cells during incubation comprising: (1) incubator protected from ambient light; (2) Arduino microcontroller; (3) programmable LED matrix embedded in 3D-printed housing (see panel (c) for details); (4) shaker. (c) Schematic of device depicted in panel (b) that allows illumination of bacterial cultures in MTPs from below. (1) Spring clip; (2) gas-permeable membrane; (3) black-wall clear-bottom 96-well MTP; (4) bacterial cultures; (5) 3D-printed adapter; (6) 8 × 8 LED array. (d) Top view of the assembled setup with individual wells exposed to varying intensities of red [(634 ± 8) nm] and blue light [(463 ± 12) nm]

7. Gas-permeable sealing membrane (Corning BF-410400-S). 8. Flow cytometer, e.g., NovoCyte Quanteon 4025 or S3e Cell sorter (BioRad). 2.3 Multiplexing of Systems

The following materials are required for the experiments in Subheading 3.2: 1. Plasmids pREDusk-YPet and pREDawn-YPet bearing the yellow-fluorescent YPet [31] reporter and a Kan resistance marker (see Note 3). 2. pCrepusculo-DsRed and pAurora-DsRed bearing the red-fluorescent reporter DsRed Express2 [30] and a Strep resistance marker (see Note 3). 3. Empty-vector controls pREDusk-MCS, pREDawn-MCS, pCrepusculo-MCS, and pAurora-MCS (see above). 4. Panel with blue-light-emitting WEHBL01-D1M, 470 nm).

diodes

(e.g.,

Winger

Control of Bacterial Expression by Red and Blue Light

469

5. Panel with red-light-emitting diodes (e.g., Kingbright L53SRCJ4, 660 nm). 6. Arduino-controlled LED illumination setup detailed in [32, 33] (Fig. 2c, d). 7. 3D-printed housing for LED illumination device (templates available for download at http://www.moeglich.uni-bayreuth. de/en/software/index.html). 8. 96-well MTPs with black walls and clear bottom (μclear plates, Greiner 655906). 9. 96-well clear MTPs (ThermoFisher 269620). 10. 96-well black MTPs (ThermoFisher 237108). 11. Multimode MTP reader (e.g., Tecan Infinite M200pro). 2.4 Multiplexed Optogenetic Control of Semi-preparative Protein Production

The following materials are required for the experiments in Subheading 3.3: 1. Plasmid pREDusk-YPet bearing the yellow-fluorescent YPet [31] reporter and a Kan resistance marker (see Note 3). 2. pAurora-DsRed bearing the red-fluorescent reporter DsRed Express2 [30] and a Strep resistance marker (see Note 3). 3. Empty-vector controls pREDusk-MCS and pAurora-MCS (see above). 4. 35 mg mL-1 chloramphenicol in 100% ethanol. 5. 15 mg mL-1 tetracycline in 70% ethanol. 6. 96-well clear MTPs (ThermoFisher 269620). 7. 96-well black MTPs (ThermoFisher 237108). 8. Panel with blue-light-emitting WEHBL01-D1M, 470 nm).

diodes

(e.g.,

Winger

9. Panel with red-light-emitting diodes (e.g., Kingbright L53SRCJ4, 660 nm). 10. 250 mL baffled Erlenmeyer flasks. 11. PCR tubes. 12. Multimode MTP reader (e.g., Tecan Infinite M200pro).

3

Methods

3.1 Implementation and Analysis of Individual Plasmid Systems

Subheading 3.1.1 describes the use of the pREDusk and pREDawn plasmids for red-light-controlled bacterial gene expression. Likewise, Subheading 3.1.2 covers the blue-light-responsive pCrepusculo and pAurora setups.

470

Stefanie S. M. Meier et al.

3.1.1 pREDusk and pREDawn

1. Streak bacteria carrying the plasmids pREDusk-DsRed, pREDawn-DsRed, or pREDusk-MCS on LB/Kan agar plates. 2. Incubate the plates overnight (~16 h) at 37 °C under non-inducing conditions, i.e., red light (660 nm, 100 μW cm-2) or darkness for pREDusk and pREDawn, respectively. 3. Prepare five 100 mL LB/Kan cultures in 500-mL Erlenmeyer flasks and inoculate them with single colonies from the agar plates (2× pREDusk-DsRed, 2× pREDawn-DsRed, and 1× pREDusk-MCS). 4. Incubate the cultures at 37 °C and 220 rpm agitation under non-inducing conditions, i.e., red light (660 nm, 100 μW cm2 ) or darkness for pREDusk and pREDawn, respectively. 5. At an optical density of the cultures at 600 nm (OD600) of 0.5, transfer one flask each of pREDusk-DsRed and pREDawnDsRed to inducing conditions, i.e., darkness or red light (660 nm, 100 μW cm-2) for pREDusk and pREDawn, respectively (Fig. 2a). Keep the other flasks under non-inducing conditions, and the pREDusk-MCS in darkness (see Note 4). 6. Continue with the incubation at 37 °C and 220 rpm agitation for 20 h. 7. Following incubation, pipet 200 μL of the cultures to microcentrifuge tubes and supplement them with antibiotics by adding 14 μL chloramphenicol solution and 8 μL tetracycline solution (see Note 5). 8. Keep the samples on ice in darkness for 2 h to allow DsRed maturation. 9. Pellet the bacteria by centrifugation at 1845× g for 2.5 min. Resuspend the pellet in 500 μL PBS and store on ice. 10. For analysis by flow cytometry, dilute the samples ten-fold in PBS. 11. Record single-cell fluorescence of approximately 100,000 cells on a flow cytometer using a 561-nm excitation laser and a (586 ± 20)-nm bandpass emission filter (see Note 6). 12. Fit the frequency distribution of the logarithm of the singlecell fluorescence to a skewed Gaussian probability density function (Fig. 3a, b), e.g., using the Fit-o-mat software [34].

3.1.2 pAurora and pCrepusculo

1. Streak bacteria harboring the plasmids pCrepusculo-DsRed, pAurora-DsRed, or the pCDF control on LB/Strep agar plates. 2. Incubate the plates overnight (~16 h) in darkness at 37 °C. 3. Inoculate triplicates of 500 μL LB/Strep medium for every sample in a 96-deep-well plate by picking single bacterial clones from the respective agar plates.

Control of Bacterial Expression by Red and Blue Light

a

b

0.3

Frequency

Frequency

0.3

0.2

0.1

0.2

0.1

0

0 0 102

103

104

105

106

0 10

Fluorescence [a. u.]

0.1

Frequency

0.1

Frequency

d 0.15

0.05

102

103

104

105

Fluorescence [a. u.]

3

104

105

106

Fluorescence [a. u.]

c 0.15

0 101

471

106

0.05

0 101

102

103

104

105

106

Fluorescence [a. u.]

Fig. 3 Characterization of the light responses of the pREDusk, pREDawn, pCrepusculo, and pAurora systems. The fluorescence of the red-fluorescent reporter DsRed was measured at the single-cell level by flow cytometry. (a) Bacteria harboring pREDusk-DsRed were incubated in darkness (gray curves) or under red light (red curves). The empty-vector control pREDusk-MCS is shown in orange. Note that the abscissa is split with fluorescence values below 100 a. u. shown on a linear scale and those above on a logarithmic scale. (b) As in panel a but for bacteria carrying pREDawn-DsRed. (c) Single-cell fluorescence of bacteria harboring pCrepusculo-DsRed and cultivated in darkness or under blue light (blue curves), with a pCDF empty-vector control displayed in orange. (d) As in panel c but for bacteria equipped with pAurora-DsRed. For each bacterial clone and light condition, data for three biologically independent replicates are shown. For each replicate, at least 105 cells were analyzed

4. Seal the plates with a gas-permeable membrane and incubate for 16 h in darkness at 37 °C and 600 rpm agitation. 5. Transfer 2× 2 μL of each culture to 198 μL LB/Strep medium within two separate clear MTPs and seal them with a gas-permeable membrane. 6. Incubate one plate in darkness and the other under blue light (470 nm, 60 μW cm-2) at 37 °C and 800 rpm agitation for 24 h (see Note 7). 7. For flow-cytometry measurements, dilute the cells ten- to thirtyfold in sheath fluid.

472

Stefanie S. M. Meier et al.

8. Analyze single-cell fluorescence on a flow cytometer [excitation lasers at 488 and 561 nm, emission at (585 ± 15) nm] by collecting approximately 200,000 events per sample (see Note 6). 9. Fit the data to log-normal functions (Fig. 3c, d), e.g., by using the Fit-o-mat software [34]. 3.2 Multiplexing of Systems

1. Streak bacteria harboring either pREDusk-YPet or pREDawnYPet combined with either pCrepusculo-DsRed or pAuroraDsRed on LB/Kan + Strep agar plates. As empty-vector controls, streak bacteria harboring the pREDusk-MCS plasmid on LB/Kan plates (see Note 8). 2. Incubate the agar plates overnight (~16 h) at 37 °C in darkness. 3. Inoculate 5-mL LB cultures supplemented with antibiotics for each bacterial clone carrying different plasmid combinations by picking a single colony from the respective agar plates. 4. Incubate for 24 h at 30 °C and 225 rpm agitation under non-inducing conditions, i.e., darkness for the pREDawn/ pAurora combination, red light (660 nm, 100 μW cm-2) for pREDusk/pAurora, blue light (470 nm, 60 μW cm-2) for pREDawn/pCrepusculo, and both red and blue light for pREDusk/pCrepusculo. 5. After incubation, dilute each of the cultures 100-fold into 20 mL LB medium supplemented with antibiotics. Transfer 200 μL of the dilution into wells A1-H8 (i.e., 64 wells in total) of a black-walled, clear-bottom MTP and seal with a gas-permeable membrane. 6. Configure the Arduino LED illumination setup such that each of the desired 64 wells (A1-H8) is illuminated with an individualized combination of blue and/or red light at varying intensities (Fig. 2d) (see Note 9). 7. Place the MTP on top of the programmable Arduino illumination setup (Fig. 2b, c) [32, 33]. 8. Incubate at 37 °C and 750 rpm agitation for 18 h in an incubator protected against ambient light. 9. After incubation, transfer 50 μL from each well into wells of a clear MTP containing 200 μL H2O each. Measure the absorbance at 600 nm (OD600) of the resulting fivefold dilution using a multimode MTP reader. 10. Next, transfer 50 μL from each well of the first, fivefold dilution into wells of a black MTP containing 200 μL H2O each. Measure the YPet and DsRed fluorescence (F) using excitation wavelengths of (500 ± 9) nm and (530 ± 20) nm, and emission wavelengths of (554 ± 9) nm and (591 ± 20) nm, respectively (see Note 10).

Control of Bacterial Expression by Red and Blue Light

473

Fig. 4 Multiplexed optogenetic control of bacterial expression by red and blue light. Bacteria harboring either pREDusk-YPet or pREDawn-YPet paired with either pCrepusculo-DsRed or pAurora-DsRed were cultivated under different intensities of red and blue light. The left panels (a, c, e, g) show the YPet fluorescence in the bacterial cultures following incubation, and the right panels (b, d, f, h) report the DsRed fluorescence in these cultures. The individual panels show data for bacteria carrying (a, b) pREDusk-YPet and pCrepusculo-DsRed; (c, d) pREDusk-YPet and pAurora-DsRed; (e, f) pREDawn-YPet and pCrepusculo-DsRed; and (g, h) pREDawnYPet and pAurora-DsRed. Data represent the mean of three biologically independent replicates

11. Determine F/OD600 ratios for each well and calculate the mean and standard deviation of the biological replicates. Correct the data for background fluorescence of the empty-vector control.

474

Stefanie S. M. Meier et al.

12. Normalize the data and plot as a three-dimensional surface diagram (Fig. 4), e.g., by using Python/matplotlib. 3.3 Multiplexed Optogenetic Control of Semi-preparative Protein Production

1. Streak bacteria harboring the plasmids pREDusk-YPet and pAurora-DsRed on LB/Kan + Strep-agar plates. As an emptyvector control, also streak bacteria containing the pREDuskMCS and pAurora-MCS plasmids (see Note 8). 2. Incubate the agar plates overnight (~16 h) at 37 °C in darkness. 3. Inoculate 5 mL LB/Kan + Strep medium with single bacterial clones from the plates in triplicate. 4. Incubate overnight (~16 h) at 37 °C and 225 rpm agitation at non-inducing conditions, i.e., red light (660 nm, 100 μW cm2 ). 5. After incubation, for each replicate use 4× 1 mL culture to inoculate four Erlenmeyer flasks containing 100 mL LB/Kan + Strep (see Note 11). 6. Incubate flasks at 37 °C and 225 rpm agitation at non-inducing conditions, i.e., red light (660 nm, 100 μW cm-2). 7. At an OD600 of ~0.5, transfer one flask for each of the replicates of the pREDusk-YPet/pAurora-DsRed combination or the empty-vector control to red light (660 nm, 100 μW cm-2), blue light (470 nm, 10 μW cm-2), or both red and blue light. The remaining flasks are kept in darkness. 8. Incubate at 37 °C and 225 rpm agitation for 24 h. 9. After incubation, take 200-μL aliquots from each flask and add 23 μL chloramphenicol and 6 μL tetracycline (see Note 5). 10. Keep the samples on ice in darkness for 2 h to allow DsRed and YPet maturation. 11. Measure the optical density at 600 nm (OD600), and the DsRed and YPet fluorescence (F) of the samples with a multimode MTP reader as described in Subheading 3.2. 12. Determine the F/OD600 ratio and calculate the mean and standard deviation across the biological replicates. Correct the data for background fluorescence of the empty-vector control. 13. Normalize the data and plot (Fig. 5).

4

Notes 1. To the extent tested, pREDusk, pREDawn, pCrepusculo, and pAurora widely apply to different E. coli bacteria. The use of a specific strain is hence not required.

Fluorescence / OD600 [a.u.]

Control of Bacterial Expression by Red and Blue Light

475

1

0.5

0 Red

Dark

Red+Blue

Blue

Fig. 5 Multiplexed optogenetic control of protein expression at the semipreparative scale. Bacteria containing the pREDusk-YPet and pAurora-DsRed plasmids were incubated either under red light (i.e., non-inducing conditions), in darkness (i.e., activation of pREDusk), under red and blue light (i.e., activation of pAurora), or under blue light (i.e., activation of both pREDusk and pAurora). The DsRed and YPet fluorescence in the cultures after incubation is shown in pink and green, respectively. Data represent mean ± s.d. of three biologically independent replicates

2. For the subsequent experiments, fluorescent reporters must be introduced into these plasmids, for instance, via restriction or Gibson cloning [35]. 3. The reporter may be substituted with another fluorescent protein of choice. 4. For keeping cell cultures in darkness, the flasks can be, for instance, covered with tin foil. 5. The mix of chloramphenicol and tetracycline arrests cell growth and translation. 6. The settings for the excitation laser and the emission filter chosen here apply to DsRed. If another fluorescent protein is used, these values require adjustment. 7. For adjusting the light intensity, take into account the attenuation by the sealing membrane. 8. The combined plasmid systems can be introduced to the bacteria by transformation either simultaneously or sequentially. 9. Rather than continuous illumination, intermittent light with a duty cycle of, e.g., 1:10 may be used, i.e., 20 s illumination, followed by 180 s in darkness. 10. The settings for the excitation laser and the emission filter chosen here apply to DsRed and YPet, respectively. If other fluorescent proteins are used, these values require adjustment.

476

Stefanie S. M. Meier et al.

11. To prevent inadvertent induction by ambient light, this step should be performed under red light (e.g., by using LED stripes).

Acknowledgements Financial support was provided by the European Commission (FET Open NEUROPA, grant 863214 to A.M.), the Deutsche Forschungsgemeinschaft (grant MO2192/4-2 to A.M), the Academy of Finland (grant 330678 to H.T.), a three-year grant from the University of Helsinki (to E.M and H.T.), the Finnish Cultural Foundation (grant 00220697 to E.M.), and a Bayreuth Humboldt Centre Senior Fellowship 2020 (to H.T.). References 1. Hegemann P (2008) Algal sensory photoreceptors. Annu Rev Plant Biol 59:167–189 2. Mo¨glich A, Yang X, Ayers RA et al (2010) Structure and function of plant photoreceptors. Annu Rev Plant Biol 61:21–47 3. Deisseroth K, Feng G, Majewska AK et al (2006) Next-generation optical technologies for illuminating genetically targeted brain circuits. J Neurosci 26:10380–10386 4. Losi A, Gardner KH, Mo¨glich A (2018) Bluelight receptors for optogenetics. Chem Rev 118:10659–10709 5. Tang K, Beyer HM, Zurbriggen MD et al (2021) The red edge: bilin-binding photoreceptors as optogenetic tools and fluorescence reporters. Chem Rev 121:14906–14956 6. Nagel G, Ollig D, Fuhrmann M et al (2002) Channelrhodopsin-1: a light-gated proton channel in green algae. Science 296:2395– 2398 7. Nagel G, Szellas T, Huhn W et al (2003) Channelrhodopsin-2, a directly light-gated cation-selective membrane channel. Proc Natl Acad Sci U S A 100:13940–13945 8. Boyden ES, Zhang F, Bamberg E et al (2005) Millisecond-timescale, genetically targeted optical control of neural activity. Nat Neurosci 8:1263–1268 9. Zhang F, Wang L-P, Brauner M et al (2007) Multimodal fast optical interrogation of neural circuitry. Nature 446:633–639 10. Ohlendorf R, Mo¨glich A (2022) Lightregulated gene expression in bacteria: fundamentals, advances, and perspectives. Front Bioeng Biotechnol 10:1029403

11. Ziegler T, Mo¨glich A (2015) Photoreceptor engineering. Front Mol Biosci 2:30 12. Ohlendorf R, Vidavski RR, Eldar A et al (2012) From dusk till dawn: one-plasmid systems for light-regulated gene expression. J Mol Biol 416:534–542 13. Baumschlager A, Aoki SK, Khammash M (2017) Dynamic blue light-inducible T7 RNA polymerases (Opto-T7RNAPs) for precise spatiotemporal gene expression control. ACS Synth Biol 6:2157–2167 14. Han T, Chen Q, Liu H (2017) Engineered photoactivatable genetic switches based on the bacterium phage T7 RNA polymerase. ACS Synth Biol 6:357–366 15. Ding Q, Ma D, Liu G-Q et al (2020) Lightpowered Escherichia coli cell division for chemical production. Nat Commun 11:2262 16. Dietler J, Schubert R, Krafft TGA et al (2021) A light-oxygen-voltage receptor integrates light and temperature. J Mol Biol 433:167107 17. Ranzani AT, Wehrmann M, Kaiser J et al (2022) Light-dependent control of bacterial expression at the mRNA level. ACS Synth Biol 11:3482–3492 18. Levskaya A, Chevalier AA, Tabor JJ et al (2005) Synthetic biology: engineering Escherichia coli to see light. Nature 438:441–442 19. Tabor JJ, Levskaya A, Voigt CA (2011) Multichromatic control of gene expression in Escherichia coli. J Mol Biol 405:315–324 20. Multam€aki E, Garcı´a de Fuentes A, Sieryi O et al (2022) Optogenetic control of bacterial expression by red light. ACS Synth Biol 11: 3354–3367

Control of Bacterial Expression by Red and Blue Light 21. Fushimi K, Narikawa R (2019) Cyanobacteriochromes: photoreceptors covering the entire UV-to-visible spectrum. Curr Opin Struct Biol 57:39–46 22. Diensthuber RP, Bommer M, Gleichmann T et al (2013) Full-length structure of a sensor histidine kinase pinpoints coaxial coiled coils as signal transducers and modulators. Structure 21:1127–1136 23. Davis SJ, Vener AV, Vierstra RD (1999) Bacteriophytochromes: phytochrome-like photoreceptors from nonphotosynthetic eubacteria. Science 286:2517–2520 24. Mo¨glich A, Ayers RA, Moffat K (2009) Design and signaling mechanism of light-regulated histidine kinases. J Mol Biol 385:1433–1444 25. Buschiazzo A, Trajtenberg F (2019) Two-component sensing and regulation: how do histidine kinases talk with response regulators at the molecular level? Annu Rev Microbiol 73:507–528 26. Mo¨glich A (2019) Signal transduction in photoreceptor histidine kinases. Protein Sci 28: 1923–1946 27. Elowitz MB, Leibler S (2000) A synthetic oscillatory network of transcriptional regulators. Nature 403:335–338

477

28. Weber AM, Kaiser J, Ziegler T et al (2019) A blue light receptor that mediates RNA binding and translational regulation. Nat Chem Biol 15:1085–1092 29. Mathes T, Vogl C, Stolz J et al (2009) In vivo generation of flavoproteins with modified cofactors. J Mol Biol 385:1511–1518 30. Strack RL, Strongin DE, Bhattacharyya D et al (2008) A noncytotoxic DsRed variant for whole-cell labeling. Nat Methods 5:955–957 31. Nguyen AW, Daugherty PS (2005) Evolutionary optimization of fluorescent proteins for intracellular FRET. Nat Biotechnol 23:355– 360 32. Hennemann J, Iwasaki RS, Grund TN et al (2018) Optogenetic control by pulsed illumination. Chembiochem 19:1296–1304 33. Dietler J, Stabel R, Mo¨glich A (2019) Pulsatile illumination for photobiology and optogenetics. In: Methods in enzymology. Elsevier, pp 227–248 34. Mo¨glich A (2018) An open-source, cross-platform resource for nonlinear least-squares curve fitting. ACS Publications 35. Gibson DG, Young L, Chuang R-Y et al (2009) Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat Methods 6: 343–345

Chapter 27 In Silico Design, In Vitro Construction, and In Vivo Application of Synthetic Small Regulatory RNAs in Bacteria Michel Bru¨ck, Bork A. Berghoff, and Daniel Schindler Abstract Small regulatory RNAs (sRNAs) are short non-coding RNAs in bacteria capable of post-transcriptional regulation. sRNAs have recently gained attention as tools in basic and applied sciences, for example, to finetune genetic circuits or biotechnological processes. Even though sRNAs often have a rather simple and modular structure, the design of functional synthetic sRNAs is not necessarily trivial. This protocol outlines how to use computational predictions and synthetic biology approaches to design, construct, and validate synthetic sRNA functionality for their application in bacteria. The computational tool, SEEDling, matches the optimal seed region with the user-selected sRNA scaffold for repression of target mRNAs. The synthetic sRNAs are assembled using Golden Gate cloning and their functionality is subsequently validated. The protocol uses the acrA mRNA as an exemplary proof-of-concept target in Escherichia coli. Since AcrA is part of a multidrug efflux pump, acrA repression can be revealed by assessing oxacillin susceptibility in a phenotypic screen. However, in case target repression does not result in a screenable phenotype, an alternative validation of synthetic sRNA functionality based on a fluorescence reporter is described. Key words Synthetic sRNAs, Golden Gate cloning, Post-transcriptional regulation, Synthetic biology, Synthetic sRNA prediction, Functional sRNA profiling, Seed region

1

Introduction Synthetic biology aims to see biological functions and concepts from the perspective of an engineer. Synthetic biologists perform characterization and standardization of molecular functions and try to apply them towards a greater aim, for example, the construction and improvement of producer strains. Producer strains often contain a heterologous pathway which has to be optimized on multiple levels. The toolbox for gene expression is growing constantly and many tools for the precise control of gene expression by transcription factors are available. In recent years, tools for posttranscriptional regulation gained increasing interest. Posttranscriptional regulation can be achieved, e.g., by CRISPR/Cas systems. They rely on the interplay of a large protein (Cas13 or

Jeffrey Carl Braman (ed.), Synthetic Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2760, https://doi.org/10.1007/978-1-0716-3658-9_27, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2024

479

480

Michel Bru¨ck et al.

Cas7-11) and co-expression of a respective guide RNA [1–5]. However, CRISPR/Cas systems for post-transcriptional regulation may interfere with the cellular machinery in bacteria because the mRNA-targeting CRISPR/Cas systems may induce dormancy or cell death [3]. Other elements for post-transcriptional control are riboswitches which are integrated into the 5′ untranslated region (UTR) of mRNAs for precise control of translation [6]. However, riboswitches need the genetic engineering of every target gene, which may be cumbersome in particular in non-model organisms. Bacteria exploit manifold post-transcriptional regulators to quickly adapt to environmental stimuli. Among them are small regulatory RNAs (sRNAs) which are expressed in specific growth phases or upon environmental stress conditions [7]. Expression of natural sRNAs often prevents translation of mRNAs by an antisense mechanism blocking translation initiation (Fig. 1a). The RNA chaperone Hfq is an important factor facilitating many sRNA/ mRNA interactions. Furthermore, Hfq may recruit RNase E and mediate RNase E-dependent mRNA decay [8, 9]. sRNAs, like the well-studied RybB sRNA, have a modular structure consisting of a seed region and a scaffold (Fig. 1b). The scaffold usually provides a Rho-independent transcriptional terminator (hairpin loop followed by consecutive uracils) and mediates interaction with Hfq. The seed region is capable of binding the corresponding target mRNAs by base-pairing. Binding of the seed region within the translation initiation region (TIR) of an mRNA usually prevents protein translation. Because of their simple and modular structure, sRNAs are gaining an increased interest for synthetic biology and biotechnological applications. The rational design of seed regions together with suitable scaffolds supports engineering of synthetic sRNAs, which can subsequently be applied for targeted repression of mRNA translation. The modularity and small size allow fast and large-scale synthesis of multiple variants and parallel testing of the resulting synthetic sRNAs. Here, we provide (i) a detailed protocol for in silico prediction of seed regions for synthetic sRNA design, (ii) a modular cloning approach for the in vitro construction of sRNAs, and (iii) assays for the functional characterization of the synthetic sRNAs in vivo (Fig. 2a). Once the optimal sRNA is identified and characterized, it can be used for its target application. The seed prediction tool, SEEDling (https://github.com/DIGGER-Bac/SEEDling), is used to predict optimal seed regions for a chosen target [10]. SEEDling integrates multiple parameters to reduce off-target effects and minimize the risk of structural changes in the synthetic sRNAs. The predicted seed regions are synthesized as oligonucleotides and used for in vitro construction of synthetic sRNA transcriptional units (TUs) by Golden Gate cloning. The chapter further provides strategies for the functional characterization of the constructed sRNA TUs based on phenotypic and/or

Bacterial Synthetic Small Regulatory RNAs

A

481

sRNA mediated translation inhibition

CDS

5´-UTR

3´-UTR

Ribosome

sRNA mediated mRNA decay

RNase E 5´-UTR

CDS

5´-UTR

3´-UTR

3´-UTR

U U U CG G A C U C U G C C A CC- G A C- G A- U A- U C- G U- A A- U C- G C- G C- G G- U A- U G U U U U - 3' G U 5'- G C C A C U G C U U U U C U U U G A U G U C C C C A U U U U G

hairpin loop

B

CDS

seed region

Hfq binding site RybB scaffold

Fig. 1 Function and modular structure of bacterial sRNAs. (a) Inhibition of translation by an sRNA bound to its target mRNA (upper panel), and RNase E recruitment to facilitate mRNA decay (lower panel). UTR indicates untranslated region and CDS the coding sequence. For simplicity the TIR is not indicated. (b) RybB, a well characterized sRNA, shows three distinct features: a seed region for base-pairing with its target, a hairpin loop for transcription termination, and a Hfq binding site. The transcriptional terminator and Hfq binding site are part of the RybB scaffold

fluorescent reporter assays [11]. In this protocol, we target the acrA mRNA, encoding a component of a major multidrug efflux pump in E. coli and relatives, as an example. The absence of AcrA increases susceptibility towards the β-lactam antibiotic oxacillin and

Michel Bru¨ck et al. B Application Reporter based characterization

500

200

32

80

12.80

5.12

0.33

0.13

Synthetic sRNA construction and validation

0

SEEDling sequence design

0.05

μg/mL oxacillin

Phenotypic characterization

2.05

A

0.82

482

MG1655 ΔacrA 2.5-fold dilution series

C

acrAp σ70 acrA

acrB

acrAp σ70

acrA

agatttacatacatttgtgaatgtatgtaccatagcacgacgataatataaacgcagcaatgggtttattaacttttgaccattgaccaatttgaaatcggacactcgaggtttacatATGAACAAAAACAGAGGG...

-35

-10

Fig. 2 Workflow for synthetic sRNA construction, oxacillin susceptibility assay and structure of the acrAB operon. (a) The general workflow of the presented protocol for in silico design, in vitro construction and in vivo application of synthetic sRNAs in bacteria. (b) Determination of the minimum inhibitory concentration (MIC) for oxacillin in E. coli wild type MG1655 and a ΔacrA deletion strain. The increased oxacillin susceptibility of E. coli ΔacrA can be used as a phenotypic readout for the functionality of synthetic sRNAs. The dilution series is performed in a 2.5-fold manner indicating an approximately 100-fold higher susceptibility of the ΔacrA strain compared to the wild type. (c) Detailed visualization of the acrAB operon. The Shine-Dalgarno sequence and the first codons of the targeted mRNA are the ideal region to identify optimal seed regions for synthetic sRNAs. For visualization, the TIR is underlined and the -35 and -10 regions are indicated

allows for quick functional characterization of acrA-targeting sRNAs (Fig. 2b) [11]. The seed region of the synthetic sRNA ideally covers the TIR, consisting of the Shine-Dalgarno sequence and the start codon, to achieve efficient translational repression (Fig. 2c). Detailed information for E. coli endogenous transcriptional units can be found on https://ecocyc.org/ [12]. Notably, if repression of the targeted mRNA does not produce a phenotype, fluorescent reporters can be used as readout for sRNA functionality. The whole described workflow can be performed in any standard molecular biology laboratory.

2

Materials The conduction of the described protocol requires the following standard laboratory equipment and consumables. 1. Standard reagents, for PCRs.

consumables,

and

instrumentation

2. Standard reagents and equipment for gel electrophoresis. 3. Standard reagents, consumables, and instrumentation for microbial culturing. 4. Standard reagents, consumables, and instrumentation for transformation of E. coli.

Bacterial Synthetic Small Regulatory RNAs

483

5. Standard reagents, consumables, and instrumentation for plasmid extraction. 6. Plate reader for continuous growth absorbance measurements. 7. Plate reader for fluorescence and absorbance detection. 8. Standard micropipettes and consumables, 12 channel pipettes are advised for work with microtiter plates. 2.1 Computational Equipment

Computer with Linux, macOS or Windows OS with Docker installed. The most recent SEEDling Docker image and its documentation is available at https://github.com/DIGGER-Bac/ SEEDling under the CC BY-NC-SA 4.0 license (see Note 1). R and the graphical interface RStudio are used for analysis (https:// www.r-project.org/).

2.2

All plasmids relevant for this protocol are listed in Table 1. Plasmids can be requested from the corresponding author. Notably, any other Golden Gate cloning acceptor plasmids can be used. However, it is imperative to adjust the outlined protocol towards the cloning standard in regard to the desired type IIs enzymes, corresponding overhangs and plasmid features (e.g., antibiotic resistance).

Plasmids

2.3 DNA Oligonucleotides

Relevant oligonucleotides for the conduction of the protocol are provided in Table 2. 100 μM stocks are generated with ddH2O and stored at -20 °C. For PCR and sequencing reactions, 10 μM working stocks are generated with ddH2O and stored at -20 °C.

2.4

Any enzyme with corresponding properties can be used. For the presented protocol, all used enzymes were provided by New England Biolabs (NEB) (see Note 2).

Enzymes

1. T4 DNA Ligase (400,000 units/mL). 2. SapI (10,000 units/mL). 3. Taq DNA Polymerase (5000 units/mL). 2.5

Antibiotics

All antibiotics used in this study are dissolved in sterile H2O and are stored in 1 mL aliquots at -20 °C. The stock concentration is 1000× except for oxacillin which was used in the indicated concentrations. 1. Kanamycin (50 mg/mL stock). 2. Oxacillin (50 mg/mL stock). 3. Spectinomycin (120 mg/mL stock).

484

Michel Bru¨ck et al.

Table 1 Plasmids constructed and used in this study

Name

Parental plasmid

Relevant features

pSL009 pBAD-TOPO derivative serving as empty plasmid control; Kan

R

References [11]

pSL099 Level 0 plasmid for subcloning of fragments to be released with SapI; pMA60 SpecR [18]

This study

pSL123 Level 0 plasmid containing RybB scaffold to be released with SapI; pSL099 SpecR

This study

pSL135 Level 0 plasmid containing PLlacO-1 to be released with SapI; SpecR pSL099

This study

pSL137 pBAD-TOPO derivative allowing Golden Gate cloning of TUs with pSL009 SapI; KanR

This study

pSL574 pSL137 derivative containing RybB TU under PLlacO-1 control with SEEDling seed sequence prediction S#1; KanR

pSL137

This study

pSL575 pSL137 derivative containing RybB TU under PLlacO-1 control with SEEDling seed sequence prediction S#2; KanR

pSL137

This study

pSL576 pSL137 derivative containing RybB TU under PLlacO-1 control with SEEDling seed sequence prediction S#3; KanR

pSL137

This study

pSL577 pSL137 derivative containing RybB TU under PLlacO-1 control with SEEDling seed sequence prediction S#4; KanR

pSL137

This study

pSL578 pSL137 derivative containing RybB TU under PLlacO-1 control with SEEDling seed sequence prediction S#5; KanR

pSL137

This study

pSL579 pSL137 derivative containing RybB TU under PLlacO-1 control with SEEDling seed sequence prediction S#6; KanR

pSL137

This study

pSL580 pSL137 derivative containing RybB TU under PLlacO-1 control with SEEDling seed sequence prediction S#7; KanR

pSL137

This study

pSL581 pSL137 derivative containing RybB TU under PLlacO-1 control with SEEDling seed sequence prediction S#8; KanR

pSL137

This study

2.6 Chemicals, Buffers, and Media Components

1. LB medium (1% (w/v) tryptone, 0.5% (w/v) yeast extract, 1% (w/v) sodium chloride, pH 7.0 ± 0.2). 2. 10 x annealing buffer (10 mM Tris–HCl, pH 7.5–8.0, 50 mM NaCl, and 1 mM EDTA). 3. 1 x TAE buffer (1 mM EDTA · Na2 · 2 H2O, 20 mM acetate, 40 mM Tris–HCl). 4. DNA dye: Thiazole orange dissolved in dimethyl sulfoxide (DMSO) (10,000× stock concentration: 13 mg/mL). 5. Agarose (standard). 6. DNA ladder. 7. Solid media is prepared with 2% Agar.

Colony PCR and sequencing primer Colony PCR and sequencing primer Forward primer of SEEDling predicted seed region S#1 targeting AcrA mRNA with corresponding Golden Gate cloning overhang indicated in bold. Forward primer of SEEDling predicted seed region S#2 targeting AcrA mRNA with corresponding Golden Gate cloning overhang indicated in bold. Forward primer of SEEDling predicted seed region S#3 targeting AcrA mRNA with corresponding Golden Gate cloning overhang indicated in bold. Forward primer of SEEDling predicted seed region S#4 targeting AcrA mRNA with corresponding Golden Gate cloning overhang indicated in bold. Forward primer of SEEDling predicted seed region S#5 targeting AcrA mRNA with corresponding Golden Gate cloning overhang indicated in bold. Forward primer of SEEDling predicted seed region S#6 targeting AcrA mRNA with corresponding Golden Gate cloning overhang indicated in bold. Forward primer of SEEDling predicted seed region S#7 targeting AcrA mRNA with corresponding Golden Gate cloning overhang indicated in bold. Forward primer of SEEDling predicted seed region S#8 targeting AcrA mRNA with corresponding Golden Gate cloning overhang indicated in bold. Reverse primer of SEEDling predicted seed region S#1 targeting AcrA mRNA with corresponding Golden Gate cloning overhang indicated in bold. Reverse primer of SEEDling predicted seed region S#2 targeting AcrA mRNA with corresponding Golden Gate cloning overhang indicated in bold. Reverse primer of SEEDling predicted seed region S#3 targeting AcrA mRNA with corresponding Golden Gate cloning overhang indicated in bold.

CTGTCAAATGGACGAAGCAG

CAGGCAAATTCTGTTTTATCAGACC

ATGTTCATATGTAAACCTC

ATGGTTCATATGTAAACCT

ATGGCCAGAGGCGTAAACC

ATGTCATATGTAAACCTCG

ATGCCCTCTGTTTTTGTTC

ATGCGCCAGAGGCGTAAAC

ATGTGTTCATATGTAAACC

ATGGACCGCCAGAGGCGTA

ATCGAGGTTTACATATGAA

ATCAGGTTTACATATGAAC

ATCGGTTTACGCCTCTGGC

SLo3961

SLo3962

SLo5230

SLo5231

SLo5232

SLo5233

SLo5234

SLo5235

SLo5236

SLo5237

SLo5238

SLo5239

SLo5240

(continued)

Information

Sequence (5′-3′)a

Name

Table 2 Oligonucleotides for the conduction of the described protocol

Bacterial Synthetic Small Regulatory RNAs 485

Reverse primer of SEEDling predicted seed region S#4 targeting AcrA mRNA with corresponding Golden Gate cloning overhang indicated in bold. Reverse primer of SEEDling predicted seed region S#5 targeting AcrA mRNA with corresponding Golden Gate cloning overhang indicated in bold. Reverse primer of SEEDling predicted seed region S#6 targeting AcrA mRNA with corresponding Golden Gate cloning overhang indicated in bold. Reverse primer of SEEDling predicted seed region S#7 targeting AcrA mRNA with corresponding Golden Gate cloning overhang indicated in bold. Reverse primer of SEEDling predicted seed region S#8 targeting AcrA mRNA with corresponding Golden Gate cloning overhang indicated in bold.

ATCCGAGGTTTACATATGA

ATCGAACAAAAACAGAGGG

ATCGTTTACGCCTCTGGCG

ATCGGTTTACATATGAACA

ATCTACGCCTCTGGCGGTC

SLo5241

SLo5242

SLo5243

SLo5244

SLo5245

Bold letters indicate the overlaps for Golden Gate cloning

a

Information

Sequence (5′-3′)a

Name

Table 2 (continued)

486 Michel Bru¨ck et al.

Bacterial Synthetic Small Regulatory RNAs

487

Table 3 Strains used in this study Name

Relevant features

References

E. coli MG1655

K-12 F- λ-

[19]

ΔacrA

E. coli MG1655 ΔacrA::cat; CmR

[11]

acrA-9′-syfp2 E. coli MG1655 acrA-9′-syfp2-cat, translational fusion of first 9 acrA codons; CmR

2.7

Consumables

[11]

1. Single-well microtiter plates (e.g., PlusPlates (PLU-003), Singer Instruments). 2. 96-well microtiter plates. 3. Optical clear microtiter plate seal. 4. PCR tubes. 5. Standard petri dishes.

2.8

3

Strains

All relevant laboratory E. coli strains for the outlined protocol are provided in Table 3.

Methods

3.1 Seed Prediction Using SEEDling

This part of the methods section explains the use of SEEDling (https://github.com/DIGGER-Bac/SEEDling) [10] for the prediction of seed regions for mRNA targeting. The process is performed by targeting the acrA mRNA of E. coli. However, SEEDling can be adjusted to any target organism and target mRNA. SEEDling is capable of generating global predictions for whole genomes. The predicted sequences will be used for the downstream steps of the protocol. The workflow for seed prediction using SEEDling is visualized in Fig. 3a. 1. Install the most recent version of SEEDling based on the documentation on GitHub (https://github.com/DIGGERBac/SEEDling). The use of the Docker image is the preferred use of SEEDling to prevent incompatibility issues. Prior to the use of the SEEDling Docker image, the compatible Docker Desktop App must be installed (https://www.docker.com/ products/docker-desktop/) (see Note 3). 2. For Seedling prediction, the following files in the input folder need to be adjusted towards the prediction goal: (a) config.yml (b) exclude.fasta

Michel Bru¨ck et al.

488 A

idea / project

defining settings

SEEDling calculation

target organism and mRNAs

synthetic sRNA application

output

target prediction

synthetic sRNA specification

SEEDling

sRNA structure integrity

included/excluded sequences

ranked predicted sequences

sRNA off-target exclusion

#8

B

#7

ccaaauguaUACUUGU

AUGCGGAGACCGCCAG

#6

CAAAUGCGGAGACCGC #5 CUUGUUUUUGUCUCCC gcuccaaauguaUACU #3 CCAAAUGCGGAGACCG #2 uccaaauguaUACUUG #1 cuccaaauguaUACUU

#4

ppp-acgauaauauaaacgcagcaauggguuuauuaacuuuugaccauugaccaauuugaaaucggacacucgagguuuacauAUGAACAAAAACAGAGGGUUUACGCCUCUGGCGGUCGUUCUGAUG...

acrA

acrAp σ70 acrA

acrB

Fig. 3 Prediction of seed regions using SEEDling. (a) The workflow of SEEDling is divided into four steps. The initial idea and project conceptualization is followed by setting sRNA criteria by the user. The settings are used by SEEDling to predict the best possible seed regions by keeping the sRNA structure integrity and prevent potential off-target effects. SEEDling provides an output file with detailed information in regard to each prediction to allow the user the best possible selection. (b) Visualization of synthetic sRNAs binding to acrA mRNA in E. coli. Seed regions with a length of 16 nucleotides are based on SEEDling predictions. The number for each synthetic sRNA indicates the ranking within the SEEDling prediction. The 5′-UTR is visualized in italics, the coding sequence in capital letters and the TIR is underlined

(c) include.txt (d) reference_file.gb (a) The configuration file (config.yml) is necessary to be adjusted towards the prediction goals (e.g., target gene, reference genome, seed region length, intended scaffold). The here described protocol generates predictions to target the acrA mRNA (Fig. 3b). (b) The exclude file contains a list of DNA sequences in fasta format to be excluded from the prediction (e.g., type IIs recognition sites). (c) The include file contains the list of target genes selected from the provided reference genome. (d) The reference file contains the annotated genome of the target organism in GenBank format (see Note 4). All files can be edited with a standard text file editor (e.g., Editor in Windows OS or Notepad++, https:// notepad-plus-plus.org/).

Bacterial Synthetic Small Regulatory RNAs

489

3. Adjust the config.yml according to the desired output. The individual settings are explained in detail in the SEEDling GitHub documentation. For this protocol, the following settings were used: # Path to GenBank target genome (used in off-target detection) subject_path: "input/e_coli_k12_mg1655.gb" # Path to GenBank files of genes for which the SEED will be calculated target_path: "input/e_coli_k12_mg1655.gb" # Path to output file, .csv output_path: "input/test.csv" # Number of SEED predictions per gene select_top: 8 # Offset (left of the start codon) start_offset: 40 # Offset (right of the start codon) end_offset: 20 # Step size (for sliding window) step_size: 1 # Length of the SEED region seq_length: 16 # Prefix (Scaffold) srna_prefix: "" # Suffix (Scaffold) srna_suffix: "GATGTCCCCATTTTGTGGAGCCCATCAACCCCGCCATTTCGGTTCAAGGTTGATGGGTTTTTTGTT" # Scaffold+SEED which should be used as a reference for RNApdist srna_template:

"GCCACTGCTTTTCTTTGATGTCCC-

CATTTTGTGGAGCCCATCAACCCCGCCATTTCGGTTCAAGGTTGATGGGTTTTTTGTT" # FASTA file of sites that should be excluded in the final scaffold+SEED exclude_sequences_path: "input/exclude.fasta" # Newline separated txt file of gene names which should be included (whitelist) include_genes_path: "input/include.txt" # E-value for BLASTn off target checks blast_evalue: 0.04

4. Adjust the exclude.fasta according to the desired output (see Note 5). For this protocol the following settings were used: >SapI GCTCTTC

490

Michel Bru¨ck et al.

5. Adjust the include.fasta according to the desired output (see Note 6). For this protocol the following settings were used: acrA

6. Provide the reference of your target organism in the form of a GenBank file. For this protocol the NC_000913 reference of E. coli was used to generate a GenBank file (“e_coli_k12_mg1655.gb”). Besides providing the target gene sequences for seed predictions, the reference is further used for off-target predictions. 7. Open the Docker Desktop application. Load the SEEDling Docker image via terminal, cmd or powershell with the command (see Note 7): docker load --input seedling

8. Start SEEDling Docker image to open a Docker container by choosing the ‘run’ option in the Docker Images tab. Important: When creating the container set the container path in the optional settings to “/home/DIGGER/SEEDling/input”. The host path can be freely adjusted by the user. 9. In the Docker Containers tab: open the terminal of the SEEDling container and execute the prediction program which generates the output as a csv file. Command: python3 SEEDling.py -c input/config.yml

10. The output file is printed as test.csv into the input folder (as defined in the config.yml file). The output file is used to identify ideal seed sequences to be subsequently ordered as oligonucleotides (see Note 8). For the settings used in this protocol, eight seed sequences were predicted and ordered as forward and reverse oligonucleotides for subsequent annealing to generate double stranded DNA for Golden Gate cloning (see Note 9). Figure 3b visualizes the position of the individual predicted seed sequences and Table 4 provides output information for seed region S#1. 3.2 Construction and Validation of Synthetic sRNA TUs by Golden Gate Cloning

The predicted sequences of SEEDling are ordered as oligonucleotides and used in this protocol to generate synthetic sRNA TUs in a pBAD derivative (pSL137) by Golden Gate cloning in combination with previously generated parts for the PLlacO-1 promoter (pSL135) and RybB scaffold (pSL123) (see Note 10). This section describes the Golden Gate cloning procedure in a stepwise manner. The protocol assumes that the individual parts in level 0 plasmids and the acceptor plasmid are available to the user. However, this

Bacterial Synthetic Small Regulatory RNAs

491

Table 4 SEEDling output data for predicted seed region S#1 ID

ZLZRJ1XP

Source

input/e_coli_k12_mg1655.gb

RNAdist

3.86201

Hybrid Energy

-34.48

Prefix

-

Suffix

GATGTCCCCATTTTGTGGAGCCCATCAACCCCGCCATTTCGGTTCAAGGTTGA TGGGTTTTTTGTT

Seed

TTCATATGTAAACCTC

Gene

acrA

Start

485614

End

485630

Strand

-1

Offsite

FALSE

Illegal Site

FALSE

Fold

.......................((((...))))((((((((((((.. (((.....)))....)))))))))))).......

FullSeq

TTCATATGTAAACCTCGATGTCCCCATTTTGTGGAGCCCATCAACCCCGCCA TTTCGGTTCAAGGTTGATGGGTTTTTTGTT

procedure can be adapted to any other Golden Gate cloning system. The only requirements are the adjustment of the type IIs enzyme and the corresponding overhangs for the sRNA TU cloning. Golden Gate cloning, the acceptor plasmid, Golden Gate reaction and the workflow is depicted in Fig. 4. 3.2.1 Golden Gate Cloning

1. Generate double-stranded DNA from single-stranded oligonucleotides by annealing. 2. The two corresponding oligonucleotides are annealed by mixing 4.5 μL of each 100 μM oligonucleotide with 1 μL of 10× annealing buffer in a PCR microcentrifuge tube. Annealing is performed in a thermocycler by heating the respective sample (s) to 95 °C for 5 min and cooling it to 25 °C with the lowest possible ramp speed. Alternatively, the reaction can be performed by boiling the mixture(s) in a heat block or water bath and slowly cooling down to room temperature at the bench. 3. Add 100 μL ddH2O to the annealed oligos (see Note 11).

Michel Bru¨ck et al.

492 A

B recognition site GCTCTTC GTGGCTCTTCCCGGTGTT CGAGAAG CACCGAGAAGGGCCACAA cut site

digest

sticky ends GTGGCTCTTCC CGGTGTT GCTCTTC CACCGAGAAGGGCC ACAA CGAGAAG fragment with fragment without recognition site recognition site

counter selection cassette SapI GAAGAGC CCATTCTGAAGAGCAGG CTTCTCG GGTAAGACTTCTCGTCC

SapI GCTCTTC GTGGCTCTTCCCGGTGTT CGAGAAG CACCGAGAAGGGCCACAA

Golden Gate parts #1 #2

Golden Gate assembly

#1

#2

#3 mCherry

Assembled plasmid

#3 Golden Gate acceptor plasmid

ccdB

Golden Gate acceptor plasmid pSL137 no type IIs recognition site

pBR322 ori

nptII

D

C annealed oligonucleotides

Preparation of DNA for Golden Gate cloning

ATGTTCATATGTAAACCTC AAGTATACATTTGGAGCTA

PLlacO-1

Generation of Golden Gate reaction mix rybB scaffold pSL137 acceptor plasmid one-pot assembly equimolar DNA + SapI + T4 ligase + T4 ligase buffer

sRNA expression plasmid

Golden Gate reaction in a thermocycler

Transformation of Golden Gate reaction

Screening & validation of E. coli candidates

Fig. 4 Golden Gate cloning of synthetic small RNAs. (a) Visualization of Golden Gate cloning principle. The cloning strategy relies on the use of type IIs restriction enzymes which have a determined recognition site but cut an undefined DNA sequence in a fixed distance (upper panel). If type IIs recognition sites and resulting overhangs are orchestrated (indicated by colors), they can be used to perform a multi-fragment assembly into a corresponding acceptor plasmid (lower panel). If the assembly is planned properly, the resulting plasmid has the fragments in the right order and no more recognition sites of the used enzyme are present (green box with arrows). Level 0 parts are released from the plasmid with type IIs recognition sites in the plasmid backbone while the acceptor plasmid has the type IIs recognition sites in the opposite orientation. It is imperative that level 0 plasmids and the acceptor plasmid have different antibiotic selection markers indicated by the dark blue and beige bar respectively. (b) Easy-to-use acceptor plasmid based on Koebel et al. 2022 containing a dual-selection cloning cassette in a pBAD derivative. pSL137 differs from previous plasmids by relying on the use of the SapI type IIs enzyme instead of BbsI. SapI generates 3-nucleotide overhangs instead of 4-nucleotide overhangs (BbsI). (c) Golden Gate cloning strategy used within this protocol. Two basic part plasmids (PLlacO1 promoter [pSL123] and rybB scaffold [pSL135]) and the respective annealed oligonucleotides (red) are assembled with the acceptor plasmid pSL137 in a one-pot Golden Gate reaction using SapI, resulting in the respective sRNA expression plasmid. The arrow depicts the promoter and the red bar in the assembled plasmid depicts the seed region of the synthetic sRNA TU. (d) The workflow of this protocol section includes the preparation and conduction of the Golden Gate reaction with subsequent transformation into E. coli cells and plasmid validation. The stepwise procedure is given in the text. If all parts are available, the reaction and transformation into E. coli can be achieved within 1 day. Single colonies can be isolated on the second day and plasmids for validation can be extracted on the third day. This pipeline allows characterization and application of synthetic sRNA TUs starting on the fourth day

Bacterial Synthetic Small Regulatory RNAs

493

4. Measure and prepare other DNA (see Note 12) parts such as the acceptor plasmid (pSL137) and level 0 plasmids (pSL123 and pSL135) (see Note 13). 5. Mix the following components in a PCR microcentrifuge tube for the Golden Gate reaction (see Note 14): Compound

Volume

T4 DNA Ligase Buffer

1 μL

T4 DNA Ligase (400,000 units/mL)

1 μL

Type IIs restriction enzyme (here: SapI)

1 μL

Acceptor plasmid

~20 fmol (see Note 15)

Level 0 plasmid

Each ~20 fmol

Annealed oligonucleotides

20 fmol up to 2 pmol

ddH2O

Add 10 μL

6. Mix the reaction briefly by flipping the tube or pulse vortexing. Spin reactions briefly in a microcentrifuge ensuring the reaction mix is at the bottom of the tube and place the reaction in a thermocycler running the following program: Temperature [°C]

Time [min]

37

300

50

20

80

10

8

1 (storage)

7. Use the Golden Gate reaction directly for transformation into the E. coli host strain or store reaction mix at -20 °C for subsequent transformation. (N). 3.2.2 Transformation of Golden Gate Reaction into E. coli Cells

Within this protocol, we use the transformation into in-house generated competent cells. Cells are generated according to a RbCl-based method [13]. However, other methods are suitable as well and can be adjusted to the preferences of the user. In our case, E. coli wild type MG1655 cells are used allowing direct testing of synthetic sRNA TUs. 1. Transform chemically competent E. coli cells with Golden Gate reaction mix. For this purpose, add 2–10 μL reaction mix to 20 μL competent cells in a 1.5 mL microcentrifuge tube. Mix carefully by flipping the microcentrifuge tube.

494

Michel Bru¨ck et al.

2. Incubate mixture on ice for 30 min. Heat up a water bath to 42 °C. 3. Heat shock: Place the mixture in the preheated water bath for 30 s. 4. Put the reaction mix back on ice for 5 min. 5. Add 1 mL of LB medium (ideally preheated to 37 °C). 6. Incubate at 37 °C with a shaking frequency of 180 rpm for 60 min. 7. Centrifuge the tube at 7000× g for 3 min to obtain a cell pellet. 8. Discard part of the supernatant and retain a volume of approximately 100 μL. 9. Resuspend the cell pellet by gently pipetting up and down. 10. Plate the cell suspension on LB agar with the corresponding antibiotic for plasmid maintenance using sterile glass beads or a cell spreader. 11. Incubate plates overnight at 37 °C (see Note 16). 3.2.3 Screening and Validation of Plasmid DNA

In this protocol colony PCR (cPCR) is used to identify potentially correct candidates. Subsequently, extracted plasmid DNA is sent for external Sanger sequencing services to validate the integrity of the DNA sequence. 1. Add 50 μL of sterile ddH2O to a PCR microcentrifuge tube. 2. With a sterile toothpick or pipette tip, pick a colony of the transformation plate and put it in the prepared tube. Stir the toothpick or pipette tip to suspend cells in the water. Store at 4 °C after use (see Note 17). 3. On ice, prepare cPCR master mix according to the number of screened candidates. Plan at least 10% dead volume for the master mix preparation. Dispense 9 μL of master mix into PCR tubes or the wells of a 96-well PCR plate. Add 1 μL of candidate cell suspension to the corresponding wells. Example components of the master mix for a single reaction:

Component

1-fold volume

Forward cPCR primer (10 μM)

0.2 μL

Reverse cPCR primer (10 μM)

0.2 μL

2× Polymerase Master Mix including dNTPs and colored 5 μL buffer for direct gel loading ddH2O

3.6 μL

Bacterial Synthetic Small Regulatory RNAs

495

4. In a thermocycler, use the following settings: Description

Cycles

Temperature

Time

Initial denaturation

1

94 °C

5 min

94 °C

15–30 s

45–68 °C

15–60 s

68 °C

60 s/kb

Denaturation Annealing

35

Elongation Final extension

1

68 °C

5 min

Storage

1

8 °C

1

5. Prepare a 1–2% (w/v) agarose gel depending on expected fragment size. Mix the agarose with a DNA dye (e.g., thiazole orange (see Note 18)). 6. Perform agarose gel electrophoresis with the whole cPCR mix and a respective DNA ladder in 1× TAE buffer at 100 V for ~40 min (see Note 19). 7. Visualize and document results of the gel electrophoresis with a dedicated documentation system. 8. Inoculate potentially correct candidates containing desired cloning products in 5 mL LB with the respective antibiotic for plasmid maintenance and incubate overnight at 37 °C with shaking at 180 rpm. 9. Purify plasmids of candidates with method of choice (see Note 20). 10. Verify integrity of plasmid sequences by Sanger sequencing using an appropriate sequencing primer. 11. Optional: Preserve candidates as cryo-culture by combining 700 μL of dense early stationary phase culture with 300 μL 50% glycerol and store at -70 °C in a cryo-tube. 3.3 Synthetic sRNA Functionality Test on Solid Media

Repression of acrA results in oxacillin susceptibility of E. coli cells allowing for a functional characterization of the constructed synthetic sRNA TUs. The initial test is performed on solid media with increasing oxacillin concentrations to obtain a qualitative assessment of synthetic sRNA functionality. Figure 5 gives an overview of the workflow and shows exemplary results. 1. If plasmids are not in the desired strain background, transform into respective E. coli cells and make sure respective controls are generated (here: E. coli wild type with pSL009 and E. coli ΔacrA with pSL009). 2. Prepare single-well microtiter plates with agar. Use a 50 mL conical tube, fill it with 40 mL molten LB-agar, containing the appropriate antibiotic for plasmid maintenance, and add the respective volume of compound for the phenotypic screening

Michel Bru¨ck et al.

496

A Preparation of plates Generation of dilution series

Spotting of dilution series

Incubation & data acquisition

Inoculation of precultures

B

25 µg/mL oxacillin

50 µg/mL oxacillin

pS L0 0 Δa 09 crA pS S# 1-R L000 9 y b S# 2-R B y b S# 3-R B yb B S# 4-R yb B S# 5-R yb B S# 6-R yb B S# 7-R S# ybB 8-R yb B pS L0 00 9 Δa crA p S S# 1-R L000 9 S# ybB 2-R y b S# 3-R B S# ybB 4-R yb B S# 5-R yb B S# 6-R yb B S# 7-R S# ybB 8-R yb B pS L0 00 9 Δa crA pS S# 1-R L000 9 S# ybB 2-R y b S# 3 -R B yb B S# 4-R yb B S# 5-R yb B S# 6-R yb B S# 7-R S# ybB 8-R yb B

0 µg/mL oxacillin

Undil. 10-2 10-3 10-4 10-5 10-6 10-7

Fig. 5 Phenotypic testing of the constructed synthetic small RNAs on solid agar. (a) The workflow is depicted with the detailed steps given in the protocol. The procedure consists of five main steps which are the preparation of cultures and plates, generation and spotting of dilution series, and data acquisition. (b) Oxacillin susceptibility assay on solid medium was performed using the Singer Instruments Rotor HDA+ and the 7 × 7 spotting program. A 96-well microtiter plate prepared with the indicated dilution series was prepared in sterile distilled H2O and spotted on LB agar in single-well plates containing kanamycin for plasmid maintenance and the indicated oxacillin concentration. The ΔacrA strain shows high susceptibility to oxacillin. The synthetic sRNAs S#1-, S#4- and S#5-RybB show reduced growth at 50 μg/mL oxacillin. For sRNAs S#2-, S#3-, S#6and S#8-RybB no regulation can be observed. Synthetic sRNA S#7-RybB indicates off-target effects by a small colony phenotype already in the absence of oxacillin, which underlines the importance to benchmark synthetic sRNAs prior to their application. The results are consistent with the alternative benchmarking procedures in form of liquid growth and fluorescence reporter assays indicating that one type of benchmarking may be sufficient for sRNA characterization

(here: oxacillin at 0–100 μg/mL) (see Note 21). Always include a control without the compound. Important: The conical tube can be reused if compound concentration increases. Important: Handle the warm molten medium with care. 3. Close the conical tube and invert it carefully 2–3 times without creating bubbles to equally distribute the antibiotic(s) in the medium. 4. Carefully pour medium into a single-well microtiter plate and make sure that plates are not moved until properly solidified to prevent uneven surfaces (see Note 22). 5. After plates are solidified, store upside down in an airflow free area at room temperature overnight. Protect plates from direct sunlight. If plates are not used the following day, store in a plastic bag at 4 °C.

Bacterial Synthetic Small Regulatory RNAs

497

6. Grow precultures of dedicated strains from single colonies overnight in LB medium containing the appropriate antibiotic to maintain the sRNA expression plasmid. Always include dedicated positive and negative controls (here: E. coli wild type with pSL009 and E. coli ΔacrA with pSL009). 7. Transfer 200 μL of the dedicated strains in row A of a 96-well microtiter plate. 8. Dispense 198 μL sterile H2O into row B and 180 μL in the remaining rows of the 96-well microtiter plate. 9. Dilute the first row 1:100 into the second row. From the second row onwards, perform 1:10 dilutions (see Note 23). 10. In this protocol, the Singer Instruments Rotor HDA+ was used to perform spotting onto solid agar plates utilizing the 7 × 7 spotting program creating a grit of 49 spots for 96 samples in parallel (Fig. 5b). If no Rotor HDA+ is available, spotting can be performed using multichannel pipettes. Carefully transfer up to 5 μL of the corresponding dilution onto the agar plate (see Note 24). 11. Incubate the spotted plates at 37 °C and document growth by imaging at dedicated time points (e.g., overnight and 24 h). In this protocol the Singer Instruments PhenoBooth was used. Any other photo documentation system is suitable as well. 12. Perform qualitative evaluation of acquired data (see Note 25). 3.4 Synthetic sRNA Functionality Test in Liquid Media

The parallel growth in a plate reader of many synthetic sRNA expressing E. coli strains can be utilized for functional characterization. The corresponding workflow is depicted in Fig. 6a. The comparison of the area under the curve (AUC) of E. coli cultures containing a respective compound and cultures containing no compound allows for a more quantitative characterization of the synthetic sRNAs in comparison to the respective controls (Fig. 6b). The AUC was found to be more precise in comparison to maximum cell density and generation time because potential suppressor mutants may overgrow the culture or cell filamentation may be induced by the compound [11]. Results for the generated synthetic sRNAs within this protocol are shown in Fig. 6c. 1. Inoculate candidates and controls from single colonies and grow overnight in LB medium containing the appropriate antibiotic for plasmid maintenance (see Note 26). 2. Prepare a 96-well microtiter plate for optical measurements with 150 μL of LB medium containing the antibiotic for plasmid maintenance (see Note 27). 3. If preculture was already prepared in a 96-well microtiter plate, it is recommended to use a 96-pin replica-plating stamp or if available a Singer Instruments Rotor HDA+ for inoculation of the experiment.

498

Michel Bru¨ck et al.

A Preparation of microtiter plates

50 μg/mL oxacillin

50 μg/mL oxacillin

50 μg/mL oxacillin

pS L0 009 1-R ybB S# 2-R ybB S# 3-R ybB S# 4-R ybB S# 5-R ybB S# 6-R ybB S# 7-R ybB S# 8-R yb B S#

009 L0

S#1-RybB 0 μg/mL oxacillin

pS

ΔacrA pSL0009 0 μg/mL oxacillin

Area Under the Curve (AUC)

Absorbance 600nm

Absorbance 600nm

C pSL0009 0 μg/mL oxacillin

Data extraction and analysis

crA

B

Inoculation and plate reader kinetic

Δa

Inoculation of precultures

7500

5000

2500

0 Time 0 to 8 hours

Time 0 to 8 hours

Time 0 to 8 hours

n.s. ** ** ** n.s. ** ** n.s. * n.s. 0 μg/mL oxacillin 50 μg/mL oxacillin

Fig. 6 Phenotypic testing of the constructed synthetic small RNAs in liquid media. (a) The workflow is depicted with the detailed steps given in the protocol. The procedure consists of four main steps which are the preparation of cultures and plates, data acquisition and data analysis. (b) Exemplary ‘growthcurver’ output showing microbial growth for a single well in the presence and absence of 50 μg/mL oxacillin. Dots represent individual measurements and the red line represents the fitted curve. Based on the fitted curve ‘growthcurver’ determines the area under the curve (AUC). The S#1-RybB growth curve in the presence of oxacillin shows a small peak which becomes more apparent at lower oxacillin concentrations (data not shown). This effect is caused by a filamentation phenotype of bacterial cells. (c) Visualization of oxacillin liquid growth test. AUC results for the synthetic sRNA expression strains are shown in comparison to the wild type and the ΔacrA strain. Measurements are performed in quadruplicates for each condition. Red boxes show AUC in the absence of oxacillin and turquoise boxes AUC in the presence of 50 μg/mL oxacillin. Student’s t-test was applied for statistical evaluation (n.s.: not significant, *P < 0.01, **P < 0.001). Synthetic sRNAs S#1-, S#2-, S#4- and S#5-RybB cause a strong growth defect in the presence of oxacillin. Synthetic sRNAs S#3-, S#6and S#8-RybB have almost no effect on growth. Targeting the TIR of acrA – instead of the coding region – seems to provide optimal sRNA functionality (cf Fig. 3b). Synthetic sRNA S#7-RybB already shows a growth defect in the absence of oxacillin indicating off-target effects of this sRNA. The liquid media results are consistent with the alternative benchmarking procedures except for S#2-RybB which does not indicate regulation on solid media. However, the different assays indicating that one type of benchmarking may be sufficient for sRNA characterization

4. Seal plate with an optical clear adhesive seal or if available a thermal plate sealer with optical clear seal and the respective settings. Important: Be careful with hot surfaces if heat sealing is used. Important: Handle sealed plates with care to avoid splashes on the seal. 5. Use plate reader with kinetic settings to record microbial growth for at least 12 h at 37 °C. Adjust settings according to the experimental setup if necessary (see Note 28).

Bacterial Synthetic Small Regulatory RNAs

499

6. The data are exported as a csv file for further analysis after finishing data acquisition. 7. Analyze the data with your preferred analysis pipeline. Here, we use the open source statistic software R (https://www.rproject.org/) with the graphical user interface RStudio to analyze and visualize the generated data. 8. Install and load package ‘growthcurver’ [14]. install.packages(’growthcurver’) library(’growthcurver’)

9. Prepare the input data as shown in the ‘growthcurver’ documentation (https://cran.r-project.org/web/packages/grow thcurver/vignettes/Growthcurver-vignette.html). 10. Exemplary data format for a 96-well plate saved as a csv file containing the column headers in the first row: [. . .] H12

time

A1

x

0.05 0.05 0.10 0.09 0.06 0.08 0.05 0.05 0.06 [. . .] 0.05

B1

C1

D1

E1

F1

G1

H1

A2

11. The ‘growthcurver’ application generates a graphical output and corresponding values including the AUC which is used for the functional characterization of synthetic sRNAs. 12. Following commands can be used for the output generation: # Load data in form of csv file data