Plant Gene Regulatory Networks: Methods and Protocols (Methods in Molecular Biology, 2698) [2nd ed. 2023] 1071633538, 9781071633533

This second edition details protocols that analyze and explore gene regulatory networks (GRNs). Chapters guide readers t

181 78 14MB

English Pages 394 [380] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Plant Gene Regulatory Networks: Methods and Protocols (Methods in Molecular Biology, 2698) [2nd ed. 2023]
 1071633538, 9781071633533

Table of contents :
Preface
Contents
Contributors
Chapter 1: Characterization of Gene Regulatory Networks in Plants Using New Methods and Data Types
1 Introduction
2 Experimental Methods to Map GRNs in Plants
3 Data Analysis and Inference Methods to Accurately Model Gene Regulation
4 Conclusions
References
Chapter 2: Inducible, Tissue-Specific Gene Expression in Arabidopsis Using GR-LhG4-Mediated Trans-Activation
1 Introduction
2 Materials
3 Methods
3.1 Preparing Plant Expression Vector Using GreenGate Cloning
3.2 PCR Amplification of Insert DNA
3.3 Entry Module Preparation
3.4 Destination Module Preparation
3.5 Intermediate Supermodule Preparation
3.6 Transform the Plant Expression Vector into A. tumefaciens
3.7 Generation of Transgenic Arabidopsis Plants
3.8 Induction of Trans-Activation in Arabidopsis Driver Lines
3.8.1 Root
3.8.2 Shoot Apical Meristem (SAM)
3.8.3 Stem
3.9 Imaging of Reporter Expression in Arabidopsis Driver Lines
3.9.1 Root
3.9.2 Shoot Apical Meristem
3.9.3 Stem
4 Notes
References
Chapter 3: Targeted Activation of Arabidopsis Genes by a Potent CRISPR-Act3.0 System
1 Introduction
2 Materials
2.1 Plant and Bacterial Strains
2.2 Plasmids
2.3 Vector Construction
2.4 Culture Media and Stock Solutions
2.5 Arabidopsis Transformation Using the Floral Dip Method
2.6 qRT-PCR Analysis of Transgenic Plants
3 Methods
3.1 Growth of Arabidopsis Plants
3.2 CRISPR-Act3.0 Construct Assembly
3.2.1 sgRNA Design for Targeted Gene Activation
3.3 Floral Dipping Transformation of Arabidopsis Plants
3.4 Quantitate the Activation Level of Target Genes in Transgenic Plants
4 Notes
References
Chapter 4: Single Cell RNA-Sequencing in Arabidopsis Root Tissues
1 Introduction
2 Materials
2.1 Plant Material
2.2 Generating Protoplasts
2.3 Quality/Quantity Assessment
2.4 Flow Cytometry
2.5 Library Preparation and Sequencing
3 Methods
3.1 Plant Material Preparation
3.2 Protoplast Generation
3.3 Sample Preparation for Direct Loading
3.4 Sample Preparation for Flow Cytometry
3.5 Loading Protoplasts
3.6 10x Genomics Sample Preparation, Library Construction, and Sequencing
3.7 Computational Workflow
3.7.1 Build the Arabidopsis Genome Reference
3.7.2 Read the Filtered Feature-Barcode Matrices in R and Clean the Data
3.7.3 Preprocessing of the Data
3.7.4 Normalizing the Raw Counts
3.7.5 Detecting Highly Variable Genes
3.7.6 Scaling the Data
3.7.7 Performing Linear Dimensional Reduction (PCA)
3.7.8 Clustering and Visualization
3.7.9 Cell Identity Assignment and Validation
4 Notes
References
Chapter 5: Analysis of Chromatin Accessibility, Histone Modifications, and Transcriptional States in Specific Cell Types Using...
1 Introduction
2 Materials
2.1 In Vitro Plant Cultivation
2.2 Protoplasting
2.3 Nuclei Isolation from Protoplasts
2.4 Sorting (FACS/FANS)
2.5 RNA-seq Library
2.6 ATAC-seq Library
2.7 ChIP-seq Library
3 Methods
3.1 Protoplast Generation and Purification
3.2 Nuclei Isolation from Protoplasts
3.3 FACS
3.4 FANS
3.5 RNA-seq Paired-End Library Preparation (Protoplasts)
3.6 ATAC-seq Paired-End Library Preparation (Nuclei)
3.7 ChIP-seq Paired-End Library Generation (Nuclei)
4 Notes
References
Chapter 6: Untargeted Proteomics and Metabolomics Analysis of Plant Organ Development
1 Introduction
2 Materials
2.1 Plant Material (See Notes 1 and 2)
2.2 Protein and Metabolite Extraction (as Described in) (See Notes 3, 5 and 9)
2.3 Metabolomic Reagents (Using MS-Grade Reagents) (as Described in)
2.4 Lipidomic Reagents (Using MS-Grade Reagents) (as Described in)
2.5 Proteomic Reagents (Using MS-Grade Reagents)
3 Methods
3.1 Metabolite and Protein Extraction (See Note 5)
3.2 Lipidomic Analysis
3.3 Metabolomic Analysis
3.4 Proteomic Analysis (See Notes 8 and 9)
3.5 Data Integration
4 Notes
References
Chapter 7: DamID-seq: A Genome-Wide DNA Methylation Method that Captures Both Transient and Stable TF-DNA Interactions in Plan...
1 Introduction
2 Materials
2.1 Cloning
2.2 Plant Growth and Protoplasting
2.3 Solutions
2.4 Protoplast Transfections
2.5 DNA Preparation
2.6 DamID
2.7 DNA Library Preparation
2.8 Equipment
3 Methods
3.1 Vector Cloning and Preparation
3.2 Plant Growth and Protoplast Isolation
3.3 Protoplast Transfection and Treatments (TARGET)
3.4 Isolation and Amplification of Methylated DNA Fragments (DamID)
3.5 Library Preparation and Sequencing
3.6 Data Analysis: TF-Target Gene Identification
4 Notes
References
Chapter 8: CUT&Tag for Mapping In Vivo Protein-DNA Interactions in Plants
1 Introduction
2 Materials
2.1 Formaldehyde Fixation
2.2 Nuclei Isolation
2.3 CUT&Tag
2.4 DNA Purification and Library Amplification
3 Methods
3.1 Formaldehyde Fixation
3.2 Nuclei Isolation
3.3 CUT&Tag
3.4 DNA Purification and Library Amplification
4 Notes
References
Chapter 9: Identification of Plant Transcription Factor DNA-Binding Sites Using seq-DAP-seq
1 Introduction
2 Materials
2.1 Equipment
2.2 Kits
2.3 Reagents and Materials
2.4 Buffers
2.5 Primers
2.6 Bioinformatics Requirements
3 Methods
3.1 DAP-seq and ampDAP-seq Input Library Preparation
3.1.1 ampDAP Libraries Without DNA Modifications (Fig. 5)
3.2 TF Expression, TF Complex Formation, and Sequential Pull Down
3.3 Binding of DNA to Immobilized TF Complex
3.4 DNA Amplification, Pooling, and Sequencing
3.5 Bioinformatic Analysis (Fig. 10)
4 Notes
References
Chapter 10: Estimating DNA-Binding Specificities of Transcription Factors Using SELEX-Seq
1 Introduction
2 Materials
2.1 Double-Stranded DNA Library Preparation
2.2 Protein (Protein Complex) In Vitro Synthesis
2.3 First Round (R1) of SELEX
2.4 PCR Amplification of the Selected DNA Fragments
2.5 Validation of SELEX by Electrophoretic Mobility Shift Assay (EMSA)
2.6 High-Throughput Sequencing of the SELEX Libraries
3 Methods
3.1 Double-Stranded DNA Library Preparation
3.2 Protein (Protein Complex) In Vitro Synthesis
3.3 First Round (R1) of SELEX
3.4 PCR Amplification of the Selected DNA Sequences
3.5 Subsequent Rounds (Rx) of SELEX
3.6 Validation of SELEX by Electrophoretic Mobility Shift Assay (EMSA)
3.7 High-Throughput Sequencing of the SELEX Libraries
4 Bioinformatics Analysis
5 Notes
References
Chapter 11: Immunoprecipitation-Mass Spectrometry (IP-MS) of Protein-Protein Interactions of Nuclear-Localized Plant Proteins
1 Introduction
2 Materials
2.1 Plant Lines
2.2 Common Material
2.3 Nuclear Protein Extraction and Protein Immunoprecipitation
2.4 IP and Input Sample Processing
2.5 Peptide Desalting
2.6 Liquid Chromatography-Mass Spectrometry (LC-MS)
2.7 Data Analysis
3 Methods
3.1 Plant Tissue Preparation
3.2 Nuclear Protein Extraction
3.3 Protein Immunoprecipitation
3.4 Input Sample Processing
3.5 IP Sample Processing
3.6 Peptide Desalting
3.7 LC-MS Measurements (See Note 9)
3.8 Protein Identification, Label-Free Quantification, and Data Analysis
4 Notes
References
Chapter 12: Mapping Active Gene-Associated Chromatin Loops by ChIA-PET in Rice
1 Introduction
2 Materials
2.1 Dual Crosslinking
2.2 Nuclei Lysis
2.3 Chromatin Immunoprecipitation
2.4 Proximity Ligation
2.5 Reverse Crosslinking and DNA Purification
2.6 Library Preparation and Sequencing
3 Methods
3.1 Dual Crosslinking
3.2 Nuclei Lysis and Chromatin Fragmentation
3.3 Chromatin Immunoprecipitation
3.4 Proximity Ligation
3.5 Reverse Crosslinking and DNA Purification
3.6 Library Preparation and Sequencing
4 Notes
References
Chapter 13: Building High-Confidence Gene Regulatory Networks by Integrating Validated TF-Target Gene Interactions Using Conne...
1 Introduction
2 Materials
3 Methods
3.1 Intersecting Validated TF-Target Interaction Datasets
3.2 Network Walking: Unified Networks Linking Direct Targets to In Planta Responses
3.3 Pruning Predicted TF-Target Interactions with Precision-Recall Analysis
3.4 Setting Up a Private Instance of ConnecTF
4 Notes
References
Chapter 14: The ChIP-Hub Resource: Toward plantEncode
1 Introduction
2 A User Guide for the ChIP-Hub Website
2.1 Resources of the ChIP-Hub Website
2.2 Search with ChIP-Hub
2.3 Online Analysis by ChIP-Hub
2.4 Download Data
3 Integrative Analyses with ChIP-Hub
3.1 Comparative Genomic Analysis by ``lastz to´´
3.2 Chromatin State Analysis
4 Perspectives
5 Notes
References
Chapter 15: A Practical Guide to Inferring Multi-Omics Networks in Plant Systems
1 Introduction
2 Materials
2.1 Software
2.2 Data
3 Methods
3.1 Setup
3.2 Transcriptomics Data Preprocessing
3.2.1 Downloading RNA Sequencing Data from SRA
3.2.2 Preprocessing RNA-seq Data
3.3 Proteomics Data preprocessing
3.3.1 Downloading Raw Proteomics Files from ProteomeExchange
3.3.2 Preprocessing Proteomics Data
3.4 Network Inference Using SC-ION
3.4.1 Preprocessing Data Tables for SC-ION
3.4.2 Running the SC-ION RShiny Application
3.5 Network Visualization and Analysis Using Cytoscape
3.5.1 Importing SC-ION Networks into Cytoscape
3.5.2 Merging Networks in Cytoscape
3.5.3 Importing Additional Information from Tables into Cytoscape Networks
3.5.4 Changing Network Visualization Parameters in Cytoscape
3.5.5 Network Motif (Importance) Score Calculation Using NetMatch* and R
4 Notes
References
Chapter 16: Gene Regulatory Network Modeling Using Single-Cell Multi-Omics in Plants
1 Introduction
2 Materials
2.1 Multi-Omics Integration with Seurat
2.1.1 Download Data for scATAC-seq + scRNA-seq
2.1.2 Install R Packages
2.2 Install Anaconda and Create an Environment for Machine Learning Project
2.3 Download and Install FIMO
2.4 Download Input Data for FIMO Analysis
3 Methods
3.1 Multi-Omics Integration with Seurat
3.1.1 Preprocessing scRNA-seq and scATAC-seq Data
3.1.2 Integrate RNA-seq and ATAC-seq
3.1.3 Marker Gene Identification
3.2 Running FIMO Analysis and Machine Learning
3.2.1 Summarizing the FIMO Outputs
3.2.2 Creating a Data Matrix
3.2.3 Selecting the Genes for Training the Machine Learning Algorithms
3.2.4 Applying the Machine Learning Models on the Dataset
4 Notes
References
Chapter 17: Methodology for Constructing a Knowledgebase for Plant Gene Regulation Information
1 Introduction
1.1 Need for Plant GRN Knowledgebases and Their Maintenance
1.2 The GRASSIUS Plant GRN Knowledgebase
1.3 Stages and Goals of Implementing a Plant GRN Knowledgebase
2 Materials
2.1 Hardware Needs-Minimal and Preferred
2.2 Expertise Required
3 Methods
3.1 Data Scope Definition
3.1.1 Identification and Classification of TF Repertoire for a Plant Species
3.1.2 Identification of TF-Target Gene Interactions [Protein-DNA Interactions (PDIs)]
3.1.3 Mapping of Transcription Start Sites (TSSs)-CAGE and Other Techniques
3.1.4 Prediction of TF DNA-Binding Sites
3.2 Schema Design
3.3 Implementation of the Database
3.3.1 Implementing Grassius in the Chado Schema
3.3.2 Population of the Database
3.4 User Interface Development
3.4.1 Web Server Setup
3.4.2 Creation of the Web Application
3.4.3 Development of User Views
3.4.4 Development of Queries (Example)
3.4.5 Implementation of Search Features
3.4.6 Addition of Graphic Displays
3.4.7 Addition of Interactive Tools
3.4.8 Development of the User Interface
4 Notes
References
Chapter 18: Predicting Gene Regulatory Interactions Using Natural Genetic Variation
1 Introduction
2 Materials
2.1 Data Needed for the Analyses
2.2 Computational Infrastructure
2.3 Software
3 Methods
3.1 Data Needed to Perform GWAS
3.2 Statistical Model for GWAS
3.3 Workflow to Perform Univariate GWAS
3.4 More Complex GWAS Models
3.4.1 The Multi-Locus Mixed Model
3.4.2 2D-GWAS
3.4.3 Integrating Knowledge About Protein-Protein Interactions
3.4.4 Integrating Gene Expression Data
3.5 Future Perspectives
4 Notes
References
Chapter 19: Prediction of Transcription Factor Regulators and Gene Regulatory Networks in Tomato Using Binding Site Information
1 Introduction
2 Materials
2.1 Bioinformatics Resources
2.2 Data and Code Availability
2.3 Gene ID Conversion Protocol
2.4 Set of Functionally Related or Coregulated Genes
2.5 Tomato Motif Mapping and Enrichment Protocol
2.6 Gene Ontology (GO) Enrichment with PLAZA Dicots 5.0 Workbench
2.7 Functional Network Visualization Using Cytoscape
2.7.1 Genes Within the Gene Set Annotated to a Specific GO Term
2.7.2 All Genes Annotated to a Specific GO Term
2.7.3 Cytoscape Network Visualization
3 Methods
3.1 Conversion of Gene Identifiers (IDs) Between ITAG 2.5 and ITAG 4.0 Genome Annotations
3.2 Running a Motif Enrichment Analysis
3.3 Analysis of Motif Enrichment Results
3.4 Functional GO Analysis of the Motif Enrichment Results
3.5 Visualization of Functional Networks Obtained Through Motif Enrichment
4 Notes
References
Chapter 20: AGENT for Exploring and Analyzing Gene Regulatory Networks from Arabidopsis
1 Introduction
2 Materials
3 Methods
3.1 Exploring Curated GRNs in AGENT
3.2 Network Motif Discovery and Network Attributes
3.3 Expression Overlay
4 Notes
References
Chapter 21: A Transferable Machine Learning Framework for Predicting Transcriptional Responses of Genes Across Species
1 Introduction
2 Materials
2.1 Transcriptomic Data
2.2 Working Environment
2.3 Software Installation
3 Methods
3.1 Label Responsive and Nonresponsive Genes
3.1.1 RNA-seq Data Processing
3.1.2 Differential Expression Gene Analysis
3.2 Binning of Responsive and Nonresponsive Genes
3.3 Quantifying Gene Features
3.3.1 Sorghum Example Data Set
3.3.2 Feature Extraction
3.3.3 Merge Genomic Features and Gene Classification Label
3.4 Gene-Family Clustering
3.5 Model Training/Hyperparameter Tuning
3.6 Model Evaluation
3.7 Cross-Species Predictions
4 Notes
References
Index

Citation preview

Methods in Molecular Biology 2698

Kerstin Kaufmann · Klaas Vandepoele  Editors

Plant Gene Regulatory Networks Methods and Protocols Second Edition

METHODS

IN

MOLECULAR BIOLOGY

Series Editor John M. Walker School of Life and Medical Sciences University of Hertfordshire Hatfield, Hertfordshire, UK

For further volumes: http://www.springer.com/series/7651

For over 35 years, biological scientists have come to rely on the research protocols and methodologies in the critically acclaimed Methods in Molecular Biology series. The series was the first to introduce the step-by-step protocols approach that has become the standard in all biomedical protocol publishing. Each protocol is provided in readily-reproducible step-by step fashion, opening with an introductory overview, a list of the materials and reagents needed to complete the experiment, and followed by a detailed procedure that is supported with a helpful notes section offering tips and tricks of the trade as well as troubleshooting advice. These hallmark features were introduced by series editor Dr. John Walker and constitute the key ingredient in each and every volume of the Methods in Molecular Biology series. Tested and trusted, comprehensive and reliable, all protocols from the series are indexed in PubMed.

Plant Gene Regulatory Networks Methods and Protocols Second Edition

Edited by

Kerstin Kaufmann Berlin, Germany

Klaas Vandepoele Ghent, Belgium

Editors Kerstin Kaufmann Berlin, Germany

Klaas Vandepoele Ghent, Belgium

ISSN 1064-3745 ISSN 1940-6029 (electronic) Methods in Molecular Biology ISBN 978-1-0716-3353-3 ISBN 978-1-0716-3354-0 (eBook) https://doi.org/10.1007/978-1-0716-3354-0 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Humana imprint is published by the registered company Springer Science+Business Media, LLC, part of Springer Nature. The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A. Paper in this product is recyclable.

Preface Plant development and environmental responses require the coordinated, orchestrated, and often cell-type-specific activities of many gene products. The complex regulatory relationships between genes and gene products are abstractly described as gene regulatory networks (GRNs). The characterization of GRNs requires typically quantitative experimental data on gene expression dynamics, e.g., in response to an environmental or developmental cue, together with information on the “hard-wiring” of the system, reporting the molecular players and interactions that determine regulatory activity and specificity. Natural genetic variation and evolutionary conservation of genes and regulatory elements can further enrich the diversity in data types used to delineate and analyze GRNs. Together, different types of experimental input serve as foundation for computational data integration and modelling of gene-regulatory interactions that underlie phenotypic traits and plant behavior in response to environmental cues. Advanced experimental technologies to interrogate plant GRNs at unprecedented resolution are rapidly evolving, accompanied by the introduction of innovative computational approaches to integrate multiscale data and predict gene-regulatory interactions. The second edition of Plant Gene Regulatory Networks aims to introduce different experimental techniques and computational approaches to elucidate GRN structure and functions in plants. The experimental technologies include multi-scale analyses of gene activities, including, e.g., cell-type and single cell transcriptome analyses, approaches for targeted perturbation, and untargeted analyses of proteomes and metabolomes. Different innovative techniques to identify molecular interactions in vitro and in vivo and to elucidate mechanisms of specificity in gene regulation are presented. It is becoming increasingly clear that mechanisms of gene regulation, and thereby mechanistic regulatory relationships forming the basis of plant GRNs, can only be understood by elucidating the higher-order architecture of genetic interactions and 3D molecular topology in gene promoters, and in general in the plant nucleus. Besides providing protocols for selected experimental approaches, Plant Gene Regulatory Networks highlights bioinformatics and data resources and pipelines for primary data analysis, integration, and network analysis. Selected innovative computational approaches for network modelling, including, e.g., machine learning approaches and agentbased modelling, are introduced. In sum, the chapters presented in the second edition of Plant Gene Regulatory Networks aim to expand the toolbox for GRN analysis in different plant species, at different experimental and computational levels, by providing detailed practical and technical insights. Berlin, Germany Ghent, Belgium

Kerstin Kaufmann Klaas Vandepoele

v

Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 Characterization of Gene Regulatory Networks in Plants Using New Methods and Data Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Klaas Vandepoele and Kerstin Kaufmann 2 Inducible, Tissue-Specific Gene Expression in Arabidopsis Using GR-LhG4-Mediated Trans-Activation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tasnim Zerin, Thomas Greb, and Sebastian Wolf 3 Targeted Activation of Arabidopsis Genes by a Potent CRISPR–Act3.0 System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Changtian Pan and Yiping Qi 4 Single Cell RNA-Sequencing in Arabidopsis Root Tissues. . . . . . . . . . . . . . . . . . . . Yuji Ke, Max Minne, Thomas Eekhout, and Bert De Rybel 5 Analysis of Chromatin Accessibility, Histone Modifications, and Transcriptional States in Specific Cell Types Using Flow Cytometry . . . . . . . Kenneth W. Berendzen, Christopher Grefen, Takuya Sakamoto, and Daniel Slane 6 Untargeted Proteomics and Metabolomics Analysis of Plant Organ Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Venkatesh P. Thirumalaikumar, Alisdair R. Fernie, and Aleksandra Skirycz 7 DamID-seq: A Genome-Wide DNA Methylation Method that Captures Both Transient and Stable TF-DNA Interactions in Plant Cells . . . . . . . . . . . . . . . Jose´ M. Alvarez, Will E. Hinckley, Lauriebeth Leonelli, Matthew D. Brooks, and Gloria M. Coruzzi 8 CUT&Tag for Mapping In Vivo Protein-DNA Interactions in Plants . . . . . . . . . Weizhi Ouyang and Xingwang Li 9 Identification of Plant Transcription Factor DNA-Binding Sites Using seq-DAP-seq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stephanie Hutin, Romain Blanc-Mathieu, Philippe Rieu, Franc¸ois Parcy, Xuelei Lai, and Chloe Zubieta 10 Estimating DNA-Binding Specificities of Transcription Factors Using SELEX-Seq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Peilin Chen, Cezary Smaczniak, Johanna Haffner, Jose M. Muino, and Kerstin Kaufmann 11 Immunoprecipitation-Mass Spectrometry (IP-MS) of Protein-Protein Interactions of Nuclear-Localized Plant Proteins. . . . . . . . . . . . . . . . . . . . . . . . . . . . Cezary Smaczniak 12 Mapping Active Gene-Associated Chromatin Loops by ChIA-PET in Rice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Weizhi Ouyang and Xingwang Li

vii

v ix

1

13

27 41

57

75

87

109

119

147

163

183

viii

13

14 15

16

17

18

19

20

21

Contents

Building High-Confidence Gene Regulatory Networks by Integrating Validated TF–Target Gene Interactions Using ConnecTF . . . . . . Ji Huang, Manpreet S. Katari, Che-Lun Juang, Gloria M. Coruzzi, and Matthew D. Brooks The ChIP-Hub Resource: Toward plantEncode . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yangming Lan, Xue Zhao, and Dijun Chen A Practical Guide to Inferring Multi-Omics Networks in Plant Systems. . . . . . . . Natalie M. Clark, Bhavna Hurgobin, Dior R. Kelley, Mathew G. Lewsey, and Justin W. Walley Gene Regulatory Network Modeling Using Single-Cell Multi-Omics in Plants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tran Chau, Prakash Timilsena, and Song Li Methodology for Constructing a Knowledgebase for Plant Gene Regulation Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hadi Nayebi Gavgani, Erich Grotewold, and John Gray Predicting Gene Regulatory Interactions Using Natural Genetic Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maura John, Dominik Grimm, and Arthur Korte Prediction of Transcription Factor Regulators and Gene Regulatory Networks in Tomato Using Binding Site Information . . . . . . . . . . . . Nicola´s Manosalva Pe´rez and Klaas Vandepoele AGENT for Exploring and Analyzing Gene Regulatory Networks from Arabidopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vincent Lau and Nicholas J. Provart A Transferable Machine Learning Framework for Predicting Transcriptional Responses of Genes Across Species . . . . . . . . . . . . . . . . . . . . . . . . . . Zhikai Liang, Xiaoxi Meng, and James C. Schnable

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

195

221 233

259

277

301

323

351

361 381

Contributors JOSE´ M. ALVAREZ • Centro de Biotecnologı´a Vegetal, Facultad de Ciencias de la Vida, Universidad Andre´s Bello, Santiago, Chile; Agencia Nacional de Investigacion y Desarrollo–Millennium Science Initiative Program, Millennium Institute for Integrative Biology (iBio), Santiago, Chile KENNETH W. BERENDZEN • Center for Plant Molecular Biology, University of Tu¨bingen, Tu¨bingen, Germany ROMAIN BLANC-MATHIEU • Laboratoire de Physiologie Cellulaire et Ve´ge´tale, CNRS, CEA, Universite´ Grenoble Alpes, INRAE, IRIG, CEA Grenoble, Grenoble, France MATTHEW D. BROOKS • Global Change and Photosynthesis Research Unit, USDA ARS, Urbana, IL, USA TRAN CHAU • Graduate Program in Genetics, Bioinformatics and Computational Biology (GBCB), Blacksburg, VA, USA DIJUN CHEN • State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China PEILIN CHEN • Institute for Biology, Plant Cell and Molecular Biology, HumboldtUniversit€ at zu Berlin, Berlin, Germany NATALIE M. CLARK • Proteomics Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA GLORIA M. CORUZZI • Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA; Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, USA THOMAS EEKHOUT • Ghent University, Department of Plant Biotechnology and Bioinformatics, Ghent, Belgium; VIB Center for Plant Systems Biology, Ghent, Belgium; VIB Single Cell Core, VIB, Ghent/Leuven, Belgium ALISDAIR R. FERNIE • Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany HADI NAYEBI GAVGANI • Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA; Dandelions Therapeutics Inc., San Francisco, CA, USA JOHN GRAY • Department of Biological Sciences, University of Toledo, Toledo, OH, USA THOMAS GREB • Centre for Organismal Studies (COS) Heidelberg, University of Heidelberg, Heidelberg, Germany CHRISTOPHER GREFEN • Faculty of Biology and Biotechnology, Molecular and Cellular Botany, University of Bochum, Bochum, Germany DOMINIK GRIMM • Technical University of Munich & Weihenstephan-Triesdorf University of Applied Sciences, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing, Germany ERICH GROTEWOLD • Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA JOHANNA HAFFNER • Institute for Biology, Plant Cell and Molecular Biology, HumboldtUniversit€ at zu Berlin, Berlin, Germany

ix

x

Contributors

WILL E. HINCKLEY • Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA JI HUANG • Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, USA BHAVNA HURGOBIN • Australian Research Council Research Hub for Medicinal Agriculture, La Trobe University, Bundoora, VIC, Australia; La Trobe Institute for Sustainable Agriculture and Food, Department of Animal, Plant and Soil Sciences, School of Agriculture, Biomedicine and Environment, La Trobe University, Bundoora, VIC, Australia STEPHANIE HUTIN • Laboratoire de Physiologie Cellulaire et Ve´ge´tale, CNRS, CEA, Universite´ Grenoble Alpes, INRAE, IRIG, CEA Grenoble, Grenoble, France MAURA JOHN • Technical University of Munich & Weihenstephan-Triesdorf University of Applied Sciences, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing, Germany CHE-LUN JUANG • Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, USA MANPREET S. KATARI • Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, USA KERSTIN KAUFMANN • Institute of Biology, Humboldt-Universitaet zu Berlin, Berlin, Germany; Institute for Biology, Plant Cell and Molecular Biology, Humboldt-Universit€ at zu Berlin, Berlin, Germany YUJI KE • Ghent University, Department of Plant Biotechnology and Bioinformatics, Ghent, Belgium; VIB Center for Plant Systems Biology, Ghent, Belgium DIOR R. KELLEY • Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA, USA ARTHUR KORTE • Center for Computational and Theoretical Biology, University of Wu¨rzburg, Wu¨rzburg, Germany XUELEI LAI • Laboratoire de Physiologie Cellulaire et Ve´ge´tale, CNRS, CEA, Universite´ Grenoble Alpes, INRAE, IRIG, CEA Grenoble, Grenoble, France; National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, China YANGMING LAN • State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China VINCENT LAU • Department of Cell and Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, Canada LAURIEBETH LEONELLI • Department of Agricultural and Biological Engineering at the University of Illinois, Urbana, IL, USA MATHEW G. LEWSEY • Australian Research Council Research Hub for Medicinal Agriculture, La Trobe University, Bundoora, VIC, Australia; La Trobe Institute for Sustainable Agriculture and Food, Department of Animal, Plant and Soil Sciences, School of Agriculture, Biomedicine and Environment, La Trobe University, Bundoora, VIC, Australia; Australian Research Council Centre of Excellence in Plants for Space, AgriBio Building, La Trobe University, Bundoora, VIC, Australia SONG LI • Graduate Program in Genetics, Bioinformatics and Computational Biology (GBCB), Blacksburg, VA, USA; School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA, USA

Contributors

xi

XINGWANG LI • National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, China ZHIKAI LIANG • Department of Plant and Microbial Biology, University of Minnesota, Saint Paul, MN, USA NICOLA´S MANOSALVA PE´REZ • Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium; Center for Plant Systems Biology, VIB, Ghent, Belgium XIAOXI MENG • Department of Horticultural Science, University of Minnesota, Saint Paul, MN, USA MAX MINNE • Ghent University, Department of Plant Biotechnology and Bioinformatics, Ghent, Belgium; VIB Center for Plant Systems Biology, Ghent, Belgium JOSE M. MUINO • Institute for Biology, Plant Cell and Molecular Biology, HumboldtUniversit€ at zu Berlin, Berlin, Germany WEIZHI OUYANG • National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, China CHANGTIAN PAN • Department of Plant Science and Landscape Architecture, University of Maryland, College Park, MD, USA FRANC¸OIS PARCY • Laboratoire de Physiologie Cellulaire et Ve´ge´tale, CNRS, CEA, Universite´ Grenoble Alpes, INRAE, IRIG, CEA Grenoble, Grenoble, France NICHOLAS J. PROVART • Department of Cell and Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, Canada YIPING QI • Department of Plant Science and Landscape Architecture, University of Maryland, College Park, MD, USA; Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, MD, USA PHILIPPE RIEU • Laboratoire de Physiologie Cellulaire et Ve´ge´tale, CNRS, CEA, Universite´ Grenoble Alpes, INRAE, IRIG, CEA Grenoble, Grenoble, France BERT DE RYBEL • Ghent University, Department of Plant Biotechnology and Bioinformatics, Ghent, Belgium; VIB Center for Plant Systems Biology, Ghent, Belgium TAKUYA SAKAMOTO • Department of Applied Biological Science, Faculty of Science and Technology, Tokyo University of Science, Chiba, Japan JAMES C. SCHNABLE • Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE, USA; Department of Agronomy and Horticulture, University of NebraskaLincoln, Lincoln, NE, USA ALEKSANDRA SKIRYCZ • Boyce Thompson Institute, Ithaca, NY, USA; Cornell University, Ithaca, NY, USA DANIEL SLANE • Department of Applied Biological Science, Faculty of Science and Technology, Tokyo University of Science, Chiba, Japan; The University of Tokyo, Graduate School of Frontier Sciences, Department of Integrated Biosciences, Laboratory of Integrated Biology, Chiba, Japan CEZARY SMACZNIAK • Institute for Biology, Plant Cell and Molecular Biology, HumboldtUniversit€ at zu Berlin, Berlin, Germany VENKATESH P. THIRUMALAIKUMAR • Boyce Thompson Institute, Ithaca, NY, USA PRAKASH TIMILSENA • School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA, USA KLAAS VANDEPOELE • Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium; VIB-UGent Center for Plant Systems Biology, Ghent, Belgium; Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium JUSTIN W. WALLEY • Department of Plant Pathology and Microbiology, Iowa State University, Ames, IA, USA

xii

Contributors

SEBASTIAN WOLF • Center for Plant Molecular Biology (ZMBP), University of Tu¨bingen, Tu¨bingen, Germany; Centre for Organismal Studies (COS) Heidelberg, University of Heidelberg, Heidelberg, Germany TASNIM ZERIN • Center for Plant Molecular Biology (ZMBP), University of Tu¨bingen, Tu¨bingen, Germany XUE ZHAO • State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China CHLOE ZUBIETA • Laboratoire de Physiologie Cellulaire et Ve´ge´tale, CNRS, CEA, Universite´ Grenoble Alpes, INRAE, IRIG, CEA Grenoble, Grenoble, France

Chapter 1 Characterization of Gene Regulatory Networks in Plants Using New Methods and Data Types Klaas Vandepoele and Kerstin Kaufmann Abstract A major question in plant biology is to understand how plant growth, development, and environmental responses are controlled and coordinated by the activities of regulatory factors. Gene regulatory network (GRN) analyses require integrated approaches that combine experimental approaches with computational analyses. A wide range of experimental approaches and tools are now available, such as targeted perturbation of gene activities, quantitative and cell-type specific measurements of dynamic gene activities, and systematic analysis of the molecular ‘hard-wiring’ of the systems. At the computational level, different tools and databases are available to study regulatory sequences, including intuitive visualizations to explore datadriven gene regulatory networks in different plant species. Furthermore, advanced data integration approaches have recently been developed to efficiently leverage complementary regulatory data types and learn context-specific networks. Key words Transcription factor, Gene regulatory network, Gene activity, Genome-wide profiling, Plant

1

Introduction Transcriptional regulation is one of the fundamental processes controlling gene expression. Within multicellular organisms, including plants, detailed regulatory programs orchestrate gene activities, resulting in phenotypic diversity. This diversity covers the development of specific cell types and organs as well as specific cellular responses to various external stimuli. Transcription factors (TFs), together with other regulatory factors such as miRNAs and other classes of ncRNAs, are key actors influencing gene expression. TFs regulate their target genes by recognizing short sequences in the DNA called TF binding sites (TFBSs). While TF binding can result in the activation or repression of the associated target gene, the interplay of multiple regulators results in the combinatorial control determining spatiotemporal gene expression [1, 2]. The full set of regulatory interactions between a set of functionally

Kerstin Kaufmann and Klaas Vandepoele (eds.), Plant Gene Regulatory Networks: Methods and Protocols, Methods in Molecular Biology, vol. 2698, https://doi.org/10.1007/978-1-0716-3354-0_1, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023

1

2

Klaas Vandepoele and Kerstin Kaufmann

related TFs and their target genes form a gene regulatory network (GRN), and these networks are of major interest to get an overview of the organization and complexity of transcriptional regulation at the organismal level. GRNs are pivotal to understanding how different biological processes like growth, development, or stress responses are controlled in plants. Examples of well-characterized signaling cascades cover GRNs controlling root, leaf, and flower development, floral transition, photomorphogenesis, as well as cellular responses to plant hormones such as ethylene and jasmonic acid [3–7]. Furthermore, numerous TFs involved in the transcriptional response to various biotic or abiotic stresses have been reported [8–10]. Transcription factors can be classified into different TF families based on the presence of protein domains with documented specific DNA binding activity associated with transcriptional regulation. While most TF families are shared between Viridiplantae species (green algae and/or land plants) and other eukaryotes, a subset of families is specific to Viridiplantae or land plants [11]. Plant-specific TF families play key roles in the regulation of various plant developmental, signaling pathways, stress responses, and the biosynthesis of metabolites that, in some cases, are of major importance for plant adaptation [12]. Between 35 and 58 TF families can be found in different plant lineages, each having specific DNA-binding signature domains. The number of TFs varies greatly between plant species, from ~100 TFs in the unicellular green alga Ostreococcus tauri to >3600 TFs in polyploid species like Triticum aestivum (wheat) [13]. Comparison of plant and algal genomes revealed that several TF families have expanded in land plants, including bHLH, NAC, GRAS, AP2/ERF, ASL/LBD, and WRKYs [11]. Whole-genome duplication is the main mechanism responsible for the global expansion of TF gene families. Apart from TF gene copy number variation, the level of divergence, either at the protein level or in spatiotemporal expression, is another major determinant contributing to network complexity, evolution, and rewiring of GRNs in different plant species [14]. An assessment of TF DNA binding domain variation found that TF sequence preference divergence varies between TF families. While MYB, C2H2 zinc-finger, and B3 TFs showed the greatest intrafamily divergence, AP2, WRKY, and bHLH were some of the families showing the least divergence [15]. Recent examples of systematic GRN reconstruction in Arabidopsis root and flower development [16–20], maize leaf development [6], or in wheat [21, 22] typically combine dynamic genomewide expression analyses with information from genome-wide TF binding, e.g., as determined by chromatin immunoprecipitation followed by deep sequencing (ChIP-seq), DNA affinity purification followed by deep sequencing (DAP-seq) or other ways of mapping TF binding sites.

Characterization of Gene Regulatory Networks in Plants. . .

3

Reconstruction of developmental GRNs by combining genome-wide binding and transcriptome analyses revealed a high level of feedback control and cross-regulation between TFs integrating hormonal control, organ and cell-type identity, and growth [23]. GRN modeling is potentially complicated by the factor that TFs may act as repressors or activators depending not only on the cis-regulatory grammar [24–26] but also on TF concentration [24], and on potentially tissue-type or condition-specific interactions with transcriptional cofactors [27]. A further challenge is to integrate spatial gradients and movement of TFs or other regulatory molecules in GRN modeling, since intercellular communication is an important factor driving developmental patterning and environmental responses, e.g., to light or pathogen attack. Challenges on spatial heterogeneity and cell-type specific regulatory interactions can now be overcome by applying techniques for cell type and single-cell omics analysis [20], which can be complemented by computational reconstruction of cell position in a tissue context [28, 29]. High-resolution transcriptome analyses can be assisted by more sensitive methods measuring transient TF-DNA interactions via DamID-seq or cell-type specific interactions by Cut&Run/Cut&Tag. As the next major milestone, spatial quantitative targeted and untargeted analyses of gene expression and regulatory activities have recently started to be developed. Complementary to this, innovative approaches for genetic perturbation and synthetic regulatory systems have been and are being developed, including the application of dCAS9 variants for targeted induction/repression, cell-type specific knockout by CRISPR/ Cas9, or synthetic TFs and TF circuits.

2

Experimental Methods to Map GRNs in Plants How TFs act in a multicellular context to control cell-type and condition-specific gene expression is still far from understood. Experimental methods that specifically perturb and sensitively analyze TF functions in an in vivo cellular context are needed to address this challenge. Innovative synthetic systems for cell-type specific activation or knockdown of gene activities have been established. One of these systems is based on a synthetic LhGR TF that combines the DNA-binding domain of a high-affinity DNA-binding mutant of the bacterial lac repressor, a transcriptional activation domain of the yeast GAL4 TF and the ligand-binding domain of the rat glucocorticoid receptor (GR). Inducible transgene expression is mediated by a synthetic promoter that harbors lacI binding sites (pOp). The system was originally established in plants by the Moore lab [30] and has been successfully utilized in a number of studies (e.g., [31, 32]). A critical aspect of the cell-type specific induction is to

4

Klaas Vandepoele and Kerstin Kaufmann

monitor spatiotemporal specificity of the system, which has been addressed by utilizing reporter gene expression, such as GUS [32] or fluorescent reporters [33] (see Chapter 2). More recently, inducible systems based on other synthetic TFs in combination or recombinase activities have been introduced, providing versatility in tissue-specific activation of transgenes and building synthetic regulatory circuits [34, 35]. In order to modulate the activities of endogenous genes in a specific manner, an endonuclease-deficient version of Cas9 (dCAS9) has been fused with potent transcriptional activation or repression domains ([36, 37], see Chapter 3). Similar systems have been introduced using synthetic TFs whose DNA-binding specificity can be modulated to target specific endogenous gene loci, such as TALEN or Zinc finger TFs [38]. Besides specific activation or repression of regulatory proteins in cells and tissues, methods for obtaining context-dependent transcriptional readout and epigenetic status have been developed, facilitated by fluorescent activated cell sorting (see Chapter 5) or INTACT-based techniques [39]. Single-cell genomics approaches are now in place to establish cellular heterogeneity of regulatory protein functions and networks at unprecedented resolution (see Chapter 4). Beyond the levels of transcriptional regulation, e.g., via measuring transcript abundance or chromatin status, dynamic changes in proteome and metabolome can be informative readout of developmental or environmental-response time series data (see Chapter 6) to elucidate downstream changes of gene-regulatory programs or possible regulatory mechanisms beyond transcriptional control. Elucidating the molecular ‘hard-wiring’, such as protein-DNA interactions and protein-protein interactions, is important for understanding the mechanistic basis of gene-regulatory interactions. Chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) is a classical method to identify TF-bound genomic regions [2, 23, 40]. In recent years, novel methods to capture not only stable but also transient or cell-type specific protein-DNA interactions in vivo at genome-wide scale have been established. For example, DNA adenine methyltransferase identification followed by sequencing (DamID-seq ([41, 42], see Chapter 7) makes use of detection of adenine-methylated DNA regions in eukaryotic cells that lack adenine methylation. DamID-seq makes use of a fusion protein of a TF or other DNA-binding protein of interest with E. coli DNA adenine methyltransferase (Dam). This enzyme can methylate the adenine base in GATC sequences that are in spatial proximity of the bound genomic regions. Advantages of the method include relatively low amounts of tissue requirement and that transient protein-DNA interactions can be captured. Another approach toward sensitive in vivo mapping of TF-DNA interactions relies on specific cleavage of DNA close to protein-of-interest bound genomic regions. In case of CUT&RUN (Cleavage Under Targets and Release Using Nuclease), protein

Characterization of Gene Regulatory Networks in Plants. . .

5

A/G fused to micrococcal nuclease (MNase) is used to direct nuclease activity toward genomic regions bound by a specific protein, facilitated by interaction of protein A/G with an antibody against the protein of interest [39]. CUT&TAG (Cleavage Under Targets and Tagmentation) is a related approach ([43, 44], Chapter 8), in which the protein A/G is fused to hyperactive Tn5 transposase pre-loaded with sequencing adaptors, allowing for simultaneous cleavage of DNA and library preparation. Both methods are highly sensitive and applicable to low amounts of tissue, yet proper controls to correct for non-specific cuts in accessible genomic regions. Once TF-bound genomic regions have been identified, a final challenge is to associate the regulatory regions to which the TF binds with potentially directly regulated target genes. This is particularly the case in larger genomes with complex and distantly located regulatory regions. Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET) addresses this question by determining protein-of-interest-mediated long-range chromatin interactions ([45], Chapter 12). The method essentially combines ChIP with chromatin conformation capture (3C) and has been successfully used in species such as rice and maize [45]. Related methods can also be applied to capture higherorder chromatin interactions in targeted or untargeted manner [45]. A limitation to in vivo methods for identifying protein-DNA interactions is that they typically do not allow to elucidate the precise mechanisms of TF DNA-binding, e.g., the requirement for combinatorial protein-protein interactions or DNA recognition. Mechanistic knowledge is however crucial for generating predictive knowledge on conditional or cell-type specific regulatory interactions. In vitro methods to elucidate physical parameters of DNA-binding, such as specificity and affinity, can be applied towards this goal. DNA affinity purification followed by sequencing (DAP-seq) is a method that relies on isolating TF-bound DNA fragments that represent sheared genomic DNA [46]. Affinity purified DNA sequences are subjected to high-throughput sequencing and mapped to the genome. While DAP-seq typically identifies DNA regions bound by a specific TF that is produced in vitro or in E. coli, a modified version of the protocol can also be used to identify DNA-binding sites of heteromeric TF complexes ([47], Chapter 9). While DAP-seq can provide an overall view on potential DNA binding sites of a TF, SELEX-seq (Systematic Evolution of Ligands by Exponential Enrichment followed by sequencing) can be alternatively used to characterize and compare TF DNA-binding specificities ([48], Chapter 10). Instead of starting with a pool of genomic DNA, SELEX-seq uses short (~20–40 bp) randomized DNA pools as input for the TF-DNA-binding reaction. Protein-DNA complexes are affinity purified and the purified DNA is used as input for additional rounds of enrichment, and DNA from several rounds of enrichment is sequenced, followed by

6

Klaas Vandepoele and Kerstin Kaufmann

computational analysis of enriched sequence patterns. Both DAP-seq and SELEX-seq have been successfully used to characterize DNA-binding sites and DNA-binding specificities of plant transcription factors. Besides protein-DNA interactions, the ‘hard-wiring’ of GRNs is also influenced by protein-protein interactions, such as combinatorial TF interactions as well as interactions of TFs and cofactors. A large number of approaches to probe physical interactions of TFs and other proteins in vitro or in planta have been established. Typical targeted in planta methods to study interactions of two specific proteins include bimolecular fluorescence complementation (BiFC), split luciferase (SLC) or Fo¨rster Resonance Energy Transfer combined with Fluorescence Lifetime Imaging (FRETFLIM). FRET-FLIM is particularly suitable to identify cell-type specific interactions of regulatory proteins [49]. All methods require appropriate control experiments in the experimental design (see, e.g., [50]). Untargeted in vivo methods for identifying regulatory protein interactions are mostly based on mass spectrometry-based identification of proteins. ‘Classical’ methods are usually based on affinity purification of proteins-of-interest fused to a tag. Such tags include simple tags like GREEN FLUORESCENT PROTEIN (GFP, [51], see Chapter 11) or tags for tandem affinity purification [52]. More recently, proximity labeling has been introduced to identify complex partners of low-abundant regulatory proteins [53]. The method has also been adapted to identify the nuclear proteome of a rare plant cell type [53]. Essentially, modified versions of bacterial biotin ligase (BirA), such as TurboID [54], are fused to a protein of interest and expressed in plants. Upon exogenous supply of the substrate biotin, BirA binds and modifies biotin, which can then covalently bind to lysine residues in proximally located proteins, such as complex partners of the protein of interest. For all the untargeted methods, careful experimental design and quantitative data analysis are required to identify protein complex partners.

3

Data Analysis and Inference Methods to Accurately Model Gene Regulation Given the wealth of experimental profiling methods to characterize gene regulation in plants, new analysis and integration methods are needed to fully exploit the complementary information present in the obtained datasets. Experimental methods available to study gene regulation capture biological insights at different levels, ranging from the genomic DNA over TF proteins to regulatory interactions between regulators and target genes. Context-specific information, either from different organs, tissues, or cell types, together with temporal dynamics, offer another level of information to comprehensively and accurately map plant GRNs in vivo. While each experimental method requires dedicated data analysis,

Characterization of Gene Regulatory Networks in Plants. . .

7

standardization, and automation are of utmost importance to guarantee reproducibility. Furthermore, access to datasets processed using the same underlying methods is essential when (1) comparing results from different organs/cell types, species, or research labs and (2) devising integration strategies aiming to combine the complementary knowledge captured by different data types ([55–57], see Chapters 13 and 14). Several studies have demonstrated how the integration of complementary omics data types offers new insights into biological networks [58]. While a targeted or context-specific setup focuses on a specific developmental process or environmental response of interest, untargeted approaches have the potential to study gene regulation covering a broad array of processes, often making use of publicly available datasets. Zander and co-workers applied a targeted multi-omics network approach, where the dynamic profiling of regulatory interactions, chromatin state, transcriptome, and (phospho)proteome was used to delineate jasmonate signaling networks in Arabidopsis ([3], see Chapter 15). Using an untargeted approach, the integrative analysis of transcriptomics and interactomics data was used to infer functional and regulatory annotations for >5000 unknown Arabidopsis genes [59]. While conditionspecific co-expression modules together with guilt-by-association were first used to predict new gene functions, the integration of experimental protein-DNA and protein-protein interaction networks added physical and/or regulatory support. Various databases and platforms offer data-driven approaches to study gene regulation and infer GRNs in plants, including PlantRegMap, AtRegNet, AIV2, ConnecTF, and TF2Network ([13, 55, 56, 60, 61], see Chapters 13, 17, 19, and 20). These resources, as well as related initiatives, leverage diverse experimental data types, including expression data, chromatin data, TF motifs, genomic variation data (see Chapter 18), protein-DNA, and protein-protein interactions, to identify regulatory interactions. Querying these regulatory data types for a specific TF, gene, pathway, or biological process of interest immediately sheds light on the available experimental evidence and allows to formulate new research questions or hypotheses. These complementary experimental datasets also form the basis for powerful integration strategies, e.g., using machine learning to infer and prioritize new regulatory interactions ([62, 63], see Chapter 21). ConSReg is a condition-specific regulatory network inference engine which uses a machine learning approach to integrate expression data, TF–DNA binding data, and open chromatin data ([64], see Chapter 16). It focuses on using protein-DNA interaction data and open chromatin data to predict the combinations of TFs that can best explain observed differential gene expression under different environmental perturbations or cell types. In a related approach [63], different input networks capturing complementary information about DNA motifs, open chromatin, TF-binding, and expression-based

8

Klaas Vandepoele and Kerstin Kaufmann

regulatory interactions were combined using a supervised learning approach, resulting in an integrated Arabidopsis gene regulatory network (iGRN) covering 1.7 million interactions. This iGRN has a similar performance to recover functional interactions compared to state-of-the-art experimental methods and allows to infer known and novel TF functions, as demonstrated for regulators predicted to be involved in reactive oxygen species stress regulation [63]. Apart from experimental data, meta-data information about genotypes (e.g., mutant or over-expression lines) and ecotypes, growth conditions, sampling procedure, stress application or perturbation, and technical properties of the applied profiling method is equally important to identify, and potentially correct, biological or technical biases leading to confounders during the analysis of regulatory datasets. As an example, the integration of different epigenomic datasets, such as accessible chromatin regions, various histone modifications, or DNA methylation, only makes sense if samples are derived from the same organ and developmental stage or combined. As such, databases implementing sample validation and certification services improve the FAIRness (i.e., level of findability, accessibility, interoperability, and reusability) of samples at the source (i.e., at submission time) and facilitate extracting novel biological knowledge through the integrative analysis of different datasets. An example of such an integrative approach was the identification of unmethylated regions (UMRs) which provide useful information for identification of functional genes and CREs [65]. The comparison of UMRs in different plant genomes with publicly available tissue-specific chromatin accessibility and gene expression information provided evidence for the functional role of these regions. For plant species with large genomes, such as maize and barley, the authors suggested that the identification of UMRs from a single tissue can assist in delineating a fairly complete catalog of potential regulatory elements and expressed genes across many developmental stages.

4

Conclusions In summary, the development and application of novel experimental and computational methodologies allows plant scientists to look with unprecedented resolution to gene activities, further improving our understanding of plant GRNs. The complementarity of these methods is a blessing and curse at the same time: while they allow to measure specific molecular activities and generate novel insights for specific TFs, promoters, or target genes of interest, the integration of these different layers of information, measuring regulatory properties at different scales, also brings new challenges to unify all this knowledge in comprehensive and understandable biological models.

Characterization of Gene Regulatory Networks in Plants. . .

9

Acknowledgments The authors wish to thank the German Research Foundation (DFG) [458750707, 438774542, 270050988, 355312821 to KK], the Research Foundation-Flanders (FWO) for ELIXIR Belgium [I002819N to K.V], and Ghent University BOF Grant [BOF24Y2019001901 to K.V.] for support. We apologize to all authors whose work could not be cited due to space constraints. References 1. Schmitz RJ, Grotewold E, Stam M (2022) Cis-regulatory sequences in plants: their importance, discovery, and future challenges. Plant Cell 34:718–741 2. Heyndrickx KS et al (2014) A functional and evolutionary perspective on transcription factor binding in Arabidopsis thaliana. Plant Cell 26: 3894–3910 3. Zander M et al (2020) Integrated multi-omics framework of the plant response to jasmonic acid. Nat Plants 6:290–302 4. Chang KN et al (2013) Temporal transcriptional response to ethylene gas drives growth hormone cross-regulation in Arabidopsis. elife 2:e00675 5. Lorenzo O, Solano R (2005) Molecular players regulating the jasmonate signalling network. Curr Opin Plant Biol 8:532–540 6. Tu X et al (2020) Reconstructing the maize leaf regulatory network using ChIP-seq data of 104 transcription factors. Nat Commun 11: 5089 7. Sullivan AM et al (2014) Mapping and dynamics of regulatory DNA and transcription factor networks in A. thaliana. Cell Rep 8:2015–2030 8. Walley JW, Dehesh K (2010) Molecular mechanisms regulating rapid stress signaling networks in Arabidopsis. J Integr Plant Biol 52:354–359 9. Urano K et al (2010) ’Omics’ analyses of regulatory networks in plant abiotic stress responses. Curr Opin Plant Biol 13:132–138 10. Ma S, Bohnert HJ (2007) Integration of Arabidopsis thaliana stress-related transcript profiles, promoter structures, and cell-specific expression. Genome Biol 8:R49 11. Lehti-Shiu MD et al (2017) Diversity, expansion, and evolutionary novelty of plant DNA-binding transcription factor families. Biochim Biophys Acta Gene Regul Mech 1860:3–20 12. Yamasaki K et al (2013) DNA-binding domains of plant-specific transcription factors:

structure, function, and evolution. Trends Plant Sci 18:267–276 13. Jin J et al (2017) PlantTFDB 4.0: toward a central hub for transcription factors and regulatory interactions in plants. Nucleic Acids Res 45:D1040–D1045 14. Jones DM, Vandepoele K (2020) Identification and evolution of gene regulatory networks: insights from comparative studies in plants. Curr Opin Plant Biol 54:42–48 15. Lambert SA et al (2019) Similarity regression predicts evolution of transcription factor sequence specificity. Nat Genet 51:981–989 16. Brady SM et al (2011) A stele-enriched gene regulatory network in the Arabidopsis root. Mol Syst Biol 7:459 17. Chen D et al (2018) Architecture of gene regulatory networks controlling flower development in Arabidopsis thaliana. Nat Commun 9: 4534 18. Moreno-Risueno MA et al (2015) Transcriptional control of tissue formation throughout root development. Science 350:426–430 19. O’Maoileidigh DS, Graciet E, Wellmer F (2014) Gene networks controlling Arabidopsis thaliana flower development. New Phytol 201: 16–30 20. Ferrari C, Manosalva Perez N, Vandepoele K (2022) MINI-EX: integrative inference of single-cell gene regulatory networks in plants. Mol Plant 15:1807–1824 21. Zhang Y et al (2021) Evolutionary rewiring of the wheat transcriptional regulatory network by lineage-specific transposable elements. Genome Res 31:2276–2289 22. Zhang Y et al (2022) Transposable elements orchestrate subgenome-convergent and -divergent transcription in common wheat. Nat Commun 13:6940 23. Gaudinier A, Brady SM (2016) Mapping transcriptional networks in plants: data-driven discovery of novel biological mechanisms. Annu Rev Plant Biol 67:575–594

10

Klaas Vandepoele and Kerstin Kaufmann

24. Perales M et al (2016) Threshold-dependent transcriptional discrimination underlies stem cell homeostasis. Proc Natl Acad Sci U S A 113:E6298–E6306 25. Rodriguez K et al (2022) Concentrationdependent transcriptional switching through a collective action of cis-elements. Sci Adv 8: eabo6157 26. White MA et al (2016) A simple grammar defines activating and repressing cis-regulatory elements in photoreceptors. Cell Rep 17:1247–1254 27. Plant AR, Larrieu A, Causier B (2021) Repressor for hire! The vital roles of TOPLESSmediated transcriptional repression in plants. New Phytol 231:963–973 28. Laureyns R et al (2022) An in situ sequencing approach maps PLASTOCHRON1 at the boundary between indeterminate and determinate cells. Plant Physiol 188:782–794 29. Liu C et al (2022) A spatiotemporal atlas of organogenesis in the development of orchid flowers. Nucleic Acids Res 50:9724–9737 30. Samalova M, Brzobohaty B, Moore I (2005) pOp6/LhGR: a stringently regulated and highly responsive dexamethasone-inducible gene expression system for tobacco. Plant J 41:919–935 31. Lopez-Salmeron V et al (2019) Inducible, cell type-specific expression in Arabidopsis thaliana through LhGR-mediated trans-activation. J Vis Exp (146) 32. Samalova M, Kirchhelle C, Moore I (2019) Universal methods for transgene induction using the dexamethasone-inducible transcription activation system pOp6/LhGR in Arabidopsis and other plant species. Curr Protoc Plant Biol 4:e20089 33. Schurholz AK et al (2018) A comprehensive toolkit for inducible, cell type-specific gene expression in Arabidopsis. Plant Physiol 178: 40–53 34. Brophy JAN et al (2022) Synthetic genetic circuits as a means of reprogramming plant roots. Science 377:747–751 35. Lloyd JPB et al (2022) Synthetic memory circuits for stable cell reprogramming in plants. Nat Biotechnol 40:1862–1872 36. Lowder LG, Malzahn A, Qi Y (2018) Plant gene regulation using multiplex CRISPRdCas9 artificial transcription factors. Methods Mol Biol 1676:197–214 37. Pan C et al (2021) CRISPR-Act3.0 for highly efficient multiplexed gene activation in plants. Nat Plants 7:942–953 38. Liu W, Stewart CN Jr (2016) Plant synthetic promoters and transcription factors. Curr Opin Biotechnol 37:36–44

39. Deal RB, Henikoff S (2010) A simple method for gene expression and chromatin profiling of individual cell types within a tissue. Dev Cell 18:1030–1040 40. van Mourik H et al (2015) Characterization of in vivo DNA-binding events of plant transcription factors by ChIP-seq: experimental protocol and computational analysis. Methods Mol Biol 1284:93–121 41. Alvarez JM et al (2020) Transient genomewide interactions of the master transcription factor NLP7 initiate a rapid nitrogen-response cascade. Nat Commun 11:1157 42. Wu F, Olson BG, Yao J (2016) DamID-seq: genome-wide mapping of protein-DNA interactions by high throughput sequencing of adenine-methylated DNA fragments. J Vis Exp 107:e53620 43. Kaya-Okur HS et al (2019) CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nat Commun 10:1930 44. Ouyang W et al (2021) Rapid and low-input profiling of histone marks in plants using nucleus CUT&Tag. Front Plant Sci 12: 634679 45. Ouyang W et al (2020) Unraveling the 3D genome architecture in plants: present and future. Mol Plant 13:1676–1693 46. O’Malley RC et al (2016) Cistrome and Epicistrome features shape the regulatory DNA landscape. Cell 165:1280–1292 47. Lai X et al (2020) Genome-wide binding of SEPALLATA3 and AGAMOUS complexes determined by sequential DNA-affinity purification sequencing. Nucleic Acids Res 48: 9637–9648 48. Slattery M et al (2011) Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell 147:1270– 1282 49. Long Y et al (2017) In vivo FRET-FLIM reveals cell-type-specific protein interactions in Arabidopsis roots. Nature 548:97–102 50. Strotmann VI, Stahl Y (2022) Visualization of in vivo protein-protein interactions in plants. J Exp Bot 73:3866–3880 51. Smaczniak C et al (2012) Characterization of MADS-domain transcription factor complexes in Arabidopsis flower development. Proc Natl Acad Sci U S A 109:1560–1565 52. Van Leene J et al (2015) An improved toolbox to unravel the plant cellular machinery by tandem affinity purification of Arabidopsis protein complexes. Nat Protoc 10:169–187 53. Mair A et al (2019) Proximity labeling of protein complexes and cell-type-specific organellar proteomes in Arabidopsis enabled by TurboID. elife 8:e47864

Characterization of Gene Regulatory Networks in Plants. . . 54. Branon TC et al (2018) Efficient proximity labeling in living cells and organisms with TurboID. Nat Biotechnol 36:880–887 55. Brooks MD et al (2021) ConnecTF: a platform to integrate transcription factor-gene interactions and validate regulatory networks. Plant Physiol 185:49–66 56. Kulkarni SR et al (2018) TF2Network: predicting transcription factor regulators and gene regulatory networks in Arabidopsis using publicly available binding site information. Nucleic Acids Res 46:e31 57. Fu LY et al (2022) ChIP-Hub provides an integrative platform for exploring plant regulome. Nat Commun 13:3413 58. Depuydt T, De Rybel B, Vandepoele K (2022) Charting plant gene functions in the multiomics and single-cell era. Trends Plant Sci 28: 283 59. Depuydt T, Vandepoele K (2021) Multi-omics network-based functional annotation of unknown Arabidopsis genes. Plant J 108(4): 1193

11

60. Palaniswamy SK et al (2006) AGRIS and AtRegNet. A platform to link cis-regulatory elements and transcription factors into regulatory networks. Plant Physiol 140:818–829 61. Dong S et al (2019) Proteome-wide, structurebased prediction of protein-protein interactions/new molecular interactions viewer. Plant Physiol 179:1893–1907 62. Meng X et al (2021) Predicting transcriptional responses to cold stress across plant species. Proc Natl Acad Sci U S A 118:e2026330118 63. De Clercq I et al (2021) Integrative inference of transcriptional networks in Arabidopsis yields novel ROS signalling regulators. Nat Plants 7:500–513 64. Song Q et al (2020) Prediction of conditionspecific regulatory genes using machine learning. Nucleic Acids Res 48:e62 65. Crisp PA et al (2020) Stable unmethylated DNA demarcates expressed genes and their cis-regulatory space in plant genomes. Proc Natl Acad Sci U S A 117:23991–24000

Chapter 2 Inducible, Tissue-Specific Gene Expression in Arabidopsis Using GR-LhG4-Mediated Trans-Activation Tasnim Zerin, Thomas Greb, and Sebastian Wolf Abstract Inducible, tissue-specific gene expression is a potent tool to study gene regulatory networks as it allows spatially and temporally controlled genetic perturbations. To this end, we generated a toolkit that covers many cell types in the three main meristems: the root apical meristem, the shoot apical meristem, and the vascular cambium. The system is based on an extensive set of driver lines expressing a synthetic transcription factor under cell type-specific promoters. Induction leads to nuclear translocation of the transcription factor and expression of response elements under control of a cognate synthetic promoter. In addition, a fluorescent reporter incorporated in driver lines allows to monitor induction. All previously generated driver lines are available from the Nottingham Arabidopsis Stock Center. This protocol describes how users can create their own constructs compatible with the existing set of lines and as well as induction and imaging procedures. Key words Inducible expression, Trans-activation, Promoter-reporter studies, Expression control, Mis-expression, Functional genetics

1

Introduction The capability to genetically perturb living systems is crucial for understanding regulatory networks. Bipartite gene expression systems that selectively express a gene in a cell type-specific or stagespecific manner address this need in a particularly efficient and versatile manner [1]. We generated a comprehensive set of Arabidopsis (Arabidopsis thaliana) driver lines suited for tissue-specific transactivation of an effector cassette in a wide range of cell types [2], which has the possibility to monitor gene activation by a fluorescent promoter reporter. This enables cell type-specific complementation or knockdown, facilitates time-resolved monitoring of the response to a given cue, and allows the maintenance of effector lines for potentially deleterious gene products without expression until crossed to driver lines. An effector (e.g., a gene of interest) can be expressed in different tissues simply by crossing a

Kerstin Kaufmann and Klaas Vandepoele (eds.), Plant Gene Regulatory Networks: Methods and Protocols, Methods in Molecular Biology, vol. 2698, https://doi.org/10.1007/978-1-0716-3354-0_2, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023

13

14

Tasnim Zerin et al.

silent effector line to different driver lines. Similarly, the same driver can easily be used to promote the expression of various effectors. Stable, homozygous lines can be generated in the following generations before the effector is expressed for the first time, minimizing the threat of silencing. However, as F1 hybrids are genetically uniform, experiments can be performed almost immediately after crossing if enough seeds are obtained. This system utilizes the widely used LhG4/pOp system that ensures rapid, stable induction with minimal adverse effects on plant growth caused by the inducer and is combined with the ligand-binding domain of the rat glucocorticoid receptor (GR) [3, 4]. LhG4 is a chimeric transcription factor with high DNA-binding affinity and remains in the cytosol by being fused with the GR ligand-binding domain through sequestration by HEAT SHOCK PROTEIN90 in the absence of the inducer. After induction with the synthetic glucocorticoid dexamethasone (Dex), nuclear import of the transcription factor occurs, resulting in the transcriptional activation of expression cassettes that are under the control of the synthetic pOp 5′ regulatory region (Fig. 1). This regulatory region consists of a cauliflower mosaic virus (CaMV) 35S minimal promoter and multiple interspersed repeats of the E. coli lac operator elements (pOp4 and pOp6) and allows cell typespecific, LhG4-responsive over- or mis-expression of a target gene [3]. Driver lines are available at the Nottingham Arabidopsis Stock Centre (https://arabidopsis.info/) so that user-designed effector lines can be used for a wide range of cell-type-specific expression experiments by crossing. Alternatively, driver lines can be directly transformed with effector constructs. The presence of a fluorescent reporter expressed under the regulation of the pOp regulatory sequence in the driver lines described here allows the monitoring of the spatiotemporal dynamics of effector gene induction and can potentially depict any effect of this phenomenon on the respective tissue identity. Furthermore, it can be utilized to assess the feedback regulation of the effector gene and the promoter it is expressed from. Here, we describe the generation of driver and effector lines and provide protocols for induction experiments in different tissues. The driver lines were generated through the fast and efficient GreenGate cloning system [5], based on Golden Gate assembly [6] (see Note 1), but they are compatible with any vector/transgenic line in which the expression of an effector is under the control of derivatives of the pOp promoter element. GreenGate cloning is based on type II S restriction enzymes such as Eco31I or its isoschizomer BsaI. These enzymes cut downstream of their recognition sites producing overhangs with varying base composition. By incorporating Eco31I/BsaI restriction sites in oligonucleotides and

Inducible Trans-Activation in Arabidopsis

15

Fig. 1 Schematic Representation of the Dex-inducible LhGR/pOp System with Driver and Effector Lines. In driver lines, the synthetic transcription factor LhG4 is expressed under the control of tissue-specific promoters. LhG4 is translationally fused to the ligand binding domain of the rat glucocorticoid receptor (GR) and thus prevents the nuclear translocation of LhG4 in the absence of an inducer (here, Dexamethasone or Dex). The effector line harbors a transcriptional cassette driven by a multiple interspersed repeats of a pOp element (pOp4/pOp6) and a minimal 35S promoter. After the cross between the driver line and the effector line, Dex induction leads to GR-LhG4 binding to the pOp type elements in the reporter cassette. In the effector module, this induces the transcription of mTurquoise2 reporter and the effector gene of interest. Confocal micrograph shows reporter expression after induction of the pAHP6 driver line (pAHP6>>)

allocating specific overhang sequences to DNA elements, modular cloning is achieved, facilitating the generation of large assemblies in a quick, reliable manner.

16

2

Tasnim Zerin et al.

Materials 1. Standard molecular biology equipment. 2. GreenGate modules and empty vectors (available at Addgene: www.addgene.org): pGGA000, pGGB000, pGGC000, pGGD000, pGGE000, pGGF000, pGGM000, pGGN000, pGGZ001, pGGZ003. 3. Oligonucleotides: Primers are designed with overhang sequence containing module type-specific adapter sequences: Forward primer: 5′AACA GGTCTC A NNNN (n) nn* + specific sequence 3′. Reverse primer: 5′AACA GGTCTC A NNNN (n) + reverse complement of specific sequence 3′. GGTCTC in the overhang sequence is the BsaI/Eco31I recognition site, AACA was added because the enzyme does not cut if the restriction site is at the extreme ends of PCR products. NNNN represents the modulespecific overhang, whereas the underlined nucleotides (nn) are needed to maintain the reading frame of the coding sequence. 4. Plasmid miniprep kit. 5. PCR clean up kit. 6. BsaI/Eco31I. 7. T4 ligase and buffer. 8. 28 °C incubator. 9. Silwet L-77. 10. 4 week-old Arabidopsis plants. We grow ~20 plants in 15 cm pots and use 4–6 pots per transformation, tray with lid, halfstrength MS medium (without sugar) with appropriate selective agent. 11. Fine tweezers, e.g., Dumont 5 INOX. 12. 6 cm Petri dishes, 3/4 filled with 2% Agarose. 13. Porous tape. 14. Stereo (dissecting) microscope. 15. Confocal microscope upright equipped with a long working distance dipping lens (e.g., Zeiss W N-Achroplan 40×/ 0.75 M27, Nikon NIR Apo 40×/0.8 NA, etc.) for SAM imaging. Standard inverted confocal microscope for root and stem section imaging.

3

Methods

3.1 Preparing Plant Expression Vector Using GreenGate Cloning

In the general GreenGate cloning scheme, six entry modules (A–F) and a plant transformation vector backbone are assembled into one expression plasmid (Destination vector) [5]. Typically, the A module contains a promoter sequence, the B-module contains an

Inducible Trans-Activation in Arabidopsis

17

N-terminal tag or “dummy” sequence, the C-module houses a CDS, the D-module contains a C-terminal tag or dummy, the E-module contains a terminator, and finally, the F-module contains a resistance cassette for selection of transgenic plants. These DNA modules are flanked by overhang sequences that serve as adapters in the respective entry modules, and they are assembled into the final expression vector in that (A–F) order (Fig. 2). 3.2 PCR Amplification of Insert DNA

1. Amplify the sequence of interest with the designed primers by polymerase chain reaction (PCR) according to standard protocols. If there are BsaI/Eco31I sites present inside the insert DNA sequence, they can be removed by site-directed mutagenesis (see Note 2). 2. Separate the PCR product on an agarose gel. Excise and column purify the correct fragments using a commercial kit as per the manufacturer’s instructions.

3.3 Entry Module Preparation

1. Digest the empty modules (entry vector) and the PCR fragments separately with BsaI/Eco31I using ~1 μg of DNA (Vector or Fragment), 4 μL 10× digestion buffer, and 1 μL BsaI/ Eco31I in a tube and bring the volume to 40 μL with ddH2O. 2. Mix gently by pipetting up and down and spin down briefly. Incubate the reaction mix at 37 °C on a heat block for 15 min (or following the time recommendation of the manufacturer). 3. After digestion, column purify each sample using a commercial kit and quantify the purified DNA by determining the optical density at 260 nm (OD260) using a spectrophotometer. 4. Mix the digested insert and the digested vector with 1 μL T4 Ligase and 3 μL 10× T4 ligation buffer in a total volume of 30 μL and incubate at room temperature overnight (O/N) (following the recommendation of the T4 ligase supplier). The desired molar ratio of entry module: insert DNA sequences (e.g., 3:1, 5:1) for the ligation reaction depends on the concentration and length of the entry module inserts. 5. To increase the transformation efficiency, heat inactivate the T4 ligase by incubating the reaction mix at 65 °C for 20 min. 6. Use 15–20 μL of the ligation product to transform chemically competent E. coli cells according to the standard lab protocols and spread the bacteria on agar plates supplemented with ampicillin (100 μg/mL). All GreenGate entry module vectors are Ampicillin resistant. 7. Incubate the plates at 37 °C O/N.

Fig. 2 GreenGate Cloning Principle. The GreenGate cloning system generally consists of six different entry vectors pGGA000-pGGF000 and a destination vector pZ001 or pZ003. The entry vectors contain an Ampicillin resistance cassette (AmpR) and a ccdB cassette flanked by the specific adaptors (sequence in blue) for each entry vector (e.g, pGGA000 entry vector) and the ‘GGTCTC’ BsaI/Eco31I recognition sites. BsaI/Eco31I (Type IIS restriction endonuclease) recognizes the non-palindromic “GGTCTC” sequence (red) in the empty entry vectors and cuts asymmetrically from the second nucleotide downstream of the recognition site and creates a four base 5′ overhang. (a) BsaI/Eco31I digestion of pGGA000 releases ccdB and creates the pGGA000specific four nucleotide overhangs (dark blue, bold). A desired promoter sequence (usual for A module) is amplified by primers containing “GGTCTC” and pGGA000 specific adaptors and digested with BsaI/Eco31I. Finally, both the digested empty pGGA000 entry vector and the promoter sequence were ligated into a final A module. (b) The same procedure is followed to generate the other modules. These six modules (A-F) harbor a plant promoter, an N-terminal tag, a coding sequence (i.e., the gene of interest), a C-terminal tag, a plant terminator and a plant resistance marker to select for transgenic plants. The modules can only be ligated in this pre-defined order. (c) The final GreenGate reaction combines the BsaI/Eco31I digestion of the destination vector pGGZ001 and the six entry vectors pGGA000-pGGF000 and the simultaneous ligation of all modules into the destination vector. (d) For the creation of two expression cassettes on a single T-DNA, the “H” overhang (“TAGG”), two adapter modules and two intermediate vectors are used. In the first step, two expression cassettes (“supermodules”) are assembled in parallel in two different intermediate vectors (pGGM000 and pGGN000) where the BsaI/Eco31I sites are retained in the supermodule (unlike the A-F modules). In the next step, these two supermodules are then transferred into a destination vector via a typical GreenGate reaction. The overhang types are denoted in capital letters. P 1,2 = promoter, N 1,2 = N-terminal tag, CDS 1,2 = coding sequence, C 1,2 = C-terminal tag, T 1,2 = terminator, R= Plant resistance marker, F-H = FH-adapter module, H-A= HA-adapter module

Inducible Trans-Activation in Arabidopsis

19

8. Screen for colonies with the correct length of the desired plasmid insert by colony PCR using the primers- 5′-GTTGTGTGGAATTGTGAGC-3′ and 5′-GTTTTCCCAG TCACGACG-3′. The same primer pair binds to all entry vectors (A–F). 9. Grow an overnight liquid culture of single colonies (positive for desired entry vector) using 5 mL LB liquid culture supplemented with ampicillin (100 μg/mL) and incubate O/N in a 37 °C shaker. 10. Isolate plasmids from the O/N liquid culture with a plasmid mini-prep kit (according to manufacturer’s manual). Confirm the isolated plasmids by performing restriction digestion reaction using enzymes that cut uniquely in the insert. Finally, sequence the selected plasmids using appropriate oligonucleotides, e.g., those used for colony PCR and, if necessary, internal insert-specific oligos (see Note 3). 3.4 Destination Module Preparation

1. For the construction of the final destination vector, add 1 μL (50–150 ng) empty destination vector (pGGZ001, pGGZ002, or pGGZ003.), 1.5 μL (50–300 ng) of each entry module, 2 μL 10× digestion buffer, 1.5 μL 10 mM ATP, 1 μL T4 Ligase (30 U/μL), 1 μL BsaI/Eco31I and ddH2O to a total volume of 20 μL (see Note 4). 2. Perform the GreenGate reaction using a PCR thermocycler according to the following steps: 37 °C for 5 min followed by 16 °C for 5 min and repeating these two steps for 50 cycles, followed by single steps of 10 min at 50 °C and 10 min at 80 °C. 3. Use the total ligation reaction mix to transform chemically competent E. coli and spread the bacteria on agar plates supplemented with the appropriate selective agent spectinomycin (50 μg/mL). Incubate the transformed E. coli on the plate at 37 °C overnight. 4. To confirm transformation, re-streak each of the selected single colonies on spectinomycin (50 μg/mL) and ampicillin (100 μg/mL) containing plates, respectively. Use colonies that grow only on spectinomycin but not on ampicillin plates. Incubate the E. coli colonies overnight at 37 °C. Check for positive colonies by colony PCR using appropriate primers. 5. Grow an overnight liquid culture of single colonies positive for desired entry vector using 5 mL LB liquid culture supplemented with spectinomycin (50 μg/mL) and incubate O/N in a 37 °C shaker. Isolate the corresponding plasmids with a MiniPrep Kit and check the isolated plasmids by restriction enzyme analysis and sequence selected positive constructs using primers 5′- ACCTCTCGGGCTTCTGG-3′, 5′-CCTTTTTACGGTT CCTG-3′ directed against the vector backbone, and appropriate internal oligonucleotides (see Note 3).

20

Tasnim Zerin et al.

3.5 Intermediate Supermodule Preparation

ur driver lines contain two expression cassettes, one for the GR-LhG4 transcription factor and one for the internal reporter pOp6:ERmTurquoise. As this cannot be built using single A–F modules, supermodules were used to assemble both expression cassettes in a single plant transformation vector (Fig. 2d). 1. To combine two independent sets of entry modules, first build two intermediate plasmids, called supermodules, before assembling the final expression plasmid in pGGZ001 or pGGZ003. 2. Add F–H adaptor at the end of the first supermodule and H–A adaptor at the beginning of the second supermodule (those serve as connections between two constructs). Only the pGGN000 intermediate module carries the resistance cassette. To generate the expression plasmid, perform the GreenGate reaction with the destination vector and the pGGN000 and pGGM000 intermediate supermodules. Alternatively, it is also possible to mix destination vector, pGGN000 intermediate supermodule, and the remaining single modules to perform the GreenGate reaction. The latter method is less efficient than the former but can be faster. 3. To create a pGGM000/pGGN000 supermodule, mix 1.5 μL (100–300 ng/μL) of each of the entry modules with 1 μL (30 ng/μL) empty intermediate vector (pGGM000 or pGGN000), 2 μL 10× digestion buffer, 1.5 μL 10 mM ATP, 1 μL T4 Ligase (30 U/μL), and 5–10 U BsaI/Eco31I in a total volume of 20 μL. Calculate the desired molar ratio entry module:destination vector (e.g., 3:1, 5:1) for the ligation depending on the concentration and length of the entry module inserts. 4. Mix and perform the GreenGate reaction as previously mentioned. 5. Use 20 μL of the ligation reaction to transform competent E. coli and streak on plates containing kanamycin (50 μg/mL) and incubate overnight at 37 °C (see Note 5). 6. Re-streak each of the selected single colonies on kanamycin (50 μg/mL) and ampicillin (100 μg/mL) containing plates, respectively. Perform overnight incubation of the E. coli colonies on the plate at 37 °C. 7. Select the single colonies that grow only on kanamycin but not on ampicillin for plasmid isolation. Use 5 mL LB liquid culture supplemented with kanamycin (50 μg/mL) and incubate overnight in a 37 °C shaker. 8. Using a mini-prep kit, isolate the corresponding plasmids and check by restriction enzyme analysis and confirm by sequencing using primers 5′-AGGCATCAAACTAAGCAGAAG-3′ and 5′-CGTTTCCCGTTGAATATGGC-3′ annealing to the pGGM000/pGGN000 backbone. If the insert cannot be fully sequenced with these primers, design internal primers for sequencing.

Inducible Trans-Activation in Arabidopsis

21

3.6 Transform the Plant Expression Vector into A. tumefaciens

Using your preferred method, transform an A. tumefaciens pSOUP+ strain (e.g., ASE), as the ori used in the GreenGate destination vectors requires this helper plasmid, and plate on the appropriate combination of antibiotics, including spectinomycin. Incubate the transformed A. tumefaciens on the plate in a 28 °C incubator for 2–3 days.

3.7 Generation of Transgenic Arabidopsis Plants

1. Transform A. thaliana plants according to your preferred method. For simplicity, we prefer the following protocol: Resuspend a single A. tumefaciens colony in 100 μL of H2O and plate 50 μL each on two LB plates containing the appropriate antibiotics. 2. After 2 days at 28 °C, resuspend the bacteria from both plates in 15 mL of LB without antibiotics. Add 120 mL of 5% sucrose solution and 45 μL of silwet L-77 (0.03%). Mix and place solution in a flat container, such as a glass Petri dish or plastic scale pan. Dip plants in the solution for 1 min, making sure that all inflorescences are covered in liquid. 3. Place plants in a greenhouse tray, cover with a lid, and keep in the dark overnight. The following day, move plants to growth chamber or greenhouse. To increase transformation efficiency, repeat the procedure after 1 week. 4. To select the transformed plants, harvest and plate the T1 seeds on selection medium. Select the dark green seedlings that grow well for propagation. 5. In T2 generation, select stable lines with a single integration event that show 3:1 segregation ratio on the selection medium. Propagate these lines to T3 generation and select the homozygous plants by plating seeds on selection medium. If all offspring survive and the plant line is originating from a single integration T1, that is a stable line.

3.8 Induction of Trans-Activation in Arabidopsis Driver Lines

To test reporter expression in A. thaliana driver lines generated through Steps 1 and 2, sterilize seeds as described below. Induction times, Dex concentration, and other experimental conditions should be adapted to your purpose (see Note 6). The procedures described here are intended for testing newly generated transgenic lines but might serve as a starting point for your other experiments (see Notes 7, 8, and 9). As positive control, driver lines are available from the Nottingham Arabidopsis Stock Center (www.arabidopsis. info). 1. Add 0.5–1.0 mL of 70% Ethanol + 0.01% non-ionic detergent to approximately 100 seeds (20 mg) in a 1.5 mL reaction tube and invert the tube a few times. Spin down at 1000 × g for 15 s and discard the supernatant.

22

Tasnim Zerin et al.

2. Add 0.5–1.0 mL of absolute ethanol. Invert the tube several times, spin down for 1000 × g for 15 s and discard the supernatant. 3. Allow seeds to dry inside the hood. 3.8.1

Root

1. Prepare half-strength Murashige and Skoog medium, pH 5.8 and add 0.9% plant agar and 1% sucrose. After autoclaving add Dexamethasone (Dex) to a final concentration of 30 μM to induction plates and an equal amount of DMSO (mock) to control plates, respectively. 2. Put seeds for root imaging on plates and stratify them for 48 h in darkness and cold (4 °C). 3. Put plates in a vertical position in a plant incubator and grow them for 4–7 days, depending on your experiment. 4. Five days after germination (dag), image the seedlings using confocal laser scanning microscopy. In addition, induction can be verified via RT-qPCR with primers directed against the mTurquoise2 reporter gene.

3.8.2 Shoot Apical Meristem (SAM)

1. Prepare plates with ½ MS, 0.9% plant agar and 1% sucrose medium and put sterilized seeds on plates. Store the plates for 2 days in darkness and cold for stratification. 2. Put plates in a vertical position in a plant incubator. Six to seven days after germination, transfer the seedlings to the soil, each plant in a single pot (long day (16/8 h), 22 °C, humidity, 65%). 3. When the stem is around 1 cm long (25–30 dag), spray the inflorescence SAM with 30 μM Dex solution (see Note 10) or an equal amount of DMSO (mock). 4. 24–48 h after induction, dissect the SAMs and proceed to imaging. In addition, induction can be verified via RT-qPCR with primers directed against the mTurquoise2 reporter gene.

3.8.3

Stem

1. Sterilize seeds as described earlier and put seeds on ½ MS, 0.9% plant agar and 1% sucrose plates. Stratify for 48 h in darkness and cold (4 °C). 2. Six to seven days after germination, transfer the seedlings to soil, each plant in a single pot (long day (16/8 h), 22 °C, humidity, 65%). 3. Induce trans-activation of effector gene in stems by watering with Dex or mock (0.1% ethanol) solution, or by dipping the plants in Dex or mock solutions, respectively. For watering, use a 25 μM Dex solution in water prepared from a stock of 25 mM Dex dissolved in ethanol. Water every 2–3 days until the desired time of induction. On the other hand, for dipping, prepare a 1 L beaker, 750 mL of water containing 0.02% silwet L-77 with 25 μM of Dex or equivalent amount of mock

Inducible Trans-Activation in Arabidopsis

23

solution, respectively. Dip single plants for 30 s in the induction or in the mock solution. Repeat every 2–3 days until the desired time of imaging. In addition, induction can be verified via RT-qPCR with primers directed against the mTurquoise2 reporter gene. 3.9 Imaging of Reporter Expression in Arabidopsis Driver Lines 3.9.1

1. Transfer seedlings from plate to a 10 μg/mL propidium iodide (PI) solution and counter-stain them for 5 min. 2. Mount the roots on microscopy slides and image using CLSM with your desired magnification (see Note 11).

Root

3.9.2 Shoot Apical Meristem

1. Cut the stem 2 cm below the shoot tip with forceps. Hold the stem in one hand and use fine forceps to remove the flower buds and large primordia. To remove young primordia, fix the SAM in an upright position in a Petri dish containing 2% agarose. Keep removing the floral buds in the view of a stereomicroscope until the meristem is visible through the eyepiece. 2. For staining, prepare 1 mL of 1 mg/mL of PI solution of water in a small Petri dish. Immerse the whole stem for 2 min and rinse in water. Alternatively, pipet a few drops of staining solution on top of the dissected SAM for 2 min and rinse with water. 3. Fix the stem with exposed meristem on a 2% agar plate. 4. Image samples using a confocal laser scanning microscope with a 25× or 40× dipping water immersion lens with long working distance (see Note 12).

3.9.3

Stem

1. Cut sections of plant stems horizontally with a razor blade, excising a segment of approximately 3 cm. Fix the stem on the opposite side of the desired section to cut using your finger. Perform several fine cuts sequentially but note that only sections from a similar position in the stem should be compared. 2. After each cut, immerse the razorblade in a Petri dish containing tap water and collect the stem sections. Stain sections at this point or directly mount on microscope slides. 3. For staining, prepare 1 mL of 1 mg/mL of PI solution of water in a small Petri dish. Immerse the sections for 2 min and rinse them in water. Transfer them to microscope slides with fine forceps or a fine paintbrush. Take care to not squeeze the samples with the coverslip. 4. Image samples using a confocal laser scanning microscope Use 561 nm laser light to excite PI fluorescence and collect from 570 to 620 nm. Use 405 nm to excite the mTurquoise2 fluorophore effector encoded by the driver line constructs and collect emission from 425 to 475 nm. Record image stacks spanning 50–100 μm in z-direction with a step size of 0.4 μm.

24

4

Tasnim Zerin et al.

Notes 1. GreenGate is a version of Golden Gate cloning distinguished by using unified adapter sequences for cloning into plant expression vectors based on the pGreen [7] plasmid. This simplifies sharing between laboratories and allows to maximize the combinatorial power inherent to the modular system. If only one or a few constructs are required, traditional cloning methods might be more efficient if all required components are present in the lab. 2. An alternative way to remove endogenous BsaI/Eco31I site is to amplify the sequence in two fragments while introducing a (silent) single-base exchange, destroying the endogenous restriction site in one of the internal primers. As the overhangs generated by type IIs enzymes can be freely chosen, it is possible to scarlessly ligate the two PCR fragments by adding BsaI/ Eco31I sites in the overhangs of the internal primers. 3. The pOp6 promoter sequence is prone to recombination events in E.coli and loss of Op repeats can frequently be observed. Therefore, it is recommended to check the length of the promoter through PCR before proceeding with the constructs. In our hands, a reduction from 6 to 4 operator elements (i.e., pOp4) does not seem to affect functionality. We have not observed recombination events in Agrobacterium but recommend to check through Colony-PCR. 4. It is possible to replace entry modules with purified, digested PCR products, however, this might result in reduced efficiency. It is recommended to avoid creating these PCR products using spectinomycin-resistant plasmids as template, as these might give false-positive colonies. 5. If cloning into the pGGM000 or pGGN000 vectors is unsuccessful, one possible solution is to digest the pGGM000 or pGGN000 with BsaI/Eco31I, run on a gel and purify the pGGM000 or pGGN000 backbone (approximately 2000 bp) fragment from the gel without the ccdB cassette (approximately 1400 bp). Then proceed normally with the ligation reaction. 6. For experiments involving effector expression lines, we recommend using the “empty” driver line, i.e., the parental line without effector (aside from the internal reporter) induced at the same time as the experimental plants as additional control besides uninduced plants. 7. Dexamethason can be applied via spraying, droplets, and watering and can be supplied to agar-based media (after autoclaving). Localized induction can be performed by applying dex-containing chunks of agarose, for example.

Inducible Trans-Activation in Arabidopsis

25

8. Although not verified in our experiments, IPTG might be applied to quench LhG4-based transcription after dex application and thus make induction reversible. 9. Only un-induced plants are used for propagation to reduce the probability of gene silencing. Plants in T2/T3 should be used for imaging due to possible artifacts from the T1 selection procedure. 10. SAMs can also be induced by brushing the Dex solution on them with a paintbrush to prevent spraying droplets. It is suggested to wear a mask when Dex is applied by spraying. 11. We recommend reusable microscope imaging chambers (e.g., 1-well tissue culture chambers) with a 0.17 μm cover slip glass as bottom. 12. Upright microscopes dissected SAMs.

are

preferable

for

imaging

References 1. del Valle RA, Didiano D, Desplan C (2012) Power tools for gene expression and clonal analysis in Drosophila. Nat Methods 9:47–55. https://doi.org/10.1038/nmeth.1800 2. Schu¨rholz A-K, Lo´pez-Salmero´n V, Li Z et al (2018) A comprehensive toolkit for inducible, cell type-specific gene expression in Arabidopsis. Plant Physiol 178:40–53. https://doi.org/10. 1104/pp.18.00463 3. Craft J, Samalova M, Baroux C et al (2005) New pOp/LhG4 vectors for stringent glucocorticoid-dependent transgene expression in Arabidopsis: inducible expression from improved pOp promoters. Plant J 41:899–918. https://doi.org/10.1111/j.1365-313X.2005. 02342.x 4. Samalova M, Brzobohaty B, Moore I (2005) pOp6/LhGR: a stringently regulated and highly responsive dexamethasone-inducible gene

expression system for tobacco: pOp6/LhGR in tobacco. Plant J 41:919–935. https://doi.org/ 10.1111/j.1365-313X.2005.02341.x 5. Lampropoulos A, Sutikovic Z, Wenzl C et al (2013) GreenGate - a novel, versatile, and efficient cloning system for plant transgenesis. PLoS One 8:e83043. https://doi.org/10.1371/jour nal.pone.0083043 6. Engler C, Kandzia R, Marillonnet S (2008) A one pot, one step, precision cloning method with high throughput capability. PLoS One 3: e3647. https://doi.org/10.1371/journal. pone.0003647 7. Hellens RP, Edwards EA, Leyland NR et al (2000) pGreen: a versatile and flexible binary Ti vector for agrobacterium-mediated plant transformation. Plant Mol Biol 42:819–832. https://doi.org/10.1023/a:1006496308160

Chapter 3 Targeted Activation of Arabidopsis Genes by a Potent CRISPR–Act3.0 System Changtian Pan and Yiping Qi Abstract The CRISPR/Cas system has emerged as a versatile platform for sequence-specific genome engineering in plants. Beyond genome editing, CRISPR/Cas systems, based on nuclease-deficient Cas9 (dCas9), have been repurposed as an RNA-guided platform for transcriptional regulation. CRISPR activation (CRISPRa) represents a novel gain-of-function (GOF) strategy, conferring robust over-expression of the target gene within its native chromosomal context. The CRISPRa systems enable precise, scalable, and robust RNA-guided transcription activation, holding great potential for a variety of fundamental and translational research. In this chapter, we provide a step-by-step guide for efficient gene activation in Arabidopsis based on a highly robust CRISPRa system, CRISPR–Act3.0. We present detailed procedures on the sgRNA design, CRISPR–Act3.0 system construction, Agrobacterium-mediated transformation of Arabidopsis using the floral dip method, and identification of desired transgenic plants. Key words CRISPR/Cas9, CRISPR activation, CRISPR, Act3.0, Targeted gene activation, Multiplex gene activation, Arabidopsis, Floral dip, Transcriptional regulation, Gain-of-function

1

Introduction The causal relationship between gene expression and a phenotype is at the center of genetic studies in plants. Precisely modulating the expression of the desired genes is key to understanding the complex gene regulation networks and their functions. The geneoverexpression method using cDNA over-expression vectors represents the conventional gain-of-function (GOF) strategy. However, using this method, it is challenging to capture the complexity of transcript isoforms, and large cDNA sequences are generally difficult to clone into vectors. More importantly, it lacks the flexibility and scalability to achieve simultaneous multigene activation [1]. Therefore, an ideal technology that overcomes such drawbacks and would enable precise, programmable, and scalable GOF perturbations at endogenous loci is needed.

Kerstin Kaufmann and Klaas Vandepoele (eds.), Plant Gene Regulatory Networks: Methods and Protocols, Methods in Molecular Biology, vol. 2698, https://doi.org/10.1007/978-1-0716-3354-0_3, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023

27

28

Changtian Pan and Yiping Qi

Recently, RNA-guided endonucleases have been engineered as a versatile DNA-binding platform for precisely modulating endogenous gene expression [2]. Among these established custom DNA-binding domains, Cas9 represents the most promising endonuclease for targeted gene perturbations because of its simplicity, specificity, and scalability [3]. Guided by a single guide RNA (sgRNA), Cas9 can be directed to precisely edit genomic DNA. Beyond genome editing, nuclease-deficient Cas9 (dCas9) has been functionalized with transcriptional activation domains, enabling Cas9 to serve as a unique platform for targeted gene activation [4]. The dCas9 protein retains its ability to bind the targeted genomic DNA, but it lacks nuclease activity for cleaving, thus it can serve as a sequence-specific RNA-guided DNA-binding platform [4]. The design of CRISPRa systems has typically relied on the fusion or recruitment of transactivation domains to either terminus of dCas9 protein or RNA aptamers like MS2 inserted in the stemloop of sgRNA scaffold [5]. Recent reports have suggested that the RNA-guided CRISPRa enables precise, scalable, and robust transcription activation of target genes in plants [3, 6–8]. More recently, a highly robust CRISPRa system, CRISPR– Act3.0, has been developed in plants [9]. It had been shown that the CRISPR–Act3.0 system could result in four to six fold-higher activation than the state-of-the-art CRISPRa systems [9]. In addition, CRISPR–Act3.0 allows simultaneous activation of multiple genes in Arabidopsis, which are stably transmitted to the T3 generations [9]. In the CRISPR–Act3.0 system (Fig. 1), the dSpCas9 is fused with a transcription activation effector VP64, and MS2 RNA aptamer is inserted into both sgRNA tetraloop and stem-loop 2 for recruiting MS2 bacteriophage coat protein (MCP), resulting in gR2.0 (single guide RNA 2.0). The SunTag composed of ten copies of GCN4 peptide is fused to an MCP. Meanwhile, the single-chain variable fragment (scFv) of GCN4 antibody fused with a super fold green fluorescent protein (sfGFP) and two TAL Activation Domain (2xTAD) motifs are co-expressed. Therefore, many copies of 2xTAD transactivation domains could be recruited to a specific locus via the gR2.0 scaffold (Fig. 1). In this chapter, we provide a step-by-step guide to achieve single or multiplexed gene activation in Arabidopsis using the CRISPR–Act3.0 system. The procedures mainly include the following steps (Fig. 2): (1) identifying the promoter region of the gene of interest and sgRNA design for activation; (2) CRISPR– Act3.0 vector construction using Golden Gate and Gateway cloning methods; (3) Agrobacterium-mediated transformation of Arabidopsis using the floral dip method; (4) identification of transgenic Arabidopsis plants with highly activation efficiency of target genes.

CRISPR Activation in Arabidopsis

29

Fig. 1 Schematic illustration of the CRISPR–Act3.0 system. The dSpCas9 is fused with a VP64. The coupled gR2.0 contains two MS2 RNA aptamers (in pink) for recruiting the MS2 bacteriophage coat protein (MCP), which is fused to the SunTag. The single-chain variable fragment (scFv) of GCN4 antibody is fused to a super folder green fluorescent protein (sfGFP), which serves as a linker between the scFv and transcription activator 2xTAD

Fig. 2 Flowchart of target gene activation in Arabidopsis by a potent CRISPR–Act3.0 system. The whole process consists of four phases which take around 11 weeks. Phase 1, sow Arabidopsis seeds and grow seedlings; Phase 2, sgRNA design for target gene activation and CRISPR–Act3.0 vector construction; Phase 3, Agrobacterium-mediated transformation of Arabidopsis using the floral dip method; Phase 4, Identification of transgenic Arabidopsis plants with high activation efficiency of target genes

30

2

Changtian Pan and Yiping Qi

Materials

2.1 Plant and Bacterial Strains

1. Arabidopsis thaliana (ecotype Columbia). 2. E. coli competent cells DH5 alpha. 3. Agrobacterium chemically competent cells GV3101.

2.2

Plasmids

2.3 Vector Construction

1. Vectors obtained from Addgene (https://www.addgene.org/): pYPQ131B2.0 (99885), pYPQ132B2.0 (99888), pYPQ133B2.0 (99892), pYPQ134B2.0 (167158), pYPQ135B2.0 (167159), pYPQ136B2.0 (167160), pYPQ142 (69294), pYPQ143 (69295), pYPQ144 (69296), pYPQ145 (69297), pYPQ146 (69298), pYPQ-dzCas9-Act3.0 (158414), pYPQ202 (86198, see Note 1). 1. Online tool for identifying the promoter region of the gene of interest, Phytozome 13 (https://phytozome-next.jgi.doe. gov/). 2. Online line tool for sgRNA design, CRISPR-P v2.0 (http:// crispr.hzau.edu.cn/CRISPR2/). 3. DNA Cloning Software (e.g., Snapgene). 4. Restriction enzymes: BsmBI (Esp3I), BsaI-HF v2, BamHI, XhoI, EcoRI. 5. DNA oligonucleotides for sgRNA cloning. 6. T4 Polynucleotide Kinase. 7. T4 DNA ligase. 8. Gel Extraction Kit. 9. Gateway™ LR Clonase™ II Enzyme mix. 10. Plasmid DNA Miniprep Kit. 11. Sequencing primers: pTC14-F2 (50 -caagcctgattgggagaaaa-30 ) for pYPQ131B2.0 to pYPQ136B2.0. M13-F (50 -ttcccagtcacg acgttgtaaaac-30 ) and M13-R (50 - TTTGAGACACGGGCCA GAGCTGC-30 ) for pYPQ142 to pYPQ146. 12. 1 kb DNA ladder (e.g., Azura Genomics). 13. DNA Gel Loading Dye (6X). 14. Heating block or water bath. 15. Temperature-controlled shaker and incubator. 16. Thermocycler. 17. NanoDrop™ One UV-Visible spectrophotometer. 18. Equipment and supplies for agarose gel electrophoresis. 19. 2 mL and 1.7 mL centrifuge tubes. 20. 0.2 mL PCR tubes.

CRISPR Activation in Arabidopsis

2.4 Culture Media and Stock Solutions

31

1. LB broth: 10 g/L of tryptone, 5 g/L of yeast extract, 10 g/L of NaCl. For LB solid media, add 15 g/L agar. Adjust the pH to 7.0 with NaOH, and autoclave the mixture for 30 min at 120  C. 2. MS2 medium: 4.3 g/L Murashige and Skoog with Vitamins and FeNaEDTA, 10 g/L sucrose, 8 g/L agar. Adjust the pH to 5.7 with NaOH, and autoclave the mixture for 30 min at 120  C. 3. 10 mg/mL Tetracycline (1000) (see Note 2). 4. 100 mg/mL Spectinomycin (2000) (see Note 2). 5. 50 mg/mL Kanamycin (1000) (see Note 2). 6. 15 mg/mL Rifampin (1000) (see Note 2). 7. 50 mg/mL Hygromycin (1000) (see Note 2). 8. 100 mg/mL Timentin (1000) (see Note 2). 9. 0.1 M IPTG (see Note 2). 10. 20 mg/mL X-Gal (see Note 2).

2.5 Arabidopsis Transformation Using the Floral Dip Method

1. Silwet-77. 1. 70% (vol/vol) ethanol. 2. 50% Clorox bleach. 3. Tween 20. 4. 0.05% agarose (wt/vol). 5. Selection of 0.5  MS plates with relevant antibiotics.

2.6 qRT-PCR Analysis of Transgenic Plants

1. TRIzol RNA Isolation Reagents. 2. DNase I (RNase-free). 3. SuperScript™ III First-Strand Synthesis System. 4. AzuraView™ GreenFast qPCR Blue Mix. 5. PURline™ 96-well PCR plates. 6. CFX96 Touch Real-Time PCR Detection System.

3

Methods

3.1 Growth of Arabidopsis Plants

1. Suspend seeds in 0.05% agarose solution using a 10 mL culture tube and incubate them at 4  C in dark for 3 days (see Note 3), then sow about 20 seeds on wet soil covered with a bridal veil (see Note 4) for each 3.5 in.  3.5 in. pot [10]. 2. Germinate seeds and grow these seedlings in a growth chamber under a long-day condition (16 h light/8 h dark) at 22  C until flowering (around 3–4 weeks).

32

Changtian Pan and Yiping Qi

3.2 CRISPR–Act3.0 Construct Assembly 3.2.1 sgRNA Design for Targeted Gene Activation

1. Retrieve the promoter sequence of the candidate gene in a length of 300–500 bp using the database Phytozome 13 (https://phytozome-next.jgi.doe.gov/) (see Note 5). The genes AtFT (AT1G65480) and AtTCL1 (AT2G30432) used in this protocol are shown as an example of multiplexed gene activation. 2. Design 2 sgRNAs targeting the retrieved promoter sequence for each target gene using CRISPR-P v2.0 (http://crispr.hzau. edu.cn/CRISPR2/) and these sgRNAs are followed by a canonical 50 -NGG-30 PAM for SpCas9 (see Table 1). The length of guide RNAs can be slightly different, but a length of 20 bp is recommended. A few principles should be taken into consideration when designing sgRNAs for gene activation: (a) To make sure the high activation of the target gene, multiple sgRNAs (3–5) are encouraged to be designed for each gene and pre-tested individually for activation efficiency using a protoplast system. Then, the most efficient sgRNA of each gene is employed for targeted gene activation. Alternatively, without pre-testing, we suggest designing at least 2 sgRNAs for each gene and assembling these sgRNAs into the multiplexed sgRNA expression vector. (b) The promoter region between 200 bp and 0 bp especially the 180 bp to 30 bp upstream of the TSS (transcription start site) represents the promising targeting window for targeted gene activation. (c) The GC content of sgRNA ranging between 35% and 60% typically induces higher activation efficiency. A poly-T (e.g., four or more continuous T) within the sgRNA sequence should be avoided. (d) The sgRNA binding the non-coding strand (the PAM of sgRNA located at the coding strand) of promoter region might result in higher activation than the coding strand.

Table 1 Protospacer sequences of AtFT and AtTCL1 for activation sgRNA

Forward (50 to 30 )

Reverse (50 to 30 )

AtFT-sgRNA-1

ggtcAGGATTTGCATTAACTCGGGT

aaacACCCGAGTTAATGCAAATCCT

AtFT-sgRNA-2

ggtcAGTGTATTAGTGTGGTGGGTT

aaacAACCCACCACACTAATACACT

AtTCL1-sgRNA-1

ggtcATTAATTTATTCAGTAAGTTA

aaacTAACTTACTGAATAAATTAAT

AtTCL1-sgRNA-2

ggtcACATGTATTATTATTACAAAG

aaacCTTTGTAATAATAATACATGT

Lowercase texts in oligonucleotides indicate the overhang sequences. The red texts in the forward sequences indicate an extra nucleotide adenine (A) is added to the 50 end of the sgRNA. Correspondingly, in reverse sequence, a nucleotide thymine (T) is added to the 30 end of the sgRNA

CRISPR Activation in Arabidopsis

33

3. Affix 50 nucleotide overhangs for sgRNA cloning and then synthesize sgRNAs fused with overhang sequence as duplexed DNA oligonucleotides (see Table 1). In the case of using pYPQ131B2.0 to pYPQ136B2.0, the 4-nt overhang sequences 50 -ggtc-30 and 50 -aaac-30 are added at the 50 end of forward and reverse sgRNA sequences, respectively. Notably, the 50 end of sgRNAs should start with “A” because an AtU3 promoter is used in pYPQ131B2.0 to pYPQ136B2.0 for sgRNA expression. An extra DNA base “A” needs to be added at the 50 end of sgRNA when its first nucleotide is not “A”. 4. During the synthesis of gRNA oligonucleotides, linearize gRNA entry plasmids pYPQ131B2.0 to pYPQ136B2.0 using BsmBI (Eps3I) following the recommended formula. Digestion of 2 μg plasmids at 37  C for 3 h and then clean up the digested plasmids with the QIAquick Gel Extraction Kit (see Note 6). Digestion pYPQ13XB2.0

2 μg

Buffer Tango (10)

5 μL

DTT (10 mM)

5 μL

Esp3I (BsmBI) (10 U/μL)

2 μL

Nuclease-free water

Up to 50 μL

The pYPQ13x2.0 indicates any one of pYPQ131B2.0 to pYPQ136B2.0

5. Phosphorylate and anneal sgRNA oligonucleotides. The lyophilized DNA oligos are prepared in a final concentration of 100 μM using sterilized water. Then, phosphorylate these DNA oligos using T4 PNK (polynucleotide kinase) at 37  C for 30 min. Next, anneal phosphorylated oligos by putting these samples in boiling water. Let the water cool down to room temperature. Dilute annealed oligos at a 1:200 ratio for sgRNA cloning at the next step. Phosphorylation sgRNA forward oligo (100 μM)

1 μL

sgRNA reverse oligo (100 μM)

1 μL

T4 PNK Reaction Buffer (10)

1 μL

T4 PNK (10 U/μL)

0.5 μL

Nuclease-free water

Up to 10 μL

34

Changtian Pan and Yiping Qi

Fig. 3 Schematic illustrations of assembling sgRNAs for multiplexed gene activation. Four sgRNAs are inserted into the BsmBI-digested gR2.0 (guide RNA scaffold containing two MS2 RNA aptamers) entry plasmids pYPQ131B2.0 to pYPQ134B2.0 and then assembled using Golden Gate cloning. The final T-DNA expression vector is constructed by Gateway cloning-mediated assembly of dCas9-Act3.0 activator and sgRNA array cassettes into a destination vector pYPQ202

6. Ligate annealed DNA oligos separately into the BsmBI-linearized pYPQ131B2.0 to pYPQ134B2.0 using T4 ligase (Fig. 3). Incubate reactions at 22  C for 3 h and transform 10 μL of the reaction into 50 μL E. coli DH5α competent cells. Plate E. coli cells onto LB solid plates with tetracycline (10 μg/mL), and incubate plates overnight at 37  C. Ligation BsmBI linearized plasmid

15 ng

Diluted annealed oligos (1:200 dilution)

1 μL

T4 DNA Ligase Buffer (10)

1 μL

T4 DNA Ligase (400 U/μL)

1 μL

Nuclease-free water

Up to 10 μL

7. Pick two colonies for each vector, miniprep, and confirm sgRNA insertion using Sanger sequencing with primer pTC14-F2. 8. To construct a multiplexed sgRNA expression vector (4 sgRNAs in the example here), assemble four sgRNA cassettes (pYPQ131B2.0 to pYPQ134B2.0) into a single Gateway compatible entry vector pYPQ144 using Golden Gate assembly (see Note 7).

CRISPR Activation in Arabidopsis

35

Golden Gate assembly pYPQ131B2.0

100 ng

pYPQ132B2.0

100 ng

pYPQ133B2.0

100 ng

pYPQ134B2.0

100 ng

pYPQ144

100 ng

T4 DNA Ligase Buffer (10)

2 μL

BsaI

1 μL

T4 DNA Ligase (400 U/μL)

1 μL

Nuclease-free water

Up to 20 μL

Steps

Temperature ( C)

Time (min)

Cycles

1 2

37 16

5 10

10 (Step1 + Step 2)

3

50

5

1

4

80

5

1

9. Transform the above reaction into 50 μL E. coli DH5α competent cells using heat shock method. Plate E. coli cells onto LB solid plates containing 50 mg/L spectinomycin, 200 mg/L X-gal, and 0.1 mM IPTG for blue-white screening (see Note 8). Incubate plates overnight at 37  C. 10. Pick two white colonies for each vector, miniprep, and digest plasmids with BamHI and XhoI with Cutsmart buffer for 2 h. To confirm the successful golden gate reaction, run the digestion products on a 1% agarose gel. Further, confirm the sequence of assembled multiple sgRNA expression cassettes using Sanger sequencing with M13-F and M13-R (see Note 9). 11. Assemble three Gateway reaction components to generate the final CRISPR–Act3.0 T-DNA vector (Fig. 3). Notably, the entry vector pYPQ144 without sgRNA assembling is also used to perform the Gateway reaction parallelly to generate the control CRISPR–Act3.0 T-DNA vector. Incubate the reaction at 25  C for at least 5 h. Then, transform the reaction product into E. coli DH5α competent cells using the heat shock method. Plate E. coli cells on LB solid medium containing 50 mg/L kanamycin (see Note 10) and incubate at 37  C overnight.

36

Changtian Pan and Yiping Qi

Gateway LR reaction Cas9 entry vector

1 μL (150 ng)

Multiple sgRNAs vector

1 μL (100 ng)

Destination vector

2 μL (200 ng)

LR Clonase II

1 μL

Total

5 μL

12. Pick 2 colonies for each vector, miniprep, and verify the CRISPR–Act3.0 T-DNA plasmids by restriction digestion using EcoRI (see Note 11). Run the digestion products on a 1% agarose gel to confirm a successful reaction. Neither PCR amplification nor Sanger sequencing is needed. 3.3 Floral Dipping Transformation of Arabidopsis Plants

1. Transformation of CRISPR–Act3.0 T-DNA and control plasmids into Agrobacterium chemically competent cells GV3101 using the freeze-thaw method, respectively. Pick 3–5 colonies for each vector (see Note 12) into 5 mL liquid LB medium containing both 50 mg/L Kanamycin and 15 mg/L Rifampin for binary vector selection. Incubate culture at 28  C overnight. 2. Transfer this feeder culture into a 500 mL liquid LB containing both 50 mg/L Kanamycin and 15 mg/L Rifampin and grow the culture at 28  C until the OD ¼ 1.5–2.0 (around 24 h). 3. Spin down Agrobacterium cells by centrifugation at 4000 g for 10 min at room temperature and resuspend cells in 500 mL 5% (wt/vol) fresh sucrose solution by gentle vortex. Add 100 μL Silwet L-77 (0.02% vol/vol) per 500 mL of resuspended solution and mix well with a stirring bar (see Note 13). 4. Transfer the resuspended Agrobacterium solution to a 500 mL beaker. Invert the pot and dip the aerial parts/flower buds of Arabidopsis plants in the cell suspension of Agrobacterium for 1 min and then remove the dipped plants from the suspension [10]. 5. Wrap the dipped plants with plastic film and lay pots sideways in a black plastic tray and cover with a second black tray (see Note 14) overnight. 6. Remove the cover of plants and place the plants upright in the growth chamber for 1 month to harvest T1 seeds.

3.4 Quantitate the Activation Level of Target Genes in Transgenic Plants

1. T1 transgenic seeds and control seeds (see Note 15) are sterilized using 50% bleach with 0.05% Tween 20 for 10 min and then rinsed with sterile water 5 times. 2. Resuspend the sterilized seeds in sterile 0.05% agarose using a 10 mL culture tube at 4  C for 2 days. Spread ~3 mL T1

CRISPR Activation in Arabidopsis

37

Fig. 4 CRISPR–Act3.0-mediated simultaneous activation of AtFT and AtTCL1 in Arabidopsis plants. (a) Early flowering phenotype in the T1 population of CRISPR–Act3.0 transgenic plants. Most of T1 CRISPR–Act3.0 transgenic plants displayed an early flowering phenotype, an anticipated phenotype for robust AtFT overexpression. In addition, the numbers of trichomes per leaf of CRISPR–Act3.0 lines were significantly decreased, an anticipated phenotype for robust AtTCL1 overexpression. (b) Analysis of AtFT and AtTCL1 expression level in T1 CRISPR–Act3.0 transgenic plants. All data are presented as the mean  s.d. (n ¼ 3 technical replicates). EF-1α is used as the endogenous control gene

transgenic and control seed-agarose suspension onto MS plates with hygromycin (15 μg/mL) (see Note 16), respectively. Culture them in the same growth chamber. 3. After 10 days, move seedlings with long roots and true leaves to the soil within a 72 Cells Seedling Tray. Culture these plants in the same growth chamber under a long-day condition (16 h light/8 h dark) at 22  C. 4. Allow these plants to grow for another week to recover. Then, try to observe the putative phenotype and sample the desired plant organs or tissues for RNA extraction (Fig. 4a). Meanwhile, the same organs or tissues of control seedlings are sampled as control. 5. The total RNA of collected samples is isolated using TRIzol RNA Isolation Reagents. Then, remove contaminating genomic DNA from RNA samples using DNase I according to the manufacturer’s recommendations. Next, the cDNA synthesis is performed based on the SuperScript™ III First-Strand Synthesis System. 6. Use both AzuraView™ GreenFast qPCR Blue Mix and RealTime PCR Detection System to quantitate the mRNA levels of AtFT and AtTCL1 with their specific qRT-PCR primers (see Table 2). EF-1α (AT5G60390) is used as the endogenous control gene (see Table 2). The Ct value of EF-1α should be adjusted to around 20.

38

Changtian Pan and Yiping Qi

Table 2 Primers for real-time reverse transcription PCR (qRT-PCR) Gene

Sequence 50 to 30

qPCR-AtFT-F

GACCTCAGGAACTTCTATACTTTGGTTATG

qPCR-AtFT-R

CTGTTTGCCTGCCAAGCTG

qPCR-AtTCL1-F

CCCAAGTTCACTCATAGCTCTCAAGAAG

qPCR-AtTCL1-R

GAGTGACTTTTGCGTCATTTGTGGGAG

qPCR-EF-1α-F

TGAGCACGCTCTTCTTGCTTTCA

qPCR-EF-1α-R

GGTGGTGGCATCCATCTTGTTACA

7. To analyze the activation level, the fold changes of target genes are calculated using the 2ΔΔCt method [11] (Fig. 4b). The AtFT and AtTCL1 mRNAs were increased by 150- to 200-fold and 5- to 10-fold, respectively (Fig. 4b).

4

Notes 1. The pYPQ202 vector contains an Arabidopsis ubiquitin 10 promoter (AtUBQ10) to drive the expression of dzCas9. It also contains a hygromycin resistance gene for transgenic plant selection. Other comparable destination vectors with the desired promoter and selective marker can also be used. One can also replace the AtUBQ10 promoter in pYPQ202 with a promoter of interest like a tissue-specific promoter or inducible promoter to achieve tissue-specific or inducible gene activation. 2. Antibiotics and chemical reagents, including tetracycline, spectinomycin, kanamycin, hygromycin, timentin, and IPTG are dissolved in sterilized water. While the rifampin and X-Gal are dissolved in methanol and ethanol, respectively. All stock solutions are filtered using a 0.45 μm syringe filter and then kept at 20  C. 3. This step is optional and used to break the dormancy of seeds, which allows uniform and maximal seed germination. 4. The bridal veil or a 4-mm Polyester Hex Mesh can prevent the loss of soil when the Arabidopsis plants are inverted for dipping. 5. The promoter sequence doesn’t include the 50 untranslated region (UTR) which is a regulatory region of DNA located at the 50 end of protein-coding genes. 6. Gel extraction instead of PCR clean-up of digested products is encouraged to minimize the unintentional collection of undigested plasmids.

CRISPR Activation in Arabidopsis

39

Table 3 Assembly of multiple sgRNAs Number of sgRNA assembly Entry plasmids

2

3

4

5

6

pYPQ131B2.0

o

o

o

o

o

pYPQ132B2.0

o

o

o

o

o

o

o

o

o

o

o

o

o

o

pYPQ133B2.0 pYPQ134B2.0 pYPQ135B2.0

o

pYPQ136B2.0 Gateway compatible entry vector

pYPQ142

pYPQ143

pYPQ144

pYPQ145

pYPQ146

7. The choice of Gateway-compatible entry vector (pYPQ142 to pYPQ146) depends on how many sgRNAs need to be assembled into one vector (see Table 3). 8. Alternatively, 70 μL of 0.1 M IPTG and 70 μL of 20 mg/mL X-gal can be spread onto the appearance of LB solid medium and wait for the chemicals to dry (20 min in the flow hood) before plating the transformed cells. 9. Both forward and reverse sgRNA sequences could be used as sequencing primers to analyze the sequence of upstream or downstream assembled sgRNA expression cassettes. 10. Corresponding antibiotics are used based on the selection of destination vectors. 11. The choice of restriction enzymes depends on the destination vector. 12. It is encouraged to confirm the presence of the CRISPR– Act3.0 construct in the Agrobacterium strains used for transformation by digestion of re-miniprepped plasmids with the appropriate restriction enzymes. 13. The 500 mL of Agrobacterium cell suspension is sufficient for the transformation of at least 15 Arabidopsis plants. We usually dip five plants for one construct. Dip plants twice separated by a 7 day interval will transform new flower buds and hence can significantly increase the transformation frequency. 14. It is important to maintain a high-humidity environment for the dipped plants to ensure high transformation efficiency. In addition, one should not expose the dipped plants to high temperatures.

40

Changtian Pan and Yiping Qi

15. The control plants are prepared as control for future qRT-PCR assays. Thus, the control and T1 transgenic plants are cultured under the same conditions until sampling. 16. Alternatively, one can dry seeds and plates with a laminar flow hood until the agarose solution dries up.

Acknowledgments This work was supported by the National Science Foundation Plant Genome Research Program grants (award nos. IOS-1758745 and IOS-2029889) and Biotechnology Risk Assessment Grant Program competitive grants (award nos. 2018-33522-28789 and 2020-33522-32274) from the US Department of Agriculture. References 1. Dominguez AA, Lim WA, Qi LS (2016) Beyond editing: repurposing CRISPR-Cas9 for precision genome regulation and interrogation. Nat Rev Mol Cell Biol 17:5–15. https:// doi.org/10.1038/nrm.2015.2 2. Pan C, Sretenovic S, Qi Y (2021) CRISPR/ dCas-mediated transcriptional and epigenetic regulation in plants. Curr Opin Plant Biol 60: 101980. https://doi.org/10.1016/j.pbi. 2020.101980 3. Lowder LG, Zhou J, Zhang Y et al (2018) Robust transcriptional activation in plants using multiplexed CRISPR-Act2.0 and mTALE-Act systems. Mol Plant 11:245–256. https://doi.org/10.1016/j.molp.2017. 11.010 4. Qi LS, Larson MH, Gilbert LA et al (2013) Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell 152:1173–1183. https://doi. org/10.1016/j.cell.2013.02.022 5. Konermann S, Brigham MD, Trevino AE et al (2015) Genome-scale transcriptional activation by an engineered CRISPR-Cas9 complex. Nature 517:583–588. https://doi.org/10. 1038/nature14136 6. Li Z, Zhang D, Xiong X et al (2017) A potent Cas9-derived gene activator for plant and mammalian cells. Nat Plants 3:930–936.

https://doi.org/10.1038/s41477-0170046-0 7. Papikian A, Liu W, Gallego-Bartolome´ J, Jacobsen SE (2019) Site-specific manipulation of Arabidopsis loci using CRISPR-Cas9 SunTag systems. Nat Commun 10:1–11. https:// doi.org/10.1038/s41467-019-08736-7 8. Selma S, Bernabe´-Orts JM, Vazquez-Vilar M et al (2019) Strong gene activation in plants with genome-wide specificity using a new orthogonal CRISPR /Cas9-based Programmable Transcriptional Activator. Plant Biotechnol J 1–3:1703. https://doi.org/10.1111/ pbi.13138 9. Pan C, Wu X, Markel K et al (2021) CRISPR– Act3.0 for highly efficient multiplexed gene activation in plants. Nat Plants 7:942–953. https://doi.org/10.1038/s41477-02100953-7 10. Zhang X, Henriques R, Lin SS et al (2006) Agrobacterium-mediated transformation of Arabidopsis thaliana using the floral dip method. Nat Protoc 1:641–646. https://doi. org/10.1038/nprot.2006.97 11. Livak KJ, Schmittgen TD (2001) Analysis of relative gene expression data using real-time quantitative PCR and the 2-ΔΔCT method. Methods 25:402–408. https://doi.org/10. 1006/meth.2001.1262

Chapter 4 Single Cell RNA-Sequencing in Arabidopsis Root Tissues Yuji Ke, Max Minne, Thomas Eekhout, and Bert De Rybel Abstract Droplet-based single-cell RNA-sequencing (scRNA-seq) empowers transcriptomic profiling with an unprecedented resolution, facilitating insights into the cellular heterogeneity of tissues, developmental progressions, stress-response dynamics, and more at single-cell level. In this chapter, we describe the experimental workflow of processing Arabidopsis root tissue into protoplasts and generating single-cell transcriptomes. We also describe the general computational workflow of visualizing and utilizing scRNAseq data. This protocol can be used as a starting point for establishing a scRNA-seq workflow. Key words Arabidopsis, FACS, Protoplast, Root, scRNA-seq, Single cell, Transcriptomics

1

Introduction The establishment of droplet-based single-cell RNA-sequencing (scRNA-seq) in plants has allowed for the construction of cell atlases and an unprecedented resolution in resolving questions about cellular progression during development and unraveling stress-response dynamics [1–3]. Practically, the process of scRNAseq involves material collection, generating protoplasts, cell enrichment, gel emulsion (GEM) generation, and sequencing; followed by data preprocessing, quality control (QC), clustering, and visualization [4]. Protoplasts are plant cells whose cell walls have been removed. Historically, the generation of plant protoplasts has mostly been used for cell cultures and transformation purposes, but it has also been used to extract large amounts of cell type-specific mRNA for bulk RNA sequencing [5–7] and only more recently for scRNA-seq [8–30]. Because of this, many existing protocols to generate protoplasts prioritize viability and longevity. However, for use in scRNA-seq, efficiency and speed are more important to minimize transcriptional perturbation. This is achieved through efficient collection of materials and fast processing prior to loading. In order

Kerstin Kaufmann and Klaas Vandepoele (eds.), Plant Gene Regulatory Networks: Methods and Protocols, Methods in Molecular Biology, vol. 2698, https://doi.org/10.1007/978-1-0716-3354-0_4, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023

41

42

Yuji Ke et al.

for protoplasts to be physically and chemically compatible with microfluidic technology, extra filtering and washing steps can be used, as well as fluorescence-activated cell sorting (FACS). Additionally, flow cytometry can be used to collect cells based on their distinctive characteristics, distinguish dead cells or debris from live cells, or to enrich cell-type specific protoplasts using fluorescent marker lines. Protoplasts are typically generated by enzymatically digesting the cell wall, and numerous protocols exist, which have proven to be effective in a range of plant species and tissues [8–34]. However, most lack step-by-step details or critical-but-often-ignored tips for researchers to follow. In this chapter, we aim to describe in great detail how we process Arabidopsis root material from fresh tissue to large quantities of viable protoplasts as quickly and efficiently as possible. We showcase the step-by-step computational workflow of how to visualize and utilize the scRNA-seq data generated with the Chromium platform from 10× Genomics, including user-friendly procedures, alternative methods, and useful tips and tricks. One can expect that following this protocol will readily generate at least 100,000 FACS sorted high-quality protoplasts by plating 1 mL of Arabidopsis seeds. Using the computational workflow present in this chapter and the practical R script through the G i t H u b l i n k ( h t t p s : // g i t h u b . c o m / v i b s c c / P l a n tSingleCellAnalysis), one can not only repeat our data analysis workflow [17, 26] but also tailor the scripts according to the tips and tricks to fit one’s specific needs.

2

Materials Prepare all solutions in milli-Q water unless stated otherwise, and store at room temperature. Buffers are best prepared fresh on the day of the experiment.

2.1

Plant Material

1. 120 × 120 mm square Petri dishes. 2. Solid (1% agarose) ½ Murashige and Skoog (MS) medium without sucrose. 3. Nylon sieve mesh, pore size 20 μm (SEFAR NITEX) (see Note 1). 4. 1 mL (0.6 g) Col-0 Arabidopsis seeds (see Note 2). 5. 6 mL 0.1% UltraPure™ Agarose (Invitrogen). 6. Scalpel handle and blades #22 or smaller.

2.2 Generating Protoplasts

The following enzyme buffer was optimized for Arabidopsis roots and will likely need significant adjustment or optimization for other tissues and species. Several protocols have been published for other species like rice, maize, poplar, and legumes [31–34] and we refer to these for the relevant compositions.

Single Cell RNA-Sequencing in Arabidopsis Roots

43

1. Enzyme buffer: 1.5% (w/v) Cellulase Y-C, 0.1% (w/v) Pectolyase Y-23, 400 mM Mannitol, 20 mM MES, 20 mM KCl, 10 mM CaCl2. 2. Minimal washing buffer (see Note 3): 8% Mannitol. 3. Complete washing buffer: 400 mM Mannitol, 10 mM CaCl2, 20 mM MES, 20 mM KCl. 4. Corning™ Falcon 15 mL Conical Centrifuge Tubes. 5. Corning™ Falcon 50 mL Conical Centrifuge Tubes. 6. Falcon™ 40 μm and 70 μm Cell Strainers. 7. 100 mL glass Erlenmeyer flasks with screw cap. 2.3 Quality/Quantity Assessment 2.4

Flow Cytometry

1. Fast-Read 102® plastic counting chamber (BioLab). 2. DIC light microscope (e.g., Olympus BX51). 1. 1 mg/mL stock 4′,6-diamidino-2-phenylindole (DAPI) (Invitrogen). 2. 1 mg/mL stock Propidium iodide (PI) powder (Sigma-Aldrich). 3. Flow cytometer (e.g., BD FACSAria™ II cell sorter) (see Note 4). 4. VWR® Non-treated, 24 wells Tissue Culture Plates.

2.5 Library Preparation and Sequencing

1. Chromium Next GEM Single Cell 3′ Reagent Kits (V3.1 chemistry, 10× Genomics). 2. Chromium Controller (10× Genomics). 3. PCR machine (Deep Well). 4. NovaSeq6000 (Illumina).

3

Methods Carry out all experimental procedures at room temperature, unless otherwise specified.

3.1 Plant Material Preparation

1. Cut the nylon sieve mesh into 100 × 100 mm pieces and sterilize (see Note 5). 2. Fill 20 square Petri dishes each with 50 mL sterile ½ MS without sucrose. 3. Once solidified, place one 100 × 100 mm piece of mesh on each plate, using sterile tweezers. Leave the mesh to settle in the laminar airflow (LAF) for approximately 5–10 min before closing the lids (see Note 6). 4. Suspend 1 mL sterilized seeds in 5–6 mL sterilized 0.1% agarose (see Note 7). 5. Using a 1000 μL tip, pipette the seed suspension in two dense, straight lines on the mesh (see Note 8).

44

Yuji Ke et al.

6. Let the plated seed suspensions dry in the LAF for 5–10 min before closing the Petri dishes with micropore tape or parafilm so they will not droop after being placed upright. 7. After having grown vertically for 5–7 days the roots are ready for harvesting (see Note 9). 8. Check every plate for contamination and discard any infected plates (see Note 10). 3.2 Protoplast Generation

1. Prepare the protoplast enzyme buffer in milli-Q water fresh before harvesting and adjust the pH to 5.7 (see Note 11). 2. Cut the root tips of the seedlings at 0.5 cm with a scalpel (see Notes 12 and 13). 3. Transfer roots, split into batches of 10 plates, into separate flasks, each containing 10 mL enzyme buffer (see Note 14). 4. Incubate for 1 h, gently shaking on a benchtop orbital shaker (70 rpm). 5. At the end of the enzymatic incubation, use 1000 μL tips to gently pipette the suspension up and down until it becomes cloudy, indicating the release of protoplasts. At this point, two options are possible for further clean-up before loading the protoplasts (see Note 15), either sample preparation for direct loading (Subheading 3.3, see Note 16) or sample preparation for flow cytometry (Subheading 3.4, see Note 17).

3.3 Sample Preparation for Direct Loading

To minimize the time between cutting roots and loading the processed sample on the 10× Chromium Controller, the protoplasts can be cleaned using only washing, filtering, and centrifugation. 1. Place a 70 μm cell strainer onto a 50 mL falcon tube and pipette 1 mL of washing buffer through them. 2. Carefully pour the protoplast suspension from one flask through the 70 μm cell strainer in the tube (see Note 18). 3. Rinse the now empty flask with 2 mL of washing buffer, which can be poured over the strainer into the tube as well. 4. Repeat Steps 2–3 with the rest of the flasks in the same falcon tube, using a new strainer for every flask (see Note 19). 5. Transfer the protoplast suspension to a 15 mL falcon tube, and centrifuge at 200 g for 6 min, which should result in a thick, clearly visible pellet (see Note 20). 6. Discard the supernatant and resuspend the pellet in 10 mL of minimal washing buffer (8% mannitol). 7. Centrifuge the suspension at 200 g for 6 min, which should again result in a thick, clearly visible pellet. Discard supernatant and leave around 400 μL of washing buffer in which the protoplasts can be resuspended.

Single Cell RNA-Sequencing in Arabidopsis Roots

3.4 Sample Preparation for Flow Cytometry

45

If filtering/washing is insufficient to clean up the sample, it might be desirable to make use of a flow cytometer. This will help retain living cells and remove unviable protoplasts and/or debris. The same steps can be used to extract specific cell types using a fluorescent reporter. 1. Place a 70 μm cell strainer onto a 50 mL falcon tube and pipette 1 mL of washing buffer through them. 2. Carefully pour the protoplast suspension from one flask through the 70 μm cell strainer in the tube. 3. Rinse the now empty flask with 2 mL of washing buffer, which can be poured over the strainer into the tube as well. 4. Repeat Steps 2–3 with the rest of the flasks in the same falcon tube, using a new strainer for every flask. 5. Transfer the protoplast suspension to a 15 mL falcon tube, and centrifuge at 200 g for 6 min, which should result in a thick, clearly visible pellet. 6. Finally, discard most of the supernatant using a pipette, leaving around 400 μL of buffer in which the protoplasts can be resuspended. 7. Transfer 350 μL cell suspension into a pre-cooled FACS tube. 8. Add PI or DAPI to a final concentration of 14 μL and incubate the mixture for 1 min (see Note 21). 9. Perform cell sorting at 4 °C using the largest nozzle of a sorter (e.g., the 100 μm nozzle on a BD FACSAria II). 9.1. Size exclusion is performed with the forward scatter [SSC-A vs. FSC-A] to discriminate between cells and debris (Fig. 1a).

Fig. 1 Gating strategy using FACS after protoplast generation (a) first gating for size exclusion to discriminate between cells and debris, (b) second gating to isolate viable protoplasts from the cell fraction based on fluorescence, (c) gated protoplasts in a 24-well plate. Scale bar represents 100 μm

46

Yuji Ke et al.

9.2. PI (excitation max. at 488 nm; emission max. at 617 nm) or DAPI (excitation max. at 358 nm; emission max. at 461 nm) fluorescence is determined in the cell fraction [DAPI/PI vs. FSC-A] (Fig. 1b). Both dyes are membrane impermeant and thus excluded from viable cells. Cells with low fluorescence are selected as living protoplasts and sorted. 9.3. Use a 24-well plate to capture 100,000 sorted cells (Fig. 1c). Each sorted population is collected in 350 μL minimal washing buffer. 3.5 Loading Protoplasts

Determine the protoplast concentration prior to loading (see Note 22). 1. Load 7 μL of protoplast suspension in a counting chamber. 2. Count the number of round-looking cells in N squares. 3. The final concentration is calculated as: Cells=μL =

ð

cells counted in square N Þ × 10 : N

4. Adjust to 700–1200 cells/μL (see Note 23). 5. Proceed with following instructions from either 10× Chromium Next GEM 3′ v3.1 Rev E Demonstrated protocol (CG000315 Rev E, for Dual index libraries) or 10× Chromium Next GEM 3′ v3.1 Rev D Demonstrated protocol (CG000204 Rev D, for Single index libraries) (see Note 24). 3.6 10× Genomics Sample Preparation, Library Construction, and Sequencing

Sequencing libraries are prepared according to 10× Genomics protocols, where the number of expected cells will determine the amount of PCR cycles needed to construct the cDNA library. After adding sample indices to each sequencing library, they are then loaded on a compatible sequencer (e.g., Illumina HiSeq4000 or NovaSeq6000) (see Note 25).

3.7 Computational Workflow

Once sequencing is done, raw data are demultiplexed and aligned to the reference genome using 10× CellRanger to generate a digital gene expression matrix of UMI counts per gene (in rows) and per cell (in columns). Step-by-step analyses are listed below.

3.7.1 Build the Arabidopsis Genome Reference

1. Download the Arabidopsis thaliana reference genome using the wget command in a terminal emulator (Mac Terminal or Windows MobaXterm): wget http://ftp.ensemblgenomes.org/pub/plants/release-52/fasta/arabidopsis_thaliana/dna/Arabidopsis_thaliana.TAIR10.dna.toplevel.fa.gz

Single Cell RNA-Sequencing in Arabidopsis Roots

47

2. Decompress the reference genome fasta file: gunzip Arabidopsis_thaliana.TAIR10.dna.toplevel. fa.gz

3. Download the Arabidopsis thaliana gene annotation gtf file: wget http://ftp.ensemblgenomes.org/pub/plants/release-52/gtf/arabidopsis_thaliana/Arabidopsis_thaliana.TAIR10.52.gtf.gz

4. Decompress the gene annotation gtf file: gunzip Arabidopsis_thaliana.TAIR10.52.gtf.gz

5. Run cellranger to make a custom reference (see Note 26): cellranger mkref --genome=Arabidopsis_thaliana_genome \ --fasta=Arabidopsis_thaliana.TAIR10.dna.toplevel. fa \ --genes=Arabidopsis_thaliana.TAIR10.52.gtf

6. Run cellranger count to generate expression matrix: cellranger count --id=sample \ --transcriptome=/path-to-Arabidopsis_thaliana_genome \ --fastqs=/path-to-fastq_files \ --sample=id_of_sample_fasta

This will generate a list of files including the raw/filtered feature-barcode matrices, which you can use as the input for downstream analyses in R (see Note 27). Here we briefly describe a general workflow to process the gene expression data using the filtered feature-barcode matrices through Seurat [35] and also provide a practical R script (https://github. com/vibscc/PlantSingleCellAnalysis) used to analyze the Arabidopsis dataset present in [17]. 3.7.2 Read the Filtered Feature-Barcode Matrices in R and Clean the Data

Use Read10X() function to read in the filtered feature-barcode matrices from the above and clean the data. 1. Load the essential R packages and Arabidopsis dataset: library(Seurat) library(scater) library(ggpubr) library(dplyr) count.matrix