MicroRNA Target Identification: Methods and Protocols [1st ed.] 978-1-4939-9206-5;978-1-4939-9207-2

This detailed book provides a comprehensive state-of-the-art presentation of all aspects of miRNA target identification,

431 56 12MB

English Pages XI, 359 [358] Year 2019

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

MicroRNA Target Identification: Methods and Protocols [1st ed.]
 978-1-4939-9206-5;978-1-4939-9207-2

Table of contents :
Front Matter ....Pages i-xi
A Practical Guide to miRNA Target Prediction (Most Mauluda Akhtar, Luigina Micolucci, Md Soriful Islam, Fabiola Olivieri, Antonio Domenico Procopio)....Pages 1-13
miRandb: A Metadatabase of Online Resources of miRNA and miRNA Targets (Seyed Hamid Aghaee-Bakhtiari)....Pages 15-30
Sfold Tools for MicroRNA Target Prediction (William Rennie, Shaveta Kanoria, Chaochun Liu, C. Steven Carmack, Jun Lu, Ye Ding)....Pages 31-42
The Role of Tertiary Structure in MicroRNA Target Recognition (Hin Hark Gan, Kristin C. Gunsalus)....Pages 43-64
Protocols for miRNA Target Prediction in Plants (Gaurav Sablok, Kun Yang, Xiaopeng Wen)....Pages 65-73
Improving miRNA Target Prediction Using CLASH Data (Xiaoman Li, Haiyan Hu)....Pages 75-83
Integration of Omics Data to Identify Cancer-Related MicroRNA (Luciano Cascione)....Pages 85-99
Functional Analysis of Genetic Variants and Somatic Mutations Impacting MicroRNA-Target Recognition: Bioinformatics Resources (Jesse D. Ziebarth, Anindya Bhattacharya, Yan Cui)....Pages 101-120
Construction and Analysis of miRNA Regulatory Networks (Antonella Mensi, Vincenzo Bonnici, Simone Caligola, Rosalba Giugno)....Pages 121-167
Identification of Disease–miRNA Networks Across Different Cancer Types Using SWIM (Giulia Fiscon, Federica Conte, Lorenzo Farina, Marco Pellegrini, Francesco Russo, Paola Paci)....Pages 169-181
Computational Methods to Investigate the Impact of miRNAs on Pathways (Salvatore Alaimo, Giovanni Micale, Alessandro La Ferlita, Alfredo Ferro, Alfredo Pulvirenti)....Pages 183-209
isoTar: Consensus Target Prediction with Enrichment Analysis for MicroRNAs Harboring Editing Sites and Other Variations (Rosario Distefano, Giovanni Nigita, Dario Veneziano, Giulia Romano, Carlo M. Croce, Mario Acunzo)....Pages 211-235
A Roadmap for the Computational Prediction and Experimental Validation of Competitive Endogenous RNAs (Florian A. Karreth)....Pages 237-250
Investigating miRNA–lncRNA Interactions: Computational Tools and Resources (Dario Veneziano, Gioacchino P. Marceca, Sebastiano Di Bella, Giovanni Nigita, Rosario Distefano, Carlo M. Croce)....Pages 251-277
Combing the Hairball: Improving Visualization of miRNA–Target Interaction Networks (Francesco Russo, Jessica Xin Hu, Jose Alejandro Romero Herrera, Søren Brunak)....Pages 279-289
Inconsistencies and Limitations of Current MicroRNA Target Identification Methods (Sophie Mockly, Hervé Seitz)....Pages 291-314
Experimental Validation of MicroRNA Targets: Luciferase Reporter Assay (Luisa Tomasello, Landon Cluts, Carlo M. Croce)....Pages 315-330
Experimental Validation of MicroRNA Targets: Mutagenesis of Binding Regions (Luisa Tomasello, Landon Cluts, Carlo M. Croce)....Pages 331-339
Experimental Validation of MicroRNA Targets: Analysis of MicroRNA Targets Through Western Blotting (Luisa Tomasello, Landon Cluts, Carlo M. Croce)....Pages 341-353
Back Matter ....Pages 355-359

Citation preview

Methods in Molecular Biology 1970

Alessandro Laganà Editor

MicroRNA Target Identification Methods and Protocols

METHODS

IN

MOLECULAR BIOLOGY

Series Editor John M. Walker School of Life and Medical Sciences University of Hertfordshire Hatfield, Hertfordshire, AL10 9AB, UK

For further volumes: http://www.springer.com/series/7651

MicroRNA Target Identification Methods and Protocols

Edited by

Alessandro Laganà Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York, NY, USA

Editor Alessandro Lagana` Department of Genetics and Genomic Sciences Mount Sinai School of Medicine New York, NY, USA

ISSN 1064-3745 ISSN 1940-6029 (electronic) Methods in Molecular Biology ISBN 978-1-4939-9206-5 ISBN 978-1-4939-9207-2 (eBook) https://doi.org/10.1007/978-1-4939-9207-2 Library of Congress Control Number: 2019936006 © Springer Science+Business Media, LLC, part of Springer Nature 2019 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Humana Press imprint is published by the registered company Springer Science+Business Media, LLC, part of Springer Nature. The registered company address is: 233 Spring Street, New York, NY 10013, U.S.A.

Preface MicroRNAs (miRNAs) are small endogenous RNA molecules that regulate gene expression at the post-transcriptional level. Since their discovery in nematodes 25 years ago, extensive molecular, clinical, and computational research has uncovered their fundamental role in a plethora of biological processes and pathways, their crucial involvement in disease initiation and progression, and their implications as biomarkers for prognosis and drug response. miRNAs exert their function by binding target messenger RNAs (mRNA) through sequence complementarity and, consequently, inhibiting their translation or promoting their degradation. Thus, a fundamental task in miRNA research is the reliable identification of miRNA targets. Several computational methods and laboratory techniques to predict and validate miRNA targets have been introduced and perfected over the past 20 years. Despite the remarkable progress achieved, the prediction of miRNA targets and the characterization of their function still represent a challenge, which calls for novel integrated systems biology approaches coupled with machine learning and novel deep sequencing techniques. The aim of this book is to provide a comprehensive state-of-the-art presentation of all aspects of miRNA target identification, from the prediction of miRNA binding sites on their target molecules to their experimental validation and downstream functional analysis. Some of the chapters review methodologies and tools for target prediction and related analysis. Other chapters offer step-by-step tutorials on specific tasks, tools, and analysis pipelines. The first eight chapters cover the principles of miRNA target recognition and the features incorporated in computational target prediction methods and tools. Chapter 1 provides an overview of the basic concepts in computational miRNA target prediction, introducing miRNA targeting features and the most commonly available tools. Chapter 2 presents a user-friendly web-based resource for miRNA research, which enables data retrieval from multiple databases and target prediction tools. Chapters 3 and 4 illustrate the importance of investigating secondary and tertiary structure of target RNA molecules for the identification of miRNA binding sites and describe related tools and computational approaches. Since most chapters in the book specifically cover target prediction in animals, Chapter 5 is specifically dedicated to introducing the main tools for miRNA target predictions in plants. Chapter 6 describes the use of crosslinking immunoprecipitation (CLIP) data to improve target prediction. Chapter 7 describes how to integrate miRNA and mRNA sequencing data to improve identification of miRNA targets. Chapter 8 discusses and describes computational resources to investigate the impact of genetic variants and somatic mutations on miRNA target recognition. Chapters 9–15 present methodologies and tools for downstream analysis of miRNAs and their targets. More specifically, Chapters 9–11 address various aspects concerning the analysis of miRNA in networks and pathways, with an emphasis on oncology applications. Chapter 12 presents a novel approach to investigate the impact of RNA editing on miRNA targeting and its functional consequences. Chapters 13 and 14 focus on the interaction between miRNAs and long noncoding RNAs, with particular focus on the role of the latter as competitive endogenous RNAs or miRNA “sponges,” which are molecules that compete with natural miRNA targets to fine-tune gene expression regulation. Chapter 15 is devoted to describing visualization approaches to represent miRNA-target interaction networks.

v

vi

Preface

The last four chapters are focused on the evaluation and validation of predicted miRNA targets. In particular, Chapter 16 discusses inconsistencies and limitations of in vitro and in silico methods for target identification, while Chapters 17–19 provide laboratory protocols for the three common techniques employed in the validation of specific targets: luciferase reporter assay, mutagenesis of the binding region, and western blotting. I am very thankful for having been given the unique opportunity to assemble and curate this volume, which offers a comprehensive overview and update on basic and advanced concepts involving miRNA target identification and provides an extremely useful guide to specific computational tasks and analysis in miRNA research. I would like to thank all the authors for their valuable contributions. My sincere gratitude also goes to John Walker, the series editor, and to David C. Casey at Springer for their guidance and support throughout the development of this book. New York, NY, USA

` Alessandro Lagana

Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

v ix

1 A Practical Guide to miRNA Target Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Most Mauluda Akhtar, Luigina Micolucci, Md Soriful Islam, Fabiola Olivieri, and Antonio Domenico Procopio 2 miRandb: A Metadatabase of Online Resources of miRNA and miRNA Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Seyed Hamid Aghaee-Bakhtiari 3 Sfold Tools for MicroRNA Target Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . William Rennie, Shaveta Kanoria, Chaochun Liu, C. Steven Carmack, Jun Lu, and Ye Ding 4 The Role of Tertiary Structure in MicroRNA Target Recognition . . . . . . . . . . . . . Hin Hark Gan and Kristin C. Gunsalus 5 Protocols for miRNA Target Prediction in Plants . . . . . . . . . . . . . . . . . . . . . . . . . . . Gaurav Sablok, Kun Yang, and Xiaopeng Wen 6 Improving miRNA Target Prediction Using CLASH Data . . . . . . . . . . . . . . . . . . . Xiaoman Li and Haiyan Hu 7 Integration of Omics Data to Identify Cancer-Related MicroRNA . . . . . . . . . . . . Luciano Cascione 8 Functional Analysis of Genetic Variants and Somatic Mutations Impacting MicroRNA-Target Recognition: Bioinformatics Resources . . . . . . . . . Jesse D. Ziebarth, Anindya Bhattacharya, and Yan Cui 9 Construction and Analysis of miRNA Regulatory Networks . . . . . . . . . . . . . . . . . . Antonella Mensi, Vincenzo Bonnici, Simone Caligola, and Rosalba Giugno 10 Identification of Disease–miRNA Networks Across Different Cancer Types Using SWIM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Giulia Fiscon, Federica Conte, Lorenzo Farina, Marco Pellegrini, Francesco Russo, and Paola Paci 11 Computational Methods to Investigate the Impact of miRNAs on Pathways . . . Salvatore Alaimo, Giovanni Micale, Alessandro La Ferlita, Alfredo Ferro, and Alfredo Pulvirenti 12 isoTar: Consensus Target Prediction with Enrichment Analysis for MicroRNAs Harboring Editing Sites and Other Variations. . . . . . . . . . . . . . . . Rosario Distefano, Giovanni Nigita, Dario Veneziano, Giulia Romano, Carlo M. Croce, and Mario Acunzo 13 A Roadmap for the Computational Prediction and Experimental Validation of Competitive Endogenous RNAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Florian A. Karreth

1

vii

15 31

43 65 75 85

101 121

169

183

211

237

viii

Contents

14

Investigating miRNA–lncRNA Interactions: Computational Tools and Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dario Veneziano, Gioacchino P. Marceca, Sebastiano Di Bella, Giovanni Nigita, Rosario Distefano, and Carlo M. Croce 15 Combing the Hairball: Improving Visualization of miRNA–Target Interaction Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Francesco Russo, Jessica Xin Hu, Jose Alejandro Romero Herrera, and Søren Brunak 16 Inconsistencies and Limitations of Current MicroRNA Target Identification Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sophie Mockly and Herve´ Seitz 17 Experimental Validation of MicroRNA Targets: Luciferase Reporter Assay . . . . . Luisa Tomasello, Landon Cluts, and Carlo M. Croce 18 Experimental Validation of MicroRNA Targets: Mutagenesis of Binding Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Luisa Tomasello, Landon Cluts, and Carlo M. Croce 19 Experimental Validation of MicroRNA Targets: Analysis of MicroRNA Targets Through Western Blotting . . . . . . . . . . . . . . . . . . . . . . . . . . . Luisa Tomasello, Landon Cluts, and Carlo M. Croce Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

251

279

291 315

331

341 355

Contributors MARIO ACUNZO  Division of Pulmonary Diseases and Critical Care Medicine, Virginia Commonwealth University, Richmond, VA, USA SEYED HAMID AGHAEE-BAKHTIARI  Bioinformatics Research Center, Mashhad University of Medical Sciences, Mashhad, Razavi Khorasan, Iran; Department of Medical Biotechnology, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Razavi Khorasan, Iran MOST MAULUDA AKHTAR  Bioinformatics, Asian University for Women, Chattogram, Bangladesh SALVATORE ALAIMO  Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy ANINDYA BHATTACHARYA  Department of Computer Science and Engineering, University of California, San Diego, CA, USA VINCENZO BONNICI  Department of Computer Science, University of Verona, Verona, Italy SØREN BRUNAK  Novo Nordisk Foundation Center for Protein Research, Translational Disease Systems Biology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark SIMONE CALIGOLA  Department of Computer Science, University of Verona, Verona, Italy C. STEVEN CARMACK  New York State Department of Health, Wadsworth Center, Center for Medical Science, Albany, NY, USA LUCIANO CASCIONE  Institute of Oncology Research, Universita` della Svizzera Italiana, Bellinzona, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland LANDON CLUTS  Department of Cancer Biology and Medical Genetics, Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA FEDERICA CONTE  Institute for Systems Analysis and Computer Science Antonio Ruberti, National Research Council, Rome, Italy; SysBio Centre for Systems Biology, Milan, Italy CARLO M. CROCE  Department of Cancer Biology and Medical Genetics, Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA YAN CUI  Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA; Center for Integrative and Translational Genomics, University of Tennessee Health Science Center, Memphis, TN, USA SEBASTIANO DI BELLA  NMS Oncology, Nerviano Medical Sciences Srl, Nerviano, Italy YE DING  New York State Department of Health, Wadsworth Center, Center for Medical Science, Albany, NY, USA ROSARIO DISTEFANO  Department of Cancer Biology and Genetics, Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA LORENZO FARINA  Department of Computer, Control, and Management Engineering Antonio Ruberti, Sapienza University of Rome, Rome, Italy ALFREDO FERRO  Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy GIULIA FISCON  Institute for Systems Analysis and Computer Science Antonio Ruberti, National Research Council, Rome, Italy; SysBio Centre for Systems Biology, Milan, Italy HIN HARK GAN  Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, USA

ix

x

Contributors

ROSALBA GIUGNO  Department of Computer Science, University of Verona, Verona, Italy KRISTIN C. GUNSALUS  Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, USA; Center for Genomics and Systems Biology, NYU Abu Dhabi, Abu Dhabi, United Arab Emirates HAIYAN HU  Department of Computer Science, University of Central Florida, Orlando, FL, USA JESSICA XIN HU  Novo Nordisk Foundation Center for Protein Research, Translational Disease Systems Biology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark MD SORIFUL ISLAM  Department of Gynecology and Obstetrics, Johns Hopkins University, School of Medicine, Baltimore, USA SHAVETA KANORIA  New York State Department of Health, Wadsworth Center, Center for Medical Science, Albany, NY, USA FLORIAN A. KARRETH  Department of Molecular Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA ALESSANDRO LA FERLITA  Department of Physics and Astronomy, University of Catania, Catania, Italy XIAOMAN LI  Burnett School of Biomedical Science, College of Medicine, University of Central Florida, Orlando, FL, USA CHAOCHUN LIU  New York State Department of Health, Wadsworth Center, Center for Medical Science, Albany, NY, USA JUN LU  Department of Genetics, Yale Stem Cell Center, Yale University, New Haven, CT, USA GIOACCHINO P. MARCECA  Department of Cancer Biology and Genetics, Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA; Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy ANTONELLA MENSI  Department of Computer Science, University of Verona, Verona, Italy GIOVANNI MICALE  Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy LUIGINA MICOLUCCI  Laboratory of Experimental Pathology, Department of Clinical and Molecular Sciences, Universita` Politecnica delle Marche, Ancona, Italy; Computational Pathology Unit, Department of Clinical and Molecular Sciences, Universita` Politecnica delle Marche, Ancona, Italy SOPHIE MOCKLY  IGH (CNRS and University of Montpellier), Montpellier, France GIOVANNI NIGITA  Department of Cancer Biology and Genetics, Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA FABIOLA OLIVIERI  Laboratory of Experimental Pathology, Department of Clinical and Molecular Sciences, Universita` Politecnica delle Marche, Ancona, Italy; Center of Clinical Pathology and Innovative Therapies, Italian National Research Center on Aging (INRCA-IRCCS), Ancona, Italy PAOLA PACI  Institute for Systems Analysis and Computer Science Antonio Ruberti, National Research Council, Rome, Italy; SysBio Centre for Systems Biology, Milan, Italy MARCO PELLEGRINI  Institute of Informatics and Telematics, National Research Council, Pisa, Italy ANTONIO DOMENICO PROCOPIO  Laboratory of Experimental Pathology, Department of Clinical and Molecular Sciences, Universita` Politecnica delle Marche, Ancona, Italy; Center of Clinical Pathology and Innovative Therapies, Italian National Research Center on Aging (INRCA-IRCCS), Ancona, Italy

Contributors

xi

ALFREDO PULVIRENTI  Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy WILLIAM RENNIE  New York State Department of Health, Wadsworth Center, Center for Medical Science, Albany, NY, USA GIULIA ROMANO  Division of Pulmonary Diseases and Critical Care Medicine, Virginia Commonwealth University, Richmond, VA, USA JOSE ALEJANDRO ROMERO HERRERA  Novo Nordisk Foundation Center for Protein Research, Translational Disease Systems Biology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark FRANCESCO RUSSO  Novo Nordisk Foundation Center for Protein Research, Translational Disease Systems Biology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark GAURAV SABLOK  Finnish Museum of Natural History (Botany), University of Helsinki, Helsinki, Finland; OEB Research Programme, Department of Biological and Environmental Sciences, Viikki Plant Science Center, Helsinki, Finland HERVE´ SEITZ  IGH (CNRS and University of Montpellier), Montpellier, France LUISA TOMASELLO  Department of Cancer Biology and Medical Genetics, Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA DARIO VENEZIANO  Department of Cancer Biology and Genetics, Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA XIAOPENG WEN  Key Laboratory of Plant Resources Conservation and Germplasm Innovation in Mountainous Region, Ministry of Education, Institute of AgroBioengineering, College of Life Sciences, Guizhou University, Guiyang, China KUN YANG  Key Laboratory of Plant Resources Conservation and Germplasm Innovation in Mountainous Region, Ministry of Education, Institute of Agro-Bioengineering, College of Life Sciences, Guizhou University, Guiyang, China JESSE D. ZIEBARTH  Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA; Center for Integrative and Translational Genomics, University of Tennessee Health Science Center, Memphis, TN, USA

Chapter 1 A Practical Guide to miRNA Target Prediction Most Mauluda Akhtar, Luigina Micolucci, Md Soriful Islam, Fabiola Olivieri, and Antonio Domenico Procopio Abstract MicroRNAs (miRNAs) are small endogenous noncoding RNA molecules that posttranscriptionally regulate gene expression. Since their discovery, a huge number of miRNAs have been identified in a wide range of species. Through binding to the 30 UTR of mRNA, miRNA can block translation or stimulate degradation of the targeted mRNA, thus affecting nearly all biological processes. Prediction and identification of miRNA target genes is crucial toward understanding the biology of miRNAs. Currently, a number of sophisticated bioinformatics approaches are available to perform effective prediction of miRNA target sites. In this chapter, we present the major features that most algorithms take into account to efficiently predict miRNA target: seed match, free energy, conservation, target site accessibility, and contribution of multiple binding sites. We also give an overview of the frequently used bioinformatics tools for miRNA target prediction. Understanding the basis of these prediction methodologies may help users to better select the appropriate tools and analyze their output. Key words MicroRNA, Target prediction, Seed match, Conservation, Free energy, Site accessibility, Bioinformatic tools

1

Introduction Since their discovery in 1993 [1], miRNAs have added a new paradigm in the understanding of gene regulatory network. MiRNAs are a set of tiny endogenous noncoding RNAs of 22 nucleotides in length that have been found in diverse plants, animals, and viruses [2, 3]. Currently, utilizing different experimental and computational approaches, a total of 30,424 mature miRNAs from 206 species, including 2578 mature human miRNAs, have been identified and deposited in the publicly available MiRBase database [3]. This database (Release 22, available from March 2018) contains 38,589 entries representing hairpin precursor miRNAs, expressing 48,885 mature miRNA products, in 271 species (http://www.mirbase.org/ and ftp://mirbase.org/pub/mirbase/).

Alessandro Lagana` (ed.), MicroRNA Target Identification: Methods and Protocols, Methods in Molecular Biology, vol. 1970, https://doi.org/10.1007/978-1-4939-9207-2_1, © Springer Science+Business Media, LLC, part of Springer Nature 2019

1

2

Most Mauluda Akhtar et al.

This exponential growth of miRNA data within a short period of time indicates their biological relevance. Following transcription, cleavage, and processing in the nucleus, mature miRNAs are exported to the cytoplasm to be loaded onto the RNA induced silencing complex (RISC) to silence target mRNA [4]. It has been computationally predicted that miRNAs target 60% of human protein-coding genes through conserved base-pairing between the 30 UTR of mRNA and the 50 region of miRNA, called the seed region [5]. Through pairing with mRNA, they can downregulate gene expression by inhibiting translation or inducing mRNA degradation [6]. A single miRNA is estimated to regulate hundreds to over thousands of target genes and a number of miRNAs can target a single gene [7]. They modulate gene expression in a broad range of biological processes including tissue development [8], cell proliferation and apoptosis [9], cell division [10], cell differentiation [11], and metabolism [12]. Any perturbation in their expression levels has been reported to be associated with various clinical consequences. Currently, disease-associated miRNAs are considered as diagnostic and prognostic biomarkers as well as therapeutic targets for different human pathological conditions, including age-related diseases and cancers [13–15]. It is now clear that identifying miRNA target genes will facilitate to understand their biological functions. A number of computational algorithms have been developed [16] for the efficient prediction of miRNA targets in animals. 1.1

2

Chapter Outline

In this chapter, we describe common features that are used in developing miRNA target prediction tools to refine the initial sequence analysis. Also, we present an overview of the most commonly used computational tools to predict putative miRNA target. The strengths and limitations are also discussed since it will be useful for the users to assess and select a particular tool of their interest.

Features of miRNA Target Prediction Tools There are several features that are considered in the development of miRNA target prediction algorithms. For instance, seed pairing between miRNAs and mRNA can be analyzed and checked. Next, thermodynamic analysis of miRNA–mRNA duplexes is performed by computing the free energy of duplex formation and evaluation of the thermodynamics of binding. Then, cross-species sequence comparison is used to check whether the target sequence is evolutionarily conserved across related species. Also, it is checked if the 30 UTR is structurally accessible for miRNAs to bind. Lastly, the number of target sites for the miRNA is counted, as mRNAs are likely to be regulated by miRNA in multiple binding target sites. All these features can be used stepwise (Fig. 1). These features are described in the following sections.

MicroRNA Target Prediction

Fig. 1 Sequential steps of microRNA target prediction

3

4

Most Mauluda Akhtar et al.

2.1 Sequence Complementarity

The miRNA sequence is complementary to the 30 UTR sequence of possible mRNAs target. Though plant miRNAs bind to their targets with perfect complementarity, animal miRNAs have partial complementarity, which makes it complicated to determine potential miRNA targets with high specificity [17, 18]. Seed sequence of an miRNA, which is defined as the first 2–7 nucleotides in its 50 region, is considered critical for mRNA targeting. Most of the available target prediction algorithms require Watson–Crick pairing with the targeted sites [19]. A Watson–Crick pairing between an miRNA and mRNA nucleotide happens when adenosine (A) pairs with uracil (U) and guanine (G) pairs with cytosine (C). A strong seed match between the miRNA and the mRNA is very important for targeting. There are a number of seed matches that might be considered, depending on the algorithm. The major types of seed matches [20] are mentioned below: 1. 6mer: A perfect six nucleotide-long WC match between the miRNA seed and the mRNA. 2. 7mer-m8: A perfect WC match from nucleotides 2–8 of the miRNA seed. 3. 7mer-A1: A perfect WC match from nucleotides 2–7 of the miRNA seed in addition to an A across from the miRNA nucleotide 1. 4. 8mer: A perfect WC match from nucleotides 2–8 of the miRNA seed in addition to an A across from the miRNA nucleotide 1.

2.2 Thermodynamic Properties

The thermodynamic properties of the miRNA–mRNA duplex is considered as an important feature for evaluation of the hybridization stability. In most miRNA target prediction algorithms thermodynamic properties of the miRNA–mRNA duplex are assessed by calculating free energy of the putative binding. Free energy is considered as a measure of the stability of a biological system. A stable miRNA–mRNA binding is more likely to be considered as a true target. When the free energy of two paired RNA strands is lower, it means that more energy is required to break up the pairing. Thus, an miRNA–mRNA duplex is considered more stable thermodynamically, which means that the paring between RNA strands is stronger, when the free energy is low [21]. Vienna package is the most commonly used software tool for thermodynamic analysis, which can be used to estimate free energy and secondary structure [22]. However, it is not easy to determine the proper thresholds of free energy, as data sets of known miRNA-mRNA duplexes are inadequate. Furthermore, lower free energy does not always result in reliable prediction of miRNA target genes. Thus, other features like sequence conservation is taken into account to effectively predict miRNA targets [21].

MicroRNA Target Prediction

5

2.3 Sequence Conservation of the Target Site

Conserved sequences refer to similar or identical sequences of nucleic acids (DNA and RNA) or proteins across species. The level of sequence conservation is another feature frequently used to filter miRNA targets from a genome sequence. A number of target prediction algorithms determine orthologous 30 UTR sequences and then check whether the potential miRNA target site is conserved in other closely related species. The number of false positive predictions can be efficiently reduced by employing sequence conservation of predicted targets across related species. Higher conservation is found in the miRNA seed region more than in other regions [23]. In some cases, conserved pairing is found at the 30 end of the miRNA in miRNA–mRNA interactions, which can compensate for seed mismatches. These sites are called 30 compensatory sites [5]. Although conservation is perhaps the most powerful measure for target prediction, sometimes it can be inconvenient because certain miRNAs have evolved to act on nonconserved 30 UTRs [24]. There is growing interest in conservation analysis of the genomic regions flanking the miRNA gene and miRNA target genes. Generally, software tools such as BLASTn [25] or AVID [26, 27] are used to align 30 UTR sequences of different species and check their conservation. Moreover, the University of California Santa Cruz (UCSC) genome database [28] provides genome-wide alignment for a broad range of species and also conservation analysis.

2.4 Target Site Accessibility

Another approach to increase the precision of predictions is the structural accessibility of the binding sites. Both the miRNA and the 30 UTR should be accessible at least in the region corresponding to the seed to assist interaction. However, only the accessibility of the 30 UTR needs to be assessed, since in their active state, miRNAs are assembled into the RISC, which guarantees the accessibility of their seed. Some methods select partially accessible binding sites. The predicted targets are then ranked according to an ad hoc score [29] or free energy [30]. Other methods take into account the sum of the free energy required to make the complementary site accessible and the hybridization energy. The sum is used to determine which binding sites are accessible and rank the predictions as well [31].

2.5 Multiple Targets in the Same 30 UTR

Several studies have demonstrated that more than one miRNA can potentially target a single gene and, consequently, multiple miRNAs may cooperatively control the expression of target genes [32, 33]. The level of miRNA target regulation varies according to the number of miRNAs that bind to a target gene [34]. Multitargeting happens more frequently than expected. Prediction based on multitargeting significantly decreases the number of predicted targets. Since this feature increases the probability to find out true target genes, it is useful for experimental validation studies of miRNA targets [35].

6

3

Most Mauluda Akhtar et al.

Commonly Used Tools for miRNA Target Prediction Several methods are now available for computational prediction of miRNA targets. These prediction platforms are based on different assumptions and implementations. The most popular tools are summarized below (see Table 1 for their general features). MiRanda is one of the earliest and most frequently used target prediction algorithms. Though initially used to find targets in Drosophila, this algorithm was later used to predict targets also in vertebrates. This tool can be used through a Web interface that predicts candidate targets and scores them using the mirSVR machine learning method according to their potency to repress targeted genes [36]. MiRanda is written in C, and the source code is available for download. It is relatively simple to compile and run. Nevertheless, both these steps and the necessity to run it through command line may represent a technical obstacle for many users. On the other hand, advanced users may find several adjustable parameters useful to study particular miRNA targets. MiRanda recognizes target sites using features like sequence complementarity by a position-weighed local alignment algorithm, free energies of miRNA–mRNA duplexes using the Vienna RNA fold package [49], and conservation of targets in related genomes as a final filter. It considers conservation of both binding site and position. TargetScan [5, 21] is a Web tool that primarily predicts biological targets of miRNAs searching for the presence of conserved 8mer, 7mer, and 6mer sites that match the seed region of each miRNA. These putative targets are then ranked based on the context scores of the sites [38]. A number of features such as 30 compensatory pairing, local AU content and position contribution, target site abundance, seed-pairing stability, and probability of conserved targeting make up the score. TargetScan also predicts secondary structure to calculate the free energy of predicted duplexes. It also provides nonconserved targets or poorly conserved sites as an option. RNAhybrid [39, 40] is a program that predicts multiple possible binding sites of miRNAs in a large target RNAs and considers the free energy of miRNA–mRNA duplexes. This program searches for the energetically optimal hybridization sites of a small RNA in a large RNA and allows for searching many long targets in a short time. RNAhybrid is available for download and as a Web tool. PicTar [34, 41] is a computational method that identifies binding sites for a single miRNA and multiple sites regulated by different miRNAs acting cooperatively. It predicts common targets for miRNAs relying on comparative data from a number of species. Moreover, PicTar calculates the maximum likelihood that a given sequence of RNA (typically 30 UTR) is bound by combinations of miRNAs. Initially, target genes are predicted using a common

h, m, r, d, cn, c, rh, cw, o, fr, z, f, w

f, w

h, m, f, w

h, m, f, w

h, m, r, d, cn

h, m, f, w

h, m, w

Web serverA

Web server / softwareA

Web serverNA

Web serverNA

Web server/ softwareA

Database/ web serverNA

Web server/ softwareA

Web serverNA

TargetScan

RNAhybrid

PicTar

rna22

PITA

miRDB

DIANAmicroTCDS

STarMir

















20

18

9.1, 10,13, 18, 21

11



6

√ 16, 18, 19, 21







10.1, 17, 21







10, 11, 15

2014Y

2013Y

2015Y

2008N

2015Y

2007N

2006N

2016 Y

2010

N

http://sfold.wadsworth.org/ cgi-bin/starmirtest2.pl

http://www.microrna.gr/ webServer

http://mirdb.org/

http://genie.weizmann.ac.il/ pubs/mir07/

https://cm.jefferson.edu/ rna22/

http://pictar.mdc-berlin.de/

http://bibiserv.techfak.unibielefeld.de/rnahybrid/

http://www.targetscan.org/

http://www.microrna.org/

miRBase Last release version updated URL



P 50 UTR CDS 30 UTR

[48]

[47]

[45, 46]

[44]

[42, 43]

[41]

[39, 40]

[5, 21, 38]

[36, 37]

References

Type: A source code available; NA not available Organisms: h human, m mouse, r rat, d dog, cn chicken, c chimpanzee, rh rhesus, cw cow, o opossum, fr frog, z zebrafish, f fly, w worm, p pufferfish, mq mosquito Last updated: Y if updated over the past 5 years, N not updated

h, m, f, w

h, m, r, f, w

Database/ softwareA

Miranda

Organism(s)

Type

Tools

Site coverage on mRNA

Table 1 Select bioinformatics resources for microRNA target prediction

MicroRNA Target Prediction 7

8

Most Mauluda Akhtar et al.

criterion, that is, optimal binding free energy, and then tested statistically using genome-wide alignment of eight vertebrate genomes to filter out true targets. In contrast, rna22 [42, 43] is a pattern-based method to identify miRNA binding sites and their corresponding heteroduplexes without relying upon cross-species conservation. It initially searches putative miRNA binding sites in a given sequence, then identifies the targeting miRNA. It is thus able to identify putative miRNA binding sites even when the targeting miRNA is unknown. PITA [44] is a parameter-free model for miRNA–target interaction. The key feature of this tool is target site accessibility. It computes the difference between the free energy gained from miRNA–target duplex formation and the energy cost of uncoupling the target to make it accessible to the miRNA. This approach is limited by the accurate prediction of stable secondary structures, which becomes unreliable when considering long distance interactions and therefore larger RNA structures. MiRDB [45, 46] is an online database for miRNA target prediction and functional annotations. It consists of two child databases, that is, (1) Web interface for computationally predicted miRNA targets generated by an SVM machine learning algorithm, and (2) wiki editing interface that provides an open platform to collect manual annotations from miRNA researchers. Unlike most other miRNA databases, miRNA functional annotations in miRDB are presented with a primary focus on mature miRNAs, which are the functional carriers of miRNA-mediated gene expression regulation. DIANA-microT-CDS [47, 50, 51] is a Web server that provides access to the updated version of the widely used DIANAmicroT target prediction algorithm. It provides a user-friendly Web interface that enables users that do not have a sophisticated computational infrastructure at their disposal, to perform extensive miRNA-related analyses on results derived from high-throughput techniques, such as microarrays or next-generation sequencing (NGS). It offers ready-to-use sophisticated pipelines to perform inquiries concerning predicted or validated miRNA–gene interactions and their impact on metabolic and signaling pathways. Moreover, the Web server hosts KEGG database to provide a relevant search module based on KEGG pathway descriptions [52]. It is one of the few available online tools that predict miRNA targets in coding sequences (CDs), besides 30 UTR. For target prediction it considers features like binding category weight, distance to the nearest end of the region (CDs or 30 UTR) or to an adjacent binding site, the predicted free energy of the miRNA–mRNA duplex, conservation, and AU content. Moreover, for advanced users, it offers an in-house developed DIANA-Taverna Plug-in that enables the user to directly access target prediction server

MicroRNA Target Prediction

9

(microT-CDS) from the graphic interface of Taverna and incorporate advanced miRNA analysis functionalities into custom pipelines. Another tool, STarMir [48], implements logistic prediction models developed with miRNA binding data from CLIP studies [53]. To predict miRNA binding sites, STarMir computes comprehensive sequence, thermodynamic and target structure features, and a logistic probability as a measure of confidence for each predicted site (30 UTR, CDS, and 50 UTR). The tools reviewed above can be used independently; recently, a number of online integrated platforms have been built that combine multiple algorithms to overcome the limitations of individual tools (Table 2). As these platforms mostly incorporate the tools described above, we do not include them in this chapter.

4

Discussion The purpose of these different bioinformatics approaches is to reduce large lists of predicted targets, while minimizing the loss of true targets. Though all the tools reviewed here have their own predictive strengths, they all have limitations that pertain to the features that they use. To improve the quality of target predictions through combining the results of multiple tools is a common practice. However, there is no evidence that this will increase prediction quality and systematically reduce the number of candidates [66]. Furthermore, a tool relying solely on seed match for miRNA target identification does not assure whether the target site sequence is evolutionarily conserved or accessible for binding and thermodynamically stable for miRNA–mRNA duplex formation. It has been reported that many nonconserved binding sites in 30 UTRs are functional [67]. Thus, exclusive use of conservationbased miRNA target prediction systems is unlikely to capture such miRNA–mRNA interactions. Most miRNA target prediction tools are not consistently updated (Table 1). Regular updating is crucial since miRNA nomenclature modifies regularly and novel miRNAs are added to miRBase every year. MiRanda, a widely used downloadable algorithm, has not been updated for the past 5 years. The Web servers of PicTar and PITA have also been out of date for more than 5 years. However, PITA offers a downloadable version that can be run with user-provided data. RNAhybrid is available through an old version of its Web server and has not been updated recently. It does not offer default values and requires adjustment of complex settings with user-supplied input, which makes it complicated for nonexpert users. Other tools such as TargetScan, rna22, STarMir, DIANAmicroT-CDS, and miRDB, are relatively kept up to date. In particular, TargetScan and miRDB are the most frequently updated, and

Type

h, m, r



h, m, f, w

DatabaseNA

Web serverNA

Web serverA

Web serverNA

Web server/ h, m, r, cw, o, cn, c, softwareA z

Web serverNA

miRWalk

mirDIP

ComiR

mirTarPri

miRmap

ToppMiR



√ √ √



√ √ √ √









19





20

15



6, 9.2











P 50 UTR CDS 30 UTR

MirBase release version

http://mirmap.ezlab.org/ https://toppmir.cchmc.org/

2014Y

http://210.46.85.180:8080/ mirTarPri/

2013Y 2013Y

http://www.benoslab.pitt.edu/ comir/

http://ophid.utoronto.ca/ mirDIP/

http://www.umm.uni-heidelberg. de/apps/zmf/mirwalk/

http://mirtar.mbc.nctu.edu.tw/ human/

http://www.proto.cs.huji.ac.il/ mirror/index.php

http://mirnamap.mbc.nctu.edu. tw/

2014Y

2017Y

2017Y

2011N

2014Y

2008N

Last updated URL

[65]

[63, 64]

[62]

[61]

[60]

[59]

[58]

[56, 57]

[54, 55]

References

Type: A source code available; NA not available Organisms: h human, m mouse, r rat, d dog, cn chicken, c chimpanzee, rh rhesus, cw cow, o opossum, fr frog, z zebrafish, f fly, w worm, p pufferfish, mq mosquito Last updated: Y if updated over the past 5 years, N not updated

H

H

H

Web serverNA

miRTar

h, m, r, f, w, z

DatabaseNA

h, m, r, d, cn, o, fr, z, f, w, p, mq

Organism(s)

MiRror/ miRror suite

miRNAMap DatabaseNA

Tools

Site coverage on mRNA

Table 2 Select integrated bioinformatics resources for microRNA target prediction

10 Most Mauluda Akhtar et al.

MicroRNA Target Prediction

11

use the High-Confidence status of an miRNA for functional miRNA identification. It is evident that miRNA-binding sites within coding sequences are involved in controlling gene expression as well [68]. Tools that predict solely the target in conserved 30 UTRs are not able to predict miRNA–mRNA interactions in other regions. Among the tools reviewed above, DIANA-microT-CDS can identify miRNA targets in CDSs as well as 30 UTRs, and rna22, miRDB, and STarMir can do so in CDSs, 50 UTRs, and 30 UTRs. It is clear that current target prediction algorithms are based on diverse prediction assumptions and implementations that may significantly hinder researchers in selecting the appropriate tool for their specific requirements. We thus encourage users to be aware of the underlying assumptions, strengths, limitations, versions, maintenance, and data utilized for each tool, in order to make an informed choice. References 1. Lee RC, Feinbaum RL, Ambros V (1993) The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 75(5):843–854 2. Reinhart BJ, Slack FJ, Basson M, Pasquinelli AE, Bettinger JC, Rougvie AE, Horvitz HR, Ruvkun G (2000) The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans. Nature 403 (6772):901–906 3. Kozomara A, Griffiths-Jones S (2014) miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res 42(D1):D68–D73 4. Winter J, Jung S, Keller S, Gregory RI, Diederichs S (2009) Many roads to maturity: microRNA biogenesis pathways and their regulation. Nat Cell Biol 11(3):228–234 5. Friedman RC, Farh KK-H, Burge CB, Bartel DP (2009) Most mammalian mRNAs are conserved targets of microRNAs. Genome Res 19 (1):92–105 6. Bartel DP (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116 (2):281–297 7. Thomson DW, Bracken CP, Goodall GJ (2011) Experimental strategies for microRNA target identification. Nucleic Acids Res 39 (16):6845–6853 8. Zhao Y, Samal E, Srivastava D (2005) Serum response factor regulates a muscle-specific microRNA that targets Hand2 during cardiogenesis. Nature 436(7048):214

9. Cheng AM, Byrom MW, Shelton J, Ford LP (2005) Antisense inhibition of human miRNAs and indications for an involvement of miRNA in cell growth and apoptosis. Nucleic Acids Res 33(4):1290–1297 10. Hatfield S, Shcherbata H, Fischer K, Nakahara K, Carthew R, Ruohola-Baker H (2005) Stem cell division is regulated by the microRNA pathway. Nature 435(7044):974 11. Naguibneva I, Ameyar-Zazoua M, Polesskaya A, Ait-Si-Ali S, Groisman R, Souidi M, Cuvellier S, Harel-Bellan A (2006) The microRNA miR-181 targets the homeobox protein Hox-A11 during mammalian myoblast differentiation. Nat Cell Biol 8(3):278 12. Nova´k J, Olejnı´cˇkova´ V, Tka´cˇova´ N, Santulli G (2015) Mechanistic role of microRNAs in coupling lipid metabolism and atherosclerosis. In: MicroRNA: basic science. Springer, New York, pp 79–100 13. Cho W (2010) MicroRNAs: potential biomarkers for cancer diagnosis, prognosis and targets for therapy. Int J Biochem Cell Biol 42 (8):1273–1281 14. Micolucci L, Akhtar MM, Olivieri F, Rippo MR, Procopio AD (2016) Diagnostic value of microRNAs in asbestos exposure and malignant mesothelioma: systematic review and qualitative meta-analysis. Oncotarget 7 (36):58606 15. Olivieri F, Capri M, Bonafe` M, Morsiani C, ˜ a J, Suh Y Jung HJ, Spazzafumo L, Vin (2017) Circulating miRNAs and miRNA shuttles as biomarkers: perspective trajectories of

12

Most Mauluda Akhtar et al.

healthy and unhealthy aging. Mech Ageing Dev 165:162–170 16. Akhtar MM, Micolucci L, Islam MS, Olivieri F, Procopio AD (2015) Bioinformatic tools for microRNA dissection. Nucleic Acids Res 44 (1):24–44 17. Zhang Y (2005) miRU: an automated plant miRNA target prediction server. Nucleic Acids Res 33(suppl 2):W701–W704 18. Ambros V (2004) The functions of animal microRNAs. Nature 431(7006):350–355 19. Bartel DP (2009) MicroRNAs: target recognition and regulatory functions. Cell 136 (2):215–233 20. Peterson SM, Thompson JA, Ufkin ML, Sathyanarayana P, Liaw L, Congdon CB (2014) Common features of microRNA target prediction tools. Front Genet 5:23 21. Lewis BP, Burge CB, Bartel DP (2005) Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120 (1):15–20 22. Wuchty S, Fontana W, Hofacker IL, Schuster P (1999) Complete suboptimal folding of RNA and the stability of secondary structures. Biopolymers 49(2):145–165 23. Lewis BP, Shih I, Jones-Rhoades MW, Bartel DP, Burge CB (2003) Prediction of mammalian microRNA targets. Cell 115(7):787–798 24. Lai EC (2004) Predicting and validating microRNA targets. Genome Biol 5(9):115 25. Waterman MS, Eggert M (1987) A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. J Mol Biol 197(4):723–728 26. Bray N, Dubchak I, Pachter L (2003) AVID: a global alignment program. Genome Res 13 (1):97–102 27. Couronne O, Poliakov A, Bray N, Ishkhanov T, Ryaboy D, Rubin E, Pachter L, Dubchak I (2003) Strategies and tools for whole-genome alignments. Genome Res 13 (1):73–80 28. Rosenbloom KR, Armstrong J, Barber GP, Casper J, Clawson H, Diekhans M, Dreszer TR, Fujita PA, Guruvadoo L, Haeussler M (2014) The UCSC genome browser database: 2015 update. Nucleic Acids Res 43(D1): D670–D681 29. Robins H, Li Y, Padgett RW (2005) Incorporating structure to predict microRNA targets. Proc Natl Acad Sci U S A 102(11):4006–4009 30. Long D, Lee R, Williams P, Chan CY, Ambros V, Ding Y (2007) Potent effect of target structure on microRNA function. Nat Struct Mol Biol 14(4):287

31. Marı´n RM, Vanı´cˇek J (2010) Efficient use of accessibility in microRNA target prediction. Nucleic Acids Res 39(1):19–29 32. Enright AJ, John B, Gaul U, Tuschl T, Sander C, Marks DS (2003) MicroRNA targets in Drosophila. Genome Biol 5(1):R1 33. Watanabe Y, Yachie N, Numata K, Saito R, Kanai A, Tomita M (2006) Computational analysis of microRNA targets in Caenorhabditis elegans. Gene 365:2–10 34. Krek A, Gru¨n D, Poy MN, Wolf R, Rosenberg L, Epstein EJ, MacMenamin P, Da Piedade I, Gunsalus KC, Stoffel M (2005) Combinatorial microRNA target predictions. Nat Genet 37(5):495 35. Ritchie W, Rasko JE, Flamant S (2013) MicroRNA target prediction and validation. In: MicroRNA cancer regulation. Springer, New York, pp 39–53 36. Betel D, Koppal A, Agius P, Sander C, Leslie C (2010) Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites. Genome Biol 11(8):R90 37. Betel D, Wilson M, Gabow A, Marks DS, Sander C (2008) The microRNA. org resource: targets and expression. Nucleic Acids Res 36 (suppl 1):D149–D153 38. Agarwal V, Bell GW, Nam J-W, Bartel DP (2015) Predicting effective microRNA target sites in mammalian mRNAs. eLife 4 39. Rehmsmeier M, Steffen P, Ho¨chsmann M, Giegerich R (2004) Fast and effective prediction of microRNA/target duplexes. RNA 10 (10):1507–1517 40. Kru¨ger J, Rehmsmeier M (2006) RNAhybrid: microRNA target prediction easy, fast and flexible. Nucleic Acids Res 34(suppl 2): W451–W454 41. Lall S, Gru¨n D, Krek A, Chen K, Wang Y-L, Dewey CN, Sood P, Colombo T, Bray N, MacMenamin P (2006) A genome-wide map of conserved MicroRNA targets in C. elegans. Curr Biol 16(5):460–471 42. Miranda KC, Huynh T, Tay Y, Ang Y-S, Tam W-L, Thomson AM, Lim B, Rigoutsos I (2006) A pattern-based method for the identification of MicroRNA binding sites and their corresponding heteroduplexes. Cell 126 (6):1203–1217 43. Loher P, Rigoutsos I (2012) Interactive exploration of RNA22 microRNA target predictions. Bioinformatics 28(24):3322–3323 44. Kertesz M, Iovino N, Unnerstall U, Gaul U, Segal E (2007) The role of site accessibility in microRNA target recognition. Nat Genet 39 (10):1278–1284

MicroRNA Target Prediction 45. Wang X (2008) miRDB: a microRNA target prediction and functional annotation database with a wiki interface. RNA 14(6):1012–1017 46. Wong N, Wang X (2014) miRDB: an online resource for microRNA target prediction and functional annotations. Nucleic Acids Res 43 (D1):D146–D152 47. Paraskevopoulou MD, Georgakilas G, Kostoulas N, Vlachos IS, Vergoulis T, Reczko M, Filippidis C, Dalamagas T, Hatzigeorgiou AG (2013) DIANA-microT web server v5. 0: service integration into miRNA functional analysis workflows. Nucleic Acids Res 41(W1):W169–W173 48. Rennie W, Liu C, Carmack CS, Wolenc A, Kanoria S, Lu J, Long D, Ding Y (2014) STarMir: a web server for prediction of microRNA binding sites. Nucleic Acids Res 42(Web Server issue):W114–W118. (In press):gku376 49. John B, Enright AJ, Aravin A, Tuschl T, Sander C, Marks DS (2004) Human microRNA targets. PLoS Biol 2(11):e363 50. Maragkakis M, Reczko M, Simossis VA, Alexiou P, Papadopoulos GL, Dalamagas T, Giannopoulos G, Goumas G, Koukis E, Kourtis K (2009) DIANA-microT web server: elucidating microRNA functions through target prediction. Nucleic Acids Res 37(Web Server issue):W273–W276 51. Maragkakis M, Vergoulis T, Alexiou P, Reczko M, Plomaritou K, Gousis M, Kourtis K, Koziris N, Dalamagas T, Hatzigeorgiou AG (2011) DIANA-microT web server upgrade supports Fly and worm miRNA target prediction and bibliographic miRNA to disease association. Nucleic Acids Res 39(suppl 2): W145–W148 52. Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M (2011) KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res 40(D1):D109–D114 53. Liu C, Mallick B, Long D, Rennie WA, Wolenc A, Carmack CS, Ding Y (2013) CLIP-based prediction of mammalian microRNA binding sites. Nucleic Acids Res 41(14): e138–e138 54. Hsu PWC, Huang H-D, Hsu S-D, Lin L-Z, Tsou A-P, Tseng C-P, Stadler PF, Washietl S, Hofacker IL (2006) miRNAMap: genomic maps of microRNA genes and their target genes in mammalian genomes. Nucleic Acids Res 34(suppl 1):D135–D139 55. Hsu S-D, Chu C-H, Tsou A-P, Chen S-J, Chen H-C, Hsu PW-C, Wong Y-H, Chen Y-H, Chen G-H, Huang H-D (2008) miRNAMap 2.0: genomic maps of microRNAs in metazoan genomes. Nucleic Acids Res 36(suppl 1): D165–D169

13

56. Friedman Y, Naamati G, Linial M (2010) MiRror: a combinatorial analysis web tool for ensembles of microRNAs and their targets. Bioinformatics 26(15):1920–1921 57. Friedman Y, Karsenty S, Linial M (2014) miRror-Suite: decoding coordinated regulation by microRNAs. Database 2014:bau043 58. Hsu JBK, Chiu C-M, Hsu S-D, Huang W-Y, Chien C-H, Lee T-Y, Huang H-D (2011) miRTar: an integrated system for identifying miRNA-target interactions in human. BMC Bioinformatics 12(1):300 59. Dweep H, Sticht C, Pandey P, Gretz N (2011) miRWalk - database: prediction of possible miRNA binding sites by “walking” the genes of three genomes. J Biomed Inform 44 (5):839–847 60. Shirdel EA, Xie W, Mak TW, Jurisica I (2011) NAViGaTing the micronome - using multiple microRNA prediction databases to identify signalling pathway-associated microRNAs. PLoS One 6(2):e17429 61. Coronnello C, Benos PV (2013) ComiR: combinatorial microRNA target prediction tool. Nucleic Acids Res 41(W1):W159–W164 62. Wang P, Ning S, Wang Q, Li R, Ye J, Zhao Z, Li Y, Huang T, Li X (2013) mirTarPri: improved prioritization of MicroRNA targets through incorporation of functional genomics data. PLoS One 8(1):e53685 63. Vejnar CE, Zdobnov EM (2012) miRmap: comprehensive prediction of microRNA target repression strength. Nucleic Acids Res 40 (22):11673–11683 64. Vejnar CE, Blum M, Zdobnov EM (2013) miRmap web: comprehensive microRNA target prediction online. Nucleic Acids Res 41 (W1):W165–W168 65. Wu C, Bardes EE, Jegga AG, Aronow BJ (2014) ToppMiR: ranking microRNAs and their mRNA targets based on biological functions and context. Nucleic Acids Res 42(Web Server issue):W107–W113. gku409 66. Ritchie W, Flamant S, Rasko JE (2009) Predicting microRNA targets and functions: traps for the unwary. Nat Methods 6(6):397 67. Farh KK-H, Grimson A, Jan C, Lewis BP, Johnston WK, Lim LP, Burge CB, Bartel DP (2005) The widespread impact of mammalian MicroRNAs on mRNA repression and evolution. Science 310(5755):1817–1821 68. Reczko M, Maragkakis M, Alexiou P, Grosse I, Hatzigeorgiou AG (2012) Functional microRNA targets in protein coding sequences. Bioinformatics 28(6):771–776

Chapter 2 miRandb: A Metadatabase of Online Resources of miRNA and miRNA Targets Seyed Hamid Aghaee-Bakhtiari Abstract MicroRNA (miRNA) studies deliver numerous types of information including miRNA identification, sequence of miRNAs, target prediction, roles in diseases, and interactions in signaling pathways. Considering the different types of miRNA data, the number of miRNA databases has been increasing quickly. While resources have been planned to simplify miRNA analysis, scientists are facing the challenging task of choosing the most proper tool to retrieve related information. In this chapter, we introduce the use of miRandb, a resource that we have established to present an outline of different types of miRNA online resources and to simplify finding the right miRNA information that scientists need for their research. miRandb offers a user-friendly platform to find related information about any miRNA data among more than 188 present miRNA databases. miRandb has an easy procedure, and information can be retrieved by miRNA category resources. Each database comprises numerous kinds of information including database activity, description, main and unique features, organism, URL, publication, category, published year, citations per year, last update, and relative popularity. miRandb provides several opportunities and facilitates access to diverse classes of microRNA resources. miRandb is available at http://miRandb.ir. Key words miRNA, Metadatabase, Online databases, Web-accessible databases, Online resources

1

Introduction MicroRNAs (miRNAs) are tiny noncoding endogenous RNAs that control gene expression posttranscriptionally. Several studies have confirmed the important role of miRNAs in an extensive series of natural processes, such as differentiation [1, 2], cell proliferation [3], apoptosis, cell cycle, and metabolism [4]. miRNAs have essential controlling roles in a wide range of human diseases, and several studies showed that these molecules have a great value as biomarkers for prognosis and diagnosis [5], therapeutic molecules [6], and disease progress evaluation [7]. Accordingly, miRNA studies are essential to entirely understand the molecular bases of health physiology and disease condition.

Alessandro Lagana` (ed.), MicroRNA Target Identification: Methods and Protocols, Methods in Molecular Biology, vol. 1970, https://doi.org/10.1007/978-1-4939-9207-2_2, © Springer Science+Business Media, LLC, part of Springer Nature 2019

15

16

Seyed Hamid Aghaee-Bakhtiari

miRNA online resources are necessary devices for researchers to get data, and in this decade their number has been quickly increasing. miRNA studies encompass several types of analyses and data including miRNA discovery, pre-miRNA and miRNA nucleotide sequences, genomic position, target assessment by prediction algorithms or validation methods, expression profiles in different cell lines and tissues, pathway and network interaction, function assignment, and their roles in different diseases [8]. Due to the fast increasing number of recorded miRNAs for human and other species and the associated different types of information, miRNA online resources play a fundamental role in managing this huge amount of information. For this purpose, several databases have been established to construct diverse series of data and several studies have reviewed miRNA online tools [9, 10]. Although several databases have been made available to simplify miRNA analysis, scientists are challenged with the task of choosing the right database to retrieve the appropriate information [11]. Several databases have been developed for specific types of miRNArelated data and researchers are faced with reviewing multiple articles or sites to decide a preferred tool. Therefore, a website which integrates and classifies all the accessible online tools is needed. We established the miRandb (miRNA algorithmic network database) database (http://miRandb.ir), which contains various data modalities and provides simplified access to the diverse classes of miRNA databases. In miRandb, we classify miRNA online tools into eight groups and present concepts and appropriate resources for each class. miRandb suggests an easy to follow policy to find required data about each miRNA among 188 miRNA online resources [12]. This chapter can guide scientists with different expertise to find the miRNA databases appropriate to their studies. The enormous number of miRNA databases and variety of types of miRNA-related data requires having an organized method to recover and interpret the data. We classified these online databases into eight groups, considering the following categories of data: (1) miRNA sequence, (2) miRNA discovery in a sequence, (3) target prediction of miRNAs, (4) miRNA target validation, (5) expression of miRNA and its regulation, (6) function of miRNAs and their roles in diseases, (7) miRNA associations in pathways and networks, and (8) miRNA deep sequencing. 1.1 miRNA Sequence Databases

Considering the very large number of miRNAs in diverse species, there is a necessity for a Web-based searchable resource for miRNA sequences and related information. In addition, miRNA nomenclature is another significant topic in this field. miRNA sequence databases include pre-miRNA and mature miRNA sequences of diverse organisms, stem-loop structure, expression of 3p and 5p miRNAs, miRNA genes, genomic annotation, and miRNA identity in different species [13].

miRandb: miRNA Meta-Database

17

1.2 miRNA Discovery in a Sequence

Discovering the complete list of miRNAs is the primary step to understand their role. There are several experimental techniques and computational methodologies available to determine miRNAs from a given sequence. Computational methods identify novel miRNAs based on stem-loop configuration, conservation in different species, and physical stability. Using several computational programs and choosing the potential miRNA sequences identified by most algorithms enhance the precision of prediction [8]. Nonetheless, the result of these computational programs should be experimentally validated, as effective recognition of miRNA sequences needs integration of both computational and experimental methods [14, 15].

1.3 Target Prediction of miRNAs Databases

Considering the small number of miRNA targets which have been experimentally validated, there is a need to establish approaches which can detect and rank prospective miRNA targets [16]. The existing algorithms are based on numerous factors such as mRNA and miRNA complementarity, miRNA–mRNA duplex stability, abundance of target sites in 30 UTRs, and miRNA target sites conservation between different species [17]. Target prediction resources can be classified in four groups [12]. Single-resource databases present precomputed calculations and cover only one target prediction algorithm. Users can choose a desired miRNA and obtain a list of gene target names or input a gene name to obtain a list of miRNAs that could target it [18]. User-defined sequence databases are a second class of target prediction resources that compute potential interactions of any desired pair of miRNA and mRNA sequences. These databases are useful for assessment of novel miRNA interactions with prospective mRNAs and evaluation of interspecies miRNA–mRNA interactions [19]. Composite resource-databases are another class of target prediction algorithms, which combine results of several primary databases. Target prediction performed with any algorithm produces a large number of results, many of which may be false positive, thus there is an essential need to select the most probable targets [20, 21]. One solution for this problem is the integration of predictions from various databases. Selecting targets which have been predicted by more algorithms may increase accuracy [22]. Correlation of target prediction and expression databases are the last class of target prediction algorithms which enhance prediction by equivalence of prediction methods by experimental expression data. There is normally a negative relationship among the expression of the target gene and its related controlling miRNA, thus predicted targets presenting a negative correlation with miRNA expression can be assumed as more reliable [23].

1.4 miRNA Target Validation Databases

Recognition of miRNA targets is essential to identify their function and there are some experimental methods that have been

18

Seyed Hamid Aghaee-Bakhtiari

established to authenticate miRNA–mRNA interaction. There are several Web resources that collect and investigate experimentally validated miRNA targets from published articles. For each miRNA–mRNA interaction, these databases include the experimental validation system, cell line/tissue type, and regulation type [9, 24]. 1.5 Expression of miRNA and Its Regulation Databases

miRNA research has progressed quickly during the past few years and provided a large amount of miRNA expression data from various creatures, cell lines, tissues, and diseases. miRNA expression comparison analysis can reveal the important function of miRNAs in different healthy and pathological conditions. These databases include various expression data such as expression of each miRNA in different cell lines and tissues, expression of different miRNAs in each cell or tissue and comparison between different tissue and cell lines [20, 25]. The other role of miRNAs is in transcription factor (TF) signaling networks, where they exert their function by either affecting the activity of TF precisely or acting in collaboration with it. The major emphasis of these databases is on the recognition of TFs, promoter region of miRNA genes and transcription start sites [26].

1.6 Databases of Function of miRNAs and Their Roles in Diseases

Recognition of miRNA function in a cell is based on their targets and is an important step to find their role in a holistic view. miRNAs can initiate or suppress different cellular procedures by controlling molecular interaction complexes. The Gene Ontology (GO) resource catalogs many of these mechanisms related with the target mRNAs of a specific miRNA. The role of each gene is previously demonstrated, and computational assessment is used to discover miRNA function in molecular pathways and different diseases based on the GO annotation of a particular miRNA target. Usually, function databases use various target prediction tools or validated targets to generate lists of genes that are targeted by a chosen miRNAs and then, based on GO, determine the resulting function [27]. There are numerous studies that confirmed that expression profiles of miRNA vary in disease conditions and therefore they could be used as disease biomarkers. Additionally, miRNAs show important roles in the pathogenic process of some diseases, for example in cancer, several miRNAs serve as tumor suppressors or oncogenes, promoting disease control or progression, respectively. There are several miRNA resources that extract, collect and categorize miRNA–disease association from literature or prediction algorithms. In these databases users can perform searches by miRNA ID, disease name, or gene name [28].

miRandb: miRNA Meta-Database

19

1.7 Databases of miRNA Interactions in Pathways and Networks

This category of databases illustrates the effect of miRNA on a desired pathway and annotates microRNA target genes on the pathway map by incorporating predicted and validated miRNA–mRNA relations’ records with cellular pathway tools. A universal methodology to recognize miRNA function in cellular biology includes inspecting their effects on networks and pathways, rather than focusing on distinct mRNAs. miRNA pathway databases can be searched by miRNA name or pathway name and a list of pathways or miRNAs is then presented, respectively, in the results page [29, 30]. miRNA activity is part of a complex network which determines gene expression through densely linked regulatory elements [31, 32].

1.8 miRNA Deep Sequencing Databases

Next-generation sequencing techniques measure the expression of identified miRNAs and discover new items. To facilitate working with next-generation sequencing data, there are several Web-based resources that have been developed to analyze sequencing data for researchers working under resource constraints [33, 34].

2

Materials

2.1

Hardware

A computer with Windows, MacOS, or a Unix workstation, and access to the Internet.

2.2

Software

Internet browser: A modern Web browser, such as Google Chrome (http://www.google.com/chrome); Internet Explorer (http:// www.microsoft.com/windows/internet-explorer/worldwide-sites. aspx); Firefox (http://www.mozilla.org/firefox); or Safari (http:// www.apple.com/safari).

3

Methods The methods presented in this chapter explain how to utilize the miRandb Web resource to search miRNA online resources with two different methods, their features, comparative investigation of various tools of each category, and search in different subcategories of target prediction resources (see Note 1).

3.1 Investigation of Various Tools in Each Category of the miRandb Database

The miRandb resource implements a convenient procedure for easy data retrieval. By means of miRNA category resources, users can pick out any class of tools and obtain all the resources of any group. The following steps explain how to work with the miRandb website to search, analyze, and compare diverse databases of miRNA roles in diseases category: 1. Go to miRNA category resources and click “Resources for miRNA functions and roles in diseases” to open the page (Fig. 1a).

20

Seyed Hamid Aghaee-Bakhtiari

Fig. 1 (a) Finding proper databases using miRNA category resources (b) miR2Disease database information types

miRandb: miRNA Meta-Database

21

2. In the next page, select the “Resources of miRNA roles in various diseases.” 3. All the databases in this category are presented on this page so that they can be compared and the most appropriate tool for your study can be selected. 4. From the results, the miR2Disease database is selected and a description of its characteristic is displayed (Fig. 1b). 5. In the result page, there are numerous kinds of data. The Activity segment indicates whether the tool is functional (active) or nonfunctional (inactive). The Description section introduces the features of the tools and the basis of each resource. The Main Feature field indicates the main characteristic of the resource, and unique features lists the qualifications that are specific for the tool. The Organism section shows the organisms included in the resource, while the URL and publication segments display the Web address and the tool articles (Fig. 1b). 6. The Category field displays the kind of data and other types of information that every resource covers. The Last observed segment shows the time of the latest data update and the Year published section presents the article publication time. Citations per year field shows Scopus citations of every tool divided on years of publication. Last update field and updated over past 3 years display the last update of the resource and the update status of the tool, respectively. Relative popularity provides a ranking system of the resources based on the Alexa site (Fig. 1b) (see Notes 2 and 3). 3.2 Selection of Desired Tools in Every Category of miRandb Database

Using the database search section, researchers can select each category of databases and filter results based on the desired specification of tools. The following procedure describes how to use the miRandb site to investigate databases of miRNA interactions in pathways and filter active databases including human data which were published in 2015 and updated over the last 3 years: 1. Go to database search and select “Resources for miRNA interactions in pathways” from the category field (Fig. 2a). 2. From the activity field, select the “active” status and select “human” as creature from the organism section. 3. In the year published part, select “2015” and from updated over past 3 years choose “Yes.” 4. Click on search button and in the result, the DIANA_miRPath v3.0 database will be presented with an explanation of its specifications (Fig. 2b).

22

Seyed Hamid Aghaee-Bakhtiari

Fig. 2 (a) Finding suitable tools with database search (b) DIANA_miRPath v3.0 database information types

3.3 Prediction of miRNA Targets and Prediction of miRNAs Targeting a Specific Gene Using Single Resource-Databases

The next step explains how to work with miRandb to search databases of single resource target prediction tools and screen active tools containing human information, which has been updated over the past 3 years. In the following, we search TargetScan database [18] to find candidate targets for hsa-mir-200a and predict miRNAs targeting the Col1A1 (Collagen, type I, alpha 1) gene: 1. Go to database search and choose “single resource target prediction tools” from the category section (Fig. 2a). 2. Select the “active” item from the activity, choose “human” as creature from the organism segment and select “Yes” from updated over past 3 years. 3. Click on search and then select the TargetScan database from the list. 4. Go to http://www.targetscan.org (Fig. 3a) and select the species in the field 3.A “Human.” Enter the miRNA name “hsamir-200a-3p” into the field 3.B and click “submit.” The result window (Fig. 3b) contains the genes targeted by hsa-mir-200a3p and its conserved and poorly conserved sites.

miRandb: miRNA Meta-Database

23

Fig. 3 (a) The TargetScan homepage and (b) predicted targets for hsa-mir-200a-3p

5. Go to http://www.targetscan.org and select the species in the field 3.A. Enter the gene name “COL1a1” into the field 3.C and click submit (Fig. 3a). The result window (Fig. 4a), contains the miRNAs targeting the gene and its conserved and poorly conserved sites. For acquiring the results in table format, click on “View table of miRNA sites” in the field 4.A. The table format is shown in Fig. 4b, presenting the miRNAs targeting the desired gene.

24

Seyed Hamid Aghaee-Bakhtiari

Fig. 4 (a) TargetScan prediction for miRNAs targeting Col1A1 gene and (b) The table of miRNAs targeting Col1A1 gene 3.4 User Gene and miRNA Sequence Interaction Prediction Using User Defined Sequence Target Prediction Tools

Using miRNA category resources, researchers can select each group of databases and find all the tools of each class. In the following section, we discuss about how to use miRandb to investigate target prediction tools using user-defined sequences. The following steps illustrate how to predict the interaction of COL1A1 30 UTR and hsa-miR-29b-3p by RNAhybrid tool [19]: 1. Go to miRNA category and click “miRNA target prediction resources” to open the page. 2. In the subsequent page, choose the “User specific sequence.” 3. All the tools in this group are demonstrated, and RNAhybrid is selected for evaluation. Go to https://bibiserv.cebitec.unibielefeld.de/rnahybrid?id¼rnahybrid_view_submission (Fig. 5), select from data input method the “copy & paste” item field 5.A and enter the sequence of the target mRNA COL1A1 30 UTR in field 5.B and the desired miRNA hsa-miR-29b-3p in the field 5. C. All the sequences should be in fasta format. Click on submit button and in the next page select the source of the target

miRandb: miRNA Meta-Database

25

Fig. 5 RNAhybrid database predicting interaction between COL1A1 30 UTR and hsa-miR-29b-3p

sequence (fly, worm, or human) in field 5.D, which provides an approximate estimate of the p-values. The next window shows the target along with miRNA interactions with minimum free energy (mfe). Interactions with a more negative score have higher stability (5.E). 3.5 Prediction of miRNAs Targeting a Specific Gene Using Composite Resource Databases

In this section, we explain the procedure to use miRandb to investigate composite resource databases and filter active databases including human data which have been updated over the last 3 years. In what follows, we predict the miRNAs targeting COL1A1 with the majority of different prediction algorithms using miRWalk 2.0 database [21]:

26

Seyed Hamid Aghaee-Bakhtiari

1. Go to database search and choose “Composite target prediction resource tools” from category segment (Fig. 2a). 2. From activity field, organism section, and updated over past 3 years, choose “active,” “human,” and “Yes,” respectively. 3. Click on search and in result, choose miRWalk 2.0 tool. 4. Go to http://zmf.umm.uni-heidelberg.de/apps/zmf/mir walk2/, click on “Predicted Target Module” and select “Gene-miRNA Targets.” In the next page (Fig. 6), select the “human” species in field 6.A, select “Gene” from search database in field 6.B, enter the gene name “COL1A1” (6.C), choose “Official symbol” from field 6.D, select the “30 -UTR”

Fig. 6 (a) The mirwalk search page (b) Intermediate result page and (c) Mirwalk results of the miRNAs targeting COL1A1 gene

miRandb: miRNA Meta-Database

27

miRNA binding region (50 -UTR, CDS, or 30 -UTR) (6.E), choose the desired algorithms from the 12 prediction progr ams which can be selected by “or” or “and” (6.F) and click on search. In the next page, the results of prediction by miRWalk and the chosen algorithms are presented, and click 30 -UTR of chosen algorithms (6.G). The next window shows the predictions of the 12 programs for each miRNA. For downloading the results in the Excel format, click on “Download Complete Table” (6.H). For miRNA analysis in miRWalk database, click on “Predicted Target Module” and select “MicroRNA-gene Targets.” 3.6 Specific miRNA Target Prediction Using Target Prediction Correlated with Expression Data

By means of miRNA category resources, researchers can select each group of databases and find all the tools of each class. In the following steps, we describe how to utilize miRandb to search databases of target prediction correlated with the expression data. In the following steps, we predict target genes of hsa-mir-200a via the TargetScan algorithm using negative correlation with the expression data by means of mimiRNA resource [23]: 1. Go to miRNA category and click “miRNA target prediction resources” to open the page. 2. In the subsequent page, choose the “Target prediction and correlation with expression data.” 3. Go to http://mimirna.centenary.org.au, click on item 5 “Which genes does my microRNA target?” (Fig. 7), select

Fig. 7 The mimiRNA search page and red lines, mimiRNA database results of target genes of hsa-mir-200a via TargetScan algorithm with negative correlation with the expression data

28

Seyed Hamid Aghaee-Bakhtiari

the desired miRNA (hsa-mir-200a) (7.A), select p-value cutoff (7.B), click on the desired algorithm (TargetScan or Miranda) to integrate with the expression data (7.C), and finally click on “Find correlated genes.” In the result page (7.D), the genes positively or negatively correlated with prediction algorithm are marked by “>>>” sign. In other words, the genes labeled by “>>>” are predicted by the prediction algorithm and are positively or negatively correlated with the expression data.

4

Notes 1. Users can search in miRandb based on the database category or on filtered results that can be obtained from the category resource part and the search database segment section, respectively. 2. The relative popularity presents recognition of each database and smaller ranking number indicates that the database is more popular. 3. Citation per year, last update, and relative popularity items can assist researchers to manage and select the most proper tool based on diverse valuable ranking systems.

Acknowledgments This research is supported by Mashhad University of Medical Sciences (No. 941245, 950909). References 1. Zare M, Soleimani M, Akbarzadeh A, Bakhshandeh B, Aghaee-Bakhtiari SH, Zarghami N (2015) A novel protocol to differentiate induced pluripotent stem cells by neuronal microRNAs to provide a suitable cellular model. Chem Biol Drug Des 86 (2):232–238. https://doi.org/10.1111/ cbdd.12485 2. Fallah P, Arefian E, Naderi M, AghaeeBakhtiari SH, Atashi A, Ahmadi K, Shafiee A, Soleimani M (2013) miR-146a and miR-150 promote the differentiation of CD133+ cells into T-lymphoid lineage. Mol Biol Rep 40 (8):4713–4719. https://doi.org/10.1007/ s11033-013-2567-6 3. Rahimian A, Soleimani M, Kaviani S, AghaeeBakhtiari SH, Atashi A, Arefian E, Nikougoftar M (2011) Bypassing the maturation arrest in

myeloid cell line U937 by over-expression of microRNA-424. Hematology 16(5):298–302. https://doi.org/10.1179/ 102453311X13085644680140 4. Attar M, Arefian E, Nabiuni M, Adegani FJ, Bakhtiari SH, Karimi Z, Barzegar M, Soleimani M (2012) MicroRNA 17-92 expressed by a transposone-based vector changes expression level of cell-cycle-related genes. Cell Biol Int 36(11):1005–1012. https://doi.org/10. 1042/CBI20110089 5. Aghaee-Bakhtiari SH, Arefian E, Soleimani M, Noorbakhsh F, Samiee SM, Fard-Esfahani P, Mahdian R (2016) Reproducible and reliable real-time PCR assay to measure mature form of miR-141. Appl Immunohistochem Mol Morphol 24(2):138–143. https://doi.org/10. 1097/PAI.0000000000000169

miRandb: miRNA Meta-Database 6. Aghaee-Bakhtiari SH, Arefian E, Naderi M, Noorbakhsh F, Nodouzi V, Asgari M, FardEsfahani P, Mahdian R, Soleimani M (2015) MAPK and JAK/STAT pathways targeted by miR-23a and miR-23b in prostate cancer: computational and in vitro approaches. Tumour Biol 36(6):4203–4212. https://doi. org/10.1007/s13277-015-3057-3 7. Arefian E, Kiani J, Soleimani M, Shariati SA, Aghaee-Bakhtiari SH, Atashi A, Gheisari Y, Ahmadbeigi N, Banaei-Moghaddam AM, Naderi M, Namvarasl N, Good L, Faridani OR (2011) Analysis of microRNA signatures using size-coded ligation-mediated PCR. Nucleic Acids Res 39(12):e80. https://doi. org/10.1093/nar/gkr214 8. Tan Gana NH, Victoriano AF, Okamoto T (2012) Evaluation of online miRNA resources for biomedical applications. Genes Cells 17 (1):11–27. https://doi.org/10.1111/j.13652443.2011.01564.x 9. Vlachos IS, Hatzigeorgiou AG (2013) Online resources for miRNA analysis. Clin Biochem 46(10–11):879–900. https://doi.org/10. 1016/j.clinbiochem.2013.03.006 10. Akhtar MM, Micolucci L, Islam MS, Olivieri F, Procopio AD (2016) Bioinformatic tools for microRNA dissection. Nucleic Acids Res 44 (1):24–44. https://doi.org/10.1093/nar/ gkv1221 11. Lukasik A, Wojcikowski M, Zielenkiewicz P (2016) Tools4miRs - one place to gather all the tools for miRNA analysis. Bioinformatics 32(17):2722–2724. https://doi.org/10. 1093/bioinformatics/btw189 12. Aghaee-Bakhtiari SH, Arefian E, Lau P (2017) miRandb: a resource of online services for miRNA research. Brief Bioinform. https:// doi.org/10.1093/bib/bbw109 13. Kozomara A, Griffiths-Jones S (2014) miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res 42(Database issue):D68–D73. https://doi. org/10.1093/nar/gkt1181 14. Mendes ND, Freitas AT, Sagot MF (2009) Current tools for the identification of miRNA genes and their targets. Nucleic Acids Res 37 (8):2419–2433. https://doi.org/10.1093/ nar/gkp145 15. Gomes CP, Cho JH, Hood L, Franco OL, Pereira RW, Wang K (2013) A review of computational tools in microRNA discovery. Front Genet 4:81. https://doi.org/10.3389/ fgene.2013.00081 16. Hammell M (2010) Computational methods to identify miRNA targets. Semin Cell Dev

29

Biol 21(7):738–744. https://doi.org/10. 1016/j.semcdb.2010.01.004 17. Huang Y, Zou Q, Song H, Song F, Wang L, Zhang G, Shen X (2010) A study of miRNAs targets prediction and experimental validation. Protein Cell 1(11):979–986. https://doi.org/ 10.1007/s13238-010-0129-4 18. Agarwal V, Bell GW, Nam JW, Bartel DP (2015) Predicting effective microRNA target sites in mammalian mRNAs. eLife 4. https:// doi.org/10.7554/eLife.05005 19. Kruger J, Rehmsmeier M (2006) RNAhybrid: microRNA target prediction easy, fast and flexible. Nucleic Acids Res 34(Web Server issue): W451–W454. https://doi.org/10.1093/nar/ gkl243 20. Schmitz U, Wolkenhauer O (2013) Web resources for microRNA research. Adv Exp Med Biol 774:225–250. https://doi.org/10. 1007/978-94-007-5590-1_12 21. Dweep H, Gretz N (2015) miRWalk2.0: a comprehensive atlas of microRNA-target interactions. Nat Methods 12(8):697. https://doi. org/10.1038/nmeth.3485 22. Min H, Yoon S (2010) Got target? Computational methods for microRNA target prediction and their extension. Exp Mol Med 42 (4):233–244. https://doi.org/10.3858/ emm.2010.42.4.032 23. Ritchie W, Flamant S, Rasko JE (2010) mimiRNA: a microRNA expression profiler and classification resource designed to identify functional correlations between microRNAs and their targets. Bioinformatics 26 (2):223–227. https://doi.org/10.1093/bioin formatics/btp649 24. Vlachos IS, Paraskevopoulou MD, Karagkouni D, Georgakilas G, Vergoulis T, Kanellos I, Anastasopoulos IL, Maniou S, Karathanou K, Kalfakakou D, Fevgas A, Dalamagas T, Hatzigeorgiou AG (2015) DIANA-TarBase v7.0: indexing more than half a million experimentally supported miRNA:mRNA interactions. Nucleic Acids Res 43(Database issue):D153–D159. https:// doi.org/10.1093/nar/gku1215 25. Hausser J, Berninger P, Rodak C, Jantscher Y, Wirth S, Zavolan M (2009) MirZ: an integrated microRNA expression atlas and target prediction resource. Nucleic Acids Res 37 (Web Server issue):W266–W272. https://doi. org/10.1093/nar/gkp412 26. Le Bechec A, Portales-Casamar E, Vetter G, Moes M, Zindy PJ, Saumet A, Arenillas D, Theillet C, Wasserman WW, Lecellier CH, Friederich E (2011) MIR@NT@N: a framework integrating transcription factors,

30

Seyed Hamid Aghaee-Bakhtiari

microRNAs and their targets to identify sub-network motifs in a meta-regulation network model. BMC Bioinformatics 12:67. https://doi.org/10.1186/1471-2105-12-67 27. Lagana A, Forte S, Giudice A, Arena MR, Puglisi PL, Giugno R, Pulvirenti A, Shasha D, Ferro A (2009) miRo: a miRNA knowledge base. Database (Oxford) 2009:bap008. https://doi.org/10.1093/database/bap008 28. Jiang Q, Wang Y, Hao Y, Juan L, Teng M, Zhang X, Li M, Wang G, Liu Y (2009) miR2Disease: a manually curated database for microRNA deregulation in human disease. Nucleic Acids Res 37(Database issue):D98–D104. https://doi.org/10.1093/nar/gkn714 29. Chiromatzo AO, Oliveira TY, Pereira G, Costa AY, Montesco CA, Gras DE, Yosetake F, Vilar JB, Cervato M, Prado PR, Cardenas RG, Cerri R, Borges RL, Lemos RN, Alvarenga SM, Perallis VR, Pinheiro DG, Silva IT, Brandao RM, Cunha MA, Giuliatti S, Silva WA Jr (2007) miRNApath: a database of miRNAs, target genes and metabolic pathways. Genet Mol Res 6(4):859–865 30. Vlachos IS, Zagganas K, Paraskevopoulou MD, Georgakilas G, Karagkouni D, Vergoulis T, Dalamagas T, Hatzigeorgiou AG (2015) DIANA-miRPath v3.0: deciphering

microRNA function with experimental support. Nucleic Acids Res 43(W1): W460–W466. https://doi.org/10.1093/ nar/gkv403 31. Sales G, Coppe A, Bisognin A, Biasiolo M, Bortoluzzi S, Romualdi C (2010) MAGIA, a web-based tool for miRNA and genes integrated analysis. Nucleic Acids Res 38(Web Server issue):W352–W359. https://doi.org/ 10.1093/nar/gkq423 32. Huang GT, Athanassiou C, Benos PV (2011) mirConnX: condition-specific mRNA-microRNA network integrator. Nucleic Acids Res 39(Web Server issue):W416–W423. https:// doi.org/10.1093/nar/gkr276 33. Huang PJ, Liu YC, Lee CC, Lin WC, Gan RR, Lyu PC, Tang P (2010) DSAP: deepsequencing small RNA analysis pipeline. Nucleic Acids Res 38(Web Server issue): W385–W391. https://doi.org/10.1093/ nar/gkq392 34. Wu J, Liu Q, Wang X, Zheng J, Wang T, You M, Sheng Sun Z, Shi Q (2013) mirTools 2.0 for non-coding RNA discovery, profiling and functional annotation based on highthroughput sequencing. RNA Biol 10 (7):1087–1092

Chapter 3 Sfold Tools for MicroRNA Target Prediction William Rennie, Shaveta Kanoria, Chaochun Liu, C. Steven Carmack, Jun Lu, and Ye Ding Abstract Computational prediction of miRNA binding sites on target mRNAs facilitates experimental investigation of miRNA functions. In this chapter, we describe STarMir and STarMirDB, two application modules of the Sfold RNA package. STarMir is a Web server for performing miRNA binding site predictions for mRNA and target sequences submitted by users. STarMirDB is a database of precomputed transcriptome-scale predictions. Both STarMir and STarMirDB provide comprehensive sequence, thermodynamic, and target structure features, a logistic probability as a measure of confidence for each predicted site, and a publicationquality diagram of the predicted miRNA–target hybrid. In addition, STarMir now offers a new quantitative score to address combined regulatory effects of multiple seed and seedless sites. This score provides a quantitative measure of the overall regulatory effects of both seed and seedless sites on the target. STarMir and STarMirDB are freely available to all through the Sfold Web application server at http://sfold. wadsworth.org. Key words miRNA, CLIP, Target mRNA, RNA secondary structure, miRNA binding site, Efficacy, Database, Quantitative score

1

Introduction MicroRNAs (miRNAs) are a class of naturally occurring, small noncoding RNAs (ncRNAs) ~22 nucleotide (nt) in length that have been found in plants, animals, and some viruses [1, 2]. Mature miRNAs exert their function by guiding the RNA-induced silencing complex (RISC) to partially complementary sequences that are typically located in the 30 untranslated regions (30 UTRs) of target mRNAs. Target binding by miRNA induces translational repression and/or mRNA degradation of the target mRNA [1, 2]. miRNA-mediated gene regulation is rather extensive, as one miRNA may regulate hundreds of targets, whereas an individual mRNA can be targeted by multiple miRNAs [3]. miRNAs play important roles in several biological processes such as development, differentiation, apoptosis, and proliferation [2, 4]. Moreover,

Alessandro Lagana` (ed.), MicroRNA Target Identification: Methods and Protocols, Methods in Molecular Biology, vol. 1970, https://doi.org/10.1007/978-1-4939-9207-2_3, © Springer Science+Business Media, LLC, part of Springer Nature 2019

31

32

William Rennie et al.

misregulation in miRNA activity has been found to be associated with numerous human diseases [5, 6]. Most existing algorithms for miRNA target prediction are primarily based on the seed rule [7]. With the development of the cross-linking immunoprecipitation (CLIP) technique [8], it has become possible to identify short Argonaute (AGO) cross-linked sequences containing miRNA binding sites. CLIP involves UV irradiation of tissues, organisms, or cells, to covalently cross-link miRNA targets to the AGO proteins (the catalytic component of the RISC complex). The cross-linked RNAs are shortened by partial digestion with RNAse to ~50 nt and further amplified by RT-PCR. The shortened RNA fragments are then sequenced for the identification of AGO tags containing miRNA binding sites on the target mRNAs. Several CLIP studies have been published in the recent years, which include HITS-CLIP for mouse brain [8], PAR-CLIP in human cell lines [9], variants of PAR-CLIP [10], and a study in worm [11]. These CLIP studies provide a genomewide map of miRNA target interactions by generating short target fragments containing miRNA binding sites. In an effort to develop logistic models for improved miRNA binding site predictions, the high-throughput data from the CLIP studies were successfully utilized [12]. These models are based on a comprehensive list of sequence, thermodynamic, and target structure features that were found to be enriched for miRNA binding sites identified by CLIP, and were validated by intradataset, interdataset as well as crossspecies validations [12]. The models have been implemented into the STarMir application module of the Sfold RNA package, which predicts miRNA binding sites on a target mRNA [13]. The models have also been employed to create STarMirDB, a database of precomputed transcriptome-scale predictions [14]. This chapter provides updated description of the protocol for using the STarMir and STarMirDB Web applications that can be freely accessed at http:// sfold.wadsworth.org.

2

Materials STarMir and STarMirDB are available at http://sfold.wadsworth. org, as application modules of the Sfold RNA package. STarMir can be accessed via a Web browser such as Safari, Chrome, or Firefox. The user can use the Web service by providing either the mirBase ID for the microRNA and RefSeq ID for the mRNA, or by submitting their miRNA and target sequences. STarMirDB searches can be performed by providing mirBase ID, RefSeq ID, or gene ID.

Sfold microRNA Target Prediction

3

33

Methods STarMir is a Web application that can be used to generate prediction data for sequences not included in STarMirDB. STarMir now provides a new quantitative score to address combined regulatory effects of multiple seed and seedless sites. This score, developed based on an in-house unpublished high-throughput luciferase reporter dataset including 461 miRNAs and 11 30 UTRs for a total of 4994 miRNA–30 UTR pairs, provides a quantitative measure of the overall regulatory effects of both seed and seedless sites for one miRNA on a given target 30 UTR (see Note 1). Both the details of the luciferase reporter data and the development of the score method will be described in a separate text and published elsewhere. It is our intention to also add the score to the STarMirDB database in the future.

3.1

STarMirDB

STarMirDB is a Web application based on a searchable database of precomputed miRNA–target pairs. The user can search the database by any combination of mirBase ID and Gene ID/RefSeq ID.

3.1.1 Input Page

Figure 1 shows the STarMirDB input page. A target species can be selected from the drop-down menu. One or more microRNA (s) may also be selected from the provided menu. Alternatively, all or part of the miRBase ID for the desired microRNA(s) may be entered into the miRNA search box. Multiple IDs and/or partial IDs can be entered. If a partial ID is entered, all IDs in the database containing that string will be matched. If no microRNAs are selected, all records matching the targets given will be returned. The RefSeq IDs or gene symbols for the desired targets are entered in the target search window. Multiple IDs may be entered. If no miRBase IDs are entered, results for all miRNAs will be returned. Radio buttons on the input page allow the user to view all the features computed for the binding sites, or a selection of the most relevant and useful features. The logistic probability computed by our algorithm is a measure of confidence in a binding site. The higher the probability, the more likely the site is a miRNA binding site. The database contains all sites with probabilities above 0.5. Even the use of this threshold still produces large result sets. The user can limit the search to high confidence sites by selecting a higher logistic probability threshold in the provided box. Sites with higher logistic probability (e.g., 0.75) are more likely to be experimentally supported (see Note 2).

3.1.2 Output Page

Figure 2 shows the STarMirDB output page. The “Interactive Site Viewer” window displays the binding sites returned by the search. The binding sites returned are divided into several pages by region (the 50 UTR, the coding region and the 30 UTR) and by seed and

34

William Rennie et al.

Fig. 1 The STarMirDB input page. The user may enter information using drop-down boxes, or by entering data in text fields. The search can be limited by selecting a logistic probability threshold from the lowest drop-down box

seedless sites. The pages can be selected using the tabs near the top of the viewer window. The figure displays the “most relevant” output fields. In addition to energy measures, the display includes site conservation, conservation of the seed region, and logistic probability. The last

Sfold microRNA Target Prediction

35

Fig. 2 The STarMirDB output page. The page displays the search results in the upper “Interactive Viewer” window that scrolls both vertically or horizontally. The lower window allows the user to download the full search results in a tab-separated file format. The user can limit the number of hits displayed in the window with the uppermost drop-down box. By default, only the 100 sites with the highest logistic probability scores are shown

field, “CLIP,” indicates if the site was supported by CLIP data. The “Hybrid Conformation” field provides a link to a publicationquality PDF diagram of the predicted hybridization conformation of the site. Figure 3 shows examples of these diagrams, one for a site with seed and the other for a seedless site. Below the interactive viewer is a window that includes links to download the data in text format. If the miRNA and target sequences of interest are not present in the database, the user can use the complementary STarMir tool for any set of miRNA and target sequences. The quantitative score is not currently available in the database, and is expected to be included in the next database update. 3.2

STarMir

STarMir is a Web application that can be used to generate prediction data for sequences not present in the STarMirDB database. Using a Web browser, such as Safari, Chrome, or Firefox, users can use the Web service by providing either the mirBase ID for the

36

William Rennie et al.

a

G= −23.2 kcal/mol hsa−let−7a−5p 3'

U UGAUAUG |

l l l l l

UUGGA

|

|

l l

|

U

l

GAUGGAGU

5'

l l l l l l l l

CUACCUCA GCUAUAU GACUU 5' −−− U C −−− 3' U U U U U A U A 1753 1788 UC U C C G AU NM_000024

b

G= −21.7 kcal/mol hsa−let−7i−5p 3'

UUGUC l l l l l

G

UG U UGUUUGAUGA GAG l l l

|

l l l l

|

l

l l

5'

|

A ACAG ACAGACUAUU CUU 5' −−− C G −−− 3' U A A A A A A 1516 1544 C U AC NM_000024

Fig. 3 Examples of hybridization diagrams produced by STarMir and STarMirDB, one for a seed site (a), one for a seedless sites (b). Note that the seed region of the miRNA is highlighted in red

microRNA and RefSeq ID for the mRNA, or by submitting their miRNA and the target sequences. Because multiple target sites may exist on a single 30 UTR for a given miRNA, STarMir now provides a new quantitative score to address combined regulatory effects of multiple seed and seedless sites. The score reflects the predicted efficacy of miRNA-mediated target regulation. 3.2.1 Input Page

Figure 4 illustrates the STarMir Web application sequence input page, with the manual sequence entry option selected for both miRNA and target sequences. The user can input the sequence information for a single miRNA or multiple miRNAs and a single target mRNA for prediction of miRNA–target binding sites. Upon job submission, a link is provided for tracking the progress of the job and to access the prediction results. A detailed description of the input is given below.

Sfold microRNA Target Prediction

37

Fig. 4 The STarMir input page showing manual entry of the target sequence. Alternatively, only the RefSeqID can be entered in the provided box

38

William Rennie et al.

3.2.2 Input Page: Model

STarMir predicts miRNA binding sites based on three models built for human (Homo sapiens), mouse (Mus musculus), and worm (Caenorhabditis elegans). These models have been trained on V-CLIP data for human [9], HITS-CLIP data for mouse [8], and ALG-1 CLIP data for worm [11]. The two mammalian models were cross-validated and can be broadly used for other species [12].

3.2.3 Input Page: Species

The user must select a species for prediction to retrieve prestored evolutionary conservation information. If the user enters RefSeq ID of the target mRNA and selects one of the three model species, the stored conservation information will be used. If the mRNA sequence is entered manually, the selection of species does not have any effect on prediction. Likewise, if “Other” is selected as the species, conservation information is not used in the predictions.

3.2.4 Input Page: miRNA

miRNA information can be provided in two ways. In the default option, one or more miRNA IDs can be entered (an example of miRNA ID is shown in Fig. 4), for which the sequences are retrieved from an internal database developed using the release 20 of miRBase [15]. An alternative is to enter one or more miRNA sequences in FASTA format into the input box, or upload a FASTA file (Fig. 4). Although there is no limit to the number of miRNA sequences that can be entered, each sequence must not be longer than 55 nt in length. Any characters other than A, T, G, C, and U in the miRNA sequence are removed.

3.2.5 Input Page: mRNA

The target mRNA information can be entered in three different ways. The default method is to enter the RefSeq ID in the provided input box (Fig. 2). The sequence will then be retrieved from our internal database of mRNA sequences for human and mouse. If the RefSeq ID of the mRNA is provided, evolutionary conservation information [16] will be used to make more accurate miRNA binding site predictions [12]. Alternatively, by selecting the “Manual sequence entry” option, the user can enter the sequence in raw or FASTA format, or upload a FASTA file. If the sequence is uploaded as a FASTA file, the file must not contain more than one sequence. As in the case of miRNA, any character in the mRNA sequences other than A, T, G, C, and U is removed. As the current limit of the Web server on the length of the mRNA sequence is 10,000 nt, longer sequences will be truncated to 10,000 nt starting from the 50 end. Sequences over 5000 nt take a long time to process (see Note 3). For manual sequence entry, the user must specify if the entered sequence represents an entire mRNA or a single region, that is, the 50 UTR, the CDS, or the 30 UTR, through a drop-down box directly above the sequence input box. If a complete mRNA sequence is given, the nucleotide positions for the start and end of the coding region must be specified in the boxes shown below

Sfold microRNA Target Prediction

39

the input window (the first nucleotide of an entered sequence is indexed as 1). If the user provides a RefSeq ID for the mRNA, the coding region start and end are retrieved from our internal mRNA database and the binding sites will be predicted for all three mRNA regions. 3.2.6 Input Page: Email Address

Provision of an email address is optional. If an email address is provided, the user will receive a notification once the job is completed. Alternatively, the user can check for job status using the link provided after job submission.

3.2.7 Output

Upon job completion, the results are presented through an interactive viewer. The results are also available as downloadable files. An illustration of a typical output as an interactive viewer is shown in Fig. 5, with the “CDS-seedless” tab and all miRNAs selected for display. The results are categorized as seed and seedless sites for each of the three target mRNA regions, that is, the 50 UTR, CDS, and the 30 UTR. Each tab displays the prediction results for one or all of the submitted miRNAs, which can be selected from the dropdown menu. The sites are presented in descending order of their logistic probabilities, as computed by our models. The output presents comprehensive sequence, thermodynamic, and target structure features, including the logistic probability score as the measure of confidence for a predicted site. The final “score” field is a measure of the predicted efficacy of the miRNA–target pair. The score is the linear combination of the contributions that were determined for seed and seedless binding sites. Examples are given in Table 1. The score is calculated for a microRNA–target pair. Note that 8-mer sites, presumably the most effective among all types of sites, do not necessarily ensure a high score which is predictive of effective regulation. On the other hand, large numbers of seedless sites may have a strong impact on the score. For convenience, the score is displayed in the interactive viewer for each binding site. It also appears as a field in the downloadable text files. Additionally, a link is provided to the graphic representation of the hybrid conformation. Examples of hybrid diagrams are shown in Fig. 3. The PDF of the hybrid diagram is also available for visualization and download. A file providing definitions of the features is available via the link for “Feature definitions” below the result table. The results are also provided as downloadable tab-delimited text files, which present all site features calculated by STarMir. The features marked with an asterisk (*) are the ones that are used in the prediction model. The prediction models are based on the features that were enriched in the CLIP experiments [12]. A text file is provided for each of the six categories, as shown in different tabs. Alternatively, all the results can also be

40

William Rennie et al.

Fig. 5 Output for the STarMir Web application. An interactive window displays the predicted binding sites for one or all the input miRNA sequences. The full results can also be downloaded with the links provided in the bottom window Table 1 Examples of quantitative scores with number of different types of binding sites Transcript microRNA name name

Number of 8-mer sites

Number of 7-mer sites

Number of 6-mer sites

Number of seedless sites Score

hKLF4

hsa-mir153-2

0

0

2

13

0.037326758

hDICER

hsa-let-7c

0

2

3

319

0.470307676

mBMI1

hsa-mir154

1

0

4

77

0.144362275

hOCT4

hsa-mir335

1

0

0

19

0.021329511

hLIN28

hsa-mir-92

0

1

6

196

0.282444367

downloaded as a compressed archive, including a text version of the hybrid conformation diagrams for each site and a file showing the probability of each nucleotide in the site to be unpaired or singlestranded. The score is calculated for each miRNA–target pair and displayed in the interactive viewer for each binding site for the pair, for convenience purposes only. For example, in Fig. 5, all five sites for

Sfold microRNA Target Prediction

41

the hsa-let-7a-3p:NM_017589 pair have a score of 0.023. “Score” also appears as a field in the downloadable text files.

4

Notes 1. Our quantitative score was developed using reporter expression data. This dataset revealed that when the score is 0.09 or higher, 54% of targets were effectively downregulated. 2. The logistic probability is a measure of confidence of a predicted site that is calculated by models developed using CLIP studies. It is difficult to assign a level of contribution toward the overall functional efficacy given the logistic probability of a particular binding site. However, we recommend a high logistic probability of 0.75 for consideration of the binding site for further experimental investigation. 3. The STarMir calculation times grows exponentially with the length of the mRNA sequence. Typical calculation times for Sfold Web server are 3 min for 500 nts, 5 min for 1000 nts, 30 min for 2000 nts, 2 h for 3000 nts, 5 h for 4000 nts, and 9 h for 5000 nts. Current limit for sequence length is 10,000 nts. Longer sequences are truncated, but the results may be less accurate than using the whole sequence due to possible long range RNA base interactions.

Acknowledgments The Bioinformatics Core at the Wadsworth Center is acknowledged for supporting computing resources for this work. This work is supported in part by the National Institutes of Health (grants GM099811, GM116855 to Y.D. and J. L.). References 1. Ambros V (2004) The functions of animal microRNAs. Nature 431(7006):350–355 2. Bartel DP (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116 (2):281–297 3. Friedman RC, Farh KK, Burge CB, Bartel DP (2009) Most mammalian mRNAs are conserved targets of microRNAs. Genome Res 19 (1):92–105. https://doi.org/10.1101/gr. 082701.108 4. Harfe BD (2005) MicroRNAs in vertebrate development. Curr Opin Genet Dev 15 (4):410–415. https://doi.org/10.1016/j. gde.2005.06.012

5. Esau CC, Monia BP (2007) Therapeutic potential for microRNAs. Adv Drug Deliv Rev 59(2-3):101–114. https://doi.org/10. 1016/j.addr.2007.03.007 6. Erson AE, Petty EM (2008) MicroRNAs in development and disease. Clin Genet 74 (4):296–306. https://doi.org/10.1111/j. 1399-0004.2008.01076.x 7. Lewis BP, Burge CB, Bartel DP (2005) Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120 (1):15–20

42

William Rennie et al.

8. Chi SW, Zang JB, Mele A, Darnell RB (2009) Argonaute HITS-CLIP decodes microRNAmRNA interaction maps. Nature 460 (7254):479–486. https://doi.org/10.1038/ nature08170 9. Hafner M, Landthaler M, Burger L, Khorshid M, Hausser J, Berninger P, Rothballer A, Ascano M Jr, Jungkamp AC, Munschauer M, Ulrich A, Wardle GS, Dewell S, Zavolan M, Tuschl T (2010) Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell 141(1):129–141. https://doi.org/10.1016/j.cell.2010.03.009 10. Kishore S, Jaskiewicz L, Burger L, Hausser J, Khorshid M, Zavolan M (2011) A quantitative analysis of CLIP methods for identifying binding sites of RNA-binding proteins. Nat Methods 8(7):559–564. https://doi.org/10.1038/ nmeth.1608 11. Zisoulis DG, Lovci MT, Wilbert ML, Hutt KR, Liang TY, Pasquinelli AE, Yeo GW (2010) Comprehensive discovery of endogenous Argonaute binding sites in Caenorhabditis elegans. Nat Struct Mol Biol 17(2):173–179. https://doi.org/10.1038/nsmb.1745 12. Liu C, Mallick B, Long D, Rennie WA, Wolenc A, Carmack CS, Ding Y (2013) CLIP-based prediction of mammalian

microRNA binding sites. Nucleic Acids Res 41(14):e138. https://doi.org/10.1093/nar/ gkt435 13. Rennie W, Liu C, Carmack CS, Wolenc A, Kanoria S, Lu J, Long D, Ding Y (2014) STarMir: a web server for prediction of microRNA binding sites. Nucleic Acids Res 42(Web Server issue):W114–W118. https://doi.org/10. 1093/nar/gku376 14. Rennie W, Kanoria S, Liu C, Mallick B, Long D, Wolenc A, Carmack CS, Lu J, Ding Y (2016) STarMirDB: A database of microRNA binding sites. RNA Biol 13(6):554–560. https://doi.org/10.1080/15476286.2016. 1182279 15. Kozomara A, Griffiths-Jones S (2014) miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res 42(Database issue):D68–D73. https://doi. org/10.1093/nar/gkt1181 16. Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, Weinstock GM, Wilson RK, Gibbs RA, Kent WJ, Miller W, Haussler D (2005) Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 15 (8):1034–1050. https://doi.org/10.1101/gr. 3715005

Chapter 4 The Role of Tertiary Structure in MicroRNA Target Recognition Hin Hark Gan and Kristin C. Gunsalus Abstract Translational repression and degradation of transcripts by microRNAs (miRNAs) is mediated by a ribonucleoprotein complex called the miRNA-induced silencing complex (miRISC, or RISC). Advances in experimental determination of RISC structures have enabled detailed analysis and modeling of known miRNA targets, yet a full appreciation of the structural factors influencing target recognition remains a challenge, primarily because target recognition involves a combination of RNA–RNA and RNA–protein interactions that can vary greatly among different miRNA–target pairs. In this chapter, we review progress toward understanding the role of tertiary structure in miRNA target recognition using computational approaches to assemble RISC complexes at known targets and physics-based methods for computing target interactions. Using this framework to examine RISC structures and dynamics, we describe how the conformational flexibility of Argonautes plays an important role in accommodating the diversity of miRNA–target duplexes formed at canonical and noncanonical target sites. We then discuss applications of tertiary structure-based approaches to emerging topics, including the structural effects of SNPs in miRNA targets and cooperative interactions involving Argonaute–Argonaute complexes. We conclude by assessing the prospects for genome-scale modeling of RISC structures and modeling of higher-order Argonaute complexes associated with miRNA biogenesis, mRNA regulation, and other functions. Key words MicroRNA, miRNA target recognition, Tertiary structure modeling, Molecular dynamics, Argonaute, RISC, RISC-SNPs

1

Introduction The biological function of miRNAs is mediated by posttranscriptional gene silencing through the multimeric RNA–protein complex RISC. Understanding the silencing complex at the molecular level is therefore key to deciphering the mechanisms of miRNA activity [1, 2]. The core of the complex is an Argonaute family protein bound to a ~22 nt single-stranded guide miRNA, which interacts directly only with its mRNA targets and Argonaute (Fig. 1). A second protein called GW182 binds to a different surface of the Argonaute (Fig. 1) and recruits the deadenylase complex to bring about target mRNA silencing [3, 4]. This chapter

Alessandro Lagana` (ed.), MicroRNA Target Identification: Methods and Protocols, Methods in Molecular Biology, vol. 1970, https://doi.org/10.1007/978-1-4939-9207-2_4, © Springer Science+Business Media, LLC, part of Springer Nature 2019

43

44

Hin Hark Gan and Kristin C. Gunsalus

GW182 Mid PIWI

miRNA target RNA

N

L2

L1

PAZ

Fig. 1 The core silencing complex, showing an Argonaute complexed with a miRNA (orange) that is base-paired with the seed region of a target mRNA (cyan). Argonautes have four domains (N, PAZ, Mid, PIWI) and two linker regions (L1, L2). Also shown is GW182 (green sticks), which binds near the PIWI–Mid interface. For this model, GW182 from the solved human AGO1-GW182 complex (PDB ID: 5W6V) was superimposed on the human AGO2 structure (4W5T). Argonaute color scheme: N, blue; PAZ, red; Mid, green; PIWI, magenta; L1, gray; L2, yellow

reviews the functional interactions and dynamics of the core silencing complex derived from experimental [5–9] and tertiary structure modeling studies [10–13] and explores prospects for future developments in this area. Current computational analysis of miRNA interactions is largely based on sequence and secondary structure, involving miRNA–target matching/binding methods such as RNA–RNA hybridization algorithms [14–16]. The simplicity and efficiency of computing secondary structures and energetics has enabled prediction of transcriptome-wide targets for known miRNA families in multiple species using these methods. Target prediction algorithms typically assume that target recognition is driven by the seed region (5–7 nucleotides in the 50 end of the miRNA). Biophysical and structural studies support the importance of this region in target recognition, but also reveal that computational approaches based solely on RNA–RNA interactions capture only some aspects of the rich interactions in the RISC complex.

The Role of Tertiary Structure in MicroRNA Target Recognition

45

For example, biophysical experiments have demonstrated that Argonaute acts as a stabilizer for the solvent-exposed seed region by reducing its conformational entropy and increasing target binding affinity [17]. Crystallographic structures of RISC complexes show that the miRNA–target duplex fits tightly within Argonaute, especially near the middle region of the duplex (positions 8–11 from miRNA 50 end) where it interacts with the PIWI domain and a helix in the L2 linker, which can detect seed duplex formation [5, 18]. Thus, the duplex conformation has to fit within the constraints of the Argonaute structure, which acts as a selector of admissible duplex conformations. Since solved RISC complexes to date contain near perfect duplexes, understanding the influence of duplex conformations on target recognition requires tertiary structure modeling. Moreover, the target recognition process is highly dynamic and involves conformational movements of the Argonaute domains [5, 6, 19]. Crystallographic studies have elegantly shown that the target is initially recognized by the solvent-exposed seed region, followed by the other bases in the miRNA guide strand [6]. To accommodate the incoming miRNA–target duplex, the Argonaute undergoes a closed to open conformational switch. As shown by superposition of RISC complexes, structures with only the bound miRNA strand are in the closed conformational state, whereas structures with a duplex are in the open state [5, 12]. Computational studies of Argonaute dynamics using an all-atom normal mode support the existence of this conformational switch associated with target recognition [10]. The switch is likely mediated by a combination of three lowest-frequency modes called hinge, torsional, and breathing motions, which open and increase the volume of the RNA-binding channel [10, 12]. Detailed coordinated motions of RISC complexes have also been examined using molecular dynamics simulations [20]. A recent modeling analysis of RISC complexes with increasing duplex length indicates that both RNA–RNA and Argonaute–RNA interactions favor duplex elongation [12]. These structural insights suggest that modeling efforts can complement crystallographic studies. Knowledge of Argonaute structure and dynamics is also essential for understanding a range of important processes beyond target recognition, including RISC assembly and recycling, recruitment of the deadenylase complex, and cooperative miRNA interactions. During miRNA biogenesis, the nascent miRNA–passenger strand duplex is delivered into the Argonaute structure by Dicer with the aid of the HSP90 chaperone in a stage called Argonaute loading [21]. After target silencing the RISC complex is recycled, likely through an active destabilization of the target by Argonaute [22]. A recent crystallographic study showed that the binary Argonaute–miRNA complex greatly enhances Argonaute’s affinity for GW182, suggesting that only Argonaute–miRNA complexes

46

Hin Hark Gan and Kristin C. Gunsalus

can recruit the deadenylase complex [3]. It is probable that the bound miRNA strand alters the conformation of the PIWI-Mid domain where the GW182 binding site resides. In addition, adjacent miRNA sites act cooperatively through Argonaute–Argonaute interactions to silence targets [23, 24]. The range of coordinated movements and interactions within RISC accompanying these different processes highlight the need for tertiary structure modeling to adequately explain various aspects of miRNA function. Successful RISC modeling depends both on the availability of well-defined miRNA targets and on progress in computational methods for constructing 3D miRNA–target duplexes and loading them into structural models of Argonaute. Comprehensive sets of miRNA–target duplexes for diverse miRNA families and species have been produced by advances in identifying high-resolution miRNA target (~30 nt) sequences using cross-linking and immunoprecipitation (CLIP) methods (e.g., HITS-CLIP [25, 26], PAR-CLIP [27, 28], CLIP/ligation [29, 30]). These datasets provide a unique opportunity for transcriptome-wide assessment of target recognition and for dissection of RISC’s functional interactions and dynamics. Experimentally determined miRNA–target pairs are used as templates to predict 3D duplex structures, which is by now a reasonably mature field. The most reliable algorithms are those based on predicted secondary structures and assembly of single- and double-stranded RNA fragments from existing RNA structures. For example, the MC-Sym algorithm has been used to construct pre-miRNA duplex structures for human let-7c, miR-19a, and miR-29a [31]. To model RISC assembly, predicted tertiary duplex structures are then loaded into Argonaute computationally. RNA duplex prediction algorithms typically generate an ensemble of structures that are ranked based on binding energy and selected for loading into Argonaute to produce candidate RISC complexes. Noncanonical targets typically result in imperfect duplexes with bulges, internal loops, and mismatches that can produce diverse conformations. How these diverse duplex conformations are accommodated within the Argonaute structure is poorly understood. Two related computational approaches have been developed to model RISC structures at many CLIP-derived targets, not just the few models available from solved structures. In one approach, a fixed, open conformational state of the T. thermophilus Argonaute structure was used to build RISC complexes with human and mouse miRNA–target duplexes to assess the effects of Argonaute structure on target recognition [13]. By modeling hundreds of known and functional targets, this study suggested that Argonaute plays a role in target selection. In the second approach, the Argonaute conformation was allowed to vary to find the optimal fit for miRNA–target duplex structures formed at different miRNA targets [12]. The method therefore incorporates the fact that target

The Role of Tertiary Structure in MicroRNA Target Recognition

47

recognition involves changes in the Argonaute structure. Algorithmically, a RISC structure is assembled by searching for Argonaute and duplex conformations that can form an energetically viable complex. This work characterized human and C. elegans RISC structures for a number of CLIP-derived canonical and noncanonical targets. In this chapter, we focus on the methods and applications of the RISC assembly approach with variable Argonaute conformations for analyzing the contributions of RNA–RNA and protein–RNA interactions to target recognition. Subheading 2 gives a general outline of the RISC assembly approach. Subheading 3 focuses on the role of Argonaute and duplex conformations in assembling RISC complexes. Subheading 4 describes the scheme for computing RISC interactions that contribute to the binding affinity between any two macromolecular components. Subheading 5 discusses the role of duplex conformations, duplex–Argonaute interactions, and duplex length in target recognition. Applications of the RISC assembly method to assess the structural effects of SNPs and cooperative miRNA interactions are introduced in Subheadings 6 and 7, respectively. We conclude by considering the prospects for modeling higher-order Argonaute complexes associated with various aspects of miRNA function.

2

Tertiary Structure-Based Approach to Modeling miRNA Target Recognition To gain a comprehensive understanding of the role of tertiary structure in target recognition, an important goal is to develop a method for building RISC complexes at many known miRNA target sites, including targets with little or no base pairing in the seed region and miRNA–target duplexes with large internal loops. Although solved crystal structures have already provided many insights into the mechanisms of target recognition, they are mostly limited to a few near-perfect duplexes, leaving open the question of how the majority of duplex structures are formed within RISC complexes. Our approach to building RISC complexes is based on two key ingredients: high-resolution miRNA targets and conformational flexibility of Argonaute and RNA structures. Highresolution miRNA target sequences are needed to minimize errors in predicted miRNA–target duplexes that can produce spurious results in constructed complexes. This is facilitated by experimental identification of targets derived from CLIP/ligation methods, which have a resolution of ~30 nt [27, 29]. During target recognition, both Argonaute and duplex conformations dynamically adjust to attain the equilibrium RISC conformations. Thus, finding the optimal RISC structure must take into account the conformational flexibility of both duplex and Argonaute structures.

48

Hin Hark Gan and Kristin C. Gunsalus

Structural templates (X-ray, NMR, homology models)

RNA targets (CLIP data)

Complex assembly (load RNA duplexes, protein dynamics)

Structure refinement (minimization)

Compute properties of complexes (binding affinities)

+ Ago-miRNA

mRNA

RISC

Fig. 2 A computational framework for assembling and analyzing RISC structures based on high-resolution, CLIP-derived miRNA targets and solved Argonaute structures. RISC models are assembled by loading duplexes into open Argonaute conformations derived from analysis of protein dynamics. Binding affinities of refined RISC models are computed as interactions between the binary Argonaute–miRNA complex and an RNA target

Based on these two ingredients, our general strategy for building RISC complexes integrates experimental data (CLIP-derived miRNA targets, solved Argonaute structures) and structural modeling methods (Fig. 2). Target and miRNA sequences from CLIP/ ligation methods yield predicted 2D miRNA–target structures, which are used to construct 3D duplex models. The solved Argonaute and Argonaute–RNA structures are used as templates to generate Argonaute conformations for duplex loading. In cases where solved Argonaute structures are not available, homology models derived from closely related proteins may be used. The duplex and Argonaute structures are then assembled into candidate RISC complexes (Subheading 3), complexes are refined by energy minimization, and finally target binding affinities are computed. This general protocol is designed to apply broadly to diverse miRNA families, mRNA targets, and Argonaute structures. Below, we detail the structural modeling steps for (1) constructing 3D from 2D miRNA–target duplexes, (2) generating

The Role of Tertiary Structure in MicroRNA Target Recognition

49

Argonaute conformations and loading 3D duplexes into Argonaute conformations, (3) refining RISC complexes, and (4) computing interactions in RISC structures. 1. To predict 3D from 2D duplexes we use the MC-Sym algorithm, which assembles 3D duplexes using RNA fragments (single-stranded and double-stranded RNAs) [31]. For each 2D duplex, the MC-Sym algorithm constructs an ensemble of 3D structures [12, 31]. The conformational ensemble size required depends on the structure’s conformational variability, with perfect duplexes needing the fewest structural models. We typically generate up to 1000 duplex structures and rank their binding affinities using molecular force fields (total intermolecular energy from Tinker’s “analyze” module [32]). 2. To generate a large pool of candidate RISC complexes, we use duplex and Argonaute conformational ensembles (Fig. 3). The Argonaute structure ensemble is generated using an elastic network model method (Subheading 3). Using a solved RISC structure template, each duplex is docked to an Argonaute structure to build model RISC complexes. 3. Assembled RISC complexes are refined before computing target binding affinities. Of the thousands of candidate complexes only the top 5 complexes, as ranked by the least number of steric clashes between the atoms, are considered for structure refinement. We use a local (limited memory L-BFGS1) minimization method with Amber99 plus GBSA2 force field implemented in Tinker [32]. 4. For binding affinity calculations (i.e., RNA–RNA and RNA– protein interactions), we consider both the enthalpic and entropic terms of the binding free energy [11]. The enthalpic term consists of atomic van der Waals and electrostatic interactions as well as implicit solvation terms to account for the presence of ions and water molecules. The entropic term includes translational, rotational, and vibrational entropies [33].

3

Assembly of RISC Complexes Although target recognition dynamics suggest that growing a duplex within a miRNA-bound Argonaute structure is a natural description of the process, this approach is not practical for transcriptome-wide modeling because of the need to simulate Argonaute and RNA dynamics. A more attainable goal is to focus

1 2

Broyden–Fletcher–Goldfarb–Shanno algorithm. Generalized Born Surface Area solvation approximation.

50

Hin Hark Gan and Kristin C. Gunsalus

Duplex structure ensemble

Ago structure ensemble

Candidate RISC structures

Screening of RISC structures (minimal atomic overlaps)

Refinement of RISC structures (energy minimization)

Predicted RISC structure

Fig. 3 Computational method for assembling RISC models using duplex and Argonaute structure ensembles. The miRNA–target duplex structures are constructed using a 3D duplex assembly algorithm and Argonaute structures are generated using an elastic network model, a computational approach for analyzing global domain motions of macromolecules. A large pool of assembled candidate RISC structures is then screened and selected candidates are refined to predict RISC models

on constructing the final, equilibrium RISC complexes. The problem then becomes how to load a duplex conformation into an Argonaute structure. This question is not purely theoretical because Argonaute loading is an essential step of the miRNA biogenesis pathway, in which Dicer transfers the miRNA–passenger strand duplex to an open Argonaute structure stabilized by the HSP90 chaperone [21, 34]. The problem of finding an optimal RISC complex for a given miRNA target can be formulated as a search for duplex and Argonaute conformations that can be assembled into viable RISC complexes (i.e., no steric clashes and

The Role of Tertiary Structure in MicroRNA Target Recognition

51

favorable energetics). Since algorithms already exist for generating duplex conformations (e.g., MC-Sym), we describe below a method for sampling Argonaute’s conformational space, followed by a brief discussion on assembly, screening, and refinement of candidate RISC complexes. Another structural modeling method developed recently incorporates some elements of this approach but using a fixed Argonaute structure and a slightly different strategy for building RNA duplexes [13]. 3.1 Argonaute Conformational Ensemble

Multidomain proteins like Argonautes often possess large-scale, relative domain motions in solution called the normal modes [35, 36]. Normal modes are discrete conformational movements with specific frequencies, which can be computed using methods such as the elastic network model (ENM) theory [37, 38]. Low-frequency modes of motion are energetically more likely to occur than higher frequency motions. Comparative analysis of modes of motion for RNA-free Argonautes from several species (T. thermophilus, K. polysporous, and H. sapiens) showed that their three lowest-frequency (nontrivial) modes are substantially similar, suggesting these modes of motion can serve as a basis for sampling Argonaute’s conformational space [12]. An earlier work on RNA-free prokaryotic Argonautes also concluded that these are the most important frequencies for target recognition [10]. Argonaute–RNA complexes possess modes of motion that differ from those of RNA-free Argonaute. Analysis of solved Argonaute structures with and without RNA suggests that more general Argonaute conformations can be found by applying ENM iteratively. This procedure typically considers transitions between the most relevant low-frequency, nontrivial modes (1, 2, and 3), in which new conformations are generated by applying ENM to a particular structural mode (e.g., mode 2 conformations generated from a mode 1 structure); the new conformations can in turn be used to generate other conformations, and so on. Applying ENM to the low-frequency modes in different combinations leads to several possible sequences of Argonaute conformations: 1 ! 2, 1 ! 3, 2 ! 3, 1 ! 2 ! 3, etc. In addition to these conformations, the structure ensemble includes conformations in modes 1, 2, and 3 generated by applying ENM to a native structure. Computationally, each mode of motion is represented by a set of conformations. If N conformations are considered per mode (defined by the user), then there are O(N2) possible conformations for mode i ! j transitions and O(N3) conformations for mode i ! j ! k transitions. We have used N ¼ 9 and considered all possible transitions to generate conformational ensembles of size ~1000; to speed up computations, we have also used a smaller subset of ~100 “open” Argonaute conformations [12]. Another ENM parameter is the amplitude of

52

Hin Hark Gan and Kristin C. Gunsalus

motion, which sets the extent to which the structure is allowed to vary. In the ElNemo implementation of ENM [37], we have used an amplitude of 600. Argonaute loading requires generating a separate conformational ensemble for each Argonaute structure (e.g., from yeast, human, or worm). 3.2 Screening of Candidate RISC Complexes

4

For duplex and Argonaute conformational ensembles of size ~1000, there are ~106 candidate RISC complexes, which poses a considerable challenge for screening. This large set of complexes can be reduced by focusing on low-energy duplexes (12 h compute time per complex), a challenge is to model transcriptome-wide targets to generate comprehensive structural data that could be exploited for developing more effective measures

56

Hin Hark Gan and Kristin C. Gunsalus

of target recognition, especially for noncanonical targets that cannot be adequately assessed using current secondary structure-based methods [41].

6

Structural Effects of SNPs in miRNA Targets About half a million single nucleotide polymorphisms (SNPs) in human occur within gene regulatory regions (promoters, enhancers, transcription factor binding sites, untranslated regions) [42]. In silico surveys of the human genome and catalogued sequence variants (https://www.ncbi.nlm.nih.gov/snp/) suggest that ~2300 SNPs occur in pre-miRNAs [43] and ~14,000 in CLIP-derived miRNA targets (our unpublished data). SNPs in pre-miRNAs could potentially affect the miRNA biogenesis pathway, whereas SNPs in miRNA targets could alter or disrupt target recognition [44]. A number of SNPs in miRNA target sites have been proposed as plausible causal mutations in human diseases [45]. Progress in deciphering the link between such “RISC-SNPs” and biology can be made by assessing the effects of SNPs on miRNA target recognition. Current approaches to estimate the impact of RISC-SNPs are based on secondary structure analysis [46] and could be enhanced by the application of tertiary structure methods, which allow more precise calculation of target binding affinities associated with subtle structural changes in RISC complexes induced by SNPs. Structure-based analysis guided by largescale data on gene expression levels influenced by genetic variants [47] could be particularly productive. Building RISC complexes at known miRNA targets with SNPs involves a modification of the RISC assembly algorithm (Subheading 2). A SNP can either generate local structural changes (base conformations) or abolish the target site. To account for these possibilities, the miRNA–target duplex formation needs to be reassessed using a hybridization program (e.g., RNAduplex [48]). The predicted 2D duplex is then used to generate 3D duplex conformations, followed by RISC assembly and refinement, as described in Subheadings 2 and 3. To compare the effects of allelic SNPs, the RISC model for each allele needs to be independently assembled. Structural modeling reveals that SNP alleles can induce favorable or unfavorable interactions depending on their structural contexts that are not always evident from patterns of base pairing in predicted miRNA–mRNA target secondary structures. We illustrate assessment of the structural effects induced by single nucleotide substitutions at human SNP rs1047420, which occurs in the 30 UTR of the GLRX (glutaredoxin) gene and affects the target strand of the seed duplex formed with hsa-miR-15a-5p (Table 2 and Fig. 5). Notably, this site is not predicted as a miRNA target by

Table 2 Binding affinity changes induced by human SNP rs1047420 in a target of hsa-miR-15a-5p ΔG(2D)

ΔG(3D)

U CAU G AGCAGCA AAUGGUUU ||||||| |||||||| UCGUCGU UUACUAAA U U A

16.80

38.1

miRNA 50 U CAU G AGCAGCA AAUGGUUU ||||||| |||||||| UUGUCGU UUACUAAA U A mRNA 30 U

14.40

2.2

miRNA 50

CAU G AAUGGUUU |||||||| UUACUAAA U A

12.60

20.2

U G CAU G A CAGCA AAUGGUUU | ||||| |||||||| U GUCGU UUACUAAA U G U A

12.70

49.2

miRNA–mRNA duplex 0

miRNA 5

mRNA 30

mRNA 30 0

miRNA 5

mRNA 30

G

A

CAGCA ||||| GUCGU

SNP alleles (bold): C—ancestral, U—minor, A, G—hypothetical. Binding affinities ΔG(2D) and ΔG(3D) (kcal/mol) were computed using RNAduplex and our 3D method, respectively. Binding affinities for the 3D models were computed using the formula ΔG(3D) ¼ ΔG(seed) + αΔG(30 ), where the weight factor α ¼ 0.2, and ΔG(seed) and ΔG(30 ) are the affinities for the seed (nucleotides 2–8) and 30 (nucleotides 9 and beyond) regions, respectively. A low α value was chosen to recognize the dominance of the seed region

Fig. 5 Superposition of RISC models showing the structural effects induced by nucleotide substitutions at the site of human SNP rs1047420 in a CLIP-derived target of hsa-miR-15a-5p. Single-nucleotide variants and their binding affinities are: C, ancestral allele (38.1 kcal/mol, yellow); T, minor SNP allele (2.2 kcal/mol, red); A (20.2 kcal/mol; unpaired, not shown); and G (49.2 kcal/mol; blue). Strand color scheme: Argonaute, gray; L2 α7 helix, orange; miRNA, green; mRNA, cyan. Human AGO1 was used to model the complex. The A variant is not modeled because it disrupts base pairing in the target 30 end

58

Hin Hark Gan and Kristin C. Gunsalus

either TargetScan [49] or PicTar [50], presumably because sequence conservation in the seed region is restricted to primates. We model the ancestral allele common to human and chimpanzee (C), the single annotated SNP allele (T), and the two other possible derived single nucleotide variants (SNVs, A and G). C-to-T transitions are fairly common and occur by oxidative deamination of cytosine; for this SNP, the minor T allele is present at a frequency of 23–26% in the human population (see rs1047420 in dbSNP). The computed target binding affinities, defined as interaction energies between the Argonaute–miRNA complex and the target sequences, show that binding for the ancestral C allele is favorable (38.1 kcal/mol), whereas binding for the observed minor T allele is highly unfavorable (2.2 kcal/mol); hypothetical SNVs arising from transversions (A and G) would be expected to impact the binding affinity to a lesser degree and in opposite directions (Table 2). Inspection of the superimposed RISC models (Fig. 5) indicates that all three substitutions induce changes in the target backbone conformation that alter its interactions with the α7 helix in the L2 linker, which acts as a sensor of the quality of the seed duplex (note that for the A variant, the target 30 end is not modeled since it does not participate in base pairing). Structural modeling thus indicates that computed binding affinities are related to the induction of favorable or unfavorable interactions with the L2 α7 helix. A comparison of binding affinities computed using 2D- and 3D-based methods reveals pronounced differences in predicted outcomes at this site (Table 2). While 2D models clearly show that the A variant is more disruptive to base pairing in the seed region, 3D modeling is required to discern that the conformational perturbation induced by the T allele is likely more deleterious for target binding. Consequently the T allele has greater potential to impair the function of this miRNA target site by altering its binding equilibrium. This example illustrates the importance of evaluating both RNA–RNA and RNA–protein interactions to fully account for the effects of subtle structural changes induced by SNP alleles. Large-scale assessment of SNPs using 3D modeling is thus needed to uncover the diversity of effects that can influence RISC structure and function in vivo. A potential application of this structural assessment method is to help identify biomarkers for diseases associated with SNPs in miRNA targets [45]. To achieve this goal, a key step is to develop an efficient way to screen such SNPs on a genome scale. Since most SNPs are biallelic (~95%), there are ~28,000 RISC complexes to consider for ~14,000 SNPs in CLIP-derived miRNA targets. To make such an effort feasible, a fruitful strategy may be to prescreen putative RISC-SNPs for predicted changes in secondary and tertiary RNA duplex structures, and any associated effects on gene expression based on eQTLs [47], before building RISC models.

The Role of Tertiary Structure in MicroRNA Target Recognition

7

59

Cooperative Interactions Between miRNA Targets Although RISC complexes at multiple miRNA targets in a transcript are commonly regarded as independent regulatory units, accumulating experimental evidence indicates that miRNA targets in close proximity can act cooperatively. Specifically, adjacent miRNA sites exhibit cooperative interactions whose strength diminishes with increasing distance between the target sites [23, 24]. Cooperative miRNA interactions represent an altered mode of target recognition. However, the molecular mechanism of cooperative miRNA interactions remains obscure. We illustrate modeling of cooperative miRNA interactions using the experimentally verified adjacent binding sites in the C. elegans toh-1 30 UTR for miR-80 and miR-35 (Fig. 6), which are abundantly expressed in embryos [23]. Data indicate that binding of both sites by miR-80 and miR-35 is required for target

Fig. 6 Computational method for assembling a cooperative RISC–RISC interaction based on experimentally verified adjacent miRNA sites in the C. elegans toh-1 30 UTR. Initially, the RISC structures at miR-80 and miR-35 sites were assembled independently using procedures outlined in Figs. 2 and 3. Model RISC–RISC complexes were then assembled using mRNA linker (CUU) conformations from solved RNA structures. ALG-2, a C. elegans Argonaute, was used to model RISCs

60

Hin Hark Gan and Kristin C. Gunsalus

deadenylation. The miR-80–target duplex has a perfect seed duplex, but the weaker miR-35-target duplex has only five base pairs and a bulge in the seed duplex. Biochemical experiments showed that the recruitment of RISC to the weaker, noncanonical miR-35 site is dependent on occupation of the adjacent miR-80 site [23]. Cooperative interactions have also been demonstrated in mammalian cells using reporter constructs with up to six adjacent target sites [24]. There are two basic questions regarding these and other examples of adjacent miRNA target sites. Can two adjacent RISC complexes be accommodated without steric clashes between the Argonautes? Which residues at the Argonaute–Argonaute interface promote cooperative interactions? Structural modeling of RISC–RISC complexes can help provide answers to these questions. The general rule for modeling two or more linearly adjacent RISC complexes on the same mRNA is to build on the RISC assembly algorithm (Subheadings 2 and 3). First, the RISC at each miRNA binding site is assembled separately (Fig. 6). Second, regarding each RISC as a rigid body, adjacent RISCs are merged by sampling possible linker mRNA conformations obtained from solved RNA structures (CUU was used for the miR-80/miR-35 adjacent site pair). Third, the assembled RISC–RISC complex is refined using a local energy minimization algorithm. This procedure generates multiple structural models that need to be further evaluated for biological relevance. For the adjacent miR-80/miR35 sites, a single large cluster of similar complexes was obtained (along with two dissimilar orphan conformations), and members of this cluster were selected as the most biologically relevant. The modeled RISC–RISC complex for the miR-80/miR-35 example using C. elegans ALG-2 shows that the adjacent Argonaute structures are in the same orientation with respect to the miRNA–mRNA helical axis (Fig. 6) [12]. The prominent interactions are between the Mid and PIWI domains of adjacent Argonaute structures. The geometry of this complex, which is free of steric clashes, supports the experimental finding that a series of two or more adjacent miRNA targets can participate in cooperative interactions [24]. Computational analysis pinpointed several conserved residues (Arg487 in ALG-2 Mid domain and Gln730 and Gln748 in ALG-2 PIWI domain) that were confirmed experimentally by mutagenesis as likely sources of cooperative miRNA interactions [23]. This work suggests that combined structural modeling and experimental analysis is a productive strategy for deciphering interactions in large regulatory complexes.

8

Future Prospects Consideration of tertiary structure interactions is needed to go beyond the analysis of miRNA target recognition that is possible using secondary structure approaches. Solved RISC structures have

The Role of Tertiary Structure in MicroRNA Target Recognition

61

provided significant information about the role of Argonaute structure and dynamics in target recognition [5–9]. Based on solved structures, structural modeling methods have contributed to our understanding of miRNA target recognition, including elucidating the modes of Argonaute conformational movements involved in target recognition and characterizing the RISC structures for diverse types of miRNA targets (straight and distorted miRNA–target duplexes) [10, 12]. Importantly, structural modeling approaches provide a broad framework for analyzing a host of Argonaute-related complexes and activities, such as structural effects of SNPs and cooperative interactions [23]. Improving the accuracy of RISC models, especially for noncanonical targets, will require further development of modeling methods as well as new data in the form of solved RISC structures for imperfect seed and seedless targets. In the near future, systematic testing of modeling methods using large miRNA target datasets, especially experimentally validated functional targets, will also help refine current structural models. An emerging application of tertiary structure-based computational modeling is building higher-order Argonaute complexes to elucidate their activities. Argonaute proteins have a number of direct interactors, including Dicer, HSP90, GW182, and Pumilio family (Puf) proteins [34]. Although Dicer and HSP90 participate in Argonaute loading, the process by which loading occurs is not clearly understood. In particular, experiments suggest that HSP90–Argonaute interactions maintain the Argonaute structure in an open conformational state for duplex loading [21, 34]. Explicit structural modeling is needed to demonstrate this role of HSP90 in Argonaute loading. In addition, a recent crystallographic and biochemical study showed that the strength of GW182 binding to Argonaute is miRNA-dependent [3]. Computational studies of RISC dynamics are needed to elucidate how miRNA binding affects the efficacy of GW182–Argonaute interactions and therefore the recruitment of deadenylase preceding target degradation. Intriguingly, posttranscriptional regulation by Argonautes has been shown to occur in association with Puf proteins [51], which are deeply conserved RNA binding proteins and translational regulators with important roles in animal development [52]. Experiments with C. elegans and human Argonautes showed that Argonaute can form a complex with a Puf and eEF1A, a eukaryotic elongation factor 1A responsible for delivering aa-tRNA to the ribosome via its GTPase activity [51]. A plausible scenario is that eEF1A binding to the preformed Argonaute–Puf complex alters eEF1A’s conformation, disabling its GTPase activity and thus blocking translational elongation. Detailed structural modeling of the Argonaute–Puf complex and its interactions with eEF1A can help illuminate this novel mechanism of translational regulation. Another emerging area is DNA targeting by guide RNA and DNA strands. The conventional view in miRNA research is that the

62

Hin Hark Gan and Kristin C. Gunsalus

binary Argonaute–miRNA complex finds the RNA targets. Emerging experiments suggest that RISC can bind DNA targets to form RNA–DNA hybrid duplexes [53, 54]. Indeed, the solved T. thermophilus RISC structure contains a hybrid duplex [5], and T. thermophilus Argonaute is now understood to load guide and target DNA in host defense against foreign DNAs in vivo [55, 56]. A role for Argonaute-mediated RNA–DNA interactions in transcriptional regulation has also been reported [57]. We currently lack computational methods to predict targets involving RNA–DNA interactions due partly to a paucity of structural data on hybrid duplexes. As the wider significance of DNA targets becomes clearer, the need to develop computational tools for hybrid duplexes will gain greater urgency in the near future. This and other emerging topics suggest that tertiary structure-based modeling methods can play an important role in elucidating the diverse regulatory functions of Argonaute proteins in living cells. References 1. Fabian MR, Sonenberg N (2012) The mechanics of miRNA-mediated gene silencing: a look under the hood of miRISC. Nat Struct Mol Biol 19(6):586–593. https://doi.org/10. 1038/nsmb.2296 2. Ipsaro JJ, Joshua-Tor L (2015) From guide to target: molecular insights into eukaryotic RNA-interference machinery. Nat Struct Mol Biol 22(1):20–28. https://doi.org/10.1038/ nsmb.2931 3. Elkayam E, Faehnle CR, Morales M, Sun J, Li H, Joshua-Tor L (2017) Multivalent recruitment of human Argonaute by GW182. Mol Cell 67(4):646–658e643. https://doi.org/ 10.1016/j.molcel.2017.07.007 4. Fabian MR, Cieplak MK, Frank F, Morita M, Green J, Srikumar T, Nagar B, Yamamoto T, Raught B, Duchaine TF, Sonenberg N (2011) miRNA-mediated deadenylation is orchestrated by GW182 through two conserved motifs that interact with CCR4-NOT. Nat Struct Mol Biol 18(11):1211–1217. https:// doi.org/10.1038/nsmb.2149 5. Wang Y, Juranek S, Li H, Sheng G, Tuschl T, Patel DJ (2008) Structure of an argonaute silencing complex with a seed-containing guide DNA and target RNA duplex. Nature 456(7224):921–926 6. Wang Y, Juranek S, Li H, Sheng G, Wardle GS, Tuschl T, Patel DJ (2009) Nucleation, propagation and cleavage of target RNAs in Ago silencing complexes. Nature 461 (7265):754–761

7. Nakanishi K, Weinberg DE, Bartel DP, Patel DJ (2012) Structure of yeast Argonaute with guide RNA. Nature 486(7403):368–374 8. Schirle NT, MacRae IJ (2012) The crystal structure of human Argonaute2. Science 336 (6084):1037–1040. https://doi.org/10. 1126/science.1221551 9. Faehnle CR, Elkayam E, Haase AD, Hannon GJ, Joshua-Tor L (2013) The making of a slicer: activation of human Argonaute-1. Cell Rep 3(6):1901–1909. https://doi.org/10. 1016/j.celrep.2013.05.033 10. Ming D, Wall ME, Sanbonmatsu KY (2007) Domain motions of Argonaute, the catalytic engine of RNA interference. BMC Bioinformatics 8:470. https://doi.org/10.1186/ 1471-2105-8-470 11. Gan HH, Gunsalus KC (2013) Tertiary structure-based analysis of microRNA-target interactions. RNA 19(4):539–551. https:// doi.org/10.1261/rna.035691.112 12. Gan HH, Gunsalus KC (2015) Assembly and analysis of eukaryotic Argonaute-RNA complexes in microRNA-target recognition. Nucleic Acids Res 43(20):9613–9625. https://doi.org/10.1093/nar/gkv990 13. Leoni G, Tramontano A (2016) A structural view of microRNA-target recognition. Nucleic Acids Res 44(9):e82. https://doi.org/10. 1093/nar/gkw043 14. Bartel DP (2009) MicroRNAs: target recognition and regulatory functions. Cell 136 (2):215–233

The Role of Tertiary Structure in MicroRNA Target Recognition 15. Rajewsky N (2006) microRNA target predictions in animals. Nat Genet 38(Suppl):S8–S13 16. Cao S, Chen SJ (2012) Predicting kissing interactions in microRNA-target complex and assessment of microRNA activity. Nucleic Acids Res 40:4681–4690. https://doi.org/ 10.1093/nar/gks052 17. Parker JS, Parizotto EA, Wang M, Roe SM, Barford D (2009) Enhancement of the seedtarget recognition step in RNA silencing by a PIWI/MID domain protein. Mol Cell 33 (2):204–214 18. Kuhn CD, Joshua-Tor L (2013) Eukaryotic Argonautes come into focus. Trends Biochem Sci 38(5):263–271. https://doi.org/10. 1016/j.tibs.2013.02.008 19. Parker JS (2010) How to slice: snapshots of Argonaute in action. Silence 1(1):3 20. Wang Y, Li Y, Ma Z, Yang W, Ai C (2010) Mechanism of microRNA-target interaction: molecular dynamics simulations and thermodynamics analysis. PLoS Comput Biol 6(7): e1000866 21. Iwasaki S, Kobayashi M, Yoda M, Sakaguchi Y, Katsuma S, Suzuki T, Tomari Y (2010) Hsc70/Hsp90 chaperone machinery mediates ATP-dependent RISC loading of small RNA duplexes. Mol Cell 39(2):292–299. https:// doi.org/10.1016/j.molcel.2010.05.015 22. Jo MH, Shin S, Jung SR, Kim E, Song JJ, Hohng S (2015) Human Argonaute 2 has diverse reaction pathways on target RNAs. Mol Cell 59(1):117–124. https://doi.org/ 10.1016/j.molcel.2015.04.027 23. Flamand MN, Gan HH, Mayya VK, Gunsalus KC, Duchaine TF (2017) A non-canonical site reveals the cooperative mechanisms of microRNA-mediated silencing. Nucleic Acids Res 45(12):7212–7225. https://doi.org/10. 1093/nar/gkx340 24. Broderick JA, Salomon WE, Ryder SP, Aronin N, Zamore PD (2011) Argonaute protein identity and pairing geometry determine cooperativity in mammalian RNA silencing. RNA 17(10):1858–1869. https://doi.org/ 10.1261/rna.2778911 25. Zisoulis DG, Lovci MT, Wilbert ML, Hutt KR, Liang TY, Pasquinelli AE, Yeo GW (2010) Comprehensive discovery of endogenous Argonaute binding sites in Caenorhabditis elegans. Nat Struct Mol Biol 17(2):173–179 26. Chi SW, Zang JB, Mele A, Darnell RB (2009) Argonaute HITS-CLIP decodes microRNAmRNA interaction maps. Nature 460 (7254):479–486 27. Grosswendt S, Filipchyk A, Manzano M, Klironomos F, Schilling M, Herzog M,

63

Gottwein E, Rajewsky N (2014) Unambiguous identification of miRNA:target site interactions by different types of ligation reactions. Mol Cell 54(6):1042–1054. https://doi.org/10. 1016/j.molcel.2014.03.049 28. Hafner M, Landthaler M, Burger L, Khorshid M, Hausser J, Berninger P, Rothballer A, Ascano M, Jungkamp AC, Munschauer M, Ulrich A, Wardle GS, Dewell S, Zavolan M, Tuschl T (2010) Transcriptome-wide identification of RNA-binding protein and MicroRNA target sites by PAR-CLIP. Cell 141(1):129–141. https://doi.org/10.1016/j.cell.2010.03.009 29. Helwak A, Kudla G, Dudnakova T, Tollervey D (2013) Mapping the human miRNA interactome by CLASH reveals frequent noncanonical binding. Cell 153(3):654–665. https://doi. org/10.1016/j.cell.2013.03.043 30. Helwak A, Tollervey D (2014) Mapping the miRNA interactome by cross-linking ligation and sequencing of hybrids (CLASH). Nat Protoc 9(3):711–728. https://doi.org/10.1038/ nprot.2014.043 31. Parisien M, Major F (2008) The MC-fold and MC-Sym pipeline infers RNA structure from sequence data. Nature 452(7183):51–55 32. Pappu RV, Hart RK, Ponder JW (1998) Analysis and application of potential energy smoothing and search methods for global optimization. J Phys Chem B 102(48):9725–9742 33. Tidor B, Karplus M (1994) The contribution of vibrational entropy to molecular association. The dimerization of insulin. J Mol Biol 238 (3):405–414 34. Meister G (2013) Argonaute proteins: functional insights and emerging roles. Nat Rev Genet 14(7):447–459. https://doi.org/10. 1038/nrg3462 35. Bahar I, Lezon TR, Yang LW, Eyal E (2010) Global dynamics of proteins: bridging between structure and function. Annu Rev Biophys 39:23–42 36. Tama F, Valle M, Frank J, Brooks CL (2003) Dynamic reorganization of the functionally active ribosome explored by normal mode analysis and cryo-electron microscopy. Proc Natl Acad Sci U S A 100(16):9319–9323 37. Suhre K, Sanejouand YH (2004) ElNemo: a normal mode web server for protein movement analysis and the generation of templates for molecular replacement. Nucleic Acids Res 32 (Web Server issue):W610–W614. https://doi. org/10.1093/nar/gkh368 38. Tama F, Wriggers W, Brooks CL (2002) Exploring global distortions of biological macromolecules and assemblies from

64

Hin Hark Gan and Kristin C. Gunsalus

low-resolution structural information and elastic network theory. J Mol Biol 321(2):297–305 39. Baker NA (2004) Poisson-Boltzmann methods for biomolecular electrostatics. Methods Enzymol 383:94–118 40. Kertesz M, Iovino N, Unnerstall U, Gaul U, Segal E (2007) The role of site accessibility in microRNA target recognition. Nat Genet 39 (10):1278–1284 41. Brennecke J, Stark A, Russell RB, Cohen SM (2005) Principles of microRNA-target recognition. PLoS Biol 3(3):e85 42. Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR (2015) A global reference for human genetic variation. Nature 526(7571):68–74. https:// doi.org/10.1038/nature15393 43. Gong J, Liu C, Liu W, Wu Y, Ma Z, Chen H, Guo AY (2015) An update of miRNASNP database for better SNP selection by GWAS data, miRNA expression and online tools. Database 2015:bav029. https://doi.org/10. 1093/database/bav029 44. Ryan BM, Robles AI, Harris CC (2010) Genetic variation in microRNA networks: the implications for cancer research. Nat Rev Cancer 10(6):389–402. https://doi.org/10. 1038/nrc2867 45. Sethupathy P, Collins FS (2008) MicroRNA target site polymorphisms and human disease. Trends Genet 24(10):489–497. https://doi. org/10.1016/j.tig.2008.07.004 46. Sabarinathan R, Wenzel A, Novotny P, Tang X, Kalari KR, Gorodkin J (2014) Transcriptomewide analysis of UTRs in non-small cell lung cancer reveals cancer-related genes with SNV-induced changes on RNA secondary structure and miRNA target sites. PLoS One 9(1):e82699. https://doi.org/10.1371/jour nal.pone.0082699 47. Battle A, Brown CD, Engelhardt BE, Montgomery SB (2017) Genetic effects on gene expression across human tissues. Nature 550 (7675):204–213. https://doi.org/10.1038/ nature24277 48. Hofacker IL (2003) Vienna RNA secondary structure server. Nucleic Acids Res 31 (13):3429–3431 49. Agarwal V, Bell GW, Nam JW, Bartel DP (2015) Predicting effective microRNA target

sites in mammalian mRNAs. eLife 4. https:// doi.org/10.7554/eLife.05005 50. Lall S, Grun D, Krek A, Chen K, Wang YL, Dewey CN, Sood P, Colombo T, Bray N, Macmenamin P, Kao HL, Gunsalus KC, Pachter L, Piano F, Rajewsky N (2006) A genome-wide map of conserved microRNA targets in C. elegans. Curr Biol 16(5):460–471 51. Friend K, Campbell ZT, Cooke A, KrollConner P, Wickens MP, Kimble J (2012) A conserved PUF-Ago-eEF1A complex attenuates translation elongation. Nat Struct Mol Biol 19(2):176–183. https://doi.org/10. 1038/nsmb.2214 52. Kimble J, Crittenden SL (2007) Controls of germline stem cells, entry into meiosis, and the sperm/oocyte decision in Caenorhabditis elegans. Annu Rev Cell Dev Biol 23:405–433. https://doi.org/10.1146/annurev.cellbio.23. 090506.123326 53. Doxzen KW, Doudna JA (2017) DNA recognition by an RNA-guided bacterial Argonaute. PLoS One 12(5):e0177097. https://doi.org/ 10.1371/journal.pone.0177097 54. Miyoshi T, Ito K, Murakami R, Uchiumi T (2016) Structural basis for the recognition of guide RNA and target DNA heteroduplex by Argonaute. Nat Commun 7:11846. https:// doi.org/10.1038/ncomms11846 55. Swarts DC, Jore MM, Westra ER, Zhu Y, Janssen JH, Snijders AP, Wang Y, Patel DJ, Berenguer J, Brouns SJ, van der Oost J (2014) DNA-guided DNA interference by a prokaryotic Argonaute. Nature 507 (7491):258–261. https://doi.org/10.1038/ nature12971 56. Sheng G, Zhao H, Wang J, Rao Y, Tian W, Swarts DC, van der Oost J, Patel DJ, Wang Y (2014) Structure-based cleavage mechanism of Thermus thermophilus Argonaute DNA guide strand-mediated DNA target cleavage. Proc Natl Acad Sci U S A 111(2):652–657. https://doi.org/10.1073/pnas.1321032111 57. Toscano-Garibay JD, Aquino-Jarquin G (2014) Transcriptional regulation mechanism mediated by miRNA-DNA*DNA triplex structure stabilized by Argonaute. Biochim Biophys Acta 1839(11):1079–1083. https://doi.org/ 10.1016/j.bbagrm.2014.07.016

Chapter 5 Protocols for miRNA Target Prediction in Plants Gaurav Sablok, Kun Yang, and Xiaopeng Wen Abstract Next-generation sequencing has opened up new avenues for the identification of microRNAs (miRNAs) and their corresponding roles in abiotic and biotic stress responses. Recently, a plethora of evidence suggests a canonical action of miRNA–mRNA interactions to regulate plant systems biology at the posttranscriptional level, thus leading to the gain or loss of genetic adaptation in plants. In this chapter, we present a detailed protocol for the identification of miRNA targets using six different prediction tools. Key words MicroRNA, Target prediction, Arabidopsis thaliana, TAPIR, psRobot, psRNATarget, TarHunter, Cleaveland

1

Introduction Obtaining a broader view of the genetic adaptation of plants to the environmental surroundings and understanding the fluctuations at the transcriptional and translational levels has always been a challenge. With the advent of next-generation sequencing technologies, substantial knowledge has been acquired at the transcriptional and posttranscriptional level to understand plant regulomes. Posttranscriptional regulation mechanisms involve small 20–22 nt long sequences called microRNAs (miRNAs), which play a critical role in the endonucleolytic cleavage of messenger RNAs (mRNAs). miRNA target prediction has always been a challenging task, and methods such as local alignments and rules inferred from the observation of miRNA–mRNA duplexes have been employed for the accurate prediction of targets. As opposed to animals, rules for miRNA target prediction in plants have been relatively easy to explore since they have been first described [1, 2], and have been subsequently revisited and revised for seed length [3, 4]. Considering the predefined rules and the perfect complementarity between miRNAs and their targets, several Web-based and stand-alone tools have been developed, such as

Alessandro Lagana` (ed.), MicroRNA Target Identification: Methods and Protocols, Methods in Molecular Biology, vol. 1970, https://doi.org/10.1007/978-1-4939-9207-2_5, © Springer Science+Business Media, LLC, part of Springer Nature 2019

65

66

Gaurav Sablok et al.

TAPIR [5], SoMART [6], psRNATarget [7], TargetFinder [8], psRobot [9], and TarHunter [10]. Most of these target prediction approaches use FASTA as a search engine and local alignment approach to infer the targets, with the exception of psRNATarget, which uses a scoring schema based on seed length. An alternative approach, which uses cleaved target site interactions (degradome sequencing), is incorporated into CleaveLand [11]. A combination of transcript-based and degradome-based methods with associated expression filtered through the lasso-regression approach has also been described [12]. In this chapter, we provide protocols for miRNA target prediction in plants using both the transcript-based and the degradome-based approaches.

2

Materials

2.1 Required Target Prediction Tools and Associated Dependencies

For the proper execution of the protocol for miRNA target prediction, the tools displayed in Table 1 need to be installed. Several dependencies also need to be installed: CDHIT (available at http://weizhongli-lab.org/cd-hit/), GNU Parallel (available at http://www.gnu.org/software/parallel/), MCL (available at http://micans.org/mcl/), RNAhybrid (available at http://bib iserv.techfak.uni-bielefeld.de/rnahybrid), Perl (v5.8+) with Bioperl v1.007002_1, FASTA34 binaries (available at http://fasta.bioch. virginia.edu/fasta_www2/fasta_down.shtml), FASTA35 binaries (available at http://fasta.bioch.virginia.edu/fasta_www2/fasta_ down.shtml), FASTA36 binaries (available at http://fasta.bioch. virginia.edu/fasta_www2/fasta_down.shtml), and MFOLD v3.5 (available at http://unafold.rna.albany.edu/?q¼mfold/downloadmfold).

Table 1 Summary of the plant miRNA target prediction tools miRNA target prediction tool

Availability

References

TargetFinder

https://github.com/carringtonlab/TargetFinder

[13]

psRNATarget

https://plantgrn.noble.org/psRNATarget/

[7]

TAPIR

http://bioinformatics.psb.ugent.be/webtools/tapir/

[5]

TarHunter

https://github.com/XMaBio/TarHunterL

[10]

TarHunterL

https://github.com/XMaBio/TarHunterL

[10]

psRobot

http://omicslab.genetics.ac.cn/psRobot/

[9]

Cleaveland

https://github.com/MikeAxtell/CleaveLand4

[11]

Protocols for miRNA Target Prediction in Plants

2.2 File Preparation for Target Predictions

67

For the present protocol, we have selected a few miRNA families of Arabidopsis thaliana, namely miR156, miR157, and miR158. The corresponding miRNAs were downloaded from miRBase and selected for target prediction using six target predictions methods, which are focused on the prediction of targets using transcript binding sites. To facilitate the task of miRNA target prediction, a FASTA file with miRNA sequences was prepared (selected_Arabidopsis_miRNAs.fasta), which reflects the miRNAs and the corresponding family classification as shown below: CODE: $head -n 6 selected_Arabidopsis_miRNAs.fasta >ath-miR156a-5p MIMAT0000166 Arabidopsis thaliana miR156a-5p UGACAGAAGAGAGUGAGCAC >ath-miR156a-3p MIMAT0031865 Arabidopsis thaliana miR156a-3p GCUCACUGCUCUUUCUGUCAGA

To integrate the target predictions in the loop shell, we used the following one-line command to sort the miRNA sequences: CODE: $sed ’/^>/d’ selected_Arabidopsis_miRNAs.fasta > selected_miRNAs_target_finder.fasta

The formatted file for the target find has now the following attributes: CODE: $head -n 6 seleted_miRNAs_target_finder.fasta UGACAGAAGAGAGUGAGCAC GCUCACUGCUCUUUCUGUCAGA UGACAGAAGAGAGUGAGCAC

which represents one single miRNA sequence on one line. For target prediction, the targets input file was downloaded from Phytozome version 12, available from https://phytozome.jgi.doe. gov/pz/portal.html.

3

Methods

3.1 Target Predictions Using TargetFinder

TargetFinder has been widely used for the prediction of miRNA targets in plants. It employs the Smith–Waterman alignment approach [8, 13]. For the prediction of targets using TargetFinder there are two options. The first option is to use the script targetfinder.pl, which requires an miRNA sequence and the target database as input. The second option is to use the script targetfinder_threads.pl, which allows for using FASTA formatted files for both the miRNAs and the targets.

68

Gaurav Sablok et al.

Fig. 1 Table format of TargetFinder’s output

Fig. 2 Alignment format of TargetFinder’s output

Before invoking TargetFinder, please export the PATH of the ssearch35_t aligner to the targetfinder DIR, as follows: export PATH=/FASTA35_installation/bin:$PATH

For option 1, using the script TargetFinder.pl requires the miRNA sequence and the target file. To run this in a for loop, the following command should be used: CODE: cat selected_miRNAs_target_finder.fasta | while read line; do perl targetfinder.pl -s $line -d Athaliana_167_TAIR10.fasta -p table > $line.target.txt; done.

The predicted targets for the corresponding set of miRNAs will be displayed in table format (Fig. 1). For option 2, using the script TargetFinder_threads.pl requires the FASTA file of multiple miRNAs and the corresponding Target file. It should be invoked as follows: CODE: $perl targetfinder_threads.pl -f selected_Arabidopsis_miRNAs.fasta -d Athaliana_167_TAIR10.fasta -o selected_miRNAs_TAIR10.txt -c 4 -t 5 -p classic.

The predicted targets for the corresponding set of miRNAs will be displayed in the classic alignment format (Fig. 2) (see Notes 1 and 2). 3.2 Target Predictions Using psRNATarget

psRNATarget [7] is an online Web-based miRNA target prediction resource (http://plantgrn.noble.org/psRNATarget/analysis? function¼1), which allows for the identification of the miRNAs targets based on the application of a predefined scoring scheme to define the complementary matching between the miRNAs and the corresponding targets. Additionally, it also allows for the prediction of miRNA targets based on the target site accessibility, thus increasing the sensitivity of target prediction and reducing false positives. The current version of psRNATarget improves miRNA target prediction by implementation of a new scoring scheme, which allows for the identification of canonical and noncanonical targets at

Protocols for miRNA Target Prediction in Plants

69

Fig. 3 Graphical user interface of psRNATarget

higher recall rates [7], by extending the seed region to 2–13 bp and allowing two mismatches [1, 3]. psRNATarget offers an easy-to-use graphical user interface, which allows for the prediction of the miRNA targets based on the following specific patterns (Fig. 3): (1) user-defined small RNAs and preloaded transcripts, which allows for the comprehensive search of miRNA targets for the species for which well-annotated transcripts are present; (2) user-defined transcripts and preloaded small RNAs, which allows for the identification of new targets using previously defined small RNAs; (3) user-defined small RNAs and transcripts, which allows for the identification of the targets in nonmodel species, thus extending the scope of psRNATarget to target prediction in nonmodel species (see Note 3). TAPIR [5] was among the first Web-based approaches developed and was also made available as a stand-alone package. It performs

70

Gaurav Sablok et al.

3.3 Target Predictions Using TAPIR

miRNA target prediction based on the FASTA search module and miRNA–mRNA duplexes. In TAPIR, two search modes are available: (1) a fast search method, which detects target interactions based on minimal free energy (MFE); (2) precise search, which employs the tool RNAHybrid to check for structure optimization. Prior to the search of targets using TAPIR, export the TAPIR directory to the path, as follows: CODE: export PATH="${PATH}:/protocol/chapter/tapir/"

For the compilation of TAPIR, FASTA34 libraries are needed (see Subheading 2.1). The path of the FASTA34 directory should be exported as follows: CODE: export PATH=/Users/protocol/FASTA34/:$PATH

To start the search of miRNA targets using the FASTA fast search for the miRNAs present in the file selected_Arabidopsis_miRNAs.fasta, run the following code: CODE: $./tapir_fasta --mir_file selected_Arabidopsis_miRNAs.fasta --target_file Athaliana_167_TAIR10.fasta --score 4 --mfe 0.8.

(see Fig. 4) (see Note 4). 3.4 Target Predictions Using psRobot

psRobot has been developed to address the high-throughput identification of small RNAs with stem-loop precursors and it also allows for the prediction of miRNA targets based on the previously defined criteria for miRNA target predictions [1, 2]. The unique features of psRobot allows for both transcript-based and degradome-based target prediction coupled with expression

Fig. 4 Output of TAPIR

Protocols for miRNA Target Prediction in Plants

71

datasets and for target site multiplicity and conservation-based approach for a given set of miRNAs. For installing psRobot on Linux 4.4.0-134-generic #160Ubuntu 86_64: CODE: $tar zxvf psRobot_v1.2.tar.gz $cd psRobot_v1.2 $export PATH=/user/mfold/Bin64:$PATH $./configure $make && sudo make install

Target prediction for a set of the miRNAs can be performed as follows: Strict mode CODE: $psRobot_tar -s selected_Arabidopsis_miRNAs.fasta -t Athaliana_167_TAIR10.fasta -o target.txt -ts 3.0 -fp 1 -tp 31 -gl 1 -p 4 -gn 0

Moderate mode CODE: $psRobot_tar -s selected_Arabidopsis_miRNAs.fasta -t Athaliana_167_TAIR10.fasta -o target.txt -ts 2.5 -fp 2 -tp 17 -gl 30 -p 4 -gn 1

Loose mode CODE: $psRobot_tar -s ../selected_Arabidopsis_miRNAs.fasta -t ../Athaliana_167_TAIR10.fasta -o target.txt -ts 3.0 -fp 2 -tp 17 -gl 1 -p 4 -gn 2

The output of psRobot is a tab-delimited file showing the alignment of the miRNAs to the corresponding target along with the score value (Fig. 5). 3.5 Target Prediction Using the TarHunter and TarHunterL

Fig. 5 Output of psRobot

miRNA target predictions using TarHunter and TarHunterL [10] relies on a cross-species conservation filter, which allows for the identification of cross-species conserved targets and targets with noncanonical binding. TarHunterL represents a simplified version

72

Gaurav Sablok et al.

Fig. 6 Output of TarHunterL

of the TarHunter, which predicts miRNA targets based on a cutoff score and can be invoked as follows: CODE: $ perl TarHunterL.pl -q selected_Arabidopsis_miRNAs.fasta -b Athaliana_167_TAIR10.fasta -o target.txt -T 4 -t 1 -f 4

Here, the parameter -f defines the cutoff score for target predictions. A score of 4 has been used across the target prediction methods described in this chapter for consistency. The delimited output with the information on the cleavage site and the start and the end coordinates can be visualized using Excel (Fig. 6). TarHunter provides several options, which entails the detection of species-specific miRNAs and the presence of conserved miRNAs across different species.

4

Notes 1. There are two customizable parameters, which users can optimize and customize according to their needs: (a) c, which represents the prediction score cutoff value. The default value is 4, which works fine with most target predictions. (b) p, visualization of the small RNA–target pairs. It can be classic, gff, json, and table format. 2. For insertion into the database or the alignment files, either the classic alignment format or the table format can be used. 3. psRNATarget searches can be optimized by specifying two parameters: (1) seed matching and (2) expectation score, which defines the rate the false positives. 4. The score cutoff in TAPIR is a measure of the number of mismatches, gaps, and the number of G:U pairs in the miRNA–mRNA duplexes.

References 1. Meyers BC, Axtell MJ, Bartel B, Bartel DP, Baulcombe D, Bowman JL, Cao X, Carrington JC, Chen X, Green PJ, Griffiths-Jones S, Jacobsen SE, Mallory AC, Martienssen RA, Poethig

RS, Qi Y, Vaucheret H, Voinnet O, Watanabe Y, Weigel D, Zhu JK (2008) Criteria for annotation of plant MicroRNAs. Plant Cell 20:3186–3190

Protocols for miRNA Target Prediction in Plants 2. Rhoades MW, Reinhart BJ, Lim LP, Burge CB, Bartel B, Bartel DP (2002) Prediction of plant microRNA targets. Cell 110:513–520 3. Axtell MJ (2013) Classification and comparison of small RNAs from plants. Annu Rev Plant Biol 64:137–159 4. Dai X, Zhuang Z, Zhao PX (2011) Computational analysis of miRNA targets in plants: current status and challenges. Brief Bioinform 12:115–121 5. Bonnet E, He Y, Billiau K, Van de Peer Y (2010) TAPIR, a web server for the prediction of plant microRNA targets, including target mimics. Bioinformatics 26:1566–1568 6. Li F, Orban R, Baker B (2012) SoMART: a web server for plant miRNA, tasiRNA and target gene analysis. Plant J 70:891–901 7. Dai X, Zhuang Z, Zhao PX (2018) psRNATarget: a plant small RNA target analysis server (2017 release). Nucleic Acids Res 46(W1): W49–W54 8. Fahlgren N, Howell MD, Kasschau KD, Chapman EJ, Sullivan CM, Cumbie JS, Givan SA, Law TF, Grant SR, Dangl JL, Carrington JC (2007) High-throughput sequencing of

73

Arabidopsis microRNAs: evidence for frequent birth and death of MIRNA genes. PLoS One 2: e219 9. Wu HJ, Ma YK, Chen T, Wang M, Wang XJ (2012) PsRobot: a web-based plant small RNA meta-analysis toolbox. Nucleic Acids Res 40: W22–W28 10. Ma X, Liu C, Gu L, Mo B, Cao X, Chen X (2018) TarHunter, a tool for predicting conserved microRNA targets and target mimics in plants. Bioinformatics 34:1574–1576 11. Addo-Quaye C, Miller W, Axtell MJ (2009) CleaveLand: a pipeline for using degradome data to find cleaved small RNA targets. Bioinformatics 25:130–131 12. Zhang L, Qin C, Mei J, Chen X, Wu Z, Luo X, Cheng J, Tang X, Hu K, Li SC (2017) Identification of microRNA targets of Capsicum spp. using MiRTrans—a trans-Omics approach. Front Plant Sci 8:495 13. Fahlgren N, Carrington JC (2010) miRNA target prediction in plants. In: Meyers B, Green P (eds) Plant MicroRNAs, Methods in molecular biology (methods and protocols), vol 592. Humana Press

Chapter 6 Improving miRNA Target Prediction Using CLASH Data Xiaoman Li and Haiyan Hu Abstract In this chapter, we present a computational method, TarPmiR, for miRNA target prediction. TarPmiR is based on emerging features of miRNA–target interactions learned from CLASH (crosslinking, ligation and sequencing of hybrids) data. First, we introduce miRNA target prediction, delineate existing methods for miRNA target prediction, and discuss their usage and limitations. Next, we describe available CLASH data, the learning of new miRNA binding features from CLASH data, and the usage of CLASH features in miRNA target prediction. Finally, we detail the computational pipeline of TarPmiR, discuss its performance compared with existing computational methods for miRNA target prediction, and present its installation and usage for miRNA target prediction. This chapter will facilitate the common understanding of CLASH data, new characteristics of miRNA–target interactions, and the use of the CLASH based miRNA target prediction tool TarPmiR. Key words miRNA, CLASH data, miRNA target prediction, New features, TarPmiR

1

Introduction The identification of microRNA (miRNA) target sites is crucial for understanding posttranscriptional gene regulation [1, 2]. miRNAs are short noncoding RNAs that can bind to mRNAs. The binding of miRNAs to their target mRNAs is often through imperfect base pairing, in which nucleotides in the seed regions of miRNAs (positions 2 to 7) are often complementary to the nucleotides in the target sites of the target mRNAs. The binding of miRNAs to their target sites can degrade the target mRNAs and/or repress their translation [1]. It is thus important to identify miRNA target sites. Experimental methods have been developed to identify miRNA target sites [1]. Early experiments are low throughput and tedious. However, it is through these experiments that scientists have significantly advanced the understanding of miRNA biogenesis, miRNA binding, and common characteristics of miRNA target sites [3, 4]. With the advent of next generation sequencing technologies, several types of crosslinking immunoprecipitation (CLIP) based methods have been developed [5–8]. These methods

Alessandro Lagana` (ed.), MicroRNA Target Identification: Methods and Protocols, Methods in Molecular Biology, vol. 1970, https://doi.org/10.1007/978-1-4939-9207-2_6, © Springer Science+Business Media, LLC, part of Springer Nature 2019

75

76

Xiaoman Li and Haiyan Hu

generate thousands of miRNA target sequences, each several dozen nucleotides long, in one experiment. Although CLIP-based methods are high throughput and relatively cost-effective, we still cannot afford to perform such experiments under various experimental conditions. Computational methods are thus indispensable for miRNA target site prediction and identification. Dozens of computational methods have been developed to identify miRNA target sites [9–18]. The majority of existing methods are based on conventional features of miRNA target sites. These features include seed matching, conservation, local AU content, and miRNA–target RNA binding energy. Almost all methods require seed matching for miRNA target site prediction [1]. However, recent CLIP-based experiments such as photoactivatable ribonucleoside-enhanced CLIP (PAR-CLIP) and cross-linking ligation and sequencing of hybrids (CLASH) show that seed matching may be unnecessary for certain miRNAs [1, 5, 6]. In other words, conventional features cannot represent the common features of all miRNA target sites. Several recent methods have thus been developed to predict target sites directly from CLIP-based experimental data or with features learned from CLIP-based experimental data. Among them, a new method, TarPmiR, has used 13 features learned from CLASH data to predict miRNA target sites, which shows superior performance to three existing methods [9]. In this chapter, we present how to apply TarPmiR for miRNA target site prediction. We choose TarPmiR because it not only shows better recall and better precision when compared with popular methods but also complements the majority of existing methods that are based on conventional features of miRNA target sites [9]. We will briefly describe the CLASH data used and the 13 features of target sites learned from the CLASH data. We then summarize the TarPmiR method and its performance when compared with three existing methods. Next, we describe the installation of TarPmiR, specify the input data to and output results from TarPmiR, and show how to run TarPmiR. Finally, we emphasize several additional notes that may help with the application of TarPmiR. We hope that this chapter will provide an easy but thorough guidance to the application of TarPmiR to miRNA target site prediction.

2

Materials and Methods

2.1 The CLASH Dataset

The CLASH dataset is from Helwak et al. [6] and contains 18,514 target sites of 399 miRNAs in HEK293 cells [19, 20]. This is the only CLASH dataset in human that is publicly accessible. Unlike datasets used to train existing popular methods, in which often only a small number of target sites without their corresponding miRNAs are available, this CLASH dataset provides target sites and their

miRNA Target Identification with TarPmiR

77

corresponding miRNAs. Moreover, a large number of target sites are present for the same miRNAs. For instance, among the 399 miRNAs in the dataset, there are 79 miRNAs with at least 50 target sites each. If relative stable target sites (with low binding energy, ”. In other words, this file is in the FASTA format. TarPmiR will identify target sites of the miRNAs listed in this file in input mRNAs (see Note 2). The second parameter is similar to the first one but it indicates the input mRNA file. This file specifies the mRNAs that will be considered as possible targets of the input miRNAs. It is preferred for users to input expressed mRNAs under the corresponding conditions (see Note 2). The -m parameter specifies the trained random forests model

Running TarPmiR

0.029

0.86

0.04

0.90

0.04

0.97 0.13

0.53 0.02

0.66 0.03

0.65

II

0.03

0.93

III

0.13

0.37

IV

0.02

0.60

I

0.03

0.55

II

miRanda

0.02

0.82

III

0.12

0.42

IV

0.01

0.65

I

miRmap

0.01

0.59

II

0.02

0.71

III

0.08

0.33

IV

I and II are human PAR-CLIP datasets from Hafner et al. and Kishore et al., respectively, each with 120 miRNAs. III is a HITS-CLIP dataset in mouse with 119 miRNAs from Chi et al. IV is a dataset collected from Tarbase 7.0 with 100 miRNAs. See the TarPmiR paper for details

Precision

Recall

IV

I

III

I

II

TargetScan V2015

TarPmiR

Table 1 Performance comparison of three methods with TarPmiR

80 Xiaoman Li and Haiyan Hu

miRNA Target Identification with TarPmiR

81

that will be used to make the target site prediction. The current TarPmiR package has only a human model available, although it also works well for mouse miRNA target site prediction (see Note 1) [9]. The -p parameter specifies the probability cutoff to define a candidate site as a target site. The default -p parameter is 0.5, which resulted in the largest recall during the testing of TarPmiR. For higher precision, users can choose larger cutoffs. Running the above command, TarPmiR outputs its predictions in a file. In this file, the first four columns of each row contain the miRNA name, the mRNA name, the target site location, and the miRNA/mRNA binding probability. Since TarPmiR is trained with both conventional and new features, many of its predicted target sites may be not predicted by popular tools such as TargetScan and miRanda [2, 14]. This does not mean that these not-shared target sites are false predictions. Users should pay attention to the values of the 13 features for each binding, which are provided from the fifth to the 17th column in the output file. If a target site does not have “good” values for conventional features, it is unlikely that it will be predicted by conventional tools, though it may still be a bona fide site. In this sense, TarPmiR is complementary to existing tools, as it can pick up the majority of the target sites predicted by popular tools and include novel sites (see Note 3) [9].

3

Notes 1. Although TarPmiR is based on the 13 features learned from the CLASH dataset, the TarPmiR can be easily extended to consider other features and retrained using other datasets. The TarPmiR package includes an extension directory and a concrete example that describes how to extend TarPmiR. 2. It is preferred to input expressed miRNAs and mRNAs under a specific condition to run TarPmiR. This is because the binding of miRNAs to mRNAs is condition-specific, and the input of unrelated miRNAs or mRNAs will likely add unnecessary complexity to the target site identification problem [1, 23]. Moreover, running TarPmiR on a large number of miRNAs and mRNA is slow, as the tool considers target sites in the entire mRNA sequences by a machine learning approach. It takes TarPmiR ~7940 CPU seconds to predict target sites of 20 miRNAs in 400 mRNA sequences each 2000 nucleotides long. If it is infeasible to obtain active miRNAs and mRNAs, users can choose top miRNAs, top mRNAs, or a list of miRNAs that are of interest. 3. The miRNA–target site interactions predicted by TarPmiR should be considered as “static,” in the sense that the predicted target sites are not condition-specific. Although the input

82

Xiaoman Li and Haiyan Hu

active miRNAs and active mRNAs will help to reduce the predicted non-condition-specific interactions, other factors such as the abundance of miRNAs, the abundance of mRNAs, and the competition and cooperation of miRNAs and mRNAs will also affect the interactions. Users may want to intersect the results from TarPmiR with those from a recently published tool, CCmiR [24], to obtain a better set of condition-specific target sites. In addition, the predicted target sites by TarPmiR only suggest that the corresponding miRNAs may bind the sites, which may not have any functional consequence. References 1. Bartel DP (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116:281–297 2. Agarwal V, Bell GW, Nam JW et al (2015) Predicting effective microRNA target sites in mammalian mRNAs. eLife 4 3. Kuhn DE, Martin MM, Feldman DS et al (2008) Experimental validation of miRNA targets. Methods 44:47–54 4. Thomson DW, Bracken CP, Goodall GJ (2011) Experimental strategies for microRNA target identification. Nucleic Acids Res 39:6845–6853 5. Hafner M, Landthaler M, Burger L et al (2010) Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell 141:129–141 6. Helwak A, Kudla G, Dudnakova T et al (2013) Mapping the human miRNA interactome by CLASH reveals frequent noncanonical binding. Cell 153:654–665 7. Moore MJ, Scheel TK, Luna JM et al (2015) miRNA-target chimeras reveal miRNA 30 -end pairing as a major determinant of Argonaute target specificity. Nat Commun 6:8864 8. Chi SW, Zang JB, Mele A et al (2009) Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. Nature 460:479–486 9. Ding J, Li X, Hu H (2016) TarPmiR: a new approach for microRNA target site prediction. Bioinformatics 32(18):2768–2775 10. Lewis BP, Burge CB, Bartel DP (2005) Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120:15–20 11. Lewis BP, Shih I-H, Jones-Rhoades MW et al (2003) Prediction of mammalian microRNA targets. Cell 115:787–798

12. Ule J, Jensen KB, Ruggiu M et al (2003) CLIP identifies Nova-regulated RNA networks in the brain. Science 302:1212–1215 13. Vejnar CE, Zdobnov EM (2012) MiRmap: comprehensive prediction of microRNA target repression strength. Nucleic Acids Res 40:11673–11683 14. Enright AJ, John B, Gaul U et al (2004) MicroRNA targets in drosophila. Genome Biol 5: R1–R1 15. Paraskevopoulou MD, Georgakilas G, Kostoulas N et al (2013) DIANA-microT web server v5. 0: service integration into miRNA functional analysis workflows. Nucleic Acids Res 41(Web Server issue):W169–W173 16. Loher P, Rigoutsos I (2012) Interactive exploration of RNA22 microRNA target predictions. Bioinformatics 28:3322–3323 17. Friedman RC, Farh KK-H, Burge CB et al (2009) Most mammalian mRNAs are conserved targets of microRNAs. Genome Res 19:92–105 18. Wang X (2010) Computational prediction of microRNA targets. Methods Mol Biol 667:283–295 19. Ding J, Li X, Hu H (2014) MicroRNA modules prefer to bind weak and unconventional target sites. Bioinformatics. btu833 20. Wang XW (2014) Composition of seed sequence is a major determinant of microRNA targeting patterns. Bioinformatics 30:1377–1383 21. Kishore S, Jaskiewicz L, Burger L et al (2011) A quantitative analysis of CLIP methods for identifying binding sites of RNA-binding proteins. Nat Methods 8:559–564 22. Vlachos IS, Paraskevopoulou MD, Karagkouni D et al (2015) DIANA-TarBase v7. 0: indexing

miRNA Target Identification with TarPmiR more than half a million experimentally supported miRNA: mRNA interactions. Nucleic Acids Res 43:D153–D159 23. Wang Y, Li X, Hu H (2011) Transcriptional regulation of co-expressed microRNA target genes. Genomics 98:445–452

83

24. Ding J, Li X, Hu H (2017) CCmiR: a computational approach for competitive and cooperative microRNA binding prediction. Bioinformatics

Chapter 7 Integration of Omics Data to Identify Cancer-Related MicroRNA Luciano Cascione Abstract MicroRNAs regulate genes involved in various biological processes and may play oncogenic or tumor suppressive roles. Many studies have investigated the relationships between microRNAs and their target genes using mRNA and microRNA expression data. Integrating different types of molecular data could lead to a better understanding of the regulatory network of disease-causing pathways. For this potential to be fully realized, methods for properly integrating omics data are necessary. Here, the computational methods for addressing these challenges are described, and key considerations for analyzing and interpreting profiling data are discussed. Key words miRNA expression, Gene expression, RNAseq, Correlation, Integrative analysis

1

Introduction MicroRNAs (miRNAs) are small noncoding RNAs (20–24 nucleotides long) which regulate gene expression posttranscriptionally by recognizing complementary sites in the 30 untranslated region (30 -UTR) of mRNAs [1]. MiRNAs bind to mRNA sequences, leading to their reduction or the suppression of translational efficiency [2]; these small noncoding molecules play a role in diverse biological pathways through effects of individual miRNAs on multiple mRNAs, and therefore, their alterations may have consequences on a number of cellular processes: cell apoptosis, proliferation, cell cycle, migration, and metabolism [3, 4]. The importance of miRNAs for cancer development and progression has been demonstrated [5, 6]. Many miRNAs either contribute to or repress the cancer phenotype by inhibiting the expression of tumor suppressors or oncogenes, respectively. Generally, oncogenic miRNAs (oncomiRs) are overexpressed in cancers while tumor suppressive miRNAs are underexpressed [7]. Thus, miRNAs have been classically categorized as either oncogenic or tumor suppressors. miRNAs mainly regulate function through their targets, and

Alessandro Lagana` (ed.), MicroRNA Target Identification: Methods and Protocols, Methods in Molecular Biology, vol. 1970, https://doi.org/10.1007/978-1-4939-9207-2_7, © Springer Science+Business Media, LLC, part of Springer Nature 2019

85

86

Luciano Cascione

elucidating miRNA–target interactions is the subject of intense ongoing research. High-throughput transcriptomics datasets now enable us to measure miRNA and gene expression from the same samples at a reasonable price. One of the most common aims of such type of experimental profiling is to provide new insights into the pathogenesis of cancer and identify important biomarkers and therapeutic targets, as well as investigate the role of miRNAs by integrating miRNA and gene expression. A commonly used approach to perform this task assumes that the expression levels of miRNAs and target mRNAs are negatively correlated. So it selects potential miRNA–target pairs that are negatively correlated and have been identified to interact using sequence-based target prediction or experimental methods. Furthermore, the relationships between miRNome and the target genes are complex and form a complicated network. However, it is not clear how to perform such type of integrated analysis and this task is still a computational challenge requiring bioinformatics expertise. A step-by-step procedure is described and discussed in this chapter.

2

Materials The analyses described here are designed to run on a standard Linux system through the Unix command line. The maximal memory usage depends on the size of the datasets but is usually less than 30 GB for typical depths of sequencing. Some of the programs used are multithreaded and will therefore complete faster if run on multiple cores.

2.1 Genomic Sequencing and Annotation Files

1. A FASTA file containing the hg38 (most recent) version of the human genome can be downloaded from: http://hgdownload. soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz. The UCSC Genome Browser also has genome files for many other species. Other rich genomic resources are ENSEMBL (http:// ensemblgenomes.org/) and the NCBI website (http://www. ncbi.nlm.nih.gov/assembly/). 2. The latest GENCODE annotation in GTF format: http:// www.gencodegenes.org/. 3. The latest mature.fa.gz from http://www.mirbase.org/ftp. shtml containing all the mature miRNA sequences.

2.2 Software Packages

1. BWA: It aligns read to a reference genome using the Burrows–Wheeler transform to create an index of the genome. http:// bio-bwa.sourceforge.net/index.shtml. 2. STAR: It is a fast spliced aware read aligner for RNA-seq data. http://smithlabresearch.org/software/preseq/.

Integration of Omics Data to Identify Cancer-Related MicroRNA

87

3. samtools: It is a toolkit including utilities for postprocessing alignments of fastq files. SAMtools provides the options to sort, index, and filter alignments, as well as a pileup function. http://www.htslib.org/. 4. FastQC: It is both a GUI (requires Java) and as a command line program to quickly see some summary statistics to check the quality of your fastq files. http://www.bioinformatics. babraham.ac.uk/projects/fastqc/. 5. Cutadapt: It trims raw reads before mapping to remove poor quality ones that might reduce assembly speed and accuracy, and adapters. http://cutadapt.readthedocs.io/en/stable/ index.html. 6. featureCounts: It counts reads to genomic features such as genes, exons, promoters, and genomic bins. http://bioinf. wehi.edu.au/featureCounts/. 7. anamiR: A R package to identify potential miRNA–target gene interactions using miRNA and mRNA expression data. 8. prodlim: This R package implements nonparametric estimators for censored event history (survival) analysis. https://cran.rproject.org/package¼prodlim.

3

Methods Raw sequencing reads from the Illumina platform are generated in a binary base call (BCL) file format; therefore, the first preprocessing step involves the conversion of BCL files to the universally accepted FASTQ format. This initial step should be done by the sequencing facility and is not covered in this chapter (see Note 1). Our first step in data preprocessing is to eliminate the Illumina adapters, and poly (A) or poly(T) sequences (added during cDNA and library preparation) that may be present at the end of the reads. We recommend Cutadapt though other tools such as Trimmomatic and FASTXToolkit can be used. Quality control (QC) is crucial, and with software such as FastQC we get key information about the quality of sequencing reads including quality score distribution along the reads, GC content, read length, and level of sequence duplication. Once the FASTQ files have been validated, sequence alignment and bioinformatics analysis are performed based on the goals of the experiment. The general outline of the pipeline is presented in Fig. 1.

3.1 Preparation of Genomic Files

1. Download and uncompress the human genome sequence file (hg38) and the comprehensive gene annotation (gtf file):

88

Luciano Cascione

Fig. 1 Summary of the analysis pipeline mkdir genomes; cd genomes; mkdir hg38; cd hg38; wget https://www.gencodegenes.org/human/release_28.html gunzip -c gencode.v28.annotation.gtf.gz mkdir sequence; cd sequence wget ftp://ftp.ensembl.org/pub/release77/fasta/homo_sapiens/dna/ Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

2. Create STAR indexes (see Note 2): This is an initial and one-time step, which prepares a genome index file to be used during read mapping. mkdir genomes/hg38/star-indexes cd genomes/hg38/star-indexes STAR --runThreadN 4 --runMode genomeGenerate --genomeDir ./ --genomeFastaFiles Genome_data/Homo_sapiens.GRCh38. dna.primary_assembly.fa.gz –sjdbGTFfile genomes/hg38/gencode.v28.annotation.gtf

3.2

Quality Control

1. Evaluate sequencing quality with FastQC and look at its report. Keep in mind that individual reports may generate a warning or fail. This does not mean your data are unusable. In most cases

Integration of Omics Data to Identify Cancer-Related MicroRNA

89

poor quality reads can be eliminated by subsequent cleaning steps without losing a large amount of sequence (see Note 3). mkdir fastqc-Sample_1 fastqc

Sample_1.R1.fastq.gz

Sample_1.R2.fastq.gz

-o

fastqc-Sample_1

3.3

Reads Mapping

1. Decompress reads, trim the sequences to the desired read length (if necessary), align, convert to BAM, and sort the BAM file. This can be carried out as follows: gunzip -c SAMPLE.fastq.gz #Count raw reads, divide by 4 to get the number of reads (each read is represented #by four lines in a FASTQ file). wc -l SAMPLE.fastq # Mapping Step mkdir star_alignment/ cd /star_alignment GENOME_DIR = genomes/hg38/star-indexes GTF_FILE = genomes/hg38/gencode.v28.annotation.gtf STAR --runThreadN 10 --genomeDir $GENOME_DIR --readFilesIn $R1 $R2 --sjdbGTFfile $GTF_FILE --outSAMtype BAM SortedByCoordinate --quantMode TranscriptomeSAM GeneCounts --readFilesCommand gunzip -c --outFileNamePrefix “SAMPLE_name”

2. Index bam files with samtools: samtools index Sample_1_Aligned.sortedByCoord.out.bam

The same procedure is carried out in parallel for all samples in your cohort (see Note 4). The resulting BAM files can be used as input for the next steps. You will get a file called Sample_1.ReadsPerGene.out.tab containing the number of reads per gene. With the --quantMode GeneCounts option STAR will count the number of reads per gene while mapping. The counts coincide with those produced by htseq-count with default parameters. This option requires annotations (GTF or GFF with –sjdbGTFfile option) used at the genome generation step, or at the mapping step. 3.4 MicroRNA Read Alignment

Preprocessing reads coming from any small-RNA sequencing experiment is an essential step. miRNAs tend to be shorter than the length of reads generated by sequencers, which means that the read will be sequenced past the biological RNA into the 30 adapter of the sequencing protocol. This nonbiological sequence at the end of the read will make it difficult for the aligner to map the read, so it is trimmed off first (see Note 5).

90

Luciano Cascione

3.4.1 Adapter Trimming

There are many programs available for trimming adapters. We will run a program called cutadapt typing the following command: cutadapt -a --trimmed-only Sample_1.fastq --minimumlength=18 > Sample_1.trimmed.fastq

The program trims the sequence given in from the reads in fastq file and prints the results to a new file Sample_1. trimmed.fastq. It also applies the following filters: only reads where the adaptor was trimmed are printed to the output, and only reads that are at least 18 nucleotides after trimming are printed. 3.4.2 Alignment

Reads are aligned using BWA, typically with the following settings (see Note 6): input_file= {project_dir}/Sample_1.smallRNA.fastq # It is a miRNA fastq file ref= genomes/hg38/sequence/Homo_sapiens.GRCh38.dna.primary_assembly.fa # it is a genomic reference fasta file bwa aln $ref $input_file > Sample_1.sai bwa samse -n 10 $ref Sample_1.sai $input_file > Sample_1.sam

Once the reads have been mapped to genome is often informative to view them with IGV or some other genome browser. For this we need to convert the sam to bam (its compressed binary format), and index it: samtools view -Sb Sample_1.sam > Sample_1.bam samtools index Sample_1.bam

Then, open IGV and load the bam file and the file with all microRNA annotations, hsa.gff3 that we can obtain with the command: wget ftp://mirbase.org/pub/mirbase/CURRENT/genomes/hsa.gff3

To load a file you first select the correct genome (“Human hg38”) in the top left menu. Then go to the File menu, and select “Load from file”, and select the files described above. Type the name of your favorite microRNA (e.g., “mir-155”) to go to that locus. 3.4.3 Quantification of MicroRNAs

We can now summarize the mapped reads to see which microRNAs are expressed in the different samples, and to do some global comparisons. For this, we use the sam files created by bwa and we quantify the known microRNAs in mirBase, counting the reads mapped in their loci.

Integration of Omics Data to Identify Cancer-Related MicroRNA

91

featureCounts is a useful program for counting reads mapping to different genomic regions. Run it as follows, for all sam file at once: featureCounts -t miRNA -g Name -O -s 1 -M -a -o

Here we only look at loci that are “miRNA”, and we use the “Name” attribute to name the loci. The -O flag tells the program that reads that map to several overlapping microRNAs should be assigned to all of them. The -s 1 flag tells the program to only count reads that map to the same strand as the microRNA, and the -M flag makes sure we count multi mapping reads. The output will be a list with the number of reads mapping to each microRNA. Once the reads mapping to each microRNA have been counted, we can analyze the microRNA expression levels using R. To compare expression levels from different libraries, the read counts have to be normalized to compensate for different sequencing depths. For this we will use the TMM normalization. This normalization method uses a trimmed mean of M-values (TMM) between each pair of samples to find a set of scaling factors for the library sizes that minimize the log-fold changes between the samples for most genes. To use this method we need to load the edgeR module. EdgeR is an R package with many useful functions for normalizing RNA-seq data and finding differentially expressed genes. Here we will only use one of the normalization functions. You will see a different prompt, since you are now typing commands in the R environment (see Note 7): R count.data