Next-Generation Sequencing & Molecular Diagnostics 9781780841861, 9781780841885

In striving to improve healthcare, the field of genome sciences is rapidly evolving into clinical genomics. Changes in t

347 99 5MB

English Pages 117 Year 2013

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Next-Generation Sequencing & Molecular Diagnostics
 9781780841861, 9781780841885

Citation preview

Next-­Generation ­Sequencing & ­Molecular Diagnostics Editor Dimitrios H Roukos Ioannina University School of Medicine, Ioannina, Greece

Published by Future Medicine Ltd Future Medicine Ltd, Unitec House, 2 Albert Place, London N3 1QB, UK www.futuremedicine.com ISSN: 2047-332X ISBN: 978-1-78084-188-5 (print) ISBN: 978-1-78084-187-8 (epub) ISBN: 978-1-78084-186-1 (pdf) © 2012 Future Medicine Ltd All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without prior written permission of the copyright holder. British Library Cataloguing-in-Publication Data. A catalogue record for this book is available from the British Library. Although the author and publisher have made every effort to ensure accuracy of published drug doses and other medical information, they take no responsibility for errors, omissions, or for any outcomes related to the book contents and take no responsibility for the use of any products described within the book. No claims or endorsements are made for any marketed drug or putative therapeutic agent under clinical investigation. Any product mentioned in the book should be used in accordance with the prescribing information prepared by the manufacturers, and ultimate responsibility rests with the prescribing physician. Content Development Editor: Duc Hong Le Senior Manager, Production & Design: Karen Rowland Head of Production: Philip Chapman Managing Production Editor: Harriet Penny Production Editor: Georgia Patey Assistant Production Editors: Samantha Whitham & Gemma King Graphics & Design Manager: Hannah Morton

Contents Translating complex genomic discoveries into molecular diagnostic tests Dimitrios H Roukos Integrating NGS and third-generation sequencing technologies into clinical genomic medicine Margaret Tzaphlidou, Costas Papaloukas & Dimitrios H Roukos Next-generation sequencing in cancer research & diagnostics Chee-Seng Ku & David N Cooper Clinical relevance of miRNAs in cancer Hong Kiat Ng, Chee-Seng Ku, David N Cooper & Richie Soong Comprehensive whole genome and transcriptome analysis for novel diagnostics Artur Silva, Adriana R Carneiro, Flávia Aburjaile, Luis C Guimarães, Rommel TJ Ramos, Thiago LP Castro, Vinicius Abreu, Wanderson M Silva, Paula Schneider & Vasco Azevedo Genome function, ChIP-Seq and personalized diagnostics Chandra S Pareek & Andrzej Tretyn Diagnostic perspectives in the epoch of next-generation sequencing Danbin Xu, Juan Caballero, Gustavo Glusman & Qiang Tian Index

3 7 21 43 65

79 99 112

About the Editor Dimitrios H Roukos Dimitrios H Roukos is Associate Professor and Scientific Director of the innovative research “Centre for Biosystems & Genomic Network Medicine (CBS.GenNetMed) at Ioannina University, Greece. Over the last 5 years, he has moved from traditional reductionist medicine to NGS-based integrated deep genome sequencing, functional genome landscape and network biologybased medicine. He has published over 200 papers (ISI/PubMed/ Scopus) with an overall impact factor > 1.400 and more than 6000 citations (Scopus). He is an evaluator in multiple national and international projects and currently evaluator in large-scale research projects proposals (EC-FP7-Health 2011/2012; France [ANR] and Italian Ministry for Education, University and Research) in the area of Innovation in Life Science. He is on the editorial board of more than 20 high impact-factor journals.

2

2

© 2012 Future Medicine www.futuremedicine.com

Foreword Translating complex genomic discoveries into molecular diagnostic tests Dimitrios H Roukos Changes in the structure, regulation and function of the human genome define diseases’ pathogenesis. Inherited and, more frequently, somatic acquired mutations in conjunction with deregulation of gene expression and signaling transduction pathways transform normal cells to disease cells. If complex intracellular signaling networks and cell–cell interactions fail to restore normal cells’ function and tissue and organ homeostasis through biochemical signals, disease becomes evident with clinical symptoms [1–4]. Next-generation sequencing (NGS) technologies coupled with advanced microarrays referred together as high-throughput technologies (HTs) enable new identification of mutations and deeper exploration of the functional principles orchestrating the function of genes, genomes and cells. This lowcost power for fast and accurate DNA sequencing at a genome-wide scale has revolutionized life science and research. Application of HT with integrated advanced bioinformatics using systems computational biology approaches can transform raw data into biochemical pathways. The field of genome sciences is rapidly evolving into clinical genomics aiming to improve healthcare. The current HT-based deeper insight into the clinical genome now provides the potential of molecular characterization of the genome in healthy and disease states and raises exciting opportunities for medical applications, such as molecular diagnostics and therapeutic target discoveries for most major diseases [5–12]. Translating these complex genomic discoveries into clinical medicine, substantial benefits for the population can emerge as we are moving to clinical personalized health management [2]. doi:10.2217/EBO.12.305

© 2012 Future Medicine

3

Roukos This book explores the power and challenges for the integration of NGS into the clinic. Deep sequencing with whole-genome sequencing allows for the identification of all classes of mutations, including translocations that may impact disease initiation and progression in some patients. RNA sequencing and chromatin immunoprecipitation sequencing at a genomewide scale provide the capacity for transcription factor identification and transcriptome analysis. In addition NGS, in conjunction with modern microarrays can expand the analysis of epigenome modifications, such as DNA methylation and histones as well as noncoding RNAs, providing a big picture of biology underlying health and disease. In Chapter 1, Ziogas and colleagues describe the sequencing technological advances from NGS or second-generation sequencing technologies to third- and fourth-generation nanopore sequencing platforms. In Chapter 2, Ku and colleagues discuss the opportunities and challenges in translating NGS-based raw data into biological meaningful signals for cancer diagnostic applications. In Chapter 3, Ng and colleagues describe how gene expression profiling measurements of mRNA and miRNAs in cancer could be used for developing multigene assays with prognostic and predictive value for cancer patients in the clinic. Silva and colleagues then describe the opportunities for microbiome analysis using NGS and microarrays for identifying bacteria pathogenic mechanisms underlying drugs resistance and develop new more effective drugs against bacteria in Chapter 4. Pareek in Chapter  5 focuses on functional genomics and complex functional elements of human genome. The author describes the advantages and challenges in understanding DNA protein binding events by using chromatin immunoprecipitation sequencing. The field of transcriptome analysis now attracts major biomedical research interest because of exciting medical applications. However, understanding genome regulation and function represents one of the bigger challenges in biology. The recent reports of the Encyclopedia of DNA Elements (ENCODE) project reveal that despite advances, only approximately 10% of the work towards understanding the genome function has been completed [13–15]. In the last chapter, Xu and colleagues describe how deeper insights into the genome landscape could be translated into clinical molecular diagnostics. We have entered into the NGS-based and network-biology-based era [16] of genomic network medicine [1–4]. Exciting perspectives are now being created. However, to reach clinical success, innovation will be required [17]. The reason is that long-term and extensive research work is needed to understand the principles of genome regulation and functioning. Despite this lack of a comprehensive genome landscape view, innovation in clinical genomics and dynamic network biology [17] can result in novel molecular

4

www.futuremedicine.com

Translating complex genomic discoveries into molecular diagnostic tests diagnostics and robust molecular subtypes in various diseases for personalizing treatment decisions. Financial & competing interests disclosure The author has no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties. No writing assistance was utilized in the production of this manuscript.

References 1

2

3

4

5

6

Green ED, Guyer MS. National Human Genome Research Institute. Charting a course for genomic medicine from base pairs to bedside. Nature 470(7333), 204–213 (2011). Chin L, Andersen JN, Futreal PA. Cancer genomics: from discovery science to personalized medicine. Nat. Med. 17, 297–303 (2011) Barabási AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 12, 56–68 (2011). Roukos DH. Networks medicine: from reductionism to evidence of complex dynamic biomolecular interactions. Pharmacogenomics 12(5), 695–698 (2011). Stephens PJ, Tarpey PS, Davies H et al. The landscape of cancer genes and mutational processes in breast cancer. Nature 486(7403), 400–404 (2012). Ellis MJ, Ding L, Shen D et al. Whole-genome analysis informs breast cancer

response to aromatase inhibition. Nature 486(7403), 353–360 (2012). 7

8

9

Banerji S, Cibulskis K, RangelEscareno C et al. Sequence analysis of mutations and translocations across breast cancer subtypes. Nature 486(7403), 405–409 (2012). Shah SP, Roth A, Goya R et al. The clonal and mutational evolution spectrum of primary triple-negative breast cancers. Nature 486(7403), 395–399 (2012). Curtis C, Shah SP, Chin SF et al. The genomic and transcriptomic architecture of 2000 breast tumours reveals novel subgroups. Nature 486(7403), 346–352 (2012).

cancer. Nature 487(7407), 330–337 (2012). 12 Seshagiri S, Stawiski EW,

Durinck S et al. Recurrent R–spondin fusions in colon cancer. Nature 488(7413), 660–664 (2012).

13 Ecker JR, Bickmore WA,

Barroso I, Pritchard JK, Gilad Y, Segal E. Genomics: ENCODE explained. Nature 489(7414), 52–55 (2012).

14 Pennisi E. Genomics. ENCODE

project writes eulogy for junk DNA. Science 337 (6099), 1159–1161 (2012).

15 Gerstein MB, Kundaje A,

Hariharan M et al. Architecture of the human regulatory network derived from ENCODE data. Nature 489(7414), 91–100 (2012)

10 Ross-Innes CS, Stark R,

Teschendorff AE et al. Differential oestrogen receptor binding is associated with clinical outcome in breast cancer. Nature 481(7381), 389–393 (2012).

11 Cancer Genome Atlas

www.futuremedicine.com

Network. Comprehensive molecular characterization of human colon and rectal

16 Ideker T, Krogan NJ.

Differential network biology. Mol. Syst. Biol. 17(8), 565 (2012).

17

Roukos DH. Disrupting cancer cells biocircuits with interactome-based drugs: is ‘clinical’ innovation realistic? Expert Rev. Proteomics 9(4), 349–353 (2012).

5

About the Authors Margaret Tzaphlidou Margaret Tzaphlidou is Professor and Head of Medical Physics Department, and Dean of Medical School University of Ioannina, Greece. She was awarded the Alexandros Onasis Fellowship, and after fellowships awarded by the British Council, she was awarded a senior fellowship with NATO. She has been a project evaluator for the European Commission, the INTAS, the Greek Ministry of Research and Technology, the Greek Ministry of Education, the Italian Ministry for Education University and Research. She has also been involved in the activities of the UNESCO Chair in Life Sciences – Life Sciences International Higher Educational Schools (LSIHES).

Costas Papaloukas Costas Papaloukas is Assistant Professor of Bioinformatics at the Department of Biological Applications and Technology, University of Ioannina. He is also a member of the scientific committee of the Center for BioSystems & Genomic Network Medicine (CBS.GenNetMed, University of Ioannina), member of the Academic Advisers Network at University of Ioannina, member of the Epirus Research Foundation and collaborator of the Scientific and Technological Park of Epirus. His research interests include computational analysis of proteins, biomedical signal processing and analysis, tissue classification and clinical decision support systems. He has also worked on DNA interaction modelling and analysis and machine learning in environmental health.

Dimitrios H Roukos Dimitrios H Roukos is Associate Professor and Scientific Director of the innovative research “Center for Biosystems & Genomic Network Medicine (CBS.GenNetMed) at Ioannina University, Greece. Over the last 5 years, he has moved from traditional reductionist medicine to NGSbased integrated deep genome sequencing, functional genome landscape and network biology-based medicine. He has published over 200 papers (ISI/PubMed/Scopus) with an overall impact factor >1.400 and more than 6000 citations (Scopus). He is an evaluator in multiple national and international projects and currently evaluator in large-scale research projects proposals (EC-FP7-Health 2011/2012; France [ANR] and Italian Ministry for Education, University and Research) in the area of Innovation in Life Science. He is on the editorial board of more than 20 high-impact-factor journals.

6

6

© 2012 Future Medicine www.futuremedicine.com

1

Chapter

Introduction

Integrating NGS and third-generation sequencing technologies into clinical genomic medicine

8

Cancer mutation landscape 8 HGP & NGS

9

Conclusion

16

Future perspective

16

doi:10.2217/EBO.12.341

© 2012 Future Medicine

Margaret Tzaphlidou, Costas Papaloukas & Dimitrios H Roukos The completion of the human genome project in 2003 has marked a new era of genome sciences, technology and genomic medicine for the 21st Century. After the completion of the sequencing of approximately 3 billion base pairs, high-throughput next-generation sequencing (NGS) technologies have rapidly evolved exponentially revolutionizing biomedical research. Now, the thirdgeneration sequencing platforms are commercially available improving technical aspects of the secondgeneration NGS platforms and fourth-generation nanopore sequencing technology has been announced. These breakthrough technologies have enabled the partial mapping of the functional elements of the genome more recently released by the Encyclopedia of DNA Elements (ENCODE) project. At the same time, massive efforts are now underway to translate deep sequence and integrated genome analysis data into clinical medicine. Here, we discuss all these advances but also multiple challenges to overcome to achieve clinical genomic medicine.

7

Tzaphlidou, Papaloukas & Roukos Ne x t- gener ati o n s e qu en c in g: thes e technologies referred also as secondgeneration sequencing are massively parallel sequencing platforms that provide high-throughput DNA sequencing. These platforms represent an evolution of low-throughput Sanger sequencing, also known as capillary sequencing or first-generation sequencing.

Introduction The Human Genome Project (HGP) required 13 years to complete the human entire genome sequence in 2003 [1] . Although there has been critique that even today healthcare improvement is modest, the HGP set the basis for the rapid ENCODE: the Encyclopedia of DNA Elements (ENCODE) development of genomics research, project explores the functional elements of the genome-wide mapping technologies [2] genome. In contrast to the widely accepted view that and future clinical personalized genomic the vast majority of human genome is ‘junked’ DNA; medicine [3]. Over the last few years, highthe ENCODE project reveals now that most of noncoding DNA and approximately 80% of human throughput (HT) nex t- generation genome has a functional role. The ENCODE project sequencing (NGS) technologies have has provided deeper insights into genes regulation rapidly evolved exponentially [2] and function, transcription factors, epigenome and revolutionizing biomedical research. The noncoding RNA but it represents 10% only for the third- generation (3G) sequencing understanding of the whole-genome regulation and function. plat for ms des c r ib ed b elow are commercially available improving technical aspects of the second-generation (2G) referred also as NGS platforms, while the fourth-generation (4G) nanopore sequencing technology has been announced. It has been increasingly recognized that probably simple genome sequencing will have modest impact on clinical medicine and better understanding of genome function is essential for improving health care. This chapter describes the power of HT sequencing technologies for understanding the structure and function of human genome, the latest advances in the ENCODE project [4–6] and particularly the perspectives of HT-based genome sciences discoveries to be translated into personalized genomic healthcare management. Cancer mutation landscape HT sequencing technologies not only enable DNA sequencing at a genomewide level but also generate data for integrative omics analysis. These platforms provide deeper insights now into structural and functional architecture of genomes in healthy and disease cells. Among the eight hallmarks of cancer counting deregulation of cell survival, growth and apoptosis expressed by Hanahan and Weinberg [7], the disruption of critical genes is possibly the most significant move to comprehend cancer. A crucial primary step for understanding how exome and genome malfunction in cancer is the recognition of these genes harboring all classes of genomic alterations, including point mutations, copy number changes and translocations. Consequently, the prospective goal is to treat cancer by

8

www.futuremedicine.com

NGS & 3G sequencing technologies & clinical genomic medicine aiming and interrupting these cellular deregulations by targeted drugs in biomarker-based chosen patients [3,8,9].

Whole-genome sequencing: next-generation sequencing and third-generation sequencing platforms provide the capacity for sequencing both the whole protein-coding portion of the genome – that is, a total of approximately 21,000 genes termed as exome and the noncoding portion of the DNA that accounts for approximately 98.5% of the whole genome.

Nowadays, the human genome can be sequenced more quickly, inexpensively and more precisely with the rapid technological progress from the preceding Whole-exome sequencing: sequencing of the whole first-generation (1G) Sanger lowexome – that is, the sequencing of all proteinthroughput sequencing equipment to HT encoding genes. NGS technologies. NGS has been advanced from 2G to 3G and 4G sequencing machines. Methods, advantages, flaws and cost of this equipment has recently been evaluated and are summarized in Table  1.1 [10]. The prevalent and recognition of these platforms is explained by the unparalleled power of HT sequencing to detain all types of mutations rapidly, accurately at a cost of only US$5000–10,000 per genome sequence [11] . However, can these HT sequencing-based genomic innovations be converted into clinical practice for decision making in individual cancer patients? Even though the target of the US$1000 human genome in a few years will become evitable, currently, it is estimated that clinical analysis, interpretation and decision-making based on whole-genome sequencing (WGS) or whole-exome sequencing (WES) data in today’s real-world is approximately US$1  billion [101]. In the scientific world, this exotic expenditure reveals the difficulties for transcriptome, proteome, epigenome and interactome analysis. Definitely, the understanding and interactions analysis of these biological systems is regarded as essential to transform healthcare but it is presently beyond the state of art of science and technology [12,101].

HGP & NGS The HGP has transformed life science. The HGP demanded 13 years and US$3 billion to deliver in 2003 the first version of a complete human genome sequence. There has been critique for the time-consumed and elevated expenses Mutational landscape: the high-throughput next-generation sequencing and thirdrequired by contrast to the modest generation sequencing technologies provide the implication in healthcare [13]. It is a fact that power for the identification of all classes of structural a decade after the achievement of the draft genome changes: point mutations, such as singlegenome, modifications have not still nucleotide polymorphisms; inherited copy-number occurred in the prevention and treatment variants; somatic copy-number aberrations; and genomic translocations (gene fusion). These genome of incurable disorders, including major changes are crucial in understanding complex disease chronic diseases such as cancer, diabetes, pathogenesis.

www.futuremedicine.com

9

10 Disadvantages

Clinical applications NA for WES, WGS, ChIP-Seq and RNA-Seq, 8 h/run 2–20 µg DNA

Roche 454 GS Junior

Applied Biosystems SOLiD™ 4 (Life Technologies, CA, USA)

35–50-bp read lengths

525,000 (17,447)

500,000 (8439)

~400,000 (8950)

2,000,000 (250,000)

System cost (US$/run)

ChIP: Chromatin immunoprecipitationl NA: Not applicable; NGS: Next-generation sequencing; Seq: Sequencing; WES: Whole-exome sequencing; WGS: Whole-genome sequencing.

Driven by DNA ligase instead of DNA polymerase

1–5 µg DNA

Roche Applied Pyrophosphate released at time of Science 454 base incorporation genome sequencer (Roche, Switzerland)

400-bp read lengths

Clinical applications NA for WES, WGS, ChIP-Seq and RNA-Seq, 10 h per run

35–100-bp read lengths and more false positives

Short read length, complex sample preparation, amplification required, long time to results, significant data shortage and interpretation requirements

MiSeq™ (Illumina)

Fluorescent-labeled nucleotides added 100 Mb (chip316) and >1 Gb (chip318). Similarly, Illumina MiSeq produces sequencing data ranging from >120 Mb to >1 Gb depending on the read length and whether it is single-end or paired-end sequencing. By contrast, the Roche 454 Genome Sequencer Junior has a much lower throughput (>35 Mb) per instrument run but a longer read length of 400 bp on average compared with the other two platforms. These bench-top NGS machines have further enhanced the technical and logistical flexibility (e.g., a smaller number of samples can be processed) at the same time as avoiding ‘redundant’ sequencing (i.e., sequencing to a higher than required depth). Thus, they have also further optimized cost–effectiveness by avoiding redundant sequencing. In molecular diagnostics, the sequencing of a panel of genes in a small number of samples is both common and routine. Using Lynch syndrome once again as an example, the Ion Torrent chip314 would be sufficient to meet the sequencing demand or Roche 454 Junior if multiple samples are available for multiplexing [29]. In parallel, the development of multiple custom-designed oligonucleotide assays based on hybrid selection (e.g., by Agilent and Nimblegen) and PCRbased amplification methods (such as Fluidigm and RainDance technologies and Illumina TruSeq™ Custom Amplicon sequencing) has made the targeted sequencing of a panel of genes with different sizes of genomic regions technically highly feasible. For example, Illumina TruSeq Custom Amplicon sequencing allows a multiplexing of up to 384 amplicons with a targeted size ranging from 4 to 96 kb [103]. By contrast, the Agilent and Nimblegen

26

www.futuremedicine.com

Next-generation sequencing in cancer research & diagnostics custom-designed oligonucleotide enrichment assays offer a larger target size ranging from hundreds of kilobases to several megabases [10]. More importantly, the latest development of prehybridization barcoding protocol, where multiple samples can be pooled for a single hybrid selection experiment, has further reduced the cost of genomic enrichment. Taken together, through harnessing the power of recent technological developments in enrichment and sequencing methodologies, targeted sequencing, WES and WGS have not only been made more technically feasible and more accessible to the clinical diagnostic setting, but also more cost effective. Table 2.1 summarizes the technological features of highthroughput and bench-top NGS platforms. As these technologies are advancing very rapidly, the reader is encouraged to refer to the vendors’ websites for the latest information.

NGS in deciphering cancer genetics Over the past several years, WES and WGS have been increasingly used to delineate the somatic mutational profiles of various cancers. The very first pioneering cancer WGS was performed on acute myeloid leukemia (AML) and succeeded in identifying numerous tumor-specific variants; this suggested the technical and analytical feasibility of applying NGS to interrogate somatic mutations in entire cancer genomes in parallel with paired constitutional DNA samples [30]. Subsequent studies have identified multiple somatic mutations recurrent in IDH1 and DNMT3A genes in AML [31,32]. Additional WGS studies have implicated novel candidate genes harboring putative pathological somatic mutations in hepatocellular carcinoma [33], melanoma [34], prostate cancer [35] and lung cancer [36]. A further advantage of WGS lies in its ability to detect chromosomal rearrangements and fusion genes. For example, in hepatocellular carcinoma, WGS identified 33 somatic rearrangements, of which 22 were validated by Sanger sequencing of the breakpoints in both the tumor and lymphocyte genomes. Four somatic fusion transcripts generated by different chromosomal rearrangements were also identified and validated – the BCORL1–ELF4 and CTNND1–STX5 fusion genes by intrachromosomal inversions (Xq25 and 11q12, respectively), the VCL–ADK fusion gene formed by an interstitial deletion in 10q22, and the CABP2–LOC645332 fusion gene resulting from a tandem duplication in 11q13 [33]. New insights into cancer metastasis were also provided through WGS of metastatic lobular breast cancer in comparison with the primary tumor derived from the same patient [37] . Clonal evolution in relapsed AML was also investigated by WGS to determine the mutational spectrum associated with relapse [38].

www.futuremedicine.com

27

28

Yes

Indexing or barcoding of samples Nucleotide substitution errors

Yes

Yes

Nucleotide substitution errors

Yes

Yes

Yes

Up to 75 bp

Up to 300 Gb

Indel errors in homopolymer

Yes

Yes

Yes

Up to 400 bp

>35 Mb

Life Technologies Ion Torrent™ Personal Genome Machine

Nucleotide substitution errors

Yes

Yes

Yes

Up to 150 bp

>1 Gb

Indel errors in homopolymer

Yes

Yes

Yes

Up to 200 bp

>1 Gb

Release of hydrogen ion and pH changes

Bridge Emulsion PCR amplification (cluster generation)

Emission of Emission of chemiluminescent fluorescent light light (pyrosequencing)

Emulsion PCR

Roche 454 Junior™ Illumina MiSeq™

Bench-top NGS platforms

The technological information of the sequencing platforms summarized in this table (and discussed in the chapter) was derived from the company websites. As the information is being updated frequently, readers are encouraged to refer to the vendors’ websites for the latest information. NGS: Next-generation sequencing; WES: Whole-exome sequencing; WGS: Whole-genome sequencing. Reproduced with permission from [55].

Dominant error Indel errors in type homopolymer

Yes

Yes

Single-end sequencing

Yes

Up to 150 bp

Up to 700 bp

Read length

Paired-end sequencing

Emulsion PCR

Emission of Emission of fluorescent light fluorescent light

Up to 300 Gb

Emission of chemiluminescent light (pyrosequencing)

Detection of nucleotide incorporation

Bridge amplification (cluster generation)

Throughput per Up to 700 Mb run

Emulsion PCR

Roche 454 Genome Illumina GAIIx™/ Life Technologies Sequencer FLX HiSeq2000™ SOLiD4/5500™/ Titanium™ 5500XL™

High-throughput NGS platforms

Template preparation/ amplification

Technological feature

Table 2.1. Summary and comparison of technological features of high-throughput and bench-top next-generation sequencing platforms.

Ku & Cooper

www.futuremedicine.com

www.futuremedicine.com

Targeted sequencing WES (multiple sequencing runs) WGS and WES

WGS and WES

Roche 454 Genome Illumina GAIIx™/ Life Technologies Sequencer FLX HiSeq2000™ SOLiD4/5500™/ Titanium™ 5500XL™

High-throughput NGS platforms

Targeted sequencing

Life Technologies Ion Torrent™ Personal Genome Machine

Targeted Targeted sequencing sequencing WES (multiple WES (multiple sequencing runs) sequencing runs)

Roche 454 Junior™ Illumina MiSeq™

Bench-top NGS platforms

The technological information of the sequencing platforms summarized in this table (and discussed in the chapter) was derived from the company websites. As the information is being updated frequently, readers are encouraged to refer to the vendors’ websites for the latest information. NGS: Next-generation sequencing; WES: Whole-exome sequencing; WGS: Whole-genome sequencing. Reproduced with permission from [55].

Suitability for WGS, WES and targeted sequencing of human genome

Technological feature

Table 2.1. Summary and comparison of technological features of high-throughput and bench-top next-generation sequencing platforms.

Next-generation sequencing in cancer research & diagnostics

29

Ku & Cooper By contrast, the delay in applying WES to cancer genome sequencing may be attributed to the initial technical difficulties inherent in enriching the collection of all exons in the human genome. However, several early studies have attempted to apply traditional PCR Sanger sequencing (in ‘brute-force’ mode) to sequence most of the consensus coding sequence and RefSeq genes [39,40]. For example, up to 125,624 PCR primers were needed to amplify 6196 RefSeq transcripts [40]. This technical obstacle, however, was removed with the development of commercial whole-exome enrichment kits. Several WES studies of cancer genomes were not published until 2011 [41–44]. In comparison to WGS, WES is more cost–effective and analytically less challenging, and hence more affordable and feasible for a larger sample size. Employing a larger sample size is advantageous for prioritizing potential candidates (either recurrent mutations or highly mutated genes) for subsequent validation. This has been demonstrated in a WES study of melanoma by searching specifically for novel recurrent mutations that occurred in at least two of the 14 samples for further validation. Follow-up investigation in an additional 153 melanoma samples identified TRRAP, which harbored a recurrent mutation in approximately 4% of samples [44]. Taken together, numerous recurrent mutations and highly mutated genes have been identified for a number of cancers through WES; these are very likely to be driver mutations or candidate cancer genes with key roles in carcinogenesis. In addition, one of the most exciting findings from the recent cancer genome sequencing studies has probably been the identification of tumor-associated mutations in the genes involved in chromatin remodeling [43,45,46]. This also further highlights the importance of the interaction between genetic and epigenetic aberrations in cancer. In parallel, NGS has been applied to familial cancer syndromes, with the successful identification of pathological germline mutations responsible for familial pancreatic cancer, hereditary pheochromocytoma and familial melanoma [2–4]. For example, a germline truncating mutation in the PALB2 gene was identified by WES of a patient with suspected familial pancreatic cancer [2]. In a similar vein, germline mutations in MAX were identified in three unrelated individuals with hereditary pheochromocytoma [3]. These studies applied a robust set of filtering criteria to identify the putative causal mutations. For example, the WES of hereditary pheochromocytoma focused on heterozygous SNVs and small indels because it was postulated that it would be very unlikely that homozygous variants could act as founder mutations. Over and above this strategy, several common filtering criteria were applied, as with other studies of Mendelian disorders [47], for example, selecting only those variants within coding regions (such as those with amino acid changes) and those that affected

30

www.futuremedicine.com

Next-generation sequencing in cancer research & diagnostics the same gene in all three samples. This resulted in the identification of a total of five SNVs, located within two genes (MAX and ADCY6). Segregation of two variants in MAX with hereditary pheochromocytoma was observed in the two families from whom DNA from affected relatives was available, but not ADCY6. Additional evidence to support the causative role came from screening 59 further cases of hereditary pheochromocytoma, which identified two additional truncating mutations and three missense variants in the MAX gene [3]. Taken together, these studies demonstrated the value of WES in discovering the underlying genetic causes of hereditary cancer syndromes.

NGS in cancer diagnostics The application of NGS in cancer diagnostics has been increasingly evident. This is exemplified by two recent studies using WGS [5,48]. More specifically, WGS has demonstrated its discovery and confirmatory role in cases characterized by an ambiguous diagnosis or clinical presentation. For example, it has been used to unravel the genetic aberration of a patient with a diagnosis of AML of unclear subtype [5]. Molecular characterization carries important clinical implications in the treatment and management of the patient. The ambiguity came from the observations that the patient’s clinical presentation was consistent with acute promyelocytic leukemia (a subtype of AML with a favorable prognosis), but it was contradicted by cytogenetic analysis. The cytogenetic analysis revealed a different subtype associated with a poor prognosis for which bone marrow transplantation in first remission is recommended. The diagnostic and treatment uncertainty was resolved by performing WGS on the original leukemic bone marrow, and from a skin biopsy. The WGS analysis detected a novel insertional translocation on chromosome 17, that generated a pathogenic PML–RARA gene fusion, thereby confirming a diagnosis of acute promyelocytic leukemia. This type of complex rearrangement would not have been detected without WGS, further demonstrating that WGS represents a comprehensive analytical tool for the entire genome [5]. More importantly, this molecular confirmatory diagnosis had important clinical implications for the treatment administered to the patient. Following the molecular diagnosis, the patient was considered eligible to receive treatment with retinoic acid, which significantly improves the overall prognosis of patients suffering from acute promyelocytic leukemia. In addition, this avoided the risks inherent in bone marrow transplantation, since this treatment option was not considered further. The clinical significance was clear as the results of the analysis were used in clinical decision making with regards to the patient’s therapy. In practical terms,

www.futuremedicine.com

31

Ku & Cooper the WGS and subsequent ana­lysis were completed within a clinically reasonable timeframe (6 weeks) [5]. In a similar vein, WGS has been employed to resolve the genetic cause of a patient with a suspected cancer susceptibility syndrome based upon the early onset of several primary tumors [48]. The patient developed multiple cancers (specifically breast cancer and ovarian cancer) at an early age. In addition, the patient also developed treatment-related AML. These clinical presentations led to the genetic testing of BRCA1 and BRCA2 genes, which was unrevealing. However, the underlying genetic cause was resolved by WGS on leukemic and skin cells derived from the patient. The WGS analysis identified a novel heterozygous deletion of three exons of the TP53 gene, and the intact copy of TP53 had been lost in the leukemic cells due to uniparental disomy. This demonstrated the utility of WGS in a case with unexpected ‘genetic heterogeneity’, where mutations, other than in BRCA1 and BRCA2 genes, were not tested at the outset. Although this did not affect subsequent clinical decision-making, revealing the underlying genetic defect had important implications for screening family members. The successful applications of NGS/WGS in cancer diagnostics in these studies are likely to be the first examples of how the new technologies are proving their worth; the number of these reports is expected to increase rapidly in the coming year. As noted above, in clinical oncology it is important to generate comprehensive genomic data, perform the analysis and interpret the data in a timeframe that is clinically relevant for the patient. In addition to the WGS studies, another study has also assessed the incorporation of WES and transcriptome sequencing in terms of their clinical utility from both technical and cost perspectives [49]. All three genomic approaches were applied to tumors in an effort to identify potentially pathological aberrations. This study showed that a ‘comprehensive genomic approach’ is both time- and cost-effective. In particular, the time from biopsy sampling and wet-lab experiments to computational analysis and initial results was streamlined to just 24 days. Furthermore, the total cost of sequencing for the three experiments and analysis was US$5400 per patient during the study, and this cost may be expected to decrease over time. A further advantage of this ‘integrative genomic approach’ is that the findings can be cross-validated in a more efficient way. For example, both WGS and WES detected an amplification event on chromosome 13q spanning the CDK8 gene in a metastatic colorectal carcinoma; the overexpression of CDK8 was confirmed by transcriptome sequencing. One of the critical challenges in applying these technologies in a clinical setting is in the handling and interpreting of the huge volume of genomic

32

www.futuremedicine.com

Next-generation sequencing in cancer research & diagnostics data. To address this concern, the study introduced the notion of a multidisciplinary ‘sequencing tumor board’ (which included professionals from several disciplines such as clinicians, geneticists, pathologists, biologists, bioinformatic specialists and bioethicists) responsible for the clinical interpretation of the sequencing data obtained from each patient [49]. Personnel who are specialized in these high-throughput genomic technologies are clearly needed. Thus, given that these challenges (cost, analysis and interpretation) are being increasingly addressed, or even alleviated, it is widely perceived that the clinical utility of NGS will become commonplace in the near future. Figure 2.1 displays the workflow of a ‘sequencing everything’ approach in a clinical context. NGS has also been assessed for its applicability as a diagnostic tool to detect known germline mutations for hereditary cancers. More specifically, by leveraging the technological advances in custom enrichment and NGS, Walsh et al. designed custom oligonucleotides in solution to capture 21 genes responsible for an inherited risk of breast and ovarian cancers [6]. The enrichment followed by NGS (using Illumina GA) was tested in 20 women diagnosed with breast or ovarian cancer and with a known mutation in one of the genes responsible for inherited predisposition to these cancers. It generated encouraging results where all of the known point mutations, small indel mutations (1–19 bp), and large genomic duplications and deletions (160–101,013 bp) were detected in all of the samples. The large deletions and duplications were detected using a read-depth strategy and were in complete agreement with the multiple ligation probe assay [6]. In addition to being able to detect point mutations and small indels, the ability to detect larger deleted and duplicated regions is a further advantage of NGS compared with Sanger sequencing; this is important in a diagnostic test setting as some causal genes are affected by copy number variants. Similarly, the promise of NGS in genetic diagnosis of familial breast cancer has also been demonstrated by another study, which attempted to detect TP53, BRCA1 and BRCA2 mutations in tumor cell lines and DNA from patients with germline mutations. All of the known pathological mutations (including point mutations and small indels of up to 16 nucleotides) were identified [50]. Furthermore, attempts have also been made to incorporate custom genomic enrichment and NGS methods into the genetic diagnostic testing of Lynch syndrome (hereditary nonpolyposis colorectal cancer). It was developed to capture every exon in a panel of 22 genes (most of which are associated with hereditary colorectal cancer) and followed by NGS using Roche 454 GS-FLX and the Illumina GA to evaluate their performance [27].

www.futuremedicine.com

33

Ku & Cooper Figure 2.1. Workflow of a sequencing everything approach in a clinical context Patient after informed consent

Tumor biopsy (DNA, RNA extraction)

Genomic analysis WGS, WES

Transcriptomics analysis RNA-Seq

Epigenomics analysis Bisulfite-Seq, ChIP-Seq

Point mutations, small indels, copy number alterations, structural rearrangements, uniparental disomy

Expression levels of coding RNAs and noncoding RNAs, fusion transcripts, alternative splicing

DNA methylation patterns, histone modifications, transcription factor binding sites

Combine information for integrative analysis and validation of the results by CLIA-certified laboratory for clincial decision making

Expert panel (clinicians, geneticists, pathologists, oncologists, biologists, bioinformatics specialists, bioethicists) responsible for data interpretation, report generation and disclosure of results

Personalized genomic medicine or personalized patient management and treatment ChIP: Chromatin immunoprecipitation; CLIA: Clinical Laboratory Improvement Amendments; Seq: ­Sequencing; WES: Whole-exome sequencing; WGS: Whole-genome sequencing. Reproduced with permission from [55].

Although these technologies are promising in the context of molecular diagnostics, their technical limitations must also be recognized, for example, the GC-rich regions were difficult to enrich. In the worst case scenario, these GC-rich regions would not be captured at all; for example, it has been demonstrated in a targeted sequencing study of several ataxia genes, where

34

www.futuremedicine.com

Next-generation sequencing in cancer research & diagnostics two exons that lacked any sequence coverage contained a very high GC content (76.1 and 63.6%, respectively) compared with the average GC content of 37.6% for the 50 best covered exons [51].

The clinical utility of next-generation sequencing technologies has been increasingly evident. On top of its affordability, the turnaround time of next-generation sequencing (e.g., whole-genome sequencing) in a molecular diagnostic setting lies within a reasonable clinical timeframe.

Off-target sequencing is also an issue that leads to redundant sequencing beyond the Pipelines for interpreting whole-genome sequencing and transcriptome sequencing data derived from targeted regions.Moreover, uneven tumors involving professionals from different sequencing coverage across the targeted disciplines have also been suggested. regions (due to several factors such as However, tests must be properly regulated in a clinical uneven enrichment, uneven sequencing setting and operated according to the Clinical and difficulty in aligning the sequence Laboratory Improvement Amendments. reads to repetitive regions) may result in poor sequence coverage in some of the regions that subsequently affect the sensitivity and specificity of variant detection [27]. Last but not least, the comparison of the two NGS platforms also revealed discrepancies between sequence variants called by the different platforms. This implies that if one platform were used, further validation by Sanger sequencing may be needed. In a clinical setting, mutations that are deemed important to patients but were identified in a research setting require confirmation in a Clinical Laboratories Improvements Amendment-certified laboratory. The development of a diagnostic tool to accurately and cost-effectively detect different genetic aberrations in panels of genes is important for its adoption in a clinical setting. The lack of such a comprehensive tool would create a need for several diagnostic tests per patient. Thus, for example, in genetic testing for BRCA1/BRAC2 mutations, a separate diagnostic test has been offered in order to detect large exonic deletions and duplications that are undetectable by PCR Sanger-based approaches. Similarly, deletion/duplication analysis of the genes implicated in Lynch syndrome (MLH1, MSH2, MSH6, PMS2 and EPCAM) has been performed separately from gene sequencing analysis. NGS has shown its advantages in integrating analyses of different types of genetic variants in a single ‘convenient’ test. In addition, all of the known disease genes can be screened simultaneously owing to the cost–effectiveness and higher throughput of NGS, thereby obviating the need for Sanger sequencingbased testing on a one-by-one basis, which is both time consuming and costly. However, for molecular diagnostics involving a small panel of genes, such as in breast/ovarian cancer and Lynch syndrome, usually only a few samples are available at a time and bench-top NGS machines are the more suitable platforms. Although a limited number of studies

www.futuremedicine.com

35

Ku & Cooper have been performed thus far to develop and assess NGS-based diagnostic assays in hereditary cancers, many have been conducted for Mendelian disorders [47].

Perspective & conclusion NGS technologies have so far made significant progress in characterizing somatic mutations in cancer genomes. This endeavor will be further accelerated by international initiatives such as the International Cancer Genome Consortium [52]. The aim of this initiative is to interrogate the somatic mutational landscape of at least 50 different cancer types and subtypes in thousands of samples, and eventually integrate these genomic data with transcriptomic and epigenomic data. This integrative approach is critical in characterizing the genomic complexities of cancers. The acceleration in the discovery of novel driver mutations or candidate cancer genes in various cancers will not only lead to a better understanding of cancer pathogenesis, but should also potentiate personalized cancer medicine. Similarly, further studies of hereditary cancer syndromes, whose genetic etiologies have not yet been fully explained, would also be expected to identify new causal mutations, that could be invaluable in diagnostic testing. Although the technical and analytical feasibility and cost–effectiveness of NGS in cancer diagnostics have been amply demonstrated, attention should also be given to ethical issues pertinent to the use of these powerful information-generating tools, for example, its ability to reveal results that may be considered incidental. It is noteworthy that the application of NGS in cancer diagnostics has until now been primarily demonstrated in a research setting. All of the clinical (genetic) tests (whether NGS-based or not) must be validated in a heavily regulated clinical setting if the results are to be used to make a diagnosis or therapeutic recommendation to the patient. Although the adoption of NGS in cancer diagnostics in inevitable, the authors believe that targeted methods, such as PCR Sanger sequencing might still be practical and could suffice for certain applications involving single candidate genes with hotspot mutations. For example, in the context of therapeutic prediction, the selection of patients who are eligible for anti-EGFR tyrosine kinase inhibitors in non-small‑cell lung cancer or anti-EGFR monoclonal antibody in colorectal cancer is guided by the status of several hotspot mutations in the EGFR and KRAS genes [53]. By contrast, NGS possesses advantages in terms of the simultaneous sequencing of multiple genes and multiple samples (i.e., targeted sequencing as demonstrated in the genes implicated in breast cancer and Lynch syndrome), and has resulted in cost and time savings. Although not the

36

www.futuremedicine.com

Next-generation sequencing in cancer research & diagnostics focus of this chapter, therapeutic prediction has also benefited significantly from NGS as a powerful discovery tool. Most notably, a recent study that employed a targeted NGS approach to sequence 138 cancer genes in melanomas (before and after relapse) from a given patient succeeded in identifying the underlying genetic mutation in the MEK1 gene responsible for acquired resistance to PLX4032 (vemurafenib) after an initial dramatic response. The in vitro demonstration of increased kinase activity, which conferred resistance to both RAF and MEK inhibition of this mutant MEK1 protein, further supported this novel mechanism of acquired drug resistance [54]. Moreover, the discovery power of WGS in making a diagnosis in cases with an unsuspected genetic etiology is very evident. Finally, it is anticipated that continuing advances in NGS technologies and computational tools, and cost reduction will make them ever more accessible in clinical practice. Financial & competing interests disclosure The authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties. No writing assistance was utilized in the production of this manuscript.

Summary. ƒƒ The advent of next-generation sequencing (NGS) technologies has advanced cancer genetics research in two distinct directions – research discovery and clinical application. ƒƒ Studies employing NGS technologies have been attempting to identify somatic driver mutations in various sporadic cancers as well as high-penetrance causal germline variants in hereditary cancer syndromes. ƒƒ NGS technologies are also a promising diagnostic tool for cancers. ƒƒ These recent advances in research and diagnostics would not have been possible with the traditional low-throughput PCR amplification and Sanger sequencing methods. ƒƒ NGS technologies are characterized by massively parallel sequencing of hundreds of millions of sequence reads, and the production of hundreds of gigabases of DNA sequence data (at a very low cost per nucleotide) has made whole-genome sequencing both technically feasible and affordable. ƒƒ Although the technical and analytical feasibility and cost–effectiveness of NGS in cancer diagnostics have been amply demonstrated, attention should also be given to ethical issues pertinent to the use of these powerful information-generating tools; for instance, its ability to reveal results that may be considered incidental. ƒƒ It is anticipated that continuing advances in NGS technologies and computational tools, and cost reduction will make them ever more accessible in clinical practice.

www.futuremedicine.com

37

Ku & Cooper References 1

2

3

4

5

6

7

8

Wong KM, Hudson TJ, McPherson JD. Unraveling the genetics of cancer: genome sequencing and beyond. Annu. Rev. Genomics Hum. Genet. 12, 407–430 (2011).

9

Jones S, Hruban RH, Kamiyama M et al. Exomic sequencing identifies PALB2 as a pancreatic cancer susceptibility gene. Science 324(5924), 217 (2009).

10 Mertes F, Elsharawy A, Sauer

Comino-Mendez I, GraciaAznarez FJ, Schiavi F et al. Exome sequencing identifies MAX mutations as a cause of hereditary pheochromocytoma. Nat. Genet. 43(7), 663–667 (2011).

11 Korbel JO, Urban AE, Affourtit

Yokoyama S, Woods SL, Boyle GM et al. A novel recurrent mutation in MITF predisposes to familial and sporadic melanoma. Nature 480(7375), 99–103 (2011). Welch JS, Westervelt P, Ding L et al. Use of whole-genome sequencing to diagnose a cryptic fusion oncogene. JAMA 305(15), 1577–1584 (2011). Walsh T, Lee MK, Casadei S et al. Detection of inherited mutations for breast and ovarian cancer using genomic capture and massively parallel sequencing. Proc. Natl Acad. Sci. USA 107(28), 12629‑12633 (2010). Metzker ML. Sequencing technologies – the next generation. Nat. Rev. Genet. 11(1), 31–46 (2010). Mardis ER. A decade’s perspective on DNA sequencing technology. Nature 470(7333), 198–203 (2011).

38

Rothberg JM, Hinz W, Rearick TM et al. An integrated semiconductor device enabling non-optical genome sequencing. Nature 475(7356), 348–352 (2011).

18 Shendure J, Ji H. Next-

generation DNA sequencing. Nat. Biotechnol. 26(10), 1135‑1145 (2008).

19 Meyerson M, Gabriel S, Getz

G. Advances in understanding cancer genomes through second-generation sequencing. Nat. Rev. Genet. 11(10), 685–696 (2010).

S et al. Targeted enrichment of genomic DNA regions for next-generation sequencing. Brief Funct. Genomics 10(6), 374–386 (2011).

20 Wheeler DA, Srinivasan M,

Egholm M et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452(7189), 872–876 (2008).

JP et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318(5849), 420–426 (2007).

21 Wang J, Wang W, Li R et al.

12 Medvedev P, Stanciu M,

Brudno M. Computational methods for discovering structural variation with nextgeneration sequencing. Nat. Methods. 6(11 Suppl) S13–S20 (2009).

13 Medvedev P, Fiume M,

Dzamba M, Smith T, Brudno M. Detecting copy number variation with mated short reads. Genome Res. 20(11), 1613–1622 (2010).

Balasubramanian S, Swerdlow HP et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456(7218), 53–59 (2008). PD et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461(7261), 272–276 (2009).

ER, Wilson RK. Challenges of sequencing human genomes. Brief Bioinform. 11(5), 484‑498 (2010).

24 Ng SB, Buckingham KJ, Lee C

et al. Exome sequencing identifies the cause of a mendelian disorder. Nat. Genet. 42(1), 30–35 (2010).

15 Robison K. Application of

second-generation sequencing to cancer genomics. Brief Bioinform. 11(5), 524–534 (2010).

25 Ng SB, Bigham AW,

16 Mardis ER. The impact of next-

generation sequencing technology on genetics. Trends Genet. 24(3), 133–141 (2008). Li Y, Wang J. Faster human genome sequencing. Nat. Biotechnol. 27(9), 820–821 (2009).

22 Bentley DR,

23 Ng SB, Turner EH, Robertson

14 Koboldt DC, Ding L, Mardis

17

The diploid genome sequence of an Asian individual. Nature 456(7218), 60–65 (2008).

Buckingham KJ et al. Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nat. Genet. 42(9), 790–793 (2010).

26 Schadt EE, Turner S,

Kasarskis A. A window into third-generation sequencing. Hum. Mol. Genet. 19(R2), R227–R240 (2010).

www.futuremedicine.com

Next-generation sequencing in cancer research & diagnostics human cancer genome. Nature 463(7278), 191–196 (2010).

27 Hoppman-Chaney N,

Peterson LM, Klee EW, Middha S, Courteau LK, Ferber MJ. Evaluation of oligonucleotide sequence capture arrays and comparison of nextgeneration sequencing platforms for use in molecular diagnostics. Clin. Chem. 56(8), 1297–1306 (2010).

35 Berger MF, Lawrence MS,

Demichelis F et al. The genomic complexity of primary human prostate cancer. Nature 470(7333), 214–220 (2011).

36 Lee W, Jiang Z, Liu J et al. The

mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature 465(7297), 473–477 (2010).

28 Artuso R, Fallerini C, Dosa L

et al. Advances in Alport syndrome diagnosis using next-generation sequencing. Eur. J. Hum. Genet. 20(1), 50–57 (2012).

37 Shah SP, Morin RD, Khattra J

et al. Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature 461(7265), 809–813 (2009).

29 Ku CS, Wu M, Cooper DN

et al. Technological advances in DNA sequence enrichment and sequencing for germline genetic diagnosis. Expert Rev. Mol. Diagn. 12(2), 159–173 (2012).

38 Ding L, Ley TJ, Larson DE et al.

Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature 481(7382), 506–510 (2012).

30 Ley TJ, Mardis ER, Ding L et al.

DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature 456(7218), 66–72 (2008).

31

Mardis ER, Ding L, Dooling DJ et al. Recurring mutations found by sequencing an acute myeloid leukemia genome. N. Engl. J. Med. 361(11), 1058–1066 (2009).

39 Sjoblom T, Jones S, Wood LD

et al. The consensus coding sequences of human breast and colorectal cancers. Science 314(5797), 268–274 (2006).

40 Wood LD, Parsons DW,

Jones S et al. The genomic landscapes of human breast and colorectal cancers. Science 318(5853), 1108–1113 (2007).

32 Ley TJ, Ding L, Walter MJ et al.

DNMT3A mutations in acute myeloid leukemia. N. Engl. J. Med. 363(25), 2424–2433 (2010).

41

33 Totoki Y, Tatsuno K,

Yamamoto S et al. Highresolution characterization of a hepatocellular carcinoma genome. Nat. Genet. 43(5), 464–469 (2011).

Yan XJ, Xu J, Gu ZH et al. Exome sequencing identifies somatic mutations of DNA methyltransferase gene DNMT3A in acute monocytic leukemia. Nat. Genet. 43(4), 309–315 (2011).

42 Wang K, Kan J, Yuen ST et al.

34 Pleasance ED, Cheetham RK,

Stephens PJ et al. A comprehensive catalogue of somatic mutations from a

www.futuremedicine.com

Exome sequencing identifies frequent mutation of ARID1A in molecular subtypes of gastric cancer. Nat. Genet. 43(12), 1219–1223 (2011).

43 Varela I, Tarpey P, Raine K

et al. Exome sequencing identifies frequent mutation of the SWI/SNF complex gene PBRM1 in renal carcinoma. Nature 469(7331), 539–542 (2011).

44 Wei X, Walia V, Lin JC et al.

Exome sequencing identifies GRIN2A as frequently mutated in melanoma. Nat. Genet. 43(5), 442–446 (2011).

45 Dalgliesh GL, Furge K,

Greenman C et al. Systematic sequencing of renal carcinoma reveals inactivation of histone modifying genes. Nature 463(7279), 360–363 (2010).

46 Li M, Zhao H, Zhang X et al.

Inactivating mutations of the chromatin remodeling gene ARID2 in hepatocellular carcinoma. Nat. Genet. 43(9), 828–829 (2011).

47 Ku CS, Cooper DN,

Polychronakos C, Naidoo N, Wu M, Soong R. Exome sequencing: dual role as a discovery and diagnostic tool. Ann. Neurol. 71(1), 5–14 (2012).

48 Link DC, Schuettpelz LG,

Shen D et al. Identification of a novel TP53 cancer susceptibility mutation through whole-genome sequencing of a patient with therapy-related AML. JAMA 305(15), 1568–1576 (2011).

49 Roychowdhury S, Iyer MK,

Robinson DR et al. Personalized oncology through integrative highthroughput sequencing: a pilot study. Sci. Transl. Med. 3(111), 111ra121 (2011).

50 Morgan JE, Carr IM,

Sheridan E et al. Genetic diagnosis of familial breast cancer using clonal

39

Ku & Cooper sequencing. Hum. Mutat. 31(4), 484–491 (2010). 51

Hoischen A, Gilissen C, Arts P et al. Massively parallel sequencing of ataxia genes after array-based enrichment. Hum. Mutat. 31(4), 494–499 (2010).

52 Hudson TJ, Anderson W,

Artez A et al. International network of cancer genome projects. Nature 464(7291), 993–998 (2010).

53 Ross JS, Cronin M. Whole

cancer genome sequencing by next-generation methods.

40

Am. J. Clin. Pathol. 136(4), 527–539 (2011). 54 Wagle N, Emery C, Berger MF

Websites 101 The Human Gene Mutation

et al. Dissecting therapeutic resistance to RAF inhibition in melanoma by tumor genomic profiling. J. Clin. Oncol. 29(22), 3085–3096 (2011).

55 Ku C-S, Cooper DN, Ziogas ED,

Halkia E, Tzaphlidou M, Roukos DH. Research and clinical applications of cancer genome sequencing. Curr. Opin. Obstet. Gynecol. doi: 10.1097/ GCO.0b013e32835af17c (2012) (Epub ahead of print).

Database. www.hgmd.org

102 Illumina. MiSeq™ Personal

Sequencer. www.illumina.com/systems/ miseq.ilmn

103 Illumina. TruSeq™ Custom

Amplicon. www.illumina.com/products/ truseq_custom_amplicon. ilmn

www.futuremedicine.com

41

About the Authors Hong Kiat Ng Hong Kiat Ng is currently a PhD candidate studying at the Cancer Science Institute of Singapore, National University of Singapore. His research interests focus on applying high-throughput technologies to study epigenetic markers for identification of cancer biomarkers, as well as their role in tumorgenesis.

Chee-Seng Ku Ku Chee-Seng completed his PhD at the National University of Singapore in 2011/2012. He then worked as a Research Associate at the Cancer Science Institute of Singapore. His research interests focus on applying high-throughput microarray and sequencing technologies for studies on human genetic variation, disease genetics (Mendelian and complex diseases) and for diagnostics application. Currently, he is a Foreign Adjunct Faculty at the Department of Medical Epidemiology and Biostatistics, Karolinska Institutet (Sweden) and a Honorary Adjunct Research Fellow in the Saw Swee Hock School of Public Health, National University of Singapore. He is also serving on the editorial board of several international journals, including Human Genetics, Journal of Medical Genetics and Human Genomics.

David N Cooper David N Cooper is Professor of Human Molecular Genetics at Cardiff University, UK. His research interests are largely focused upon elucidating the mechanisms of mutagenesis underlying human genetic disease. He has published over 350 papers in the field of human molecular genetics and has coauthored/coedited a number of books on mutation in the context of inherited disease or molecular evolution. He curates the Human Gene Mutation Database (www.hgmd.org) and is European Editor of Human Genetics.

42

42

© 2012 Future Medicine www.futuremedicine.com

Chapter

3 Clinical relevance of miRNAs in cancer

Introduction

44

mRNA versus miRNA in differentiating tumor types

47

Early detection & disease monitoring  49 Differentiating cancer subtypes, prognostication & drug response prediction 51 Therapeutic target 

52

Clinical applications of other noncoding RNAs 

55

Conclusion

56

doi:10.2217/EBO.12.131

© 2012 Future Medicine

Hong Kiat Ng, Chee‑Seng Ku, David N Cooper & Richie Soong High-throughput genomic technologies have potentiated the large-scale measurement of coding and noncoding transcript expression for the characterization of the molecular basis of disease. In addition, mRNA expression signatures have been found to be very useful in clinical applications, such as in the classification of cancer types and subtypes. Thus, the US FDA has given their approval to several mRNA expression tests for breast cancer (e.g.,  MammaPrint, PAM50, MapQuant Dx™ and OncotypeDX®) to be used both in prognosis assessment and for the prediction of therapeutic response. However, more recently, the expression signatures of miRNAs have also emerged as useful biomarkers for these applications. Indeed, miRNAs have several inherent advantages over mRNAs with respect to expression profiling. Finally, dysregulation of miRNA expression as a hallmark of cancer has led to the development of small molecules, such as miRNA antagonists and miRNA sponges, that can be used in a therapeutic context either to inhibit overexpressed oncogenic miRNAs, or to restore the

43

Ng, Ku, Cooper & Soong Next-generation sequencing technologies: High-throughput sequencing technologies that are able to simultaneously sequence up to hundred millions of sequence reads and thus generate hundred gigabases of sequencing data per run by the sequencing instruments at a significantly lower cost (per nucleotide) compared with the traditional Sanger sequencing.

normal expression of tumor-suppressive miRNAs using miRNA mimics. Taken together, miRNAs appear to have a very promising future both as clinical markers and therapeutic tools in cancer.

Introduction The advent of high-throughput genomic technologies, such as DNA microarrays and next-generation sequencing (NGS) platforms, has been instrumental for the performance of large-scale measurement of transcript expression (for both protein-coding and -noncoding RNAs, such as miRNAs) at the wholegenome level. Comprehensive transcript profiling has improved our ability to characterize the molecular basis of disease and also to identify biomarkers for applications such as early disease detection, prognostication and the prediction of drug responses. Biomarkers of early disease detection should be indicative at an early stage of disease development whereas prognostic markers aim to stratify patients carrying different risks of a specific clinical outcome (e.g.,  disease recurrence or overall survival). On the other hand, biomarkers for drug response are useful for the prediction of responsiveness to certain therapies and can therefore be used to tailor the treatment management of specific patients to optimize the desired therapeutic outcome on an individual basis.

miRNAs: Small (~22 nucleotide) noncoding RNAs with a regulatory role in modulating mRNA or gene expression levels post-transcriptionally.

The expression patterns or signatures of mRNAs have been studied intensively over the past decade in the context of different cancers [1,2]. As a consequence, numerous mRNA expression signatures have been found to be informative in relation to early disease detection, prognostication, and prediction of drug responses. These advances have led to approval being granted by the US FDA for the use of several mRNA expression signatures, such as MammaPrint [3], PAM50 [4], MapQuant Dx™ [5] and OncotypeDX® [6] for breast cancer, and Oncotype DX colon [7] for colon cancer, for use in either prognostication or prediction of therapeutic response. In addition to the FDA-approved mRNA expression tests, molecular assays such as quantitative real-time PCR are also being developed to assess the mRNA expression patterns for other cancers (e.g., lung and cervix) but these assays will require further validation in clinical trials before they can be adopted in a clinical setting for patient management [8,9]. However, more recently, miRNAs have also emerged as a group of promising biomarkers for these applications [10].

44

www.futuremedicine.com

Clinical relevance of miRNAs in cancer miRNAs are small (~22 nucleotides) noncoding RNAs with a regulatory role in modulating mRNA or gene expression levels post-transcriptionally [11]. miRNAs are encoded endogenously by DNA sequences throughout the genome. miRNA synthesis starts with the transcription of long primary transcripts by RNA polymerase II that are subsequently processed by RNase III endonuclease, Drosha and Dicer to form mature miRNAs (Figure 3.1) [12]. Mature miRNAs are then incorporated into RNA-inducing silencing complex and the binding to the 3’ untranslated regions (3’UTRs) of their targeted mRNAs, with either partial or perfect complementarity, leads to translational repression or transcript degradation, respectively [13]. This post-transcriptional regulation is complex because a single miRNA can target multiple mRNAs and a single mRNA can be targeted by numerous miRNAs [14]. To date, more than 21,000 mature miRNAs from assorted organisms have been annotated in miRBase release 18 [15], of which, 1527 are human miRNAs. The mRNA regulation mediated by miRNAs plays an essential role in many biological processes, such as cellular differentiation, cell growth, proliferation and apoptosis; thus, dysregulation of miRNAs expression has been found to be associated with various diseases, including cancer [16,17]. Indeed, experimental data have increasingly demonstrated that aberrant miRNA expression is associated with various diseases, particularly cancer, where distinct expression signature differences between the disease tissue and its normal counterpart have been found [18,19]. The DNA sequences that encode miRNAs are often located in fragile sites and genomic regions associated with cancers [20,21]. Several genomic aberrations have been found to contribute to the dysregulation of miRNA expression RNA-induced silencing complex: a multiprotein (Figure 3.1). For example, overexpression complex that cleaves RNA when a single small of oncogenic miRNAs due to genomic interfering RNA or miRNA is incorporated as template amplification or loss of epigenetic silencing for recognizing mRNA. can suppress tumor-suppressor genes, Post-transcriptional regulation: the regulation of gene resulting in inhibition of the antioncogenic expression at the mRNA level (i.e.,  between the mechanism in the cells [22,23]. On the other transcription and translation of a gene), for example, through miRNA-mediated mechanisms such as mRNA hand, decreased expression or loss of decay or translational repression of mRNA. tumor-suppressive miRNAs due to genomic Single nucleotide polymorphisms: single nucleotide deletion, translocation or gain in promoter substitution at one particular locus in DNA sequence methylation can enhance oncogene/ and this mutational event generates two alleles with proto-oncogene expression [24,25] . In a population frequency of at least 1%. The allele with addition to these structural genetic lower frequency is known as minor allele and the aberrations and epigenetic mechanisms, other one is major allele. The arbitrary cut-off of 1% population frequency is to distinguish from point somatic point mutations or single mutation.

www.futuremedicine.com

45

46

ORF

ORF

ORF

AAA

AAA

AAA

7mG

7mG

7mG

AAA

AAA

AAA

Increased target gene expression

ORF

ORF

ORF

B Deletion

ORF

ORF

ORF

AAA

AAA

AAA

Change in targets

7mG

7mG

7mG

C Mutation

DICER

DROSHA

DGCR8

7mG

AAA

Exportin 5

miRISC

RAN-GTP

Transcription (RNA pol ll and lll)

miRNA

Cis-acting Trans-acting changes factors

D Transcription and processing

(A) Genomic amplification of miRNA results in a decrease of target gene expression, whereas (B) genomic deletion of miRNA leads to an increase in target gene expression. (C) Mutation in the seed regions of miRNAs can affect their complementarity with 3’ untranslated regions thereby leading to a change in the targets. (D) Primary transcripts are transcribed in the nucleus before being processed by RNase III endonuclease Drosha and Dicer to form mature miRNAs. Reproduced with permission from [110]. © Macmillan Publishers Ltd (2011). 

Decreased target gene expression

7mG

7mG

7mG

A Amplification

Genomic alterations

Figure 3.1. The genomic alterations of miRNAs leading to posttranscriptional regulation of mRNAs and the biogenesis of mature miRNAs.

Ng, Ku, Cooper & Soong

www.futuremedicine.com

Clinical relevance of miRNAs in cancer nucleotide polymorphisms in the seed sequence of miRNA (i.e.,  the sequence that binds to 3’UTR) have also been found to cause a lower binding efficiency or elimination of binding to mRNA targets or significantly alter the complementarity specificity that results in other pathways being targeted [26]. In addition to genetic variants in seed sequences of mature miRNAs, it has been found that sequence variants in the primary miRNA transcripts also affect the biogenesis of miRNAs and the integration efficiency of the mature miRNA into the RNA-inducing silencing complex resulting in the loss of function of the miRNAs [27,28]. Apart from 3’UTR, it has also been found that miRNAs can bind to 5’UTR and open reading frame [29–31]. Furthermore, miRNAs do not invariably downregulate their target mRNAs; studies have also found that miRNAs upregulate gene expression levels in growth arrest condition [32] and when binding to the promoters of genes [33]. In subsequent sections, we compare and discuss the clinical applications of mRNAs and miRNAs in various cancers. We then summarize our current knowledge of different clinical applications of miRNAs in cancer, focusing on the feasibility of using miRNAs as biomarkers for early disease detection, prognostication and prediction of response to chemotherapy. Finally, we highlight several miRNAs as potential therapeutic targets that have progressed to clinical trials as a cancer treatment.

mRNA versus miRNA in differentiating tumor types Over the past decade, the microarray analysis of mRNA expression has made substantial progress towards the development of molecular classifiers for different cancers [34,35]. Whole-genome mRNA expression microarray experiments have been performed on several different cancers with the aim to identify expression signatures for classifying different cancer types and subtypes, as well as for early detection, prognostication and prediction of drug responses [36,37]. However, since the discovery of miRNAs in Caenorhabditis elegans in the early 1990s [38] and the subsequent finding of the associations of miRNAs with various cancers, for example, deletion and downregulation of miR-15 and miR-16 at chromosome 13q14 were frequently observed in chronic lymphocytic leukemia [39], miRNAs have emerged as a promising class of molecular markers for clinical applications. Since then, genome-wide profiling studies of miRNAs employed on different platforms, such as microarray, bead-based flow cytometric and NGS, have consistently found that alterations in the miRNA expression profile is a hallmark of cancers, and the aberrant miRNA expression patterns vary between The roles of miRNAs in the post-transcriptional regulation of mRNA expression have been cancer types and subtypes [18,19,40]. increasingly appreciated.

www.futuremedicine.com

47

Ng, Ku, Cooper & Soong Thus, the expression signatures of mRNAs and miRNAs as cancer classifiers have been assessed and compared [18,41], for example, whole-genome mRNA and miRNA expression was profiled by Lu et al. to compare the robustness of each expression profile in classifying cancers [18]. More specifically, a miRNA-based and mRNA-based classifier was generated from 195 miRNAs and 14546 mRNAs that showed a log2 fold change of over 7.25 in 68 well-differentiated tumors from 11 different cancer types. By comparison, the miRNA classifier was found to be more accurate than the mRNA classifier in classifying the poorly differentiated tumors. In terms of sample sources, a study compared miRNA and mRNA expression profiling in formalin-fixed paraffin-embedded and matched frozen tissue. The miRNA expression profile outperformed mRNA expression profiling in identifying the frozen tissue molecular type, highlighting the feasibility of using formalin-fixed paraffin-embedded tissues for miRNA analysis and supporting the greater robustness of miRNAs than mRNAs profiles in this application [41]. In addition, several studies have also attempted to incorporate both miRNAs and mRNAs into cancer classifiers to increase their sensitivity and specificity [42–44]. However, for example, Lanza et al. did not observe substantial differences in the classification when mRNA and miRNA signature were tested independently by comparison with a combined miRNA/mRNA signature [43]. The robustness of the miRNA expression signature in cancer classification has also been demonstrated in the case of metastasis [45]. The expression pattern of a set of 48 miRNAs derived from 22 different tumor tissues and metastases (205 primary and 131 metastatic tumors) were found to be able to predict the tissue of origin of primary tumors and metastases with a high degree of accuracy, namely 89%. In fact, 100% accuracy was achieved in the prediction for 16 of the 22 different tumor tissues and metastases, demonstrating the effectiveness of using miRNAs as biomarkers for identifying the tissue of origin of the cancer. This development is clinically important because carcinomas of undetermined primary origin account for 3–5% of all malignancies with a poor prognosis [46]. In addition to their robustness in cancer classification, miRNAs have several inherent advantages when compared with mRNAs in expression profiling. For example, in terms of their number, miRNAs represent a smaller set of RNA species than mRNAs; hence, they require fewer probes (microarray method) or a smaller amount of sequencing data (sequencing method) to study the entire collection of miRNAs from tumor tissue samples. Furthermore, owing to their small size, miRNAs are also more resistant to degradation than mRNAs and can therefore be readily studied using formalin-fixed paraffin-embedded samples and other biospecimens,

48

www.futuremedicine.com

Clinical relevance of miRNAs in cancer In addition to their biological roles, miRNAs such as whole blood, plasma or serum [47,48], also serve as biomarkers for clinical saliva [49] , sputum [50] , urine [51] and applications such as in cancer classification, as miRNA stool [52]. The circulating miRNAs are also expression profiles have been demonstrated as a more stable and RNase-resistant in serum more robust molecular signature than mRNA in and plasma compared with mRNAs. It is distinguishing different cancer types and subtypes. therefore advantageous in terms of sample logistics to study miRNAs where the samples can be derived from various sources. The inherent stability of miRNAs in serum or plasma also provides opportunities to use miRNAs as less invasive biomarkers in cancer.

Early detection & disease monitoring The importance of identifying biomarkers for early cancer detection is also appreciated as the prognosis or survival of cancer depends considerably upon the tumor stage at the time of detection. Therefore, these biomarkers will be able to detect cancer before symptoms appear (usually at the later stages) and hence can be integrated into diagnostic testing or medical screening. miRNA holds great promise in this application as alterations in miRNA expression have been observed in the transition from normal colon to adenoma and from adenoma to carcinoma [53]. More specifically, when global miRNA expression was analysed in 315 samples (52 normal colonic mucosas, 41 tubulovillous adenomas, 158 adenocarcinomas with proficient DNA mismatch repair selected for stage and age of onset, and 64 adenocarcinomas with defective DNA mismatch repair selected for sporadic [n = 53] and inherited colon cancers [n = 11]), the majority of the differentially expressed miRNAs between normal and adenoma also showed a similar magnitude of expression when normal colon was compared to carcinoma. Similarly, reduced expression of pri-miR-34a was observed in premalignant cervical intraepithelial neoplasia and cervical cancer compared with normal cervical epithelium [54], suggesting that miRNA dysregulation is an early event in the development of cancer and can be used as an early detection marker. In addition to miRNA profiling in cancer tissues, several studies have also been conducted to profile circulating miRNAs in different cancers [47,55,56], especially to identify biomarkers for early detection [57–59]. Notably, Bianchi et al. reported an expression signature of 34 miRNAs in serum that can identify patients with an early stage of non-small-cell lung carcinoma (NSCLC) in a cohort of asymptomatic high-risk individuals with 80% accuracy [58]. This study was performed in serum samples from lung cancer patients collected 1–2 years before the onset of disease. This expression signature was also found to be able to differentiate malignant tumors from benign lesions detected by low-dose spiral chest computed tomography

www.futuremedicine.com

49

Ng, Ku, Cooper & Soong (LDCT). Similar success was also achieved by others when three different circulating miRNA signatures were identified from asymptomatic lung cancer patients recruited for the INT-IEO cohort (a screening project using LDCT) for lung cancer prediction, diagnosis and prognosis in high-risk individuals. One of these expression signatures, comprising 16 miRNAs, was found to be able to discriminate subjects (with ~80% accuracy) who were at high risk for NSCLC from their plasmas collected 1–2 years before lung cancer was detected by LDCT [57]. These signatures were then further validated in another independent cohort (Multicentric Italian Lung Detection cohort) with a similar success rate. However, it must be noted that the current clinical techniques such as LDCT might not be sensitive enough to detect minute nodules, resulting in a false negative detection of the cancer [60,61]. Therefore, the high-risk miRNA expression signatures identified by these studies could represent a better and more sensitive method as compared to the traditional screening methods; patients who exhibit these signatures warrant a more comprehensive screening in addition to LDCT. However, several caveats must be noted in the studies of miRNAs in serum and plasma, as their expression patterns do not always correspond to the primary cancer tissue. For example, miR-375 and miR-141 were significantly upregulated in both prostate tissues and serum samples from patients with metastatic prostate cancer compared with benign prostate tissues [62]. However, some miRNAs, such as miR-21, miR-141 and miR-200c, showed an inverse correlation pattern in lung cancer. It was found that all three miRNAs have a higher expression level in cancer tissues from NSCLC when compared to the matched noncancerous tissues, but only miR-21 had a higher expression level in serum while miR-141 and miR-200c exhibited a lower expression level in cancer patients when compared with the serum of noncancer individuals [63]. This discrepancy could be due to the heterogeneity of the examined tissues (i.e., the expression of miRNA varies between different sections of a tumor either due to intratumoral heterogeneity or imperfect section leading to inclusion of normal tissue). In addition to early detection, miRNAs are useful biomarkers for monitoring residual disease and predicting disease recurrence. For example, Tsujiura et al. observed a reduced expression level of miR-21 and miR-106b in postoperative plasma samples in gastric cancer patients who underwent gastrectomy compared with the patients’ matched preoperative plasma samples [64]. Therefore, measuring the expression of these miRNAs in serum and plasma should be useful for less-invasive disease monitoring or detecting the residual disease after surgical resection. Similarly, a combined expression profile of miR-375 and miR-142-5p in plasma was also found to be a potential predictor of recurrence risk for gastric cancer [65].

50

www.futuremedicine.com

Clinical relevance of miRNAs in cancer Similar applications have also been demonstrated in other cancers. For example, two miRNA signatures were generated by studying 357 stage I NSCLCs. The first expression signature, comprising 34 miRNAs, was found to be predictive of recurrence/relapse-free survival independent of cancer subtypes whereas the second signature, comprising 27 miRNAs, was adenocarcinoma specific [66]. A low level of miR-34a expression was also found to be capable of predicting relapse in surgically resected NSCLC [67]. The ability to predict disease recurrence is important as it is usually associated with poor survival; identifying patients who are at risk allows clinicians to administer adjuvant chemotherapy to this group of patients, which might not be necessary for others.

Differentiating cancer subtypes, prognostication & drug response prediction Apart from early detection of cancers and tracing the tissue of origin for metastases, the expression signature of miRNA is also informative in distinguishing different subtypes of a particular cancer, for prognostication and to predict drug responses [68–70]. These applications have been amply demonstrated. For example, based on an expression signature of 19 miRNAs, Ueda et al. were able to distinguish diffuse- and intestinaltypes of gastric cancer with 74% accuracy [68] while a seven-miRNA signature was found to be predictive for overall survival and relapse-free survival [71]. Different miRNA signatures were also found to be capable of distinguishing between luminal and basal breast cancer subtypes [69,72], as well as classifying breast cancer based on estrogen receptor, progesterone receptor and HER2/neu receptor status [73]. Similarly, the prognostic value of miRNAs has also been demonstrated in chronic lymphocytic leukemia [39,74] , hepatocellular carcinoma [75] , breast cancer [76] and pancreatic cancer [77]. For example, 20 miRNAs were found to be associated with overall survival in patients with resectable pancreatic ductal adenocarcinoma [77]. Among these miRNAs, a high expression of miR-21 and a reduced expression of miR-34a and miR-30d were associated with poor overall survival following resection, independent of clinical covariates. In addition to prognostication, expression signatures that can predict responses to specific therapies are also of clinical importance. Several studies have thus investigated the association of miRNA profiles with therapeutic responses [78–80]. For instance, patients with reduced levels of miR-26a and miR-26b have been consistently shown to be associated with shorter overall survival in hepatocellular carcinoma in three independent patient cohorts. However, when patients were treated with

www.futuremedicine.com

51

Ng, Ku, Cooper & Soong adjuvant IFN-α therapy, patients with a lower level of miR-26 had significant improvement in overall survival compared with patients whose tumors have a higher level of miR-26 [79]. In another study, introduction of miR-21 into a hepatocellular carcinoma cell line was found to confer resistance to IFN-α/5-fluorouracil therapy [80]. Similar results have also been observed in pancreatic cancer where a low level of miR-21 expression was associated with higher disease-free survival in patients treated with gemcitabine or 5-fluorouracil [81,82]. This suggests that expression levels of both miR-21 and miR-26 might be a useful marker for identifying patients with hepatocellular carcinoma who would benefit from adjuvant IFN-α and 5-fluorouracil therapies and gemcitabine in patients with pancreatic cancer. Besides the miRNA expression level, genetic variation in the miRNA itself and within the miRNA binding sites (i.e. 3’UTR of mRNAs) have also shown to be associated with therapeutic outcomes in metastatic colon cancer [83] and ovarian cancer [84]. Of note, CC and CT genotypes were found to be more favorable in the treatment outcome when compared to the TT genotype of this particular single-nucleotide polymorphism (rs7372209) in pri-miR26a-1 of metastatic colorectal cancer patients treated with 5-fluorouracil and irinotecan [83]. Taken together, the associations (for both expression level and genetic variation) with therapeutic responses highlight the importance of miRNAs as therapeutic prediction markers. These markers can also be incorporated into clinical trials to identify patients who will benefit from specific drugs.

Therapeutic target Apart from identifying dysregulated miRNAs as biomarkers in clinical applications as discussed earlier, studies have also investigated the tumorigenic effects of miRNA in cancers in an attempt to identify potential targets for therapeutic intervention. Repression or introduction of a single miRNA into cultured cancer cell lines and mouse models of cancer has successfully identified miRNAs with oncogenic or tumor suppressive effect that contribute to tumor progression [85,86]. The common oncogenic miRNAs including miR-21, miR-155 and miR-17–92, were frequently overexpressed in various cancers, but in contrast to tumor-suppressive miRNAs, such as miR-34a, miR-24, miR-26 and the let-7 miRNA family, they were frequently downregulated in multiple cancers. It is widely believed that treating cancers with a drug that targets a single gene is likely to be ineffective as cancers are complex diseases that involve multiple genomic aberrations or types of dysregulation. Thus, miRNAs represent an alternative treatment option as they have the ability to control multiple

52

www.futuremedicine.com

Clinical relevance of miRNAs in cancer Lock nucleic acids: a class of modified RNA targets that are dysregulated. In a analogs in which the ribose ring is ‘locked’ in hypothetical example, aberration of the the perfect conformation for Watson–Crick binding. expression of a miRNA can lead to abnormalities in multiple targeted mRNAs and their pathways. Thus, development of a drug to target the aberrant miRNA can correct the abnormalities in all the downstream targets. Such an approach is believed to be more effective than developing individual drugs for each targeted mRNA.

There are two approaches to developing miRNA-based therapies. The more common approach is to directly target the overexpressed miRNAs by using antisense oligonucleotides to inhibit their oncogenic effects. These antagonists are equipped with a chemically modified phosphate backbone to delay degradation as well as to increase their binding affinity towards the targeted miRNAs [87–89]. The binding of antagonists to miRNAs will result in either the degradation of miRNAs or their prevention from incorporation into the RNA-inducing silencing complex, thereby rendering them functionally inert. Examples of these antagonists are anti-miRs, antagomiRs and locked nucleic acids [87,89]. Alternatively, the function of a family of oncogenic miRNAs can be inhibited by competitive inhibitors known as ‘miRNA sponges’ [90,91]. miRNA sponges are designed with complete complementarity to the seed sequence of the mature miRNA but with a bulge at positions 9–12 nucleotides to prevent degradation. The mechanism of action of these miRNA sponges is similar to miRNA antagonists but they will target or inhibit several miRNAs with complimentary heptametric seed sequences, thereby silencing a whole family of miRNAs or sequence-related miRNAs rather than blocking a single individual miRNA. The proof of concept for targeting miRNAs in vivo with antagomiRs was first established in 2005. Intravenous administration of 80 mg of antagomiRs (against miR-16, miR-122, miR-192 and miR-194) per kg mouse bodyweight showed silencing of these specific endogenous miRNAs in multiple organs [87]. Silencing of miR-122 was shown to reduce plasma cholesterol in both mice and African green monkeys suggesting a role in lipid metabolism [92]. In addition, treating chronically HCV-infected chimpanzees with a locked nucleic acid-modified oligonucleotide (SPC3649) complementary to miR-122 resulted in the suppression of HCV viremia without any hepatotoxicity or renal toxicity [93]. This has led to the completion of Phase II clinical trials of SPC3649, the first human miRNA targeting drug, as a potential treatment for HCV infection in human (Clinical Trial Identifier: NCT01200420).

www.futuremedicine.com

53

Ng, Ku, Cooper & Soong In addition to miR-122, several other miRNA antagonists have also been tested in various preclinical mouse models of cancer, such as germline transgenic or knockout mice [94]. For example, silencing of miR-10b through systemic treatment of tumor-bearing mice with miR-10b antagomiRs degrades endogenous miR-10b while it increases Hoxd10 expression [94]. Although it does not reduce primary tumor growth, the treatment suppresses the metastasis of the breast cancer cells, highlighting its potential as an antimetastasis drug. Repressing miR-182 with anti-miR-182 has also been shown to suppress metastasis in a mouse model of melanoma liver metastases [95]. In similar vein, inhibition of miR-191 by intraperitoneal administration of a 2’-0-methoxyethyl anti-miR-191 was found to reverse the cancer phenotype in an orthotopic xenograft mouse model of hepatocellular carcinoma [96].

miRNAs were found to be involved in many biological processes and systems, and dysregulation of these miRNAs have been implicated in many diseases including cancer.

In contrast to miRNA antagonists, the second approach aims to restore the loss of function or the normal expression level of tumor suppressive miRNAs through the introduction of miRNA mimics. Unlike oncogenic miRNAs, these miRNAs are depleted in tumor tissues than in normal tissues. As these miRNA mimics have an identical sequence to the depleted endogenous miRNAs, it is anticipated that they will target the same set of mRNAs thereby preventing off-target effects. To date, various tumor-suppressive miRNAs have been discovered and tested in cell lines and mouse cancer models (lung, prostate and pancreas). Restoring two of the well-characterized tumor-suppressor miRNAs (i.e., miR-34a and let-7 tumor suppressor miRNA) using miRNA mimics has shown encouraging therapeutic effects, such as inhibiting tumor growth or reducing tumor burden [97,98]. For example, when KRASG12D autochthonous NSCLS mouse model was systemically treated via tail vein injection with a novel neutral lipid emulsion of either the miR-34a or the let-7b mimic, both sets of mice displayed a significant decrease in tumor burden [97]. Mice treated with miR-34a mimic showed up to 60% decrease in tumor area compared with the control. This suggests that the miR-34a mimic has been successfully delivered to tumor cells and the reduced tumor growth is the outcome of the antioncogenic effects of the miRNA mimic. Similarly, systemic delivery of a lipid-based nanovector of miR-34a showed inhibition of tumor growth without apparent sign of toxicity in subcutaneous and orthotopic in MiaPaCa-2 PDAC xenografts of prostate cancer [98]. In addition, other miRNA mimics such as miR-16 and miR-26a have also shown the desired therapeutic effects in mouse models of cancer [99,100]. Furthermore, to increase their therapeutic efficacy, other small molecules such as cholesterol, folate or glycans can be chemically conjugated

54

www.futuremedicine.com

Clinical relevance of miRNAs in cancer to these miRNAs mimics to enhance the delivery of these conjugates to the specific tumor sites [101]. These small RNA molecules (i.e., RNA mimics) within nanoparticles or liposomes can also be modified with cell-specific receptors by fusing with cell-specific aptamers that recognise specific cell receptors to increase their specificity. Currently, cholesterol-conjugated miRNA mimics have already been shown to induce gene silencing in various cancer models when systemically delivered into mice and nonhuman primates [87,92,102]. The roles of miRNAs as oncogenes and tumor suppressors are sometimes dependent on the cellular context and cancer type. For example, a recent study has reported a downregulation (with tumor-suppressive role) of miR-146 family members (e.g., miR-146a and miR-146b-5p) in castrationresistant prostate cancer [103], but the expression of these miRNAs is elevated in breast and gastric cancer [104,105]. It is noteworthy that cancer is a heterogeneous disease; thus, the same miRNA might not have identical therapeutic effects even in the same cancer type. For instance, reduced lung metastasis without inhibiting primary tumor growth was observed in orthotopic LAPC9 prostate cancer xenografts whereas only reduced burden of tumor was observed in orthotopic PC3 prostate cancer xenografts even though identical lipid-based miR-34a mimic was systemically delivered to both mice [106]. This might be due to the different molecular subtypes of the prostate cancer cells. Hence, appropriate molecular classification of cancer subtypes is important so that the most appropriate therapy can be administrated to the individual.

Clinical applications of other noncoding RNAs As with miRNAs, other noncoding RNAs, such as long noncoding RNAs, small nucleolar RNAs and PIWI-interacting RNAs are also gaining recognition as important regulators in many biological processes. Although they are not as well studied as miRNAs, the disruption of these noncoding RNAs have already been implicated in many cancers and hence could potentially be used as biomarkers or therapeutic targets [107]. A recent study by Gupta et al. found that long noncoding RNA HOTAIR is involved in specific chromatin remodeling and an increased expression of HOTAIR is a strong predictor for metastasis in breast cancer [108]. Furthermore, the overexpression of six small nucleolar RNAs was also observed in tumors from NSCLC patients compared to their adjacent normal tissues [107]. Three of these small nucleolar RNAs (SNORD33, SNORD66 and SNORD76) have been further examined in plasma samples and they were found to be capable of distinguishing NSCLC patients (with 81.1% sensitivity and 95.8% specificity) from normal individuals and individuals with chronic obstructive pulmonary disease. Long noncoding RNA MALAT-1 and HEIH also play a key

www.futuremedicine.com

55

Ng, Ku, Cooper & Soong role in hepatocellular carcinoma where an overexpression of MALAT-1 was found to be a predictor of hepatocellular carcinoma recurrence in patients after liver transplantation, and inhibition of MALAT-1 in HepG2 cells has effectively reduced cell viability, motility, invasiveness and increased cellular sensitivity to apoptosis. Collectively, these results suggest that other noncoding RNAs are also important in tumorgenesis and future research should include a more comprehensive investigation of these noncoding RNAs for various clinical applications and as therapeutic targets similar to miRNAs.

The feasibility of using miRNAs as biomarkers for early detection and prognostication is very encouraging.

Conclusion Over the past decade, the roles of miRNAs in the post-transcriptional regulation of mRNA expression have been increasingly appreciated. miRNAs were found to be involved in many biological processes and systems, and dysregulation of these miRNAs have been implicated in many diseases, including cancers [17]. In addition to their biological roles, miRNAs also serve as biomarkers for clinical applications such as in cancer classification, as miRNA expression profiles have been demonstrated as a more robust molecular signature than mRNA in distinguishing different cancer types and subtypes [18,19]. The advent of NGS technologies has advanced the studies at the whole genome level considerably compared with microarrays because NGS technologies are particularly suited to, and more powerful in identifying, novel miRNAs. Many studies have employed sequencing-based methods to study miRNAs. However, the low-throughput techniques, such as northern blot, in situ hybridization and quantitative real-time PCR are still important for validation of the novel candidates discovered in different types of samples. As technologies advance, investigation of noncoding RNAs without the need to reverse transcribe to cDNA will further increase the accuracy of RNA sequencing and for quantifying their expression [109]. The feasibility of using miRNAs as biomarkers for early detection and prognostication is very encouraging but several issues remain to be addressed before they can be applied in a clinical setting (e.g., the lack of consistency of the results so far). This inconsistency might be due to different sample preparations and microarrays being used to study miRNAs. For example, miRNAs which were identified as the potential biomarkers in the recent studies using newer microarrays (where the probes are designed based on the most updated version of miRNA database) might have been missed by earlier studies using ‘older’ versions of microarrays. However, as the cost of NGS is now plummeting, profiling of all the miRNAs and noncoding RNAs through sequencing is becoming cost effective; this will avoid the

56

www.futuremedicine.com

Clinical relevance of miRNAs in cancer inherent limitations of microarrays, where the detection of miRNAs is restricted by the availability of probes. The sequencing-based approach will simultaneously detect the presence and quantify the expression level of all the RNA species comprehensively, which will obviate the need to design probes for hybridization. Taken together, studies to date have shown promising and very encouraging results for the clinical application of miRNAs in the early detection of cancer, prognostication and the prediction of drug responses. Although miR-122 has successfully completed Phase II clinical trials, the development of small molecules to target aberrant miRNAs requires further investigation such as the development of an efficient and specific delivery means of antagomiRs or miRNA mimics to the targeted sites. Evaluation of potential adverse side effects must also be taken into consideration for the successful implementation of miRNA-based therapeutics. Acknowledgements CS Ku approved the final version and had responsibility for this book chapter as the invited author. CS Ku and HK Ng contributed to the conceptualization of the book chapter. HK Ng contributed to the writing of the book chapter. CS Ku and DN Cooper contributed to the critical reading and editing of the book chapter. Financial & competing interests disclosure The authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties. No writing assistance was utilized in the production of this manuscript.

Summary. ƒƒ Comprehensive transcript profiling has improved our ability to characterize the molecular basis of disease as well to identify biomarkers for applications such as early disease detection, prognostication and the prediction of drug responses. Numerous mRNA expression signatures have been found to be informative in relation to these applications. ƒƒ More recently, miRNAs have also emerged as a group of promising biomarkers for early disease detection, prognostication and the prediction of drug responses. ƒƒ Dysregulation of miRNA expression as a hallmark of cancer has led to the development of small molecules such as miRNA antagonists and miRNA sponges, which can be used in a therapeutic context. ƒƒ miRNAs appear to have a very promising future both as clinical markers and therapeutic tools in cancer.

www.futuremedicine.com

57

Ng, Ku, Cooper & Soong References 1

Lizardi PM, Forloni M, Wajapeyee N. Genome-wide approaches for cancer gene discovery. Trends Biotechnol. 29(11), 558–568 (2011).

2

Kasaian K, Jones SJ. A new frontier in personalized cancer therapy: mapping molecular changes. Future Oncol. 7(7), 873–894 (2011).

3

Glas AM, Floore A, Delahaye LJ et al. Converting a breast cancer microarray signature into a high-throughput diagnostic test. BMC Genomics 7, 278 (2006).

4

5

6

7

8

Korde LA, Lusa L, McShane L et al. Gene expression pathway analysis to predict response to neoadjuvant docetaxel and capecitabine for breast cancer. Breast Cancer Res. Treat 119(3), 685–699 (2010). Loi S, Haibe-Kains B, Desmedt C et al. Definition of clinically distinct molecular subtypes in estrogen receptor-positive breast carcinomas through genomic grade. J. Clin. Oncol. 25(10), 1239–1246 (2007). Paik S, Shak S, Tang G et al. A multigene assay to predict recurrence of tamoxifentreated, node-negative breast cancer. N. Engl. J. Med. 351(27), 2817–2826 (2004). Webber EM, Lin JS, Whitlock EP. Oncotype DX tumor gene expression profiling in stage II colon cancer. Application: prognostic, risk prediction. PLoS Curr. 2, pii:RRN1177 (2010). Kratz JR, He J, Van Den Eeden SK et al. A practical molecular assay to predict survival in resected non-squamous, non-small-cell lung cancer:

58

9

development and international validation studies. Lancet 379(9818), 823–832 (2012).

17

Huang L, Zheng M, Zhou QM et al. Identification of a 7-gene signature that predicts relapse and survival for early stage patients with cervical carcinoma. Med. Oncol. doi:10.1007/s12032-012-01663 (2012) (Epub ahead of print).

18 Lu J, Getz G, Miska EA et al.

10 Iorio MV, Croce CM.

MicroRNA dysregulation in cancer: diagnostics, monitoring and therapeutics. A comprehensive review. EMBO Mol. Med. 4(3), 143–159 (2012).

Farazi TA, Spitzer JI, Morozov P, Tuschl T. miRNAs in human cancer. J. Pathol. 223(2), 102–115 (2011). MicroRNA expression profiles classify human cancers. Nature 435(7043), 834–838 (2005).

19 Volinia S, Calin GA, Liu CG

et al. A microRNA expression signature of human solid tumors defines cancer gene targets. Proc. Natl Acad. Sci. USA 103(7), 2257–2261 (2006).

20 Calin GA, Sevignani C,

11 Bartel DP. MicroRNAs:

genomics, biogenesis, mechanism, and function. Cell 116(2), 281–297 (2004).

Dumitru CD et al. Human microRNA genes are frequently located at fragile sites and genomic regions involved in cancers. Proc. Natl Acad. Sci. USA 101(9), 2999–3004 (2004).

12 Lee Y, Kim M, Han J et al.

21 Landau DA, Slack FJ.

13 Bartel DP. MicroRNAs: target

22 Iorio MV, Visone R, Di Leva G

MicroRNA genes are transcribed by RNA polymerase II. EMBO J. 23(20), 4051–4060 (2004).

MicroRNAs in mutagenesis, genomic instability, and DNA repair. Semin. Oncol. 38(6), 743–751 (2011). et al. MicroRNA signatures in human ovarian cancer. Cancer Res. 67(18), 8699–8707 (2007).

recognition and regulatory functions. Cell 136(2), 215–233 (2009).

14 Lewis BP, Burge CB, Bartel DP.

Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120(1), 15–20 (2005).

15 Kozomara A, Griffiths-Jones S.

miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res. 39(Suppl. 1), D152–D157 (2011).

16 Sayed D, Abdellatif M.

MicroRNAs in development and disease. Physiol. Rev. 91(3), 827–887 (2011).

23 Moskwa P, Buffa FM, Pan Y

et al. miR-182-mediated downregulation of BRCA1 impacts DNA repair and sensitivity to PARP inhibitors. Mol. Cell 41(2), 210–220 (2011).

24 Zheng B, Liang L, Wang C et al.

MicroRNA-148a Suppresses Tumor cell invasion and metastasis by downregulating ROCK1 in gastric cancer. Clin. Cancer Res. 17(24), 7574–7583 (2011).

25 Brueckner B, Stresemann C,

Kuner R et al. The human

www.futuremedicine.com

Clinical relevance of miRNAs in cancer let-7a-3 locus contains an epigenetically regulated microRNA gene with oncogenic function. Cancer Res. 67(4), 1419–1423 (2007).

Science 318(5858), 1931–1934 (2007). 33 Place RF, Li LC, Pookot D,

Noonan EJ, Dahiya R. MicroRNA-373 induces expression of genes with complementary promoter sequences. Proc. Natl Acad. Sci. USA 105(5), 1608–1613 (2008).

26 Gong J, Tong Y, Zhang HM

et al. Genome-wide identification of .s in microRNA genes and the effects on microRNA target binding and biogenesis. Hum. Mutat. 33(1), 254–263 (2012).

28 Sun G, Yan J, Noltner K et al.

SNPs in human miRNA genes affect biogenesis and function. RNA 15(9), 1640–1651 (2009).

29 Lytle JR, Yario TA, Steitz JA.

et al. Amplification and highlevel expression of heat shock protein 90 marks aggressive phenotypes of human epidermal growth factor receptor 2 negative breast cancer. Breast Cancer Res. 14(2), R62 (2012).

35 Irgon J, Huang CC, Zhang Y,

Talantov D, Bhanot G, Szalma S. Robust multi-tissue gene panel for cancer detection. BMC Cancer 10, 319 (2010).

36 Galamb O, Sipos F, Solymosi N

et al. Diagnostic mRNA expression patterns of inflamed, benign, and malignant colorectal biopsy specimen and their correlation with peripheral blood results. Cancer Epidemiol. Biomarkers Prev. 17(10), 2835–2845 (2008).

Target mRNAs are repressed as efficiently by microRNAbinding sites in the 5’ UTR as in the 3’ UTR. Proc. Natl Acad. Sci. USA 104(23), 9667–9672 (2007).

30 Schnall-Levin M, Rissland OS,

Johnston WK, Perrimon N, Bartel DP, Berger B. Unusually effective microRNA targeting within repeat-rich coding regions of mammalian mRNAs. Genome Res. 21(9), 1395–1403 (2011).

31

Moretti F, Thermann R, Hentze MW. Mechanism of translational regulation by miR-2 from sites in the 5’ untranslated region or the open reading frame. RNA 16(12), 2493–2502 (2010).

32 Vasudevan S, Tong Y, Steitz JA.

Switching from repression to activation: microRNAs can up-regulate translation.

40 Schulte JH, Marschall T,

Martin M et al. Deep sequencing reveals differential expression of microRNAs in favorable versus unfavorable neuroblastoma. Nucleic Acids Res. 38(17), 5919–5928 (2010).

34 Cheng Q, Chang JT, Geradts J

27 Mencia A, Modamio-Hoybjor

S, Redshaw N et al. Mutations in the seed region of human miR-96 are responsible for nonsyndromic progressive hearing loss. Nat. Genet. 41(5), 609–613 (2009).

miR16 at 13q14 in chronic lymphocytic leukemia. Proc. Natl Acad. Sci. USA 99(24), 15524–15529 (2002).

37 Zhao X, Rodland EA, Sorlie T

et al. Combining gene signatures improves prediction of breast cancer survival. PLoS ONE 6(3), e17845 (2011).

38 Lee RC, Feinbaum RL, Ambros

V. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 75(5), 843–854 (1993).

39 Calin GA, Dumitru CD, Shimizu

www.futuremedicine.com

M et al. Frequent deletions and down-regulation of micro- RNA genes miR15 and

41

Liu A, Tetzlaff MT, Vanbelle P et al. MicroRNA expression profiling outperforms mRNA expression profiling in formalin-fixed paraffinembedded tissues. Int. J. Clin. Exp. Pathol. 2(6), 519–527 (2009).

42 Donahue TR, Tran LM, Hill R

et al. Integrative survivalbased molecular profiling of human pancreatic cancer. Clin. Cancer Res. 18(5), 1352–1363 (2012).

43 Lanza G, Ferracin M, Gafa R

et al. mRNA/microRNA gene expression profile in microsatellite unstable colorectal cancer. Mol. Cancer 6, 54 (2007).

44 Sieuwerts AM, Mostert B,

Bolt-de Vries J et al. mRNA and microRNA expression profiles in circulating tumor cells and primary tumors of metastatic breast cancer patients. Clin. Cancer Res. 17(11), 3600–3618 (2011).

45 Rosenfeld N, Aharonov R,

Meiri E et al. MicroRNAs accurately identify cancer tissue origin. Nat. Biotechnol. 26(4), 462–469 (2008).

46 Massard C, Loriot Y, Fizazi K.

Carcinomas of an unknown primary origin – diagnosis and treatment. Nat. Rev. Clin. Oncol. 8(12), 701–710 (2011).

59

Ng, Ku, Cooper & Soong 47 Liu R, Zhang C, Hu Z et al.

A five-microRNA signature identified from genome-wide serum microRNA expression profiling serves as a fingerprint for gastric cancer diagnosis. Eur. J. Cancer 47(5), 784–791 (2011).

48 Chen X, Ba Y, Ma L et al.

Characterization of microRNAs in serum: a novel class of biomarkers for diagnosis of cancer and other diseases. Cell Res. 18(10), 997–1006 (2008).

49 Liu CJ, Lin SC, Yang CC, Cheng

HW, Chang KW. Exploiting salivary miR-31 as a clinical biomarker of oral squamous cell carcinoma. Head Neck 34(2), 219–224 (2012).

50 Yu L, Todd NW, Xing L et al.

Early detection of lung adenocarcinoma in sputum by a panel of microRNA markers. Int. J. Cancer 127(12), 2870–2878 (2010).

51

Bryant RJ, Pawlowski T, Catto JW et al. Changes in circulating microRNA levels associated with prostate cancer. Brit. J. Cancer 106(4), 768–774 (2012).

52 Wu CW, Ng SS, Dong YJ et al.

Detection of miR-92a and miR-21 in stool samples as potential screening biomarkers for colorectal cancer and polyps. Gut 61(5), 739–745 (2011).

53 Oberg AL, French AJ, Sarver

AL et al. miRNA expression in colon polyps provides evidence for a multihit model of colon cancer. PLoS ONE 6(6), e20465 (2011).

54 Li B, Hu Y, Ye F, Li Y, Lv W, Xie

X. Reduced miR-34a expression in normal cervical tissues and cervical lesions with high-risk human

60

papillomavirus infection. Int. J. Gynecol. Cancer 20(4), 597–604 (2010).

findings. Radiology 212(1), 61–66 (1999). 62 Brase JC, Johannes M,

55 Schrauder MG, Strick R,

Schulz-Wendtland R et al. Circulating micro-RNAs as potential blood-based markers for early stage breast cancer detection. PLoS ONE 7(1), e29770 (2012).

56 Borel F, Konstantinova P,

Jansen PL. Diagnostic and therapeutic potential of miRNA signatures in patients with hepatocellular carcinoma. J. Hepatol. 56(6),1371–1383 (2012).

63 Liu XG, Zhu WY, Huang YY

et al. High expression of serum miR-21 and tumor miR-200c associated with poor prognosis in patients with lung cancer. Med. Oncol. 29(2), 618–626 (2011).

64 Tsujiura M, Ichikawa D,

Komatsu S et al. Circulating microRNAs in plasma of patients with gastric cancers. Br. J. Cancer 102(7), 1174–1179 (2010).

57 Boeri M, Verri C, Conte D

et al. MicroRNA signatures in tissues and plasma predict development and prognosis of computed tomography detected lung cancer. Proc. Natl Acad. Sci. USA 108(9), 3713–3718 (2011).

65 Zhang X, Yan Z, Zhang J et al.

58 Bianchi F, Nicassio F, Marzi M

et al. A serum circulating miRNA diagnostic test to identify asymptomatic highrisk individuals with early stage lung cancer. EMBO Mol. Med. 3(8), 495–503 (2011).

59 Song MY, Pan KF, Su HJ et al.

Identification of serum microRNAs as novel noninvasive biomarkers for early detection of gastric cancer. PLoS ONE 7(3), e33608 (2012).

Combination of hsa-miR-375 and hsa-miR-142–5p as a predictor for recurrence risk in gastric cancer patients following surgical resection. Ann. Oncol. 22(10), 2257–2266 (2011).

66 Lu Y, Govindan R, Wang L

et al. MicroRNA profiling and prediction of recurrence/ relapse-free survival in stage I lung cancer. Carcinogenesis 33(5), 1046–1054 (2012).

67 Gallardo E, Navarro A, Vinolas

N et al. miR-34a as a prognostic marker of relapse in surgically resected nonsmall-cell lung cancer. Carcinogenesis 30(11), 1903–1909 (2009).

60 Veronesi G, Bellomi M,

Mulshine JL et al. Lung cancer screening with low-dose computed tomography: a non-invasive diagnostic protocol for baseline lung nodules. Lung Cancer 61(3), 340–349 (2008).

61 Kakinuma R, Ohmatsu H,

Schlomm T et al. Circulating miRNAs are correlated with tumor progression in prostate cancer. Int. J. Cancer 128(3), 608–616 (2011).

68 Ueda T, Volinia S, Okumura H

Kaneko M et al. Detection failures in spiral CT screening for lung cancer: analysis of CT

et al. Relation between microRNA expression and progression and prognosis of gastric cancer: a microRNA expression analysis. Lancet Oncol. 11(2), 136–146 (2010).

69 Blenkiron C, Goldstein LD,

Thorne NP et al. MicroRNA

www.futuremedicine.com

Clinical relevance of miRNAs in cancer expression profiling of human breast cancer identifies new markers of tumor subtype. Genome Biol. 8(10), R214 (2007).

Proc. Natl Acad. Sci. USA 109(8), 3024–3029 (2012). 77 Jamieson NB, Morran DC,

Morton JP et al. MicroRNA molecular profiles associated with diagnosis, clinicopathologic criteria, and overall survival in patients with resectable pancreatic ductal adenocarcinoma. Clin. Cancer Res. 18(2), 534–545 (2012).

70 Garzon R, Volinia S, Liu CG

et al. MicroRNA signatures associated with cytogenetics and prognosis in acute myeloid leukemia. Blood 111(6), 3183–3189 (2008).

71 Li X, Zhang Y, Ding J, Wu K,

Fan D. Survival prediction of gastric cancer by a sevenmicroRNA signature. Gut 59(5), 579–585 (2010).

78 Yamamoto Y, Yoshioka Y,

Minoura K et al. An integrative genomic analysis revealed the relevance of microRNA and gene expression for drugresistance in human breast cancer cells. Mol. Cancer 10, 135 (2011).

72 Sempere LF, Christensen M,

Silahtaroglu A et al. Altered microRNA expression confined to specific epithelial cell subpopulations in breast cancer. Cancer Res. 67(24), 11612–11620 (2007).

79 Ji J, Shi J, Budhu A et al.

MicroRNA expression, survival, and response to interferon in liver cancer. N. Engl. J. Med. 361(15), 1437–1447 (2009).

73 Lowery AJ, Miller N, Devaney

A et al. MicroRNA signatures predict oestrogen receptor, progesterone receptor and HER2/neu receptor status in breast cancer. Breast Cancer Res. 11(3), R27 (2009).

74

Fabbri M, Bottoni A, Shimizu M et al. Association of a microRNA/TP53 feedback circuitry with pathogenesis and outcome of B-cell chronic lymphocytic leukemia. JAMA 305(1), 59–67 (2011).

80 Tomimaru Y, Eguchi H,

Nagano H et al. MicroRNA-21 induces resistance to the anti-tumour effect of interferon-alpha/5fluorouracil in hepatocellular carcinoma cells. Br. J. Cancer 103(10), 1617–1626 (2010).

81 Hwang JH, Voortman J,

Giovannetti E et al. Identification of microRNA-21 as a biomarker for chemoresistance and clinical outcome following adjuvant therapy in resectable pancreatic cancer. PLoS ONE 5(5), e10630 (2010).

75 Li D, Liu X, Lin L et al.

MicroRNA-99a inhibits hepatocellular carcinoma growth and correlates with prognosis of patients with hepatocellular carcinoma. J. Biol. Chem. 286(42), 36677–36685 (2011).

82 Giovannetti E, Funel N, Peters

76 Volinia S, Galasso M, Sana ME

et al. Breast cancer signatures for invasiveness and prognosis defined by deep sequencing of microRNA.

www.futuremedicine.com

GJ et al. MicroRNA-21 in pancreatic cancer: correlation with clinical outcome and pharmacologic aspects underlying its role in the modulation of gemcitabine

activity. Cancer Res. 70(11), 4528–4538 (2010). 83 Boni V, Zarate R, Villa JC et al.

Role of primary miRNA polymorphic variants in metastatic colon cancer patients treated with 5-fluorouracil and irinotecan. Pharmacogenomics J. 11(6), 429–436 (2011).

84 Liang D, Meyer L, Chang DW

et al. Genetic variants in microRNA biosynthesis pathways and binding sites modify ovarian cancer risk, survival, and treatment response. Cancer Res. 70(23), 9765–9776 (2010).

85 Esquela-Kerscher A, Slack FJ.

Oncomirs – microRNAs with a role in cancer. Nature Rev. Cancer 6(4), 259–269 (2006).

86 Lee YS, Dutta A. MicroRNAs in

cancer. Annu. Rev. Pathol. 4, 199–227 (2009).

87 Krutzfeldt J, Rajewsky N,

Braich R et al. Silencing of microRNAs in vivo with ‘antagomirs’. Nature 438(7068), 685–689 (2005).

88 Meister G, Landthaler M,

Dorsett Y, Tuschl T. Sequencespecific inhibition of microRNA- and siRNAinduced RNA silencing. RNA 10(3), 544–550 (2004).

89 Orom UA, Kauppinen S, Lund

AH. LNA-modified oligonucleotides mediate specific inhibition of microRNA function. Gene 372, 137–141 (2006).

90 Ebert MS, Neilson JR, Sharp

PA. MicroRNA sponges: competitive inhibitors of small RNAs in mammalian cells. Nat. Methods 4(9), 721–726 (2007).

91 Poliseno L, Salmena L, Zhang

J, Carver B, Haveman WJ,

61

Ng, Ku, Cooper & Soong Pandolfi PP. A codingindependent function of gene and pseudogene mRNAs regulates tumour biology. Nature 465(7301), 1033–1038 (2010). 92 Elmen J, Lindow M, Schutz S

et al. LNA-mediated microRNA silencing in nonhuman primates. Nature 452(7189), 896–899 (2008).

93 Lanford RE, Hildebrandt-

Eriksen ES, Petri A et al. Therapeutic silencing of microRNA-122 in primates with chronic hepatitis C virus infection. Science 327(5962), 198–201 (2010).

94 Ma L, Reinhardt F, Pan E et al.

Therapeutic silencing of miR10b inhibits metastasis in a mouse mammary tumor model. Nat. Biotechnol. 28(4), 341–347 (2010).

95 Huynh C, Segura MF, Gaziel-

Sovran A et al. Efficient in vivo microRNA targeting of liver metastasis. Oncogene 30(12), 1481–1488 (2011).

96 Elyakim E, Sitbon E, Faerman

A et al. HSA-miR-191 is a candidate oncogene target for hepatocellular carcinoma therapy. Cancer Res. 70(20), 8077–8087 (2010).

97 Trang P, Wiggins JF, Daige CL

et al. Systemic delivery of tumor suppressor microRNA mimics using a neutral lipid emulsion inhibits lung tumors in mice. Mol. Ther. 19(6), 1116–1122 (2011).

62

104 Xiao B, Zhu ED, Li N et al.

98 Pramanik D, Campbell NR,

Karikari C et al. Restitution of tumor suppressor microRNAs using a systemic nanovector inhibits pancreatic cancer growth in mice. Mol. Cancer Ther. 10(8), 1470–1480 (2011).

Increased miR-146a in gastric cancer directly targets SMAD4 and is involved in modulating cell proliferation and apoptosis. Oncol. Rep. 27(2), 559–566 (2012).

99 Takeshita F, Patrawala L, Osaki

M et al. Systemic delivery of synthetic microRNA-16 inhibits the growth of metastatic prostate tumors via downregulation of multiple cell-cycle genes. Mol. Ther. 18(1), 181–187 (2010).

Bertrand P et al. Downregulation of BRCA1 expression by miR-146a and miR-146b-5p in triple negative sporadic breast cancers. EMBO Mol. Med. 3(5), 279–290 (2011).

106 Liu C, Kelnar K, Liu B et al. The

microRNA miR-34a inhibits prostate cancer stem cells and metastasis by directly repressing CD44. Nat. Med. 17(2), 211–215 (2011).

100 Kota J, Chivukula RR,

O’Donnell KA et al. Therapeutic microRNA delivery suppresses tumorigenesis in a murine liver cancer model. Cell 137(6), 1005–1017 (2009).

107 Liao J, Yu L, Mei Y et al. Small

nucleolar RNA signatures as biomarkers for non-small-cell lung cancer. Mol. Cancer 9, 198 (2010).

101 Horwich MD, Zamore PD.

Design and delivery of antisense oligonucleotides to block microRNA function in cultured Drosophila and human cells. Nat. Protoc. 3(10), 1537–1549 (2008).

108 Gupta RA, Shah N, Wang KC

et al. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature 464(7291), 1071–1076 (2010).

102 Zimmermann TS, Lee AC,

Akinc A et al. RNAi-mediated gene silencing in non-human primates. Nature 441(7089), 111–114 (2006).

103 Xu B, Wang N, Wang X et al.

105 Garcia AI, Buisson M,

109 Ozsolak F, Platt AR, Jones DR

et al. Direct RNA sequencing. Nature 461(7265), 814–818 (2009).

MiR-146a suppresses tumor growth and progression by targeting EGFR pathway and in a p-ERK-dependent manner in castration-resistant prostate cancer. Prostate 72(11), 1171–1178 (2011).

110 Kasinski AL, Slack FJ.

Epigenetics and genetics. MicroRNAs en route to the clinic: progress in validating and targeting microRNAs for cancer therapy. Nat. Rev. Cancer 11(12), 849–864 (2011).

www.futuremedicine.com

63

About the Authors Vasco Azevedo Vasco Azevedo, a molecular geneticist, graduated from Veterinary School, Federal University of Bahia (Salvador, Brazil). He obtained his Masters and PhD degree in molecular genetics, at Institut National Agronomique Paris-Grignon and Institut National de la Recherche Agronomique, France. He did a post-doctorate in the Microbiology Department of Medicine School, University of Pennsylvania, PA, USA. Since 1995, he has been Professor at the Federal University of Minas Gerais, Brazil. He is a pioneer of genetics of lactic acid bacteria and Corynebacterium pseudotuberculosis in Brazil. He has specialized and is currently researching bacterial genetics, genome, transcriptome, proteome, development of new vaccines and diagnostics against infectious diseases.

Artur Silva, Adriana R Carneiro, Flávia Aburjaile, Luis C Guimarães, Rommel TJ Ramos, Thiago LP Castro, Vinicius Abreu, Wanderson M Silva & Paula Schneider

64

64

© 2012 Future Medicine www.futuremedicine.com

Chapter

4 Comprehensive whole genome and transcriptome analysis for novel diagnostics

Advantages of NGS for diagnostics in comparison to previous strategies  66 Genome assembly & functional annotation

67

Metagenomics

69

Transcriptome

71

Small noncoding RNA as a target for novel drugs discovery

72

doi:10.2217/EBO.12.208

© 2012 Future Medicine

Artur Silva, Adriana R Carneiro, Flávia Aburjaile, Luis C Guimarães, Rommel TJ Ramos, Thiago LP Castro, Vinicius Abreu, Wanderson M Silva, Paula Schneider & Vasco Azevedo Life sciences are undergoing an important revolution led by the new sequencing technologies, also called nextgeneration DNA sequencing (NGS), since they have arisen after the first-generation Sanger methodology. The key feature of the NGS platforms, such as the pyrosequencingbased Roche 454 FLX, is the provision of hundreds of millions of readings in a parallel way, guaranteeing comprehensive short-time assays [1]. Despite all this power of analysis, which is greatly contributing to medical research, there are still obstacles preventing the regular use of NGS by diagnostic laboratories, among which are the high costs involved and the lack of sufficient biological

65

Silva, Carneiro, Aburjaile et al., information linked to specific genetic traits. Transposing those barriers would benefit not only the large-scale detection of human genetic disorders, but also the diagnosis of infectious diseases caused by viruses or pathogenic microorganisms [2].

Next-generation sequencers: DNA sequencing platforms that generate a large amount of data in a single round with high coverage and accuracy.

NGS-based assays may provide rapid identification of pathogens from clinical samples, which is crucial to readily treat the patients and design strategies to suppress outbreaks. Beyond this, the NGS analysis provides comprehensive data to understand particular gene-expression profiles and facilitates the screening for proteins involved in the pathogenicity and metabolic pathways related to the infectious process [3]. The gathered data may also help improve conventional treatment procedures, search for new drug targets, collect epidemiological data, and track the possible sources of contamination – for example, water and food. Although most studies tend to explore separately only the microbial species known to be related to a given condition, there is a growing appreciation for research regarding commensal microbial populations in humans, such as those that inhabit our skin, gut and other mucosal areas in the mouth and genitals [4]. For each human cell, there are at least ten microbial ones in our bodies, suggesting a very intricate cross-talk between human cells and microbial communities [5]. Studies indicate the need to characterize the diversity of microbiota in humans, especially owing to its complex correlation with metabolism regulation, nutrient acquisition and host defence by producing substances involved in mechanisms of microbial antagonism [6]. In this context, the NGS technologies greatly contribute to the assessment of genomic and transcriptomic information that represents the large variety found in colonizing populations, from both healthy and diseased individuals. In this chapter, we address how NGS technologies could be applied for the diagnostics of microbial-related human diseases, discussing genomics and transcriptomics contributions for this purpose.

Advantages of NGS for diagnostics in comparison to previous strategies With the introduction of NGS technologies, the costs to obtain highthroughput data tend to reduce to the previous strategies. An important advantage provided by NGS involves the determination of whole sets of sequences starting from libraries of amplified single DNA fragments, separated into sequencing chips, without the need for cloning in vectors prior to sequence acquisition. Another point to be considered is that the

66

www.futuremedicine.com

Whole genome & transcriptome analysis for novel diagnostics high-throughput feature of NGS makes it feasible to sequence someone’s genome in order to find correlations between normal or diseased phenotypes with genotypes.

Interactome: study of the interaction between microorganisms and the human body.

An enormous volume of threads can be sequenced by NGS platforms, permitting to reach in a single run the high coverage of 200-times a bacterial genome. Therefore, NGS techniques represent a powerful tool that can be applied to metagenomics-based strategies for the detection of unknown viruses and bacteria associated to diseases. The high sensitivity offered by NGS methods is a great advantage when compared with conventional microarray-based assays, because of the potential to detect the full spectrum of viruses and bacteria, including unknown species [7]. NGS is becoming an important ally for the detection of single-nucleotide polymorphisms (SNPs). One example is a NGS-based work with Plasmodium genes, which showed high specificity of SNPs for Plasmodium falciparum that are not present in other human malaria species of parasites [8]. Moreover, in this research, the use of NGS provided results faster and in a less expensive way compared with restriction fragment length polymorphism assays. The detection of insertions and deletions (indels) throughout genomes using NGS technologies is becoming more powerful due to the use of matepaired reads, which enables the mapping of reads separated each of the other by known distances. This facilitates the incorporation of new sequences into the reference genome. The principal weakness of the use of mate-paired reads method is that it currently does not provide efficient detection of single-base indels in homopolymers [9].

Genome assembly & functional annotation The availability of genomic data generated by NGS is increasing at an accelerated rate, revolutionizing biological research. Nevertheless, the processes that ensure the quality of a given genome, such as genome assembling and functional annotation, are not keeping up with the increasing volume of data produced by the rapid pace in sequencing. A properly healed genome provides highly accurate and integral sequences of an organism, greatly contributing to further data mining [10]. After sequencing a genome, the assembly is a crucial step where sequences are filtered according to the quality of the reads and then overlapped into contiguous threads, based on either ab initio approach (only matches in the pool of acquired sequences are considered) or on a reference assembly

www.futuremedicine.com

67

Silva, Carneiro, Aburjaile et al., (the novel readings are aligned according to their similarities with a previously assembled genome). NGS platforms, for example, the SOLiD 5500 (Life Technologies), produce a huge amount of short reads (up to 75 nucleotides). As a partial solution to this problem, we can use pairedend libraries, but these are not yet fully effective. Finally, the manual curation process is necessary in order to close gaps and do an analysis of the genome. Studies show that there is a greater rate of increase in the number of draft genomes, which have not completed the assembly process, compared with the number of complete genomes, in public databases [11]. Following the assembly process, the genome is subjected to gene prediction or gene finding. This process uses computational tools using search algorithms for identify coding regions in genomic DNA. Afterwards, the genome is subjected to automatic annotation, a computational process that compares the obtained sequences with those already featured in databases. This allows a former identification of putative open reading frames. Furthermore, it is recommended that manual curation of the pre-annotated genome be performed for more powerful predictions of protein families, metabolic pathways, phylogenetic ratings; and for facilitated experimental data mining of the literature. Nevertheless, manual curation is a costly process, requiring considerable time and dedication beyond the availability of skilled personnel to execute the needed tasks [11]. Ensuring quality in assembly and annotation is a challenge, with no certain rules to follow. There are also different ways to name a single protein, as well as methods that vary according to each research group. Found among the genetic elements likely to be identified in manual curation are: fragments of viral origin, mobile elements such as transposons and integrons, alternative splicings, pseudogenes, repetitive elements, and relevant SNPs. One of the uses of SNP mapping is to determine where one organism differs from another by comparing their genomic sequences [11]. Taking advantage of the NGS technologies, German authorities were able to successfully identify and characterize the Shiga toxin-producing Escherichia coli O104:H4 in the early stages of an outbreak. Therefore, with this optimization of the process, they were able to control the spread of this outbreak in record time [12]. In the future, greater effort will be necessary to develop new bioinformatics tools concurrent with the availability of NGS. Furthermore, due to the volume of data being generated in a short period of time, these data must be subjected to critical analysis and quality management.

68

www.futuremedicine.com

Whole genome & transcriptome analysis for novel diagnostics The combination of these new technologies with sophisticated software solutions clearly reveals a scenario where the detection of pathogens in clinical samples, based on whole-genome sequences, becomes increasingly routine. Certainly, genome-based comprehensive analysis of the causative agents would help doctors to select the most appropriate treatment for a given patient, in addition to contributing to the development of prevention and control strategies for disease [10,12].

Metagenomics Culture techniques and biochemical tests have been the gold standard for the detection of bacteria specimens in basic research and clinical diagnostics for over a century. This was possible due to postulates of Robert Koch in 1880, which revolutionized the microbiological research with the development of culture media, hence allowed the creation of two bacterial groups: cultivable and uncultivable. Moreover, with the introduction of molecular biology in the last century, an important mark in the phylogenetic study of microorganism occurred from the 1980s. Recent studies based on amplification and sequencing of the 16S ribosomal DNA, which allowed microbial identification independent from isolation or culture, also revolutionized the microbiological research. This type of study allows the characterization of several unknown bacteria species. Currently, advances in the techniques of molecular biology allows the characterization in large scale of the microbial community in a given environmental, thus creating a new genomic approach called metagenomics. Metagenomic analysis aims to the identification of culture-independent microorganisms in a given habitat. Currently with the advances in genomic research, the identification of the sequence is realized by NGS. This study is divided in two strategies (Figure  4.1): shotgun metagenomic and sequence randomic fragments, construction of a metagenomic library, this can be divided in two categories homology-based screening: to identify regions of already-known genes or protein families; and functional screening: based on the specific assay to identify relatively known genes or that possess a novel function [13]. The human body is habituated by microorganisms. The interaction of this microbial community with our body has been the basis for various research. In this regard, metagenomic approaches have been applied to expand our knowledge Metagenomics provides identification of microorganisms in a culture-independent way, about these interactions. Thereby, some assisting the study of the influences of microbial metagenomic studies have been realized populations in diseases such as obesity, diabetes, for the characterization of oral, vaginal, colon cancer and bowel disorders.

www.futuremedicine.com

69

70

Bacterial vector

DNA fragment

Replication origin

Selection marker

DNA fragment cloned

F

b Assay function

a Identification of homology

Metagenomic library

4 Amplification of specific fragments

R

AGTTGCACCAGTACATCGGCA TCAACGTGGTCATGTAGCCGT

3

(1) Genomic DNA eviromental extract of different microorganism. (2) Identification of sequence randomic fragments. (3) After cloning process, construction metagenomic libraries; (3a) Identification of clones by homology; (3b) Selection of clones to identification of genes with specific function. (4) Identification of clone by conventional ribosomal DNA PCR amplification and sequencing. (5) Results of sequence by next-generation sequencing.

5 Sequence

AGTTGCACCAGTACATCGGCAGG GCCATACCTAGCATCGACGGCTA ATTACTAGAGCTCCTATTGACAGA CTCAGACCTATTAGTATAGCATAC

2 Shotgun sequence

1 Genomic DNA

Figure 4.1. Metagenomics approaches: workflow.

Silva, Carneiro, Aburjaile et al.,

www.futuremedicine.com

Whole genome & transcriptome analysis for novel diagnostics skin, gastrointestinal and gut microbial flora. These studies show that diseases such as obesity, bowel diseases, diabetes and colon cancer may be influenced by the microbial populations. In this context, application of metagenomic approach can be used in both the scientific research and clinical diagnostic.

Transcriptomics evaluates gene-expression profiles in conditions of interest, for instance, of a microorganism exposed to environmental stress. Allows the identification of genes related to pathogenicity in several infectious diseases, leading to the discovery of new therapeutic targets and development of drugs.

Metagenomic studies associated the intestinal disorders, such as irritable bowel syndrome, demonstrated that the g-proteobacteria show great percentage in patients with irritable bowel syndrome. These patients have various levels of abdominal pain and the great frequency this associated a specific microorganism such as Alistipes [14]. The use of metagenomic approach in the detection of bacterial pathogens in brain abscesses showed polymicrobial infection composed of uncultivated bacteria species and novel microorganisms that had never been found in brain abscess [15]. The knowledge regarding the interactions of the microbial community with the human organism and its impact on the human health is essential to provide information about the genetic basis of these microorganisms. Moreover, it allows the identification of unknown microorganisms, which in their majority are uncultivable and furnish important information about the functional relation of microbial genes with illness. Such knowledge will promote a more comprehensive view of the cellular process involved in this interaction symbiosis, leading for the development of diagnostic methods, treatments and disease prevention.

Transcriptome Besides the advantages concerning the genome structure, the NGS technologies provided the new approach to evaluation of gene expression, called RNA-Seq, which consist of cDNA sequencing where the gene expression is based on the sequencing coverage [16]. The scientific community has been testing the NGS platforms to identify infectious agents, which cause many diseases in human, with transcriptome approaches using in silico methods through bioinformatic tools and a set of molecular biology techniques, such as PCR, immunohistochemistry and serological analysis [17,18]. The identification of infectious agents as virus and bacteria by RNA-Seq method allows the detection and quantification of the expression levels in humans.

www.futuremedicine.com

RNA-Seq: new method to evaluate whole transcriptomes through next-generation sequencing platforms. Transcriptome: set of transcripts of an organism.

71

Silva, Carneiro, Aburjaile et al., Moore et al. evaluated the accuracy of RNASeq to identify sequences of micro­ organisms in biopsy cells of colorectal tumor tissue. In the same work, the authors presented a bioinformatic pipeline to identify probable microorganism transcripts in samples of human tissues (Figure 4.2) [17].

Small RNA: nonprotein encoding RNA that exerts control over both synthesis and activity of specific proteins.

Such promising strategy can be applied in the diagnosis of infectious agents involved in the emergence of some kind of cancer, through the identification of differential expressed genes related to pathogenicity by using comparative analyses between normal and tumor tissues from the same individual [17]. The major limitation of this pipeline is the need for available sequences in public data banks to be used as reference for transcripts mapping and to help in the discovery of new transcripts of microorganisms [17]. Semenov et al. conducted RNA-Seq analysis of human blood plasma using SOLiD platform, which allowed to identify the types of RNA circulating in eight apparently healthy individuals by reads mapping against human RNA data banks through Bowtie software [19]. Thus, rRNA transcripts, mitochondrial transcripts, miRNA, scRNA, small nuclear RNA, small nucleolar RNA and fragments of mRNA were detected, as well, new transcripts normally found in blood plasma. The unmapped reads were used to align against the data bank of the human microbiome (ftp.ncbi.nih. gov/genomes/HUMAN_MICROBIOM) and viruses genomes (ftp.ncbi.nih. gov/genomes/Viruses), to identify RNA of microorganisms. The obtained sequences showed the presence of ribosomal and transporter RNA of the following bacterial genera: Escherichia, Acinetobacter and Propionibacterium and human herpes virus 1.

Small noncoding RNA as a target for novel drugs discovery Small RNAs (sRNAs) include several classes of noncoding RNAs: miRNAs, short interfering RNAs, small nucleolar RNAs and small nuclear RNAs. These sRNAs do not encode proteins, but exert control over both the synthesis of proteins and the activity of specific proteins by binding to them. In prokaryotic organisms, the sRNA-coding sequences are normally found in intergenic regions and show high diversity in size, ranging from 50 bp to 500 bp [18,20]. The bacterial sRNAs are differentially expressed in vitro or in vivo; and as they have been widely found in affected host tissues, potentially mediating the course of diseases, they are likely to be useful biomarkers for diagnostics [21]. The use of chromatin immunoprecipitation (ChIP)

72

www.futuremedicine.com

Whole genome & transcriptome analysis for novel diagnostics Figure 4.2. Flowchart of the pipeline for detection of transcripts of microorganisms generated from RNA-Seq libraries. Raw reads of the RNA-Seq library

Reads