Omics for Animal Sciences: Principles and Approaches 9781685079017, 1685079016

114 57 29MB

English Pages [272] Year 2022

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Omics for Animal Sciences: Principles and Approaches
 9781685079017, 1685079016

Table of contents :
Contents
Foreword
Preface
Chapter 1
Introduction of Omics Loop
Abstract
Introduction
Omes and Omics
Omes and Omics People
Omics Concept
Omics Studies Types
Genomics
Cognitive Genomics
Comparative Genomics
Functional Genomics
Metagenomics
Neurogenomics
Pangenomics
Personal Genomics
Epigenomics
Nucleomics
Lipidome
Lipidomics
Proteome
Proteomics
Immunoproteomics
Nutriproteomics
Proteogenomics
Structural Genomics
Glycomics
Foodomics
Transcriptomics
Metabolism
Nutrition, Pharmacology and Toxicology
Integration of Omics
Conclusion
References
Chapter 2
Animal Genomics
Abstract
Introduction
Structural Genomics
Functional Genomics
Subdivision of Genetics
Animals as a Genetic Model
Structure of Genome
Types of Genomes
Mitochondrial Genomes: Ancestral and Derived
Discovery of mtDNA in Donkey
Composition of DNA
Stability of Genome
Chargaff’s Rule
Form of DNA
Characteristics of Genome
Genome Project
Gene Desert
Advances in Buffalo Genome
Buffalo Genome and 90K SNP Chip
The Bos Taurus and Domestic Cow Whole-Genome Assembly
Recombinant DNA Technology in Domestic Animals
Characteristics of the Mammalian Mitochondrial Genome That Are Conserved throughout the Bovine Mitochondrial DNA Sequence
Bison Bison and Bison-Cattle Hybrid Mitochondrial DNA Sequence Analysis: Function and Phylogeny
Conclusion
References
Chapter 3
Transcriptomics
Abstract
Introduction
Transcriptomic Techniques
Expressed Sequence Tags (ESTs)
Serial and Cap Analysis of Gene Expression (SAGE/CAGE)
Microarrays
RNA-Seq
Validation
Applications
Transcriptome Databases
Transcriptomics and the Mediterranean Diet
Transcriptomics and IVF in Bovine
Potential Biomarkers in Cows by Granulosa Cell Transcriptomics
Transcriptional Role for Biosynthesis of Milk
Liver Transcriptomic in Beef Cattle
Comparative Transcriptomics
Transcriptome Studies of Ovarian Granulosa Cells in Buffalo
A Beneficial Effect on Hoof Transcriptomics in Cow
Whole Blood Transcriptomics in Pigs
Transcriptomics and Ruminant Methanogenesis
A Transcriptomics Approach in Arabidopsis Thaliana
Conclusion
References
Chapter 4
Animal Proteomics: An Overview
Abstract
Introduction
Post-Translational Modifications
Phosphorylation
Ubiquitination
Further Modifications
Proteomics Databases
Detection of Proteins
Using Antibodies for Detection of Proteins
Antibody-Free Detection
Methods for Detection
Separation Methods
Hybridization of the Technologies
Modern Research Methodologies
High-Throughput Technologies for Proteomics
Mass Spectrometry and Protein Profiling
Protein Chips
Reverse-Phased Protein Microarrays
Practical Uses of Proteomics
Interaction Proteomics and Protein Networks
Expression Proteomics
Biomarkers
Proteogenomics
Structural Proteomics
Proteomics Method in Quantification of Dairy Products
Proteomics of Dairy Cow Fatty Liver Metabolism
Milk Authenticity by Ion-Trap Proteomics
Comparative Proteomics Analysis of Laminitis in Chinese Holstein Cows
Proteome Dynamics of Autoimmune Uveitis in Horse
Proteomics Analysis of Frozen Horse Mackerel
Proteomic Characterization of Bovine Mammary Gland
Bovine Mastitis and Proteomics
Conclusion
References
Chapter 5
Metabolomics
Abstract
Introduction
Metabolomics Related Definitions
Metabolome
Metabolome Mapping
Metabolomics
Metabolic Profiling
Metabolic/Metabolomics Fingerprinting
Metabolic/Metabolite Target Profiling
Untargeted Metabolic Analysis
Metabonomics
Metabolomics Technologies
Classification of Metabolites
Different Analytical Techniques
Metabolomics in Different Areas of Research
Metabolomics and Male Fertility
Metabolomics and Livestock Genomics
Biomarker Discovery in Animals and Metabolomics
Canine Hepatology and Metabolomics
Use of Anabolic Steroids in Cattle and Metabolomics
Genetically Modified Crops and Metabolomics
Asthenozoospermia and Metabolomics Fingerprinting
Nuclear Magnetic Resonance Metabonomics and Somatic Cell Count (SCC) in Cattle Milk
Milk Metabolite Profiles and Milk Traits of Holstein Cows
Grain Diets Effect on Rumen Health and Metabolomics
Genetic Modulation of Mammalian Growth and Metabolomics
The Human Metabolome Database (HMDB)
The Bovine Ruminal Fluid Metabolome
Metabolomics: Beyond Biomarkers and towards Mechanisms
Future Perspectives of Metabolomics
Conclusion
References
Chapter 6
Molecular Markers
Abstract
Introduction
Single Nucleotide Polymorphism (SNP)
Types of SNPs
Analysis of SNP’s
Databases and Programs for SNP’s
Importance and Applications of Single Nucleotide Polymorphisms
RFLP (Restriction Fragment Length Polymorphism)
Principle
Steps for Analysis
Applications of RFLP
Negative Aspects of RFLP
Alternatives of RFLP
Cleaved Amplified Polymorphic Sequence
Analysis Steps
Applications of CAPS
Negative Aspects of CAPS
Terminal Restriction Fragment Length Polymorphism
Analysis Steps
Advantages
Disadvantage
Amplified Fragment Length Polymorphism (AFLP)
Principle
Method
Applications
Advantages
Disadvantages
Variable Number Tendem Repeats (VNTR)
Procedure
Applications
Disadvantages
Short Tandem Repeats (STR’s)
Classification of STR’s
Mutations in STR’s
13CODIS (Combined DNA Index System)
Advantages of CODIS Short Tandem Repeats
Method Used for Analysis
Applications and Advantages of STR’s
Disadvantages
Random Amplified Polymorphic DNA (RAPD)
Procedure
Applications and Advantages
Disadvantages
Restriction Site Associated DNA (RAD) Marker
History of RAD Markers
Restriction Site Associated DNA (RAD) Marker Genotyping
Restriction Site Associated DNA (RAD) Markers
RAD Mapping Principles
Steps Involved in RAD Mapping
Isolation of RAD Tags
Modified RAD Tag Isolation Procedure
RAD Markers Typing and Identification
RAD Markers Identification and Typing on Microarrays
Identification and Typing of RAD Markers through RAD Tag Sequencing (RAD-Seq)
Conclusion
References
Chapter 7
QTL Analysis: Ancient and Modern Perspectives
Abstract
Introduction
Quantitative Trait Locus (QTL)
Quantitative Trait
Multifactorial Traits
QTL Detection
QTL Mapping
Simple Marker Analysis
Analysis of Variance
Interval Mapping
Composite Interval Mapping
Family-Pedigree Based Mapping
Linkage Analysis
Parametric Linkage Analysis
Major Types of Genetic Markers
Morphological/Classical/Visible Marker
Biochemical Markers
Molecular Markers
Types of Molecular Markers
Linkage Map
Procedure of Constructing a Linkage Map
Future of QTL Mapping
Genetic Improvement Programs for Dairy Cattle and Other Livestock
Infrastructure of Data Collectionmilk Recording
Pedigree Information
Health, Fertility, Calving Ability, and Longevity Data
Progeny Testing
Estimating Breeding Values
Pre-Adjustment for Established Effects of the Environment
Contemporary Groups
Genetic Evaluations of Animal Models
Four Paths of Selection
Increased Productivity through Selection
Milk Production
Maintenance Costs and Milk Production Efficiency
Functional Traits Selection
Calving Performance
Fertility in Males and Females
Sire Selection
Design for Livestock QTL Detection and Implications for MAS
GWAS and QTL
Meta-Analysis Methodology
Genomic Selection
Conclusion
References
Chapter 8
Epigenetics and Its Applications
Abstract
Introduction
Rise of Epigenetics
Epigenetic Mechanisms
DNA Methylation
DNA Methylation and Gene Expression
Histone Modifications
Chromatin Variation
Noncoding RNA
Inheritance and Epigenetics
X-Chromosome Inactivation
Genomic Imprinting
Environment and Inheritance
Inheritance of Phenotypic Variation in Livestock
Applications of Epigenetics
Epigenetics and Livestock
Epigenetic Changes and Cloning of Domestic Animals
Epigenetics and Nutrition
Epigenetics and Therapeutics
Conclusion
References
Chapter 9
Nutrigenomics
Abstract
Introduction
Nutrigenetics
Nutrigenomics Concept
Modern Biotechnology Tools Related to Nutritional Genomics
Nutrigenomics as a Holistic Approach
Interaction between Genes and Nutrients
Gene Expression Profiling
Approaches of Nutrigenomics
Nutrigenomics Is Complicated
Application of Nutrigenomics for Human Nutrition and Health
Application of Nutrigenomics in Animal Sector
Application of Nutrigenomics in Ruminants
Bovine Genome Characteristics
Ruminant Nutritional Genomics
Applying Nutrigenomics for Better Nutrition and Health
Application of Nutrigenomics in Ruminants for Higher Fat Quantity in Milk
Application of Nutrigenomics in Ruminant Reproduction and Fertility
The Technologies
Single Nucleotide Polymorphisms
Biomarkers
Transcriptomics
Proteomics
Metabolomics
Data Integration and the Omics Workflow
Gene Diet Disease Interaction
Nutrigenetic Diseases
Nutrigenomics and Diet Supplementation
Regulatory, Ethical and Social Implications of Nutrigenomics
Opportunities and Challenges
Conclusion
References
Chapter 10
Next-Generation Sequencing: Advantages, Disadvantages, and the Future
Abstract
Introduction
First Generation Sequencing
Second Generation Sequencing
Third Generation DNA Sequencing
Approaches for Next Generation Sequencing
Pyrosequencing
Pyrosequencing Principle
Benefits of Pyrosequencing
Reversible Terminator Sequencing Technology’s Working Mechanism
Chemical Aspects of Next Generation Sequencing
General Workflow of NGS
DNA Sequencing
Fragmentation
Size Selection
Polymerase Chain Reaction (PCR)
Bioinformatics in Next-Generation Sequencing
Next Generation Sequencing Data Analysis
Bioinformatics for Next Generation Sequencing Data
Alignment
De-Novo Assembly
Identification of SNP/Indel detection
Alignment/Assembly Viewers
NGS Technologies Overview
Roche/454
Illumina Genome Analyzer
AB SOLiD
HeliScope
Reversible Termination Sequencing Technology
Next Generation Sequencing Data Analysis
Conclusion
References
Chapter 11
Next-Generation Sequencing Technologies for Livestock Improvement
Abstract
Introduction
Exome and Targeted Sequencing
Whole-Genome Sequencing
Transcriptome Sequencing (RNA-seq)
Large-Scale Genome Sequencing Projects
Sequence Census Applications
Discovery of Novel RNAs and Profiling Small Non-Coding RNA
Annotation of Protein-Coding Genes Based on Transcriptome Sequencing Data
Detection of Aberrant Transcription Events
Forensic Application Prospects of NGS Technology
Short Tandem Repeat (STR) Analysis
Mitochondrial Genome Analysis
Y Chromosome Analysis
Forensic Microbiological Analysis
Plant and Animal DNA Analysis
Ancestry Research and Phenotypic Inferences
Epigenetic Analysis
MicroRNA Analysis
Future Perspectives
Conclusion
References
About the Authors
Index
Blank Page

Citation preview

Biochemistry and Molecular Biology in the Post Genomic Era

No part of this digital document may be reproduced, stored in a retrieval system or transmitted in any form or by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained herein. This digital document is sold with the clear understanding that the publisher is not engaged in rendering legal, medical or any other professional services.

Biochemistry and Molecular Biology in the Post Genomic Era The Future of Metabolic Engineering Abhishek Sharma, PhD and Dhruti Amin, PhD (Editors) 2022. ISBN: 978-1-68507-362-6 (Hardcover) 2022. ISBN: 978-1-68507-464-7 (eBook) Recent Advances in Computer Aided Drug Designing Akhil Varshney, PhD and Ashutosh Mani, PhD (Editors) 2021. ISBN: 978-1-53619-739-6 (Hardcover) 2021. ISBN: 978-1-53619-904-8 (eBook) A Closer Look at Proteolysis Jelena Radosavljević (Editor) 2020. ISBN: 978-1-53618-677-2 (Hardcover) 2020. ISBN: 978-1-53618-743-4 (eBook) Caspase-3: Structure, Functions and Interactions Lunawati L. Bennett, PhD (Editor) 2020. ISBN: 978-1-53618-610-9 (Softcover) 2020. ISBN 978-1-53618-686-4 (eBook) HSP70s: Discovery, Structure and Functions Rajib Deb, PhD (Editor) 2020. ISBN: 978-1-53618-179-1 (Softcover) 2020. ISBN: 978-1-53618-208-8 (eBook)

More information about this series can be found at https://novapublishers.com/shop/hsp70s-discovery-structure-and-functions/

Asif Nadeem and Maryam Javed

Omics for Animal Sciences Principles and Approaches

Copyright © 2022 by Nova Science Publishers, Inc. https://doi.org/10.52305/PCIP7813 All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted in any form or by any means: electronic, electrostatic, magnetic, tape, mechanical photocopying, recording or otherwise without the written permission of the Publisher. We have partnered with Copyright Clearance Center to make it easy for you to obtain permissions to reuse content from this publication. Simply navigate to this publication’s page on Nova’s website and locate the “Get Permission” button below the title description. This button is linked directly to the title’s permission page on copyright.com. Alternatively, you can visit copyright.com and search by title, ISBN, or ISSN. For further questions about using the service on copyright.com, please contact: Copyright Clearance Center Phone: +1-(978) 750-8400 Fax: +1-(978) 750-4470 E-mail: [email protected].

NOTICE TO THE READER The Publisher has taken reasonable care in the preparation of this book, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained in this book. The Publisher shall not be liable for any special, consequential, or exemplary damages resulting, in whole or in part, from the readers’ use of, or reliance upon, this material. Any parts of this book based on government reports are so indicated and copyright is claimed for those parts to the extent applicable to compilations of such works. Independent verification should be sought for any data, advice or recommendations contained in this book. In addition, no responsibility is assumed by the Publisher for any injury and/or damage to persons or property arising from any methods, products, instructions, ideas or otherwise contained in this publication. This publication is designed to provide accurate and authoritative information with regard to the subject matter covered herein. It is sold with the clear understanding that the Publisher is not engaged in rendering legal or any other professional services. If legal or any other expert assistance is required, the services of a competent person should be sought. FROM A DECLARATION OF PARTICIPANTS JOINTLY ADOPTED BY A COMMITTEE OF THE AMERICAN BAR ASSOCIATION AND A COMMITTEE OF PUBLISHERS. Additional color graphics may be available in the e-book version of this book.

Library of Congress Cataloging-in-Publication Data ISBN:  H%RRN

Published by Nova Science Publishers, Inc. † New York

Dedicated to

Rosalind Elsie Franklin whose contributions to the discovery of the structure of DNA were largely recognized posthumously

Contents

Foreword

........................................................................................ ix Peter Thomson

Preface

........................................................................................ xi

Chapter 1

Introduction of Omics Loop ......................................... 1

Chapter 2

Animal Genomics ........................................................ 19

Chapter 3

Transcriptomics .......................................................... 37

Chapter 4

Animal Proteomics: An Overview ............................. 59

Chapter 5

Metabolomics............................................................... 77

Chapter 6

Molecular Markers ................................................... 101

Chapter 7

QTL Analysis: Ancient and Modern Perspectives ............................................................... 123

Chapter 8

Epigenetics and Its Applications .............................. 143

Chapter 9

Nutrigenomics............................................................ 171

Chapter 10

Next-Generation Sequencing: Advantages, Disadvantages, and the Future ................................ 191

Chapter 11

Next-Generation Sequencing Technologies for Livestock Improvement ...................................... 229

About the Authors ............................................................................... 253 Index

..................................................................................... 255

Foreword Improvements in biotechnology have allowed the scientific community to consider the study of DNA in its entirety, i.e., the genome, rather than being restricted to focus on a handful of genes. The first map of the human genome resulting from the Human Genome Project was released in the mid 1990s. Since then and with subsequent improvements, the information has allowed biomedical scientists to ‘deep dive’ into understanding the molecular machinery of human life and exploring the genetic diversity within the human population including its pre-history. Importantly, this information has been used to diagnose disease and develop targeted therapies such as for treatment of cancer. Since then, the genomes of many other species have been sequenced, from a range of animal, plant, microbe and viral. However, there have been astounding biotechnology developments across many other areas of molecular biology. For example, transcriptomics deals with the expression levels of all genes in an organism: often the focus is comparing gene expression levels across different tissues or across different life stages. The area of proteomics is concerned with the study of the entire set of proteins expressed by an organism, i.e., their proteome. Similarly, the entire set of metabolites, originating from the proteome, i.e., the metabolome, is now being studied and utilised. Collectively, these four areas, genomics, transcriptomics, proteomics and metabolomics, are known as ‘omics’. With each of these ‘omics’ technologies, the trend is for (1) greater capacity, and (2) lower cost, allowing routine use to be viable in many situations. Accompanying this has been the developments of data-handling procedures, i.e., bioinformatics, and this has been facilitated by improvements in computing capacity and processing speed. An important development in omics is the concept of ‘integrative omics’, where the four of these areas are synthesised to provide a very detailed understanding of the functional biology of an organism. As with any emerging discipline it takes some time before developments are described in reference and textbooks making them more accessible to a wider readership. While a number of books have been published over the past five years or so, primarily with applications in biomedical, clinical, microbiological and plant agriculture fields, there is a gap on books that deal with applications in animal science. However, the current book, Omics for Animal Sciences, addresses this gap. Dr Asif Nadeem and Dr Maryam Javed

x

Peter Thomson

have done a very thorough job at introducing the range of omics topics, including the four areas listed above as well as other related topics. Examples originating mostly from animal research are used to highlight concepts, and sections are dedicated specifically to animal genomics, and their application to livestock improvement. The book is thoughtfully constructed, and well written and I believe will be a useful resource both as a reference text and for teaching courses on the range of omics techniques, particularly for those with an interest in animal or veterinary science.

Peter Thomson Honorary Associate Professor The University of Sydney, Australia

Preface “Omics” is the field of research analyzing and integrating studies of many different “omes,” including genome, proteome, metabolome, using bioinformatics and computational biology. Omics aims at the collective characterization and quantification of pools of biological molecules that translate into the structure, function, and dynamics of an organism or organisms. These studies enable biological engineers to design innovative applications in different fields including veterinary medicine, livestock production and wild life preservation. Many “omes” beyond the original “genome” have become useful and have been widely adopted by research scientists. “Proteomics” has become well-established as a term for studying proteins at a large scale. “Omes” can provide an easy shorthand to encapsulate a field; for example, an interactomics study is clearly recognizable as relating to large-scale analyses of gene-gene, protein-protein, or protein-ligand interactions. Tools developed by such studies are invaluable for identifying, measuring and reproducing valuable genetic traits in domestic animals as well as livestock. Selection of productive traits has led to amazing gains in most livestock species. In an agricultural country like Pakistan, yield of good biological produce is of vital importance as a large portion of the national income depends on it. An intense focus needs to be maintained upon the research that can help combat problems faced in our agrarian system and generation of facilities to strengthen it. This book on Animal Omics is expected to bring important issues to the focus so that technologies can be devised, and suggestions be made in order to improve the system. This covers all the principles that imply on the animal genomic technologies for selection of superior animals through whole genome sequencing, transcriptome analysis, identification of selection signatures for traits of economic significance and marker assisted selection. It also provides ample knowledge of application of genomic information to develop genetically modified organisms by recombinant DNA technology and genome editing. It will help introduce the information gained by recent research to the active professionals who are working in the field, and in turn these professionals can help guide the direction of the future research towards projects that are more important for the country by virtue of their application.

Chapter 1

Introduction of Omics Loop Abstract Research in biological sciences has been transformed by advancements in the high-throughput technologies. Though these advances have increased our understanding of various biological phenomena, it has also revealed complex nature of their underlying mechanisms. So, new strategies and tools are required to investigate and comprehend such tremendous biological complexity. Such new approaches are embodied by ‘omics,’ which is defined as the comprehensive study of related sets of biological molecules, including genomics, transcriptomics, proteomics, and metabolomics, etc. Many bioinformatics-based methods are accessible to investigate and exploit the large quantity of data created by analyzing the many omes and their interactions, with the goal of uncovering the molecular interactome underpinning any given physical condition of living organisms. As sample analysis costs and processing times continue to decrease, the number of omics datasets continue to increase, further giving birth to disciplines like foodomics, glycomics, lipidomics, and pharmacogenomics etc. However, as the number of disciplines increase in the omics filed, so does the need to integrate these areas of research. In this chapter, we provide an in-depth information regarding the development of omics, existing tools, and various subdisciplines in omics research which will serve as a comprehensive guide to the researchers interested in omics research as well as those interested in integrating its various aspects in their own research areas.

Keywords: omics, bioinformatics, molecular interactome, existing tools, sub-disciplines

Introduction We have seen a dramatic shift in the way biomedical research is conducted over the last ten years. Now we can also examine the influence of the Human Genome Project on the discovery and development of different therapeutics (Bilello, 2005). More and more researchers are interested in examining the

2

Asif Nadeem and Maryam Javed

interaction of genes and proteins that generate various other proteins, now that a preliminary draft of the human genome has been completed. Sequence variants are quite prevalent, according to the complete sequence of human genome. A wide number of genetic changes (called polymorphisms) have already been shown to be linked with human diseases and complications from toxicants or drugs exposure, implying genome sensitivity to pharmaceuticals and environmental factors, disease vulnerabilities, and response to therapeutics. The implementation of “-omic-” methods may result in a paradigm shift and opening of new horizons in healthcare.

Omes and Omics Various research areas in biological sciences like genomics, proteomics, and interactomics etc., have one term common in them i.e., the use of suffix-omics. The similar suffix ome is concerned with the targets of research in such domains as the genome, proteome, and interactome. However, the suffix “ome-” is sometimes misinterpreted as pointing to a totality of some kind, as well as intricate networks inside the omes. Jong Bhak openfree provided the ome vs omics graph, which demonstrates that the cost associated with every multi-omics sample will continue to rise until the technology reaches a level where a high number studies (big data analytics) have become automated and the number of ome and omics studies intersect each other at one balance point.

Figure 1. Ome versus Omics graph. Image is adapted from http://omics.org/.

Introduction of Omics Loop

3

Omes and Omics People The term “ome” was first used by molecular biologists and bioinformaticists in a broad sense. Bioinformaticists at Cambridge, UK, where several early bioinformatics including omics-related labs like the Cavendish lab, EBI, MRC Centre, Sanger Centre, genetics, and biochemistry and genetics departments, were the early major proponents of this technology. The earliest genome and proteome research initiatives, for example, were conducted at the MRC centre. As a result, University of Cambridge had some of the earliest bioinformaticists who specialized in genomic sequences, structures, and generating databases related to them. However, it is EBI which is known for having the first bioinformaticists. The term textome, for example, was coined by Christos Ouzounis’ lab while working at the EBI. Several academics during the mid-1990s did not take the omics and omes trend seriously and joked about the usage of such terms and technologies. On the other hand, there were also many scientists especially the younger ones who realized the potential of these techniques and developed and organized many conceptual frameworks of the omes and omics. One researcher at the Cambridge University named Jong Bhak was one of the earliest scientists to adopt this omes and omics, as well as the biofying trend. In the other side of the western world, Harvard Medical School’s Church lab advocated for the conceptualization of omes and omics in the United States. Mark Gerstein (who obtained his Ph.D. at the MRC centre in University of Cambridge, UK) was also involved in this movement at Yale. One trend emerged from the historical data. As researchers became more interested in combining biology and informatics, they began to employ omics. For biologists, the term -omics effectively communicated a fundamental concept: the complex systems approach’s implications, which is strongly linked to the studying networks, emerging characteristics, and encapsulating principles in conceptual computer science. Steward Kauffman’s ideas were picked up by biologists who also understood informatics. Computer scientists and Physicists published some publications which opened a debate on applications of scale-free network features in biological systems in the late 1990s and in the early years of 21st century. These academic discourses helped pay for improvements in the omics technology which led to the expansion of the usage of omics as a means of describing complex heterogenous object networks.

4

Asif Nadeem and Maryam Javed

Omics Concept Recent innovations that allow for the simultaneous screening of hundreds and even thousands of micro and macro molecules propose to enable the continuous surveillance of numerous (or even all) major biological pathways. The modern “global” ways of detecting and quantifying cellular molecules families, like RNA, intermediary metabolites and proteins are dubbed as “omic” technologies, because they can characterize all, or nearly all molecule family members in a given examination (Figure 2). Researchers can now perform thorough analyses of the biochemical pathways’ functional activities, as well as gene sequence variations between individuals as well as species, using these new methods, which was not possible before.

Figure 2. The examination of the genome, transcriptome, proteome, and metabolome is part of the Central Dogma and the interacting “ome” This image has been adapted from (Debnath et al., 2010).

Different fields in biology which contain the names ending in the suffix omics, like genomics, metagenomics, transcriptomics, proteomics, and metabolomics, are collectively called omics. The goal of omics is to characterize and quantify large groups of biomolecules that are translated into a particular structure, perform a specific function, or are involved dynamically in an organism or in many organisms. The similar suffix -ome, on the other hand, refers to the research topics of such domains as the genome, transcriptome, proteome, and so on. The suffix -ome is a neologism or more appropriately a “neo-suffix” that was coined by combining several terms of Greek origin in a -ωμα words sequence which lack a recognizable suffix in the Greek language. The goal of the functional genomics is to determine the roles

Introduction of Omics Loop

5

of as many genes present in an organism as possible. It includes generation of multiple saturated libraries and integration with multiple omics approaches like transcriptomics and proteomics. The aim of functional genomics is to identify the activities of as many genes in any organism as feasible. It integrates various -omics approaches with saturated mutant datasets like transcriptomics and proteomics. The -ome suffix has three main areas of use, according to the Oxford English Dictionary: 1. Creating nouns containing the sense “tumour, swelling” in medicine 2. Creating nouns in botany and zoology in the sense of “a portion of plant or an animal with a certain construction” 3. Creating nouns with the connotation that treats all constituents collectively, in disciplines such as cellular and molecular biology The -ome suffix began as a variation of -oma in the last part of the nineteenth century, and it became popular in the late 1800s. Scleromeor rhizome was one of the first terms used for it. Every one of these words come from Greek words ending in a -ωμα, letters that may be decoded as -ω-μα, with the ω denoting the word stem (typically a verb) and the μα denoting a real Greek suffix producing abstract nouns. Biome (1916) and genome (1916) are early attestations, according to the OED, of -ome suffix’s third definition, which is made by a back-formation of the German word “mitome” (coined first as Genom in the year 1920). The etymological association of ome with the molecular biology word chromosome is spurious. The Greek stems -χρωμ (ατ)- which means “color” and σωμ(ατ)- which refers to “body” are used to create the word chromosome. While the -μα suffix is present in σωμα word “body,” the previous -ω- is not a stem-forming suffix but rather a word root’s component. The newly formed suffix -ome suggests itself as relating to “wholeness” or “completeness” because the genome is associated with the whole genetic composition of an organism.

Omics Studies Types Genomics The study of organisms’ genomes is known as genomics. Genomic science encompasses a wide range of subjects.

6

Asif Nadeem and Maryam Javed

Cognitive Genomics Variations in cognitive processes that are linked to genetic profiles constitute cognitive genomics. Comparative Genomics It involves studying the link between genomic structure and its function in various living organisms. Functional Genomics Study of the functions as well as interactions of genes and proteins (it is common to use transcriptomics in such studies as well). Metagenomics The investigation of metagenomes, or whole genetic material directly collected from any target samples, is the subject of interest in this field of study. Neurogenomics This field of study entails studying the influence of an organism’s genetics on the nervous system’s function and development. Pangenomics The study of a species’ complete set of genes (genomes). Personal Genomics A field of genomics that is involved with first sequencing and then analyzing an individual organism’s genome. After identification of the genome of an individual, a comparison can be made with genotypes published in the literature to estimate the possibility of difference in expression of traits and development of a certain disease. This can help in the practice of Personalized Medicine, which involves selecting therapeutic strategies according to each person’s genetic profile. Despite tremendous scientific progress, just a few genomics-based diagnostics or medicines have made it all the way to consumers. The use of genomics demonstrated that gene expression patterns caused by one drug and those induced by same combinational drugs can differ drastically.

Introduction of Omics Loop

7

The tremendous amount of information generated as a result of studying biological systems in the higher-order like cells as well as the organisms, and their interactions with their respective environments, including industrial, medical or any other surroundings, is based on the growing amount of genomic and molecular data (Kanehisa et al., 2006). The integration of advanced genome sequencing technologies with an understanding of the gene sequence and its conservation has resulted in the discovery of genetic markers that make it easier to find genetic variations and their outcomes in biological systems. Moreover “-omic” technologies also enable efficient characterization of biochemical pathways, resulting in generation of new markers that help in determining the susceptibility to disease as well as its prognosis. With -omic technology, analysis of gene sequences, their transcripts and protein products including intermediate metabolites can be performed simultaneously, which allows for monitoring of multiple physiological pathways at once with little effort. As a result, critical biomarkers and signalling molecules linked to cell metabolism, proliferation, and death have been identified. By employing such new biomarkers, tracking damage at the molecular and cellular level, and damage-response mechanisms becomes easy (Macgregor, 2004). Advancements in genomic technologies have enabled creation of massive amounts of data from biological sources over the last two decades. Gene sequences and information related to their expression as well as protein structures and their role in metabolic pathways are among the data generated by genomic sciences. A huge amount of data which can come in a variety of formats can now be created and automatically saved in computer systems thanks to automation in these technologies. Aside from the genomic information, biotechnology and pharmaceutical companies also have a wealth of “legacy data” - information inherited from a variety of sources about chemical properties, toxicological and clinical information related to the compounds themselves. A major challenge is the integration of new genomic data with the database systems so that decision-making can be facilitated, since most of this data is kept in older databases that were created for a specific sort of data.

Epigenomics Alternative DNA structures, RNA binders and proteins that form the genome’s supporting structure, as well as DNA chemical changes, comprise the epigenome. Epigenomics: Studies of chromosomal conformation with Hi-C, ChIP-seq, and many other sequencing techniques used in combination with proteomic fractionation, as well as sequencing techniques that detect chemical

8

Asif Nadeem and Maryam Javed

changes to cytosines, such as bisulfite sequencing, are all examples of modern technologies for investigating epigenomes.

Nucleomics The nucleome is a complicated and dynamic biological system that contains the entire set of genetic components that make up “the cell nucleus as a complex, dynamic biological system.” In 2017, the 4D Nucleome Consortium became a member of the International Human Epigenome Consortium (IHEC). In terms of approach and abstract objectives, epigenomics has many commonalities with various genomics disciplines. Epigenomics aims to discover and perform characterization of epigenetic alterations on a global scale, analogous to how genomics studies the entire collection of DNA and proteomics studies the whole protein set present in a cell. The idea for undertaking global epigenetic analysis is that conclusions regarding epigenetic modifications can be established (Barski et al., 2007). The idea for undertaking global epigenetic research is that it allows inferences concerning epigenetic alterations that would otherwise be impossible to make through examination of particular loci. Bioinformatics integrates biology, computer science, and mathematics is extensively used in epigenomics, as it is in other genomics studies (Petter, 2010). While many epigenetic alterations had been identified and examined for decades, it has only been made possible to do global assessments because of the advances in bioinformatics methods. Many current approaches rely on older methodologies, which are frequently adapted to genomic experiments. Lipidome Lipidome is an organism’s or system’s complete composition of cellular lipids, which includes any alterations made to a specific group of lipids. Lipidomics The field entails the study of the lipid pathways and networks at a large scale. Lipidomics makes extensive use of mass spectrometry methods. In Lipidomics, molecular species of lipids, proteins, and other substances in the cell are identified, quantified, and their interactions are analyzed. Lipidomics researchers study the cellular lipids’ structures, roles, interactions, as well as their dynamics including the changes that take place when the system is perturbed.

Introduction of Omics Loop

9

Han and Gross, 2003 were the first to define lipidomics by combining the chemical properties of lipid molecular species with a thorough mass spectrometric technique. Despite the fact that lipidomics is part of the larger subject of “metabolomics,” it is a separate field of research due to the individuality and functional specialization of lipids in comparison with other metabolites. After a cell is perturbed by alterations in its physiological state and or pathological conditions, a tremendous amount of data quantitatively defining the temporal and spatial modifications in the composition and contents of distinct lipid molecular types is accumulated in lipidomic research. The insights gleaned from these investigations aids mechanistic understanding of cellular function alterations. As a result, lipidomic studies are critical for identifying changes in lipid metabolism, transport, and homeostasis, as well as elucidating the molecular underpinnings underlying disease states related to lipids. The European Lipidomics Initiative or ELIfe, European Lipidomics Initiative, and LIPID Metabolites and Pathways Strategy also called LIPID MAPS Consortium are good examples of how lipid research is becoming more popular.

Proteome The proteome is an organism’s or system’s total protein complement, along with any alterations made to a specific collection of proteins. Proteomics It includes studying proteins on a large scale, being primarily concerned with their structures and functional properties. Proteomics makes extensive use of mass spectrometry tools. Immunoproteomics In immunoproteomics, large groups of proteins associated with the immune system are studied. It is application of proteomics for studying immune system. Nutriproteomics Nutriproteomics is the study of the nutritional and non-nutritional components of diet using proteomics. For research involving protein expression, data from proteomics mass spectrometry is used.

10

Asif Nadeem and Maryam Javed

Proteogenomics Proteogenomics is a new research area of biological sciences that combines genomics and proteomics as well as gene annotations based on proteomics data. Structural Genomics It involves studying each protein’s three-dimensional encoded by a genome by combining experimental data and mathematical methods. Proteomics includes the large-scale analysis of gene products, or proteins, using modern protein sequencing technologies. Protein expression patterns, their modifications and networks are studied in relation to the cellular function and biochemical processes such as promotion of body growth and health, and development of disease states (Macaulay et al., 2005). Proteomics has quickly evolved as a fascinating new area of study, one that supplements rather than substitutes genomics, thanks to the availability of the human genome map. The genome contains all the genes of an organism, which include the instructions for producing the proteins required for carrying out various functions. Proteomics in short involves studying proteins including the biological activities, and processes they are involved in as a whole. Research into protein structural conformation as well as protein-protein interactions is now possible with this technique which has become widely used in biomedical and pharmaceutical sciences. Because proteins play such a vital role in an organism’s life, proteomics is critical in identifying biomarkers that signify a disease. The Human Genome Project discovered that the human genome has much fewer protein-coding genes than the human proteome (20,000–25,000 genes vs. around 1,000,000 proteins). More than 2 million proteins are thought to be present in the human body, each with its own set of functions (http://en.wikipedia.org/wiki/Proteomics). Alternative splicing and posttranslational protein modifications are believed to be responsible for protein variety. The disparity means that gene expression studies cannot properly characterize protein variety, hence proteomics is important for identifying normal and pathological biological processes. Scientists have a huge challenge in cataloguing all human proteins, their activities, as well as their interactions, and the Human Proteome Organization or HUPO is leading an international effort to achieve these aims. Most proteins work in tandem with other proteins, and one of the goals of proteomics is to figure out how they interact. This frequently provides crucial information on the functional roles of newly identified proteins.

Introduction of Omics Loop

11

Glycomics Glycomics is the study of the glycome, which includes sugars and carbohydrates. The name glycomics comes from the prefix used in biochemistry called “glyco-,” which means “sweetness” or “sugar,” and was coined to follow the omics nomenclature practice set by genomics (which focuses on genes) and proteomics (which is concerned with proteins). To comprehend the importance of the glycomics field, one should first comprehend the many and varied functions of glycans. The following are few examples of such functions:        

On the surface of cells, glycolipids and glycoproteins serve an important function in detection of bacterial and viral pathogens. They play a role in cellular signaling pathways and influence cellular functions. They play important role modulation of innate immunity. Also are important in determining development of cancers. They direct ultimate fate of cells, suppress proliferation, control circulation, and prevent invasion of cells. They have an impact on the stability and folding of proteins. It influences glycoprotein pathways and reaction outcomes. There are numerous glycan-specific illnesses, many of which are inherited.

Various elements of glycomics have crucial medical applications:  

In the hematopoietic stem cell transplantation therapy, fractionization of cells is done lectins to prevent graft versus host disease. In cancer therapeutics, cytolytic CD8 T cells’ stimulation and proliferation.

Glycans perform a variety of roles in bacterial physiology, hence glycomics is exceptionally significant in microbiological research (Twine and Logan, 2012). Bacterial glycomics research can potentially contribute to the creation of:   

new drugs bioactive glycans discovery glycoconjugate vaccines development

12

Asif Nadeem and Maryam Javed

Foodomics The study of nutrition, food, and well-being through the use of -omics technology to strengthen the consumer health, well-being, and knowledge of factors affecting them, according to a definition of foodomics proposed in 2009. Following its introduction at the first international conference in Cesena, Italy in 2009, foodomics became a hot topic. Many professionals in the fields of omics and nutrition were asked to come to the conference to explore new approaches and possibilities in the domain of food sciences and technology. Foodomics research and development prospects are currently limited because of the need for high throughput analytical procedures. Foodomics considerably aids scientists researching in the areas of food and nutritional sciences in gaining greater accessibility to data that can be used to assess the impact of food on public health, among other things. It is thought to be a further stride towards a greater grasp of technological development and advancement in the food industry. Furthermore, foodomics research contributes to other omics sub-areas, such as nutrigenomics, which combines nutrition, genetics, and omics research. Transcriptomics The transcriptome includes the collection of all the RNA molecules generated in one or more cells, which include mRNAs, rRNAs, tRNAs, as well as various non-coding RNA. The study of transcriptomes, comprising their structures, and activities is known as transcriptomics. At different stages of progression and in varied physiological situations, each cell uses (expresses) a variety of genes. Tissues typically express comparable groups of genes, which can be employed for identifying them in the lack of further evidence. The human brain, for instance, produces around 30% of all the recognized genes; producing unique transcripts which are distinct from the heart’s transcribed genome. As a result, we can create molecular markers based on the expression patterns, which may subsequently be utilized to automatically classify normal cells as well as tissues into the appropriate group. This expression analysis technique can therefore also be used to study disease processes. For example, neurons in Alzheimer’s disease have different gene expression profiles than normal neurons. Since it is a surrogate marker of that cellular state, this sort of information could be employed as a molecular diagnostic criterion in the lack of histopathology data. Transcriptomics expands on the more traditional research of gene expression, in which discrete genes are analyzed using Northern blots. Rather

Introduction of Omics Loop

13

than looking at a single gene at one time, this modern technique looks at the complete transcriptome, which contains the complete collection of all the messenger RNA molecules occurring in a particular cell population of cells at any particular time. This offers a quick snapshot of all the genes that are being transcribed, and because the method is somewhat quantitative, it provides information on each gene’s expression rate. For instance, by making the comparison of transcriptional studies from plants subjected to various environmental situations, all genes participating in the adaptation process can be swiftly identified. Every gene and various proteins may be thought of as tools for constructing an organism’s biochemical makeup and, as a result, its physiological uniqueness. The same could be applied to clinical conditions that are less visible or for those conditions that lack any diagnostic marker (For example Autism). This method will also make testing easier in presymptomatic situations, encouraging early treatments that could lead to improved outcomes. The transcriptomics technique allows for the subcategorization of illnesses that seem to be identical on the surface. This method has been demonstrated effectively for a range of illnesses, notably in cancer, where an expression profile can be used to predict outcomes and survival rates as well as medication responses. Microarray technology and various other high-throughput techniques are proving to be essential in biological sciences because they allow researchers to track a large number of genes, usually in thousands, in a single experiment. Now, it has become necessary to employ computational tools for making comparisons and assesements of these expression patterns in a way that allows formulation of biological interpretations (Subramanian et al., 2005).

Metabolism 



Metabolomics: The scientific study of metabolite-related chemical reactions. It includes a “systematic examination of the distinct chemical fingerprints left behind by specific biological processes,” as well as analyses of the metabolite profiles of their small molecules. Metabonomics: Quantitative analysis of a living system’s dynamic and multiparametric metabolic responses to pathogenic stressors or genetic change.

14

Asif Nadeem and Maryam Javed

Nutrition, Pharmacology and Toxicology 

 



Nutritional genomics: Nutritional genomics is a branch of science that studies the link between both the human genome and nutrition, and their impact on health.  Nutrigenetics investigates the impact of genetic differences on the relationship between nutrition and health, with consequences for vulnerable populations  Nutrigenomics: Nutrigenomics is the study of how foods and their ingredients’ affect expression of genes. The impact of nutrition on the genome, metabolome, and proteome is studied. Pharmacogenomics studies the impact of the human genome’s aggregate of variants on pharmaceuticals. Pharmacomicrobiomics studies the impact of the microbiome present in the humans’ aggregate of variants on pharmaceuticals and vice versa. Toxicogenomics: It is a branch of science concerned with the gathering, interpreting, and archiving of data on gene expression and protein function in specific cells or tissues present in the organism in response to harmful agents.

Integration of Omics Creating a novel medicine is a time-consuming and costly process. For the first time, genomics, transcriptomics, metabolomics, and proteomics, as well as other recently established high-throughput research methods, provide the ability to thoroughly examine molecular level disease mechanisms. They significantly enhance the range of genes or proteins that may be identified at the same time and offer the ability to link complicated compositions to complex consequences via genes or proteins expression patterns. The basic goal of “omic” technology is to identify all the gene products of genes (such as transcripts, metabolites, proteins etc.) found in a single biological specimen without being targeted. The sophisticated examination of quantitative fluctuations in living organisms is a further and more difficult component of omic techniques. The “-omics” methods make it easier to characterise the physiology of therapeutic targets methodically, lowering loss rates in the discovery studies and increasing the total productivity of pharmaceutical studies. The fast expanding amounts of automatically

Introduction of Omics Loop

15

generated biological data, as well as a shortage of robust database systems and computational tools, are now the hurdles to fully using these experimental methods (Fischer, 2005). When dealing with “omic” level, high-throughput processing is required Genomic single nucleotide polymorphisms (SNPs) study (i.e., large-scale SNPs genotyping), transcriptomic quantifications (i.e., concurrent quantification of each gene’s expression values in the cell or tissue types), and proteomic quantifications (i.e., determination of every protein present in cells or tissue types) are the four main categories of high-throughput quantifications that are widely practiced (i.e., determination and measurement of every metabolite found in cell or tissue types). All of these four techniques are distinct and offer a unique viewpoint on illness onset and advancement, as well as provides methods for disease prediction, prevention, and treatment (Venkatesh and Harlow, 2002). SNP genotyping examines genotypes of a person across the genome for hundreds of thousands of single nucleotide variations. These SNPs are quite prevalent (i.e., 5% of the population has at least one less prevalent allele copy), although they are not strictly the cause of the disease. SNPs, on the other hand, can work in concert with the other SNPs and environment factors to enhance or reduce a person’s illness risk. This renders it challenging to find relevant SNPs since the number of variations in a single trait that can be explained by SNP is minimal in comparison to the overall variance in the trait. SNPs, on the other hand, are possibly among the most valuable markers for forecasting disease risk because genotypes stay consistent throughout life (preventing alterations in individual cells). The earliest and most recognized highthroughput technologies are transcriptomic studies (also known as gene expression microarrays or colloquially as “gene chips”). Gene expression rates have a stronger impact on characteristics than SNPs, therefore meaningful relationships are easier to find. While transcriptome metrics are not as effective for pre-disease prognosis (since gene expression levels in a person’ before disease onset are unlikely to be instructive since they can dramatically vary), they are helpful for predicting disease progression. They are ideal for either initial disease detection (identifying persons with disease-related levels of gene expression but no other symptoms) or dividing individuals with a certain disease in subgroups (by determining levels of gene expression that are linked to better or poor results, or greater or reduced values of certain disease trait).

16

Asif Nadeem and Maryam Javed

Figure 3. Integration of omics technologies: Integrating genomics, proteomic, and other related data to provide a more comprehensive perspective of any living organisms remains one of the difficulties of systems biology. This image has been adapted from (Debnath et al., 2010).

Proteomics is comparable to transcriptomics in terms of character. Proteomic metrics, like transcriptome measures, are useful for detecting disease early or separating individuals into a number of subgroups. Metabolites are quantified via a very quick serial method, similar to proteomics. The NMR method is commonly utilized to detect and measure metabolites. Although this technique is relatively new and less widely employed than the others, the same concerns apply. Metabolite’s quantifications, like levels of gene expression and expressed proteins, are changeable and thus perfectly suitable for early detection of diseases or disease subtype delineation. The emerging omics techniques appear to be on track to meet their ambitious goals, and when used together, they could be tremendously useful in functional gene investigations. DNA microarray methods are commonly used in genomics and transcriptomics research. On the other hand, proteomics and metabolomics lack standardized protocols as of now, but typically, proteome assessment is done using chromatography-mass spectrometry and two-dimensional gel electrophoresis, whereas metabolome study is done using MALDI-TOF, Liquid chromatography-mass spectrometry, Liquid Chromatography-Nuclear magnetic resonance, and Gas chromatographymass spectrometry. Proteomics connects gene expression patterns to protein products, while transcriptomics gives the tool for interpreting them. Metabolomics, which specifies metabolic network or networks related to gene expression, is recognised as the third critical partner. The use of NMR and

Introduction of Omics Loop

17

mass spectrometry allows for a broad screen examination of the metabolome and the conversion processes associated with it, which goes beyond traditional specific metabolic research. Omics will influence not just our knowledge about biological functions, but also the possibility of more correctly detecting and curing diseases (Loughlin, 2007).

Conclusion With the advent of technologies that made it possible to map and sequence the genomes of various organisms, it has become possible to take a great deal of information about the molecular composition of tissues and cells. By using these technologies, one can get a picture of the underlying biochemistry of a biological system of interest at a level of precision previously unattainable. The scientific domains concerned with measurements of biological molecules in a high-throughput manner are collectively referred to as “omics.” The massive rise in sequencing of biological molecules such as DNA, RNA, and proteins as well as other molecules like different metabolites since the last decade has continued to reveal the immense range of underlying molecular processes in different organisms that has yet to be discovered. Due to these advancements, the number of available omics approaches are constantly growing, among which genomics, transcriptomics, proteomics, epigenomics and metabolomics are the most well-known. In future, development of omics techniques is expected due to the improvements in validated technologies, research designs, and statistical approaches for data interpretation. This will significantly increase the role of omics technologies in different areas of research related to biological sciences. This will aid in the integration of different disciplines to provide an integrated view of the interplay between different processes occurring in living organisms.

References Barski, Artem, Suresh Cuddapah, Kairong Cui, Tae-Young Roh, Dustin E. Schones, Zhibin Wang, Gang Wei, Iouri Chepelev, and Keji Zhao. “High-resolution profiling of histone methylations in the human genome.” Cell 129, no. 4 (2007): 823-837. https://doi.org/10.1016/j.cell.2007.05.009. Bilello, John A. “The agony and ecstasy of “OMIC” technologies in drug development.” Current molecular medicine 5, no. 1 (2005): 39-52. https://doi.org/10.2174/ 1566524053152898.

18

Asif Nadeem and Maryam Javed

Debnath, Mousumi, Godavarthi B. K. S. Prasad, and Prakash S. Bisen. Molecular diagnostics: promises and possibilities. Springer Science & Business Media, 2010. Fischer, Hans Peter. “Towards quantitative biology: integration of biological information to elucidate disease pathways and to guide drug discovery.” Biotechnology annual review 11 (2005): 1-68. https://doi.org/10.1016/S1387-2656(05)11001-1. Kanehisa, Minoru, Susumu Goto, Masahiro Hattori, Kiyoko F. Aoki-Kinoshita, Masumi Itoh, Shuichi Kawashima, Toshiaki Katayama, Michihiro Araki, and Mika Hirakawa. “From genomics to chemical genomics: new developments in KEGG.” Nucleic acids research 34, no. suppl_1 (2006): D354-D357. https://doi.org/10.1194/jlr.R300004JLR200. Loughlin, Michael F. “Using ‘omic’technology to target Helicobacter pylori.” Expert opinion on drug discovery 2, no. 8 (2007): 1041-1051. https://doi.org/10.1517/ 17460441.2.8.1041. Macaulay, Iain C., Philippa Carr, Arief Gusnanto, Willem H. Ouwehand, Des Fitzgerald, and Nicholas A. Watkins. “Platelet genomics and proteomics in human health and disease.” The Journal of clinical investigation 115, no. 12 (2005): 3370-3377. https://doi.org/10.1172/JCI26885. Macgregor, James T. “Biomarkers of cancer risk and therapeutic benefit: new technologies, new opportunities, and some challenges.” Toxicologic pathology 32, no. 1_suppl (2004): 99-105. https://doi.org/10.1080/01926230490425067. Petter, J. “iGenetics a Molecular Approach.” (2010). Subramanian, Aravind, Pablo Tamayo, Vamsi K. Mootha, Sayan Mukherjee, Benjamin L. Ebert, Michael A. Gillette, Amanda Paulovich et al. “Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.” Proceedings of the National Academy of Sciences 102, no. 43 (2005): 15545-15550. https://doi.org/10.1073/pnas.0506580102. Twine, S. M., and S. M. Logan. “Bacterial Flagellar Glycosylation Systems: Opportunities and Applications in Bacterial Glycomics, Current Research, Technology and Applications. Eds. Reid, Twine, Reid.” (2012). Venkatesh, T. V., and Harry B. Harlow. “Integromics: challenges in data integration.” (2002): 1-3. https://doi.org/10.1186/gb-2002-3-8-reports4027.

Chapter 2

Animal Genomics Abstract Livestock production constitutes as a major resource of animal protein and an important source of income for a large portion of the world’s population. Animal breeding mainly revolves around evaluating and selecting for the features like increased productivity, tolerance to diseases, and longevity. Traditionally, the phenotype data was paired with pedigree data to estimate the extent of genetic variations as well as the potential for improving features through selection. Economically important farm animals such as cattle, and chicken, and laboratory animals such as rats have served as essential biomedical research subjects. In the future, genomics will accelerate the genetic improvement of farm animals and enhance output levels. To maximize animal productivity, it is imperative to understand the genetic structure and expression of the animal genomes, as well as how these genetic structures interact with non-genetic components in a biological system. Breeders will likely shift their perception on features and the worth of farm animals for selection in the future years as the genomes of an increasing range of animals are unraveled. This presents a one-of-a-kind opportunity to revolutionize animal agriculture and husbandry that involves making animals healthy and more productive while requiring less investment in the production systems.

Keywords: livestock production, animal breeding, phenotypes, genetic improvement, animal genomes

Introduction The term genomics was first used by Thomas Roderick in 1986. Genomics is a discipline of biology that involves the study of entire genome, focusing particularly on each gene’s architecture, activity, evolution, and relationship with other biological molecules. Moreover, the discipline also entails the creation and use of more efficient sequencing, mapping, and other

20

Asif Nadeem and Maryam Javed

computional biology techniques. Genomicists create massive volumes of data using large-scale molecular techniques such as genome sequencing, linkage analysis, and physical mapping, which the scientists then analyze using computers. Genomicists can use advanced softwares to infer the presence and, in some instances, basic roles of previously unknown genes. These predictions can then be verified by using molecular biology methods. Genomic branch can be classified into two disciplines: Functional Genomics and Structural genomics (Del Giacco and Cattaneo 2012).

Structural Genomics Structural genomics is the study of the structure of a specie’s whole genome like genome size and physical position of genes on chromosomes. The organization and sequencing of genetic information stored inside a genome is the subject of structural genomics. Preparing physical and genetic maps of chromosomes is frequently employed as a first step in genome characterization. These maps show where genes as well as molecular markers, and chromosome fragments are located in relation to one another, which is useful for locating chromosome segments and then aligning of sequenced DNA into a complete genome sequence (Goldsmith-Fischman and Honig 2003).

Functional Genomics Functional genomics deal with the study of function of all genes existing in the whole genome. It is concerned with the proteome as well as the transcriptome. The proteome is concerned with the total number of proteins expressed by a certain genome, while the transcriptome focuses on the total number of RNAs transcribed by a genome. To define the DNA sequence’s function, functional genomics uses both laboratory research and bioinformatics techniques. In vitro methods such as experimental mutagenesis, in situ hybridization, and in vivo techniques such as using knockouts and transgenic animals are all some of the lab-based approaches for finding genes and determining their roles. These techniques can be used to analyze singular genes and can give useful knowledge regarding positions and functions of the genetic data (Bunnik and Le Roch 2013).

Animal Genomics

21

Subdivision of Genetics Population genetics, molecular genetics, and transmission genetics are the three primary subdisciplines of genetics. Transmission genetics, often called as classical genetics, deals with the fundamentals of heredity and studying how traits are transmitted down from one generation to another. The relationship between inheritance and chromosomes, the organization of genes on chromosomes, and mapping of genes mapping are all topics covered in this subdiscipline. It focuses on a given organism and analyzes how it receives its genetic composition and transfers its genes down from generation to generation. Gene structure, function, and expression are studied in molecular genetics, which examines the chemical makeup of genes. It covers gene regulation (control of genetic activity) as well as the process by which a gene replicates, transcribes, and translates (which is concerned with transfer of genetic information). The gene including its structure, organization, and activity are the subject of interest in molecular genetics. Population genetics involves studying the genetic makeup of individual groups belonging to the same species (called populations) and also how the population makeup varies from place to place and over. Population genetics is essentially involved in studying the evolution since the latter involves genetic change. Population genetics focuses on the genes present in a population. Even though this traditional division of genetics into three subfields is useful and convenient, it should be acknowledged that these subdivisions overlap and that every one of these main subdivisions can be further subdivided into a handful of even more specialized disciplines like biochemical genetics, chromosomal genetics, quantitative genetics and so on. An alternative to this approach is to study the genetics of a particular species (such as a fruit fly, corn, or bacterium) at the population, transmission, molecular, or transmission genetics level. There are many subfields and specialties within modern genetics, all of which are connected (Griffiths 2000).

Animals as a Genetic Model A vast amount of genetic data has been amassed about organisms with unique traits that enables genetic study particularly Drosophila melanogaster (fruit fly); the domestic mouse; Saccharomyces cerevisiae (baker’s yeast); Caenorhabditis elegans (nematode worm), Escherichia coli (bacteria found in the guts of humans and other animals) are the five model species that have

22

Asif Nadeem and Maryam Javed

been extensively studied genetically. Many genetic research experts like these organisms, and so their genomes have been sequenced to complement data from the Human Genome Project. Xenopus laevis or clawed frog and Danio rerio (zebrafish) are two more species that are usually the topic of extensive genetic research and so, are used as genetic research models (Goldstein and King 2016).

Structure of Genome An organism’s genome includes its entire set of genetic material. It consists of DNA whereas in case of RNA viruses it is RNA. Genetic material includes both coding and non-coding DNA existing in the form of chromosome. Number of chromosomes vary from specie to specie. Number of chromosomes in some species is given in the Table 1.1. The portion of coding and noncoding DNA also differs greatly between species. Only a small percentage of the genome is coding DNA sequence. Coding DNA include the DNA sequence that encodes protein sequences. Non-coding DNA, on the other hand, does not produce protein sequences. A few noncoding DNA produce molecules such as ribosomal RNAs, regulatory RNAs, and transfer RNAs etc. Some of the noncoding DNA sequences formed function as a regulation of translation and transcription of protein-coding sequences. Other function of non-coding DNA sequence includes, telomeres, centromeres, origins of DNA replication and scaffold attachment regions. Junk DNA is the non-functional component of non-coding DNA that persists. Complex genetic connections and epigenetic activity are also mediated by non-coding DNA. Most multicellular animal’s genome contains a considerable number of somewhat mild and substantially high repetitive sequences, the percentage of which is higher for species that have larger genomes. Many of these repeating sequences appear to have formed because of transposition, and they are especially visible in the human genome: Transposable elements account for 45 percent of the human genome DNA, several of which are faulty and therefore unable to move. In multicellular species, the majority of DNA is noncoding, and several genes are disrupted by introns. The quantity and size of introns are both larger in more sophisticated eukaryotes (Makalowski 2001).

Animal Genomics

23

Table 1.1. Number of chromosomes in different animals Common name Human Water buffalo (river type) Water buffalo (swamp type) Sheep Goat Cow/Bull Donkey Mule Horse Dog Chicken Cat

Scientific Name Homo sapiens Bubalus bubalis Bubalus bubalis Ovis orientalis aries Capra aegagrus hircus Bos primigenius Equus africanus asinus Equus ferus caballus Canis lupus familiaris Gallus gallus domesticus Felis silvestris catus

No. of Chromosomes 46 48 50 54 60 60 62 63 64 78 78 38

Several genes are found in the identical position in related genomes, the phenomena being known as colinearity. This is one of the aspects of genome evolution uncovered by making a comparison of the genetic sequences of various organisms. The explanation for genome colinearity is that these sequences are all inherited from a similar ancestral genome, and evolutionary pressures have kept the order of genes in descendants’ genomes the same.

Types of Genomes There are two types of eukaryotic genome named as nuclear genome and mitochondrial genome. Nuclear genome located in nucleus of the cells while mitochondrial genome present in mitochondria. There are two number of copies of nuclear genome present in diploid cell while single copy presents in haploid cells. On the other hand, mitochondrial DNA is estimated to contain 100–1000 mtDNA copies. Unlike the nuclear genome, which is linear and has open ends, the mitochondrial genome is circular. The nuclear genome is diploid, meaning it comes from both parents, but the mitochondrial genome is haploid and solely comes from an offspring’s mother. Nuclear DNA chromosomes are responsible for a person’s genetic make-up, whereas mitochondrial DNA chromosomes are responsible for metabolic activity. Both types of genomes are assumed to have evolved independently. Endosymbiotic theory holds that mtDNA originated from bacteria that were ingested by the ancestral species of modern eukaryotic cells. There are 16,569 nucleotides in the mitochondrial genome of humans, which contains 37 genes. The human

24

Asif Nadeem and Maryam Javed

mitochondrial genome is made up of 16,569 DNA nucleotides that code for 37 genes. The mitochondrial genome is extremely small when compared to nuclear DNA, which contains about 3 billion nucleotides and encodes between 20,000 and 25,000 different genes (Excoffier and Langaney 1989).

Mitochondrial Genomes: Ancestral and Derived Derived and ancestral genomes are the two forms of mitochondrial genomes. However, there is a lot of variety within each kind, and some organisms’ mtDNA doesn’t really conform well with either. Compared to derived genomes, ancestral mitochondrial genomes have more genes, a complete or almost complete collection of tRNA genes, and rRNA genes that encode eubacterial-like ribosomes. They use universal codons, have fewer introns and noncoding DNA between genes, and organize their genes into clusters like those found in eubacteria. So, ancestral mitochondrial genomes retain many of their eubacterial forefathers’ features. Derived mitochondrial genomes, on the other hand, are often shorter and contain smaller number of genes when compared with ancestral genomes. The ribosomes and rRNA genes transcribed by derived mitochondrial genomes are not the same as those reported in eubacteria. It was discovered that DNA sequences from mitochondrial genomes, as compared to the genes from ancestral genomes, were different from those typical eubacterial genomes and have non-universal codons. This is true for the majority of fungal and animal mitochondrial genomes. The features of derived mitochondrial genomes vary markedly from those of typical eubacteria (Aquadro and Greenberg 1983). Table 1.2. Mitochondrial genome sizes in various organisms Organism Common Name Fruit fly Earthworm Frog House mouse Human

Scientific Name Drosophila melanogaster Lumbricus terrestris Xenopus laevis Mus musculus Homo sapiens

Size of mtDNA (bp) 19,517 14,998 17,553 16,295 16,569

Animal Genomics

25

Discovery of mtDNA in Donkey Mitochondrial DNA (mtDNA) provides several advantages for studying evolutionary linkages; hence it is the reason why Beja-Pereira and coworkers employed it in their research on donkeys. Firstly, the mitochondrial DNA is smaller than the DNA contained in eukaryotic cells’ chromosomes, and it is easier to study. Secondly, since every cell has multiple mitochondria, each with multiple copies of the mitochondrial chromosome, mtDNA is plentiful. As a result, mtDNA is more easily isolated and studied than nuclear DNA. Thirdly, because mtDNA evolves faster than nuclear DNA in animals, it’s valuable for studying interactions between closely related organisms. Lastly, because mtDNA is often inherited from a single parent (typically from the mother), its genes do not undergo reshuffling by recombination every generation, which can obfuscate genetic relationships (Xu et al. 1996).

Composition of DNA DNA is a continuous stretch of polymer made up of nucleotide monomers. An individual nucleotide is made up of one of four nitrogenous bases, a deoxyribose sugar, and a phosphate. Deoxyribose, a 5-carbon sugar, is the sugar in DNA molecules in which successive sugar molecules are joined by the covalent phosphodiester linkages. A nitrogenous base is covalently bonded to carbon atom number 1' (one prime) of each sugar residue. Adenine (A), thymine (T), cytosine (C), guanine (G) and thymine (T) are four types of bases that are made up of nitrogen and heterocyclic carbon rings. Purines (which contain A and G) have two interlocking heterocyclic rings and are split into two classes. On the other hand, one such ring can be found in pyrimidines (C and T). A nucleoside is a sugar with an associated base while a nucleotide is the fundamental repetitive unit of a DNA strand, consisting of a nucleoside with a phosphate group attached at carbon atom 5' or 3'. RNA molecules are comparable to DNA molecules in composition, but they are different in that they comprise of ribose sugar molecules in place of deoxyribose and Uracil (U) instead of thymine. A 3',5' phosphodiester connection connects every sugar residue to its surrounding sugar residues in each example. This means that a phosphate group connects a sugar’s carbon atom 3 to the neighboring sugar’s carbon atom 5. RNA is a single-stranded molecule, while DNA exists in double stranded molecular form. Antiparallel means that the two DNA molecule strands point in different directions. This signifies that one strand’s

26

Asif Nadeem and Maryam Javed

5-end faces the other strand’s 3-end in the opposite direction. DNA is not only double-stranded, but the two strands are also coiled around one another in a helical pattern (Pray 2008a).

Stability of Genome DNA exists in a double helix form, which is made up of two DNA molecules (called DNA strands) are bound with each other by weak hydrogen bonds to create a duplex. On the other hand, RNA molecules are generally single molecules. Hydrogen bonding exists between bases that are laterally opposed. As per Watson-Crick rules, the base pairs (bp) of the two strands of the DNA duplex are: Adenine bonds with Thymine while Cytosine binds to Guanine specifically (Harding et al. 2018).

Chargaff’s Rule Erwin Chargaff and coworkers examined the quantity of the four DNA bases in several species and discovered that the DNA base composition from various organisms varies substantially. The tetranucleotide theory was refuted by this discovery. They determined that the ratios of the bases are consistent within every specie: adenine is almost always equal to thymine (A = T) while the same is true for cytosine and guanine (C = G). Chargaff’s rules were born out of these observations (Chargaff et al. 1952).

Form of DNA DNA can form several different double helical configurations. The most stable and prevalent of these was reported by Watson and Crick. Ten base pairs are found in each helix turn and is right-handed. The grooves that go along the helix are called the major and minor due to the differences in their depth. To differentiate it from the two helical forms: A-DNA and Z-DNA, the typical Watson and Crick double helix is called the B-form or B-DNA. While double stranded RNA frequently has an A form, other forms of the double helix, such as DNA, have Z forms, which have left-handed turns and 12 base pairs per turn.

Animal Genomics

27

Characteristics of Genome It was unknown how DNA could hold and transfer genetic information until the DNA structure was discovered. Even before the discovery of nucleic acids as the source of genetic information, biologists knew that whatever the genetic material was, it had to have three essential properties. 1. Complex data must be present in the genetic material To begin with, the genetic material should be able to store huge amounts of data—instructions for all an organism’s features and functions. Because various species and particularly specific individuals of a species have varied genetic makeups, this data must be flexible. Simultaneously, the genetic code should be stable, as changes in genetic information (called mutations) are typically hazardous (Pray 2008b). 2. The genetic material must be able to duplicate itself in a consistent manner The ability to correctly copy the genetic material is a second requirement. Each organism starts out as an individual cell that must divide billions of times to become a complicated, multicellular entity to form a complete person. The genetic information should be accurately conveyed to descendent cell at each cell division. The coding information has to be faithfully reproduced when living beings reproduce and transfer the genes to their offspring (Sclafani and Holzen 2007). 3. The phenotype must be encoded in genetic material The genotypes of organisms (the genetic content) should be capable of “coding for” (determining) the phenotypes (the traits). Because the output of a gene is frequently a protein, a method for translating genetic information into protein amino acid sequence must exist (Fredrick and Ibba 2009). 4. Genes form the basis of inheritance Based on biological environment, the specific manner in which a gene is defined can vary. A gene can be thought of as an information unit that contains a genetic property at its most basic level (Simpson et al. 2001). 5. Genes come in several forms, known as alleles Alleles are different versions of a gene that determine a trait. A gene for coat color in cats could have two alleles, for example: one

28

Asif Nadeem and Maryam Javed

6.

7.

8.

9.

encoding black fur and the other encoding orange fur (Elston et al. 2012). Phenotypes are determined by genes The contrast between characteristics and genes is one of the most significant notions in genetics. Traits are not passed down over the generations. Genes are inherited, and they, combined with environmental variables, influence how traits are expressed. The genotype is the genetic information that an individual organism carries; the trait is the phenotype. The degree of similarity among genes seen in even remotely related species is a clear and surprising trend observed in eukaryotic genomes. Humans and mice, for example, can share 99 percent of DNA in some genes. About half of the genes present in the fruit flies are identical to human genes (Merrick 1992). DNA and RNA both are carriers of genetic information Nucleic acids come in two forms: deoxyribonucleic acid abbreviated as DNA and ribonucleic acid or RNA, contain genetic information. Nucleic acids are polymers made up up of nucleotides, which are repetitive units made up of nitrogenous base, sugar, and phosphate. Adenine (A), thymine (T), cytosine (C), and guanine (G) are the four nitrogenous bases present in DNA. The genetic information is encoded by the sequence of these nucleotides. Two complementary nucleotide strands make up DNA. The genetic information of most species is stored in DNA, while a few viruses store it in RNA. Guanine, cytosine, adenine and uracil are the four nitrogenous bases found in RNA (Travers and Muskhelishvili 2015). Genes reside on chromosomes: Chromosomes, which are made up of DNA and related proteins, are the carriers of genetic data inside a cell. Each species’ cells have a specific set of chromosomes; for instance, bacterial cells contain only one chromosome, human cells possess 46, and pigeon cells have 80 chromosomes. A high number of genes are carried on each chromosome (Miko 2008). Mitosis and meiosis are two mechanisms that split chromosomes Mitosis and meiosis guarantee that every cell produced by cell division contains a full collection of a chromosomes in an organism. Mitosis refers to the division of somatic cells during which chromosomes are segregated. Meiosis is the process by which

Animal Genomics

29

chromosomes are matched and separated during the division of sexual cells (reproductive cells) to make gametes (Ohkura 2015). 10. Transfer of genetic information occurs via DNA to RNA and RNA to protein: The protein structure is specified by various genes to code for traits. First, DNA is transcribed into RNA, which is subsequently translated into the proteins’ amino acid sequence (Crick 1970).

Genome Project Over 100 eukaryotic species comprising of various species including protists, fungi, insects, and vertebrate animals have their genomes sequenced entirely. For example, aphids, anemones, chimps, cows, dogs, fruit flies, horses, humans, fruit flies, mice, mosquitoes, mice, and rats are among the sequenced eukaryotes. Even extinct creatures such as the Neanderthal and woolly mammoth have had their genomes sequenced. Hundreds of other eukaryotic genomes are being sequenced right now. Although some genomes have been “fully sequenced,” many gaps remain in the final sequence assembly, and it is possible that heterochromatic sections might not at all be sequenced. As a result, the eukaryotic genomes’ sizes are frequently estimated, and the quantity of base pairs stated for a specific species’ genome size may differ. Estimating the number of genes contained in a genome is likewise problematic, and the genes numbers found in a genome might vary based on the gene-finding tools utilised and assumptions applied during the process (Pray 2008b). Table 1.3. The characteristics of various completely sequenced eukaryotic genomes Species Caenorhabditis elegans (roundworm) Drosophila melanogaster (fruit fly) Anopheles gambiae (mosquito) Danio rerio (zebrafish) Mus musculus (mouse) Rattus novergicus (Norway rat) Pan troglodytes (chimpanzee) Homo sapiens (human) Gallus gallus (chicken) Canis familiaris (domestic dog)

Genome Size (Mbp) 103 170 278 1465 2627 2571 2733 3223 ~1000 ~2400

Number of Predicted Genes 20,598 13,525 14,707 22,409 26,762 23,761 22,524 ~24,000 20,000-23,000 ~25,000

30

Asif Nadeem and Maryam Javed

Gene Desert A normal eukaryotic genome’s gene density differs widely, with certain chromosomes possessing a high gene density while others have a low gene density. Long sections of DNA, frequently hundreds of thousands to millions of base pairs long, lack any known genes or any related functional elements in some portions of the genome; these sites are considered as gene deserts which are astonishingly widespread in eukaryotes. About 500 or more gene deserts exist in the human genome, accounting for about 25% of total euchromatin. Human chromosomes number 13, 5, 4 and 13 contain gene deserts extensively, which can cover up to 40% of the chromosome’s length. Marcelo Nobrega and coworkers studied the functional importance of gene deserts by deleting gene deserts in mice. In the experiment, they made transgenic mice with either a 1,500,000-bp gene desert on mouse chromosome 3 or an 845,000-bp gene desert missing on mouse chromosome 19. Mice that were homozygous for the deletions were formed by interbreeding, indicating that the mice genomes were entirely devoid of the DNA found in such gene deserts. The investigators meticulously tracked the blood chemistry, weight gain, and survival rate of the transgenic mice. Several organs, such as the brain, bladder, heart, intestines, kidneys, lungs, reporductive parts, spleen, stomach and thymus, were examined visually and pathologically at different ages. Interestingly, mice with gene deserts deleted seemed healthy and could not be distinguished from controls.These findings show that huge sections of the mammalian genome could be removed without severe phenotypic repercussions, even if the mice with the deletions may have had problems that went undiscovered (Nóbrega et al. 2004).

Advances in Buffalo Genome There are 130,000,000 river buffalo animals in the world used for milk and meat production. In many countries, research resources are very confided and buffalo in these countries are economically very important, as result research on buffalo genome is not possible. However, in case of buffalo genomics, establishment of fluorescence in situ hybridization and cytogenetics has a strong foundation. Synteny maps were produced due to development of hybrid somatic cell panel that were combined with maps of cytogenetic, resulting in the direct distribution of syntenic assemblies to chromosomes. About three

Animal Genomics

31

hundred loci exist on cytogenetic map, many of them are homologs mapped with different species, and thus have development of comparative mapping. Banding patterns of chromosome reported nearly the same karyotypes of cattle (2N = 60) and river buffalo (2N = 50). Five couples of one-armed chromosomes of cattle resembling to be separated form of five bi-armed chromosomes of buffalo. Comparative mapping of bovid species confirms the identification of these arms of the chromosome. Comparative mapping can be enhanced to a higher level of resolution by newly created radiation hybrid panels for buffaloes. The work of J. Elliott, E. Amaral, and J.E. Womack suggests that homologous regions of buffalo sequences in the genome can be amplified by using microsatellites primers for cattle. If microsatellites are polymorphic in sufficient numbers in Buffalo, these will help to develop linkage map when pedigreed families are accurately classified and DNA is available to buffalo mapping community (Moaeen-ud-Din 2014).

Buffalo Genome and 90K SNP Chip A group of researchers sequenced and aligned the genome of several buffalo of different breeds with bovine genome, which helpsto identify variants in genome of buffalo. Frequencies of variants within and between breeds of buffalo and their diversity in genome compared with genome of bovine. Ninety thousand single nucleotide polymorphisms were selected to produce Axiom Buffalo Genotyping Array 90K. This SNP Chip was experimented in the different population of buffalo from Brazil and Italy and found that these populations have at least 75% high-quality polymorphic markers. This 90 K SNP chip was then employed to study the different population of Buffalo and to locate the variations that have a significant impact on milk production (Iamartino et al. 2017).

The Bos Taurus and Domestic Cow Whole-Genome Assembly Using whole-genome shotgun and hierarchical and sequencing methods, bos taurus and domestic cow, had their genomes sequenced. By applying variety of assembly improvement techniques, 35 million sequence variants have assembled and 2.86 billion of base pair assembly were created which has a lot of improvements over outdated assemblies. In current assembly hundreds of single-nucleotide mistakes have been fixed, a lot of translocations, deletions,

32

Asif Nadeem and Maryam Javed

inversions and erroneous have been corrected, hundreds of gaps have been filled, covering more of the genome and more complete. Independent measures show that the newly developed assembly is far more reliable and comprehensive than earlier versions. Researchers were able to produce an assembly with large-scale contiguity utilizing conserved synteny between the human and cow genomes and independent mapping data, with an estimated 90% of the genome organized on the 30 chromosomes of Bos taurus. Taurus. Scientists also have built a new human-cow synteny map that extends upon old maps. Researchers have determined for the first time a part of Y chromosome of the Bos tauruss (Zimin et al. 2009).

Recombinant DNA Technology in Domestic Animals Domestic animals are likewise subjected to recombinant DNA technology. The gene for growth hormone, for instance, was taken from cattle and then cloned in E. coli, which produce vast amounts of bovine growth hormone, which is given to dairy cows to boost milk output. Certain eukaryotic proteins have to be changed after translation, while only other eukaryotes (excluding bacteria) are capable of performing modifications themselves. For instance, the regulatory region of the gene in sheep for -milk protein lactoglobulin has been linked to a gene for human clotting factor VIII. Transgenic sheep were created by injecting the fused gene into sheep embryos, resulting in the transgenic sheep that generate human clotting factor in their milk, that can be used to treat hemophiliac disease in humans.

Characteristics of the Mammalian Mitochondrial Genome That Are Conserved throughout the Bovine Mitochondrial DNA Sequence A group of researchers represented the complete set of 16,338 mitochondrial nucleotide sequences of bovine. The human mitochondrial genome was discovered to be equivalent to the bovine mitochondrial genome, with the genes ordered in the same order. Protein-coding genes (63–79%) in the bovine mitochondrial genome are nearly identical to those in the human mitochondrial genome, with the majority of changes occurring at the third nucleotide of codons. The lowest rate of base changes that accounts for nucleotide differences in the codon’s third position is exceptionally high: at least 610 9 alterations every year. In comparison with human mitochondrial

Animal Genomics

33

DNA ribosomal RNA genes, bovine 12S and 16S ribosomal RNA genes show conserved traits consistent with ribosomal RNAs’ postulated secondary structure. The DNA sequence of the mitochondrial genome in the bovine Dloop region is insignificantly homologous to the corresponding region in the human mitochondrial genome, in contrast to the significantly high pattern of homology between human and bovine mitochondrial DNAs observed over most of the mitochondrial genome. The size of the D-Loop region of the bovine and human mitochondrial genomes likewise vary (Anderson et al. 1982).

Bison Bison and Bison-Cattle Hybrid Mitochondrial DNA Sequence Analysis: Function and Phylogeny Entire mitochondrial DNA genomes of 43 bison-cattle hybrids and bison were sequenced and compared to those of other bovids. The animals chosen reflect bison’s current taxonomic structure and historical range. This study discovered evidence of bison population substructure, as well as a complete mitochondrial DNA phylogenetic tree for this species. Sixty-six polymorphic sites identify the seventeen bison haplotypes, while bison-cattle and bison hybrid sequences have 86 non-synonymous mutations and 728 fixed differences (Douglas et al. 2011).

Conclusion The human population is expected to increase to 10 billion by the year 2050. This trend will be coincided with improvement in economic conditions of developing nations. As such, an increase in animal derived product is anticipated. This will put an enormous strain on animal production system which is already facing many challenges such as concerns about the welfare of livestock animals, regulatory pressures, and market competitiveness. This calls for a paradigm shift in how animal trait research has been done previously. More emphasis needs to be placed on understanding the biological processes occurring in the animals (Rexford et al. 2018). One such approach called Animal genomics integrates genomics with animal sciences seeks to achieve this purpose. It involves studying the structure of genomes of individual organisms or species (Structural genomics) and how information

34

Asif Nadeem and Maryam Javed

contained in those genomes is expressed (Functional genomics). The use of high-throughput DNA sequencing techniques has accelerated the number of published animal genomes, while development in bioinformatics tools has streamlined data collection and analysis of this genomic data. In the future, such information is expected to enable sustainable, cost-effective, and less environmentally stressful animal breeding that also takes animal welfare into account.

References Anderson, S. M. H. L., M. H. L. De Bruijn, A. R. Coulson, I. C. Eperon, F. Sanger, and I. G. Young. “Complete sequence of bovine mitochondrial DNA conserved features of the mammalian mitochondrial genome.” Journal of molecular biology 156, no. 4 (1982): 683-717. https://doi.org/10.1016/0022-2836(82)90137-1. Aquadro, Charles F., and Barry D. Greenberg. “Human mitochondrial DNA variation and evolution: analysis of nucleotide sequences from seven individuals.” Genetics 103, no. 2 (1983): 287-312. Bunnik, Evelien M., and Karine G. Le Roch. “An introduction to functional genomics and systems biology.” Advances in wound care 2, no. 9 (2013): 490-498. https://doi.org/10.1089/wound.2012.0379. Douglas, Kory C., Natalie D. Halbert, Claire Kolenda, Christopher Childers, David L. Hunter, and James N. Derr. “Complete mitochondrial DNA sequence analysis of Bison bison and bison–cattle hybrids: Function and phylogeny.” Mitochondrion 11, no. 1 (2011): 166-175. https://doi.org/10.1016/j.mito.2010.09.005. Chargaff, Erwin, Rakoma Lipshitz, and Charlotte Green. “Composition of the desoxypentose nucleic acids of four genera of sea-urchin.” J Biol Chem 195, no. 1 (1952): 155-160. https://doi.org/10.1038/227561a0. Crick, Francis. “Central dogma of molecular biology.” Nature 227, no. 5258 (1970): 561563. https://doi.org/10.1038/227561a0. Del Giacco, Luca, and Cristina Cattaneo. “Introduction to genomics.” Molecular Profiling (2012): 79-88. Dickerson R. E., Ng H. L. 2001. DNA structure from A to B. Proc Natl Acad Sci U S A. 98(13): 6986-6988. https://doi.org/10.1007/978-1-60327-216-2_6. Elston, Robert C., Jaya M. Satagopan, and Shuying Sun. “Genetic terminology.” In Statistical Human Genetics, pp. 1-9. Humana Press, 2012. https://doi.org/10.1007/ 978-1-61779-555-8_1. Excoffier, Laurent, and Andre Langaney. “Origin and differentiation of human mitochondrial DNA.” American Journal of Human Genetics 44, no. 1 (1989): 73. Fredrick, Kurt, and Michael Ibba. “Errors rectified in retrospect.” Nature 457, no. 7226 (2009): 157-158. https://doi.org/10.1038/457157a. Goldsmith‐Fischman, Sharon, and Barry Honig. “Structural genomics: computational methods for structure analysis.” Protein Science 12, no. 9 (2003): 1813-1821. https://doi.org/10.1110/ps.0242903.

Animal Genomics

35

Goldstein, Bob, and Nicole King. “The future of cell biology: emerging model organisms.” Trends in Cell Biology 26, no. 11 (2016): 818-824. https://doi.org/ 10.1016/j.tcb.2016.08.005. Griffiths, Anthony J. F., Jeffrey H. Miller, David T. Suzuki, Richard C. Lewontin, and William M. Gelbart. “Quantifying heritability.” In An Introduction to Genetic Analysis. 7th edition. W. H. Freeman, 2000. ISBN 978-0-7167-3520-5. Harding, Stephen E., Guy Channell, and Mary K. Phillips-Jones. “The discovery of hydrogen bonds in DNA and a re-evaluation of the 1948 Creeth two-chain model for its structure.” Biochemical Society Transactions 46, no. 5 (2018): 1171-1182. https://doi.org/10.1042/BST20180158. Lamartino, Daniela, Ezequiel L. Nicolazzi, Curtis P. Van Tassell, James M. Reecy, Eric R. Fritz-Waters, James E. Koltes, Stefano Biffani et al. “Design and validation of a 90K SNP genotyping assay for the water buffalo (Bubalus bubalis).” PloS one 12, no. 10 (2017): e0185220. https://doi.org/10.1371/journal.pone.0185220. Makałowski, Wojciech. “The human genome structure and organization.” Acta Biochimica Polonica 48, no. 3 (2001): 587-598. https://doi.org/10.18388/abp. 2001_3893. Merrick, William C. “Mechanism and regulation of eukaryotic protein synthesis.” Microbiological reviews 56, no. 2 (1992): 291-315. https://doi.org/10.1128/mr.56. 2.291-315.1992. Miko, Ilona. “Thomas Hunt Morgan and sex linkage.” Nature Education 1, no. 1 (2008): 143. Muhammad, Moaeen-ud-Din. “Buffalo genome research-a review.” Animal Science Papers and Reports 32, no. 3 (2014): 187-199. Nobrega, Marcelo A., Yiwen Zhu, Ingrid Plajzer-Frick, Veena Afzal, and Edward M. Rubin. “Megabase deletions of gene deserts result in viable mice.” Nature 431, no. 7011 (2004): 988-993. https://doi.org/10.1038/nature03022. Ohkura, Hiroyuki. “Meiosis: an overview of key differences from mitosis.” Cold Spring Harbor Perspectives in Biology 7, no. 5 (2015): a015859. https://doi.org/10.1101/ cshperspect.a015859. Pray, Leslie. “Discovery of DNA structure and function: Watson and Crick.” Nature Education 1, no. 1 (2008). Pray, L. “Eukaryotic genome complexity.” Nature Education 1, no. 1 (2008): 96. Rexroad, Caird, Jeffrey Vallet, Lakshmi Kumar Matukumalli, James Reecy, Derek Bickhart, Harvey Blackburn, Mark Boggess et al. “Genome to phenome: improving animal health, production, and well-being–a new USDA blueprint for animal genome research 2018–2027.” Frontiers in genetics 10 (2019): 327. https://doi.org/10.3389/fgene.2019.00327. Sclafani, R. A., and TM2292467 Holzen. “Cell cycle regulation of DNA replication.” Annu. Rev. Genet. 41 (2007): 237-280. https://doi.org/10.1146/annurev.genet. 41.110306.130308. Simpson, Andrew J. G., Sandro J. de Souza, Anamaria A. Camargo, and Ricardo R. Brentani. “Definition of the gene content of the human genome: the need for deep experimental verification.” Comparative and functional genomics 2, no. 3 (2001): 169-175.

36

Asif Nadeem and Maryam Javed

Travers, Andrew, and Georgi Muskhelishvili. “DNA structure and function.” The FEBS journal 282, no. 12 (2015): 2279-2295. https://doi.org/10.1111/febs.13307. Xu, Xiufeng, Anette Gullberg, and Ulfur Arnason. “The complete mitochondrial DNA (mtDNA) of the donkey and mtDNA comparisons among four closely related mammalian species-pairs.” Journal of Molecular Evolution 43, no. 5 (1996): 438-446. https://doi.org/10.1007/BF02337515. Zimin, Aleksey V., Arthur L. Delcher, Liliana Florea, David R. Kelley, Michael C. Schatz, Daniela Puiu, Finnian Hanrahan et al. “A whole-genome assembly of the domestic cow, Bos taurus.” Genome biology 10, no. 4 (2009): 1-10. https://doi.org/10.1186/gb2009-10-4-r42.

Chapter 3

Transcriptomics Abstract Transcriptomics examines genetic expression at the level of RNAs, providing genome-wide data on its composition and function to uncover the molecular pathways underlying certain biological activities. An organism’s blue-print information is stored in its genome and transmitted via transcription to produce proteins. Though there are a number of RNAs produced in a cell which have several distinct roles, it is only mRNA through which this transfer of biological information takes place. The mRNAs function as a temporary intermediate element in the information transfer system through the production of mRNA transcripts. The entire transcripts produced within a cell are captured in a transcriptome. Analysis of the transcriptome has led to improvement in our knowledge of the RNA dependent gene regulation network. Development of the high throughput sequencing technologies such as next-generation has accelerated this process. Transcriptomics has recently expanded to include a wide range of agricultural and animal species, connecting the knowledge about transcriptomes between both areas. Newer approaches can constantly be introduced to address the shortcomings of older technologies and provide more precise insight on fundamental research concerns. In this chapter, we review current advances in various transcriptomics methods, such as microarray and RNA-seq, and explore their applications in different areas of research. Furthermore, we showcase pertinent implementations of these methods in the framework of farm animals and agriculture in order to demonstrate their enormous potential.

Keywords: transcriptomics, RNAs, transcripts, intermediate element, RNA dependent gene regulation network

Introduction Transcriptomics is study of transcriptome which is the whole set of RNA transcripts produced by genome of a particular cell under definite conditions

38

Asif Nadeem and Maryam Javed

by utilizing high throughput methods. The transcriptome is a complete set of RNA molecules present in a cell or in a population of cells. Sometimes, transcriptome is referred to all RNAs or mRNA depending on specific experiment. It is different from exome as only RNA molecules present in specific population of cell are included in it. It also includes the concentration of every RNA molecule. Differentially expressed genes in discrete cells under different circumstances are identified by comparative transcriptomics. Transcriptomics is usually described for mRNA content of cell. But, these techniques are also applicable for noncoding RNAs, not translated into protein, instead have regulatory functions (roles in RNA splicing, DNA replication, Transcriptional regulation and protein translation) (Noller 1991). Some of ncRNAs have role in disease states, including cardiovascular, cancer and neurological diseases (Hüttenhofer et al. 2005). The transcriptomics study, including , splice variant analysis and expression profiling examines level of RNAs expression in a cell population, usually directing on mRNA, but sometimes also include others such as sRNAs tRNAs. Transcriptomic approaches have broad application through different areas of biomedical research, comprising disease profiling and diagnosis. RNA-Seq methodologies allow the large scale identification of transcriptional start sites, novel splicing alterations and usage of alternative promoter. The importance of these regulatory elements in diseases makes crucial variants for interpretation of disease association (Costa et al. 2013). The understanding of disease causing variants is possible due to identification of single nucleotide polymorphisms (SNP) associated with diseases, gene fusions and allele-specific expression by RNA-Seq and gene fusions (Khurana et al. 2016). Researchers seeking knowledge about carcinogenesis and cell differentiation have keen interest for transcriptomes of cancer cells and stem cells. The identification of molecular mechanisms and signaling pathway of development in early embryonic stage by human oocytes or embryos transcriptome could be used as a powerful tool for suitable selection of embryo via in vitro fertilization (Assou et al. 2011). The pathways and genes interacting with various biotic and abiotic stresses of environments can be identified with transcriptomics. The novel transcriptional networks of complex systems are identified due to untargeted nature of transcriptomics. The transcriptional profiles linked with salinity and drought stresses were identified in a variety of chickpea lines at altered developmental stages by comparative analysis. The role of isoforms AP2-EREBP transcript was also observed (Garg et al. 2016). Transcriptomic profiling delivers vital

Transcriptomics

39

information for mechanisms of drug resistance. The artemisinin resistance association with slower progression in primary stages of development of asexual intraerythrocytic cycle and upregulation of unfolded protein response was identified from analysis of above 1000 Plasmodium falciparum in isolates from Southeast Asia (Mok et al. 2015). Every transcriptomic technique is useful in functional identification of genes and identification of the responsible certain phenotypes. Transcriptomics of Arabidopsis ecotypes, hyperaccumulate metals associated genes are involved in uptake of metal, homeostasis and tolerance with the phenotype (Verbruggen et al. 2009).

Transcriptomic Techniques Expressed Sequence Tags (ESTs) ESTs are short nucleotide sequences derived from a specific RNA transcript. Before the cDNA is sequenced, reverse transcriptase enzyme copies the RNA to cDNA (Marra et al. 1998). Prior to the introduction of high-throughput technologies such as sequencing by synthesis (Illumina/Solexa, San Diego, CA), the Sanger method for sequencing was the most used. Because ESTs do not require prior knowledge of the organism from which they originate, they can be used to identify a variety of organisms and environmental materials. Although greater throughput approaches are currently used, EST libraries have historically provided sequence information for early microarray designs. For example, 350,000 previously sequenced ESTs were used to build a GeneChip for barley (Close et al. 2004). Serial and Cap Analysis of Gene Expression (SAGE/CAGE) SAGE was an improvement of EST approach to enhance throughput of tags produced and also permit quantitation about transcript abundance. cDNA is produced from RNA then restriction enzymes, cutting at specific site and 11 bp along sequence digested it into fragments of 11 bp “tag.” Then the cDNA tags are sequenced by low throughput after head to tail concatenated in 500 bp of long strands, then concatenated head-to-tail into long strands (>500 bp) and sequenced using low-throughput. CAGE and SAGE methods give information for more genes than ESTs single sequencing but the preparation of samples and analysis are labour exhaustive (Velculescu et al. 1995).

40

Asif Nadeem and Maryam Javed

Microarrays Microarrays comprise of small nucleotide oligomers, arrayed on solid substrate called “probes.” Hybridisation of these probes to transcripts which are fluorescently labelled determined abundance of transcript. The transcript abundance for probe sequence is indicated by intensity of fluorescence for each location of probe (Barbulovic-Nad et al. 2006). But to produce probes for array, microarrays need prior information of organism, for example in form of sequence of annotated genome or in form of a ESTs library. The pursuit for data of transcriptome at individual cell level has determined improvements in RNA-Seq library methods of preparation, which resulted in dramatic developments in sensitivity. The transcriptomes of Single-cell are well described now and have been stretched into in situ RNA-Seq in which ndividual cells transcriptomes are interrogated directly in fixed tissues (Lee et al. 2014). RNA-Seq RNA-Seq is a combination of high-throughput sequencing and computational approaches for quantifying and capturing transcripts found in RNA extracts. The nucleotide sequences produced are usually about 100 bp long, although they might range from 30 bp to 10,000 bp depending on the sequencing technology used. RNA-Seq uses deep sampling of the transcriptome with small segments to allow computational restoration of the original RNA transcript to a reference genome and to each other, which is referred to as de novo assembly. Validation Analysis of Transcriptomes can be validated by an independent method, like quantitative PCR (qPCR) that is distinguishable and statistically measurable. The expression of genes is detected against standards for both genes of interest and genes of control. The measurement of qPCR is similar to that obtained by RNA-Seq in which value or the concentration of target in sample is measured. But qPCR is limited to smaller amplicons of 300 bp, typically near to 3ʹ end coding region, evading 3ʹ untranslated region (3ʹUTR). For requirement of transcript isoforms validation, a scrutiny of RNA-Seq read alignments must specify the placement of qPCR primers for maximum perception. A stable reference is produced in biological context by measurement of several control genes and the genes of interest. The majority of qPCR validation of RNA-Seq data has showed that different RNA-Seq methodologies are highly linked (Camarena et al. 2010).

Transcriptomics

41

Applications 

 



  



Transcriptomics is a developing and frequently growing field for discovery of biomarker to use for drugs safety and assessment of chemical risk. Individual transcriptomes can be utilized to infer evolutionary relationships. The transcriptome is a forerunner to the proteome, which is a genome’s complete set of expressed proteins. The comparatively small changes in expression of mRN may result in huge changes in total quantity of that protein found in the cell, which makes analysis complicated at mRNA level. Gene set enrichment analysis is a method which recognizes networks of coregulated gene instead of individual genes, up- or downregulated in various cell populations (Subramanian et al. 2005). DNA microarrays and next-generation sequencing technologies such as RNA-Seq are examples of transcriptomics approaches. Single-cell transcriptomics can look at transcription at the level of individual cells (Wang et al. 2009). RNA-Seq may be used to find genes within a genome or genes that are active at a specific time, and read counts can be used to model gene expression levels exactly (Tachibana 2015). RNA-Seq (high-throughput RNA sequencing) is quickly becoming the preferred method for the transcriptome-focused research, and it can also be employed to examine changes in expression of genes across the whole transcriptome (Ozsolak and Milos 2011) .The probable transcriptional mechanisms and components for bovine species, exposed to diverse physiological and phenotypic changes have been identified by unbiased and sensitive recognition of all expressed genes. These include negative energy balance, transcripts of liver linked with dietary restriction (Keogh et al. 2016) and different levels of feed efficiency within cattle. The transcriptomic profiling based on RNA-Seq has been also used for the identification of effects and underlying mechanism of diets on expression of hepatic gene in heifers by intake of equal metabolizable energy (ME).

42

Asif Nadeem and Maryam Javed

Transcriptome Databases Transcriptomics research generates massive amounts of data with uses that go far beyond the research’s initial goals. As a result, raw and processed data can be stored in public databases for the benefit of the scientific community at large. Table 3. Transcriptomics databases Transcriptomic databases Names

Host

Data

Description

EBI

Microarray RNA-Seq

Expression Atla

noncode.org

RNA-Seq

NONCODE

ENA

Microarray

Array Express

DDBJ

All

RefEx

NCBI

Microarray RNA-Seq

Gene Expression Omnibus

Privately curated

Microarray RNA-Seq

Plant and animal tissue-specific gene expression database. Secondary analyses and visualisations, such as Gene Ontology and InterPro domains, are displayed. Non-coding RNAs (NcRNAs), which do not include tRNA or rRNA. Accepts direct submissions and imports datasets from the Gene Expression Omnibus. Heatmaps showing gene expression projected onto 3D reconstructions of anatomical structures. MIAME and MINSEQE community standards were introduced in the first transcriptomics database. This project consists of hand curation of available transcriptome datasets with a focus on plant and medicinal biology.

Transcriptomics and the Mediterranean Diet Due varsality and persistent chemical properties of RNA, the transcriptional studies have been predominantly addressed in comparison to proteins having uniform patterns. High throughput gene expression by utilizing DNA chips and microarrays has been familiarized and applied to transcriptomes analysis for better understanding of interactions between gene expression and nutrients. The organism doesn’t have a single transcript like genome but for each cell there is one transcript which may change in definite environmental circumstances. The finding of a huge number of non-coding RNAs (approximately 70,000) having regulatory functions unlocks a new study field of nutrient action, focusing on study of transcriptomics, acting as end-point

Transcriptomics

43

for regulatory control. The changes in many tissues have been considered with various transcriptional programs (Uhlén et al. 2015), and the responses to these nutritional stimuli have recognized against varied transcription factors, having distinct physiological purposes (Arnal et al. 2015). The Mediterranean diet has been verified to be greatly operative in inhibition of cancer, cardiovascular diseases and in lowering generally mortality. Because of many regulatory biological processes by noncoding RNAs the transcriptomics is increasing its certain relevance. Many studies have done to provide evidence for impact of Mediterranean diet on different tissues transcriptomes over experimental models. But very limited information is available for regulatory RNA contributing towards effect. The virgin olive oil has special attention because diets rich in monounsaturated fatty acids inhibit expression of inflammatory genes in various tissues, which was observed with phenolic compounds of olive oil. The secoiridoids, tyrosol and hydroxytyrosol have been found to be predominantly active in expression of cell cycle and oleanolic acid which is a less studied terpene, is significant modulators for genes of circadian clock. That reaction with these compounds is common and has an vital level of complexity for different expressed genes in every tissue and number of various tissues in an organism by different studied tissues and organisms (HerreraMarcos et al. 2017).

Transcriptomics and IVF in Bovine In vitro fertilization (IVF) is an appreciated tool for both industry and research, having a range of applications from traits preservation for cloning to selection of gametes. Still there are difficulties in efficiency, despite approximately 339,685 bovine embryos transferred in 2010 alone. It is unusual to ensure additional than 40% of fertilized in vitro oocytes of cattle reaching at blastocyst stage on day 8 of culture, with pregnancy rates stated even less than 45% for production of in vitro embryos. To examine potential impacts invitro fertilization (IVF) on development of embryo, in vitro- and in vivo derived blastocysts of bovine were compared at the same stage and quality status (excellent quality, expanded) to regulate the grade of transcriptomic variation on morphology by utilizing RNA-Seq. IVF have influence at level of transcriptomes and the morphology is partial in complete description for preimplantation embryos of bovine (Driver et al. 2012).

44

Asif Nadeem and Maryam Javed

The changes in gene expression were analyzed produced by and in vitro production (IVP) and somatic cell nuclear transfer (SCNT) in elongated embryos of bovine by utilizing Affymetrix bovine genome array. The Day-16 embryos of bovine were taken from recipients by IVP, SCNT and artificial insemination (AI) for transcriptome analysis. Notwithstanding identical rates of in vivo growth, SCNT showed a considerable reduction in elongation size when compared to non-cloned embryos (186.6 mm for IVP, 196.3 mm for AI embryos, and 93.3 mm for SCNT). When transcript levels for 477 genes involved in several pathways such as proline and arginine metabolism, fatty acid metabolism, and glycerolipid metabolism were compared in AI embryos and SCNT embryos, it was discovered that transcript levels for 477 genes involved in several pathways such as proline arginine, and fatty acid metabolism, and glycerolipid metabolism were suggestively changed in AI embryos. Similarly, in AI and IVP embryos, 365 genes were expressed differently. As a result, a number of pathways were impacted, including the tight junction and TNRF-1 signaling pathways. Distinctive or mutual differentially expressed genes in IVP and SCNT embryos compared to fibroblast donor cells or AI were examined to predict the association of changed transcripts with errors of transcriptional reprogramming and with culture. In this regard, 71 transcripts were found which were not transcriptionally reprogrammed because their expression looked like the donor cells more compared to AI embryos and the residual transcripts were reprogrammed partly or incompletely. So, the deviations in gene expression, elongation size and In summary, the study discovered differences in the gene expression, elongation size, and corresponding molecular pathways for IVP and Day-16 SCNT conceptuses compared to AI counterparts, which could be linked to fetal development product (Betsha et al. 2013).

Potential Biomarkers in Cows by Granulosa Cell Transcriptomics Despite the fact that Ovum Pick Up-In vitro Production or OPU-IVP of the embryos is a unique reproductive method for cow production, the intricate mechanisms driving IVP results are unknown. To discover genes and biological processes for beneficial IVP, relevant features in donor cows, RNA from granulosa cells of Holstein cows during oocyte aspiration before IVP was sequenced, and IVP was performed independently for each animal. It led to the discovery of 56 genes that were found to be strongly linked to IVP scores

Transcriptomics

45

(kinetic, BL rate and morphology). HEY2, BEX2 RGN, TXNDC11, and TNFAIP6 were shown to be adversely linked with all IVP scores, whereas STC1 and Mx1 were found to be positively associated with all IVP scores. A variety of biological mechanisms, such as cell growth and proliferation, and death were discovered through functional analysis. These pathways are complicated by the four key upstream regulators (IL1, COX2, PRL, and TRIM24). It was discovered that a successful IVP outcome is inextricably linked to primary follicular atresia. It was also discovered that high-GI bulls can be employed for breeding without lowering IVP performance. These outcomes may contribute to development of biomarkers from fluid content of follicles and for improvement of Genomic Selection (GS) methods, utilizing functional information for breeding of cattle and allowing an extensive huge scale application of GS-IVP (Mazzoni et al. 2017).

Transcriptional Role for Biosynthesis of Milk Every day, the global demand for high-quality milk increases. Data on transcriptional and posttranscriptional control of genes that code for proteins involved in protein, lactose, and fat synthesis in the mammary gland of an organism can be used to increase milk production efficiency. The research on it, however, is at a lower level, but the data available on it clearly indicates that fact. Milk fat synthesis appears to be controlled, particularly in bovines, via a communication network involving LXR, PPAR, and SREBP1, as well as a possible function for other transcription factors such as ChREBP, Sp1, and Spot14. Insulin and amino acids including their transporters all have a role in milk protein synthesis via transcriptional and posttranscriptional pathways. The insulin-mTOR pathway is extremely important. Although the precise transcriptional regulation of lactose production in milk is unknown, glucose transporters play a critical role. They may also interact positively with amino acid transporters and the mTOR pathway (Osorio et al. 2016).

Liver Transcriptomic in Beef Cattle The selection of cattle beef for traits of feed efficiency (FE) is vital for economic and productive efficiency with less impact on the environment of livestock. There are eight genes that are expressed differentially between low and high feed efficient animals (LFE and HFE, respectively). The co-

46

Asif Nadeem and Maryam Javed

expression analyses have identified 34 gene modules, out of which 4 have a strong association with traits of feed efficiency. They were mostly enhanced for terms related to inflammation or direct inflammatory response. The LFE animals have shown high levels for serum cholesterol and biomarker GGT in liver injury. Histopathology of the liver revealed a high percentage of periportal inflammation as well as mononuclear infiltration. LFE animals had altered lipid metabolism and increased hepatic periportal lesions due to inflammatory response gathered predominantly from mononuclear cells, according to the study of the liver transcriptome network and other results (Alexandre et al. 2015). In milk production industries, feed is the principal flexible cost, so improvement in feed efficiency will provide improved practice of resources. This plan works thoroughly on descriptions of feed efficiency for dairy cattle by using innovative combined bioinformatics, genomics and methods of systems biology connecting transcriptomics differences with important traits or attributes linked to feeding efficiency of dairy cattle. Twenty cows (10 Jersey; 10 Holstein Friesian) have used for experiment. The two breed groups were divided into two groups of feed efficiency according to their status of feed efficiency which includes low or high efficiency. The mRNA was extracted from samples of liver biopsies to sequence RNA by using the Illumina HiSeq2500. Blood samples were collected for purpose of genotyping. Then the plasma was extracted from blood for study of NEFA, glucose, βhydroxybutyrates, urea and Triacylglyceride. The Feed efficiency, called Kleiber Ratio and Residual Feed Intake based on body weight, daily feed, body weight, dry matter intake and records of milk production were also measured. The bovine data of gene expression by RNAseq gene was analyzed by statistical or bioinformatics tools and systems biology methods to classify a list of genes differentially expressed, differentially wired networks, coexpressed genes, co-expression, hub genes/biomarkers and networks of transcriptional regulatory for feed efficiency. It provided information for metabolic processes at molecular level, nutrient partitioning, energy balance delivering projecting biomarkers in cattle for feed efficiency (Salleh et al. 2018). Gram-negative bacteria as Escherichia coli (E. coli) are expected as central agents causing severe disease of mastitis with medical signs in the dairy cattle (Ahmadzadeh et al. 2009). For prevention of this disease to other cows and reduction of inappropriate antibiotics, there should be a quick method for its diagnosis. Six studies were investigated which were based on microarray to examine mammary gland transcriptomic profile, induced by E.

Transcriptomics

47

coli infection causing mastitis. It focused not only on individual cells but also considers response to drug, reactions to hypoxia, anti-apoptosis along with positive regulation transcription of RNA polymerase II promoter developed by genes of up-regulation. Ten different attribute premium algorithms were used to give priority to identify small groups of genes providing significant information for E. coli mastitis. Twelve meta genes have been identified by majority of attribute premium algorithmsas as most informative genes comprising NFKBIZ, CXCL8 (IL8), HP, PDE4B, ZC3H12A, CASP4, CCL20, CXCL2, GRO1(CXCL1), S100A9, CFB, and S100A8. Interestingly, the outcomes have been confirmed that all these genes are key genes in the inflammation, immune response or mastitis. The models of Decision tree have well exposed the best grouping of meta genes as bio-signature and have confirmed that some top-ranked genes -CXCL2, ZC3H12A, GRO, CFB- acts as biomarkers in E. coli mastitis (with the accuracy of 83%). This research appropriately showed that by mixture of two new data withdrawal tools, machine learning and meta-analysis, improved command for detection of greatest informative genes, helping in improvement for diagnosis and treatment approaches for E. coli related to mastitis in cattle (Sharifi et al. 2018). Selective breeding in the cattle that have high feed efficiencies (FE) is an important aspect for meat and milk production in dairy cattle (Rotz et al. 2010). Universal expression of gene and their patterns in related tissues may be used for study of genes functions which are significantly involved in regulation of FE. In order to discover differentially expressed genes (DEGs) in low and high FE groups of cows, high throughput RNA sequencing data for liver biopsies was studied in 19 dairy cows (founded on Residual Feed Intake or RFI). The results of RNA-Seq gene expression in bovine liver were examined in order to identify DEGs and, as a result, molecular mechanisms, pathways, and potential biomarkers for feed efficiency. A total of 57 million reads (short reads and less than around 200-base mRNA sequences) were sequenced. On average, 52 million reads were mapped, with quantification of 24,616 known transcripts allowing for the identification of the bovine reference genome. In Jersey and Holstein cows, the comparison of low and high RFI groups revealed 70 and 19 DEGs, respectively. The DEGs discovered were the GIMAP and CYP genes for Jersey and Holstein cows, respectively, which are associated to the main immunodeficiency pathway and play a key role in feed consumption and sugar, lipid, and protein metabolism (Salleh et al. 2017).

48

Asif Nadeem and Maryam Javed

Comparative Transcriptomics Protozoan parasites are stated to display dangerous organ tropism e.g., the flagellate Tritrichomonas foetus (Mattos et al. 1997). T. foetus affects the reproductive system in cattle, causing abortion, however the infection results in chronic great bowel diarrhea in cats (Parsonson et al. 1976). In the absence of the T. foetus genome, a denovo technique was employed to acquire transcriptome of feline and bovine genotypes in order to uncover host specific adaptations and virulence components particular to each genotype. Illumina RNA-seq reads totaled 36,559 and 42,363 contigs in two typical feline and bovine transcriptomes, respectively. Coding with non-coding sections of genomic libraries revealed striking similarities, which were reduced to 7,547 coding orthologs for these two genotypes by 24,620 shared homolog pairs. Although variances exist for parasite origins/host, the transcriptomes were instant duplicates in distribution of functional category with no indication of differentiating stress acting on orthologs. Orthologs in both genotypes formed a considerable proportion of highly expressed transcripts (feline genotype: 56 percent, bovine genotype: 76 percent). The cysteine proteases (CP) were found to be the most abundant in libraries with protease virulence factors. 483 bovines and 445 feline T. foetus transcripts were identified as candidate proteases using the MEROPS database, with 9 hits as putative protease inhibitors. CP8 is the CP that has been specially transcribed. In bovine T. foetus, transcription of CP7 was more abundant, but transcription of CP7 was more abundant in feline genotype. The analysis of RNA-seq data using analysis of gene discovery revealed striking parallels between the feline T. foetus and the bovine T. foetus, indicating the present adaptation to host or niche. T. foetus constitues a unique occurrence of a mammalian protozoan expanding its parasitic grip by interacting with distantly related host lineages. The significance of host range with in silico drug directing was revealed, indicating that parasite targets in a single host were not basically suitable for the same parasite in a different host (Morin-Adeline et al. 2014).

Transcriptome Studies of Ovarian Granulosa Cells in Buffalo The normal ovulation and maturation of ovarian follicles is significant for conception and improvement of fertility in buffalo (Jain et al. 2016). However, the mechanism of molecular regulation of follicles growth in buffalo remains unknown. So a study was carried to analyze gene expression profiles linked

Transcriptomics

49

with growth of ovarian follicle in buffalo. A total of 13,672 differentially expressed genes (DEGs) and 17,700 unigenes were detected according to RNA sequencing analysis. In four stages of follicular growth, the total 30 mutual DEGs were recognized with basic synchronization of expression patterns which have suggested the products of these expressions of genes have role in cooperation for regulation of follicular development. Furthermore, KEGG and GO enhancement analyses exposed that the many of DEGs in initial stage of follicular growth have ribosomal and oxidative phosphorylation signaling pathways in abundance, with upregulation of these expression patterns for DEGs at the start of follicular growth (12 mm), indicating, an important role of immune system also in last stage of follicular ovulation and maturation. The study has provided a profile for gene expression in buffalo follicle growth, with insight in biological processes related with ovarian follicle growth molecular regulation (Li et al. 2017).

A Beneficial Effect on Hoof Transcriptomics in Cow Providing trace minerals in form of more bioavailable amino acid complexes (AAC) might help ameliorate occurrence in peripartal dairy cows for hoof disorders (Cha et al. 2010). The useful effects of peroxisome proliferatoractivated receptor (PPAR) for controlling genes expression dealing in antiinflammatory response, lipid metabolism and growth of ruminants have been studied (Bionaz et al. 2013). The upregulation of PPARD and PPARA resulting supplementation of AAC have suggested that more bioavailability of trace minerals might straightly or indirectly stimulate the transcription factors. The better mRNA abundance of PPARD or PPARG than PPARA for hoof tissue (Osorio et al. 2016) have shown the PPARA expression might be chiefly vital in biology of corium tissue. To investigate claw composition, inflammation, chemotaxis, oxidative stress, and transcriptional regulation, researchers looked at the effects of increasing metal AAC during the peripartal period on expression of 28 genes in corium tissue. Forty-four multiparous Holstein cows were fed a common diet for 30 days prior to parturition, as well as an oral bolus containing

50

Asif Nadeem and Maryam Javed

inorganic trace minerals (INO) or AAC (i.e., organic) Zn, Cu, Mn, and Co to achieve supplementation levels of 75, 65, 11, and 1 ppm, respectively, based on dry matter of total diet. Trace minerals of inorganic nature were delivered in sulfate form and AAC were delivered via Availa Mn, Availa Zn, Availa Cu, and COPRO (Zinpro Corp., Eden Prairie, MN). The score of locomotion score was also recorded afore enrollment and weekly during the whole experiment. The hoof health problems incidence at 30 d for milk was assessed before a hoof biopsy of a subset of cows (AAC = 9, INO = 10). Locomotion score didn’t differ for treatments in postpartum or prepartum period. The heel horn erosion incidence was lesser for AAC cows, but the sole ulcers incidence didn’t differ. Downregulation for CTH, KRT5, CYBB and CALML5, and upregulation of BTD for AAC cows showeddecrease in need of activation for cellular pathways in order to stimulate corium tissue and to enhance biotin availability for sole claw. The sole molecular changes may have been initiated by lower heel erosion incidence for response to AAC. The cows of AAC had more NFE2L2 expression which is a transcription factor regulating antioxidant enzyme SOD1 and antioxidant response in genes having association with oxidative stress. Among genes linked with inflammation, the AAC cows had more TLR4 expression, and lesser of TLR2 expression, TNF, IL1B, compared with INO cows. Supplementation of metal AAC through period of peripartal affected the genes expression dealing in oxidative stress, composition and inflammation status in corium (Osorio et al. 2016).

Whole Blood Transcriptomics in Pigs Molecular mechanisms essential for feed efficiency should be better understood for improvement of animal efficiency, a priority of research for support of competitive and maintainable production of livestock. A study was carried out to describe how changes in feed efficiency or consumed nutrients impact the pig blood transcriptome. Growing pigs of two breeding lines were divergently selected to give residual feed intake (RFI) and fed isocaloric and isoproteic diets contrasted in energy source and nutrients were considered. Pigs (n = 12 by line and diet) were fed a standard diet high in cereals and low in fat (LF) or a diet in which cereals were partially replaced by fibres (HF) and lipids between the ages of 74 and 132 days. The food and the line at the end of the feeding trial had no effect on the overall amount of white blood cells. However, in low RFI pigs, the amount of red blood cells was larger (P0.001) than in high RFI pigs. The number of probes found in the transcriptome of

Transcriptomics

51

whole blood using a porcine microarray was larger for differentially expressed (DE) in RFI lines than between diets (2,154 versus 92 DE, P0.01). When comparing low RFI pigs to high RFI pigs, there were 528 overexpressed genes and 477 underexpressed genes. The genes that were overexpressed were mostly involved in translational elongation. The inflammatory response, immunological response, cell structure, and anti-apoptosis processes were all regulated by the underexpressed genes. These findings indicate that RFI selection has influenced the status of immunological and defensive mechanisms in pigs. DE genes were largely associated to lipid metabolism and the immune system in both diets. The usefulness of blood transcriptome in identification of main biological processes which were influenced by feeding strategies and genetic selection was demonstrated (Jégou et al. 2016).

Transcriptomics and Ruminant Methanogenesis Methane (CH4) emissions account for 40-45 percent of greenhouse gas emissions in ruminant animals, with enteric fermentation accounting for about 90 percent of those emissions (McAllister and Newbold 2008). The conversion of carbon dioxide to CH4 is important for ruminants’ successful fermentation because it prevents an increase in reducing equivalents in the rumen. Rumen fungi and protozoa have a symbiotic interaction with methanogens, which are found in biofilms, the rumen wall, and feed. Transcriptomics and genomics are becoming increasingly important in characterizing the ecology of ruminal methanogenesis and identifying mitigating options. Metagenomic approaches have revealed fluctuations in abundances as well as the makeup of species in the methanogen community between ruminants with varied feed efficiency, CH4 emissions, and CH4 mitigator responses. The genomes of rumen methanogens have been sequenced, providing insight into surface proteins, benefits in vaccine development, and a certified assemblage of metabolic pathways to employ for chemogenomic approaches to reduce ruminal CH4 emissions. In a number of anaerobic conditions, transcriptomics has been utilized to identify variations in RNA abundance related to methanogens or methanogenesis. A study was conducted to look at the transcriptome of rumen methanogens (Shi et al. 2014). Transcriptomic investigations are useful for comprehending the methanogenesis process as well as the ecological and physiological adaptations of methanogens to their surroundings. Freitag and Prosser (2009) have investigated the dynamics of ecological methanogens using soil

52

Asif Nadeem and Maryam Javed

transcriptome analysis. Although there was no association between transcript abundance and methanogenesis, there was a linear correlation (r2 = 0.79) between the mcrA transcript gene ratio and the rate of CH4 synthesis in peat soil (Freitag and Prosser 2009). The analysis of metagenomics and metatranscriptomic for whole microbial communities of rumen are giving fresh outlooks on interaction of methanogens with other environmental characteristics, and these interactions can be changed to reduce methanogenesis. Identification of members of community producing, antimethanogen agents which inhibit or may kill methanogens might give information for identification of new approaches of mitigation. One example is the discovery of a lytic archaeophage that lyses only methanogens. Due to a dearth of sequence information relevant to the rumen microbial community, efforts to use genetic data to modify methanogenesis have been limited (McAllister et al. 2015).

A Transcriptomics Approach in Arabidopsis Thaliana A subset of plant defenses is disrupted by Arabidopsis thaliana PARG1 (poly(ADP-ribose) glycohydrolase) ablation or pharmacological suppression of poly(ADP-ribose) polymerase (PARP). The impact of altered poly(ADPribosyl)ation on the initial expression of genes induced by MAMPs, flagellin, and EF-Tu was examined (elf18). Statistical analysis and filtering revealed 178 genes with MAMP-induced mRNA abundance trends that were changed by the PARP inhibitor 3-aminobenzamide (3AB) or PARG1 deletion. From the 178 genes found, approximately fifty Arabidopsis T-DNA insertion lines were chosen and examined for basal defensive responses. knockouts of At3g55630 at3g55630 (FPGS3, a cytosolic folylpolyglutamate synthetase), At1g47370 (a TIR-X (Toll-Interleukin Receptor domain), and At5g64060 (a TIR-X (Toll-Interleukin Receptor domain)) as well as t5g64060 (a TIR-X (Toll-Interleukin Receptor domain)) and At5g64060 (a TIR-X (a projected pectin methylesterase inhibitor). “Innate immune response” was an overrepresented GO term in the elf18/parg1 study gene expression, emphasising elf18-activated defense-associated genes subgroup having altered expression in parg1 plants. A highly precise study of early mRNA abundance responses to elf18 and flg22 in Arabidopsis wildtype was also discovered, revealing several disparities. Because of its pleiotropic characteristics, the PARP inhibitor 3-methoxybenzamide (3MB) was also utilised in gene expression

Transcriptomics

53

profiling. MAMP-induced plant immunological responses, the impact of PARP inhibitors, and the molecular processes of poly(ADP-ribosyl)ation controlling MAMP-induced responses in plants were all recognised as possibilities for further exploration in this transcriptomics study (Briggs et al. 2017).

Conclusion Over the previous few decades improvements in sequencing technologies have increased our expectations about what can be investigated. These advanced technologies have made possible to explore the intricacies of life in great depth. By implementing these sequencing technologies in RNA research, we have gained a deeper understanding of the expression of genomes. The last few years has seen a tremendous rise in the in the number of sequenced transcriptomes of a number of organisms. This has been made possible due to the implementation of NGS based technologies in RNA sequencing-based research. Along with technological advancements that have made large-scale creation of transcriptome data possible, it is also vital to recognize the particular methodological and analytical obstacles that hinder standardization of transcriptomics still persist. Hopefully, in future, these problems will be addressed as this field of research matures.

References Ahmadzadeh, A., F. Frago, B. Shafii, J. C. Dalton, W. J. Price, and M. A. McGuire. “Effect of clinical mastitis and other diseases on reproductive performance of Holstein cows.” Animal reproduction science 112, no. 3-4 (2009): 273-282. https://doi.org/10.1016/j.anireprosci.2008.04.024. Alexandre, Pamela A., Lisette J. A. Kogelman, Miguel H. A. Santana, Danielle Passarelli, Lidia H. Pulz, Paulo Fantinato-Neto, Paulo L. Silva et al. “Liver transcriptomic networks reveal main biological processes associated with feed efficiency in beef cattle.” BMC genomics 16, no. 1 (2015): 1-13. https://doi.org/10.1186/s12864-0152292-8. Arnal, Carmen, Jose M. Lou‐Bonafonte, María V. Martínez‐Gracia, María J. Rodríguez‐ Yoldi, and Jesús Osada. “Transcriptomics and nutrition in mammalians.” Genomics, proteomics and metabolomics in nutraceuticals and functional foods (2015): 581-608. https://doi.org/10.1002/9781118930458.ch46. Assou, Said, Imène Boumela, Delphine Haouzi, Tal Anahory, Hervé Dechaud, John De Vos, and Samir Hamamah. “Dynamic changes in gene expression during human early

54

Asif Nadeem and Maryam Javed

embryo development: from fundamental aspects to clinical applications.” Human reproduction update 17, no. 2 (2011): 272-290. https://doi.org/10.1093/ humupd/dmq036. Barbulovic-Nad, Irena, Michael Lucente, Yu Sun, Mingjun Zhang, Aaron R. Wheeler, and Markus Bussmann. “Bio-microarray fabrication techniques—a review.” Critical reviews in biotechnology 26, no. 4 (2006): 237-259. https://doi.org/10.1080/ 07388550600978358. Betsha, S., M. Hoelker, D. Salilew-Wondim, E. Held, F. Rings, C. Grosse-Brinkhause, M. U. Cinar, V. Havlicek, U. Besenfelder, E. Tholen, C. Looft, K. Schellander and D. Tesfaye. “Transcriptome profile of bovine elongated conceptus obtained from SCNT and IVP pregnancies.” Molecular reproduction and development 80 (2013): 333. https://doi.org/10.1002/mrd.22165. Bionaz, Massimo, Shuowen Chen, Muhammad J. Khan, and Juan J. Loor. “Functional role of PPARs in ruminants: potential targets for fine-tuning metabolism during growth and lactation.” PPAR research 2013 (2013). https://doi.org/10.1155/ 2013/684159. Briggs, Amy G., Lori C. Adams-Phillips, Brian D. Keppler, Sophia G. Zebell, Kyle C. Arend, April A. Apfelbaum, Joshua A. Smith, and Andrew F. Bent. “A transcriptomics approach uncovers novel roles for poly (ADP-ribosyl) ation in the basal defense response in Arabidopsis thaliana.” PLoS One 12, no. 12 (2017): e0190268. https://doi.org/10.1371/journal.pone.0190268. Camarena, Laura, Vincent Bruno, Ghia Euskirchen, Sebastian Poggio, and Michael Snyder. “Molecular mechanisms of ethanol-induced pathogenesis revealed by RNAsequencing.” PLoS pathogens 6, no. 4 (2010): e1000834. https://doi.org/10.1371/ journal.ppat.1000834. Cha, E., J. A. Hertl, D. Bar, and Y. T. Gröhn. “The cost of different types of lameness in dairy cows calculated by dynamic programming.” Preventive veterinary medicine 97, no. 1 (2010): 1-8. https://doi.org/10.1016/j.prevetmed.2010.07.011. Close, Timothy J., Steve I. Wanamaker, Rico A. Caldo, Stacy M. Turner, Daniel A. Ashlock, Julie A. Dickerson, Rod A. Wing, Gary J. Muehlbauer, Andris Kleinhofs, and Roger P. Wise. “A new resource for cereal genomics: 22K barley GeneChip comes of age.” Plant Physiology 134, no. 3 (2004): 960-968. https://doi.org/10.1104/ pp.103.034462. Costa, Valerio, Marianna Aprile, Roberta Esposito, and Alfredo Ciccodicola. “RNA-Seq and human complex diseases: recent accomplishments and future perspectives.” European Journal of Human Genetics 21, no. 2 (2013): 134-142. https://doi.org/10. 1038/ejhg.2012.129. Driver, Ashley M., Francisco Peñagaricano, Wen Huang, Khawaja R. Ahmad, Katie S. Hackbart, Milo C. Wiltbank, and Hasan Khatib. “RNA-Seq analysis uncovers transcriptomic variations between morphologically similar in vivo-and in vitroderived bovine blastocysts.” BMC genomics 13, no. 1 (2012): 1-9. https://doi.org/10.1186/1471-2164-13-118. Freitag, Thomas E., and James I. Prosser. “Correlation of methane production and functional gene transcriptional activity in a peat soil.” Applied and Environmental Microbiology 75, no. 21 (2009): 6679-6687. https://doi.org/10.1128/AEM.01021-09.

Transcriptomics

55

Garg, Rohini, Rama Shankar, Bijal Thakkar, Himabindu Kudapa, Lakshmanan Krishnamurthy, Nitin Mantri, Rajeev K. Varshney, Sabhyata Bhatia, and Mukesh Jain. “Transcriptome analyses reveal genotype-and developmental stage-specific molecular responses to drought and salinity stresses in chickpea.” Scientific reports 6, no. 1 (2016): 1-15. https://doi.org/10.1038/srep19228. Herrera-Marcos, Luis V., José M. Lou-Bonafonte, Carmen Arnal, María A. Navarro, and Jesús Osada. “Transcriptomics and the mediterranean diet: A systematic review.” Nutrients 9, no. 5 (2017): 472. https://doi.org/10.3390/nu9050472. Hüttenhofer, Alexander, Peter Schattner, and Norbert Polacek. “Non-coding RNAs: hope or hype?.” TRENDS in Genetics 21, no. 5 (2005): 289-297. https://doi.org/10.1016/ j.tig.2005.03.007. Jain, A., T. Jain, P. Kumar, M. Kumar, S. De, M. Gohain, R. Kumar, and T. K. Datta. “Follicle-stimulating hormone–induced rescue of cumulus cell apoptosis and enhanced development ability of buffalo oocytes.” Domestic animal endocrinology 55 (2016): 74-82. https://doi.org/10.1016/j.domaniend.2015.10.007. Jégou, Maëva, Florence Gondret, Annie Vincent, Christine Trefeu, Hélène Gilbert, and Isabelle Louveau. “Whole blood transcriptomics is relevant to identify molecular changes in response to genetic selection for feed efficiency and nutritional status in the pig.” PloS one 11, no. 1 (2016): e0146550. https://doi.org/10.1371/journal. pone.0146550. Keogh, Kate, David A. Kenny, Paul Cormican, Alan K. Kelly, and Sinead M. Waters. “Effect of dietary restriction and subsequent re-alimentation on the transcriptional profile of hepatic tissue in cattle.” BMC genomics 17, no. 1 (2016): 1-16. https://doi.org/10.1186/s12864-016-2578-5. Khurana, Ekta, Yao Fu, Dimple Chakravarty, Francesca Demichelis, Mark A. Rubin, and Mark Gerstein. “Role of non-coding sequence variants in cancer.” Nature Reviews Genetics 17, no. 2 (2016): 93-108. https://doi.org/10.1038/nrg.2015.17. Lee, Je Hyuk, Evan R. Daugharthy, Jonathan Scheiman, Reza Kalhor, Joyce L. Yang, Thomas C. Ferrante, Richard Terry et al. “Highly multiplexed subcellular RNA sequencing in situ.” Science 343, no. 6177 (2014): 1360-1363. https://doi.org/10. 1126/science.1250212. Li, J., Z. Li, S. Liu, R. Zia, A. Liang, and L. Yang. “Transcriptome studies of granulosa cells at different stages of ovarian follicular development in buffalo.” Animal reproduction science 187 (2017): 181-192. https://doi.org/10.1016/j.anireprosci. 2017.11.004. Marra, Marco A., Ladeana Hillier, and Robert H. Waterston. “Expressed sequence tags— ESTablishing bridges between genomes.” Trends in Genetics 14, no. 1 (1998): 4-7. Mattos, A., A. M. Sole-Cava, G. DeCarli, and M. Benchimol. “Fine structure and isozymic characterization of trichomonadid protozoa.” Parasitology research 83, no. 3 (1997): 290-295. https://doi.org/10.1007/s004360050249. Mazzoni, Gianluca, Suraya M. Salleh, Kristine Freude, Hanne S. Pedersen, Lotte Stroebech, Henrik Callesen, Poul Hyttel, and Haja N. Kadarmideen. “Identification of potential biomarkers in donor cows for in vitro embryo production by granulosa cell transcriptomics.” PloS one 12, no. 4 (2017): e0175464. https://doi.org/10.1371/ journal.pone.0175464.

56

Asif Nadeem and Maryam Javed

McAllister, T. A., and C. J. Newbold. “Redirecting rumen fermentation to reduce methanogenesis.” Australian Journal of Experimental Agriculture 48, no. 2 (2008): 713. https://doi.org/10.1071/EA07218. McAllister, T. A., S. J. Meale, E. Valle, L. L. Guan, M. Zhou, W. J. Kelly, G. Henderson, G. T. Attwood, and P. H. Janssen. “Ruminant nutrition symposium: use of genomics and transcriptomics to identify strategies to lower ruminal methanogenesis.” Journal of animal science 93, no. 4 (2015): 1431-1449. https://doi.org/10.2527/jas.2014-8329. Mok, Sachel, Elizabeth A. Ashley, Pedro E. Ferreira, Lei Zhu, Zhaoting Lin, Tomas Yeo, Kesinee Chotivanich et al. “Population transcriptomics of human malaria parasites reveals the mechanism of artemisinin resistance.” Science 347, no. 6220 (2015): 431435. https://doi.org/10.1126/science.1260403. Morin-Adeline, Victoria, Rodrigo Lomas, Denis O’Meally, Colin Stack, Ana Conesa, and Jan Šlapeta. “Comparative transcriptomics reveals striking similarities between the bovine and feline isolates of Tritrichomonas foetus: consequences for in silico drugtarget identification.” BMC genomics 15, no. 1 (2014): 1-18. https://doi.org/10.1186/ 1471-2164-15-955. Noller, Harry F. “Ribosomal RNA and translation.” Annual review of biochemistry 60, no. 1 (1991): 191-227. https://doi.org/10.1146/annurev.bi.60.070191.001203. Osorio, J. S., Erminio Trevisi, C. Li, J. K. Drackley, M. T. Socha, and J. J. Loor. “Supplementing Zn, Mn, and Cu from amino acid complexes and Co from cobalt glucoheptonate during the peripartal period benefits postpartal cow performance and blood neutrophil function.” Journal of dairy science 99, no. 3 (2016): 1868-1883. https://doi.org/10.3168/jds.2015-10040. Osorio, J. S., F. Batistel, E. F. Garrett, M. M. Elhanafy, M. R. Tariq, M. T. Socha, and J. J. Loor. “Corium molecular biomarkers reveal a beneficial effect on hoof transcriptomics in peripartal dairy cows supplemented with zinc, manganese, and copper from amino acid complexes and cobalt from cobalt glucoheptonate.” Journal of dairy science 99, no. 12 (2016): 9974-9982. https://doi.org/10.3168/jds.201510698. Osorio, Johan S., Jayant Lohakare, and Massimo Bionaz. “Biosynthesis of milk fat, protein, and lactose: roles of transcriptional and posttranscriptional regulation.” Physiological genomics 48, no. 4 (2016): 231-256. https://doi.org/10.1152/ physiolgenomics.00016.2015. Ozsolak, Fatih, and Patrice M. Milos. “RNA sequencing: advances, challenges and opportunities.” Nature reviews genetics 12, no. 2 (2011): 87-98. https://doi.org/10.1038/nrg2934. Parsonson, I. M., B. L. Clark, and J. H. Dufty. “Early pathogenesis and pathology of Tritrichomonas foetus infection in virgin heifers.” Journal of comparative pathology 86, no. 1 (1976): 59-66. https://doi.org/10.1016/0021-9975(76)90028-1. Rotz, C. A., Felipe Montes, and D. S. Chianese. “The carbon footprint of dairy production systems through partial life cycle assessment.” Journal of dairy science 93, no. 3 (2010): 1266-1282. https://doi.org/10.3168/jds.2009-2162. Salleh, M. S., Gianluca Mazzoni, J. K. Höglund, D. W. Olijhoek, P. Lund, P. Løvendahl, and H. N. Kadarmideen. “RNA-Seq transcriptomics and pathway analyses reveal potential regulatory genes and molecular mechanisms in high-and low-residual feed

Transcriptomics

57

intake in Nordic dairy cattle.” BMC genomics 18, no. 1 (2017): 1-17. https://doi.org/10.1186/s12864-017-3622-9. Salleh, Suraya Mohamad. “A transcriptomics and systems biology approach to identify candidate genes and biological pathways determining residual feed intake in Danish dairy cattle.” Sharifi, Somayeh, Abbas Pakdel, Mansour Ebrahimi, James M. Reecy, Samaneh Fazeli Farsani, and Esmaeil Ebrahimie. “Integration of machine learning and meta-analysis identifies the transcriptomic bio-signature of mastitis disease in cattle.” PloS one 13, no. 2 (2018): e0191227. https://doi.org/10.1371/journal.pone.0191227. Shi, Weibing, Christina D. Moon, Sinead C. Leahy, Dongwan Kang, Jeff Froula, Sandra Kittelmann, Christina Fan et al. “Methane yield phenotypes linked to differential gene expression in the sheep rumen microbiome.” Genome research 24, no. 9 (2014): 15171525. https://doi.org/10.1101/gr.168245.113. Subramanian, Aravind, Pablo Tamayo, Vamsi K. Mootha, Sayan Mukherjee, Benjamin L. Ebert, Michael A. Gillette, Amanda Paulovich et al. “Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.” Proceedings of the National Academy of Sciences 102, no. 43 (2005): 15545-15550. https://doi.org/10.1073/pnas.0506580102. Tachibana, Chris. “Transcriptomics today: Microarrays, RNA-seq, and more.” Science 349, no. 6247 (2015): 544-546. https://doi.org/10.1126/science.349.6247.544. Uhlén, Mathias, Linn Fagerberg, Björn M. Hallström, Cecilia Lindskog, Per Oksvold, Adil Mardinoglu, Åsa Sivertsson et al. “Tissue-based map of the human proteome.” Science 347, no. 6220 (2015). https://doi.org/10.1126/science.1260419. Velculescu, Victor E., Lin Zhang, Bert Vogelstein, and Kenneth W. Kinzler. “Serial analysis of gene expression.” Science 270, no. 5235 (1995): 484-487. https://doi.org/10.1126/science.270.5235.484. Verbruggen, Nathalie, Christian Hermans, and Henk Schat. “Molecular mechanisms of metal hyperaccumulation in plants.” New phytologist 181, no. 4 (2009): 759-776. https://doi.org/10.1111/j.1469-8137.2008.02748.x. Wang, Zhong, Mark Gerstein, and Michael Snyder. “RNA-Seq: a revolutionary tool for transcriptomics.” Nature reviews genetics 10, no. 1 (2009): 57-63. https://doi.org/10.1038/nrg2484.

Chapter 4

Animal Proteomics: An Overview Abstract Proteomics entails the parallel investigation of several different aspects of proteins with the goal of obtaining thorough assessments of the construction, activities, and regulation of biological processes in both normal and disease conditions. The extension of the spectrum of biological investigations from reductionist approach involving biochemical examination of individual proteins to proteome-wide measures has been driven by advancements in proteomics approaches and technology. Researchers use proteomics-based techniques for a variety of purposes, including the identification of diverse diagnostic markers, pathogenicity processes, vaccine candidates, changes in expression profiles in reaction to a variety of signals, and analyzing protein pathways in different diseases. The application of proteomics in animal sciences has been termed as animal proteomics. Animal husbandry and wellbeing are critical economic factors, especially when it comes to the global food supply. Proteomics in animals and veterinary medicine is a rapidly developing subject that has enormous potential not just for basic and applied research into the physiology and pathophysiology of domestic animals, but also for translational implications in the human disease studies. While there are some obvious technological barriers to its development, these could be addressed now and in coming years with more participation from proteomic researchers, that will lead to broadened access to species-specific proteins and data volume growth of bioinformatics databases.

Keywords: proteomics, proteome-wide measures, animal sciences, animal husbandry, translational implications

Introduction Proteomics is the large-scale study of proteomes in which, every proteome is the entire set of proteins generated inside a particular system’s context, biological setting, or organism. The proteome of a species, like Homo sapiens

60

Asif Nadeem and Maryam Javed

(humans), or the proteome of an organ, for example, the liver, are such examples. Every cell has its own unique proteome, which evolves over time. To some extent, the proteome reveals the main transcriptome. However, protein activity, which is commonly measured by the protein’s response rate of processes, is influenced by a variety of variables, including a key gene’s expression levels. Proteins are vital components of living organisms, serving a variety of activities. In 1997, the proteomics term was coined as a synonym for genomics, which entails studying the entire genome (James 1997). Marc Wilkins, a PhD student at the Macquarie University at that time, coined the term proteome, which is a combination of genome and protein. In 1995, Macquarie University established the first dedicated proteomics laboratory the Australian Proteome Analysis Facility (abbreviated as APAF) in Australia (Williams and Walsh 1996). The proteome is a collection of proteins created or transformed by the biological system or organism, and it fluctuates according to the demands, as well as time and stresses that a cell or organism encounters (Anderson et al. 2016). The field of proteomics encompasses various fields that has profited immensely from the Human Genome Project’s genetic data (Hood and Rowen 2013). The research area also entails doing scientific studies and investigations into proteomes at all levels of the intracellular protein structure, content, and activity characteristics. In functional genomics, proteomics also plays a vital role. After transcriptomics and genomics, proteomics forms the subsequent phase in the examination of biological systems, and it is more challenging than genomics as the genome of an individual is somewhat constant.

Post-Translational Modifications Differences are not produced only during translation from mRNA but after translation, a wide diversity of chemical modifications also produce a variety of proteins and many of these posttranslational modifications are vital for function of protein.

Phosphorylation Phosphorylation is one of these posttranslational modifications, which occurs in many structural proteins including enzymes during the cell signaling process. The insertion of one phosphate to a specific amino acid, most

Animal Proteomics

61

typically threonine and serine (Olsen et al. 2006) which is facilitated by threonine kinases or serine, or more seldomly tyrosine, which is facilitated by tyrosine kinases, is known as phosphorylation. It allows the protein that aids in the process to become a binding target and then engage with a small group of different proteins that identify the phosphorylated domain. The phosphorylation of proteins is among the most commonly investigated protein modifications, and numerous “proteomic” studies have focused on identifying the collection of phosphorylated proteins within a certain type of cell or tissue under various conditions. In this way, scientists are notified if any signaling pathways are active during the experiment.

Ubiquitination As a minor protein, ubiquitin is attached to specific protein substrates by E3 ubiquitin ligases enzymes. Understanding how proteins pathways are controlled is aided by determining the fate of poly-ubiquitinated proteins. As a result, this constitutes an extra relevant “proteomic” study. In the same manner that the researcher decides which substrates are ubiquitinated by which ligase, the researcher also determines which ligases are expressed in which cell type. Further Modifications Proteins are exposed to a variety of changes besides ubiquitination and phosphorylation, such as acetylation, glycosylation, methylation, nitrosylation, and oxidation. Certain proteins go through all of these changes, mostly in combinations dependent on time, which adds to the difficulty of studying structure and function of protein.

Proteomics Databases Table 2. Proteomics databases Sr. No. 1

Names PSORT (WWW Server)

2

PRoteomics IDEntifications database (PRIDE)

Description PSORT is a program that predicts protein levels in cells. It accepts information on amino acid sequences, including their origin, and analyses them using predefined rules for various sequence properties. PRIDE is a public data repository for proteomics data that adheres to the centralized compliant standards

62

Asif Nadeem and Maryam Javed

Table 2. (Continued) Sr. No. 3

Names Laboratory of Mass Spectrometry and Gaseous Ion Chemistry

4

The Global Proteome Machine Organization

5

PeptideAtlas

6

Open Proteomics Database

7

pepMapper

8

Protein Prospector

Description It includes multiple tools for protein sequence searching, retrival and analysis finding modifications and identifying proteins by matching mass spectra. The Global Proteome Machine Organization was established in order to enable scientists working in proteomics who employ tandem mass spectrometry to examine proteomes utilizing the data. The Peptide Atlas is an online database of peptides which also contains a vast amount of proteomics information obtained from tandem mass spectrometry technique. OPD is a publicly acessible database for mass spectrometry-based proteomics data storage and dissemination. Spectra approximately 3,000,000 are currently stored in this database. Mapper, the UMIST protein is a search tool using Mass Spec. data produced by digestion of a protein It is an online tool to mine sequence databases in combination with Mass Spectrometry investigations in the field of proteomics.

Detection of Proteins In proteomics, there are a variety of approaches for studying proteins. Immunoassays which include employing antibodies are routinely used to detect proteins, as is mass spectrometry. In case of studying a particular complex biological sample, a uniquely specific antibody may be required for quantitative dot blot or qdb analysis, or a biochemical separation may be necessary before detecting or quantifying the complex biological sample, due to the presence of several analytes within the sample.

Using Antibodies for Detection of Proteins In cell biology and biochemistry research, antibodies specific to certain proteins or their modified forms are widely utilized. Antibodies of this type are now one of the most widely utilized tools among molecular scientists. Protein detection can be done with a number of different approaches and processes. For decades, the enzyme-linked immunosorbent assay (abbreviated as ELISA) has been used for the detection and quantification of proteins in the

Animal Proteomics

63

sample materials. For discrete proteins, the Western blot may also be utilized for both detection and quantification. In a complex combination, however, the proteins are separated using SDS-PAGE and then identified with an antibody. The study of modified proteins is done by creating a particular antibody for that change. For instance, antibodies that recognize certain proteins that have been phosphorylated on tyrosine are referred to as phospho-specific antibodies. Similarly, several antibodies specific to unique changes exist that can be employed to identify a set of proteins that have been altered. The emerging revolution in primary diagnosis and treatment at the molecular level is being driven by disease detection. Protein biomarkers are in little supply, which makes early detection difficult. The upper femtomolar range (containing 10(-13) M) is the lower threshold of detection for conventional immunoassay technology. The digital immunoassay technique has enhanced detection sensitivity by three logs, to the range in attomoles i.e., 10(-16) M, which has the capacity to open new doors in treatments and diagnostics. However, these techniques do not constitute as being well matched for regular usage (Wilson et al. 2016).

Antibody-Free Detection In molecular biology, there are alternative ways for detecting proteins besides antibodies. These approaches have a number of benefits, including the ability to ascertain the protein or peptide sequence, better throughput than detection methods based on antibodies, and the ability to identify and quantify the proteins in a sample for which no antibody exists. Methods for Detection A 1967 invention, Edman degradation, is one of the oldest methods for protein analysis, which involves subjecting one peptide at a time to various chemical breakdown processes to obtain its sequence. However, greater throughput technologies have since taken their place. Recent approaches rely on mass spectrometry. The invention of “soft ionization” approaches like matrix-assisted laser desorption/ionization or MALDI and electrospray ionization (ESI) allowed for the development of mass spectrometry-based procedures.

64

Asif Nadeem and Maryam Javed

Separation Methods Complex biological sample analysis necessitates a sample complexity reduction, which could be accomplished off-line via single or two-dimensional separation. On-line methods have now emerged in which discrete peptides (referred to as bottom-up proteomics methodologies) are isolated by reversedphase chromatography and then directly ionized by the ESI. The phrase “online” for analysis is clarified by the direct coupling of analysis and separation.

Hybridization of the Technologies For identification as well as, quantification there are many different hybrid technologies based on antibody and then mass spectrometric analysis is performed. For instance, Randall Nelson created the mass spectrometric immunoassay (MSIA) approach in 1995, and Leigh Anderson introduced the Stable Isotope Standard Capture with Anti-Peptide Antibodies (SISCAPA) technique in 2004.

Modern Research Methodologies 



Fluorescence two-dimensional differential gel electrophoresis or 2-D DIGE as most commonly referred to may be used for quantification. The variations in 2-D DIGE process could help establish statistically effective thresholds for transfering measurable changes between the samples. An examination of Comparative proteome analysis can reveal the function of proteins in complicated biological systems like reproduction in many living organisms. The insecticide triazophos, for instance, produces an excess of male accessory gland proteins (Acps) in brown planthoppers (Nilaparvata lugens (Stl)), which are passed to females via mating, which leads to an increase in fertility, or a number of births a female could give. Researchers used comparative proteome analysis of mated N. lugens females to determine the alterations in reproductive proteins and accessory gland proteins (Acps) caused by male planthoppers stealing female planthoppers. The findings revealed that these proteins play a function in the microbial reproduction process (Ge et al. 2011).

Animal Proteomics







65

Arabidopsis peroxisomal proteome analysis has proven to be a valuable, impartial approach for discovering new peroxisomal proteins on a wide scale. The human proteome, which is expected to comprise between 20,000 and 25,000 non-redundant proteins, is characterized using a variety of approaches. Because of proteolysis and RNA splicing, the number of unique proteins for each species appears to increase between 50,000 and 500,000. The overall number of exclusive human proteins is likely to be in the millions if post-translational modification is taken into account (Jensen 2004). The proteome of animal cancers was recently published. In Macrobrachium rosenbergii, this approach has been employed as a functional approach for profiling of proteins.

High-Throughput Technologies for Proteomics With the development of multiple methodologies, proteomics has gained traction during the last decade. Some are entirely new, while others are founded on old techniques. The most frequent techniques for studying proteins on a wide scale are techniques such as microarrays and mass spectrometry. Mass Spectrometry and Protein Profiling For protein profiling, two mass spectrometry-dependent approaches are presently being used. The most well-known and widely used method involves the separation of proteins from distinct materials using two-dimensional electrophoresis with high resolution, followed by selection and labelling of selectively expressed proteins for identification by mass spectrometry. It has limitations, despite its advancements in 2DE and maturity. The difficulty in identifying all the proteins contained in a sample with various characteristics and expression levels is the main source of dissatisfaction. Stable isotope tags are used in the second measurable approach to label proteins in mixes of two separate complexes differentially. After being isotopically tagged, the proteins in a heterogeneous mixture are digested to yield labelled peptides. After merging labelled mixtures, the peptides are separated using multidimensional liquid chromatography and then evaluated using tandem mass spectrometry. Isotope tags are usually used affinity tag or ICAT reagents tagged with isotope. This approach reduces the complexity of mixtures by removing non-cysteine residues by covalently attaching cysteine proteins residues with ICAT reagent.

66

Asif Nadeem and Maryam Javed

Quantitative proteomics with stable isotope tagging is becoming a more valuable instrument in modern research methods. To begin, chemical processes are utilized to tag the particular locations of proteins to probe their functions. To stop the protein fraction in a complex mixture, isotopic labelling is employed to separate phosphorylated peptides with selective chemistries. Pure or partially purified macromolecular complexes such as preinitiation complex, big RNA polymerase II, and proteins complexed transcription factor are differentiated using the ICAT process. ICAT labelling has recently been used with chromatin isolation to detect and measure proteins that have structural or functional associations with chromatin. To summarize, ICAT reagents can be used to do proteome profiling of cellular compartments and organelles (Weston and Hood 2004).

Protein Chips Micro arrays supplement the use of mass spectrometers in healthcare and proteomics. In biological sample analysis, protein microarrays are used for printing hundreds of identifying protein characteristics. Antibody arrays, for instance, detect antigens in a human blood sample using a variety of antibodies. Another way is to explore features such as protein-protein, proteinligand, and protein-DNA interactions by arraying multiple types of protein. The functional proteomic arrays have the complete proteins set for an individual organism in question. The 5000 pure yeast proteins were placed on glass microscope slides in the first version of these arrays. Despite the initial device’s success, implementing protein arrays proved to be more difficult. Proteins are more difficult to work with than DNA because they have a broader dynamic range, are less stable, and fail to maintain their shape on glass slides. In comparison to protein chip technologies, the global ICAT technology has significant benefits (Weston and Hood 2004). Reverse-Phased Protein Microarrays This is a more advanced microarray tool for diagnosing, studying, and treating complex disorders like cancer. For the creation of reverse phase protein microarrays, the approach combines laser capture microdissection or LCM and microarray technology. The entire collection of proteins is immobilized in these microarrays to record different phases of disease in a single individual. Reverse phase arrays, when used in conjunction with LCM, can track the dynamic status of the proteome in a diverse population of cells in a small area of human tissue. This method can be used to examine the condition of cellular signaling molecules in tissue cross sections, including both malignant and

Animal Proteomics

67

healthy cells. This method is valuable for tracking the status of critical variables in both invasive prostate cancer and healthy prostate tissues. The tissue is subsequently dissected by the LCM and epithelium, and protein lysates are arrayed on nitrocellulose slides and detected with particular antibodies. This technique can monitor a wide range of molecular processes and compare healthy and diseased tissues in the same patient, allowing for the development of medicines and diagnostics. The capability to obtain proteomics printouts of nearby cell populations by using reverse phase microarrays and LCM offers a wide range of uses outside of tumor research. The approach is useful for describing developmental abnormalities and processes, as well as normal pathology and physiology of entire tissues (Weston and Hood 2004).

Practical Uses of Proteomics The discovery of prospective novel medications to treat a variety of diseases, which depends on knowledge from the genome and proteome for detection of proteins related to a disease, is a key advancement that has resulted from investigating genes and proteins in humans. The information is then used as a target by computer software to create new medications.

Interaction Proteomics and Protein Networks Protein interactions are studied at all dimensions, from binary interactions to proteome or network scales, in interaction proteomics. The most prevalent route for proteins to operate is through protein–protein interactions, and the purpose of interaction proteomics is to find protein complexes, as well as binary protein interactions and the interactomes. Probe protein–protein interactions can be done in a variety of ways. While yeast two-hybrid analysis is the most conventional approach, affinity purification followed by protein mass spectrometry using tagged protein baits is a promising new technology. Surface plasmon resonance or SPR, dual polarization interferometry, microscale thermophoresis, protein microarrays, and empirical approaches like phage display and computational techniques are some of the other methodologies (Bensimon et al. 2012).

68

Asif Nadeem and Maryam Javed

Expression Proteomics Protein expression research at a larger scale is called expression proteomics. It aids in the detection of major proteins found in a specimen, as well as proteins that are expressed differently in different samples, such as normal tissues vs diseased tissues. If a protein is present in only the diseased sample, then it is useful to be used as a diagnostic marker or drug target. Proteins having similar or same expression profiles might also be related functionally. The technologies like mass spectrometry and 2D-PAGE are used for expression proteomics (Smith and Life 2014).

Biomarkers It is defined as “A characteristic that is objectively measured and assessed as an indicator of standard biological processes, pathogenic processes, or pharmacologic reactions for a therapeutic intrusion,” by the National Institutes of Health (AJ et al. 2001). Proteogenomics In proteogenomics, proteomic methods like mass spectrometry are utilized to improve annotations of genes. Genome analysis and Parallel proteome aids in the detection of post-translational changes associated with proteolytic processes, especially when comparing several species (comparative proteogenomics) (Gupta et al. 2008). Structural Proteomics Structural proteomics is the study of protein structures on a large scale. Additionally, it likens protein structures and detects newly found genes’ activities. The structural analysis aids in the knowledge of binding of drugs to proteins as well as protein interaction. This is accomplished through the use of several techniques like NMR spectroscopy and X-ray crystallography.

Proteomics Method in Quantification of Dairy Products Fraud in dairy products and milk happens when cow milk is added to goat or sheep milk for commercial reasons. Not any selective, sensitive and reliable method occurs for quantification of milk percentage for various species. The validation and development of a proteomics-based method was reported for qualitative detection as well as for quantitative determination of goat, sheep

Animal Proteomics

69

and cow milks in raw resources which is used for dairy products. The protein marker selected was β-Lactoglobulin because of main protein in whey powder and milk. The tryptic peptides LAFNPTQLEGQCHV and LSFNPTQLEEQCHI were used as sign peptides for sheep, goat milks and for cow milk respectively. The avian peptides LKALPMHIRLAFNPTQL *EGQCHV* and LKALPMHIRLSFNPTQL*EEQCHI* were designed and then synthesized like internal standards. The method was validated having, good sensitivity, reproducibility, specificity, accuracy and precision and this method is applicable easily in laboratory analysis for routine use without any intensive background of proteomics (Chen et al. 2016).

Proteomics of Dairy Cow Fatty Liver Metabolism The shift from late pregnancy to primary lactation, when physiological changes occur to enable a rapid rise in milk yield, is when the high-yielding dairy cow experiences the most alterations. The coordination of physiological processes slowed the release of energy and nutrients from human tissue, preventing the quick surge in nutrient requirements for milk production during primary lactation. Non-esterified fatty acids or NEFAs in the adipose tissues, release massive amounts of energy, and a significant rise in energy intake is putting cows in a negative energy state of balance. Around 25% of NEFA permits the liver, metabolized and then esterified into triglyceride. When the triglyceride develops in excess, liver becomes fatty and hepatic lipidosis usually occurs like a subclinical disease state will occur (Kuhla and Ingvartsen 2018).

Milk Authenticity by Ion-Trap Proteomics Unfortunately, the introduction of additional contaminated substances in milk to boost revenues is a global problem. Furthermore, higher-priced milk from the minor dairy animal species is frequently illegally mixed with lesser milk of cows. The existence of these species-specific proteins, which differ from those reported in the tag, can be a serious concern for some allergy patients. As a result, improving appropriate analytical methodologies is critical to safeguarding consumer care and product validity.

70

Asif Nadeem and Maryam Javed

A proteomic methodology for exposure of adulteration processes in mixtures of milk was proposed. A few microliters of milk were digested using chymotrypsin and trypsin before being evaluated using nanoLC-ESI-IT-MS. A post-database processing was completed to gain confidence in the sequence of peptide assignments, allowing detection of milk adulteration at a level lower than 1%. The specie specific peptides from bovine αS1 casein and βlactoglobulin have been identified as suitable peptide indicators for milk authenticity (Nardiello et al. 2018).

Comparative Proteomics Analysis of Laminitis in Chinese Holstein Cows The most common cause of hoof lameness in dairy cows is laminitis, resulting in significant economic losses in the dairy industry. Nineteen sites of protein were expressed differentially in Chinese Holstein cows for proteome investigation of laminitis, with identification of 16 types of proteins following mass fingerprint screening of peptide and bioinformatics study. Twelve proteins were differentially up-regulated, whereas four were down-regulated. Differentially expressed proteins were shown to be involved in lipid metabolism, glucose metabolism, inflammatory response, molecular transport, immunological modulation, oxidative stress, and other processes. DEPs were closely linked to the occurrence and progression of laminitis in the dairy cows, suggesting that lipid metabolic disruption may represent a novel mechanism in the development of laminitis in the dairy cows. These findings laid the groundwork for further research into the causes of laminitis, as well as the identification of early diagnostic proteins and treatment targets (Dong et al. 2015).

Proteome Dynamics of Autoimmune Uveitis in Horse Intermittent uveitis in horses, which targets T cells on retinal proteins, is the only impulsive model of recurrent autoimmune uveitis for humans. The proteomes of peripheral blood derived lymphocytes (PBLs) isolated from the same case of interphotoreceptor retinoid binding protein induced uveitis were examined before (Day 0), during (Day 15), and after the uveitic attack to identify differences between auto aggressive and normal lymphocytes (Day 23). In a tlabelfree differential proteome study he quantitative, comparative

Animal Proteomics

71

protein abundances in PBL were investigated in cells that had been frozen for 14 years since the initial experiments. In all three time points, quantitative data for 2632 proteins could be obtained. When comparing Day 0 with 15, which indicates acute inflammation (1070 regulated proteins), and Day 0 with 23, there were significant alterations (twofold change) in the quantity of PBL protein. Significant alterations were made to proteins involved in integrin signalling during dynamic uveitis and pathways involving “Erk and pi3 kinase, necessary for collagen binding in corneal epithelia,” “integrin linked kinase signalling,” and “integrins during angiogenesis.” In contrast, following a cessation uveitic attack, proteins associated with “nongenotropic androgen signaling,” “Amb2 integrin signaling,” and “classical complement pathway” were significantly altered. The importance of the findings and proving usefulness of the induced model for simulating spontaneous autoimmune uveitis is shown by the fact that numerous members of critical pathways changed earlier for naturally occurring uveitis. The PRIDE partner repository has been used to save all of the MS data to the ProteomeXchange collaboration (Hauck et al. 2017).

Proteomics Analysis of Frozen Horse Mackerel The effects of high-pressure processing (HPP) (150, 300, and 450 MPa for 0, 2.5, and 5 minutes) on total sodium dodecyl sulphate (SDS)-soluble and sarcoplasmic proteins in frozen horse mackerel (Trachurus trachurus) were investigated. The researchers employed proteomics technologies based on image analysis of SDS–PAGE protein gels and protein identification using tandem mass spectrometry (MS/MS). Although there were no substantial variations in the total SDS-soluble fraction caused by HPP, this processing improved t1-D SDS–PAGE sarcoplasmic forms in a direct dependent manner, exerting a discriminating effect on particular proteins depending on processing settings. Phosphoglycerate mutase 2, pyruvate kinase muscle isozyme, betaenolase, glycogen phosphorylase muscle form, triosephosphate isomerase, and phosphoglucomutase-1 were all significantly reduced when maximum pressure (450 MPa) was applied (Pazos et al. 2015).

72

Asif Nadeem and Maryam Javed

Proteomic Characterization of Bovine Mammary Gland Proteomic investigations rely on the separation of proteomes and the determination of abundance for individual proteins, both of which are reliant on the presence of databases (private or public) that are essential for protein identification and abundance determination. In recent years, specific cell proteomes for tissues in the bovine mammary gland have been studied. The proteome of mammary epithelial cells, milk-secreting cuboidal cells positioned in the innermost layer of alveoli, was studied in Bos indicus, as well as their role in lactation. Milk was used to isolate mammary epithelial cells, which were then identified using a 1D-Gel-LC-MS/MS and 2DE MALDI-TOF/TOF MS-based approach, which resulted in the discovery of 500 distinct proteins in 28 different pathways involved in lactation function. The proteome of the teat canal lining (TCL) of bovines was also investigated. When dangerous germs infect the mammary gland, that structure serves as the initial line of defense. Western blotting, 2DE, and fluorescence immunohistochemistry were used to determine the amount and location of the structure’s selected proteins. The location of two main protein clusters, including the S100 families and keratin, in teat canal keratinocytes, was also determined. In TCL, there was a significant variance in the quantity of S100 proteins (Smolenski et al. 2014). A one-dimensional SDS-PAGE proteomic study of mammary tissue microsomes in lactating cow mammary tissue was performed using RPLCESI-MS/MS. Over 700 proteins were discovered and categorized based on their biological roles, subcellular localization, and relevance to lipid metabolism (Peng et al. 2008).

Bovine Mastitis and Proteomics In terms of animal health and welfare, mastitis is the most common condition caused by bacterial or viral pathogens infecting the gland, resulting in major economic losses in the dairy sector (Hogeveen et al. 2011). Over the last decade, milk protein analysis in response to infection has increased the use of increasingly sophisticated technology and proteomic methods (Boehmer 2011, Duarte et al. 2015, Verma and Ambatipudi 2016, Wheeler et al. 2012) Proteomic improvements have been mirrored in increasingly informative milk research, which has progressed from early studies that employed 2DE and

Animal Proteomics

73

MALDI-TOF to detect and segregate high abundant milk proteins (Galvani et al. 2001). The alterations that occur during mammary gland infection (Hogarth et al. 2004) have represented 292 proteins which at different stages of the process, their abundance fluctuates. In experimentally induced Streptococcus uberis (S. uberis) mastitis, the reduction in the concentration of the high abundance proteins of normal, healthy milk was validated 30 hours after infection (Thomas et al. 2015). Later infection increased albumin levels at 36 hours and lactoferrin levels at 57 hours. Proteomic analysis of reactions to S. uberis, predominantly using LC-MS and 2DE, identified 68 antimicrobial, immunological, or inflammatory regulatory functions, as well as pathogen detection (Smolenski et al. 2007). Proteomic investigations on the mammary gland of small ruminants (sheep and goat) are extremely rare. The bulk of omics research in the mammary gland of sheep (Ovis aries) appears to be concentrated on transcriptome studies, with the majority of work coming from France and Italy. Numerous works comparing ovine and bovine can be found in the literature (Singh et al. 2013) dairy sheep milking ability (Dhorne‐Pollet et al. 2012) and comparison of dairy ewe mammary gland transcriptomes by their respective breed (Suárez-Vega et al. 2016).

Conclusion Animal proteomics is a science as well as a technology that is still in its infancy, but it holds a lot of promise for future advances in promoting animal welfare and their productive capacity. Proteomics has continued to make great advances since the past few years, including the emergence of what will undoubtedly be a massive and long-term influx of information about proteins and their functions within biological systems. The proteomic methods are quick, efficient, and capture a larger portion of the proteome. Incorporation of these methods in animal sciences research has improved protein purification and their quantification, characterization, as well as sequence, structural and bioinformatics analysis, resulting in discovery of a huge variety of proteins in various types of animals. Furthermore, a wide range of the cell-state-specific proteins have also been identified for various livestock animals, thereby producing protein expression databases that cover a wide range of cell types. Proteomics has the promise to allow researchers to decipher the intricate

74

Asif Nadeem and Maryam Javed

interplay of a wide range of proteins participating in various biological functions, as well as to detect and quantify the complicated network of proteins involved in cellular reaction to the surroundings. This information can help in identification of proteins participating in various physiological processes in animals which can help in promotion of their wellbeing and productivity.

References Aj, J. A., W. A. Colburn, V. G. DeGruttola, D. L. DeMets, G. J. Downing, D. F. Hoth, J. A. Oates, C. C. Peck, R. T. Schooley, and B. A. Spilker. 2001. “Biomarkers and Surrogate Endpoints: Preferred Definitions and Conceptual Framework.” Clin. Pharmacol. Ther 69: 89–95. https://doi.org/10.1067/mcp.2001.113989. Anderson, Johnathon D., Henrik J. Johansson, Calvin S. Graham, Mattias Vesterlund, Missy T. Pham, Charles S. Bramlett, Elizabeth N. Montgomery et al. “Comprehensive proteomic analysis of mesenchymal stem cell exosomes reveals modulation of angiogenesis via nuclear factor‐kappaB signaling.” Stem cells 34, no. 3 (2016): 601613. https://doi.org/10.1002/stem.2298. Bensimon, Ariel, Albert J. R. Heck, and Ruedi Aebersold. “Mass spectrometry–based proteomics and network biology.” Annual review of biochemistry 81 (2012): 379-405. https://doi.org/10.1146/annurev-biochem-072909-100424. Boehmer, Jamie L. “Proteomic analyses of host and pathogen responses during bovine mastitis.” Journal of mammary gland biology and neoplasia 16, no. 4 (2011): 323338. https://doi.org/10.1007/s10911-011-9229-x. Chen, Q., X. Ke, J. S. Zhang, S. Y. Lai, F. Fang, W. M. Mo, and Y. P. Ren. “Proteomics method to quantify the percentage of cow, goat, and sheep milks in raw materials for dairy products.” Journal of dairy science 99, no. 12 (2016): 9483-9492. https://doi.org/10.3168/jds.2015-10739. Dhorne‐Pollet, Sophie, Christèle Robert‐Granié, Marie-Rose Aurel, and Christel Marie‐ Etancelin. “A functional genomic approach to the study of the milking ability in dairy sheep.” Animal genetics 43, no. 2 (2012): 199-209. https://doi.org/10.1111/ j.1365-2052.2011.02237.x. Dong, Shu-Wei, Shi-Dong Zhang, Dong-Sheng Wang, Hui Wang, Xiao-Fei Shang, Ping Yan, Zuo-Ting Yan, and Zhi-Qiang Yang. “Comparative proteomics analysis provide novel insight into laminitis in Chinese Holstein cows.” BMC veterinary research 11, no. 1 (2015): 1-9. https://doi.org/10.1186/s12917-015-0474-x. Duarte, Carla M., Paulo P. Freitas, and Ricardo Bexiga. “Technological advances in bovine mastitis diagnosis: an overview.” Journal of veterinary diagnostic investigation 27, no. 6 (2015): 665-672. https://doi.org/10.1177/1040638715603087. Galvani, Marina, Mahmoud Hamdan, and Pier Giorgio Righetti. “Two‐dimensional gel electrophoresis/matrix‐assisted laser desorption/ionisation mass spectrometry of commercial bovine milk.” Rapid communications in mass spectrometry 15, no. 4 (2001): 258-264. https://doi.org/10.1002/rcm.220.

Animal Proteomics

75

Ge, Lin-Quan, Yao Cheng, Jin-Cai Wu, and Gary C. Jahn. “Proteomic analysis of insecticide triazophos-induced mating-responsive proteins of Nilaparvata lugens Stal (Hemiptera: Delphacidae).” Journal of proteome research 10, no. 10 (2011): 45974612. https://doi.org/10.1021/pr200414g. Gupta, Nitin, Jamal Benhamida, Vipul Bhargava, Daniel Goodman, Elisabeth Kain, Ian Kerman, Ngan Nguyen et al. “Comparative proteogenomics: combining mass spectrometry and comparative genomics to analyze multiple genomes.” Genome research 18, no. 7 (2008): 1133-1142. https://doi.org/10.1101/gr.074344.107. Hauck, Stefanie M., Marlen F. Lepper, Michael Hertl, Walter Sekundo, and Cornelia A. Deeg. “Proteome dynamics in biobanked horse peripheral blood derived lymphocytes (PBL) with induced autoimmune uveitis.” Proteomics 17, no. 19 (2017): 1700013. https://doi.org/10.1002/pmic.201700013. Hogarth, Caroline J., Julie L. Fitzpatrick, Andrea M. Nolan, Fiona J. Young, Andrew Pitt, and P. David Eckersall. “Differential protein composition of bovine whey: a comparison of whey from healthy animals and from those with clinical mastitis.” Proteomics 4, no. 7 (2004): 2094-2100. https://doi.org/10.1002/pmic.200300723. Hogeveen, Henk, K. Huijps, and T. J. G. M. Lam. “Economic aspects of mastitis: new developments.” New Zealand veterinary journal 59, no. 1 (2011): 16-23. https://doi.org/10.1080/00480169.2011.547165. Hood, Leroy, and Lee Rowen. “The human genome project: big science transforms biology and medicine.” Genome medicine 5, no. 9 (2013): 1-8. https://doi.org/ 10.1186/gm483. James, Peter. “Protein identification in the post-genome era: the rapid rise of proteomics.” Quarterly reviews of biophysics 30, no. 4 (1997): 279-331. https://doi.org/10.1017/ S0033583597003399. Jensen, Ole Nørregaard. “Modification-specific proteomics: characterization of posttranslational modifications by mass spectrometry.” Current opinion in chemical biology 8, no. 1 (2004): 33-41. https://doi.org/10.1016/j.cbpa.2003.12.009. Kuhla, Björn, and Klaus L. Ingvartsen. “Proteomics and the Characterization of Fatty Liver Metabolism in Early Lactation Dairy Cows.” In Proteomics in Domestic Animals: from Farm to Systems Biology, pp. 219-231. Springer, Cham, 2018. https://doi.org/10.1007/978-3-319-69682-9_11. Nardiello, Donatella, Anna Natale, Carmen Palermo, Maurizio Quinto, and Diego Centonze. “Milk authenticity by ion-trap proteomics following multi-enzyme digestion.” Food chemistry 244 (2018): 317-323. https://doi.org/10.1016/ j.foodchem.2017.10.052. Olsen, Jesper V., Blagoy Blagoev, Florian Gnad, Boris Macek, Chanchal Kumar, Peter Mortensen, and Matthias Mann. “Global, in vivo, and site-specific phosphorylation dynamics in signaling networks.” Cell 127, no. 3 (2006): 635-648. https://doi.org/10.1016/j.cell.2006.09.026. Pazos, Manuel, Lucía Méndez, Manuel Vázquez, and Santiago P. Aubourg. “Proteomics analysis in frozen horse mackerel previously high-pressure processed.” Food chemistry 185 (2015): 495-502. https://doi.org/10.1016/j.foodchem.2015.03.144. Peng, Lifeng, Pisana Rawson, Danyl McLauchlan, Klaus Lehnert, Russell Snell, and T. William Jordan. “Proteomic analysis of microsomes from lactating bovine mammary

76

Asif Nadeem and Maryam Javed

gland.” Journal of proteome research 7, no. 4 (2008): 1427-1432. https://doi.org/10. 1021/pr700819b. Singh, Mini, Peter C. Thomson, Paul A. Sheehy, and Herman W. Raadsma. “Comparative transcriptome analyses reveal conserved and distinct mechanisms in ovine and bovine lactation.” Functional & integrative genomics 13, no. 1 (2013): 115-131. https://doi.org/10.1007/s10142-012-0307-y. Smith A., Life H. Monthly Archives: August 2014. Smolenski, Grant, Stephen Haines, Fiona Y. S. Kwan, Jude Bond, Vicki Farr, Stephen R. Davis, Kerst Stelwagen, and Thomas T. Wheeler. “Characterisation of host defence proteins in milk using a proteomic approach.” Journal of proteome research 6, no. 1 (2007): 207-215. https://doi.org/10.1021/pr0603405. Smolenski, Grant A., Marita K. Broadhurst, Kerst Stelwagen, Brendan J. Haigh, and Thomas T. Wheeler. “Host defence related responses in bovine milk during an experimentally induced Streptococcus uberis infection.” Proteome science 12, no. 1 (2014): 1-14. https://doi.org/10.1186/1477-5956-12-19. Suárez-Vega, Aroa, Beatriz Gutiérrez-Gil, Christophe Klopp, Gwenola Tosser-Klopp, and Juan-José Arranz. “Comprehensive RNA-Seq profiling to evaluate lactating sheep mammary gland transcriptome.” Scientific data 3, no. 1 (2016): 1-11. https://doi.org/10.1038/sdata.2016.51. Thomas, Funmilola Clara, Mary Waterston, Peter Hastie, Timothy Parkin, Hayley Haining, and Peter David Eckersall. “The major acute phase proteins of bovine milk in a commercial dairy herd.” BMC veterinary research 11, no. 1 (2015): 1-10. https://doi.org/10.1186/s12917-015-0533-3. Verma, Aparna, and Kiran Ambatipudi. “Challenges and opportunities of bovine milk analysis by mass spectrometry.” Clinical proteomics 13, no. 1 (2016): 1-13. https://doi.org/10.1186/s12014-016-9110-4. Weston, Andrea D., and Leroy Hood. “Systems biology, proteomics, and the future of health care: toward predictive, preventative, and personalized medicine.” Journal of proteome research 3, no. 2 (2004): 179-196. https://doi.org/10.1021/pr0499693. Wheeler, T. T., G. A. Smolenski, D. P. Harris, S. K. Gupta, B. J. Haigh, M. K. Broadhurst, A. J. Molenaar, and K. Stelwagen. “Host-defence-related proteins in cows’ milk.” Animal 6, no. 3 (2012): 415-422. https://doi.org/10.1017/S175173 1111002151. Williams, Keith L., and Bradley J. Walsh. “APAF: the Australian proteome analysis facility.” Australasian Biotechnology 6 (1996): 178-180. Wilson, David H., David M. Rissin, Cheuk W. Kan, David R. Fournier, Tomasz Piech, Todd G. Campbell, Raymond E. Meyer et al. “The Simoa HD-1 analyzer: a novel fully automated digital immunoassay analyzer with single-molecule sensitivity and multiplexing.” Journal of laboratory automation 21, no. 4 (2016): 533-547. https://doi.org/10.1177/2211068215589580.

Chapter 5

Metabolomics Abstract Metabolomics is a recent method of metabolome research that involves identifying, quantifying, and characterizing a vast amount of molecules in biological materials. Mass spectrometry (MS) and Nuclear magnetic resonance or NMR spectroscopy combined with robust chemometric softwares enable the concurrent detection of huge number of biochemical molecules and also compares them, resulting in an increase in biomolecular chemistry research in organisms such as animals, plants, and bacteria. Metabolomics will become more widely utilized in future by the continuous improvements in these analytical tools, which will enable deeper integration of biomolecular research into various biological sciences. Alterations in the metabolome are thought to better represent the cellular activities at functional levels since they are a ‘downstream’ effect of the expression of gene(s). Metabolomics supplements other omics techniques like genomics, transcriptomics and proteomics. Among the omics technologies, metabolomics research has numerous promising implications in a variety of sectors like medical research, as well as animal, plant and microbial sciences. The application of metabolomics to agriculture and animal production would be a crucial component of “next generation phenotyping” methodologies necessary for refining and improving trait characterization and, as a result, improve predictions of the animal and plant breeding values for meeting traditional and innovative selection program aims.

Keywords: metabolome, MS, NMR, chemometric softwares, omics techniques, agriculture and animal production

Introduction Metabolome is a term which is first introduced by Oliver et al. in 1998. Metabolome is a set of small organic molecules. These molecules have small molecular mass, produced by various different biochemical pathways, and are found in biological media. Metabolites are the final products of complex

78

Asif Nadeem and Maryam Javed

processes which are occurring inside and outside of the cell or living organism. Inside the cell means within the genome and outside the cell or organism means in the environment. Metabolomics can be defined as the comprehensive measurement of metabolites. As a consequence, interactions between genes and the environment can be studied via metabolomics. In addition, complete detail of the phenotype can be studied with the help of metabolomics (Menezes et al. 2018) (Monteiro et al. 2013). Another related term called metabotype is defined as the metabolic readout of the phenotype (Fontanesi 2016). To begin, metabolomics is proposed as a method of functional genomics; metabolomics is defined as the study of cells, tissues, and body fluids through the examination of their metabolites. Metabolites are the completed byproducts of biological activity. An organism which creates a series of metabolites consist of its ‘metabolome’ (Fiehn, 2002) (Denkert et al. 2006) (Qi et al. 2014). Adamski et al., reported that we can identify, characterize and quantify metabolites from a biological sample in one single experiment with the help of metabolomics (Adamski and Suhre 2013). Metabolomics is an interdisciplinary field. For getting raw chemical data, analytical chemistry is used and for data interpretation and data mining, other fields like biochemistry, biostatistics and bioinformatics are used.

Metabolomics Related Definitions Metabolome The term “metabolome” refers to the entire set of low-molecular-weight metabolites found in biological samples. Gene expression produces these metabolites as a byproduct. Metabolome Mapping Identification of metabolite is said to be metabolite mapping. Metabolomics The study of all the metabolites which are present in a biological sample is called metabolomics. In metabolomics, identification and quantification of metabolites is done/performed.

Metabolomics

79

Metabolic Profiling Each metabolite is associated with a unique metabolic pathway. The identification of this metabolite which is related to a specific metabolic pathway is known as metabolic profiling. Metabolic/Metabolomics Fingerprinting In metabolomics fingerprinting, high throughput and rapid analysis of sample extracts is performed for sample screening. In metabolomics fingerprinting, quantification and identification of metabolites is not done. Metabolic/Metabolite Target Profiling The study of one or several metabolites which are related to metabolic reaction is called metabolic target analysis. This analysis is qualitative as well as quantitative. Untargeted Metabolic Analysis The comparative analysis between treatment and control groups is carried out in untargeted metabolic profiling. Metabonomics It is the study of metabolic response of living cells to physiological and pathological stimuli or genetic modification. Metabolomics Technologies The analysis of complete metabolome of human by using a single analytical tool cannot be accomplished because of the wide variety of metabolites. Optical and non-optical spectroscopy are the two main techniques which are used in metabolomics researches (Dunn et al. 2005). These techniques have their own benefits and disadvantages concerning quantitative and qualitative metabolomics analysis. Classification of Metabolites There are two main types of metabolites: endogenous and xenobiotics metabolites. Endogenous metabolites are those that are created directly by the organism, whereas xenobiotics are those metabolites that are not produced by the organism and are obtained from a foreign molecule. Foreign molecule can be environmental element such as pollutants, drug metabolites or anything else which could be transformed in the body of the organism to become a metabolite. Endogenous metabolites are further sub-divided into primary and

80

Asif Nadeem and Maryam Javed

secondary metabolites. Amino acids, sugar phosphates, organic acids, and nucleotides are examples of primary metabolites. Secondary metabolites are basically derived from primary metabolites. Lipids, small hormones, and phytochemicals are included in secondary metabolites. From the above classification, we can say that a metabolite does not directly come from gene expression whether it is a metabolism-originated organic compound (Junot et al. 2014). Endogenous and exogenous small molecules which have a molecular weight less than 1500 Da are called metabolites and targeted and non-targeted analysis of these metabolites is called metabolomics.

Different Analytical Techniques In metabolomics, the most commonly used techniques are gas chromatography-mass spectrometry (GC-MS) liquid chromatography-mass spectrometry (LC-MS) (Wilson et al. 2005) (Want et al. 2005) and nuclear magnetic resonance (NMR) spectroscopy (Robertson 2005) (Moore et al. 2007). For low molecular weight metabolites analysis, capillary electrophoresis-mass spectrometry (Moore et al. 2007) and fourier transformion cyclotron resonance (FT-ICR) mass spectrometry (Brown et al. 2005) have also provoked interest. For the measurement of a large number of metabolites, different analytical approaches like high-performance liquid-phase chromatography (HPLC), mass spectrometry (MS) and nuclear magnetic resonance (NMR) can be used. Quantification and chemical identification of metabolites can be obtained with the help of these techniques (Fontanesi 2016). Various techniques have used in metabolomics. Each method has its own set of benefits and drawbacks. Optical spectroscopy and NMR spectroscopy are based on pattern recognition and are used in screening and general diagnosis (Petricoin III et al. 2002). Though NMR spectroscopy is time consuming, costly and insensitive method, but it can handle human seminal plasma metabolites. Besides this, instruments which are used in NMR spectroscopy require trained personnel. Near-infrared (NIR) spectroscopy and Fourier transform-infrared (FT-IR) are alternative techniques which are precise, fast and cheap for metabolomics analysis. Metabolomics in Different Areas of Research Analytical chemistry and metabolite data analysis techniques are improving, making metabolomics more accessible to a wider range of research fields. Metabolomics is widely employed in a variety of research fields, including food and nutritional studies, as well as biological research (for the discovery of biomarkers to disease-related research), environmental monitoring and crop

Metabolomics

81

characterization (Moore et al. 2007) (Wishart 2008) (Kim et al. 2016) (Jalali et al. 2016). In 1990, there were just two papers published on metabolomics. After that period, more than 2400 studies have been done on metabolomics till the year of 2015. The application of metabolomics is widely used in different areas of agriculture research like pesticide monitoring, crop trait selection and crop breeding (Simó et al. 2014) (Sumner et al. 2015) (Mahdavi et al. 2015) (Mahdavi et al. 2016). Metabolomics is a promising tool for discovery of biomarkers (Nicholson and Lindon 2008). It has been used in assessing responses against toxicology, drug discovery, diabetes and natural product discovery, environmental stress, comparing different growth stages and comparing mutants (Wang et al. 2011) (Zhang et al. 2010) (Sreekumar et al. 2009). Metabolites are widely used for clinical diagnosis and for therapeutic intervention and are assessed as indicators of pathological and normal biological processes (Yanes et al. 2011) (Cao et al. 2011) (Kim et al. 2010). Interaction of metabolites with other metabolites is being investigated with the help of metabolomics. In addition, allosteric regulation is also studied in the field of metabolomics. Allosteric regulation is defined as a regulatory role of metabolites with the interaction of proteins, transcripts and genes.

Metabolomics and Male Fertility Infertility in males affects half of the couples. Many questions remain to be addressed regarding male fertility instead of numerous years of investigation on infertility. In this view, it has been suggested that metabolomics is a novel field of omics to be applied for infertility problems in males. For the demonstration of mixtures of metabolites, a number of terms have been established which have an association with metabolite quantity and quality. Investigations in metabolomics have been performed with the hope of classifying specific biomarkers for better detection of male infertility (Minai‐ Tehrani et al. 2016). Metabolomics and Livestock Genomics The use of metabolomics in the area of livestock is somehow less common but its use in livestock is very crucial. The noninvasive detection of complex phenotypic changes, dietary responses and innate phenotypic propensities is possible just because of the power of metabolomics and this thing makes metabolomics a perfect tool for livestock breeding and research (Fiehn 2002) (Houle et al. 2010) (Duggan et al. 2011) (Jones et al. 2012) (May et al. 2013) (Gilany et al. 2014) (Minai‐Tehrani et al. 2016). At a recent time, there have

82

Asif Nadeem and Maryam Javed

been a lot of papers published in metabolomics of livestock. Published papers showing how metabotyping can help livestock researchers, veterinarians, farmers and the livestock industry. For example, metabolomics can be helpful for the prediction of residual feed intake (RFI) and feed efficiency (Karisa et al. 2014), evaluation of dietary responses to various feeds (Saleem et al. 2012) (Abarghuei et al. 2014), discovery of disease tendency (Hailemariam et al. 2014) (LeBlanc et al. 2005) (Sundekilde et al. 2013), assessment of fertility (Chapinal et al. 2012), quality of the milk (Melzer et al. 2013a) (Melzer et al. 2013b), assessment of contents of bioproduct (Castejón et al. 2015) and in many other important breeding traits associated with livestock. In breeding of animals, external phenotypes like milk production, growth rate and fat deposition have great economic importance. Thus, it is necessary to understand metabolomics to acquire useful information regarding these phenotypes. So that it could be helpful economically in animal production. Traits that could not be investigated by using conventional approaches can be studied by metabotypes. In addition to these external phenotypes, more complex phenotypes that are difficult to be measured can be studied by metabotypes. Similarly, for complete dissection of complex traits, other omics related fields like transcriptomics and proteomics can be integrated with metabolomics (Fontanesi 2016). Metabolites are very close to the genetic makeup of animals and in this way they can be considered molecular phenotypes or internal phenotypes. Metabolomics can get advantages by studying biofluids or tissues from animals. Biofluids or tissues include muscle, milk in dairy species or other tissues collected after slaughtering. These types of samples cannot be easily acquired from humans. In addition, during the sampling and preparation of samples, standard operating protocols are more strenuous to design for metabolomic analysis (Fontanesi 2016). Population based mGWAS have been described in dairy cattle milk by using untargeted metabolomics and in pigs with the help of targeted metabolomics (Fontanesi et al. 2014) (Fontanesi et al. 2015). A small number of people were examined in this mGWAS but significant markers were reported. In pigs, the levels of various nutrients which are circulating in plasma were associated with the particular genes. This phenomenon describes a significant percentage of the genetic variability of these metabotypes, opening up new avenues for nutrigenomics research. In Holstein population, mGWAS was performed on milk for eight metabolites. In addition, 14 metabolites were linked to 21 chromosome-wide significant correlations (Buitenhuis et al. 2013).

Metabolomics

83

In another study, 248 samples were collected from animals for studying glycerophosphocholine, phosphocholine and ratio of both metabolites in milk. The research is done for the identification of variations in a gene (apolipoprotein receptor B) on chromosome 25 that have been linked with glycerophosphocholine and the ratio of both metabolites (Tetens et al. 2015). Metabolomics monitors changes in cellular function that may be most apparent at small molecule metabolism level and can give a coherent view of the response of biological systems to various environmental and genetic effects (German et al. 2005) (Orešič et al. 2006). In animals, metabolomics gives a viewpoint on the molecular pathogenesis of disease and offer a complete perspective of a sick animal.

Biomarker Discovery in Animals and Metabolomics Essentially, for the identification of biomarkers of specific disease states, readily accessible metabolites are available (Serkova and Niemann 2006). However, in experiments of metabolomics, the effect of biological and analytical influence on composition of body fluid and tissue requires to be cautiously evaluated (Stanley et al. 2005) (Bollard et al. 2005) (Wang et al. 2006) (Teahan et al. 2006) (Gu et al. 2007). Metabolites which have low molecular weight require a variety of diagnostic platforms for identification, detection and quantification. Techniques which are used for this purpose must be robust and sensitive and have the ability to get data on metabolite profiles. The most comprehensive overview of metabolite composition for biomarkers found in body fluids like plasma and urine is difficult to achieve and necessitates a comprehensive metabolite analysis and data processing technique. The thorough overview of metabolite composition is required for the finding of biomarkers in body fluids such as urine and plasma, and a cohesive method for data processing and metabolite analysis is required (Dunn et al. 2005). Preferably, metabolomic analysis must be able to detect each individual metabolite, not specific molecules. For this purpose, multiple analytical techniques should be used. Although metabolomics is a relatively new field, its approaches have been used in botany to identify the metabolic variations produced by fluctuations in the function of the gene (Weckwerth 2003) (Schauer and Fernie 2006), to explore drug toxicity mechanism (Nicholson et al. 2002) (Lindon et al. 2003) and to explore the metabolism of microorganisms (Kell 2004). In addition, metabolomics help in the evaluation of pathological processes in animal model of human diseases (Wang et al. 2004) (Major et al. 2006) (Griffin 2006) and

84

Asif Nadeem and Maryam Javed

in diagnostic applications (Brindle et al. 2002) (Lamers et al. 2005) (Kenny et al. 2005).

Canine Hepatology and Metabolomics For characterization of metabolic disturbances in liver disease of dog, metabolomic analysis has been used (Whitfield et al. 2005). Additionally, diagnosis and progression of portvascular abnormalities in dogs can also be achieved by using metabolomics. In one study, LCMS was used for the examination of plasma metabolite profile from dogs of three groups having hepatic and non-hepatic disorders and congenital portvascular abnormalities. Then, the comparison of plasma metabolite profiles between three groups was done by using multivariate data analysis. Metabolites like cholic acid, chenodeoxycholic acid and taurine conjugates of the bile acids were most remarkably increased while 16:0-, 18:2- and 18:0-lysophosphatidylcholine were reduced in these three groups of dogs. In conventional laboratory methods, the analysis only had told about affected animals but in metabolomics techniques, discrimination was done between affected and control groups of dogs. It also distinguished between acquired hepatic disorders and congenital portovascular anomalies (Moore et al. 2007). Use of Anabolic Steroids in Cattle and Metabolomics European Union and the U.S. have prohibited the use of corticosteroids, sexual steroids and beta-agonists as growth promoting factors in veal calves. The reason is that these agents are harmful for both consumer and treated animals. As these compounds are used in low dosages, the presence of these agents in matrices remains unnoticed. Therefore, it is essential to evolve new methods for the detection of these metabolites which have low molecular weights. A metabolomics strategy was used for the examination of metabolic responses of cattle against steroid treatment (Dumas et al. 2005). Dumas et al. administered a wide range of steroids in Hereford steers and collected urine samples at several time intervals. NMR spectroscopy was used for analyzing metabolites of diagnostic interest. Creatine, creatinine, dimethylamine, citrate and trimethylamine-N-oxide were detected in urine samples. These all are involved in nitrogen metabolism and indicate a coordinated response to anabolic steroids. Genetically Modified Crops and Metabolomics For understanding the response of organisms against environmental and genetic changes, omics technologies are important tools (Valdés et al. 2013).

Metabolomics

85

For genetically modified organism analysis, metabolomics has the ability to give new dimensions. Analytical techniques namely mass spectrometry (MS) and nuclear magnetic resonance (NMR) are well established analytical techniques which are used in fingerprinting analyses of plants (Seger and Sturm 2007) (Hegeman 2010). In NMR technique, medium to high abundance metabolites can be detected and limited preparation of the sample is required (Eisenreich and Bacher 2007) (Pan and Raftery 2007). In addition, when the analysis of complex mixture of metabolites in plant is done, MS-based approaches is used because it has higher sensitivity than NMR. Analytical performance, selectivity and sensitivity can be improved by using high and ultra-high resolution mass spectrometers (García‐Cañas et al. 2011) (Herrero et al. 2012).

Asthenozoospermia and Metabolomics Fingerprinting Asthenozoospermia is defined as a complete absence or reduced motility of sperms. The molecular weight of metabolites is almost 2000 Da. Molecules which are included in metabolites are sugars, oligonucleotides, peptides, alkaloids, steroids, lipids, amino acids, amines, aldehydes, pollutants, toxins, food additives, minerals, vitamins, alkaloids and drugs (Weiss and Kim 2012) (Wishart et al. 2012). Presently, in the human metabolome database, there are almost 40,000 metabolites present (Wishart et al. 2012). Raman spectroscopy is used to describe abnormal sperms morphology. Raman spectroscopy combined with chemometrics identifies changes in asthenozospermic against normozoospermic men. Instead of using invasive testicular sperm extraction method, Raman spectroscopy can be used for the identification of azoospermia with great ease. Nuclear Magnetic Resonance Metabonomics and Somatic Cell Count (SCC) in Cattle Milk SCC is linked with changes in milk metabolites. SCC is an indicator of infection of mastitis. Nuclear magnetic resonance (NMR) technique is used for the analysis of differences in SCC by collecting milk from individual cows. In mastitic milk, different compounds were reported in one research by using GC-MS technique. These compounds were increased in milk due to different pathogens (Hettinga et al. 2009). In cow milk, small molecules have been identified by using NMR that may be involved in the coagulation process because the different amounts of lactose, citric acid, carnitine, and choline in milk are closely related to

86

Asif Nadeem and Maryam Javed

coagulation properties (Sundekilde et al. 2011). Additionally, for assessing metabolic status of cows with association to ketosis, GC-MS in combination with NMR has been used (Klein et al. 2012). NMR technique discovers that amount of eight metabolic compounds in cow’s milk have a significant association with low or high SCC. Elevated SCC and whey proteins are responsible for the change in milk composition (Verdi et al. 1987). In one such research, when SCC was increased, then multiple metabolites were also seen to be affected. The reason is that metabolites were being transported from bacteria or secreted by somatic cells, from epithelial cells due to changed metabolism or from the blood. In addition, this study reported new indicators in cow’s milk that can be used in the diagnosis of mastitis infection, or in the discrimination of milk quality. Moreover, novel metabolites BHBA, hippurate, fumarate, butyrate and isoleucine were found to be associated with SCC.

Milk Metabolite Profiles and Milk Traits of Holstein Cows In dairy cattle research, milk quantity and quality are screened by monitoring traditional milk traits. Besides these standard tests, milk metabolites can be analyzed in a highly efficient manner that can serve as candidate biomarkers. In one research, 1305 Holstein cows were studied for 14 milk traits and 190 milk metabolites. A group of metabolites showed negative correlations to pH and lactose and another group of metabolites showed positive correlations to casein protein present in milk. In addition, lactic acid, uracil and 9 other metabolites were also detected in milk for somatic cell score (Melzer et al. 2013a). In mastitis condition, lactic acid already is reported as a biomarker in cattle. Analysis of milk traits with the help of set of metabolites is a unique perspective. Grain Diets Effect on Rumen Health and Metabolomics Dairy cows have a high tendency of disorders of metabolism due to high grain dietsbut, the mechanism is not clear as to how the grain diet causes disease. For understanding the mechanism of how grain diet affects rumen health and leads to metabolic diseases in dairy cow, a quantitative and comprehensive analysis of metabolism was done. In this analysis, 8 dairy cows fed on 4 various grain diets were included. Dairy cows were fed upon different amounts of barley grain having 0, 15, 30 and 45% concentration. Fluid samples from rumen were collected. Total 93 metabolites were identified and quantified by using GC-MS, direct flow injection tandem mass spectroscopy and NMR spectroscopy. Rumen metabolites arising from 45% barley grain diet were

Metabolomics

87

different from diet having 0, 15 and 30% grain. In addition, various inflammatory, unnatural and toxic compounds like short-chain fatty acids, ethanolamine, methylamines and putrescine were detected in high concentrations in rumen fluid due to grain diets of 30% and 45%. A change in various amino acid concentrations (phenylacetylglycine, valine, arginine, leucine, lysine, ornithine and phenylalaninie) was also detected (Saleem et al. 2012). This study shows that metabolomic approach is quite helpful for a detailed understanding of dietary effects on rumen fluid and metabolic causes and effects.

Genetic Modulation of Mammalian Growth and Metabolomics Metabolites are small molecules that are processed by transporter proteins and enzymes of the body (Suhre and Gieger 2012). In the genomic blueprint, metabolomics comes after the transcriptome and proteome. It is the third level of phenotypic expression. Metabolites are very closer to the final classical phenotypes which are regularly measured in animals. Hence metabolites can be called intermediate phenotypes. One thing which is most important is that conventionally measured phenotypes provide not more specific picture as metabolic phenotype does. The reason is that classical phenotypes are the result of a large number of physiological processes. In livestock production, metabolomics is used for various non-genetic practices like detection of product origin of food, control of oocyte and embryo quality and control of drug abuse (Kühn 2012). For the integration of metabolomics, appropriate analytical methods and genetic variance are necessary. Heritabilities of metabolites are established by large scale MS or NMR metabolome at low to medium level (Buitenhuis et al. 2013) (Wittenburg et al. 2013). In one research, the sex-specific architecture of the human metabolome was described by Mittelstrass et al. in 2011. In 2010, Chan et al. reported genotypeenvironmental interaction of metabolomic profiles in Arabidopsis. There are metabolomic differences described among control pigs and fetal pigs during late gestation. Milk metabolomics has been studied in cattle due to easy access to milk samples. For the prediction of carcass traits in pigs, blood plasma metabolomics is applied (Rohart et al. 2012). In conclusion, rising metabolomics can give new visions in livestock species and their complete molecular background of complex traits with the help of genomic and conventional phenotypic information.

88

Asif Nadeem and Maryam Javed

The Human Metabolome Database (HMDB) The most comprehensive and complete collection of human metabolism data is the Human Metabolome Database (HMDB). This database contains information regarding metabolites collected from thousands of journal articles, books and database etc. This database contains information regarding 2180 endogenous metabolites. Furthermore, HMDB also carries metabolite concentration data which is obtained by applying NMR and MS on blood, cerebrospinal fluid and urine samples. In HMDB, each metabolite entry carries approximately 90 separate data files including each metabolite name, reference NMR, synonyms, MS spectra, physiochemical data, structural information, disease associations, biofluid concentrations, gene sequence data, enzyme data, pathway information, SNP and mutation data and other public databases. In addition, data browsing tools, relational querying and extensive searching are also provided. The HMDB is available at: www.hmdb.ca. The HMDB is necessary for members of the metabolomics community, nutritionists, medical geneticists, physicians, clinical chemists and biochemists (Wishart et al. 2007). The Bovine Ruminal Fluid Metabolome The rumen is a special organ in domestic livestock like sheep, cattle and goats. Microbial fermentation of ingested plant material takes place in this primary site. The healthy and unhealthy condition of rumen mainly depends upon the chemical composition of ruminal fluid. In addition, for the production of good quality meat and milk, rumen health is so critical. That is why, it is necessary to understand ruminal fluid composition deeply and the effect of diet on its composition. So that, effectiveness and efficiency of veterinary and farming practices can be improved. Hence, characterization of ruminal fluid metabolome in cattle is desperately needed. For this purpose, quantification and identification of all ruminal fluid metabolites are done by direct flow injection (DFI) mass spectrometry, gas chromatography-mass spectrometry (GC-MS), lipidomics, inductively coupled plasma mass-spectroscopy (ICPMS) and NMR spectroscopy coupled with computer-aided literature mining. A total of 246 ruminal fluid metabolites, their related literature links and references, their concentrations to their known diet associations for the bovine rumen metabolome are given in Table 1. It is also freely available at http://www.rumendb.ca. Hence, metabolome coverage is enhanced by using multiple metabolomics technologies and platforms and the relative weakness and strength of these techniques is also assessed.

Metabolomics

89

Metabolomics: Beyond Biomarkers and towards Mechanisms Metabolites drive vital functions of cells such as apoptosis, signal transduction, energy production and storage. Metabolites are produced from the organism where they are present and can be derived from many microorganisms, xenobiotic, exogenous sources and dietary sources as well (Johnson et al. 2012). Metabolites can regulate embryonic stem cells pluripotency and maintain epigenetic mechanisms (Sperber et al. 2015; Ulanovskaya et al. 2013). In addition, metabolites such as S-adenosyl methionine, NAD+, acetyl-CoA and ATP can function in regulating posttranslational modifications and as co-substrates in different reactions (Wellen et al. 2009; Nakahata et al. 2008). It has also been well established that hormones and fatty acids can commune with proteins of plasma to enable their transport in blood (Gornall 1980; Richieri and Kleinfeld 1995). The role of metabolites in transduction of signals is cleared from proteins-metabolites interaction, initiating signaling cascades (Li et al. 2010; Hubbard et al. 2015). Metabolites have their influence on the environment in which they are produced. For instance, acidic metabolites, which are found in the colon, decrease the pH of the environment (Sharma et al. 2015; Louis et al. 2014) and leads to dietary carbohydrate fermentation and production of short chain fatty acids. So, metabolites have broad range of functions from physiological roles to specific functions in the cells. Identification and quantification of metabolites are done by targeted and global mass spectrometry which is the main methodology. The broadest range of metabolites can be measured with the help of untargeted metabolomics and metabolites are analyzed without a priori information. While targeted metabolomics analyze metabolites on the basis of a priori information. In contrast to untargeted metabolomics, targeted metabolomics gives selectivity and sensitivity (Ivanisevic et al. 2015). Future Perspectives of Metabolomics Metabolomic analysis can be utilized for the analysis of all possible detectable metabolites in the sample, instead of analyzing each compound separately at a given time. GC or HPLC/UPLC are high-end and high-throughput techniques that can be used for more than a hundred metabolites at a given time. Metabolomic analysis which is applied to biomedical and biochemical experiments is helpful for the discovery and validation of biomarkers. HPLC/UPLC-RPMS, HILIC-LC-MS and GC-TOF-MS are integrated techniques which provide researchers with sufficient data for data mining and multivariate analysis and offer sufficient metabolome mapping.

90

Asif Nadeem and Maryam Javed

Conclusion Nowadays, metabolomics has been used to deal with genetic questions in some species like pigs, chickens, and cattle. Applications of metabolomics in other livestock species are also expected, because of the advantages of this field for answering the specific biological questions. As the metabolomic profiles are very sensitive to environmental conditions, practical problems have to be faced during the collection of samples and during applying suitable experimental designs. Another limitation in metabolomics is that measurement and detection of a fraction of all metabolites which are present in biofluid is not possible but is possible with genomic technologies. However, innovations in this field are anticipated. New methodological developments in metabolomics in animal breeding and genetics will contribute more than any other field in near future in omics era (Moore et al. 2007). Metabolomic analyses have just originated. The clear and distinct identification of biological system metabolites is not easy. The eventual goal is to predict and understand the complex biological systems, particularly in plants by using metabolomics which is relatively cheap, precise and reliable. Hypothetically, it is possible to link metabolomics changes with enzymes which are involved, and indirectly linked to genetic changes. This type of results has not been published so far. Besides this, for computation of metabolomics data, recent approaches to mathematical modeling and data mining have not been prepared yet. There is a lot of information in metabolomics data. It is very important that publically available metabolomics databases be designed, if metabolomics profiling is to be utilized in a comprehensive way (Fiehn 2002).

References Abarghuei, M. J., Y. Rouzbehan, A. Z. M. Salem, and M. J. Zamiri. “Nitrogen balance, blood metabolites and milk fatty acid composition of dairy cows fed pomegranatepeel extract.” Livestock Science 164 (2014): 72-80. https://doi.org/10.1016/ j.livsci.2014.03.021. Adamski, Jerzy, and Karsten Suhre. “Metabolomics platforms for genome wide association studies—linking the genome to the metabolome.” Current opinion in biotechnology 24, no. 1 (2013): 39-47. https://doi.org/10.1016/j.copbio.2012.10.003. Bollard, Mary E., Elizabeth G. Stanley, John C. Lindon, Jeremy K. Nicholson, and Elaine Holmes. “NMR‐based metabonomic approaches for evaluating physiological influences on biofluid composition.” NMR in Biomedicine: An International Journal

Metabolomics

91

Devoted to the Development and Application of Magnetic Resonance In vivo 18, no. 3 (2005): 143-162. https://doi.org/10.1002/nbm.935. Brindle, Joanne T., Henrik Antti, Elaine Holmes, George Tranter, Jeremy K. Nicholson, Hugh WL Bethell, Sarah Clarke et al. “Rapid and noninvasive diagnosis of the presence and severity of coronary heart disease using 1 H-NMR-based metabonomics.” Nature medicine 8, no. 12 (2002): 1439-1445. https://doi.org/ 10.1038/nm1202-802. Brown, Stephen C., Gary Kruppa, and Jean‐Louis Dasseux. “Metabolomics applications of FT‐ICR mass spectrometry.” Mass spectrometry reviews 24, no. 2 (2005): 223-231. https://doi.org/10.1002/mas.20011. Buitenhuis, A. J., U. K. Sundekilde, N. A. Poulsen, H. C. Bertram, L. B. Larsen, and P. Sørensen. “Estimation of genetic parameters and detection of quantitative trait loci for metabolites in Danish Holstein milk.” Journal of Dairy Science 96, no. 5 (2013): 32853295. https://doi.org/10.3168/jds.2012-5914. Cao, Dong-Sheng, Bing Wang, Mao-Mao Zeng, Yi-Zeng Liang, Qing-Song Xu, LiangXiao Zhang, Hong-Dong Li, and Qian-Nan Hu. “A new strategy of exploring metabolomics data using Monte Carlo tree.” Analyst 136, no. 5 (2011): 947-954. Castejón, David, Juan Manuel García-Segura, Rosa Escudero, Antonio Herrera, and María Isabel Cambero. “Metabolomics of meat exudate: Its potential to evaluate beef meat conservation and aging.” Analytica Chimica Acta 901 (2015): 1-11. https://doi.org/10.1016/j.aca.2015.08.032. Chapinal, N., M. E. Carson, S. J. LeBlanc, K. E. Leslie, S. Godden, M. Capel, J. E. P. Santos, M. W. Overton, and T. F. Duffield. “The association of serum metabolites in the transition period with milk production and early-lactation reproductive performance.” Journal of dairy science 95, no. 3 (2012): 1301-1309. https://doi.org/10.3168/jds.2011-4724. Denkert, Carsten, Jan Budczies, Tobias Kind, Wilko Weichert, Peter Tablack, Jalid Sehouli, Silvia Niesporek, Dominique Könsgen, Manfred Dietel, and Oliver Fiehn. “Mass spectrometry–based metabolic profiling reveals different metabolite patterns in invasive ovarian carcinomas and ovarian borderline tumors.” Cancer research 66, no. 22 (2006): 10795-10804. https://doi.org/10.1158/0008-5472. Duggan, Gavin E., Dustin S. Hittel, Christoph W. Sensen, Aalim M. Weljie, Hans J. Vogel, and Jane Shearer. “Metabolomic response to exercise training in lean and diet-induced obese mice.” Journal of Applied Physiology 110, no. 5 (2011): 1311-1318. https://doi.org/10.1152/japplphysiol.00701.2010. Dumas, Marc-Emmanuel, Cecile Canlet, Joseph Vercauteren, Francois Andre, and Alain Paris. “Homeostatic signature of anabolic steroids in cattle using 1h− 13c hmbc nmr metabonomics.” Journal of proteome research 4, no. 5 (2005): 1493-1502. https://doi.org/10.1021/pr0500556. Dunn, Warwick B., Nigel J. C. Bailey, and Helen E. Johnson. “Measuring the metabolome: current analytical technologies.” Analyst 130, no. 5 (2005): 606-625. Eisenreich, Wolfgang, and Adelbert Bacher. “Advances of high-resolution NMR techniques in the structural and metabolic analysis of plant biochemistry.” Phytochemistry 68, no. 22-24 (2007): 2799-2815. https://doi.org/10.1016/ j.phytochem.2007.09.028.

92

Asif Nadeem and Maryam Javed

Fiehn, Oliver. “Metabolomics—the link between genotypes and phenotypes.” Functional genomics (2002): 155-171. https://doi.org/10.1007/978-94-010-0448-0_11. Fontanesi, Luca. “Metabolomics and livestock genomics: Insights into a phenotyping frontier and its applications in animal breeding.” Animal Frontiers 6, no. 1 (2016): 7379. https://doi.org/10.2527/af.2016-0011. Fontanesi, L., S. Bovo, G. Mazzoni, A. B. Samorè, G. Schiavo, E. Scotti, F. Fanelli et al. “Genome wide perspective of genetic variation in pig metabolism and production traits.” Manuscript 359 (2014). Fontanesi, L., Schiavo, G., Bovo, S., Mazzoni, G., Fanelli, F., Ribani, A., Utzeri, V., Luise, D., Samorè, A., Galimberti, G., editors. Abstract retrieved from the book of abstracts of the 66th annual meeting of the European Federation of Animal Science. “Book of Abstracts. (2015). García‐Cañas, Virginia, Carolina Simó, Carlos León, Elena Ibáñez, and Alejandro Cifuentes. “MS‐based analytical methodologies to characterize genetically modified crops.” Mass spectrometry reviews 30, no. 3 (2011): 396-416. https://doi.org/ 10.1002/mas.20286. German, J. Bruce, Bruce D. Hammock, and Steven M. Watkins. “Metabolomics: building on a century of biochemistry to guide human health.” Metabolomics 1, no. 1 (2005): 3-9. https://doi.org/10.1007/s11306-005-1102-8. Gilany, Kambiz, Roudabeh Sadat Moazeni‐Pourasil, Naser Jafarzadeh, and Elham Savadi‐ Shiraz. “Metabolomics fingerprinting of the human seminal plasma of asthenozoospermic patients.” Molecular Reproduction and Development 81, no. 1 (2014): 84-86. https://doi.org/10.1002/mrd.22284. Gornall, Allan G., ed. Applied biochemistry of clinical disorders. HarperCollins Publishers, 1980. Griffin, Julian L. “Understanding mouse models of disease through metabolomics.” Current opinion in chemical biology 10, no. 4 (2006): 309-315. https://doi.org/ 10.1016/j.cbpa.2006.06.027. Gu, Haiwei, Huanwen Chen, Zhengzheng Pan, Ayanna U. Jackson, Nari Talaty, Bowei Xi, Candice Kissinger et al. “Monitoring diet effects via biofluids and their implications for metabolomics studies.” Analytical chemistry 79, no. 1 (2007): 89-97. https://doi.org/10.1021/ac060946c. Hailemariam, D., R. Mandal, F. Saleem, S. M. Dunn, D. S. Wishart, and B. N. Ametaj. “Identification of predictive biomarkers of disease state in transition dairy cows.” Journal of dairy science 97, no. 5 (2014): 2680-2693. https://doi.org/10.3168/ jds.2013-6803. Hegeman, Adrian D. “Plant metabolomics—meeting the analytical challenges of comprehensive metabolite analysis.” Briefings in functional genomics 9, no. 2 (2010): 139-148. https://doi.org/10.1093/bfgp/elp053. Herrero, Miguel, Carolina Simó, Virginia García‐Cañas, Elena Ibáñez, and Alejandro Cifuentes. “Foodomics: MS‐based strategies in modern food science and nutrition.” Mass spectrometry reviews 31, no. 1 (2012): 49-69. https://doi.org/10.1002/mas. 20335. Hettinga, K. A., H. J. F. van Valenberg, T. J. G. M. Lam, and A. C. M. van Hooijdonk. “The origin of the volatile metabolites found in mastitis milk.” Veterinary

Metabolomics

93

microbiology 137, no. 3-4 (2009): 384-387. https://doi.org/10.1016/j.vetmic.2009. 01.016. Houle, David, Diddahally R. Govindaraju, and Stig Omholt. “Phenomics: the next challenge.” Nature reviews genetics 11, no. 12 (2010): 855-866. https://doi.org/ 10.1038/nrg2897. Hubbard, Troy D., Iain A. Murray, William H. Bisson, Tejas S. Lahoti, Krishne Gowda, Shantu G. Amin, Andrew D. Patterson, and Gary H. Perdew. “Adaptation of the human aryl hydrocarbon receptor to sense microbiota-derived indoles.” Scientific reports 5, no. 1 (2015): 1-13. https://doi.org/10.1038/srep12689. Ivanisevic, Julijana, Darlene Elias, Hiroshi Deguchi, Patricia M. Averell, Michael Kurczy, Caroline H. Johnson, Ralf Tautenhahn et al. “Arteriovenous blood metabolomics: a readout of intra-tissue metabostasis.” Scientific reports 5, no. 1 (2015): 1-13. https://doi.org/10.1038/srep12757. Jalali, Amir, Amir Hatamie, Tahere Saferpour, Alireza Khajeamiri, Tahere Safa, and Foad Buazar. “Impact of pharmaceutical impurities in Ecstasy tablets: gas chromatographymass spectrometry study.” Iranian journal of pharmaceutical research: IJPR 15, no. 1 (2016): 221. Johnson, Caroline H., Andrew D. Patterson, Jeffrey R. Idle, and Frank J. Gonzalez. “Xenobiotic metabolomics: major impact on the metabolome.” Annual review of pharmacology and toxicology 52 (2012): 37-56. https://doi.org/10.1146/annurevpharmtox-010611-134748. Jones, Dean P., Youngja Park, and Thomas R. Ziegler. “Nutritional metabolomics: progress in addressing complexity in diet and health.” Annual review of nutrition 32 (2012): 183-202. https://doi.org/10.1146/annurev-nutr-072610-145159. Junot, Christophe, François Fenaille, Benoit Colsch, and François Bécher. “High resolution mass spectrometry based techniques at the crossroads of metabolic pathways.” Mass spectrometry reviews 33, no. 6 (2014): 471-500. https://doi.org/10.1002/mas.21401. Karisa, B. K., J. Thomson, Z. Wang, C. Li, Y. R. Montanholi, S. P. Miller, S. S. Moore, and G. S. Plastow. “Plasma metabolites associated with residual feed intake and other productivity performance traits in beef cattle.” Livestock Science 165 (2014): 200-211. https://doi.org/10.1016/j.livsci.2014.03.002. Kell, Douglas B. “Metabolomics and systems biology: making sense of the soup.” Current opinion in microbiology 7, no. 3 (2004): 296-307. https://doi.org/ 10.1016/j.mib.2004.04.012. Kenny, Louise C., Warwick B. Dunn, David I. Ellis, Jenny Myers, Philip N. Baker, and Douglas B. Kell. “Novel biomarkers for pre-eclampsia detected using metabolomics and machine learning.” Metabolomics 1, no. 3 (2005): 227-234. https://doi.org/ 10.1007/s11306-005-0003-1. Kim, Dong-Hyun, Roger M. Jarvis, Yun Xu, Anthony W. Oliver, J. William Allwood, Lynne Hampson, Ian N. Hampson, and Royston Goodacre. “Combining metabolic fingerprinting and footprinting to understand the phenotypic response of HPV16 E6 expressing cervical carcinoma cells exposed to the HIV anti-viral drug lopinavir.” Analyst 135, no. 6 (2010): 1235-1244.

94

Asif Nadeem and Maryam Javed

Kim, Sooah, Jungyeon Kim, Eun Ju Yun, and Kyoung Heon Kim. “Food metabolomics: from farm to human.” Current Opinion in Biotechnology 37 (2016): 16-23. https://doi.org/10.1016/j.copbio.2015.09.004. Klein, Matthias S., Nina Buttchereit, Sebastian P. Miemczyk, Ann-Kathrin Immervoll, Caridad Louis, Steffi Wiedemann, Wolfgang Junge, Georg Thaller, Peter J. Oefner, and Wolfram Gronwald. “NMR metabolomic analysis of dairy cows reveals milk glycerophosphocholine to phosphocholine ratio as prognostic biomarker for risk of ketosis.” Journal of proteome research 11, no. 2 (2012): 1373-1381. https://doi.org/10.1021/pr201017n. Kühn, Christa. “Metabolomics in animal breeding.” In Genetics Meets Metabolomics, pp. 107-123. Springer, New York, NY, 2012. https://doi.org/10.1007/978-1-4614-16890_8. Lamers, R. J. A. N., J. H. J. Van Nesselrooij, V. B. Kraus, J. M. Jordan, J. B. Renner, A. D. Dragomir, G. Luta, J. Van Der Greef, and J. DeGroot. “Identification of an urinary metabolite profile associated with osteoarthritis.” Osteoarthritis and Cartilage 13, no. 9 (2005): 762-768. https://doi.org/10.1016/j.joca.2005.04.005. LeBlanc, S. J., K. E. Leslie, and T. F. Duffield. “Metabolic predictors of displaced abomasum in dairy cattle.” Journal of dairy science 88, no. 1 (2005): 159-170. https://doi.org/10.3168/jds.S0022-0302(05)72674-6. Li, Xiyan, Tara A. Gianoulis, Kevin Y. Yip, Mark Gerstein, and Michael Snyder. “Extensive in vivo metabolite-protein interactions revealed by large-scale systematic analyses.” Cell 143, no. 4 (2010): 639-650. https://doi.org/10.1016/j.cell.2010. 09.048. Lindon, John C., Jeremy K. Nicholson, Elaine Holmes, Henrik Antti, Mary E. Bollard, Hector Keun, Olaf Beckonert et al. “Contemporary issues in toxicology the role of metabonomics in toxicology and its evaluation by the COMET project.” Toxicology and applied pharmacology 187, no. 3 (2003): 137-146. https://doi.org/10.1016/ S0041-008X(02)00079-0. Louis, Petra, Georgina L. Hold, and Harry J. Flint. “The gut microbiota, bacterial metabolites and colorectal cancer.” Nature reviews microbiology 12, no. 10 (2014): 661-672. https://doi.org/10.1038/nrmicro3344. Mahdavi, Vahideh, Mahdi Moridi Farimani, Fariba Fathi, and Alireza Ghassempour. “A targeted metabolomics approach toward understanding metabolic variations in rice under pesticide stress.” Analytical biochemistry 478 (2015): 65-72. https://doi.org/10.1016/j.ab.2015.02.021. Mahdavi, Vahideh, Faezeh Ghanati, and Alireza Ghassempour. “Integrated pathway-based and network-based analysis of GC-MS rice metabolomics data under diazinon stress to infer affected biological pathways.” Analytical biochemistry 494 (2016): 31-36. https://doi.org/10.1016/j.ab.2015.10.017. Major, Hilary J., Rebecca Williams, Amy J. Wilson, and Ian D. Wilson. “A metabonomic analysis of plasma from Zucker rat strains using gas chromatography/mass spectrometry and pattern recognition.” Rapid Communications in Mass Spectrometry: An International Journal Devoted to the Rapid Dissemination of Up‐to‐the‐Minute Research in Mass Spectrometry 20, no. 22 (2006): 3295-3302. https://doi.org/10.1002/rcm.2732.

Metabolomics

95

May, Damon H., Sandi L. Navarro, Ingo Ruczinski, Jason Hogan, Yuko Ogata, Yvonne Schwarz, Lisa Levy, Ted Holzman, Martin W. McIntosh, and Johanna W. Lampe. “Metabolomic profiling of urine: response to a randomised, controlled feeding study of select fruits and vegetables, and application to an observational study.” British journal of nutrition 110, no. 10 (2013): 1760-1770. doi:10.1017/S000 711451300127X. Melzer, Nina, Dörte Wittenburg, S. Hartwig, S. Jakubowski, U. Kesting, Lothar Willmitzer, Jan Lisec, Norbert Reinsch, and Dirk Repsilber. “Investigating associations between milk metabolite profiles and milk traits of Holstein cows.” Journal of dairy science 96, no. 3 (2013): 1521-1534. https://doi.org/ 10.3168/jds.2012-5743. Melzer, Nina, Dörte Wittenburg, and Dirk Repsilber. “Integrating milk metabolite profile information for the prediction of traditional milk traits based on SNP information for Holstein cows.” PLoS One 8, no. 8 (2013): e70256. https://doi.org/10. 1371/journal.pone.0070256. Menezes, Erika, Thu Dinh, Erdogan Memili, Ana Luiza Cazaux Velho, Arlindo Alencar Moura, Einko Topper, and Abdullah Kaya. “Metabolomic markers of fertility in bull seminal plasma.” PLoS ONE 13, no. 4 (2018). Minai‐Tehrani, A., N. Jafarzadeh, and K. Gilany. “Metabolomics: a state‐of‐the‐art technology for better understanding of male infertility.” Andrologia 48, no. 6 (2016): 609-616. https://doi.org/10.1111/and.12496. Monteiro, M. S., Márcia Carvalho, M. L. Bastos, and P. Guedes de Pinho. “Metabolomics analysis for biomarker discovery: advances and challenges.” Current medicinal chemistry 20, no. 2 (2013): 257-271. Moore, Rowan E., Jennifer Kirwan, Mary K. Doherty, and Phillip D. Whitfield. “Biomarker discovery in animal health and disease: the application of post-genomic technologies.” Biomarker insights 2 (2007): 117727190700200040. https://doi.org/ 10.1177/117727190700200040. Nakahata, Yasukazu, Milota Kaluzova, Benedetto Grimaldi, Saurabh Sahar, Jun Hirayama, Danica Chen, Leonard P. Guarente, and Paolo Sassone-Corsi. “The NAD+-dependent deacetylase SIRT1 modulates CLOCK-mediated chromatin remodeling and circadian control.” Cell 134, no. 2 (2008): 329-340. https://doi.org/10.1016/j.cell.2008.07.002. Nicholson, Jeremy K., John Connelly, John C. Lindon, and Elaine Holmes. “Metabonomics: a platform for studying drug toxicity and gene function.” Nature reviews Drug discovery 1, no. 2 (2002): 153-161. https://doi.org/10.1038/nrd728. Nicholson, Jeremy K., and John C. Lindon. “Metabonomics.” Nature 455, no. 7216 (2008): 1054-1056. https://doi.org/10.1038/4551054a. Orešič, Matej, Antonio Vidal-Puig, and Virve Hänninen. “Metabolomic approaches to phenotype characterization and applications to complex diseases.” Expert review of molecular diagnostics 6, no. 4 (2006): 575-585. https://doi.org/10.1586/ 14737159.6.4.575. Pan, Zhengzheng, and Daniel Raftery. “Comparing and combining NMR spectroscopy and mass spectrometry in metabolomics.” Analytical and bioanalytical chemistry 387, no. 2 (2007): 525-527. https://doi.org/10.1007/s00216-006-0687-8.

.

96

Asif Nadeem and Maryam Javed

Petricoin III, Emanuel F., Ali M. Ardekani, Ben A. Hitt, Peter J. Levine, Vincent A. Fusaro, Seth M. Steinberg, Gordon B. Mills et al. “Use of proteomic patterns in serum to identify ovarian cancer.” The lancet 359, no. 9306 (2002): 572-577. https://doi.org/10.1016/S0140-6736(02)07746-2. Qi, Yunpeng, Yunlong Song, Haiwei Gu, Guorong Fan, and Yifeng Chai. “Global metabolic profiling using ultra-performance liquid chromatography/quadrupole timeof-flight mass spectrometry.” In Mass Spectrometry in Metabolomics, pp. 15-27. Humana Press, New York, NY, 2014. Richieri G. V., Kleinfeld A. M. 1995. Unbound free fatty acid levels in human serum. J Lipid Res. 36(2): 229-240. https://doi.org/10.1007/978-1-4939-1258-2_2. Robertson, Donald G. “Metabonomics in toxicology: a review.” Toxicological Sciences 85, no. 2 (2005): 809-822. https://doi.org/10.1093/toxsci/kfi102. Rohart, Florian, Alain Paris, Béatrice Laurent, Cecile Canlet, Jerome Molina, Marie-José Mercat, Thierry Tribout et al. “Phenotypic prediction based on metabolomic data for growing pigs from three main European breeds.” Journal of animal science 90, no. 13 (2012): 4729-4740. https://doi.org/10.2527/jas.2012-5338. Saleem, F., B. N. Ametaj, S. Bouatra, R. Mandal, Q. Zebeli, S. M. Dunn, and D. S. Wishart. “A metabolomics approach to uncover the effects of grain diets on rumen health in dairy cows.” Journal of Dairy Science 95, no. 11 (2012): 6606-6623. https://doi.org/10.3168/jds.2012-5403. Schauer, Nicolas, and Alisdair R. Fernie. “Plant metabolomics: towards biological function and mechanism.” Trends in plant science 11, no. 10 (2006): 508-516. https://doi.org/10.1016/j.tplants.2006.08.007. Seger, Christoph, and Sonja Sturm. “Analytical aspects of plant metabolite profiling platforms: current standings and future aims.” Journal of proteome research 6, no. 2 (2007): 480-497. https://doi.org/10.1021/pr0604716. Serkova, Natalie J., and Claus U. Niemann. “Pattern recognition and biomarker validation using quantitative 1H-NMR-based metabolomics.” Expert review of molecular diagnostics 6, no. 5 (2006): 717-731. https://doi.org/10.1586/14737159.6.5.717. Sharma, Mohit, Madhusudan Astekar, Sonal Soi, Bhari S. Manjunatha, Devi C. Shetty, and Raghu Radhakrishnan. “pH gradient reversal: an emerging hallmark of cancers.” Recent patents on anti-cancer drug discovery 10, no. 3 (2015): 244-258. Simó, Carolina, Clara Ibáez, Alberto Valdés, Alejandro Cifuentes, and Virginia GarcíaCañas. “Metabolomics of genetically modified crops.” International Journal of Molecular Sciences 15, no. 10 (2014): 18941-18966. https://doi.org/10.3390/ ijms151018941. Sperber, Henrik, Julie Mathieu, Yuliang Wang, Amy Ferreccio, Jennifer Hesson, Zhuojin Xu, Karin A. Fischer et al. “The metabolome regulates the epigenetic landscape during naive-to-primed human embryonic stem cell transition.” Nature cell biology 17, no. 12 (2015): 1523-1535. https://doi.org/10.1038/ncb3264. Sreekumar, Arun, Laila M. Poisson, Thekkelnaycke M. Rajendiran, Amjad P. Khan, Qi Cao, Jindan Yu, Bharathi Laxman et al. “Metabolomic profiles delineate potential role for sarcosine in prostate cancer progression.” Nature 457, no. 7231 (2009): 910-914. https://doi.org/10.1038/nature07762.

Metabolomics

97

Stanley, E. G., N. J. C. Bailey, M. E. Bollard, J. N. Haselden, C. J. Waterfield, E. Holmes, and J. K. Nicholson. “Sexual dimorphism in urinary metabolite profiles of Han Wistar rats revealed by nuclear-magnetic-resonance-based metabonomics.” Analytical biochemistry 343, no. 2 (2005): 195-202. https://doi.org/10.1016/j.ab. 2005.01.024. Suhre, Karsten, and Christian Gieger. “Genetic variation in metabolic phenotypes: study designs and applications.” Nature reviews genetics 13, no. 11 (2012): 759-769. https://doi.org/10.1038/nrg3314. Sumner, Lloyd W., Zhentian Lei, Basil J. Nikolau, and Kazuki Saito. “Modern plant metabolomics: advanced natural product gene discoveries, improved technologies, and future prospects.” Natural product reports 32, no. 2 (2015): 212-229. Sundekilde, U. K., Nina Aagaard Poulsen, Lotte Bach Larsen, and H. C. Bertram. “Nuclear magnetic resonance metabonomics reveals strong association between milk metabolites and somatic cell count in bovine milk.” Journal of Dairy Science 96, no. 1 (2013): 290-299. https://doi.org/10.3168/jds.2012-5819. Sundekilde, Ulrik Kræmer, Pernille Dorthea Frederiksen, Morten Rahr Clausen, Lotte Bach Larsen, and Hanne Christine Bertram. “Relationship between the metabolite profile and technological properties of bovine milk from two dairy breeds elucidated by NMR-based metabolomics.” Journal of agricultural and food chemistry 59, no. 13 (2011): 7360-7367. https://doi.org/10.1021/jf202057x. Teahan, Orla, Simon Gamble, Elaine Holmes, Jonathan Waxman, Jeremy K. Nicholson, Charlotte Bevan, and Hector C. Keun. “Impact of analytical bias in metabonomic studies of human blood serum and plasma.” Analytical chemistry 78, no. 13 (2006): 4307-4318. https://doi.org/10.1021/ac051972y. Tetens, Jens, Claas Heuer, Iris Heyer, Matthias S. Klein, Wolfram Gronwald, Wolfgang Junge, Peter J. Oefner, Georg Thaller, and Nina Krattenmacher. “Polymorphisms within the APOBR gene are highly associated with milk levels of prognostic ketosis biomarkers in dairy cows.” Physiological genomics 47, no. 4 (2015): 129-137. https://doi.org/10.1152/physiolgenomics.00126.2014. Ulanovskaya, Olesya A., Andrea M. Zuhl, and Benjamin F. Cravatt. “NNMT promotes epigenetic remodeling in cancer by creating a metabolic methylation sink.” Nature chemical biology 9, no. 5 (2013): 300-306. https://doi.org/10.1038/nchembio.1204. Valdés, Alberto, Carolina Simó, Clara Ibáñez, and Virginia García-Cañas. “Foodomics strategies for the analysis of transgenic foods.” TrAC Trends in Analytical Chemistry 52 (2013): 2-15. https://doi.org/10.1016/j.trac.2013.05.023. Verdi, Robert Joseph, D. M. Barbano, M. E. Dellavalle, and G. F. Senyk. “Variability in true protein, casein, nonprotein nitrogen, and proteolysis in high and low somatic cell milks.” Journal of Dairy Science 70, no. 2 (1987): 230-242. https://doi.org/ 10.3168/jds.S0022-0302(87)80002-4. Wang, Xijun, Hui Sun, Aihua Zhang, Wenjun Sun, Ping Wang, and Zhigang Wang. “Potential role of metabolomics apporoaches in the area of traditional Chinese medicine: as pillars of the bridge between Chinese and Western medicine.” Journal of pharmaceutical and biomedical analysis 55, no. 5 (2011): 859-868. https://doi.org/10.1016/j.jpba.2011.01.042.

98

Asif Nadeem and Maryam Javed

Wang, Yulan, Elaine Holmes, Jeremy K. Nicholson, Olivier Cloarec, Jacques Chollet, Marcel Tanner, Burton H. Singer, and Jürg Utzinger. “Metabonomic investigations in mice infected with Schistosoma mansoni: an approach for biomarker identification.” Proceedings of the National Academy of Sciences 101, no. 34 (2004): 12676-12681. https://doi.org/10.1073/pnas.0404878101. Wang, Yulan, Elaine Holmes, Huiru Tang, John C. Lindon, Norbert Sprenger, Marco E. Turini, Gabriela Bergonzelli, Laurent B. Fay, Sunil Kochhar, and Jeremy K. Nicholson. “Experimental metabonomic model of dietary variation and stress interactions.” Journal of proteome research 5, no. 7 (2006): 1535-1542. https://doi.org/10.1021/pr0504182. Want, Elizabeth J., Benjamin F. Cravatt, and Gary Siuzdak. “The expanding role of mass spectrometry in metabolite profiling and characterization.” Chembiochem 6, no. 11 (2005): 1941-1951. Weckwerth, Wolfram. “Metabolomics in systems biology.” Annual review of plant biology 54, no. 1 (2003): 669-689. https://doi.org/10.1146/annurev.arplant.54.031902. 135014. Weiss, Robert H., and Kyoungmi Kim. “Metabolomics in the study of kidney diseases.” Nature Reviews Nephrology 8, no. 1 (2012): 22-33. https://doi.org/10.1038/ nrneph.2011.152. Wellen, Kathryn E., Georgia Hatzivassiliou, Uma M. Sachdeva, Thi V. Bui, Justin R. Cross, and Craig B. Thompson. “ATP-citrate lyase links cellular metabolism to histone acetylation.” Science 324, no. 5930 (2009): 1076-1080. https://doi.org/ 10.1126/science.1164097. Whitfield, Phillip David, Peter-John Mantyla Noble, Hilary Major, Robert Jeffrey Beynon, Rachel Burrow, Alistair Iain Freeman, and Alexander James German. “Metabolomics as a diagnostic tool for hepatology: validation in a naturally occurring canine model.” Metabolomics 1, no. 3 (2005): 215-225. https://doi.org/10.1007/s11306-005-0001-3. Wilson, Ian D., Robert Plumb, Jennifer Granger, Hilary Major, Rebecca Williams, and Eva M. Lenz. “HPLC-MS-based methods for the study of metabonomics.” Journal of Chromatography B 817, no. 1 (2005): 67-76. https://doi.org/10.1016/ j.jchromb.2004.07.045. Wishart, David S. “Metabolomics: applications to food science and nutrition research.” Trends in food science & technology 19, no. 9 (2008): 482-493. https://doi.org/10.1016/j.tifs.2008.03.003. Wishart, David S., Timothy Jewison, An Chi Guo, Michael Wilson, Craig Knox, Yifeng Liu, Yannick Djoumbou et al. “HMDB 3.0—the human metabolome database in 2013.” Nucleic acids research 41, no. D1 (2012): D801-D807. https://doi.org/10.1093/nar/gks1065. Wishart, David S., Dan Tzur, Craig Knox, Roman Eisner, An Chi Guo, Nelson Young, Dean Cheng et al. “HMDB: the human metabolome database.” Nucleic acids research 35, no. suppl_1 (2007): D521-D526. https://doi.org/10.1093/nar/gkl923. Wittenburg, Dörte, Nina Melzer, Lothar Willmitzer, Jan Lisec, U. Kesting, Norbert Reinsch, and Dirk Repsilber. “Milk metabolites and their genetic variability.” Journal of dairy science 96, no. 4 (2013): 2557-2569. https://doi.org/10.3168/ jds.2012-5635.

Metabolomics

99

Yanes, Oscar, Ralf Tautenhahn, Gary J. Patti, and Gary Siuzdak. “Expanding coverage of the metabolome for global metabolite profiling.” Analytical chemistry 83, no. 6 (2011): 2152-2161. https://doi.org/10.1021/ac102981k. Zhang, Aihua, Hui Sun, Zhigang Wang, Wenjun Sun, Ping Wang, and Xijun Wang. “Metabolomics: towards understanding traditional Chinese medicine.” Planta medica 76, no. 17 (2010): 2026-2035. https://doi.org/10.1055/s-0030-1250542.

Chapter 6

Molecular Markers Abstract A population of sexually reproducing organisms generate hereditable genomic diversity through genetic mutations, insertions or deletions, and gene duplication events. The identification and evaluation of these genetic variations in living organisms can aid our understanding of the molecular underpinnings of a variety of biological processes. The term molecular marker refers to genetic loci that can be readily monitored and measured in populations and are associated with a specific gene or trait. Amplified fragment length polymorphism (AFLP), random amplified polymorphic DNA (RAPD), restriction fragment length polymorphism (RFLP), and simple sequence repeats (SSR), are all molecular or DNAbased marker methods that are regularly utilized in finding genetic diversity, phylogenetic and evolutionary biology studies, genome mapping and tagging of genes, as well as in forensic investigations. Reliability, low-cost, and convenience of application are all characteristics of these approaches. These strategies are well-known, and their benefits as well as drawbacks have been identified. The development and utilization of molecular markers has reached highthroughput and even ultrahigh-throughput levels thanks to a fairly recent breakthrough in the DNA sequencing technologies. In this chapter, we aim to provide an overview of several molecular markers, including the techniques that are used, their advantages and shortcomings, as well as their applicability in diagnostics and genetic studies.

Keywords: genetic loci, populations, biological processes, reliability, lowcost, ultrahigh-throughput

Introduction The markers on the maps in the first 70 years of genetic mapping were genes with variant alleles that produced detectably different phenotypes. As more information about organisms became available, vast numbers of these genes might be employed as map markers. Even in systems where the maps appeared

102

Asif Nadeem and Maryam Javed

to be “full” of established phenotypic effect loci, measurements indicated that the chromosomal spaces between genes had to contain massive quantities of DNA. As there were no markers in such areas, linkage analysis was unable to identify the gaps. Large numbers of additional genetic markers were required to fill in the gaps, allowing for a higher-resolution map to be created. The discovery of many types of molecular markers satisfies this demand. Molecular markers occur at sites of heterozygosity for silent genetic variants that are not related to any detectable phenotypic differences. In mapping analysis, a heterozygous “DNA locus” can be used in the same way as a heterozygous allele pair. Since there were no markers in such areas, linkage analysis could not map them. Large numbers of additional genetic markers were required to fill in the gaps, allowing for a higher-resolution map to be produced. The finding of numerous types of molecular markers provided a solution to this problem. Molecular markers are sites of heterozygosity for a silenced DNA variant that is not phenotypically significant. When heterozygous, the “DNA locus” can be used in the same way as a standard heterozygous allele pair. This chapter discusses the types of molecular markers.

Single Nucleotide Polymorphism (SNP) A single nucleotide polymorphism is a kind of mutation in which only one nucleotide is changed at a particular position in a genome. Degree of SNP variation in a population is >1%. Through SNPs scientist find out how body provides response against the disease. Due to a single nucleotide polymorphism, a body may respond rapidly or slowly towards progression of disease. For example, a single base change in Apo-lipoprotein E is linked with Alzheimer.

Types of SNPs Single nucleotide polymorphisms may be present in coding region or may be in non coding region of genome. There are two types of SNP’s in coding region of genome. 1. Synonymous SNP’s: In this type, structure of the protein does not change.

Molecular Markers

103

2. Non-Synonymous SNP’s: In this type,sequence of amino acid change which may cause truncated protein. It has further two types: Missense and Non-sense. Non-coding region SNP’s affect the mRNA degradation, gene splicing and non-coding RNA sequence.

Analysis of SNP’s Different methods are used for analysis of SNP’s and also to identify known single nucleotide polymorphism.     

Sequencing of DNA RFLP (Restriction Fragment Length Polymorphism) Single base extension SSCP (Single Strand Confirmation Polymorphism) Capillary Electrophoresis

Databases and Programs for SNP’s  Mutation Taster  PolyPhen  SNAP2  SuSPect  OMIM  dbSNP  SNPedia  GWAS Central Importance and Applications of Single Nucleotide Polymorphisms  SNP technique are used for haplotype mapping  In population genetics SNP’s are used for linkage disequilibrium analysis  In biomedical research it has great importance for genome wide association study  SNP’s also used in gene mapping  SNP’s also linked with the drug meatbolism and play impotant role in pharmacogentics  SNP’s are also used in forensic to identify criminal by STR technique. It can be used to identify phenotypes like eye color, hair etc.

104

Asif Nadeem and Maryam Javed

RFLP (Restriction Fragment Length Polymorphism) Botstein et al. developed Restriction fragment length polymorphisms in 1980. Restriction enzymes were used to cut the DNA for further analysis. First time Alec Jeffries used restriction fragment length polymorphism (RFLP) in 1984 to solve a case in forensic. Restriction fragment length polymorphism popularity extended in 1986 related to crime scene. In 1997 terminal restriction fragment length polymorphism was invented by Liu and coworkers. RFLP is a method used to detect mutations which are present at different sites on DNA. Restriction fragment length polymorphism detects the genetic polymorphisms at sequence level. Inherited genetic variations between persons in over one percent of normal population are called genetic polymorphisms. Restriction fragment length polymorphism is a first-class genetic marker which is used to construct linkage maps. RFLP analysis is used to identify genetic variations between two individuals on the basis of specific patterns in DNA; these specific patterns are called variable number of tandem repeats. Repeat sequence size is generally10-100bp. VNTR are also called the mini satellite DNA. Tendem repeat sequences means that sequences are repeated with one another without any other sequences among them. Most of the restriction fragment length polymorphism markers are highly locus specific and co-dominant.

Principle DNA is broken into little pieces by restriction enzymes. The restriction endonucleases cleave DNA at specific points and target specific DNA sequences, which results in production of DNA fragments of various lengths in different people. Steps for Analysis The following steps are used to analyze restriction fragment length polymorphisms: 



DNA extraction is done by using different methods like organic and inorganic or kit method. Samples used for DNA extraction are blood, saliva, hair etc. Fragmentations of DNA are performed by using endonucleases enzymes. Length of the recognition sites for these enzymes is four to six bp. Smaller the recognized sequence, larger the number of pieces made after digestion.

Molecular Markers







105

Gel electrophoresis is used for analysis of fragments. It separates the fragments on the basis of charge and size. Electrophoretic tank is used for separation of fragments. When electric current is applied, DNA fragments move from negative to positive electrode. Smaller pieces of DNA move faster than the larger pieces of DNA. Filter paper, plastic mask and Nylon membrane, gel and again filter paper are placed on each other in a sequence.Overnight prehybridization is done. Labeled probes are added after prehybridization. Only for gel electrophoresis UV trans-illuminator or Gel Doc is used for band visualization. After overnight incubation in southern blotting luminescent dyes are used for visualization.

Applications of RFLP  Restriction fragment length polymorphism is used in gene mapping and genetic diseases analysis.  It is an old method which was used for genetic fingerprinting.  Paternity testing is also done using this technique.  In animal population for determination of breeding pattern and for genetic diversity characterization RFLP are performed.  It is used in criminal investigation.  RFLP is used to detect the carrier of a disease-causing mutation in a family. Negative Aspects of RFLP Although this method has many applications but there are some disadvantages of this method.      

It is a time-consuming and tedious procedure. The method is expensive. This method has many steps and requires weeks for result analysis. So, the error chances are also increased. Large amounts of DNA samples are required for analysis This method is less sensitive as compare to other methods. Now a days RFLP method is not used. More robust techniques are used for analysis of samples in forensic and many other fields.

106

Asif Nadeem and Maryam Javed

Alternatives of RFLP Restriction fragment length polymorphisms is still used in marker supported selection. Cleaved Amplified Polymorphic Sequences (CAPS) and terminal restriction fragment length polymorphism (TRFLP) are the alternatives of RFLP.

Cleaved Amplified Polymorphic Sequence CAPS is also called PCR-RFLP. Digested sequences are also analyzed through polymerase chain reaction (PCR). PCR–RFLP analyzes more samples in less time. Allele specific oligonucleotide probe are also used for analysis. In this method there is no use of radioisotopes, which makes this analysis more cooperative in clinical setting.

Analysis Steps  Extraction of DNA using appropriate method from target samples.  Quantification of DNA and polymerase chain reaction for amplification  Restriction digestion of amplified DNA  Gel electrophoresis for analysis of the DNA fragments Applications of CAPS  CAPS markers used for gene mapping  It can also be used in forensic when sample is in low quantity Negative Aspects of CAPS  Due to the small size of the amplified fragments (300-1800 bp), CAPS polymorphisms are more difficult to discover.  Sequence data are required to create PCR primers. Terminal Restriction Fragment Length Polymorphism TRFLP is a technique in molecular biology. Initially this technique was used for characterization of bacterial populations. TRFLP has also been used for many other species or groups like fungi. In this technique PCR are performed using florescently labeled primers. After amplification, digestion is performed and fragments are visualized by using sequencer. Analysis is done by counting

Molecular Markers

107

peaks. In some characteristics this method is similar to denaturing gradient gel electrophoresis.

Analysis Steps  DNA extraction of samples which is required for analysis.  Primer designing related to 16srDNA which is targets the gene conserved regions.  Fluorescent tag used to tag products.  Amplification of the target region are done by PCR.  Use of suitable endonucleases for restriction digestion.  In a DNA sequencer, a mixture of digested fragments is separated using polyacrylamide or capillary electrophoresis.  Size of DNA fragments are determined by florescence detector.  Results are represented in graphical forms. X-axis shows size of DNA fragments and Y-axis shows fluorescent intensity. Advantages  Highly reproducible results  Numerical data can be used for quantitative and statistical analysis  In-silico analysis of the peaks of sequences  Robustic method than RFLP Disadvantage  Gene copy number variations in 16S rRNA in different microbes makes the technique semi-quantitative.  It has same flaws as all PCR based analysis techniques  Reliable lower limit of detection of PCR products in a mixture is about 1%.

Amplified Fragment Length Polymorphism (AFLP) In past decades several techniques have been used for typing and identification of eukaryotes and prokaryotes. The ideal techniques are those which produced invariable results. In 1990 Keygene invented most promising technique amplified fragment length polymorphism in Netherland. In 1995 Vos et al. explained this technique. Basically, AFLP is the method in which restriction enzymes are used for digestion. In past AFLP have also been used during

108

Asif Nadeem and Maryam Javed

different outbreaks for species identification like Acinetobacter species. Precise primer is used for PCR amplification. This technique is more robust than RFLP. Although name of this technique is AFLP which means amplified fragment length polymorphism, but the results of this techniques are not in form of length polymorphisms, instead of this it is in form if absence or presence of SNPs.

Principle AFLP is a selective digested fragment amplification technique. In this technique, adapters are ligated to digested fragments of DNA and PCR amplification are done by using adapter specific primers. Amplified products are analyzed by using PAGE or autoradiography. Method  Genomic extraction is done using sample like blood, plant extract etc. Ususally 500ng genomic DNA are used for AFLP. Concentration of DNA can be adjusted by using different volume of DNA.  Two restriction enzymes, one with a moderate cutting frequency and the other with a higher cutting frequency, are used to perform DNA restriction digestion. The digested DNA fragments are attached to adapters. Ligation of adapters with restricted fragments of DNA change the restriction site to prevent the second restriction during ligation. Sequence of adapter mostly consists of 20 nucleotides which can be used as primer in polymerase chain reaction. Mostly restriction and ligation are done in single reaction.  In preamplification adapters are used as primers. in this PCR only adapter ligated DNA fragments are amplified. First three steps of AFLP (extraction, digestion and ligation and amplification) are visualized on 1.6 percent agarose gel.  During selective amplification, primers that are used have three types of sequences. Five prime sites of primers consist of sequences related to adapter, sequence related to restriction site and on three prime there are selective nucleotides. Primers are labeled with fluorescent dye for visualization. By using this method more specific sequences are amplified.  After amplification automated capillary sequencing instrument are used for results. Results are visualized in form of peaks called electropherogram. Amplified fragments has range from 30 to 400bp.

Molecular Markers

109

Applications  AFLP is used in plant molecular genetics. In plants AFLP generate 150 locus specific bands. AFLP patterns are very interesting related to diversity, phylogeny and gene pools of plants.  AFLP has also been used in backcross breeding.  AFLP is also used in animal genetics. It is useful fir estimation of divergence of genome of related species like wild and domestic cattle.  It is used in mitochondrial analysis.  In microbiology it is used for species or strains identification.  it is also used in medical diagnostics.  it is used to develop understanding of target drug and also for drug sensitivity. Advantages  AFLP has high discrimination power.  It is very robust method.  It requires small amount of DNA.  High reliability of this marker lead to replacement of other markers.  It can detect polymorphisms within genome without requiring previous genome information.  It has the ability to magnify between 50 and 100 fragments at once. Disadvantages  AFLP has difficulties with mixture analysis  Not suitable for forensic analysis  Formation of locus specific markers for individual fragments are difficult  It is not used for co-dominance

Variable Number Tendem Repeats (VNTR) Variable number tendem repeats are also called the minisatellite of DNA. In 1980 first human minisatellite was discovered by R. White and Wyman. In 1986 minisatellite was also used for population study and linkage analysis as a genetic marker but replaced by microsatellite in 1990. VNTRs is a site on genome where nucleotide sequences in the short form are present as tendem repeats. These sequences can be present on several chromosomes and have

110

Asif Nadeem and Maryam Javed

variations in length between individuals. Each variation is used as an inherited allele and used for personal and parental identification. In minisatellite, 10-60 bp of DNA is repeated 5-50 times. Minisatellites are present in more than 1000 sites in human genome. These sequences have high diversity and more mutation rate (10-3-10-4 mutations/site/generation) among the population. These sequences are mostly present in the telomere and centromere of chromosomes. Minisatellites are generally GC rich sequences having length of 10-100 bp. Hypervariable minisatellites have legth of 9-64 bp = and are majorly present in centromeric region. Mostly these sequences are part of noncoding DNA but sometimes may be part of a gene.

Procedure  DNA extraction is done by using different methods like organic and inorganic or kit method. Samples used for DNA extraction are blood, saliva, hair etc.  Fragmentations of DNA are performed by using endonucleases enzymes.  Gel electrophoresis is used for analysis of fragments. It separates the fragments on the basis of charge and size. Electrophoretic tank is used for separation of fragments. When electric current is applied, DNA fragments move from negative to positive electrode. Smaller pieces of DNA are moved faster than the greater pieces of DNA.  Filter paper, plastic mask and Nylon membrane, gel and again filter paper are placed on each other in a sequence. Closed the apparatus and start the pump. Overnight pre-hyberdization is done. Labeled probes are added after prehyberdizaion.  Only for gel electrophoresis UV trans-illuminator or Gel Doc is used for band visualization. After overnight incubation in southern blotting luminescent dyes are used for visualization. Applications  VNTR is used for linkage analysis of genome.  Ideal marker in forensic crime investigation.  It is used in parental identification  It is used in genetic diversity study and breeding pattren identification in animals.  Diagnosis of different diseases are also performed using VNTR

Molecular Markers

111

Disadvantages  It is a laborious and time-consuming process.  It is an expensive method.  This method has many steps and requires weeks for result analysis. So, the error chances are also increased.  Large amount of DNA sample is required for analysis  This method is less sensitive as compared to other methods.

Short Tandem Repeats (STR’s) Short tandem repeats which is also called simple sequence repeats (SSR) or micro satellite in eukaryotes were identified in 1970. Short tandem repeats similar to VNTR began to be used in 1990’s. Short tandem repeats (STR) consist of 1-6 bp repeats of DNA. These DNA repeats make the string of on hundred nucleotides. Short tandem repeats mostly present in both prokaryotes and eukaryotes. In human presence of STR’s are three percent of the whole genome. In chromosomes they are present in subtelomeric regions. Most of the short tandem repeats are present in non-coding areas of genome but eight percent are located in coding areas. Densities of STR’s are variable among chromosomes. Highest density of STRs are in chromosome 19 of human. Average STR’s occur per two thousand base pair in human genome. Most commonly present STR’s in humans are A- rich sequences: AG, A, AAAN, AC and AAN. Although short tandem repeats are present mostly in genome but most of these repeats do not have biological use and are known as Junk DNA.

Classification of STR’s On the basis of repeat units, short tandem repeats are divided into several categories. Short tandem repeats are divided into mono, di, tri, tetra, penta, and hexa nucleotide repeats based on the length of the repeat units. In the human genome, di-nucleotide repeats are the most abundant short tandem repeats. Short tandem repeats are also divided into the following two categories:  

Simple repeats: These repeats consist of only one repetitive unit and also called perfect repeats. Compound repeats: These consist of different repetitive unit and also called imperfect repeats.

112

Asif Nadeem and Maryam Javed

Mutations in STR’s Particular sequences of DNA have less mutation rate (10-9 nucleotide per generation). But in short tandem repeats mutation rates are very high (10-2-106 nucleotide per generation). In human short tandem repeats mutation rate are 10-3-10-5 nucleotide per cell division. Chakraborty et al. examined that dinucleotide repeats have high mutation rate but in tetra nucleotide the mutation rate is 50% lower and non-pathogenic. Different approaches used for elimination of mutations:    

Familial approach Population approach Germ line approach Biological model approach

13CODIS (Combined DNA Index System) Federal Bureau of Investigation has been made data bank for indentation of perpetrators. All short tandem repeats in the Combined DNA Index System are tetrameric repeat sequences. Short tandem repeat loci are given names like D3S1266, where D stands for DNA, 3 for chromosome 3 on the STR locus, S for STR, and 1266 for the unique identification. STR locus and their genotypes are following: Table 6. STR locus and their genotypes Locus D3S1358 vWA FGA D8S1179 D21S11 D18S51 D5S818 D13S317 D7S820 D16S539 THO1 TPOX CSF1PO AMEL

Genotype 15, 18 16,16 19, 24 12,13 29,31 12,13 11,13 11,11 10,10 11,11 9, 9.3 8,8 11,11 X,Y

Molecular Markers

113

Advantages of CODIS Short Tandem Repeats  This system is used for forensic analysis  Data are digital and suitable for databases  Criminal identification can easily be done using these alleles  This is very robustic method for criminal identification Method Used for Analysis  Samples of blood, hair, vaginal secretions etc can be collected by different ways.  DNA extraction from cells is performed by using different organic, inorganic or kit methods.  PCR are performed by targeting the specific regions of DNA like STR loci. Fluorescent primers are used for amplification. Different dyes are used for labeling of primers according to different loci.  Gel electrophoresis are performed for analysis. Different amplified region due to different sizes are separated differently. Results are showed in form of different color peaks due to different color dyes.  Sample results are analyzed by comparing with the suspect results. Applications and Advantages of STR’s  STR’s are highly polymorphic and co-dominant  These loci mostly used in scientific and applied research  STR’S mostly used in genetic maps construction and genetic linkage analysis  Also for identification of individuals and gene location  For paternity testing  For diagnosis of different diseases  For criminal investigation  This analysis are also used for population genetics  Now, STR loci are also used to understand the relationship of populations in different areas, as well as the route of migration of ancient peoples. Disadvantages  Contamination chances are more as compare to other techniques  Very expensive method  Results analyses are very complicated

114

Asif Nadeem and Maryam Javed

 

In case of identical twins results are same Sample contamination lead towards false results

Random Amplified Polymorphic DNA (RAPD) Random Amplified Polymorphic DNA or RAPD in short, was discovered and applied in plants for the first time in 1990. The RAPD is a kind of PCR in which the DNA fragments amplified are unknown. For PCR arbitrary primers eight to twelve nucleotides are designed. It requires only one primer for amplification. Random primers are used for genetic diversity determination. This technique does not require any knowledge related to target DNA fragment. 10-mer primers that are identical to sequence may or may not lead to amplification of DNA segment. For example, if primers annealed away from each other or three prime-end of primers are not matching the target. No PCR product is produced if a mutation occurs in the template DNA at the location where primers are annealed, leading to a different pattern of amplified DNA segments on the gel. Procedure  Collection of samples blood or plant extract, saliva, semen, or some other biological sample.  Extraction of DNA by using samples any of the above mentioned.  Nano spectrophotometer can be used for quantification and also to check the purity of DNA.  Arbitrary primers are used for amplification of target DNA.  Restriction digestions of the fragments are also performed by using different types of enzymes.  Gel electrophoresis is used for analysis of fragments. It separates the fragments on the basis of charge and size. Smaller fragments move fast than larger.  Gels with amplified fragments are visualized in gel doc under UV light. Applications and Advantages  RAPD is used in plant breeding.  Classifications of species are also performed on the basis of RAPD techniques data.

Molecular Markers





115

Random amplified polymorphic DNA analysis are used for the identification of molecular markers linked to differential flowering behavior. Phylogenetic analysis of animal and plant species can also be done by using RAPD.

Disadvantages  RAPD do not use degraded DNA samples.  Resolving power of this technique is much lower than targeted.  Limited detection of polymorphisms  RAPD, due to low annealing temperatures are less reproducible than other procedures.

Restriction Site Associated DNA (RAD) Marker To determine the involvement of genes in various biological processes, genetic mapping of natural or artificial genomic variation is required. SNPs are among the most frequent and plentiful type of genetic markers, and their high density makes them suitable for researching genomic inheritance. Current genotyping technologies, on the other hand, necessitate a significant initial expenditure in order to uncover SNPs and then genotype these SNPs in a large number of people (Baird et al. 2008). A molecular marker is a piece of DNA that can be reliably identified and the inheritance of which can be easily tracked. The usage of molecular markers is based on DNA polymorphism that occurs naturally. A marker should be polymorphic and available in different forms so that chromosomes containing mutant genes and chromosomes containing normal genes can be distinguished by a marker they both carry. Genetic polymorphism is defined as the simultaneous and instantaneous occurrence of a specific trait in the same population of two discontinuous genotypes or variants. A DNA marker is a DNA sequence that can easily be recognized and inherited. The natural occurrence of DNA polymorphism provides the basis for the usage of molecular markers. Simultaneous and instantaneous emergence of two discontinuous genotypes or variants in the same population results in genetic polymorphism (Kumar et al. 2009). Basically, there are two types of DNA based markers: (1) non-PCR based (RFLP) and (2) PCR based markers (RAPD, AFLP, SSR, SNP, etc.). Microsatellite DNA markers are

116

Asif Nadeem and Maryam Javed

important because of their ease of use (simple PCR followed by denaturing gel electrophoresis for allele size determination) and the high degree of information provided by their large number of alleles per locus. Notwithstanding, being a bi-allelic marker, the SNP marker has grown in popularity. With the advancement of new specialised types of markers, their value in understanding and studying genetic variability and variety among the same as well as different species of organisms has grown (Kumar et al. 2009). SNPs have long been studied using genetic markers such as RFLP or AFLP, which disrupt endonuclease restriction sites. The fact that current restriction site polymorphism-based approaches only screen and analyze a small subset of potential restriction sites for each restriction endonuclease enzyme is a disadvantage. Such approaches are required to build a method that simultaneously screens each restriction site of a given enzyme (Miller et al. 2007). One of the approach entails using restriction site associated DNA or RAD markers in genotyping.

History of RAD Markers The RAD marker was first used with microarrays and then adapted for use with Next-Generation-Sequencing (NGS). Eric Johnson’s lab at the University of Oregon created it in 2006. In Drosophila melanogaster, RAD markers were utilised to identify recombination breakpoints and to locate QTLs in threespine sticklebacks (Miller et al. 2007). Double digest RADseq, an improved RAD tagging approach, was published in 2012 (Peterson et al. 2012). To perform low-cost population genotyping, they introduced a second restriction endonuclease enzyme and used a rigorous DNA size selection step. Restriction Site Associated DNA (RAD) Marker Genotyping It involved adopting a rigorous DNA size selection process and introduction of another restriction endonuclease enzyme to undertake low-cost population genotyping. Restriction Site Associated DNA (RAD) Markers These are a form of genetic marker that can be used in population genetics, association mapping, ecology, QTL mapping, and evolution. RAD mapping is the use of RAD markers for genetic mapping. Other restriction site marker approaches (such as AFLP or RFLP) use the fragment length polymorphism that emerged owing to the variation in restriction sites to distinguish genetic polymorphism. RAD tags, on the other hand, use DNA sequences flanking each restriction site. The usage of flanking

Molecular Markers

117

DNA-sequences in RAD tag approaches is called the reduced representation method (Miller et al. 2007; Davey et al. 2011).

RAD Mapping Principles Isolating RAD tags is the first stage in RAD mapping. The DNA sequences that immediately flank a restriction endonuclease’s cutting point throughout the genome are known as RAD tags. These separated RAD tags are then utilized to identify and genotype polymorphisms in DNA sequences. RAD tags can be employed to find and genotype DNA sequence polymorphisms, especially SNPs, once they’ve been separated. RAD markers are polymorphisms discovered and genotyped through the isolation and analysis of the RAD tags (Miller et al. 2007). Steps Involved in RAD Mapping Isolation of RAD Tags Isolating RAD tags is the first step in RAD mapping. The isolation of RAD tags begins when DNA is treated with a restriction endonuclease enzyme of a certain type. The overhangs of digested fragments are subsequently ligated with biotinylated linkers. After the DNA is randomly sheared into fragments that are substantially smaller than the average distance between restrictions sites, only the parts directly flanking a restriction site coupled to biotin are left. Streptavidin beads are used for biotinylated fragment isolation and purification. These fragments are immobilized with streptavidin beads while the remainder of the DNA is eliminated. Isolation of RAD tags is the initial step for microarray analysis (Lewis et al. 2007; Miller et al. 2007; Miller et al. 2007). The restriction endonuclease utilized during the RAD tag isolation process determines the density of genomic RAD tags (Baird et al. 2008). The beads are then digested, releasing fragments at the original restriction locations. Throughout the genome, this technique isolates the DNA tags that are directly flanking the restriction sites of a certain restriction endonuclease enzyme. Differences in the hybridization patterns of RAD-tag samples on a microarray are used to identify and type restriction site associated DNA (RAD) markers.

118

Asif Nadeem and Maryam Javed

Modified RAD Tag Isolation Procedure The technique for isolating and typing RAD tags has recently been improved, by the usage of Illumina Genome Analyzer high-throughput sequencing and embedded nucleotide barcodes for sample tracking. The restriction enzyme digestion of genomic DNA is the first step in this novel technique, followed by the ligation of a first adapter termed P1 to the fragment’s overhanging ends. Adaptors will bind to a flow cell used by Illumina for sequencing. For sample identification, this initial adapter has forward amplification, Illumina sequencing primer sites, and a nucleotide barcode of 4 or 5 bp. The molecular identifier (MID) barcode in the first adapter allows you to pool distinct DNA samples with different barcodes and monitor each sample while they’re sequenced in the same reaction. To minimize inaccurate sample assignments owing to sequencing errors, all barcodes vary by at minimum of two nucleotides. After then, the DNA fragments are combined and processed into random lengths of just few hundred base pairs. The blunt ends of the sheared ends are then ligated to a second adapter known as P2. The fragments containing both adaptors are then amplified using PCR. Because the P2 adapter has a divergent ‘Y’ structure, it will not bind to the P2 primer until the P1 adapter has completed the amplification process. Unless a complementary sequence is filled in during the first cycle of forward elongation emanating from the P1 amplification primer, the reverse amplification primer is unable to bind to P2. Due to the nature of the first adapter, only P1 adapter ligated RAD tags will be amplified during the final PCR amplification step. The size of these sheared, sequencer-ready fragments is then determined (fragments with a length of 200–500 pb are extracted), and the RADSeq library is sequenced on the Illumina sequencing platform. The molecular identifier (MID) P1 adaptor are used to create these sequences and across the restriction site for the particular restriction enzyme, leading to a massive data set of RAD tags i.e., sequences downstream of restriction sites obtained from a significantly reduced fraction of the original genome. If the endonuclease restriction site is symmetric, each site will produce two RAD tags. Polymorphisms can now be discovered in roughly 300 bases flanking each restriction site, thanks to the Illumina sequencing platform’s ability to sequence up to 150 bases. The usage of high-throughput sequencing for the investigation of RAD tags is known as reduced representation sequencing, which includes RAD-Sequencing, also known as RADSeq.

Molecular Markers

119

Because next-generation sequencers allow for entire polymorphism data to be used, this improved approach offers a significant advancement in the RAD genotyping platform. The multiplexed and massively parallel sample sequencing of RAD tag libraries enables the quick discovery of thousands of SNPs and high throughput genotyping of large populations. The advantages of using Sequenced RAD tags for genetic mapping are as follows: (1) Sequenced RAD tags are a shortened version of the genome that allows over sequencing of the nucleotides downstream of restriction sites and detection of SNPs, both of which are appealing qualities for genetic mapping, (2) Several markers for an application can be chosen by utilizing a certain restriction enzyme, and the quantity of markers can be raised endlessly by adding more enzymes (3) Multiplexed genotyping of individuals and genotyping of pooled populations for bulk segregation analysis for the fine-scale mapping are both possible with this method.

RAD Markers Typing and Identification Once DNA sequence polymorphisms, such as the SNPs have been identified, RAD tags could be employed to detect and genotype them. RAD markers are polymorphic sites in nature. RAD Markers Identification and Typing on Microarrays RAD markers are identified and typed using differences in the patterns of hybridization of the RAD-tag samples on a microarray. Hybridization to oligonucleotide or cDNA microarrays is used to identify RAD markers. The number of RAD markers that can be found is limited due to the lower sequence representation of these types of microarrays. In many circumstances, a suitable pre-existing array for hybridizing RAD tags does not exist. To address this issue, a new type of array for the identification and typing of RAD markers has been devised. RAD tags make up this microarray, which are first and foremost informative RAD markers. To create meaningful RAD tags, subtractive hybridizations were performed between the samples. By directly hybridizing the RAD-tag samples from the various individuals used to build the RAD array against one another, this enriched RAD marker array typed a certain amount of informative markers. The protocol for this technique included extraction of the genomic DNA from these subjects, tagging, and then hybridizing the DNA to the array without the use of RAD-tag isolation.

120

Asif Nadeem and Maryam Javed

Hybridization to microarrays can be used to identify RAD markers. However, when the number of sequences represented on the array diminishes, the marker density drops as well. The primary advantage of using expression arrays to genotype RAD markers is that they are not confined to a population subset or strains but could be employed for genotyping of any individual. RAD markers can be genotyped using high-density tilling route arrays, which are becoming more readily available. RAD tags can be utilized to make customized arrays that allow for the most precise detection and typing of RAD markers. Using DNA subtraction techniques, it is possible to enrich for RAD tags that are present in one sample but not in the other, which greatly increases the number of useful markers on the array. Making an optimized RAD marker microarray is potentially straightforward and affordable. The reagents used in this technique are commonly accessible, and thousands of RAD arrays can be made from PCR products (Miller et al. 2007). The poor sensitivity of RAD-based microarrays is a disadvantage, as this method can only identify polymorphisms in a DNA sequence at restriction sites, resulting in the lack of RAD tags, or considerable DNA sequence polymorphisms, causing RAD tag hybridization to be disrupted As a result, the genetic marker density achievable by microarrays is significantly reduced.

Identification and Typing of RAD Markers through RAD Tag Sequencing (RAD-Seq) The most efficient method for identifying RAD tags is high-throughput DNA sequencing, also known as RAD sequencing or RAD tag sequencing, abbreviated as RAD-Seq, or RADSeq. RADSeq combines Illumina sequencing with two molecular biology approaches: restriction enzymes to break DNA into fragments (as in AFLPs and RFLPs) and molecular identifiers (MID) to match sequence data to particular individuals. RADSeq provides significant advantages over earlier approaches for marker discovery and can be employed to conduct research involving on animals even with very little genomic data. Similar to studies using amplified fragment length polymorphisms and restriction fragment length polymorphisms (RFLPs), it reduces genomic complexity by subsampling solely at specified locations determined by restriction enzymes. RADSeq outperforms these methods in terms of its capacity to simultaneously find, validate, and score markers (instead of necessitating a lengthy developmental process) as well as its ability to reliably determine the origin of markers from every site. RADSeq can be employed in wild populations and on crosses of

Molecular Markers

121

any design, allowing for not just genotyping and SNP detection, but also more advanced analysis including quantitative genetic and phylogeographic research (Davey et al. 2011). In wild populations, 0.1 percent diversity is expected, and around 20–30 percent of restriction sites should be bordered by a polymorphism in the adjacent 200–300 bases of sequence. Following sequencing, the molecular identifier is used to separate the sequences from each individual. RADSeq can be used to detect polymorphisms at restriction sites, as well as SNPs and indels in the sequence bordering the restriction site (by finding a marker that is present in one set of individuals but absent in another, indicating a restriction site variation). Raw sequence is compared to a reference genome if one is available, and SNPs as well as indels are detected by employing currently available next-generation sequencing bioinformatics online resources like SAMtools, BWA, and Bowtie. By mapping the reads to a reference genome, the small quantities of sequencing error in the reads are automatically repaired. RAD tags can be analyzed without a reference sequence if one is not available. Candidate alleles are created by combining identical readings into unique sequences. Identifying the SNPs and indels between alleles at the same locus can be done by clustering the unique sequences with the fewest possible mismatches and correcting errors by comparing counts of each base at each location. True homozygous or heterozygous alleles will have large read counts, but errors will be negligible (Baird et al. 2008).

Conclusion Molecular markers are a popular method for the discovery of genetic variations that are associated with distinct phenotypic expression of a number of traits in many organisms. They find their role in a wide number of disciplines ranging from animal and agriculture sciences to forensics. Their usage has been accepted by scientists from various disciplines because of their stability, low cost, and convenience of use. Molecular genetic markers are one of the most effective methods for genome analysis, allowing heritable phenotypes to be linked to fundamental genomic variation. Molecular markers for individual homoeologous groups or chromosomes are now available for various organisms, which is assisting in understanding the molecular basis of various phenotypes. In the future, advancements in the NGS technologies, will rapidly accelerate the discovery of molecular markers and their use in variety

122

Asif Nadeem and Maryam Javed

of applications ranging from animal breeding to predicting the onset and progression of diseases, thereby having a significant impact in food production economies and health.

References Baird, Nathan A., Paul D. Etter, Tressa S. Atwood, Mark C. Currey, Anthony L. Shiver, Zachary A. Lewis, Eric U. Selker, William A. Cresko, and Eric A. Johnson. “Rapid SNP discovery and genetic mapping using sequenced RAD markers.” PloS one 3, no. 10 (2008): e3376. https://doi.org/10.1371/journal.pone.0003376. Davey, John W., Paul A. Hohenlohe, Paul D. Etter, Jason Q. Boone, Julian M. Catchen, and Mark L. Blaxter. “Genome-wide genetic marker discovery and genotyping using next-generation sequencing.” Nature Reviews Genetics 12, no. 7 (2011): 499-510. https://doi.org/10.1038/nrg3012. Kumar, P., V. K. Gupta, A. K. Misra, D. R. Modi, and B. K. Pandey. “Potential of molecular markers in plant biotechnology.” Plant omics 2, no. 4 (2009): 141-162. https://doi.org/10.3316/informit.090706285698938. Lewis, Zachary A., Anthony L. Shiver, Nicholas Stiffler, Michael R. Miller, Eric A. Johnson, and Eric U. Selker. “High-density detection of restriction-site-associated DNA markers for rapid mapping of mutated loci in Neurospora.” Genetics 177, no. 2 (2007): 1163-1171. https://doi.org/10.1534/genetics.107.078147. Miller, Michael R., Tressa S. Atwood, B. Frank Eames, Johann K. Eberhart, Yi-Lin Yan, John H. Postlethwait, and Eric A. Johnson. “RAD marker microarrays enable rapid mapping of zebrafish mutations.” Genome biology 8, no. 6 (2007): 1-10. https://doi.org/10.1186/gb-2007-8-6-r105. Miller, Michael R., Joseph P. Dunham, Angel Amores, William A. Cresko, and Eric A. Johnson. “Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers.” Genome research 17, no. 2 (2007): 240-248. https://doi.org/10.1101/gr.5681207. Peterson, Brant K., Jesse N. Weber, Emily H. Kay, Heidi S. Fisher, and Hopi E. Hoekstra. “Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species.” PloS one 7, no. 5 (2012): e37135. https://doi.org/10.1371/journal.pone.0037135.

Chapter 7

QTL Analysis: Ancient and Modern Perspectives Abstract The discovery for the mechanisms of the biological inheritance has undoubtedly been one of the most profound achievements in the area of biology that happened over a millennium ago. The quantitative trait locus (QTL) research involving the genetic deconstruction of measurable phenotypes into the Mendelian components has offered substantial knowledge about the organization and regulation of complex traits. This has been achieved by linking two types of biological data: phenotypic data (measurements of traits) and genotypic data (typically involving molecular markers). Several genes involved in the polygenic inheritance of specific traits are dispersed across the genome. QTL refers to their location and is determined by finding the location of a gene or combination of genes that have an effect on a particular quantitative trait. The capacity to untangle the genetic mechanisms affecting traits that are quantitative in nature is critical for various fields of study in biology such as livestock production. In farm animals, most of the economically important genetic traits are quantitative in nature. QTL research in this area has generated substantial information on how complex traits are organized and controlled in livestock animals. Mapping of QTLs in the genome referred to as QTL mapping can also permit the discovery of genes that govern genotypic variations impacting phenotypic traits of substantial economic importance.

Keywords: biological inheritance, measurable phenotypes, phenotypic data, genotypic data, quantitative trat

Introduction Considerable improvements have been made during the past century in terms of selecting animals with desirable production, behavioral and health characteristics. While focus of initial selection was only on the phenotypes,

124

Asif Nadeem and Maryam Javed

understanding of quantitative genetics principles resulted in a faster rate of genetic improvement. The following text describe some important areas of quantitative genetics and techniques, their relevance to livestock, and concludes with some relatively recent genomic developments on livestock improvement.

Quantitative Trait Locus (QTL) Quantitative trait loci are areas of the genome that include genes associated to a quantitative trait (QTL). Geldermann coined this phrase in 1975. Multiple QTLs can govern a single phenotypic characteristic, and they are frequently located on distinct chromosomes. The genetic architecture of a trait is determined by the proportion of QTLs that describe variability in the phenotypic characteristics. QTLs are usually both continuous and distinct traits. QTLs can also be used to find potential genes linked to a trait. QTLs are usually both continuous and discrete traits. QTLs can also be used to find potential genes linked to a trait. It is possible to sequence a region of DNA that has been identified as important for a trait. Any genes in this region’s DNA sequence could then be evaluated by comparing the sequence to a DNA database containing genes, the function of which is established.

Quantitative Trait Polygenic inheritance corresponds to the quantitative measurement of phenotypic characteristics (traits), each of which is traceable to more than one gene. Multifactorial inheritance deals with studying polygenic inheritance and their environmental interactions. Polygenic traits, unlike the monogenic traits, are not inherited according to Mendelian principles of inheritance. Variations in the human skin color is a well understood polygenic trait example. Because several genes play a role in defining an individual’s original skin color, changing just one of them can result in a modest alteration in the skin color. Diseases like diabetes, cancer, as well as several other illnesses with hereditary components are also polygenic. The majority of phenotypic features are the consequence of several genes interacting in a complicated type of epistasis.

QTL Analysis

125

Multifactorial Traits Both genetic and environmental factors including their interactions (which are 'G×E' interactions), influence traits. Multifactorial features are usually exhibited in organisms as continuous qualities, like skin color, body mass and height in humans. All of these traits are complex and influenced by their surroundings. Genes that lack recessive patterns and typical dominance show their impact in the continuous traits’ distribution like skin color and height, gene × gene (epistasis) and even more complex gene × gene × environment interactions. QTL Detection In the absence of the availability of a genome, sequencing the selected region and then determining the putative gene functions based on their homology to genes from other genomes whose function is known can be an option. For this, Basic Local Alignment Search Tool (BLAST) can be used, an online resource that enables users to input a primary sequence and explore the BLAST gene database containing genes from a wide range of organisms to find homologous sequences (https://blast.ncbi.nlm.nih.gov/Blast.cgi). The phenotypic traits are often expressed by segments of DNA that are closely connected to the gene, rather than the genes themselves. QTL Mapping Rather than looking into individual candidate genes, it’s common to look at associations with a broad range of genes. A scan of the genome is undertaken to evaluate associations between many markers (for genes) and the quantitative trait of interest, this is known as QTL mapping. There are following methods for detection of QTLs: 1. Single marker analysis 2. Simple interval mapping 3. Composite interval mapping

Simple Marker Analysis Techniques like the linear regression, t-test, analysis of variance (ANOVA), as mentioned in the following, are the simplest ways for discovering QTLs linked with single markers.

126

Asif Nadeem and Maryam Javed

Analysis of Variance Analysis of variance (ANOVA) at the marker loci is the simplest method for QTL mapping (Weller 2009). In this procedure, a t-statistic can be used to make a comparison of the averages of two genotype marker groups in a backcross. In the case of multiple potential genotypes in a cross (like an intercross), a more generic ANOVA version is used, which yields an Fstatistic. Because recombination is less probable when the markers and QTL are closely related, both will be inherited jointly, and the group means are expected to be considerably dissimilar. Independent segregation of the marker and QTL occurs if they are loosely related or not related at all. In such case, no significant difference will exist between the genotype group’s means. The ANOVA method for QTL mapping has three significant flaws. We don’t get independent estimations of QTL position and QTL effect, for starters. The actual effects of QTL at an individual marker will be lower than the perceived QTL effect due to recombination between the QTL and the marker, and the visible QTL effect at a marker will be lower than the actual QTL effect due to recombination between the QTL and marker. Secondly, persons whose genotypes are absent at the marker must be eliminated. Thirdly, if the markers are extensively spread, the QTL could be fairly distant from all of them, lowering the effectiveness of QTL detection.

Interval Mapping Interval mapping was introduced by Lander and Botstein in 1989 to address the three drawbacks of variance analysis at marker loci. For QTL mapping, interval mapping is now the most preferred method. The method relies on a typed marker’s genetic linkage map and, like analysis of variance, presupposes the existence of a specific QTL. The interval between adjacent pairs of linked markers along chromosomes is examined in interval mapping. This accounts for marker recombination, and the QTL is statistically stronger than a singlemarker relationship. Every locus is analyzed individually in interval mapping, and the logarithmic odds ratio (LOD score) or another statistic test (such as F ratio, likelihood ratio test) is measured to model and determine whether a particular locus is a true QTL or not. The bigger the amount of the test statistic for a certain chromosomal site, the more evidence there is that a QTL exists there.

QTL Analysis

127

Interval mapping is employed when predicting the location of a specific QTL inside two markers. The initial emphasis of interval mapping was on maximum likelihood, however, later, the incorporation of simple regression has shown to provide extremely effective approximations that are considerably less computationally costly. The following are the QTL mapping principle: 1) Given observable data on marker genotypes and phenotypes, the probability or LOD score for a given parameters set (especially QTL impact and QTL position) can be determined, 2) Estimations of the parameters are based on those for which the probability is greatest (effect sizes and genomic positions) 3) Permutation testing can be used to determine a significance threshold. The evidence for QTL vs genomic position is represented in a typical interval map, with peaks indicating plausible QTL placements. Traditional methods for detecting QTLs rely on comparing individual QTL models to a model that assumes no QTL. The probability of a single putative QTL is evaluated at every site in the genome using the “interval mapping” technique, for example. QTLs found elsewhere on the genome, on the other hand, can cause interference. As a result, the detection ability of QTLs could be harmed, and estimations of their locations and impacts could be biased (Lander and Botstein 1989; Knapp 1991). Even non-existing socalled “ghost” QTLs may appear (Haley and Knott 1992; Martinez and Curnow 1992). It’s possible that non-existent “ghost” QTLs will show up. Using multiple QTL models, numerous QTLs might thus be mapped more efficiently and correctly. Iteratively scanning the genome and adding existing QTL to the regression model as QTLs are uncovered is a common way to tackle QTL mapping when many QTL contribute significantly to a particular trait. This technique, known as composite interval mapping, is much more precise than single-QTL methods in determining the position and amount of QTL effects, in particular in context of the small mapping populations where genotype correlations could severely limit the results.

Composite Interval Mapping Interval mapping is accomplished using a selection of markers loci as covariates in composite interval mapping or CIM in short. By taking account connected QTLs and minimizing residual variation, such markers function as proxies for other related QTLs, increasing the depth of interval mapping. CIM’s main issue is deciding which marker loci to use as covariates; once

128

Asif Nadeem and Maryam Javed

these are determined, CIM reduces the model selection challenge to a singledimensional scanning. Nevertheless, it is still unclear which marker covariates are most appropriate. The relevant markers, predictably, are those that are closest to the genuine QTLs; so, if these could be found, the QTL mapping problem would be solved. CIM applications such as PLABQTL, QTL Cartographer, and MapManger QTX, are widely used.

Family-Pedigree Based Mapping In inbred species lie mice, it is relatively easy to underrate a QTL mapping exercise, involving specific between-strain crosses, such as F2, backcross and multi-generation intercross (e.g., Moradi Marjaneh et al. 2012). Instead of just one family, family-based QTL mapping, also known as family-pedigree based mapping (linkage and association mapping), involves many families. Where experimental crosses are difficult to make, family-based QTL mapping has been the sole approach to map genes. Plant geneticists are currently aiming to adopt some of the methodologies developed for human genetics because of the advantages associated with the latter. Bink et al. have discussed the use of a family pedigree-based strategy. The use of association and family-based linkage analysis has proved fruitful (Bink et al. 2008) (Rosyara et al. 2009).

Linkage Analysis The above methods of QTL mapping have outlined how genes associated with quantitative traits are detected. However, particularly in the human domain, the traits being studied is often binary (disease vs non-disease) and the search it to find genes associated with the disease of interest. The process is usually termed linkage analysis and is very similar in many aspects to QTL mapping. Linkage analysis can be parametric (if the link between genetic and phenotypic resemblance is known) or non-parametric. The LOD score determines whether a given pedigree, in which the marker and disease are co segregating, is due to presence of linkage (with a specific linkage value) or to chance. Non-parametric linkage analysis, on the other hand, investigates the likelihood of a singular allele being similar to itself through descent.

QTL Analysis

129

Parametric Linkage Analysis Morton (1955) created the LOD score, which is a type of statistical test to quantify genetic linkage in plants, animals, and humans. The probability of getting test data when the two loci (QTI and marker) are actually connected is compared to the possibility of witnessing the very same data by random chance. Positive LOD scores suggest a higher likelihood of connection, whilst negative LOD scores suggest a lower likelihood of linkage. Computerized LOD score assessment is a straightforward method for analyzing complicated family pedigrees to identify the linkage between Mendelian traits (or between one or two markers and a trait).

Major Types of Genetic Markers Prior to conducting QTL mapping (quantitative traits) or linkage mapping (categorical traits), a linkage map needs to be constructed, consisting of a map of genetic markers of known chromosomal positions. These issues are discussed here.

Morphological/Classical/Visible Marker These markers are themselves phenotypic traits and characters e.g., flower, color, seed, growth habit or pigmentation. Indeed, the underlying concept of a QTL was laid down by Sax (1923) who observed the association between seed pigmentation traits (a marker) and seed weight (phenotype). Biochemical Markers Isoenzymes are allelic variations of enzymes that can be identified using electrophoresis and appropriate staining. Molecular Markers In genetics, a molecular marker is defined as a fragment of DNA connected to a particular location in the genome. Molecular markers are employed in biotechnology and molecular biology to detect a DNA sequence in an unknown pool of DNA.

130

Asif Nadeem and Maryam Javed

Types of Molecular Markers There are several types of genetic markers, all of which have benefits and drawbacks. “First Generation Markers,” “Second Generation Markers,” and “New Generation Markers” are the three categories in which these markers are classified (Khan 2015). There are several types of genetic markers, all of which have benefits and drawbacks. “First Generation Markers,” “Second Generation Markers,” and “New Generation Markers” are the three categories in which these markers are classified (Khan 2015). In the genome, these markers could be utilized to detect dominance as well as co-dominance. Within a population, using a marker to identify dominance and co-dominance could help to differentiate homozygotes from heterozygotes. It is more useful to use co-dominant markers since they identify several alleles at once, allowing mapping methods to be used to track a specific attribute. These markers enable sequence to be amplified within the genome to make comparisons and perform various analysis. Since they pinpoint linkage between noticeable spots inside a chromosome and can be replicated for confirmation, molecular markers use is impactful. When markers are present in large numbers, small differences in mapping population can be detected, permitting for the differentiation of mapping species and the segregation of traits and identities. They may be employed to locate specific spots on chromosomes, enabling for the creation of physical maps. Finally, they might be used to figure out how many alleles each individual has for a certain trait at a specific locus (i.e., bi-allelic or polyallelic). It’s vital to note that a marker must be polymorphic in order to be useful for detecting differences between individuals of the same or different species. The monomorphic markers that contain all members of a population study with same genotype, are unable to differentiate between individuals and thus are useless for the gene mapping. As previously stated, genomic markers have certain qualities and weaknesses, so careful study and understanding of the molecular markers is required before they can be used. A RAPD marker, for example, is dominant (being able to identify only one different band) and could be susceptible to repeatable results. This is usually owing to the circumstances under which the technique was created. When a specimen is generated, RAPDs are also employed under the hypothesis that two samples contain the same locus. Currently, SNP markers have taken on a revolutionary way of approaching genomic studies. They are typically very high-density allowing very accurate association mapping, using genome-wide association studies,

QTL Analysis

131

abbreviated as GWAS. The GWAS is an extremely significant tool in human disease studies as well as agricultural uses in livestock and crops.

Linkage Map Genetic linkage maps are chromosomal maps of a species or population that depict the relative positions of known genes or markers on a scale of genetic recombination instead of particular physical coordinates on every chromosome. Linkage mapping entails chromosomal mapping of population or species that depicts the relative positions of already identified markers or genes on a genetic recombination level instead of particular physical coordinates on every chromosome. Both the genetic linkage map and physical gene map differ in respect that the former maps molecular marker sites in the genetic recombination units, whilst the latter locates markers in relation to their physical base-pair positions. Variance between these two maps types can reveal recombination “hot spots” with in the genome. The linkage mapping process is essential to finding genes linked to hereditary diseases or production traits. Genetic characteristics and markers will appear in all feasible combinations in a typical population. Depending on the frequency of each gene, a particular combination can be found. The recombination frequency between the markers during homologous chromosomal crossing is used to create a linkage map. On the linkage scale, the more recombination (segregation) between two genetic markers, the more apart they are. On the other hand, the shorter the actual distance between the markers, the greater the frequency of linkage between them. Traditionally, measurable characteristics (eye color, enzyme production) originating from the coding DNA sequences that were utilized as markers.

Procedure of Constructing a Linkage Map 1. Frequency of recombination: Firstly, the recombination frequency between all pairs of markers is evaluated. The recombination frequency is the proportion of individual where a recombination event is observed between the pair of markers. Note that a recombination would be observed if there are an odd number of crossings over between the pair of markers, an even number of crossings over would not be seen as recombinant event.

132

Asif Nadeem and Maryam Javed

The principle is that the close two markers are, the lower the recombination frequency is. Very distant or unlinked markers have a recombination frequency of 0.5. 2. For small recombination frequencies, this value is approximately equal to the distance between the pairs of markers, but for increasing recombination frequency, the genetic distance gets larger (accommodating the multiple crossing over mentioned above). The mapping function is the relationship between the frequency of recombination and the genetic distance. The Haldane and Kosambi mapping functions are often used mapping functions (Weller 2009). Map distance are expressed in Morgans (M) (1 Morgan = distance where one recombination is expected on average), thought usually map distances are expressed in centiMorgans (cM). 3. Once distances between pairs of markers are computed, the markers are then ordered to produce a consensus across the genome. Sets of markers entirely unlinked will be placed in separate linkage groups, which if the mapping is done with sufficient marker density and individuals, will be aligned to particular chromosomes. This is a computationally intensity step. The first software to produce a linkage map was MAPMAKER (Lander et al. 1987).

Future of QTL Mapping Of the proposed 19,000 genes in human (Ezkurdia et al. 2014) in the OMIM database, less than 2000 genetic gene variants have been linked to a linked to a particular phenotype. Conventional family linkage study is still the most effective approach for detecting the mutations’ phenotypic traits in the residual genes, particularly for high penetrance mutations like mutations resulting in severe loss-of-function. Using a hybrid linkage/LD technique, recessive traits homozygosity mapping in suitable populations can prove to be much more effective in the gene identification process. Statistical geneticists also use QTL mapping to figure out how complicated the genetic structure is that underlies a phenotypic trait. For instance, they might want to recognize whether a particular phenotype is influenced by a large number of independent loci or a small number of loci,

QTL Analysis

133

and whether those loci undergo interactions. This could provide insight into how the phenotype is changing. DNA microarrays have recently been used to integrate classic QTL studies with gene expression profiling. These are cis- and trans-regulatory components that influence the expression of genes that are frequently connected to disease. It has been shown that cross-validation of genes with metabolic pathways and peer-reviewed literature databases has been successful in determining the gene responsible.

Genetic Improvement Programs for Dairy Cattle and Other Livestock Qualitative traits are phenotypes or characteristics whose intensity can be measured on a numerical scale. Numerous quantitative traits, like dairy goat milk yield and sheep fleece weight are characterized by a continuous distribution of values. Other traits, like the number of eggs laid by laying hens or the size of the litter produced by pigs, may be whole numbers because they include variables. There are certain quantitative features presented on a multiple and ordered categorization scale, while others are binary. The Binary features, including the animals’ classification as healthy or unwell, are the ones in which individual organisms may be categorized into two conceivable outcomes. Body condition scores in beef or beef meat marbling scores represent examples of multiple-category outcomes. Numerous genes, as well as environmental factors, impact quantitative traits. As a result, quantitative characteristics are frequently called complex qualities. There are two main goals of dairy cattle selection programs: 1) dairy cattle that can efficiently produce large quantities of milk; and 2) dairy cattle that can resist infectious diseases, metabolic disorders, infertility, and other health problems that lead to high veterinary bills and early culling from the herd. Effective dairy cattle breeding programs are based on sound principles of quantitative genetics and tend to rely on cooperation among commercial dairy producers, government agencies, non-profit organizations, breeding companies, and agriculture educational institutions. In data accumulation and processing it for quality control and statistical analysis, and then using it for educational and outreach programs or product development, every organization plays a critical role. Infrastructure of Data Collectionmilk Recording The establishment and widespread use of milk recording tools has considerably aided genetic selection programs in dairy cattle. Data collection standards ensure quality control and manage the day-to-day collection of

134

Asif Nadeem and Maryam Javed

millions of data points regarding milk volume, milk fat percentage, milk protein, nitrogen, lactose, and urea content percentage, and somatic cell count. In addition, information regarding individual dairy animals, such as birth dates, calving dates, gathered through the milk recording program. In the past, these data were recorded manually by testers who travelled to farms on a monthly or bi-monthly basis and, while collecting the milk samples, copied this information from on-farm record-keeping systems. At present, most of the information regarding birth dates, calving dates, inseminations, culling dates, pedigrees, health problems, and other important events comes directly from on-farm herd management software, such as DairyComp 305 (www.vas.com), DHI-Plus (www.dhiprovo.com), and PCDart (www.drms.org).

Pedigree Information The task of collecting and maintaining pedigree information has been carried out by breed associations since the late 1800s and is necessary for better development of cattle. Health, Fertility, Calving Ability, and Longevity Data Data collection data addressing fitness qualities, like health, longevity, fertility, and weaning ability has lagged behind that of production and conformation traits according to Berger (Berger 1994). Progeny Testing Progeny testing programs have been central to genetic improvement in dairy cattle for over 50 years. Progeny testing entails gathering more than 1000 units of semen for every bull from potential elite young bulls bought from pedigree breeders on the basis on high parent average or PA for crucial traits, and then distributing such semen to the dozens or even hundreds co-operator herds.

Estimating Breeding Values Pre-Adjustment for Established Effects of the Environment Almost all economically important traits in the dairy cattle are quantitative in nature, meaning they are quantified on a continuous scale. In general, every trait can be fit inside the following simple model in which a phenotype is a result of genotype and environmental factors (phenotype = genotype + environmental factors). In the equation, phenotype indicates an individual organism’s observable characteristic, while genotype is a representation of the

QTL Analysis

135

animal’s genetic influence on that trait, and environment factors represent the aggregate of all the non-genetic influences. To eliminate the environmental impacts, statistical corrections must be used to establish normalized information (Cole et al. 2009). It is the estimate of the additive genetic part of this model that is known as the estimated breeding value (EBV), originally developed by Henderson (1984) (although the work dates back to the 1960s) and this has been a crucial to animal breeders as a selection tool.

Contemporary Groups Although phenotypic characteristics such as age, milking frequency, and lactation length can be standardized, these traits are also influenced by various environmental factors that are not visible. Such as the exact weather effectiveness of the heat abatement devices used on that farm (e.g., fans, sprinklers, or soakers). In addition, information regarding the farm animal’s vaccination program and the feed offered to every animal, like the type, quantity, or grade of particular forages as well as concentrates, and additives administered, or energy content of the entire mixed diet. As a result, resort to the current group concept. Genetic Evaluations of Animal Models Farmers choose to pick the partners for particular cows in a non-random way to remedy flaws in their physical characteristics or inadequacies in milk output, reproductive capacity, or udder health though the corrective mating, which could also maximize the value from costly semen by using that for mating the most desirable cows. Once environmental factors and undiscovered environmental factors have been compensated by categorizing animals into contemporary groups, even then non-random mating must still be taken into account. As a result, adjustments for mate merit is required. Four Paths of Selection As per the sex of both parents and offspring, the rate of genetic advancement can be defined in light of four ways of selection. The first path, male sires, symbolises outstanding males who are chosen to be the following generation’s young bulls’ sires. In the second path, male sires, indicate a bigger male group whose sperm is utilized to propagate the general cow population in order to create substitute females for the commercial farms. The male dams, which constitutes third path indicates an elite female group who are mated with bulls from the male sires group in order to produce bull calves, which are frequently called bull dams. The fourth path, Female dams of females, describes the vast

136

Asif Nadeem and Maryam Javed

majority of female cattle in commercial farms who are primarily employed for milk production instead of breeding purposes. Such cows, also known as commercial cows, are used for mating with bulls amongst the female group’s sires to start lactation and give birth to the replacement heifers.

Increased Productivity through Selection Milk Production Milk protein and fat yields, as well as their percentages, were all evaluated genetically. Rather than being consumed as fluid milk, the bulk of milk is utilized to make cheese and other processed dairy products. As a result, boosting milk solids generation is more important than improving milk volume or the percentage of milk protein or fat. There are two approaches to enhance total milk solids in every cow: (1) raising milk yield without changing milk protein and fat content percentage, or (2) enhancing milk protein and fat contents percentage while maintaining milk yield. Maintenance Costs and Milk Production Efficiency By reducing maintenance expenses, selecting for greater milk yield serves to raise the milk production gross efficiency or GE (VandeHaar et al. 2006). By improving management and maintaining multiples of maintenance, farm profitability can be significantly increased according to Capper et al. who also observed that by selecting for better genetics and managing cows more efficiently, milk production per cow can be increased as well (Capper et al. 2009). Functional Traits Selection Calving Performance Calving ease, service sire (direct), stillbirth rate or SSB and daughter (maternal) stillbirth rate, and daughter calving ease abbreviated as DCE are all genetic assessments included in calving ability. Fertility in Males and Females Fertility is determined by two factors: the bull’s sperm fertility and the cow’s fertility into whom the sperm is put (VanRaden et al., 2004, Weigel, 2004a).

QTL Analysis

137

Sire Selection Rather than employing individual culling thresholds and risk management, geneticists usually advocate that dairy farmers utilize a selection index, in which all features are integrated as per their particular economic values, to pick the bulls to employ in their herds. The squared connection between an animal’s predicted breeding worth and its genuine breeding potential is known as risk reliability or REL. Gender-enhanced semen was first observed in dairy calves about a decade ago, the application of which has since gained widespread acceptance (Norman et al., 2010). Since a dairy farmer cannot decide among the heifers that are best for breeding with costly genderenhanced semen unless pedigree information recorded are correct and thorough, the accessibility of gender enhanced semen has piqued interest in genetics and pedigree documentation. Genetic improvement programs in dairy cattle by utilizing quantitative trait loci have achieved outstanding advancement because of the collaborative efforts between organizations that keep milk records, centers for data processing, animal breeding associations, government agencies, insemination companies and agricultural universities. Design for Livestock QTL Detection and Implications for MAS An enhanced form of selection of animals for breeding utilised information on QTL that have been mapped. Animals are genotyped for these markers and selected based in these favourable gene markers: this process is called markerassisted selection abbreviated as MAS. MAS utilized in cattle comes in a variety of forms MAS use for genetic enhancement is contingent on the precision with which QTLs have been outlined. There are three levels for this purpose. 1) Functional mutation Functional polymorphism loci that can be genotyped. 2) LD markers Loci with a functional mutation in a population-wide linkage disequilibrium. 3) LE markers Loci having a functional mutation that are in population-wide linkage equilibrium.

138

Asif Nadeem and Maryam Javed

GWAS and QTL Genome-wide association study abbreviated as GWAS, also called wholegenome association study or WGAS, and expression QTL or QTL are the two new methodologies for analysing the relationships of numerous genetic variants (usually SNPs) with attributes of interest (in comparison to classical QTL). GWAS, similar to QTL, contributes to our knowledge of genome–trait connections. GWAS data is based on genome maps, while QTL data is based on linkage maps. To represent both forms of data in parallel, the genomic maps are matched with their corresponding linkage maps. To align the maps, two strategies were previously used: (i) linearly scaling out both the genome and linkage maps with the same length base so that their map positions are visibly similar, and (ii) usage of anchor markers to transmit map information between them. Unlike QTL mapping, GWAS maps on a much finer scale, due to the much increases marker density compared with microsatellite markers typically used in QTL mapping. Also, they are not restricted to particular crosses of family structure, so can be used for gene mapping in a commercial herd environment. Meta-Analysis Methodology G. Glass suggested meta-analysis as a technique for integrating and summarising the results of a collection of research in 1976 (Glass, 1976). Meta-analysis has now become a commonly established analytical technique in a vast range of disciplines, particularly in the clinical, behavioral, and social sciences (Hedges.1985). Meta-analysis is the implementation of basic statistical concepts to circumstances when only summary material (e.g., public reports) is available, rather than the original unit record data. Meta-analysis done correctly provides for a better objective assessment of the data, which may result to the resolving ambiguity and dispute. Meta-analysis makes the literature review process more transparent, compared with traditional narrative reviews where it is often not clear how the conclusions follow from the data examined (Egger et al., 1998). The application of meta-analysis to QTL detection is recent (Goffinet et al., 2000, Hayes et al., 2001, Khatkar et al. 2004). In contrast to conventional narrative reviews, where it is usually unclear how the particular conclusions are derived from the data analysed, meta-analysis renders the literature review procedure more visible. Metaanalysis has just recently been applied to QTL detection. In comparison to any one study, pooling the results from multiple studies can provide a more precise and consensus estimate of the location of a QTL and its effect. However,

QTL Analysis

139

merging the results of QTL mapping across studies presents numerous problems, including variances in study design, sample size, marker density, linkage map, and statistical methodologies used.

Genomic Selection Another important tool added to the animal breeder’s list is genomic selection (Hayes et al., 2009). This utilises markers over the entire genome to arrive at the best prediction of genetic merit. It adds value to the current EBVs by deriving genomic EBVs (gEBVs). Unlike GWAS which aims at detecting specific genes associated with the trait of interest, genomic selection regards the genome as a ‘black box’: the purpose is to find the best combination of SNP information across the genome that predict the breeding value, without considering the genetic function of any of the SNP markers that make up this selection index. This tool is making substantial progress in livestock breeding.

Conclusion It has now been proven that QTL with significant impacts can be found in livestock, and with inclusion of skills, capital, technology, the gold standard of QTL mapping identification of the functional mutations can be achieved. Hence, QTL mapping must transfer from academic and research institutions to the animal breeding sector, as well as a full complement of modern genomic-based quantitative genetics tools.

References Bink, M. C. A. M., M. P. Boer, C. J. F. Ter Braak, J. Jansen, R. E. Voorrips, and W. E. Van de Weg. “Bayesian analysis of complex traits in pedigreed plant populations.” Euphytica 161, no. 1 (2008): 85-96. https://doi.org/10.1007/s10681-007-9516-1. Berger, P. Jeffrey. “Genetic prediction for calving ease in the United States: Data, models, and use by the dairy industry.” Journal of dairy science 77, no. 4 (1994): 1146-1153. https://doi.org/10.3168/jds.S0022-0302(94)77051-X. Egger, Matthias, and George Davey Smith. “Meta-analysis bias in location and selection of studies.” Bmj 316, no. 7124 (1998): 61-66. https://doi.org/10.1136/bmj. 316.7124.61. Capper, Jude L., Roger A. Cady, and Dale E. Bauman. “The environmental impact of dairy production: 1944 compared with 2007.” Journal of animal science 87, no. 6 (2009): 2160-2167. https://doi.org/10.2527/jas.2009-1781.

140

Asif Nadeem and Maryam Javed

Cole, J. B., D. J. Null, and P. M. VanRaden. “Best prediction of yields for long lactations.” Journal of dairy science 92, no. 4 (2009): 1796-1810. https://doi.org/10.3168/ jds.2007-0976. Ezkurdia, Iakes, David Juan, Jose Manuel Rodriguez, Adam Frankish, Mark Diekhans, Jennifer Harrow, Jesus Vazquez, Alfonso Valencia, and Michael L. Tress. “Multiple evidence strands suggest that there may be as few as 19 000 human protein-coding genes.” Human molecular genetics 23, no. 22 (2014): 5866-5878. https://doi.org/10.1093/hmg/ddu309. Glass, Gene V. “Primary, secondary, and meta-analysis of research.” Educational researcher 5, no. 10 (1976): 3-8. https://doi.org/10.3102%2F0013189X005010003. Goffinet, Bruno, and Sophie Gerber. “Quantitative trait loci: a meta-analysis.” Genetics 155, no. 1 (2000): 463-473. https://doi.org/10.1093/genetics/155.1.463. Haley, Chris S., and Sarah A. Knott. “A simple regression method for mapping quantitative trait loci in line crosses using flanking markers.” Heredity 69, no. 4 (1992): 315-324. https://doi.org/10.1038/hdy.1992.131. Hayes, Ben J., Phillip J. Bowman, Amanda J. Chamberlain, and Michael E. Goddard. “Invited review: Genomic selection in dairy cattle: Progress and challenges.” Journal of dairy science 92, no. 2 (2009): 433-443. https://doi.org/10.3168/jds.2008-1646. Hayes, B. E. N., and Mike E. Goddard. “The distribution of the effects of genes affecting quantitative traits in livestock.” Genetics Selection Evolution 33, no. 3 (2001): 1-21. https://doi.org/10.1186/1297-9686-33-3-209. Khan, Faheema. “Molecular markers: an excellent tool for genetic analysis.” J Mol Biomark Diagn 6, no. 03 (2015): 233. https://doi.org/10.4172/2155-9929.1000233. Khatkar, Mehar S., Peter C. Thomson, Imke Tammen, and Herman W. Raadsma. “Quantitative trait loci mapping in dairy cattle: review and meta-analysis.” Genetics Selection Evolution 36, no. 2 (2004): 163-190. https://doi.org/10.1051/gse:2003057. Knapp, S. J. “Using molecular markers to map multiple quantitative trait loci: models for backcross, recombinant inbred, and doubled haploid progeny.” Theoretical and Applied Genetics 81, no. 3 (1991): 333-338. https://doi.org/10.1007/BF00228673. Lander, Eric S., and David Botstein. “Mapping mendelian factors underlying quantitative traits using RFLP linkage maps.” Genetics 121, no. 1 (1989): 185-199. https://doi.org/10.1093/genetics/121.1.185. Lander, E S., P Green, J. Abrahamson, A. Barlow, M. J. Daly, S. E. Lincoln and L. A. Newberg. “MAPMAKER: an interactive computer package for constructing primary genetic linkage maps of experimental and natural populations.” Genomics 1 (1987): 181. https://doi.org/10.1016/0888-7543(87)90010-3. Martinez, O., and R. N. Curnow. “Estimating the locations and the sizes of the effects of quantitative trait loci using flanking markers.” Theoretical and Applied Genetics 85, no. 4 (1992): 480-488. https://doi.org/10.1007/BF00222330. Moradi Marjaneh, M., I. C. A. Martin, Edwin P. Kirk, Richard P. Harvey, C. Moran, and P. C. Thomson. “QTL mapping of complex binary traits in an advanced intercross line.” Animal genetics 43 (2012): 97-101. https://doi.org/10.1111/j.1365-2052.2012. 02383.x. Morton, Newton E. “Sequential tests for the detection of linkage.” American journal of human genetics 7, no. 3 (1955): 277.

QTL Analysis

141

Norman, H. D., J. L. Hutchison, and R. H. Miller. “Use of sexed semen and its effect on conception rate, calf sex, dystocia, and stillbirth of Holsteins in the United States.” Journal of dairy science 93, no. 8 (2010): 3880-3890. https://doi.org/10. 3168/jds.2009-2781. Rosyara, U. R., J. L. Gonzalez-Hernandez, K. D. Glover, K. R. Gedye, and J. M. Stein. “Family-based mapping of quantitative trait loci in plant breeding populations with resistance to Fusarium head blight in wheat as an illustration.” Theoretical and applied genetics 118, no. 8 (2009): 1617-1631. https://doi.org/10.1007/s00122-009-1010-9. Sax, Karl. “The association of size differences with seed-coat pattern and pigmentation in Phaseolus vulgaris.” Genetics 8, no. 6 (1923): 552. VandeHaar, Michael J., and Norman St-Pierre. “Major advances in nutrition: Relevance to the sustainability of the dairy industry.” Journal of dairy science 89, no. 4 (2006): 1280-1291. https://doi.org/10.3168/jds.S0022-0302(06)72196-8. VanRaden, P. M., A. H. Sanders, M. E. Tooker, R. H. Miller, H. D. Norman, M. T. Kuhn, and G. R. Wiggans. “Development of a national genetic evaluation for cow fertility.” Journal of dairy science 87, no. 7 (2004): 2285-2292. https://doi.org/10.3168/ jds.S0022-0302(04)70049-1. Weigel, K. A. “Improving the reproductive efficiency of dairy cattle through genetic selection.” Journal of Dairy Science 87 (2004): E86-E92. https://doi.org/10.3168/ jds.S0022-0302(04)70064-8. Weller, Joel Ira. Quantitative trait loci analysis in animals. CABI, 2009.

Chapter 8

Epigenetics and Its Applications Abstract Only a small portion of the enormous diversity in phenotypic traits can be explained by genomic data on its own. There is a high probability that some of the unexplained variation is due to the epigenome. Biological molecules that take part in chromatin remodeling, DNA methylation, modifications in histone tails, and additional molecular species like non-coding RNAs that have a role in conveying epigenetic information are all part of the epigenome. The study of an organism’s epigenome is called epigenetics, which entails studying the heritable phenotypic modifications that have no impact on changing the sequence of DNA itself. Epigenetic alterations affect the activation of specific genes without changing their coding sequence. These changes occur by changing the architecture of the DNA or the surrounding chromatin proteins, resulting in gene silencing or activation. The majority of epigenetic modifications happen just once in the lifespan of a single organism; nevertheless, certain epigenetic modifications can be passed down to the organism’s progeny. The epigenome is able to modulate expression of genes in response to signals such as nutrition, infections, and climate etc. present in cells or an organism’s external and or internal environment, resulting in the creation of certain phenotypic expressions. In this chapter, we describe the current knowledge about epigenetic processes and their applications in various fields such as health, nutrition, medicine, and livestock production.

Keywords: epigenomes, heritable phenotypic modifications, DNA sequence, gene silencing, gene activation

Introduction Genetics is the study of inherited variation. It lays the foundation of life. DNA contains all of an organism’s genetic information. It carries the information which is required by the cell to perform function necessary for growth, survival and reproduction of an organism. Behavior and appearance of an

144

Asif Nadeem and Maryam Javed

organism depends on inherited genetic material. The environment can also have an impact on gene expression. Scientists have progressed their research from investigating single gene to the analysis of thousands genes and whole genomes of organisms within past few decades. Science of genomics has promptly been expanded from determination of sequences of DNA (the nucleotide order on a given fragment of DNA), to the study of roles of genes and proteins and expression profiles (more functional level) (Del Giacco and Cattaneo 2012). After about 150 years of Gregor Mendel’s discovery of inheritance in pea plants, genetics has gotten more importance and has developed more quickly. In the 1940s, Waddington coined the term epigenetics (using Greek prefix epiwhich means above) to describe the development of a single zygote into multiple various kinds of cells in an organism and the observation of nonMendelian features of heredity within the same DNA. Epigenetics is the study of inherited phenotypic changes or gene expressions that are not affected by environmental factors and are independent of changes within sequences of DNA including histone modifications, methylation of DNA and also involvement of some noncoding RNAs (Khatib 2012). Words that contain diverse meanings for different people have long had a place in biology. Epigenetics is one of the extreme cases having more than one meaning and definitions with different origins. According to Conrad Hal Waddington, epigenetics is the study of complex developmental processes between the genotype and phenotype (Waddington 1957). However, Arthur Riggs and his colleagues gave definition of epigenetics as the evaluation of heritable alterations (meiotically and/or mitotically) in function of gene that are unable to be explained by changes in sequence of DNA (Bird 2007). Embryologist Conrad Waddington coined the word epigenetics in 1942 by associating it with epigenesis, the concept of 17th century. In the followed years studies on gene regulation were tackled in the frame of genetics and not in epigenetics. However, in 21st century epigenetics gain recognition and research on it rose strongly (Deichmann 2016). Literal meaning of the term epigenetics is “in addition to changes in genetic sequence.” The term epigenetics has been evolved and has included mechanisms that can change the activity of gene without altering the sequence of DNA and led to changes transmissible to daughter cells while research has shown that some of the epigenetic changes can be reversed. A century later, after the introduction and understanding of epigenetics, scientists and researchers have been trying to find out the traces for the modifications of gene function other than sequence changes. Today, evidence of relating epigenetic mechanisms with a wide

Epigenetics and Its Applications

145

variety of health indicators, illnesses and behaviors have been found. This can include cognitive dysfunction, cancers of about all kinds, cardiovascular, autoimmune, respiratory, and reproductive and neurobehavioral diseases. Basic nutrients, hormones, bacteria, viruses, pesticides, heavy metals, radioactivity, diesel exhaust, polycyclic aromatic hydrocarbons, tobacco and smoke are suspected or known to be the driving agents of epigenetic mechanisms. Several epigenetic mechanisms have been found including acetylation, ubiquitylation, sumoylation, phosphorylation and methylation. More epigenetic conditions and processes are suspected to come up with the passage of time as investigations proceed. These mechanisms occur naturally and are considered important for proper functioning of organisms. But defect in epigenetic processes can lead to behavioral effects and adverse health issues (Weinhold 2006). Molecular events in epigenetic events are governed by the way the environment is involved in regulating the genomes of organisms. Epigenetic mechanisms lead to phenotypic changes like physiology, behavior, cognition and appearance (Powledge 2011).

Rise of Epigenetics Historically, the concept of epigenetics originated in the early 19th century. Development and inheritance were considered as the same problem by scientists in nineteenth century. While the rapid development was happening in the science of genetics, developmental biologists and embryologists were busy in using procedures and mechanisms that took little account of gene action and genes (Holliday 2006). Epigenetics history is associated with study of development and evolution, when in early nineteenth century, Jean-Baptiste Lamarck (a French biologist) tried to explain the concept of evolution. Lamarck gave the concept of ability of an organism to acquire transmittable features over their lifetime and hence responsible for evolution (Van Soom et al. 2014). Association between concepts of Lamarck and epigenetics can be developed as it vividly explains interaction of an organism with its environment (Babenko et al. 2012). However, in 1880s, August Weismann rejected this of idea inheritance of acquired characteristics by conducting an experiment on twenty two generations of mice and proved that inheritance of lost tail was not possible (Weismann et al. 1891).

146

Asif Nadeem and Maryam Javed

In 1859, Charles Darwin suggested that process of natural selection is important for evolution. According to his theory of survival of the fittest, those who are well adapted and suited are more likely to reproduce and survive in their particular environment as compared to those that are less fit. Hence, successful individuals have greater number of progeny then others and are able to transmit their heritable characteristics to next generation. Thus, it suggested that environment plays greater role in evolving the species of the population (Kovalchuk and Kovalchuk 2012). Darwin explained heredity by proposing another theory of pangenesis. In this theory it was suggested that cells of an organism also shed gemmules (tiny particles) along with experiencing and responding to environmental changes. These gemmules are involved in development and transmission of traits to progeny by assembling in the gonads (Liu 2008). It has been suggested that microRNAs can be the potential mediators of gene and environment interaction (Qureshi and Mehler 2012). Conrad Waddington (1905-75) was the first scientist who was able to develop relation between developmental biology and genetics and gave the concept of epigenetics (Holliday 2006). Waddington provided the revolutionary idea of that era. Embryologists of that era believed that genes have a very little role in developmental process, for example, controlling some inconsequential phenotypic traits such as eye color. Conrad Waddington proposed the notion of genes and gene regulation with the help of an epigenetic landscape (as controlling cell fate and how cells become specialized) (Waddington 1957). Paper of Waddington on the epigenotype was published in the journal of Endeavour in 1942. With the help of strong genetic background he was able to explain the concept of epigenetics and vaguely provided its definition as the branch of biology which studies the causal mechanisms between genes and gene products, which bring the phenotype into being (Bhattacharya et al. 2011). Waddington had tied the knot of epigenetics knowledge with embryology. He developed this concept by observing the interaction between gene and environment in Drosophila and explained that temperature-shock after puparium is involved in causing the morphological changes in flies (cross veinless wings). Various terms introduced by Waddington were never widely used only uptill 21st century when epigenetics got recognition and became popular. He did not give the exact definition however the term evolved with the passage of time (Van Speybroeck 2002). The term of epigenetics has also been evolved, as knowledge about molecular mechanisms in controlling the regulation of expression of gene in eukaryotes has increased during past fifty years. It has been understood that

Epigenetics and Its Applications

147

somatic cells of all organisms have essentially the same DNA and patterns of expression of gene vary among different types of cells and can be clonally inherited. Hence definition of epigenetics can be coined as branch of biology that study the meiotically and/or mitotically inherited alterations in function of gene that is independent of changes in sequence of DNA. Until mid of twentieth century, the term epigenetics have mostly been used for the categorization of regulated processes starting from genetic material and shaping the final product that is, developmental processes starting from fertilized zygote to mature organism (Felsenfeld 2014). An essential attribute of epigenetics is that same genome exhibits alternative phenotype due to different states of epigenetic. The term epigenetics has been evolving since its beginning and now it can be defined as the study involving molecular processes within and around DNA molecule controlling the activity of gene without relying on nucleotide sequence of DNA which can be inherited through the processes of meiosis or mitosis (Kumar and Singh 2016).

Epigenetic Mechanisms The term epigenetics has advanced from explaining complex interactions of environment and genome for differentiation and development in higher organisms to heritable changes other than modifications in sequence of DNA. Histone modification and DNA methylation, epigenetic alteration or tags, are responsible for modifying chromatin structure, DNA accessibility and hence involved in the regulation of patterns of expression of gene. These mechanisms are essential in an adult organism, for the normal differentiation and development of distinguished lineages of cell. These processes can be influenced by external agents leading to environmental modifications of phenotypic traits. It has been found that epigenetic programming plays a vital role for the regulation of pluripotent genes, which can be inactivated during the process of differentiation (Loscalzo and Handy 2014). It has been suggested that epigenetic variations can also be promoted by environmental factors. Many scientists have reported that epigenetics play a crucial role in the phenomena of evolution and considered it as the primary source of molecular mechanisms involved in the process of natural selection (Pigliucci 2007).

148

Asif Nadeem and Maryam Javed

On the contrary, the basic concept of epigenetics is comparatively simple as it explains that genes can be turned on or off through inherited epigenome. While on the other hand, these processes are greatly influenced by number of environmental and biological events, for example this can be explained by methylation process that cause different effects in different amino acids of histone such as H3K36me3 and H3K9me3, and through modifications in the interpretation of studies performed in different organisms like plants, mammals, worms, flies, yeast, tumor cells and ciliated protozoans (Lim et al. 2010). Mechanisms of epigenetics are crucial for the normal development of an organism, as they are responsible for providing the basic cellular memory important in maintaining the exact phenotype during the process of mitosis. Chromatin involved in alteration of patterns of gene expression and posttranslational modifications of DNA are the epigenetic processes that accomplish this. Some of the necessary epigenetic mechanisms are described briefly.

DNA Methylation DNA methylation is the most commonly studied epigenetic process which is utilized by the cell for the maintenance and establishment of a controlled gene expression pattern (Quina et al. 2006). Most frequently occurring epigenetic modification is by an enzyme DNA methyltransferase that cause the methylation of fifth carbon cytosine residue to form 5-methylcytosine (5mC) (Kumar et al. 2017). Family of enzyme DNA methyltransferases mediate methylation by catalyzing the transfer of a methyl group to the 5th position of cytosine residues in CpG islands located within or near the regions of promoter. Gene silencing have frequently been correlated with methylation of DNA in mammals, as it is the source of association of genome-wide regulation and inheritance of lineage-specific silencing of gene between generations of cell (Robertson and Wolffe 2000). DNA methylation patterns are definite for all kinds of cell and help to recognize identity of cell and its type (Szyf 2005a). It has been observed that mostly methylated DNA has association with an inactive chromatin whereas, unmethylated DNA is correlated with active chromatin configuration (Szyf 2005b). Among covalent modifications, DNA methylation is the only occurring modification on the sequences of DNA. It has been found that DNA methylation is crucial for stem cell differentiation as well as for embryonic development (Bröske et al. 2009; Li et al. 1992; Reik et al. 2001). In eukaryotic genomes and sometimes in genomes of bacteria, 5-methylcytosine (methylation of DNA on the cytosine) have mostly been found. In mammals,

Epigenetics and Its Applications

149

methylation of DNA exists mainly within CpG dinucleotides (C followed by G). Although methylation of DNA in other C bases had also been found in some invertebrates, plants and fungi (Zemach et al. 2010). DNA methylation can be inherited as it has been reported that DNA replication would not be deprived of methylation on the CpG site (Holliday and Pugh 1975; Riggs 1975). Methyltransferases (family of proteins that maintain the DNA methylation in cells) helps in addition of methyl group to the cytosine, during replication, of newly synthesized DNA. This methylation process can be blocked by the enzymes demethylases, which help in the removal of methyl groups from sequences of DNA (Khatib 2012). DNA methyltransferases family comprises of DNMT1that helps in maintenance of methylation after completion of replication of DNA by utilizing parental strand of DNA as template for the methylation of daughter DNA strand (Bestor et al. 1992; Pradhan et al. 1999), DNMT2 is the smallest DNA methyltransferase of mammals. Its role in DNA methylation is confusing as on one hand it has been found that it has DNA methylation role while on the other hand studies has pointed that there is no methylation activity for this enzyme (Hermann et al. 2003; Kunert et al. 2003; Liu et al. 2003; Rai et al. 2007; Tang et al. 2003). The de novo methyltransferases of this family are DNMT3a and DNMT3b, as they are involved in establishing new patterns of DNA methylation by the addition of methyl group to unmethylated DNA especially during the processes of gametogenesis and during the development of early embryo (Okano et al. 1999; Okano et al. 1998). DNMT3L is a protein related with family of DNA methyltransferase, however, this enzyme does not contain motifs of methyltransferase and hence it is not able to methylate the DNA. Although, it has been found that DNMT3L helps in de novo methylation activity of DNMT3A by interacting and forming a dimer with DNMT3A (Jia et al. 2007). DNMT3L also involved in the activation of HDAC1 (histone deacetylase 1) and in the recognition of histone tails (H3) that are methylated at lysine 4 (Deplus et al. 2002; Ooi et al. 2007; Turek-Plewa and Jagodzinski 2005).

DNA Methylation and Gene Expression Methylation of DNA is an epigenetic process that happens when a methyl group (CH3) is added to DNA and results in modification of gene function thus affecting expression of the gene. The promoter of gene should be easily accessible to regulatory units like enhancers and transcription factors for

150

Asif Nadeem and Maryam Javed

transcription of gene to occur properly (Watt and Molloy 1988). DNA methylation is involved in preventing the binding of transcription factor resulting in change in structure of chromatin and hence leading to the restriction of access of transcription factors to the gene promoter. For instance, histone modification occurs when methylated CpGs binds with methyl-CpGbinding domain proteins and recruit the repressor complexes. Histone proteins are the salient components of chromatin structure and they function as spools for wrapping of DNA (which can be altered). Formation of heterochromatin (more condensed chromatin structure) is due to histone modification via repressor complexes as compared to euchromatin, an open and active structure of chromatin, important for the process of transcription (Bird and Wolffe 1999; Jones et al. 1998; Nan et al. 1998). Defects in methylation of DNA can lead to diseases which can affect process of embryogenesis, cancer and genomic imprinting (Lim and Maher 2010). It has been reported that environmental and nutritional agents can influence methylation of DNA and hence modifications in DNA methylation can lead to alteration of expression of gene to develop diverse phenotypes with the ability of increased or decreased risk of disease and productivity (IbeaghaAwemu and Zhao 2015).

Histone Modifications The basic unit of chromatin, nucleosome, is formed by tightly packed eukaryotic DNA. Histone proteins forms an octamer of two each of H2A, H2B, H3, and H4 thus forming the basic unit of chromatin. DNA molecule wraps around the core of this octamer providing the capacity to regulate gene expression and maintaining stability of structure. Each of the core histone present in nucleosome is composed of globular domains as well as highly dynamic N-terminal tail, which extends from globular domains. Tails of histone proteins can undergo various post-translational modifications like phosphorylation, acetylation, sumoylation, methylation, ADP-ribosylation, glycosylation, proline isomerization, propionylation, ubiquitylation, butyrylation and citrullination (Gardner et al. 2011). Histone acetylation is one of the well-known and best understood modification of histone and disruption of this process can cause tumourigenesis and aberrant expression of gene. Histone acetylation is controlled by HATs (histone acetylases) and HDACs (histone deacetylases), which are also found to function as transcriptional co-activators and corepressors respectively. In particular residues present in histone tails, histone acetylation is a reversible modification process (Cairns 2001; Yang 2004).

Epigenetics and Its Applications

151

Levels of acetylation of histone vary and depend on the domains of chromatin. Acetylation is considered to be present at low levels through most of the genome, likely through an equilibrium between activities of HAT and HDAC (Vogelauer et al. 2000). Interaction of transcription-regulatory proteins with target DNA in chromatin, chromatin structure and modulation of gene expression at different levels, are greatly affected by histone acetylation. For example, transcription of gene is repressed by compact chromatin structure, formed by the removal of acetyl groups by HDAC. However, it has also been observed that low levels of histone acetylation might be able to induce other epigenetic modifications like DNA methylation hence leading to permanent gene silencing (Cairns 2001; Yang 2004).

Chromatin Variation The cell’s epigenome is considered to contain a complete set of epigenetic markers, for example methylation of DNA, modification of histone tail (methylation. acetylation, ubiquitylation, etc.), remodeling of chromatin and other molecules that have capability of transmitting information through regulation of gene like non-coding RNA species (e.g., long non-coding RNAs and microRNAs), which occur in a cell at any given point and time (Rakyan et al. 2011). Biochemical and genetic research has provided evidence that link covalent modifications of histone with well-established phenomena of epigenetics. Regulation of gene expression is essentially controlled by chromatin structure. Euchromatin (transcriptionally active) and heterochromatin (transcriptionally inactive) are the domains of eukaryotic genome. Regulation and assembly of chromatin structure is controlled by three major factors, such as histone modification, DNA methylation and ATPdependent chromatin-remodeling complex. Various ATP-dependent chromatin-remodeling complexes have been found in observed in vertebrates, such as NuRD/Mi-2 complex, PRC1, and PRC2, ISWI and SWI/SNF family. The SWI/SNF family has eleven subunits which are encoded by twenty genes and it is the most studied complex (Khatib 2012). Study on Drosophila revealed that more than hundred genes are involved in encoding important components of heterochromatin. Among these genes, many genes are observed to be conserved starting from flies to human beings, for example the histone H3K9 methyltransferase Su(var)3-9 and including heterochromatin protein 1 (HP1). It has also been suggested that such modifications which can alter charge like phosphorylation and acetylation can directly modify the

152

Asif Nadeem and Maryam Javed

physical attributes of chromatin fiber that can lead to the modification of higher order structures (Schotta et al. 2003). Incorporation of specialized histone variants and chromatin remodeling serve as additional tools of cell for the introduction of modification in the chromatin template. Chromatin accessibility by altering histone-DNA interactions either by ejecting nucleosomes or by sliding is thought to be done by ATP-dependent chromatin remodeling complexes (Goldberg et al. 2007). Variants of histone like H2A.Z and H3.3 have their own patterns of modification and they are exchanged with the help of exchange machinery and dedicated chaperone within domains of chromosome (Polo and Almouzni 2006). It has been found that links exist between the covalent and noncovalent procedures like effectors have capability of including subunits of nucleosome remodeling complexes (Wysocka et al. 2006). Histone variants, covalent modification and nucleosome remodeling work in collaboration for the introduction of significant modification in the chromatin fiber and their collective collaboration in epigenetics is rigorously being explored (Goldberg et al. 2007).

Noncoding RNA Transposons are the repeated sequences of DNA that constitute about thirty five percent of the genome of mammals and noncoding RNAs constitute about 1.2 percent of eukaryotic genome. In humans only two percent of the genome is translated into proteins. Those RNAs which are not translated into proteins are known as ncRNAs (noncoding RNAs) and are known to perform several functions of the cell, such as gene regulation. Along with performing its intermediary function between DNA and protein, RNA also performs a distinct role in regulation of gene with the help of ncRNAs and coding RNAs (Mattick et al. 2006). Regulation of gene expression at various levels like translation, splicing, mRNA degradation and transcription is regulated by several noncoding RNAs, such as miRNAs (microRNAs), lncRNAs (long ncRNAs) and siRNAs (small interfering RNAs) (Kaikkonen et al. 2011). Post-transcriptional silencing is mediated by small interfering RNAs, a double stranded RNA, in a way that it induces heterochromatin to detect histone deacetylase complexes (Grewal 2010). Mature microRNA, a single stranded 18-24 nucleotides in length, is produced by cleavage of precursor RNA with the help of RNA polymerase III enzymes DROSHA and DICER. These microRNAs are capable of controlling expression of gene either by translational repression or by targeting specific mRNAs for degradation. Gene expression can also be controlled by

Epigenetics and Its Applications

153

microRNAs by recruiting chromatin-modifying complexes to DNA through binding to regulatory regions of DNA and hence changing the conformation of chromatin (Chuang and Jones 2007; Hutvágner and Zamore 2002; Lee et al. 2003). A subset of long noncoding RNA, LincRNA, is known to have a high conservation across various species. They have been familiar for the establishment of cell type–specific epigenetic states by guiding chromatinmodifying complexes to particular genomic loci. Pluripotent transcription factors NANOG and OCT4 regulate long noncoding RNAs expression in embryonic development and help in particular gene expression of cell lineage. LincRNAs also have significant role in developmental mechanisms like genomic imprinting and X-chromosome inactivation (Guttman et al. 2009; Mohamed et al. 2010; Ponting et al. 2009). It has been proved that RNA, especially noncoding RNAs, have a major part in controlling various processes of epigenetics. This can be explained by a prominent example of mammals and Drosophila in which dosage compensation mechanisms are mediated by XIST and rox RNAs respectively. This RNA involvement ranges from silencing of both genes and repetitive DNA sequences by PTGS (posttranscriptional) and TGS (transcriptional) RNAi (RNA interference) related pathways, respectively, in approximately all eukaryotes. To obtain stable silencing, these RNAs work with different constituents of DNA methylation machinery and chromatin of cell. However, PTGS-inducing RNAs like siRNAs and miRNAs are not considered to be epigenetic in nature, but TGS-evoking RNAs like small RNAs in S. pombe, Xist RNA and repeat-associated siRNAs have been proved to be more epigenetic as they are capable of inducing long-term silencing effects which can be passed onto next generation through cell division (Bernstein and Allis 2005).

Inheritance and Epigenetics Epigenetic inheritance has a critical role in various biological mechanisms like silencing of transposons, imprinting of transposons and gene expression in early embryo development. Evidences have suggested that epigenetic effects can be transferred from generation to generation (Triantaphyllopoulos et al. 2016). It must be taken into consideration that there are two kinds of epigenetic inheritance: (a) transgenerational epigenetic inheritance through germ line, in which gene expression patterns are controlled and can be inherited by next

154

Asif Nadeem and Maryam Javed

generation (Daxinger and Whitelaw 2012). (b) Epigenetic marks, in this type marks remain conserved during the process of mitosisand can be passed on to next generation in the soma line (Jablonka and Raz 2009). Some of the conditions and mechanisms of epigenetic inheritance are discussed here.

X-Chromosome Inactivation To counteract for gene number disproportions of X-linked genes, one out of two X-chromosomes can be silenced through a mechanism of X-chromosome inactivation, in females (Heard and Disteche 2006). At the blastocyst stage of early embryogenesis, embryos having more than one X-chromosome (XXY males and XX, XXX females) can undergo random inactivation of Xchromosome. The X inactivation center (XIC), a master switch locus, regulates X-chromosome inactivation by controlling the expression of the lincRNA gene X-inactive specific transcript (XIST) and its antisense transcription unit (TSIX). For example, in mouse it has been studied that XIC is used for sensing of X-chromosome number in the cell and it is involved in random silencing of one of the X-chromosome. This is achieved by Xist RNA which is involved in coating of all of the future inactive X-chromosomes, along with polycomb repressor complex II recruitment and the addition of silencing chromatin marks for instance DNA methylation at CpG-rich promoters, H3 lysine27 methylation and histone H3 and H4 hypoacetylation. Regulation of this process is not completely understood yet and it limits each diploid cell to have one active copy of the X-chromosome (Heard and Disteche 2006; Inbar-Feigenberg et al. 2013; Morey and Avner 2010; Okamoto and Heard 2009). Genomic Imprinting Genomic imprinting is an epigenetic mechanism in which germ lines of female and male provide the specific marks called imprints on particular regions of chromosome. These imprints are capable of providing the potential for monoallelic parent-of-origin-particular expression in certain cell types or during development (Reik and Walter 2001). Normally both maternal and paternal alleles are involved for the expression of autosomal genes, however, on the other hand expression of imprinted genes is predominantly controlled either by paternal or maternal allele in a parent-of-origin-specific manner. It means that expression of an imprinted gene is from the maternal allele and the copy of paternal allele is silenced or vice versa. Imprinted genes do not predict parent-of-origin

Epigenetics and Its Applications

155

specificity for gene expression and do not follow the pattern of Mendelian Inheritance. Imprinted domains are formed when imprinted genes are clustered together. ICs (imprinting centers) are present within these domains which are involved in regulation of imprinted genes. DMRs (differentially methylated regions) are used for the characterization of imprinting centers. Parent-of-origin-specific histone modification and DNA methylation marks are present on differentially methylated regions. The basis of the parent-oforigin-particular expression of imprinted genes is formed by cis-acting differentially methylated regions and trans-acting factors. For instance, IGF2/H19 (insulin growth factor2) IC is involved in regulation of expression of paternal IGF2 gene alng with maternal gene expression of H19, whereas, these two genes are present on the similar imprinted domain 90 kb apart (Choufani et al. 2010; Gabory et al. 2010; Tycko and Morison 2002).

Environment and Inheritance Historically, it has been proved through experimentation that different traits can be passed on from one generation to another generation. In the 18th century, Jean-Baptiste Lamarck, the naturalist, was the one who proposed the role of environment in inheritance. He suggested that heritable changes in living cells can be influenced by environment within a generation or two (Jablonka and Lamb 1999). Eventually, in the 19th century, the Lamarckian principles were proved by further experimentation. Ivan Pavlov, a physiologist and physician, conducted an experiment on mouse and a remarkable discovery was made that the progeny of mouse had learnt the maze faster than their parents (Pandian and Sugiyama 2013). Afterwards, it was proved that a methyl donor, vitamin B, had a distinguished consequence on methylation of DNA as well as the generation of healthy puppies is facilitated by DNA methylation, caused by vitamin B. These puppies were not susceptible to diabetes (Waterland et al. 2006). It has been proved that environmentally induced epigenetic marks can be acquired by individuals leading to the formation of a type of transgenerational memory. Modifications induced by environment in the epigenome of an organism has been studied in the methylation pattern of genomic DNA for at least up to 3 generations. Best example can be observed in sheep, where diet of grandmothers affected the granddaughters weight (Daxinger and Whitelaw 2010; Nijland et al. 2008). Nutrition and diet in human and animals, have been the main focus of scientists for research due to many health reasons and it has given significant understanding regarding

156

Asif Nadeem and Maryam Javed

potential epigenetic mechanisms that might be effective. It has been observed in animals that nutritional changes can be responsible for the modification of histone as well as methylation patterns of DNA and particular effect of nutrients on histone modification and DNA methylation has also been reported (Triantaphyllopoulos et al. 2016).

Inheritance of Phenotypic Variation in Livestock Epigenetics can be an essential source of knowledge about complex inherited diseases and traits and this information can be very fruitful in animal breeding. In other words, livestock genetics and breeding can be improved with the help of epigenetics. Epigenome can be developed by various mechanisms, as discussed earlier, like histone modification, DNA methylation, regulation of gene expression by non-coding RNAs, chromatin remodeling any other force that alters phenotype of animal. Gene expression, behavior, phenotype plasticity and cell fate can be altered and affected by these processes respectively (Bannister and Kouzarides 2011; González-Recio et al. 2015; Ibeagha-Awemu and Zhao 2015). Necessarily, relationship between epigenetics and phenotype has been more obvious in disease. Like most of the aberrant pathways of epigenetics have been found in osteoarthritis, atherosclerosis, imprinting disorders, improper gene inactivation in cancer, neuropsychiatric disorders and lupus erythematosus (Ballestar et al. 2006; Esteller 2002; Feinberg et al. 2006; Lund et al. 2004; Roach and Aigner 2007). Abnormalities that arise due to epigenome are associated with developmental disorders as well as mental and metabolic diseases, late onset adult disorders, have also been observed. In livestock, epigenetic processes, have mainly concentrated on the molecular features that control expression of genomic regions or certain genes and also in response to external factors of environment (Attig et al. 2010).

Applications of Epigenetics Study of epigenetics has revolutionized the way of scientific research and it serves as a valuable tool in many fields and can be used to improve the life quality of animals.

Epigenetics and Its Applications

157

Epigenetics and Livestock Some of the epigenetic mechanisms have been used in livestock production and breeding to obtain desired traits. Quantitative traits are some of the most important economic traits in livestock and it has been studied that some of these traits are partially regulated by the imprinted genes. Genomic imprinting is an epigenetic regulation in which expression of one allele differ from other due to its dependence on parent from which it was inherited. Traits like yield of milk, fetal development, carcass traits, meat and fat deposition and growth are affected by the imprinted genes found in livestock. Livestock breeding programs are required to take imprinting into consideration and change the standard protocol. It has been suggested that more focus is required on maternal contribution and will need different values of breeding for females and males, additive genetic variances and dominance deviations (Zeric 2012). One of the most commonly studied mammalian imprinted gene is insulin-like growth factor 2 (IGF2) which is involved in encoding a fetal mitogenic protein structurally related to insulin, IGF-II. Concentration on the role of IGF2 in livestock has been increased. It has been considered that IGF2 plays a crucial role in modification of essential traits of production like milk and meat production in dairy cattle and beef (O’Dell and Day 1998; Rijnkels et al. 2010; Zeric 2012). Lost heritability of complex diseases and traits as well as missing causalities in animal breeding can be found with the help of epigenetics. Environmental factors like pollution, diet, drugs or stress along with many others are not only involved in modifying the individual’s life but also the patterns of DNA (Petronis 2010). So, few factors of environment are considered to be more involved in increasing patterns of methylation and it is suggested that these patterns can cause the phenotypic modification among individuals. Parameters might be measure accurately by removing this noise from phenotype (González-Recio 2012). Occurrence of disease and utilization of antibiotics in animal production program can reduced greatly with the help of epigenetic information. Methylation of DNA has broadly been used to develop personalized medicine in cancer research of humans and it seemed that this technique can also be useful in veterinary medicine. For example, patters of methylation that are associated with genomic region of certain diseases can be modified by developing drugs (Gomez and Ingelman‐Sundberg 2009; Peedicayil 2008). At this time, these perspectives are helpful in making epigenetics an interesting field of research, as its potential application could lead to perform

158

Asif Nadeem and Maryam Javed

management and breeding of livestock in a more sustainable and efficient manner.

Epigenetic Changes and Cloning of Domestic Animals Early embryo development and gametogenesis are thought to be more crucial processes for considerable changes of DNA methylation. As compared to differentiated somatic cells, sperms, gametes and oocytes have relatively low DNA methylation levels (Phutikanit et al. 2010). These low DNA methylation levels are responsible for the loss of pronuclei of both males and females in most mammalian species, immediately after fertilization. However, speed and mechanisms of demethylation differ from each other. Process of active demethylation exist in the absence of replication of DNA or transcription when there is rapid loss of methylation of male pronucleus of the zygote. Involvement of component of elongater complex KAT9 (also known as Elp3) has been reported (Okada et al. 2010). While in the absence of functional DNMT1 (DNA methyl transferase-1), the female pronucleus, with each round of replication of DNA undergoes stepwise decrement in methylation of DNA. Due to occurrence of this process, overall level of DNA methylation reduces as well as the newly synthesized strand of DNA lack process of methylation and hence this process (replication-dependent demethylation) is referred as passive demethylation. Extensive conservation of this kind of demethylation of the parent genome has been reported in mouse embryos, rat, bovine and pig (Dean et al. 2001). Eventually, further quantitative studies on paternal demethylation showed specie variation. It has been reported that active demethylation occur partially in cow and has not been seen in rabbit and sheep zygotes (Beaujean et al. 2004). Comparison among zygotes of cattle, goats, mice, sheep, rats, goats, pigs and rabbits for dynamics of global DNA methylation has been done. This comparison lead to the classification of those studied species into three categories. This categorization was done according to methylation of DNA states in the male pronucleus during the zygotic stage of embryo development. Rat and mouse were categorized in type I species and it was observed that male pronucleus actively demethylated to near completion. Paternal DNA methylation was largely sustained in rabbit and sheep (type II species). Partial demethylation has been observed in male pronucleus of type III species i.e., goat and cattle (Park et al. 2007). In another study it was suggested that embryos of goat were similar to rabbits and sheep that were classified in group II (Hou et al. 2006).

Epigenetics and Its Applications

159

Although epigenetic modifications differ among various domestic species during development of early embryo, however, the basic phenomenon is the occurrence of various events that take place in histones and DNA during complex stages of reprogramming and development. Wide range of aberrations in both histone modifications and DNA methylation have been seen in cloned embryos. Chromatin architecture and noncoding RNA (other aspects of epigenetics) are considered as next vital areas of research in somatic cell nuclear transfer (Khatib 2012).

Epigenetics and Nutrition An interesting topic of epigenetics and nutrition is nutrigenomics. Main focus of nutrigenomics is regulation of nutrient-epigenetic in animals. Nutrigenomics help in understanding association between genetic functions and dietary components as well as the consequence of nutrients on expression of gene and also provides the understanding of activity of dietary constituents. Research on nutrigenomics is still in its early stage, especially in farm animals. It is becoming clearer with the passage of time that relationship exist between genome and external environment like availability of nutrient and temperature. Scientists have developed an understanding that existence of an organism is not only dependent on the sequence of DNA but also on external factors. It has also been understood that genomic sequence only plays a partial role in determining the phenotype of an individual. Changes in external environment such as nutrient availability or temperature of cells (part of a multi-cellular organism or free living) must be quickly responded by cells to survive and exploit in such conditions. Association present between environment and genes have a vital role in identification of resistance of humans and animals to different stress types. Phenotypic plasticity (changes in phenotype of various organisms) is known to be caused by certain nutritional and environmental factors. Field of animal and human nutrition have been revolutionized with the development of understanding that nutrients are capable of modulating the molecular mechanisms which are involved in forming the basis of physiological functions of an organism (Mutch et al. 2005). Relationship between nutrientepigenetic and phenotype can be best understood with the help of VFAs (volatile fatty acids) such as butyrate, acetate and propionate, also known as SCFAs (short-chain fatty acids), and by the examining the gene expression regulation controlled by short-chain fatty acids. Short-chain fatty acids are not only essential for their nutritional value but also (especially butyrate) involved in modulating apoptosis, motility, proliferation and differentiation of cell, as well as inducing cell cycle arrest.

160

Asif Nadeem and Maryam Javed

Butyrate is also known for its therapeutic potential against cancer and is being intensively investigated (Myzak and Dashwood 2006). The cell cycle regulatory effects of butyrate have also been investigated in normal cells of bovine at molecular and cellular levels. Modifications in the cells has been studied which are treated with other HDAC (histone deacetylase) inhibitors and with butyrate and is considered as the global hyperacetylation of histones. Association among cell cycle progression, overall chromosome stability, chromatin structure, DNA replication and histone posttranslational modifications have become very distinct and clear (Wolffe and Guschin 2000). Alterations in mechanisms of epigenetics like microRNA, and histone posttranslational modification can result in modification of phenotypic characteristics of animals. Regulation of gene expression has served to provide several pathways in which cells have capability of controlling their responses to any external stimuli. A vast variety of research has been going on to uncover molecular phenomenon of epigenetics. With the development of novel biotechnological techniques, approaches of epigenetics will help to identify genome-wide epigenetic markers which are essential for the regulation of dietary components. The characterization and identification of those epigenetic markers will help to understand the role of dietary factors in regulating the mechanisms of epigenetics. This will provide favorable chance of developing an understanding of the role and action of dietary constituents in modifying the patterns of epigenetics as well as it also plays an essential role in functional genomic research of farm animal industry and bovines. It is the need of hour to better understand the function and difference of epigenetics and epigenomics. Both of the fields are much more dynamic and complicated. Standardized procedures and platforms are required for the development of maps of epigenomics markers and epigenomic research of farm animals. Great work is required to done for efficient recognition of epigenomic markers and well understanding of the elements that are involved in inducing modifications in these markers. Considerable international coordination is required to achieve this aim. It is expected that epigenetic landscape will take over in animal sciences (Khatib 2015). It has been strongly suggested that studies on nutrigenomics on domestic farm animals are still in its early stages, although epigenetics is considered among most rapidly developing fields of molecular biology. Butyrate, in addition to its nutritional value in cattle also involved in induction of histone modifications and alteration of several biological mechanisms in bovine cells.

Epigenetics and Its Applications

161

These procedures may involve apoptosis, cellular differentiation and cell cycle arrest (Khatib 2012).

Epigenetics and Therapeutics Number of researches associate epigenetic modifications with several developmental procedures, such as axis patterning and differentiation, genomic imprinting, X-chromosome inactivation andto various adult and childhood disorders like mental retardation, chromosomal instabilities and cancer. As it is well known that genetics play a sufficient role in complexity of these procedures, however, unveiling the processes of epigenetics may play a key role in perceiving the basic mechanisms responsible for development and regulation of such diseases. Abnormalities in modifications of epigenetics have been found in various diseases of humans, including autoimmune diseases, neurological disorders and cancers (Portela and Esteller 2010). As compared to epigenetic studies on human diseases, less research has been done on animals. Abnormal methylation of DNA has been reported in cattle clones having dysfunctions in development indicating that maintenance of normal status of epigenetics is essential for domestic animals. Effect of epigenetics on various diseases have been studied on several model animals including mice and rats. LOS (large-offspring syndrome), one of the bovine developmental disorder, is known to have epigenetic constituents during development of embryo. It has been observed that largeoffspring syndrome is linked with reproductive technologies like somatic-cell nuclear transfer and in vitro fertilization, mainly utilized with cattle. Difficulty in standing and breathing, overgrowth of organs, birth weight as well as immunological and skeletal defects, increased rates of neonatal and fetal deaths are some of the symptoms of large-offspring syndrome (Garry et al. 1996; Kruip and Den Daas 1997; Walker et al. 1996). Marek’s disease virus is responsible for Marek’s disease in chickens which leads to the formation of T-cell lymphoma that can affect chickens as well as other birds. Vaccines have been developed to treat such disease but these vaccines were not a complete success (Davison and Nair 2004). It has been found in a study that levels of DNA methylation in cells of thymus were decreased when these cells were exposed to the virus. It was also reported that propagation in infected cells was slowed after invitro pharmacological inhibition of methylation of DNA. It has been suggested that virus

162

Asif Nadeem and Maryam Javed

susceptibility or resistance may be associated with methylation of DNA in host (Tian et al. 2013).

Conclusion It can be concluded that epigenetics is conceptualized as controlling the expression of genomic DNA. As a great number of research and studies have been done on epigenetics, still more is needed to be done. Rapid advancement in technology is greatly influencing the pace of epigenetic and genomic profiling of various diseases. Various epigenetic mechanisms can be proved as a useful tool to increase the production of essential traits in livestock.

References Attig, Linda, Anne Gabory, and Claudine Junien. “Early nutrition and epigenetic programming: chasing shadows.” Current Opinion in Clinical Nutrition & Metabolic Care 13, no. 3 (2010): 284-293. https://doi.org/10.1097/MCO.0b013e328338aa61. Babenko, Olena, Igor Kovalchuk, and Gerlinde A. Metz. “Epigenetic programming of neurodegenerative diseases by an adverse environment.” Brain research 1444 (2012): 96-111. https://doi.org/10.1016/j.brainres.2012.01.038. Ballestar, E., Esteller, M., & Richardson, B. C. (2006). The epigenetic face of systemic lupus erythematosus. The Journal of Immunology, 176(12), 7143-7147. https://doi.org/10.4049/jimmunol.176.12.7143. Bannister, Andrew J., and Tony Kouzarides. “Regulation of chromatin by histone modifications.” Cell research 21, no. 3 (2011): 381-395. https://doi.org/10.1038/ cr.2011.22. Beaujean, Nathalie, Geraldine Hartshorne, Jennifer Cavilla, Jane Taylor, John Gardner, Ian Wilmut, Richard Meehan, and Lorraine Young. “Non-conservation of mammalian preimplantation methylation dynamics.” Current Biology 14, no. 7 (2004): R266R267. https://doi.org/10.1016/j.cub.2004.03.019. Bernstein, Emily, and C. David Allis. “RNA meets chromatin.” Genes & development 19, no. 14 (2005): 1635-1655. https://doi.org/10.1101/gad.1324305. Bestor, Timothy H., Glenn Gundersen, Anne-Brit Kolstø, and Hans Prydz. “CpG islands in mammalian gene promoters are inherently resistant to de novo methylation.” Genetic Analysis: Biomolecular Engineering 9, no. 2 (1992): 48-53. https://doi.org/ 10.1016/1050-3862(92)90030-9. Bhattacharya, Sudin, Qiang Zhang, and Melvin E. Andersen. “A deterministic map of Waddington’s epigenetic landscape for cell fate specification.” BMC systems biology 5, no. 1 (2011): 1-12. https://doi.org/10.1186/1752-0509-5-85.

Epigenetics and Its Applications

163

Bird, Adrian. “Perceptions of epigenetics.” Nature 447, no. 7143 (2007): 396. https://doi.org/10.1038/nature05913. Bird, Adrian P., and Alan P. Wolffe. “Methylation-induced repression—belts, braces, and chromatin.” Cell 99, no. 5 (1999): 451-454. https://doi.org/10.1016/S0092-8674(00) 81532-9. Bröske, Ann-Marie, Lena Vockentanz, Shabnam Kharazi, Matthew R. Huska, Elena Mancini, Marina Scheller, Christiane Kuhl et al. “DNA methylation protects hematopoietic stem cell multipotency from myeloerythroid restriction.” Nature genetics 41, no. 11 (2009): 1207-1215. https://doi.org/10.1038/ng.463. Cairns, Bradley R. “Emerging roles for chromatin remodeling in cancer biology.” Trends in cell biology 11 (2001): S15-S21. https://doi.org/10.1016/S0962-8924(01)82074-2. Choufani, Sanaa, Cheryl Shuman, and Rosanna Weksberg. “Beckwith–Wiedemann syndrome.” In American Journal of Medical Genetics Part C: Seminars in Medical Genetics, vol. 154, no. 3, pp. 343-354. Hoboken: Wiley Subscription Services, Inc., A Wiley Company, 2010. https://doi.org/10.1002/ajmg.c.30267. Chuang, Jody C., and Peter A. Jones. “Epigenetics and microRNAs.” Pediatric research 61, no. 7 (2007): 24-29. https://doi.org/10.1203/pdr.0b013e3180457684. Davison, Fred, and Venugopal Nair, eds. Marek’s disease: an evolving problem. Elsevier, 2004. Daxinger, Lucia, and Emma Whitelaw. “Transgenerational epigenetic inheritance: more questions than answers.” Genome research 20, no. 12 (2010): 1623-1628. https://doi.org/10.1101/gr.106138.110. Daxinger, Lucia, and Emma Whitelaw. “Understanding transgenerational epigenetic inheritance via the gametes in mammals.” Nature Reviews Genetics 13, no. 3 (2012): 153-162. https://doi.org/10.1038/nrg3188. Dean, Wendy, Fátima Santos, Miodrag Stojkovic, Valeri Zakhartchenko, Jörn Walter, Eckhard Wolf, and Wolf Reik. “Conservation of methylation reprogramming in mammalian development: aberrant reprogramming in cloned embryos.” Proceedings of the National Academy of Sciences 98, no. 24 (2001): 13734-13738. https://doi.org/10.1073/pnas.241522698. Deichmann, Ute. “Epigenetics: The origins and evolution of a fashionable topic.” Developmental biology 416, no. 1 (2016): 249-254. https://doi.org/10.1016/ j.ydbio.2016.06.005. Del Giacco, Luca, and Cristina Cattaneo. “Introduction to genomics.” Molecular Profiling (2012): 79-88. https://doi.org/10.1007/978-1-60327-216-2_6. Deplus, Rachel, Carmen Brenner, Wendy A. Burgers, Pascale Putmans, Tony Kouzarides, Yvan de Launoit, and François Fuks. “Dnmt3L is a transcriptional repressor that recruits histone deacetylase.” Nucleic acids research 30, no. 17 (2002): 3831-3838. https://doi.org/10.1093/nar/gkf509. Esteller, Manel. “CpG island hypermethylation and tumor suppressor genes: a booming present, a brighter future.” Oncogene 21, no. 35 (2002): 5427-5440. https://doi.org/10.1038/sj.onc.1205600. Feinberg, Andrew P., Rolf Ohlsson, and Steven Henikoff. “The epigenetic progenitor origin of human cancer.” Nature reviews genetics 7, no. 1 (2006): 21-33. https://doi.org/10.1038/nrg1748.

164

Asif Nadeem and Maryam Javed

Felsenfeld, Gary. “A brief history of epigenetics.” Cold Spring Harbor perspectives in biology 6, no. 1 (2014): a018200. https://doi.org/10.1101/cshperspect.a018200. Gabory, Anne, Hélène Jammes, and Luisa Dandolo. “The H19 locus: role of an imprinted non‐coding RNA in growth and development.” Bioessays 32, no. 6 (2010): 473-480. https://doi.org/10.1002/bies.200900170. Gardner, Kathryn E., C. David Allis, and Brian D. Strahl. “Operating on chromatin, a colorful language where context matters.” Journal of molecular biology 409, no. 1 (2011): 36-46. https://doi.org/10.1016/j.jmb.2011.01.040. Garry, F. B., R. Adams, J. P. McCann, and K. G. Odde. “Postnatal characteristics of calves produced by nuclear transfer cloning.” Theriogenology 45, no. 1 (1996): 141-152. https://doi.org/10.1016/0093-691X(95)00363-D. Goldberg, Aaron D., C. David Allis, and Emily Bernstein. “Epigenetics: a landscape takes shape.” Cell 128, no. 4 (2007): 635-638. https://doi.org/10.1016/j.cell.2007.02.006. Gomez, A., and M. Ingelman‐Sundberg. “Pharmacoepigenetics: its role in interindividual differences in drug response.” Clinical Pharmacology & Therapeutics 85, no. 4 (2009): 426-430. https://doi.org/10.1038/clpt.2009.2. González-Recio, Oscar. “Epigenetics: a new challenge in the post-genomic era of livestock.” Frontiers in genetics 2 (2012): 106. https://doi.org/10.3389/fgene. 2011.00106. González-Recio, Oscar, Miguel Angel Toro, and Alex Bach. “Past, present and future of epigenetics applied to livestock breeding.” Frontiers in genetics 6 (2015): 305. https://doi.org/10.3389/fgene.2015.00305. Grewal, Shiv I. S. “RNAi-dependent formation of heterochromatin and its diverse functions.” Current opinion in genetics & development 20, no. 2 (2010): 134-141. https://doi.org/10.1016/j.gde.2010.02.003. Guttman, Mitchell, Ido Amit, Manuel Garber, Courtney French, Michael F. Lin, David Feldser, Maite Huarte et al. “Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals.” Nature 458, no. 7235 (2009): 223227. https://doi.org/10.1038/nature07672. Heard, Edith, and Christine M. Disteche. “Dosage compensation in mammals: fine-tuning the expression of the X chromosome.” Genes & development 20, no. 14 (2006): 18481867. https://doi.org/10.1101/gad.1422906. Hermann, Andrea, Sigrid Schmitt, and Albert Jeltsch. “The human Dnmt2 has residual DNA-(cytosine-C5) methyltransferase activity.” Journal of Biological Chemistry 278, no. 34 (2003): 31717-31721. https://doi.org/10.1074/jbc.M305448200. Holliday, Robin. “Epigenetics: a historical overview.” Epigenetics 1, no. 2 (2006): 76-80. https://doi.org/10.4161/epi.1.2.2762. Holliday, Robin, and John E. Pugh. “DNA modification mechanisms and gene activity during development.” Science 187, no. 4173 (1975): 226-232. Hou, J., T. H. Lei, L. Liu, X. H. Cui, X. R. An, and Y. F. Chen. “DNA methylation patterns in in vitro-fertilised goat zygotes.” Reproduction, Fertility and Development 17, no. 8 (2006): 809-813. https://doi.org/10.1071/RD05075. Hutvágner, György, and Phillip D. Zamore. “A microRNA in a multiple-turnover RNAi enzyme complex.” Science 297, no. 5589 (2002): 2056-2060. https://doi.org/10. 1126/science.1073827.

Epigenetics and Its Applications

165

Ibeagha-Awemu, Eveline M., and Xin Zhao. “Epigenetic marks: regulators of livestock phenotypes and conceivable sources of missing variation in livestock improvement programs.” Frontiers in genetics 6 (2015): 302. https://doi.org/10.3389/fgene. 2015.00302. Inbar-Feigenberg, Michal, Sanaa Choufani, Darci T. Butcher, Maian Roifman, and Rosanna Weksberg. “Basic concepts of epigenetics.” Fertility and sterility 99, no. 3 (2013): 607-615. https://doi.org/10.1016/j.fertnstert.2013.01.117. Jablonka, Eva, and Marion J. Lamb. Epigenetic inheritance and evolution: the Lamarckian dimension. Oxford University Press, 1995. Jablonka, Eva, and Gal Raz. “Transgenerational epigenetic inheritance: prevalence, mechanisms, and implications for the study of heredity and evolution.” The Quarterly review of biology 84, no. 2 (2009): 131-176. https://doi.org/10.1086/ 598822. Jia, Da, Renata Z. Jurkowska, Xing Zhang, Albert Jeltsch, and Xiaodong Cheng. “Structure of Dnmt3a bound to Dnmt3L suggests a model for de novo DNA methylation.” Nature 449, no. 7159 (2007): 248-251. https://doi.org/10.1038/nature06146. Jones, Peter L., Gert Jan C. Veenstra, Paul A. Wade, Danielle Vermaak, Stefan U. Kass, Nicoletta Landsberger, John Strouboulis, and Alan P. Wolffe. “Methylated DNA and MeCP2 recruit histone deacetylase to repress transcription.” Nature genetics 19, no. 2 (1998): 187-191. https://doi.org/10.1038/561. Kaikkonen, Minna U., Michael T. Y. Lam, and Christopher K. Glass. “Non-coding RNAs as regulators of gene expression and epigenetics.” Cardiovascular research 90, no. 3 (2011): 430-440. https://doi.org/10.1093/cvr/cvr097. Khatib, Hasan, ed. Livestock epigenetics. John Wiley & Sons, 2012. Khatib, Hasan. Molecular and quantitative animal genetics. John Wiley & Sons, 2015. Kovalchuk, Igor, and Olga Kovalchuk. Epigenetics in health and disease. FT Press, 2012. Kruip, Th A. M., and J. H. G. Den Daas. “In vitro produced and cloned embryos: effects on pregnancy, parturition and offspring.” Theriogenology 47, no. 1 (1997): 43-52. https://doi.org/10.1016/S0093-691X(96)00338-X. Kumar, S., and A. Singh. “Epigenetic regulation of abiotic stress tolerance in plants.” Adv. Plants Agric. Res 5, no. 5 (2016): 00179. Kumar, Suresh, Ashok K. Singh, and Trilochan Mohapatra. “Epigenetics: history, present status and future perspective.” Indian J Genet Plant Breed 77 (2017): 445-63. https://doi.org/10.5958/0975-6906.2017.00061.X. Kunert, Natascha, Joachim Marhold, Jonas Stanke, Dirk Stach, and Frank Lyko. “A Dnmt2like protein mediates DNA methylation in Drosophila.” (2003): 5083-5090. https://doi.org/10.1242/dev.00716. Lee, Yoontae, Chiyoung Ahn, Jinju Han, Hyounjeong Choi, Jaekwang Kim, Jeongbin Yim, Junho Lee et al. “The nuclear RNase III Drosha initiates microRNA processing.” Nature 425, no. 6956 (2003): 415-419. https://doi.org/10.1038/nature01957. Li, En, Timothy H. Bestor, and Rudolf Jaenisch. “Targeted mutation of the DNA methyltransferase gene results in embryonic lethality.” Cell 69, no. 6 (1992): 915-926. https://doi.org/10.1016/0092-8674(92)90611-F.

166

Asif Nadeem and Maryam Javed

Lim, Derek H. K., and Eamonn R. Maher. “DNA methylation: a form of epigenetic control of gene expression.” The Obstetrician & Gynaecologist 12, no. 1 (2010): 37-42. https://doi.org/10.1576/toag.12.1.037.27556. Lim, Shen Jean, Tin Wee Tan, and Joo Chuan Tong. “Computational epigenetics: the new scientific paradigm.” Bioinformation 4, no. 7 (2010): 331. https://dx.doi.org/10. 6026%2F97320630004331. Liu, Kui, Yun Fei Wang, Carmen Cantemir, and Mark T. Muller. “Endogenous assays of DNA methyltransferases: Evidence for differential activities of DNMT1, DNMT2, and DNMT3 in mammalian cells in vivo.” Molecular and cellular biology 23, no. 8 (2003): 2709-2719. https://doi.org/10.1128/MCB.23.8.2709-2719.2003. Liu, Yongsheng. “A new perspective on Darwin’s Pangenesis.” Biological Reviews 83, no. 2 (2008): 141-149. https://doi.org/10.1111/j.1469-185X.2008.00036.x. Loscalzo, Joseph, and Diane E. Handy. “Epigenetic modifications: basic mechanisms and role in cardiovascular disease (2013 Grover Conference series).” Pulmonary circulation 4, no. 2 (2014): 169-174. https://doi.org/10.1086%2F675979. Lund, Gertrud, Linda Andersson, Massimiliano Lauria, Marie Lindholm, Mario F. Fraga, Ana Villar-Garea, Esteban Ballestar, Manel Esteller, and Silvio Zaina. “DNA methylation polymorphisms precede any histological sign of atherosclerosis in mice lacking apolipoprotein E.” Journal of Biological Chemistry 279, no. 28 (2004): 2914729154. https://doi.org/10.1074/jbc.M403618200. Mattick, J. S., I. V. Makunin, and R. N. A. Non-coding. “Hum Mol Genet 15 Spec No 1.” R17 (Apr 2006) (2006). Mohamed, Jameelah Sheik, Philip Michael Gaughwin, Bing Lim, Paul Robson, and Leonard Lipovich. “Conserved long noncoding RNAs transcriptionally regulated by Oct4 and Nanog modulate pluripotency in mouse embryonic stem cells.” Rna 16, no. 2 (2010): 324-337. https://doi.org/10.1261/rna.1441510. Morey, Céline, and Philip Avner. “Genetics and epigenetics of the X chromosome.” Annals of the New York Academy of Sciences 1214, no. 1 (2010): E18-E33. https://doi.org/10.1111/j.1749-6632.2010.05943.x. Mutch, David M., Walter Wahli, and Gary Williamson. “Nutrigenomics and nutrigenetics: the emerging faces of nutrition.” The FASEB journal 19, no. 12 (2005): 1602-1616. https://doi.org/10.1096/fj.05-3911rev. Myzak, Melinda C., and Roderick H. Dashwood. “Histone deacetylases as targets for dietary cancer preventive agents: lessons learned with butyrate, diallyl disulfide, and sulforaphane.” Current drug targets 7, no. 4 (2006): 443-452. https://doi.org/ 10.2174/138945006776359467. Nan, Xinsheng, Huck-Hui Ng, Colin A. Johnson, Carol D. Laherty, Bryan M. Turner, Robert N. Eisenman, and Adrian Bird. “Transcriptional repression by the methylCpG-binding protein MeCP2 involves a histone deacetylase complex.” Nature 393, no. 6683 (1998): 386-389. https://doi.org/10.1038/30764. Nijland, Mark J., Stephen P. Ford, and Peter W. Nathanielsz. “Prenatal origins of adult disease.” Current Opinion in Obstetrics and Gynecology 20, no. 2 (2008): 132-138. https://doi.org/10.1097/GCO.0b013e3282f76753.

Epigenetics and Its Applications

167

O’Dell, Sandra D., and Ian N. M. Day. “Molecules in focus Insulin-like growth factor II (IGF-II).” The international journal of biochemistry & cell biology 30, no. 7 (1998): 767-771. https://doi.org/10.1016/S1357-2725(98)00048-X. Okada, Yuki, Kazuo Yamagata, Kwonho Hong, Teruhiko Wakayama, and Y. I. Zhang. “A role for the elongator complex in zygotic paternal genome demethylation.” Nature 463, no. 7280 (2010): 554-558. https://doi.org/10.1038/nature08732. Okamoto, Ikuhiro, and Edith Heard. “Lessons from comparative analysis of Xchromosome inactivation in mammals.” Chromosome research 17, no. 5 (2009): 659669. https://doi.org/10.1007/s10577-009-9057-7. Okano, Masaki, Daphne W. Bell, Daniel A. Haber, and En Li. “DNA methyltransferases Dnmt3a and Dnmt3b are essential for de novo methylation and mammalian development.” Cell 99, no. 3 (1999): 247-257. https://doi.org/10.1016/S00928674(00)81656-6. Okano, Masaki, Shaoping Xie, and En Li. “Dnmt2 is not required for de novo and maintenance methylation of viral DNA in embryonic stem cells.” Nucleic acids research 26, no. 11 (1998): 2536-2540. https://doi.org/10.1093/nar/26.11.2536. Ooi, Steen K. T., Chen Qiu, Emily Bernstein, Keqin Li, Da Jia, Zhe Yang, Hediye Erdjument-Bromage et al. “DNMT3L connects unmethylated lysine 4 of histone H3 to de novo methylation of DNA.” Nature 448, no. 7154 (2007): 714-717. https://doi.org/10.1038/nature05987. Pandian, Ganesh N., and Hiroshi Sugiyama. “Strategies to modulate heritable epigenetic defects in cellular machinery: lessons from nature.” Pharmaceuticals 6, no. 1 (2013): 1-24. https://doi.org/10.3390/ph6010001. Park, Jung Sun, Young Sun Jeong, Sang Tae Shin, Kyung‐Kwang Lee, and Yong‐Kook Kang. “Dynamic DNA methylation reprogramming: active demethylation and immediate remethylation in the male pronucleus of bovine zygotes.” Developmental dynamics: an official publication of the American Association of Anatomists 236, no. 9 (2007): 2523-2533. https://doi.org/10.1002/dvdy.21278. Peedicayil, Jacob. “Pharmacoepigenetics and pharmacoepigenomics.” (2008): 1785-1786. https://doi.org/10.2217/14622416.9.12.1785. Petronis, Arturas. “Epigenetics as a unifying principle in the aetiology of complex traits and diseases.” Nature 465, no. 7299 (2010): 721-727. https://doi.org/10.1038/ nature09230. Phutikanit, Nawapen, Junpen Suwimonteerabutr, Dion Harrison, Michael D’Occhio, Bernie Carroll, and Mongkol Techakumphu. “Different DNA methylation patterns detected by the Amplified Methylation Polymorphism Polymerase Chain Reaction (AMP PCR) technique among various cell types of bulls.” Acta Veterinaria Scandinavica 52, no. 1 (2010): 1-9. https://doi.org/10.1186/1751-0147-52-18. Pigliucci, Massimo. “Do we need an extended evolutionary synthesis?.” Evolution: International Journal of Organic Evolution 61, no. 12 (2007): 2743-2749. https://doi.org/10.1111/j.1558-5646.2007.00246.x. Polo, Sophie E., and Geneviève Almouzni. “Chromatin assembly: a basic recipe with various flavours.” Current opinion in genetics & development 16, no. 2 (2006): 104111. https://doi.org/10.1016/j.gde.2006.02.011.

168

Asif Nadeem and Maryam Javed

Ponting, Chris P., Peter L. Oliver, and Wolf Reik. “Evolution and functions of long noncoding RNAs.” Cell 136, no. 4 (2009): 629-641. https://doi.org/10.1016/ j.cell.2009.02.006. Portela, Anna, and Manel Esteller. “Epigenetic modifications and human disease.” Nature biotechnology 28, no. 10 (2010): 1057-1068. https://doi.org/10.1038/nbt.1685. Powledge, Tabitha M. “Behavioral epigenetics: How nurture shapes nature.” BioScience 61, no. 8 (2011): 588-592. https://doi.org/10.1525/bio.2011.61.8.4. Pradhan, Sriharsa, Albino Bacolla, Robert D. Wells, and Richard J. Roberts. “Recombinant human DNA (cytosine-5) methyltransferase: I. Expression, purification, and comparison of de novo and maintenance methylation.” Journal of Biological Chemistry 274, no. 46 (1999): 33002-33010. https://doi.org/10.1074/ jbc.274.46.33002. Quina, A. S., M. Buschbeck, and L. Di Croce. “Chromatin structure and epigenetics.” Biochemical pharmacology 72, no. 11 (2006): 1563-1569. https://doi.org/10.1016/ j.bcp.2006.06.016. Qureshi, Irfan A., and Mark F. Mehler. “Emerging roles of non-coding RNAs in brain evolution, development, plasticity and disease.” Nature Reviews Neuroscience 13, no. 8 (2012): 528-541. https://doi.org/10.1038/nrn3234. Rai, Kunal, Stephanie Chidester, Chad V. Zavala, Elizabeth J. Manos, Smitha R. James, Adam R. Karpf, David A. Jones, and Bradley R. Cairns. “Dnmt2 functions in the cytoplasm to promote liver, brain, and retina development in zebrafish.” Genes & development 21, no. 3 (2007): 261-266. https://doi.org/10.1101/gad.1472907. Rakyan, Vardhman K., Thomas A. Down, David J. Balding, and Stephan Beck. “Epigenome-wide association studies for common human diseases.” Nature Reviews Genetics 12, no. 8 (2011): 529-541. https://doi.org/10.1038/nrg3000. Reik, Wolf, Wendy Dean, and Jörn Walter. “Epigenetic reprogramming in mammalian development.” Science 293, no. 5532 (2001): 1089-1093. https://doi.org/10.1126/ science.1063443. Reik, Wolf, and Jörn Walter. “Genomic imprinting: parental influence on the genome.” Nature Reviews Genetics 2, no. 1 (2001): 21-32. https://doi.org/10.1038/35047554. Riggs, Arthur D. “X inactivation, differentiation, and DNA methylation.” Cytogenetic and Genome Research 14, no. 1 (1975): 9-25. https://doi.org/10.1159/000130315. Rijnkels, Monique, Elena Kabotyanski, Mohamad B. Montazer-Torbati, C. Hue Beauvais, Yegor Vassetzky, Jeffrey M. Rosen, and Eve Devinoy. “The epigenetic landscape of mammary gland development and functional differentiation.” Journal of mammary gland biology and neoplasia 15, no. 1 (2010): 85-100. https://doi.org/10.1007/ s10911-010-9170-4. Roach, H. I., and T. Aigner. “DNA methylation in osteoarthritic chondrocytes: a new molecular target.” Osteoarthritis and cartilage 15, no. 2 (2007): 128-137. https://doi.org/10.1016/j.joca.2006.07.002. Robertson, Keith D., and Alan P. Wolffe. “DNA methylation in health and disease.” Nature Reviews Genetics 1, no. 1 (2000): 11-19. https://doi.org/10.1038/35049533. Schotta, G., A. Ebert, R. Dorn and G. Reuter. “Position-effect variegation and the genetic dissection of chromatin regulation in Drosophila.” Seminars in cell & developmental biology 14 (2003): 75. https://doi.org/10.1016/s1084-9521(02)00138-6.

Epigenetics and Its Applications

169

Szyf, M. “DNA methylation and demethylation as targets for anticancer therapy.” Biochemistry (Moscow) 70, no. 5 (2005): 533-549. Szyf, Moshe. “Therapeutic implications of DNA methylation.” (2005): 125-135. https://doi.org/10.1517/14796694.1.1.125. Tang, Lin-Ya, M. Narsa Reddy, Vanya Rasheva, Tai-Lin Lee, Meng-Jau Lin, Ming-Shiu Hung, and C. K. James Shen. “The eukaryotic DNMT2 genes encode a new class of cytosine-5 DNA methyltransferases.” Journal of Biological Chemistry 278, no. 36 (2003): 33613-33616. https://doi.org/10.1074/jbc.C300255200. Tian, Fei, Fei Zhan, Nathan D. VanderKraats, Jeffrey F. Hiken, John R. Edwards, Huanmin Zhang, Keji Zhao, and Jiuzhou Song. “DNMT gene expression and methylome in Marek’s disease resistant and susceptible chickens prior to and following infection by MDV.” Epigenetics 8, no. 4 (2013): 431-444. https://doi.org/10.4161/epi.24361. Triantaphyllopoulos, Kostas A., Ioannis Ikonomopoulos, and Andrew J. Bannister. “Epigenetics and inheritance of phenotype variation in livestock.” Epigenetics & chromatin 9, no. 1 (2016): 1-18. https://doi.org/10.1186/s13072-016-0081-5. Turek-Plewa, Justyna, and Pawel P. Jagodzinski. “The role of mammalian DNA methyltransferases in the regulation of gene expression.” Cellular and Molecular Biology Letters 10, no. 4 (2005): 631. Tycko, Benjamin, and Ian M. Morison. “Physiological functions of imprinted genes.” Journal of cellular physiology 192, no. 3 (2002): 245-258. https://doi.org/ 10.1002/jcp.10129. Van Soom, Ann, Luc Peelman, W. V. Holt, and A. Fazeli. “An introduction to epigenetics as the link between genotype and environment: a personal view.” Reproduction in Domestic Animals 49 (2014): 2-10. https://doi.org/10.1111/rda.12341. Van Speybroeck, Linda. “From epigenesis to epigenetics: the case of CH Waddington.” Annals of the New York Academy of Sciences 981, no. 1 (2002): 61-81. Vogelauer, Maria, Jiansheng Wu, Noriyuki Suka, and Michael Grunstein. “Global histone acetylation and deacetylation in yeast.” Nature 408, no. 6811 (2000): 495-498. https://doi.org/10.1038/35044127. Waddington, C. H.,. The strategy of the genes. A discussion of some aspects of theoretical biology (1957). With an appendix by H. Kacser. The strategy of the genes. A discussion of some aspects of theoretical biology. With an appendix by H. Kacser. Walker, S. K., K. M. Hartwich, and R. F. Seamark. “The production of unusually large offspring following embryo manipulation: concepts and challenges.” Theriogenology 45, no. 1 (1996): 111-120. https://doi.org/10.1016/0093-691X(95)00360-K. Waterland, Robert A., Dana C. Dolinoy, Juan‐Ru Lin, Charlotte A. Smith, Xin Shi, and Kajal G. Tahiliani. “Maternal methyl supplements increase offspring DNA methylation at Axin Fused.” genesis 44, no. 9 (2006): 401-406. https://doi.org/ 10.1002/dvg.20230. Watt, Fujiko, and Peter L. Molloy. “Cytosine methylation prevents binding to DNA of a HeLa cell transcription factor required for optimal expression of the adenovirus major late promoter.” Genes & development 2, no. 9 (1988): 1136-1143. https://doi. org/10.1101/gad.2.9.1136. Weinhold, Bob. “Epigenetics: the science of change.” (2006): A160-A167. https://doi.org/10.1289/ehp.114-a160.

170

Asif Nadeem and Maryam Javed

Weismann, August. Essays upon heredity and kindred biological problems. Vol. 1. Clarendon press, 1891. Wolffe, Alan P., and Dmitry Guschin. “Chromatin structural features and targets that regulate transcription.” Journal of structural biology 129, no. 2-3 (2000): 102-122. https://doi.org/10.1006/jsbi.2000.4217. Wysocka, Joanna, Tomek Swigut, Hua Xiao, Thomas A. Milne, So Yeon Kwon, Joe Landry, Monika Kauer et al. “A PHD finger of NURF couples histone H3 lysine 4 trimethylation with chromatin remodelling.” Nature 442, no. 7098 (2006): 86-90. https://doi.org/10.1038/nature04815. Yang, Xiang‐Jiao. “The diverse superfamily of lysine acetyltransferases and their roles in leukemia and other diseases.” Nucleic acids research 32, no. 3 (2004): 959-976. https://doi.org/10.1093/nar/gkh252. Zemach, Assaf, Ivy E. McDaniel, Pedro Silva, and Daniel Zilberman. “Genome-wide evolutionary analysis of eukaryotic DNA methylation.” Science 328, no. 5980 (2010): 916-919. https://doi.org/10.1126/science.1186366. Zeric, D., Importance of epigenetics in animal breeding: Genomic imprinting. Bachelor Thesis, Faculty of Veterinary Medicine and Animal Science, Swedish University of Agricultural Sciences (2012).

Chapter 9

Nutrigenomics Abstract The emergence of genomics which is defined as a set of high-throughput genome sequencing technologies for generating, processing, and applying scientific knowledge regarding the genomic structure and function has opened up previously unimaginable possibilities for learning more about the way nutrients influence expression of genes and proteins and, in turn, affect metabolism at cellular and organism level. The use of high-throughput genomic technologies in the scientific study of nutrition is known as nutrigenomics. It combines the diverse fields of studies in biology like biochemistry, nutrition, physiology, bioinformatics, genomics, transcriptomics, proteomics, epigenomics, and metabolomics to investigate and describe the existing mutual interactions between nutrients and genes at the molecular level. The identification of all these nutrient-gene interactions will help in the prescription and administration of personalized diets based on the genotype of each individual. Research in nutrigenomics could allow in future to implement appropriate dietary intervention approaches to restore proper homeostasis in the body and prevent onset and progression of disorders related to diet. Currently, nutrigenomics research is mostly focused on health, disease, nutrition, livestock production and improvement in economic traits of livestock. It is undoubtedly a fascinating emerging science with numerous aspects that need to be researched extensively and from various viewpoints.

Keywords: nutrigenomics, high-throughput sequencing, personalized diets, genotype

Introduction Nutritional genomics, also known as nutrigenomics, is the study of the intricate interactions between dietary nutrients, their metabolic bi-products, and the genes of the human body (Kaput et al., 2005). While originally developed in the context of humans, the methods are also applicable to livestock production. However, the concepts will first be considered for

172

Asif Nadeem and Maryam Javed

humans and considerations for livestock later in this review. As in many other areas of genetics, developments in human genetics, particularly biotechnological processes have provided the opening for subsequent innovations in the agricultural sector. Nutrigenomics includes the study of diet-gene interactions in order to determine the dietetic elements that have good or harmful health related consequences (German, 2005). Nutrigenomics includes the study of diet-gene interactions in order to determine the dietetic elements that have good or harmful health related consequences. Additionally, it possesses the ability to assess an individual’s nutritional needs based on his or her genetic composition, and also provide a link between food and chronic diseases, which will aid in understanding the etiology of chronic diseases including cancer, cardiovascular disease (CVD), obesity, and type 2 diabetes. The nutritional environment can be considered one of the most important environmental factors that influence the expression of genes.

Nutrigenetics Nutrigenetics, however, analyze how an individual’s genetics affect the metabolization of certain dietary nutrients. Nutrigenetics is focused on discovering the genes associated with physiological reactions to dietary intakes, as well as the genes in which slight variations, known as polymorphisms, could have large nutritional implications. It also explains the reason why and in what way different people react to same nutrition in various ways. Nutrigenetics together with nutrigenomics assure to provide a crucial component required for understanding the scientific information related to the way how diet influences individual humans, and nutrigenomics will eventually result in empirical nutrition based intervention methods to restore health and wellness as well as stop diseases related to the dietary intake (Afman & Müller, 2006).

Nutrigenomics Concept DellaPenna (1999) was the first to define nutritional genomics as an approach to genetic analysis concerned with nutritional compounds that are manufactured or stored by organisms, be they plant or animal (DellaPenna 1999; Ordovas and Corella, 2004). It is a branch of science that studies the

Nutrigenomics

173

function of nutrients in relation of expression of genes and their interactions. Nutrigenomics is the study of molecular links between nutritional stimuli and gene responses, as defined by Chavez & de Chávez (2003). Another definition of nutrigenomics is the investigation of molecular links between nutrition’s’ stimuli and responses generated by genes, as put forward by Chavez & de Chávez (Chavez & de Chávez 2003). Nutrigenomics, according to Müller and Kersten (2003), is “the application of high-throughput genetic technologies in nutrition research. If used correctly, it will help researchers better comprehend the way nutrition affects metabolic pathways and homeostasis regulation, what leads to disruption in homeostasis regulation is during the initial stages of a nutrition-related disease, and how specific sensitizing genotypes lead to establishment of disease.”

Modern Biotechnology Tools Related to Nutritional Genomics There are a range, of new ‘omics’ and computational tools that have been developed, many of which are relevant to nutritional genomics (Subbiah et al., 2007). These include:   



Genomics includes the study of all the genes in relation to their activities and interactions in the genome. Proteomics involves studying alterations in the entire proteome, e.g., post-translational modifications. Metabolomics is a technique for measuring changes in the entire metabolome, which includes metabolites that have less than 2 kDa molecular weight. Bioinformatics provides the statistical and computational tools to handle and extract information from these the large data sets.

Nutrigenomics as a Holistic Approach Nutrigenomics studies the connections between dietary intake and disease risk and physiological responses, as well as connection of diet with genes and their expression, with molecular mediators and biomarkers like metabolites and hormones that exist between. Nutrigenomics employs a variety of techniques to determine illness risk and development. Some of these are:

174

Asif Nadeem and Maryam Javed

   

Food diaries for keeping track of nutritional intake. Biomarkers, including hormones levels or metabolites are used to determine how the body reacts. Genomic tests to find gene variations that are relevant. Clinical data including age, sex, BMI, and weight to track how food impacts health.

Nutrigenomics techniques are being used to treat a variety of ailments. This includes: 

 

Approaches to address the incidence of metabolic syndrome, which has basis on genetic variations and factors related to the diet or lifestyle. Finding the relation between the gut microbiome, mental health, and obesity. Links between a particular nutrition and disease, such as coffee and heart abnormalities.

Interaction between Genes and Nutrients Internal and environmental stimuli work together to turn genes off and on. External factors include various hormone concentrations, but one the most important external factors is nutrients. The nutritional milieu has the potential to influence gene expression and, as a result, health. There are a range of nonessential bioactive substances that appear to have a substantial impact on health, in addition to the essential nutrients like carbohydrates, fatty acids, amino acids, calcium, selenium, zinc, and vitamins A, C, and E. Metabolism of carcinogens, hormone balancing, cell signalling, cell cycle control, angiogenesis, and apoptosis are just a few of the physiological processes that these necessary but non-essential bioactive food elements have been shown to affect. Bioactive dietary ingredients frequently affect multiple processes at the same time (Törrönen et al., 2006). Gene Expression Profiling Expression microarrays are a method for measuring the expression levels of many thousands of genes simultaneously and have become an important tool for exploring gene expression profiles in tissues and also to delve into the mechanisms behind gene transcription regulation. In addition, the technology

Nutrigenomics

175

can also be used to build a clinical diagnostic profile, based on a patient’s gene expression profile, and possible avenue for therapy (Masotti et al., 2010). Related to gene expression profiling is the mapping of expression quantitative loci, i.e., eQTL mapping, and also termed genetical genomics sometimes (Jansen & Nap, 2001; Schadt et al., 2003). This method deals with how the expression of a gene is influenced by the underlying genotype. But the existence and magnitude of an eQTL will be influenced by its environment, in particular the current statue of the tissue including dietary effects. Such methods are now being adopted in nutrigenomic applications (Das & Shama, 2014).

Approaches of Nutrigenomics The use of recently developed tools for use in transcriptomics, proteomics and metabolomics have allowed nutritionists to build a systems approach to understand the complex genetic interactions associated with the nutrition environment. Transcriptomics methods (microarray technologies) have provided fresh insight into the physiological effects of dietary proteins, whereas proteomics methods (two-dimensional electrophoresis) may be a useful tool for examining specific amino acids’ effects on composition of proteins. Nutrigenomics is a technology that may be used to determine the way genes interact with nutrients on a genome-wide scale, as well as how DNA and genetic code impacts the requirements for specific nutrients and quantities. It can also assist us in comprehending the impact of nutrition on the expression of genes. Nutrigenomics Is Complicated Nutrigenomics is complicated due to following limitations: 





Although more nutrigenomic biomarkers are being identified (e.g., miRNAs), the interactions between these biomarkers is highly complex. There is a wide range of tissues to be investigated, each of which requires their own ‘omics’ assays to understand the processes involved. While associations between nutrient intake and gene expression may be identified, understanding biological causal pathways remains a major challenge.

176

Asif Nadeem and Maryam Javed



Food and behavioral information is often recorded in the form of diaries for the patient or study subject, but these are not always reliable.

Application of Nutrigenomics for Human Nutrition and Health In a nutrigenomic study, the use of genomic concepts aids in the identification of particular connections between nutrients and genetic variables. Nutrigenomics can be employed to develop a personalized dietary plan (specific nutritional needs depending on an individual’s genetic composition), to better understand diet-related disorders, and to identify the genes involved in interactions involving genes and diet. Furthermore, nutrigenetic techniques and eQTL mapping can be utilized to detect polymorphisms in genes which may be influenced by major environmental and nutritional variables. Nutrigenetics will in future help in developing information for determining the ideal diet for a certain subject, whereas nutrigenomics will help formulate data for providing a customised optimal diet plans for an individual. Quantitative real-time polymerase chain reaction (qPCR), DNA microarray methods, and DNA sequencing can be used to assess the interaction between genes and diet. In nutrigenomics research, microarray or DNA chip technology now allows for the screening of huge sets of genes, offering a full image of gene expression patterns’ variation. Application of Nutrigenomics in Animal Sector Dietary modifications and nutritional strategies are well-known technologies in the cattle industry for altering output and health. According to Byrne’s research, intake of low-quality feed, which is characterized by nutritional constraints, causes changes in expression of certain genes involved in cytoskeleton remodeling, protein turnover, and metabolic balance, all of which have a deleterious effect on growth of animals. The cellular composition levels and neuron expression will be examined using microarray technology, and these alterations in expression might be predicted from observed variations in animal growth and physiology during dietary restriction. Understanding biochemical, physiological, and metabolic processes, as well as expression of genes in livestock, especially poultry is required to investigate the impact of nutrition and diet formulations. The use of microarrays will be a useful tool for the livestock industry for evaluating nutritional strategies for their animals. There are nutrition effects of individual gene markers that could be evaluated from a nutrigenetics approach, but these have variable effects on individual animals receiving the

Nutrigenomics

177

same level of feed and comparing gene expression patterns in groups of animals may be an effective way to understand this variation. It is also feasible to find particular parallels and variances in nutritional effects across many genes. Using this information, it will in the future be possible to develop an optimal genetic breeding plan coupled to an optimal nutritional plan specific to the local environment.

Application of Nutrigenomics in Ruminants There are several applications of nutrigenomics in ruminants, some of these listed below: a) Improvement in ruminant health b) Improvement in milk production traits (total, fat, protein) c) Improvement in fertility and reproductive performance

Bovine Genome Characteristics The bovine genome includes around three billion nucleic acid base pairs contained in two sex chromosomes and 29 autosomes. The size of the bovine genome is comparable to that of other animals (Lewin, 2003). Many complementary markers have been identified and documented in GenBank as a result of mapping procedures, and sequencing investigations have identified and catalogued more than 300,000 expressed sequence tags or ESTs in short. A collaborative international effort resulted in the completion of the first draft of Bovine Genome Project in 2004 which had begun in 2003. Ruminant Nutritional Genomics Nutritional methods and dietary alterations are important techniques for affecting ruminant productivity. Dairy animals’ reproductive efficiency and fertility at large scale are directly influenced by their genetic disposition and nutritional control. This is especially crucial during initial lactation periods, when the animal is inseminated but is especially vulnerable to nutritional disturbances, which results in lower pregnancy rates. According to Byrne et al. (2005), nutritional limitation caused by poor quality feeds promotes changes in the expression of certain genes involved in cytoskeletal remodeling, protein turnover, and metabolic balance. Observed variations in the animal development and physiology during normal food restrictions may help in predicting most of these alterations in expression (Byrne et al., 2005; Jones et al., 2004).

178

Asif Nadeem and Maryam Javed

Applying Nutrigenomics for Better Nutrition and Health Nutritional methods and dietary alterations are important techniques for affecting the ruminant’s productivity. The reproductive success of milk animals is heavily influenced by their diet and genetic makeup. This is especially critical during the transition phase and early periods of lactation, when the animal is particularly vulnerable to nutritional imbalance. There is very little evidence on the effect of nutrition on the expression of genes relevant to livestock productivity and reproduction. It may be feasible to gain a better understanding of the significance of the link between individual nutrients and gene expression regulation. When an animal is provided a selenium-deficient feed, this causes an alteration of protein synthesis at the transcriptional level; as well as an adverse effect of increased stress through the up-regulation of particular genes and signaling pathways. This also affects genes controlling protection of oxidative damage, and detoxification mechanism, causing alterations of phenotypes. However, these traits can be improved by altered dietary regimens (Kore et al., 2008). Application of Nutrigenomics in Ruminants for Higher Fat Quantity in Milk The mammary glands’ synthesis of milk fat is a continual process. Bioactive fatty acids or FAs can help to modulate this. According to the biodegradation theory, diet-induced milk fat depression or MFD as it is commonly known in dairy cows is mediated by specific FAs created during ruminal biodegradation inhibiting mammary synthesis of milk fat. Tran-10, cis-12 conjugated linoleic acid was the first FA to be proven to alter milk fat production, whose impact was characterized as a dose response interaction. Lipogenic ability and transcription of major breast lipogenic genes are both downregulated during MFD. The findings support the role of spot 14 and sterol response elementbinding protein-1 (SREBP1) in rodents and ruminants as a biodegradation intermediate responsive lipogenic signaling pathway. Ruminant milk fat production is regulated by bioactive FAs generated from the rumen (Bauman et al., 2011). Fat is the most dynamic constituent in the milk of dairy cows, according to Mach et al. (2011), with the quantity and its biochemical makeup being impacted by genetic, physiological, and environmental factors. Application of Nutrigenomics in Ruminant Reproduction and Fertility Variations in the functionality of particular genes caused by food changes in cattle and other livestock species affect reproduction performance. It is feasible to grasp the relationship between gene expression regulation and

Nutrigenomics

179

nutrients, as demonstrated in a report published by Rao et al., (2001) on the dietary selenium’s effects on the expression of genes in mice.

The Technologies Small-scale approaches (Northern blotting, qPCR and differential display PCR) have historically been used in nutrigenomics investigations to evaluate the expression of a limited proportion of genes. However, modern genomic analytic tools provide more powerful approaches to a wide range of biological problems by detecting the expression of genes on a wider scale, due to the advent of microarrays and extensive transcriptome sequencing commonly referred to as RNAseq. Through the automation and parallel processing of protein and DNA/RNA chemistry, these technological advances established the notion of high-throughput information gathering. The technologies designed to collect transcriptomic, proteomic, and metabolomic information are by far the most commonly utilized omics technologies. Biologists have already been able to use current systems methods to analyze interactions happening within living systems thanks to newly emerging bioinformatics tools and biological information obtained from genomics and transcriptomics investigations. Using these high-throughput gene expression approaches, nutrigenomics involves studying the effects and linkages of nutrition intake on gene expression. We can uncover the molecular basis of phenotypic variations observed between different animals subjected to various nutritional interventions by studying differentially expressed genes, as well as identifying the genes and metabolic pathways involved in tissue composition regulation. The expression of genes is influenced by a variety of dietary elements, which include carbohydrates, fats, proteins, minerals, vitamins, and phytochemicals (such as isothiocyanates, flavonoids, and so on), according to the research involving molecular interactions of dietary components. Most qualities of relevance, notably meat production, have a multivariate basis, meaning that meat qualities are the result of a complicated genetic makeup interacting with the surroundings, with food being one of the most important components in animal agriculture. In the muscles, feeding causes a regulating impact on biological processes that are reflected in the meat and other tissues quality. The understanding of the complicated interactions between specific nutrients and their interactions on the genomes of domesticated animals, which contain 30–40,000 genes per species, is unknown. The primary rationale for that the technical tools required for such comprehension have only recently become available. This process has been hastened by the

180

Asif Nadeem and Maryam Javed

continuous genome mapping of the major agricultural animals, as well as advances in informatics technology and molecular biological procedures.

Single Nucleotide Polymorphisms Munshi and Duvvuri (2008) described the way nutrients exert their influence on the outcome of gene expression, such as synthesis of mRNA (transcriptomics), proteins (proteomics), and metabolite (metabolomics), by using genetic polymorphisms as an example, such as SNPs, that could be partly accountable for variability in an individual’s responses to biochemical (Munshi and Duvvuri, 2008). Siddique et al. (2009) investigated the influence of different nutrients on expression of genes that occurs naturally in the body and how this could be implemented in many areas (2009) (Siddique et al., 2009). Scientists have found genes involved in the creation of nutritionally essential proteins including transport molecules and digestive enzymes that move nutrients as well as cofactors to their sites of usage by using molecular biology and genomics methods. A variety of frequent SNPs are reported to have an impact on nutritional needs. SNPs linked to a reduced likelihood of having organ failure in humans taking low-choline diets are one example. Mitra et al. (2005) concentrated on SNPs including the disorders linked with them (e.g., cancer, CVD, diabetes, Down syndrome, leukemia, neural tube defect, and spina bifida), as well as the connection between folate-dependent enzyme polymorphism and folate intake (folate nutrigenomics) (Mitra et al., 2005). Biomarkers There is now an innovative way of looking at food, that entails not only seeing it as a source of nutrition, but as a drug with a capability of curing sickness and delaying the effects of ageing (Bhatt and Sharma 2011). Nutrigenomics is a component of this novel strategy, and it entails identifying indicators for the initial stages of disorders related to diet, when nutritional intervention might restore health (Kore et al., 2008; Lau et al., 2008; Murray et al., 2010). The expression of a gene can be controlled using foods or combinations of nutrients to promote animal health, production, and subsequently their overall performance. In the modern nutrigenomic research, the discovery of these genetic markers connected to commercially relevant qualities like meat, milk, and wool productivity, the expression of which may be modified by dietary regimens, is critical. This will aid in the production of sustainable cattle. It can be conceivable to achieve the intended performance of livestock in regard to

Nutrigenomics

181

health and resource production by the targeting of certain genes through dietary manipulation (Kore et al., 2008).

Transcriptomics The transcriptome includes a cell’s or tissue’s entire transcribed RNA (which includes mRNA, rRNA, tRNA, and noncoding). When it comes to the mRNA, the transcriptome is essentially a cellular metabolism’s record that displays the actively transcribed genes at any particular time. High-throughput techniques, like microarray and RNA-seq allow quantification of practically the entire transcriptome, are ideally suitable for comprehensive connotation using the systems biology approaches. Transcriptome data could be utilized to recreate changes throughout all the potential biological pathways by employing suitable computational tools, such as cell signaling and metabolic databases. Loor et al. (2003) focused on the development of microarray techniques for research related to ruminant nutrition and data mining (Loor et al., 2003). Proteomics Proteome research, in general, gives the composition of proteins in a cell or tissue at a specific moment. The proteome is more dynamic than the genome because of posttranscriptional changes that can happen very rapidly, such as phosphorylation and dephosphorylation in response to hormone exposure, as seen in adipose tissues (Humphrey et al., 2013). Furthermore, alternative splicing of mRNA and a vast range of posttranslational modifications add to the proteome’s complexity. Key advances in proteomics instrumentation and methods have been made in the last decade (May et al., 2011). Livestock researchers have used powerful approaches for identifying and differentially quantifying protein species in complicated biological data (Lippolis and Reinhardt, 2008). Metabolomics Metabolomics is the study of metabolites on a global scale, combining highresolution data with statistical methods like principal component analysis (PCA) and partial least squares (PLS) to create a comprehensive metabolites’ picture (Zhang et al., 2012). These statistical approaches are quite beneficial for dealing with data that has a tremendously high dimensionality, such as numerous protein species. Compounds like nucleic acids, amino acids and peptides, as well as carbohydrates, vitamins, alkaloids, polyphenols, organic acids, and inorganic species are among the tiny compounds that can be detected using this method. Metabolomics provides a platform for comparing

182

Asif Nadeem and Maryam Javed

metabolites between different nutritional interventions in order to depict the dynamic mechanisms that underpin biological activities. This method has been used to evaluate ruminal adjustments to chronic ruminal acidosis in the dairy cattle (Saleem et al., 2012) as well as in pigs as a result of feeding particularly high fermentable proteins (Pieper et al., 2012). Metabolome research can be done on a range of biological fluids as well as tissues, and it can be done on a range of technological devices with unclear magnetic resonance (NMR), being among the most widely used spectroscopic analytical methods, can uniquely detect and concurrently measure a wide number of organic molecules in the micromolar scale.

Data Integration and the Omics Workflow The costs associated with omics experiments, as well as the processing capacity needed for statistically analyzing the data sets, have ceased to be a significant barrier; as a result, omics methods are now available to a wider range of livestock researchers. The presence of enormous data sets, however, is no assurance that meaningful insight will be acquired in a particular system unless the data is not effectively processed and then examined. When attempting to integrate many levels of complexity (e.g., data from various omics analyses within the same study), three kinds of studies are typically performed: When two methods are used at the same time, a single omics dataset (like transcriptomics) could be employed in filling the gaps in other omics data set (such as proteomics), 2) various omics levels can be utilized for cross validating the other, and 3) numerous omics data sets could also be employed for making mathematical correlations. Based on the systems perspective, this second option is more intriguing. When combining the transcriptome and proteome, for example, researchers could concentrate on instances where the anticipated connections between both are missing, exposing unknown regulatory information not included in the system’s initial knowledgebase. Omics-level analysis of biological systems is highly complex in terms, of the acquisitions of biological data at the genome, transcriptome and proteome level, together with their integration and interpretation. Such high-volume data also requires many data screening processes to be put in place, as well as many internal validation processes. For this, various workflows have been developed to cope with this, based on input from biological, statistical, and computational scientists (Mühlberger et al., 2011; Kohl et al., 2014). The endpoint of this workflow will ultimately result in an integrated system understanding of how the biological processes work.

Nutrigenomics

183

Furthermore, these omics-level data, along with bioinformatics, can be utilized to construct mathematical models to better comprehend the actual biological interactions occurring in a system, based on this understanding. On the basis of this insight, the same omics-level data, together with bioinformatics, may be used to build mathematical models for better understanding the underlying biological linkages inside a system. The goal of genome-scale metabolic network reconstructions in the model organisms is to reflect on each and everything inside of known metabolic pathways in an organism’s cell (Schellenberger et al., 2010). Metabolic reconstructions for a variety of taxa including wide range unicellular to multicellular life forms, notably cattle have been assembled (Seo and Lewin, 2009), but as of now, it has not been done for other livestock species. The “bottom-up” strategy, which entails creating automated tools and applying mathematical models, tries to meticulously create accurate models that can be simulated under various physiological situations (Shahzad and Loor, 2012). However, like with any model, the output and inferences drawn from such mathematical models must be confirmed through critical data analysis, preferably from well-designed biological investigations and experiments.

Gene Diet Disease Interaction Nutrigenetic Diseases Overall, 97% of genetic-based diseases in human are the result of single-gene mutations, although the progression of disease can be altered by a change in diet. Examples include phenylketonuria (PKU), triggered by the presence of the amino acid phenylalanine: reducing high protein intake can assist with this disease. Another example is galactosaemia, the inability to process galactose as the enzyme galactose-1-phosphate uridyl transferase (GALT) is absent in the liver. In this situation, dairy food should be avoided as it contains galactose and lactose (Gaboon, 2011). Nutrigenomics and Diet Supplementation We now exist in a world where dietary intake of humans and many other organisms is vastly different from the one to which we had acquired genetic adaptation. Around 10,000 years ago, dramatic changes in human food supply were accompanied with the agriculture revolution and domestication and farming of animals. Later, thousands of years after this development, the Industrial Revolution and advances in food technology resulted in even more

184

Asif Nadeem and Maryam Javed

significant changes in dietary composition. Among the most significant changes in food composition due to this technological advancement resulted in the quality and quantity of specific fatty acids. Srinivasarao et al. (1997) used discontinuous sucrose density gradient ultracentrifugation to determine the magnitude of variations in polyunsaturated fatty acids, long-chain fatty acids, and molar ratios of cholesterol to phospholipids in synaptosomal membranes in response to the intake of different types of dietary fats (coconut oil, mustard, peanut and safflower oil). Diet has an essential role in the aetiology and protection from cancer. Several plant-based remedies are used in Ayurveda medicine, an Indian traditional medicine, for treatment of cancer (Srinivasarao et al., 1997). In India, Sinha et al. (2003) held a symposium on risk to cancer and the influence of food on it. The symposium’s aim was to outline the various dietary as well as other factors linked to cancer. Turmeric has been found to be an effective anti-inflammatory, antioxidant, and chemopreventive agent (Sinha et al., 2003). The impact of numerous elements, such as macronutrients and obesity control guidelines. Vitamin A was shown to be an adipose tissue growth regulator by Jeyakumar et al. (2005). Chronic high-dose dietary vitamin A intake effectively controls mass of adipose tissues in the WNIN/Ob strain’s thin and obese phenotypes. Vitamin A is also necessary for appropriate development of the embryo and fetus, as well as for maintaining fully differentiated condition in adults (Jeyakumar et al., 2005). Ghoshal et al. (2003) used the phosphoenolpyruvate carboxykinase (PEPCK) gene as a prototypical retinoid-responsive gene to explore the impact of vitamin A deficiency on mouse liver development (Ghoshal et al., 2003). In a randomized experiment, Singh et al. (1994) found that a low-energy, fruit- and vegetable-rich diet reduced central adiposity and other symptoms related to glucose intolerance in individuals after the acute myocardial infarctions (Singh et al., 2002).

Regulatory, Ethical and Social Implications of Nutrigenomics Nutrigenomics has moral, legal, and societal concerns, especially in terms of the way the general public might obtain nutrigenetic tests as well as dietary and lifestyle guidance. International specialists have identified five key areas (Oliver, 2005) both in terms of fundamental nutrigenomics studies and therapeutic and commercial applications: (i) nutrigenomics-related health improvement claims, (ii) nutrigenomic data management, (iii) nutrigenomics services delivery techniques, (iv) nutrigenomics products, and (v) equitable

Nutrigenomics

185

nutritional services accessibility. As a result, it is critical to raise the level of debate in order to comprehend and regulate each of these issues.

Opportunities and Challenges With worldwide urbanization and increase in population, there has been an increase in consumption of animal products. Furthermore, some of the grain feeds currently utilized in animal production may be suitable for human consumption also. This phenomenon necessitates a novel approach to improve the efficiency of feed conversion in order to raise production, which could be achieved using ‘omics’ technologies. It may be able to improve productivity and efficiency in the use of nutrition. Nevertheless, there is still a scarcity of nutrigenomic data for analyzing and correlating genes and nutrient conversion in cattle production. The consumption of majority of nutrients is likely insufficient in various settings, resulting in a wide range of performance. To remedy, the genetic element of the variability must first be discovered, followed by the implementation of a genetic improvement program. By taking nutrigenomics into account, we can anticipate seeing a shift how we feed and farm livestock including poultry. DNA-based testing for genes or markers that influence features that are challenging to assess, like quality of meat and resistance to diseases, are expected to have a significant contribution in the future. This could be utilised to improve output and productivity by breeding animals for qualities like improved product quality, welfare, reduced susceptibility to disease and increased disease resistance, and reduction in their environmental impact. This method will aid the breeders and scientists in determining the precise amount of nutrients in a given situation. The use of genomic selection is a relatively new addition to the animal breeder’s toolbox (Hayes et al., 2009). Rather than focus on finding individual genes associated with economically important traits (as does the QTL mapping and GWAS approaches), genomic selection treats the genome as a ‘black box’: the role is to extract a genetic signature from the genome that is predictive of performance, without the concern of specific gene function. Application of nutrigenomics in this context could be important. It may also be economically advantageous to select animals whose response to varying diets is not so great: this would lead to more predictable and uniform growth which is a desirable characteristic in some livestock industries, poultry in particular.

186

Asif Nadeem and Maryam Javed

Conclusion Nutrients and food elements can directly or indirectly upregulate and downregulate gene expression, as well as regulate intermediate metabolites involved in the signaling pathways, which can have positive or negative consequences. In nutrigenomics research, microarray or DNA chip technology now allows researchers to analyze a massive number of genes while also providing a full view of the patterns in gene expression, allowing researchers to investigate dietary effects that were previously impossible to investigate. The use of molecular genetics techniques to assess the heritable characteristics of humans and farm animals like body weight, animal carcass merit, feed intake, growth rate, and milk output and composition has sparked a lot of interest. Nutrigenomics can only partially provide solutions to the problems related to non-genetic factors that impact the health and body performance of individuals.

References Afman, Lydia, and Michael Müller. “Nutrigenomics: from molecular nutrition to prevention of disease.” Journal of the American Dietetic Association 106, no. 4 (2006): 569-576. https://doi.org/10.1016/j.jada.2006.01.001. Bauman, Dale E., Kevin J. Harvatine, and Adam L. Lock. “Nutrigenomics, rumen-derived bioactive fatty acids, and the regulation of milk fat synthesis.” Annual review of nutrition 31 (2011): 299-319. https://doi.org/10.1146/annurev.nutr.012809.104648. Bhatt, Shibani N., and Arthvan D. Sharma. “Nutrigenomics: a non—conventional therapy.” International Journal of Pharmaceutical Sciences Review and Research 8, no. 2 (2011): 100-105. Byrne, K. A., Y. H. Wang, S. A. Lehnert, G. S. Harper, S. M. McWilliam, H. L. Bruce, and A. Reverter. “Gene expression profiling of muscle tissue in Brahman steers during nutritional restriction.” Journal of animal science 83, no. 1 (2005): 1-12. https://doi.org/10.2527/2005.8311. Chavez, A., and M. Muñoz de Chávez. “Nutrigenomics in public health nutrition: shortterm perspectives.” European Journal of Clinical Nutrition 57, no. 1 (2003): S97S100. https://doi.org/10.1038/sj.ejcn.1601809. Das, Swapan Kumar, and Neeraj Kumar Sharma. “Expression quantitative trait analyses to identify causal genetic variants for type 2 diabetes susceptibility.” World journal of diabetes 5, no. 2 (2014): 97. https://dx.doi.org/10.4239%2Fwjd.v5.i2.97. DellaPenna, Dean. “Nutritional genomics: manipulating plant micronutrients to improve human health.” Science 285, no. 5426 (1999): 375-379. https://doi.org/10.1126/ science.285.5426.375.

Nutrigenomics

187

Gaboon, Nagwa E. A. “Nutritional genomics and personalized diet.” Egyptian Journal of Medical Human Genetics 12, no. 1 (2011). https://doi.org/10.1016/j.ejmhg. 2011.02.001. German, J. Bruce. “Genetic dietetics: nutrigenomics and the future of dietetics practice.” Journal of the American Dietetic Association 105, no. 4 (2005): 530-531. https://doi.org/10.1016/j.jada.2005.02.034. Ghoshal, Saheli, Saritha Pasham, Daniel P. Odom, Harold C. Furr, and Mary M. McGrane. “Vitamin A depletion is associated with low phosphoenolpyruvate carboxykinase mRNA levels during late fetal development and at birth in mice.” The Journal of nutrition 133, no. 7 (2003): 2131-2136. https://doi.org/10.1093/jn/133. 7.2131. Hayes, Ben J., Phillip J. Bowman, Amanda J. Chamberlain, and Michael E. Goddard. “Invited review: Genomic selection in dairy cattle: Progress and challenges.” Journal of dairy science 92, no. 2 (2009): 433-443. https://doi.org/10.3168/jds.2008-1646. Humphrey, Sean J., Guang Yang, Pengyi Yang, Daniel J. Fazakerley, Jacqueline Stöckli, Jean Y. Yang, and David E. James. “Dynamic adipocyte phosphoproteome reveals that Akt directly regulates mTORC2.” Cell metabolism 17, no. 6 (2013): 1009-1020. https://doi.org/10.1016/j.cmet.2013.04.010. Jansen, Ritsert C., and Jan-Peter Nap. “Genetical genomics: the added value from segregation.” TRENDS in Genetics 17, no. 7 (2001): 388-391. https://doi.org/10. 1016/S0168-9525(01)02310-1. Jeyakumar, S. M., A. Vajreswari, B. Sesikeran, and N. V. Giridharan. “Vitamin A supplementation induces adipose tissue loss through apoptosis in lean but not in obese rats of the WNIN/Ob strain.” Journal of molecular endocrinology 35, no. 2 (2005): 391-398. https://doi.org/10.1677/jme.1.01838. Jones, K. L., S. S. King, and M. J. Iqbal. “Endophyte‐infected tall fescue diet alters gene expression in heifer luteal tissue as revealed by interspecies microarray analysis.” Molecular Reproduction and Development: Incorporating Gamete Research 67, no. 2 (2004): 154-161. https://doi.org/10.1002/mrd.10395. Kaput, Jim, Jose M. Ordovas, Lynnette Ferguson, Ben Van Ommen, Raymond L. Rodriguez, Lindsay Allen, Bruce N. Ames et al. “The case for strategic international alliances to harness nutritional genomics for public and personal health.” British Journal of Nutrition 94, no. 5 (2005): 623-632. https://doi.org/10.1079/BJN 20051585. Kohl, Michael, Dominik A. Megger, Martin Trippler, Hagen Meckel, Maike Ahrens, Thilo Bracht, Frank Weber et al. “A practical data processing workflow for multi-OMICS projects.” Biochimica et Biophysica Acta (BBA)-Proteins and Proteomics 1844, no. 1 (2014): 52-62. https://doi.org/10.1016/j.bbapap.2013.02.029. Kore, K. B., A. K. Pathak, and Y. P. Gadekar. “Nutrigenomics: Emerging face of molecular nutrition to improve animal health and production.” Veterinary World 1, no. 9 (2008): 285. Lau, Francis C., Manashi Bagchi, Chandan Sen, Sashwati Roy, and Debasis Bagchi. “Nutrigenomic analysis of diet-gene interactions on functional supplements for weight management.” Current genomics 9, no. 4 (2008): 239-251. https://doi.org/10.2174/138920208784533638.

188

Asif Nadeem and Maryam Javed

Lewin, H. A. “The future of cattle genome research: the beef is here.” Cytogenetic and genome research 102, no. 1-4 (2003): 10. https://doi.org/10.1159/000075718. Lippolis, J. D., and T. A. Reinhardt. “Centennial paper: proteomics in animal science.” Journal of Animal Science 86, no. 9 (2008): 2430-2441. https://doi.org/10.2527/jas. 2008-0921. Loor, Juan J., Massimo Bionaz, and James K. Drackley. “Systems physiology in dairy cattle: Nutritional genomics and beyond.” Annu. Rev. Anim. Biosci. 1, no. 1 (2013): 365-392. https://doi.org/10.1146/annurev-animal-031412-103728. Masotti, Andrea, Letizia Da Sacco, Gian Franco Bottazzo, and Anna Alisi. “Microarray technology: a promising tool in nutrigenomics.” Critical reviews in food science and nutrition 50, no. 7 (2010): 693-698. https://doi.org/10.1080/10408390903044156. Mühlberger, Irmgard, Julia Wilflingseder, Andreas Bernthaler, Raul Fechete, Arno Lukas, and Paul Perco. “Computational analysis workflows for Omics data interpretation.” In Bioinformatics for Omics Data, pp. 379-397. Humana Press, 2011. https://doi.org/10.1007/978-1-61779-027-0_17. Mitra et al. Nutrigenomics: a new frontier (2005). http://www.apiindia.org/ images/stories/pdf/medicine_update_2005/chapter_182.pdf. Accessed 25 January 2011. Müller, Michael, and Sander Kersten. “Nutrigenomics: goals and strategies.” Nature Reviews Genetics 4, no. 4 (2003): 315-322. https://doi.org/10.1038/nrg1047. Munshi, Anjana, and V. Shanti Duvvuri. “Nutrigenomics: looking to DNA for nutrition advice.” (2008). Murray, Harry M., Santosh P. Lall, Rajesh Rajaselvam, Lee Anne Boutilier, Brian Blanchard, Robert M. Flight, Stefanie Colombo, Vindhya Mohindra, and Susan E. Douglas. “A nutrigenomic analysis of intestinal response to partial soybean meal replacement in diets for juvenile Atlantic halibut, Hippoglossus hippoglossus, L.” Aquaculture 298, no. 3-4 (2010): 282-293. https://doi.org/10.1016/j.aquaculture. 2009.11.001. Oliver, D. The Future of Nutrigenomics: From The Lab to the Dining Room. Institute for the Future. (2005). Ordovas, Jose M., and Dolores Corella. “Nutritional genomics.” Annu. Rev. Genomics Hum. Genet. 5 (2004): 71-118. https://doi.org/10.1146/annurev.genom.5.061903. 180008. Pieper, Robert, Susan Kröger, Jan F. Richter, Jing Wang, Lena Martin, Jérôme Bindelle, John K. Htoo et al. “Fermentable fiber ameliorates fermentable protein-induced changes in microbial ecology, but not the mucosal response, in the colon of piglets.” The Journal of nutrition 142, no. 4 (2012): 661-667. https://doi.org/10.3945 /jn.111.156190. Rao, Lin, Birgit Puschner, and Tomas A. Prolla. “Gene expression profiling of low selenium status in the mouse intestine: transcriptional activation of genes linked to DNA damage, cell cycle control and oxidative stress.” The Journal of nutrition 131, no. 12 (2001): 3175-3181. https://doi.org/10.1093/jn/131.12.3175. Saleem, F., B. N. Ametaj, S. Bouatra, R. Mandal, Q. Zebeli, S. M. Dunn, and D. S. Wishart. “A metabolomics approach to uncover the effects of grain diets on rumen health in

Nutrigenomics

189

dairy cows.” Journal of Dairy Science 95, no. 11 (2012): 6606-6623. https://doi.org/10.3168/jds.2012-5403. Schadt, Eric E., Stephanie A. Monks, Thomas A. Drake, Aldons J. Lusis, Nam Che, Veronica Colinayo, Thomas G. Ruff et al. “Genetics of gene expression surveyed in maize, mouse and man.” Nature 422, no. 6929 (2003): 297-302. https://doi.org/10. 1038/nature01434. Schellenberger, Jan, Junyoung O. Park, Tom M. Conrad, and Bernhard Ø. Palsson. “BiGG: a Biochemical Genetic and Genomic knowledgebase of large scale metabolic reconstructions.” BMC bioinformatics 11, no. 1 (2010): 1-10. https://doi.org/ 10.1186/1471-2105-11-213. Seo, Seongwon, and Harris A. Lewin. “Reconstruction of metabolic pathways for the cattle genome.” BMC systems biology 3, no. 1 (2009): 1-13. https://doi.org/10.1186/ 1752-0509-3-33. Shahzad, Khuram, and Juan J. Loor. “Application of top-down and bottom-up systems approaches in ruminant physiology and metabolism.” Current Genomics 13, no. 5 (2012): 379-394. https://doi.org/10.2174/138920212801619269. Siddique, R. A., M. Tandon, T. Ambwani, S. N. Rai, and S. K. Atreja. “Nutrigenomics: nutrient-gene interactions.” Food Reviews International 25, no. 4 (2009): 326-345. https://doi.org/10.1080/87559120903155883. Singh, Ram B., Gal Dubnov, Mohammad A. Niaz, Saraswati Ghosh, Reema Singh, Shanti S. Rastogi, Orly Manor, Daniel Pella, and Elliot M. Berry. “Effect of an IndoMediterranean diet on progression of coronary artery disease in high risk patients (Indo-Mediterranean Diet Heart Study): a randomised single-blind trial.” The Lancet 360, no. 9344 (2002): 1455-1461. https://doi.org/10.1016/S0140-6736(02)11472-3. Sinha, R., D. E. Anderson, S. S. McDonald, and P. Greenwald. “Cancer risk and diet in India.” Journal of postgraduate medicine 49, no. 3 (2003): 222. Srinivasarao, P., K. Narayanareddy, A. Vajreswari, M. Rupalatha, Padmini Surya Prakash, and Padmini Rao. “Influence of dietary fat on the activities of subcellular membranebound enzymes from different regions of rat brain.” Neurochemistry international 31, no. 6 (1997): 789-794. https://doi.org/10.1016/S0197-0186(97)00037-5. Subbiah, M. T. Ravi. “Nutrigenetics and nutraceuticals: the next wave riding on personalized medicine.” Translational Research 149, no. 2 (2007): 55-61. https://doi.org/10.1016/j.trsl.2006.09.003. Törrönen, Riitta, Marjukka Kolehmainen, and Kaisa Poutanen. “Nutrigenomics–new approaches for nutrition, food and health research.” Food and Health Research Centre (2006): 1-43. Zhang, Aihua, Hui Sun, Ping Wang, Ying Han, and Xijun Wang. “Modern analytical techniques in metabolomics analysis.” Analyst 137, no. 2 (2012): 293-300.

Chapter 10

Next-Generation Sequencing: Advantages, Disadvantages, and the Future Abstract Following the successful execution of the human genome project, the molecular sequencing techniques have advanced dramatically, resulting in lower per nucleotide megabase cost and an expansion in the quantity and variety of sequenced genomes. Whole-genome sequencing, RNAsequencing, proteome-sequencing, noncoding RNA expression, and identification of single nucleotide polymorphisms (SNPs) in a genome are just a few applications where these sequencing technologies have been applied so far. High-throughput omics research has never been easier thanks to a new range of sequencing technology platforms like AB/SOLiD, Illumina/Solexa, HeliScope, and 454/Roche. Modern stateof-the-art massively parallel systems outperform Sanger techniques in terms of throughput, and in the future development in sequencing technologies will accelerate the growth of this field. Here, we discuss the important features of sequencing platforms like First generation sequencers, second generation sequencers and the High-throughput Next generation sequencers, and then third generation high-throughput next generation sequencing platforms, third like single molecule Nanopore, HeliscopeTM, Reversible Termination Sequencing Technology, and SMRTTM etc. Moreover, we also describe the most popular among the high-throughput sequencing technologies, the expanding number of sequencing assays based on such technologies, and the issues that existing sequencing systems and their research and clinical applications face.

Keywords: first generation sequencers, second generation sequencers, highthroughput sequencing, third generation sequencers

192

Asif Nadeem and Maryam Javed

Introduction People are extensively using Sanger sequencing and fluorescence based electrophoresis for the detection of the DNA sequence for more than a decade to study somatic and germ lines genetics. The instrumentation has improved with high performance computers and bioinformatics tools. However increased throughput of di-deoxy chain termination sequencing has been parallelized by adding more sequencing machines with more need of electrophoresis or additional wells for every chemical reaction of capillary electrophoresis. On the other hand, massive parallel sequencing also known as Next Generation Sequencing (NGS) has resolved all the issues faced by Sanger DNA sequencing by creating the very small micro level reactors or attach the template DNA to a solid surface of beads. This allows millions of reactions for nucleic acid sequencing in parallel. Currently, market has commercialized four technologies and many other are in the pipeline for development and the implementation. A big leap has been taken in sequencing the whole genome by next generation sequencing (NGS) producing massive amount of data about 30 GB of DNA sequence in a single sequencing run. The type of data generated by Next Generation Sequencing is more amazing than the throughput and relatively low cost of sequencing. It can produce shorter reads of sequences than the traditional Sanger sequencing but millions of them are produced in one single run (Pettersson et al. 2009; Voelkerding et al. 2009; Tucker et al. 2009; Fullwood et al. 2009; Morozova and Marra 2008). Next generation Sequencing also differs from Sanger sequencing in using just one single molecule of the DNA instead of using amplification reaction before sequencing. It is much easy to count and quantify the shorter reads of DNA also allowing the detection and identification of mutations in the respective sequenced DNA (Stratton et al. 2009). Moreover, Massively Parallel Sequencing of the DNA fragments from the both ends made it possible to detect the re-arrangements in somatic cell in a genome wide fashion (Stratton et al. 2009; Fullwood et al. 2009; Campbell et al. 2008). Massively parallel sequencing also known as Next Generation Sequencing (NGS) produces much smaller reads of nucleic acids than conventional Sanger dideoxy chain termination method. Even then, many re-sequencing studies are using NGS for the studies of normal and cancer genomes just in the time span of few weeks (Wang et al. 2008; Shah et al. 2009). It is also being used to capture the DNA for focusing the specific regions or whole exome (Ng et al. 2009). Breast Cancer International Cancer Genome Consortium has

Next-Generation Sequencing

193

completed about 1500 breast cancer genomes which are helpful in the study of different subtypes of the cancer (www.icgc.org). Next generation sequencing has also been applied to germ-line DNA for finding the gene association studies including cancer genomes (Pettersson et al. 2009; Voelkerding et al. 2009; Tucker et al. 2009; Fullwood et al. 2009; Morozova and Marra 2008; Stratton et al. 2009). The accuracy and the speed of this technology might be helpful in some rare Mendalian disorders like Freeman-Sheldon Syndrome (Ng et al. 2009). The interpretation of any part of the genome will involve the biologists in dealing with the previously underestimated SNPs and DNA polymorphisms but 1000 Genomes Project is going to give an almost a complete picture of single nucleotide polymorphism, copy number polymorphism and indels in a general population (http://www.1000genomes.org). Digital Gene expression, RNA sequencing, pair end RNA sequencing and small and noncoding RNA sequencing are some other applications of next generation sequencing. It has made it possible to identify the multiple novel splice variants, gene rearrangements (Zhao et al. 2009; Hampton et al. 2009) and novel fusion genes (Maher et al. 2009 a, b), identification of read troughs (Maher et al. 2009a) which are actually RNAs produced through the co-splicing of two contagious genes in the genome. RNA editing like non-synonymous transcript editing of COG3 and SRP9 genes can also be studied by Next Generations Sequencing in metastasis of cancers (Shah et al. 2009). The project ENCODE is also using massively parallel sequencing for noncoding and small RNAs studies for revealing the regulation at transcription. DNA methylation, histone acetylation of the genome is also being studied by using massively parallel sequencing strategies (Lister and Ecker 2009. Microarrays are being replaced for RNA interference to identify the genes involved in viability of cancers (Iorns et al. 2007). A bulk of information might be in the bag from these strategies which will help in identifying the genes involved in survival of the cancer cells and drug targets.

First Generation Sequencing In 1953, Watson and Crick explained the 3D structure of DNA along with the crystallography by Rosalind Franklin and Maurice Wilkins (Watson and Crick 1953; Zallen, 2003). Both of them helped in elucidating the framework of DNA replication and protein synthesis but one could not read or sequence the

194

Asif Nadeem and Maryam Javed

DNA. It was difficult to apply the same strategies of protein sequencing for DNA sequencing as another problem was of distinguishing the different length DNA molecules (Hutchison, 2007). Initially, people focused on sequencing of microbial ribosomal or tRNA and ssRNA bacteriophages because they are much shorter in length than eukaryotic DNA molecules. With all these advantages, the progress was slow as these techniques only enabled the researchers to elucidate the nucleotide composition but not the order of the nucleotide (Holley et al. 1961). In 1965, Robert Holley and colleagues used some ribonucleases along with already available analytical chemistry techniques to obtain the first whole nucleic acid sequence of alanine tRNA from Saccharomyces cerevisiae (Holley et al. 1965). On the other end Fred Sanger et al. developed a new technique that involved the detection of radiolabeled partial digestion of fragments allowing the steady addition of growing pool of ribosomal and transfer RNAs (Sanger et al. 1965; Brownlee and Sanger 1967; Cory et al. 1968; Dube et al. 1968; Goodman et al. 1968; Adams et al. 1969). Walter Fiers also used the same 2-D fractionation to sequence the bacteriophage MS2 protein gene in 1972 and complete genome of the same organism in 1976 (Fiers et al. 1976). Many researchers used this DNA sequencing technique inrecent purification of DNA genomes of bacteriophages. The observation that Enterobacteria phage lambda has 5’ overhang, led Ray Wu and Dale Kaiser using DNA polymerase for the addition of radioactively labeled nucleotides with the measurement of their incorporation (Wu and Kaiser 1968; Wu, 1970). Later, 2D fractionation was replaced with polyacrylamide gel electrophoresis for the separation of polynucleotide chains providing great resolution. Alan Coulson-Sanger’s “plus and minus” system and Maxmam Gilbert’s chemical cleavage method were primarily used for sequencing (Sanger and Coulson 1975; Maxam and Gilbert 1977). In Plus and minus technique, DNA polymerase is used for the synthesis of new polynucleotide chain along with primers and incolves incorporation of radio- labelled nucleotides before second polymerization reaction. This is a plus reaction where only one type of nucleotide is present ending all the extensions with that same nucleotide type. In the next step, in the minus reactions other three types of nucleotides are used producing sequences upto the point of missing nucleotide. The product is run on polyacrylamide gel electrophoresis (PAGE) comparing and making it possible to check the position of nucleotides in the sequence. Sanger and his coworkers used the same technique for sequencing bacteriophage ɸX174 genome, the first ever genome sequenced (Sanger et al. 1977). Maxam Gilbert Sequencing techniques does not use DNA Polymerase for generating DNA

Next-Generation Sequencing

195

fragments but radiolabeled DNA is treated with some chemicals to break DNA at specific sites. The fragments are run on PAGE to determine the length of the fragments. Maxam Gilbert Sequencing is considered as the birth of “First generation” DNA sequencing as this was the first technique to be adopted worldwide. In 1977, development of another DNA sequencing started a new era of DNA sequencing. The technique was developed by Sanger and his coworkers thus named as Sanger’s Chain termination or Dideoxy techniques (Sanger et al. 1977). Chain termination sequencing method uses di-deoxyribonucleotides that have hydrogen on C3 instead of a hydroxyl group. This C3 with its OH group is required for chain elongation to make phosphodiester bond between two nucleotides (Chidgeavadze and Beabealashyilli 1984). Radio-labeled dideoxyribonucleotides are mixed with DNA extension reaction expecting to be incorporated at the end of each DNA strand causing the termination of the chain to produce different lengths of DNA, starting from the same nucleotide but ending in different dideoxy ribonucleotide. The four reactions are run in parallel with each containing individual ddNTP. The resulting fragments are run on four lanes of polyacrylamide gel. Through autoradiography one is able to elucidate the nucleotide sequence of the original DNA template. Using the same principle and some other techniques, Sanger et al. were able to develop Sanger sequencing as the most common DNA sequencing technology. A lot of improvements were made in Chain termination sequencing methods including the replacement of phospho or tritium-radiolabeling with fluorometric based detection and improvement in detection through capillary electrophoresis. All these improvements led to the development of many automated DNA sequencing machines including some commercial machines for commercial use (Smith et al. 1985; Ansorge et al. 1986; Ansorge et al. 1987; Prober et al. 1987; Kambara et al. 1988; Swerdlow and Gesteland 1990; Luckey et al. 1990; Hunkapiller et al. 1991). These first-generation sequencing machines may produce almost 1kb reads of DNA. In order to analyze larger DNA fragments, researchers used Shotgun sequencing involving the cloning of overlapped DNA lengths and sequencing them separately for subsequent assemblage of the DNA fragments into one contig in-silico (Staden, 1979; Anderson, 1981). Development of Polymerase Chain Reaction (PCR) (Saiki et al. 1985; Saiki et al. 1988) and Recombinant DNA Technologies (Jackson et al. 1972; Cohen et al. 1973) brought more revolution to genomics by generating high concentrations of purified DNA for DNA sequencing. Less direct routes also brought improvement for example; Klenow fragment of DNA polymerase was used in DNA sequencing as it has

196

Asif Nadeem and Maryam Javed

the ability to incorporate di-de-oxyribonucleotides. Many sequenced genomes and tools for manipulation of genomes led to find polymerases with accommodating abilities of some additional chemicals for dNTP incorporation (Chen, 2014). Eventually, new DNA sequencers with Dideoxy chain termination method allowed simultaneous sequencing of many hundreds DNA templates which were used in sequencing of Human Genome Project (Ansorge, 2009).

Second Generation Sequencing Another method was later developed that did not use radio fluorescently labeled dNTPs for identification of nucleotides but instead used luminescent method for the measurement of pyrophosphate synthesis. The method involved two enzyme converting pyrophosphate to ATP in the presence of enzyme ATP sulfurylase. This ATP is the substrate for luciferase enzyme producing the light proportional to the amount of pyrophosphate (Nyrén and Lundin 1985) as each nucleotide is washed through a system in turnover the template DNA is fixed to a solid phase (Hyman, 1988). This shows that even though pyrosequencing and Sanger sequencing are both different but both are sequence by synthesis techniques. Nyrena and his coworkers developed the pyrosequencing with a lot of benefits like pyrosequencing can use natural nucleotides instead of using ddNTPs observed in real time (Nyrén and Lundin 1985; Ronaghi et al. 1996; Ronghi et al. 1998). Another improvement was the attachment of DNA with paramagnetic beads and enzymatic degradation of unincorporated dNTPs that were extra, leading to lengthy washing steps. One of the major flaw in this technique is finding the number of same nucleotides in the same row at a given position (Ronghi et al. 1998). 454 Life Sciences, a biotechnology company got the license for Pyrosequencing and it evolved as first major successful commercial NGS technology. Their machine was able to parallelize many sequencing reactions in one run. The DNA libraries are attached to beads through adapters followed by water in oil emulsion PCR (emPCR) (Tawfik and Griffiths 1998). Thus, coating each bead in a clonal DNA population accommodating one bead per DNA molecule that emulsifies in its own droplet in the emulsion. The DNA coated beads are then washed over a picoliter reaction plate fitting one bead in one well. Then, small bead-linked enzymes and dNTPs are washed over a plate releasing the pyrophosphate. This release of pyrophosphate is then measured through a charged couple device (CCD)

Next-Generation Sequencing

197

sensor beneath the wells. The whole setup of reaction can produce reads of around 400-500 bp length (Margulies et al. 2005). This allowed researchers to sequence the single human genome completely. This was more rapid than the effort of Craig Venter’s team using Sanger sequencing (Wheeler et al. 2008). The first high throughput sequencing machine GS20 produced greater number of reads with better quality of the sequences (Voelkerding et al. 2009). This great number of reads in parallel sequencing reactions on a micrometer scale led to the definition of second generation DNA sequencing (Shendure and Ji 2008). After the successful evolution and progress of 454 sequencing method, many sequencing techniques arose and Solexa was one of the important technique, later acquired by Illumina (Voelkerding et al. 2009). Instead of using emPCR, DNA with its adapters is passed through a lawn of nucleotides bound to a flow-cell. Solid phase PCR then produces clones of the respective DNA on each flow-cell (Fedurco et al. 2006; Bentley et al. 2008). The process has been dubbed bridge amplification because of replicating DNA having to arch over to prime the next cycle of polymerization off neighboring surface bound nucleotides (Voelkerding et al. 2009). Flourescent reversible terminator deoxyribonucleotides are used for sequencing by synthesis (Turcatti et al. 2008). The flow-cells are cleaned of dNTPs and DNA polymerase. The integration of nucleotides is monitored using a charged couple device or CCD after illumination of fluorophores with a laser, followed by enzyme removal of blocking fluorescent moieties and progression to the next location. The initial genome analyzer produced reads of only around 350bp in the form of paired end data, in which both ends of each DNA cluster could be recorded. This is accomplished by acquiring one read from a single stranded flow cell bound DNA before doing a single solid phase PCR extension cycle and removing the previously sequenced DNA strand. As a result, the DNA read should be oriented in reverse in relation to the flow-cell. Then, from the other end of the molecule to the first, a second read is received. The lengths of the DNA strands are both known with a great deal of precision. The conventional genome analyzer (GAIIx) was followed by the HiSeq analyzer, which had longer reads, and finally the MiSeq analyzer, which had a lower throughput but faster turnaround and longer length of reads (Balasubramanian, 2011; Quail et al. 2012). Many sequencing companies with their novel methods produced their impact on sequencing experiments. In the start of second generation sequencing, the third option was sequencing by ligation of nucleotides and detection system of SOLiD from Applied Biosystems (Mckernan et al. 2009). SOLiD does not sequence by synthesis but by DNA ligation through DNA

198

Asif Nadeem and Maryam Javed

ligase enzyme previously used by George Church’s group (Shendure et al. 2005). SOLiD platform is unable to compete Illumina machine. Complete Genomic’s DNA nanoballs technique involves sequencing by ligation technology (Buermans and den Dunnen 2014). The DNA sequences are obtained using probe ligation in this approach, but clonal DNA manufacturing is new. The DNA is cloned via rolling circular amplification, which results in long DNA chains made up of template DNA repetitions with adapters attached to their ends. They then self-assemble into nanoballs that are attached to a slide that will be sequenced (Drmanac et al. 2010). Jonathan Rothburg’s last remarkable second-generation sequencing platform, built after he left 454, was this. Ion torrent claims to be the first technique for post-light sequencing. It doesn’t make use of fluorescence or luminescence in any way. Each nucleotide is washed over a picowell plate after the beads containing DNA colonies have been washed. Then, utilizing complementary metal oxide semiconductor technology, the incorporation of nucleotides is measured by the release of protons (H+) during nucleotide extension (Rothberg et al. 2011). The only flaw in this technology is its limited ability to introduce homopolymer sequences due to signal loss. The cost and ease of DNA sequencing have altered drastically as a result of the genomic revolution. DNA sequencers are currently being sequenced at a faster rate than Moor’s law allows; the complexity of microchips doubles every two years, whereas sequencing doubled every five months between 2004 and 2010 (Stein, 2010).

Third Generation DNA Sequencing The defining techniques for third generation sequencing are the single molecule sequencing, real-time sequencing, and simple divergence from preceding technologies. Staphan Quake developed the first single molecule sequencing (SMS) technology which was commercialized and broadened by Helicos Biosciences without any kind of bridge amplification. The defining techniques for third generation sequencing are expected to be single-molecule sequencing, real-time sequencing, and simple divergence from earlier technologies (Schadt et al. 2010; Pareek et al. 2011). Other firms picked up the third-generation baton after Helicos filed bankruptcy in the early 2012. Pacific Biosciences’ Single Molecule Real Time (SMRT) platform is the most extensively utilized third generation sequencing technique (van Dijk et al. 2014). It is available on PacBio range of machines. In this technology,

Next-Generation Sequencing

199

incorporation of DNA occurs in arrays of microfabricated nanostructure known as sero mode waveguides (ZMWs). They are essentially very small holes in a metallic file that cover a chip. These nanostructures use light properties which is passing through the hole of diameter smaller than its wavelength. The light while passing through the aperture causes the exponential decay illuminating the bottom of the wells. Hence one can see the single fluorophore molecules closer to the bottom of nanostructures (Levene et al. 2003). The deposition of single DNA polymerase inside the ZMWs gets them inside the illuminated region. By the washing the DNA library and fluorescently labelled de-oxyribo nucleotides, the incorporation of single nucleotide even can be monitored in real time (Eid et al. 2009). The whole process occurs in very small time. PacBio range has many advantages which are now being shared by many other companies in their machines. The production of kinetic data by the sequencing also allows the detection of modifies bases (Flusberg et al. 2010). PacBio machines can produce long reads of sequenced nucleic acids like upto and more than 10kb that are very useful for de-novo genome assemblies (Schadt et al. 2010; van Dijk et al. 2014). Nanopore sequencing is the most anticipated area for 3rd Generation DNA sequencing where detection and quantification of different molecules is done through nanopores (Haque et al. 2013). The idea of nanopore sequencing was developed when some biologists drive ssDNA or RNA across the lipid bilayer using alpha hemolysin ion channels through electrophoresis. This passage of the nucleic acids blocks the ion flow that results in decrease in current for the time which is proportional to the length of nucleic acid (Kasianowicz et al. 1996). Non-biological, solid state technology has also been planned to use for the generation of nanopores to be able to sequence the double stranded DNAs (Dekker, 2010). Oxford Nanopore Technologies (ONT) is the first company that offered nanopore sequencers with the platforms GridION and MinION (Clarke et al. 2009). MinION is a small USB sized sequencing platform released in 2014. Despite the poor quality, such sequencers are expected to produce long reads of DNA rapidly and very cheaply (Branton et al. 2008; Haque et al. 2013). Many bacterial genomes have been sequenced through MinIONs sequencer (Quick et al. 2014). MinION machine has decentralized the sequencing today as Joshua Quick and Nicolas Loman sequenced the Ebola viruses in Guinea within a period of two days after collecting the sample (Hayden, 2015).

200

Asif Nadeem and Maryam Javed

Approaches for Next Generation Sequencing 1. 2. 3. 4.

Pyrosequencing Chemistry and reversible terminator sequencing Ligation Sequencing Phospholinked Fluorescent Nucleotides or Real Time Sequencing

Pyrosequencing Pyrosequencing is a DNA sequencing technique developed by the Royal Institute of Technology in the United Kingdom (KTH). Without the use of tagged primers, nucleotides, or gel electrophoresis, the method employs luminometric detection of pyrophosphate produced as a result of dNTP incorporation. Upto 100bp fragment can be sequenced through pyrosequencing. Pyrosequencing is widely being used for nucleic acid characterization with potentiate accuracy, flexibility, automation and parallel processing being a successful confirmatory and denovo sequencing (Ronaghi, 2000). Pyrosequencing Principle Pyrosequencing is based on the idea of synthesis-based sequencing (Hyman, 1988; Melamede, 1989) and pyrophosphate identification during the incorporation of DNA replication. The detection of nucleotide incorporation is done with a set of four enzymes. Before being combined with enzymes, a sequencing primer is annealed to a single-stranded DNA template that has been biotin-labeled. Adenosine 5' phosphosulfate (APS) and luciferin are the targets of DNA polymerase, ATP sulfurylase, luciferase, and apyrase, respectively. (Gharizadeh et al. 2007). Four cycles of dNTPs are added separately to the reaction mixture and the reaction starts with polymerization of template DNA where incorporation of nucleotides releases pyrophosphate (PPi) in equimolar amount to the incorporated nucleotides. This PPi is subsequently incorporated to the ATP synthesis catalyzed by ATP sylfurylase in the presence of APS. This ATP starts the conversion of luciferin to oxyluciferin by luciferase enzyme along with the production of visible light proportional to ATP amount. The light produced during this reaction is almost 560 nm wavelength which can be detected by a photon detection device like a Charged Coupled Device (CCD) camera or photomultiplier. Another enzyme, Apyrase degrades nucleotides including ATP and unincorporated dNTPs in the dNTP incorporation. Usually 65 seconds are taken for the nucleotide

Next-Generation Sequencing

201

dispension for complete degradation, dNTP incorporation is one at time rate (Gharizadeh et al. 2007). The sequence of the template can be calculated if the additional nucleotides are known. The signal of the light produced during pyrosequencing can be observed in the form of peak signal in the pyrogram which is relative to the number of incorporated dNTPs. Triple dGTP incorporation will produce a triple higher peak (Gharizadeh et al. 2007). The DNA sequence is displayed as a pyrogram on the screen. The climbing curve slope in the pyrogram represents polymerase and ATP sulfurylase activity. The signal’s height corresponds to luciferase, whereas the slope of the decreasing curve depicts nucleotide breakdown (Gharizadeh, 2003). Integrated software is used for base-calling having features for SNP and sequence analyses (Gharizadeh et al. 2007). It takes 3-4 seconds to complete polymerization and light detection at room temperature. ATP sulfurylase reaction takes 1.5 seconds and less than 0.2 seconds are taken by the light production (Nyrén and Lundin 1985). Standard pyrosequencing uses Klenow fragment from Escherichia coli DNA Pol I. The ATP sulfurylase employed is a recombinant version of yeast (Saccharomyces cerevisiae) while luciferase comes from Photinus pyralis, the American firefly (Ronaghi, 1999). Apyrase is obtained from Solanum tuberisum (Pimpernel variety).

Benefits of Pyrosequencing Pyrosequencing has surfaced as a viable replacement of Sanger Dideoxy Chain Termination DNA sequencing for the sequencing of short pieces of template DNA due to its increased speed and real-time readout (Gharizadeh et al. 2007). It utilizes many enzymes simultaneously for DNA synthesis whose kinetics can be investigated in real-time (Gharizadeh et al. 2003c). This is more convenient in producing sequencing signal immediately downstream the primer. About 15 minutes are taken for single strand DNA preparation in contrast to Sanger method that takes almost 4 hours. Reagents cost for pyrosequencing is also very low (Gharizadeh et al. 2007). Another advantage of pyrosequencing is easily programmed dispension of nucleotides with variations in pyrogram patterns for revealing mutations, deletions and insertions. Base calling can be observed in real time for each sample. Automated Pyrosequencing is also available for large scale screening (Gharizadeh et al. 2003c). The 3'-O-blocked terminator should have a greater termination effect than the other terminators since it has a 3' reversible blocking group. Due to the lack of alteration at the 3'-OH group, the 3'-unblocked reversible terminator is

202

Asif Nadeem and Maryam Javed

easily accepted by DDNA polymerase. As the DNA polymerase enzymes have gone through the evolution of billions of years to distinguish between RNA and DNA nucleotides, they closely observe the 2’ and 3’ positions of their substrates. For instance, the presence and lack of oxygen atom on 3’ position will terminate the chain extension further (Chen et al. 2010; Benner, 2004). Three reversible terminators are commercially available having a blocking group at 3’-OH: the 3’-ONH2 reversible terminator which was developed by Dr. Steven A Benner and his coworkers from Foundation for Applied Molecular Evolution (Chen et al. 2010; Wu et al. 2007). Jiggyue Ju with his co-workers developed 3’-O-allyl reversible terminator (Guo et al. 2010) and Illumina Solexa developed 3’-O-Azidomethy reversible terminator (Bentley et al. 2008). Only one commercial terminator is available for second type of terminator with name of virtual terminator developed by Helicos Biosciences Corporation. In the emerging 3rd generation DNA sequencing, this was used by the first single molecule sequencer available on the market (Bowers et al. 2009). Micheal L Metzker with his collegues developed Lightning terminator recently belonging to 3’-OH unblocked reversible terminator. The use of UV light to cleave the fluorescent group is unique (Wu et al. 2007; Gardner et al. 2012; Stupi et al. 2012; Litosh et al. 2011).

Reversible Terminator Sequencing Technology’s Working Mechanism The sequencing by synthesis principle infers the DNA sequence by stepwise primer elongation in sequencing technology. Illumina platform owns it as their second-generation technology. Generally, the process of reversible termination sequencing involves the following steps as Extensiontermination-cleavage-extension 1. Immobilization of nucleic acid template and primers on solid surface 2. Extension of primers through one base and termination 3. Detection of color of fluorophore released by incorporation of nucleotide 4. Removing of fluorescently tagged 3’-O blocking group 5. Repeated washing and aforementioned steps (Step 2 to 4) After the designing and synthesis of reversible terminator, the very important step is to select the proper polymerase enzyme which can accept the nucleotide analog. Although it is difficult to find a selective enzyme for a particular structure of a reversible terminating nucleotide with high fidelity and efficiency. For finding a proper selective polymerase, potential candidates

Next-Generation Sequencing

203

are tested through experimentation through primer extension screening. There are two approaches for this type of approaches that have been used for the selection of proper polymerase for reversible terminators (Chen et al. 2010; Hutter et al. 2010). One of the approaches is to simply screen them from commercially available DNA polymerases and reverse transcriptases (Hutter et al. 2010). The second approach is through directed evolution of semi rational design. In directed evolution, mutation libraries are constructed through rational designing while combination of rational design with directed evolution is approached in semi-rational design (Chen et al. 2010). Terminator, Klenow, Bst and 90Nm DNA polymerases are some commercially available DNA polymerases that have been found to show better results (Bentle et al. 2008; Guo et al. 2008; Hutter et al. 2010; Wu et al. 2007). Ampli Taq DNA polymerase have been used by Jingyue Ju for sequencing as it has good compatibility to large fluorescent group at 5’ of pyrimidine and 7’ position of purines (Guo et al. 2008). Illumina Solexa has introduced a mutant version of 90Nm DNA polymerase but without any information of mutation (Bentley et al. 2008). Lightening terminator research group tested 8 commercial DNA polymerases and Bst DNA polymerase was found to be the best (Wu et al. 2007). Steven A Benner and his coworker have shown some revers-transcriptases with good compatibility to nucleotide analogs (Benner, 2004).

Chemical Aspects of Next Generation Sequencing Human Genome project was completed in 2003 using Sanger Chain Termination method also classified as First-Generation sequencing. When Illumina Genome analyzer was introduced in 2007 started another era of DNA sequencing, Next generation sequencing. In 2008, NGS completed the first individual human genome sequence (Wheeler et al. 2008). Now NGS is evolving rapidly with low cost over DNA sequencing with expected to lead to 10000$ per diploid human genome (Niedringhaus et al. 2011). Currently, a typical platform can produce 600GBs in a sequencing run within a time period of 7-10 days. This data set may contain 6000000000 sequencing reads with 100 bases per length. Generally, Next generation sequencing of DNA uses synthesis or ligation chemistry for reading template DNA in a parallel fashion with production of massive sequencing data. Sequencing by synthesis (Fuller et al. 2009) uses single molecule approach or ensemble approach. Both of these approaches might be accomplished in real time or synchronous controlled fashion. Signal detection is brought about by fluorescently labeled nucleotides, enzymes

204

Asif Nadeem and Maryam Javed

coupled chemiluminescence assays for pyrophosphate and pH changes that occur during incorporation of nucleotides. One big contrast between Sanger dideoxy Chain termination sequencing and NGS is the length of sequence read as Sanger method produces approximately 500 to 1000 bp length reads while on the other hand multiple reads of smaller length of the same template are produced in parallel fashion. Next generation sequencing has increased the speed and throughput in contrast to Sanger Method.

General Workflow of NGS Now a days NGS is essential in all the fields of biological sciences enabling the millions of DNAs in parallel with low cost off-course (Metzker, 2010). Not only it sequences and re-sequences the genomes but also gives the most accurate information about the DNA composition. Further it helps in transcriptome analyses, metagenomics, profiling of methylated DNA or DNA associated proteins. Despite a lot of chemistries, next generation sequencing technologies need a complex and highly perfect target preparation for pre- and post-sequencing analyses of the data. Pre-sequencing steps include the enrichment of target DNA and preparation of NGS library. PCR, Long PCR or Raindance fluifigm PCR are the amplification methods for the target DNA enrichment. Other methods include hybridization capture methods but solid phase or in solution can also be used for enrichment. There are many methods for DNA library preparation but linkage of adapters to nucleic acid is the common step in all of them. Adapter linkage is necessary for the immobilization on solid surface and sequencing. Furthermore, often size selection and usual amplification by PCR is performed. The quality of the sequencing data depends on the starting material which should be a high quality nucleic acid. Thus, library construction becomes the most important step. Here is the brief description of library construction step. 1. 2. 3. 4.

DNA Sequencing Fragmentation Size Selection Polymerase Chain Reaction (PCR)

DNA Sequencing DNA sequencing always starts with extracted DNA which is then fragmented followed by immunoprecipitation and removal of DNA bound proteins. Then

Next-Generation Sequencing

205

end-paired adapter ligation and size selection is done to eliminate unligated adapter molecules. Then, often PCR is done to produce enough DNA template, accurate quantification and successful linkage of adapters. PCR may aid in addition of more adapters using tailed primers preparing more template DNA with all elements necessary for bridge amplification on flow cell surface for DNA sequencing.

Fragmentation Fragmentation of DNA is done by mechanical forces like nebulization or sonication or enzymatically (Teytelman et al. 2009). As a result of fragmentation, templates of about 200bp are prepared as a library which only contains euchromatin with the elimination of heterochromatin. Mokry and collegues developed a protocol named as double fragmentation ChIP-seq protocol (Mokry et al. 2010). Convention crosslinking and immunoprecipitation de-cross links chromatin shearing it into fragments of the size for NGS. This approach decreases bias against heterochromatin DNA and brings increase in DNA yield. Size Selection For size selection, current procedures use solid phase reversible immobilization beads. The beads provide a quick and easy approach to enrich DNA fragments of a specific size. Gel extraction is still used to choose sizes, though. Quail and his colleagues heated an agarose gel in chaotropic salt buffer to temperatures above 50°C, which reduced AT-rich sequences. This is because AT rich parts are denatured during size selection making it difficult to re-anneal (Quail et al. 2008). To solve this problem gel slices should be melted in the supplied buffer at room temperature between 18-22oC which considerably decreases GC biasness. Polymerase Chain Reaction (PCR) Amplification through PCR induces bias in the sample composition as not all the fragments of DNA are not amplified with the same efficiency. The fragments with GC-neutrality are more efficiently amplified than GC or AT rich onse. This might result in high GC or AT rich fragments unrepresented or lost during the library preparation process (Day et al. 1996). This increases complication in the genomes rich in GC (HepB virus) (Perelygina et al. 2003) or AT rich (Plasmodium falciparum) (Gardner et al. 2002) or even for high GC-rich CpG islands of cancer genomes. This bias from PCR was avoided by method of Kozarewa and colleagues (Kozarewa et al. 2009). They proposed

206

Asif Nadeem and Maryam Javed

the ligation of the adapters containing all the elements required for the bridge amplification which will eliminate the need for PCR to add them. Current Illumina TruSeq adapters are PCR free kits reducing the bias. Moreover, a PCR additive Betaine was used to reduce bias in GC -rich templates (Aird et al. 2011). Tetramethyl-ammonium chloride (TMAC) is another additive increasing thermos-stability of the AT base pairs (Oyola et al. 2012). Moreover, recombinase polymerase amplification (RPA) is another amplification method based on strand displacement polymerase operating at low temperature (Piepenburg et al. 2006). Linear amplification for deep sequencing (LADS) is another transcription system that uses T7 polymerase for DNA transcription of a library many times followed by reverse transcription giving the linearly amplified library (Hoeijmakers et al. 2011). Both of these methods show better results than standard Phusion polymerase amplification but high levels of chimerics and duplicated reads are produced. Classical PCR with Kapa Hifi showed better results in library construction (Oyola et al. 2012). Multiple linear amplification method is another good method for amplification which bases on the observation that O29 DNA polymerase can also amplify linear genomic DNA (Lizardi et al. 1998). Even micrograms of DNA can be amplified through MDA which is why it has now been a method of choice for whole genome sequencing from a single cell (Yilmaz and Singh 2012). In short, many alternative methods have been proposed for amplification of genomic DNA with nucleotide bias. But PCR with Kapa HiFi polymerase have been found best of all or MDA with single cell.

Bioinformatics in Next-Generation Sequencing Bioinformatics is facing many fascinating challenges as Next Generation Sequencing is evolving. Bioinformatics needs to improve in sequence quality scoring, alignment, assembly and data release. Sequence quality scoring has become a very important issue as many different sequencing platforms and biochemistries have low quality data and context dependent error distributions. With the maturity of second-generation sequencing and expansion in biological problems, the need to have clear matrices for data quality, reliability, reproducibility and biological relevance becomes necessary. Now there is an opportunity for comparing the different old and new platforms which will improve the data quality. The standardized quality will benefit

Next-Generation Sequencing

207

many applications in biological sciences like quantification of quality of denovo sequence assemblies, confidence matrices on the aligned data to reference, confidence on base calling for polymorphism and mutation detection. Following are the fields where new sequencing platforms should evaluate data quality I. II. III. IV.

Technical reproducibility Estimated accuracies for raw base calls distribution Raw and consensus sequencing data errors True ratio bias and skewness in tag counting applications.

Data in simple and standardized forms must be included by default with the sequences. Currently, data vendors are trying to access quality matrices in reporting of data with only the need of cross comparisons of the data with potential integration from multiple platforms with advantages. Currently, the resequencing approaches try to evaluate polymorphism and mutations in the sequenced data and most of the time approach through short length reads but this is prone to errors and false discovery. Brockman et al. 2008 and Quinlan et al. 2008 have developed some new approaches for Single Nucleotide Polymorphism (SNP) detection that can access error chances during base calls. It has been suggested that third party algorithms will improve the base calling and polymorphism detection in up-coming sequencing platforms.

Next Generation Sequencing Data Analysis Next generation Sequencing produces a massive amount of data and there are many softwares available for the analyses of the data. These softwares involve the alignment of the sequences to a reference sequence, base calling or polymorphism detection, denovo assembly of the sequence reads, genome browsing and annotation. For alignment purpose, BLAST or BLAT is enough for longer reads. There is a massive increase in tools for the alignment of short length reads of DNA. Smith-Waterman and some other tools have very wellestablished algorithms. One of the example is SOAP which is a software package for the alignment of gapped and un-gapped alignment. This software uses memory intensive seed and look up table algorithms that accelerate the alignment process allowing the iterative trimming of the 3’ end of the reads (Li et al. 2008). Bit encoding is another approach for accelerated alignment which also compresses the data into more appropriate format (Li et al. 2008; Ning et al. 2001). Another software MAQ also takes into account the estimated data quality for generating read-placements. MAQ specially works for Solex

208

Asif Nadeem and Maryam Javed

or SOLiD and SHRiMP data. This includes a novel color space or letter space SM algorithm. Many tools for the assembly of the data have been introduced in the field of next generation sequencing especially for short and unpaired DNA reads (Butler et al. 2008; Sundquist et al. 2007; Warren et al. 2007). Major platforms usually have many mate-paired reads which are anticipitated to impose some major impact on overall assembly of the short length reads. Many algorithms have also been developed for this purpose (Butler et al. 2008; Sundquist et al. 2007; Warren et al. 2007).

Bioinformatics for Next Generation Sequencing Data Alignment One major problem with the Next Generation Sequencing is the alignment of the read produced. All NGS platforms can produce massive amount of data even in gigabytes in just 24 hours (Metzker, 2010). Traditional tools for alignment of the sequencing did not show efficiency demanded by the researchers therefore, Bioinformatitians started developing new tools for read alignment. These new tools are technology specific with different advantages like short read lengths of Solex, SOLiD and Helicos, low indel error rate of Illumina reads and di-base encoding reads of SOLiD. The tools developed for alignment are Short read Aligners being more accurate and rapid outclassed the traditional BLAST tool (Kent, 2002). Following are the capabilities that one good algorithm should have for the alignment of short length reads of nucleic acid sequencers. 1. It must be very quick and efficient in aligning billions of the short length reads produced by NGS platform. 2. It must also be able to align non-unique reads (Repetitive elements in reference sequences) and the reads not matching exactly with reference sequences (Sequence error variations). Short read alignment tools like BWA and Mosaik can work well with Sanger and 454 pyrosequencing reads. Bowtie and MAQ use quality scores for the improved and accurate alignment. Bowtie, BWA and SOAP2 can align almost 7GB of the human genome per CPU per day.

Next-Generation Sequencing

209

De-Novo Assembly The next very important step of NGS data analyses is the de-novo assembly. Assembly of short reads is essential when reference genomes are not available. This is only possible through some bioinformatics tools. Many algorithms have been developed for the de-novo assembly of the reads like AbySS (Simpson et al. 2009), ALLPATHS (Butler et al. 2008), Edena (Hernandez et al. 2008), Velvet (Zerbino and Birney, 2008) and SOAP denovo (Li et al. 2008). The base of all these tools is deBruijn graph data structure (Pevzner et al. 1989; Idury and Waterman 1995). The occurrence of repeats in the sequence reads makes it problematic or nearly impossible to assemble the long reads and thus only short length reads can be assembled.

Identification of SNP/Indel detection Identification of Single Nucleotide Polymorphism (SNP) and indels are another very important part of the resequencing of the genome. Only few useful approaches have been implemented for both the identifications. These tools focus on deciding the likelihood of a locus being homozygous or heterozygous with error rate of the particular sequencing platform, probabilities of bad mapping of the genes, and coverage. Following two steps are generally followed by the tools. The first step is the preparation of data and calling each nucleotide under a Bayesian framework. The first step involves the evaluation and filtration of the read. Those reads that may be mapped to paralogs or repeats discarded while only the reads giving good support evidence, quality values are reassigned on the basis of different statistics. And lastly, realignment is performed. After preparation, Bayesian approach is used to filter the data which consists of computation of conditioned likelihood of nucleotide position using Bayesian rule.

This rule is the posterior probability of a given genotype G for a given data R and is calculated when one knows the prior probability of that genotype and the likelihood of observing the given data from the genotype. Prior probability is calculated as variant probability and probability of reads is then

210

Asif Nadeem and Maryam Javed

estimated for every possible genotype. PolyBayes (Marth et al. 1999) and SOAP snp (Li et al. 2009) and MAQ (Li et al. 2008) are the tools for Bayesian approach. Recently, Malhis and Jones 2010 and Hoberman et al. 2009 introduced two alternate methods for Bayesian approach. Malhis et al. proposed an algorithm which is implemented in Slider tool (Malhis et al. 2009). It considers the most likely base at each position with all the possibilities. If a match is found between the most likely base and reference base, it is considered as non-variant. If there is no match between the two above mentioned bases, then base is considered variable being above the cutoff probability. On the other hand, if the reference allele is improbable, base is considered as candidate SNP.

Alignment/Assembly Viewers Next generation sequencing has raised the need for fast, efficient and user friendly tools to assemble or align the re-sequenced genomes. The tools developed for this purpose are EagleView, MapView, the Text Alignment Viewer of SAM tools, MaqView, Tablet and IGV (Huang and Marth 2008; Bao et al. 2009; Li et al. 2009; Li et al. 2008; Milne et al. 2010) by Broad Institute. Some visualization software are also required for dealing with NGS data. This software should be quick and efficient giving high quality rendering and navigation for the data formats.

NGS Technologies Overview Roche/454 The first 454 Sequencer was launched in 2004 as first commercial Roche GSFLX 454 Genome sequencer. A complete genome of an individual Human was sequenced by 454 Sequencer by James D Watson. Then in 2008, upgraded version of this machine was introduced as 454 GS FLX Titanium System that had increased the read length upto 700 bp with 99.99% accuracy. An improvement was shown with 0.7 Gb of data output in one individual run in just 24 hours. Previously the sequencer used to produce 700 bp length and 70MB of data inn 10 to 18 hours. The 454 Sequencing system uses Pyrosequencing along with emulsion PCR. In emulsion PCR, amplified fragments are attached to beads which are then deposited into the wells of picoliter plate. In these wells solid phase pyrosequencing is carried out

Next-Generation Sequencing

211

(Lohmann and Klein 2014; Xuan et al. 2013). Some enzymes like polymerase, sulfurylase and luciferase are also added. Unlabeled nucleotides are also added which are incorporated during the sequencing process. The incorporation of nucleotides is detected by the emission of light which is the consequence of release of pyrophosphate released during the incorporation of nucleotides. The 454 Sequencer constructs libraries by different methods producing mixture of short nucleic acid fragments with adaptors on their ends. Emulsion PCR (Dressman et al. 2003) generates colonal sequencing features and amplified fragments are catched on 28um beads. The beads then are treated with a denaturant to remove extra nucleotides that were not incorporated. The beads are then passed through hybridization based enrichment. In contrast to other sequencing platforms, the sequencing must be monitored in real time. The pattern of incorporation of nucleotides gives the information about template nucleic acid. One of the major drawbacks of 454 sequencer is homopolymers which makes it almost impossible to distinguish between incorporated and non-incorporated nucleotides. This results in insertiondeletion rather than substitutions. The 454 Sequencing platform can sequence only about 600 nucleotides which is half of the current Sanger sequencer (1200 nucleotides read). Thus 454 sequencer read length has longest read length of 600 bp and reads of 400600 Mb per run. Metagenomics seemed to benefit from 454 Sequencer despite all the drawbacks (Weiss et al. 2013). To an unknown sequence, a primer is linked to its universal adopter and then Pyrosequencing starts (Ronaghi et al. 1996).

Illumina Genome Analyzer Turcatti and coworkers (Adessi et al. 2000; Turcatti et al. 2008) introduced a Next Generation Sequencing platform called The Solexa. In this platform, nucleic acid libraries are constructed by using any of the methods that gives a mixture of adaptor ligated nucleic acid fragments which are several hundred base pairs long. The PCR used is Bridge PCR (Adessi et al. 2000; Fedurco et al. 2006). In this technology, forward and reverse primers are attached to a solid surface through flexible linkers in such a way that all the amplified fragments remain immobilized and clustered to single location on an array. Bridge PCR uses formamide for denaturation and Bst polymerase for the amplification of the template DNA strand. The PCR produces a cluster of about 1000 amplicons and several million clusters of amplicons are produced on single flow-cell in its independent lanes.

212

Asif Nadeem and Maryam Javed

Generation of amplicons clusters is followed by hybridization of sequencing primers flanking the site of interest. Every single cycle of nucleic acid sequencing involves single base extension along with a special type of DNA polymerase and mixture of four nucleotides. Nucleotides to be used are modified in two ways. One of them is that they are made reversible terminators through cleavage at 3’ OH allowing the single base incorporation in each cycle and one of the four fluorescently labeled corresponding to the identity of each nucleotide (Turcatti et al. 2008). When single base has been incorporated, chemical cleavage sets up the next stage. Usual read length is 36bs with possibility of long reads. There are many factors responsible for limiting the read length. These factors produce signal decay and dephasing. Dominant type of error is substitution. Average raw rates are 1-1.5%. Recently, some modifications made it possible to produce mate-paired reads. For example, each sequencing yielding 2 × 36 bp independent reads from each end of a given library of the nucleic acids.

AB SOLiD AB SOLiD was first described by J S and his coworkers in 2005 (Shendure et al. 2005) and McKernan and his coworkers at Agencourt Personel Genomics (McKernan et al. 2009) (Beverly, MA, USA) (acquired by Applied Biosystems (Foster City, CA, USA) in 2006. Adaptor linked fragments are used to construct libraries by any of the methods although many protocols have been used for mate paired tag libraries with controlled and highly flexible distance distribution (Shendure et al. 2005). Emulsion PCR is used to generate clonal sequencing features along with capturing of amplicons fixed to a surface of 1uM paramagnetic beads (Dressman et al. 2003). After the breaking of emulsion, beads having amplicons are recovered selectively and immobilized to a solid surface for making a dense and disordered array of fragments. DNA ligase is used for sequencing by synthesis (Shendure et al. 2005; Brenner et al. 2000; McKernan et al. 2009). Universal primer is annealed to adaptor of amplified and fixed read. Each cycle of sequencing involves the linkage of fluorescently labeled octamers. The mixture of octamers is structured in that there is a correlation between identity of specific base position and identity of fluorescently labeled base. Ligation if followed by imaging in four channels along with the collection of data for the position of same base. Then octamer is cleaved chemically between the position 5 and 6 to remove the fluorescent label.

Next-Generation Sequencing

213

When sequencing cycles are completed the primer is denatured to reset the system. The process can be repeated for every nucleotide position. Another important feature of this next generation sequencing platform is the use of two base encoding which is an error correcting scheme. In this feature, two adjacent bases are correlated with the label (McKernan et al. 2009). This will lead to quering each base twice in such a way that miscalls are readily identified. Polonator is another system related to SOLiD which has been developed by J.S and Church group at Harvard (Shendure et al. 2005). Polonator also uses emulsion PCR and sequencing by synthesis strategy. This system is low in cost and open source with being programmable. The read length is limited while sequencing on a high-density array of small beads is possible with providing the opportunity to get massive amount of data.

HeliScope Quake’s group (Braslavsky et al. 2003) introduced a Next Generation Sequencing platform called Helicose Sequencer. This platform has a unique feature in that there is no clonal amplification but a highly sensitive fluorescence detection system is used for the detection of single stranded DNA sequenced through sequencing by synthesis. Random fragmentation and PolyA tailing are used for the construction of libraries without the involvement of Polymerase Chain Reaction (PCR). These single stranded DNAs are captured by hybridization to a surface tethered poly-A tail which gives an array of primed single molecule sequencing templates. Each cycle of the sequencing, is accompanied by addition of DNA polymerase and fluorescently labeled nucleotide. This whole process results in extension of surface immobilized primer template duplexes template. After imaging the full array, chemical cleavage and fluorescence release, subsequent extension and imaging occurs. Recently it has been reported that many hundred cycles of single base extension give an average of 25 bp or greater length of reads (Harris et al. 2008). Some salient features of this system are as follow; this sequencing platform is asynchronous and some strands fall ahead or behind other strands in a sequence dependent way. Some templates may fail to incorporate the next nucleotide but as this is a single molecule, dephasing is not a big issue. The second feature of this platform/system is that there is no moiety for termination of labeled nucleotides which give rise to a homopolymers issue. In HeliScope platform, accuracy of raw sequencing can be improved by a special strategy where single molecule template array is sequenced and then fully copied. After the surface attachment of the new strand, original template molecule is denatured to remove it. The distal end of the adapter is primed to get a second

214

Asif Nadeem and Maryam Javed

sequence for the same template (Ewing et al. 1998; Harris et al. 2008). Lastly, incorporation of contaminating, unlabeled or non-emitting nucleotides is another problem. The dominant type of error is deletion being at higher rate than substitution.

Reversible Termination Sequencing Technology Dr. Jingyue Ju from Colombia University was the first one who reported Reversible Termination Sequencing technology (Li et al. 2003). It uses analogous modified nucleotides for chain synthesis termination reversibly as compared to Sanger sequencing where di-deoxy-ribonucleotides are used for chain termination irreversibly. Over the last decade or more, many reversible chain terminators have been developed which are classified into two groups based on the difference of blocking groups (Fuller et al. 2009; Wu et al. 2007; Guo et al. 2008; Ju et al. 2008; Chen et al. 2010; Hutter et al. 2010; Bowers et al. 2009; Gardner et al. 2012; Stupi et al. 2012; Litosh et al. 2011). One of the terminator types is 3’-O-blocked reversible terminator which is linked to the oxygen atom of 3’ OH of pentose sugar and label for fluorescence is attached to the nitrogenous base. This acts as a reporter and is cleaved (Bentley et al. 2008; Li et al. 2003; Wu et al. 2007; Guo et al. 2008; Ju et al. 2008; Hutter et al. 2010). The second type of terminator is 3’-unblocked reversible terminator (Bowers et al. 2009; Gardner et al. 2012; Stupi et al. 2012; Litosh et al. 2011) where reversible terminator is attached to the base as well as the florescence group which acts as a reporter and terminating group for chain termination. Both the terminators have benefits and drawbacks as 3’-O-blocked terminator has 3’ reversible blocking group therefore it should have enhanced termination effects. On the other hand, 3’-unblocked reversible terminator is easily accepted by DNA polymerase due to lack of modification at 3’-OH group. Next Generation Sequencing Data Analysis Next generation Sequencing produces a massive amount of data and many software are available for the analyses of the data. These softwares involve the alignment of the sequences to a reference sequence, base calling or polymorphism detection, denovo assembly of the sequence reads, genome browsing and annotation. For alignment purpose, BLAST or BLAT is enough for longer reads. There is a massive increase in tools for the alignment of short length reads of DNA. Smith-Waterman and some other tools have very wellestablished algorithms. One of the examples is SOAP which is a software package for the alignment of gapped and un-gapped alignment. This software

Next-Generation Sequencing

215

uses memory intensive seed and look up table algorithms that accelerate the alignment process allowing the iterative trimming of the 3’ end of the reads (Warren et al. 2007). Bit encoding is another approach for accelerated alignment which also compresses the data into more appropriate format (Li et al. 2008; Ning et al. 2001). Another software MAQ also takes into account the estimated data quality for generating read-placements. MAQ specially works for Solex or SOLiD and SHRiMP data. This includes a novel color space or letter space SM algorithm. Many tools for the assembly of the data have been introduced in the field of next generation sequencing specially for short and unpaired DNA reads (Butler et al. 2008; Sundquist et al. 2007; Warren et al. 2007). Major platforms usually have many mate-paired reads which are anticipitated to impose some major impact on overall assembly of the short length reads. Many algorithms have also been developed for this purpose (Butler et al. 2008; Sundaquist et al. 2007; Warren et al. 2007).

Conclusion Thanks to NGS, researchers can study biological systems in a way previously unimaginable. As a result, NGS has cemented itself as a notable tool in fundamental research and is quickly becoming a well-established method in studies involving translational research. A complex genome of organisms such as that of a human can now be sequenced using NGS in a single run. In comparison, the prior Sanger sequencing method, which was used to produce the draft of the first human genome, took more than a decade to complete. In the foreseeable future, ongoing cost reductions and the creation of standardized pipelines will very certainly make NGS a regular technique for everyday analysis. Clinical as well as forensic labs have already begun to employ NGS as a diagnostic tool, and in near future, it will be incorporated into all spheres of research and diagnostics as the cost and time requirement continue to decrease. These trends over the long run are expected to continue which will result in generation of sequencing data of a wide range of important organisms. Nonetheless, there are still substantial hurdles in implementing NGS, particularly in terms of storage of data and its processing. Moreover, generation of genomic data in case of humans creates ethical considerations regarding their proper usage. For example, storage of human genomic data as well as its access, dissemination, and associated discrimination that may result from such knowledge are just some of the few possible ethical and legal

216

Asif Nadeem and Maryam Javed

concerns which must be addressed now before the technology becomes widely implemented.

References Adams, J. M., P. G. N. Jeppesen, F. Sanger, and B. G. Barrell. “Nucleotide sequence from the coat protein cistron of R17 bacteriophage RNA.” Nature 223, no. 5210 (1969): 1009-1014. https://doi.org/10.1038/2231009a0. Adessi, Céline, Gilles Matton, Guidon Ayala, Gerardo Turcatti, Jean-Jacques Mermod, Pascal Mayer, and Eric Kawashima. “Solid phase DNA amplification: characterisation of primer attachment and amplification mechanisms.” Nucleic acids research 28, no. 20 (2000): e87-e87. https://doi.org/10.1093/nar/28.20.e87. Aird, Daniel, Michael G. Ross, Wei-Sheng Chen, Maxwell Danielsson, Timothy Fennell, Carsten Russ, David B. Jaffe, Chad Nusbaum, and Andreas Gnirke. “Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries.” Genome biology 12, no. 2 (2011): 1-14. https://doi.org/10.1186/gb-2011-12-2-r18. Anderson, Stephen. “Shotgun DNA sequencing using cloned DNase I-generated fragments.” Nucleic acids research 9, no. 13 (1981): 3015-3027. https://doi.org/10.1093/nar/9.13.3015. Ansorge, Wilhelm J. “Next-generation DNA sequencing techniques.” New biotechnology 25, no. 4 (2009): 195-203. https://doi.org/10.1016/j.nbt.2008.12.009. Ansorge, Wilhelm, Brian S. Sproat, Josef Stegemann, and Christian Schwager. “A nonradioactive automated method for DNA sequence determination.” Journal of Biochemical and Biophysical Methods 13, no. 6 (1986): 315-323. https://doi.org/ 10.1016/0165-022X(86)90038-2. Ansorge, Wilhelm, Brian Sproat, Josef Stegemann, Christian Schwager, and Martin Zenke. “Automated DNA sequencing: ultrasensitive detection of fluorescent bands during electrophoresis.” Nucleic acids research 15, no. 11 (1987): 4593-4602. https://doi.org/10.1093/nar/15.11.4593. Balasubramanian, Shankar. “Sequencing nucleic acids: from chemistry to medicine.” Chemical communications 47, no. 26 (2011): 7281-7286. https://doi.org/10.1038/ nature07517. Bao, Hua, Hui Guo, Jinwei Wang, Renchao Zhou, Xuemei Lu, and Suhua Shi. “MapView: visualization of short reads alignment on a desktop computer.” Bioinformatics 25, no. 12 (2009): 1554-1555. https://doi.org/10.1093/bioinformatics/btp255. Benner, Steven A. “Understanding nucleic acids using synthetic chemistry.” Accounts of Chemical Research 37, no. 10 (2004): 784-797. https://doi.org/10.1021/ar040004z. Bentley, David R., Shankar Balasubramanian, Harold P. Swerdlow, Geoffrey P. Smith, John Milton, Clive G. Brown, Kevin P. Hall et al. “Accurate whole human genome sequencing using reversible terminator chemistry.” nature 456, no. 7218 (2008): 5359. https://doi.org/10.1038/nature07517. Bowers, Jayson, Judith Mitchell, Eric Beer, Philip R. Buzby, Marie Causey, J. William Efcavitch, Mirna Jarosz et al. “Virtual terminator nucleotides for next-generation

Next-Generation Sequencing

217

DNA sequencing.” Nature methods 6, no. 8 (2009): 593-595. https://doi.org/ 10.1038/nmeth.1354. Branton, Daniel, David W. Deamer, Andre Marziali, Hagan Bayley, Steven A. Benner, Thomas Butler, Massimiliano Di Ventra et al. “The potential and challenges of nanopore sequencing.” Nanoscience and technology: A collection of reviews from Nature Journals (2010): 261-268. https://doi.org/10.1142/9789814287005_0027. Braslavsky, Ido, Benedict Hebert, Emil Kartalov, and Stephen R. Quake. “Sequence information can be obtained from single DNA molecules.” Proceedings of the National Academy of Sciences 100, no. 7 (2003): 3960-3964. https://doi.org/10.1073/ pnas.0230489100. Brenner, Sydney, Maria Johnson, John Bridgham, George Golda, David H. Lloyd, Davida Johnson, Shujun Luo et al. “Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays.” Nature biotechnology 18, no. 6 (2000): 630-634. https://doi.org/10.1038/76469. Brockman, William, Pablo Alvarez, Sarah Young, Manuel Garber, Georgia Giannoukos, William L. Lee, Carsten Russ, Eric S. Lander, Chad Nusbaum, and David B. Jaffe. “Quality scores and SNP detection in sequencing-by-synthesis systems.” Genome research 18, no. 5 (2008): 763-770. https://doi.org/10.1101/gr.070227.107. Brownlee, G. G., and F. Sanger. “Nucleotide sequences from the low molecular weight ribosomal RNA of Escherichia coli.” Journal of molecular biology 23, no. 3 (1967): 337-IN9. https://doi.org/10.1016/S0022-2836(67)80109-8. Buermans, H. P. J., and J. T. Den Dunnen. “Next generation sequencing technology: advances and applications.” Biochimica et Biophysica Acta (BBA)-Molecular Basis of Disease 1842, no. 10 (2014): 1932-1941. https://doi.org/10.1016/j.bbadis. 2014.06.015. Butler, Jonathan, Iain MacCallum, Michael Kleber, Ilya A. Shlyakhter, Matthew K. Belmonte, Eric S. Lander, Chad Nusbaum, and David B. Jaffe. “ALLPATHS: de novo assembly of whole-genome shotgun microreads.” Genome research 18, no. 5 (2008): 810-820. . https://doi.org/10.1101/gr.7337908. Campbell, Peter J., Philip J. Stephens, Erin D. Pleasance, Sarah O’Meara, Heng Li, Thomas Santarius, Lucy A. Stebbings et al. “Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing.” Nature genetics 40, no. 6 (2008): 722-729. https://doi.org/10. 1038/ng.128. Check Hayden, Erika. “Pint-sized DNA sequencer impresses first users.” Nature News 521, no. 7550 (2015): 15. https://doi.org/10.1038/521015a. Chen, Cheng-Yao. “DNA polymerases drive DNA sequencing-by-synthesis technologies: both past and present.” Frontiers in microbiology 5 (2014): 305. https://doi.org/10.3389/fmicb.2014.00305. Chen, Fei, Eric A. Gaucher, Nicole A. Leal, Daniel Hutter, Stephanie A. Havemann, Sridhar Govindarajan, Eric A. Ortlund, and Steven A. Benner. “Reconstructed evolutionary adaptive paths give polymerases accepting reversible terminators for sequencing and SNP detection.” Proceedings of the National Academy of Sciences 107, no. 5 (2010): 1948-1953. https://doi.org/10.1073/pnas.0908463107.

218

Asif Nadeem and Maryam Javed

Chidgeavadze, Z. G., R. Sh Beabealashvilli, A. M. Atrazhev, M. K. Kukhanova, A. V. Azhayev, and A. A. Krayevsky. “2’, 3’-Dideoxy-3’aminonucleoside 5’-triphosphates are the terminators of DNA synthesis catalyzed by DNA polymerases.” Nucleic acids research 12, no. 3 (1984): 1671. https://dx.doi.org/10.1093%2Fnar%2F12.3.1671. Clarke, James, Hai-Chen Wu, Lakmal Jayasinghe, Alpesh Patel, Stuart Reid, and Hagan Bayley. “Continuous base identification for single-molecule nanopore DNA sequencing.” Nature nanotechnology 4, no. 4 (2009): 265-270. https://doi.org/10.1038/nnano.2009.12. Cohen, Stanley N., Annie C. Y. Chang, Herbert W. Boyer, and Robert B. Helling. “Construction of biologically functional bacterial plasmids in vitro.” Proceedings of the National Academy of Sciences 70, no. 11 (1973): 3240-3244. https://doi.org/10.1073/pnas.70.11.3240. Cory, Suzanne, K. A. Marcker, S. K. Dube, and B. F. C. Clark. “Primary structure of a methionine transfer RNA from Escherichia coli.” Nature 220, no. 5171 (1968): 10391040. https://doi.org/10.1038/2201039a0. Day, Darren J., Phyllis W. Speiser, Egbert Schulze, M. Bettendorf, Jodene Fitness, Francis Barany, and Perrin C. White. “Identification of non-amplifying CYP21 genes when using PCR-based diagnosis of 21-hydroxylase deficiency in congenital adrenal hyperplasia (CAH) affected pedigrees.” Human Molecular Genetics 5, no. 12 (1996): 2039-2048. https://doi.org/10.1093/hmg/5.12.2039. Dekker, Cees. “Solid-state nanopores.” Nanoscience And Technology: A Collection of Reviews from Nature Journals (2010): 60-66. https://doi.org/10.1142/978981 4287005_0007. Dressman, Devin, Hai Yan, Giovanni Traverso, Kenneth W. Kinzler, and Bert Vogelstein. “Transforming single DNA molecules into fluorescent magnetic particles for detection and enumeration of genetic variations.” Proceedings of the National Academy of Sciences 100, no. 15 (2003): 8817-8822. https://doi.org/10.1073/pnas.1133470100. Drmanac, Radoje, Andrew B. Sparks, Matthew J. Callow, Aaron L. Halpern, Norman L. Burns, Bahram G. Kermani, Paolo Carnevali et al. “Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays.” Science 327, no. 5961 (2010): 78-81. https://doi.org/10.1126/science.1181498. Dube, S. K., Ko Ao Marcker, B. F. C. Clark, and Suzanne Cory. “Nucleotide sequence of N-formyl-methionyl-transfer RNA.” Nature 218, no. 5138 (1968): 232-233. https://doi.org/10.1038/218232a0. Eid, John, Adrian Fehr, Jeremy Gray, Khai Luong, John Lyle, Geoff Otto, Paul Peluso et al. “Real-time DNA sequencing from single polymerase molecules.” Science 323, no. 5910 (2009): 133-138. https://doi.org/10.1126/science.1162986. Ewing, Brent, and Phil Green. “Base-calling of automated sequencer traces using phred. II. Error probabilities.” Genome research 8, no. 3 (1998): 186-194. https://doi.org/10.1101/gr.8.3.186. Fedurco, Milan, Anthony Romieu, Scott Williams, Isabelle Lawrence, and Gerardo Turcatti. “BTA, a novel reagent for DNA attachment on glass and efficient generation of solid-phase amplified DNA colonies.” Nucleic acids research 34, no. 3 (2006): e22e22. https://doi.org/10.1093/nar/gnj023.

Next-Generation Sequencing

219

Fiers, Walter, Roland Contreras, Fred Duerinck, Guy Haegeman, Dirk Iserentant, Jozef Merregaert, W. Min Jou et al. “Complete nucleotide sequence of bacteriophage MS2 RNA: primary and secondary structure of the replicase gene.” Nature 260, no. 5551 (1976): 500-507. https://doi.org/10.1038/260500a0. Flusberg, Benjamin A., Dale R. Webster, Jessica H. Lee, Kevin J. Travers, Eric C. Olivares, Tyson A. Clark, Jonas Korlach, and Stephen W. Turner. “Direct detection of DNA methylation during single-molecule, real-time sequencing.” Nature methods 7, no. 6 (2010): 461-465. https://doi.org/10.1038/nmeth.1459. Fuller, Carl W., Lyle R. Middendorf, Steven A. Benner, George M. Church, Timothy Harris, Xiaohua Huang, Stevan B. Jovanovich et al. “The challenges of sequencing by synthesis.” Nature biotechnology 27, no. 11 (2009): 1013-1023. https://doi.org/10.1038/nbt.1585. Fullwood, Melissa J., Chia-Lin Wei, Edison T. Liu, and Yijun Ruan. “Next-generation DNA sequencing of paired-end tags (PET) for transcriptome and genome analyses.” Genome research 19, no. 4 (2009): 521-532. https://doi.org/10.1101/gr.074906.107. Gardner, Andrew F., Jinchun Wang, Weidong Wu, Jennifer Karouby, Hong Li, Brian P. Stupi, William E. Jack, Megan N. Hersh, and Michael L. Metzker. “Rapid incorporation kinetics and improved fidelity of a novel class of 3′-OH unblocked reversible terminators.” Nucleic acids research 40, no. 15 (2012): 7404-7415. https://doi.org/10.1093/nar/gks330. Gardner, Malcolm J., Neil Hall, Eula Fung, Owen White, Matthew Berriman, Richard W. Hyman, Jane M. Carlton et al. “Genome sequence of the human malaria parasite Plasmodium falciparum.” Nature 419, no. 6906 (2002): 498-511. https://doi.org/10.1038/nature01097. Gharizadeh, Baback, Mehran Ghaderi, and Pål Nyrén. “Pyrosequencing technology for short DNA sequencing and whole genome sequencing.” 生物物理 47, no. 2 (2007): 129-132. https://doi.org/10.2142/biophys.47.129. Gharizadeh, Baback. “Method development and applications of Pyrosequencing technology.” PhD diss., Bioteknologi, 2003. Goodman, Howard M., John Abelson, Arthur Landy, S. Brenner, and J. D. Smith. “Amber suppression: a nucleotide change in the anticodon of a tyrosine transfer RNA.” Nature 217, no. 5133 (1968): 1019-1024. https://doi.org/10.1038/2171019a0. Guo, Jia, Lin Yu, Nicholas J. Turro, and Jingyue Ju. “An integrated system for DNA sequencing by synthesis using novel nucleotide analogues.” Accounts of chemical research 43, no. 4 (2010): 551-563. https://doi.org/10.1021/ar900255c. Guo, Jia, Ning Xu, Zengmin Li, Shenglong Zhang, Jian Wu, Dae Hyun Kim, Mong Sano Marma et al. “Four-color DNA sequencing with 3′-O-modified nucleotide reversible terminators and chemically cleavable fluorescent dideoxynucleotides.” Proceedings of the National Academy of Sciences 105, no. 27 (2008): 9145-9150. https://doi.org/10.1073/pnas.0804023105. Hampton, Oliver A., Petra Den Hollander, Christopher A. Miller, David A. Delgado, Jian Li, Cristian Coarfa, Ronald A. Harris et al. “A sequence-level map of chromosomal breakpoints in the MCF-7 breast cancer cell line yields insights into the evolution of a cancer genome.” Genome research 19, no. 2 (2009): 167-177. https://doi.org/10. 1101/gr.080259.108.

220

Asif Nadeem and Maryam Javed

Haque, Farzin, Jinghong Li, Hai-Chen Wu, Xing-Jie Liang, and Peixuan Guo. “Solid-state and biological nanopore for real-time sensing of single chemical and sequencing of DNA.” Nano today 8, no. 1 (2013): 56-74. https://doi.org/10.1016/j.nantod. 2012.12.008. Harris, Timothy D., Phillip R. Buzby, Hazen Babcock, Eric Beer, Jayson Bowers, Ido Braslavsky, Marie Causey et al. “Single-molecule DNA sequencing of a viral genome.” Science 320, no. 5872 (2008): 106-109. https://doi.org/10.1126/ science.1150427. Hernandez, David, Patrice François, Laurent Farinelli, Magne Østerås, and Jacques Schrenzel. “De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer.” Genome research 18, no. 5 (2008): 802-809. https://doi.org/10.1101/gr.072033.107. Hoberman, Rose, Joana Dias, Bing Ge, Eef Harmsen, Michael Mayhew, Dominique J. Verlaan, Tony Kwan, Ken Dewar, Mathieu Blanchette, and Tomi Pastinen. “A probabilistic approach for SNP discovery in high-throughput human resequencing data.” Genome Research 19, no. 9 (2009): 1542-1552. . https://doi.org/10.1101/ gr.092072.109. Hoeijmakers, Wieteke A. M., Richárd Bártfai, Kees-Jan Françoijs, and Hendrik G. Stunnenberg. “Linear amplification for deep sequencing.” Nature protocols 6, no. 7 (2011): 1026-1036. https://doi.org/10.1038/nprot.2011.345. Holley, Robert W., Jean Apgar, George A. Everett, James T. Madison, Mark Marquisee, Susan H. Merrill, John Robert Penswick, and Ada Zamir. “Structure of a ribonucleic acid.” Science (1965): 1462-1465. Holley, Robert W., Jean Apgar, Susan H. Merrill, and Paul L. Zubkoff. “Nucleotide and oligonucleotide compositions of the alanine-, valine-, and tyrosine-acceptor “soluble” ribonucleic acids of yeast.” Journal of the American Chemical Society 83, no. 23 (1961): 4861-4862. Huang, Weichun, and Gabor Marth. “EagleView: a genome assembly viewer for nextgeneration sequencing technologies.” Genome research 18, no. 9 (2008): 1538-1543. https://doi.org/10.1101/gr.076067.108. Hunkapiller, Tom, R. J. Kaiser, B. F. Koop, and Leroy Hood. “Large-scale and automated DNA sequence determination.” Science 254, no. 5028 (1991): 59-67. https://doi.org/10.1126/science.1925562. Hutchison III, Clyde A. “DNA sequencing: bench to bedside and beyond.” Nucleic acids research 35, no. 18 (2007): 6227-6237. https://doi.org/10.1093/nar/gkm688. Hutter, Daniel, Myong-Jung Kim, Nilesh Karalkar, Nicole A. Leal, Fei Chen, Evan Guggenheim, Visa Visalakshi, Jerzy Olejnik, Steven Gordon, and Steven A. Benner. “Labeled nucleoside triphosphates with reversibly terminating aminoalkoxyl groups.” Nucleosides, Nucleotides and Nucleic Acids 29, no. 11-12 (2010): 879-895. https://doi.org/10.1080/15257770.2010.536191. Hyman, Edward David. “A new method of sequencing DNA.” Analytical biochemistry 174, no. 2 (1988): 423-436. https://doi.org/10.1016/0003-2697(88)90041-3. Idury, Ramana M., and Michael S. Waterman. “A new algorithm for DNA sequence assembly.” Journal of computational biology 2, no. 2 (1995): 291-306. https://doi.org/10.1089/cmb.1995.2.291.

Next-Generation Sequencing

221

Iorns, Elizabeth, Christopher J. Lord, Nicholas Turner, and Alan Ashworth. “Utilizing RNA interference to enhance cancer drug discovery.” Nature reviews Drug discovery 6, no. 7 (2007): 556-568. https://doi.org/10.1038/nrd2355. Jackson, David A., Robert H. Symons, and Paul Berg. “Biochemical method for inserting new genetic information into DNA of Simian Virus 40: circular SV40 DNA molecules containing lambda phage genes and the galactose operon of Escherichia coli.” Proceedings of the National Academy of Sciences 69, no. 10 (1972): 2904-2909. https://doi.org/10.1073/pnas.69.10.2904. Ju, J., Kim, D. H., Guo, J., Meng, Q., Li, Z., Cao, H., et al. DNA sequencing with nonfluorescent nucleotide reversible terminators and cleavable label modified nucleotide terminators. PCT Int Appl Publ. (2008) WO2009054922. (2008). Kambara, Hideki, Tetsuo Nishikawa, Yoshiko Katayama, and Tomoaki Yamaguchi. “Optimization of parameters in a DNA sequenator using fluorescence detection.” Bio/Technology 6, no. 7 (1988): 816-821. https://doi.org/10.1038/nbt0788-816. Kasianowicz, John J., Eric Brandin, Daniel Branton, and David W. Deamer. “Characterization of individual polynucleotide molecules using a membrane channel.” Proceedings of the National Academy of Sciences 93, no. 24 (1996): 13770-13773. https://doi.org/10.1073/pnas.93.24.13770. Kent, W. James. “BLAT—the BLAST-like alignment tool.” Genome research 12, no. 4 (2002): 656-664. https://doi.org/10.1101/gr.229202. Kozarewa, Iwanka, Zemin Ning, Michael A. Quail, Mandy J. Sanders, Matthew Berriman, and Daniel J. Turner. “Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+ C)-biased genomes.” Nature methods 6, no. 4 (2009): 291-295. https://doi.org/10.1038/nmeth.1311. Levene, Michael J., Jonas Korlach, Stephen W. Turner, Mathieu Foquet, Harold G. Craighead, and Watt W. Webb. “Zero-mode waveguides for single-molecule analysis at high concentrations.” science 299, no. 5607 (2003): 682-686. https://doi.org/10.1126/science.1079700. Li, R., Y. Li, K. Kristiansen and J. Wang. “SOAP: short oligonucleotide alignment program.” Bioinformatics (Oxford (2008): 714. https://doi.org/10.1093/ bioinformatics/btn025. Li, Ruiqiang, Chang Yu, Yingrui Li, Tak-Wah Lam, Siu-Ming Yiu, Karsten Kristiansen, and Jun Wang. “SOAP2: an improved ultrafast tool for short read alignment.” Bioinformatics 25, no. 15 (2009): 1966-1967. https://doi.org/10.1093/ bioinformatics/btp336. Li, Zengmin, Xiaopeng Bai, Hameer Ruparel, Sobin Kim, Nicholas J. Turro, and Jingyue Ju. “A photocleavable fluorescent nucleotide for DNA sequencing and analysis.” Proceedings of the National Academy of Sciences 100, no. 2 (2003): 414-419. https://doi.org/10.1073/pnas.242729199. Lister, Ryan, and Joseph R. Ecker. “Finding the fifth base: genome-wide sequencing of cytosine methylation.” Genome research 19, no. 6 (2009): 959-966. https://doi.org/10.1101/gr.083451.108. Litosh, Vladislav A., Weidong Wu, Brian P. Stupi, Jinchun Wang, Sidney E. Morris, Megan N. Hersh, and Michael L. Metzker. “Improved nucleotide selectivity and termination of 3′-OH unblocked reversible terminators by molecular tuning of 2-

222

Asif Nadeem and Maryam Javed

nitrobenzyl alkylated HOMedU triphosphates.” Nucleic acids research 39, no. 6 (2011): e39-e39. https://doi.org/10.1093/nar/gkq1293. Lizardi, Paul M., Xiaohua Huang, Zhengrong Zhu, Patricia Bray-Ward, David C. Thomas, and David C. Ward. “Mutation detection and single-molecule counting using isothermal rolling-circle amplification.” Nature genetics 19, no. 3 (1998): 225-232. https://doi.org/10.1038/898. Lohmann, Katja, and Christine Klein. “Next generation sequencing and the future of genetic diagnosis.” Neurotherapeutics 11, no. 4 (2014): 699-707. https://doi.org/10.1007/s13311-014-0288-8. Luckey, John A., Howard Drossman, Anthony J. Kostichka, David A. Mead, Jonathan D’Cunha, Tracy B. Norris, and Lloyd M. Smith. “High speed DNA sequencing by capillary electrophoresis.” Nucleic acids research 18, no. 15 (1990): 4417-4421. https://doi.org/10.1093/nar/18.15.4417. Maher, Christopher A., Chandan Kumar-Sinha, Xuhong Cao, Shanker Kalyana-Sundaram, Bo Han, Xiaojun Jing, Lee Sam, Terrence Barrette, Nallasivam Palanisamy, and Arul M. Chinnaiyan. “Transcriptome sequencing to detect gene fusions in cancer.” Nature 458, no. 7234 (2009): 97-101. https://doi.org/10.1038/nature07638. Maher, Christopher A., Nallasivam Palanisamy, John C. Brenner, Xuhong Cao, Shanker Kalyana-Sundaram, Shujun Luo, Irina Khrebtukova et al. “Chimeric transcript discovery by paired-end transcriptome sequencing.” Proceedings of the National Academy of Sciences 106, no. 30 (2009): 12353-12358. https://doi.org/10.1073/ pnas.0904720106. Malhis, Nawar, and Steven J. M. Jones. “High quality SNP calling using Illumina data at shallow coverage.” Bioinformatics 26, no. 8 (2010): 1029-1035. https://doi.org/10.1093/bioinformatics/btq092. Malhis, Nawar, Yaron S. N. Butterfield, Martin Ester, and Steven J. M. Jones. “Slider— maximum use of probability information for alignment of short sequence reads and SNP detection.” Bioinformatics 25, no. 1 (2009): 6-13. https://doi.org/10.1093/ bioinformatics/btn565. Manolio, Teri A. “Genomewide association studies and assessment of the risk of disease.” New England journal of medicine 363, no. 2 (2010): 166-176. https://doi.org/ 10.1056/NEJMra0905980. Manolio, Teri A., Francis S. Collins, Nancy J. Cox, David B. Goldstein, Lucia A. Hindorff, David J. Hunter, Mark I. McCarthy et al. “Finding the missing heritability of complex diseases.” Nature 461, no. 7265 (2009): 747-753. https://doi.org/10.1038/ nature08494. Margulies, Marcel, Michael Egholm, William E. Altman, Said Attiya, Joel S. Bader, Lisa A. Bemben, Jan Berka et al. “Genome sequencing in microfabricated high-density picolitre reactors.” Nature 437, no. 7057 (2005): 376-380. https://doi.org/ 10.1038/nature03959. Marth, Gabor T., Ian Korf, Mark D. Yandell, Raymond T. Yeh, Zhijie Gu, Hamideh Zakeri, Nathan O. Stitziel, LaDeana Hillier, Pui-Yan Kwok, and Warren R. Gish. “A general approach to single-nucleotide polymorphism discovery.” Nature genetics 23, no. 4 (1999): 452-456. https://doi.org/10.1038/70570.

Next-Generation Sequencing

223

Maxam, Allan M., and Walter Gilbert. “A new method for sequencing DNA.” Proceedings of the National Academy of Sciences 74, no. 2 (1977): 560-564. https://doi.org/10.1073/pnas.74.2.560. McKernan, K. J., Peckham, H. E., Costa, G. L., McLaughlin, S. F., et al. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. “ Genome Res. (2009) 19 (9): 15271541. Melamede, Robert J. “Automatable process for sequencing nucleotide.” U.S. Patent 4,863,849, issued September 5, 1989. Metzker, Michael L. “Sequencing technologies—the next generation.” Nature reviews genetics 11, no. 1 (2010): 31-46. https://doi.org/10.1038/nrg2626. Milne, Iain, Micha Bayer, Linda Cardle, Paul Shaw, Gordon Stephen, Frank Wright, and David Marshall. “Tablet—next generation sequence assembly visualization.” Bioinformatics 26, no. 3 (2010): 401-402. https://doi.org/10.1093/bioinformatics/ btp666. Mokry, Michal, Harma Feitsma, Isaac J. Nijman, Ewart de Bruijn, Pieter J. van der Zaag, Victor Guryev, and Edwin Cuppen. “Accurate SNP and mutation detection by targeted custom microarray-based genomic enrichment of short-fragment sequencing libraries.” Nucleic acids research 38, no. 10 (2010): e116-e116. https://doi.org/10.1093/nar/gkq072. Morozova, Olena, and Marco A. Marra. “Applications of next-generation sequencing technologies in functional genomics.” Genomics 92, no. 5 (2008): 255-264. Nanopore, Oxford. “Oxford Nanopore announcement sets sequencing sector abuzz.” Nature biotechnology 30, no. 4 (2012): 295. https://doi.org/10.1038/nbt0412-295. Ng, Sarah B., Emily H. Turner, Peggy D. Robertson, Steven D. Flygare, Abigail W. Bigham, Choli Lee, Tristan Shaffer et al. “Targeted capture and massively parallel sequencing of 12 human exomes.” Nature 461, no. 7261 (2009): 272-276. https://doi.org/10.1038/nature08250. Niedringhaus, Thomas P., Denitsa Milanova, Matthew B. Kerby, Michael P. Snyder, and Annelise E. Barron. “Landscape of next-generation sequencing technologies.” Analytical chemistry 83, no. 12 (2011): 4327-4341. https://doi.org/10.1021/ ac2010857. Ning, Zemin, Anthony J. Cox, and James C. Mullikin. “SSAHA: a fast search method for large DNA databases.” Genome research 11, no. 10 (2001): 1725-1729. https://doi.org/10.1101/gr.194201. Nyrén, Pål, and Arne Lundin. “Enzymatic method for continuous monitoring of inorganic pyrophosphate synthesis.” Analytical biochemistry 151, no. 2 (1985): 504-509. https://doi.org/10.1016/0003-2697(85)90211-8. Oyola, S. O., T. D. Otto, Y. Gu, G. Maslen, M. Manske, S Campino, D. J. Turner, B. Macinnis, D. P. Kwiatkowski, H. P. Swerdlow and M. A. Quail. “Optimizing Illumina next-generation sequencing library preparation for extremely AT-biased genomes.” BMC genomics (2012): 1. https://doi.org/10.1186/1471-2164-13-1. Pareek, Chandra Shekhar, Rafal Smoczynski, and Andrzej Tretyn. “Sequencing technologies and genome sequencing.” Journal of applied genetics 52, no. 4 (2011): 413-435. https://doi.org/10.1007/s13353-011-0057-x.

224

Asif Nadeem and Maryam Javed

Perelygina, Ludmila, Li Zhu, Holley Zurkuhlen, Ryan Mills, Mark Borodovsky, and Julia K. Hilliard. “Complete sequence and comparative analysis of the genome of herpes B virus (Cercopithecine herpesvirus 1) from a rhesus monkey.” Journal of virology 77, no. 11 (2003): 6167-6177. https://doi.org/10.1128/JVI.77.11.6167-6177.2003. Pettersson, Erik, Joakim Lundeberg, and Afshin Ahmadian. “Generations of sequencing technologies.” Genomics 93, no. 2 (2009): 105-111. https://doi.org/10.1016/ j.ygeno.2008.10.003. Pevzner, Pavel A., Mark Yu Borodovsky, and Anrey A. Mironov. “Linguistics of nucleotide sequences II: stationary words in genetic texts and the zonal structure of DNA.” Journal of Biomolecular Structure and Dynamics 6, no. 5 (1989): 1027-1038. https://doi.org/10.1080/07391102.1989.10506529. Piepenburg, Olaf, Colin H. Williams, Derek L. Stemple, and Niall A. Armes. “DNA detection using recombination proteins.” PLoS biology 4, no. 7 (2006): e204. https://doi.org/10.1371/journal.pbio.0040204. Pleasance, Erin D., R. Keira Cheetham, Philip J. Stephens, David J. McBride, Sean J. Humphray, Chris D. Greenman, Ignacio Varela et al. “A comprehensive catalogue of somatic mutations from a human cancer genome.” Nature 463, no. 7278 (2010): 191196. https://doi.org/10.1038/nature08658. Prober, James M., George L. Trainor, Rudy J. Dam, Frank W. Hobbs, Charles W. Robertson, Robert J. Zagursky, Anthony J. Cocuzza, Mark A. Jensen, and Kirk Baumeister. “A system for rapid DNA sequencing with fluorescent chain-terminating dideoxynucleotides.” Science 238, no. 4825 (1987): 336-341. https://doi.org/10.1126/science.2443975. Quail, Michael A., Iwanka Kozarewa, Frances Smith, Aylwyn Scally, Philip J. Stephens, Richard Durbin, Harold Swerdlow, and Daniel J. Turner. “A large genome center’s improvements to the Illumina sequencing system.” Nature methods 5, no. 12 (2008): 1005-1010. https://doi.org/10.1038/nmeth.1270. Quail, Michael A., Miriam Smith, Paul Coupland, Thomas D. Otto, Simon R. Harris, Thomas R. Connor, Anna Bertoni, Harold P. Swerdlow, and Yong Gu. “A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers.” BMC genomics 13, no. 1 (2012): 1-13. https://doi.org/10.1186/1471-2164-13-341. Quick, Joshua, Aaron R. Quinlan, and Nicholas J. Loman. “A reference bacterial genome dataset generated on the MinION™ portable single-molecule nanopore sequencer.” Gigascience 3, no. 1 (2014): 2047-217X. https://doi.org/10.1186/2047-217X-3-22. Quinlan, Aaron R., Donald A. Stewart, Michael P. Strömberg, and Gábor T. Marth. “Pyrobayes: an improved base caller for SNP discovery in pyrosequences.” Nature methods 5, no. 2 (2008): 179-181. https://doi.org/10.1038/nmeth.1172. Ronaghi, Mostafa, Mathias Uhlén, and Pål Nyrén. “A sequencing method based on realtime pyrophosphate.” Science 281, no. 5375 (1998): 363-365. https://doi.org/10. 1126/science.281.5375.363. Ronaghi, Mostafa, Samer Karamohamed, Bertil Pettersson, Mathias Uhlén, and Pål Nyrén. “Real-time DNA sequencing using detection of pyrophosphate release.” Analytical biochemistry 242, no. 1 (1996): 84-89. https://doi.org/10.1006/ abio.1996.0432.

Next-Generation Sequencing

225

Ronaghi, Mostafa. “Improved performance of pyrosequencing using single-stranded DNAbinding protein.” Analytical biochemistry 286, no. 2 (2000): 282-288. https://doi.org/10.1006/abio.2000.4808. Rothberg, Jonathan M., Wolfgang Hinz, Todd M. Rearick, Jonathan Schultz, William Mileski, Mel Davey, John H. Leamon et al. “An integrated semiconductor device enabling non-optical genome sequencing.” Nature 475, no. 7356 (2011): 348-352. https://doi.org/10.1038/nature10242. Saiki, R. K., D. H. Gelfand, S. Stoffel, S. J. Scharf, R. Higuchi, G. T. Horn, K. B. Mullis and H. A. Erlich. “Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase.” Science (New York (1988): 491. https://doi.org/10.1126/science.2448875. Saiki, Randall K., Stephen Scharf, Fred Faloona, Kary B. Mullis, Glenn T. Horn, Henry A. Erlich, and Norman Arnheim. “Enzymatic amplification of beta-globin genomic sequences and restriction site analysis for diagnosis of sickle cell anemia.” Science 230, no. 4732 (1985): 1350-1354. https://doi.org/10.1126/science.2999980. Sanger, Fred, and Alan R. Coulson. “A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase.” Journal of molecular biology 94, no. 3 (1975): 441-448. https://doi.org/10.1016/0022-2836(75)90213-2. Sanger, Frederick, George G. Brownlee, and Bart G. Barrell. “A two-dimensional fractionation procedure for radioactive nucleotides.” Journal of molecular biology 13, no. 2 (1965): 373-IN4. https://doi.org/10.1016/S0022-2836(65)80104-8. Sanger, Frederick, Gilian M. Air, Bart G. Barrell, Nigel L. Brown, Alan R. Coulson, John C. Fiddes, C. A. Hutchison, Patrick M. Slocombe, and Mo Smith. “Nucleotide sequence of bacteriophage φX174 DNA.” nature 265, no. 5596 (1977): 687-695. https://doi.org/10.1038/265687a0. Sanger, Frederick, Steven Nicklen, and Alan R. Coulson. “DNA sequencing with chainterminating inhibitors.” Proceedings of the national academy of sciences 74, no. 12 (1977): 5463-5467. https://doi.org/10.1073/pnas.74.12.5463. Schadt, Eric E., Steve Turner, and Andrew Kasarskis. “A window into third-generation sequencing.” Human molecular genetics 19, no. R2 (2010): R227-R240. https://doi.org/10.1093/hmg/ddq416. Shah, Sohrab P., Martin Köbel, Janine Senz, Ryan D. Morin, Blaise A. Clarke, Kimberly C. Wiegand, Gillian Leung et al. “Mutation of FOXL2 in granulosa-cell tumors of the ovary.” New England Journal of Medicine 360, no. 26 (2009): 2719-2729. https://doi.org/10.1056/NEJMoa0902542. Shendure, Jay, and Hanlee Ji. “Next-generation DNA sequencing.” Nature biotechnology 26, no. 10 (2008): 1135-1145. https://doi.org/10.1038/nbt1486. Shendure, Jay, Gregory J. Porreca, Nikos B. Reppas, Xiaoxia Lin, John P. McCutcheon, Abraham M. Rosenbaum, Michael D. Wang, Kun Zhang, Robi D. Mitra, and George M. Church. “Accurate multiplex polony sequencing of an evolved bacterial genome.” Science 309, no. 5741 (2005): 1728-1732. https://doi.org/10.1126/ science.1117389. Simpson, Jared T., Kim Wong, Shaun D. Jackman, Jacqueline E. Schein, Steven JM Jones, and Inanç Birol. “ABySS: a parallel assembler for short read sequence data.” Genome

226

Asif Nadeem and Maryam Javed

research 19, no. 6 (2009): 1117-1123. https://doi.org/10.1101/ gr.089532.108. Smith, Lloyd M., Steven Fung, Michael W. Hunkapiller, Tim J. Hunkapiller, and Leroy E. Hood. “The synthesis of oligonucleotides containing an aliphatic amino group at the 5′ terminus: synthesis of fluorescent DNA primers for use in DNA sequence analysis.” Nucleic acids research 13, no. 7 (1985): 2399-2412. https://doi.org/ 10.1093/nar/13.7.2399. Staden, Rodger. “A strategy of DNA sequencing employing computer programs.” Nucleic acids research 6, no. 7 (1979): 2601-2610. https://doi.org/10.1093/nar/6.7.2601. Stein, Lincoln D. “The case for cloud computing in genome informatics.” Genome biology 11, no. 5 (2010): 1-7. https://doi.org/10.1186/gb-2010-11-5-207. Stratton, Michael R., Peter J. Campbell, and P. Andrew Futreal. “The cancer genome.” Nature 458, no. 7239 (2009): 719-724. https://doi.org/10.1038/nature07943. Stupi, Brian P., Hong Li, Jinchun Wang, Weidong Wu, Sidney E. Morris, Vladislav A. Litosh, Jesse Muniz, Megan N. Hersh, and Michael L. Metzker. “Stereochemistry of benzylic carbon substitution coupled with ring modification of 2‐nitrobenzyl groups as key determinants for fast‐cleaving reversible terminators.” Angewandte Chemie International Edition 51, no. 7 (2012): 1724-1727. https://doi.org/10.1002/ anie.201106516. Sundquist, Andreas, Mostafa Ronaghi, Haixu Tang, Pavel Pevzner, and Serafim Batzoglou. “Whole-genome sequencing and assembly with high-throughput, short-read technologies.” PloS one 2, no. 5 (2007): e484. https://doi.org/10.1371/journal. pone.0000484. Swerdlow, Harold, and Raymond Gesteland. “Capillary gel electrophoresis for rapid, high resolution DNA sequencing.” Nucleic acids research 18, no. 6 (1990): 1415-1419. https://doi.org/10.1093/nar/18.6.1415. Tawfik, Dan S., and Andrew D. Griffiths. “Man-made cell-like compartments for molecular evolution.” Nature biotechnology 16, no. 7 (1998): 652-656. https://doi.org/10.1038/nbt0798-652. Teytelman, Leonid, Bilge Özaydın, Oliver Zill, Philippe Lefrançois, Michael Snyder, Jasper Rine, and Michael B. Eisen. “Impact of chromatin structures on DNA processing for genomic analyses.” PloS one 4, no. 8 (2009): e6700. https://doi.org/10.1371/journal.pone.0006700. Tucker, Tracy, Marco Marra, and Jan M. Friedman. “Massively parallel sequencing: the next big thing in genetic medicine.” The American Journal of Human Genetics 85, no. 2 (2009): 142-154. https://doi.org/10.1016/j.ajhg.2009.06.022. Turcatti, Gerardo, Anthony Romieu, Milan Fedurco, and Ana-Paula Tairi. “A new class of cleavable fluorescent nucleotides: synthesis and optimization as reversible terminators for DNA sequencing by synthesis.” Nucleic acids research 36, no. 4 (2008): e25-e25. https://doi.org/10.1093/nar/gkn021. Van Dijk, Erwin L., Hélène Auger, Yan Jaszczyszyn, and Claude Thermes. “Ten years of next-generation sequencing technology.” Trends in genetics 30, no. 9 (2014): 418426. https://doi.org/10.1016/j.tig.2014.07.001.

Next-Generation Sequencing

227

Van Dijk, Erwin L., Yan Jaszczyszyn, and Claude Thermes. “Library preparation methods for next-generation sequencing: tone down the bias.” Experimental cell research 322, no. 1 (2014): 12-20. https://doi.org/10.1016/j.yexcr.2014.01.008. Voelkerding, Karl V., Shale A. Dames, and Jacob D. Durtschi. “Next-generation sequencing: from basic research to diagnostics.” Clinical chemistry 55, no. 4 (2009): 641-658. https://doi.org/10.1373/clinchem.2008.112789. Wang, Jun, Wei Wang, Ruiqiang Li, Yingrui Li, Geng Tian, Laurie Goodman, Wei Fan et al. “The diploid genome sequence of an Asian individual.” Nature 456, no. 7218 (2008): 60-65. https://doi.org/10.1038/nature07484. Warren, René L., Granger G. Sutton, Steven JM Jones, and Robert A. Holt. “Assembling millions of short DNA sequences using SSAKE.” Bioinformatics 23, no. 4 (2007): 500-501. https://doi.org/10.1093/bioinformatics/btl629. Watson, James D., and Francis H. C. Crick. “Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid.” Nature 171, no. 4356 (1953): 737-738. https://doi.org/10.1038/171737a0. Weiss, M. M., B. Zwaag, J. D. Jongbloed and M. Vogel. “Best practice guidelines for the use of next-generation sequencing applications in genome diagnostics: a national collaborative study of Dutch genome diagnostic laboratories.” Human mutation 34 (2013): 1321. https://doi.org/10.1002/humu.22368. Welch, John S., Peter Westervelt, Li Ding, David E. Larson, Jeffery M. Klco, Shashikant Kulkarni, John Wallis et al. “Use of whole-genome sequencing to diagnose a cryptic fusion oncogene.” Jama 305, no. 15 (2011): 1577-1584. https://doi.org/10.1001/ jama.2011.497. Wu, Jian, Shenglong Zhang, Qinglin Meng, Huanyan Cao, Zengmin Li, Xiaoxu Li, Shundi Shi et al. “3′-O-modified nucleotides as reversible terminators for pyrosequencing.” Proceedings of the National Academy of Sciences 104, no. 42 (2007): 16462-16467. https://doi.org/10.1073/pnas.0707495104. Wu, R. 1970. “Nucleotide Sequence Analysis of DNA. I. Partial Sequence of the Cohesive Ends of Bacteriophage Lambda and 186 DNA.” Journal of Molecular Biology 51 (3): 501–21. https://doi.org/10.1016/0022-2836(70)90004-5. Wu, Ray, and A. D. Kaiser. “Structure and base sequence in the cohesive ends of bacteriophage lambda DNA.” Journal of molecular biology 35, no. 3 (1968): 523-537. https://doi.org/10.1016/S0022-2836(68)80012-9. Xuan, Jiekun, Ying Yu, Tao Qing, Lei Guo, and Leming Shi. “Next-generation sequencing in the clinic: promises and challenges.” Cancer letters 340, no. 2 (2013): 284-295. https://doi.org/10.1016/j.canlet.2012.11.025. Yilmaz, Suzan, and Anup K. Singh. “Single cell genome sequencing.” Current opinion in biotechnology 23, no. 3 (2012): 437-443. https://doi.org/10.1016/j.copbio. 2011.11.018. Zallen, Doris T. “Despite Franklin’s work, Wilkins earned his Nobel.” Nature 425, no. 6953 (2003): 15-15. https://doi.org/10.1038/425015b. Zerbino, Daniel R., and Ewan Birney. “Velvet: algorithms for de novo short read assembly using de Bruijn graphs.” Genome research 18, no. 5 (2008): 821-829. https://doi.org/10.1101/gr.074492.107.

228

Asif Nadeem and Maryam Javed

Zhao, Qi, Otavia L. Caballero, Samuel Levy, Brian J. Stevenson, Christian Iseli, Sandro J. De Souza, Pedro A. Galante et al. “Transcriptome-guided characterization of genomic rearrangements in a breast cancer cell line.” Proceedings of the National Academy of Sciences 106, no. 6 (2009): 1886-1891. https://doi.org/10.1073/pnas.0812945106.

Chapter 11

Next-Generation Sequencing Technologies for Livestock Improvement Abstract With next-generation sequencing (NGS) technology, complete omes (genome, transcriptome, and proteome etc.) of virtually any organism found on earth can now be sequenced more rapidly, inexpensively, and in much more depth than what could be achieved previously. With the continued rapid advancements in the high-throughput sequencing technologies, we foresee sequencing hundreds, or might possibly be thousands of related or unrelated genomes to assess genetic variation within and among organisms or their populations, instead of just sequencing the genomes of an individual. Current sequencing tools allow researchers to see into complicated combinations of RNA and DNA samples in ways that were previously unimaginable. NGS is now developing into a type of molecular scanner that is being used in almost all sectors of biological research. In the last decade, NGS technologies and methodologies have improved, and the sequencing quality has gotten to a point where NGS may be employed in diagnosis of diseases in humans and in forensic investigations. In agriculture and animal production, it is now possible to enhance genetic improvements once relevant genes and their interactions with the surrounding environment are discovered. Though sequencing technologies have made tremendous advancements since their conception, a very important issue is to improve methods used for the analysis of huge amount of data along with accuracy, cost, time and building of efficient algorithms for research.

Keywords: NGS, omes, diagnosis, genetic improvement, analyzing large amount of data

230

Asif Nadeem and Maryam Javed

Introduction Next Generation Sequencing (NGS) has endless applications in biological sciences with rapid progress in identification of gene and their regulators in pathological processes. NGS is also giving a wealth of information in comparative biology through whole genome sequencing. Through bacterial and viral DNA/RNA sequencing, NGS is also playing its part in public health and epidemiology which ultimately helps in identifying novel virulence factors. Expression of genes being studied through RNA-seq is going to replace microarray analysis. Forensic science is also benefitting from NGS. Here is the review of some application of NGS.

Exome and Targeted Sequencing About